Faster self-organizing fuzzy neural network training and a hyperparameter analysis for a...

14
1458 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 Faster Self-Organizing Fuzzy Neural Network Training and a Hyperparameter Analysis for a Brain–Computer Interface Damien Coyle, Member, IEEE, Girijesh Prasad, Senior Member, IEEE, and Thomas Martin McGinnity, Member, IEEE Abstract—This paper introduces a number of modifications to the learning algorithm of the self-organizing fuzzy neural network (SOFNN) to improve computational efficiency. It is shown that the modified SOFNN favorably compares to other evolving fuzzy sys- tems in terms of accuracy and structural complexity. An analysis of the SOFNN’s effectiveness when applied in an electroencephalo- gram (EEG)-based brain–computer interface (BCI) involving the neural-time-series-prediction-preprocessing (NTSPP) framework is also presented, where a sensitivity analysis (SA) of the SOFNN hyperparameters was performed using EEG data recorded from three subjects during left/right-motor-imagery-based BCI exper- iments. The aim of this one-time SA was to eliminate the need to choose subject- and signal-specific hyperparameters for the SOFNN and thus apply the SOFNN in the NTSPP framework as a parameterless self-organizing framework for EEG preprocessing. The results indicate that a general set of NTSPP parameters chosen via the SA provide the best results when tested in a BCI system. Therefore, with this general set of SOFNN parameters and its self-organizing structure, in conjunction with parameterless feature extraction and linear discriminant classification, a fully parameterless BCI that lends itself well to autonomous adaptation is realizable. Index Terms—Autonomous, brain–computer interface (BCI), electroencephalogram (EEG), fuzzy neural network (NN), self- organization, time-series prediction. I. I NTRODUCTION I N RECENT years, there has been significant emphasis on developing self-organizing fuzzy systems that continuously evolve and adapt to nonstationary dynamics in complex data sets [1]–[8]. Many of the developments have successfully been used for applications such as function approximation, system identification, and time-series prediction and are often tested on benchmark problems such as the two-input nonlinear- sinc problem, Mackey–Glass time-series prediction, and others [1]–[4], [9]. An example of a network with an online self-organizing training algorithm is the self-organizing fuzzy neural network Manuscript received July 1, 2008; revised December 23, 2008. First pub- lished May 29, 2009; current version published November 18, 2009. This paper was recommended by Associate Editor T. H. Lee. The authors are with the Intelligent Systems Research Center, School of Computing and Intelligent Systems, Faculty of Computing and Engineering, University of Ulster, BT48 7JL Londonderry, U.K. (e-mail: dh.coyle@ulster. ac.uk). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCB.2009.2018469 (SOFNN) [1], [2]. The SOFNN is capable of self-organizing its architecture, adding and pruning neurons as required. New neurons are added to cluster new data that the existing neurons are unable to cluster. Inevitably, if the data are highly nonlinear and nonstationary, then as training progresses and more data are fed to the network, the structure complexity increases, and the training efficiency begins to degrade. This is due to the necessity to calculate each neuron’s firing strength for a greater number of neurons for all data previously presented to the network to ensure that changes to the network do not affect the network’s ability to optimally cope with older previously learned data dynamics. This problem occurs if the structure update algorithm depends on information contained in the error derived from the existing network, which is often the case. The problem is amplified if the neurons contain fuzzy membership functions (MFs) that are expensive to compute, e.g., the exponential function. There are algorithms in the literature that have tackled this problem, but the fuzzy-based approaches have a structure that restricts utilization of many of the proposed techniques. For example, the growing and pruning radial basis function net- work (GAP-RBFN) [10] enables determination of the effect of neuron removal on the whole structure by only checking that neuron’s contribution to the network. For the fuzzy-based approaches such as the SOFNN [1], [2], this method cannot be applied because the fourth layer of the network contains the consequent part of the fuzzy rules in the fuzzy inference system (FIS) that employs a global learning strategy. A single fuzzy rule does not determine the system output alone; it conjunctively works with all fuzzy rules in the FIS; therefore, the SOFNN error cannot be determined by removing a neuron and implementing the same technique described for the GAP- RBFN [10]. Furthermore, as the fuzzy rules of the SOFNN are based on zero- or first-order Takagi–Sugeno (TS) models, the linear parameters (consequent parameters of the fuzzy rules) are updated using the recursive least squares estimator (RLSE); however, if the structure is modified during training, the parameters must be estimated using the least squares estimator (LSE). Therefore, the SOFNN is only recursive if the network structure remains unchanged. There have been a number of approaches developed for online recursive estimation of the consequence parameters and local learning rules [4]–[7], al- though in this paper it is shown that these algorithms are not as accurate as the SOFNN, even though they normally evolve complex structures. 1083-4419/$25.00 © 2009 IEEE

Transcript of Faster self-organizing fuzzy neural network training and a hyperparameter analysis for a...

1458 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009

Faster Self-Organizing Fuzzy Neural NetworkTraining and a Hyperparameter Analysis

for a Brain–Computer InterfaceDamien Coyle, Member, IEEE, Girijesh Prasad, Senior Member, IEEE, and

Thomas Martin McGinnity, Member, IEEE

Abstract—This paper introduces a number of modifications tothe learning algorithm of the self-organizing fuzzy neural network(SOFNN) to improve computational efficiency. It is shown that themodified SOFNN favorably compares to other evolving fuzzy sys-tems in terms of accuracy and structural complexity. An analysisof the SOFNN’s effectiveness when applied in an electroencephalo-gram (EEG)-based brain–computer interface (BCI) involving theneural-time-series-prediction-preprocessing (NTSPP) frameworkis also presented, where a sensitivity analysis (SA) of the SOFNNhyperparameters was performed using EEG data recorded fromthree subjects during left/right-motor-imagery-based BCI exper-iments. The aim of this one-time SA was to eliminate the needto choose subject- and signal-specific hyperparameters for theSOFNN and thus apply the SOFNN in the NTSPP framework as aparameterless self-organizing framework for EEG preprocessing.The results indicate that a general set of NTSPP parameterschosen via the SA provide the best results when tested in a BCIsystem. Therefore, with this general set of SOFNN parameters andits self-organizing structure, in conjunction with parameterlessfeature extraction and linear discriminant classification, a fullyparameterless BCI that lends itself well to autonomous adaptationis realizable.

Index Terms—Autonomous, brain–computer interface (BCI),electroencephalogram (EEG), fuzzy neural network (NN), self-organization, time-series prediction.

I. INTRODUCTION

IN RECENT years, there has been significant emphasis ondeveloping self-organizing fuzzy systems that continuously

evolve and adapt to nonstationary dynamics in complex datasets [1]–[8]. Many of the developments have successfullybeen used for applications such as function approximation,system identification, and time-series prediction and are oftentested on benchmark problems such as the two-input nonlinear-sinc problem, Mackey–Glass time-series prediction, and others[1]–[4], [9].

An example of a network with an online self-organizingtraining algorithm is the self-organizing fuzzy neural network

Manuscript received July 1, 2008; revised December 23, 2008. First pub-lished May 29, 2009; current version published November 18, 2009. This paperwas recommended by Associate Editor T. H. Lee.

The authors are with the Intelligent Systems Research Center, School ofComputing and Intelligent Systems, Faculty of Computing and Engineering,University of Ulster, BT48 7JL Londonderry, U.K. (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCB.2009.2018469

(SOFNN) [1], [2]. The SOFNN is capable of self-organizingits architecture, adding and pruning neurons as required. Newneurons are added to cluster new data that the existing neuronsare unable to cluster. Inevitably, if the data are highly nonlinearand nonstationary, then as training progresses and more dataare fed to the network, the structure complexity increases, andthe training efficiency begins to degrade. This is due to thenecessity to calculate each neuron’s firing strength for a greaternumber of neurons for all data previously presented to thenetwork to ensure that changes to the network do not affectthe network’s ability to optimally cope with older previouslylearned data dynamics. This problem occurs if the structureupdate algorithm depends on information contained in theerror derived from the existing network, which is often thecase. The problem is amplified if the neurons contain fuzzymembership functions (MFs) that are expensive to compute,e.g., the exponential function.

There are algorithms in the literature that have tackled thisproblem, but the fuzzy-based approaches have a structure thatrestricts utilization of many of the proposed techniques. Forexample, the growing and pruning radial basis function net-work (GAP-RBFN) [10] enables determination of the effectof neuron removal on the whole structure by only checkingthat neuron’s contribution to the network. For the fuzzy-basedapproaches such as the SOFNN [1], [2], this method cannotbe applied because the fourth layer of the network containsthe consequent part of the fuzzy rules in the fuzzy inferencesystem (FIS) that employs a global learning strategy. A singlefuzzy rule does not determine the system output alone; itconjunctively works with all fuzzy rules in the FIS; therefore,the SOFNN error cannot be determined by removing a neuronand implementing the same technique described for the GAP-RBFN [10]. Furthermore, as the fuzzy rules of the SOFNN arebased on zero- or first-order Takagi–Sugeno (TS) models, thelinear parameters (consequent parameters of the fuzzy rules)are updated using the recursive least squares estimator (RLSE);however, if the structure is modified during training, theparameters must be estimated using the least squares estimator(LSE). Therefore, the SOFNN is only recursive if the networkstructure remains unchanged. There have been a number ofapproaches developed for online recursive estimation of theconsequence parameters and local learning rules [4]–[7], al-though in this paper it is shown that these algorithms are notas accurate as the SOFNN, even though they normally evolvecomplex structures.

1083-4419/$25.00 © 2009 IEEE

COYLE et al.: FASTER SOFNN TRAINING AND HYPERPARAMETER ANALYSIS FOR BRAIN–COMPUTER INTERFACE 1459

Fig. 1. SOFNN.

To improve the SOFNN’s [1], [2] computational efficiency,a new method of checking the network structure after it hasbeen modified is proposed. Instead of testing the entire structureevery time it has been modified, a record is kept of eachneuron’s firing strength for all data previously clustered by thenetwork. This record is updated as training progresses and isused to reduce the computational load of checking networkstructure changes to ensure that performance degradation doesnot occur, resulting in significantly reduced training times.

Applying the SOFNN in a BCI: Extensive research under-taken over the last 15 years has produced substantial evidencethat humans can become skilled at manipulating amplitude andfrequency rhythms in their electroencephalogram (EEG). A per-son’s ability to manipulate his/her EEG can enable him/her tocommunicate devoid of the prerequisite of neuromuscular con-trol that is necessary for the usual methods of communication,such as speech or sign language. EEG-based communication canbe realized via a brain–computer interface (BCI) and may bebeneficial for people with neuromuscular disorders [11]–[20].

A framework for EEG preprocessing, referred to as neuraltime-series prediction preprocessing (NTSPP), has recentlybeen proposed [19]–[22]. The NTSPP framework involvestraining prediction/regression models to predict different typesof EEG time series m steps ahead, after which features areextracted from the predicted signals. This process of extractingfeatures from the predicted signals rather than the originalsignals has a number of benefits, the most significant beingthat the predicted signals are more separable than the originalsignals. Features extracted from the predicted signals are moreseparable in terms of Euclidean distance between class meansand reduced interclass correlation and intraclass variance [19],[20]. Additionally, the multiple-step-ahead prediction NTSPPframework can improve the information transfer (IT) rate [21],and the NTSPP framework allows other BCI components(e.g., feature extraction and classification) to be applied moreefficiently—reducing the need for subject-specific parameters.

In [21], the SOFNN [1], [2] was applied in the NTSPPframework using arbitrarily chosen hyperparameters. Whenapplying any algorithm to EEG for a BCI, completely au-tonomous adaptation to each individual and each signal isdesirable, and therefore, subject-specific parameters shouldbe minimized. The self-organizing structure of the SOFNNlends itself well to this requirement, although it was notedfrom the initial experimentation of the SOFNN in the NTSPP

Fig. 2. Structure of the jth neuron Rj within the EBF layer.

framework [21] that, to utilize the algorithm more efficientlyand to improve the transparency of its predefined parameters,it was necessary to investigate the sensitivity of the networkperformance to each of five predefined parameters. This papershows that parameters chosen via a sensitivity analysis (SA)can enhance the SOFNN’s automaticity and applicability tothe NTSPP framework and that these parameters enable anefficient parameterless BCI involving the NTSPP frameworkwith parameterless feature extraction procedures (FEPs) and alinear discriminant classifier.

This paper is outlined as follows. Sections II and III illustratethe SOFNN algorithm. In Section IV, a novel modified versionof the SOFNN learning algorithm is introduced. Section Voutlines the data acquisition and configuration for applying andanalyzing the SOFNN to predicting different types of EEG timeseries in the NTSPP framework. An SA technique for assessingthe SOFNN parameters is also described. Section VI providesdetails of the NTSPP framework and Hjorth- and Barlow-basedfeature extraction. Sections VII and VIII provide the results,discussion, and conclusions.

II. ARCHITECTURE OF THE SOFNN

The SOFNN is a five-layer fuzzy neural network (NN) andhas the ability to self-organize its neurons in the learningprocess for implementing TS fuzzy models [23] (cf. Fig. 1).In the elliptical basis function (EBF) layer, each neuron is aT-norm of Gaussian fuzzy MFs belonging to the inputs ofthe network. Every MF thus has a distinct center and width;therefore, every neuron has a center and a width vector. Fig. 2

1460 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009

illustrates the internal structure of the jth neuron, where theinput vector is x = [x1, x2, . . . , xr], cj = [c1j , c1j , . . . , crj ]is the vector of centers in the jth neuron, and σj =[σ1j , σ2j , . . . , σrj ] is the vector of widths in the jth neuron.Layer 1 is the input layer with r neurons xi, i = 1, 2, . . . , r.Layer 2 is the EBF layer. Each neuron in this layer represents apremise part of a fuzzy rule. The outputs of (EBF) neurons arecomputed by products of the grades of MFs. Each MF is in theform of a Gaussian function

μij = exp[−(xi − cij)2

/2σ2

ij

], j = 1, 2, . . . , u (1)

whereμij ith MF in the jth neuron;cij center of the ith MF in the jth neuron;σij width of the ith MF in the jth neuron;r number of input variables;u number of EBF neurons.

For the jth neuron, the output is

φj = exp

[−

r∑i=1

((xi − cij)2

/2σ2

ij

)], j = 1, 2, . . . , u.

(2)

Layer 3 is the normalized layer. The number of neurons inthis layer is equal to that in layer 2. The output of the jth neuronin this layer is

ψj = φj/

u∑k=1

φk, j = 1, 2, . . . , u. (3)

Layer 4 is the weighted layer. Each neuron in this layerhas two inputs and the product of these inputs as its out-put. One of the inputs is the output of the related neuronin layer 3, and the other is the weighted bias w2j . For theTS model [23], the bias B = [1, x1, x2, . . . , xr]T, and Aj =[aj0, aj1, aj2, . . . , ajr] represents the set of parameters corre-sponding to the consequent of the fuzzy rule j, which areobtained using the LSE or RLSE. The weighted bias w2j is

w2j = Aj · B = aj0 + aj1x1 + · · · + ajrxr,

j = 1, 2, . . . , u. (4)

This is the consequent part of the jth fuzzy rule of the fuzzymodel. The output of each neuron is fj = w2jψj .

Layer 5 is the output layer where the incoming signals fromlayer 4 are summed, shown in

y(x) =u∑

j=1

fj (5)

where y is the value of an output variable. If u neurons aregenerated from n training exemplars, then the output of thenetwork can be written as

Y = W2Ψ (6)

where for the TS model

Y = [ y1 y2 · · · yn ] (7)

Ψ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

ψ11 · · · ψ1n

ψ11x11 · · · ψ1nx1n...

......

ψ11xr1 · · · ψ1nxrn...

......

ψu1 · · · ψun

ψu1x11 · · · ψunx1n...

......

ψu1xr1 · · · ψunxrn

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(8)

W2 = [ a10 a11 · · · a1r · · · au0 · · · au1 · · · aur ]

(9)

where W2 is the parameter matrix, and ψjt is the output of thejth neuron in the normalized layer for the tth training exemplar.

III. SOFNN LEARNING ALGORITHM

The learning process of the SOFNN includes structure learn-ing and parameter learning. The structure learning processattempts to achieve an economical network size by dynamicallymodifying, adding, and/or pruning neurons. There are two crite-ria to judge whether to generate a new EBF neuron—the systemerror criterion and the if-part criterion. The error criterion con-siders the generalization performance of the overall network.The if-part criterion evaluates whether existing fuzzy rules orEBF neurons can suitably cluster the current input vector. TheSOFNN pruning strategy is based on the optimal-brain-surgeonapproach [24]. Basically, the idea is to use second-derivative in-formation to find the least important neuron. If the performanceof the entire network is accepted when the least important neu-ron is pruned, the new structure of the network is maintained.

This section only provides a basic outline of the structurelearning process, the complete structure and weight learningalgorithm for the SOFNN is detailed in [1] and [2]. It must benoted that the neuron modifying, adding, and pruning proce-dures are fully dependent upon determining the network erroras the structures changes; therefore, a significant amount ofnetwork testing is necessary—to either update the structurebased on finalized neuron changes or simply to check if atemporarily deleted neuron is significant. This can be computa-tionally demanding, and therefore, an alternative approach thatminimizes the computational cost of error checking during thelearning process is described below.

IV. FASTER SOFNN STRUCTURE LEARNING PROCEDURE

Updating the structure requires updating the matrix Ψ, asshown in (8), after which the linear parameters are estimatedusing the LSE, i.e., nonrecursively after structure changes only.As can be seen from (3) and (8), the Ψ matrix requires that allneuron outputs are calculated for every input exemplar that hasalready entered the network. The time required is therefore onthe order of O(nUM2), where n is the number of training data

COYLE et al.: FASTER SOFNN TRAINING AND HYPERPARAMETER ANALYSIS FOR BRAIN–COMPUTER INTERFACE 1461

that have already entered the network, U is the number of timesthe structure needs to be updated, and M = u · (r + 1), wherer is the number of input variables, and u is the number of EBFneurons in the network structure. In small training sets (n <2000), with a relatively small input training dimension (r < 7),the number of neurons added may not be large (depending onthe data complexity); therefore, O(nUM2) is normally lowand tolerable. However, real-world data sets can often exhibitcomplex dynamics and are inherently large. In the case of EEGdata sets, n is large and can thus result in long training times;therefore, there is a need to reduce some of the other variablesin O(nUM2).

In this paper, it is proposed to reduce the training time by re-ducing the time required to perform tests on the structure whenupdating the structure and evaluating the importance of neu-rons. To do this, as training progresses, a record of intermediatenetwork responses can be stored, which contains each EBFneuron’s most recent firing strength, denoted as Φ ∈ Ru×n,and the normalization factor for each input that previouslyentered the network, denoted as ξ ∈ R1×n, i.e., the outputs fromeach neuron in layer 2 and the sum of the outputs from allneurons in layer 2 are stored. Then, when checking structuralmodifications during training, instead of calculating the firingstrength for all neurons in layer 2, for all previously enteredinputs, only those neurons that have been changed should bechecked. However, there is an implication in doing this. Ascan be seen from (3), ψj , the output of each neuron in layer 3,depends on the normalization factor that is the sum of the firingstrength for all neurons for each input exemplar. Therefore, ifany neuron is modified, added, or deleted, the normalizationfactor for all inputs that previously entered into the network ischanged, and therefore, all historical data become invalid. Toovercome this problem, if a neuron, e.g., q, is changed, thenthat neuron’s previous firing strength is obtained from the storeddata and deducted from the saved normalization factor for eachof the previously entered inputs. Then, the firing strength forneuron q for all inputs is recalculated. The new firing strengthis stored in place of the old firing strength and added to thenormalization factor. Subsequently, the output from layer 3can be obtained by dividing all the saved firing strengths bythe respective normalization factor for all previously clustereddata. To demonstrate this, consider that n data exemplars haveentered the network and u neurons have been created; then, Φ isa u × n matrix containing the firing strength of each neuron forall previously entered data, and ξ is a 1 × n vector containingthe normalization factor for all previously clustered data. There-fore, we have the following cases for neuron/network changes.

1) Neuron modifying: If a neuron q has been modified,then first, its contribution to the normalization factor isexcluded using

ξinew = ξiold − Φqi, i = 1, 2, . . . , n (10)

and the firing strength for the qth neuron for the ith inputexemplar ςi is calculated using (2) and (3). This is carriedout for all previously entered data (i.e., i = 1, 2, . . . , n).The qth row of the saved firing strength matrix Φ issubsequently updated using (11), and the normalization

factor is updated using (12) [note that (10) must becalculated prior to (12), after which ξinew obtained using(10) becomes ξiold in (12)]

Φqi = ςi, i = 1, 2, . . . , n (11)

ξinew = ξiold + ςi, i = 1, 2, . . . , n. (12)

2) Neuron adding: If a new neuron q has been added to thestructure, then the firing strength ςi of that neuron for alldata previously entered into the structure is calculated,after which Φ is updated using (11) and ξ is obtainedusing (12).

3) Neuron deleting: If the qth neuron has been deleted, ξis updated using (10), after which the qth row of Φ isremoved.

Subsequent to any network changes and respective changesto the data stored in Φ and ξ, the normalized firing strengthfor all data and all neurons (output of layer 3) is simply andefficiently calculated by dividing each row of Φ by ξ.

V. SOFNN SETUP AND ANALYSIS FOR NTSPP

A. Data Acquisition

The data used in this paper were recorded from three subjects(S1–S3) in a timed recording procedure [11], [26]. All signalswere sampled at 125 Hz and filtered between 0.5 and 30 Hz.Two bipolar EEG channels were measured using two electrodespositioned 2.5 cm posterior and anterior to positions C3 andC4 according to the international standard (10/20 system) [27].Data were recorded during virtual reality feedback and gamingexperiments, where the task was to control and perform deci-sions in various types of virtual environments or control thehorizontal position of a ball falling from the top to the bottomof the monitor of a personal computer using motor imagery(cf. [26] for details). The timing of the recording paradigm andfeedback duration was similar for all subjects [11]. The datasets for subjects S1–S3 consist of 476, 1080, and 1080 trials,respectively. Each trial lasts ∼10 s, of which 4–5 s is related toleft/right movement imagery, i.e., event related.

B. Data Configuration

There are four different time series recorded during theexperiments described above, i.e., left (C3 and C4) and right(C3 and C4). These will be referred to as l3, l4, r3, and r4. In theNTSPP framework, each of these time series is predicted usinga different SOFNN (cf. Fig. 3 for illustration). For prediction,the recorded EEG time-series data are structured so that thesignal measurements from sample indices t to t − (Δ − 1)τ areused to make a prediction of the signal at sample index t + m.The parameter Δ is the embedding dimension, and

cx(t + m) = f (cx(t), . . . , cx (t − (Δ − 1)τ)) (13)

where τ is the time delay, m is the prediction horizon (minusτ ), f( ) is the SOFNN model, cx is the signal (i.e., l3, r3, l4, orr4), and cx is the predicted signal. There are many techniquesused for selecting the best values for Δ and τ [28]; however,

1462 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009

Fig. 3. Illustration of the NTSPP framework, FEP, and classifier.

this paper was not concerned with selecting the optimal Δ andτ values, although an SA was carried out during tests with τranging from 1 to 10 for m ranging from 1 to 150 in steps of 50(i.e., four values of m).

C. Sensitivity Analysis

For an autonomous BCI, the signal processing algorithmmust adapt to the characteristics of each individual’s EEG,and it is expected to continue to autonomously adapt to theevolving dynamics of the EEG. In [25], a method to measurethe automaticity of an algorithm by relying on its sensitivity tothe training parameters is described, i.e., a sensitivity analysis(SA). The goal is to find how sensitive the performance isto changes in the predefined parameters. For example, if aslight change in a parameter θi results in a significant changein the prediction error and/or structural complexity, then thisparameter is critical and requires considered selection. The fiveSOFNN parameters that can be predefined are the error toler-ance (δ), the desired training root-mean-square error (RMSE)of the network (krmse), the distance threshold vector (kd), theinitial width vector (σ0), and the error tolerance multiplier (λ).All these parameters are standard among many evolving fuzzyalgorithms and, in many cases, are not analyzed in great detail.The selection of these types of parameters is often arbitrarilycarried out or expert knowledge is used; and therefore, theseparameters are often neglected or not identified as predefined.For the SOFNN, this can also be achieved using the expertrecommendations presented in [1] and [2]; however, for thisBCI study, the objective was to critically assess the effects ofthese parameters on the overall BCI performance.

The sensitivity s of parameter θi (here, i = 1 to 5) is [25]

s=1

κ − 1· 1πmax − πmin

κ−1∑j=1

∣∣∣π(θ(j+1)i

)− π

(θ(j)i

)∣∣∣ (14)

where κ is the number of samples of a particular parameter,π(θi) is the performance of the network when using a par-

ticular value of θi, and πmin and πmax are, respectively, theminimum and maximum values of the chosen performancemeasure. For this SA, the performance measure π(θi) is theRMSE of predicting a test data set, πmin = 0.001, and πmax =0.1. The RMSE usually always falls within this range. Foreach parameter combination, an SOFNN was trained on 1400samples of event-related data (350 samples from four randomlyselected trials) for each of the four EEG signal types, i.e.,l3, l4, r3, and r4. For subjects 1–3, respectively, 136, 276,and 276 trials were used for testing the SOFNN during theSA. Tests were performed on 4 s of event-related data, whichresulted in a maximum test set of (4 × 125) × 276 = 138 000test samples. For each iteration of the training and testingprocedure, the predefined parameter combinations were variedin the following ranges: kd = 0.05, 0.1, . . . , 1; krmse = 0.01,0.02, . . . , 0.1; δ = 0.01, 0.02, . . . , 0.2; σ0 = 0.1, 0.2, . . . , 1;and λ = 0.05, 0.1, . . . , 1.05. While each of these parameterswas varied in its respective range, the other parameters wereinitialized as follows: kd = 0.15, krmse = 0.01, δ = 0.02, σ0 =0.2, and λ = 1. The parameter value that achieved minimumRMSE for each subject is averaged over the three subjects, andthe standard deviation (STD) from the average is reported inSection VII-C (for each of the four signal types). Results forτ = 1 and m = 1 and τ = 10 and m = 150 (i.e., minimum andmaximum time lag and prediction horizon) are presented.

VI. BCI EXPERIMENTATION

A. BCI With NTSPP

To verify the practicality of the parameters selected throughthe SA, first, tests of a BCI involving NTSPP using SOFNNstrained with the parameters selected via the SA were performed,and the results were compared with those obtained using arbi-trarily selected parameters (cf. Table VIII in Section VII-C).The amount of training data used to train the SOFNNs alsohas a significant effect on NTSPP performance; therefore,for each parameter combination, the SOFNNs were trained

COYLE et al.: FASTER SOFNN TRAINING AND HYPERPARAMETER ANALYSIS FOR BRAIN–COMPUTER INTERFACE 1463

using a 4-s segment of event-related data, drawn from onerandomly chosen trial for one set of tests and from ten differenttrials for a second set of tests (cf. Sections V-A and B forfurther information on data configuration). When all networksare trained and during application, each trial is fed into theprediction modules, and the features are extracted from boththe original signals (Os) and the NTSPP-predicted signals (Ys)(cf. Section VI-B for details on FEPs). The main objective ofNTSPP is to first improve feature separability and, thus, theresultant classification accuracy (CA) when compared to thefeatures extracted from the Os signals. In addition, when ap-plying a multiple-step-ahead prediction of the EEG time seriesfor a BCI, the objective is also to reduce the classification time(CT) by predicting the point of maximum separability in thetrial prior to its occurrence, thus enhancing feature separabilityearlier than would be possible using the Os signals only.1

B. Feature Extraction

Hjorth introduced a method that involves using time-domaininformation from the signal for EEG analysis [29], [30]. TheHjorth method produces three features: 1) activity, which is themean power in the time domain (i.e., variance); 2) mobility,which is an estimate of the mean frequency; and 3) complexity,which can be seen as a measure of variability in the frequencycontent of the input signal. Barlow-based feature extraction issomewhat similar to Hjorth-based feature extraction, where,again, there are three features extracted from each signal [30],[31], which are: 1) mean amplitude; 2) mean frequency; and3) spectral purity index, which is a measure of the irregularityin the signal.

The advantages of using a Barlow-based FEP are similar tothose of using the Hjorth method, i.e., frequency informationcan be obtained without iterating a complex function withmultiple parameters, although these methods only produce acrude estimate of the frequency information. The feature ex-traction window width is the only parameter for these FEPsand was set to 1 s; therefore, there is no parameter tuning forthis FEP. Features are extracted from each signal at the rateof the sampling interval. For the Hjorth or Barlow method, thefeature fv has a dimension of six (two signals and three featuresper signal) for Os signals, whereas fv using Ys signals has adimension of 12 (four signals). The signal dynamics that areextracted by the FEPs are influenced by NTSPP; thus, it wasanticipated that NTSPP would perform well with these FEPs.

C. Classification

The data sets for each subject were split into three sessions.Using a linear discriminant analysis (LDA) classifier [32],a fivefold cross validation was carried out on session 1 foreach subject, where the data were partitioned into a trainingset (80%) and a validation set (20%). Tests were performed

1The signals are classified at the rate of the sampling interval, i.e., at everytime instant t between the initiation of the communication signal, i.e., themotor-imagery-based mental task, and the end. During this period, there isnormally a time t-max where the classification accuracy is maximal.

five times using a different validation partition each time. Themean-CA (mCA) rates on the five folds of validation dataand 95% confidence intervals were estimated. Subsequently, allsession-1 data were utilized to train the system, and the classi-fier was set up on the features that produced the highest mCArate on session 1. The system’s generalization abilities werethen tested on a one-pass test on session 2. The best resultswere used to determine the best setup to use for a final test ondata from session 3.

VII. RESULTS AND DISCUSSION

A. Optimized SOFNN Training Algorithm

The computational effort involved in carrying out the aboveanalysis significantly benefited from a reduction in trainingtimes inherited from the modified SOFNN learning approachintroduced in this paper. To demonstrate the speedup, a com-parative analysis on ten benchmark data sets was carried outwith the older version of the SOFNN and the modified version,using a 3.6-GHz Intel Dual Xeon central processing unit with4 GB of random access memory. Error and network complex-ity results for these benchmarks were presented in [1] and[2], where comparisons to well-known algorithms such as theadaptive neuro-fuzzy inference system, the radial basis functionNN, the dynamic fuzzy NN (DFNN), the generalized DFNN[3], [9], and others were presented, showing that the SOFNNoutperformed these algorithms. For the current tests, as wasexpected, neuron numbers and training RMSEs for the old andnew networks were equivalent. Columns 5 and 6 of Table Ishow the training times for the old and new versions of theSOFNN. In all cases, the newer version trained quicker thanthe older version, although data sets 8 and 9 explicitly showthe differentiation in training times when more neurons arerecruited by the SOFNN to cluster larger data sets.

The results in bold shown in Table I are for tests performedon a EEG data set, where the SOFNN was trained on 6000samples of motor imagery EEG signals recorded from the C3electrode during imagination of left-hand movement. The datawere taken from 16 randomly chosen trials and consists of3-s segments of the most separable event-related portion ofeach of the trials [to help illustrate how the SOFNN structureadapts to different data, two tests were performed on EEG datasets, i.e., each data set was randomly selected but has differentcomplexity, and these data sets are referred to as sEEG (simple)and cEEG (complex)]. The SOFNN was trained to perform one-step-ahead prediction with an embedding dimension of six anda time lag of one. The parameters were stringently selectedso that the network would attempt to evolve for maximumperformance in terms of accuracy and thus was likely to add asignificant number of neurons. Again, the two SOFNN versionsproduced identical results, i.e., both converged on the sameminimum error and the same number of neurons pruned andadded. However, the elapsed time, ranging from the initiationof the training to completion, was significantly different foreach version of the network. For sEEG, the elapsed timefor the old version was approximately 18.4 min, whereas theelapsed time for the modified SOFNN was ∼5.3 min. This is

1464 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009

Fig. 4. SOFNN performance and structure growth for (top) the sEEG data set (13 neurons) and (bottom) the cEEG data set (20 neurons). Left: Training result.Middle: Growth of neurons. Right: Testing result.

a ∼71% reduction in the training duration. Only 13 neuronswere added, which shows the compactness of the network butstill gives good performance (RMSE = 0.009, which was thetarget). Conversely, for the more complex EEG data set (cEEG),20 neurons were added to maintain the desired predictionaccuracy, and the training duration was significantly increased,taking approximately 47 min for the old SOFNN and 15 minfor the new SOFNN.2 Again, the speedup provided by themodifications to the SOFNN is illustrated, which is circa 68%in this case.

These tests clearly show the significance of the speedupwhen working on larger data sets. When learning these largerEEG data sets, for any occasion in which a neuron was addedor deleted, only that portion of the network was recheckedusing the historical records of the firing strengths and simpledata manipulations involving subtractions and division (cf.Section IV). In the older version, after structure changes, ifthere were six inputs, with six neurons in the structure and 5000past data exemplars, then, as described in Section IV, thereare O(nUM2) = 5000 · (1)(422) = 8 820 000 calculations, in-volving the exponential function, required to estimate the firingstrengths of all the neurons for all past data. With the modifiedversion, only the firing strengths for the neuron that has beenmodified need to be updated; therefore, the number of calcu-lations O(nUM2) = 5000 · (1)(72) = 245 000. Differences instructure updating complexity are illustrated in this example,which verifies how the improved performance is acquired andthe significance of the computational efficiency instilled by themodifications to the SOFNN presented in this paper. Thereis not a large cost in terms of memory required to facilitate

2These times are greater than normally required. As outlined previously, forthese tests and for illustrative purposes, the SOFNN parameters were selected toensure that the accuracy was maximized regardless of the structural complexity.

the modified structure learning procedure. For example, inthe cEEG example, where 6000 sets of firing strengths for20 neurons are stored, the dimensionality of the firing strengthmatrix Φ is 6 000 320, and the normalization factor vector ξhas a dimension of 600 031. Together, Φ and ξ requires circa928 KB of memory; therefore, the modified approach does nothave a large memory overhead, unless the structure significantlygrows, in which case the computations may become intractable,or the data sets are very large, in which case various techniquescan be employed, the simplest approach being to only considerdata within a predefined window, forgetting about data that havebeen clustered in the distant past.

Figs. 4 and 5 show the training performance information, theneuron growth during training, and the MFs for each input inthe sEEG and cEEG data sets. Table III provides fuzzy rules forthe sEEG data set.

B. Comparative Analysis With Other Evolving Fuzzy Systems

It may be argued that the proposed modifications to theSOFNN are essential and should be an inherent part of thelearning algorithm; however, none of the other fuzzy NNapproaches in the literature have employed this approachto structure checking, which was inspired by the work ofHuang et al. [10]. Many of the evolving fuzzy system ap-proaches [4]–[6] do not check the network using passed inputdata because the structure and parameters updating in thesealgorithms are based on current input data only and recursiveestimation techniques; however, it has been observed that theseapproaches may not maintain good accuracy and low structuralcomplexity in applications where these are demanded. To ver-ify how techniques referred to as evolving fuzzy approachescompare to the SOFNN, the well-known evolving fuzzy NN(EFuNN) [7], [8] was tested on all the benchmark data sets

COYLE et al.: FASTER SOFNN TRAINING AND HYPERPARAMETER ANALYSIS FOR BRAIN–COMPUTER INTERFACE 1465

Fig. 5. SOFNN MFs for each input. Left: sEEG data set. Right: cEEG data set.

TABLE ITIME ENHANCEMENTS BY SOFNN MODIFICATIONS WITH

ACCURACY AND COMPLEXITY PERFORMANCE

introduced above. Two versions of the EFuNN [the parameterself-tuning version [7] and the TS version—the dynamic evolv-ing neuro-fuzzy inference system [8]] are compared with theSOFNN. These methods have been shown to outperform manyother evolving and nonevolving approaches [4], [5], [7], [8].The results are presented in Tables IV and V (Information fromTable I is reproduced here [as] Table VI—refer to Table II fordataset information).

It can be seen that in the ten standard benchmark cases, forboth versions of the EFuNN, the structural complexity, in termsof the number of neurons added, is significantly higher thanthe number obtained with the SOFNN, yet all the training andtesting errors are higher than those obtained with the SOFNN.In terms of training speed, the new version of the SOFNN trainsquicker for the first seven (smaller) data sets and trains quickerthan the self-tuning EFuNN on data set 10 but is slower thanthe TS-based EFuNN. For data sets 8 and 9, the SOFNN issignificantly slower than both versions of the EFuNN. Thisis due to these being larger data sets that require a greater

TABLE IIBENCHMARK DATA SET DETAILS (COLUMNS 2 AND 3 SHOW THE

TRAINING AND TESTING DATA SET SIZES

number of neurons than the other data sets. In these situations,the SOFNN algorithm training speed is reduced, due to thenecessity to check the structure using more past samples moreoften (more neurons and more data). The problem does notoccur in the EFuNN because it employs a recursive updatingprocedure; however, the EFuNN performs much more poorlyin terms of accuracy and structural complexity, and therefore,there is a significant tradeoff between speed and accuracy whenusing the EFuNN.

The main ethos of evolving-based approaches is the potentialfor more rapid evolution of the structure; however, this advan-tage does require a substantive tradeoff between performanceand training speed. Often, when large data sets are beingprocessed, evolving fuzzy systems have unbounded structuralgrowth and fail to ensure that errors are minimized, whereasthe SOFNN does not exhibit these problems. The SOFNNalgorithm employs a global learning approach and uses pastdata to ensure optimal performance. These methods are morecomputationally demanding than local learning approaches in-volving continuous recursive parameter estimates such as thoseemployed in the EFuNN and the evolving TS approach [5];however, it is clear that the best performance in terms of accu-racy and structural complexity is not ensured with local learningand continuous recursion. The following question must be

1466 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009

TABLE IIIFUZZY RULES IN SOFNN sEEG DATA SET—13-NEURON STRUCTURE

posed: is the issue of online training speed critically important?Obviously, this depends on the application, but it is unlikelythat many applications have such fast-changing dynamics thatrequire algorithms to be updated on every sample and withinthe sampling interval. To investigate the implications of trainingspeed, the training and testing times for the EEG datasets with6000 training exemplars and 10 500 testing exemplars wereanalyzed.

As can be seen in Tables IV–VI both versions of the EFuNNevolve a substantial number of neurons to cluster this dataset even though the error level is significantly higher thanthe SOFNN, i.e., an order of magnitude higher for both thetraining and testing data sets. This performance differentiationcan be attributed to the recursive nature of the algorithm, whichdoes result in faster training in some cases, as can be seen inTables IV–VI, but has shortcomings when robustly ensuringthat the accuracy and the structural complexity are optimal.It must be noted that for the EEG data set, the SOFNNrequires an average of 323 s/6000 ≈ 54 ms to organize thestructure after each exemplar enters the network during trainingfor the SOFNN, whereas the TS-based EFuNN requires onaverage ∼37 ms to update the structure. If either algorithmhad to adapt to the EEG signal at the rate of the samplinginterval, which is 8 ms (125-Hz sampling), then both wouldbe incapable. The above timings are based on MATLABm-files and therefore could dramatically be improved by en-coding the algorithm in C or C-Mex using MATLAB Simulink,as described in [19] and [30]. For a BCI involving the SOFNN-based NTSPP framework, it would not be necessary to update

the algorithm after every sample is recorded. If necessary,each SOFNN in the NTSPP structure could be updated afterevery seven exemplars (i.e., 53 ms/8 ms ≈ 7), but it would besufficient to adapt the network less often or perhaps periodicallyusing the time break between each trial or rest periods whenthe BCI is not in operation. This will be a topic of furtherinvestigation.

Notably, however, there is a significant issue in relation to thetesting of the EFuNN when between 1500 and 1800 neuronsare added and the data set is large—the EFuNN requires ∼8–10 min to process the test data, whereas the SOFNN requires1–1.5 s. This is a significant difference, and the main cause ofthe EFuNN’s time overhead is the structural complexity—forevery exemplar of testing data, the distance between each dataexemplar and the 1628 clusters for the self-tuning EFuNN(1794 clusters for the TS-based EFuNN) (cEEG data set)must be calculated using a Euclidean metric. This is doneto determine which local cluster the exemplar belongs to. Inthe SOFNN, the firing strength of each Gaussian MF neu-ron is calculated using multiplicative and summation termsfor each of the 20 neurons added for each data exemplarand is therefore much more efficient in this respect. Basedon the above figures, the self-tuning EFuNN would require(∼13 min)/10 500 ≈ 74 ms to process one data exemplar,which is a considerable duration for network testing. Again,however, the problem may be alleviated using more efficientprogramming languages, but this analysis does illustrate ashortcoming with local recursive learning and unbounded struc-tural complexity.

COYLE et al.: FASTER SOFNN TRAINING AND HYPERPARAMETER ANALYSIS FOR BRAIN–COMPUTER INTERFACE 1467

TABLE IVEFUNN WITH PARAMETER SELF-TUNING

TABLE VEFUNN TS

C. Results of Sensitivity Analysis

Results from the SA are presented in Table VII. Column 1specifies the signal type, column 2 specifies the average bestvalue of each parameter ± STD. Column 3 specifies the meannumber of neurons added to the structure ± STD. Column 4specifies the mean RMSE ± STD. Column 5 specifies the meansensitivity. Column 2–5 and 6–9 specify the same results usingdifferent τ and m combinations.

In the majority of tests, the average number of neurons wasone. This is indicative that one neuron was sufficient to clusterthe training data used in the SA. It can be seen that the networkcomplexity (i.e., increasing the number of neurons) is sensitiveto parameters σ0 and λ. λ is critical in the pruning process. Forexample, if an existing neuron in the structure is pruned andthe error rate does not reduce to below a percentage (λ · 100)of the error obtained when that neuron was included in thestructure, then that neuron is permanently deleted. If λ is closeto one, any reduction in error will result in neuron deletion. A λvalue slightly greater than one indicates that even if the error is

TABLE VIMODIFIED SOFNN ACCURACY, COMPLEXITY, AND TRAINING TIMES

slightly increased, due to neuron removal, this error is tolerable,and the neuron can permanently be deleted. Therefore, thechoice of λ is critical to the complexity of the network.

The sensitivity (s) values show that the performance ismore sensitive to some of the parameters than to others. Theseare, in order of level of sensitivity ranging from maximum tominimum, λ, σ0, δ, krmse, and kd. This is true for all combi-nations of τ and m, although the sensitivity to all parametersis increased when τ and/or m is increased (in general, thesensitivity is circa an order of magnitude higher for τ = 10 andm = 50); therefore, as the complexity of the prediction task(the prediction horizon) is increased, more care in parameterselection is necessary. The error also increases as the predictionhorizon is increased, which is expected.

The mean value of each parameter that achieved the lowestmean error is different for each signal type, indicating that eachsignal type has different dynamics, and therefore, it may benecessary to have different parameter values for each signaltype. However, in the majority of cases, STDs overlap, indicat-ing that the mean value of each parameter for all signal typesis a reasonable estimate of each parameter. It was hypothe-sized that in applications that involve training an SOFNN topredict each of the motor imagery signals, such as the NTSPPframework [12], [21], it could be sufficient to specify one valuefor each parameter for all four signals for any subject, thusremoving the necessity to specify subject- and signal-specifichyperparameters. To test the hypothesis, the average values foreach of the hyperparameters shown in column 2 of Table VIIwere tested in a BCI, and comparisons with stringently (S) andnonstringently (NS) selected parameters were carried out withthe aim of verifying the efficacy of the parameters selected viathe SA (cf. Table VIII).

D. Parameterless BCI

The best parameter setups for the BCI experiments are pre-sented in Table IX (details are presented in the caption). TheP columns shows that 9 out of 12 of the best results producedby the NTSPP signals (Ys-1 and Ys-50, cf. next paragraph)

1468 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009

TABLE VIIRESULTS FROM THE SA OF SOFNN PARAMETERS (AVERAGES OF THREE SUBJECTS)

TABLE VIIISELECTIONS FOR PREDEFINED SOFNN PARAMETERS

TABLE IXPARAMETERS THAT PRODUCED THE BEST RESULTS FOR ALL TESTS

FOR EACH SUBJECT—S, NS, OR SA—AND THE PARAMETERS

THAT DICTATE THE PREDICTION HORIZON (τ AND m)

were obtained using the SA parameters, which indicates thata general set may be chosen to obtain the best results. To verifythe statistical significance, an analysis of more subjects wouldbe required; however, these results are strongly indicative thatthe parameters chosen through the SA do indeed work well in aBCI and that it may be possible to generalize these parametersacross multiple subjects for easily applying the NTSPP frame-work. This verification was a major objective of this paper,but to verify the overall efficacy of the SOFNN-based NTSPPframework for application in a BCI, a practical analysis of theapproach has been carried out using a number of different BCIperformance measures.

Fig. 6 presents the CAs obtained with each of the signal typesfor both feature types. As outlined, the best results producedwith Os signals were compared with the best results using the

Ys signals obtained with one-step-ahead (Ys-1) and 50-step-ahead (Ys-50) predictions. In 16 out of the 18 tests (∼88%), Yssignals produced better CA rates than Os signals.

Considering one best CA result for all feature types andsubjects (sessions 2 and 3), 75% of the best results wereobtained with 50-step-ahead predicted signals, ∼17% of thebest results were obtained using Ys signals with m = 1 or50, and 8% of the best results were obtained with Os signals.Therefore, NTSPP improved the performance in 92% of thecross-session tests. Both Hjorth and Barlow features producedthe best results on 50% of these cases. In general, the CArates are much lower than desired due to the simplicity ofthese FEPs (> 90% CA is desirable for a BCI); however, itis clearly evident that the NTSPP framework improves featureseparability and overall system performance. More advancedFEPs utilized along with NTSPP have been shown to produceCA results between 90% and 100% [35]–[37], but those FEPsrequire subject-specific parameter selection and are thus not asapposite to an autonomous BCI.

Improved CA is not the only advantage of the NTSPPframework. Using Barlow features and Ys signals with m ≥ 50on session 2, the CT is reduced, resulting in a significantlyhigher IT rate [12]. The graphs in Fig. 7 show the averagevalues of these performance metrics for each signal type acrossall subjects. As can be seen, the CT obtained for Ys-1 andYs-50 is reduced in all cases, indicating that NTSPP has thepotential to reduce CT and thus help increase IT rates. Theincrease in IT rate through 50-step-ahead prediction via theNTSPP framework is indicative that NTSPP can enable fasterBCI communication. For example, the average increase in ITrate is > 3 b/min for the cross-session tests of sessions 1 and2 and ∼2 b/min for the cross-session tests of sessions 1 and 3.NTSPP achieves this by either reducing the CT and/or increas-ing the CA (cf. [12] for IT rate calculation). In terms of mutual

COYLE et al.: FASTER SOFNN TRAINING AND HYPERPARAMETER ANALYSIS FOR BRAIN–COMPUTER INTERFACE 1469

Fig. 6. Results for each subject obtained using Hjorth- and Barlow-based FEPs carried out on the original signals and the NTSPP signals. Left: mCA [inpercentage] and the ± 90% confidence intervals for session-1 cross validation. Middle: CA rate for cross-session tests of sessions 1 and 2. Right: CA rate forcross-session tests of sessions 1 and 3.

Fig. 7. Graphs showing average CTs, IT rates, and MI across all subjects foreach signal type and feature type for the cross-session tests.

information (MI) [15], NTSPP introduces slight improvementsover the Os signals in the cross-session tests of sessions 1and 3, but in general, an increased MI is desirable. Due tothe simplistic nature of Hjorth- and Barlow-based features, theinformation extracted is not effective for increasing MI, and MIis thus low in all cases.3 Fig. 8 shows the time course of the CA(separability) from the beginning to the end of the trial for eachsubject. A steep rise in CA and a higher peak are also indicativethat the multiple-step-ahead NTSPP can improve IT rates.

VIII. CONCLUSION

A modified learning algorithm that involves storing the mostrecent neuron firing strength information for previously clus-tered input data and using this information to more efficientlycheck the network as training progresses has been shown to sig-nificantly enhance the computational efficiency of the SOFNN.The results show that the processing times are reduced by morethan 70% for EEG data, and significant training time reductionscan also be observed on ten different well-known benchmarktests. The SOFNN favorably compares to many other fuzzy NNalgorithms in terms of performance and structural complexity.To further reduce the training times, different local-learning-and recursive-based updating strategies will be investigated.The SOFNN employs a global learning strategy; however, if

3MI reflects the amount of information that can be derived from the classifieroutput, i.e., when a continuous signal is produced that varies in time andmagnitude over the duration of the trials and reflects the distance of the featuresfrom the separating hyperplane at each time instant, i.e., the time-varying signeddistance [15].

Fig. 8. Time courses of CA (in percentage) for Os and Ys (sessions 1 and 2).

a local learning strategy was employed, the consequent param-eters of each rule/neuron could separately be trained from theothers. Methods such as those described in [4]–[6] and [34]may be employed for local learning; however, as shown in thispaper, even though recursive and local learning methods canbe trained faster, they do lack in performance and can resultin increased structural complexity. Therefore, when accuracy isglobally paramount, the semirecursive learning algorithm of theSOFNN may be the best approach.

Results from an SA have been utilized to determine a goodset of parameters for predicting motor-imagery-altered EEGsignals using the SOFNN algorithms. The parameter changesthat are most likely to influence the sensitivity of the networkerror have been identified. A general set of SOFNN parametersfor predicting motor imagery data for all subjects has beenselected, and multiple tests on a BCI involving the NTSPPframework indicate that these parameters are indeed the mostsuitable for these data. This BCI based on an SOFNN-basedNTSPP framework that autonomously adapts its structure tosuit each individual and does not require predefined parame-ters, used in conjunction with Hjorth- or Barlow-based featureextraction along with an LDA classifier, has the potential to bea fully parameterless BCI and lends itself well to autonomousadaptation. The NTSPP framework is the only framework thatenables features to be extracted from signals predicted multipletime steps ahead, and the IT rate increases are indicative of

1470 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009

this unique potential. The prediction power of the SOFNN forpredicting over long prediction horizons is advantageous inthis respect. Further BCI work will involve enhancing NTSPPby simultaneously training the SOFNNs with an objectivefunction which may produce networks that influence, moreso, the separable signal characteristics obtained by the FEPswhich extract more precise frequency information. In addition,very promising results have been achieved by applying theSOFNN-based NTSPP framework in a multiclass BCI andin conjunction with other preprocessing procedures [35]–[37].Further work will involve continuing these investigations.

REFERENCES

[1] G. Leng, G. Prasad, and T. M. McGinnity, “An on-line algorithm forcreating self-organizing fuzzy neural networks,” Neural Netw., vol. 17,no. 10, pp. 1477–1493, Dec. 2004.

[2] G. Leng, “Algorithmic developments for self-organizing fuzzy neuralnetworks,” Ph.D. dissertation, Univ. Ulster, Coleraine, U.K., Sep. 2003.

[3] S. Wu and M. J. Er, “Dynamic fuzzy neural networks—A novel approachto function approximation,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,vol. 30, no. 2, pp. 358–364, Apr. 2000.

[4] N. K. Kasabov, Evolving Connectionist Systems: Methods and Applica-tions in Bioinformatics. Berlin, Germany: Springer-Verlag, 2003.

[5] P. P. Angelov and D. Filev, “An approach to online identification ofTakagi-Sugeno fuzzy models,” IEEE Trans. Syst., Man, Cybern. B,Cybern., vol. 34, no. 1, pp. 484–498, Feb. 2004.

[6] E. Lughofer and E. P. Klement, “FLEXFIS: A variant for incrementallearning of Takagi-Sugeno fuzzy systems,” in Proc. FUZZ-IEEE, 2005,pp. 915–920.

[7] N. K. Kasabov, “Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning,” IEEE Trans. Syst., Man,Cybern. B, Cybern., vol. 31, no. 6, pp. 902–918, Dec. 2001.

[8] N. K. Kasabov and Q. Song, “DENFIS: Dynamic evolving neural-fuzzyinference system and its application for time-series prediction,” IEEETrans. Fuzzy Syst., vol. 10, no. 2, pp. 144–154, Apr. 2002.

[9] J.-S. R. Jang, Neuro-Fuzzy & Soft Computing. Englewood Cliffs, NJ:Prentice-Hall, 1997.

[10] G. B. Huang, P. Saratchandran, and N. Sundararajan, “An efficient se-quential learning algorithm for growing and pruning RBFNs,” IEEETrans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 6, pp. 2284–2292,Dec. 2004.

[11] G. Pfurtscheller, C. Neuper, A. Schlogl, and K. Lugger, “Separability ofEEG signals recorded during right and left motor imagery using adaptiveautoregressive parameters,” IEEE Trans. Rehabil. Eng., vol. 6, no. 3,pp. 316–324, Sep. 1998.

[12] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, andT. M. Vaughan, “Brain-computer interfaces for communication andcontrol,” Clin. Neurophysiol., vol. 113, no. 6, pp. 767–791, Jun. 2002.

[13] J. R. Wolpaw, H. Ramouser, D. J. McFarland, and G. Pfurtscheller,“EEG-based communication: Improved accuracy by response verifica-tion,” IEEE Trans. Rehabil. Eng., vol. 6, no. 3, pp. 326–333, Sep. 1998.

[14] T. M. Vaughan, “Guest editorial brain-computer interface technology:A review of the second international meeting,” IEEE Trans. Neural Syst.Rehabil. Eng., vol. 11, no. 2, pp. 94–109, Jun. 2003.

[15] A. Schlogl, C. Keinrath, R. Scherer, and G. Pfurtscheller, “Estimating themutual information of an EEG-based brain-computer interface,” Biomed.Technik, vol. 47, no. 1/2, pp. 3–8, Jan./Feb. 2002.

[16] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. Muller, and G. Curio,“The non-invasive Berlin brain-computer interface: Fast acquisition ofeffective performance in untrained subjects,” Neuroimage, vol. 37, no. 2,pp. 539–550, Aug. 2007. Elsevier.

[17] S. G. Mason, A. Bashashati, M. Fatoruechi, K. F. Navarro, andG. E. Birch, “A comprehensive survey of brain interface technologydesigns,” Ann. Biomed. Eng., vol. 35, no. 2, pp. 137–169, Feb. 2007.

[18] A. Bashashati, M. Fatourechi, R. K. Ward, and G. E. Birch, “A surveyof signal processing algorithms in brain-computer interfaces based onelectrical brain signals,” J. Neural Eng., vol. 4, no. 2, pp. R32–R37,Jun. 2007.

[19] D. Coyle, “Intelligent preprocessing and feature extraction techniques fora brain computer interface,” Ph.D. dissertation, Univ. Ulster, Coleraine,U.K., 2006.

[20] D. Coyle, G. Prasad, and T. M. McGinnity, “A time-series prediction ap-proach for feature extraction in a BCI,” IEEE Trans. Neural Syst. Rehabil.Eng., vol. 13, no. 4, pp. 461–467, Dec. 2005.

[21] D. Coyle, G. Prasad, and T. M. McGinnity, “Improving informationtransfer rates of a BCI by SOFNN-based multi-step-ahead time-seriesprediction,” in Proc. 3rd IEEE Syst., Man, Cybern. Conf., Sep. 2004,pp. 230–235.

[22] D. Coyle, G. Prasad, and T. M. McGinnity, “Improving signal separabil-ity and inter-session stability of a motor imagery-based brain-computerinterface by neural-time-series-prediction-preprocessing,” in Proc. 27thInt. IEEE Eng. Med. Biol. Conf., Sep. 2005, pp. 2294–2299.

[23] T. Takagi and M. Sugeno, “Derivation of fuzzy control rules from humanoperator’s control action,” in Proc. IFAC Symp. Fuzzy Inf., Knowl. Repre-sentation Decision Anal., 1983, pp. 55–60.

[24] B. Hassibi and D. G. Stork, “Second order derivatives for networkpruning: Optimal brain surgeon,” in Adv. Neural Inf. Process. Syst., 1993,vol. 4, pp. 164–171.

[25] A. Rizzi, M. Panella, and F. M. F. Maxioli, “Adaptive resolution of min-max classifiers,” IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 402–414,Mar. 2002.

[26] Graz Dataset IIIb for the BCI 2005 Data Classification Competition.[Online]. Available: http://ida.first.fhg.de/projects/bci/competition_iii/desc_IIIb.pdf

[27] B. J. Fisch, Fisch & Spellmann’s EEG Primer: Basic Principles of Digitaland Analog EEG. Amsterdam, The Netherlands: Elsevier, 1999.

[28] G. P. Williams, Chaos Theory Tamed. New York: Taylor & Francis,1997.

[29] B. Hjorth, “EEG analysis based on time-domain properties,” Electroen-cephalogr. Clin. Neurophysiol., vol. 29, no. 3, pp. 306–310, Sep. 1970.

[30] C. Guger, “Real-time data processing under windows for an EEG-basedbrain-computer interface,” Ph.D. dissertation, Tech. Univ. Graz, Graz,Austria, Sep. 1999.

[31] I. I. Goncharova and J. S. Barlow, “Changes in EEG mean frequency andspectral purity during spontaneous alpha blocking,” Electroencephalogr.Clin. Neurophysiol., vol. 76, no. 3, pp. 197–204, Sep. 1990.

[32] B. S. Everitt and G. Dunn, Applied Multivariate Data Analysis. London,U.K.: Arnold, 1991.

[33] The Mathworks Website. [Online]. Available: http://www.mathworks.com

[34] M. Birattari, G. Bontempi, and H. Bersini, “Lazy learning meets the recur-sive least-squares algorithm,” in Advances in Neural Information Process-ing Systems, vol. 11, M. S. Kearns, S. A. Solla, and D. A. Cohn, Eds.Cambridge, MA: MIT Press, 1999.

[35] D. Coyle, A. Satti, G. Prasad, and T. M. McGinnity, “Neural times-series prediction preprocessing meets common spatial patterns in a brain-computer interface,” in Proc. 30th Int. IEEE Eng. Med. Biol. Conf., 2008,pp. 2626–2629.

[36] D. Coyle, T. M. McGinnity, and G. Prasad, “A multi-class brain-computerinterface with SOFNN-based prediction preprocessing,” in IEEE WorldCongr. Comput. Intell., Jun. 2008, pp. 3695–3702.

[37] BCI Competition IV , 2008. [Online]. Available: http://ida.first.fhg.de/projects/bci/competition_iv/results/index.html

Damien Coyle (S’04–M’05) received the First ClassHonours degree in computing and electronic en-gineering and the Ph.D. degree in intelligent sys-tems engineering from the University of Ulster,Londonderry, U.K., in 2002 and 2006, respectively.

He is currently a Lecturer with the School of Com-puting and Intelligent Systems, Faculty of Comput-ing and Engineering, University of Ulster, where heis also a member of the Intelligent Systems ResearchCenter. His research interests include bio-inspiredcognitive and adaptive systems and biosignal pro-

cessing for brain–computer interface technology and diagnostics.Dr. Coyle is a member of the Institution of Engineering and Technology and

the Isaac Newton Institute for Mathematical Sciences. He received the IEEEComputational Intelligence Society Outstanding Doctoral Dissertation Awardin 2008. He is a Member of the Executive Committee of the IEEE Engineeringand Biology Society [U.K. and Republic of Ireland (UKRI) Chapter] and theChair of the IEEE Computational Intelligence Society (CIS) UKRI Chapter andthe IEEE CIS Graduates of the Last Decade (GOLD) Subcommittee. He is theinaugural Chair of the IEEE CIS Brain–Computer Interface Task Force and aMember of IEEE CIS Awards Committee.

COYLE et al.: FASTER SOFNN TRAINING AND HYPERPARAMETER ANALYSIS FOR BRAIN–COMPUTER INTERFACE 1471

Girijesh Prasad (M’98–SM’07) received theB.Tech. degree in electrical engineering from theRegional Engineering College, Calicut, India, in1987, the M.Tech. degree in computer scienceand technology from the University of Roorkee,Roorkee, India, in 1992, and the Ph.D. degree fromQueen’s University, Belfast, U.K., in 1997.

Since 1999, he has been with the School ofComputing and Intelligent Systems, Faculty ofComputing and Engineering, University of Ulster,Londonderry, U.K., where he is currently a Reader

and an executive member of the Intelligent Systems Research Center, leadingthe Brain–Computer Interface and Assistive Technology Team. His researchinterests are in self-organizing hybrid intelligent systems, neural computa-tion, fuzzy neural networks, type-1 and type-2 fuzzy logic, local model net-works, evolutionary algorithms, and adaptive predictive modeling and controlwith applications in complex industrial and biological systems, includingbrain–computer interface and assistive robotic systems. He is the author of morethan 85 peer-reviewed academic papers in international journals, books, andconference proceedings.

Dr. Prasad is a Chartered Engineer and a member of the Institution ofEngineering and Technology.

Thomas Martin McGinnity (M’82) received theFirst Class Honours degree in physics and thePh.D. degree from the University of Durham,Durham, U.K.

Since 1992, has been with the School of Comput-ing and Intelligent Systems, Faculty of Computingand Engineering, University of Ulster, Londonderry,U.K., where he is currently a Professor of intelligentsystems engineering and the Director of the Intelli-gent Systems Research Center, which encompassesthe research activities of more than 60 researchers.

He has 30 years of experience in teaching and research in electronic andcomputer engineering and was formerly the Head of the School of Computingand Intelligent Systems. More recently, he was the Acting Associate Dean ofthe Faculty of Engineering with responsibility for research and developmentand knowledge and technology transfer. He was a founding member of theIntelligent Systems Engineering Laboratory. He is also currently the Directorof the University of Ulster’s technology transfer company, UUTech. He is theauthor or a coauthor of more than 175 research papers. His current researchinterests are focused on computational intelligence and the creation of bio-inspired intelligent computational systems in general, particularly in relationto hardware and software implementations of biologically plausible artificialneural networks, fuzzy systems, genetic algorithms, embedded intelligent sys-tems utilizing reconfigurable logic devices, brain–computer interfacing, andintelligent systems in robotics.

Dr. McGinnity is a Chartered Engineer and a Fellow of the Institutionof Engineering and Technology. He is a recipient of a Senior DistinguishedResearch Fellowship and a Distinguished Learning Support Fellowship fromthe University of Ulster in recognition of his contribution to teaching andresearch.