Improving the separability of multiple EEG features for a BCI by...

10
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Transcript of Improving the separability of multiple EEG features for a BCI by...

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Biomedical Signal Processing and Control 5 (2010) 196–204

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control

journa l homepage: www.e lsev ier .com/ locate /bspc

Improving the separability of multiple EEG features for a BCI byneural-time-series-prediction-preprocessing

Damien Coyle ∗, T. Martin McGinnity, Girijesh PrasadIntelligent Systems Research Center, School of Computing and Intelligent Systems, Faculty of Computing and Engineering, Magee Campus,University of Ulster, Northland Road, Derry, Northern Ireland BT48 7JL, UK

a r t i c l e i n f o

Article history:Received 9 January 2009Received in revised form 16 March 2010Accepted 16 March 2010

Keywords:Brain–computer interfaceTime–frequencyTime-series predictionNeural networkElectroencephalogram (EEG)

a b s t r a c t

A recently proposed method for EEG preprocessing is extended and analyzed in this work via a rangeof different tests in combination with various other BCI components. Neural-time-series-prediction-processing (NTSPP) is a predictive approach to EEG preprocessing where prediction models (PMs) aretrained to perform one-step-ahead prediction of the EEG times-series which reflect motor imageryinduced alterations in neuronal activity. Due to the specialization of distinct PMs, the predicted sig-nals (Ys) and error signals (Es) are distinctly different from the original (Os) signals. The PMs map the Ossignals to a higher dimension which, in the majority of cases, produces features that are more separablethan those produced by the Os signals. Four feature extraction procedures, ranging in complexity andin terms of the information which is extracted i.e., time domain, frequency domain and time–frequency(t–f) domain, are used to determine the separability enhancements which are verified by comparativestatistical tests and brain–computer interface (BCI) tests on six subjects. It is shown that, in the majorityof the tests, features extracted from the NTSPP signals are more separable than those extracted from theOs signals, in terms of increased Euclidean distance between class means, reduced inter-class correla-tions and intra-class variance, and higher classification accuracy (CA), information transfer (IT) rate andmutual information (MI).

© 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Brain–computer interface (BCI) research is growing at a signif-icant pace and, since the beginning of the 21st century, has seenexplosive growth [1–3]. BCI technology can provide a communi-cation pathway from the brain to the computer which does notrely on neuromuscular control therefore there are many potentialbeneficiaries of the technology. These include people with neuro-muscular deficiencies due to disease or spinal chord injury. Beingable to offer these people an alternative means of communicationthrough BCI could have an obvious impact on the quality of life ofthese people [1–11]. There are other applications of BCI, yet to befully proven and exploited, such as neurofeedback for stroke reha-bilitation [12,13] and epileptic seizure prediction [14]. BCI is alsoemerging as an augmentative technology in computer games andvirtual reality technology [15].

Recent studies on multiple subjects with different levels of expe-rience in using a BCI for communication have shown that there areoften between 5 and 30% of attempted BCI communications mis-classified [11–21]. Insufficient robustness and low accuracy and

∗ Corresponding author. Tel.: +44 0 28 7137 5170; fax: +44 0 28 7137 5470.E-mail address: [email protected] (D. Coyle).

information transfer rate have prevented BCIs being offered tothose who need it and thus there is an ongoing need to develop andoptimize signal processing techniques for maximum performanceand develop new techniques specifically suited to processing bio-logical signals [22].

This paper presents a thorough analysis of a technique,introduced in [5], which has been specifically developed forpreprocessing EEG signals, referred to as neural-time-series-prediction-preprocessing (NTSPP). The basic concept behind NTSPPis based around using the differences in prediction outputs pro-duced by predictor networks specialized on predicting differenttypes of EEG signals to help improve the separability of EEG data andenhance overall BCI performance. Consider two EEG times series,xi, i ∈ {1,2} drawn from two different signal classes ci, i ∈ {1,2},respectively, assuming, in general, that the time series have dif-ferent dynamics in terms of spectral content and signal amplitudebut have some similarities. Consider also two prediction NNs, f1and f2, where f1 is trained to predict the values of x1 at time t + �given values of x1 up to time t (likewise, f2 is trained on timeseries x2), where � is the number of samples in the predictionhorizon. If each network is sufficiently trained to specialize on itsrespective training data, either x1 or x2, using a standard error-based objective function and a standard training algorithm, theneach network could be considered an ideal predictor for the data

1746-8094/$ – see front matter © 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.bspc.2010.03.004

Author's personal copy

D. Coyle et al. / Biomedical Signal Processing and Control 5 (2010) 196–204 197

type on which it was trained,1 i.e., specialized on a particular datatype. If each prediction NN is an ideal predictor then each shouldpredict the time-series on which it was trained perfectly, leavingonly error residual equivalent to white or Gaussian noise with zeromean.

In such cases the expected value of the mean error residualgiven predictor f1 for signal x1 is E[x1 − f1(x1)] = 0 and the expectedpower of the error residual, E[x1 − f1(x)]2, would be low (i.e., in rel-ative terms) whereas, if x2 is predicted by f1 then E[x2 − f1(x2)] /= 0and E[x2 − f1(x2)]2 would be high (i.e., again in relative terms). Theopposite would be observed when xi, i ∈ {1,2}, data are predicted bypredictor f2. Based on the above assumptions, a simple set of rulescould be used to determine which signal class an unknown signaltype, u, belongs too. To classify u one, or both, of the following rulescould be used:

1. If E[u − f1(u)] = 0 and E[u − f2(u)] /= 0 then u ∈ C1, otherwiseu ∈ C2.

2. If E[u − f1(u)]2 < E[u − f2(u)]2 then u ∈ C1, otherwise u ∈ C2.

These rules are simple rules and may only work successfully incases where the predictors are ideal however due to the complexityof EEG data, predictors trained on EEG data are rarely ideal but whentrained on EEG with different dynamics, e.g., left and right move-ment imagination (left or right motor imagery), predictor networkscan introduce desirable effects in the predicted outputs which makethem more separable than the original signals and thus aid in deter-mining which class an unknown signal belongs. Instead of usingonly one signal channel, the NTSPP framework can be used for eachsignal class and two or more channels and, instead of using simplerules such as those described above; information relevant to thedifferences introduced by the predictors for each class of signal canbe extracted via feature extraction and classified using a trainedclassifier.

In this work, we tested the hypothesis by analyzing NTSPP statis-tically using frequency- and time-domain features and evaluatingthe performance of a BCI system. The objective was to compare theperformance of a BCI with and without the NTSPP framework wherefeatures are extracted from the error residual (Es) and predictedsignals (Ys) using time–frequency (t–f), adaptive autoregressivecoefficients (AARc), and power- and mean-based feature extractionprocedures (FEPs) and compared to the same features extractedfrom the original signals (Os).

Although the NTSPP has been presented in [5] this work offers asignificant extension to previous work in that the proposed NTSPPframework is thoroughly tested using a spectrum of feature extrac-tion techniques ranging from simple to complex and involvinginformation extracted from both the time- and frequency domain.This comparison of these feature extraction techniques have notbeen performed previously but the main focus of this work is not tocompare feature extraction techniques per se, but instead featuresare compared when used in conjunction with the NTSPP frameworkagainst a system where no NTSPP is used, i.e., only raw EEG signals(Os). Such an analysis has not been presented previously. In addi-tion, a comparison of the efficacy and separability characteristics ofeach of the feature types is also presented. Along with these results,further details are presented in this paper, pertaining to how theNTSPP approach improves feature separability in BCI systems andhow hyperparameters can be easily chosen. In addition, statisti-cal tests are presented showing that the NTSPP can increase the

1 Multilayered feedforward NNs and adaptive-network based-fuzzy-inference-systems (ANFIS) are considered universal approximators due to having the capacityto approximate any function to any desired degree of accuracy with as few as onehidden layer that has sufficient neurons [22–24].

Table 1EEG data acquisition details.

Subject Feedback type Feedback time (s) Trials

S1 Bar extension [17] 4–9 280S2, S3 Bar extension [17] 4–8 320S4 Virtual reality [29,31] 4–8 476S5, S6 Basket paradigm [29,33] 4–7 1080

Euclidean distance between class means and reduce the inter-classcorrelation and the intra-class variance. This is shown to contributeto improved classification accuracy, information transfer rates [25]and mutual information [26] in a BCI, in addition to enhancingrobustness and feature stability across sessions. It is clearly demon-strated that significant performance improvement can be realizedwhen features are extracted from the NTSPP signals compared tothose extracted from the original signals.

The paper is organized as follows. Section 2 describes the dataacquisition and configuration procedures and the complete NTSPPframework. Section 3 presents four feature extraction proceduresand the methods by which they are applied in conjunction withthe NTSPP procedure. Section 4 provides details on data partition-ing and classification. Section 5 illustrates the effectiveness of theNTSPP procedure through comparative statistical analysis of fea-ture separability and practical tests on 6 healthy subjects. Section6 includes a discussion and concludes the paper.

2. Time-series prediction for EEG preprocessing

2.1. Data acquisition and description

The EEG data used in this work were recorded by the Graz BCIresearch group [17,29]. The data is recorded from 6 subjects (S1–S6)in timed experimental recording procedures. The recordings weremade using Ag/AgCl electrodes. All signals were sampled at 128 Hz(for S1, S2 and S3) and 125 Hz (for S4, S5 and S6) and filteredbetween 0.5 and 30 Hz. Two bipolar EEG channels were measuredusing two electrodes positioned 2.5 cm posterior and anterior toposition C3 and C4 according to the international standard (10/20system) electrode positioning nomenclature [30]. Each trial was7–9 s in length. The first 2 s was quiet, at t = 2 s an acoustic stimu-lus signified the beginning of a trial, then at t = 3 s, a cue stimulusis presented, indicating a right or left motor imagery should beperformed. For subjects S1–S3, the task was to move a bar in thedirection of a cue arrow by imagining moving the left or right hand[17]. The data for subject S4 were recorded during virtual realityfeedback experiments, similar to those described in [29,31,32]. Thedata for subjects S5 and S6 were recorded in experiments wherethe cue and feedback was provided using the “basket paradigm”[29,33]. The data for each subject was split into two sets (referredto as session 1 and 2), in the same way in which the datasets weresplit for the BCI competitions for subject S1 and S4–S6 [29], wherehalf of the data is used for training and cross-validation and theother half was used for testing (unlabelled for the competition).For subjects S2 and S3 the data for sessions 1 and 2 were recordedin separate sessions on different days (cf. Table 1 for a summaryof the data for each subject and section IV for details of furtherpartitioning of the training and validation set in this work).

2.2. Data configuration for time-series prediction

For the purpose of prediction, the recorded EEG time-series datais structured so that the signal measurements from sample indices tto t − (� − 1)� are used to make a prediction of the signal at sampleindex t + �. The parameter � is the embedding dimension. Each NNpredicts data recorded from both the C3 and C4 electrodes therefore

Author's personal copy

198 D. Coyle et al. / Biomedical Signal Processing and Control 5 (2010) 196–204

the training input exemplar contains � measurements from boththe C3 and C4 time-series’ and has the following format:

[c3t , c3t−�, ..., c3t−(�−1)�, c4t , c4t−�, ..., c4t−(�−1)� ; c3t+�, c4t+�]

(1)

where � is the time delay/lag and � is the prediction horizon (bothmeasured in samples). In this work the features are directly influ-enced by the NNs preprocessing capabilities therefore the qualityof the features is a good indication of the best � and � combinationto utilize.2 All values of � ranging from 2 to 8 with � ranging from1 to 2 were tested. One-step-ahead prediction is focused on in thiswork therefore � = 0 and for the purpose of this investigation themajority of results presented are based on � = 1. Details of methodsfor determining the best � and � values are described in [35,36].

2.3. Prediction NNs – architecture and training

Two feedforward multi-layer perceptron (MLP)-NNs are usedto perform prediction, referred to as pNNs and labelled ‘L’ for ‘left’(LpNN) and ‘R’ for ‘right’ (RpNN), corresponding to the type ofmotor imagery EEG data on which they are trained. For prediction,the optimum number of hidden layers, neurons in each layer andthe types of transfer function are important parameters and thebest choice depends on the complexity of the data to be learnedby the NN. In this investigation, many different pNN architectureswere tested. Neuron numbers were varied between 1 and 12 forsingle-hidden-layer NNs. Previous work has shown that modelorders in this range are sufficient for EEG modeling and prediction[1,11,27]. The input and output layer neuron numbers are dictatedby the number of signals being predicted and the chosen valueof �. Pure linear and tansigmoidal transfer functions were testedin most cases, for all subjects. Although multi-hidden-layer NNswere investigated, it was noted that there was no advantage gainedby the increased NN architecture complexity [5]. The pNN weretrained using the Levenberg–Marquardt [28] algorithm using 20%of the training data folds for early-stopping cross-validation duringnetwork training to ensure good generalization performance. Thequality of the features produced determined the best architecture(cf. Section 3 for the feature extraction methodology).

2.4. Neural-time-series-prediction-preprocessing

Subsequent to each set of pNNs being trained to perform one-step-ahead prediction, the data are input to both pNNs, trial by trial(cf. Section 2.2 for data configuration information). When a trial isinput to both pNNs, eight new signals are produced by the NTSPPframework, i.e., four prediction error signals (Es) and four predictedsignals (Ys) (cf. Fig. 1). Eq. (2) shows how each new signal type isobtained.

sy(t)k =

⎧⎪⎨⎪⎩y(t) − yk(t) for Es

or

yk(t) for Ys

(2)

where y(t) and yk(t) are the values of actual signal and predictedsignals (i.e., for either C3 or C4) at time index t, respectively. sy(t)k isthe NTSPP signal at time index t and the k index indicates whetherthe signal is the output from the LpNN or RpNN (i.e., k can be l orr). During application, the unknown data type is fed into both theLpNN and RpNN and, because each pNN provides a prediction for

2 The quality of the features, i.e., the attributes of the signal which convey the mostdiscriminative information, is verified by the feature separability which is quantifiedby the classification accuracy [1].

two signals (C3 and C4), features can be extracted from the Es or Yssignals or both. A new set of features can be extracted after each newset of predictions (i.e., at the rate of the sampling interval). Fig. 1shows details of the prediction framework used in conjunction withthe t–f based feature extraction process (cf. Section 3.3 for detailsof t–f based feature extraction procedure).

3. Feature extraction procedures

Any type of features can be extracted from the signals producedby the NTSPP framework. In this work four FEPs are used to verifythe advantages of the NTSPP framework.

3.1. NTSPP and power and mean-based feature extraction

When a trial is input to both pNNs, features are obtained bycontinually calculating the mean and power (mean squared) of theEs signals or the Ys signals as they pass through a feature extraction(FE) window. The FE window is illustrated by dashed lines in Fig. 1.Eqs. (3) and (4) are used to obtain power and mean-based featuresfrom the Es and/or Ys signals.

fyk =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

1M

M∑t=1

(y(t) − yk(t))2 for Es signals

or

1M

M∑t=1

(yk(t))2 for Ys signals

(3)

fyk =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

1M

M∑t=1

(y(t) − yk(t)) for Es signals

or

1M

M∑t=1

(yk(t)) for Ys signals

(4)

Most of the variables and indices are described in Section 2.4. Fourfeatures can be extracted after each new set of predictions (i.e., atthe rate of the sampling interval). M is the number of predictionsamples (i.e., the FE window length). Within the FE window, newpredictions contribute to each feature calculation while predictionsmade over M samples earlier are removed from the calculation sothat predictions at the beginning of a trial (or non-event-relatedpredictions) are forgotten as new predictions are made. In this workM is selected empirically during cross-validation on the trainingdata. A large value of M (more data in the window) often pro-duces better accuracy than a smaller window however this can haveimplications during online feedback sessions were a rapid feedbackresponse is required. Large windows can result in latency in pro-ducing high accuracy early in the trial. Most BCI systems requiringrapid response feedback will employ a window width of between0.5 and 1.5 s. In this work the objective was to find a subject-specificwindow width which maximised classification accuracy not neces-sarily reduce the latency although the results section does provideinformation on latency and evidence that the NTSPP can reducelatency also.

3.2. NTSPP and AAR-coefficients based feature extraction

The AARc FEP is described in [1,17,27,37]. The AAR model isof order m and a1t, . . ., amt are the time-varying AR-coefficients,which are estimated with the recursive least squares (RLS) algo-rithm using Kalman filter update coefficient (UC = 0.006) [17,27].The AAR-coefficients are used as features for each signal at each

Author's personal copy

D. Coyle et al. / Biomedical Signal Processing and Control 5 (2010) 196–204 199

Fig. 1. Schematic illustrating the NTSPP framework utilised in conjunction with the t–f FEP.

time point. For each signal, at each time point, there are m coeffi-cients therefore there are 2m features when Os signals are used and4m features when either Es or Ys signals are used. The best modelorder m for each signal type was selected by varying m between 1and 12. The best model order for motor imagery EEG usually fallswithin this range [1,11,27].

3.3. NTSPP and t–f-based feature extraction

A t–f based FEP, which is detailed in [38], was employedto extract frequency domain information using the short-time-Fourier-transform (STFT) and features are extracted from theinterpolated (smoothed) spectra within subject-specific frequencybands. Eq. (5) is used to calculate the STFT for the NTSPP signals.

Yk(f, �)=

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

�+(N/2)∑t=�−(N/2)

w ∗ (t − �)(y(t) − yk(t))e(

−j2�

Nfft)

for Es

�+(N/2)∑t=�−(N/2)

w ∗ (t − �)(yk(t))e(

−j2�

Nfft)

for Ys

(5)

where f = 0, 1, . . ., Nf − 1. Nf is the number of frequency points orFourier transforms. Yk(f,�) contains the frequency spectrum foreach STFT-window centered at �. w is a Gaussian window func-tion as shown in (6) and * denotes convolution. The other variablesand indices are described in Section 2.4.

w(t) = e− 1

2

(˛ t−(N/2)

(N/2)

)2

(6)

where 0 ≤ t < N and ˛ is the reciprocal of the standard deviation.The width of the window is inversely related to the value of ˛. Nis a constant parameter and denotes the length of the window and

˛ denotes the degree of localization in the time-domain (cf. [38]and references therein for further details). The number of STFT-windows used to analyze the data contained in the FE windowdepends on the length of the FE window, M, the STFT-windowlength, N, and the amount of overlap, ovl, between adjacent STFT-windows (cf. Fig. 1). M must always be larger than N. Yk is a matrixwith Nf rows and E = (M − ovl)/(N − ovl) columns (i.e., the rows con-tain the power of the signal for each harmonic and E is the numberof STFT-windows that are contained within the FE window).

The interpolation process and final part of the feature extractionprocedure is carried out using (7)–(9).

Yipkv(u) =

IPE∑i=IPS

Yk(i, v)(IPE − IPS + 1)

where v = 1, ..., E, (7)

IPS ={

0 if u − ip < 0,

u − ip otherwise,(8)

IPE ={

Nf − 1 if u + ip > Nf ,

u + ip otherwise.(9)

Eq. (5) is used to calculate Yk, E is the number of spectra and u isthe value of the interpolated spectra at each frequency point (har-monic) therefore u = 0, 1, . . ., Nf − 1. Nf is the number of frequencypoints or Fourier transforms in the spectra. The value of ip deter-mines the number of interpolation points which in turn determinesthe degree of smoothing. A feature, fyv

kis obtained by taking the

l2 norm of the interpolated spectra within pre-selected subject-specific frequency bands [38]. E is the number of spectra (i.e., thenumber of windows for one spectrogram), Z is the number of signals(Z = 2 for Os and Z = 4 for Ys) and:

m = ZE (10)

Author's personal copy

200 D. Coyle et al. / Biomedical Signal Processing and Control 5 (2010) 196–204

fyvk =

∥∥Yipkv

∥∥ where v = 1, . . . , E (11)

where fyvk

is a feature obtained from the reactive frequency bands ofthe vth interpolated spectrum of the signal produced by the kth pre-diction model, i.e., l or r, output channel y, i.e., 3 or 4 correspondingto C3 or C4 data. According to (10), if there are 3 spectra (i.e., 3 STFT-windows) within the FE window for each signal, then E = 3, Z = 2 (2signals) and m = 6 thus, each feature vector would contain six fea-tures. The feature vector for an LpNN with E interpolated spectra isshown in (12). Fig. 1 shows the NTSPP framework combined withthe t–f FEP.

f v = (f 31l , f 32

l , . . . , f 3El , f 41

l , f 42l , . . . , f 4E

l ) (12)

4. Data partitioning and classification

All parameter selection was performed using data from session1. Using a linear discriminant analysis (LDA) classifier [34] a 5-foldcross-validation was performed where session 1 for each subjectwas partitioned into a training set (80% of the data) and a test set(20% of the data) which was used to calculate the classificationaccuracy (CA) rates. For each 5-fold cross-validation this proce-dure was performed five times using a different test partition (20%)each time. For each set of cross-validation tests the trials were ran-domly mixed. The mean-CA (mCA), calculated from the CA ratesobtained on the five test partitions (i.e., (1/5)

∑5i=1CAi) and 90%

confidence intervals, were estimated for each subject and used toselect the best system parameters. 5-fold cross-validations wereiterated, changing the parameter combination for each iteration,until a good set of parameters was obtained, after which, all ses-sion 1 data is utilized to obtain a feature set and classifier using thechosen best parameters. The system is tested on session 2, whichis considered as the most informative offline, single-trial test [5].Cross-validation has been used for parameter tuning and only pro-vides indicative results. Session 2 is completely unseen data and thesession 2 results reflect better the results that would be obtainedin online single-trial tests after the system parameters have beenchosen.

5. Results

5.1. A statistical analysis of features and signals

A comparative analysis of the feature types extracted fromOs, Es and Ys signals was carried out where the mean values ofthe features extracted from signals, circa the point of maximumseparability3 on session 2 data, were analyzed. The Euclidean dis-tance (ε) between class means is a metric used to verify if theEs signals and/or Ys signals where more separable than the Ossignals. Other separability characteristics including the inter-classcorrelation coefficient (�) and the intra-class variance were alsoanalyzed. For comparative purposes the intra-class variances werenormalized across the three signal types so that a relative metricwas obtained, i.e., relative mean variance (rmv). Minimization of� and rmv, along with maximizing ε, is a major objective of datapreprocessing and feature extraction.

Best (B), median (M) and worst (W) are used to rank each signaltype (Os, Es and Ys), for each particular performance metric. Foreach signal type there are 3 performance metrics for each of thefour feature types therefore each signal type is compared based ontwelve performance measures. Ys and Es signals produce the most B

3 It must be noted that the signals are classified at the rate of the sampling interval,i.e., at every time instant t, between the initiation of the communication signal,i.e., the motor imagery based mental task, and the end. During this period there isnormally a time, t-max, where classification accuracy is maximal.

and M rankings and rarely produce W rankings whereas Os signalsobtain the majority of W rankings (cf. Fig. 2(a)). These observationsindicate that, in general, the NTSPP procedure does indeed improvefeature separability when utilized in conjunction with a number ofdifferent FEPs. Most notably, the Ys signals are most proficient atproducing separable features and have no W rankings for subjectsS1 and S3 Ys.

Fig. 2(b) provides a general overview of the best features. It canbe seen that, in terms of maximizing ε, the most complex FEP (com-plexity is quantified by the number of parameters to be tuned), i.e.,the t–f FEP, is the best, with the next most complex, i.e., the AARc-based FEP, producing the next maximum, then the simpler meanand then power-based features. The t–f approach also producesvery low rmw and introduces negative �; a desirable outcome forfeature extraction. It can be seen that each of the other FEPs producereasonable outcomes for only one of the metrics – mean/powermay be potentially useful for reducing � and the AARc approachcan result in a greater ε, as well as minimizing rmv whereas the t–fapproach can produce the most desired values of all performancemetrics. These observations can be viewed in a number of differentways (1) FEPs with the least complexity (least number of parame-ters to tune) are less capable of creating good separability and/or(2) FEPs which operate in the time domain only are less proficientthan those that extract frequency domain information, such as theAAR or t–f approaches.

5.2. Parameter selection

Performance on session 1 is analyzed based on mean-CA ratesand 90% confidence intervals (ci) obtained using a t-statistic (levelof significance ˛ = 0.1) for parameters selection. For each subjectthe features for the windowed based t–f, mean and power FEPs forsubjects S1–S6 were extracted from FE windows of width M = 390,350, 300, 430, 175 and 225, respectively. The parameters valuesselected for each of the feature types and pNNs are presented inTable 2, where details are outlined in the caption. For subjectsS1–S3 all pNNs architectures were chosen based on performanceproduced by power-based features. For these three subjects thebest chosen pNN architecture, for utilization in conjunction withpower-based features, was also used for other FEPs. In column 3,the number represents the number of neurons and the letters N andL represents non-linear and linear NN transfer functions, respec-tively.

Using the AAR, the mean and the power-based FEPs for subjectsS4–S6, the pNNs parameters were optimized in conjunction withthe optimization of the FEP parameters. For these subjects, onlynon-linear transfer functions were analyzed because an analysiscarried out in [5] indicated that, in the majority of cases, non-lineartransfer functions provided better performance than their linearcounterparts. pNN predictors were tested with pre-selected valuesfor the predefined parameters (such as neuron numbers and trans-fer functions) when used in conjunction with the t–f based FEP forsubjects S4–S6.

5.3. Performance results from BCI tests

The graphs in the left column of Fig. 3 show results for session 1.It can be seen that for 100% of the subjects, for the mean and power-based FEPs, the maximum mCA rates were obtained using NTSPPsignals. For the AAR approach, in 50% of the subjects analyzed theNTSPP signals produced the highest mCA rates. The t–f approachachieved the highest mCA rate for 50% of the subjects using NTSPPsignals.

The graphs in the middle column of Fig. 3 show results for ses-sion 2. It can be seen that the NTSPP signals (Es or Ys) worked bestwith the power-based features in 100% of the tests, the mean-based

Author's personal copy

D. Coyle et al. / Biomedical Signal Processing and Control 5 (2010) 196–204 201

Fig. 2. (a) Bar graph showing the frequency (i.e., proportion) at which each signal produces the best, median and worst separability for all features in terms of maximizingthe distance between class means and minimizing inter-class correlation and intra-class variance. (b) Bar graph showing the normalized values of the separability metricsfor each feature types across all signals and all subjects.

features worked conjunctively the best with the NTSPP signalsin 66% of the tests, whereas Os produced the best results in theremaining 34%, however, in these cases, the improvement over theNTSPP signals are less than 0.5%. The AARc and t–f features workedconjunctively the best with the NTSPP signals in 66% of the tests.These results are summarized in Fig. 4 where the average increasein CA rate achieved by either Es or Ys signals, compared to the Ossignals for session 2, is shown; indicating that the NTSPP procedure

has the potential to significantly improve the performance of moresimpler BCI systems. It must be noted that the results are based oncomparisons involving both Es and Ys against Os and there is stilla requirement to choose which of the two signal types, Es or Ys,should be used. This can be easily achieved by analyzing the resultsfor each subject for a particular feature type.

In terms of reduced classification time (CT) and increased infor-mation transfer (IT) [25] rate and mutual information (MI) [26]

Table 2Parameters selected for each feature extraction procedure and the NTSPP framework. Column 1 and 2 specify the subject and the signal type, respectively. Column 3 specifiesthe pNN parameters for power and mean features. Columns 4 and 5 specify the pNN parameters for AAR features and the model order, respectively. Column 6 specifies thepNN parameters used with t–f features and columns 7–11, respectively specify the STFT-window length and width, the overlap between STFT-windows and the interpolationinterval.

Features → Mean/power AAR t–f

Sb Sig. pNN pNN m pNN N ˛ ovl ip

S1Os na na 6 naa 150 3.68 1 7Es 5N 5N 9 5N 50 0.68 1 17Ys 8 150 3.68 5 3

S2Os na na 10 na 50 0.68 5 10Es 2L 2L 7 2L 50 0.68 5 16Ys 10 50 0.68 1 19

S3Os na na 10 na 50 0.68 15 6Es 4N 3N 10 3N 50 0.68 5 8Ys 8 50 0.68 5 10

S4Os na na 8 na 100 1 1 9Es 9N 10N 7 6N 200 0.5 1 6Ys 5 100 1 15 5

S5Os na na 7 na 50 RWb 1 4Es 6N 6N 4 6N 175 RW 1 5Ys 7N 4 50 RW 1 4

S6Os na na 7 na 225 0.5 1 6Es 8N 4N 11 6N 200 0.5 1 2Ys 7N 6 200 0.5 1 6

a Not applicable.b Rectangular window provided better results than a Gaussian window.

Author's personal copy

202 D. Coyle et al. / Biomedical Signal Processing and Control 5 (2010) 196–204

Fig. 3. Results for each subject obtained using power, mean, AAR t–f based FEPs carried out on the original signals and the NTSPP signals. Graphs in the first column showthe mean-CA[%] ± 90% confidence intervals for session 1 cross-validation. The graphs in the middle column show the CA rates for session two. In the majority of cases theOs level is lower than at either the Es or Ys signals, or both. The graphs in the last column show average classification time (CT), information transfer rates (IT) and mutualinformation (MI) across all subjects for each signal type for Session 2.

the graphs in the right column of Fig. 3 show the average val-ues of these performance metrics. CT is reduced or equal to Osfor either Es or Ys in all cases except for mean features, indicatingthat NTSPP has the potential to reduce CT and thus help increase ITrates. The increase in IT shown in Fig. 3 for all feature types indi-cates another significant advantage of preprocessing with NTSPP.For power and mean features the average increase is >2 bits/min.NTSPP achieves this by either reducing CT or increasing CA or both.For AARc and t–f features, where the CA enhancements were lesspronounced, the average IT rates are improved by >1 bit/min. Itmust be noted that this work has focused on basing all performancemeasurements on the point of maximum separability in the trial.This provides indicative information from an offline analysis of thepotential maximal performance attainable by the subject/method,assuming the classification at the point determined via offline anal-ysis is used to make the decision during online BCI control. Thisis satisfactory for comparison of algorithms for synchronous BCIsinvolving a single binary communication per trial. Depending onthe BCI application, where continuous control of a cursor is sought(perhaps self-paced, i.e., user initiated timing), other performancequantifiers are needed and the time at which performance becomes

maximal or the user is able to produce a classifier output whichexceeds a threshold needs to be considered. For example, the steep-ness in rise of the MI quantifier was used in the BCI competitionIII to judge the best algorithm for dataset IIIb [29]. Including the“steepness in rise” quantifier enabled determination of which algo-rithms produced best control in the quickest time within each trial.In terms of MI in this work, the NTSPP introduces improvementsover the Os signals in all cases with t–f producing the highest MI.This is indicative of another significant benefit of using NTSPP. TheMI reflects the amount of information that can be derived from theclassifier output, i.e., the time-varying signed distance (TSD) [26].MI is increased one order of magnitude to improve visual clarity inFig. 3.

6. Discussion and conclusion

From this investigation it has been shown that single-hidden-layer NNs with up to 10 neurons in the hidden layer provide the bestresults. For subject S2 linear activation functions worked best, whilefor subject S1 and S3 non-linear activation functions worked best(for subjects S4–S6 non-linear transfer function only were analyzed

Author's personal copy

D. Coyle et al. / Biomedical Signal Processing and Control 5 (2010) 196–204 203

Fig. 4. Bar graph showing the percentage of cases when NTSPP signals produced better CA than the Os signals for sessions 1 and 2 and the average increase in CA on session2 produced by NTSPP for each of the FEPs.

only and, for t–f features, the prediction embedding dimension,� = 6 was predefined). Subject-specific parameter tuning in theranges outlined above should provide further improvements tothe results. Generally, M ranging from 175 to 430 provided thebest results. The best pNN parameters can be chosen utilizingan iterative approach similar to that described for selecting thebest FE window width, M, where, firstly a good window width(M = 300–400) is chosen and the pNN architecture is adjusted andcross-validated through an iterative training and validation proce-dure. This approach was fully automated via strategic programmingbut has the disadvantage of being computationally expensive. Aniterative process for converging upon a good set of t–f parameters isoutlined in [38]. The NN parameters in this work were also empiri-cally selected and these are quite critical for performance. Althoughthe networks are trained using an error-based objective function,selection of the parameters is based on average cross-validationclassification accuracy of the overall system. A general guideline forthe time delay embedding parameters is to use � = 6 and � = 1 how-ever subject-specific parameters tuning always produces betterresults and the empirical approach is normally the best approach.Other types of prediction models that may enable automated detec-tion and processing of temporal information could be consideredand tested within this framework. The recurrent network is a popu-lar choice for prediction based applications because the recurrencemakes it possible for the networks to process sequential inputs,enabling these networks to potentially discover an intrinsic tem-porally successive structure in the EEG time series [1]. However,networks with recurrent connections often have inherent sensitiv-ity and can become instable. For example, if outliers or artifactscause large error in the recurrent feedback signal, a recurrent net-work will diverge instead of converge upon the optimal solution.This can result in extensive training durations and/or poor net-work performance. A network such as the recurrent network whichis prone to divergence may be problematic or require extensivesystem setup on multiple subjects however; it may be useful toinvestigate recurrent connections in the NTSPP framework in thefuture.

Similar issues do not occur in the feedforward MLP-NN hencethe choice for this BCI application. The feedforward MLP-NN alwaysconverges and usually converges upon a good predictive solutionefficiently. The generalization performance is ensured by using 20%of the training data for early-stopping during training and this isconfirmed with overall results presented on the unseen data, indi-cating that the networks in the NTSPP framework specialize ontheir respective data to produce the overall performance enhance-ment.

The results have demonstrated that the NTSPP is advantageousfor conjunctive utilization with all the FEPs tested in this investi-gation. It has been shown that the extent of the enhancements tosignal separability and inter-session stability is dependent on thecomplexity of the FEP, i.e., in some cases the greatest improvementswere made using simple feature extraction methods – power-basedfeatures being the most pertinent. It is expected that the Es signalswould work well in conjunction with power features because, forexample, if left data is fed into the LpNN, the power in the errorresidual would be expected to be lower than the power in theerror residual after feeding the same data into the RpNN. Changesin the power in error residual morphology can be associated withthe backpropagation training function and error-based objectivefunction as described in Section 1. These are fundamental con-cepts of the NTSPP procedure and, even though the separabilitydoes not consistently occur as outlined in Section 1, in the majorityof cases there is some derivative of the concept creating a signifi-cant increase in signal separability. NTSPP can map the Os signalsto higher dimension which produces features that are more sepa-rable than those produced by the Os signals, in terms of increasedEuclidean distance between class means and reduced inter-classcorrelation and intra-class variance [6]. NTSPP can also act as afilter of irregular transients and noise sources, since filtering andprediction go hand in hand. NTSPP is different to basic filteringin that different filters/predictors are developed for different datatypes but used to process both data types. These are benefits of theNTSPP framework and the results presented in this work have pro-vide clear evidence of the benefits and indeed the shortcomings ofthe NTSPP – one shortcoming being that it does not significantlyimprove the separability of the features which work best, i.e., themore complex t–f based approaches. However simplicity is favoredover complexity in BCI development, to enable easier adaptationto each individual and continuous adaptation in the long term [39].NTSPP increases the potentiality of using simpler feature extractionmethods. Each individual has a different capacity to manipulatethere brain waves, this can be due differences in brain physiol-ogy, alertness and fatigue levels and levels of BCI experience. Thisoften results in large inter-subject variability therefore the resultsdo not consistently show the same improvement for each subjectfor each of the feature types however it is shown that there is anoverall improvement in the aggregate performance given by theNTSPP framework for all features, as shown in Fig. 4, justifying theapplication of NTSPP as important preprocessing method in BCI.

Further extensions to the NTSPP framework to improve auton-omy in adaptation and performance and reduce latency therebyimproving IT rate, involving a self-organizing fuzzy neural network

Author's personal copy

204 D. Coyle et al. / Biomedical Signal Processing and Control 5 (2010) 196–204

(SOFNN), have also been identified and results from preliminaryanalyses are outlined in [40,41]. The main differences of thisapproach are that subject-specific parameters for the NTSPP werenot selected for any subject but the self-organizing characteris-tics of the SOFNN allowed autonomous adaptation to account forsubject-specific variability. In addition, the FEPs and classifier useddid not have any subject-specific parameters and thus the BCI pre-sented was completely parameterless. Using a similar approacha multiclass implementation of the NTSPP framework is outlinedin [42] where preliminary results show the potential of usingthe NTSPP to enhance performance in a multiclass system with aminimal number of electrode channels. Future work will involveenhancing the NTSPP procedure by training the pNNs simultane-ously with an objective function that will produce networks that aremore specialized in producing predictions for the class of data onwhich they are trained but also help the network to produce signifi-cant differences when predicting data from other classes (classes ofdata which the network has not be trained to predict). This might beachieved through a process of negative correlating outputs of net-work during the training of network for different classes [43,44].This would offer more control over the framework and ensure thatin all cases separability is improved.

References

[1] T.M. Vaughan, J.R. Wolpaw, Guest editorial: the third international meeting onbrain–computer interface technology: making a difference, IEEE Trans. NeuralSyst. Rehab. Eng. 14 (June (2)) (2006).

[2] S.G. Mason, A. Bashashati, M. Fatoruechi, K.F. Navarro, G.E. Birch, A compre-hensive survey of brain interface technology designs, Ann. Biomed. Eng. 35 (2)(2007) 137–169.

[3] A. Bashashati, M. Fatourechi, R.K. Ward, G.E. Birch, A survey of signal processingalgorithms in brain–computer interfaces based on electrical brain signals, J.Neural Eng. 4 (2007) R32–R37.

[4] J.R. Wolpaw, N. Birbaumer, D.J. McFarland, G. Pfurtscheller, T.M. Vaughan,Brain–computer interfaces for communication and control, J. Clin. Neurophys-iol. 113 (2002) 767–791.

[5] D. Coyle, G. Prasad, T.M. McGinnity, A time-series prediction approach for fea-ture extraction in a brain–computer interface, IEEE Trans. Neural Syst. Rehab.Eng. 13 (December (4)) (2005) 461–467.

[6] D. Coyle, Intelligent preprocessing and feature extraction techniques for a braincomputer interface, PhD thesis, Faculty of Computing and Engineering, Univer-sity of Ulster, N. Ireland, 2006.

[8] G. Pfurtscheller, C. Guger, G. Muller, G. Krausz, C. Neuper, Brain oscil-lations control hand orthosis in a tetraplegic, Neurosci. Lett. 292 (2000)211–214.

[9] J. Kaiser, J. Perelmouter, I. Iversen, N. Neumann, N. Ghanayim, T. Hinterberger,A. Kubler, B. Kotchoubey, N. Birbaumer, Self-initiation of EEG-based commu-nication in paralyzed patients, J. Clin. Neurophysiol. 112 (2001) 551–554.

[10] A. Kubler, B. Kotchoubey, T. Hinterberger, N. Ghanayim, J. Perelmouter, M.Schauer, C. Fritsch, E. Taub, N. Birbaumer, The thought translation device: aneurophysiological approach to communication in total motor paralysis, Exp.Brain Res. 124 (1999) 223–232.

[11] G. Pfurtscheller, C. Neuper, A. Schlogl, K. Lugger, Separability of EEG signalsrecorded during right and left motor imagery using adaptive autoregressiveparameters, IEEE Trans. Rehab. Eng. 6 (September (3)) (1998) 316–324.

[12] T.E. Ward, C.J. Soraghan, F. Mathews, C. Markham, A concept for extendingthe applicability of constraint induced movement therapy through motor cor-tex activity feedback using a neural prosthesis, Comput. Int. Neurosci. (2007),http://www.hindawi.com/journals/cin/2007/051363.abs.html.

[13] G. Prasad, P. Herman, D. Coyle, S. McDonough, J. Crosbie, Using a motor imagery-based brain–computer interface for post-stroke rehabilitation, in: Proceedingsof the 4th IEEE EMB Conf. Neural Engineering, April, 2009, pp. 258–262.

[14] L.D. Iasemidis, Epileptic seizure prediction and control, IEEE Trans. Biomed.Eng. 50 (May (5)) (2003) 549–558.

[15] R. Leeb, F. Lee, C. Kenraith, H. Bischof, G. Pfurtscheller, Brain–computer com-munication: motivation, aim, and impact of exploring a virtual apartment, IEEETrans. Neural Syst. Rehab. Eng. 15 (December (4)) (2007).

[16] C. Guger, H. Ramouser, G. Pfurtscheller, Real-time EEG analysis with subject-specific spatial pattern for a brain–computer interface (BCI), IEEE Trans. Rehab.Eng. 8 (December (4)) (2000) 447–456.

[17] C. Guger, A. Schlogl, C. Neuper, T. Strein, D. Walterspacher, G. Pfurtscheller,Rapid proto-typing of an EEG-based brain–computer interface, IEEE Trans. Neu-ral Syst. Rehab. Eng. 9 (March (1)) (2001) 49–57.

[18] S. Lemm, C. Schafer, G. Curio, BCI competition—data set. III. Probabilisticmodelling of sensorimotor � rhythms for classification of imaginary handmovements, IEEE Trans. Biomed. Eng. 51 (June (6)) (2004) 1077–1080.

[19] R. Scherer, G.R. Muller, C. Neuper, B. Graimann, G. Pfurtscheller, Anasynchronously controlled EEG-based virtual keyboard: improvement ofthe spelling rate, IEEE Trans. Biomed. Eng. 51 (September (6)) (2004)979–984.

[20] Georg E. Fabiani, D.J. McFarland, J.R. Wolpaw, G. Pfurtscheller, Conver-sion of EEG activity into cursor movement by a brain–computer interface(BCI), IEEE Trans. Neural Syst. Rehab. Eng. 12 (September (3)) (2004)331–338.

[21] D.J. McFarland, J.R. Wolpaw, Sensorimotor rhythm-based brain–computerinterface (BCI): feature selection by regression improves performance, IEEETrans. Neural Syst. Rehab. Eng. 13 (September (3)) (2005) 372–379.

[22] Vaughan, et al., Guest Editorial – brain–computer interface technology: areview of the second international meeting, IEEE Trans. Neural Syst. Rehab.Eng. 11 (June (2)) (2003) 94–109.

[23] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks areuniversal approximators, Neural Netw. 2 (1989) 359–366.

[24] J.-S.R. Jang, C.-T. Sun, E. Mizutani, Neuro-Fuzzy & Soft Computing, Prentice-Hall,Englewood Cliffs, NJ, 1997.

[25] J.R. Wolpaw, H. Ramouser, D.J. McFarland, G. Pfurtscheller, EEG-based commu-nication: improved accuracy by response verification, IEEE Trans. Rehab. Eng.6 (September (3)) (1998) 326–333.

[26] A. Schlogl, C. Keinrath, R. Scherer, G. Pfurtscheller, Estimating the mutual infor-mation of an EEG-based brain computer interface, Biomedizinische Technik Band47 (2002) 03–08.

[27] A. Schlogl, D. Flotzinger, G. Pfurtscheller, Adaptive autoregressive modellingused for single-trial EEG classification, Biomedizinische Technik Band 42 (1997)162–167.

[28] M.T. Hagan, M. Menhaj, Training feedforward networks with the Marquardtalgorithm, IEEE Trans. Neural Netw 5 (6) (1994) 989–993.

[29] The Graz dataset IIIb description for the BCI2005 competition.http://ida.first.fhg.de/projects/bci/competition iii/desc IIIb.pdf, 2005.

[30] B.J. Fisch, Fisch & Spellmann’s EEG Primer: Basic Principles of Digital and AnalogEEG, Elsevier, 1999.

[31] R. Leeb, C. Keinrath, C. Guger, R. Scherer, D. Friedman, M. Slater, G. Pfurtscheller,Using a BCI as a navigation tool in virtual environments, in: Proc. of the2nd International Brain–Computer Interface Workshop and Training Course,September, 2004, pp. 65–66.

[32] R. Leeb, R. Scherer, F. Lee, H. Bischof, G. Pfurtscheller, Navigation in virtualenvironments through motor imagery, in: Proc. of the 9th Computer VisionWinter Workshop, 2004, pp. 99–108.

[33] C. Vidaurre, A. Schlogl, R. Cabeza, G. Pfurtscheller, A fully online adaptivebrain–computer interface, Biomedizinische Technik, Band 49 (September (2))(2004) 760–761 (special issue).

[34] B.S. Everitt, G. Dunn, Applied Multivariate Data Analysis, Edward Arnold, 1991.[35] G.P. Williams, Chaos Theory Tamed, Taylor and Francis, 1997.[36] D. Coyle, G. Prasad, T.M. McGinnity, P. Herman, Estimating the predictability of

EEG recorded over the motor cortex using information theoretic functionals,in: Proceedings of the 2nd International Brain–Computer Interface Workshop,September, 2004, pp. 43–44.

[37] D. Coyle, G. Prasad, T.M. McGinnity, Improving signal separability andinter-session stability of a motor imagery-based brain–computer interfaceby neural-time-series-prediction-preprocessing, in: Proceedings of the 27thInternational IEEE Eng. Med. and Biol. Conference, September, 2005, pp.2294–2299.

[38] D. Coyle, G. Prasad, T.M. McGinnity, A time–frequency approach to featureextraction for a brain–computer interface with a comparative analysis of per-formance measures, EURASIP J. Appl. Signal Process. 19 (November) (2005)3141–3151 (special issue on Trends in Brain–computer Interfaces).

[39] J.R. Wolpaw, Brain–computer interfaces for communication and control: cur-rent status, in: Proceedings of the 2nd International Brain–Computer InterfaceWorkshop and Training Course, Biomedizinische Technik, September, 2004, pp.43–44.

[40] D. Coyle, G. Prasad, T.M. McGinnity, Improving information transfer rates ofa BCI by self-organising fuzzy neural network-based multi-step-ahead time-series prediction, in: Proceeding of the 3rd IEEE Syst, Man and Cyber. (UK&RIChapter) conference, September, 2004, pp. 230–235.

[41] D. Coyle, G. Prasad, T.M. McGinnity, Faster self-organizing fuzzy neural networktraining and a hyperparameter analysis for a brain–computer interface, IEEETrans. Syst. Man Cybernet. B 39 (December (6)) (2009) 1458–1471.

[42] D. Coyle, T.M. McGinnity, G. Prasad, A multi-class brain–computer interfacewith SOFNN-based prediction preprocessing, in: IEEE World Congress on Com-putational Intelligence, June, 2008, pp. 3695–3702.

[43] Y. Liu, X. Yao, Simultaneous training of negatively correlated neural networksin an ensemble, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 29 (December (6))(1999) 716–725.

[44] X. Yao, Y. Liu, Making use of population information in evolutionary artificialneural networks, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 29 (June (3))(1998) 417–425.