Measuring Time Series Predictability Using Support Vector Regression

16
PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [Sistema Integrado de Bibliotecas/USP] On: 16 January 2009 Access details: Access Details: [subscription number 794614673] Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Simulation and Computation Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713597237 Measuring Time Series Predictability Using Support Vector Regression João R. Sato ab ; Sergi Costafreda c ; Pedro A. Morettin a ; Michael John Brammer c a Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil b LIM44/NIF, Institute of Radiology, University of São Paulo, São Paulo, Brazil c Brain Image Analysis Unit, Institute of Psychiatry, King's College, London, UK Online Publication Date: 01 June 2008 To cite this Article Sato, João R., Costafreda, Sergi, Morettin, Pedro A. and Brammer, Michael John(2008)'Measuring Time Series Predictability Using Support Vector Regression',Communications in Statistics - Simulation and Computation,37:6,1183 — 1197 To link to this Article: DOI: 10.1080/03610910801942422 URL: http://dx.doi.org/10.1080/03610910801942422 Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Transcript of Measuring Time Series Predictability Using Support Vector Regression

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [Sistema Integrado de Bibliotecas/USP]On: 16 January 2009Access details: Access Details: [subscription number 794614673]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Simulation and ComputationPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713597237

Measuring Time Series Predictability Using Support Vector RegressionJoão R. Sato ab; Sergi Costafreda c; Pedro A. Morettin a; Michael John Brammer c

a Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil b LIM44/NIF, Institute ofRadiology, University of São Paulo, São Paulo, Brazil c Brain Image Analysis Unit, Institute of Psychiatry,King's College, London, UK

Online Publication Date: 01 June 2008

To cite this Article Sato, João R., Costafreda, Sergi, Morettin, Pedro A. and Brammer, Michael John(2008)'Measuring Time SeriesPredictability Using Support Vector Regression',Communications in Statistics - Simulation and Computation,37:6,1183 — 1197

To link to this Article: DOI: 10.1080/03610910801942422

URL: http://dx.doi.org/10.1080/03610910801942422

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

Communications in Statistics—Simulation and Computation®, 37: 1183–1197, 2008Copyright © Taylor & Francis Group, LLCISSN: 0361-0918 print/1532-4141 onlineDOI: 10.1080/03610910801942422

Time Series Analysis

Measuring Time Series PredictabilityUsing Support Vector Regression

JOÃO R. SATO1�2, SERGI COSTAFREDA3,PEDRO A. MORETTIN1,AND MICHAEL JOHN BRAMMER3

1Institute of Mathematics and Statistics, University of São Paulo,São Paulo, Brazil2LIM44/NIF, Institute of Radiology, University of São Paulo,São Paulo, Brazil3Brain Image Analysis Unit, Institute of Psychiatry, King’s College,London, UK

Most studies involving statistical time series analysis rely on assumptions oflinearity, which by its simplicity facilitates parameter interpretation and estimation.However, the linearity assumption may be too restrictive for many practicalapplications. The implementation of nonlinear models in time series analysis involvesthe estimation of a large set of parameters, frequently leading to overfittingproblems. In this article, a predictability coefficient is estimated using a combinationof nonlinear autoregressive models and the use of support vector regression in thismodel is explored. We illustrate the usefulness and interpretability of results by usingelectroencephalographic records of an epileptic patient.

Keywords Autoregressive; Machine learning; Non-linear; Prediction; Regres-sion; Support vector.

Mathematics Subject Classification 62-Statistics; 68-Computer Science.

1. Introduction

The main goals of time series analysis are the identification of implicit/explicitproperties and forecasting. The properties of interest may be temporal patterns suchas dependence on previous events, trends, and/or periodicities. These characteristicsare generally extracted by statistical modeling and their significance examined byhypothesis testing. Once these properties are identified, one may use the model to

Received September 6, 2007; Accepted January 24, 2008Address correspondence to João R. Sato, Lim44 – Department of Radiology, University

of São Paulo, Av. Dr. Enéas de Carvalho Aguiar, 255, Cerqueira César, São Paulo05403-001, Brazil; E-mail: [email protected]

1183

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

1184 Sato et al.

predict future observations. Many studies involving the extraction of underlyingproperties from time series can be found in econometrics (Enders, 2003; Stockand Watson, 2001), signal processing (Stoica and Moses, 2005), epidemiology (Lai,2005), geophysics (Grinsted et al., 2004), and recently in neurosciences (Andrzejaket al., 2001; Baccala and Sameshima, 2001; Friston et al., 1995; Harrison et al., 2003;Sato et al., 2006a,b).

Time series analysis of brain signals has become an importanttool for neuroscience studies. The advent of technologies such as EEG(electroencephalography), fMRI (functional magnetic resonance imaging), andMEG (magnetoencephalography), which allow the temporal quantification oflocalized brain activity, provides an objective neurophysiological measure forunderstanding brain processes. However, it is usually assumed in most analyticapproaches that the observed time series belongs to the class of linear stationaryprocesses. Generally, for most processes in this class, the models are simple,the interpretation of parameters is intuitive and estimation procedures arecomputationally easy to implement. The autoregressive and moving averageprocesses are representatives of the linear class. Nevertheless, the linearityassumption may be too restrictive in some real applications, as nonlinear processeshave often been described in neuroscientific studies (see Andrzejak et al., 2001;Deneux and Faugeras, 2006; Friston et al., 2000; Riera et al., 2004). Some nonlinearparametric models have been suggested to overcome the linearity limitation. Haganand Ozaki (1981) proposed the exponential autoregressive model (EXPAR) andTong (1983) introduced the threshold autoregressive model (TAR). A more generaltype of model is the functional-coefficient autoregressive class described by Chenand Tsay (1993), Cai et al. (1999), and Morettin and Chiann (2006). Generalizedadditive models (GAM) were also introduced to nonparametric estimation of non-linear models by Hyndman (1999). Such nonparametric models can be attractivealternatives to parametric approaches for the analysis of nonlinearity in time series,mainly due to their flexibility.

Despite the fact that they have a reasonable performance in cases of regressionswith one response and one predictor variable, many nonparametric approaches havelimitations in multiple regression estimation. These limitations may be overcomeby constraining the space of functions to be estimated, e.g., assuming additivemodels (linear combinations of functions). By nonparametric methods, we meanmethods without a pre-specified functional form (e.g., quadratic, sigmoid, etc.)and not without parameters. Furthermore, the application of most nonparametricprocedures is limited in cases of small sample datasets, because if the number ofparameters to be estimated is large, the estimates are poor. In addition, if thenumber of parameters to be estimated is greater than the sample size, most modelscannot even be estimated (ill-posed problems).

An attractive way to deal with this limitation and ill-posed problems is by usingsupport vector methods, first described by Vapnik (1995, 1998) and embedded intoartificial intelligence and statistical learning theory (Hastie et al., 2001). Supportvector machines (SVM) were primarily developed in the context of supervisedclassification, as robust alternatives to Fisher linear discriminant analysis (Fisher,1936). Generally, least squares and maximum likelihood estimation are founded onempirical risk minimization, i.e., minimizing the error between the fit and observeddata, which may lead to overfitting problems. On the other hand, the main ideaof support vector methods is structural risk minimization, which is related to

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

Predictability SVR 1185

the concept of generalization, i.e., the prediction of a new observation from theproperties of previously analyzed data. This concept can also be applied to realcurves (support vector regression—SVR) and density estimation (Vapnik, 1995,1998).

Despite their apparently attractive properties, studies about support vectormethods to statistical time series analysis still have not been large in number. Mülleret al. (1997) and Mukherjee et al. (1997) introduced support vector regression totime series forecasting, and empirically demonstrated its superiority compared toclassical ARMA models. Tay and Cao (2001) and Ping-Feng and Chih-Sheng (2005)applied support vector methods to predicting time series behavior of financial assets,obtaining good results.

In the current study, a nonlinear predictability coefficient based on SVRis introduced and some of its properties were explored by using computationalsimulations. An application of this approach to electroencephalography (EEG)datasets of an epileptic patient during a seizure is presented. The main focus of thisapplication is to estimate the predictability of EEG time series comparing linear andnonlinear models.

2. Methods

In the development of approaches to dealing with ill-posed problems inclassification, Vapnik (1995, 1998) introduced the concept of structural riskminimization founded on statistical learning theory. As a practical resultof mathematical theorems involving complexity and VC-dimensions (Vapnik–Chervoneski dimension, which measures the capacity, i.e., flexibility of aclassification procedure), Vapnik introduced the support vector machine (SVM),which can be seen as an optimum solution for a classification problem.Furthermore, empirical studies have also shown the superiority of SVM comparedto Fisher linear discriminant analysis (1936) for the analysis of high-dimensionaldata (Mourão-Miranda et al., 2005). Finally, the concept of structural riskminimization has also been extended to the case of real valued function estimation(Gunn, 1998; Smola and Schoelkopf, 1998; Vapnik, 1995), i.e., to the case ofregression models.

2.1. Support Vector Regression

In this section, the mathematical bases of support vector regression (SVM)are briefly described. More theoretical detail and efficient computationalimplementations can be found in the works of Vapnik (1995), Smola and Schoelkopf(1998), and Hastie et al. (2001).

Let xi be a k-dimensional real valued vector and yi a real number (i = 1� � � � � N�.The main idea of a support vector nonlinear regression (SVR) is to estimate afunction f�xi�, f � �k → �, such that

yi = f�xi�+ �i�

where �i’s are random errors with mean zero and variance �2, the space of xi is theinput space, and � is a high-dimensional space called the feature space (an extension

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

1186 Sato et al.

of input space considering the nonlinearities and interactions). The function f�xi�

can be written as:

f�xi� = ���xi�� + b�

where � �k → � (i.e., is a function from the input space to the feature space)and � ∈ �.

An estimate for � can be obtained by minimizing the structural risk Rstruct���,defined as the sum of the empirical risk of a functional Remp�f��� and a complexityquantity. The structural risk is minimized focusing a balance between the residualerror and the flatness (flexibility) of �. Thus, SVR balances the residuals varianceand the flexibility of the estimated curve, preventing from overfitting problems.Formally,

Rstruct��� = Remp���+���22

=N∑i=1

C�f�xi�− yi�+���22

where C is a cost function for the errors. It can be shown that this minimizationmay be reduced to a quadratic optimization problem with a unique solution. Forcomputational convenience, the objective function Rstruct��� is solved in the spaceof Lagrange multipliers, i.e., dual space. Additionally, the vector � can be writtenas a linear combination of the data, i.e.,

� =N∑i=1

��− �∗j ��xi��

where � and �∗j are the solutions in dual space, satisfying

f�x� =N∑j=1

��− �∗j �k�xj� x�+ b�

and k�xl� xm� � � → � is called the kernel function, satisfying k�xl� xm� =��xl��xm��.

The advantage of using kernel functions lies not only in incorporatingnonlinearities but also in the fact that the function does not need to be explicitlyestimated. Note that the kernel function k�xl� xm� requires only the dot productsof (this property is commonly named “kernel trick”). However, an importantquestion that arises in this analysis is which kernel functions best represent the dotproduct of any function � �k → �. Basically, the Mercer condition states that fork ∈ L2���, if

∫�×�

k�x� x′�f�x�f�x′�dx dx′ ≥ 0 for all f ∈ L2���

is valid, then k�x� x′� can be written as a dot product in some feature space (moredetails and mathematical formality can be found in Mercer, 1909, and Smola and

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

Predictability SVR 1187

Schoelkopf, 1998, Sec. 2.3). Due to its simplicity and flexibility, one of the mostfrequently used kernels is the radial basis function (RBF), given by:

k�x� x′� = exp�− ��x− x′��2��

where > 0 is a smoothness parameter.In addition, the definition of a cost function �C� for the errors is necessary.

The role of cost functions in SVR is to quantify the impact of the distance betweenobservation and fitted values in the optimization problem. The cost function alsocontrols the impact of outliers on the fit, and consequently, the robustness of theestimator. The most commonly used cost function for regression problems isthe �-insensitive function, defined by:

C�y� f�x�� ={0 if �y − f�x�� < �

�y − f�x�� otherwise.

The main idea of �-insensitive cost is to consider a “tube” of diameter 2 ∗ �centered on the curve fit, within which all errors can be ignored. In many practicalapplications, the kernel smoothness parameter and the tolerance � are set in anad hoc fashion or via cross-validation methods (e.g., minimizing leave-one-out sumof errors).

It is important to mention that the linear models are particular cases of SVR.These models may be fitted using the linear kernel given by:

k�x� x′� = �x x′��

In this case, the model is reduced a linear multiple regression model. However,in contrast to least-squares approach, which minimizes the empirical risk, thecoefficients are estimated considering the cost function and the structural risk. Inpractice, the goodness of fit using linear and nonlinear kernels may be compared byquantifying the leave-one-out (or leave-n-out) cross-validation error, as illustratedin the sections dealing with simulations and applications to real data.

Measuring predictability. The main idea of using support vector regression inunivariate time series analysis is to forecast future observations based solely on pobserved past values, i.e., a nonlinear autoregressive model of order p. Consider atime series yt (t = 1� 2� � � � � T� observed at T time points. The aim is to estimatea function f such that

yt�h� = f�yt−1� yt−2� � � � � yt−p��

where yt�h� is the prediction h-steps ahead. Hence, one may apply SVRmethods to estimate the function f , using yt as response variable and theset x= �yt−1� yt−2� � � � � yt−p� as the feature vector. The model order p may beautomatically selected as the one that minimizes the variance of leave-one-out cross-validation residuals. An illustrative diagram of SVR application for estimatingunivariate time series models is shown in Fig. 1.

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

1188 Sato et al.

Figure 1. Diagram of time series prediction using SVR.

An intuitive measure of time series predictability is the relative prediction errorgiven by:

� = �2�

�2y

where �2� is the prediction error variance and �2

y is yt’s variance. In other words,1− � can be interpreted as the percentage of yt’s variance, which is due to anunderlying linear/nonlinear deterministic structure.

A computationally intensive approach (but requiring minimal assumptions)for assessing the statistical significance of � is by using permutation techniques.Note that under the null hypothesis of no predictability (i.e., � = 1� the values ofthe time series yt can be simply permuted. Hence, one may obtain the respectivebootstrap quantiles of � to obtain a permutation based estimated p-value.

3. Simulations

In this section, simulations were carried out in order to evaluate the performanceand comparisons of linear and nonlinear predictability (using RBF kernel)estimation using SVR. The implementation of the algorithms and all simulationswere written for R platform (www.r-project.org, requiring e1071 module). For eachgroup, 500 simulations were carried out, considering time series with length (T) 100and 200 and four different levels of random errors’ standard deviation. The responsevariable and the predictors were normalized to zero mean and unit variance, andthe SVR parameters were set to C = 1, epsilon = 0�1, and gamma = 1/p (Hsu et al.,2000). The number of lags p in both linear and nonlinear autoregressive models wasestimated by minimizing the leave-one-out cross-validation error.

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

Predictability SVR 1189

Group 1. For this first group of simulation, the following AR(2) process wasconsidered:

xt = 0�5xt−1 − 0�3xt−2 + ��i�

The main interest in this series of simulations was to compare the linear andnonlinear prediction errors when the underlying autocorrelation structure processwas purely linear. In this case, the linear kernel is expected to be more adequate,however the evaluation of the difference in the accuracy between the two methodswas also a point of interest. In addition, the influences of noise level ��� and thesample size T were quantified.

Group 2. The aim of this second group of simulation was to verify theadequacy of nonlinear autoregressive models using SVR in cases of nonlinearprocesses. We considered the following model:

xt = 0�2xt−1 + 2 cos�xt−2�+ ��i�

In theory, the nonlinear prediction error was expected to be less than the linearone. However, the validity of this statement needed be verified in cases of smallsamples. The effects of noise level ��� and of sample size on the accuracy by usinga nonlinear kernel were also quantified.

Group 3. The model considered in these simulations was:

xt = cos�xt−1� cos�xt−2�+ ��i�

This group of simulation focused on the comparison of the performance ofnonlinear and linear models in quantifying the predictability in cases of non additivenonlinear models. A nonlinear interaction term involving the first and secondtemporal lags was included in the generation of the simulated data sets.

Group 4. This set of simulations focus on the evaluation of multiple horizonsforecasting of both linear and nonlinear estimators. The three models described inprevious simulations were considered, assuming � = 0�1 and horizons of h = 2, 4,and 6. In these cases, the SVR was trained to achieve the best predictors for eachstep ahead.

3.1. Simulations Results

Figure 2 shows the results for the first group of simulations. As expected, the linearmodel performs (slightly) better than the nonlinear one, but this difference seemsnot be influenced by the noise standard variance. The variability in the computedpredictability estimates seems to be greater using the linear model.

The simulations results for the second group of simulations are shown in Fig. 3.Here, the nonlinear predictive power seems to be greater than the linear one, andthis superiority does not appear to depend on the innovations’ variance. In addition,in a similar way to that seen with the results of the first group of simulations, thevariability on predictability estimates using the linear model was greater than thatusing the nonlinear model.

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

1190 Sato et al.

Figure 2. Simulations 1: Top panel shows the results for T = 100 at different levels ofnoise. Bottom panel shows the difference on prediction error between nonlinear and linearestimates.

The results for the third group of simulations are summarized in Fig. 4.The simulations suggest that, in this case, the autoregressive models estimated usingnonlinear kernels are more adequate, but the differences between the predictivepowers was not so large as for the second group of simulations.

For all simulations, the prediction error decreases when the sample sizeincreases, which is a reasonable and expected result. On the other hand, thedifference between the nonlinear and linear prediction error seems to depend neitheron the sample size nor on the noise level.

The multiple horizons forecasting comparisons are shown in Fig. 5. The resultssuggest that the nonlinear estimator is clearly superior in the nonlinear case(Group 2), which is expected. Nevertheless, the prediction errors between nonlinearand linear seem to be equivalent for simulations of Groups 1 and 3. In the linearmodel, this result is reasonable considering that the prediction errors were slightlydifferent even in one step ahead forecasting. However, the simulations involving

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

Predictability SVR 1191

Figure 3. Simulations 2: Top panel shows the results for T = 100 at different levels ofnoise. Bottom panel shows the difference on prediction error between nonlinear and linearestimates.

the model of Group 3 suggest that although being superior in one-step prediction,nonlinear estimator may be equivalent to the linear in cases of multiple horizonsforecasting.

In summary, simulations confirmed the hypothesis that the linear kernel shouldperform better in cases of linear autoregressive processes. In contrast, RBF kernel issuperior in performance when there are underlying nonlinearities in the data. Thisdifference is not large in the linear case but can be extremely large in the nonlinearcase. This result is expected, as linear autoregressive structures are particular casesof the nonlinear model.

4. Application to an EEG Dataset

The EEG dataset used in this study was collected at the Department of Neurology,Hospital das Clínicas/University of São Paulo – Brazil. The data consists of 10,000

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

1192 Sato et al.

Figure 4. Simulations 3: Top panel shows the results for T = 100 at different levels ofnoise. Bottom panel shows the difference on prediction error between nonlinear and linearestimates.

time points sampled at a rate of 200Hz using the international 10–20 EEG system.The volunteer was a patient with generalized epilepsy in the left mesial temporallobe, with a seizure focus close to the T3 channel.

The literature describes several studies which have attempted to characterizeor predict epileptic seizures using quantitative analysis of intracranial/extracranialEEG recordings. Junling and Dazong (2005) described a linear epileptic seizurepredictor based on the analysis of low frequency characteristics of EEG signalsmeasured on the scalp. Mormann et al. (2005, 2006, 2007) studied and reviewedmethods and algorithms for predicting seizures and discussed the clinical usefulnessand applicability of these approaches. Most studies in literature aim for theidentification of explicit/implicit temporal or spectral characteristics that mayanticipate the epileptic seizures. Sabesan et al. (2003) described a comparative studyusing the Lyapunov exponent, Shannon and Kullback-Leibler entropy of EEGsignals to predict temporal lobe epileptic attacks. Adeli et al. (2007) introduced

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

Predictability SVR 1193

Figure 5. Simulations 4: Prediction error in multi-steps ahead forecasting. The models weredescribed in previous simulations descriptions.

Figure 6. Predictability analysis of EEG recordings at channel T3. Time series, predictionerror estimates using nonlinear and linear models, and the respective difference. The verticaldotted lines show the seizure start-point.

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

1194 Sato et al.

the correlation dimension and Lyapunov exponent in the wavelet domain tocharacterize the signals of healthy subjects, epileptic subjects during seizure-freeinterval and during attack.

In the current study, the EEG time series of channel T3 (seizure focus) ispresented. The time series was normalized to zero mean and unit variance (Fig. 6top-left panel). The patient is seizure-free approximately from timepoints 0–3,500(pre-ictal period), in a transition state from 3,501–6,000 and in seizure from6,001–10,000 (ictal-period). The prediction errors using linear and nonlinear kernelswere estimated locally by analyzing using time segments of length (T) 50 (numberof samples) due to the non stationarity properties of EEG channels during epilepticseizures (Andrzejak et al., 2001; Manuca et al., 1998; Rankine et al., 2007). The SVRparameters and lag order were set identically to those used for the simulations.

The analysis of EEG recordings (Fig. 6) raised some interesting pointsfor consideration. First, both linear and nonlinear estimates indicate that therandomness of the signal increased before the seizure (0–2,000 timepoints). Onthe other hand, the signal tends to be quite predictable during the transition(3,500–5,000) and ictal (6,000–10,000) periods (almost 95% of the signal variance canbe explained by a deterministic structure). For both linear and nonlinear estimator,the mean difference between the estimated prediction error before and during theseizure (after 3,500) was tested by using the general linear model (with Cochrane-Orcutt correction, due to autocorrelation), resulting in a p-value less than 0.001,indicating that the signal is significantly more predictable during the seizure. Theseresults are in agreement with previous other studies in literature. Yambe et al. (2005)applied KS entropy to show that the predictability of subdural EEG data increasesduring the transition period. Sabesan et al. (2003) and Mormann et al. (2006, 2007)have shown that determinism measures may be useful to detect epileptic seizures.

Furthermore, the comparison between the linear and nonlinear models is apoint of interest. During the seizure-free period, linear and nonlinear predictionerrors are statistically equivalent (p-value = 0�488, mean difference tested using thegeneral linear model with Cochrane-Orcutt correction). Nevertheless, nonlinearitiesare more evident during the seizure (p-value = 0�006, the differences variance areclearly reduced compared to the seizure-free period). In fact, the differences averagebetween linear and nonlinear prediction errors was small (approximately 1%).However, point-wise estimates are more relevant than averages, since predictabilityin EEG is dynamic, i.e., varies over time. The 5% quantile of the differences betweennonlinear and linear estimators was 11%, which is physiologically relevant todescribe a seizure model. In fact, it is not expected that nonlinear predictors wouldbe better during the whole seizure timepoints, but the identification of some fewperiods is enough and important to improve epilepsy characterization. These resultsregarding nonlinearities are also comparable with those obtained from with previousstudies. For examples, Casdagli et al. (1997) detected the existence of nonlinearitiesfrom depth and subdural EEG recording of patients with seizure of mesial temporalorigin. Gautama et al. (2003) reported the existence of nonlinear structures in brainelectrical activity and Andrzejak et al. (2006) showed that consideration of signalnonlinearity improved the spatial characterization of epilepsy datasets by comparinglinear and nonlinear measures.

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

Predictability SVR 1195

5. Conclusion

Most approaches for the analysis of biological time series require the assumptionof linearity. This approach has the virtue of simplicity but for many practicalapplications may be too restrictive or even invalid.

In the current study, we introduce a time series predictability measure basedon support vector regression estimation and linear and nonlinear autoregressiveprocesses. The advantages of SVR is that it deals with overfitting problemsby minimizing the structural risk, and models the nonlinearities using fewparameters compared to other approaches such as kernel smoothing, splines, etc.Simulation results indicated that the approach is adequate, even for small samples.The existence of nonlinearities in time series was shown in an illustrative applicationusing EEG recordings from an epileptic patient. It was shown that it is possible tocharacterize the dynamics of epileptic seizures not only bymeasuring the predictabilityof the recordings but also by quantifying the nonlinearities in the data. Most of theresults obtained were in agreement with previous studies in the literature.

The main limitations of SVR lie in the area of parameter interpretation. In thelinear case, this can be generally described by expected values of variations (as inmultiple linear regression models). However, in SVR, functions are not estimatedexplicitly, as they are implicitly mapped using the kernel trick. In some ways,SVR can be viewed as a kind of “black box,” used to make predictions. In otherwords, the main focus is obtaining good predictions and not inferences about modelparameters. On the other hand, the ability to produce nonlinear forecasts can beused in inferences about the predictability and nonlinearity degrees of time series,as shown in this study.

Acknowledgments

The authors are grateful to Prof. Dr. Koichi Sameshima (Hospital das Clínicas,University of São Paulo) for providing the EEG datasets. This research was partiallysupported by CNPq (J.R.S) and FAPESP (03/10105-2 and ClnAPCe 05/56464-9)Brazil.

References

Adeli, H., Ghosh-Dastidar, S., Dadmehr, N. (2007). A wavelet-chaos methodology foranalysis of EEGs and EEG subbands to detect seizure and epilepsy. IEEE Transactions ofBiomedical Engineering 54(2):205–211.

Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., Elger, C. E. (2001).Indications of nonlinear deterministic and finite-dimensional structures in time series ofbrain electrical activity: dependence on recording region and brain state. Physical ReviewE 64(6):061907.

Andrzejak, R. G., Mormann, F., Widman, G., Kreuz, T., Elger, C. E., Lehnertz, K. (2006).Improved spatial characterization of the epileptic brain by focusing on nonlinearity.Epilepsy Research 69(1):30–44.

Baccala, L. A., Sameshima, K. (2001). Partial directed coherence: a new concept in neuralstructure determination. Biological Cybernetics 84(6):463–474.

Cai, Z., Fan, J., Yao, Q. (1999). Functional-coefficient regression models for nonlinear timeseries. Working Paper, University of North Carolina, Charlote.

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

1196 Sato et al.

Casdagli, M. C., Iasemidis, L. D., Savit, R. S., Gilmore, R. L., Roper, S. N., Sackellares,J. C. (1997). Non-linearity in invasive EEG recordings from patients with temporal lobeepilepsy. Electroencephalography and Clinical Neurophysiology 102(2):98–105.

Chen, R., Tsay, R. S. (1993). Functional-coefficient autoregressive models. The Journal of theAmerican Statistical Association 88:298–308.

Deneux, T., Faugeras, O. (2006). Using nonlinear models in fMRI data analysis: modelselection and activation detection. Neuroimage 32(4):1669–1689.

Enders, W. (2003). Applied Econometric Time Series. 2nd ed. New York: Wiley.Fisher, R. A. (1936). The use of multiple measures in taxonomic problems. Ann. Eugenics

7:179–188.Friston, K. J., Holmes, A. P., Poline, J. B., Grasby, P. J., Williams, S. C., Frackowiak, R. S.,

Turner, R. (1995). Analysis of fMRI time-series revisited. Neuroimage 2(1):45–53.Friston, K. J., Mechelli, A., Turner, R., Price, C. J. (2000). Nonlinear responses in fMRI: the

Balloon model, Volterra kernels, and other hemodynamics. Neuroimage 12(4):466–477.Gautama, T., Mandic, D. P., Van Hulle, M. M. (2003). Indications of nonlinear structures

in brain electrical activity. Physical Review E, Statistical, Nonlinear, and Soft Matter Physics67(4 Pt 2):046204.

Grinsted, A., Moore, J. C., Jevrejeva, S. (2004). Application of the cross wavelet transformand wavelet coherence to geophysical time series. Nonlinear Processes in Geophysics11:561–566.

Gunn, S. (1998). Support vector machines for classification and regression. Technical ReportISIS-1-98, Department of Electronics and Computer Science, University of Southampton.

Hagan, V., Ozaki, T. (1981). Modelling nonlinear vibrations using an amplitude dependentautoregressive time series model. Biometrika 68:189–196.

Harrison, L, Penny, W. D., Friston, K. (2003). Multivariate autoregressive modeling of fMRItime series. Neuroimage 19(4):1477–1491.

Hastie, T., Tibshirani, R., Friedman, J. (2001). The Elements of Statistical Learning.New York: Springer-Verlag.

Hsu, C. W., Chang, C. C., Lin, C. J. (2000). A practical guide to support vectorclassification. Working paper of Information Engineering of University of Taiwan. URL:http://www.csie.ntu.edu.tw/∼cjlin/libsvm.

Hyndman, R. J. (1999). Nonparametric regression models for binary time series. Proceedingsof 1999 Australasian Meeting of the Econometric Society. University of Technology,Sydney.

Junling, Z., Dazong, J. (2005). A linear epileptic seizure predictor based on slow waves ofscalp EEGs. Conference Proceedings of IEEE Engineering Medical and Biological Society7:7277–7280.

Lai, D. (2005). Monitoring the SARS epidemic in China: a time series analysis. Journal ofData Sciences 3:279–293.

Manuca, R., Casdagli, M. C., Savit, R. S. (1998). Nonstationarity in epileptic EEG andimplications for neural dynamics. Mathematical Bioscience 1:147(1):1–22.

Mercer, J. (1909). Functions of positive and negative type and their conections withthe theory of integral equations. Philosophical Transactions of Royal Society, London A209:415–446.

Morettin, P. A., Chiann, C. (2006). Nonparametric estimation of functional-coefficientautoregressive models. Technical Report. Institute of Mathematics and Statistics,University of São Paulo.

Mormann, F., Kreuz, T., Rieke, C., Andrzejak, R. G., Kraskov, A., David, P., Elger, C. E.,Lehnertz, K. (2005). On the predictability of epileptic seizures. Clinical Neurophysiology116(3):569–587.

Mormann, F., Elger, C. E., Lehnertz, K. (2006). Seizure anticipation: from algorithms toclinical practice. Current Opinion in Neurology 19(2):187–193. Review.

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009

Predictability SVR 1197

Mormann, F., Andrzejak, R. G., Elger, C. E., Lehnertz, K. (2007). Seizure prediction: thelong and winding road. Brain 130:314–333. Review.

Mourão-Miranda, J., Bokde, A. L. W., Born, C., Hampel, H., Stetter, S. (2005) Classifyingbrain states and determining the discriminating activation patterns: support vectormachine on functional MRI data. Neuroimage 28(4):980–995.

Mukherjee, S., Osuna, E., Giroisi, F. (1997). Nonlinear prediction of chaotic timeseries using support vector machines. Proceedings of IEEE NNSP’97 24–26. see:http://citeseer.ist.psu.edu/9026.html

Müller, K. R., Smola, A. J., Rätsch, G., Schölkopf, B., Kohlmorgen, J., Vapnik, V. (1997).Predicting time series with support vector machines. Proceedings of ICANN’97. SpringerLNCS 1327, pp. 999–1004.

Ping-Feng, P., Chih-Sheng, L. (2005). Using support vector machines to forecast theproduction values of the machinery industry in Taiwan. International Journal of AdvancedManufactoring Technology 27:205–210.

Rankine, L., Stevenson, N., Mesbah, M., Boashash, B. (2007). A nonstationary model ofnewborn EEG. IEEE Transactions Biomedical Engineering 54(1):19–28.

Riera, J., Bosch, J., Yamashita, O., Kawashima, R., Sadato, N., Okada, T., Ozaki, T. (2004).fMRI activation maps based on the NN-ARx model. Neuroimage 23(2):680–697.

Sabesan, S., Narayanan, K., Prasad, A., Spanias, A., Sackellares, J. C., Iasemidis, L. D.(2003). Predictability of epileptic seizures: a comparative study using Lyapunov exponentand entropy based measures. Biomedical Sciences Instrumentation 39:129–135.

Sato, J. R., Amaro, E., Jr, Takahashi, D. Y., de Maria Felix, M., Brammer, M. J., Morettin,P. A. (2006a). A method to produce evolving functional connectivity maps during thecourse of an fMRI experiment using wavelet-based time-varying Granger causality.Neuroimage 31(1):187–196.

Sato, J. R., Takahashi, D. Y., Cardoso, E. F., Martin, M. G., Amaro, E., Jr, Morettin, P.A. (2006b). Intervention models in functional connectivity identification applied to fMRI.International Journal of Biomedical Imaging 1–7.

Smola, A. J., Schoelkopf, B. (1998). A tutorial on support vector regression. NeuroCOLT2Technical Report Series.

Stock, J. H., Watson, M. W. (2001). Vector autoregressions. Journal of Economic Perspectives,American Economic Association 4:101–115.

Stoica, P., Moses, R. (2005). Spectral Analysis of Signals. Prentice-Hall.Tay, F., Cao, L. (2001). Application of support vector machines time series forecasting.

Omega 29:309–317.Tong, H. (1983). Threshold Models in Non-linear Time Series Analysis. Lecture Notes in

Statistics 21. Heidelberg: Springer.Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.Vapnik, V. (1998). Statistical Learning Theory. New York: John Wiley and Sons.Yambe, T., Asano, E., Mauyama, S., Shiraishi, Y., Shibata, M., Sekine, K., Watanabe,

M., Yamaguchi, T., Kuwayama, T., Konno, S., Nitta, S. (2005). Chaos analysis ofelectro encephalography and control of seizure attack of epilepsy patients. BiomedicalPharmacotherapy 59 Suppl 1:S236–S238.

Downloaded By: [Sistema Integrado de Bibliotecas/USP] At: 16:56 16 January 2009