COMPUTATIONAL INTELLIGENCE METHODS FOR FINANCIAL TIME SERIES MODELING

International Journal of Bifurcation and Chaos, Vol. 16, No. 7 (2006) 2053–2062c© World Scientific Publishing Company

COMPUTATIONAL INTELLIGENCE METHODSFOR FINANCIAL TIME SERIES MODELING

N. G. PAVLIDIS∗, D. K. TASOULIS†, V. P. PLAGIANAKOS‡,and M. N. VRAHATIS§

Computational Intelligence Laboratory,Department of Mathematics, University of Patras,

University of Patras Artificial Intelligence Research Center (UPAIRC),GR-26110 Patras, Greece∗[email protected]†[email protected]‡[email protected]

§[email protected]

Received February 16, 2005; Revised April 11, 2005

In this paper, the combination of unsupervised clustering algorithms with feedforward neuralnetworks in exchange rate time series forecasting is studied. Unsupervised clustering algorithmshave the desirable property of deciding on the number of partitions required to accuratelysegment the input space during the clustering process, thus relieving the user from making thisad hoc choice. Combining this input space partitioning methodology with feedforward neuralnetworks acting as local predictors for each identified cluster helps alleviate the problem ofnonstationarity frequently encountered in real-life applications. An improvement in the one-step-ahead forecasting accuracy was achieved compared to a global feedforward neural networkmodel for the time series of the exchange rate of the German Mark to the US Dollar.

Keywords : Time series modeling and prediction; unsupervised clustering; neural networks.

1. Introduction

System identification and time series prediction areembodiments of the old problem of function approx-imation [Principe et al., 1998]. A discrete timeseries is a set of observations of a given variablez(t) ordered according to the parameter time, anddenoted as z1, z2, . . . , zN , where N is the size of thetime series.

Conventional time series models rely on globalapproximation, employing techniques such as lin-ear regression, polynomial fitting and artificial neu-ral networks. Global models are well suited toproblems with stationary dynamics. In the analy-sis of real-world systems, however, two of the keyproblems are nonstationarity (often in the form ofswitching between regimes) and overfitting (which

is particularly serious for noisy processes) [Weigendet al., 1995]. Nonstationarity implies that thestatistical properties of the data generator varythrough time. This leads to gradual changes inthe dependency between the input and outputvariables.

Noise, on the other hand, refers to the unavail-ability of complete information from the pastbehavior of the time series to fully capture thedependency between the future and the past. Noisecan be the source of overfitting, which impliesthat the performance of the forecasting model willbe poor when applied to new data [Cao, 2003;Milidiu & Renteria, 1999]. Although global approx-imation methods can be applied to model andforecast time series having the aforementioned

2053

https://www.researchgate.net/publication/2240830_Local_Dynamic_Modeling_with_Self-Organizing_Maps_and_Applications_to_Nonlinear_System_Identification_and_Control?el=1_x_8&enrichId=rgreq-0627d970-d9b8-4c0a-87a0-0581163bc10c&enrichSource=Y292ZXJQYWdlOzI2Mzk4NDM3NjtBUzoxMjA3NTgyNjA5OTgxNDZAMTQwNTgwMjQyMTc4NQ==

2054 N. G. Pavlidis et al.

characteristics, it is reasonable to expect that theforecasting accuracy can be improved if regions ofthe input space exhibiting similar dynamics areidentified and subsequently a local model is con-structed for each of them. A number of researchershave proposed alternative methodologies to performthis task effectively [Cao, 2003; Milidiu & Rente-ria, 1999; Pavlidis et al., 2003; Pavlidis et al., 2005;Principe et al., 1998; Sfetsos & Siriopoulos, 2004;Weigend et al., 1995]. In principle, these method-ologies are formed by the combination of two dis-tinct approaches; an algorithm for the partitioningof the input space and a function approximationmodel. Evidently the partitioning of the input spaceis critical for the successful application of thesemethodologies.

In this paper we investigate the improvementin one-step-ahead forecasting accuracy that can beattained if the partitioning of the input space is per-formed through unsupervised clustering algorithms,while the function approximation model is a feed-forward neural network. Clustering can be definedas the process of “grouping a collection of objectsinto subsets or clusters, such that those within onecluster are more closely related to one another thanobjects assigned to different clusters” [Hastie et al.,2001]. Unsupervised clustering algorithms automat-ically approximate the number of clusters in thedataset during their execution. This property isimportant in the context of partitioning the inputspace for the purposes of time series forecasting,since the number of partitions corresponding to thedifferent regimes is typically unknown a priori.

As a benchmark we consider the time seriesof the spot exchange rate of the German Markagainst the US Dollar. Foreign exchange rates areamong the most important economic indices ininternational monetary markets. Currently, foreignexchange markets are the most active of all finan-cial markets with average daily trading volumes intraditional (nonelectronic broker) estimated at $1.2trillion [Bank of International Settlements, 2001].Although the precise scale of speculative tradingon spot markets is unknown it is estimated thatonly around 15% of the trading is driven by non-dealer/financial institution trading. Approximately,90% of all foreign currency transactions involvethe US Dollar [Bank of International Settlements,2001]. Foreign exchange rates are affected by manyhighly correlated economic, political and psycholog-ical factors, the interaction of which is very com-plex. Thus, forecasting foreign exchange rates poses

many theoretical and experimental challenges [Yao& Tan, 2000].

The remaining paper is organized as follows:in the next section we present the clustering andneural network algorithms employed in this study.In Sec. 3 experimental results regarding the spotexchange rate of the German Mark against the USDollar are presented. The paper ends with a shortdiscussion of the results and concluding remarks.

2. Methods

In this section we briefly describe the application ofunsupervised clustering and neural networks in thecontext of time series modeling and prediction. Fur-thermore, three unsupervised clustering algorithms,as well as three neural network training algorithmsare outlined.

2.1. Unsupervised clusteringalgorithms

A critical issue in the process of partitioning theinput space for the purpose of time series mod-eling and forecasting is to obtain an appropriateestimation of the number of subsets. Over- or under-estimation of this quantity can cause the appear-ance of clusters with little or no physical meaning,and/or clusters containing patterns from regionswith different dynamics, and/or clusters with veryfew patterns that are insufficient for the training ofa feedforward neural network.

This is a fundamental and unresolved prob-lem in cluster analysis, independent of the clus-tering technique applied. For instance, well-knownand widely used iterative techniques, such as Self-Organizing Maps (SOMs) [Kohonen, 1997], thek-means algorithm [Hartigan & Wong, 1979], as wellas, the Fuzzy c-means algorithm [Bezdek, 1981],require from the user to specify the number of clus-ters present in the dataset prior to the execution ofthe algorithm.

On the other hand, algorithms that have theability to approximate the number of clusterspresent in a dataset belong to the category ofunsupervised clustering algorithms. In this studywe consider only unsupervised clustering algo-rithms. In particular, we employ the Growing Neu-ral Gas [Fritzke, 1995], the DBSCAN [Ester et al.,1996], and the unsupervised k-windows [Tasoulis& Vrahatis, 2004; Vrahatis et al., 2002] clusteringalgorithms. Next, the aforementioned unsupervisedalgorithms are briefly presented.

Computational Intelligence Methods for Financial Time Series Modeling 2055

2.1.1. Growing Neural Gas clusteringalgorithm

The Growing Neural Gas (GNG) clustering algo-rithm [Fritzke, 1995] is an incremental neural net-work. It can be described as a graph consisting ofk nodes, each of which has an associated weight vec-tor, defining the node’s position in the data spaceand a set of edges between the node and its neigh-bors. During the clustering procedure, new nodesare added to the network until a maximal number ofnodes is reached. GNG starts with two nodes, ran-domly positioned in the data space, connected byan edge. Adaptation of weights, i.e. the nodes’ posi-tions, is performed iteratively. For each data objectthe closest node (winner), s1, and the closest neigh-bor of the winner node, s2, are identified. These twonodes are connected by an edge. An age variable isassociated with each edge. When the edge betweens1 and s2 is created its age is set to zero. At eachlearning step the age variable of all edges emanat-ing from the winner node are increased by one. Bytracing the changes of the age variable it is possibleto detect inactive nodes. Edges exceeding a max-imal age, R, and any nodes having no emanatingedges are removed. The neighborhood of the win-ner is limited to its topological neighbors. The win-ner and its topological neighbors are moved in thedata space toward the presented object by a con-stant fraction of the distance, defined separately forthe winner and its topological neighbors. There isno neighborhood function, or ranking concept andthus, all topological neighbors are updated in thesame manner.

2.1.2. The DBSCAN clustering algorithm

The DBSCAN clustering algorithm [Sander et al.,1998] relies on a density-based notion of clustersand is designed to discover clusters of arbitraryshape and to distinguish noise. More specifically,the algorithm relies on the idea that for each pointin a cluster at least a minimum number of objects,MinPts, should be contained in a neighborhood ofa given radius, Eps, around it. Thus, by iterativelyscanning all the points in the dataset DBSCANforms clusters of points that are connected throughchains of Eps-neighborhoods of at least MinPtspoints each.

2.1.3. Unsupervised k-windows

The unsupervised k-windows clustering algo-rithm [Tasoulis & Vrahatis, 2004; Vrahatis et al.,

2002] uses a windowing technique to discover theclusters present in a dataset. More specifically, ifwe suppose that the dataset lies in d dimensions,it initializes a number of d-dimensional windowsover the dataset. At a next step it iteratively movesand enlarges these windows to enclose all the pat-terns that belong to one cluster in a window. Themovement and enlargement procedures are guidedby the points that lie within a window at each iter-ation. As soon as the movement and enlargementprocedures do not alter significantly the number ofpoints within a window they terminate. The finalset of windows defines the clustering result of thealgorithm. The unsupervised k-windows algorithm(UKW) applies the k-windows algorithm using a“sufficiently” large number of initial windows. Thewindowing technique of the k-windows algorithmallows for a large number of initial windows tobe examined without any significant overhead intime complexity. At a final step the windows thatcontain a high percentage of common points fromthe dataset are considered to belong to the samecluster. Thus the number of clusters can be deter-mined [Alevizos et al., 2002; Alevizos et al., 2004;Tasoulis & Vrahatis, 2004].

2.2. Feedforward Neural Networks

Artificial Neural Networks (ANNs) have beenwidely employed in numerous fields and have showntheir strengths in solving real-world problems.ANNs are parallel computational models comprisedof interconnected adaptive processing units (neu-rons), characterized by an inherent propensity forstoring experiential knowledge. They resemble thehuman brain in two fundamental respects; firstly,knowledge is acquired by the network from itsenvironment through a learning process, and sec-ondly, interneuron connection strengths (known asweights) are employed to store the acquired knowl-edge [Haykin, 1999].

Numerous neural network models have beenproposed, but multilayered Feedforward NeuralNetworks (FNNs) are the most common. In FNNsneurons are arranged in layers and there are connec-tions between neurons in one layer to the neuronsof the following layer. The learning rule typicallyused for FNNs is supervised training. Two criticalparameters for the successful application of FNNsare the appropriate selection of the network archi-tecture and the training algorithm. For the generalproblem of function approximation, the universal


approximation theorem, proved in [White, 1990]states that:

Theorem 2.1. Standard Feedforward Networkswith only a single hidden layer can approximate anycontinuous function uniformly on any compact setand any measurable function to any desired degreeof accuracy.

An immediate implication of the above theo-rem is that any lack of success in applications mustarise from inadequate learning and/or an insuffi-cient number of hidden units and/or the lack of adeterministic relationship between the input pat-terns and the desired response (target).

In the context of time series modeling theinputs to the FNN typically consist of a numberof delayed observations, while the target is the nextvalue of the series. The universal myopic mappingtheorem [Sandberg & Xu, 1997a, 1997b] states thatany shift-invariant map can be approximated arbi-trarily well by a structure consisting of a bankof linear filters feeding an FNN. An implicationof this theorem is that, in practice, FNNs alonecan be insufficient to capture the dynamics of anonstationary system [Haykin, 1999]. This is alsoverified by the results presented in this paper.

The selection of the optimal network architec-ture for a specific task remains up to date an openproblem. An upper bound on the architecture of anFNN designed to approximate a continuous func-tion defined on the unit cube in R

n is given by thefollowing Theorem [Pinkus, 1999]:

Theorem 2.2. On the unit cube in Rn any con-

tinuous function can be uniformly approximated, towithin any error by using a two hidden layer net-work having 2n+1 units in the first layer and 4n+3units in the second layer.

2.3. Supervised training ofneural networks

The supervised training process is an incremen-tal adaptation of the weights that propagate infor-mation between the neurons. Learning in FNNs isachieved by minimizing the network error using abatch, also called offline, or a stochastic, also calledonline, training algorithm.

Batch training is considered as the classicalmachine learning approach. In time series applica-tions, a set of patterns is used for modeling the

system, before the network is actually used for pre-diction. In this case, the goal is to find a minimizerw∗ = (w∗

1, w∗2, . . . , w

∗n) ∈ R

n, such that:

w∗ = minw∈Rn

E(w),

where E is the batch error measure of theFNN, whose lth layer (l =1, . . . ,M) contains Nl

neurons:

E =12

P∑p=1

NM∑j=1

(yMj,p − tj,p)2 =

P∑p=1

Ep. (1)

In the above relation, the error function is basedon the squared difference between the actual out-put value at the jth output layer neuron for pat-tern p, yM

j,p, and the target output value, tj,p. Ep isthe error of the pth pattern and p is the index overthe input–output pairs. To predict the next valueof the time series, there is only one output neuron(NM = 1). On the other hand, when the problem isformulated as a classification task the value of NM

can vary according to the number of classes.Supervised training is a difficult task since, in

general, the dimension of the weight space is veryhigh and the function E generates a complicatedsurface, characterized by multiple local minima andbroad flat regions adjoined to narrow steep ones.

In online training, the FNN weights areupdated after the presentation of each training pat-tern. Online training may be the appropriate choicefor learning a task either because of the very large(or even redundant) training set, or because of theslowly time-varying nature of the task. Althoughbatch training seems faster for small-size trainingsets and networks, online training is probably moreefficient for large training sets and FNNs. It oftenhelps to avoid local minima and provides a morenatural approach for learning nonstationary tasks,such as time series modeling and prediction. Onlinemethods seem to be more robust than batch meth-ods as errors, omissions, or redundant data in thetraining set can be corrected, or ejected during thetraining phase.

In this paper we have employed and com-pared four algorithms for batch training and oneonline training algorithm. The batch training algo-rithms were the well-known Resilient Propagation(RPROP) [Riedmiller & Braun, 1993], a ScaledConjugate Gradient (SCG) [Møller, 1993] and twopopulation based algorithms, namely the Differ-ential Evolution algorithm (DE) [Storn & Price,1997] and the Particle Swarm Optimization (PSO)[Eberhart et al., 1996]. We also implemented the

https://www.researchgate.net/publication/224982370_Neural_Networks_-_A_Comprehensive_Foundation?el=1_x_8&enrichId=rgreq-0627d970-d9b8-4c0a-87a0-0581163bc10c&enrichSource=Y292ZXJQYWdlOzI2Mzk4NDM3NjtBUzoxMjA3NTgyNjA5OTgxNDZAMTQwNTgwMjQyMTc4NQ==


recently proposed Adaptive Online BackPropaga-tion training algorithm (AOBP) [Magoulas et al.,2001; Plagianakos et al., 2000]. Next, we brieflydescribe the AOBP, the DE, as well as, the PSOalgorithms.

2.3.1. The Online Neural Networktraining algorithm

Despite the abundance of methods for learning fromexamples, there are only a few that can be usedeffectively for online learning. For example, the clas-sic batch training algorithms cannot straightfor-wardly handle nonstationary data. Even when someof them are used in online training the problem of“catastrophic interference” appears, in which train-ing on new examples interferes excessively with pre-viously learned examples, leading to saturation andslow convergence [Sutton & Whitehead, 1993].

Methods suited to online learning are thosethat can handle time-varying data, while at thesame time, require relatively little additional mem-ory and computation in order to process anadditional example. The AOBP method proposedin [Magoulas et al., 2001; Plagianakos et al., 2000]belongs to this class of methods.

The key features of this method are the lowstorage requirements and the inexpensive computa-tions. At each iteration, the d-dimensional weightvector is evaluated using the following updateformula:

wg+1 = wg − ηg∇E(wg).

To calculate the learning rate for the next iteration,ηg+1, AOBP uses information from the current andthe previous iteration. In detail, the new learningrate is calculated through the following relation:

ηg+1 = ηg + K〈∇E(wg−1),∇E(wg)〉,where η is the learning rate, K is the meta-learningrate constant (typically K = 0.5), and 〈·, ·〉 standsfor the usual inner product in R

d. This approachstabilizes the learning rate adaptation process, andprevious experiments [Magoulas et al., 2001; Pla-gianakos et al., 2000] have shown that it allowsthe method to exhibit good generalization and highconvergence rate.

2.3.2. Differential Evolution trainingalgorithm

DE [Storn & Price, 1997] is a novel minimiza-tion method designed to handle nondifferentiable,

nonlinear and multimodal objective functions, byexploiting a population of NP potential solutions,that is d-dimensional vectors, to probe the searchspace. At each iteration of the algorithm, called gen-eration, g, three steps, mutation, recombination andselection, are performed to obtain more accurateapproximations [Plagianakos & Vrahatis, 2002]. Ini-tially, all weight vectors are initialized by using arandom number generator. At the mutation step,for each i = 1, . . . , NP a new mutant weight vectorvig+1 is generated by combining weight vectors, ran-

domly chosen from the population, and exploitingthe following variation operator:

vig+1 = ωi

g + µ(ωbestg − ωi

g + ωr1g − ωr2

g ), (2)

where ωr1g and ωr2

g are randomly selected vectors,different from ωi

g, and ωbestg is the member of the

current generation that yielded the lowest errorfunction value. Finally, the positive mutation con-stant µ, controls the magnification of the differencebetween two weight vectors (typically µ = 0.8).

The resulting mutant vectors are mixed witha predetermined weight vector, called target vec-tor. This operation is called recombination, and itgives rise to the trial vector. At the recombina-tion step, for each component j = 1, 2, . . . , d of themutant weight vector a random number r ∈ [0, 1] isgenerated. If r is smaller than the predefined recom-bination constant p (typically p = 0.9), the jth com-ponent of the mutant vector vi

g+1 becomes the jthcomponent of the trial vector. Otherwise, the jthcomponent of the target vector, ωi

g, is selected asthe jth component of the trial vector. Finally, atthe selection step, the trial weight vector obtainedafter the recombination step is accepted for the nextgeneration, if and only if, it yields a reduction of thevalue of the error function relative to the previousweight vector; otherwise, the previous weight vectoris retained.

2.3.3. Particle Swarm Optimizationtraining algorithm

PSO is a swarm-intelligence optimization algorithmcapable of minimizing nondifferentiable, nonlinearand multimodal objective functions. Each memberof the swarm, called particle, moves with an adapt-able velocity within the search space, and retains inits memory the best position it ever encountered.At each iteration, the best position ever attainedby the swarm is communicated among the parti-cles [Eberhart et al., 1996].


Assume a d-dimensional search space, S ⊂ Rd,

and a swarm of NP particles. Both the position andthe velocity of the ith particle are d-dimensionalvectors, xi ∈ S and vi ∈ R

d, respectively. The bestprevious position ever encountered by the ith parti-cle is denoted by pi, while the best previous positionattained by the swarm is denoted by pg. The veloc-ity [Clerc & Kennedy, 2002] of the ith particle atthe (g +1)-th iteration is obtained through Eq. (3).The new position of this particle is determined bysimply adding the velocity vector to the previousposition vector, Eq. (4).

v(g+1)i = χ(v(g)

i + c1r1(p(g)i − x

(g)i )

+ c2r2(p(g)g − x

(g)i )), (3)

x(g+1)i = x

(g)i + v

(g+1)i , (4)

where i = 1, . . . ,NP ; c1 and c2 are positive con-stants (typically c1 = c2 = 2.05); r1, r2 are randomnumbers uniformly distributed in [0, 1]; and χ is theconstriction factor (typically χ = 0.729). In general,PSO has proved to be very efficient and effective intackling various difficult problems [Parsopoulos &Vrahatis, 2002].

3. Presentation of ExperimentalResults

The time series considered was that of the daily spotprices of the exchange rate of the German Mark rel-ative to the US Dollar [Keogh & Folias, 2002]. Thetime period considered extends from 10/9/1986 to8/9/1996, covering approximately ten years. Thetotal number of observations was 2567. The first

2317 were used to evaluate the parameters of thepredictive models, while the remaining 250, cover-ing approximately the final year of the dataset, wereused to evaluate their performance.

The first step in the analysis and predictionof time series originating from real-world systemsis the choice of an appropriate time delay, T ,and the determination of the embedding dimen-sion, D. To select T an established approach isto use the value that yields the first minimumof the mutual information function [Fraser, 1989].For the considered time series no minimum occursfor T = 1, . . . , 20, as illustrated in Fig. 1. In thiscase a time delay of one is typically selected. Todetermine the minimum embedding dimension forstate space reconstruction we applied the methodof “False Nearest Neighbors” [Hegger et al., 1999;Kennel et al., 1992]. As illustrated in Fig. 1 the pro-portion of false nearest neighbors as a function ofD drops sharply to the value of 0.006 for D equalto five, which is the embedding dimension that weselected, and it becomes zero for dimensions higherthan seven. With this embedding dimension thenumber of patterns used to evaluate the parametersof the predictive models was 2312 while the perfor-mance of the models was evaluated on the last 250patterns.

Having selected an embedding dimension, wetested numerous FNNs with different architecturesand training algorithms, but no FNN was capa-ble of producing a satisfactory test set predictionaccuracy. In fact, the forecasts resembled a time-lagged version of the original series. Next, the threeunsupervised clustering algorithms, namely GNG,

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

0 5 10 15 20

Mut

ual I

nfor

mat

ion

Time Delay

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9 10

Pro

port

ion

of F

alse

Nea

rest

Nei

ghbo

rs

Embedding Dimension

Fig. 1. Mutual information as a function of T (left) and proportion of “false nearest neighbors” as a function of D (right).


DBSCAN and UKW, were applied on the pat-terns of the training set to obtain a partition ofthe input space. Note that the value to be pre-dicted (target value) by the FNNs acting as localapproximators, was also included in the patternscomprising the dataset supplied to the cluster-ing algorithms. Our experience suggests that thisapproach slightly improves the overall forecastingperformance. Once the clusters present in the train-ing set are identified, each pattern from the testset is assigned to one of the clusters. Since the tar-get value for patterns in the test set is unknownthe assignment is performed by not taking intoconsideration the additional dimension that corre-sponds to the target component. A test set patternis assigned to the cluster to which the nearest (interms of Euclidean distance) node, pattern, windowcenter, belongs for the GNG, DBSCAN, and UKWalgorithms, respectively.

We evaluate the accuracy of the FNNs bythe percentage of correct sign prediction [de Bodtet al., 2001; Giles et al., 2001; Walczak, 2001].This measure captures the percentage of forecastsin the test set for which the following inequality issatisfied:

(x̂t+d − xt+d−1) · (xt+d − xt+d−1) > 0, (5)

where, x̂t+d represents the prediction generated bythe FNN, xt+d refers to the true value of theexchange rate at period t + d and, finally, xt+d−1

stands for the value of the exchange rate at the cur-rent period, t + d − 1. Correct sign prediction ineffect captures the percentage of profitable tradesenabled by the forecasting system. To successfullytrain FNNs capable of forecasting the direction ofchange of the time series, a modified, nondifferen-tiable, error function was implemented:

Ek =

0.5 · |xt+d − x̂t+d| , if (x̂t+d − xt+d−1)· (xt+d − xt+d−1)> 0

|xt+d − x̂t+d| , otherwise.(6)

Since RPROP, SCG and AOBP are gradient basedalgorithms, this function is employed only whenthe FNNs are trained through the DE and PSOalgorithms.

Numerical experiments were performed usinga Clustering and a Neural Network C++ Inter-face, built under the Fedora Core Linux 3.0operating system using the GNU compiler col-lection (gcc) version 3.4.2. The results obtainedare reported in Tables 1–3 and the accompany-ing figures. Each table reports the total numberof clusters identified in the training set. Further-more, it reports the number of clusters to whichtest set patterns were assigned. For each suchcluster the number of patterns from the train-ing set and the test set assigned to this clus-ter are also reported. Notice that irrespective of

Table 1. UKW: Results.

Patterns in the

Train Set Test Set

Cluster 1 84 33Cluster 2 82 57Cluster 3 65 3Cluster 4 239 67Cluster 5 210 90

Total number of clusters: 13.Clusters used in test set: 5.

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cluster 5 Cluster 2 Cluster 3 Cluster 1 Cluster 4

Sig

n P

redi

ctio

n

RPROPAOBP

SCGDE

PSO

Fig. 2. Proportion of correct sign prediction based on the clustering ofthe input space using the UKW algorithm.


Table 2. DBSCAN: Results.

Patterns in the

Train Set Test Set

Outliers 1353 123Cluster 1 95 59Cluster 2 81 53Cluster 3 4 4Cluster 4 11 11


0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cluster 4 Cluster 1 Cluster 2 Outliers Cluster 3

Sig

n P

redi

ctio

n

RPROPAOBP

SCGDE

PSO

Fig. 3. Proportion of correct sign prediction based on the clustering ofthe input space using the DBSCAN algorithm.

Table 3. GNG: Results.

Patterns in the

Train Set Test Set

Cluster 1 90 57Cluster 2 61 29Cluster 3 94 6Cluster 4 496 158


0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cluster 2 Cluster 3 Cluster 1 Cluster 4

Sig

n P

redi

ctio

n

RPROPAOBP

SCGDE

PSO

Fig. 4. Proportion of correct sign prediction based on the clustering ofthe input space using the GNG algorithm.

the clustering algorithm, a relatively small pro-portion of the patterns contained in the trainingset was actually used to generate the predictions,since training only the FNNs corresponding tothe particular clusters is necessary. The accom-panying figures provide candlestick plots. Eachcandlestick depicts for a cluster and a trainingalgorithm the forecasting accuracy with respect signprediction, obtained over 100 experiments. A filledbox is plotted between the first and third quar-tile of the data. The lines extending from each end

of the box (whiskers) show the range of the data.The black line inside the box stands for the meanvalue of the measurements. An immediate observa-tion from the inspection of the figures is that thereare significant differences in the predictability of thedifferent clusters, irrespective of the clustering algo-rithm. Moreover, within the same cluster, differenttraining algorithms produced FNNs yielding differ-ent predictive accuracy.

For clusters 1, 3, 5 identified by the UKW algo-rithm and having the corresponding FNNs trained


by the DE and PSO algorithms, a mean predic-tive accuracy exceeding 60% was achieved. Thesethree clusters together comprise more than 50% ofthe test set. However, the predictability of cluster4 (26.8% of the test set) is rather low. As previ-ously mentioned, DBSCAN has the ability to iden-tify outliers in a dataset. In this case close to 50%of the patterns of the test set were characterizedas outliers. The FNN trained on these patternsproduced a poor performance. On the other hand,the mean predictability for clusters 1 and 2 wasaround 55%. Note that cluster 3 (to which four testpatterns were assigned) exhibited extremely highpredictability. The GNG algorithm distinguishedcluster 2 for which the corresponding FNN pro-duced a mean accuracy close to 60% irrespectiveof the training algorithm used. For cluster 4, PSOand DE exhibited good performance, but the otherthree algorithms yielded the worst performance wit-nessed in this study.

4. Discussion and ConcludingRemarks

In this study, we report results from the applica-tion of unsupervised clustering algorithms, com-bined with feedforward neural networks in exchangerate time series forecasting. The desirable propertyof unsupervised clustering algorithms is that theycan automatically approximate the number of parti-tions (clusters), thus relieving the user from makingthis critical, problem-specific, choice.

Combining this input space partitioningmethodology with feedforward neural networksacting as local predictors for each identified clus-ter helps alleviate the problem of nonstationar-ity frequently encountered in real-life applications.Feedforward neural networks are selected as localapproximation models due to their ability to copewith noise. Through this approach an improvement,compared to global feedforward neural networks, inthe one-step-ahead prediction of the direction ofchange of the daily spot exchange rate of the Ger-man Mark to the US Dollar was achieved.

Among the unsupervised clustering algorithmsconsidered, UKW’s performance is more robust.Both the DBSCAN and the GNG algorithms how-ever, were capable of identifying meaningful clus-ters that yielded increased predictability in thetest set. From the training algorithms considered,FNNs trained using the AOBP training algorithmexhibited the highest maximum performance. The

performance of the population based algorithms,DE and PSO, exhibited wide variations.

Future work will include the synthesis ofthe results of the different clustering algorithmsto improve the forecasting performance in largerregions of the input space, and also the examina-tion of other real-life time series.

Acknowledgments

The authors thank the European Social Fund(ESF), Operational Program for Educational andVocational Training II (EPEAEK II) and partic-ularly the Program PYTHAGORAS, for fundingthis work. This work was supported in part bythe “Karatheodoris” research grant awarded by theResearch Committee of the University of Patras.

References

Alevizos, P., Boutsinas, B., Tasoulis, D. K. & Vrahatis,M. N. [2002] “Improving the orthogonal rangesearch k-windows clustering algorithm,” Proc. 14thIEEE Int. Conf. Tools with Artificial Intelligence,Washington, D.C., pp. 239–245.

Alevizos, P., Tasoulis, D. K. & Vrahatis, M. N. [2004]Parallelizing the Unsupervised k-Windows Cluster-ing Algorithm, Lecture Notes in Computer Science,Vol. 3019, pp. 225–232.

Bank of International Settlements [2001] “Central banksurvey of foreign exchange and derivative mar-ket Activity in April 2001,” Bank of InternationalSettlements.

Bezdek, J. C. [1981] Pattern Recognition with FuzzyObjective Function Algorithms (Kluwer AcademicPublishers).

Cao, L. [2003] “Support vector machines experts for timeseries forecasting,” Neurocomputing 51, 321–329.

Clerc, M. & Kennedy, J. [2002] “The particle swarm-explosion, stability, and convergence in a multidimen-sional complex space,” IEEE Trans. Evol. Comput. 6,58–73.

de Bodt, E., Rynkiewicz, J. & Cottrell, M. [2001] “Someknown facts about financial data,” European Symp.Artificial Neural Networks (ESANN’2001), pp. 223–236.

Eberhart, R. C., Simpson, P. & Dobbins, R. [1996]Computational Intelligence PC Tools (AcademicPress).

Ester, M., Kriegel, H. P., Sander, J. & Xu, X. [1996]“A density-based algorithm for discovering clustersin large spatial databases with noise,” Proc. 2ndInt. Conf. Knowledge Discovery and Data Mining,pp. 226–231.

Fraser, A. M. [1989] “Information and entropy in strangeattractors,” IEEE Trans. Inform. Th. 35, 245–262.


Fritzke, B. [1995] “A growing neural gas network learnstopologies,” Advances in Neural Information Process-ing Systems, eds. Tesauro, G., Touretzky, D. S. &Leen, T. K. (MIT Press, Cambridge MA), pp. 625–632.

Giles, L. C., Lawrence, S. & Tsoi, A. H. [2001] “Noisytime series prediction using a recurrent neural net-work and grammatical inference,” Mach. Learn. 44,161–183.

Hartigan, J. A. & Wong, M. A. [1979] “A k-means clus-tering algorithm,” Appl. Stat. 28, 100–108.

Hastie, T., Tibshirani, R. & Friedman, J. [2001] The Ele-ments of Statistical Learning (Springer-Verlag).

Haykin, S. [1999] Neural Networks: A ComprehensiveFoundation (Macmillan College Publishing Company,NY).

Hegger, R., Kantz, H. & Schreiber, T. [1999] “Practicalimplementation of nonlinear time series methods: TheTISEAN package,” Chaos 9, 413–435.

Kennel, M. B., Brown, R. & Abarbanel, H. D. [1992]“Determining embedding dimension for phase–spacereconstruction using a geometrical construction,”Phys. Rev. A 45, 3403–3411.

Keogh, E. & Folias, T. [2002] The UCR Time SeriesData Mining Archive.

Kohonen, T. [1997] Self-Organized Maps (Berlin,Springer).

Magoulas, G. D., Plagianakos, V. P. & Vrahatis, M. N.[2001] “Adaptive stepsize algorithms for online train-ing of neural networks,” Nonlin. Anal. T.M.A. 47,3425–3430.

Møller, M. [1993] “A scaled conjugate gradient algorithmfor fast supervised learning,” Neural Networks 6,525–533.

Parsopoulos, K. E. & Vrahatis, M. N. [2002] “Recentapproaches to global optimization problems throughparticle swarm optimization,” Natural Comput. 1,235–306.

Pavlidis, N. G., Tasoulis, D. K. & Vrahatis, M. N. [2003]“Financial forecasting through unsupervised Cluster-ing and evolutionary trained neural networks,” Proc.Congress on Evolutionary Computation (CEC 2003),pp. 2314–2321.

Pavlidis, N. G., Tasoulis, D. K. & Vrahatis, M. N. [2005]“Time series forecasting methodology for multiple–step–ahead prediction,” The IASTED Int. Conf.Computational Intelligence (CI 2005), pp. 456–461.

Pinkus, A. [1999] “Approximation theory of the mlpmodel in neural networks,” Acta Numerica, 143–195.

Plagianakos, V. P., Magoulas, G. D. & Vrahatis, M. N.[2000] “Global learning rate adaptation in online neu-ral network training,” Proc. Second Int. ICSC Symp.Neural Computation (NC’2000).

Plagianakos, V. P. & Vrahatis, M. N. [2002] “Paral-lel evolutionary training algorithms for ‘hardware–

friendly’ neural networks,” Natural Comput. 1,307–322.

Principe, J. C., Wang, L. & Motter, M. A. [1998]“Local dynamic modeling with self–organizing mapsand applications to nonlinear system identificationand control,” Proc. IEEE 86, 2240–2258.

Milidiu, R. L., Machado, R. J. & Renteria, R. P. [1999]“Time-series forecasting through wavelets transfor-mation and a mixture of expert models,” Neurocom-puting 28, 145–156.

Riedmiller, M. & Braun, H. [1993] “A direct adap-tive method for faster backpropagation learning: TheRPROP algorithm,” Proc. IEEE Int. Conf. NeuralNetworks (San Francisco, CA), pp. 586–591.

Sandberg, I. W. & Xu, L. [1997a] “Uniform approxima-tion and gamma networks,” Neural Networks 10, 781–784.

Sandberg, I. W. & Xu, L. [1997b] “Uniform approxima-tion of multidimensional myopic maps,” IEEE Trans.Circuits Syst. I: Fund. Th. Appl. 44, 477–485.

Sander, J., Ester, M., Kriegel, H.-P. & Xu, X. [1998]“Density-based clustering in spatial databases: Thealgorithm GDBSCAN and its applications,” DataMin. Knowl. Discov. 2, 169–194.

Sfetsos, A. & Siriopoulos, C. [2004] “Time series fore-casting with a hybrid clustering scheme and patternrecognition,” IEEE Trans. Syst., Man Cybern. PartA: Syst. Humans 34, 399–405.

Storn, R. & Price, K. [1997] “Differential evolution —A simple and efficient adaptive scheme for global opti-mization over continuous spaces,” J. Glob. Optim. 11,341–359.

Sutton, R. S. & Whitehead, S. D. [1993] “Online learningwith random representations,” Proc. Tenth Int. Conf.Machine Learning (Morgan Kaufmann), pp. 314–321.

Tasoulis, D. K. & Vrahatis, M. N. [2004] “Unsuperviseddistributed clustering,” IASTED Int. Conf. Paralleland Distributed Computing and Networks (Innsbruck,Austria), pp. 347–351.

Vrahatis, M. N., Boutsinas, B., Alevizos, P. & Pavlides,G. [2002] “The new k-windows algorithm for improv-ing the k-means clustering algorithm,” J. Compl. 18,375–391.

Walczak, S. [2001] “An empirical analysis of datarequirements for financial forecasting with neural net-works,” J. Manag. Inform. Syst. 17, 203–222.

Weigend, A. S., Mangeas, M. & Srivastava, A. N. [1995]“Nonlinear gated experts for time series: Discoveringregimes and avoiding overfitting,” Int. J. Neural Syst.6, 373–399.

White, H. [1990] “Connectionist nonparametric regres-sion: Multilayer feedforward networks can learn arbi-trary mappings,” Neural Networks 3, 535–549.

Yao, J. & Tan, C. L. [2000] “A case study on using neuralnetworks to perform technical forecasting of forex,”Neurocomput. 34, 79–98.

COMPUTATIONAL INTELLIGENCE METHODS FOR FINANCIAL TIME SERIES MODELING

Documents

Transcript of COMPUTATIONAL INTELLIGENCE METHODS FOR FINANCIAL TIME SERIES MODELING