Data Modeling for Kuala Lumpur Composite Index with ANFIS

6
Data Modeling for Kuala Lumpur Composite Index with ANFIS Zuriahati Mohd Yunos, Siti Mariyam Shamsuddin, Roselina Sallehuddin Soft Computing Research Group Faculty of Computer Science and Information System, University Technology Malaysia, 81300 Skudai, Johor zuriahati, mariyam, roselina @utm.my Abstract Stock market transaction is one of the most popular investments activities. There are many conventional techniques being used and these include technical and fundamental analysis. Recently, AI such as ANN, GA, FL and RS are widely used by the researchers due to their ability to predict the behavior of the stock market efficiently. In this research, a comprehensive pre- processing data modeling of stock market is developed to acquire granular information that represents the behavior of the data that is to be fed to the classifier. The pre-process methodology includes splitting, scaling, normalization, feature selection, and follows by the Ten- Fold Cross Validation method as a benchmark for estimating the predictive accuracy and effectiveness of splitting and selecting the input data. Daily data of KLCI is captured and analyzed, and it is found that the movements of the Indices are unstable; hence the forecasting process becomes difficult. A Hybrid Neurofuzzy with ANFIS is suggested to predict the behavior of the Indices. Four technical indicators are chosen to analyze the data. To verify the effectiveness of the ANFIS model, two experimental have been carried out and the results show that ANFIS method is competent in forecasting the KLCI fabulously compared to ANN. Keywords: Hybrid, Neurofuzzy, Forecasting, ANFIS, ANN, technical analysis 1. Introduction Stock market has long been considered a high return investment field. The major forecasting method use in financial area is either technical or fundamental. Due to the fact that stock markets are affected by many highly interrelated economic, political and even psychological factors that interact with each other in a very complex fashion, it is very difficult to forecast the movement in the stock market. ANN has been found highly recommended in forecasting theory, leading to successful application in time series and explanatory sales forecasting. ANN is a well-tested method for financial analysis on stock market [6, 7, 8]. ANN has been shown to be able to decode nonlinear time series data, which adequately describes the characteristics of the stock market [10]. Other intelligent techniques such as fuzzy expert system and genetic algorithm also have been applied in forecasting problem. Examples using ANN in stock market applications include the indication of trading signals of selling and buying [8]. FIS 1 is a popular framework for solving complex problem based on the concept of fuzzy set theory, fuzzy if-then rules and fuzzy reasoning. And Neurofuzzy is an integrated combination between ANN and FIS. There are many research combined these two techniques and using it in financial problems [1, 2, 3, 4, 5, 10, 11, 13, 14]. Figure 1 shows the steps that involved in forecasting time series. Generally, it consists of collecting historical data, preprocessing data, implement ANFIS model and generate rules (Figure 1). Figure 1. Forecasting time series process In this research, the application of Neurofuzzy in predicting stock market closing price is implemented. We employ ANFIS network in modeling and predicting 1 FIS: Fuzzy Inference System Implement ANFIS model and generate rules Preprocessing data Input: raw data Technical indicators Output: closing price Second Asia International Conference on Modelling & Simulation 978-0-7695-3136-6/08 $25.00 © 2008 IEEE DOI 10.1109/AMS.2008.56 609

Transcript of Data Modeling for Kuala Lumpur Composite Index with ANFIS

Data Modeling for Kuala Lumpur Composite Index with ANFIS

Zuriahati Mohd Yunos, Siti Mariyam Shamsuddin, Roselina Sallehuddin Soft Computing Research Group

Faculty of Computer Science and Information System, University Technology Malaysia, 81300 Skudai, Johor

zuriahati, mariyam, roselina @utm.my

Abstract

Stock market transaction is one of the most popular investments activities. There are many conventional techniques being used and these include technical and fundamental analysis. Recently, AI such as ANN, GA, FL and RS are widely used by the researchers due to their ability to predict the behavior of the stock market efficiently. In this research, a comprehensive pre-processing data modeling of stock market is developed to acquire granular information that represents the behavior of the data that is to be fed to the classifier. The pre-process methodology includes splitting, scaling, normalization, feature selection, and follows by the Ten-Fold Cross Validation method as a benchmark for estimating the predictive accuracy and effectiveness of splitting and selecting the input data. Daily data of KLCI is captured and analyzed, and it is found that the movements of the Indices are unstable; hence the forecasting process becomes difficult. A Hybrid Neurofuzzy with ANFIS is suggested to predict the behavior of the Indices. Four technical indicators are chosen to analyze the data. To verify the effectiveness of the ANFIS model, two experimental have been carried out and the results show that ANFIS method is competent in forecasting the KLCI fabulously compared to ANN.

Keywords: Hybrid, Neurofuzzy, Forecasting, ANFIS, ANN, technical analysis 1. Introduction

Stock market has long been considered a high return investment field. The major forecasting method use in financial area is either technical or fundamental. Due to the fact that stock markets are affected by many highly interrelated economic, political and even psychological factors that interact with each other in a very complex fashion, it is very difficult to forecast the movement in the stock market. ANN has been found highly

recommended in forecasting theory, leading to successful application in time series and explanatory sales forecasting. ANN is a well-tested method for financial analysis on stock market [6, 7, 8]. ANN has been shown to be able to decode nonlinear time series data, which adequately describes the characteristics of the stock market [10]. Other intelligent techniques such as fuzzy expert system and genetic algorithm also have been applied in forecasting problem. Examples using ANN in stock market applications include the indication of trading signals of selling and buying [8]. FIS1 is a popular framework for solving complex problem based on the concept of fuzzy set theory, fuzzy if-then rules and fuzzy reasoning. And Neurofuzzy is an integrated combination between ANN and FIS. There are many research combined these two techniques and using it in financial problems [1, 2, 3, 4, 5, 10, 11, 13, 14]. Figure 1 shows the steps that involved in forecasting time series. Generally, it consists of collecting historical data, preprocessing data, implement ANFIS model and generate rules (Figure 1).

Figure 1. Forecasting time series process In this research, the application of Neurofuzzy in

predicting stock market closing price is implemented. We employ ANFIS network in modeling and predicting

1 FIS: Fuzzy Inference System

Implement ANFIS model and generate rules

Preprocessing data

Input: raw data

Technical indicators

Output: closing price

Second Asia International Conference on Modelling & Simulation

978-0-7695-3136-6/08 $25.00 © 2008 IEEEDOI 10.1109/AMS.2008.56

609

the KLCI2 stock. Neurofuzzy computing is a popular framework for solving complex problems [1, 2]. ANFIS is the basis network architecture and its hybrid-learning rule is proposed by Roger Jang [12]. Subsequently, many researchers have used these techniques particularly in economic forecasting [1, 2, 9, 15]. Hence, in this study, ANFIS is exploited to investigate its efficiency in predicting the KLCI movement. Subsequently, a comparison between the forecast value and the actual value is executed by reducing the errors forecast of the predicted value and the actual value. The obtained results are compared between the ANFIS model and ANN model.

The remainder of this paper is organized as follows. Section 2 depicts on technical analysis, and Section 3 presents the preprocessing data, follows by the introduction of ANFIS in Section 4. The results are presented in Section 5, and Section 6 confers on the evaluation of the experiment. Finally, Section 7 concludes the paper with future works. 2. Technical Analysis

From the emergence of stock markets, investors have

used technical analysis, fundamental analysis and even mathematical models in order to predict stock price. In finance there are two main analyses that are normally used to evaluate shares in order to make investment decision; fundamental and technical analysis. Technical analysis is used to evaluate the merit of a security by studying market statistics generated by market activity, past prices and volumes to forecast the future price movements. This theory is applicable to any tradable instrument where the price is influenced by the forces of demand and supply. Technical analysis disregards the financial statements of the issuer; instead it relies upon market trend to ascertain investor sentiments to predict the performance of the security. Theory believes that the history of price performance of stock is strong indication of its future performance and it is extremely accurate most of the time. The main objectives of this method is to study the appropriate market timing to enter and exit the stock market and to read the psychology of the market since it is believed that stock market movement is 10% logical and 90% psychological in nature.

In this study, we combined the technical analysis to capture the information and subsequently employed it to forecast the KLCI. The parameters are 14-days RSI3, 5-days MA4 and stochastic indicator (%K, %D). These parameters are the common stock market indicators. These parameters values are derived from the original stock market data.

2 KLCI: Kuala Lumpur Composite Index 3 RSI: Relative Strength Index 4 MA: Moving Average

Stochastic Indicator The stochastic indicator is a momentum oscillator

that can alert the strength or weakness of the market. It is plotted as two lines called %K, a fast line and %D, a slow line which is %K line is more sensitive than %D, %D line is a moving average of %K and %D line triggers the trading signals.

Suppose that %K as a fast moving average and %K as a slow moving average, and the lines are plotted on a scale of 1 to 100. “Trigger” lines are normally drawn on stochastic charts at the 80% and 20% levels. A signal is generated when these lines are crossed as below.

(1) Relative Strength Index 14-days

RSI compares the relative strength of price gains on days that close above the previous day close to price losses on days that close below the previous day’s close.

RS is the average of positive closing changes for a specified number of days divided by the average of negative closing changes for the same number of days (Equation (2)).

(2) Moving Average

The concept is computing means over a period of time. It can be done for the period of week, fortnight, month or any suitable length of time. MA cooperates as the detector of trend directions, thus determining the signal of buying and selling. If closing prices remain above the average price of MA, then the market is an up trend. Hence, it is suitable for sell activity. If the closing prices remain below the average price, the market is showing downward trend and buying activity would be the best to take place during that period.

(3)

610

3. Preprocessing Data

Generally, data preprocessing smoothes the efficiency and gives better generalization ability. In this study, we combined the technical analysis methods to capture the information for KLCI forecasting. The input data selections are carried out using common technical analysis such as moving average, stochastic indicator and relative strength index and the close price data itself. These data are pre-processed prior to the daily prediction. Table 1 show the list of input and output variables that are used in ANFIS model, whereas Table 2 shows the sample data use in this research. For stock market, the data should be changed in returns value using Equation (4) [13].

(4)

Table 1. Input and output variables for ANFIS model

Variables Mapping Description Input 1 Close price a day before t-1 Input 2 Current close price T Input 3 5-days moving average MA5 Input 4 14-days relative strength index RSI-14 Input 5 Stochastic indicator %K Input 6 Stochastic indicator %D Input 7 Output t+1

Table 2. Sample data used in the model

(27 March - 31 May 2004)

Training data Testing data Real data Sample 1 1000 July 2004 Sample 2 700 July 2004 Sample 3 550 July 2004

Partitioning data is the process of dividing the data

into training set, testing set and validation set. Hence, we use ten-fold cross validation; the data is split into 10 equal partitions. Each partition is used for training while the remainder is used for training. For example, 9/10 data is used for training and 1/10 for testing and the procedure is repeated for 10 times (Figure 2).

Table 3. Results generate through fuzzy decision tree

No. of nodes Degree of membership No. of rules (2n)

2 2 4 3 2 8 4 2 16 5 2 32 6 2 64

Figure 2. Ten-fold cross validation and simulation Subsequent to partitioning process, the training set is

split into 80% for training data and 20% for testing data. The training data is propagated into the network together with the target value. As a result, every sample data will contain 6-input nodes and 1-output nodes. The training data is used to train the network and to develop the prediction model. To validate and verify the model, we test the model with the new data without the target value, which are July daily data.

Table 3 shows the results generated from Ffzzy decision tree that consists of number of input nodes degree of membership and numbers of rules. This information is crucial in developing the forecasting model.

4. Anfis Model

ANFIS model works for various input combinations and trains them with a single application of the least-squares method. Consequently, choose the best model with the best performance and later, proceed for further training. This learning method works similarly to that of neural networks. Figure 3 shows ANFIS network architecture that corresponds to the first-order Sugeno fuzzy model, while Figure 5 shows the basic diagram of ANFIS computation.

Figure 3. ANFIS architecture

Experiment 1

Experiment 2

Experiment 3

Experiment 4

Experiment 5

Experiment 6

Experiment 7

Experiment 8

Experiment 9

Experiment 10

Training set Testing set

611

Figure 4. Basic flow diagram of ANFIS computation ANFIS only supports Sugeno-type systems, and these

must have the following properties: • Be first or zero order Sugeno-type systems. • Have a single output, obtained using weighted

average defuzzification. All output membership functions must be the same type and either is linear or constant.

• Have no rule sharing. Different rules cannot share the same output membership function, namely the number of output membership functions must be equal to the number of rules.

• Have unity weight for each rule. In this study, Gaussian membership function is

applied in the input layer. A membership function (MF) is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 to 1.

Layer 1 is the input layer. Every node i in this layer is an adaptive node with a node function where x or y is the input to node i and 1 2 1 2, , ,A A B B is a linguistic label associate with this node.

Nodes in this layer perform fuzzification. In Jang’s model, fuzzification nodes have a bell function and it can be specified as:

or gaussian function

where x is the input and ( )x

Aiu is the output. While

( , , )i i ia b c are parameter that control, respectively, the center and slope of the Gaussian or Bell activation function of nodes i. Parameter in this layer are referred to as premise parameters.

Layer 2 is the rule layer. Each node in this layer corresponds to a Sugeno-type fuzzy rule. A rule node receives inputs from the respective fuzzification nodes and calculates the firing strength of the rule it represents:

Layer 3 is also known as normalization layer. Each node in this layer receives inputs from all nodes in the rule layer and calculates the normalized firing strength of a given rule.

The normalized firing strength is the ratio of a given rule to the sum of firing strength of all rules:

Layer 4 is the defuzzification layer. Each node in this layer connected to the respective normalization node and also receive initials inputs, x and y. A defuzzification node calculates the weighted consequent value of a given rule as:

where iw is a normalized firing strength from layer 3

and , ,i i ip q r is the parameter set of this node. Parameters in this layer are referred to as consequent parameters.

Layer 5 is represented by a single summation node, which computes the the whole output of defuzzification nodes and produces ANFIS output.

5. Experimental Results

Two experimental sets of data are tested. Experiment Set 1 is for changing the network structure and Experiment Set 2 is for period of forecasting. The purpose of these experimental is to identify the effectiveness of the network performances. Table 4(a) and 4(b) show the list of changing for each experiment. To select the best model for each experiment, we measure the accuracy by using RMSE and MAPE.

For experiment 1 each sample data has five different network structures (Table 4(a)). As we test the small network structure, the network converges very fast. However, subsequent to the increment of network structure, the convergences take a longer time to terminate. Despite longer learning time, the results are encouraging when we apply on large network structure. This is probably due to its incompatible for huge data and number of maximum input nodes. On the other hand, Experiment 2 reveals different period of forecasting (Table 4(b)).

Initialize the fuzzy system

Give other parameters for learning Important are: number of iterations (epochs) &

tolerance (error)

Start learning process Use commands Anfis

Stop when tolerance is achieved

Validate with independent data

612

Table 4(a). List of network for experiment 1

Sample 1 2-4-1 3-8-1 4-16-1 5-32-1 6-64-1 Sample 2 2-4-1 3-8-1 4-16-1 5-32-1 6-64-1 Sample 3 2-4-1 3-8-1 4-16-1 5-32-1 6-64-1

Table 4(b). Period of forecast for experiment 2

(n-days ahead)

Sample 1 10 30 50 Sample 2 10 30 50 Sample 3 10 30 50

Table 5 and Table 6 illustrate the results of

Experiment 1 and Experiment 2. The best-forecasting model for each sample in Experiment 1 is given by the structure 3-8-1 with 3 input nodes and 8 rules. From the table, the RMSE error is quite small and MAPE shows the higher prediction accuracy, which is nearly to 100% (Table 5). Table 5. Prediction accuracy for Experiment 1 for each sample

data

ANFIS Sample Data RMSE MAPE Best

network Sample 1 0.00376 99.32 3-8-1 Sample 2 0.003008 99.45 3-8-1 Sample 3 0.004069 99.30 3-8-1

The best model for data Sample 1, Sample 2 and

Sample 3 are given by 5-32-1, 3-8-1 and 2-4-1 architecture, respectively. Encouraging results are obtained wherein the RMSE errors are getting smaller and the prediction accuracy is closely to 100%.

Table 6. Prediction accuracy for Experiment 2 for each

sample data

ANFIS Sample Data Period RMSE MAPE Best network Sample 1 10 0.002825 99.49 5-32-1 Sample 2 50 0.003008 99.45 3-8-1 Sample 3 10 0.003871 99.34 2-4-1

6. Results Evaluation

In this study, the evaluation performance is exploited by probing the different forecasting methods in terms of the error between the actual and the desired output of ANFIS model and ANN model. For the ANN model, back-propagation is trained with different topologies as shown in Table 7. As a result, Table 8 and Table 9 show the results between ANFIS model and ANN model for experiment 1 and experiment 2. From the table, it depicts that ANFIS model is capable to forecast the KLCI stocks better compare to ANN model. Sample 2 gives ANFIS as the best model with the structure of 3-8-1. Hence, the

best model for data Sample 1 and Sample 3 are given by ANN. Analogous to the previous result, data Sample 2 shows that ANFIS is the best model whereas ANN is the best model for data Sample 1 and Sample 3.

Table 7. Neural network architecture to be tested

n 2n 2n+1 2-2-1 2-4-1 2-5-1 3-3-1 3-6-1 3-7-1 4-4-1 4-8-1 4-9-1 5-5-1 5-10-1 5-11-1 6-6-1 6-12-1 6-13-1 8-8-1 8-16-1 8-17-1

10-10-1 10-20-1 10-21-1 Notes: n is number of input nodes

Table 8. Comparison between ANFIS and ANN for

Experiment 1

ANFIS ANN RMSE MAPE Best

N/w RMSE MAPE Best N/w

Sample 1 0.00376 99.32 3-8-1 0.003118 99.49 10-10-1

Sample 2 0.003008 99.45 3-8-1 0.00321 99.43 4-8-1

Sample 3 0.004069 99.30 3-8-1 0.003625 99.36 10-20-1

Table 9. Comparison between ANFIS and ANN for

Experiment 2

ANFIS ANN RMSE MAPE Best

N/w RMSE MAPE Best N/w

Sample 1 0.002825 99.49 5-32-1 0.0021 99.63 6-6-1

Sample 2 0.003008 99.45 3-8-1 0.0037 99.34 4-9-1

Sample 3 0.003871 99.34 2-4-1 0.0021 99.58 5-11-1

To verify and validate the best model, KLCI of July

daily data are used as a tested data. For Experiment 1, the best ANFIS and ANN model are given by 3-8-1 and 10-10-1 architectures; and for Experiment 2, the best ANFIS and ANN model are given by 5-32-1 and 6-6-1 architectures. This entire model has been test using the real dataset (July 2004) (refer to Table 10). The errors are getting smaller after testing with real data and the prediction accuracy is approaching to 100%. It shows that ANFIS model is capable to forecast the KLCI stock market. Figure 6 and Figure 7 show the result of KLCI real data, ANFIS and ANN output.

Table 10. Result tested with real data (July 2004)

ANFIS ANN

N/w RMSE MAPE N/w RMSE MAPE

Exp. 1 3-8-1 0.0027 99.5 10-10-1 0.00317 99.47 Exp. 2 5-32-1 0.0022 99.59 6-6-1 0.00259 99.57

613

7. Conclusions and future works

In working towards a more robust financial forecasting model, the following issues are worth examining. First, instead of emphasizing on the forecasting accuracy, other financial criteria should be considered. As we understand that a perfect forecasting is impossible in reality. Second, there should be adequate organization and processing of forecasting data. Preprocessing and proper sampling of input data can have impact on the forecasting performance. Choice of indicators as input through sensitivity analysis could help to eliminate redundant inputs. Third, a trading system should be used to decide on the best tool to use. ANN is not the single tool that can be used for financial forecasting. We also cannot claim that it is the best forecasting tool since to conduct post forecasting analysis; we need to find out the suitability of the models and series. Acknowledgement This work is supported by Universiti Teknologi Malaysia, Skudai Johor Bahru MALAYSIA and Ministry of Higher Education (MOHE) under Fundamental Research Grant Scheme (FRGS). Authors would like to thank Soft Computing Research Group, Faculty of Computer Science and Information Systems for their incisive comments and for suggesting many helpful ways to improve this article. References [1] Abraham, A., Nath, B. and Mahanti, P.K. (2001).

Hybrid Intelligent for Stock Market Analysis. Springer-Verlag Berlin Heidelberg, 337 – 345.

[2] Abraham. A., Nath, B. and Nath, M (2001). A Neuro-Fuzzy Approach for Forecasting Electricity Demand in Victoria. Applied Soft Computing Journal, Elsevier Science, 127-138.

[3] Abraham, A. (2001(a)). Beyond Neuro-Fuzzy Systems: Reviews, Prospects, Perspectives and Directions, 7th International Mendel Conference on Soft Computing (MENDEL 2001), Published by Brno University of Technology, Matousek Radek and Osmera Pavel (Eds), Brno, ISBN 80-214-1894-X, Czech Republic, (06-08).

[4] Abraham, A. (2001(b)). Neuro-Fuzzy Systems: State-of-the-Art Modeling Techniques Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence, Springer-Verlag Germany, Jose Mira and Alberto Prieto (Eds.), ISBN 3-540-42235-8, Granada, Spain, 269-276.

[5] Bouqata B., Bensaid A.M., Palliam R. and Gomez S.A., (2000). Time Series Prediction Using Crisp

and Fuzzy Neural Networks: A Comparative Study, Conference on Computational Intelligence for Financial Engineering, New York.

[6,] Jing T. Y, Chew. L. T. and Hean L. P., (1999). Neural Networks for Technical Analysis: a Study on KLCI, International Journal of Theoretical and Applied Finance, Vol. 2, No.2, pp221-241.

[7] Jing T. Y, Chew. L. T., (2002). Neural Networks for Technical Forecasting of Foreign Exchange Rates, in Smith, K. A. and Gupta, J. N. D (eds.), Neural Networks in Business: Techniques and Applications, Idea Group Publishing, Hershey, Pennsylvania, pp191-207.

[8] Jing T. Y, Chew. L. T., (2000). A Case Study on Using Neural Networks to Perform Technical Forecasting of Forex, Neurocomputing, Vol. 34, No. 1-4, pp79-98.

[9] Konstantinos N., Pantazapoulos, Tsoukalas, Lefteri H.;Bourbakis, Nikolaos G.;Brun, Michael J., Houstis, And Elias N. (1998). Financial Prediction and Trading Strategies Using Neurofuzzy Approaches Source.. IEEE Transactions on Systems, Man, and Cybernetics, Piscataway, NJ, USA, pp. 520-531

[10] Lapedes, A., Farber, R. (1987). How neural network. In: Anderson, D.Z., (Ed.), Neural Information Processing Systems, American Institute of Physics, New York, pp. 442–456.

[11] Pantazopoulos K.N., Tsoukalas L.H., And Houstis E.N. (1997). Neurofuzzy Characterization of Financial Time Series in an Anticipatory Framework. IEEE, Piscataway, NJ, USA, pp. 50-56.

[12] Roger, J.S., And Sun, C.T., (1995). Predicting Chaotic Time Series with Fuzzy if-then Rules, Proceedings of IEEE International Conference on Fuzzy Systems, San Fancisco.

[13] Siekmann S., Gebhardt J., And Kruse R. (1999). Information Fusion in the Context of Stock Index Prediction, Springer-Verlag Berlin Heidelberg, pp. 363 – 373.

[14] Wu Xiaodan, Fung Ming, And Flitman A.(2001). Forecasting Stock Market Performance Using Hybrid Intelligent System, Springer-Verlag Berlin Heidelberg, pp. 447 – 456.

[15] Kodogiannis, V. & Lolis, A. (2002). Forecasting Financial Time Series Using Neural Network and Fuzzy System-based Techniques. Springer-Verlag London Heidelberg, pp.90 – 102.

614