A Convex Combination Approach for Artificial Neural Network ...
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of A Convex Combination Approach for Artificial Neural Network ...
applied sciences
Article
A Convex Combination Approach for Artificial Neural Networkof Interval Data
Woraphon Yamaka , Rungrapee Phadkantha * and Paravee Maneejuk
�����������������
Citation: Yamaka, W.; Phadkantha,
R.; Maneejuk, P. A Convex
Combination Approach for Artificial
Neural Network of Interval Data.
Appl. Sci. 2021, 11, 3997. https://
doi.org/10.3390/app11093997
Academic Editor: Tag Gon Kim
Received: 12 April 2021
Accepted: 26 April 2021
Published: 28 April 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Center of Excellence in Econometrics, Faculty of Economics, Chiang Mai University, Chiang Mai 50200, Thailand;[email protected] (W.Y.); [email protected] (P.M.)* Correspondence: [email protected]
Abstract: As the conventional models for time series forecasting often use single-valued data (e.g.,closing daily price data or the end of the day data), a large amount of information during the day isneglected. Traditionally, the fixed reference points from intervals, such as midpoints, ranges, andlower and upper bounds, are generally considered to build the models. However, as different datasetsprovide different information in intervals and may exhibit nonlinear behavior, conventional modelscannot be effectively implemented and may not be guaranteed to provide accurate results. To addressthese problems, we propose the artificial neural network with convex combination (ANN-CC) modelfor interval-valued data. The convex combination method provides a flexible way to explore thebest reference points from both input and output variables. These reference points were then usedto build the nonlinear ANN model. Both simulation and real application studies are conducted toevaluate the accuracy of the proposed forecasting ANN-CC model. Our model was also comparedwith traditional linear regression forecasting (information-theoretic method, parametrized approachcenter and range) and conventional ANN models for interval-valued data prediction (regularizedANN-LU and ANN-Center). The simulation results show that the proposed ANN-CC model is asuitable alternative to interval-valued data forecasting because it provides the lowest forecastingerror in both linear and nonlinear relationships between the input and output data. Furthermore,empirical results on two datasets also confirmed that the proposed ANN-CC model outperformedthe conventional models.
Keywords: artificial neural network; convex combination method; interval-valued data; time series
1. Introduction
Time-series point (single-value data) forecasting normally fails to reflect the range offluctuation or uncertainty for economic, financial, and environmental data. Moreover, theexisting model of interval forecasting is still incomplete, complex, and has relatively lowaccuracy. Thus, interval-value data forecasting has become an important issue to be investi-gated [1,2]. This study comes within interval-valued time series forecasting framework byintroducing the convex combination (CC) method designed to choose the reference pointsthat better represent the interval-valued data. The CC method automatically explores theset of reference points from input and output variables to build the neural networks (NN)models. This is an enhancement and a generalization over existing methods.
Interval-valued data forecasting serves the needs of investors and data scientists whoare sometimes interested in single value-data and the variability of value intervals in thedata. Interval-valued data provides rich information that can help investors and datascientists make an accurate decision [3]. With the advance in data science, enormousinformation and big data can be collected nowadays. However, conventional forecastingmethods cannot be effectively implemented to deal with this big data to yield accurateresults. Furthermore, these methods are generally proposed to forecast future observationsusing only the point-valued data, which gives rise to a higher computation cost whendealing with big data. Moreover, it is sometimes difficult to express the real behavior
Appl. Sci. 2021, 11, 3997. https://doi.org/10.3390/app11093997 https://www.mdpi.com/journal/applsci
Appl. Sci. 2021, 11, 3997 2 of 25
of the variable using only point-valued data. Thus, interval-valued data, such as therange of temperature, stock returns, and willingness to pay (minimum and maximum),is generally used to predict uncertainty in many situations. Interval analysis suggestedby Lauro and Palumbo [4] assumed that observations and estimations in the world areusually uncertain, incomprehensive, and do not precisely represent the real behavior of thedata. This is quite true, as our use of point-valued data will entitle us to lose substantialinformation about prices or values during the day or the week. With interval-valued data,however, we can capture more realistic movement or information of the variables and canalso handle big data forecasting at the same time. Thus, the interval approach should beconsidered to explain real data behavior under the context of big data. Interval-valued datais a special type for symbolic data analysis (SDA), composed of lower and upper boundsof an interval. The objective of SDA is to provide a way to construct aggregated datadescribed by multivalued variables, and thereby provide an efficient way to summarizethe large data sets by some reference value of symbolic data. Thus, the estimation tools forinterval-valued data analysis are very intensively required in recent studies. The use ofinterval-data to represent uncertainty is common in various situations, such as in coalitionalgames where payoffs of coalitions are uncertain and could be modeled as intervals of realnumbers, see, e.g., Branzei et al. [5] and Kiekintveld et al. [6].
From a methodological point of view, interval-valued data is generally transformedinto point-valued data or reference point data by some techniques; then, this referencepoint data is furthered used as a variable in the models. One of the famous approachesfor dealing with interval-valued data is the mid-point method which was introduced byBillard and Diday [7]. They analyzed these data using ordinary least squares regression onthe midpoints of the intervals, namely the lower and upper bounds of the independentand dependent variables. Neto and De Carvalho [8] improved this approach by presentinga new method based on two linear regression models (center-range method). The firstregression model is fitted at the midpoints of the intervals and the second one on theranges of the intervals. It was found to be more efficient than that of Billard and Diday [7].More recently, Souza et al. [9], Chanaim et al. [10], and Phadkantha et al. [11] argued thatthe mid-point and range methods are not appropriate to be the reference of the intervaldata as they cannot present the real behavior of the data and both are also too restrictive.Thus, Souza et al. [9] introduced the parametrized approach (PM) to the intervals of inputvariables. This method can choose the reference points that better represent the inputintervals before building the regression. Chanaim et al. [10] and Phadkantha et al. [11]extended the PM approach by suggesting the convex combination (CC) method to getthe reference points of the input and output intervals before estimating the regressionmodel. Specifically, instead of restricting the weight of interval-valued data (w) to be0.5 (mid-point), Xc = 0.5(Xu) + 0.5(Xl), they generalized this weight to be unknown.Hence, the reference point data become Xcc = w(Xu) + (1−w)(Xl), where w ∈ [0, 1]is the weight parameter. Then, this reference point data is used as the input data in theregression model. Buansing et al. [12] also proposed the iterative, information-theoretic(IT) method for forecasting interval-valued data. Their method differs from others, asit does not assume that every point in the interval emerged from the same underlyingprocess; there may be multiple models behind the process. They showed that the IT methodprovides more accurate forecasting of the upper and lower bounds when compared to thecenter-range method.
There is voluminous literature on stock market estimation and forecasting based on awide variety of both linear and nonlinear models. Linear models like autoregressive (AR)and autoregressive moving average (ARMA) [13,14] and nonlinear models like thresholdand Markov switching models [15–18] have been used for time series forecasting. However,the equivocal and unforeseeable nature of the time series data has brought about thedifficulty in prediction [19], so artificial neural networks (ANN) models were introduced toforecast the future return of the stock. The major advantage of ANN models is their flexiblenonlinear modeling capability. With ANN, there is no need to specify a particular model
Appl. Sci. 2021, 11, 3997 3 of 25
form. Instead, the model is adaptively formed based on the features inherent in the data.This data-driven approach is suitable for many empirical data sets where no theoreticalguidance is available to suggest an appropriate data generating process. ANN providesdesirable properties that some traditional linear and nonlinear regression models lack, suchas being noise tolerant. The structure of the ANN model is inspired by real-life information-processing abilities of the human brain. Key attributes of the brain’s information networkinclude a nonlinear, parallel information processing structure and multiple connectionsbetween information nodes [20].
In the recent decade, several studies have indicated the higher performance of theANN models in forecasting compared to that of regression models. Zhang et al. [21] andLeung et al. [22] examined various prediction models based on multivariate classificationtechniques and compared both neural network and classical forecasting models. Theirexperiment results suggested that the probabilistic neural network can outperform thelevel estimation models, including adaptive exponential smoothing, vector autoregressionwith Kalman filter updating, multivariate transfer function, and multilayered feedforwardneural network, in terms of the prediction accuracy. In a more recent study, Cao et al. [23]demonstrated the accuracy of ANN in predicting Shanghai Stock Exchange (SHSE) move-ment by comparing the neural networks and linear models under the capital asset pricingmodel (CAPM) and Fama and French’s three-factor contexts, and they found that neuralnetworks outperformed the linear models.
As we mentioned above, this paper comes within the framework of interval-valueddata forecasting by ANN. To the best of our knowledge, work-related to interval-valueddata forecasting by neural networks is somewhat limited. Some attempts along this lineinclude San Roque et al. [24], Maia et al. [25], Maia and De Carvalho [26], and Yang et al. [27].Maia et al. [25] proposed the ANN-Center method to predict the interval value by using themidpoint of the interval as the input. Later, Maia and De Carvalho [26] introduced the ANN-LU method by using ANN-Center and ANN-Range (predicting the difference betweenupper and lower bounds) for predicting the lower and upper bounds of intervals separately.Yang et al. [27] suggested that ANN-LU may face the unnatural interval crossing problemwhen the predicted lower bounds of intervals are larger than the predicted upper ones andvice versa and thereby leading to an invalid interval prediction. Hence, they introducedthe regularized ANN-LU (RANN-LU) model for interval-valued time series forecasting.
Although ANN-center, ANN-LU, and RANN-LU are found to be superior to the clas-sical linear regression models for interval-valued data prediction of Billard and Diday [7],these models rely either on midpoint (ANN-Center) or lower and upper bounds of theinterval for forecasting, which may not be a good input for predicting the future value of aninterval. For example, in the case of RANN-LU, if we predict the future value of the upperbound and lower bound separately, the prediction may not be reliable, since the wholeinformation of the interval is not taken into account [27,28]. Although ANN-Center andANN-LU consider both upper and lower bounds in the prediction process, the predictionstill relies on the midpoint of the interval indicating the symmetric weight between lowerand upper bounds. To overcome these problems, in this study, we introduce the convexcombination (CC) approach to ANN for predicting interval-valued data, say, lower andupper bounds. Our model is a generalization of the ANN-Center, allowing the weight tobe more flexible and not to be fixed at 0.5 (asymmetric weight).
The novelty of the proposed ANN-CC can be summarized in the following two aspects:First, our method can construct prediction intervals based on the CC approach and ANNmodels. More specifically, this study proposes the novel CC method for interval ANNmodeling. In this approach, the intervals of input and output variables are parametrizedthrough the convex combination of lower and upper bounds. The proposed ANN-CCis a promising alternative to the existing approaches. With optimal reference points thatbetter represent the intervals, we are able to find an efficient solution that improves theprediction accuracy of the lower and upper bounds. Second, to the best of our knowledge,there is no study extending the CC approach for interval ANN modeling. Our proposed
Appl. Sci. 2021, 11, 3997 4 of 25
method fills in such a literature gap and can capture both linear and nonlinear patternswithin interval-valued data.
The rest of this paper is organized as follows. Section 2 gives a brief review ofinterval-valued prediction methods. Section 3 presents the proposed ANN with the convexcombination method. In Section 4, we provide a simulation study to assess the performanceof our proposed method; Section 5 describes the data used in this study. The analyticalresults are presented in Section 5. Section 6 provides the conclusion of this study.
2. Reviews of Existing Methods2.1. Linear Regression Based on Center Method
Let Xl = (xlt1, . . . , xl
tk) and Xu = (xut1, . . . , xu
tk), t = 1, . . . , T, are the lower and upperbounds of the intervals, respectively.
According to Billard and Diday [7], the center or mid-point of interval-valued explana-tory variables denoted as Xc is calculated from:
Xc =Xl + Xu
2. (1)
Likewise, interval-valued response variable denoted as Yc is calculated from:
Yc =Yl + Yu
2, (2)
where Yl = (yl1, . . . , yl
T) and Yu = (yu1 , . . . , yu
T).Thus, regression based on the center method can be constructed as
Yc = (Xc)′βc + εc, (3)
where βc = (βc1, . . . , βc
k) is the vector of parameters. εc = (εc1, . . . , εc
T) are errors that have anormal distribution. Using matrix notation, this problem can be estimated by the ordinaryleast squares (OLS) method under the full rank assumption:
βc= (Xc ′Xc)
−1Xc ′Yc. (4)
Then, the estimates for the response lower and upper bounds are presented as_Y
l=
Xl_β
cand
_Y
u= Xu
_β
c, respectively.
2.2. Linear Regression Based on Center-Range Method
Neto and De Carvalho [8] also introduced the center-range method to predict theupper and lower bounds of the dependent variable intervals. In this method, the lowerand upper bounds of the interval-valued response variable are separately predicted bythe mid-points and ranges of the interval-valued explanatory variables. Thus, this modelis built on two linear regression models, namely the regression based on center method(Equation (3)) and the regression-based on range method:
Yr = Xr ′βr + εr, (5)
where Xr = (Xu − Xl)/2 is half-ranges of explanatory variables, Yr is half-ranges ofresponse variables, εr is the error. Using matrix notation, the least-squares estimatorparameter βr is given by
βr= (Xr ′Xr)
−1Xr ′Yr. (6)
Then, we can predict the response lower and upper bounds as_Y
l= Xc ′_β
c− Xr ′_β
r
and_Y
u= Xc ′_β
c+ Xr ′_β
r, respectively.
Appl. Sci. 2021, 11, 3997 5 of 25
2.3. Linear Regression Based on Convex Combination Method
Chanaim et al. [10] suggested that the center method may lead to the misspecificationproblem, as the midpoint of the intervals might not be a good reference of the intervals.To tackle this problem, they proposed employing the convex combination approach todetermine the best reference point between the ranges of interval data
xcck = w1kxl
k + (1−w1k)xuk ; w1k ∈ [0, 1], (7)
Ycc = w2Yl + (1−w2)Yu ; w2 ∈ [0, 1], (8)
where w= [w 11, . . . , w1kw2] is the weight parameter of the interval data with values [0, 1].The advantage of this method lies in the flexibility to assign weights in calculating theappropriate value between intervals. Thus
Ycc = Xcc ′βcc + εcc, (9)
where Xcc = (xcct1, . . . , xcc
tk), t = 1, . . . , T. Using matrix notation, this problem is alsoestimated by the OLS method under the full rank assumption:
βcc= (Xcc ′Xcc)
−1(Xcc ′Ycc). (10)
The response lower bound prediction is described in Equation (11), and the model topredict the response upper bound is given by Equation (12),
_Y
l=
_X
cc_β
cc+
_Y
r(_w2), (11)
_Y
u=
_X
cc_β
cc+
_Y
r(1−_
w2), (12)
where_Y
r= Xl ′_β
cc− Xu ′_β
cc, (13)
where_Y
ris the range prediction and
_X
cc=(_w1jxl
j + (1−_w1k)xu
j
), j = 1, . . . , k.
2.4. Regularized Artificial Neural Network (RANN)
Yang et al. [27] introduced RANN for interval-valued data prediction. This methodis able to approximate various forms of nonlinearity in the data and directly models thenon-cross lower and upper bounds of intervals. In this model, the relation between theinterval-valued output (Yu and Yl) and interval-valued inputs (Xu and Xl) is as follows:
Yl = f
(J
∑j=1
g
(2k
∑i=1
XiωI,lij + bI,l
j
)ωo,l
j + bo,lj
), (14)
Yu = f
(J
∑j=1
g
(2k
∑i=1
XiωI,uij + bI,u
j
)ωo,u
j + bo,uj
), (15)
where X= (Xu, Xl) ∈ R2k consists of 2k inputs. ω I,uij , ω I,l
ij and bI,lj , bI,u
j represent the weight
parameters and bias terms of the jth hidden layer neuron of input layer. ωo,uij , ωo,l
ij and
bo,lj , bo,u
j represent the weight parameters and bias terms of the jth hidden layer neuron ofoutput layer. f (·) and g(·) are the activation functions. To meet non-crossing lower andupper bounds of intervals, a non-crossing regularize is introduced in the loss functionas follows,
Loss =1
2T
T
∑t=1
(Yl − Yl)2+
12T
T
∑t=1
(Yu − Yu)2+
λ
2T
T
∑t=1
{max(0, Yl − Yu)
}2, (16)
Appl. Sci. 2021, 11, 3997 6 of 25
where λ > 0 is the regularization parameter for controlling the non-crossing strength.
3. The Proposed Method Artificial Neural Network with ConvexCombination (ANN-CC)
Artificial neural network (ANN) models can approximate various forms of nonlinear-ity in the data. There are various types of ANN, but the most popular one is the MultilayerPerceptron (MLP). ANN models have been successfully applied in a variety of fields suchas accounting, economics, finance, and marketing, as well as forecasting [20,26]. In thisstudy, we use a three-layer ANN model in which both the inputs and outputs contain thelower and upper bounds of intervals. As shown in Figure 1, suppose the interval-valueddata (X, Y) consists of one predictor and one response. The first layer is the input layers,the middle layer is called the hidden layer, and the last layer is the output layers. Allthe data are expressed in the form of a lower-upper bound. Thus, we have
{Xu
1 , Xl1
}and
{Yu, Yl
}. For convenience, we use the notation Xcc= w1Xl
1 + (1−w1)Xu1 for input
variable and Ycc= w2Yl + (1−w2)Yu for output variable. Note that ANN-center is theparticular case of ANN-CC if we set w1 = w2 = 0.5. Each layer contains jth neurons whichare connected to one another and are also connected with all neurons in the immediatenext layer. The neuron input path has a signal on its Xcc, and the strength of the path ischaracterized by a weight of neuron j (wj). The neuron is modeled as summing the pathweight times the input signal over all paths and adding the node bias (b). Note that theneural network consists of input function and output function; thus, we can express thesefunctions as:
Hcc,Ij = g
(Xcc ′ω I
j + bIj
), (17)
where Hcc,Ij is the jth hidden neuron’s input, ω I
j is the weight vector between the hidden
layer and the input layer. g(·) is the activation function for the hidden layer. bIj is the bias
term of input layer. Then, Hcc,Ij is transformed into output Ycc with the activation function
of the output layer. Thus, the model can be written as:
Ycc = f
(J
∑j=1
Hcc,Ij ωo
j + boj
), (18)
where ωo ={
ωo1, . . . , ωo
j ) is the weight vector between the hidden layer and the outputlayer. bo
j is the bias term of output layer. f (·) is the activation function of the output layer. Achallenge in ANN design is the selection of activation function. It is also known as transferfunction, and can be basically divided into four types: tanh or hyperbolic tangent activationfunction (tanh), sigmoid or logistic activation function (sigmoid), linear activation function(linear), and exponential activation function (exp).
Learning occurs through the adjustment of the path weights and node biases. Themost common method used for this adjustment is backpropagation. In this method, theoptimal weights ω I
j and ωoj are estimated by minimizing the squared difference between
the model output and the estimated. We formulate the loss function as follows,
loss =1T
T
∑t=1
(Ycc − Ycc), (19)
where Ycc and Ycc are the estimated output and observed output, respectively. In additionto the weights of neuron j (ωj), our estimation also considers the weight parameter of theinterval data (w) to determine the reference of the output and input variables.
Appl. Sci. 2021, 11, 3997 7 of 25Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 26
Figure 1. The network architecture of the interval-valued data artificial neural network with con-
vex combination (ANN-CC) model (one hidden layer is assumed.
In our work, the ANN-CC methodology includes the grid search for the parameter
1 2={w , w )w . Note that grid search was executed before estimating the weight parameters o
j and o
j in the ANN structure. A pseudo-code for the performed grid search and
ANN-CC estimation is presented in Algorithm 1.
Algorithm 1. Pseudo code for the proposed ANN-CC with one predictor and one re-
sponse
Require: 1 2={w , w )w , where 1w and 2w are the set of candidate weight of input
1 1{ , }u lX X and output { , }u lY Y , respectively, within [0, 1].
#Serach the optimal w
For each 1 2w , wi i in 1 2w , w [0.001,0.002,...,1]
Calculate 2 2w (1 w )cc l u
i i iY Y Y and 1 1 1 1w (1 w )cc l u
i i iX X X
#Define the loss function of ANN structure
1{ ,..., )o o o
i i ij , 1{ ,..., )I I I
i i ij = Parameters( )
,cc I cc I I
ij i ij ijH g x b
,cc cc I o o
i ij ij ijY f H b
2
( ) cc cc
i i i iloss Y Y w
# Follow the gradients until convergence
( , )o I
i i i ω
repeat
Loss
i i i ω ω ω
until convergence
end for
#Choose the 1 2={w , w )w with the lowest iLoss as the optimal 1 2={w , w )w
1
uX
1
lX
1
ccX
NN1
NN2
… 1
ccY
1
uY
1
lY
….
NN J
1 1 ib
0b
1w
11 w I
jw
o
j
2w
21 w
Figure 1. The network architecture of the interval-valued data artificial neural network with convexcombination (ANN-CC) model (one hidden layer is assumed.
As the interval data is manipulated as a new type of numbers represented by anordered pair of its minimum and maximum values, the numerical manipulations of intervaldata should follow “interval calculus” [25]. In practice, we can separately predict the lowerand upper bounds of the interval using the min-max method. However, this method doesnot guarantee the mathematical coherence of predicted bounds. That is, the predictedlower bounds of intervals should be smaller than the predicted upper ones. Otherwise,the unnatural interval crossing problem will occur, which leads to an invalid intervalprediction [27]. Furthermore, the forecasting performance can be impaired if there is nota clear dependency between the respective bounds of output and input [28,29]. Insteadof restricting the weight of interval-valued data (w = (w1, w2)) to be 0.5 (mid-point), inthis study, we consider the convex combination method to get the reference point of theinterval-valued data. Thus, in this estimation, the weights of neuron j (ωj) are dependenton the given weight w. Since there is no close form solution for this weight parameterof the interval data, we thus employ a grid search selection of w that minimizes thesum of squared errors, denoted as loss(w). Then, we can rewrite our loss function inEquation (19) as
loss(w) =1T
T
∑t=1
(Ycc − Ycc(w))2. (20)
This loss function is computed using two steps of estimation. Firstly, it is important tosolve a nonlinear optimization to obtain (ωj) which depends on the candidate wi. In thesecond step, the loss function is then minimized with respect to the candidate wi. Then,we introduce another candidate w to repeat the first step. After loss(w) is computed for allcandidates wi, the minimum value of loss(w) is preferred. Thus, the optimal w = (w1, w2)is obtained by
_w = argmin
w= loss(w). (21)
We note that ANN estimates over a grid search between 0 and 1. Finally, followingthe CC method in Section 2.3, we can predict the lower and upper bounds as follows.
Yl = f
(J
∑j=1
Hcc,Ij (
_w1)
_ω
oj +
_b
o
j
)− Yr(
_w2), (22)
Appl. Sci. 2021, 11, 3997 8 of 25
Yu = f
(J
∑j=1
Hcc,Ij (
_w1)
_ω
oj +
_b
o
j
)+ Yr(1−_
w2), (23)
where Hcc,Ij (
_w1) = g
(xcc(
_w1)
′_ω
Ij +
_b
I
j
)is the jth hidden neuron’s input Xcc(
_w1) and the
range prediction is computed by
Yr = f
(J
∑j=1
Hu,Ij
_ω
oj +
_b
o
j
)− f
(J
∑j=1
Hl,Ij
_ω
oj +
_b
o
j
), (24)
where Hl,Ij = g
(Xl ′_ω
Ij +
_b
I
j
)and Hu,I
j = g(
Xu ′_ωIj +
_b
I
j
). By making this prediction, the
predicted lower bounds of intervals should not cross over the corresponding upper bound.
We also note that_ω
Ij and
_ω
oj have contained the information of the weight parameter of
the interval data X and Y.In our work, the ANN-CC methodology includes the grid search for the parameter
_w=
{_w
1,_w2). Note that grid search was executed before estimating the weight parameters
_ω
oj and
_ω
oj in the ANN structure. A pseudo-code for the performed grid search and ANN-
CC estimation is presented in Algorithm 1.
Algorithm 1. Pseudo code for the proposed ANN-CC with one predictor and one response
Require : w= {w 1, w2), where w1 and w2 are the set of candidate weight of input{
Xu1 , Xl
1
}and output
{Yu, Yl
}, respectively, within [0, 1].
#Serach the optimal_w
For each wi1, wi2 in w1, w2 = [0.001, 0.002, . . . , 1]Calculate Ycc
i = wi2Yl + (1−wi2)Yu and Xcci = wi1Xl
1 + (1−wi1)Xu1
#Define the loss function of ANN structure
ωoi =
{ωo
i1, . . . , ωoij) , ω I
i ={
ω Ii1, . . . , ω I
ij) = Parameters( )
Hcc,Iij = g
(xcc
i′ω I
ij + bIij
)Ycc
i = f(
∑ Hcc,Iij ωo
ij + boij
)lossi(wi) = ‖Ycc
i − Ycci ‖
2
# Follow the gradients until convergenceωi = (ωo
i , ω Ii )
repeatωi =ωi − ρ∇ωLoss
iuntil convergenceend for#Choose the w= {w 1, w2) with the lowest Lossi as the optimal
_w=
{_w
1,_w2)
_w = argmin
w= loss(w)
Calculate_Y
cc=
_w2Yl + (1−_
w2)Yu and_X
cc=
_w1Xl
1 + (1−_w1)Xu
1
# Compute the loss function of ANN structure using_Y
ccand
_X
cc
Loss∗(_w) = ‖Ycc − Ycc‖2
# Follow the gradients until convergenceω = (ωo, ω I)repeatω =ω− ρ∇ωLoss∗
until convergence
Appl. Sci. 2021, 11, 3997 9 of 25
4. Simulation Study
To examine the performance of our proposed method, we conducted a simulationstudy. We considered two data generation processes which are different in structure: linearand nonlinear.
4.1. Linear Structure
The simple interval generation process is conducted, and one independent variableis assumed. We considered three typical data generation processes with different weightparameters. For this purpose, we considered the following three Scenarios of weight inthe intervals:
Scenario 1: Center of interval data: w1 = 0.5, w2 = 0.5[(0.5)Yl + (1− 0.5)Yu
]= 1 + 5
[(0.5)Xl + (1− 0.5)Xu
]+ ε. (25)
Scenario 2: Deviate from the center to the lower of the interval data: w1 = 0.2, w2 = 0.2[(0.2)Yl + (1− 0.2)Yu
]= 1 + 5
[(0.2)Xl + (1− 0.2)Xu
]+ ε. (26)
Scenario 3: Deviate from the center to the upper of the interval data: w1 = 0.8, w2 = 0.8[(0.8)Yl + (1− 0.8)Yu
]= 1 + 5
[(0.8)Xl + (1− 0.8)Xu
]+ ε. (27)
For each scenario, we performed 100 replications. In each simulation, we proceededas follows.
(1) Generate the error from the normal distribution with mean zero and variance one.(2) Generate the upper bound of the independent variable Xu, from the uniform (1,3).
Then, we computed the lower bound of the independent variable Xl = Xu − rx, whererx ∼ U(0, 2) denotes the range between the upper and lower bounds.
(3) Compute the expected independent variable Xcc of the intervals that have beencomputed by
Xcc = (w1Xu) + (1 − w1)Xl . Then, we could generate the expected dependentvariables Ycc = Xccβ + ε.
(4) Finally, we derived the upper and lower bounds of intervals by Yl = Ycc − ry,
where ry ∼ U(0, 2) and Yu =(
Ycc − (1−w2)Yl)
/w2. We note rx and ry are a randomnumber for simulating the interval X and Y, respectively. This guaranteed that the boundsare not crossing each other.
In this simulation study, we performed 100 replications with a sample size n = 1000 forall three scenarios. Each simulation dataset a randomly split, with 80% for training and 20%for testing. Our ANN with the convex combination method (ANN-CC) was then comparedwith two conventional models, namely the ANN-Center and the RANN-LU method ofYang et al. [27]. In this simulation study, four transfer functions, namely tanh or hyperbolictangent activation function (tanh), sigmoid or logistic activation function (sigmoid), linearactivation function (linear), and exponential activation function (exp) were considered.To simplify the comparison, one input layer, one output layer, one hidden layer, and onehidden neuron were assumed. We noted that the ANN-Center could be estimated by thevalidann package in R programming language [30]. In addition, this package also providesvalidation methods for the replicative, predictive, and structural validation of artificialneural network models.
Appl. Sci. 2021, 11, 3997 10 of 25
To assess the performance of these models, we conducted the following measures:mean absolute error (MAE), mean squared errors (MSE), and root mean squared errors(RMSE). The formulae of MAE, MSE, and RMSE are as follows:
MAE =
(N∑
i=1
∣∣∣∣_Yu
i −Yui
∣∣∣∣/N)+
(N∑
i=1
∣∣∣∣_Y l
i −Yli
∣∣∣∣/N)
2, (28)
MSE =
N∑
i=1(_Y
u
i −Yui )
2/N +
N∑
i=1(_Y
l
i −Yli )
2
/N
2, (29)
RMSE =
√N∑
i=1(_Y
u
i −Yui )
2/N +
√N∑
i=1(_Y
l
i −Yli )
2
/N
2(30)
We repeated the simulation 100 times and obtained the simulation data with 100 samples.An example of each of the simulated interval-valued time series is presented in Figure 2. Inthis figure, each square plot represents the relationship between the interval X and Y.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 26
In this simulation study, we performed 100 replications with a sample size n = 1000
for all three scenarios. Each simulation dataset a randomly split, with 80% for training and
20% for testing. Our ANN with the convex combination method (ANN-CC) was then
compared with two conventional models, namely the ANN-Center and the RANN-LU
method of Yang et al. [27]. In this simulation study, four transfer functions, namely tanh
or hyperbolic tangent activation function (tanh), sigmoid or logistic activation function
(sigmoid), linear activation function (linear), and exponential activation function (exp)
were considered. To simplify the comparison, one input layer, one output layer, one hid-
den layer, and one hidden neuron were assumed. We noted that the ANN-Center could
be estimated by the validann package in R programming language [30]. In addition, this
package also provides validation methods for the replicative, predictive, and structural
validation of artificial neural network models.
To assess the performance of these models, we conducted the following measures:
mean absolute error (MAE), mean squared errors (MSE), and root mean squared errors
(RMSE). The formulae of MAE, MSE, and RMSE are as follows:
1 1
/ /
MAE2
N Nu u l l
i i i i
i i
Y Y N Y Y N
,
(28)
2 2
1 1
( ) / ( ) /
MSE ,2
N Nu u l l
i i i i
i i
Y Y N Y Y N
(29)
2 2
1 1
( ) / ( ) /
RMSE2
N Nu u l l
i i i i
i i
Y Y N Y Y N
(30)
We repeated the simulation 100 times and obtained the simulation data with 100
samples. An example of each of the simulated interval-valued time series is presented in
Figure 2. In this figure, each square plot represents the relationship between the intervalX and Y .
(a) Scenario 1 (b) Scenario 2
-2 0 2 4
-10
01
02
0
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
-2 0 2 4
-10
01
02
03
04
0
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
Appl. Sci. 2021, 11, x FOR PEER REVIEW 11 of 26
(c) Scenario 3
Figure 2. Interval-valued data plot for the linear case. The red dots indicate the mid-point value
within the interval.
Table 1 presents the results of 100 repetitions for the linear structure case. The MAE,
MSE, and RMSE are reported. We observed that the ANN model with the CC method
(ANN-CC) showed its powerful nonlinear approximation ability (tanh, sigmoid, exp) as
the MAE, MSE, and RMSE values were lower than those of the ANN-Center and RANN-
LU in Scenarios 2 and 3. It is also noticed that tanh function performed the best fit function
for the ANN-CC model for these simulated datasets. Not surprisingly, we observed that
our ANN-CC did not outperform the ANN-Center method under Scenario 1. This is due
to the interval data being simulated from the midpoint. However, our ANN-CC method
still performed better than RANN-LU in this scenario. In sum, from evaluating our ANN
model with convex combination performance, we reached a similar conclusion for Sce-
narios 2 and 3. Our proposed model performed well in the simulation study, and the
ANN-CC method showed high performance in all scenarios.
Table 1. Experimental results for the linear case.
ANN-CC ANN-Center RANN-LU
Scenario 1 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 2.4011
(0.1548)
10.8590
(1.3251)
3.2951
(1.2434)
2.3154
(0.1215)
9.4115
(1.1154)
3.0681
(1.0859)
3.1244
(1.1245)
14.2251
(2.3414)
3.7729
(1.5521)
sigmoid 2.5870
(0.1211)
10.9894
(1.3511)
3.3154
(1.2433)
2.4023
(0.1011)
9.5558
(1.1584)
3.0917
(1.0756)
3.1148
(1.3584)
12.5441
(3.6974)
3.5423
(1.2532)
linear 2.7244
(0.1513)
12.0524
(1.1125)
3.4723
(1.3234)
2.4488
(0.1254)
10.3554
(1.0015)
3.2184
0028)
3.5415
(1.2121)
14.3554
(1.5487)
3.7890
(1.1112)
exp 2.5980
(0.1148)
11.3486
(1.981)
3.3693
(1.2113)
2.4223
(0.1011)
10.0215
(1.1057)
3.1661
(1.0723)
3.1057
(1.2554)
12.5015
(3.4548)
3.5361
(0.9723)
Scenario 2 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 2.1350
(0.1254)
7.0789
(1.3258)
2.6610
(1.1123)
3.1332
(2.1254)
9.0173
(2.4848)
3.0021
(1.4434)
3.5445
(1.3145)
15.1541
(2.6879)
3.8935
(1.2329)
sigmoid 2.3328
(0.1158)
8.6030
(1.2217)
2.9337
(1.0283)
4.2214
(3.3141)
10.6030
(1.6278)
3.2565
(1.3233)
2.5847
(1.2597)
9.6984
(3.3354)
3.1154
(1.2810)
linear 2.7825
(0.1698)
8.9356
(1.1369)
2.9894
(1.0022)
4.3112
(2.6545)
10.9356
(2.3679)
3.3078
(1.2232)
4.1125
(1.3541)
14.0778
(1.1548)
3.7533
(1.0012)
exp 2.4625
(0.1354)
8.5072
(1.2589)
2.9173
(0.9233)
3.5845
(2.3651)
10.4072
(2.2589)
3.2269
(1.1129)
3.4797
(1.5479)
10.3155
(1.1554)
3.2129
(0.9928)
Scenario 3 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
-2 0 2 4
-10
01
02
0
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
Figure 2. Interval-valued data plot for the linear case. The red dots indicate the mid-point valuewithin the interval.
Table 1 presents the results of 100 repetitions for the linear structure case. The MAE,MSE, and RMSE are reported. We observed that the ANN model with the CC method
Appl. Sci. 2021, 11, 3997 11 of 25
(ANN-CC) showed its powerful nonlinear approximation ability (tanh, sigmoid, exp) as theMAE, MSE, and RMSE values were lower than those of the ANN-Center and RANN-LU inScenarios 2 and 3. It is also noticed that tanh function performed the best fit function forthe ANN-CC model for these simulated datasets. Not surprisingly, we observed that ourANN-CC did not outperform the ANN-Center method under Scenario 1. This is due tothe interval data being simulated from the midpoint. However, our ANN-CC method stillperformed better than RANN-LU in this scenario. In sum, from evaluating our ANN modelwith convex combination performance, we reached a similar conclusion for Scenarios 2and 3. Our proposed model performed well in the simulation study, and the ANN-CCmethod showed high performance in all scenarios.
Table 1. Experimental results for the linear case.
ANN-CC ANN-Center RANN-LU
Scenario 1 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 2.4011(0.1548)
10.8590(1.3251)
3.2951(1.2434)
2.3154(0.1215)
9.4115(1.1154)
3.0681(1.0859)
3.1244(1.1245)
14.2251(2.3414)
3.7729(1.5521)
sigmoid 2.5870(0.1211)
10.9894(1.3511)
3.3154(1.2433)
2.4023(0.1011)
9.5558(1.1584)
3.0917(1.0756)
3.1148(1.3584)
12.5441(3.6974)
3.5423(1.2532)
linear 2.7244(0.1513)
12.0524(1.1125)
3.4723(1.3234)
2.4488(0.1254)
10.3554(1.0015)
3.21840028)
3.5415(1.2121)
14.3554(1.5487)
3.7890(1.1112)
exp 2.5980(0.1148)
11.3486(1.981)
3.3693(1.2113)
2.4223(0.1011)
10.0215(1.1057)
3.1661(1.0723)
3.1057(1.2554)
12.5015(3.4548)
3.5361(0.9723)
Scenario 2 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 2.1350(0.1254)
7.0789(1.3258)
2.6610(1.1123)
3.1332(2.1254)
9.0173(2.4848)
3.0021(1.4434)
3.5445(1.3145)
15.1541(2.6879)
3.8935(1.2329)
sigmoid 2.3328(0.1158)
8.6030(1.2217)
2.9337(1.0283)
4.2214(3.3141)
10.6030(1.6278)
3.2565(1.3233)
2.5847(1.2597)
9.6984(3.3354)
3.1154(1.2810)
linear 2.7825(0.1698)
8.9356(1.1369)
2.9894(1.0022)
4.3112(2.6545)
10.9356(2.3679)
3.3078(1.2232)
4.1125(1.3541)
14.0778(1.1548)
3.7533(1.0012)
exp 2.4625(0.1354)
8.5072(1.2589)
2.9173(0.9233)
3.5845(2.3651)
10.4072(2.2589)
3.2269(1.1129)
3.4797(1.5479)
10.3155(1.1554)
3.2129(0.9928)
Scenario 3 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 2.1049(0.1112)
6.9441(1.5159)
2.6352(0.6333)
3.4488(2.1254)
9.9410(4.3549)
3.1539(1.6727)
2.4141(1.0413)
8.5454(2.5444)
2.9245(1.5529)
sigmoid 2.1249(0.1874)
7.0088(1.6511)
2.6468(0.7843)
3.9784(2.3743)
15.0113(5.4035)
3.8730(1.2270)
2.5454(1.1115)
10.5544(2.1125)
3.2456(1.2332)
linear 2.8524(0.1369)
12.9612(1.3594)
3.6013(1.0091)
3.8411(2.6588)
14.9023(5.8941)
3.8607(1.3410)
2.9445(0.8797)
13.1154(2.1112)
3.6210(1.2221)
exp 2.4489(0.1364)
10.4617(1.5114)
3.2344(0.8833)
4.8778(4.2643)
16.4107(5.4113)
4.0511(2.0013)
3.0124(1.0694)
11.3547(1.9967)
3.2683(1.1009)
Note: The parenthesis denotes the standard deviation. The bold number presents the lowest values of mean absolute error (MAE), meansquared errors (MSE), and root mean squared errors (RMSE).
4.2. Nonlinear Structure
Similar to the linear structure, we considered three typical data generation pro-cesses with different weight parameters. Three Scenarios of weight in the intervals wereas follows:
Scenario 1: Center of interval data: w1 = 0.5, w2 = 0.5[(0.5)Yl + (1− 0.5)Yu
]= 1 + 3e[(0.5)Xl+(1−0.5)Xu ]
2+ ε. (31)
Scenario 2: Deviate from the center to the lower of the interval: w1 = 0.2, w2 = 0.2[(0.2)Yl + (1− 0.2)Yu
]= 1 + 3e[(0.2)Xl+(1−0.2)Xu ]
2+ ε. (32)
Appl. Sci. 2021, 11, 3997 12 of 25
Scenario 3: Deviate from the center to the upper of the interval: w1 = 0.8, w2 = 0.8[(0.8)Yl + (1− 0.8)Yu
]= 1 + 3e[(0.8)Xl+(1−0.8)Xu ]
2+ ε. (33)
For each scenario, we performed 100 replications. This data was more complicatedthan that in the linear structure case, as it showed a nonlinear relationship between thedependent and independent variables, as shown in Figure 3. The results are summarizedin Table 2.
Table 2 presents the simulation results based on the nonlinear case. Similar results wereobtained, as the proposed ANN-CC showed higher performance in Scenarios 2 and 3. Weobserved that ANN-Center still performed poorly in Scenarios 2 and 3. The reason is simple.This model fixes the weight parameter at the center, which does not correspond to the truedata generating process, thus the ANN-Center leads to higher bias of the prediction.
The experiments were carried out using an Intel Core i5-6400 CPU 2.7 GHz 4 core16Gb RAM. The computational cost of our ANN-CC model were a bit higher than thoseof ANN-Center and RANN-LU. The training was the only time-consuming step for allmodels. It was also observed that the computation time of our proposed model was largerthan ANN-Center and RANN-LU, as the additional weight of interval-valued data wasestimated simultaneously during the optimization.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 12 of 26
tanh 2.1049
(0.1112)
6.9441
(1.5159)
2.6352
(0.6333)
3.4488
(2.1254)
9.9410
(4.3549)
3.1539
(1.6727)
2.4141
(1.0413)
8.5454
(2.5444)
2.9245
(1.5529)
sigmoid 2.1249
(0.1874)
7.0088
(1.6511)
2.6468
(0.7843)
3.9784
(2.3743)
15.0113
(5.4035)
3.8730
(1.2270)
2.5454
(1.1115)
10.5544
(2.1125)
3.2456
(1.2332)
linear 2.8524
(0.1369)
12.9612
(1.3594)
3.6013
(1.0091)
3.8411
(2.6588)
14.9023
(5.8941)
3.8607
(1.3410)
2.9445
(0.8797)
13.1154
(2.1112)
3.6210
(1.2221)
exp 2.4489
(0.1364)
10.4617
(1.5114)
3.2344
(0.8833)
4.8778
(4.2643)
16.4107
(5.4113)
4.0511
(2.0013)
3.0124
(1.0694)
11.3547
(1.9967)
3.2683
(1.1009)
Note: The parenthesis denotes the standard deviation. The bold number presents the lowest values of mean absolute error
(MAE), mean squared errors (MSE), and root mean squared errors (RMSE).
4.2. Nonlinear Structure
Similar to the linear structure, we considered three typical data generation processes
with different weight parameters. Three Scenarios of weight in the intervals were as fol-
lows:
Scenario 1: Center of interval data: 1w 0.5 , 2w 0.5
2(0.5) (1 0.5)
(0.5) (1 0.5) 1 3 .l uX X
l uY Y e (31)
Scenario 2: Deviate from the center to the lower of the interval: 1w 0.2 , 2w 0.2
2(0.2) (1 0.2)
(0.2 ) (1 0.2 ) 1 3 .l uX X
l uY Y e (32)
Scenario 3: Deviate from the center to the upper of the interval: 1w 0.8 , 2w 0.8
2(0.8) (1 0.8)
(0.8 ) (1 0.8 ) 1 3 .l uX Xl uY Y e
(33)
For each scenario, we performed 100 replications. This data was more complicated
than that in the linear structure case, as it showed a nonlinear relationship between the
dependent and independent variables, as shown in Figure 3. The results are summarized
in Table 2.
(a) Scenario 1 (b) Scenario 2
-2 0 2 4
02
04
06
08
01
00
12
0
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
-2 0 2 4
05
01
00
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 26
(c) Scenario 3
Figure 3. Interval-valued data plot for the nonlinear case. The red dots indicate the mid-point
value within the interval.
Table 2. Experimental results for the nonlinear case.
ANN-CC ANN-Center RANN-LU
Scenario 1 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 3.6341
(0.3015)
14.8140
(2.3840)
3.8491
(1.0023)
3.1258
(0.2474)
9.8741
(2.0126)
3.1438
(0.9343)
3.9874
(1.4126)
14.9874
(3.2158)
3.8715
(0.9323)
sigmoid 3.4558
(0.3114)
10.9894
(5.0114)
3.3155
(0.9833)
3.2154
(0.2099)
9.9741
(2.3654)
3.1582
(0.8834)
4.9845
(1.9136)
15.3898
(5.1145)
3.9245
(0.9823)
linear 5.3155
(1.4259)
17.1148
(5.6584)
4.1377
(1.0824)
5.2145
(1.3978)
16.4213
(3.3665)
4.0523
(1.1112)
5.4136
(2.0113)
17.8854
(5.7897)
4.2295
(1.2334)
exp 4.6211
(0.8797)
16.3123
(2.8557)
4.0391
(1.0067)
3.1158
(0.9788)
14.1314
(2.6547)
3.7599
(0.9823)
4.8654
(1.3541)
17.0198
(3.9844)
4.1256
(1.2220)
Scenario 2 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 3.2250
(0.8453)
9.1588
(3.6941)
3.0372
(0.8832)
3.9788
(2.5481)
10.5448
(4.6641)
3.2475
(1.1234)
3.7888
(1.1255)
9.8368
(3.6879) 3.1367
sigmoid 4.0581
(1.3511)
13.5154
(3.6444)
3.6765
(0.8734)
5.3698
(2.4145)
20.3688
(4.3658)
4.5139
(2.4449)
4.3781
(1.8746)
16.1556
(5.1034) 4.0197
linear 4.8744
(2.5584)
17.9981
(4.3398)
4.2428
(0.9734)
6.1142
(3.9451)
27.6548
(10.4891)
5.2597
(2.1240)
5.6684
(2.9876)
22.8314
(6.3598) 4.7783
exp 4.1158
(0.7446)
15.4458
(3.3658)
3.9311
(0.9872)
5.4101
(2.3155)
20.0123
(4.4115)
4.4744
(2.1098)
4.2666
(1.3659)
14.9155
(1.4231) 3.8628
Scenario 3 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 3.4589
(0.7894)
9.3158
(4.1158)
3.0544
(0.9239)
5.8115
(3.0125)
21.1930
(10.3661)
4.6033
(1.2323)
3.5012
(1.2458)
9.4125
(4.2320)
3.0681
(1.9383)
sigmoid 4.1585
(1.3841)
14.3651
(3.9887)
3.7901
(1.1112)
5.9884
(2.3688)
20.0113
(10.4035)
4.4730
(1.2223)
4.3685
(1.5645)
14.8554
(4.3598)
3.8544
(1.2234)
linear 4.8664
(2.1355)
16.6557
(6.0123)
4.0814
(1.0389)
5.9424
(2.6871)
25.1253
(10.3211)
5.0131
(1.4980)
4.9785
(1.8994)
16.9974
(5.3145)
4.1229
(1.4409)
exp 4.2556
(1.2154)
14.4456
(4.1106)
3.8009
(0.9227)
6.6698
(5.1155)
31.4107
(12.9987)
5.6044
(1.3409)
4.8664
(2.0115)
15.5024
(2.1258)
3.9377
(1.2284)
Note: One hidden layer node is assumed for ANN in this simulation study. The parenthesis denotes the standard devia-
tion. The bold number presents the lowest values of MAE, MSE, and RMSE.
Table 2 presents the simulation results based on the nonlinear case. Similar results
were obtained, as the proposed ANN-CC showed higher performance in Scenarios 2 and
-2 0 2 4
02
04
06
08
01
00
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
Figure 3. Interval-valued data plot for the nonlinear case. The red dots indicate the mid-point valuewithin the interval.
Appl. Sci. 2021, 11, 3997 13 of 25
Table 2. Experimental results for the nonlinear case.
ANN-CC ANN-Center RANN-LU
Scenario 1 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 3.6341(0.3015)
14.8140(2.3840)
3.8491(1.0023)
3.1258(0.2474)
9.8741(2.0126)
3.1438(0.9343)
3.9874(1.4126)
14.9874(3.2158)
3.8715(0.9323)
sigmoid 3.4558(0.3114)
10.9894(5.0114)
3.3155(0.9833)
3.2154(0.2099)
9.9741(2.3654)
3.1582(0.8834)
4.9845(1.9136)
15.3898(5.1145)
3.9245(0.9823)
linear 5.3155(1.4259)
17.1148(5.6584)
4.1377(1.0824)
5.2145(1.3978)
16.4213(3.3665)
4.0523(1.1112)
5.4136(2.0113)
17.8854(5.7897)
4.2295(1.2334)
exp 4.6211(0.8797)
16.3123(2.8557)
4.0391(1.0067)
3.1158(0.9788)
14.1314(2.6547)
3.7599(0.9823)
4.8654(1.3541)
17.0198(3.9844)
4.1256(1.2220)
Scenario 2 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 3.2250(0.8453)
9.1588(3.6941)
3.0372(0.8832)
3.9788(2.5481)
10.5448(4.6641)
3.2475(1.1234)
3.7888(1.1255)
9.8368(3.6879) 3.1367
sigmoid 4.0581(1.3511)
13.5154(3.6444)
3.6765(0.8734)
5.3698(2.4145)
20.3688(4.3658)
4.5139(2.4449)
4.3781(1.8746)
16.1556(5.1034) 4.0197
linear 4.8744(2.5584)
17.9981(4.3398)
4.2428(0.9734)
6.1142(3.9451)
27.6548(10.4891)
5.2597(2.1240)
5.6684(2.9876)
22.8314(6.3598) 4.7783
exp 4.1158(0.7446)
15.4458(3.3658)
3.9311(0.9872)
5.4101(2.3155)
20.0123(4.4115)
4.4744(2.1098)
4.2666(1.3659)
14.9155(1.4231) 3.8628
Scenario 3 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE
tanh 3.4589(0.7894)
9.3158(4.1158)
3.0544(0.9239)
5.8115(3.0125)
21.1930(10.3661)
4.6033(1.2323)
3.5012(1.2458)
9.4125(4.2320)
3.0681(1.9383)
sigmoid 4.1585(1.3841)
14.3651(3.9887)
3.7901(1.1112)
5.9884(2.3688)
20.0113(10.4035)
4.4730(1.2223)
4.3685(1.5645)
14.8554(4.3598)
3.8544(1.2234)
linear 4.8664(2.1355)
16.6557(6.0123)
4.0814(1.0389)
5.9424(2.6871)
25.1253(10.3211)
5.0131(1.4980)
4.9785(1.8994)
16.9974(5.3145)
4.1229(1.4409)
exp 4.2556(1.2154)
14.4456(4.1106)
3.8009(0.9227)
6.6698(5.1155)
31.4107(12.9987)
5.6044(1.3409)
4.8664(2.0115)
15.5024(2.1258)
3.9377(1.2284)
Note: One hidden layer node is assumed for ANN in this simulation study. The parenthesis denotes the standard deviation. The boldnumber presents the lowest values of MAE, MSE, and RMSE.
5. Application to Real Data5.1. Capital Asset Pricing Model: Thai Stocks
The performance of this ANN-CC model was assessed using interval-valued returns(maximum and minimum returns) from the Stock Exchange of Thailand. In this study,we compared the forecasting accuracy of several ANN models under the context of thecapital asset pricing model (CAPM). Our analysis employed lower and upper bounds ofthe real daily stock returns, including SET50, PTT, SCC, and CPALL, over the period from26 October 2011, to 7 May 2019. These three companies were selected for this study becausethey are considered well-performing companies in terms of their share prices. Moreover,they also experienced large trade volumes in the current decade and are regarded asfast-growing and highly volatile in the Thai stock market. All data were collected fromThomson Reuters DataStream. The criteria to choose the best fit model were MAE, MSE,and RMSE.
CAPM was proposed in separate studies by Sharpe [31] and Lintner [32] to measurethe risk of the individual stock against the market in terms of the beta risk βi. The βi is ameasure of a stock’s risk (volatility of returns) reflected by measuring the fluctuation of itsprice changes relative to the overall market. In other words, it is the stock’s sensitivity tomarket risk. The model can be written as
rit − r f t = β0 + βi(rmt − r f t) + εit, (34)
where rit is the return of the stock i, rmt is the return of the market, r f t is risk-free, εit iserror term at time t. If βi > 0, the stock is called aggressive stock (high risk), otherwise
Appl. Sci. 2021, 11, 3997 14 of 25
defensive stock (low risk) of excess stock return. In this study, we preserve the intervalformat of the stock i as [
rlit, ru
it
]=
[Pl
it − PAt−1
PAit−1
,Pu
it − PAit−1
PAit−1
], (35)
where Puit , Pl
it and PAit are maximum, minimum, and average prices of the individual stock
i at time t, respectively. Note that the risk-free rate rlf t and ru
f t, in this empirical study, isassumed to be zero.
Again, we preserve the interval format of the market return as
[rl
mt, rumt
]=
[Pl
mt − PAmt−1
PAmt−1
,Pu
m,t − PAmt−1
PAmt−1
], (36)
where Pumt, Pl
mt and PAmt are maximum, minimum, and average prices of the interval of the
market at time t, respectively.We constructed a deep neural network using stock returns from the Stock Exchange
of Thailand (SET), the major stock market in Thailand. We have selected three major stockswith high market capitalization at the beginning of the sample period. We collected SET50(proxy of the market) and three company stocks, PTT, SCC, and CPALL, over the periodfrom 4 January 2012 to 30 December 2019. The interval-valued data were constructed froma daily range of selected price indexes, i.e., the lowest and highest trading index valuesfor the day were calculated to define the movement on the market for that day. Then,the construction of our interval-valued prediction for these three stocks would be madebased on the models developed in Section 3. We note that all the interval price serieswere transformed to be interval returns following Equations (34) and (35). The descriptivestatistics, namely the mean, standard deviation, minimum value, and maximum value ofthe variables for the full sample, are summarized in Table 3. We observe that the returnsexhibited negative skewness for lower bound returns and positive for upper bound returns.In addition, the values of skewness on both sides were asymmetric, indicating that thegain and the loss in the Thai stock market were quite different. For illustration, a visualrelationship between the stock market and each stock is shown in Figure 4.
Table 3. Data description.
SET_ u SET_ l PTT_ u PTT_ l SCC_ u SCC_ l CPALL_ u CPALL_ l
Mean 0.012 −0.011 0.019 −0.018 0.018 −0.017 0.021 −0.019Median 0.010 −0.009 0.016 −0.015 0.016 −0.015 0.017 −0.016
Maximum 0.097 0.025 0.136 0.033 0.113 0.027 0.185 0.032Minimum −0.019 −0.106 −0.027 −0.128 −0.016 −0.105 −0.058 −0.172Std. Dev. 0.010 0.010 0.016 0.015 0.014 0.013 0.017 0.016Skewness 1.894 −1.812 1.576 −1.564 1.653 −1.475 2.038 −2.280Kurtosis 12.247 11.383 7.981 8.775 8.337 7.522 12.824 15.743
Jarque—Bera 7659.845 6398.846 2665.961 3308.472 3022.738 2235.950 8677.507 14,052.600MBF Jarque-Bera 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Observations 1841 1841 1841 1841 1841 1841 1841 1841Unit root test −2.230 −3.454 −2.125 −2.125 −2.661 −2.385 −2.307 −3.255MBF-unit root 0.083 0.003 0.105 0.105 0.029 0.058 0.070 0.005
Note: _ u and _ l are upper and lower bounds, respectively.
Appl. Sci. 2021, 11, 3997 15 of 25
Appl. Sci. 2021, 11, x FOR PEER REVIEW 15 of 26
We constructed a deep neural network using stock returns from the Stock Exchange
of Thailand (SET), the major stock market in Thailand. We have selected three major stocks
with high market capitalization at the beginning of the sample period. We collected SET50
(proxy of the market) and three company stocks, PTT, SCC, and CPALL, over the period
from 4 January 2012 to 30 December 2019. The interval-valued data were constructed from
a daily range of selected price indexes, i.e., the lowest and highest trading index values
for the day were calculated to define the movement on the market for that day. Then, the
construction of our interval-valued prediction for these three stocks would be made based
on the models developed in Section 3. We note that all the interval price series were trans-
formed to be interval returns following Equations (34) and (35). The descriptive statistics,
namely the mean, standard deviation, minimum value, and maximum value of the varia-
bles for the full sample, are summarized in Table 3. We observe that the returns exhibited
negative skewness for lower bound returns and positive for upper bound returns. In ad-
dition, the values of skewness on both sides were asymmetric, indicating that the gain and
the loss in the Thai stock market were quite different. For illustration, a visual relationship
between the stock market and each stock is shown in Figure 4.
Table 3. Data description.
SET_ u SET_ l PTT_ u PTT_ l SCC_ u SCC_ l CPALL_ u CPALL_ l
Mean 0.012 −0.011 0.019 −0.018 0.018 −0.017 0.021 −0.019
Median 0.010 −0.009 0.016 −0.015 0.016 −0.015 0.017 −0.016
Maximum 0.097 0.025 0.136 0.033 0.113 0.027 0.185 0.032
Minimum −0.019 −0.106 −0.027 −0.128 −0.016 −0.105 −0.058 −0.172
Std. Dev. 0.010 0.010 0.016 0.015 0.014 0.013 0.017 0.016
Skewness 1.894 −1.812 1.576 −1.564 1.653 −1.475 2.038 −2.280
Kurtosis 12.247 11.383 7.981 8.775 8.337 7.522 12.824 15.743
Jarque—Bera 7659.845 6398.846 2665.961 3308.472 3022.738 2235.950 8677.507 14,052.600
MBF Jarque-Bera 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Observations 1841 1841 1841 1841 1841 1841 1841 1841
Unit root test −2.230 −3.454 −2.125 −2.125 −2.661 −2.385 −2.307 −3.255
MBF-unit root 0.083 0.003 0.105 0.105 0.029 0.058 0.070 0.005
Note: _ u and _ l are upper and lower bounds, respectively.
(a) PTT (b) SCC (c) CPALL
Figure 4. Interval-valued data plots for Stock Exchange of Thailand (SET) and stocks. The red dot presents the midpoint
data plot for SET and stocks.
-0.10 -0.05 0.00 0.05
-0.1
0-0
.05
0.0
00
.05
0.1
0
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
-0.06 -0.04 -0.02 0.00 0.02 0.04 0.06
-0.0
6-0
.04
-0.0
20
.00
0.0
20
.04
0.0
6
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
Midpoint data
Interval-valued data
-0.05 0.00 0.05 0.10
-0.1
0-0
.05
0.0
00
.05
0.1
0
PTT
X.Interval
Y.In
terv
al
Midpoint data
Interval-valued data
Figure 4. Interval-valued data plots for Stock Exchange of Thailand (SET) and stocks. The red dot presents the midpointdata plot for SET and stocks.
Moreover, a unit root test was also conducted to examine whether a time series variablewas nonstationary and possessed a unit root. In this study, we used the minimum Bayesfactor (MBF) as a tool for making a statistical test. The MBF has a significant advantageover the p-value because the likelihood of the observed data can be expressed under eachhypothesis [33]. If 1 < MBF < 1/3, 1/3 < MBF < 1/10, 1/10 < MBF < 1/30, 1/30 < MBF< 1/100, 1/100 < MBF < 1/300 and MBF < 1/300, there is a chance that the MBF favors,respectively, the weak evidence, moderate evidence, substantial evidence, strong evidence,very strong evidence, and decisive evidence for the null hypothesis. According to theresults, all data series were decisive stationary, as shown by the low MBF values [33].
5.2. Comparison Results
This section shows the results of the models representing artificial neural networksfor interval-valued data of Thailand’s stock market. In this empirical example, we considerthe number of hidden neurons between one and three hidden neurons with four activationfunctions. Again, one input layer, one output layer, and one hidden layer were assumed.Thus, twelve ANN specifications were used to describe and forecast the excess return ofPTT, SCC, and CPALL stock returns under the CAPM context. To evaluate the forecastingperformance of the twelve ANN-class specifications presented in this section, the fore-casting process was handled as follows: Each dataset was split into 80% for training and20% for testing. Thus, both in-sample and out-of-sample forecasts were conducted in thiscomparison. The performance evaluation of the interval-valued data forecasting modelsANN-CC under CAPM was accomplished through MAE, MSE, and RMSE.
Table 4 shows the in-sample and out-of-sample estimation results for different ANN-CC specifications. Focusing on the MAE, MSE, and RMSE, the result indicated that theinterval-valued prediction results were a bit sensitive to activation functions. These acti-vation functions were compared, and we found that the exponential activation functionenabled more accurate forecasting in most cases, as the MAE, MSE, and RMSE of this acti-vation were lower than of other activation functions. When the number of hidden nodesconsidered in this comparison was compared, we observed that the higher the number ofhidden nodes, the lower MAE, MSE, and RMSE in some cases. Finally, we considered theweight of interval-valued data, which was calculated by the convex combination method.The results show that most of the weight parameters for interval excess stock return andinterval excess market return were not equal to 0.5. Therefore, it can be said that theassumption of the center method in the neural network forecasting of Maia et al. [25] maybe inappropriate for practical problems. This result confirms the reliability of the convexcombination method in ANN-CC forecasting.
Appl. Sci. 2021, 11, 3997 16 of 25
Table 4. Experimental results of ANN-CC.
One Hidden Node Weight Parameter MAE MSE RMSE
PTT w1 w2 out in out in out in
tanh 0.648521 0.999000 0.014094 0.013565 0.000315 0.000298 0.017751 0.017265sigmoid 0.608577 0.962942 0.011501 0.011931 0.000249 0.000237 0.015792 0.015362
linear 0.611678 0.999000 0.013707 0.013696 0.000303 0.000301 0.017401 0.017378exp 0.607746 0.999000 0.010572 0.010023 0.000221 0.000209 0.014862 0.014465
SCC w1 w2 out in out in out In
tanh 0.510112 0.927919 0.011382 0.011377 0.000218 0.000211 0.014768 0.014533sigmoid 0.513121 0.910014 0.010506 0.011060 0.000190 0.000160 0.013781 0.012658
linear 0.519982 0.920239 0.011793 0.011423 0.000235 0.000232 0.015343 0.015238exp 0.509987 0.930019 0.010734 0.011065 0.000193 0.000168 0.013897 0.012970
CPALL w1 w2 out in out in out In
tanh 0.412312 0.481917 0.013848 0.013835 0.000391 0.000384 0.019765 0.019588sigmoid 0.461660 0.500013 0.012627 0.012231 0.000235 0.000191 0.015343 0.013832
linear 0.386464 0.342022 0.013619 0.012975 0.000353 0.000277 0.018756 0.016650exp 0.461668 0.500101 0.013608 0.012891 0.000342 0.000269 0.018489 0.016413
Two Hidden Nodes Weight Parameter MAE MSE RMSE
PTT w1 w2 out in out in out In
tanh 0.398420 0.398420 0.014168 0.013570 0.000328 0.000294 0.018121 0.017151sigmoid 0.409278 0.409278 0.010203 0.009748 0.000167 0.000157 0.012934 0.012535
linear 0.421373 0.421373 0.013431 0.013755 0.000315 0.000298 0.017763 0.017269exp 0.389728 0.389728 0.009030 0.009085 0.000151 0.000145 0.012275 0.012040
SCC w1 w2 out in out in out In
tanh 0.358503 0.358503 0.012146 0.011344 0.000242 0.000201 0.0155543 0.014181sigmoid 0.387659 0.387659 0.010360 0.010467 0.000162 0.000168 0.012736 0.012977
linear 0.359730 0.359730 0.010858 0.010981 0.000172 0.000190 0.013121 0.013779exp 0.351820 0.351820 0.010011 0.010006 0.000156 0.000155 0.012494 0.012469
CPALL w1 w2 out in out in out In
tanh 0.423709 0.423709 0.014073 0.013789 0.000318 0.000319 0.017836 0.017856sigmoid 0.376741 0.376741 0.012700 0.012163 0.000255 0.000262 0.015960 0.016191
linear 0.431735 0.431735 0.015190 0.013474 0.000393 0.000301 0.019825 0.017356exp 0.427827 0.427827 0.011830 0.011123 0.000212 0.000201 0.014567 0.014183
Three Hidden Nodes Weight Parameter MAE MSE RMSE
PTT w1 w2 out in out in out In
tanh 0.394010 0.394010 0.012486 0.011650 0.000279 0.000215 0.016751 0.014672sigmoid 0.387755 0.387755 0.010510 0.009682 0.000192 0.000151 0.013866 0.012291
linear 0.425890 0.425890 0.012102 0.011565 0.000258 0.000222 0.016080 0.014912exp 0.383586 0.383586 0.010312 0.009675 0.000180 0.000150 0.013425 0.012256
SCC w1 w2 out in out in out In
tanh 0.363167 0.363167 0.011112 0.011669 0.000183 0.000215 0.013529 0.014668sigmoid 0.337693 0.337693 0.010553 0.010319 0.000175 0.000161 0.013230 0.012692
linear 0.364510 0.364510 0.011384 0.011578 0.000187 0.000214 0.013677 0.014633exp 0.367622 0.367622 0.010887 0.010341 0.000179 0.000164 0.013381 0.012815
CPALL w1 w2 out in out in out In
tanh 0.399262 0.399262 0.014182 0.013737 0.000397 0.000306 0.019931 0.017499sigmoid 0.410937 0.410937 0.012159 0.012331 0.000261 0.000261 0.016160 0.016158
linear 0.415327 0.415327 0.013745 0.012910 0.000319 0.000282 0.017865 0.016780exp 0.423794 0.423794 0.011926 0.012321 0.000231 0.000252 0.015185 0.015873
Note: “in” and “out” denote in-sample and out-of-sample forecasts, respectively. Bold number indicates the lowest MAE, MSE, and RMSEin each case.
Appl. Sci. 2021, 11, 3997 17 of 25
Furthermore, we also compared the performance of the ANN-CC models with thetraditional ANN models, ANN-Center (Table 5) and RANN-LU (Table 6). Tables 5 and 6also provide the MAE, MSE, and RMSE of ANN-Center and RANN-LU, respectively. Wenote that four activations and a different number of hidden neurons were compared, andwe also found that the exponential activation function provided higher performance inmost cases. Furthermore, the prediction error decreased for a larger number of hiddenneurons because the MAE, MSE, and RMSE of the ANN-Center and RANN-LU modelswere lower when the number of hidden neurons increased.
Table 5. Experimental results of ANN-Center.
One Hidden Node Weight Parameter MAE MSE RMSE
PTT w1 w2 out in out in out in
tanh 0.500000 0.500000 0.019527 0.019337 0.000626 0.000593 0.025023 0.024332sigmoid 0.500000 0.500000 0.019831 0.019362 0.000639 0.000596 0.025286 0.024424
linear 0.500000 0.500000 0.019064 0.019355 0.000597 0.000583 0.024441 0.02415exp 0.500000 0.500000 0.012656 0.011672 0.000261 0.000225 0.016154 0.015011
SCC w1 w2 out in out in out in
tanh 0.500000 0.500000 0.017691 0.017901 0.000479 0.000496 0.021866 0.022275sigmoid 0.500000 0.500000 0.017738 0.017932 0.000506 0.000499 0.022495 0.022342
linear 0.500000 0.500000 0.017561 0.017892 0.000465 0.000486 0.021569 0.022032exp 0.500000 0.500000 0.017469 0.017703 0.000455 0.000472 0.021335 0.021737
CPALL w1 w2 out in out in out in
tanh 0.500000 0.500000 0.021078 0.020439 0.000801 0.000631 0.028314 0.025126sigmoid 0.500000 0.500000 0.021049 0.020295 0.000797 0.000626 0.028236 0.025027
linear 0.500000 0.500000 0.021850 0.020635 0.000857 0.000658 0.029283 0.025666exp 0.500000 0.500000 0.022129 0.020771 0.000891 0.000733 0.029861 0.027078
Two Hidden Nodes Weight Parameter MAE MSE RMSE
PTT w1 w2 out in out in out in
tanh 0.500000 0.500000 0.018938 0.019209 0.000558 0.000556 0.023622 0.023830sigmoid 0.500000 0.500000 0.019426 0.019363 0.000608 0.000611 0.024658 0.024490
linear 0.500000 0.500000 0.018735 0.018200 0.000548 0.000541 0.023434 0.023269exp 0.500000 0.500000 0.019611 0.019322 0.000629 0.000595 0.025092 0.024391
SCC w1 w2 out in out in out in
tanh 0.500000 0.500000 0.017883 0.017851 0.000472 0.000487 0.021731 0.022073sigmoid 0.500000 0.500000 0.018067 0.017798 0.000517 0.000476 0.022742 0.021825
linear 0.500000 0.500000 0.018658 0.017653 0.000546 0.000461 0.023375 0.021481exp 0.500000 0.500000 0.013203 0.012860 0.000291 0.000281 0.017057 0.016770
CPALL w1 w2 out in out in out in
tanh 0.500000 0.500000 0.020415 0.020792 0.000690 0.000681 0.026271 0.026090sigmoid 0.500000 0.500000 0.021457 0.020544 0.000752 0.000665 0.027426 0.025781
linear 0.500000 0.500000 0.020851 0.020698 0.000644 0.000692 0.025382 0.026314exp 0.500000 0.500000 0.015538 0.014341 0.000472 0.000386 0.021739 0.019653
Three Hidden Nodes Weight Parameter MAE MSE RMSE
PTT w1 w2 out in out in out in
tanh 0.500000 0.500000 0.019384 0.019376 0.000581 0.000607 0.024115 0.024637sigmoid 0.500000 0.500000 0.019355 0.019373 0.000608 0.000600 0.024667 0.024495
linear 0.500000 0.500000 0.019215 0.019423 0.000603 0.000602 0.024549 0.024536exp 0.500000 0.500000 0.009504 0.009139 0.000178 0.000165 0.013344 0.012845
SCC w1 w2 out in out in out in
tanh 0.500000 0.500000 0.017898 0.017843 0.000493 0.000482 0.022213 0.021961sigmoid 0.500000 0.500000 0.017337 0.017988 0.000458 0.000491 0.021412 0.022166
linear 0.500000 0.500000 0.017841 0.017864 0.000489 0.000483 0.022121 0.021968exp 0.500000 0.500000 0.012653 0.012585 0.000273 0.000267 0.016530 0.016353
Appl. Sci. 2021, 11, 3997 18 of 25
Table 5. Cont.
CPALL w1 w2 out in out in out in
tanh 0.500000 0.500000 0.014358 0.014825 0.000393 0.000414 0.019813 0.020359sigmoid 0.500000 0.500000 0.020873 0.020682 0.000718 0.000674 0.026790 0.025967
linear 0.500000 0.500000 0.021531 0.020515 0.000791 0.000655 0.028131 0.025598exp 0.500000 0.500000 0.014286 0.014245 0.000413 0.000385 0.020323 0.019629
Note: “in” and “out” denote in-sample and out-of-sample forecasts, respectively. Bold number indicates the lowest MAE, MSE, and RMSEin each case.
Table 6. Experimental results of regularized ANN-LU (RANN-LU).
One Hidden Neuron MAE MSE RMSE
PTT out in out in out in
tanh 0.020572 0.020551 0.000699 0.000633 0.026443 0.025164sigmoid 0.019717 0.019512 0.000641 0.000612 0.025322 0.024745
linear 0.011574 0.014203 0.000348 0.000239 0.018662 0.015467exp 0.011498 0.014183 0.000347 0.000204 0.018634 0.014291
MAE MSE RMSE
SCC out in out in out In
tanh 0.017615 0.017566 0.000461 0.000463 0.021466 0.021522sigmoid 0.017725 0.017619 0.000529 0.000470 0.023012 0.021684
linear 0.018800 0.017891 0.000541 0.000487 0.023267 0.022070exp 0.018008 0.017821 0.000519 0.000480 0.022791 0.021915
MAE MSE RMSE
CPALL out in out in out in
tanh 0.020698 0.020667 0.000679 0.000673 0.026053 0.025945sigmoid 0.020784 0.020711 0.000735 0.000675 0.027123 0.025978
linear 0.020946 0.020754 0.000785 0.000730 0.028028 0.027032exp 0.018078 0.016966 0.000571 0.000516 0.023883 0.022729
Two Hidden Neurons MAE MSE RMSE
PTT out in out in out in
tanh 0.020031 0.019463 0.000657 0.000611 0.025640 0.024736sigmoid 0.009529 0.009196 0.000186 0.000165 0.013632 0.012854
linear 0.019771 0.019383 0.000608 0.000610 0.024667 0.024689exp 0.009449 0.009187 0.000184 0.000160 0.013573 0.012657
MAE MSE RMSE
SCC out in out In out In
tanh 0.018178 0.017976 0.000517 0.000506 0.022743 0.022488sigmoid 0.013537 0.013251 0.000314 0.000294 0.017732 0.017155
linear 0.013337 0.013334 0.000295 0.000300 0.017182 0.017329exp 0.013058 0.013048 0.000285 0.000281 0.016898 0.016773
MAE MSE RMSE
CPALL out in out in out in
tanh 0.014284 0.014469 0.000359 0.000406 0.018953 0.020149sigmoid 0.014353 0.014793 0.000393 0.000415 0.019832 0.020377
linear 0.020375 0.020873 0.000663 0.000698 0.025759 0.026432exp 0.020621 0.020821 0.000675 0.000696 0.025989 0.026378
Appl. Sci. 2021, 11, 3997 19 of 25
Table 6. Cont.
Three Hidden Neurons MAE MSE RMSE
PTT out in out in out in
tanh 0.019891 0.019421 0.000646 0.000611 0.025424 0.024728sigmoid 0.019574 0.019510 0.000602 0.000621 0.024545 0.024933
linear 0.009158 0.009339 0.000173 0.000171 0.013169 0.013071exp 0.009113 0.009108 0.000162 0.000156 0.012731 0.012481
MAE MSE RMSE
SCC out in out in out in
tanh 0.017995 0.018026 0.000514 0.000507 0.022674 0.022523sigmoid 0.014593 0.013787 0.000355 0.000315 0.018857 0.017751
linear 0.018369 0.017916 0.000524 0.000503 0.022878 0.022455exp 0.013318 0.013200 0.000296 0.000295 0.017223 0.017189
MAE MSE RMSE
CPALL out in out in out in
tanh 0.020252 0.020950 0.000637 0.000713 0.025246 0.026711sigmoid 0.020111 0.020974 0.000604 0.000719 0.024565 0.026829
linear 0.020553 0.020850 0.000651 0.000705 0.025521 0.026563exp 0.014032 0.014000 0.000364 0.000393 0.019083 0.019832
Note: “in” and “out” denote in-sample and out-of-sample forecasts, respectively. Bold number indicates the lowest MAE, MSE, and RMSEin each case.
To make a clear explanation, the best prediction models for the three stocks aresummarized in Table 7. We found that the exponential activation function was mostlyselected (except for ANN-Center in the PTT case). This indicates a nonlinear patternof these three stock companies. It is evident that the performance measures of the bestspecification of ANN-CC models were lower than those of the ANN-Center and RANN-LUfor all stock indices, meaning that the ANN-CC model was superior to the conventionalmodels in both in- and out-of-sample forecast performances. Additionally, in this section,we also compared the performance of our ANN-CC against the methods proposed in theliterature, namely, center and center-range methods of Billard and Diday [7], PM methodof Souza et al. [9], and IT method of Buansing et al. [12]. The result clearly confirmed thehigher prediction performance of our proposed ANN-Center.
A robustness check was also conducted to confirm the performance of our proposedmodel. As in practice, the simple loss functions, such as MAE, MSE, and RMSE, maynot yield a piece of sufficient information to identify a single forecasting model as “best”.Therefore, in this study, another accuracy measure, the model confidence set (MCS), wasused to evaluate forecasting performance. Hansen et al. [34] introduced the MCS testto validate the forecasting performance of forecasting models. In this study, our MCStests were based on two loss functions, MAE, MSE, and RMSE. We note that when thep-value was higher, it meant that they were more likely to reject the null hypothesis ofequal predictive ability. In other words, the greater the p-value, the better the model. Formore details of the MCS test, we referred to Hansen et al. [34].
The statistical performance results for all competing models are provided in thebracket shown in Table 7. According to the MCS test results, the ANN-CC model clearlyoutperformed other competing models for both in- and out-of-sample forecasts. The p-values of the ANN-CC model were equal to one, but the other three models were less thanthe 0.10 threshold. It means that ANN-Center, RNN-LU, and linear regression forecastingmodels were removed in the MCS inspection process, and thus the ANN-CC was the onlysurvivor model.
Appl. Sci. 2021, 11, 3997 20 of 25
Table 7. Summary of the forecasting performance and model confidence set (MCS) test of the first dataset.
Number MAE MSE RMSE
PTT Activation Hidden Neuron out in out in out in
ANN-CC exp 2 0.009030(1.0000)
0.009085(1.0000)
0.000151(1.0000)
0.000145(1.0000)
0.012275(1.0000)
0.012040(1.0000)
ANN-Center linear 2 0.018735(0.0000)
0.018200(0.0000)
0.000548(0.0000)
0.000541(0.0000)
0.023434(0.0000)
0.023269(0.0000)
RANN-LU exp 3 0.009113(0.2012)
0.009108(0.1015)
0.000162(0.2450)
0.000156(0.0010)
0.012731(0.3232)
0.012481(0.0000)
IF 0.017932(0.0000)
0.017817(0.0000)
0.000513(0.0000)
0.000501(0.0000)
0.022652(0.0000)
0.022383(0.0000)
PM 0.018212(0.0000)
0.018326(0.0000)
0.000523(0.0000)
0.000561(0.0000)
0.022873(0.0000)
0.0236861(0.0000)
Center 0.021453(0.0000)
0.020541(0.0000)
0.000751(0.0000)
0.000664(0.0000)
0.027410(0.0000)
0.025771(0.0000)
Center-range 0.019877(0.0000)
0.01532(0.0000)
0.000717(0.0000)
0.000602(0.0000)
0.026771(0.0000)
0.024538(0.0000)
MAE MSE RMSE
SCC out in out in out in
ANN-CC exp 2 0.010011(1.0000)
0.010006(1.0000)
0.000156(1.0000)
0.000155(1.0000)
0.012494(1.0000)
0.012469(1.0000)
ANN-Center exp 3 0.012653(0.0000)
0.012585(0.0000)
0.000273(0.0000)
0.000267(0.0000)
0.016530(0.0000)
0.016353(0.0000)
RANN- LU exp 2 0.013058(0.0000)
0.013048(0.0000)
0.000285(0.0000)
0.000281(0.0000)
0.016898(0.0000)
0.016773(0.0000)
IF 0.012455(0.0000)
0.012101(0.0000)
0.000250(0.0000)
0.000236(0.0000)
0.015813(0.0000)
0.015366(0.0000)
PM 0.013013(0.0000)
0.012992(0.0000)
0.000271(0.0000)
0.000263(0.0000)
0.016464(0.0000)
0.016221(0.0000)
Center 0.014044(0.0000)
0.014021(0.0000)
0.000285(0.0000)
0.000281(0.0000)
0.016877(0.0000)
0.016765(0.0000)
Center-range 0.013221(0.0000)
0.013019(0.0000)
0.000277(0.0000)
0.000275(0.0000)
0.016642(0.0000)
0.016581(0.0000)
MAE MSE RMSE
CPALL out in out in out in
ANN-CC exp 2 0.011830(1.0000)
0.011123(1.0000)
0.000212(1.0000)
0.000201(1.0000)
0.014567(1.0000)
0.014183(1.0000)
ANN-Center exp 3 0.014286(0.0000)
0.014245(0.0000)
0.000413(0.0000)
0.000385(0.0000)
0.020323(0.0000)
0.019629(0.0000)
RANN- LU exp 3 0.014032(0.0000)
0.014000(0.0000)
0.000396(0.0000)
0.000393(0.0000)
0.019083(0.0000)
0.019832(0.0000)
IF 0.013251(0.0000)
0.013656(0.0000)
0.000372(0.0000)
0.000369(0.0000)
0.019291(0.0000)
0.019205(0.0000)
PM 0.015130(0.0000)
0.014561(0.0000)
0.000431(0.0000)
0.000426(0.0000)
0.020763(0.0000)
0.02071(0.0000)
Center 0.016125(0.0000)
0.015365(0.0000)
0.00510(0.0000)
0.000439(0.0000)
0.071423(0.0000)
0.020961(0.0000)
Center-range 0.019877(0.0000)
0.01532(0.0000)
0.000717(0.0000)
0.000602(0.0000)
0.026762(0.0000)
0.024524(0.0000)
Note: “in” and “out” denote in-sample and out-of-sample forecasts, respectively. Bold number indicates the lowest MAE, MSE, and RMSEin all cases. The number in the bracket indicates the p-value of MCS test results by using bootstrap simulation 1000 times.
5.3. Hong Kong Air Quality Monitoring Dataset
In the second dataset, we considered the Hong Kong air quality monitoring dataset asanother example. This dataset is suggested in Yang et al. [27] and can be retrieved fromhttp://www.epd.gov.hk (accessed on 20 February 2021). They provide hourly air quality
Appl. Sci. 2021, 11, 3997 21 of 25
data of 16 monitoring stations in Hong Kong. In this study, we also considered the datafrom Central/Western Station and downloaded the hourly data ranging from 1 January2020, to 31 December 2020. Then, we aggregated the hourly data to the minimum andmaximum form according to each day’s record. There were seven air quality indicators inthe database. However, we considered some of them and selected the respirable suspendedparticulates (RSP) as the interval-valued response variable and selected dinitrogen tetroxide(NO2) and sulfur dioxide (SO2) as the interval-valued explanatory variables.
Again, each dataset was split into 80% for training and 20% for testing. Thus, both in-sample and out-of-sample forecasts were conducted in this comparison. The performanceevaluation of the interval-valued data forecasting models ANN-CC under CAPM wasaccomplished through MAE, MSE, and RMSE. As shown in Table 8, similar to the firstexample, our proposed ANN-CC model was the best in terms of MAE, MSE, RMSE, andMCS’s p-value.
Table 8. Summary of the forecasting performance and MCS test of the second dataset.
Number MAE MSE RMSE
PTT Activation Hidden Neuron out in out in out in
ANN-CC sigmoid 2 2.6374(1.0000)
2.7092(1.0000)
7.3232(1.0000)
7.4085(1.0000)
2.7078(1.0000)
2.7225(1.0000)
ANN-Center sigmoid 2 2.9833(0.0000)
2.8363(0.0000)
8.1092(0.0000)
8.3233(0.0000)
2.8483(0.0000)
2.8854(0.0000)
RANN-LU sigmoid 3 2.7762(0.0000)
2.7423(0.0000)
7.6403(0.0000)
7.5409(0.0510)
2.7656(0.0000)
2.74613(0.0000)
IF 2.8532(0.0000)
2.7821(0.0000)
8.5110(0.0000)
8.3368(0.0000)
2.9161(0.0000)
2.8866(0.0000)
PM 2.7011(0.0000)
2.6893(0.0000)
7.5215(0.0000)
7.4212(0.0000)
2.7427(0.0000)
2.7240(0.0000)
Center 3.0029(0.0000)
2.9861(0.0000)
8.7334(0.0000)
8.7433(0.0000)
2.9543(0.0000)
2.9571(0.0000)
Center-range 2.8763(0.0000)
2.7823(0.0000)
8.5532(0.0000)
8.3499(0.0000)
2.9239(0.0000)
2.8883(0.0000)
Note: “in” and “out” denote in-sample and out-of-sample forecasts, respectively. Bold number indicates the lowest MAE, MSE, and RMSEin all cases. The number in the bracket indicates the p-value of MCS test results by using bootstrap simulation at 1000 times.
To illustrate the performance of our ANN-CC models, we show the out-of-sampleforecasting result of our model on the Thai stocks and RSP in Figure 5. For clarity, only10% of the out-of-sample forecasting result are shown. In this figure, the red vertical linesegment represents a predicted interval-valued data, while the gray vertical line segmentrepresents a predicted interval-valued data; the extremes correspond to the minimumand maximum interval values. The comparison between actual values and predictedvalues indicates the quality of modeling and the prediction task. The results show thatthe predicted values were very close to the actual values, indicating the goodness of fit ofour model.
Appl. Sci. 2021, 11, 3997 22 of 25Appl. Sci. 2021, 11, x FOR PEER REVIEW 23 of 26
(a) (PTT)
(b) (SCC)
(c) (CPALL)
(d) (RSP)
Figure 5. The out of sample interval forecasts by ANN-CC (red) vs. the actual data (gray) for ex-
ample data.
0 20 40 60 80 100 120
-0.1
0-0
.05
0.0
00
.05
0.1
0
Out-of-sample forecast
Inte
rva
l
Predicted interval-valued data
Actual interval-valued data
0 20 40 60 80 100 120
-0.1
0-0
.05
0.0
00
.05
0.1
0
Out-of-sample forecast
Inte
rva
l
Predicted interval-valued data
Actual interval-valued data
0 20 40 60 80 100 120
-0.1
0-0
.05
0.0
00
.05
0.1
0
Out-of-sample forecast
Inte
rva
l
Predicted interval-valued data
Actual interval-valued data
0 10 20 30 40 50 60
02
04
06
08
01
00
Out-of-sample forecast
Inte
rva
l
Predicted interval-valued data
Actual interval-valued data
Figure 5. The out of sample interval forecasts by ANN-CC (red) vs. the actual data (gray) forexample data.
Appl. Sci. 2021, 11, 3997 23 of 25
5.4. Discussion
Although the traditional models like ANN-Center and RANN-LU provide acceptableprediction results in all stocks and RSP, they still face some limitations. The ANN-Centerrelies on the midpoint of the data, which may not reflect real behavior of the data duringthe day. Likewise, the RANN-LU predicts future observation based on either lower orupper bounds, separately, clearly leaving out the information within the interval-valuedata. Thus, our proposed ANN-CC model is used to solve such a challenging task andconsider the whole information during the interval. According to the above experimentalresults, we can draw the following conclusions:
(1) Regarding the prediction performance, our ANN-CC was superior to other traditionalmodels in all datasets.
(2) We note that the symmetric weight within the interval data should not be w = 0.5.We found that the prediction result was sensitive to the weight w; thus, the weightshould not be a fixed parameter.
(3) Our model outperformed the ANN-LU and RANN-LU models in situations in whichthe interval series had linear and nonlinear behavior.
(4) We also studied the sensitivity of each activation function and found that the qualityof the prediction model was not very sensitive in many cases. However, carefulassessment needs to be made when choosing the activation function.
(5) Even though the exponential activation function seemed to be the best fit one inthe ANN architecture, it was noticed that other activation functions performed wellin some cases. Although the exponential activation function performed very wellin the selected three stocks, it may not be reliable in other stocks or under otherANN structures.
(6) However, we can draw an important conclusion that our ANN-CC is a promisingmodel for interval-valued data forecasting. The ANN-CC method has the advantagesof not assuming constraints for the weight nor fixing reference points. The ANN-CCmodel is adaptive and adjusts itself for the best fit. The fitted model allows thebehavior analysis of response lower and upper bounds based on the variation of thereference points of input and output intervals.
6. Conclusions
This paper proposed an artificial neural network with a convex combination (ANN-CC) method for interval-valued data prediction. Simulation and experimental results onreal data showed that the proposed ANN-CC model is a useful tool in interval-valuedprediction tasks, especially for complicated nonlinear datasets. Moreover, the proposedANN-CC model fills the research gap by considering interval-valued data using the convexcombination method. Our proposed model was examined by comparing its performancewith conventional ANN with the center method (ANN-Center) and regularized ANN-LU(RANN-LU), linear regression with the center method, center-range, PM, and IT methods.We considered three stock returns in the Thai stock market and Hong Kong air quality mon-itoring dataset in our empirical comparison. According to the in-sample and out-of-sampleforecasts, we found that the performances of various ANN-CC specifications were notmuch different. However, we observed that the tanh activation function performed wellin the in-sample and out-sample forecast, while the linear activation function performedrelatively well in the in-sample forecast. In addition, we could also confirm the higherperformance of our ANN-CC compared to that of the ANN-Center and RANN-LU. Ex-perimental results on two real datasets also confirmed that the proposed ANN-CC modeloutperformed the conventional models.
In this study, a neural network with one hidden layer was assumed. However, real-world data is quite complex, and one hidden layer may not be enough to learn the data.Thus, a deep neural network (more than two hidden layers) should be more promisingin the approximation performance. Another meaningful future work is to employ otherdeep learning methods, such as recurrent neural networks and long short-term memory.
Appl. Sci. 2021, 11, 3997 24 of 25
These models handle incoming data in time order and learn about the previous time topredict future value. In addition to deep learning methods, the fuzzy inference system(FIS) modeling approach for interval-valued time series forecasting [1] is also suggestedfor further study. Finally, our proposed model can be applied for forecasting in other areassuch as environmental and medical sciences.
Author Contributions: Conceptualization, R.P. and W.Y.; methodology, W.Y.; software, W.Y.; vali-dation, R.P., P.M. and W.Y.; formal analysis, R.P.; investigation, W.Y.; resources, P.M.; data curation,P.M.; writing—original draft preparation, W.Y.; writing—review and editing, W.Y.; visualization,R.P.; supervision, W.Y. and P.M. All authors have read and agreed to the published version ofthe manuscript.
Funding: This research was funded by Center of Excellence in Econometrics, Faculty of Economics,Chiang Mai University.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: In this study, we used simulated data to show the performance of ourmodel, and the simulation processes are already explained in the paper. For the real data analysissection, the data can be freely collected from Thomson Reuter DataStream. However, the data areavailable from the author upon request ([email protected] (accessed on 27 April 2021)).
Acknowledgments: The authors would like to thank Laxmi Worachai, Vladik Kreinovich, and HungT. Nguyen for their helpful comments on this paper. The authors are also grateful for the financialsupport offered by the Center of Excellence in Econometrics, Chiang Mai University, Thailand.
Conflicts of Interest: The authors declare no conflict of interests.
References1. Maciel, L.; Ballini, R. A fuzzy inference system modeling approach for interval-valued symbolic data forecasting. Knowl. Based
Syst. 2019, 164, 139–149. [CrossRef]2. Ma, X.; Dong, Y. An estimating combination method for interval forecasting of electrical load time series. Expert Syst. Appl. 2020,
158, 113498. [CrossRef]3. Chou, J.S.; Truong, D.N.; Le, T.L. Interval forecasting of financial time series by accelerated particle swarm-optimized multi-output
machine learning system. IEEE Access 2020, 8, 14798–14808. [CrossRef]4. Lauro, C.N.; Palumbo, F. Principal component analysis of interval data: A symbolic data analysis approach. Comput. Stat. 2000,
15, 73–87. [CrossRef]5. Branzei, R.; Branzei, O.; Gök, S.Z.A.; Tijs, S. Cooperative interval games: A survey. Cent. Eur. J. Oper. Res. 2010, 18, 397–411.
[CrossRef]6. Kiekintveld, C.; Islam, T.; Kreinovich, V. Security games with interval uncertainty. In Proceedings of the 2013 International
Conference on Autonomous Agents and Multi-Agent Systems, Paul, MN, USA, 6–10 May 2013; pp. 231–238.7. Billard, L.; Diday, E. Regression analysis for interval-valued data. In Data Analysis, Classification, and Related Methods; Springer:
Berlin/Heidelberg, Germany, 2000; pp. 369–374.8. Neto, E.D.A.L.; de Carvalho, F.D.A. Centre and range method for fitting a linear regression model to symbolic interval data.
Comput. Stat. Data Anal. 2008, 52, 1500–1515. [CrossRef]9. Souza, L.C.; Souza, R.M.; Amaral, G.J.; Silva Filho, T.M. A parametrized approach for linear regression of interval data. Knowl.
Based Syst. 2017, 131, 149–159. [CrossRef]10. Chanaim, S.; Sriboonchitta, S.; Rungruang, C. A Convex Combination Method for Linear Regression with Interval Data. In
Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2016; Huynh, V.N., Inuiguchi, M., Le, B., Le, B., Denoeux,T., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9978, pp. 469–480. [CrossRef]
11. Phadkantha, R.; Yamaka, W.; Tansuchat, R. Analysis of Risk, Rate of Return and Dependency of REITs in ASIA with Capital AssetPricing Model. In Predictive Econometrics and Big Data. TES 2018; Kreinovich, V., Sriboonchitta, S., Chakpitak, N., Eds.; Studies inComputational Intelligence; Springer: Cham, Switzerland, 2018; Volume 753, pp. 536–548. [CrossRef]
12. Buansing, T.T.; Golan, A.; Ullah, A. An information-theoretic approach for forecasting interval-valued SP500 daily returns. Int. J.Forecast. 2020, 36, 800–813. [CrossRef]
13. Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175.[CrossRef]
14. Mondal, P.; Shit, L.; Goswami, S. Study of effectiveness of time series modeling (ARIMA) in forecasting stock prices. Int. J.Comput. Sci. Eng. Appl. 2014, 4, 13. [CrossRef]
Appl. Sci. 2021, 11, 3997 25 of 25
15. McMillan, D.G. Nonlinear predictability of stock market returns: Evidence from nonparametric and threshold models. Int. Rev.Econ. Financ. 2001, 10, 353–368. [CrossRef]
16. Nyberg, H. Predicting bear and bull stock markets with dynamic binary time series models. J. Bank. Financ. 2013, 37, 3351–3363.[CrossRef]
17. Pastpipatkul, P.; Maneejuk, P.; Sriboonchitta, S. Markov Switching Regression with Interval Data: Application to Financial Riskvia CAPM. Adv. Sci. Lett. 2017, 23, 10794–10798. [CrossRef]
18. Phochanachan, P.; Pastpipatkul, P.; Yamaka, W.; Sriboonchitta, S. Threshold regression for modeling symbolic interval data. Int. J.Appl. Bus. Econ. Res. 2017, 15, 195–207.
19. Hiransha, M.; Gopalakrishnan, E.A.; Menon, V.K.; Soman, K.P. NSE stock market prediction using deep-learning models. ProcediaComput. Sci. 2018, 132, 1351–1362.
20. Haykin, S.; Principe, J. Making sense of a complex world [chaotic events modeling]. IEEE Signal Process. Mag. 1998, 15, 66–81.[CrossRef]
21. Zhang, G.; Patuwo, B.E.; Hu, M.Y. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 1998, 14, 35–62.[CrossRef]
22. Leung, D.; Abbenante, G.; Fairlie, D.P. Protease inhibitors: Current status and future prospects. J. Med. Chem. 2000, 43, 305–341.[CrossRef]
23. Cao, Q.; Leggio, K.B.; Schniederjans, M.J. A comparison between Fama and French′s model and artificial neural networks inpredicting the Chinese stock market. Comput. Oper. Res. 2005, 32, 2499–2512. [CrossRef]
24. San Roque, A.M.; Maté, C.; Arroyo, J.; Sarabia, Á. iMLP: Applying multi-layer perceptrons to interval-valued data. Neural Process.Lett. 2007, 25, 157–169. [CrossRef]
25. Maia, A.L.S.; de Carvalho, F.D.A.; Ludermir, T.B. Forecasting models for interval-valued time series. Neurocomputing 2008, 71,3344–3352. [CrossRef]
26. Maia, A.L.S.; de Carvalho, F.D.A. Holt’s exponential smoothing and neural network models for forecasting interval-valued timeseries. Int. J. Forecast. 2011, 27, 740–759. [CrossRef]
27. Yang, Z.; Lin, D.K.; Zhang, A. Interval-valued data prediction via regularized artificial neural network. Neurocomputing 2019, 331,336–345. [CrossRef]
28. Mir, M.; Nasirzadeh, F.; Kabir, H.D.; Khosravi, A. Neural Network-based interval forecasting of construction material prices. J.Build. Eng. 2021, 39, 102288.
29. Moore, R.E. Methods and Applications of Interval Analysis; Society for Industrial and Applied Mathematics: Philadelphia, PA,USA, 1979.
30. Humphrey, G.B.; Maier, H.R.; Wu, W.; Mount, N.J.; Dandy, G.C.; Abrahart, R.J.; Dawson, C.W. Improved validation frameworkand R-package for artificial neural network models. Environ. Model. Softw. 2017, 92, 82–106. [CrossRef]
31. Sharpe, W.F. Capital asset prices: A theory of market equilibrium under conditions of risk. J. Financ. 1964, 19, 425–442.32. Lintner, J. Security prices, risk, and maximal gains from diversification. J. Financ. 1965, 20, 587–615.33. Maneejuk, P.; Yamaka, W. Significance test for linear regression: How to test without P-values? J. Appl. Stat. 2021, 48, 827–845.
[CrossRef]34. Hansen, P.R.; Lunde, A.; Nason, J.M. The model confidence set. Econometrica 2011, 79, 453–497. [CrossRef]