A Convex Combination Approach for Artificial Neural Network ...

applied sciences

Article

A Convex Combination Approach for Artificial Neural Networkof Interval Data

Woraphon Yamaka , Rungrapee Phadkantha * and Paravee Maneejuk

��

Citation: Yamaka, W.; Phadkantha,

R.; Maneejuk, P. A Convex

Combination Approach for Artificial

Neural Network of Interval Data.

Appl. Sci. 2021, 11, 3997. https://

doi.org/10.3390/app11093997

Academic Editor: Tag Gon Kim

Received: 12 April 2021

Accepted: 26 April 2021

Published: 28 April 2021

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional affil-

iations.

Copyright: © 2021 by the authors.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

Center of Excellence in Econometrics, Faculty of Economics, Chiang Mai University, Chiang Mai 50200, Thailand;[email protected] (W.Y.); [email protected] (P.M.)* Correspondence: [email protected]

Abstract: As the conventional models for time series forecasting often use single-valued data (e.g.,closing daily price data or the end of the day data), a large amount of information during the day isneglected. Traditionally, the fixed reference points from intervals, such as midpoints, ranges, andlower and upper bounds, are generally considered to build the models. However, as different datasetsprovide different information in intervals and may exhibit nonlinear behavior, conventional modelscannot be effectively implemented and may not be guaranteed to provide accurate results. To addressthese problems, we propose the artificial neural network with convex combination (ANN-CC) modelfor interval-valued data. The convex combination method provides a flexible way to explore thebest reference points from both input and output variables. These reference points were then usedto build the nonlinear ANN model. Both simulation and real application studies are conducted toevaluate the accuracy of the proposed forecasting ANN-CC model. Our model was also comparedwith traditional linear regression forecasting (information-theoretic method, parametrized approachcenter and range) and conventional ANN models for interval-valued data prediction (regularizedANN-LU and ANN-Center). The simulation results show that the proposed ANN-CC model is asuitable alternative to interval-valued data forecasting because it provides the lowest forecastingerror in both linear and nonlinear relationships between the input and output data. Furthermore,empirical results on two datasets also confirmed that the proposed ANN-CC model outperformedthe conventional models.

Keywords: artificial neural network; convex combination method; interval-valued data; time series

1. Introduction

Time-series point (single-value data) forecasting normally fails to reflect the range offluctuation or uncertainty for economic, financial, and environmental data. Moreover, theexisting model of interval forecasting is still incomplete, complex, and has relatively lowaccuracy. Thus, interval-value data forecasting has become an important issue to be investi-gated [1,2]. This study comes within interval-valued time series forecasting framework byintroducing the convex combination (CC) method designed to choose the reference pointsthat better represent the interval-valued data. The CC method automatically explores theset of reference points from input and output variables to build the neural networks (NN)models. This is an enhancement and a generalization over existing methods.

Interval-valued data forecasting serves the needs of investors and data scientists whoare sometimes interested in single value-data and the variability of value intervals in thedata. Interval-valued data provides rich information that can help investors and datascientists make an accurate decision [3]. With the advance in data science, enormousinformation and big data can be collected nowadays. However, conventional forecastingmethods cannot be effectively implemented to deal with this big data to yield accurateresults. Furthermore, these methods are generally proposed to forecast future observationsusing only the point-valued data, which gives rise to a higher computation cost whendealing with big data. Moreover, it is sometimes difficult to express the real behavior

Appl. Sci. 2021, 11, 3997. https://doi.org/10.3390/app11093997 https://www.mdpi.com/journal/applsci

https://www.mdpi.com/journal/applsci

https://www.mdpi.com

https://orcid.org/0000-0002-0787-1437

https://doi.org/10.3390/app11093997


https://creativecommons.org/

https://creativecommons.org/licenses/by/4.0/

https://creativecommons.org/licenses/by/4.0/


https://www.mdpi.com/journal/applsci

https://www.mdpi.com/article/10.3390/app11093997?type=check_update&version=2

Appl. Sci. 2021, 11, 3997 2 of 25

of the variable using only point-valued data. Thus, interval-valued data, such as therange of temperature, stock returns, and willingness to pay (minimum and maximum),is generally used to predict uncertainty in many situations. Interval analysis suggestedby Lauro and Palumbo [4] assumed that observations and estimations in the world areusually uncertain, incomprehensive, and do not precisely represent the real behavior of thedata. This is quite true, as our use of point-valued data will entitle us to lose substantialinformation about prices or values during the day or the week. With interval-valued data,however, we can capture more realistic movement or information of the variables and canalso handle big data forecasting at the same time. Thus, the interval approach should beconsidered to explain real data behavior under the context of big data. Interval-valued datais a special type for symbolic data analysis (SDA), composed of lower and upper boundsof an interval. The objective of SDA is to provide a way to construct aggregated datadescribed by multivalued variables, and thereby provide an efficient way to summarizethe large data sets by some reference value of symbolic data. Thus, the estimation tools forinterval-valued data analysis are very intensively required in recent studies. The use ofinterval-data to represent uncertainty is common in various situations, such as in coalitionalgames where payoffs of coalitions are uncertain and could be modeled as intervals of realnumbers, see, e.g., Branzei et al. [5] and Kiekintveld et al. [6].

From a methodological point of view, interval-valued data is generally transformedinto point-valued data or reference point data by some techniques; then, this referencepoint data is furthered used as a variable in the models. One of the famous approachesfor dealing with interval-valued data is the mid-point method which was introduced byBillard and Diday [7]. They analyzed these data using ordinary least squares regression onthe midpoints of the intervals, namely the lower and upper bounds of the independentand dependent variables. Neto and De Carvalho [8] improved this approach by presentinga new method based on two linear regression models (center-range method). The firstregression model is fitted at the midpoints of the intervals and the second one on theranges of the intervals. It was found to be more efficient than that of Billard and Diday [7].More recently, Souza et al. [9], Chanaim et al. [10], and Phadkantha et al. [11] argued thatthe mid-point and range methods are not appropriate to be the reference of the intervaldata as they cannot present the real behavior of the data and both are also too restrictive.Thus, Souza et al. [9] introduced the parametrized approach (PM) to the intervals of inputvariables. This method can choose the reference points that better represent the inputintervals before building the regression. Chanaim et al. [10] and Phadkantha et al. [11]extended the PM approach by suggesting the convex combination (CC) method to getthe reference points of the input and output intervals before estimating the regressionmodel. Specifically, instead of restricting the weight of interval-valued data (w) to be0.5 (mid-point), Xc = 0.5(Xu) + 0.5(Xl), they generalized this weight to be unknown.Hence, the reference point data become Xcc = w(Xu) + (1−w)(Xl), where w ∈ [0, 1]is the weight parameter. Then, this reference point data is used as the input data in theregression model. Buansing et al. [12] also proposed the iterative, information-theoretic(IT) method for forecasting interval-valued data. Their method differs from others, asit does not assume that every point in the interval emerged from the same underlyingprocess; there may be multiple models behind the process. They showed that the IT methodprovides more accurate forecasting of the upper and lower bounds when compared to thecenter-range method.

There is voluminous literature on stock market estimation and forecasting based on awide variety of both linear and nonlinear models. Linear models like autoregressive (AR)and autoregressive moving average (ARMA) [13,14] and nonlinear models like thresholdand Markov switching models [15–18] have been used for time series forecasting. However,the equivocal and unforeseeable nature of the time series data has brought about thedifficulty in prediction [19], so artificial neural networks (ANN) models were introduced toforecast the future return of the stock. The major advantage of ANN models is their flexiblenonlinear modeling capability. With ANN, there is no need to specify a particular model

Appl. Sci. 2021, 11, 3997 3 of 25

form. Instead, the model is adaptively formed based on the features inherent in the data.This data-driven approach is suitable for many empirical data sets where no theoreticalguidance is available to suggest an appropriate data generating process. ANN providesdesirable properties that some traditional linear and nonlinear regression models lack, suchas being noise tolerant. The structure of the ANN model is inspired by real-life information-processing abilities of the human brain. Key attributes of the brain’s information networkinclude a nonlinear, parallel information processing structure and multiple connectionsbetween information nodes [20].

In the recent decade, several studies have indicated the higher performance of theANN models in forecasting compared to that of regression models. Zhang et al. [21] andLeung et al. [22] examined various prediction models based on multivariate classificationtechniques and compared both neural network and classical forecasting models. Theirexperiment results suggested that the probabilistic neural network can outperform thelevel estimation models, including adaptive exponential smoothing, vector autoregressionwith Kalman filter updating, multivariate transfer function, and multilayered feedforwardneural network, in terms of the prediction accuracy. In a more recent study, Cao et al. [23]demonstrated the accuracy of ANN in predicting Shanghai Stock Exchange (SHSE) move-ment by comparing the neural networks and linear models under the capital asset pricingmodel (CAPM) and Fama and French’s three-factor contexts, and they found that neuralnetworks outperformed the linear models.

As we mentioned above, this paper comes within the framework of interval-valueddata forecasting by ANN. To the best of our knowledge, work-related to interval-valueddata forecasting by neural networks is somewhat limited. Some attempts along this lineinclude San Roque et al. [24], Maia et al. [25], Maia and De Carvalho [26], and Yang et al. [27].Maia et al. [25] proposed the ANN-Center method to predict the interval value by using themidpoint of the interval as the input. Later, Maia and De Carvalho [26] introduced the ANN-LU method by using ANN-Center and ANN-Range (predicting the difference betweenupper and lower bounds) for predicting the lower and upper bounds of intervals separately.Yang et al. [27] suggested that ANN-LU may face the unnatural interval crossing problemwhen the predicted lower bounds of intervals are larger than the predicted upper ones andvice versa and thereby leading to an invalid interval prediction. Hence, they introducedthe regularized ANN-LU (RANN-LU) model for interval-valued time series forecasting.

Although ANN-center, ANN-LU, and RANN-LU are found to be superior to the clas-sical linear regression models for interval-valued data prediction of Billard and Diday [7],these models rely either on midpoint (ANN-Center) or lower and upper bounds of theinterval for forecasting, which may not be a good input for predicting the future value of aninterval. For example, in the case of RANN-LU, if we predict the future value of the upperbound and lower bound separately, the prediction may not be reliable, since the wholeinformation of the interval is not taken into account [27,28]. Although ANN-Center andANN-LU consider both upper and lower bounds in the prediction process, the predictionstill relies on the midpoint of the interval indicating the symmetric weight between lowerand upper bounds. To overcome these problems, in this study, we introduce the convexcombination (CC) approach to ANN for predicting interval-valued data, say, lower andupper bounds. Our model is a generalization of the ANN-Center, allowing the weight tobe more flexible and not to be fixed at 0.5 (asymmetric weight).

The novelty of the proposed ANN-CC can be summarized in the following two aspects:First, our method can construct prediction intervals based on the CC approach and ANNmodels. More specifically, this study proposes the novel CC method for interval ANNmodeling. In this approach, the intervals of input and output variables are parametrizedthrough the convex combination of lower and upper bounds. The proposed ANN-CCis a promising alternative to the existing approaches. With optimal reference points thatbetter represent the intervals, we are able to find an efficient solution that improves theprediction accuracy of the lower and upper bounds. Second, to the best of our knowledge,there is no study extending the CC approach for interval ANN modeling. Our proposed

Appl. Sci. 2021, 11, 3997 4 of 25

method fills in such a literature gap and can capture both linear and nonlinear patternswithin interval-valued data.

The rest of this paper is organized as follows. Section 2 gives a brief review ofinterval-valued prediction methods. Section 3 presents the proposed ANN with the convexcombination method. In Section 4, we provide a simulation study to assess the performanceof our proposed method; Section 5 describes the data used in this study. The analyticalresults are presented in Section 5. Section 6 provides the conclusion of this study.

2. Reviews of Existing Methods2.1. Linear Regression Based on Center Method

Let Xl = (xlt1, . . . , xl

tk) and Xu = (xut1, . . . , xu

tk), t = 1, . . . , T, are the lower and upperbounds of the intervals, respectively.

According to Billard and Diday [7], the center or mid-point of interval-valued explana-tory variables denoted as Xc is calculated from:

Xc =Xl + Xu

2. (1)

Likewise, interval-valued response variable denoted as Yc is calculated from:

Yc =Yl + Yu

2, (2)

where Yl = (yl1, . . . , yl

T) and Yu = (yu1 , . . . , yu

T).Thus, regression based on the center method can be constructed as

Yc = (Xc)′βc + εc, (3)

where βc = (βc1, . . . , βc

k) is the vector of parameters. εc = (εc1, . . . , εc

T) are errors that have anormal distribution. Using matrix notation, this problem can be estimated by the ordinaryleast squares (OLS) method under the full rank assumption:

βc= (Xc ′Xc)

−1Xc ′Yc. (4)

Then, the estimates for the response lower and upper bounds are presented as_Y

l=

Xl_β

cand

_Y

u= Xu

_β

c, respectively.

2.2. Linear Regression Based on Center-Range Method

Neto and De Carvalho [8] also introduced the center-range method to predict theupper and lower bounds of the dependent variable intervals. In this method, the lowerand upper bounds of the interval-valued response variable are separately predicted bythe mid-points and ranges of the interval-valued explanatory variables. Thus, this modelis built on two linear regression models, namely the regression based on center method(Equation (3)) and the regression-based on range method:

Yr = Xr ′βr + εr, (5)

where Xr = (Xu − Xl)/2 is half-ranges of explanatory variables, Yr is half-ranges ofresponse variables, εr is the error. Using matrix notation, the least-squares estimatorparameter βr is given by

βr= (Xr ′Xr)

−1Xr ′Yr. (6)

Then, we can predict the response lower and upper bounds as_Y

l= Xc ′_β

c− Xr ′_β

r

and_Y

u= Xc ′_β

c+ Xr ′_β

r, respectively.

Appl. Sci. 2021, 11, 3997 5 of 25

2.3. Linear Regression Based on Convex Combination Method

Chanaim et al. [10] suggested that the center method may lead to the misspecificationproblem, as the midpoint of the intervals might not be a good reference of the intervals.To tackle this problem, they proposed employing the convex combination approach todetermine the best reference point between the ranges of interval data

xcck = w1kxl

k + (1−w1k)xuk ; w1k ∈ [0, 1], (7)

Ycc = w2Yl + (1−w2)Yu ; w2 ∈ [0, 1], (8)

where w= [w 11, . . . , w1kw2] is the weight parameter of the interval data with values [0, 1].The advantage of this method lies in the flexibility to assign weights in calculating theappropriate value between intervals. Thus

Ycc = Xcc ′βcc + εcc, (9)

where Xcc = (xcct1, . . . , xcc

tk), t = 1, . . . , T. Using matrix notation, this problem is alsoestimated by the OLS method under the full rank assumption:

βcc= (Xcc ′Xcc)

−1(Xcc ′Ycc). (10)

The response lower bound prediction is described in Equation (11), and the model topredict the response upper bound is given by Equation (12),

_Y

l=

_X

cc_β

cc+

_Y

r(_w2), (11)

_Y

u=

_X

cc_β

cc+

_Y

r(1−_

w2), (12)

where_Y

r= Xl ′_β

cc− Xu ′_β

cc, (13)

where_Y

ris the range prediction and

_X

cc=(_w1jxl

j + (1−_w1k)xu

j

), j = 1, . . . , k.

2.4. Regularized Artificial Neural Network (RANN)

Yang et al. [27] introduced RANN for interval-valued data prediction. This methodis able to approximate various forms of nonlinearity in the data and directly models thenon-cross lower and upper bounds of intervals. In this model, the relation between theinterval-valued output (Yu and Yl) and interval-valued inputs (Xu and Xl) is as follows:

Yl = f

(J

∑j=1

g

(2k

∑i=1

XiωI,lij + bI,l

j

)ωo,l

j + bo,lj

), (14)

Yu = f

(J

∑j=1

g

(2k

∑i=1

XiωI,uij + bI,u

j

)ωo,u

j + bo,uj

), (15)

where X= (Xu, Xl) ∈ R2k consists of 2k inputs. ω I,uij , ω I,l

ij and bI,lj , bI,u

j represent the weight

parameters and bias terms of the jth hidden layer neuron of input layer. ωo,uij , ωo,l

ij and

bo,lj , bo,u

j represent the weight parameters and bias terms of the jth hidden layer neuron ofoutput layer. f (·) and g(·) are the activation functions. To meet non-crossing lower andupper bounds of intervals, a non-crossing regularize is introduced in the loss functionas follows,

Loss =1

2T

T

∑t=1

(Yl − Yl)2+

12T

T

∑t=1

(Yu − Yu)2+

λ

2T

T

∑t=1

{max(0, Yl − Yu)

}2, (16)

Appl. Sci. 2021, 11, 3997 6 of 25

where λ > 0 is the regularization parameter for controlling the non-crossing strength.

3. The Proposed Method Artificial Neural Network with ConvexCombination (ANN-CC)

Artificial neural network (ANN) models can approximate various forms of nonlinear-ity in the data. There are various types of ANN, but the most popular one is the MultilayerPerceptron (MLP). ANN models have been successfully applied in a variety of fields suchas accounting, economics, finance, and marketing, as well as forecasting [20,26]. In thisstudy, we use a three-layer ANN model in which both the inputs and outputs contain thelower and upper bounds of intervals. As shown in Figure 1, suppose the interval-valueddata (X, Y) consists of one predictor and one response. The first layer is the input layers,the middle layer is called the hidden layer, and the last layer is the output layers. Allthe data are expressed in the form of a lower-upper bound. Thus, we have

{Xu

1 , Xl1

}and

{Yu, Yl

}. For convenience, we use the notation Xcc= w1Xl

1 + (1−w1)Xu1 for input

variable and Ycc= w2Yl + (1−w2)Yu for output variable. Note that ANN-center is theparticular case of ANN-CC if we set w1 = w2 = 0.5. Each layer contains jth neurons whichare connected to one another and are also connected with all neurons in the immediatenext layer. The neuron input path has a signal on its Xcc, and the strength of the path ischaracterized by a weight of neuron j (wj). The neuron is modeled as summing the pathweight times the input signal over all paths and adding the node bias (b). Note that theneural network consists of input function and output function; thus, we can express thesefunctions as:

Hcc,Ij = g

(Xcc ′ω I

j + bIj

), (17)

where Hcc,Ij is the jth hidden neuron’s input, ω I

j is the weight vector between the hidden

layer and the input layer. g(·) is the activation function for the hidden layer. bIj is the bias

term of input layer. Then, Hcc,Ij is transformed into output Ycc with the activation function

of the output layer. Thus, the model can be written as:

Ycc = f

(J

∑j=1

Hcc,Ij ωo

j + boj

), (18)

where ωo ={

ωo1, . . . , ωo

j ) is the weight vector between the hidden layer and the outputlayer. bo

j is the bias term of output layer. f (·) is the activation function of the output layer. Achallenge in ANN design is the selection of activation function. It is also known as transferfunction, and can be basically divided into four types: tanh or hyperbolic tangent activationfunction (tanh), sigmoid or logistic activation function (sigmoid), linear activation function(linear), and exponential activation function (exp).

Learning occurs through the adjustment of the path weights and node biases. Themost common method used for this adjustment is backpropagation. In this method, theoptimal weights ω I

j and ωoj are estimated by minimizing the squared difference between

the model output and the estimated. We formulate the loss function as follows,

loss =1T

T

∑t=1

(Ycc − Ycc), (19)

where Ycc and Ycc are the estimated output and observed output, respectively. In additionto the weights of neuron j (ωj), our estimation also considers the weight parameter of theinterval data (w) to determine the reference of the output and input variables.

Appl. Sci. 2021, 11, 3997 7 of 25Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 26

Figure 1. The network architecture of the interval-valued data artificial neural network with con-

vex combination (ANN-CC) model (one hidden layer is assumed.

In our work, the ANN-CC methodology includes the grid search for the parameter

1 2={w , w )w . Note that grid search was executed before estimating the weight parameters o

j and o

j in the ANN structure. A pseudo-code for the performed grid search and

ANN-CC estimation is presented in Algorithm 1.

Algorithm 1. Pseudo code for the proposed ANN-CC with one predictor and one re-

sponse

Require: 1 2={w , w )w , where 1w and 2w are the set of candidate weight of input

1 1{ , }u lX X and output { , }u lY Y , respectively, within [0, 1].

#Serach the optimal w

For each 1 2w , wi i in 1 2w , w [0.001,0.002,...,1]

Calculate 2 2w (1 w )cc l u

i i iY Y Y and 1 1 1 1w (1 w )cc l u

i i iX X X

#Define the loss function of ANN structure

1{ ,..., )o o o

i i ij , 1{ ,..., )I I I

i i ij = Parameters( )

,cc I cc I I

ij i ij ijH g x b

,cc cc I o o

i ij ij ijY f H b

2

( ) cc cc

i i i iloss Y Y w

# Follow the gradients until convergence

( , )o I

i i i ω

repeat

Loss

i i i ω ω ω

until convergence

end for

#Choose the 1 2={w , w )w with the lowest iLoss as the optimal 1 2={w , w )w

1

uX

1

lX

1

ccX

NN1

NN2

… 1

ccY

1

uY

1

lY

….

NN J

1 1 ib

0b

1w

11 w I

jw

o

j

2w

21 w

Figure 1. The network architecture of the interval-valued data artificial neural network with convexcombination (ANN-CC) model (one hidden layer is assumed.

As the interval data is manipulated as a new type of numbers represented by anordered pair of its minimum and maximum values, the numerical manipulations of intervaldata should follow “interval calculus” [25]. In practice, we can separately predict the lowerand upper bounds of the interval using the min-max method. However, this method doesnot guarantee the mathematical coherence of predicted bounds. That is, the predictedlower bounds of intervals should be smaller than the predicted upper ones. Otherwise,the unnatural interval crossing problem will occur, which leads to an invalid intervalprediction [27]. Furthermore, the forecasting performance can be impaired if there is nota clear dependency between the respective bounds of output and input [28,29]. Insteadof restricting the weight of interval-valued data (w = (w1, w2)) to be 0.5 (mid-point), inthis study, we consider the convex combination method to get the reference point of theinterval-valued data. Thus, in this estimation, the weights of neuron j (ωj) are dependenton the given weight w. Since there is no close form solution for this weight parameterof the interval data, we thus employ a grid search selection of w that minimizes thesum of squared errors, denoted as loss(w). Then, we can rewrite our loss function inEquation (19) as

loss(w) =1T

T

∑t=1

(Ycc − Ycc(w))2. (20)

This loss function is computed using two steps of estimation. Firstly, it is important tosolve a nonlinear optimization to obtain (ωj) which depends on the candidate wi. In thesecond step, the loss function is then minimized with respect to the candidate wi. Then,we introduce another candidate w to repeat the first step. After loss(w) is computed for allcandidates wi, the minimum value of loss(w) is preferred. Thus, the optimal w = (w1, w2)is obtained by

_w = argmin

w= loss(w). (21)

We note that ANN estimates over a grid search between 0 and 1. Finally, followingthe CC method in Section 2.3, we can predict the lower and upper bounds as follows.

Yl = f

(J

∑j=1

Hcc,Ij (

_w1)

_ω

oj +

_b

o

j

)− Yr(

_w2), (22)

Appl. Sci. 2021, 11, 3997 8 of 25

Yu = f

(J

∑j=1

Hcc,Ij (

_w1)

_ω

oj +

_b

o

j

)+ Yr(1−_

w2), (23)

where Hcc,Ij (

_w1) = g

(xcc(

_w1)

′_ω

Ij +

_b

I

j

)is the jth hidden neuron’s input Xcc(

_w1) and the

range prediction is computed by

Yr = f

(J

∑j=1

Hu,Ij

_ω

oj +

_b

o

j

)− f

(J

∑j=1

Hl,Ij

_ω

oj +

_b

o

j

), (24)

where Hl,Ij = g

(Xl ′_ω

Ij +

_b

I

j

)and Hu,I

j = g(

Xu ′_ωIj +

_b

I

j

). By making this prediction, the

predicted lower bounds of intervals should not cross over the corresponding upper bound.

We also note that_ω

Ij and

_ω

oj have contained the information of the weight parameter of

the interval data X and Y.In our work, the ANN-CC methodology includes the grid search for the parameter

_w=

{_w

1,_w2). Note that grid search was executed before estimating the weight parameters

_ω

oj and

_ω

oj in the ANN structure. A pseudo-code for the performed grid search and ANN-

CC estimation is presented in Algorithm 1.

Algorithm 1. Pseudo code for the proposed ANN-CC with one predictor and one response

Require : w= {w 1, w2), where w1 and w2 are the set of candidate weight of input{

Xu1 , Xl

1

}and output

{Yu, Yl

}, respectively, within [0, 1].

#Serach the optimal_w

For each wi1, wi2 in w1, w2 = [0.001, 0.002, . . . , 1]Calculate Ycc

i = wi2Yl + (1−wi2)Yu and Xcci = wi1Xl

1 + (1−wi1)Xu1

#Define the loss function of ANN structure

ωoi =

{ωo

i1, . . . , ωoij) , ω I

i ={

ω Ii1, . . . , ω I

ij) = Parameters( )

Hcc,Iij = g

(xcc

i′ω I

ij + bIij

)Ycc

i = f(

∑ Hcc,Iij ωo

ij + boij

)lossi(wi) = ‖Ycc

i − Ycci ‖

2

# Follow the gradients until convergenceωi = (ωo

i , ω Ii )

repeatωi =ωi − ρ∇ωLoss

iuntil convergenceend for#Choose the w= {w 1, w2) with the lowest Lossi as the optimal

_w=

{_w

1,_w2)

_w = argmin

w= loss(w)

Calculate_Y

cc=

_w2Yl + (1−_

w2)Yu and_X

cc=

_w1Xl

1 + (1−_w1)Xu

1

# Compute the loss function of ANN structure using_Y

ccand

_X

cc

Loss∗(_w) = ‖Ycc − Ycc‖2

# Follow the gradients until convergenceω = (ωo, ω I)repeatω =ω− ρ∇ωLoss∗

until convergence

Appl. Sci. 2021, 11, 3997 9 of 25

4. Simulation Study

To examine the performance of our proposed method, we conducted a simulationstudy. We considered two data generation processes which are different in structure: linearand nonlinear.

4.1. Linear Structure

The simple interval generation process is conducted, and one independent variableis assumed. We considered three typical data generation processes with different weightparameters. For this purpose, we considered the following three Scenarios of weight inthe intervals:

Scenario 1: Center of interval data: w1 = 0.5, w2 = 0.5[(0.5)Yl + (1− 0.5)Yu

]= 1 + 5

[(0.5)Xl + (1− 0.5)Xu

]+ ε. (25)

Scenario 2: Deviate from the center to the lower of the interval data: w1 = 0.2, w2 = 0.2[(0.2)Yl + (1− 0.2)Yu

]= 1 + 5

[(0.2)Xl + (1− 0.2)Xu

]+ ε. (26)

Scenario 3: Deviate from the center to the upper of the interval data: w1 = 0.8, w2 = 0.8[(0.8)Yl + (1− 0.8)Yu

]= 1 + 5

[(0.8)Xl + (1− 0.8)Xu

]+ ε. (27)

For each scenario, we performed 100 replications. In each simulation, we proceededas follows.

(1) Generate the error from the normal distribution with mean zero and variance one.(2) Generate the upper bound of the independent variable Xu, from the uniform (1,3).

Then, we computed the lower bound of the independent variable Xl = Xu − rx, whererx ∼ U(0, 2) denotes the range between the upper and lower bounds.

(3) Compute the expected independent variable Xcc of the intervals that have beencomputed by

Xcc = (w1Xu) + (1 − w1)Xl . Then, we could generate the expected dependentvariables Ycc = Xccβ + ε.

(4) Finally, we derived the upper and lower bounds of intervals by Yl = Ycc − ry,

where ry ∼ U(0, 2) and Yu =(

Ycc − (1−w2)Yl)

/w2. We note rx and ry are a randomnumber for simulating the interval X and Y, respectively. This guaranteed that the boundsare not crossing each other.

In this simulation study, we performed 100 replications with a sample size n = 1000 forall three scenarios. Each simulation dataset a randomly split, with 80% for training and 20%for testing. Our ANN with the convex combination method (ANN-CC) was then comparedwith two conventional models, namely the ANN-Center and the RANN-LU method ofYang et al. [27]. In this simulation study, four transfer functions, namely tanh or hyperbolictangent activation function (tanh), sigmoid or logistic activation function (sigmoid), linearactivation function (linear), and exponential activation function (exp) were considered.To simplify the comparison, one input layer, one output layer, one hidden layer, and onehidden neuron were assumed. We noted that the ANN-Center could be estimated by thevalidann package in R programming language [30]. In addition, this package also providesvalidation methods for the replicative, predictive, and structural validation of artificialneural network models.

Appl. Sci. 2021, 11, 3997 10 of 25

To assess the performance of these models, we conducted the following measures:mean absolute error (MAE), mean squared errors (MSE), and root mean squared errors(RMSE). The formulae of MAE, MSE, and RMSE are as follows:

MAE =

(N∑

i=1

∣∣∣∣_Yu

i −Yui

∣∣∣∣/N)+

(N∑

i=1

∣∣∣∣_Y l

i −Yli

∣∣∣∣/N)

2, (28)

MSE =

N∑

i=1(_Y

u

i −Yui )

2/N +

N∑

i=1(_Y

l

i −Yli )

2

/N

2, (29)

RMSE =

√N∑

i=1(_Y

u

i −Yui )

2/N +

√N∑

i=1(_Y

l

i −Yli )

2

/N

2(30)

We repeated the simulation 100 times and obtained the simulation data with 100 samples.An example of each of the simulated interval-valued time series is presented in Figure 2. Inthis figure, each square plot represents the relationship between the interval X and Y.

Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 26

In this simulation study, we performed 100 replications with a sample size n = 1000

for all three scenarios. Each simulation dataset a randomly split, with 80% for training and

20% for testing. Our ANN with the convex combination method (ANN-CC) was then

compared with two conventional models, namely the ANN-Center and the RANN-LU

method of Yang et al. [27]. In this simulation study, four transfer functions, namely tanh

or hyperbolic tangent activation function (tanh), sigmoid or logistic activation function

(sigmoid), linear activation function (linear), and exponential activation function (exp)

were considered. To simplify the comparison, one input layer, one output layer, one hid-

den layer, and one hidden neuron were assumed. We noted that the ANN-Center could

be estimated by the validann package in R programming language [30]. In addition, this

package also provides validation methods for the replicative, predictive, and structural

validation of artificial neural network models.

To assess the performance of these models, we conducted the following measures:

mean absolute error (MAE), mean squared errors (MSE), and root mean squared errors

(RMSE). The formulae of MAE, MSE, and RMSE are as follows:

1 1

/ /

MAE2

N Nu u l l

i i i i

i i

Y Y N Y Y N

,

(28)

2 2

1 1

( ) / ( ) /

MSE ,2

N Nu u l l

i i i i

i i

Y Y N Y Y N

(29)

2 2

1 1

( ) / ( ) /

RMSE2

N Nu u l l

i i i i

i i

Y Y N Y Y N

(30)

We repeated the simulation 100 times and obtained the simulation data with 100

samples. An example of each of the simulated interval-valued time series is presented in

Figure 2. In this figure, each square plot represents the relationship between the intervalX and Y .

(a) Scenario 1 (b) Scenario 2

-2 0 2 4

-10

01

02

0

PTT

X.Interval

Y.In

terv

al

Midpoint data

Interval-valued data

-2 0 2 4

-10

01

02

03

04

0

PTT

X.Interval

Y.In

terv

al

Midpoint data



(c) Scenario 3

Figure 2. Interval-valued data plot for the linear case. The red dots indicate the mid-point value

within the interval.

Table 1 presents the results of 100 repetitions for the linear structure case. The MAE,

MSE, and RMSE are reported. We observed that the ANN model with the CC method

(ANN-CC) showed its powerful nonlinear approximation ability (tanh, sigmoid, exp) as

the MAE, MSE, and RMSE values were lower than those of the ANN-Center and RANN-

LU in Scenarios 2 and 3. It is also noticed that tanh function performed the best fit function

for the ANN-CC model for these simulated datasets. Not surprisingly, we observed that

our ANN-CC did not outperform the ANN-Center method under Scenario 1. This is due

to the interval data being simulated from the midpoint. However, our ANN-CC method

still performed better than RANN-LU in this scenario. In sum, from evaluating our ANN

model with convex combination performance, we reached a similar conclusion for Sce-

narios 2 and 3. Our proposed model performed well in the simulation study, and the

ANN-CC method showed high performance in all scenarios.

Table 1. Experimental results for the linear case.

ANN-CC ANN-Center RANN-LU

Scenario 1 MAE MSE RMSE MAE MSE RMSE MAE MSE RMSE

tanh 2.4011

(0.1548)

10.8590

(1.3251)

3.2951

(1.2434)

2.3154

(0.1215)

9.4115

(1.1154)

3.0681

(1.0859)

3.1244

(1.1245)

14.2251

(2.3414)

3.7729

(1.5521)

sigmoid 2.5870

(0.1211)

10.9894

(1.3511)

3.3154

(1.2433)

2.4023

(0.1011)

9.5558

(1.1584)

3.0917

(1.0756)

3.1148

(1.3584)

12.5441

(3.6974)

3.5423

(1.2532)

linear 2.7244

(0.1513)

12.0524

(1.1125)

3.4723

(1.3234)

2.4488

(0.1254)

10.3554

(1.0015)

3.2184

0028)

3.5415

(1.2121)

14.3554

(1.5487)

3.7890

(1.1112)

exp 2.5980

(0.1148)

11.3486

(1.981)

3.3693

(1.2113)

2.4223

(0.1011)

10.0215

(1.1057)

3.1661

(1.0723)

3.1057

(1.2554)

12.5015

(3.4548)

3.5361

(0.9723)


tanh 2.1350

(0.1254)

7.0789

(1.3258)

2.6610

(1.1123)

3.1332

(2.1254)

9.0173

(2.4848)

3.0021

(1.4434)

3.5445

(1.3145)

15.1541

(2.6879)

3.8935

(1.2329)

sigmoid 2.3328

(0.1158)

8.6030

(1.2217)

2.9337

(1.0283)

4.2214

(3.3141)

10.6030

(1.6278)

3.2565

(1.3233)

2.5847

(1.2597)

9.6984

(3.3354)

3.1154

(1.2810)

linear 2.7825

(0.1698)

8.9356

(1.1369)

2.9894

(1.0022)

4.3112

(2.6545)

10.9356

(2.3679)

3.3078

(1.2232)

4.1125

(1.3541)

14.0778

(1.1548)

3.7533

(1.0012)

exp 2.4625

(0.1354)

8.5072

(1.2589)

2.9173

(0.9233)

3.5845

(2.3651)

10.4072

(2.2589)

3.2269

(1.1129)

3.4797

(1.5479)

10.3155

(1.1554)

3.2129

(0.9928)


-2 0 2 4

-10

01

02

0

PTT

X.Interval

Y.In

terv

al

Midpoint data


Figure 2. Interval-valued data plot for the linear case. The red dots indicate the mid-point valuewithin the interval.

Table 1 presents the results of 100 repetitions for the linear structure case. The MAE,MSE, and RMSE are reported. We observed that the ANN model with the CC method

Appl. Sci. 2021, 11, 3997 11 of 25

(ANN-CC) showed its powerful nonlinear approximation ability (tanh, sigmoid, exp) as theMAE, MSE, and RMSE values were lower than those of the ANN-Center and RANN-LU inScenarios 2 and 3. It is also noticed that tanh function performed the best fit function forthe ANN-CC model for these simulated datasets. Not surprisingly, we observed that ourANN-CC did not outperform the ANN-Center method under Scenario 1. This is due tothe interval data being simulated from the midpoint. However, our ANN-CC method stillperformed better than RANN-LU in this scenario. In sum, from evaluating our ANN modelwith convex combination performance, we reached a similar conclusion for Scenarios 2and 3. Our proposed model performed well in the simulation study, and the ANN-CCmethod showed high performance in all scenarios.

Table 1. Experimental results for the linear case.



tanh 2.4011(0.1548)

10.8590(1.3251)

3.2951(1.2434)

2.3154(0.1215)

9.4115(1.1154)

3.0681(1.0859)

3.1244(1.1245)

14.2251(2.3414)

3.7729(1.5521)

sigmoid 2.5870(0.1211)

10.9894(1.3511)

3.3154(1.2433)

2.4023(0.1011)

9.5558(1.1584)

3.0917(1.0756)

3.1148(1.3584)

12.5441(3.6974)

3.5423(1.2532)

linear 2.7244(0.1513)

12.0524(1.1125)

3.4723(1.3234)

2.4488(0.1254)

10.3554(1.0015)

3.21840028)

3.5415(1.2121)

14.3554(1.5487)

3.7890(1.1112)

exp 2.5980(0.1148)

11.3486(1.981)

3.3693(1.2113)

2.4223(0.1011)

10.0215(1.1057)

3.1661(1.0723)

3.1057(1.2554)

12.5015(3.4548)

3.5361(0.9723)


tanh 2.1350(0.1254)

7.0789(1.3258)

2.6610(1.1123)

3.1332(2.1254)

9.0173(2.4848)

3.0021(1.4434)

3.5445(1.3145)

15.1541(2.6879)

3.8935(1.2329)

sigmoid 2.3328(0.1158)

8.6030(1.2217)

2.9337(1.0283)

4.2214(3.3141)

10.6030(1.6278)

3.2565(1.3233)

2.5847(1.2597)

9.6984(3.3354)

3.1154(1.2810)

linear 2.7825(0.1698)

8.9356(1.1369)

2.9894(1.0022)

4.3112(2.6545)

10.9356(2.3679)

3.3078(1.2232)

4.1125(1.3541)

14.0778(1.1548)

3.7533(1.0012)

exp 2.4625(0.1354)

8.5072(1.2589)

2.9173(0.9233)

3.5845(2.3651)

10.4072(2.2589)

3.2269(1.1129)

3.4797(1.5479)

10.3155(1.1554)

3.2129(0.9928)


tanh 2.1049(0.1112)

6.9441(1.5159)

2.6352(0.6333)

3.4488(2.1254)

9.9410(4.3549)

3.1539(1.6727)

2.4141(1.0413)

8.5454(2.5444)

2.9245(1.5529)

sigmoid 2.1249(0.1874)

7.0088(1.6511)

2.6468(0.7843)

3.9784(2.3743)

15.0113(5.4035)

3.8730(1.2270)

2.5454(1.1115)

10.5544(2.1125)

3.2456(1.2332)

linear 2.8524(0.1369)

12.9612(1.3594)

3.6013(1.0091)

3.8411(2.6588)

14.9023(5.8941)

3.8607(1.3410)

2.9445(0.8797)

13.1154(2.1112)

3.6210(1.2221)

exp 2.4489(0.1364)

10.4617(1.5114)

3.2344(0.8833)

4.8778(4.2643)

16.4107(5.4113)

4.0511(2.0013)

3.0124(1.0694)

11.3547(1.9967)

3.2683(1.1009)

Note: The parenthesis denotes the standard deviation. The bold number presents the lowest values of mean absolute error (MAE), meansquared errors (MSE), and root mean squared errors (RMSE).

4.2. Nonlinear Structure

Similar to the linear structure, we considered three typical data generation pro-cesses with different weight parameters. Three Scenarios of weight in the intervals wereas follows:

Scenario 1: Center of interval data: w1 = 0.5, w2 = 0.5[(0.5)Yl + (1− 0.5)Yu

]= 1 + 3e[(0.5)Xl+(1−0.5)Xu ]

2+ ε. (31)

Scenario 2: Deviate from the center to the lower of the interval: w1 = 0.2, w2 = 0.2[(0.2)Yl + (1− 0.2)Yu

]= 1 + 3e[(0.2)Xl+(1−0.2)Xu ]

2+ ε. (32)

Appl. Sci. 2021, 11, 3997 12 of 25

Scenario 3: Deviate from the center to the upper of the interval: w1 = 0.8, w2 = 0.8[(0.8)Yl + (1− 0.8)Yu

]= 1 + 3e[(0.8)Xl+(1−0.8)Xu ]

2+ ε. (33)

For each scenario, we performed 100 replications. This data was more complicatedthan that in the linear structure case, as it showed a nonlinear relationship between thedependent and independent variables, as shown in Figure 3. The results are summarizedin Table 2.

Table 2 presents the simulation results based on the nonlinear case. Similar results wereobtained, as the proposed ANN-CC showed higher performance in Scenarios 2 and 3. Weobserved that ANN-Center still performed poorly in Scenarios 2 and 3. The reason is simple.This model fixes the weight parameter at the center, which does not correspond to the truedata generating process, thus the ANN-Center leads to higher bias of the prediction.

The experiments were carried out using an Intel Core i5-6400 CPU 2.7 GHz 4 core16Gb RAM. The computational cost of our ANN-CC model were a bit higher than thoseof ANN-Center and RANN-LU. The training was the only time-consuming step for allmodels. It was also observed that the computation time of our proposed model was largerthan ANN-Center and RANN-LU, as the additional weight of interval-valued data wasestimated simultaneously during the optimization.


tanh 2.1049

(0.1112)

6.9441

(1.5159)

2.6352

(0.6333)

3.4488

(2.1254)

9.9410

(4.3549)

3.1539

(1.6727)

2.4141

(1.0413)

8.5454

(2.5444)

2.9245

(1.5529)

sigmoid 2.1249

(0.1874)

7.0088

(1.6511)

2.6468

(0.7843)

3.9784

(2.3743)

15.0113

(5.4035)

3.8730

(1.2270)

2.5454

(1.1115)

10.5544

(2.1125)

3.2456

(1.2332)

linear 2.8524

(0.1369)

12.9612

(1.3594)

3.6013

(1.0091)

3.8411

(2.6588)

14.9023

(5.8941)

3.8607

(1.3410)

2.9445

(0.8797)

13.1154

(2.1112)

3.6210

(1.2221)

exp 2.4489

(0.1364)

10.4617

(1.5114)

3.2344

(0.8833)

4.8778

(4.2643)

16.4107

(5.4113)

4.0511

(2.0013)

3.0124

(1.0694)

11.3547

(1.9967)

3.2683

(1.1009)

Note: The parenthesis denotes the standard deviation. The bold number presents the lowest values of mean absolute error

(MAE), mean squared errors (MSE), and root mean squared errors (RMSE).

4.2. Nonlinear Structure

Similar to the linear structure, we considered three typical data generation processes

with different weight parameters. Three Scenarios of weight in the intervals were as fol-

lows:

Scenario 1: Center of interval data: 1w 0.5 , 2w 0.5

2(0.5) (1 0.5)

(0.5) (1 0.5) 1 3 .l uX X

l uY Y e (31)

Scenario 2: Deviate from the center to the lower of the interval: 1w 0.2 , 2w 0.2

2(0.2) (1 0.2)

(0.2 ) (1 0.2 ) 1 3 .l uX X

l uY Y e (32)

Scenario 3: Deviate from the center to the upper of the interval: 1w 0.8 , 2w 0.8

2(0.8) (1 0.8)

(0.8 ) (1 0.8 ) 1 3 .l uX Xl uY Y e

(33)

For each scenario, we performed 100 replications. This data was more complicated

than that in the linear structure case, as it showed a nonlinear relationship between the

dependent and independent variables, as shown in Figure 3. The results are summarized

in Table 2.

(a) Scenario 1 (b) Scenario 2

-2 0 2 4

02

04

06

08

01

00

12

0

PTT

X.Interval

Y.In

terv

al

Midpoint data


-2 0 2 4

05

01

00

PTT

X.Interval

Y.In

terv

al

Midpoint data



(c) Scenario 3

Figure 3. Interval-valued data plot for the nonlinear case. The red dots indicate the mid-point

value within the interval.

Table 2. Experimental results for the nonlinear case.



tanh 3.6341

(0.3015)

14.8140

(2.3840)

3.8491

(1.0023)

3.1258

(0.2474)

9.8741

(2.0126)

3.1438

(0.9343)

3.9874

(1.4126)

14.9874

(3.2158)

3.8715

(0.9323)

sigmoid 3.4558

(0.3114)

10.9894

(5.0114)

3.3155

(0.9833)

3.2154

(0.2099)

9.9741

(2.3654)

3.1582

(0.8834)

4.9845

(1.9136)

15.3898

(5.1145)

3.9245

(0.9823)

linear 5.3155

(1.4259)

17.1148

(5.6584)

4.1377

(1.0824)

5.2145

(1.3978)

16.4213

(3.3665)

4.0523

(1.1112)

5.4136

(2.0113)

17.8854

(5.7897)

4.2295

(1.2334)

exp 4.6211

(0.8797)

16.3123

(2.8557)

4.0391

(1.0067)

3.1158

(0.9788)

14.1314

(2.6547)

3.7599

(0.9823)

4.8654

(1.3541)

17.0198

(3.9844)

4.1256

(1.2220)


tanh 3.2250

(0.8453)

9.1588

(3.6941)

3.0372

(0.8832)

3.9788

(2.5481)

10.5448

(4.6641)

3.2475

(1.1234)

3.7888

(1.1255)

9.8368

(3.6879) 3.1367

sigmoid 4.0581

(1.3511)

13.5154

(3.6444)

3.6765

(0.8734)

5.3698

(2.4145)

20.3688

(4.3658)

4.5139

(2.4449)

4.3781

(1.8746)

16.1556

(5.1034) 4.0197

linear 4.8744

(2.5584)

17.9981

(4.3398)

4.2428

(0.9734)

6.1142

(3.9451)

27.6548

(10.4891)

5.2597

(2.1240)

5.6684

(2.9876)

22.8314

(6.3598) 4.7783

exp 4.1158

(0.7446)

15.4458

(3.3658)

3.9311

(0.9872)

5.4101

(2.3155)

20.0123

(4.4115)

4.4744

(2.1098)

4.2666

(1.3659)

14.9155

(1.4231) 3.8628


tanh 3.4589

(0.7894)

9.3158

(4.1158)

3.0544

(0.9239)

5.8115

(3.0125)

21.1930

(10.3661)

4.6033

(1.2323)

3.5012

(1.2458)

9.4125

(4.2320)

3.0681

(1.9383)

sigmoid 4.1585

(1.3841)

14.3651

(3.9887)

3.7901

(1.1112)

5.9884

(2.3688)

20.0113

(10.4035)

4.4730

(1.2223)

4.3685

(1.5645)

14.8554

(4.3598)

3.8544

(1.2234)

linear 4.8664

(2.1355)

16.6557

(6.0123)

4.0814

(1.0389)

5.9424

(2.6871)

25.1253

(10.3211)

5.0131

(1.4980)

4.9785

(1.8994)

16.9974

(5.3145)

4.1229

(1.4409)

exp 4.2556

(1.2154)

14.4456

(4.1106)

3.8009

(0.9227)

6.6698

(5.1155)

31.4107

(12.9987)

5.6044

(1.3409)

4.8664

(2.0115)

15.5024

(2.1258)

3.9377

(1.2284)

Note: One hidden layer node is assumed for ANN in this simulation study. The parenthesis denotes the standard devia-

tion. The bold number presents the lowest values of MAE, MSE, and RMSE.

Table 2 presents the simulation results based on the nonlinear case. Similar results

were obtained, as the proposed ANN-CC showed higher performance in Scenarios 2 and

-2 0 2 4

02

04

06

08

01

00

PTT

X.Interval

Y.In

terv

al

Midpoint data


Figure 3. Interval-valued data plot for the nonlinear case. The red dots indicate the mid-point valuewithin the interval.

Appl. Sci. 2021, 11, 3997 13 of 25

Table 2. Experimental results for the nonlinear case.



tanh 3.6341(0.3015)

14.8140(2.3840)

3.8491(1.0023)

3.1258(0.2474)

9.8741(2.0126)

3.1438(0.9343)

3.9874(1.4126)

14.9874(3.2158)

3.8715(0.9323)

sigmoid 3.4558(0.3114)

10.9894(5.0114)

3.3155(0.9833)

3.2154(0.2099)

9.9741(2.3654)

3.1582(0.8834)

4.9845(1.9136)

15.3898(5.1145)

3.9245(0.9823)

linear 5.3155(1.4259)

17.1148(5.6584)

4.1377(1.0824)

5.2145(1.3978)

16.4213(3.3665)

4.0523(1.1112)

5.4136(2.0113)

17.8854(5.7897)

4.2295(1.2334)

exp 4.6211(0.8797)

16.3123(2.8557)

4.0391(1.0067)

3.1158(0.9788)

14.1314(2.6547)

3.7599(0.9823)

4.8654(1.3541)

17.0198(3.9844)

4.1256(1.2220)


tanh 3.2250(0.8453)

9.1588(3.6941)

3.0372(0.8832)

3.9788(2.5481)

10.5448(4.6641)

3.2475(1.1234)

3.7888(1.1255)

9.8368(3.6879) 3.1367

sigmoid 4.0581(1.3511)

13.5154(3.6444)

3.6765(0.8734)

5.3698(2.4145)

20.3688(4.3658)

4.5139(2.4449)

4.3781(1.8746)

16.1556(5.1034) 4.0197

linear 4.8744(2.5584)

17.9981(4.3398)

4.2428(0.9734)

6.1142(3.9451)

27.6548(10.4891)

5.2597(2.1240)

5.6684(2.9876)

22.8314(6.3598) 4.7783

exp 4.1158(0.7446)

15.4458(3.3658)

3.9311(0.9872)

5.4101(2.3155)

20.0123(4.4115)

4.4744(2.1098)

4.2666(1.3659)

14.9155(1.4231) 3.8628


tanh 3.4589(0.7894)

9.3158(4.1158)

3.0544(0.9239)

5.8115(3.0125)

21.1930(10.3661)

4.6033(1.2323)

3.5012(1.2458)

9.4125(4.2320)

3.0681(1.9383)

sigmoid 4.1585(1.3841)

14.3651(3.9887)

3.7901(1.1112)

5.9884(2.3688)

20.0113(10.4035)

4.4730(1.2223)

4.3685(1.5645)

14.8554(4.3598)

3.8544(1.2234)

linear 4.8664(2.1355)

16.6557(6.0123)

4.0814(1.0389)

5.9424(2.6871)

25.1253(10.3211)

5.0131(1.4980)

4.9785(1.8994)

16.9974(5.3145)

4.1229(1.4409)

exp 4.2556(1.2154)

14.4456(4.1106)

3.8009(0.9227)

6.6698(5.1155)

31.4107(12.9987)

5.6044(1.3409)

4.8664(2.0115)

15.5024(2.1258)

3.9377(1.2284)

Note: One hidden layer node is assumed for ANN in this simulation study. The parenthesis denotes the standard deviation. The boldnumber presents the lowest values of MAE, MSE, and RMSE.

5. Application to Real Data5.1. Capital Asset Pricing Model: Thai Stocks

The performance of this ANN-CC model was assessed using interval-valued returns(maximum and minimum returns) from the Stock Exchange of Thailand. In this study,we compared the forecasting accuracy of several ANN models under the context of thecapital asset pricing model (CAPM). Our analysis employed lower and upper bounds ofthe real daily stock returns, including SET50, PTT, SCC, and CPALL, over the period from26 October 2011, to 7 May 2019. These three companies were selected for this study becausethey are considered well-performing companies in terms of their share prices. Moreover,they also experienced large trade volumes in the current decade and are regarded asfast-growing and highly volatile in the Thai stock market. All data were collected fromThomson Reuters DataStream. The criteria to choose the best fit model were MAE, MSE,and RMSE.

CAPM was proposed in separate studies by Sharpe [31] and Lintner [32] to measurethe risk of the individual stock against the market in terms of the beta risk βi. The βi is ameasure of a stock’s risk (volatility of returns) reflected by measuring the fluctuation of itsprice changes relative to the overall market. In other words, it is the stock’s sensitivity tomarket risk. The model can be written as

rit − r f t = β0 + βi(rmt − r f t) + εit, (34)

where rit is the return of the stock i, rmt is the return of the market, r f t is risk-free, εit iserror term at time t. If βi > 0, the stock is called aggressive stock (high risk), otherwise

Appl. Sci. 2021, 11, 3997 14 of 25

defensive stock (low risk) of excess stock return. In this study, we preserve the intervalformat of the stock i as [

rlit, ru

it

]=

[Pl

it − PAt−1

PAit−1

,Pu

it − PAit−1

PAit−1

], (35)

where Puit , Pl

it and PAit are maximum, minimum, and average prices of the individual stock

i at time t, respectively. Note that the risk-free rate rlf t and ru

f t, in this empirical study, isassumed to be zero.

Again, we preserve the interval format of the market return as

[rl

mt, rumt

]=

[Pl

mt − PAmt−1

PAmt−1

,Pu

m,t − PAmt−1

PAmt−1

], (36)

where Pumt, Pl

mt and PAmt are maximum, minimum, and average prices of the interval of the

market at time t, respectively.We constructed a deep neural network using stock returns from the Stock Exchange

of Thailand (SET), the major stock market in Thailand. We have selected three major stockswith high market capitalization at the beginning of the sample period. We collected SET50(proxy of the market) and three company stocks, PTT, SCC, and CPALL, over the periodfrom 4 January 2012 to 30 December 2019. The interval-valued data were constructed froma daily range of selected price indexes, i.e., the lowest and highest trading index valuesfor the day were calculated to define the movement on the market for that day. Then,the construction of our interval-valued prediction for these three stocks would be madebased on the models developed in Section 3. We note that all the interval price serieswere transformed to be interval returns following Equations (34) and (35). The descriptivestatistics, namely the mean, standard deviation, minimum value, and maximum value ofthe variables for the full sample, are summarized in Table 3. We observe that the returnsexhibited negative skewness for lower bound returns and positive for upper bound returns.In addition, the values of skewness on both sides were asymmetric, indicating that thegain and the loss in the Thai stock market were quite different. For illustration, a visualrelationship between the stock market and each stock is shown in Figure 4.

Table 3. Data description.

SET_ u SET_ l PTT_ u PTT_ l SCC_ u SCC_ l CPALL_ u CPALL_ l

Mean 0.012 −0.011 0.019 −0.018 0.018 −0.017 0.021 −0.019Median 0.010 −0.009 0.016 −0.015 0.016 −0.015 0.017 −0.016

Maximum 0.097 0.025 0.136 0.033 0.113 0.027 0.185 0.032Minimum −0.019 −0.106 −0.027 −0.128 −0.016 −0.105 −0.058 −0.172Std. Dev. 0.010 0.010 0.016 0.015 0.014 0.013 0.017 0.016Skewness 1.894 −1.812 1.576 −1.564 1.653 −1.475 2.038 −2.280Kurtosis 12.247 11.383 7.981 8.775 8.337 7.522 12.824 15.743

Jarque—Bera 7659.845 6398.846 2665.961 3308.472 3022.738 2235.950 8677.507 14,052.600MBF Jarque-Bera 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Observations 1841 1841 1841 1841 1841 1841 1841 1841Unit root test −2.230 −3.454 −2.125 −2.125 −2.661 −2.385 −2.307 −3.255MBF-unit root 0.083 0.003 0.105 0.105 0.029 0.058 0.070 0.005

Note: _ u and _ l are upper and lower bounds, respectively.

Appl. Sci. 2021, 11, 3997 15 of 25


We constructed a deep neural network using stock returns from the Stock Exchange

of Thailand (SET), the major stock market in Thailand. We have selected three major stocks

with high market capitalization at the beginning of the sample period. We collected SET50

(proxy of the market) and three company stocks, PTT, SCC, and CPALL, over the period

from 4 January 2012 to 30 December 2019. The interval-valued data were constructed from

a daily range of selected price indexes, i.e., the lowest and highest trading index values

for the day were calculated to define the movement on the market for that day. Then, the

construction of our interval-valued prediction for these three stocks would be made based

on the models developed in Section 3. We note that all the interval price series were trans-

formed to be interval returns following Equations (34) and (35). The descriptive statistics,

namely the mean, standard deviation, minimum value, and maximum value of the varia-

bles for the full sample, are summarized in Table 3. We observe that the returns exhibited

negative skewness for lower bound returns and positive for upper bound returns. In ad-

dition, the values of skewness on both sides were asymmetric, indicating that the gain and

the loss in the Thai stock market were quite different. For illustration, a visual relationship

between the stock market and each stock is shown in Figure 4.

Table 3. Data description.

SET_ u SET_ l PTT_ u PTT_ l SCC_ u SCC_ l CPALL_ u CPALL_ l

Mean 0.012 −0.011 0.019 −0.018 0.018 −0.017 0.021 −0.019

Median 0.010 −0.009 0.016 −0.015 0.016 −0.015 0.017 −0.016

Maximum 0.097 0.025 0.136 0.033 0.113 0.027 0.185 0.032

Minimum −0.019 −0.106 −0.027 −0.128 −0.016 −0.105 −0.058 −0.172

Std. Dev. 0.010 0.010 0.016 0.015 0.014 0.013 0.017 0.016

Skewness 1.894 −1.812 1.576 −1.564 1.653 −1.475 2.038 −2.280

Kurtosis 12.247 11.383 7.981 8.775 8.337 7.522 12.824 15.743

Jarque—Bera 7659.845 6398.846 2665.961 3308.472 3022.738 2235.950 8677.507 14,052.600

MBF Jarque-Bera 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Observations 1841 1841 1841 1841 1841 1841 1841 1841

Unit root test −2.230 −3.454 −2.125 −2.125 −2.661 −2.385 −2.307 −3.255

MBF-unit root 0.083 0.003 0.105 0.105 0.029 0.058 0.070 0.005

Note: _ u and _ l are upper and lower bounds, respectively.

(a) PTT (b) SCC (c) CPALL

Figure 4. Interval-valued data plots for Stock Exchange of Thailand (SET) and stocks. The red dot presents the midpoint

data plot for SET and stocks.

-0.10 -0.05 0.00 0.05

-0.1

0-0

.05

0.0

00

.05

0.1

0

PTT

X.Interval

Y.In

terv

al

Midpoint data


-0.06 -0.04 -0.02 0.00 0.02 0.04 0.06

-0.0

6-0

.04

-0.0

20

.00

0.0

20

.04

0.0

6

PTT

X.Interval

Y.In

terv

al

Midpoint data


Midpoint data


-0.05 0.00 0.05 0.10

-0.1

0-0

.05

0.0

00

.05

0.1

0

PTT

X.Interval

Y.In

terv

al

Midpoint data


Figure 4. Interval-valued data plots for Stock Exchange of Thailand (SET) and stocks. The red dot presents the midpointdata plot for SET and stocks.

Moreover, a unit root test was also conducted to examine whether a time series variablewas nonstationary and possessed a unit root. In this study, we used the minimum Bayesfactor (MBF) as a tool for making a statistical test. The MBF has a significant advantageover the p-value because the likelihood of the observed data can be expressed under eachhypothesis [33]. If 1 < MBF < 1/3, 1/3 < MBF < 1/10, 1/10 < MBF < 1/30, 1/30 < MBF< 1/100, 1/100 < MBF < 1/300 and MBF < 1/300, there is a chance that the MBF favors,respectively, the weak evidence, moderate evidence, substantial evidence, strong evidence,very strong evidence, and decisive evidence for the null hypothesis. According to theresults, all data series were decisive stationary, as shown by the low MBF values [33].

5.2. Comparison Results

This section shows the results of the models representing artificial neural networksfor interval-valued data of Thailand’s stock market. In this empirical example, we considerthe number of hidden neurons between one and three hidden neurons with four activationfunctions. Again, one input layer, one output layer, and one hidden layer were assumed.Thus, twelve ANN specifications were used to describe and forecast the excess return ofPTT, SCC, and CPALL stock returns under the CAPM context. To evaluate the forecastingperformance of the twelve ANN-class specifications presented in this section, the fore-casting process was handled as follows: Each dataset was split into 80% for training and20% for testing. Thus, both in-sample and out-of-sample forecasts were conducted in thiscomparison. The performance evaluation of the interval-valued data forecasting modelsANN-CC under CAPM was accomplished through MAE, MSE, and RMSE.

Table 4 shows the in-sample and out-of-sample estimation results for different ANN-CC specifications. Focusing on the MAE, MSE, and RMSE, the result indicated that theinterval-valued prediction results were a bit sensitive to activation functions. These acti-vation functions were compared, and we found that the exponential activation functionenabled more accurate forecasting in most cases, as the MAE, MSE, and RMSE of this acti-vation were lower than of other activation functions. When the number of hidden nodesconsidered in this comparison was compared, we observed that the higher the number ofhidden nodes, the lower MAE, MSE, and RMSE in some cases. Finally, we considered theweight of interval-valued data, which was calculated by the convex combination method.The results show that most of the weight parameters for interval excess stock return andinterval excess market return were not equal to 0.5. Therefore, it can be said that theassumption of the center method in the neural network forecasting of Maia et al. [25] maybe inappropriate for practical problems. This result confirms the reliability of the convexcombination method in ANN-CC forecasting.

Appl. Sci. 2021, 11, 3997 16 of 25

Table 4. Experimental results of ANN-CC.

One Hidden Node Weight Parameter MAE MSE RMSE

PTT w1 w2 out in out in out in

tanh 0.648521 0.999000 0.014094 0.013565 0.000315 0.000298 0.017751 0.017265sigmoid 0.608577 0.962942 0.011501 0.011931 0.000249 0.000237 0.015792 0.015362

linear 0.611678 0.999000 0.013707 0.013696 0.000303 0.000301 0.017401 0.017378exp 0.607746 0.999000 0.010572 0.010023 0.000221 0.000209 0.014862 0.014465

SCC w1 w2 out in out in out In

tanh 0.510112 0.927919 0.011382 0.011377 0.000218 0.000211 0.014768 0.014533sigmoid 0.513121 0.910014 0.010506 0.011060 0.000190 0.000160 0.013781 0.012658

linear 0.519982 0.920239 0.011793 0.011423 0.000235 0.000232 0.015343 0.015238exp 0.509987 0.930019 0.010734 0.011065 0.000193 0.000168 0.013897 0.012970

CPALL w1 w2 out in out in out In

tanh 0.412312 0.481917 0.013848 0.013835 0.000391 0.000384 0.019765 0.019588sigmoid 0.461660 0.500013 0.012627 0.012231 0.000235 0.000191 0.015343 0.013832

linear 0.386464 0.342022 0.013619 0.012975 0.000353 0.000277 0.018756 0.016650exp 0.461668 0.500101 0.013608 0.012891 0.000342 0.000269 0.018489 0.016413

Two Hidden Nodes Weight Parameter MAE MSE RMSE

PTT w1 w2 out in out in out In

tanh 0.398420 0.398420 0.014168 0.013570 0.000328 0.000294 0.018121 0.017151sigmoid 0.409278 0.409278 0.010203 0.009748 0.000167 0.000157 0.012934 0.012535

linear 0.421373 0.421373 0.013431 0.013755 0.000315 0.000298 0.017763 0.017269exp 0.389728 0.389728 0.009030 0.009085 0.000151 0.000145 0.012275 0.012040


tanh 0.358503 0.358503 0.012146 0.011344 0.000242 0.000201 0.0155543 0.014181sigmoid 0.387659 0.387659 0.010360 0.010467 0.000162 0.000168 0.012736 0.012977

linear 0.359730 0.359730 0.010858 0.010981 0.000172 0.000190 0.013121 0.013779exp 0.351820 0.351820 0.010011 0.010006 0.000156 0.000155 0.012494 0.012469


tanh 0.423709 0.423709 0.014073 0.013789 0.000318 0.000319 0.017836 0.017856sigmoid 0.376741 0.376741 0.012700 0.012163 0.000255 0.000262 0.015960 0.016191

linear 0.431735 0.431735 0.015190 0.013474 0.000393 0.000301 0.019825 0.017356exp 0.427827 0.427827 0.011830 0.011123 0.000212 0.000201 0.014567 0.014183

Three Hidden Nodes Weight Parameter MAE MSE RMSE

PTT w1 w2 out in out in out In

tanh 0.394010 0.394010 0.012486 0.011650 0.000279 0.000215 0.016751 0.014672sigmoid 0.387755 0.387755 0.010510 0.009682 0.000192 0.000151 0.013866 0.012291

linear 0.425890 0.425890 0.012102 0.011565 0.000258 0.000222 0.016080 0.014912exp 0.383586 0.383586 0.010312 0.009675 0.000180 0.000150 0.013425 0.012256


tanh 0.363167 0.363167 0.011112 0.011669 0.000183 0.000215 0.013529 0.014668sigmoid 0.337693 0.337693 0.010553 0.010319 0.000175 0.000161 0.013230 0.012692

linear 0.364510 0.364510 0.011384 0.011578 0.000187 0.000214 0.013677 0.014633exp 0.367622 0.367622 0.010887 0.010341 0.000179 0.000164 0.013381 0.012815


tanh 0.399262 0.399262 0.014182 0.013737 0.000397 0.000306 0.019931 0.017499sigmoid 0.410937 0.410937 0.012159 0.012331 0.000261 0.000261 0.016160 0.016158

linear 0.415327 0.415327 0.013745 0.012910 0.000319 0.000282 0.017865 0.016780exp 0.423794 0.423794 0.011926 0.012321 0.000231 0.000252 0.015185 0.015873

Note: “in” and “out” denote in-sample and out-of-sample forecasts, respectively. Bold number indicates the lowest MAE, MSE, and RMSEin each case.

Appl. Sci. 2021, 11, 3997 17 of 25

Furthermore, we also compared the performance of the ANN-CC models with thetraditional ANN models, ANN-Center (Table 5) and RANN-LU (Table 6). Tables 5 and 6also provide the MAE, MSE, and RMSE of ANN-Center and RANN-LU, respectively. Wenote that four activations and a different number of hidden neurons were compared, andwe also found that the exponential activation function provided higher performance inmost cases. Furthermore, the prediction error decreased for a larger number of hiddenneurons because the MAE, MSE, and RMSE of the ANN-Center and RANN-LU modelswere lower when the number of hidden neurons increased.

Table 5. Experimental results of ANN-Center.

One Hidden Node Weight Parameter MAE MSE RMSE


tanh 0.500000 0.500000 0.019527 0.019337 0.000626 0.000593 0.025023 0.024332sigmoid 0.500000 0.500000 0.019831 0.019362 0.000639 0.000596 0.025286 0.024424

linear 0.500000 0.500000 0.019064 0.019355 0.000597 0.000583 0.024441 0.02415exp 0.500000 0.500000 0.012656 0.011672 0.000261 0.000225 0.016154 0.015011

SCC w1 w2 out in out in out in

tanh 0.500000 0.500000 0.017691 0.017901 0.000479 0.000496 0.021866 0.022275sigmoid 0.500000 0.500000 0.017738 0.017932 0.000506 0.000499 0.022495 0.022342

linear 0.500000 0.500000 0.017561 0.017892 0.000465 0.000486 0.021569 0.022032exp 0.500000 0.500000 0.017469 0.017703 0.000455 0.000472 0.021335 0.021737

CPALL w1 w2 out in out in out in

tanh 0.500000 0.500000 0.021078 0.020439 0.000801 0.000631 0.028314 0.025126sigmoid 0.500000 0.500000 0.021049 0.020295 0.000797 0.000626 0.028236 0.025027

linear 0.500000 0.500000 0.021850 0.020635 0.000857 0.000658 0.029283 0.025666exp 0.500000 0.500000 0.022129 0.020771 0.000891 0.000733 0.029861 0.027078

Two Hidden Nodes Weight Parameter MAE MSE RMSE


tanh 0.500000 0.500000 0.018938 0.019209 0.000558 0.000556 0.023622 0.023830sigmoid 0.500000 0.500000 0.019426 0.019363 0.000608 0.000611 0.024658 0.024490

linear 0.500000 0.500000 0.018735 0.018200 0.000548 0.000541 0.023434 0.023269exp 0.500000 0.500000 0.019611 0.019322 0.000629 0.000595 0.025092 0.024391


tanh 0.500000 0.500000 0.017883 0.017851 0.000472 0.000487 0.021731 0.022073sigmoid 0.500000 0.500000 0.018067 0.017798 0.000517 0.000476 0.022742 0.021825

linear 0.500000 0.500000 0.018658 0.017653 0.000546 0.000461 0.023375 0.021481exp 0.500000 0.500000 0.013203 0.012860 0.000291 0.000281 0.017057 0.016770


tanh 0.500000 0.500000 0.020415 0.020792 0.000690 0.000681 0.026271 0.026090sigmoid 0.500000 0.500000 0.021457 0.020544 0.000752 0.000665 0.027426 0.025781

linear 0.500000 0.500000 0.020851 0.020698 0.000644 0.000692 0.025382 0.026314exp 0.500000 0.500000 0.015538 0.014341 0.000472 0.000386 0.021739 0.019653

Three Hidden Nodes Weight Parameter MAE MSE RMSE


tanh 0.500000 0.500000 0.019384 0.019376 0.000581 0.000607 0.024115 0.024637sigmoid 0.500000 0.500000 0.019355 0.019373 0.000608 0.000600 0.024667 0.024495

linear 0.500000 0.500000 0.019215 0.019423 0.000603 0.000602 0.024549 0.024536exp 0.500000 0.500000 0.009504 0.009139 0.000178 0.000165 0.013344 0.012845


tanh 0.500000 0.500000 0.017898 0.017843 0.000493 0.000482 0.022213 0.021961sigmoid 0.500000 0.500000 0.017337 0.017988 0.000458 0.000491 0.021412 0.022166

linear 0.500000 0.500000 0.017841 0.017864 0.000489 0.000483 0.022121 0.021968exp 0.500000 0.500000 0.012653 0.012585 0.000273 0.000267 0.016530 0.016353

Appl. Sci. 2021, 11, 3997 18 of 25

Table 5. Cont.


tanh 0.500000 0.500000 0.014358 0.014825 0.000393 0.000414 0.019813 0.020359sigmoid 0.500000 0.500000 0.020873 0.020682 0.000718 0.000674 0.026790 0.025967

linear 0.500000 0.500000 0.021531 0.020515 0.000791 0.000655 0.028131 0.025598exp 0.500000 0.500000 0.014286 0.014245 0.000413 0.000385 0.020323 0.019629


Table 6. Experimental results of regularized ANN-LU (RANN-LU).

One Hidden Neuron MAE MSE RMSE

PTT out in out in out in

tanh 0.020572 0.020551 0.000699 0.000633 0.026443 0.025164sigmoid 0.019717 0.019512 0.000641 0.000612 0.025322 0.024745

linear 0.011574 0.014203 0.000348 0.000239 0.018662 0.015467exp 0.011498 0.014183 0.000347 0.000204 0.018634 0.014291

MAE MSE RMSE

SCC out in out in out In

tanh 0.017615 0.017566 0.000461 0.000463 0.021466 0.021522sigmoid 0.017725 0.017619 0.000529 0.000470 0.023012 0.021684

linear 0.018800 0.017891 0.000541 0.000487 0.023267 0.022070exp 0.018008 0.017821 0.000519 0.000480 0.022791 0.021915

MAE MSE RMSE

CPALL out in out in out in

tanh 0.020698 0.020667 0.000679 0.000673 0.026053 0.025945sigmoid 0.020784 0.020711 0.000735 0.000675 0.027123 0.025978

linear 0.020946 0.020754 0.000785 0.000730 0.028028 0.027032exp 0.018078 0.016966 0.000571 0.000516 0.023883 0.022729

Two Hidden Neurons MAE MSE RMSE


tanh 0.020031 0.019463 0.000657 0.000611 0.025640 0.024736sigmoid 0.009529 0.009196 0.000186 0.000165 0.013632 0.012854

linear 0.019771 0.019383 0.000608 0.000610 0.024667 0.024689exp 0.009449 0.009187 0.000184 0.000160 0.013573 0.012657

MAE MSE RMSE

SCC out in out In out In

tanh 0.018178 0.017976 0.000517 0.000506 0.022743 0.022488sigmoid 0.013537 0.013251 0.000314 0.000294 0.017732 0.017155

linear 0.013337 0.013334 0.000295 0.000300 0.017182 0.017329exp 0.013058 0.013048 0.000285 0.000281 0.016898 0.016773

MAE MSE RMSE


tanh 0.014284 0.014469 0.000359 0.000406 0.018953 0.020149sigmoid 0.014353 0.014793 0.000393 0.000415 0.019832 0.020377

linear 0.020375 0.020873 0.000663 0.000698 0.025759 0.026432exp 0.020621 0.020821 0.000675 0.000696 0.025989 0.026378

Appl. Sci. 2021, 11, 3997 19 of 25

Table 6. Cont.

Three Hidden Neurons MAE MSE RMSE


tanh 0.019891 0.019421 0.000646 0.000611 0.025424 0.024728sigmoid 0.019574 0.019510 0.000602 0.000621 0.024545 0.024933

linear 0.009158 0.009339 0.000173 0.000171 0.013169 0.013071exp 0.009113 0.009108 0.000162 0.000156 0.012731 0.012481

MAE MSE RMSE

SCC out in out in out in

tanh 0.017995 0.018026 0.000514 0.000507 0.022674 0.022523sigmoid 0.014593 0.013787 0.000355 0.000315 0.018857 0.017751

linear 0.018369 0.017916 0.000524 0.000503 0.022878 0.022455exp 0.013318 0.013200 0.000296 0.000295 0.017223 0.017189

MAE MSE RMSE


tanh 0.020252 0.020950 0.000637 0.000713 0.025246 0.026711sigmoid 0.020111 0.020974 0.000604 0.000719 0.024565 0.026829

linear 0.020553 0.020850 0.000651 0.000705 0.025521 0.026563exp 0.014032 0.014000 0.000364 0.000393 0.019083 0.019832


To make a clear explanation, the best prediction models for the three stocks aresummarized in Table 7. We found that the exponential activation function was mostlyselected (except for ANN-Center in the PTT case). This indicates a nonlinear patternof these three stock companies. It is evident that the performance measures of the bestspecification of ANN-CC models were lower than those of the ANN-Center and RANN-LUfor all stock indices, meaning that the ANN-CC model was superior to the conventionalmodels in both in- and out-of-sample forecast performances. Additionally, in this section,we also compared the performance of our ANN-CC against the methods proposed in theliterature, namely, center and center-range methods of Billard and Diday [7], PM methodof Souza et al. [9], and IT method of Buansing et al. [12]. The result clearly confirmed thehigher prediction performance of our proposed ANN-Center.

A robustness check was also conducted to confirm the performance of our proposedmodel. As in practice, the simple loss functions, such as MAE, MSE, and RMSE, maynot yield a piece of sufficient information to identify a single forecasting model as “best”.Therefore, in this study, another accuracy measure, the model confidence set (MCS), wasused to evaluate forecasting performance. Hansen et al. [34] introduced the MCS testto validate the forecasting performance of forecasting models. In this study, our MCStests were based on two loss functions, MAE, MSE, and RMSE. We note that when thep-value was higher, it meant that they were more likely to reject the null hypothesis ofequal predictive ability. In other words, the greater the p-value, the better the model. Formore details of the MCS test, we referred to Hansen et al. [34].

The statistical performance results for all competing models are provided in thebracket shown in Table 7. According to the MCS test results, the ANN-CC model clearlyoutperformed other competing models for both in- and out-of-sample forecasts. The p-values of the ANN-CC model were equal to one, but the other three models were less thanthe 0.10 threshold. It means that ANN-Center, RNN-LU, and linear regression forecastingmodels were removed in the MCS inspection process, and thus the ANN-CC was the onlysurvivor model.

Appl. Sci. 2021, 11, 3997 20 of 25

Table 7. Summary of the forecasting performance and model confidence set (MCS) test of the first dataset.

Number MAE MSE RMSE

PTT Activation Hidden Neuron out in out in out in

ANN-CC exp 2 0.009030(1.0000)

0.009085(1.0000)

0.000151(1.0000)

0.000145(1.0000)

0.012275(1.0000)

0.012040(1.0000)

ANN-Center linear 2 0.018735(0.0000)

0.018200(0.0000)

0.000548(0.0000)

0.000541(0.0000)

0.023434(0.0000)

0.023269(0.0000)

RANN-LU exp 3 0.009113(0.2012)

0.009108(0.1015)

0.000162(0.2450)

0.000156(0.0010)

0.012731(0.3232)

0.012481(0.0000)

IF 0.017932(0.0000)

0.017817(0.0000)

0.000513(0.0000)

0.000501(0.0000)

0.022652(0.0000)

0.022383(0.0000)

PM 0.018212(0.0000)

0.018326(0.0000)

0.000523(0.0000)

0.000561(0.0000)

0.022873(0.0000)

0.0236861(0.0000)

Center 0.021453(0.0000)

0.020541(0.0000)

0.000751(0.0000)

0.000664(0.0000)

0.027410(0.0000)

0.025771(0.0000)

Center-range 0.019877(0.0000)

0.01532(0.0000)

0.000717(0.0000)

0.000602(0.0000)

0.026771(0.0000)

0.024538(0.0000)

MAE MSE RMSE

SCC out in out in out in

ANN-CC exp 2 0.010011(1.0000)

0.010006(1.0000)

0.000156(1.0000)

0.000155(1.0000)

0.012494(1.0000)

0.012469(1.0000)

ANN-Center exp 3 0.012653(0.0000)

0.012585(0.0000)

0.000273(0.0000)

0.000267(0.0000)

0.016530(0.0000)

0.016353(0.0000)

RANN- LU exp 2 0.013058(0.0000)

0.013048(0.0000)

0.000285(0.0000)

0.000281(0.0000)

0.016898(0.0000)

0.016773(0.0000)

IF 0.012455(0.0000)

0.012101(0.0000)

0.000250(0.0000)

0.000236(0.0000)

0.015813(0.0000)

0.015366(0.0000)

PM 0.013013(0.0000)

0.012992(0.0000)

0.000271(0.0000)

0.000263(0.0000)

0.016464(0.0000)

0.016221(0.0000)

Center 0.014044(0.0000)

0.014021(0.0000)

0.000285(0.0000)

0.000281(0.0000)

0.016877(0.0000)

0.016765(0.0000)

Center-range 0.013221(0.0000)

0.013019(0.0000)

0.000277(0.0000)

0.000275(0.0000)

0.016642(0.0000)

0.016581(0.0000)

MAE MSE RMSE


ANN-CC exp 2 0.011830(1.0000)

0.011123(1.0000)

0.000212(1.0000)

0.000201(1.0000)

0.014567(1.0000)

0.014183(1.0000)

ANN-Center exp 3 0.014286(0.0000)

0.014245(0.0000)

0.000413(0.0000)

0.000385(0.0000)

0.020323(0.0000)

0.019629(0.0000)

RANN- LU exp 3 0.014032(0.0000)

0.014000(0.0000)

0.000396(0.0000)

0.000393(0.0000)

0.019083(0.0000)

0.019832(0.0000)

IF 0.013251(0.0000)

0.013656(0.0000)

0.000372(0.0000)

0.000369(0.0000)

0.019291(0.0000)

0.019205(0.0000)

PM 0.015130(0.0000)

0.014561(0.0000)

0.000431(0.0000)

0.000426(0.0000)

0.020763(0.0000)

0.02071(0.0000)

Center 0.016125(0.0000)

0.015365(0.0000)

0.00510(0.0000)

0.000439(0.0000)

0.071423(0.0000)

0.020961(0.0000)

Center-range 0.019877(0.0000)

0.01532(0.0000)

0.000717(0.0000)

0.000602(0.0000)

0.026762(0.0000)

0.024524(0.0000)

Note: “in” and “out” denote in-sample and out-of-sample forecasts, respectively. Bold number indicates the lowest MAE, MSE, and RMSEin all cases. The number in the bracket indicates the p-value of MCS test results by using bootstrap simulation 1000 times.

5.3. Hong Kong Air Quality Monitoring Dataset

In the second dataset, we considered the Hong Kong air quality monitoring dataset asanother example. This dataset is suggested in Yang et al. [27] and can be retrieved fromhttp://www.epd.gov.hk (accessed on 20 February 2021). They provide hourly air quality

http://www.epd.gov.hk

http://www.epd.gov.hk

Appl. Sci. 2021, 11, 3997 21 of 25

data of 16 monitoring stations in Hong Kong. In this study, we also considered the datafrom Central/Western Station and downloaded the hourly data ranging from 1 January2020, to 31 December 2020. Then, we aggregated the hourly data to the minimum andmaximum form according to each day’s record. There were seven air quality indicators inthe database. However, we considered some of them and selected the respirable suspendedparticulates (RSP) as the interval-valued response variable and selected dinitrogen tetroxide(NO2) and sulfur dioxide (SO2) as the interval-valued explanatory variables.

Again, each dataset was split into 80% for training and 20% for testing. Thus, both in-sample and out-of-sample forecasts were conducted in this comparison. The performanceevaluation of the interval-valued data forecasting models ANN-CC under CAPM wasaccomplished through MAE, MSE, and RMSE. As shown in Table 8, similar to the firstexample, our proposed ANN-CC model was the best in terms of MAE, MSE, RMSE, andMCS’s p-value.

Table 8. Summary of the forecasting performance and MCS test of the second dataset.

Number MAE MSE RMSE

PTT Activation Hidden Neuron out in out in out in

ANN-CC sigmoid 2 2.6374(1.0000)

2.7092(1.0000)

7.3232(1.0000)

7.4085(1.0000)

2.7078(1.0000)

2.7225(1.0000)

ANN-Center sigmoid 2 2.9833(0.0000)

2.8363(0.0000)

8.1092(0.0000)

8.3233(0.0000)

2.8483(0.0000)

2.8854(0.0000)

RANN-LU sigmoid 3 2.7762(0.0000)

2.7423(0.0000)

7.6403(0.0000)

7.5409(0.0510)

2.7656(0.0000)

2.74613(0.0000)

IF 2.8532(0.0000)

2.7821(0.0000)

8.5110(0.0000)

8.3368(0.0000)

2.9161(0.0000)

2.8866(0.0000)

PM 2.7011(0.0000)

2.6893(0.0000)

7.5215(0.0000)

7.4212(0.0000)

2.7427(0.0000)

2.7240(0.0000)

Center 3.0029(0.0000)

2.9861(0.0000)

8.7334(0.0000)

8.7433(0.0000)

2.9543(0.0000)

2.9571(0.0000)

Center-range 2.8763(0.0000)

2.7823(0.0000)

8.5532(0.0000)

8.3499(0.0000)

2.9239(0.0000)

2.8883(0.0000)

Note: “in” and “out” denote in-sample and out-of-sample forecasts, respectively. Bold number indicates the lowest MAE, MSE, and RMSEin all cases. The number in the bracket indicates the p-value of MCS test results by using bootstrap simulation at 1000 times.

To illustrate the performance of our ANN-CC models, we show the out-of-sampleforecasting result of our model on the Thai stocks and RSP in Figure 5. For clarity, only10% of the out-of-sample forecasting result are shown. In this figure, the red vertical linesegment represents a predicted interval-valued data, while the gray vertical line segmentrepresents a predicted interval-valued data; the extremes correspond to the minimumand maximum interval values. The comparison between actual values and predictedvalues indicates the quality of modeling and the prediction task. The results show thatthe predicted values were very close to the actual values, indicating the goodness of fit ofour model.

Appl. Sci. 2021, 11, 3997 22 of 25Appl. Sci. 2021, 11, x FOR PEER REVIEW 23 of 26

(a) (PTT)

(b) (SCC)

(c) (CPALL)

(d) (RSP)

Figure 5. The out of sample interval forecasts by ANN-CC (red) vs. the actual data (gray) for ex-

ample data.

0 20 40 60 80 100 120

-0.1

0-0

.05

0.0

00

.05

0.1

0

Out-of-sample forecast

Inte

rva

l

Predicted interval-valued data

Actual interval-valued data

0 20 40 60 80 100 120

-0.1

0-0

.05

0.0

00

.05

0.1

0


Inte

rva

l



0 20 40 60 80 100 120

-0.1

0-0

.05

0.0

00

.05

0.1

0


Inte

rva

l



0 10 20 30 40 50 60

02

04

06

08

01

00


Inte

rva

l



Figure 5. The out of sample interval forecasts by ANN-CC (red) vs. the actual data (gray) forexample data.

Appl. Sci. 2021, 11, 3997 23 of 25

5.4. Discussion

Although the traditional models like ANN-Center and RANN-LU provide acceptableprediction results in all stocks and RSP, they still face some limitations. The ANN-Centerrelies on the midpoint of the data, which may not reflect real behavior of the data duringthe day. Likewise, the RANN-LU predicts future observation based on either lower orupper bounds, separately, clearly leaving out the information within the interval-valuedata. Thus, our proposed ANN-CC model is used to solve such a challenging task andconsider the whole information during the interval. According to the above experimentalresults, we can draw the following conclusions:

(1) Regarding the prediction performance, our ANN-CC was superior to other traditionalmodels in all datasets.

(2) We note that the symmetric weight within the interval data should not be w = 0.5.We found that the prediction result was sensitive to the weight w; thus, the weightshould not be a fixed parameter.

(3) Our model outperformed the ANN-LU and RANN-LU models in situations in whichthe interval series had linear and nonlinear behavior.

(4) We also studied the sensitivity of each activation function and found that the qualityof the prediction model was not very sensitive in many cases. However, carefulassessment needs to be made when choosing the activation function.

(5) Even though the exponential activation function seemed to be the best fit one inthe ANN architecture, it was noticed that other activation functions performed wellin some cases. Although the exponential activation function performed very wellin the selected three stocks, it may not be reliable in other stocks or under otherANN structures.

(6) However, we can draw an important conclusion that our ANN-CC is a promisingmodel for interval-valued data forecasting. The ANN-CC method has the advantagesof not assuming constraints for the weight nor fixing reference points. The ANN-CCmodel is adaptive and adjusts itself for the best fit. The fitted model allows thebehavior analysis of response lower and upper bounds based on the variation of thereference points of input and output intervals.

6. Conclusions

This paper proposed an artificial neural network with a convex combination (ANN-CC) method for interval-valued data prediction. Simulation and experimental results onreal data showed that the proposed ANN-CC model is a useful tool in interval-valuedprediction tasks, especially for complicated nonlinear datasets. Moreover, the proposedANN-CC model fills the research gap by considering interval-valued data using the convexcombination method. Our proposed model was examined by comparing its performancewith conventional ANN with the center method (ANN-Center) and regularized ANN-LU(RANN-LU), linear regression with the center method, center-range, PM, and IT methods.We considered three stock returns in the Thai stock market and Hong Kong air quality mon-itoring dataset in our empirical comparison. According to the in-sample and out-of-sampleforecasts, we found that the performances of various ANN-CC specifications were notmuch different. However, we observed that the tanh activation function performed wellin the in-sample and out-sample forecast, while the linear activation function performedrelatively well in the in-sample forecast. In addition, we could also confirm the higherperformance of our ANN-CC compared to that of the ANN-Center and RANN-LU. Ex-perimental results on two real datasets also confirmed that the proposed ANN-CC modeloutperformed the conventional models.

In this study, a neural network with one hidden layer was assumed. However, real-world data is quite complex, and one hidden layer may not be enough to learn the data.Thus, a deep neural network (more than two hidden layers) should be more promisingin the approximation performance. Another meaningful future work is to employ otherdeep learning methods, such as recurrent neural networks and long short-term memory.

Appl. Sci. 2021, 11, 3997 24 of 25

These models handle incoming data in time order and learn about the previous time topredict future value. In addition to deep learning methods, the fuzzy inference system(FIS) modeling approach for interval-valued time series forecasting [1] is also suggestedfor further study. Finally, our proposed model can be applied for forecasting in other areassuch as environmental and medical sciences.

Author Contributions: Conceptualization, R.P. and W.Y.; methodology, W.Y.; software, W.Y.; vali-dation, R.P., P.M. and W.Y.; formal analysis, R.P.; investigation, W.Y.; resources, P.M.; data curation,P.M.; writing—original draft preparation, W.Y.; writing—review and editing, W.Y.; visualization,R.P.; supervision, W.Y. and P.M. All authors have read and agreed to the published version ofthe manuscript.

Funding: This research was funded by Center of Excellence in Econometrics, Faculty of Economics,Chiang Mai University.

Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.

Data Availability Statement: In this study, we used simulated data to show the performance of ourmodel, and the simulation processes are already explained in the paper. For the real data analysissection, the data can be freely collected from Thomson Reuter DataStream. However, the data areavailable from the author upon request ([email protected] (accessed on 27 April 2021)).

Acknowledgments: The authors would like to thank Laxmi Worachai, Vladik Kreinovich, and HungT. Nguyen for their helpful comments on this paper. The authors are also grateful for the financialsupport offered by the Center of Excellence in Econometrics, Chiang Mai University, Thailand.

Conflicts of Interest: The authors declare no conflict of interests.

References1. Maciel, L.; Ballini, R. A fuzzy inference system modeling approach for interval-valued symbolic data forecasting. Knowl. Based

Syst. 2019, 164, 139–149. [CrossRef]2. Ma, X.; Dong, Y. An estimating combination method for interval forecasting of electrical load time series. Expert Syst. Appl. 2020,

158, 113498. [CrossRef]3. Chou, J.S.; Truong, D.N.; Le, T.L. Interval forecasting of financial time series by accelerated particle swarm-optimized multi-output

machine learning system. IEEE Access 2020, 8, 14798–14808. [CrossRef]4. Lauro, C.N.; Palumbo, F. Principal component analysis of interval data: A symbolic data analysis approach. Comput. Stat. 2000,

15, 73–87. [CrossRef]5. Branzei, R.; Branzei, O.; Gök, S.Z.A.; Tijs, S. Cooperative interval games: A survey. Cent. Eur. J. Oper. Res. 2010, 18, 397–411.

[CrossRef]6. Kiekintveld, C.; Islam, T.; Kreinovich, V. Security games with interval uncertainty. In Proceedings of the 2013 International

Conference on Autonomous Agents and Multi-Agent Systems, Paul, MN, USA, 6–10 May 2013; pp. 231–238.7. Billard, L.; Diday, E. Regression analysis for interval-valued data. In Data Analysis, Classification, and Related Methods; Springer:

Berlin/Heidelberg, Germany, 2000; pp. 369–374.8. Neto, E.D.A.L.; de Carvalho, F.D.A. Centre and range method for fitting a linear regression model to symbolic interval data.

Comput. Stat. Data Anal. 2008, 52, 1500–1515. [CrossRef]9. Souza, L.C.; Souza, R.M.; Amaral, G.J.; Silva Filho, T.M. A parametrized approach for linear regression of interval data. Knowl.

Based Syst. 2017, 131, 149–159. [CrossRef]10. Chanaim, S.; Sriboonchitta, S.; Rungruang, C. A Convex Combination Method for Linear Regression with Interval Data. In

Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2016; Huynh, V.N., Inuiguchi, M., Le, B., Le, B., Denoeux,T., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9978, pp. 469–480. [CrossRef]

11. Phadkantha, R.; Yamaka, W.; Tansuchat, R. Analysis of Risk, Rate of Return and Dependency of REITs in ASIA with Capital AssetPricing Model. In Predictive Econometrics and Big Data. TES 2018; Kreinovich, V., Sriboonchitta, S., Chakpitak, N., Eds.; Studies inComputational Intelligence; Springer: Cham, Switzerland, 2018; Volume 753, pp. 536–548. [CrossRef]

12. Buansing, T.T.; Golan, A.; Ullah, A. An information-theoretic approach for forecasting interval-valued SP500 daily returns. Int. J.Forecast. 2020, 36, 800–813. [CrossRef]

13. Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175.[CrossRef]

14. Mondal, P.; Shit, L.; Goswami, S. Study of effectiveness of time series modeling (ARIMA) in forecasting stock prices. Int. J.Comput. Sci. Eng. Appl. 2014, 4, 13. [CrossRef]

[email protected]

http://doi.org/10.1016/j.knosys.2018.10.033

http://doi.org/10.1016/j.eswa.2020.113498

http://doi.org/10.1109/ACCESS.2020.2965598

http://doi.org/10.1007/s001800050038

http://doi.org/10.1007/s10100-009-0116-0

http://doi.org/10.1016/j.csda.2007.04.014

http://doi.org/10.1016/j.knosys.2017.06.012

http://doi.org/10.1007/978-3-319-49046-5_40

http://doi.org/10.1007/978-3-319-70942-0_38

http://doi.org/10.1016/j.ijforecast.2019.09.003

http://doi.org/10.1016/S0925-2312(01)00702-0

http://doi.org/10.5121/ijcsea.2014.4202

Appl. Sci. 2021, 11, 3997 25 of 25

15. McMillan, D.G. Nonlinear predictability of stock market returns: Evidence from nonparametric and threshold models. Int. Rev.Econ. Financ. 2001, 10, 353–368. [CrossRef]

16. Nyberg, H. Predicting bear and bull stock markets with dynamic binary time series models. J. Bank. Financ. 2013, 37, 3351–3363.[CrossRef]

17. Pastpipatkul, P.; Maneejuk, P.; Sriboonchitta, S. Markov Switching Regression with Interval Data: Application to Financial Riskvia CAPM. Adv. Sci. Lett. 2017, 23, 10794–10798. [CrossRef]

18. Phochanachan, P.; Pastpipatkul, P.; Yamaka, W.; Sriboonchitta, S. Threshold regression for modeling symbolic interval data. Int. J.Appl. Bus. Econ. Res. 2017, 15, 195–207.

19. Hiransha, M.; Gopalakrishnan, E.A.; Menon, V.K.; Soman, K.P. NSE stock market prediction using deep-learning models. ProcediaComput. Sci. 2018, 132, 1351–1362.

20. Haykin, S.; Principe, J. Making sense of a complex world [chaotic events modeling]. IEEE Signal Process. Mag. 1998, 15, 66–81.[CrossRef]

21. Zhang, G.; Patuwo, B.E.; Hu, M.Y. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 1998, 14, 35–62.[CrossRef]

22. Leung, D.; Abbenante, G.; Fairlie, D.P. Protease inhibitors: Current status and future prospects. J. Med. Chem. 2000, 43, 305–341.[CrossRef]

23. Cao, Q.; Leggio, K.B.; Schniederjans, M.J. A comparison between Fama and French′s model and artificial neural networks inpredicting the Chinese stock market. Comput. Oper. Res. 2005, 32, 2499–2512. [CrossRef]

24. San Roque, A.M.; Maté, C.; Arroyo, J.; Sarabia, Á. iMLP: Applying multi-layer perceptrons to interval-valued data. Neural Process.Lett. 2007, 25, 157–169. [CrossRef]

25. Maia, A.L.S.; de Carvalho, F.D.A.; Ludermir, T.B. Forecasting models for interval-valued time series. Neurocomputing 2008, 71,3344–3352. [CrossRef]

26. Maia, A.L.S.; de Carvalho, F.D.A. Holt’s exponential smoothing and neural network models for forecasting interval-valued timeseries. Int. J. Forecast. 2011, 27, 740–759. [CrossRef]

27. Yang, Z.; Lin, D.K.; Zhang, A. Interval-valued data prediction via regularized artificial neural network. Neurocomputing 2019, 331,336–345. [CrossRef]

28. Mir, M.; Nasirzadeh, F.; Kabir, H.D.; Khosravi, A. Neural Network-based interval forecasting of construction material prices. J.Build. Eng. 2021, 39, 102288.

29. Moore, R.E. Methods and Applications of Interval Analysis; Society for Industrial and Applied Mathematics: Philadelphia, PA,USA, 1979.

30. Humphrey, G.B.; Maier, H.R.; Wu, W.; Mount, N.J.; Dandy, G.C.; Abrahart, R.J.; Dawson, C.W. Improved validation frameworkand R-package for artificial neural network models. Environ. Model. Softw. 2017, 92, 82–106. [CrossRef]

31. Sharpe, W.F. Capital asset prices: A theory of market equilibrium under conditions of risk. J. Financ. 1964, 19, 425–442.32. Lintner, J. Security prices, risk, and maximal gains from diversification. J. Financ. 1965, 20, 587–615.33. Maneejuk, P.; Yamaka, W. Significance test for linear regression: How to test without P-values? J. Appl. Stat. 2021, 48, 827–845.

[CrossRef]34. Hansen, P.R.; Lunde, A.; Nason, J.M. The model confidence set. Econometrica 2011, 79, 453–497. [CrossRef]

http://doi.org/10.1016/S1059-0560(01)00093-4

http://doi.org/10.1016/j.jbankfin.2013.05.008

http://doi.org/10.1166/asl.2017.10155

http://doi.org/10.1109/79.671132

http://doi.org/10.1016/S0169-2070(97)00044-7

http://doi.org/10.1021/jm990412m

http://doi.org/10.1016/j.cor.2004.03.015

http://doi.org/10.1007/s11063-007-9035-z

http://doi.org/10.1016/j.neucom.2008.02.022

http://doi.org/10.1016/j.ijforecast.2010.02.012

http://doi.org/10.1016/j.neucom.2018.11.063

http://doi.org/10.1016/j.envsoft.2017.01.023

http://doi.org/10.1080/02664763.2020.1748180

http://doi.org/10.2139/ssrn.522382

A Convex Combination Approach for Artificial Neural Network ...

Documents

Transcript of A Convex Combination Approach for Artificial Neural Network ...