Post on 04-Apr-2023
www.elsevier.com/locate/simpat
Simulation Modelling Practice and Theory 11 (2003) 211–222
Methods to improve predictionperformance of ANN models
Chungen Yin a,*, Lasse Rosendahl a, Zhongyang Luo b
a Institute of Energy Technology, Aalborg University, Pon 101, DK-9220 Aalborg East, Denmarkb Clean Energy and Environment Engineering Key Lab of MOE, Zhejiang University,
Hangzhou 310027, China
Received 30 May 2002; received in revised form 18 February 2003; accepted 4 March 2003
Abstract
Artificial neural network (ANN) is a powerful tool and applied successfully in numerous
fields. But there are still two limitations on its use. One is over-training, which occurs when
the capacity of the ANN for training is too great because it is allowed too many training it-
erations. The other is that ANNs are not effective for extrapolation, which is sometimes very
important because the existing data used to train an ANN do not necessarily cover the entire
range. The two limitations degrade seriously the prediction performance of ANN models. In
this paper, two practices are introduced to alleviate or overcome the negative effect of the lim-
itations. Demonstrations based on these practices indicate that they are general and useful
practices and can improve greatly the prediction performance of the resulting ANN models
to make them really suitable for engineering applications.
� 2003 Elsevier B.V. All rights reserved.
Keywords: Artificial neural networks; Prediction performance; Over-training; Coal blending
1. Introduction
Artificial neural network (ANN) is a powerful tool and has been applied success-
fully in numerous fields, for example, some recent applications in [1–4]. Besides
numerous applications, many researchers take effort to improve the learning effi-
ciency of ANN algorithms by various enhancements and to speed up the learning
* Corresponding author. Tel.: +45-9635-9248; fax: +45-9815-1411.
E-mail address: chy@iet.auc.dk (C. Yin).
1569-190X/03/$ - see front matter � 2003 Elsevier B.V. All rights reserved.
doi:10.1016/S1569-190X(03)00044-3
212 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222
algorithms, which are very important especially for on-line training. Some recent
achievements in this aspect can be found, for example, in [5–7].
Comparing with numerous successful applications and the achievements to im-
prove learning efficiency of ANNs, how to improve the prediction performance of
resulting ANN models is less reported, which is in fact also an important issue.
An ANN model may be able to remember all the samples that participate in training
process with a very high accuracy, but it does not necessarily mean the model will
also have a good prediction performance for other samples that do not participatein the training.
As pointed out by Pleune and Chopra [8], there are two limitations on the use of
ANN models, which seriously degrade the prediction performance of ANN models.
One is over-training. Over-training occurs when the capacity of the ANN for train-
ing is too great because it is too large or is allowed too many training iterations. The
other is that ANN models are not effective for extrapolation. The benefit of ANNs is
lost when they are needed to extrapolate beyond available experimental data. The
problem with extrapolation is sometimes very important, because the existing dataused to train an ANN do not necessarily cover the entire range of the modeled ob-
ject. In fact, the above two limitations are common problems for any kind of regres-
sion technique, not specially for the use of ANNs. However, no solutions for the two
limitations were suggested in [8]: the optimal number of iteration for training the
ANN was set to 3000 for the data in their study; and it was concluded that interpo-
lation is much more meaningful for ANNs, while extrapolations beyond the range of
the training database cannot be made.
As a solution to the problems summarized in [8], this paper is focused on methodsto alleviate or overcome the above two limitations. To achieve an ANNmodel of good
prediction performance, some useful practices are introduced. First, artificial samples,
covering the entire range as much as possible, are drawn based on the existing knowl-
edge about the modeled problem, and then used to initialize the ANN to ensure most
of the future prediction will be an interpolation. This initialization can also store an
approximate relation in the ANN and improve the training efficiency. Secondly, in-
stead of pre-imposing a high precision, for example 10�6, or a large iteration number
on the training process, the traditional way for ANN training, the error function forthe samples that do not participate in the training and are used only for testing the pre-
diction performance of the model is also monitored in the training process to avoid the
over-training. Two examples in the field of coal chemistry are given to demonstrate the
effectiveness of these practices: the resulting ANN models have good prediction per-
formance and are in every-day use in a large-scale coal-blends production center in
China to co-manage the coal library and co-direct coal blends production [9].
2. Artificial neural networks
An ANN is an information-processing system, which consists of a number of in-
terconnected processing elements, commonly referred to as neurons. The neurons are
logically arranged into two or more layers, and interact with each other via weighted
C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 213
connections. Each neuron is connected to all the neurons in the next layer. There is
an input layer where data are presented to the ANN, and an output layer that holds
the response of the network to the input. It is the intermediate layers, also known as
hidden layers, which enable these networks to represent and compute complicated
associations between patterns. A typical 3-layer ANN is shown in Fig. 1, where
wlij is the weight of the link between neuron ulj in layer l and neuron ulþ1
i in layer
lþ 1, and biasli is the bias of neuron uli in layer l. In this architecture, input (0th),
1st, and output layers have n0 ð¼ 2Þ, n1 ð¼ 2Þ and no ð¼ 1Þ neurons, respectively.For a training sample ðXp;DpÞ, in which Xp is the input pattern and can be ex-
pressed as an n0-dimension vector ðx1p; x2p; . . . ; xn0pÞ and Dp is the corresponding ex-
pected solution and can be expressed as an no-dimension vector ðd1p; d2p; . . . ; dnopÞ.The output of a neuron u0i in the input layer is its input xip. For other layers, the
net input to a neuron is usually computed as the sum of all inputs to it. The net input
netlþ1pi to a neuron ulþ1
i for the input pattern Xp is given by
Fig. 1
dicted
netlþ1pi ¼
Xnl
j¼1
wlij � outlj þ biaslþ1
i ; ð1Þ
where outlj is the output of neuron ulj in layer l; nl is the number of neurons in layer l;and biaslþ1
i is the threshold value of neuron ulþ1i in layer lþ 1. The net input to a
neuron is used by a certain activation function, here i.e., the most commonly used
sigmoidal function, to produce an output
outlpi ¼ f ðnetlpiÞ ¼1
1þ e�netlpi: ð2Þ
Error back-propagation (BP) scheme is one of the popular learning algorithms to
train a multi-layer ANN, which is basically a gradient descent algorithm designed to
minimize the error function in the weights space. It is also the learning algorithm
used in this study.
In the batch BP algorithm, each training samples is presented once and weight
correction is calculated, but the weights are not actually adjusted; and calculated
weight corrections for each weight are added together for all the samples and then
weights are updated only once using the cumulative correction. The rule is given by
. A 3-layer ANN. (a) Predicted Cad vs. measured Cad. (b) Predicted Had vs. measured Had. (c) Pre-
Oad vs. measured Oad. (d) Predicted Nad vs. measured Nad.
214 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222
wlij½sþ 1� ¼ wl
ij½s� þ g � Glij½s�; ð3Þ
where g is the step size or the learning rate coefficient, and the index s labels the
iteration number in the learning process. Glij denotes the partial derivative of the
energy function E (i.e., the error function) with respect to the weight wlij, which can
be written as
Glij½s� ¼ � oE
owlij½s�
; ð4Þ
where the energy function E can be expressed as follows:
E ¼ 1
2
XP
p¼1
Xno
i¼1
ðdpi � out0piÞ2; ð5Þ
in which, P is the size of the training set, i.e., the number of samples used for
training.
3. Practice to alleviate the extrapolation limitation of ANNs
As an example, the ANN used to model the relationship between ultimate analysis
data of any coal and its proximate analysis data is established to demonstrate how toalleviate the extrapolation limitation and the effect of this practice.
Coal is one of the most important energy resources all over the world. A coal is
primarily described by its proximate analysis and ultimate analysis. The proximate
analysis provides information about the moisture content of the coal, the amount
of volatile matters, the fixed carbon and the ash content, and can be easily obtained
by means of thermogravimetric analysis. On the contrary, the ultimate analysis,
which gives the elemental composition of the coal (carbon, hydrogen, oxygen, nitro-
gen and a negligible amount of sulfur), is relatively difficult to be conducted and thusis not always available for users. It was proven that there exists intrinsic coal-type-
dependent relationship between ultimate analysis of a coal and its proximate analysis
[10,11]. Conventional techniques, such as empirical formulae, are often used to ap-
proximate the relationship, for example, Eqs. (6)–(9) show the most-commonly used
empirical model, which is defined on dry-ash-free basis for lignite [11],
Cdaf ¼ 96:04� 0:48Vdaf ; ð6Þ
Hdaf ¼ 1:205þ 0:0835Vdaf ; ð7Þ
Odaf ¼ 0:21þ 0:39Vdaf ; ð8Þ
Ndaf ¼ 1:36þ 0:01Vdaf : ð9Þ
To establish the ANN models, 693 different coal samples, covering almost all thetypical Chinese coals, are collected and classified into several different groups ac-
cording to China�s coal type classification standard. Here, lignite is used as a dem-
C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 215
onstration, whose major classification index is the volatile fraction Vdaf > 40% and
calorific value Qnet;ar > 11:70 MJ/kg. There are totally 110 lignite samples, which are
divided into two sets: a training set and a testing set. The samples in the training set
are used to train the ANN; while the samples in the testing set do not participate in
the training and are used only for testing the prediction performance of the resulting
ANN model, i.e., if the model is really suitable for practical uses.
When preparing the training set, firstly, the coal samples with maximum or min-
imum values of any index (including Aad, Vad or Wad in the proximate analysis, Cad,Had, Oad or Nad in the ultimate analysis, or the calorific value) will be picked out from
the 110 experimental samples and used to train the ANN. The remaining experimen-
tal samples are then grouped randomly into the two sets: one is for ANN training
and the other is for model testing. Secondly, some artificial samples can be drawn
from the existing empirical formulae, as shown in Eqs. (6)–(9), which cover the entire
range of lignite as much as possible. For example, artificial samples including those
with the indexes Vdaf 40% and Qnet;ar 11:70, i.e., the lowest limit of this kind of
coal, are drawn from the formulae. The artificial samples will be used to pre-trainthe ANN, i.e., to initialize the ANN. Although these artificial samples are not nec-
essarily accurate, the ANN pre-trained by them will be able to represent a rough re-
lationship between ultimate analysis and proximate analysis of lignite, and serve as a
good initialization for the experimental data. The practice reduces largely the possi-
bility to predict new lignite by means of extrapolation since the artificial samples cov-
ering theoretically the entire range of lignite are used to initialize the ANN, and the
experimental samples with maximum or minimum value of any index all participate
in the training process.With the same structure of the ANN in Fig. 1, a 3-layer ANN is employed to pre-
dict the ultimate analysis data of any lignite from its known proximate analysis data,
with three neurons in the input layer denoting the proximate analysis, Aad, Vad and
Wad, respectively; four neurons in the output layer representing the ultimate analysis,
Cad, Had, Oad and Nad, respectively; and 7 neurons in the hidden layer determined by
numerical experiments, as seen, for example, in Goh [12].
The ultimate analysis data (Cad, Had, Oad and Nad), predicted by the resulting ANN
model and by the empirical model in Eqs. (6)–(9), are plotted in Fig. 2 against withthe measured values in same basis (ad), in which the line y ¼ x is just used to guide
eyes. As shown in the Appendix A, the parameters of the ANN after 4000 training
cycles are accepted and used to predict the ultimate analysis of any lignite from its
proximate analysis data. The details about how to terminate the training of an
ANN are shown and explained in the next section. To ease the comparison between
the ANN and the formulae, the experimental samples are also divided into training
samples and testing samples in the predictions of the formulae, in a same way as
ANN.From Fig. 2, one can see clearly that the resulting ANN model can achieve better
prediction performance than the empirical model, which was developed on the sim-
ilar database as the ANN and is currently widely used to estimate the ultimate anal-
ysis data for Chinese lignite. One of the causes that the resulting ANN model shows
a much better prediction performance than the traditional empirical model and is in
Fig. 2. Predicted ultimate analysis data with different models against the measurements.
216 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222
Fig. 2 (continued)
C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 217
every-day use is the practice used in the preparation of training samples, which re-
duces the possibility of extrapolation as much as possible. Another is the practice
to fix the over-training problem, as explained below.
4. Practice to overcome the over-training limitation of ANNs
To ease the illustration and make the comparison clear, predicting coal ash fusion
temperature from ash chemical components is used to demonstrate this practice,
with only one output in the ANN––the ash fusion temperature.
Besides the ultimate and the proximate analysis data of a coal, its ash fusiontemperature is also an important source for the design and operation coal-fired
boilers. Coal with a low ash fusion temperature promotes deposits that accumulates
around the heat transfer pipes and leads to the corrosion of furnace components.
Coal ash is an extremely complex mixture of mineral matters, mainly including
SiO2, Fe2O3, CaO, MgO, K2O, Na2O, Al2O3, SO3 and TiO2 and a few some other
oxides, and these chemical components determine the ash fusion temperature, but
in a quite complicated and not-yet precisely known manner: the mineral matters in-
teract with each other at high temperature and have important effect on the fusiontemperature. So, although some regression analyses are conducted and some for-
mulae are achieved to estimate ash fusion temperatures from their chemical compo-
nents, they are still far from capable of giving satisfactory predictions, as shown in
Yin et al. [9].
Here, an ANN is used to alleviate this problem. Data of 160 typical coal ash sam-
ples are collected for use in training and testing the ANN, in which one half is used
for training the ANN and another half for testing the generalization the resulting
ANN model. Also, a 3-layer ANN is employed, just like the structure shown inFig. 1, with one input layer, one hidden layer, and one output layer. There are seven
neurons in the input layer, denoting the content of SiO2, Al2O3, Fe2O3, CaO, MgO,
TiO2 and K2O+Na2O in coal ash, respectively; one neuron in the output layer,
Fig. 3. Over-fitting of the measurement data.
218 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222
representing the corresponding ash softening temperature; and eight neurons in the
hidden layer, determined also by numerical experiments.
In most of the existing applications of ANNs, a very high training precision is al-
ways set to a priori, for example 10�6, or a very large number of training cycles is
preset to terminate a training course. If all the training patterns are clean or have
no errors, this practice is necessary. Nevertheless, in actual engineering systems
the samples for training are usually erroneous, so the too high a precision willover-fit the training samples and degrade the prediction performance of the resulting
ANN model, which is shown schematically in Fig. 3.
In principle, an ANN can always remember all the training samples after enough
large training cycles, so the energy function for training samples normally decreases
Fig. 4. Evolution of the energy functions along the iterations.
Fig. 5. Predicted results by the ANNs terminated after different training cycles.
C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 219
with the progress of training. However, the energy function for testing samples usu-
ally decreases only in the beginning stage and then increases with training further on,
especially for the engineering problems. The energy functions for the training set,
testing set and all the samples in this case are shown in Fig. 4, from which the above
tendency is also observed. So, in this study a useful practice is introduced to termi-nate the training course: the energy functions for both the testing set and all the sam-
ples are monitored during the training course, as well as that for the training set. The
training should be terminated when both the energy function for the testing set and
that for all the samples reduce to a relatively low value, and all the parameters in that
ANN should be accepted for the future predictions. Here, the training is terminated
after 600 iterations according to Fig. 4, and the ANN is accepted for future predic-
tion, whose parameters are shown in the Appendix A.
Fig. 5 shows the predicted ash fusion temperatures by the ANNs trained with 600iterations and 150,000 iterations, respectively. It can be seen that the latter cannot
predict the testing patterns satisfactorily although it can recall the training patterns
very well. Of course, this ANN is not suitable for practical applications. It is some-
how like the over-fitting curve shown in Fig. 3, which can remember very well the
measurement data used for training. However, it deviates a lot from the reasonable
model behind the measurement data, and thus cannot be extended for practical uses.
5. Conclusions
To alleviate or overcome the two major limitations of the use of ANNs: over-
training and poor extrapolation ability, which degrade seriously the prediction per-
formance of the resulting ANN models, two useful practices are introduced and
demonstrated in this paper. The practices are general and can be extended to any
similar application of ANNs. In fact, the practices are also valid for any other regres-
sion techniques. The demonstrated ANN models based on the two practices in the
Table 1
Parameters of the ANN to predict ultimate analysis of any lignite from its proximate analysisa
Weights, wlij
Input neurons Hidden
neurons
Output neurons
Input 1 Input 2 Input 3 Output 1 Output 2 Output 3 Output 4
3.448 )4.820 2.583 Hidden 1 )4.553 )1.590 )3.752 )1.9362.975 7.926 )4.977 Hidden 2 )2.133 )0.174 )1.238 )2.663)1.887 )8.351 0.269 Hidden 3 )5.553 )1.889 )1.555 3.692
)3.879 )7.741 )1.330 Hidden 4 5.748 1.345 )1.688 )6.96636.032 )12.285 )0.393 Hidden 5 1.991 8.670 )6.543 )14.1022.782 )0.624 6.690 Hidden 6 )2.805 )1.110 4.362 )5.20333.152 )12.963 3.117 Hidden 7 )2.447 )9.435 5.348 14.025
Threshold values of different neurons, biasliHidden neurons (7) Output neurons (4)
No. 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 Output 1 Output 2 Output 3 Output 4
)3.470 )5.111 4.818 4.715 4.704 )0.725 5.305 4.099 2.489 )0.846 3.501
a Terminated after 4000 training cycles. 110 experimental samples in total: 72 for training; 38 for testing.
220
C.Yinetal./SimulationModellin
gPractice
andTheory11(2003)211–222
Table 2
Parameters of the ANN for ash fusion temperature predictiona
Weights, wlij
Input neurons Hidden
neurons
Output
neuron
Input 1 Input 2 Input 3 Input 4 Input 5 Input 6 Input 7 Output 1
)0.8005 )0.4531 0.7683 1.2116 1.2944 )0.8723 )0.5339 Hidden 1 1.1124
)0.5924 )0.5462 0.6592 1.0458 0.7930 )0.4359 )0.3945 Hidden 2 0.7517
)0.6120 )0.7525 0.6905 0.8584 0.3296 )0.2114 )0.2177 Hidden 3 0.5046
)0.4226 )0.7626 0.7826 0.6263 )0.0068 0.0972 )0.0789 Hidden 4 0.2521
)0.3314 )0.7775 1.1558 0.6783 )0.0836 0.3460 0.0917 Hidden 5 0.0112
)0.6834 )1.0007 1.7747 1.4242 0.2602 0.2659 0.3075 Hidden 6 )0.3711)1.8958 )1.9084 2.4055 2.7201 0.8786 )0.4589 0.2971 Hidden 7 )0.9806)2.8246 )4.2516 3.5228 4.4852 1.6333 )1.5458 )0.7226 Hidden 8 )2.0724
Threshold values of different neurons, biasli
Hidden 1 Hidden 2 Hidden 3 Hidden 4 Hidden 5 Hidden 6 Hidden 7 Hidden 8 Output 1
)0.4506 )0.3101 )0.2927 )0.0849 0.5400 1.5407 2.0709 3.0808 1.3661
a Terminated after 600 training iterations. 160 experimental samples in total: 80 for training and 80
for testing.
C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 221
field of coal chemistry, whose parameters are shown in the Appendix A, are proven
of good prediction performance and is every-day use in a large-scale coal-blends pro-
duction center in China to co-manage the coal library and co-direct coal blends pro-
duction.
Acknowledgements
The first author gratefully thanks Academician Prof. Kefa Cen in Clean Energy
and Environment Engineering Key Lab of Ministry of Education, Zhejiang Univer-
sity, for his discussions during the project––‘‘Catalytic clean coal combustion anddevelopment of a novel coal blending technology for power stations based on
non-linear programming’’.
Appendix A. Parameters of the resulting ANNs
This appendix gives the detailed parameters of the resulting ANNs for the two ex-
amples in this paper, for further verifications, applications in similar problems, orother purposes (Tables 1 and 2).
References
[1] B. Ayrulu, B. Barshan, Neural networks for improved target differentiation and localization with
sonar, Neural Networks 14 (2001) 355–373.
222 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222
[2] M. Liang, M.J. Palakal, Airborne sonar target recognition using artificial neural network,
Mathematical and Computer Modelling 35 (2002) 429–440.
[3] N. Tosun, L. Ozler, A study of tool life in hot machining using artificial neural networks and
regression analysis method, Journal of Materials Processing Technology 124 (2002) 99–104.
[4] K.A. Nagaty, On learning to estimate the block directional image of a fingerprint using a hierarchical
neural network, Neural Networks 16 (2003) 133–144.
[5] S.Y. Jeong, S.Y. Lee, Adaptive learning algorithms to incorporate additional functional constraints
into neural networks, Neurocomputing 35 (2000) 73–90.
[6] H.M. Lee, C.M. Chen, T.C. Huang, Learning efficiency improvement of back-propagation algorithm
by error saturation prevention method, Neurocomputing 41 (2001) 125–143.
[7] N. Ampazis, S.J. Perantonis, J.G. Taylor, A dynamical model for the analysis and acceleration of
learning in feedforward networks, Neural Networks 14 (2001) 1075–1088.
[8] T. Pleune, O. Chopra, Using artificial neural networks to predict the fatigue life of carbon and low-
alloy steels, Nuclear Engineering and Design 197 (2000) 1–12.
[9] C. Yin, Z. Luo, J. Zhou, K. Cen, A novel non-linear programming-based coal blending technology
for power plants, Chemical Engineering Research and Design 78 (2000) 118–124.
[10] J. Bai, F. Liu, Coal quality analysis, Coal Industry Press, Beijing, 1982 (in Chinese).
[11] Beijing Research Institute of Coal Chemistry, Handbook of coal chemistry examination, Coal
Industry Press, Beijing, 1983 (in Chinese).
[12] A. Goh, Back-propagation neural networks for modeling complex systems, Artificial Intelligence in
Engineering 9 (1995) 143–151.