Methods to improve prediction performance of ANN models

Post on 04-Apr-2023

3 views 0 download

Transcript of Methods to improve prediction performance of ANN models

www.elsevier.com/locate/simpat

Simulation Modelling Practice and Theory 11 (2003) 211–222

Methods to improve predictionperformance of ANN models

Chungen Yin a,*, Lasse Rosendahl a, Zhongyang Luo b

a Institute of Energy Technology, Aalborg University, Pon 101, DK-9220 Aalborg East, Denmarkb Clean Energy and Environment Engineering Key Lab of MOE, Zhejiang University,

Hangzhou 310027, China

Received 30 May 2002; received in revised form 18 February 2003; accepted 4 March 2003

Abstract

Artificial neural network (ANN) is a powerful tool and applied successfully in numerous

fields. But there are still two limitations on its use. One is over-training, which occurs when

the capacity of the ANN for training is too great because it is allowed too many training it-

erations. The other is that ANNs are not effective for extrapolation, which is sometimes very

important because the existing data used to train an ANN do not necessarily cover the entire

range. The two limitations degrade seriously the prediction performance of ANN models. In

this paper, two practices are introduced to alleviate or overcome the negative effect of the lim-

itations. Demonstrations based on these practices indicate that they are general and useful

practices and can improve greatly the prediction performance of the resulting ANN models

to make them really suitable for engineering applications.

� 2003 Elsevier B.V. All rights reserved.

Keywords: Artificial neural networks; Prediction performance; Over-training; Coal blending

1. Introduction

Artificial neural network (ANN) is a powerful tool and has been applied success-

fully in numerous fields, for example, some recent applications in [1–4]. Besides

numerous applications, many researchers take effort to improve the learning effi-

ciency of ANN algorithms by various enhancements and to speed up the learning

* Corresponding author. Tel.: +45-9635-9248; fax: +45-9815-1411.

E-mail address: chy@iet.auc.dk (C. Yin).

1569-190X/03/$ - see front matter � 2003 Elsevier B.V. All rights reserved.

doi:10.1016/S1569-190X(03)00044-3

212 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222

algorithms, which are very important especially for on-line training. Some recent

achievements in this aspect can be found, for example, in [5–7].

Comparing with numerous successful applications and the achievements to im-

prove learning efficiency of ANNs, how to improve the prediction performance of

resulting ANN models is less reported, which is in fact also an important issue.

An ANN model may be able to remember all the samples that participate in training

process with a very high accuracy, but it does not necessarily mean the model will

also have a good prediction performance for other samples that do not participatein the training.

As pointed out by Pleune and Chopra [8], there are two limitations on the use of

ANN models, which seriously degrade the prediction performance of ANN models.

One is over-training. Over-training occurs when the capacity of the ANN for train-

ing is too great because it is too large or is allowed too many training iterations. The

other is that ANN models are not effective for extrapolation. The benefit of ANNs is

lost when they are needed to extrapolate beyond available experimental data. The

problem with extrapolation is sometimes very important, because the existing dataused to train an ANN do not necessarily cover the entire range of the modeled ob-

ject. In fact, the above two limitations are common problems for any kind of regres-

sion technique, not specially for the use of ANNs. However, no solutions for the two

limitations were suggested in [8]: the optimal number of iteration for training the

ANN was set to 3000 for the data in their study; and it was concluded that interpo-

lation is much more meaningful for ANNs, while extrapolations beyond the range of

the training database cannot be made.

As a solution to the problems summarized in [8], this paper is focused on methodsto alleviate or overcome the above two limitations. To achieve an ANNmodel of good

prediction performance, some useful practices are introduced. First, artificial samples,

covering the entire range as much as possible, are drawn based on the existing knowl-

edge about the modeled problem, and then used to initialize the ANN to ensure most

of the future prediction will be an interpolation. This initialization can also store an

approximate relation in the ANN and improve the training efficiency. Secondly, in-

stead of pre-imposing a high precision, for example 10�6, or a large iteration number

on the training process, the traditional way for ANN training, the error function forthe samples that do not participate in the training and are used only for testing the pre-

diction performance of the model is also monitored in the training process to avoid the

over-training. Two examples in the field of coal chemistry are given to demonstrate the

effectiveness of these practices: the resulting ANN models have good prediction per-

formance and are in every-day use in a large-scale coal-blends production center in

China to co-manage the coal library and co-direct coal blends production [9].

2. Artificial neural networks

An ANN is an information-processing system, which consists of a number of in-

terconnected processing elements, commonly referred to as neurons. The neurons are

logically arranged into two or more layers, and interact with each other via weighted

C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 213

connections. Each neuron is connected to all the neurons in the next layer. There is

an input layer where data are presented to the ANN, and an output layer that holds

the response of the network to the input. It is the intermediate layers, also known as

hidden layers, which enable these networks to represent and compute complicated

associations between patterns. A typical 3-layer ANN is shown in Fig. 1, where

wlij is the weight of the link between neuron ulj in layer l and neuron ulþ1

i in layer

lþ 1, and biasli is the bias of neuron uli in layer l. In this architecture, input (0th),

1st, and output layers have n0 ð¼ 2Þ, n1 ð¼ 2Þ and no ð¼ 1Þ neurons, respectively.For a training sample ðXp;DpÞ, in which Xp is the input pattern and can be ex-

pressed as an n0-dimension vector ðx1p; x2p; . . . ; xn0pÞ and Dp is the corresponding ex-

pected solution and can be expressed as an no-dimension vector ðd1p; d2p; . . . ; dnopÞ.The output of a neuron u0i in the input layer is its input xip. For other layers, the

net input to a neuron is usually computed as the sum of all inputs to it. The net input

netlþ1pi to a neuron ulþ1

i for the input pattern Xp is given by

Fig. 1

dicted

netlþ1pi ¼

Xnl

j¼1

wlij � outlj þ biaslþ1

i ; ð1Þ

where outlj is the output of neuron ulj in layer l; nl is the number of neurons in layer l;and biaslþ1

i is the threshold value of neuron ulþ1i in layer lþ 1. The net input to a

neuron is used by a certain activation function, here i.e., the most commonly used

sigmoidal function, to produce an output

outlpi ¼ f ðnetlpiÞ ¼1

1þ e�netlpi: ð2Þ

Error back-propagation (BP) scheme is one of the popular learning algorithms to

train a multi-layer ANN, which is basically a gradient descent algorithm designed to

minimize the error function in the weights space. It is also the learning algorithm

used in this study.

In the batch BP algorithm, each training samples is presented once and weight

correction is calculated, but the weights are not actually adjusted; and calculated

weight corrections for each weight are added together for all the samples and then

weights are updated only once using the cumulative correction. The rule is given by

. A 3-layer ANN. (a) Predicted Cad vs. measured Cad. (b) Predicted Had vs. measured Had. (c) Pre-

Oad vs. measured Oad. (d) Predicted Nad vs. measured Nad.

214 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222

wlij½sþ 1� ¼ wl

ij½s� þ g � Glij½s�; ð3Þ

where g is the step size or the learning rate coefficient, and the index s labels the

iteration number in the learning process. Glij denotes the partial derivative of the

energy function E (i.e., the error function) with respect to the weight wlij, which can

be written as

Glij½s� ¼ � oE

owlij½s�

; ð4Þ

where the energy function E can be expressed as follows:

E ¼ 1

2

XP

p¼1

Xno

i¼1

ðdpi � out0piÞ2; ð5Þ

in which, P is the size of the training set, i.e., the number of samples used for

training.

3. Practice to alleviate the extrapolation limitation of ANNs

As an example, the ANN used to model the relationship between ultimate analysis

data of any coal and its proximate analysis data is established to demonstrate how toalleviate the extrapolation limitation and the effect of this practice.

Coal is one of the most important energy resources all over the world. A coal is

primarily described by its proximate analysis and ultimate analysis. The proximate

analysis provides information about the moisture content of the coal, the amount

of volatile matters, the fixed carbon and the ash content, and can be easily obtained

by means of thermogravimetric analysis. On the contrary, the ultimate analysis,

which gives the elemental composition of the coal (carbon, hydrogen, oxygen, nitro-

gen and a negligible amount of sulfur), is relatively difficult to be conducted and thusis not always available for users. It was proven that there exists intrinsic coal-type-

dependent relationship between ultimate analysis of a coal and its proximate analysis

[10,11]. Conventional techniques, such as empirical formulae, are often used to ap-

proximate the relationship, for example, Eqs. (6)–(9) show the most-commonly used

empirical model, which is defined on dry-ash-free basis for lignite [11],

Cdaf ¼ 96:04� 0:48Vdaf ; ð6Þ

Hdaf ¼ 1:205þ 0:0835Vdaf ; ð7Þ

Odaf ¼ 0:21þ 0:39Vdaf ; ð8Þ

Ndaf ¼ 1:36þ 0:01Vdaf : ð9Þ

To establish the ANN models, 693 different coal samples, covering almost all the

typical Chinese coals, are collected and classified into several different groups ac-

cording to China�s coal type classification standard. Here, lignite is used as a dem-

C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 215

onstration, whose major classification index is the volatile fraction Vdaf > 40% and

calorific value Qnet;ar > 11:70 MJ/kg. There are totally 110 lignite samples, which are

divided into two sets: a training set and a testing set. The samples in the training set

are used to train the ANN; while the samples in the testing set do not participate in

the training and are used only for testing the prediction performance of the resulting

ANN model, i.e., if the model is really suitable for practical uses.

When preparing the training set, firstly, the coal samples with maximum or min-

imum values of any index (including Aad, Vad or Wad in the proximate analysis, Cad,Had, Oad or Nad in the ultimate analysis, or the calorific value) will be picked out from

the 110 experimental samples and used to train the ANN. The remaining experimen-

tal samples are then grouped randomly into the two sets: one is for ANN training

and the other is for model testing. Secondly, some artificial samples can be drawn

from the existing empirical formulae, as shown in Eqs. (6)–(9), which cover the entire

range of lignite as much as possible. For example, artificial samples including those

with the indexes Vdaf 40% and Qnet;ar 11:70, i.e., the lowest limit of this kind of

coal, are drawn from the formulae. The artificial samples will be used to pre-trainthe ANN, i.e., to initialize the ANN. Although these artificial samples are not nec-

essarily accurate, the ANN pre-trained by them will be able to represent a rough re-

lationship between ultimate analysis and proximate analysis of lignite, and serve as a

good initialization for the experimental data. The practice reduces largely the possi-

bility to predict new lignite by means of extrapolation since the artificial samples cov-

ering theoretically the entire range of lignite are used to initialize the ANN, and the

experimental samples with maximum or minimum value of any index all participate

in the training process.With the same structure of the ANN in Fig. 1, a 3-layer ANN is employed to pre-

dict the ultimate analysis data of any lignite from its known proximate analysis data,

with three neurons in the input layer denoting the proximate analysis, Aad, Vad and

Wad, respectively; four neurons in the output layer representing the ultimate analysis,

Cad, Had, Oad and Nad, respectively; and 7 neurons in the hidden layer determined by

numerical experiments, as seen, for example, in Goh [12].

The ultimate analysis data (Cad, Had, Oad and Nad), predicted by the resulting ANN

model and by the empirical model in Eqs. (6)–(9), are plotted in Fig. 2 against withthe measured values in same basis (ad), in which the line y ¼ x is just used to guide

eyes. As shown in the Appendix A, the parameters of the ANN after 4000 training

cycles are accepted and used to predict the ultimate analysis of any lignite from its

proximate analysis data. The details about how to terminate the training of an

ANN are shown and explained in the next section. To ease the comparison between

the ANN and the formulae, the experimental samples are also divided into training

samples and testing samples in the predictions of the formulae, in a same way as

ANN.From Fig. 2, one can see clearly that the resulting ANN model can achieve better

prediction performance than the empirical model, which was developed on the sim-

ilar database as the ANN and is currently widely used to estimate the ultimate anal-

ysis data for Chinese lignite. One of the causes that the resulting ANN model shows

a much better prediction performance than the traditional empirical model and is in

Fig. 2. Predicted ultimate analysis data with different models against the measurements.

216 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222

Fig. 2 (continued)

C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 217

every-day use is the practice used in the preparation of training samples, which re-

duces the possibility of extrapolation as much as possible. Another is the practice

to fix the over-training problem, as explained below.

4. Practice to overcome the over-training limitation of ANNs

To ease the illustration and make the comparison clear, predicting coal ash fusion

temperature from ash chemical components is used to demonstrate this practice,

with only one output in the ANN––the ash fusion temperature.

Besides the ultimate and the proximate analysis data of a coal, its ash fusiontemperature is also an important source for the design and operation coal-fired

boilers. Coal with a low ash fusion temperature promotes deposits that accumulates

around the heat transfer pipes and leads to the corrosion of furnace components.

Coal ash is an extremely complex mixture of mineral matters, mainly including

SiO2, Fe2O3, CaO, MgO, K2O, Na2O, Al2O3, SO3 and TiO2 and a few some other

oxides, and these chemical components determine the ash fusion temperature, but

in a quite complicated and not-yet precisely known manner: the mineral matters in-

teract with each other at high temperature and have important effect on the fusiontemperature. So, although some regression analyses are conducted and some for-

mulae are achieved to estimate ash fusion temperatures from their chemical compo-

nents, they are still far from capable of giving satisfactory predictions, as shown in

Yin et al. [9].

Here, an ANN is used to alleviate this problem. Data of 160 typical coal ash sam-

ples are collected for use in training and testing the ANN, in which one half is used

for training the ANN and another half for testing the generalization the resulting

ANN model. Also, a 3-layer ANN is employed, just like the structure shown inFig. 1, with one input layer, one hidden layer, and one output layer. There are seven

neurons in the input layer, denoting the content of SiO2, Al2O3, Fe2O3, CaO, MgO,

TiO2 and K2O+Na2O in coal ash, respectively; one neuron in the output layer,

Fig. 3. Over-fitting of the measurement data.

218 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222

representing the corresponding ash softening temperature; and eight neurons in the

hidden layer, determined also by numerical experiments.

In most of the existing applications of ANNs, a very high training precision is al-

ways set to a priori, for example 10�6, or a very large number of training cycles is

preset to terminate a training course. If all the training patterns are clean or have

no errors, this practice is necessary. Nevertheless, in actual engineering systems

the samples for training are usually erroneous, so the too high a precision willover-fit the training samples and degrade the prediction performance of the resulting

ANN model, which is shown schematically in Fig. 3.

In principle, an ANN can always remember all the training samples after enough

large training cycles, so the energy function for training samples normally decreases

Fig. 4. Evolution of the energy functions along the iterations.

Fig. 5. Predicted results by the ANNs terminated after different training cycles.

C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 219

with the progress of training. However, the energy function for testing samples usu-

ally decreases only in the beginning stage and then increases with training further on,

especially for the engineering problems. The energy functions for the training set,

testing set and all the samples in this case are shown in Fig. 4, from which the above

tendency is also observed. So, in this study a useful practice is introduced to termi-nate the training course: the energy functions for both the testing set and all the sam-

ples are monitored during the training course, as well as that for the training set. The

training should be terminated when both the energy function for the testing set and

that for all the samples reduce to a relatively low value, and all the parameters in that

ANN should be accepted for the future predictions. Here, the training is terminated

after 600 iterations according to Fig. 4, and the ANN is accepted for future predic-

tion, whose parameters are shown in the Appendix A.

Fig. 5 shows the predicted ash fusion temperatures by the ANNs trained with 600iterations and 150,000 iterations, respectively. It can be seen that the latter cannot

predict the testing patterns satisfactorily although it can recall the training patterns

very well. Of course, this ANN is not suitable for practical applications. It is some-

how like the over-fitting curve shown in Fig. 3, which can remember very well the

measurement data used for training. However, it deviates a lot from the reasonable

model behind the measurement data, and thus cannot be extended for practical uses.

5. Conclusions

To alleviate or overcome the two major limitations of the use of ANNs: over-

training and poor extrapolation ability, which degrade seriously the prediction per-

formance of the resulting ANN models, two useful practices are introduced and

demonstrated in this paper. The practices are general and can be extended to any

similar application of ANNs. In fact, the practices are also valid for any other regres-

sion techniques. The demonstrated ANN models based on the two practices in the

Table 1

Parameters of the ANN to predict ultimate analysis of any lignite from its proximate analysisa

Weights, wlij

Input neurons Hidden

neurons

Output neurons

Input 1 Input 2 Input 3 Output 1 Output 2 Output 3 Output 4

3.448 )4.820 2.583 Hidden 1 )4.553 )1.590 )3.752 )1.9362.975 7.926 )4.977 Hidden 2 )2.133 )0.174 )1.238 )2.663)1.887 )8.351 0.269 Hidden 3 )5.553 )1.889 )1.555 3.692

)3.879 )7.741 )1.330 Hidden 4 5.748 1.345 )1.688 )6.96636.032 )12.285 )0.393 Hidden 5 1.991 8.670 )6.543 )14.1022.782 )0.624 6.690 Hidden 6 )2.805 )1.110 4.362 )5.20333.152 )12.963 3.117 Hidden 7 )2.447 )9.435 5.348 14.025

Threshold values of different neurons, biasliHidden neurons (7) Output neurons (4)

No. 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 Output 1 Output 2 Output 3 Output 4

)3.470 )5.111 4.818 4.715 4.704 )0.725 5.305 4.099 2.489 )0.846 3.501

a Terminated after 4000 training cycles. 110 experimental samples in total: 72 for training; 38 for testing.

220

C.Yinetal./SimulationModellin

gPractice

andTheory11(2003)211–222

Table 2

Parameters of the ANN for ash fusion temperature predictiona

Weights, wlij

Input neurons Hidden

neurons

Output

neuron

Input 1 Input 2 Input 3 Input 4 Input 5 Input 6 Input 7 Output 1

)0.8005 )0.4531 0.7683 1.2116 1.2944 )0.8723 )0.5339 Hidden 1 1.1124

)0.5924 )0.5462 0.6592 1.0458 0.7930 )0.4359 )0.3945 Hidden 2 0.7517

)0.6120 )0.7525 0.6905 0.8584 0.3296 )0.2114 )0.2177 Hidden 3 0.5046

)0.4226 )0.7626 0.7826 0.6263 )0.0068 0.0972 )0.0789 Hidden 4 0.2521

)0.3314 )0.7775 1.1558 0.6783 )0.0836 0.3460 0.0917 Hidden 5 0.0112

)0.6834 )1.0007 1.7747 1.4242 0.2602 0.2659 0.3075 Hidden 6 )0.3711)1.8958 )1.9084 2.4055 2.7201 0.8786 )0.4589 0.2971 Hidden 7 )0.9806)2.8246 )4.2516 3.5228 4.4852 1.6333 )1.5458 )0.7226 Hidden 8 )2.0724

Threshold values of different neurons, biasli

Hidden 1 Hidden 2 Hidden 3 Hidden 4 Hidden 5 Hidden 6 Hidden 7 Hidden 8 Output 1

)0.4506 )0.3101 )0.2927 )0.0849 0.5400 1.5407 2.0709 3.0808 1.3661

a Terminated after 600 training iterations. 160 experimental samples in total: 80 for training and 80

for testing.

C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222 221

field of coal chemistry, whose parameters are shown in the Appendix A, are proven

of good prediction performance and is every-day use in a large-scale coal-blends pro-

duction center in China to co-manage the coal library and co-direct coal blends pro-

duction.

Acknowledgements

The first author gratefully thanks Academician Prof. Kefa Cen in Clean Energy

and Environment Engineering Key Lab of Ministry of Education, Zhejiang Univer-

sity, for his discussions during the project––‘‘Catalytic clean coal combustion anddevelopment of a novel coal blending technology for power stations based on

non-linear programming’’.

Appendix A. Parameters of the resulting ANNs

This appendix gives the detailed parameters of the resulting ANNs for the two ex-

amples in this paper, for further verifications, applications in similar problems, orother purposes (Tables 1 and 2).

References

[1] B. Ayrulu, B. Barshan, Neural networks for improved target differentiation and localization with

sonar, Neural Networks 14 (2001) 355–373.

222 C. Yin et al. / Simulation Modelling Practice and Theory 11 (2003) 211–222

[2] M. Liang, M.J. Palakal, Airborne sonar target recognition using artificial neural network,

Mathematical and Computer Modelling 35 (2002) 429–440.

[3] N. Tosun, L. Ozler, A study of tool life in hot machining using artificial neural networks and

regression analysis method, Journal of Materials Processing Technology 124 (2002) 99–104.

[4] K.A. Nagaty, On learning to estimate the block directional image of a fingerprint using a hierarchical

neural network, Neural Networks 16 (2003) 133–144.

[5] S.Y. Jeong, S.Y. Lee, Adaptive learning algorithms to incorporate additional functional constraints

into neural networks, Neurocomputing 35 (2000) 73–90.

[6] H.M. Lee, C.M. Chen, T.C. Huang, Learning efficiency improvement of back-propagation algorithm

by error saturation prevention method, Neurocomputing 41 (2001) 125–143.

[7] N. Ampazis, S.J. Perantonis, J.G. Taylor, A dynamical model for the analysis and acceleration of

learning in feedforward networks, Neural Networks 14 (2001) 1075–1088.

[8] T. Pleune, O. Chopra, Using artificial neural networks to predict the fatigue life of carbon and low-

alloy steels, Nuclear Engineering and Design 197 (2000) 1–12.

[9] C. Yin, Z. Luo, J. Zhou, K. Cen, A novel non-linear programming-based coal blending technology

for power plants, Chemical Engineering Research and Design 78 (2000) 118–124.

[10] J. Bai, F. Liu, Coal quality analysis, Coal Industry Press, Beijing, 1982 (in Chinese).

[11] Beijing Research Institute of Coal Chemistry, Handbook of coal chemistry examination, Coal

Industry Press, Beijing, 1983 (in Chinese).

[12] A. Goh, Back-propagation neural networks for modeling complex systems, Artificial Intelligence in

Engineering 9 (1995) 143–151.