Sensitivity Based Feature Selection for Recurrent Neural Network Applied to Forecasting of Heating...

Sensitivity based feature selection for recurrentneural network applied to forecasting of heating

gas consumption

Martin Macas1, Fiorella Lauro2, Fabio Moretti2, Stefano Pizzuti2, MauroAnnunziato2, Alessandro Fonti3, Gabriele Comodi3, and Andrea Giantomassi4

1 Department of Cybernetics, Czech Technical University in Prague, Prague, CzechRepublic,

[email protected] Unita Tecnica Tecnologie Avanzate per l’Energia e l’Industria, ENEA (Italian

National Agency for New Technologies, Energy and Sustainable EconomicDevelopment), Cassacia Research Center, Roma, Italy,

3 Dipartimento di Ingegneria Industriale e Scienze Matematiche, UniversitaPolitecnica delle Marche, Ancona, Italy

4 Dipartimento di Ingegneria dell’Informazione, Universita Politecnica delle Marche,Ancona, Italy

Abstract. The paper demonstrates the importance of feature selectionfor recurrent neural network applied to problem of one hour ahead fore-casting of gas consumption for office building heating. Although the ac-curacy of the forecasting is similar for both the feed-forward and therecurrent network, the removal of features leads to accuracy reductionmuch earlier for the feed-forward network. The recurrent network canperform well even with 50% of features. This brings significant benefitsin scenarios, where the neural network is used as a blackbox model ofbuilding consumption, which is called by an optimizer that minimizesthe consumption. The reduction of input dimensionality leads to reduc-tion of costs related to measurement equipment, but also costs relatedto data transfer.

Keywords: forecasting, consumption, gas, heating, neural networks,feature selection

1 Introduction

Although artificial neural networks are very popular soft-computing techniquesused in industrial applications ([1],[2]), recurrent neural networks are not usedso often like the feed-forward models. One possible cause is the fact that theirtraining is usually much more difficult and more complex recurrent models aremore sensitive to over-fitting. It can be therefore crucial to perform a properselection of network inputs, which can simplify the training and can lead to abetter generalization abilities [3]. A proper input selection was observed to bevery important particularly in real-world applications [1].

2 Macas et al.

In this paper, a simple recurrent neural network model is adopted. The result-ing network can be further used as a data-based black box model for optimizationof the building heating. At each hour, a building management system finds theindoor air temperature set points that lead to a minimum output of the con-sumption model and a proper level of comfort. For this purpose, it is crucial toreach a good prediction accuracy. Since the cost function is highly nonlinear andmulti-modal, the population based metaheuristics can be used with advantage.Because the neural network is used in an optimization loop many times, it isalso crucial to have the network as smallest as possible. Moreover, if the opti-mization is performed remotely, one must minimize the amount of data that aremeasured and transferred from the building to the optimization agent. All theserequirements imply a critical need for a proper selection of features, which leadsto reasonable data acquisition requirements and proper prediction accuracy.

We focus on one hour ahead forecasting of total consumption of gas forheating of particular office building. Although there are many papers focusingon nation-wide level gas consumption prediction [4], the single office buildingconsumption is more variable and can be therefore more difficult to predict.Probably the most popular methods applied in this area are artificial neuralnetworks [5]. In the literature, most approaches use feed-forward neural net-works [6]. On the other hand, the recurrent neural networks are mostly used forelectric energy consumption prediction [7], but not for gas heating systems. Wedemonstrate that although the accuracy of the recurrent model is comparable tothe accuracy of feed-forward network, a sensitivity based feature selection canhelp the recurrent network to reach higher reduction of the input dimensionalityby simultaneously keeping a good accuracy level.

In Section 2, we briefly describe all the methods used in the experimentalpart, which is described in section 3. Some final discussion and conclusions canbe found in Section 4.

2 Data and Methods

2.1 F40 Building model

An actual office building located at ENEA (Cassacia Research Centre, Rome,Italy) was considered as a case study (see Figure 1). The building is composed ofthree floors and a thermal subplant in the basement. The building is equippedwith an advanced monitoring system aimed at collecting data about energyconsumption (electrical and thermal) and the environmental conditions. In thebuilding there are 41 offices of different size with a floor area ranging from 14 to36 m2, 2 EDP rooms each of about 20 m2, 4 Laboratories,1 Control Room and2 Meeting Rooms. Each office room has from 1 up to 2 occupants.

In order to estimate thermal energy consumption for heating the whole build-ing, a MATLAB Simulink simulator based on HAMBASE model ([8], [9]) wasdeveloped. In particular, the building was divided into 15 different zones ac-cording to different thermal behavior depending to solar radiation exposure.

Sensitivity based feature selection for gas consumption forecasting 3

Fig. 1. Outside of F40 building

Therefore each zone is modeled with similar thermal and physical characteris-tics. Figure 2 shows the division of zones for each floor. Each zone covers morerooms. Although there are 15 zones at all, zones numbers 3, 8 and 13 correspondto corridors and do not have fan coils. Below, these zones are called non-activezones while all the other zones are called active zones. The simulator estimatesgas consumption needed for heating each zone according to given indoor tem-perature set points and external meteorological conditions.

2.2 Data

A potential remote control agent would be based on a simple data-driven blackbox model, that is a surrogate of the simulator. Metaheuristical algorithms usesuch model to optimise temperature set point of the zones so as to minimisethermal consumption and maximise user’s comfort. Optimized temperature setpoints are then applied to the simulator in order to evaluate the resulting energysavings and comfort. We used the simulator with the following settings.

To obtain valid and reliable results, we simulated four heating seasons -2005/2006, 2006/2007, 2007/2008 and 2008/2009. Each data set consists of 75days which corresponds to 75× 24 = 1800 hourly data instances. The data fromthe first heating season are called the training data and are used for both thetraining and feature selection. The data from 2006/2007 are used for selectionof the best number of inputs. And the data from 2007/2008 and 2008/2009 areused for the final validation of methods.

The behavior of supply water temperature set point was controlled by a sim-ple weather compensation rule. To excite the dynamics of the system in a properdegree, we also add a random component. The value of the set point is Gaussianrandom number with standard deviation 10◦ C and mean equal to 70 − 2Te,

4 Macas et al.

Fig. 2. Partitioning of F40 building zones

where Te is the external temperature. If the generated number is out of feasibil-ity interval ⟨35; 70⟩◦ C, the value of water temperature set point is replaced byuniformly distributed random number from this feasibility interval. The behav-ior of inside air temperature set points differs for daytime and nighttime hours.Between 6 a.m. and 8 p.m., they are also Gaussian random numbers with mean21◦ C and standard deviation 1◦ C. Moreover, there is a saturation under 19◦ Cand above 23◦ C. Between 8 p.m. and 6 a.m., there is a nighttime regime andall the set points are 17◦ C.

The whole set of features used as inputs for our neural networks is describedin Table 1. The first 12 features are the set point values for air temperatures(held constant within each hour) at hour t in 12 active zones (zones that have atleast one fan coil). The 13rd feature is the supply water temperature set pointat hour t. The remaining features describe the external environment climaticconditions in previous hour t− 1. All the meteorological data are obtained fromreal measurements in Roma Ciampino location. The target variable is the totalbuilding gas consumption during hour t.

2.3 Neural networks

In the underlying experiments, the two simple models were used. The firstone is the one hidden layer feed-forward neural network trained by Levenberg-Marquardt algorithm. This is one of the most popular methods used in neuralnetwork applications. The second network is the recurrent neural network withone hidden layer whose delayed outputs are connected back to the input [10]. Thenetwork was trained also by the Levenberg-Marquardt algorithm. This popularalgorithm is used, because of its relatively high speed, and because it is highlyrecommended as a first-choice supervised algorithm by Matlab Neural Networktoolbox, although it does require more memory than other algorithms [11].


Table 1. The description of features in the original set

Number Feature

1 Air temperature set point in active zone 1 [◦ C]

...

12 Air temperature set point in active zone 12 [◦ C]

13 Supply water temperature set point [◦ C]

14 Diffuse solar radiation [Wm−2]

15 Exterior air temperature [◦ C]

16 Direct solar radiation (plane normal to the direction)[Wm−2]

17 Cloud cover(1...8)

18 Relative humidity outside [%]

19 Wind velocity [ms−1]

20 Wind direction [degrees from north]

its small memory requirements. Both networks were simulated in Neural Net-work Toolbox for Matlab [11]. For the reasons described in section 3 justified alsoby preliminary experiments, only one unit networks are used here. The hiddenand output neurons use the sigmoid and linear transfer function, respectively.The mean squared error was minimized by training procedure. The trainingwas stopped after 100 epochs without any improvement or after the number oftraining epochs exceeded 300 or if the error gradient reached 10−7.

2.4 Feature selection

Although both studied neural networks are nonlinear, they significantly differ intheir dynamics and a feature selection should be adapted for particular network.To select proper features tailored for particular network, we decided to use a wellknown sensitivity based method developed by Moody [12]. It is called Sensitivitybased Pruning (SBP) algorithm. It evaluates a change in training mean squarederror (MSE) that would be obtained if ith input’s influence was removed fromthe network. The removal of influence of input is simply modeled by replacingit by its average value. Let xj(t) = (x1j , . . . , xij , . . . , xDj), be the jth of Ninstances of the input vector (N is the size of the training data set). Let xi

j(t) =(x1j , . . . ,

∑j xij/N, . . . , xDj) be jth instance modified at ith position. For each

data instance j, partial sensitivity is defined by

Sij = (f(xij)− yi)

2 − (f(xj)− yi))2, (1)

6 Macas et al.

where f is the neural network function and yi is the target value for ith datainstance. Further, the sensitivity of the network to variable i is defined as:

Si =

∑j Sij

N(2)

In our implementation of SBP, the algorithm starts with the full set of fea-tures (D = 20). At each step, a target neural network is trained. Further, itssensitivity is computed for particular inputs and the feature, for which the sen-sitivity is smallest is removed from the data. Note, that a new neural networkis trained at each backward step. Moreover, compared to the original Moody’sapproach [12], which uses only training set for the sensitivity computation, wesplit the training set into two parts; on the first we train the network and onthe second we compute the sensitivity. This approach was chosen after somepreliminary experiments, where it slightly outperformed the original method.

An obvious question is, how many features to select. To answer this question,we test the neural networks with different number of inputs on an independenttesting data set 2006/2007 and select the proper number of inputs according toits testing error. The final errors of the methods are estimated on the 2007/2008and 2008/2009 data sets, which are not used in any part of the predictor designprocess.

3 Experiments

In this section, we experimentally compare how much benefit the feature se-lection brings for feed-forward and recurrent neural network. From preliminaryexperiment performed on training data set 2005/2006, we choose neural net-works with only one hidden unit. For the two or three neurons in the hiddenlayer, the average of error estimate (computed over multiple runs) was similar,but the standard deviation value was much higher. Therefore, it seems to be”less risky” to use only one hidden unit topologies for both networks.

– First, the training set 2005/2006 was used for feature selection and subse-quent training.

– Second, the error obtained for different numbers of selected features was es-timated on testing set 2006/2007. This error can be found in the upper-leftpart of Figure 3. From a brief analysis of the upper-left subfigure, one canobserve, that the feed-forward neural network gives reasonable testing errorfor 15 and more inputs. The term ”reasonable” means that the prediction isnot disrupted too much. The difference is demonstrated in Figure 4, wherethe upper part shows a bad prediction results obtained by feed-forward net-work with MSE approximately equal to 3. On the other hand, the lower partshows a reasonably good prediction obtained by recurrent neural networkwith MSE approximately equal to 1.5. Finally, to reduce a risk related torandom character of the results, one would chose 16 features as the finalnumber of inputs. Analogically, we select 11 features as the final number ofinputs for the recurrent network.


– Third, the final models were validated on validation set 2007/2008 and2008/2009. This error can be found in the lower-left part of Figure 3. One cansee that the comparison results for the two methods are the same for boththe testing and validation data. This means that the testing data sufficientlyrepresent the behavior of the system and can be used for the final modelselection. The most important conclusion is that the recurrent model canperform much better for smaller numbers of input features than the feed-forward networks. For completeness, the numerical testing and validationvalues for MSE can be found in Table 2. The upper part describes resultswith one unit networks. The lower part shows the results for networks withtwo units. This supports our previous topology choice. The more complextwo-unit networks give much higher average error and also its standard devi-ation, which means that the training data (from one heating season) are notsufficient for more complex models. Moreover, on the right side of Figure 3,one can see Mean Absolute Percentage Error, which can be also used forevaluation of forecasting accuracy. One can notice that its value is relativelyhigh (more than 20%, which is caused by high variability of the target timeseries, which can be observed in Figure 4.

Table 2. The MSE results for two different numbers of selected features for 1-unitnetworks (above) and 2-units networks (below). For the finally chosen models with1-unit, RN with 11 features can lead to the same MSE as FF with 16 features.

Neural Chosen testing MSE validation MSE

network dimensionality 2006/2007 2007/2008+2008/2009

Feed-forward 16 1.49± 0.00 1.40± 0.00

Recurrent 16 1.55± 0.38 1.38± 0.00

Feed-forward 11 2.96± 0.06 2.24± 0.09

Recurrent 11 1.49± 0.00 1.48± 0.00

Feed-forward 2 16 2.17± 0.79 1.55± 0.27

Recurrent 2 16 2.07± 0.98 1.38± 0.16

Feed-forward 2 11 2.52± 0.88 1.78± 0.35

Recurrent 2 11 2.14± 1.39 1.48± 0.21

4 Discussion and conclusions

The main conclusion of the paper is that for our forecasting problem, the featureselection, which is based on sensitivity of the network to the removal of features,

8 Macas et al.

0 5 10 15 20

1.5

2

2.5

3

3.5

Tes

ting

MS

E

0 5 10 15 200.2

0.25

0.3

Tes

ting

MA

PE

0 5 10 15 20

1.5

2

2.5

3

3.5

Number of selected features

Val

idat

ion

MS

E

0 5 10 15 200.2

0.25

0.3

0.35

Number of selected features

Val

idat

ion

MA

PE

FFRN

Fig. 3. The dependence of errors on number of selected features. Based on the testingerror curve, it was decided to select 11 features for RNN and 16 features for FF.

720 740 760 780 800 820 840 860 8800

5

10

15Feed−forward neural network with 11 selected features

Gas

con

sum

ptio

n pe

r ho

ur [m

3 ]

PredictionTarget

720 740 760 780 800 820 840 860 8800

5

10

15Recurrent neural network with 11 selected features

Time [hours]

Gas

con

sum

ptio

n pe

r ho

ur [m

3 ]

Fig. 4. Typical prediction result during seven days obtained by networks trained withonly 11 selected features.


leads to a significant reduction of input dimensionality without increasing of theMSE. For the recurrent model such a reduction is much higher (11/20 inputs).This also means that a real benefit of recurrent neural network is in possibilityof having a much simpler process of data acquisition (remote transfer) and fastercomputation of network outputs. The fact that we used such small networks canseem to be strange, but it can be related to small training data consisting ofone heating season measurement. For much bigger data, bigger networks couldbe more suitable, and the results can be different. However, in a real-plant case,one usually does not have enough time to collect multiple heating seasons dataand the small networks must be used.

From deeper analysis of Figure 3, one can observe that even if we wouldselect only 10 input features, the result will be the same. Thus, we are able toreduce the input dimensionality for recurrent network by 50%.

Finally, we describe the feature selection result itself. In Table 3, one can findthe order, in which the features were removed from the original set. Althoughtwe averaged 20 runs, the feature selection was the same for most runs (19 of20), thus we show the most typical result. The direct solar radiation and cloudcover were the worst features filtered out for both networks. On the other hand,most of the air temperature set points were crucial for both methods. In fact, therecurrent predictor does not even need any of the external environmental condi-tions, which is an important conclusion since the meteorological data acquisitioncauses important costs.

The findings from this paper will be directly used in our recent experimentswith optimization of building heating. In future, we also want to focus on somemore sophisticated feature selection methods and also more complex neural mod-els of gas heating consumption.

Table 3. The feature numbers (see Table 1 for feature names) in the same order asthey were removed from the feature set by the backward feature elimination procedure.E.g. for both methods, the feature number 16 (direct solar radiation) was removed asthe first one.

Feed-forward 16 17 7 8 9 10 11 12 13 14 15 18 19 20 1 2 3 4 5 6

Recurrent 16 17 18 19 20 15 8 14 2 3 4 5 6 7 9 10 11 12 13 1

Acknowledgments. The research was supported by the Czech Science Founda-tion project no. 13-21696P, ”Feature selection for temporal context aware modelsof multivariate time series”.

References

1. Villar, J.R., Gonzalez, S., Sedano, J., Corchado, E., Puigpinos, L., de Ciurana,J.: Meta-heuristic improvements applied for steel sheet incremental cold shaping.Memetic Computing 4(4) (2012) 249–261

10 Macas et al.

2. Calvo-Rolle, J.L., Corchado, E.: A bio-inspired knowledge system for improvingcombined cycle plant control tuning. Neurocomputing 126(0) (2014) 95 – 105

3. Macas, M., Lhotska, L.: Wrapper feature selection significantly improves nonlinearprediction of electricity spot prices. In: Systems, Man, and Cybernetics (SMC),2013 IEEE International Conference on. (2013) 1171–1174

4. Sarak, H., Satman, A.: The degree-day method to estimate the residential heatingnatural gas consumption in turkey: a case study. Energy 28(9) (2003) 929 – 939

5. Kalogirou, S.A.: Applications of artificial neural-networks for energy systems. Ap-plied Energy 67(12) (2000) 17 – 35

6. Khotanzad, A., Elragal, H., Lu, T.L.: Combination of artificial neural-networkforecasters for prediction of natural gas consumption. Neural Networks, IEEETransactions on 11(2) (2000) 464–473

7. Kalogirou, S.A., Bojic, M.: Artificial neural networks for the prediction of theenergy consumption of a passive solar building. Energy 25(5) (2000) 479 – 491

8. Schijndel, A.W.M.V.: HAMLab: Integrated heat air and moisture modeling andsimulation. PhD thesis, Eindhoven: Technische Universiteit (2007)

9. de Wit, M.: HAMBASE: Heat, Air and Moisture Model for Building And SystemsEvaluation. Technische Universiteit Eindhoven, Faculteit Bouwkunde (2006)

10. Elman, J.L.: Finding structure in time. Cognitive Science 14(2) (1990) 179–21111. Mathworks: Neural Network Toolbox for Matlab ver. 2012b (2012)12. Moody, J.E.: The effective number of parameters: An analysis of generalization

and regularization in nonlinear learning systems. In: NIPS, Morgan Kaufmann(1991) 847–854

Sensitivity Based Feature Selection for Recurrent Neural Network Applied to Forecasting of Heating...

Documents

Transcript of Sensitivity Based Feature Selection for Recurrent Neural Network Applied to Forecasting of Heating...