Short-Term Electrical Load Forecasting using Constructive ...

Short-Term Electrical Load Forecastingusing Constructive Feed-Forward Neural

Network

Kazi Rafiqul Islam

Department of Electrical and Electronic Engineering

Dhaka University of Engineering and Technology, Gazipur

June 2013

Short-Term Electrical Load Forecastingusing Constructive Feed-Forward Neural

Network

A dissertation submitted in partial fulfilment of the requirements for thedegree of

Master of Science in Electrical and Electronic Engineering

By

Kazi Rafiqul IslamStudent No. 072213

Under Supervision of

Dr. Md. Monirul KabirAssistant Professor, Dept. of EEE

Department of Electrical and Electronic Engineering

Dhaka University of Engineering and Technology, Gazipur

June 2013

Declaration

I declare that this thesis is my own work and has not been submitted in any form for another degree

or diploma at any university or other institute of tertiary education. Information derived from the

published and unpublished work of others has been acknowledged in the text and a list of references

is given.

Kazi Rafiqul Islam Date: 25/06/2013

iv

Acknowledgements

First of all, I thank the Almighty Allah, who gave me the opportunity and strength to carry

out this work.

I would like to express my sincere gratitude and profound indebtedness to my supervisor Dr.

Md. Monirul Kabir for constant guidance, insightful advice, helpful criticism, valuable

suggestions, commendable support, and endless patience towards the completion of this

thesis. I feel very proud to have worked with him. Without his inspiring enthusiasm and

encouragement, this work could not have been completed.

I thank all the colleague, staffs and friends at the Department of Electrical and Electronic

Engineering, Dhaka University of Engineering and Technology, for their support and

encouragement.

I wish to express my gratitude to Dhaka University of Engineering and Technology, Gazipur

for providing an excellent environment for research. The support I have received from

Dhaka University of Engineering and Technology in terms of approving study leave is

gratefully acknowledged.

Last but not least I would like to thank my parents and my wife, who taught me the value

of hard work by their own example. They rendered me enormous support during the

whole tenure of my study.

v

Abstract

The economy of the operation and control of power systems is sensitive to system demand;

large savings can be obtained by increasing the accuracy of demand forecast. The effect of a

large forecast error is reflected in terms of over conservative or over risky operation. It

implies that, over estimation (or, prediction) leads to the start up of too many units or

excessive energy purchase. Thereby, such over prediction supplies an unnecessary level of

reserve that resulting high operating cost. On the other hand, under prediction persuades

insufficient preparation of spinning reserve and causes the system to operate at a risk region

to the disturbance. In addition, under prediction of load forecast leads to insufficient reserve

capacity preparation, which ultimately increases the operating cost by using expensive

peaking units. Thus, improvement in load forecasting accuracy leads to the cost savings and

increases in the system security. Since, in power systems, the every next day’s power

generation needs to be scheduled for the power dispatch. In this regard, the day-

ahead short-term load forecasting (STLF) is a necessary task.

A number of approaches exist in the literature, where they try to solve the short term

electrical load forecasting (ELF), i.e. STELF problem using neural networks (NNs). It has

been confirmed that, the usage of NN in STELF always outperforms any human-based

computational analysis in terms of accuracy, easy maintenance for users. Because, NN has a

good capability for mapping between input and output although load (i.e., output) is being

increased day by day. Feed forward NN (FFNN) has been used to solve the ELF problem for

different regions with a reasonable computational cost. It is noted that, FFNNs are much

suitable for mapping static relationships between inputs and outputs and ultimately providing

good results in ELF. However, FFNNs need large historical data and have a limited

capability to predict loads of holidays and fast load changes. To overcome the shortcomings

of FFNN, a number of efforts have been done recently, among which echo state NN, radial

basis function NN, recurrent NN, and nonlinear autoregressive NN are used, respectively. It

is noted here that, the performances of afore-mentioned NN models are satisfactory in

vi

predicting the electrical load comparing to the FFNN, but computationally expensive.

Thereby, huge requirements are necessary for the hardware setups as well as experts are

needed for maintenances.

This paper describes a new single-stage online ELF approach using FFNN, called

constructive approaches for electrical load forecasting (CAELF) as per short-term basis. This

approach differs from previous works in a way that, CAELF determines the appropriate NN

architecture in advance before the ELF starts using constructive NN training. In contrast to

the previous approaches, they generally use a fixed NN architecture with randomly selecting

the hidden neuron in the hidden layer during training before the ELF starts. It is well known

that, the random selection of hidden neurons affects the generalization performance of NNs.

The reason is that, the performance of any NN is greatly dependent on its architecture. Thus,

determining hidden neurons’ number automatically provides a novel approach in building

learning models using NNs for ELF. In order to evaluate the performance of CAELF, the

daily electrical load demand data of Spain has been used. Experimental result shows that

CAELF has a significant capability to forecast the electrical load compared to the other

standard FFNN models.

vii

Abbreviations

AI Artificial Intelligence

ANN Artificial Neural Network

ARIMA Autoregressive Integrated Moving Average

ARMA Autoregressive–Moving-Average

BPN Back Propagation Network

CFFNN Constructive Feed Forward Neural Network

CAELF Constructive Approaches for Electrical Load Forecasting

CDD Cooling Degree Days

ELF Electrical Load Forecasting

DSM Demand Side Management

EPRI Electric Power Research Institute

FL Fuzzy Logic

HN Hidden Neuron

HDD Heating Degree Days

MAPE Mean Absolute Percentage Error

MSE Mean Squared Error

NARx Neural Network Auto Regressive

PG&E Pacific Gas And Electric Company

RBFN Radial Basis Function Neural Network

SELF Standard Electrical Load Forecasting

SVM Support Vector Machine

STELF Short Term Electrical Load Forecasting

SCADA Supervisory Control And Data Acquisition

THI Temperature-Humidity Index

WCI Wind Chill Index

viii

Contents

Acknowledgements iv

Abstract v

Abbreviations vii

1 Introduction 1

1.1 Business Needs of Load forecasts....................................................................................... 2

1.2 Characteristics of the power system Load .......................................................................... 3

1.2.1 Weather ..................................................................................................................... 4

1.2.2 Time .......................................................................................................................... 4

1.2.3 Economy ................................................................................................................... 5

1.2.4 Random Disturbance................................................................................................. 5

1.3 Classification of Developed ELF methods ......................................................................... 6

1.4 Short Term Electrical Load Forecast .................................................................................. 6

1.5 Application of Short Term Electrical Load Forecast .......................................................... 6

1.6 Specific aims of this thesis.................................................................................................. 8

2 Literature Review 9

2.1 Overview............................................................................................................................ .9

2.2 Statistical Approaches....................................................................................................... 11

2.2.1 Regression Analysis................................................................................................. 11

2.2.2 Time Series Analysis ............................................................................................... 12

2.3 Neural Network based approaches.................................................................................... 13

3 Artificial Neural Network 15

3.1 Introduction....................................................................................................................... 15

3.2 Fundamentals of Neural network...................................................................................... 16

3.2.1 Processing Unit ........................................................................................................ 17

3.2.2 Activation Function ................................................................................................. 17

3.2.3 Network Topologies................................................................................................. 20

3.2.4 Network Learning .................................................................................................... 21

Contents ix

ix

3.2.5 Objective Function................................................................................................... 22

3.3 Feed Forward Neural Network ......................................................................................... 22

3.3.1 Basic structure.......................................................................................................... 22

3.3.2 Representation Capability ........................................................................................ 24

3.3.3 Network structure design ......................................................................................... 25

3.3.3.1 Determination of Hidden Layers................................................................. 25

3.3.3.2 Determination of Optimal Number of Hidden Units................................... 26

3.3.3.3 Training Algorithm of Neural Network ...................................................... 27

3.3.3.4 Update of Output layer weights .................................................................. 29

3.3.3.5 Update of Hidden layer weights.................................................................. 30

4 Proposed Model for Electrical Load Forecasting 33

4.1 Electrical Load forecasting...... ......................................................................................... 33

4.2 Constructive Approaches for Electrical Load Forecasting ............................................... 33

4.2.1 Performance criterion of NN training ...................................................................... 36

4.2.2 Termination criterion of NN training....................................................................... 36

4.2.3 Hidden Neuron Addition.......................................................................................... 37

4.3 Experimental Studies ........................................................................................................ 37

4.3.1 Description of data................................................................................................... 38

4.3.2 Experimental setup................................................................................................... 40

4.3.3 Experimental results................................................................................................. 40

4.4 Results of CAELF for prototype data... ............................................................................ 43

5 Analysis and Comparisons 45

5.1 Computational Complexity .............................................................................................. 45

5.2 T-Test.................................................... ............................................................................ 46

5.3 Mackey Glass Time Series................................................................................................ 47

5.3.1 Forecasting Results .................................................................................................. 49

5.4 Lorenz Time Series................................................... ........................................................ 49


5.5 Rossler Time Series .......................................................................................................... 51


5.6 Comparison with other works ........................................................................................... 53

6 Conclusion and Future works 55

6.1 Conclusion ........................................................................................................................ 55

6.2 Future works ..................................................................................................................... 56

Bibliography 57

x

List of Figures

Figure 2.1: A typical STELF process.............................................................................................. 10

Figure 3.1: Processing unit.............................................................................................................. 17

Figure 3.2: Identity function............................................................................................................ 18

Figure 3.3: Binary step function...................................................................................................... 18

Figure3.4: Sigmoid function ........................................................................................................... 19

Figure 3.5: Bipolar sigmoid function .............................................................................................. 19

Figure 3.6: Recurrent Neural Network............................................................................................ 20

Figure 3.7: Supervised Learning model .......................................................................................... 21

Figure 3.8: Feed forward Neural network....................................................................................... 27

Figure 4.1: Flowchart of CAELF. ................................................................................................... 34

Figure 4.2: Model of feed forward NN for forecasting the electrical load .................................... 35

Figure 4.3: Comparision between actual load and predicted load for 120 days obtained from

CAELF and corresponding their error in percentage................................................... 41


SELF and corresponding their error in percentage...................................................... 42


.CAELF........................................................................................................................ 42


.CAELF........................................................................................................................ 42

Figure 4.7: Comparision between actual load and predicted load for Holidays of January 99

obtained from CAELF ................................................................................................. 43

Figure 5.1: Sample data of the Mackey-glass time series ............................................................... 48

Figure 5.2: Comparison between the actual data and the predicted data for Mackey glass data

obtained from CAELF ................................................................................................ 49

xi

Figure 5.3: Sample data of the Lorentz time series ………………………………………..50

Figure 5.4: Comparison between the actual data and the predicted data for Lorentz data

obtained from CAELF ................................................................................................. 51

Figure 5.5: Sample data of the Rossler time series …………………………………………52

Figure 5.6: Comparison between the actual data and the predicted data for Rossler data

obtained from CAELF ................................................................................................ 52

List of Tables

Table 1.1: Needs of forecasts in utilities. .......................................................................................... 3

Table 4.1: A sample of data showing the log(load),HDD,CDD, and Dummy Variables ............. ..39

Table 4.2: Comparision between actual load and predicted load for Holidays of January 99 ...... ..43

Table 4.3: Users design electrical load forecasting prototype data for the sample 1 to 5 ............. ..44

Table 4.4: Result for Sample prototype data ................................................................................ ..44

Table 5.1: The value of MAPE of CAELF on Mackey-glass data.................................................. 48

Table 5.2: The value of MAPE of CAELF on Lorentz data ........................................................ ..49

Table 5.3: The value of MAPE of CAELF on Rossler data............................................................ 51

Table 5.4: Comparisons with other models for the ELF problem in terms of the next 120

days.. ............................................................................................................................ 52

Chapter 1

Introduction

With the growth of power system networks and the increase in their complexity, many

factors have become influential in electric power generation, demand or load management.

Load forecasting is one of the critical factors for economic operation of power systems.

Forecasting of future loads also plays a significant role in network planning, infrastructure

development and so on. However, power system load forecasting is a two dimensional

concept: consumer based forecasting and utility based forecasting and each forecast could

be handled coherently. Consumer based forecasts are used to provide some guidelines to

optimize network planning and investments, better manage risk and reduce operational

costs. Undoubtedly, both utility companies and consumers are challenged to accurately

predict their respective loads [1]. This challenge has been in existence for decades, thus a

variety of load forecasting techniques ranging from classical to intelligent systems have

been developed to date, and highlighted in a number of studies. The ultimate distinction of

these methods can be drawn on the bases of forecast accuracy.

In the design stages, utilities need to plan ahead for anticipated future load growth under

different scenarios. Their decisions and designs can affect the gain or loss of huge revenues

for their companies/utilities as well as customer satisfaction and future economic

growth in their area [2].

The decisions on sale-purchase, banking of power and generating electric power, load

switching, and infrastructure development can be carried out with the help of load

forecasting. In the market environment, precise forecasting is the basis of electrical energy

trade and spot price establishment for the system to gain the minimum electricity

purchasing cost [3].

Chapter 1 Introduction

2

Because of high complexity, the research work in the area of load forecasting is still a

challenge to the electrical engineering scholars. Load forecasting with the historical data,

especially in case of holidays, days with extreme weather has remained a difficulty up to

now [4]. To improve the forecasting results, the new mathematical, data mining and

artificial tools are implemented.

1.1 Business Needs of Load Forecasts

In today’s world, load forecasting is an important process in most utilities with the

applications spread across several departments, such as planning department, operations

department, trading department, etc. The business needs of the utilities can be summarized,

but not limited to, the following:

1) Energy Purchasing: Whether a utility purchases its own energy supplies from the

market place, or outsources this function to other parties, load forecasts are essential for

purchasing energy. The utilities can perform bi-lateral purchases and asset commitment

in the long term, e.g., 10 years ahead. They can also do prevarication and block

purchases one month to 3 years ahead, and adjust (buy or sell) the energy purchase in

the day-ahead market.

2) Transmission and Distribution (T&D) Planning: The planning about transmission and

distribution are described in [4]. The utilities need to properly maintain and upgrade the

system to satisfy the growth of demand in the service territory and improve the

reliability. And sometimes the utilities need to hedge the real estate to place the

substations in the future. The planning decisions heavily rely on the forecasts, known as

spatial load forecasts, that contain when, where, and how much the load as well as the

number of customers will grow.

3) Operations and Maintenance: In daily operations, load patterns obtained during the

load forecasting process guide the system operators to make switching and loading

decisions, and schedule maintenance outages.

4) Demand Side Management (DSM): Although lots of DSM activities are belong to

daily operations; it is worthwhile to separate DSM from the operations category due to

its importance in this smart-grid world. A load forecast can support the decisions in load


3

control and voltage reduction. On the other hand, through the studies performed during

load forecasting, utilities can perform long term planning according to the

characteristics of the end-use behavior of certain customers.

5) Financial Planning: The load forecasts can also help the executives of the utilities

project medium and long term revenues, make decisions during acquisitions, approve

or disapprove project budgets, plan human resources and technologies, etc.

According to the lead time range of each business need described above, the minimum

updating cycle and maximum horizon of the forecasts are summarized in Table 1.1.

TABLE 1.1. Needs of forecasts in utilities.

Minimum updating cycle Maximum horizon

Energy purchasing 1 hour 10 years and above

T&D planning 1 day 30 years

Operations 15 minutes 2 weeks

DSM 15 minutes 10 years and above

Financial planning 1 month 10 years and above

1.2 Characteristics of the Power System Load

The system load is the sum of all the consumers’ load at a time. A good

understanding of the system characteristics helps to design reasonable forecasting

models and select appropriate models operating in different situations. Various factors

that influence the system load behaviour can be classified into the following major

categories:

● Weather

● Time

● Economy

● Random disturbance

The effects of all these factors are introduced in the remaining part of this section to

provide a basic understanding of the load characteristics.


4

1.2.1 Weather

Weather factors include temperature, humidity, rainfall, wind speed, cloud cover, light

intensity etc. The change of the weather causes the change of consumers’ comfort feeling

and in turn the usage of some appliances such as space heater, water heater and

air conditioner. Weather-sensitive load also includes appliance of agricultural irrigation

due to the need of the irrigation for cultivated plants. In the areas where summer and

winter have great meteorological difference, the load patterns differ greatly.

Normally the intraday temperatures are the most important weather variables in terms of

their effects on the load; hence they are often selected as the independent variables in

Electrical Load Forecasting ( ELF). Temperatures of the previous days also affect the

load profile. For example, continuous high temperature days might lead to heat build up

and in turn a new demand peak. Humidity is also an important factor, because it affects

the human being’s comfort feeling greatly. People feel hotter in the environment of 35

oC with 70% relative humidity than in the environment of 37 oC with 50% relative

humidity. That’s why temperature-humidity index (THI) is sometimes employed as an

affecting factor of load forecasting. Furthermore, wind chill index (WCI) is another factor

that measures the cold feeling. It is a meaningful topic to select the appropriate weather

variables as the inputs of ELF.

1.2.2 Time

Time factors influencing the load at time point of the day, holiday,

weekday/weekend property and season property. The weekend or holiday load curve is

lower than the weekday curve, due to the decrease of working load. Shifts to and from

daylight savings time and start of the school year also contribute to the significant

change of the previous load profiles. Periodicity is another property of the load curve.

There is very strong daily, weekly, seasonal and yearly periodicity in the load data.

Taking good use of this property can benefit the load forecasting result.


5

1.2.3 Economy

Electricity is a kind of commodity. The economic situation also influences the

utilization of this commodity. Economic factors, such as the degree of industrialization,

price of electricity and load management policy have significant impacts on the system

load growth/decline trend. With t h e development of modern electricity markets, the

relationship between electricity price and load profile is even stronger. Although time-of-

use pricing and demand - side management had arrived before deregulation, the volatility

of spot markets and incentives for consumers to adjust loads are potentially of a much

greater magnitude. At low prices, elasticity is still negligible, but at times of extreme

conditions, price-induced rationing is a much more likely scenario in a deregulated market

compared to that under central planning.

1.2.4 Random Disturbance

The modern power system is composed of numerous electricity users. Although it is

not possible to predict how each individual user consumes the energy, the amount of

the total loads of all the small users shows good statistical rules and in turn, leads to

smooth load curves. This is the groundwork of the load forecasting work. But the start up

and shutdown of the large loads, such as steel mill, synchrotrons and wind tunnels, always

lead to an obvious impulse to the load curve. This is a random disturbance, since for the

dispatchers, the start up and shutdown time of these users is quite random, i.e. there is

no obvious rule of when and how they get power from the grid. When the data from such

a load curve are used in load forecasting training, the impulse component of the load

adds to the difficulty of load forecasting. Special events, which are known in advance

but whose effect on load is not quite certain, are another source of random disturbance. A

typical special event is, for example, a world cup football match, which the dispatchers

know for sure will cause increasing usage of television, but cannot best decide the amount

of the usage. Other typical events include strikes and the government’s compulsory

demand-side management due to forecasted electricity shortage.


6

1.3 Classification of Developed ELF Methods

In terms of lead time, load forecasting is divided into four categories:

Long-term forecasting with the lead time of more than one year

Mid-term forecasting with the lead time of six month to one year

Short-term load forecasting with the lead time of one week to six month

Very short-term load forecasting with the lead time shorter than one day

Different categories of forecasting serve for different purposes. In this thesis short-term

load forecasting which serves the next day(s) unit commitment and reliability analysis is

focused on.

1.4 Short-Term Electrical Load Forecast

The term “short” to imply prediction times of the order of days. The basic quantity of

interest in short-term load forecasting is , typical ly, the daily integrated total system

load. In addition to the prediction of the hourly values of the system load, the short-term

electrical load forecasting (STELF) is also concerned with the forecasting of

the daily peak system load

the values of system load at certain times of the day

the daily, weekly and monthly system energy

This dissertation develops a novel model of constructive feed-forward neural network for

short-term electrical load forecasting process. The term “short-term” here refers to initiating

all the forecasts from one week to six months period.

1.5 Application of Short-Term Electrical Load Forecast

STELF plays a key role in the formulation of economic, reliable, and secure operating

strategies for the power system [5]. The principal objective of the STLF function is to

provide the load predictions for

the basic generation scheduling functions


7

assessing the security of the power system at any time

timely dispatcher information

The primary application of the STELF function is to drive the scheduling functions that

determine the most economic commitment of generation sources consistent with

reliability requirements, operational constraints and policies, and physical, environmental,

and equipment limitations. For purely hydro systems, the load forecasts are required for the

hydro scheduling function to determine the optimal releases from the reservoirs and

generation levels in the power houses. For purely thermal systems, the load forecasts are

needed by the unit commitment function to determine the minimal cost hourly strategies for

the start-up and shutdown of units to supply the forecast load. For mixed hydro and thermal

systems, the load forecasts are required by the hydro-thermal coordination function to

schedule the hourly operation of the various resources so as to minimize production costs.

The hydro schedule unit commitment/ hydro-thermal coordination function requires system

load forecasts for the next day or the next week to determine the least cost operating plans

subject to the various constraints imposed on system operation.

A second application of STELF is for predictive assessment of the power system security.

The system load forecast is an essential data requirement of the off-line network analysis

function for the detection of future conditions under which the power system may be

vulnerable. This information permits the dispatchers to prepare the necessary corrective

actions (e.g. bringing peaking units on line, load shedding, power purchases, switching

operations) to operate the power systems securely.

The third application of STLF is to provide system dispatchers with timely information, i.e.

the most recent load forecast, with the latest weather prediction and random behaviour

taken into a c c o u n t . The dispatchers need this information to operate the system

economically and reliably.


8

1.6 Specific Aims of the Thesis

The thesis is highlighted on basis of the following objectives:

1) To develop a new straightforward model for electrical load forecasting (CAELF) using

constructive feed-forward NN that might reduce the current difficulties of NN for the

STELF problem.

2) To predict loads of holidays and fast load changes effectively using CAELF.

3) To compare the performance of CAELF with other conventional NN-based ELF models.

Chapter 2

Literature Review

Thousands of papers and reports were published in the load forecasting field in the past

50 years. The literature review presented in this chapter concentrates on the Short-Term

Electrical Load Forecasting (STELF) literature published in the reputed journals. The papers

are reviewed from three aspects: the techniques developed or applied, the various

variables deployed, and the representative work done by several major research groups. The

reviews tend to be focused on the major development in the field rather than covering every

aspect of the matter. The comments to the papers in this review are addressed on the

conceptual level. This chapter is organized as following: Section 2.1 overview and Section

2.2 reviews the statistical techniques including regression analysis and time series analysis

applied to STELF. The neural network techniques are presented in Section 2.3.

2.1 Overview

Fig. 2.1 shows a typical STELF process conducted in the utilities that rely on the weather

information. Weather and load history are taken as the inputs to the modelling process.

After the parameters are estimated, the model and weather forecast are extrapolated to

generate the final forecast. A time series, including the load series, can be decomposed to

systematic variation and noise. The modelling process in Fig. 2.1 tends to capture the

systematic variation, which, as an input to the extrapolating process, is crucial to the

forecast accuracy. As a consequence, a large variety of pioneer research and practices in the

field of STELF has been devoted to the modelling process. Most of the model development

work can be summarized from two aspects: techniques and variables.

In addition, people have been adopting or developing various techniques for STELF as

tackling the time series forecasting problems. A lot of these techniques can be roughly

Chapter 2 Literature review

10

categorized into two groups: statistical approaches, such as regression analysis [6] and

time series analysis [7], and Artificial Intelligent (AI) based approaches, such as ANN

[8].Various combinations of these techniques have also been studied and applied to STELF

problems.

On the other hand, people have been seeking the most suitable variables for each particular

problem and trying to generalize the conclusions to interpret the causality of the electric

load consumption. Most of these efforts were embedded coherently into the

development of the techniques. For instance, temperature and relative humidity were

considered in [10], while the effect of humidity and wind speed were considered through a

linear transformation of temperature in the improved version [11]. In general, the electric

load is mainly driven by nature and human activities. The effects of nature are normally

reflected by weather variables, e.g., temperature, while the effects of human activities are

normally reflected by the calendar variables, e.g., business hours.

The combined effects of both elements exist as well but are nontrivial. With the progress of

deregulation, more and more parties have joined the energy markets, where electricity is

traded as a commodity. In some situations, the consumers tend to shift electricity

consumption from the expensive hours to other times when possible. Price information

would affect the load profiles in such a price - sensitive environment. Following this

thought, a price-sensitive load forecaster was proposed, of which the results are reported to

be superior to an existing STELF program [12]. Since the price-sensitive environment is not

generic in the current Bangladesh utility industry, price information was not included in

Fig. 2.1 or the scope of this report.

Fig. 2.1 A typical STELF process that is adapted from [9]


11

Although the majority of the literature in STELF is on the modelling process, there is some

research concerning other aspects to improve the forecast. Weather forecast, as an input to

the extrapolating process, is also very important to the accuracy of STELF. Consequently,

another branch of research work is focusing on developing, improving or incorporating the

weather forecast [13]. A temperature forecaster is proposed for STELF [14].

2.2 Statistical Approaches

2.2.1 Regression Analysis

A regression-based approach to STELF is proposed by Papalexopoulos and Hesterberg [6].

The proposed approach was reported to be tested using Pacific Gas and Electric Company’s

(PG&E) data for the peak and hourly load forecasts of the next 24 hours. This is one of the

few papers fully focused on regression analysis for STELF in the past 20 years [15]-[17].

Some modelling concepts of using multiple linear regression for STELF were applied:

weighted least square technique, temperature modelling by using heating and cooling

degree functions, holiday modelling by using binary variables, and a robust parameter

estimation method and so on. Through a thorough test, the new model was concluded to

be superior to the existing one used in PG&E.

While the paper clearly introduced the proposed approach, two issues appeared in the

details. Firstly, the method of weighted least square was applied to “minimize the effect of

outliers”. However, there is neither analysis for the data to show the existence of outliers,

nor the supportive data to show the cause of outliers. Furthermore, the new model using the

approach with weighted least square was only compared to the existing model, but not the

model without using the weighted least square technique.

Therefore, the advantage of the proposed weighted least square method was not

convincing. Secondly, adding noise to the temperature history to obtain a more robust

forecast was not quite justifiable. This issue was also pointed out by Larson in the


12

appended discussion of this paper, and the authors’ response did not show much strong

evidence to support their statement. Papalexopoulos and Hesterberg’s paper offers a

comprehensive grounding work for applying regression analysis to STELF.

A nonparametric regression based approach was applied to STELF in [18]. The load model

was constructed to reflect the probability density function of load and the factors that

affected the load. The corresponding load forecast was the conditional expectation of the

load given the explanatory variables including time, weather conditions, etc. The proposed

method did not require weather forecast to produce the load forecast, which was different

from the regression based approaches discussed above. Three-week period load and weather

history were used to generate a one-week load forecast. The results were shown to be

competitive comparing with those of an ANN based approach. Since the proposed approach

was only tested using one set of data in the summer, it was not quite convincing as to

whether the method would work well throughout the year. Further thorough tests were

necessary to shown the credibility of this approach.

2.2.2 Autoregressive Integrated Moving Average

Regression techniques were combined with ARIMA models for STELF in [19]. Regression

techniques were used to model and forecast the peak and low load, as well as, weather

normalize the load history, or “remove the weather-sensitive trend” from the load series.

Then ARIMA was applied to a weather normalized load to produce the forecast. Finally

the forecasted normalized load was adjusted based on the forecasted peak and low load.

ARIMA models, together with other Box and Jenkins time series models were applied to

STELF shown to be “well suited to this application” in Hagan and Behr’s paper [7]. A

nonlinear transformation, more precisely, a 3rd order polynomial of the temperature was

proposed to reflect the nonlinear relationship between the load and temperature. Three time

series methods, ARIMA models, standard transfer function models, and transfer function

models with nonlinear transformation, were compared with a conventional procedure

deployed in the utilities, which relied on the input from the dispatchers, for three 20-day

periods (winter, spring and summer) in 1984. The results showed that all the three types of


13

time series models performed better than the convention forecasting approach. Among the

time series models, the nonlinear extension of the transfer function model provided the best

results.

The modified ARIMA method produces the STELF in better accuracy than the other three

approaches. Some other time series modelling approaches were applied to STELF.

Threshold autoregressive models with the stratification rule were discussed in [20]. A

modified ARMA approach was proposed to include the non-Gaussian process

considerations [21]. An adaptive ARMA approach was tested and compared with

conventional Box- Jenkins approach and showed better accuracy [22]. A method using

periodic autoregressive models was reported [23]. ARMAX model, with particle swarm

optimization as the technique to identify the parameters, was proposed for STELF [24].

Other than Box and Jenkins models, a nonlinear system identification technique was

applied to STELF as well [25]. All these techniques and the associated engineering

solutions provided some good insights in certain aspects to the field of STELF.

2.3 Neural Network based Approaches

The history of applying NN to STELF can be traced back to the early 1990s [27], when

ANN was proposed as an algorithm to combine both time series and regression approaches.

In addition, the NN was expected to perform nonlinear modelling for the relationship

between the load and weather variables and be adaptable to new data. The algorithm was

tested using Puget Sound Power and Light Company’s data, which included hourly

temperature and load for Seattle/Tacoma area from Nov. 1, 1988 to Jan 30, 1989. Three test

cases were constructed for peak, total, and hourly load of the day respectively. Normal

weekdays were the focus of the test cases. The proposed algorithm was compared with an

existing algorithm deployed in the utility.

A number of approaches exist in the literature (e.g., [27]-[37]), where they try to solve the

STELF problem using neural networks (NNs). It has been confirmed that, the usage of NN

in STELF always outperforms any human-based computational analysis in terms of


14

accuracy, easy maintenance for users. Because, NN has a good capability for mapping

between input and output although load (i.e., output) is being increased day by day [38].

Feed-forward NN (FFNN) has been used in [33]-[37] to solve the ELF problem for different

regions with a reasonable computational cost. It is noted that, FFNNs are much suitable for

mapping static relationships between inputs and outputs and ultimately providing good

results in ELF. However, FFNNs need large historical data and have a limited capability to

predict loads of holidays and fast load changes [39]. To overcome the shortcomings of

FFNN, a number of efforts have been done in [28], [30]-[32] recently, among which echo

state NN, radial basis function NN, recurrent NN, and nonlinear autoregressive NN are used,

respectively. It is noted here that, the performances of afore-mentioned NN models are

satisfactory in predicting the electrical load comparing to the FFNN, but computationally

expensive. Thereby, huge requirements are necessary for the hardware setups as well as

experts are needed for maintenances.

Chapter 3

Artificial Neural network

3.1 Introduction

Neural networks, more accurately called Artificial Neural Networks (ANN), are

computational models that consist of a number of simple processing units that communicate

by sending signals to each other over a large number of weighted connections. They were

originally developed from the inspiration of human brains. In human brains, a biological

neuron collects signals from other neurons through a host of fine structures called dendrites

[40].

The neuron sends out spikes of electrical activity through a long, thin stand known as an

axon, which splits into thousands of branches. At the end of each branch, a structure called a

synapse converts the activity from the axon into electrical effects that inhibit or excite

activity in the connected neurons. When a neuron receives excitatory input that is

sufficiently large compared with its inhibitory input, it sends a spike of electrical activity

down its axon.

Learning occurs by changing the effectiveness of the synapses so that the influence of one

neuron on another changes. Like human brains, neural networks also consist of processing

units (artificial neurons) and connections (weights) between them. The processing units

transport incoming information on their outgoing connections to other units. The "electrical"

information is simulated with specific values stored in those weights that make these

networks have the capacity to learn, memorize, and create relationships amongst data. A

very important feature of these networks is their adaptive nature where "learning by

example" replaces "programming" in solving problems. This feature renders these

Chapter 3 Artificial Neural Network

16

computational models very appealing in application domains where one has little or

incomplete understanding of the problems to be solved, but where training data are available.

There are many different types of neural networks, and they are being used in many fields.

And new uses for neural networks are devised daily by researchers. Some of the most

traditional applications include [41][42]: Classification – To determine military operations

from satellite photographs; to distinguish among different types of radar returns (weather,

birds, or aircraft); to identify diseases of the heart from electrocardiograms. Noise reduction

– To recognize a number of patterns (voice, images, etc.) corrupted by noise. Prediction – To

predict the value of a variable given historic values. Examples include forecasting of various

types of loads, market and stock forecasting, and weather forecasting. The model built in this

thesis falls into this category.

3.2 Fundamentals of Neural Networks

Neural networks, sometimes referred to as connectionist models, are parallel-distributed

models that have several distinguishing features [43] a set of processing units;

1) An activation state for each unit, which is equivalent to the output of the unit;

2) Connections between the units. Generally each connection is defined by a weight wjk

that determines the effect that the signal of unit j has on unit k;

3) A propagation rule, which determines the effective input of the unit from its external

inputs;

4) An activation function, which determines the new level of activation based on the

effective input and the current activation;

5) An external input (bias, offset) for each unit;

6) A method for information gathering (learning rule);

7) An environment within which the system can operate, provide input signals and, if

necessary, error signals.


17

Σ f( aj)

x0

x1

xn

wj0

wjn

aj yj

j

n

i

ijij xwa 1

)( jj afy

j

...

θj

wj1

3.2.1 Processing Unit

A processing unit (Fig. 3.1), also called a neuron or node, performs a relatively simple job; it

receives inputs from neighbors or external sources and uses them to compute an output

signal that is propagated to other units.

Fig. 3.1 Processing unit

Within the neural systems there are three types of units:

1) Input units, which receive data from outside of the network;

2) Output units, which send data out of the network;

3) Hidden units, whose input and output signals remain within the network.

Each unit j can have one or more inputs x0, x1, x2, … xn, but only one output yj. An input to a

unit is either the data from outside of the network, or the output of another unit, or its own

output.

3.2.2 Activation Function

Most units in neural network transform their net inputs by using a scalar-to-scalar function

called an activation function, yielding a value called the unit's activation. Except possibly for

output units, the activation value is fed to one or more other units. Activation functions with

a bounded range are often called squashing functions. Some of the most commonly used

activation functions are [44]:

1) Identity function:

xxf )( (3.1)


18

f(x)

-1

0

1

-1 0 1x

It is obvious that the input units use the identity function. Sometimes a constant is multiplied

by the net input to form a linear function.

Fig. 3.2 Identity function

2) Binary step function:

Also known as threshold function or Heaviside function. The output of this function is

limited to one of the two values:

)if(

)if(

0

1)(

x

xxf (3.2)

This kind of function is often used in single layer networks.

Fig. 3.3 Binary step functions

3) Sigmoid function (Fig. 3.4)

xexf

11

)( (3.3)

This function is especially advantageous for use in neural networks trained by back-

propagation; because it is easy to differentiate, and thus can dramatically reduce the

computation burden for training. It applies to applications whose desired output values are

between 0 and 1.


19

f(x)

-1

0

1

-6 -4 -2 0 2 4 6 x

Fig. 3.4 Sigmoid function

4) Bipolar sigmoid function:

x

x

e

exf

11

)( (3.4)

This function has similar properties with the sigmoid function. It works well for applications

that yield output values in the range of [-1, 1].

Fig. 3.5 Bipolar sigmoid functions

Activation functions for the hidden units are needed to introduce non-linearity into the

networks. The reason is that a composition of linear functions is again a linear function.

However, it is the non-linearity (i.e., the capability to represent nonlinear functions) that

makes multi-layer networks so powerful. Almost any nonlinear function does the job,

although for back-propagation learning it must be differentiable and it helps if the function is

bounded. The sigmoid functions are the most common choices [45].

For the output units, activation functions should be chosen to be suited to the distribution of

the target values. We have already seen that for binary [0, 1] outputs, the sigmoid function is

an excellent choice. For continuous-valued targets with a bounded range, the sigmoid

f(x)1

0

-6 -4 -2 0 2 4 6


20

functions are again useful, provided that either the outputs or the targets to be scaled to the

range of the output activation function.

3.2.3 Network Topologies

The topology of a network is defined by the number of layers, the number of units per layer,

and the interconnection patterns between layers. They are generally divided into two

categories based on the pattern of connections:

1) Feed-forward Neural Networks: where the data flow from input units to output units is

strictly feed-forward. The data processing can extend over multiple layers of units, but no

feedback connections are present. That is, connections extending from outputs of units to

inputs of units in the same layer or previous layers are not permitted. Feed-forward

networks are the main focus of this thesis. Details have been described in Section 3.3.

2) Recurrent Neural Networks: It contains feedback connections. Contrary to feed-forward

networks, the dynamical properties of the network are important. In some cases, the

activation values of the units undergo a relaxation process such that the network will

evolve to a stable state in which activation does not change further. In other applications

in which the dynamical behaviour constitutes the output of the network, the changes of

the activation values of the output units are significant. A schematic model of RNN shows

in Fig. 3.6

Fig. 3.6 Recurrent neural networks


21

Training Data

Network

Training Algorithm(optimization method)

ObjectiveFunction

Input Desired output

in out

Weightchanges

target

error+-

3.2.4 Network Learning

The functionality of a neural network is determined by the combination of the topology

(number of layers, number of units per layer, and the interconnection pattern between the

layers) and the weights of the connections within the network. The topology is usually held

fixed, and the weights are determined by a certain training algorithm. The process of

adjusting the weights to make the network learn the relationship between the inputs and

targets is called learning, or training. Many learning algorithms have been invented to help

find an optimum set of weights that result in the solution of the problems. They can roughly

be divided into two main groups: Supervised and unsupervised about these learning

methodologies are mentioned below:

1) Supervised Learning: The network is trained with mapping the inputs and desired

outputs (i.e., target values). These input-output pairs are provided by an external teacher,

or by the system containing the network. The difference between the real outputs and the

desired outputs is used by the algorithm to adapt the weights in the network (Fig. 3.7). It

is often referred as a function approximation problem - given training data consisting of

pairs of input patterns x, and corresponding target t, the goal is to find a function f(x) that

matches the desired response for each training input.

Fig. 3.7 Supervised learning model, adapted from [46]

2) Unsupervised Learning: In unsupervised learning, there is no feedback from the

environment to indicate if the outputs of the network are correct. The network must

discover features, regulations, correlations, or categories in the input data automatically.

In fact, for most varieties of unsupervised learning, the targets are the same as inputs. In


22

other words, unsupervised learning usually performs the same task as an auto-associative

network, compressing the information from the inputs.

3.2.5 Objective Function

To train a network and measure how well it performs, an objective function (or cost function)

must be defined to provide an unambiguous numerical rating of system performance.

Selection of an objective function is very important because the function represents the

design goals and decides what training algorithm can be taken. To develop an objective

function that measures exactly what we want is not an easy task. A few basic functions are

very commonly used. One of them is the sum of squares error function,

N

i

pipi

P

p

ytNP

E1

2

1

)(1

(3.5)

Where N and P refer to the total number of output nodes and pattern in the training set

respectively on the other hand i indexes the output nodes, t and y are the target and actual

network output for the ith output unit.

3.3 Feed-Forward Neural Networks

3.3.1 Basic Architecture

A layered feed-forward network consists of a certain number of layers, and each layer

contains a certain number of units. There is an input layer, an output layer, and one or more

hidden layers between the input and the output layer. Each unit receives its inputs directly

from the previous layer (except for input units) and sends its output directly to units in the

next layer (except for output units). Unlike the Recurrent network, which contains feedback

information, there are no connections from any of the units to the inputs of the previous

layers nor to other units in the same layer, nor to units more than one layer ahead. Every unit

only acts as an input to the immediate next layer. Obviously, this class of networks is easier

to analyze theoretically than other general topologies because their outputs can be

represented with explicit functions of the inputs and the weights.


23

Fig. 3.8 Feed-forward neural network

An example of a layered network with one hidden layer is shown in Fig. 3.8. In this network

there are l inputs, m hidden units, and n output units. The output of the jth hidden unit is

obtained by first forming a weighted linear combination of the l input values, then adding a

bias,

l

i

jijij wxwa1

0)1()1( (3.6)

Where )1(jiw is the weight from input i to hidden unit j in the first layer and )1(

0jw is the bias

for hidden unit j. If we are considering the bias term as being weights from an extra input

10 x , Eq.3.6 can be rewritten to the form of,

l

i

ijij xwa0

)1( (3.7)

The activation of hidden unit j then can be obtained by transforming the linear sum using an

activation function )(xf :

)( jj afh (3.8)


24

The outputs of the network can be obtained by transforming the activation of the hidden

units using a second layer of processing units. For each output unit k, first we get the linear

combination of the output of the hidden units,

m

jkjkjk whwa

1

)2(0

)2((3.9)

Again we can absorb the bias and rewrite the above equation to,

m

j

jkjk hwa0

)2( (3.10)

Then applying the activation function )(2 xf to Eq. 3.10 we can get the kth output

)(2 kkafy (3.11)

Combining Eq. 3.7, Eq. 3.8, Eq. 3.9 and Eq. 3.10, we get the complete representation of the

network as

))((0 0

)1()2(2

m

j

l

i

ijikj xwfwfyk

(3.12)

The network of Fig. 3.8 is a network with one hidden layer. We can extend it to have two or

more hidden layers easily as long as we make the above transformation further.

One thing we need to note is that the input units are very special units. They are hypothetical

units that produce outputs equal to their supposed inputs. No processing is done by these

input units.

3.3.2 Representation Capability

The feed-forward NN networks provide a general framework for representing non-linear

functional mapping between a set of input variables and a set of output variables. The

representation capability of a network can be defined as the range of mappings that can be

implemented when the weights are varied. Theories [45], [47]-[48] show that:


25

1) Single-layer networks are capable of representing only linearly separable functions or

linearly separable decision domains.

2) Two hidden layered networks can represent an arbitrary decision boundary to arbitrary

accuracy with threshold activation functions and could approximate any smooth mapping

to any accuracy with sigmoid activation functions.

3.3.3 Network Structure Design

Determination of optimal architecture of NN for a certain problem is not an easy matter. For reduce

this complexity, various methodology have been already introduced. Among those methods, some

are discussed below.

3.3.3.1 Determination of Number of Hidden Layers

Because networks with two hidden layers can represent functions with any kind of shapes,

there is no theoretical reason to use networks with more than two hidden layers. It has also

been determined that for the vast majority of practical problems, there is no reason to use

more than one hidden layer. Problems that require two hidden layers are only rarely

encountered in practice. Even for problems requiring more than one hidden layer

theoretically, most of the time, using one hidden layer performs much better than using two

hidden layers in practice [41]. Training often slows dramatically when more hidden layers

are used. There are several reasons why we should use as few layers as possible in practice:

1) Most training algorithms for feed-forward network are gradient-based. The additional

layer through which errors must be back propagated makes the gradient very unstable.

The success of any gradient-directed optimization algorithm is dependent on the degree

to which the gradient remains unchanged as the parameters vary.

2) The number of local minima increases dramatically with more hidden layers. Most of the

gradient-based optimization algorithms can only find local minima, thus they miss the

global minima. Even though the training algorithm can find the global minima, there is a

higher probability that after much time-consuming iteration, we will find ourselves stuck

in a local minimum and have to escape or start over.


26

Of course, it is possible that for a certain problem, using more hidden layers of just a few

units is better than using fewer hidden layers requiring too many units, especially for

networks that need to learn a function with discontinuities. In general, it is strongly

recommended that one hidden layer be the first choice for any practical feed-forward

network design. If using a single hidden layer with a large number of hidden units does not

perform well, then it may be worth trying a second hidden layer with fewer processing units.

3.3.3.2 Determination of Optimal Number of Hidden Units

Another important issue in designing a network is how many units to place in each layer.

Using too few units can fail to detect the signals fully in a complicated data set, leading to

underfitting. Using too many units will increase the training time, perhaps so much that it

becomes impossible to train it adequately in a reasonable period of time. A large number of

hidden units might cause overfitting, in which case the network has so much information

processing capacity, that the limited amount of information contained in the training set is

not enough to train the network.

The best number of hidden units depends on many factors – the numbers of input and output

units, the number of training cases, the amount of noise in the targets, the complexity of the

error function, the network architecture, and the training algorithm [45].

In most situations, there is no easy way to determine the optimal number of hidden units

without training using different numbers of hidden units and estimating the generalization

error of each. The best approach to find the optimal number of hidden units is trial and

error. In practice, we can use either the forward selection or backward selection to determine

the hidden layer size. Forward selection starts with choosing an appropriate criterion for

evaluating the performance of the network. Then we select a small number of hidden units,

like two if it is difficult to guess how small it is; train and test the network; record its

performance. Next we slightly increase the number of hidden units; train and test until the

error is acceptably small, or no significant improvement is noted, whichever comes first.

Backward selection, which is in contrast with forward selection, starts with a large number


27

of hidden units, and then decreases the number gradually [41][49]. This process is time-

consuming, but it works well.

3.3.3.3 Training Algorithm of Neural Network

Back-propagation is the most commonly used method for training multi-layer feed-forward

networks. It can be applied to any feed-forward network with differentiable activation

functions. This technique was popularized by Rumelhart, Hinton and Williams [50].

For most networks, the learning process is based on a suitable error function, which is then

minimized with respect to the weights and bias. If a network has differential activation

functions, then the activations of the output units become differentiable functions of input

variables, the weights and bias. If we also define a differentiable error function of the

network outputs such as the sum-of-square error function, then the error function itself is a

differentiable function of the weights. Therefore, we can evaluate the derivative of the error

with respect to weights, and these derivatives can then be used to find the weights that

minimize the error function, by either using the popular gradient descent or other

optimization methods. The algorithm for evaluating the derivative of the error function is

known as back-propagation, because it propagates the errors backward through the network.

Back Propagation Learning algorithm is intuitive appealing because it is based on a relative

simple concept: if the network gives the wrong answer, then the weights are corrected so that

the error is lessened and as a result future result of the network are more likely to be correct.

The back-propagation learning algorithm involves two phases: During the first phase the

input is presented and propagated forward through the network to compute the output value

Opk for each unit. This output is then compared with the targets, resulting in an error signal

δpk for each output unit. The second phase involves a backward pass through the network

(analogous to the initial forward pass) during which the error signal is passed to each unit in

the network and the appropriate weight changes are made. This second backward pass

allows the recursive computation of δ as indicate above. The first step is to compute δ for

each of the output units. This is simply the difference between the actual and desired output


28

values times the derivative of the squashing function. Then the weight changes for all

connections that feed into the final layer can be computed. After this is done, then compute

δ’s for all units in the penultimate layer. This propagates errors back one layer and the same

process can be repeated for every layer.

The significance of the process is that, as the network trains, the nodes in the intermediate

layers organize themselves such that different nodes learn to recognize different features of

the total input space. After training, when presented with an arbitrary input pattern that is

noisy or incomplete, the units in the hidden layers of the network will respond with an active

output if the new input contains a pattern that resembles the feature the individual unit learns

to recognize during training. Conversely hidden layer units have a tendency to inhabit their

outputs if the input pattern does not contain the feature that they were trained to recognize.

The Back propagation network shown is a layered feed-forward network that is fully

interconnected by layers. Thus there are no feedback connections and no connections that

bypass one layer to go directly to a later layer. Although only three layers are used in

discussion, more than one hidden layer is permissible.

Suppose a set of P vector –pairs,(x1,y1),(x2,y2),…… ,(xp,yp), which are examples of

functional mapping MN RYRXxY ,:)( [35]. The network will learn an approximation

)(XYO for its training. To derive a method of doing this training that usually works

provides the training vector pairs have been chosen properly and there is sufficient number

of them. Learning of a neural network means finding an appropriate set of weights. The

learning technique described here resembles the problem of finding the equation of a line

that best fits a number of known points.

Let us consider an input vector, ,),...,( 21t

PNxxxX is applied to the input layer of the

network. The “P” subscript refers to the p training vector. The input units distribute the

values to the hidden-layer units. The net input to the jth hidden unit is

N

j

hjpi

hji

hpj xwnet

1

(3.13)


29

Where hjiw is the weight of the connection from the ith input unit to jth hidden unit, and h

j is

the bias term. The “h” superscript refers to quantities on the hidden layer. Assuming that the

activation of this node is equal to the net input; then, the output of this node is

)( hpj

hjpj netfi (3.14)

Where the function )( hpj

hj netf is referred to as an activation function. Its domain is the set of

activation values, net, of the neuron model.

The equations for the output nodes are

okpj

L

j

okj

opk iwnet

1

(3.15)

)( opk

okpk netfo (3.16)

Where the “o” superscript refers to quantities on the output layer.

3.3.3.4 Update of Output-layer Weights

The error at a single output unit is defined ),( pkpkpk oy where the subscripts “p” refers

to the P training vector and “k” refers to the kth output units. In this case ypk is the the

desired output and opk is the actual output from the kth unit. The error to be minimized is the

sum of the squares of the errors for all output units:

M

kpkpE

1

2

21

(3.17)

To determine the direction in which to change the weights, the negative of the gradient of Ep,

pE , with respect to weights, wkj is calculated. The values of the weights can adjust such

that the total error is reduced. It is often usual to think of Ep as a surface in a weights space.

From Eq. (3.17) and the definition of pk

k

pkpkp oyE 2)(21

(3.18)


30

okj

opk

opk

ok

pkpkokj

p

w

net

net

foy

w

E

)(

)()( (3.19)

Where Eq. (3.18) is used for the output value, opk and the chain rule for partial derivatives.

The last factor of Eq. 3.19 is

pjok

L

jpj

okjo

kjokj

opk iiw

ww

net

)()(

1

(3.20)

Combining Eq. 3.19 and Eq. 3.20, the negative gradient

pjopk

okpkpko

kj

p inetfoyw

E)()(

(3.21)

As far as the magnitude of the weight change is connected. It has taken to proportional to the

negative gradient. Thus the weights on the output layer are stated according to

)()()1( twtwtw okjp

okj

okj (3.22)

Where pjopk

okpkpk

okjp inetfoyw )()(

(3.23)

The factor η is called the learning rate parameter. If sigmoid function is used then weight

update equation for output unit is

pjpkpkpkpkokj

okj ioooytwtw )1()()()1( (3.24)

By defining output layer error term

)(

)()(

opk

okpk

opk

okpkpk

opk

netf

netfoy

(3.25)

By combining the Eq. 3.24 and Eq. 3.25 the weight update equation becomes

pjopk

okj

okj itwtw )()1( (3.26)

3.3.3.5 Update of Hidden Layer Weights

The error of the hidden layer is given by


31

k j

okpj

okj

okpk

k

opk

okpk

kpkpkp

iwfy

netfy

oyE

2

2

2

))((21

))((21

)(21

(3.27)

The gradient of Ep with respect to the hidden layer weights

hji

hpj

hpj

pj

pj

opk

opk

pk

kpkpk

kpkpko

jioji

p

w

net

net

i

i

net

net

ooy

oyww

E

)(

)(

)(

)()(

)(21 2

(3.28)

Each of the factors in Eq. 3.28 can be calculated explicitly from previous equation. The

result is

pihpj

hj

okj

opk

k

okpkpko

ji

p xnetfwnetfoyw

E)()()(

(3.29)

The hidden layer weights update in proportion to negative of the Eq. 3.29;

okj

opk

k

okpkpkpi

hpj

hj

hjip wnetfoyxnetfw )()()(

By using Eq. 3.25;

okj

k

opkpi

hpj

hj

hjip wxnetfw )( (3.30)

Every weight update on the hidden layer depends on all the error terms, opk , on the output

layer. The known errors on the output layer are propagated back to hidden layer to determine

the appropriate weight changes on that layer. By defining hidden layer error term

okj

k

opk

hpj

hj

hpj wnetf )( (3.31)

So, the weight update equation becomes analogous to those for the output layer:


32

pihpj

hji

hji xtwtw )()1( (3.32)

The amount of weight adjustment depends on three factors: δ, η, x. the size of the weight

adjustment is proportional to δ the error value of the unit. Thus a larger error value for that

unit results in the larger adjustment to its incoming weights.

The weight adjustment is also proportional to x, the output value for that originating unit. If

this output value is small, then the weight adjustments are small. If this output value is large,

then the weight adjustment is large. Thus a higher activation value for incoming unit results

in a larger adjustment to its outgoing weight.

The variable η in the weight adjustment equation is the learning rate. Its value commonly

between 0.25 and 0.75 is chosen by the neural network user and usually reflects the rate of

learning of the network [50].

Chapter 4

Proposed Model for Electrical Load Forecasting

This chapter describes about the proposed model (i.e., CAELF) for solving the forecasting of

electrical load demand efficiently. In order to evaluate the CAELF, extensive experimental

results have been reported in this chapter.

4.1 Electrical Load

If an electric circuit has a well-defined output terminal, the circuit connected to this terminal

is the load. In other word, the term “load” may also refer to the power consumed by a circuit

or consumer. The estimation of electrical load demand in a locality or a large area is a very

difficult issue, because the consumers consumed the electrical load on basis of different

criteria, such as, weather conditions, nature of the days (i.e., working days, weekend days),

and so on.

Thus, estimating or forecasting the electrical load demand is vitally important for the electric

industry in the deregulated economy as well as essential to the operation and planning of a

utility company. Load forecasting helps an electric utility to make important decisions

including decisions on purchasing and generating electric power, load switching, and

infrastructure development. It is also important for energy suppliers, financial institutions,

and other participants in electric energy generation, transmission, distribution, and markets.

4.2 Constructive Approach for Electrical Load Forecasting (CAELF)

This paper describes a new single-stage electrical load forecasting model using constructive

approaches, called as CAELF. This model differs from the previous works in such a way

Chapter 4 Load Forecasting

34

…....

Bias unit

Bias unit

Input Layer

Hidden Layer

Output layer

HDD CDD Wd Mt

Load

that, CAELF determines the appropriate NN architecture in advance before the ELF starts

using the constructive NN training. In contrast to the previous approaches (e.g., [33]-[37]),

they generally use a fixed NN architecture with randomly selecting the hidden neuron in the

hidden layer during training before the ELF starts. It is well known that, the random

selection of hidden neurons affects the generalization performance of NNs. The reason is

that, the performance of any NN is greatly dependent on its architecture [51], [52]. Thus

determining the hidden neurons’ number automatically provide a novel approach in building

learning models using NNs for the electrical load forecasting.

On the other hand, the proposed CAELF overcomes an existing problem of NN-based ELF

approaches efficiently, that is to say, the limited capability to predict loads of holidays and

fast load changes in respect of FFNN [39]. Although a number of recent efforts have been

done (e.g., [28]-[32]) to overcome such shortcoming, they are being suffered by the huge

computation cost. In this regard, CAELF uses a very simple NN architecture involving a

constructive technique that does not require the expensive computational during training.

Furthermore, CAELF ultimately enhances the prediction performances of holidays and fast

changing loads.

Fig. 4.1 Model of feed-forward NN for forecasting the electrical load. Here, HDD and CDD refer tothe exogenous variables of degree days that are calculated as heating degree days andcooling degree days, respectively. On the other hand, Wd and Mt are the dummy variablesthat represent all the weeks and monthly seasonalities, respectively. For more informationabout these input variables can be found in [53]


35

Furthertraining?

YES

YES

NO

NO

Initialize NN

Partial training for τepoch

Trainingstop?

Add hidden neuron

Final testing ofload forecast

CAELF uses a training approach in association with incremental training to find a minimum

number of hidden neurons for NN models. Hidden neurons (HNs) are added simultaneously

one by one in constructive fashion during the training process of a NN. If the addition of HN

does not improve the NN’s accuracy, it is then removed. The major steps of CAELF are

summarized in Fig. (4.1), which are explained further as follows:

Step 1) At first, choose a feed-forward NN with minimal size. Precisely, size of the input

layer and output layer are decided by the total number of input variables and the output load

of the given ELF dataset, respectively, whereas, size of the hidden layer is initialized using

one hidden neuron.

Step 2) Start the partial training of NN on the training data set up to τ epoch using the back

propagation (BP) algorithm [54]. The number of training epochs, τ, is specified by the user.

Partial training, which was first used in conjunction with an evolutionary algorithm [55],

means that the NN is trained for a fixed number of epochs regardless whether it has

converged or not.

Fig. 4.2 Flowchart of CAELF. Here, NN and HN refer to neural network and hidden neuron,respectively.

Step 3) Check the termination criterion of NN training. If it is satisfied, the current NN

architecture is the outcome of CAELF for a given dataset. Otherwise, follow the next step. In

this work, calculate average training error [56], Ea on the validation set. In other word, the

average training error is considered here as mean squared error (MSE). The average error is


36

the modified form of estimating error and this estimator procedure mentioned in Eq.

3.5.Thus, the error, Ea, is calculated as,

P

P

C

Ccca pypt

pE

1 1

2))()((2

1

(4.1)

Where, tc(p) and yc(p), respectively, are the actual and predicted responses of the c-th output

neuron for the validation pattern p. The symbols P and C represent the total number of

validation patterns and of output neurons, respectively.

Step 4) Check the performance criterion of the network training. If the criterion is satisfied

then the network is assigned to be trained further and go to Step 2. Otherwise, follow the

next step.

Step 5) Add a hidden neuron to the network and go to Step 2 for following the partial

training again.

Step 6) NN is then tested with the unseen testing pattern. Finally, get the electrical load

forecasting from the current NN.

CAELF uses only one cost function that is the training error on validation set. CAELF

finally tries to design a better load forecaster using NN. Details about some basic steps of

CAELF are further given in the following sections.

4.2.1 Performance Criterion of NN Training

If the average training error on validation set reduces by a predefined amount ε, after the

training epoch τ, it is assumed that the training process is progressing well, thus further

training is necessary and go to the Step 2. The reduction of training error can be described

as,

,....3,2,,)()( ttEtE aa (4.2)

Where, τ and t are positive integer number specified by the user.

4.2.2 Termination Criterion of NN Training

Since CAELF adds hidden neurons one by one during the training process of a NN, the

training error would reduce as the training process progresses. However, the objective of

CAELF is to improve generalization ability of the NN. This means the training error may not


37

be a right choice to be used for terminating the training process of the NN. Generally, a

separate dataset, called the validation set, is widely used for termination. It is assumed that

the validation error gives an unbiased estimate because the validation data are not used for

modifying the weights of the NN.

In order to achieve good generalization ability, CAELF uses average training error on

validation set in its termination criterion. It measures validation error after every τ epochs of

training, called strips. It terminates training when the average training error increases by a

predefined amount (λ) for T successive times, which are measured at the end of each of T

successive strips [57]. Since the average training error on validation set increases not only

once but T successive times, it can be assumed that such increases indicate the beginning of

the final over fitting not just the intermittent. The termination criterion can be expressed as,

TiEiE aa ,,....3,2,1,)()( (4.3)

Where τ and T are positive integer number specified by the user. Our model, CAELF, tests

the termination criterion after every τ epochs of training and stops training when the

condition described by Eq. 4.3 is satisfied. In this work, the value of T is chosen as 3.

4.2.3 Hidden Neuron Addition

CAELF adds a hidden neuron to the existing network architecture according to the Eq. 4.4.

The reason is that, the existing network architecture is not capable to acquire the all

information of the dataset; thereby increasing the size of the network is necessary. Then,

train the modified architecture for a certain number of τ epochs.

,....3,2,,)()( ttEtE aa (4.4)

Where, ε is the predefined amount specified by user.

4.3 Experimental Studies

In this section, the performance of CAELF for predicting the electrical load at near future

was presented using the daily load data set. The data used in this study is the daily electrical

demand in megawatts/hour in Spain [32], [53]. The CAELF’s performance was evaluated in

terms of predicted error. Precisely, predicted error refers to the error of existing NN on


38

testing set. For more clarification about the performance evaluation of CAELF, this section

is organized by the following subsections.

4.3.1 Description of Data

The sample used in this study comprises of the daily electricity demand given in

megawatts_hour or MW/h in Spain from January 1, 1993 to June 1998 for a total 2007 days.

All sectors (industrial, commercial, and residential) are included, as sectorial disaggregated

data was not available for this time frequency. The sample has been transformed by taking

natural logarithms of the electricity demand data to reduce the impact of the

heteroskedasticity that could be present due to the large.

The exogenous variables of degree-days are calculated as heating degree days, HDD

= max(0, Tref – Tave ) and cooling degree days, CDD = max (0, Tave – Tref ), where

Tave is the population-weighted daily mean temperature and Tref = 18o

C. The mean daily

temperature is collected over four different weather stations to represent different climatic

subregions of Spain and a population-weighted temperature is assessed in [53].To clarify

about the data, Table 4.1 shows a partial sample of the data sheet. Using this data, the

architecture of feed-forward NNs have been trained that are presented in Fig. 4.2.

To capture the significant seasonal daily components in the electricity load series a

qualitative variable ‘day of the week’ has been introduced into the model through the

specification of six dummy variables (Wit ) representing all days in the week except the base

day of Monday. The index i represents the day of the week (Tuesday, Wednesday, Thursday,

Friday, Saturday and Sunday), and Wit equals 1 if in the t observation day i is found, and 0

otherwise.

To improve the forecast of electricity consumption, anomalous events related to holidays, or

days near a holiday, have been considered. Electricity consumption, mainly in the industrial

sector, decreases appreciably during holidays. For the model to take this into account, three

additional dummy variables have been introduced. First, a variable Hi has been defined,

which equals 1 if t is a holiday and 0 otherwise. Secondly, another dummy variable Ht-1 has

been defined that equals 1 if the t observation corresponds to a day t-1 following a holiday,

and 0 otherwise. This variable is introduced to check the impact on electricity consumption


39

TABLE 4.1 A Sample of Data Showing the Log (Load), Actual Load (MW/hr), HDD, CDD,and Dummy Variables

produced by the proximity of a holiday. Finally, a third variable has been included to analyze

the influence of Easter. This holiday has been treated separately because it has a variable

time location each year within the sample. Thus, a new variable Gt has been defined which

equals 1 if the t observation corresponds to Easter Thursday or Good Friday, and 0

otherwise. To account in the model for the monthly seasonality, eleven dummy variables

(Mjt) were introduced, each representing one of the months in a year and taking January as

the base month. Thus, j refers to February, March, April, May, June, July, August,


40

September, October, November and December, and Mij equals 1 if in the observation the

month j is found, and 0 otherwise.

4.3.2 Experimental Setup

The data used for training of the NN model in CAELF was from January 1, 1993 to

December 31, 1997 for a total 1826 days, whereas the NN was validated during training

using the data from July 1, 1998 to December 31, 1998 in total 184 samples. Precisely, these

samples were called “in-sample” data as they used in NN model for training. On the other

hand, the data sample period of 120 days from January 1, 1999 to April 30, 1999, that were

used to test the forecasting performance by comparing model output (i.e., predicted load)

with the actual load. These data samples are called as “out-of-sample” as they were not used

during the training of NN.

In all experiments, one bias unit with a fixed input +1 was connected to the hidden layer and

output layer. The learning rate and momentum term for training of NN were chosen as 0.05–

0.1 and 0.4–0.7, respectively. The initial connection weights for an NN were randomly

chosen in the range between -1.0 and 1.0. A sigmoid function was used as an activation

function.

4.3.3 Experimental Results

The performance of CAELF in terms of forecasting the out-of-samples was measured by

making a comparison between actual values and model outputs during the same period.

Furthermore, we measured the mean absolute percentage error (MAPE) for the best relative

accuracy measure among the various forecasting accuracy criteria [58]. However, MAPE

was calculated here as,

N

n n

nn

p

PP

NMAPE

1

1001

Where, the nP and nP represent the actual and predicted electrical load, respectively and N

is the total number of samples available.


41

(a)

(b)

In this context, Fig. 4.3 and Fig. 4.4 represent the forecasting analysis for the period of 120

days (including errors in percentage), respectively. Particularly, a comparison between actual

load and predicted load (i.e., forecasting load) was made in Fig. 4.3 using CAELF. We

calculated MAPE between these two loads and found that it was 0.2132. In addition, more

analytical forecasting results of CAELF can be found in Figs. 4.5 and 4.6, where the

forecasting of electrical load was done in between 30 days and 7 days, respectively.

In case of holydays load demand, predicting such demand perfectly is really a difficult task

for any kind of model. The reason is that, the consumers enjoy their time in different places

in various ways that why, the electrical load fluctuations nonlinearly. In order to observe

such issue, CAELF conducted one experiment for 30 days, where the forecasting results

especially for the holydays are pointed out between the actual load curve and predicted load

curve in Fig. 4.7. It has been seen that, the load variations are satisfactory except some cases.

Also Table 4.2 shows that the difference between predicted load and actual load is very low.

Fig. 4.3 (a) Comparison between the actual load and the predicted load for 120 days obtainedfrom CAELF used by constructive feed-forward neural network and (b)corresponding their errors in percentage


42

(a)

(b)

Fig. 4.4 (a) Comparison between the actual load and the predicted load for 120 days obtained fromstandard model (SELF) used by feed-forward neural network and

Fig. 4.4 (b) The percentage of error between the actual load and the predicted load for 120 daysobtained from standard model (SELF) used by feed-forward neural network

Fig. 4.5 Comparison between the actual load and the predicted load for 30 days obtained fromCAELF used by constructive feed-forward neural network

Fig. 4.6 Comparison between the actual load and the predicted load for 7 days obtained fromCAELF used by constructive feed-forward neural network


43

Fig. 4.7 Comparison between the actual load and the predicted load for Holidays of january 99obtained from CAELF used by constructive feed-forward neural network

4.4 Results of CAELF in Prototype Data

In order to justify our model CAELF, we applied to some prototype data whether it works

well or not. In this regard, we have designed some data presented on Tables 4.3 based on the

methodology of Spain electrical load data. In these prototype data, the actual load demand is

unknown. After that, we used these prototype data to the trained NN in CAELF, and finally

we got the corresponding forecasting results that have been presented in Table 4.4.

TABLE 4.2 Comparisons between the actual and predicted load forHolidays of January 99

Holidays Predicted Actual DifferencesJan’1st 13.1888 13.1482 0.0406

2nd 13.187 13.1809 0.00616th 13.2189 13.2833 -0.06449th 13.1974 13.2569 -0.059515th 13.2077 13.2582 -0.050516th 13.1921 13.2542 -0.062123rd 13.1645 13.1842 -0.019724th 13.1078 13.0741 0.033730th 13.1744 13.2209 -0.046531st 13.0745 13.0928 -0.0183


44

TABLE 4.4 Results for Sample Prototype Data

Observation Sample Predicted Output (MW/h)

01 Sample -1 534346.62

02 Sample -2 492779.08

03 Sample -3 544541.81

04 Sample -4 476632.01

05 Sample -5 204771.00

It is observed for the Table 4.4 is that, the predicted output results are reasonable comparing

to the results mentioned in Table 4.1. It means that our model works well in the prototype

data. Hence, by using our proposed model we can forecast electrical load of near future.

TABLE 4.3 Users design electrical load forecasting prototype data for the sample 1 to 5. Here, theoutput load is unknown in each sample

Sample -1

Sample -2

Sample -3

Sample -4

Sample -5

Chapter 5

Analysis and Comparisons

In this chapter, a rigorous analysis about the experimental performance of CAELF is done in

different aspects in order to measure the complexity, significance, and generalization ability.

Particularly, for measuring the complexity of CAELF, a rigorous computational complexity

analysis is done. In order to justify whether CAELF is statistically significant or not, a t-test

analysis is made. In addition to the measurement of generalization ability, three synthetic

time series data samples are used. Finally, a comparison between the forecasting

performances of CAELF and other two related models has been presented in this chapter.

5.1 Computational Complexity

Computational complexity is a measure by which it can be understood about a model how

much complexity is available in the computational process. However, computational

complexity theory is a branch of the theory of computation in theoretical computer science

and mathematics that focuses on classifying computational problems according to their

inherent difficulty, and relating those classes to each other. A computational problem is

understood to be a task that is in principle amenable to being solved by a computer, which is

equivalent to stating that the problem may be solved by mechanical application of

mathematical steps.

The analysis of computational complexity helps to understand the actual computational cost

of an algorithm. As Kudo and Sklansky [59] showed such an analysis in the form of big-O

notation, we are inspired to compute the computational cost of our CAELF. The following

few paragraphs present the computational complexity of CAELF to show that the inclusion

of different techniques does not increase computational complexity of training NNs.

(i) Partial Training: In our thesis, we use standard back propagation (BP) algorithm [54]

for training. Each epochs of BP takes O(W) computations for training one example. Here,

W is the number of weights in the current NN. Thus training all examples in the training

Chapter 5 Analysis and Comparisons

46

set for epochs needs )( wpO t computations, where Pt denotes the number of

examples in the training set

(ii) Termination Criterion: The termination criterion employed in CAELF for stopping

training of the NN that uses both training and validation errors. Since the training error is

computed as a part of the training process, the termination criterion takes )( wpO v

computations, where pv denotes the number of examples in the validation set. Here pv < pt

, )( wpO v < )( wpO t

(iii) Further Training: Our CAELF uses Eq. (4.4) to check whether further training for

the added HN is necessary. The evaluation of Eq. (4.4) takes a constant computation O(1),

since the error values used in Eq. (4.4) have already evaluated during training.

(iv) Adding a Hidden Neuron: The computational cost for adding a hidden neuron is

)( 1 CNO for initializing its connection weights, where N1 is the number of added input

features and C is the number neurons in the output layer. It is also noted that )( 1 CNO <

)( wpO t .

All the above mentioned computation is done for a partial training consisting of t epochs. In

general, CAELF needs several, say M, such partial trainings. Thus the total computational

cost of CAELF for training a total of T epochs )( MT is

)()( 2 wpMOpNO tt . However, in practice, the first term, i.e., tpN 2 is much

less than the second one. Hence the total computational cost CAELF is ),( wpMO t

which is same for training a fixed network architecture using BP [54]. It is clear that the

incorporation of several techniques in CAELF does not increase its computational cost.

5.2 T-TEST

T-test is one kind of statistical significant test in the literature that is usually performed in

order to know whether a model is statistically significant or not to solve a particular task.

However, this test is used to compare responses from two groups of data, where two groups


47

can come from different experimental treatments. A t-test is any statistical hypothesis test in

which the test statistic follows a data’s t distribution if the null hypothesis is supported. It

can be used to determine if two sets of data are significantly different from each other, and is

most commonly applied when the test statistic would follow a normal distribution if the

value of a scaling term in the test statistic were known. When the scaling term is unknown

and is replaced by an estimate based on the data, the test statistic (under certain conditions)

follows a Student's t distribution.

The null hypothesis is that the two data means are equal to each other. To test the null

hypothesis, we have to calculate the following values: 1x and 2x are the means of the two

samples; s12 and s2

2 refer to the variances of the two samples; n1 and n2 are the sample sizes

of the two samples; and k is the degrees of freedom.

in xnxxxxnx )/1().........(/1 321

]........)(.........)()()[1/(1 222

21

2 xxxxxxns n

By shortening the above equation, we can get that,

]))(/1(()[1/(1 222 ii xnxns

We use the following equation to calculate the T-statistic:

22

212

1

21

//( nsns

xxt

Here, we compared the calculated t-value, with k degrees of freedom to the critical t value

from the t-distribution table at the chosen confidence level and it was then decided whether

to accept or reject the null hypothesis. After computation, it was found that t-value is 7.65

and from the t-table the critical t-value is 2.81. Since, the calculated t-value is greater than

the t-table value, we can say that the null hypothesis is rejected and the obtained predicted

load forecasting result of CAELF is statistically significant.

5.3 Mackey-Glass Time Series

The Mackey-Glass series, based on the Mackey-Glass differential equation is widely

regarded as a benchmark for comparing the generalization ability of different methods. This


48

series is a chaotic time series generated from the following time-delay ordinary differential

equation: Mackey-Glass time series refers to the following differential equation:

)()(1

)()(10

tbxtx

tax

dt

tdx

(5.1)

It can be numerically solved using, for example, the 4th order Runge-Kutta method, at

discrete, equally spaced time steps:

),,),(),((4_)( battxtxrksmackeyglasttx

Where, the function mackeyglass_rk4 numerically solves the Mackey-Glass delayed

differential equation using the 4-th order Runge Kutta. This is the RK4 method:

),),(),((_.1 batxtxeqsmackeyglastk

),),(),2

1((_. 12 batxktxeqsmackeyglastk

),),(),2

1((_. 23 batxktxeqsmackeyglastk

),),(),((_. 34 batxktxeqsmackeyglastk

6636)()( 4321 kkkk

txttx

Where, mackeyglass_eq is the function which return the value of the Mackey-Glass delayed

differential equation in (5.1) once its inputs and its parameters (a,b) are provided.

The generated data sample are presented in the following Fig.5.1, where the value of 17,

a 0.2, b 0.1.

Fig. 5.1 Sample data of the Mackey-glass time series

00.20.40.60.8

11.21.4

0 1000 2000 3000 4000 5000 6000Mag

nitu

de o

f Sam

ples

No. of Samples

Sample Data


49

5.3.1 Forecasting Results

Using the Mackey-glass time series model, we generated some data samples that were used

as training and testing data samples. After training CAELF with the training data samples,

the testing samples were applied to the trained NN model in order to find the forecasting

result. After that, we found a promising result using Mackey-glass time series data samples,

which are exhibited in Fig. 5.2 and Table 5.1.

Fig. 5.2 Comparison between the actual data and the predicted data for Mackey-glass data samplesobtained from CAELF.

TABLE 5.1 The value of MAPE of CAELF on Mackey-glass data.Here SD refers to standard deviation.

Mean SD Maxm Minm

MAPE 0.108234 0.014332 0.127487 0.087631

In accordance with Fig. 5.2, it has been found that, the two curves, that is to say, predicted

and actual data curves are closely over lapped each other. The reason is that, the MAPE in

this case is very low, i.e., 0.10823 as well as the value of SD is quite low, which are

presented in Table 5.1. Thus, we can say that, our CAELF model is robust and well-

performed in predicting the value of Mackey-glass time series problem.

5.4 Lorenz Time Series

The Lorenz system is a system of ordinary differential equations (i.e., the Lorenz equations)

first studied by Edward Lorenz. It is notable for having chaotic solutions for certain

parameter values and initial conditions. In particular, the Lorenz attractor is a set of chaotic

solutions of the Lorenz system which, when plotted, resemble a butterfly or figure eight.

00.20.40.60.8

11.21.4

0 200 400 600

Mag

nitu

de o

f Sam

ples

No. of Samples

Predicted Data

Actual Data


50

In 1963, Edward Lorenz developed a simplified mathematical model for atmospheric

convection. The model is a system of three ordinary differential equations now known as the

Lorenz equations:

.

,)(

),(

zxydt

dz

yzxdt

dy

xydt

dx

Here, x, y and z make up the system state, t is time, and ,, , are the system parameters.

From a technical point of view, the Lorenz system is nonlinear, three-dimensional and

deterministic.

Fig. 5.3 Sample data of the Lorentz time series

5.4.1 Forecasting ResultsUsing the Lorenz time series model, we generated some data samples that were used as

training and testing data samples. After training CAELF with the training data samples, the

testing samples were applied to the trained NN model in order to find the forecasting result.

After that, we found a promising result using Lorenz time series data samples, which are

exhibited in Fig. 5.4 and Table 5.2.

00.20.40.60.8

11.2

1 501 1001 1501Mag

nitu

de o

f Sam

ples

No. of Samples


51

Fig. 5.4 Comparison between the actual data and the predicted data for Lorentz data obtained fromCAELF.

TABLE 5.2 The value of MAPE of CAELF on Lorentz dataMean SD Maxm Minm

MAPE 0.916056 0.015022 0.93433 0.89491





performed in predicting the value of Lorenz time series problem.

5.5 Rossler Time series

The Rössler attractor is the attractor for the Rössler system, a system of three non-linear

ordinary differential equations originally studied by Otto Rössler.These differential

equations define a continuous-time dynamical system that exhibits chaotic dynamics

associated with the fractal properties of the attractor.

The defining equations of the Rössler system are:

).(

,

cxzbdt

dz

ayxdt

dy

zydt

dx

0

5

10

15

20

1 51 101 151 201 251 301 351Mag

nitu

de o

f Sam

ples

No. of Samples

Actual data

Predicted data


52

Otto E. Rössler studied the chaotic attractor with a=0.2, b=0.2, and c=5.7, though properties

of a=0.1, b=0.1, and c=14 have been more commonly used since. Another line of the

parameter space was investigated using the topological analysis. It corresponds to b=2, c=4,

and a was chosen as the bifurcation parameter.

Fig. 5.5 Sample data of the Lorentz time series

5.5.1 Forecasting ResultsUsing the Rossler time series model, we generated some data samples that were used as

training and testing data samples. After training CAELF with the training data samples, the

testing samples were applied to the trained NN model in order to find the forecasting result.

After that, we found a promising result using Rossler time series data samples, which are

exhibited in Fig. 5.6 and Table 5.3.

Fig. 5.6 Comparison between the actual data and the predicted data for Rossler data obtained fromCAELF

0

0.2

0.4

0.6

0.8

1

1.2

0 200 400 600 800 1000 1200

Mag

nitu

de o

f Sam

ples

No. of Samples

02468

101214

1 23 45 67 89 111

133

155

177

199

221

243

265

287

309

331

353

375

397

Mag

nitu

de o

f Sam

ples

No of Samples

Actual data

Predicted data


53

TABLE 5.3 The value of MAPE of CAELF on Rossler dataMean SD Maxm Minm

MAPE 0.823876 0.033428 0.84915 0.78577





performed in predicting the value of Rossler time series problem.

5.6 Comparison with Other Works

The obtained forecasting result of CAELF on Spanish daily electrical load data has been

compared with the results of three electrical load forecasting models, such as, (i) standard

electrical load forecasting (SELF), (ii) NNELF-1[32], and (iii) NARx-2[32]. The first two

models used standard feed-forward NN for electrical load forecasting, where a fixed number

of hidden neuron in the hidden layer of NN and a fixed number of iteration for the NN

training have been considered. The third one is the nonlinear autoregressive model that is

composed by two parts: (i) the true available output is fed as an input to train the NN; (ii) the

resulting network has purely feed-forward architecture and BP algorithm is used for training.

A detailed explanation of NNELF and NARx has been mentioned in Chapter 3. We used one

parameter for comparisons here, that is to say, MAPE.

TABLE 5.4 Comparisons with other models for the ELF problem in terms of the next 120days. Here, the comparisons are made according to the MAPE

Models Mean SD Max Min

NNELF-1 [32] 0.6426 0.0551 0.7686 0.4946

NARx-2 [32] 0.3852 0.1132 0.6455 0.2367

SELF 0.3151 0.0009 0.3168 0.3142

CAELF 0.2132 0.0008 0.2136 0.2110


54

In SELF, the whole setup of CAELF was used, except the constructive approach and partial

training. In this case, 5 hidden neurons were considered in the hidden layer of NN and 200

iterations for the NN training. For comparisons, we run SELF from 10 times and averaged

the forecasting results. On the other hand, the model NNELF-1[32] and NARx-2 [32] were

used 10 hidden neurons in the hidden layer and the forecasting results were averaged by 20

individual runs.

Table 5.4 shows the comparison results among these four model including CAELF. From the

table we see that, the mean value of CAELF is lower than NNELF-1, NARx-2 and SELF

model also the standard deviation is lowest. According to a close look, it is understood that,

the mean value of MAPE is the most reduced one for CAELF among the others. Not only

that, the other factor is lower than existing models. So, we can clear that our model is

superior to the existing models.

Chapter 6

Conclusion and Future Works

6.1 Conclusion

The thesis proposes a new short-term electrical load forecasting model, CAELF that is

formulated by feed-forward NN training scheme. The size of the hidden layer is determined

automatically by the constructive approach during the training processes. Thereby, the

strength of standard feed-forward NN has definitely enhanced in terms of ELF problem,

which has been exhibited in Fig. 4.3 and Fig. 4.4 for the case of 120 days prediction

analysis. In addition to be understood more clearly, we have also presented the forecasting

result of CAELF for 30 days and 7 days in Fig.4.5 and 4.6, respectively. We can conclude

our observation from these figures is that, the proposed CAELF has a remarkable capability

of forecasting the electrical load as per short-term basis. In case of holidays load demand,

forecasting of electrical load using CAELF is satisfactory as the predicted load curve is very

much closed to overlap the actual load curve except some points, which has been exhibited

in Fig.4.7.

Table 5.4 shows a comparison results in terms of MAPE for three different models, such as,

SELF, NNELF-1 [32], and NARx-2 [32] for the Spanish daily load demand data. It is found

that, forecasting result of CAELF has minimum quantity of MAPE comparing to the other

models. In addition, the value of SD for MAPE in case of CAELF is lower than that of other

models.

In order to justify the logical significance of our proposed CAELF, we conducted one

different experiment, i.e., the Mackey Glass time series, Lorenz, Rossler analysis that is

mentioned in detail in Section 5.3, Section 5.4, and Section 5.5. It can be observed that the

56

forecasting results in three prototype data samples are generated artificially. In addition to

the results of SD for all data samples including real Spain data and artificial time series data

exhibited in Table 5.1 to Table 5.3 , it has been found that the value of SD is very low.

Table 5.4 also proves that the value of SD of CAELF is lower than other existing modes.

Hence, we can claim that our CAELF model is robust and significantly better for electrical

load forecasting problem.

6.2 Future Works

Although the forecasting performance of CAELF is very much satisfactory, there are some

areas where it could not perform well. It can be found from Fig. 4.3, forecasting errors of the

initial part and the last part are little bit higher comparing to the middle part. The reason

behind such less performance is to be the nonlinearity of electrical load. To overcome such

difficulties, incorporating some heuristic techniques to the CAELF is recommended for the

future tasks.

57

Bibliography

[1] J.P. Rothe, A.K. Wadhwani and S. Wadhwani, “Hybrid and integrated approach to short

term load forecasting.” IEEE Transactions on Power Systems, vol. 2, issue 12, pp. 7127-

7132, 2010.

[2] ShahramJavadi, “Spatial Load Forecasting Using Fuzzy Logic.” 4 t h WSEAS international

conference on power engineering systems–ICOPES, Brazil, pp. 51-56, 2005.

[3] M. Amina and V. S. Kodogiannis, “Load forecasting using fuzzy wavelet Neural networks.”

IEEE International Conference on Fuzzy Systems, pp. 1033 – 1040, 2011.

[4] H. L. Willis, “Power Distribution Planning Reference Book.” Second Edition, Revised and

Expanded. New York: Marcel Dekker, 2002.

[5] G. Gross and Francisco D Galiana, “Short-term load forecasting.” IEEE transaction on power

system, vol 75.no.12, pp.1558-1573, December 1987.

[6] A. D. Papalexopoulos and T. C. Hesterberg, “A regression-based approach to short-term

system load forecasting.” IEEE Transactions on Power Systems, vol. 5, pp. 1535-1547, 1990.

[7] M. T. Hagan and S. M. Behr, “The Time Series Approach to Short Term LoadForecasting,”

IEEE Transactions on Power Systems, vol. 2, pp. 785-791, 1987.

[8] H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for short- term load

forecasting: a review and evaluation,” IEEE Transactions on Power Systems, vol. 16, pp. 44-

55, 2001

[9] Zhanshou Yu, “Feed-Forward Neural Networks and Their Applications in Forecasting”, M.Sc

thesis, December, 2000.

[10] A. Khotanzad, R. Afkhami-Rohani, L. Tsun-Liang, A. Abaye, M. Davis and D. J.

Maratukulam, “ANNSTLF-a neural-network-based electric load forecasting system,” IEEE

Transactions on Neural Networks, vol. 8, pp. 835-846, 1997.

[11] A. Khotanzad, R. Afkhami-Rohani, and D. Maratukulam, “ANNSTLF- Artificial Neural

Network Short-Term Load Forecaster generation three,” IEEE Transactions on Power Systems,

vol. 13, pp. 1413-1422, 1998.

Bibliography

58

[12] A. Khotanzad, Z. Enwang, and H. Elragal, “A neuro-fuzzy approach to short- term load

forecasting in a price-sensitive environment.” IEEE Transactions on Power Systems, vol. 17,

pp. 1273-1282, 2002.

[13] H. S. Hippert and C. E. Pedreira, “Estimating temperature profiles for short- term load

forecasting: neural networks compared to linear models,” IEE Proceedings - Generation,

Transmission and Distribution, vol. 151, pp. 543-547, 2004.

[14] A. Khotanzad, M. H. Davis, A. Abaye and D. J. Maratukulam, “An artificial neural network

hourly temperature forecaster with applications in load forecasting,” IEEE Transactions on

Power Systems, vol. 11, pp. 870-876, 1996.

[15] T. Haida and S. Muto, “Regression based peak load forecasting using a transformation

technique,” IEEE Transactions on Power Systems, vol. 9, pp.1788-1794, 1994.

[16] O. Hyde and P. F. Hodnett, “An adaptable automated procedure for short-term electricity load

forecasting,” IEEE Transactions on Power Systems, vol. 12,pp. 84-94, 1997.

[17] S. Ruzic, A. Vuckovic, and N. Nikolic, “Weather sensitive method for short term load

forecasting in Electric Power Utility of Serbia.” IEEE Transactions on Power Systems, vol. 18,

pp. 1581-1586, 2003.

[18] W. Charytoniuk, M. S. Chen and P. Van Olinda, “Nonparametric regression based short-term

load forecasting,” IEEE Transactions on Power Systems, vol.13, pp. 725-730, 1998.

[19] B. Krogh, E. S. de Llinas and D. Lesser, “Design and Implementation of An on-Line Load

Forecasting Algorithm,” IEEE Transactions on Power Apparatus and Systems, vol. PAS-101,

pp. 3284-3289, 1982.

[20] S. R. Huang, "Short-term load forecasting using threshold autoregressive models," IEE

Proceedings-Generation, Transmission and Distribution, vol.144, pp. 477-481, 1997.

[21] S.-J. Huang and K.-R. Shih, “Short-term load forecasting via ARMA model identification

including non-Gaussian process considerations.” IEEE Transactions on Power Systems, vol.

18, pp. 673-679, 2003.

[22] J. F. Chen, W. M. Wang and C.-M. Huang, “Analysis of an adaptive time- series autoregressive

moving-average (ARMA) model for short-term load forecasting.” Electric Power Systems

Research, vol. 34, pp. 187-196, 1995.

[23] M. Espinoza, C. Joye, R. Belmans and B. DeMoor, “Short-Term Load Forecasting, Profile

Bibliography

59

Identification, and Customer Segmentation: A Methodology Based on Periodic Time Series.”

IEEE Transactions on Power Systems, vol. 20, pp. 1622-1630, 2005.

[24] C. M. Huang, C. J. Huang and M. L. Wang, “A particle swarm optimization to identifying the

ARMAX model for short-term load forecasting.” IEEE Transactions on Power Systems, vol.

20, pp. 1126-1133, 2005.

[25] M. Espinoza, J. A. K. Suykens, R. Belmans and B. De Moor, “Electric Load Forecasting.”

IEEE Control Systems Magazine, vol. 27, pp. 43-57, 2007.

[26] D. C. Park, M. A. El-Sharkawi, R. J. Marks, II, L. E. Atlas and M. J.Damborg, “Electric load

forecasting using an artificial neural network.”IEEE Transactions on Power Systems, vol. 6, pp.

442-449, 1991.

[27] Methaprayoon K, Lee WJ, Rasmiddatta S, Liao JR, and Ross RJ, “Multistage artificial neural

network short-term load forecasting engine with front-end weather forecast.” IEEE

Transactions on Industry Applications, vol. 43, no. 6, pp. 1410-1416, 2007.

[28] Deihimi A and Showkati H, “Application of echo state networks in short-term electric load

forecasting”, Energy, vol. 39, pp. 327-340, 2012.

[29] Ferreira VH and Alves da Silva AP, “Toward estimating autonomous neural network-based

electric load forecasters”, IEEE Transactions on Power Systems, vol. 22, no. 4, pp. 1554-1562,

2007.

[30] Xia C, Wang J and McMenemy K, “Short, medium and long term load forecasting model and

virtual load forecaster based on radial basis function neural networks.” International Journal

of Electrical Power & Energy Systems, Vol. 32, No. 7 pp.743-750, 2010.

[31] Vermaak J and Botha EC, “Recurrent neural networks for short-term load forecasting.” IEEE

Transactions on Power Systems, Vol.13, pp.126-132, 1998.

[32] Elias R S, Fang L and Wahab M I M, “Electrical load forecasting based on weather variables

and seasonalities: A neural network approach.” 8th International conference on service systems

and service management, Canada, 2011.

[33] H. A. Malki, N. B. Karayiannis and M. Balasubramanian, “Short-term electric power load

forecasting using feedforward neural network.” Expert systems, vol. 21, no. 3, pp. 157-167,

2004.

[34] D. O. Arroyo, M. K. Skov and Q.Huynh, “Accurate electricity load forecasting with artificial

Bibliography

60

neural networks.” International conference on computational intelligence for modeling,

control, and automotion-International conference on intelligenct agents, web technologies, and

internet commerce (CIMCA-IAWTIC’05), 2005

[35] A. G. Baklrtzls, V. Petrldls, S. J. Klartzls and M. C. Alexladls, “A neural network short term

load forecasting model for the greek power system.” IEEE Transactions on Power Systems,

vol. 11, no. 2, pp. 858-863, 1996.

[36] D. Srinivasan, A. C. Liew and C. S. Chang, “A neural network short-term load forecaster.”

Electric Power Systems Research, vol. 28, pp. 227-234, 1994.

[37] C. C. Hsu and C. Y. Chen, “Regional load forecasting in Taiwan-applications of artificial

neural networks.” Energy Conversion and Management, vol. 44, pp. 1941-1949, 2003.

[38] Hippert H S, Pedreira C E and Zareipour R C S, “Neural Networks for Short-Term Load

Forecasting: A Review and Evaluation.” IEEE Transactions on Power Systems, vol. 16, no. 1,

pp. 44-55, 2001.

[39] Y. Chen, P.B. Luh, C. Guan, Y. Zhao, L.D. Michel and M.A. Coolbeth, et al., “Short-term load

forecasting: similar day-based wavelet neural networks.” IEEE Transactions on Power

Systems, vol. 25, no. 1, pp. 322-330, 2010.

[40] https://en.wikipedia.org/wiki/Artificial_Neural_Network

[41] T. Masters, “Practical Neural Network Recipes in C++.” Academic Press, Inc., 1993

[42] S. T. Welstead, “Neural Network and Fuzzy Logic Applications in C++.” John Wiley & Sons,

Inc., 1994

[43] Ben Kröse and Patrick V.D. Smagt, “An Introduction to Neural Network.” The University of

Amsterdam, 1996

[44] L. Fausett, “Fundamentals of Neural Networks: Architectures, Algorithms, and Applications.”

Prentice-Hall, Inc., 1994

[45] W.S. Sarles, “Neural Network FAQ.” periodic posting to the Usenet newsgroup

comp.ai.neural-net, URL: ftp://ftp.sas.com/pub/neural/FAQ.html, 1997.

[46] Zhanshou Yu, “Feed-Forward Neural Networks and Their Applications in Forecasting”, M.Sc

thesis,December,2000

[47] R.D. Reed and Robert J. Mark, “Neural Smithing: Supervised Learning in Feedforward

Bibliography

61

Artificial Neural Networks.” The MIT Press, 1999

[48] C. M. Bishop, “Neural Networks for Pattern Recognition.” Press, 1995 Oxford University.

[49] B. D. Ripley, “Pattern Recognition and Neural Networks.” Cambridge University Press, 1996.

[50] Rumelhart D. E., Hinton G.E. and Williams R.J., “Learning internal representations by error

propagation”, Nature, Vol. 323, pp. 533-536, 1986

[51] M.M.Islam and K. Murase, “A new algorithm to design compact two hidden-layer artificial

neural networks.” Neural Network, vol. 14, no. 9, pp. 1265-1278, 2001.

[52] T. Y. Kwok and D. Y. Yeung, “Constructive algorithms for structure learning in feed-forward

neural networks for regression problems.” IEEE Transactions on Neural Networks, vol. 8, pp.

630-645, 1997.

[53] Pardo A, Meneu V and Valor E, “Temperature and seasonality influences on Spanish electricity

load”, Energy Economics, vol. 24, pp. 50-70, 2002

[54] D. E. Rumelhart and J. McClelland, “Parallel Distributed Processing”, MIT Press, 1986

[55] X. Yao and Y. Liu, “ A new evolutionary system for evolving artificial neural networks.” IEEE

Transactions on Neural Networks,Vol. 8 no. 3, pp. 694–713, 1997

[56] M. M. Kabir. M. M. Islam and K. Murase, “A new wrapper feature selection approach using

neural network.” Neurocomputing, vol. 73, pp. 3273-3283, 2010

[57] L. Prechelt and PROBEN1, “A set of neural network benchmark problemsand benchmarking

rules.” Technical Report 21/94, Faculty of Informatics.

[58] C. Brooks, “Introductory economics for finance.” 2nd edition, Cambridge University Press, UK,

2008.

[59] M. Kudo and J. Sklansky, “Comparison of algorithms that select features for pattern

classifiers.” Pattern Recognition Vol. 33, pp. 25–41, 2000.

Bibliography

62

Short-Term Electrical Load Forecasting using Constructive ...

Documents

Transcript of Short-Term Electrical Load Forecasting using Constructive ...