COMPLEMENTARY QUADRATIC PROGRAMMING AND ...

94
COMPLEMENTARY QUADRATIC PROGRAMMING AND ARTIFICIAL NEURAL NETWORK FOR COMPUTATIONALLY EFFICIENT MICROGRID DISPATCH OPTIMIZATION WITH UNIT COMMITMENT By NADIA VICTORIA PANOSSIAN A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN MECHANICAL ENGINEERING WASHINGTON STATE UNIVERSITY School of Mechanical and Materials Engineering MAY 2018 © Copyright by NADIA VICTORIA PANOSSIAN, 2018 All Rights Reserved

Transcript of COMPLEMENTARY QUADRATIC PROGRAMMING AND ...

COMPLEMENTARY QUADRATIC PROGRAMMING AND ARTIFICIAL NEURAL

NETWORK FOR COMPUTATIONALLY EFFICIENT MICROGRID DISPATCH

OPTIMIZATION WITH UNIT COMMITMENT

By

NADIA VICTORIA PANOSSIAN

A thesis submitted in partial fulfillment of

the requirements for the degree of

MASTER OF SCIENCE IN MECHANICAL ENGINEERING

WASHINGTON STATE UNIVERSITY

School of Mechanical and Materials Engineering

MAY 2018

© Copyright by NADIA VICTORIA PANOSSIAN, 2018

All Rights Reserved

© Copyright by NADIA VICTORIA PANOSSIAN, 2018

All Rights Reserved

ii

To the Faculty of Washington State University:

The members of the Committee appointed to examine the thesis of NADIA VICTORIA

PANOSSIAN find it satisfactory and recommend that it be accepted.

Dustin McLarty, Ph.D., Chair

Noel Schulz, Ph.D.

Soumik Banerjee, Ph.D.

Kshitij Jerath, Ph.D.

iii

ACKNOWLEDGMENT

The author would like to recognize the help of Dr. Matthew E. Taylor for his patient guidance in

machine learning techniques, Dr. Dustin McLarty for advice on the project, and Dr. Srinivas

Katipamula at PNNL for making this work possible.

The author would also like to recognize the PNNL-WSU Distinguished Graduate Research

Program and WSU’s Research Assistantships for Diverse Scholars Program for their support of

this research.

iv

COMPLEMENTARY QUADRATIC PROGRAMMING AND ARTIFICIAL NEURAL

NETWORK FOR COMPUTATIONALLY EFFICIENT MICROGRID DISPATCH

OPTIMIZATION WITH UNIT COMMITMENT

Abstract

by Nadia Victoria Panossian, M.S.

Washington State University

May 2018

Chair: Dustin McLarty

Microgrid infrastructures allow for a cleaner energy future by reducing transmission losses,

enabling combined heat and power efficiency upgrades, employing onsite renewable generation,

and providing power stability especially when paired with energy storage devices. Microgrid

dispatch optimization allows wider implementation of microgrid infrastructures by lowering

microgrid operations costs. The computational bottleneck of dispatch optimization is unit

commitment, which is a mixed integer optimization problem. Three methods to reduce the

computational effort of unit commitment and maintain satisfactory optimality are.

Complementary Quadratic Programming (cQP), modified complementary Quadratic

Programming (mcQP), and Artificial Neural Network (ANN) with dynamic economic dispatch.

Both cQP and mcQP are capable of quickly optimizing receding horizon dispatches with storage,

creating training sets which facilitates machine learning approaches such as method three. This

thesis presents cQP and mcQP development as a means of training a neural network unit

commitment solver, and compares all three approaches to solutions of the full mixed-integer

problem using a commercial solver. Decision trees are employed for feature selection, and ANNs

v

of varying depth are compared for ANN structure selection. The mcQP method is the most

robust, and the ANN method is the most computationally efficient. All three methods outperform

the commercial solver in computational efficiency, robustness, and dispatch cost.

vi

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENT………………………………………………………………………..iii

ABSTRACT……………………………………………………………………………………...iv

LIST OF TABLES……………………………………………………………………………...viii

LIST OF FIGURES……………………………………………………………………………....ix

1 Introduction…………………………………………………………………………………..1

2 Literature Review…………………………………………………………………………….4

2.1 Gradient Based Methods………………………………………………………………...4

2.2 Search Methods……………………………………………………………………….....8

2.3 Machine Learning Methods……………………………………………………………10

3 Problem Statement…………………………………………………………………………..15

4 Methodology………………………………………………………………………………...16

4.1 Problem Formulation:………………………………………………………………….16

4.1.1 Combined Cooling, Heating, and Power…………………………………………….21

4.2 Complementary Quadratic Programming……………………………………………...23

4.3 Modified Complementary Quadratic Programming…………………………………...25

4.4 Artificial Neural Network……………………………………………………………...29

4.4.1 Network Structure Selection………………………………………………………....31

4.4.2 ANN Training………………………………………………………………………..36

vii

4.4.3 Algorithm Execution and Division of Work………………………………………...38

4.5 Test Systems…………………………………………………………………………....42

5 Results………………………………………………………………………………………45

5.1 Complementary Quadratic Programming Dispatch Cost and Computational

Efficiency……………………………………………………………………………………... 45

5.2 ANN in for Unit Commitment…………………………………………………………52

5.2.1 ANN Structure Optimization………………………………………………………...52

5.2.2 ANN Implementation………………………………………………………………..61

6 Conclusion…………………………………………………………………………………..70

7 Discussion…………………………………………………………………………………...72

8 References…………………………………………………………………………………..74

9 Appendix A: Sample Source Code………………………………………………………….80

9.1 Neural Network Class………………………………………………………………….80

9.2 Single Layer ANN Training Algorithm………………………………………………..81

viii

LIST OF TABLES

Page

Table 1 Generator component parameters used in the test campus system. ................................. 43

Table 2 Chiller component parameters used in the test campus system ....................................... 44

Table 3 Energy storage components used in the test campus system ........................................... 44

Table 4 Startup costs associated with coming online. .................................................................. 44

Table 5 Summary of Winter and Summer comparison of FMI, cQP and mcQP ......................... 48

Table 6: Accuracy of decision trees for each component's unit commitment .............................. 52

Table 7: Highest level layer for input threshold for each component ........................................... 55

Table 8: Test accuracy of single and double layer ANN .............................................................. 57

Table 9: Time in seconds to complete various tasks using the mcQP method versus the ANN. . 63

Table 10: Comparison of Dispatch methods for year in receding horizon ................................... 71

ix

LIST OF FIGURES

Page

Figure 1: Artificial Neural Network basic structure ..................................................................... 10

Figure 2: Decision trees ................................................................................................................ 13

Figure 3: Conceptual depiction of generator performance and cost functions ............................. 20

Figure 4: Pseudo code for assuring feasibility when using cQP. .................................................. 24

Figure 5 Pseudo code for the elimination of combinations which are infeasible ......................... 27

Figure 6 Process for developing an ANN for unit commitment. .................................................. 30

Figure 7: Evolution of complementary Quadratic Programming algorithm and software. .......... 39

Figure 8 Electric utility rates vary throughout the week ............................................................... 44

Figure 9: Distribution of operating costs for each optimization ................................................... 46

Figure 10 Distribution of operating costs for each optimization for winter and summer ............. 47

Figure 11 Comparison of electrical dispatch of January 8th. ....................................................... 49

Figure 12 Dispatch from mcQP, cQP, and FMI methods for June 26th ....................................... 51

Figure 13: Training and testing error for each component versus training iterations ................... 58

Figure 14: Validation testing accuracy of unit commitment for each component ........................ 60

Figure 15: Training accuracy of unit commitment for each component from ANNs .................. 59

Figure 16: Dispatch Comparison for ANN1 and mcQP for the simple case microgrid ............... 62

Figure 17: ANN1 as compared to mcQP for a sample dispatch. .................................................. 64

Figure 18: The ANN is capable of eliminating shutdown of ICE. ............................................... 65

Figure 19: Cost Distribution of single layer ANN dispatch as compared to mcQP dispatch. ...... 66

Figure 20: Comparison of mcQP and ANN1 electric dispatch for August 1st ............................. 68

Figure 21: Hot thermal and cold power dispatches using mcQP and ANN1 for August 1st. ....... 69

x

Dedication

Thanks to my parents, Jack and Linda Panossian for their love and support, and thanks to my

cousin Dr. Emil Rahim for his guidance on the grad school process.

1

1. INTRODUCTION

Despite the United States’ (US) withdrawal from the Paris Climate Accords, all other

nations, many US cities, and many US states have pledged to reduce harmful carbon emissions

as part of the international effort to slow the pace of global climate change [1]. Emissions

reduction efforts focus on a switch from fossil fuel burning energy production to renewable

sources such as wind and solar, as well as a switch from gas and oil-based transportation to

electric powered vehicles [2]. In many nations, such as in Germany, legacy technology is

replaced by renewable generation from wind and solar [3]. These variable renewable sources

require a higher level of control to ensure power dispatch stability with minimized cost and

emissions. Wind and solar generated power can be curtailed or stored to assure stable power

supply at all times. Other sources such as gas turbines or fuel cells that accommodate demand

not supplied by renewables and storage must be dispatched optimally to maintain stability, lower

power generation cost, and minimize emissions.

Emissions from remaining legacy generation can be reduced by using waste heat from

generators such as gas turbines, reciprocating engines, or fuel cells. This heat can be used to

satisfy heat demand if piped over short distances in the form of steam as in the case of

Washington State University’s steam plant that is used to heat campus buildings and melt ice on

central campus walkways [4]. The heat can also be used to provide cooling when connected to an

absorption chiller as is the case at University of California, Irvine [5]. Finally, waste heat can be

used for electric generation via a steam turbine connected to a heat recovery steam generator

unit, which can be transmitted over much longer distances than thermal generation [6]. Using

2

waste heat to meet thermal demand avoids energy conversion losses, but is limited to short

supply line distances of a few miles or less as are present in microgrid infrastructures.

Microgrid infrastructures are an important part of emissions reduction plans and improve

power reliability due to their independence via an ability to disconnect from the larger power

grid [7]. High reliability is necessary for locations with high cost of power failure such as

mission critical military locations, ship-board electrical systems, and hospital campuses [8] [9].

Microgrid infrastructures can also be cost effective in large campus installations which

accumulate high power demand across many buildings such as college campuses [6]. Because

microgrids may not be able to rely on the surrounding power grid for stability, dispatch control

planning is important to provide power stability, reduce cost, and avert high emissions [6]. The

distributed generation optimization applied to microgrid systems may be applied to large grid

systems as energy generation diversifies and new generation is added to areas that were once

exclusively power demand locations.

Microgrid dispatch optimization is the process of finding set points for all power generation

components, and sometimes demand components, such that cost of generation or emissions are

minimized while still providing stable power [10]. Many components such as gas turbines, fuel

cells, and vapor compression chillers have a lower limit of generation below which they are not

self-sustaining and cannot operate. Unit commitment is the problem of optimizing which

components should be online and above the lower limit of operation, or offline with zero

production at a given time to minimize cost of generation or emissions. Unit commitment

algorithms output binary values for every component with non-zero lower limit and dispatch

optimization methods use those binary values when determining real-valued setpoints for

components.

3

Complementary Quadratic Programming can reduce the computational complexity of the

unit commitment and dispatch optimization problems to provide optimal microgrid dispatching

[11] [12]. Using a neural network for unit commitment within complementary Quadratic

Programming enables real time dispatch optimization and control allowing effective use of

renewable generation, storage, and dispatchable components for a reliable power supply with

minimized cost and emissions [13].

4

2. LITERATURE REVIEW

Dispatch optimization methods can be separated into three main approaches. Gradient-

based methods reach global optimums quickly, but are limited in the types of constraints and

efficiency curves over which they can optimize. Search based methods are more flexible in the

efficiency curves and constraints that can be accommodated, but rely on computationally

expensive search algorithms to converge to an optimal solution. Machine learning approaches

can accommodate more complex constraints and efficiency curves with low computational

demand after training, but require large training sets for robust outputs.

1.1 Gradient-Based Methods

Gradient-based methods rely on gradient descent which converges on a minimum value, in

this case generation cost, by moving in the opposite direction of the surface gradient until the

absolute value of the gradient falls below some threshold [14]. Gradient descent may not

converge to a global optimal if the problem is non-convex because all points on the surface may

not have gradients which lead towards the global minimum. Gradient-based methods require

convex and often linear relationships between power production and cost to assure global

convergence. When the problem is constrained, as is the case with power dispatch, interior-point

methods are used to limit the gradient descent search space [15] with linear and possibly

quadratic constraints. Approximating the problem as a convex function with linear constraints

creates some error between the actual and estimated costs, but allows for rapid computation by

avoiding the necessity to search and check multiple possible solutions. If gradient-based methods

are used for power flow optimization with unit commitment then they lose their computational

demand advantage, because one optimization is run for each possible unit commitment

5

combination to find a global optimum. The numeric methods discussed here are linear

programming, quadratic programming, and representation as energy hubs.

Standard form linear programming is an interior-point method where an objective, in this

case cost, is minimized subject to linear constraints [15]. Linear programming approximates

component efficiency curves as a straight line and optimizes for cost as in the equations shown

below.

min(𝐶) = ∑ 𝑃𝑖𝑐𝑖𝐵𝑖

𝑁

𝑖=1

𝑠. 𝑡. ∀𝑘 𝐷𝑒𝑚𝑎𝑛𝑑𝑘 = ∑ 𝑃𝑖𝑘

𝑁

𝑖=𝑖

Where C is the cost, P is the power from each component in the microgrid, c is the slope of the

linearized generator cost curve, B is a binary value indicating the component unit commitment,

and for all timesteps k, the demand equals the sum of generation. Linear approximation may not

have high accuracy, especially near rated power, however, it drastically simplifies the problem.

If all components have linear efficiencies, then the optimal dispatch will be the point where

marginal power is equivalent for all generators. When minimum and maximum constraints are

applied, the problem is more complex because the point at which all components have the same

marginal cost may be outside of the boundary conditions [16]. Some constraints may be removed

if the optimal solution is typically not near the boundary to reduce memory and computational

demand [16]. However, any constraint that is removed must be checked a priori to assure

feasibility. Linear programming is often paired with mixed integer problems such as unit

commitment, because of the speed with which an optimization can be performed [17] [18].

Because one mixed integer configuration can be found quickly, checking the entire domain of

unit commitment combinations may be completed rapidly for smaller systems [19].

6

Unfortunately, the unit commitment process is still NP hard, so for large networks the mixed

integer problem remains computationally expensive [20]. Linear programming is also limited in

the type of acceptable constraints. Because constraints must be linear, this method is limited to

optimization of DC power grids. Linear programming can be used to approximate a solution to

the AC problem, where linear programming finds optimal real power output from each

component and then the Newton-Raphson or other convergence method is used to find the

network flow for the AC grid, with real power generation close to the optimal value [21] [22].

Quadratic programming methods can be used for optimization of DC power grids, but are

limited to linear or quadratic constraints [14]. This means that these methods do not extend to

AC grids easily due to the non-linear relationships between active power, reactive power, and

voltage [23]. Quadratic programming methods can still be used to minimize line losses or cost of

generation. If the problem is non-convex it can still be solved using Metzler matrices and semi-

definite programming relaxation [23] [24]. The problem is approximated, as shown in the

equations below, by a quadratic curve for the relationship between generation and cost for each

generator, and by assuming that the grid is DC, eliminating the constraints for reactive power.

min(𝐶) = ∑ 𝐹(𝑃𝑖)𝐵𝑖

𝑁

𝑖=1

𝐹(𝑃𝑖) = 𝑓𝑃𝑖 + 𝐻𝑃𝑖2

𝑠. 𝑡. ∀𝑘 𝐷𝑒𝑚𝑎𝑛𝑑𝑘 = ∑ 𝑃𝑖𝑘

𝑁

𝑖=𝑖

Where the cost is quadratic function of the power output with linear f and quadratic H

coefficients. The approximation of the cost curve as a quadratic function is valid because the

inverse of most combustion engines such as gas turbines, diesel generators, or fuel cells follow a

7

convex quadratic shape near their rated power, and the optimal dispatch drives set points near

rated powers, because efficiency is often the highest near rated power. In this approximation the

only constraints are upper and lower bounds on real power, ramp rates, line losses, storage

limitations, and that generation equals demand. All constraints mentioned so far in this paragraph

are linear. The DC grid solution is often used as an approximation of the real power for an AC

grid [23]. The quadratic programming method has a high computational efficiency when not

performing unit commitment, because the convex nature of the quadratic cost functions

guarantees a global optimal. If quadratic programming methods are used for power flow

optimization including unit commitment, then it is less computationally efficient, because each

unit commitment possibility becomes a new convex optimization problem that must be solved

independently, resulting in 2G quadratic programming optimizations where G is the number of

components with non-zero generation limit.

Quadratic programming can incorporate multiple loads and demands with multiple

equality constraints [25] [26]. However, since constraints are limited to linear relationships,

modeling of co-generation plants is limited to linear relationships between fuel and heat or

electric power production and heat.

Microgrid components are often grouped into energy hubs to reduce the order of the unit

commitment and dispatch problem. In an energy hub approach energy is passed through the hub

or converted from one form to another [27]. Since hubs can represent multiple components, there

are redundant energy inputs and outputs, resulting in higher reliability and flexibility of the

optimization. Adding a natural gas network requires modeling of pump power to maintain

pressure in the gas pipelines [28]. Power flow equations and network energy carrier balance is

met with equality constraints, while system limitations such as voltage limits, power generation

8

limits, and compression limits are handled with inequality constraints [28]. Once an in-bound

solution is found for the energy hubs, the outputs from each hub are broken down by component

in a subroutine. The portion of each energy source and supply to and from each component

within an energy hub is proportional to a predetermined constant factor. Combining components

into hubs reduces computational time by reducing the number of variables. The final cost of the

solution is determined by the sum of the cost of the energy input to each component [28].

When complex power flow and real efficiency curves are included, the problem becomes

non-linear, non-convex with nonlinear constraints, preventing numerical methods from being

applied. If efficiency curves are constrained to be quadratic and constraints are approximated as

linear, then numerical methods can be used to find global optimal points.

1.2 Search Methods

The dispatch optimization and unit commitment problems are often solved with search

methods where multiple feasible solutions are found and the cost of each solution is checked.

Search methods can capture nonlinear, non-convex cost relationships, as well as nonlinear

constraints. The search methods discussed here are particle swarm optimization, teaching-

learning optimization, and multi-agent genetic algorithm.

One method for finding an optimum point is particle swarm optimization (PSO) where

particles settle on possible solutions that are checked for optimality [29] [30] [31] [32] [33].

Particles are initialized at random solutions with random velocities. All particles’ solutions are

compared using the objective function and the particles move according to a velocity and inertial

terms. Because constraints, including nonlinear power system relationships, are considered, each

9

potential solution must be checked for feasibility against the constraint boundaries before the

cost is evaluated [29].

There are several varieties of particle swarm optimizations which can be used for

dispatch optimization including Adaptive Modified Particle Swarm Optimization [34], Particle

Diffusion Optimization [35], Ant colony optimization [36] [37], and Cuckoo search algorithm

[38].

Another method for solving the optimization is through Teaching-Learning Based

Optimization (TLBO) where Pareto solutions are found using fuzzy logic and clustering, and a

final solution is selected using objective weights [39].

A multi-agent genetic algorithm (MAGA) is proposed in [28] as an optimization

approach for multi-carrier energy systems, but is constrained to optimizing for only DC grids,

eliminating reactive power elements of the problem. There are many varieties of genetic

algorithm that can be used for dispatch optimization, with varying reproduction processes

including Artificial Immune System [40] [41], Hypermutation [41], and Matrix Real-Coded

Genetic Algorithm (MRCGA) [42].

Search based methods are non-deterministic and may converge to different solutions in

successive runs. Microgrids with multiple components with similar efficiencies have several

dispatch solutions which are similarly optimal, creating flat optimization surface regions.

Employing a search based method in receding horizon dispatch may create rapid fluctuations

between similar cost solutions, creating instability on the power grid and possibly damaging

generators with rapid startups and shutdowns.

Comparison of search methods within a receding horizon dispatch optimization is beyond

the scope of this thesis.

10

1.3 Machine Learning Methods

Some machine learning methods, such as Artificial Neural Networks, are deterministic,

and thus not susceptible to the random outcomes of some search based methods caused by

multiple similar local minima. Machine learning methods use sets of input features and desired

outputs to train model parameters so that the model can reproduce desired outputs. Model

training can be computationally expensive, but once the model is trained, outputs can be

produced quickly. Artificial Neural Networks (ANN) and decision trees fit into this category.

ANNs have the potential to solve the microgrid dispatch and unit commitment problems

efficiently, and without rapid fluctuations caused by non-deterministic methods.

Artificial Neural Networks are adept at classification and have been used for medical

diagnosis [43] [44] [45], image recognition [46] [47] [48], and industrial quality control [49] [50]

[51]. ANNs were initially developed as a computational model of the human brain, where

neurons are interconnected and connections between interrelated neurons become stronger [52].

Figure 1: Artificial Neural Network basic structure: inputs are fed to the network, transformed as they pass along interconnections and through nodes, and converted into an output.

inputs

11

The basic structure of an ANN is shown in Figure 1, where inputs are fed to the network, the

input data is transformed as it is passed through the network, and an output is reached.

Features are the aspects of a problem which characterize the situation. For example,

when classifying images, a feature could be the color of a pixel. Input feature values are

multiplied by weight factors when passed along connections from node to node. All information

going into a node is summed and added to the node’s bias term before being transformed by an

operating function. The ANN structure, feature inputs, and operating functions are

predetermined, established by the user. The network’s weight and bias terms are learned to

minimize the error between the desired and actual outputs of a training set. ANNs must be

provided with examples of inputs and desired outputs, called training examples. When an ANN

is initialized its weights and biases are not tuned to generate accurate outputs, so training is

necessary. The inputs from each training example is fed through the network and the ANN

output is compared to the desired output. If the output classification deviates from the desired

classification, then the weights and biases are altered using a learning method. Back propagation,

the learning method used here, is further described in section 1.7.2. Training a network to

achieve high accuracy outputs requires significant computation, with long training times, using

large training, testing, and validation data sets, but once the network has been optimized, use of

the network is computationally efficient.

ANN’s can also be used in dispatch optimization. ANNs have been used for renewable

generation forecasting [53] [54], load forecasting [55] [56], load shedding [53], and replication

of quadratic programming optimization [57]. In the case of renewable generation and load

forecasting, weather data, as well as solar production, wind production, and electrical demand

historical data is often logged at generation sites, so large training sets can be created easily from

12

these historical profiles [53] [54] [55] [56]. In the case of load shedding decisions, the ANN was

trained from load shedding simulations run on a cloud based Graphics Processing Unit because

of the large computing and memory demands involved with creating a large set of training data

[53]. In the case of quadratic programming optimization replication, generator cost functions

were represented as piecewise quadratic, convex functions, and all components were assumed to

be online to facilitate rapid creation of a training set using quadratic programming [57]. The

assumption of that all components are always online, eliminates the unit commitment problem

completely, reducing computational time of training set creation, but restricting the solution from

turning any component off. This case also does not employ energy storage, so quadratic

programming optimizations can be conducted for individual timesteps, instead of over a receding

horizon. This no-storage, non-unit commitment case, facilitates rapid creation of training

examples.

This thesis evaluates use of a neural network for unit commitment over a receding

horizon with storage as trained by mcQP, with decision tree to facilitate neural network structure

optimization. Modified complementary Quadratic Programming is demonstrated to rapidly create

training examples for cases with energy storage and unit commitment.

Decision trees are useful for classification, but also require implementation with an

optimization method with real valued outputs for dispatch optimization [58]. Decision trees are

series of nodes with thresholds on features values. Each node has one feature threshold. The

outcome (above or below the value) of the threshold comparison determines the branch that the

example follows to the next node. This process of thresholding and branching is repeated until

the example reaches a node with no branches, called a leaf node. Each leaf node is associated

with a classification and the example is classified according to the leaf node where it ends.

13

Decision tree thresholds and node structures are created by sorting training examples. All

training examples start in one group at the root node, and a threshold is determined that will

effectively separate the examples into two groups.

The method for determining which feature and what threshold value is variable. The algorithm

ID3 for threshold determination is described in Section 1.7.1. The process of selecting a

threshold and separating the training examples is repeated until all examples at a node are of the

same classification. Once all examples at a node are of the same classification, the node is said to

be “pure” and becomes a leaf node.

Decision trees are very flexible and can be used for classifications with complex

boundaries and mixed integer features. Decision trees have been used to optimize islanding for

blackout prevention as trained by mixed-integer non-linear programming solutions [59], active

power and thermal generation security and contingency planning [60], and fault response

stability determination as trained by multi-objective biogeography-based optimization [61].

Decision trees are classification structures where all examples begin at the root node and are

separated along branches by a threshold on one feature. Each node that contains examples of

more than one class is split further until all nodes only contain one class type. Most algorithms

stop before all nodes are pure or employ some form of pruning to prevent overfitting. Pruning

involves removing nodes which contain few examples to prevent overfitting. Following the ID3

Figure 2: Decision trees classify examples using thresholds on important features. The ID3

algorithm selects feature thresholds which create the highest information gain.

14

algorithm, nodes are split by the threshold that results in the largest information gain, or

reduction in entropy [62] as shown in the equations below.

𝐼𝐺 = 𝐻(𝐵) − 𝐻(𝐵|𝑓)

𝐻(𝐵) = ∑ −𝑝(𝑏) log2( 𝑝(𝑏))

𝑏∈𝐵

𝐻(𝐵|𝑓) = ∑ 𝑝(𝑏)𝐻(𝐵|𝑓 = 𝑓𝑡ℎ𝑟𝑒𝑠ℎ) = ∑ 𝑝(𝑓𝑖 , 𝑏) log2 (𝑝(𝑏)

𝑝(𝑓𝑖, 𝑏))

𝑏∈𝐵,𝑓𝑖∈𝑓𝑏∈𝐵

where IG is the information gain, H is the information entropy, B is the set of unit commitment

classifications for all examples, b is one unit commitment example (online or offline), p(b) is the

probability of classification b, p(fi, b) is the probability that an example will have feature value fi

and classification b, and f is the feature threshold. A node that only contains one classification

type is called a leaf. Pruning is conducted to prevent overfitting the tree to the training data.

WEKA’s J48 algorithm includes a pruning algorithm where nodes which contain less than three

examples are eliminated [63].

Decision trees require large training sets for high accuracy with complex classification

boundaries. Over training is also an issue with decision trees, because a tree with many layers

can be extremely accurate for sorting the training examples, but it will replicate any noise or

inaccuracies present in the training set. Decision trees could be used for classification as online

or offline in the unit commitment problem, but would need a supplementary optimizing function

to solve the dispatch optimization problem. In this thesis, one decision tree is created for

classification for each component in a microgrid and the decision tree thresholds are used to

evaluate the importance of features for a neural network.

15

3. PROBLEM STATEMENT

A reliable dispatch optimization system with high computational efficiency is required to

enable real time optimization and control of generation systems. The computational bottleneck of

the dispatch optimization process is unit commitment. Unit commitment involves selecting

which components will be online for the minimal cost dispatch. A reliable dispatch optimization

is desired with low computational demand for unit commitment. An artificial neural network

trained by modified complementary quadratic programming is proposed to meet computational

and reliability requirements for real time dispatch optimization. The accuracy of various artificial

neural network feature and depth configurations are compared to assess the minimum

computational resources needed for high accuracy output.

This thesis describes generation and storage dispatch optimization by breaking it down into

two problems: unit commitment, and dispatch optimization.

16

4. METHODOLOGY

1.4 Problem Formulation:

The cost function to be minimized is defined as:

min 𝐶 = ∑ { ∑ 𝐹(𝑃𝑖) + 𝐹(𝑃𝑔𝑟𝑖𝑑)𝐺𝑖=1 }𝑁

𝑘=1 + ∑ {𝐹(𝑆𝑂𝐶𝑟)}𝑁 𝑆𝑟=1 (1)

Where there are N time steps, k = 1, 2, 3,…, and G dispatchable generators whose cost, F(Pi), is

a function of their power output, Pi. Connection to an external electric grid, represented by Pgrid,

is assigned a time dependent price for either purchasing or selling power, F(Pgrid).

Dispatch cost estimates typically include the cost of storing energy by dividing the average

cost of energy generation by the round-trip efficiency of the storage system. This energy storage

cost estimate fails to account for the degrading round-trip efficiency with the duration of storage

and the time dependence of cost of generation for storage. In this problem formulation, power

put into or coming from energy storage devices, S, incurs cost only at the time it was generated,

Pi, or purchased, Pgrid. The charging efficiency, ηc, is included in the round trip efficiency loss

term (9), because there is some power lost during the charging process. Any residual state-of-

charge, SOC, must have some value; otherwise the SOC would always be driven to zero at the

end of the dispatch horizon. Assigning the residual SOC value instead of mandating an ending

SOC set point allows more flexibility so that if the forecast changes, the storage use can be

adapted. Assigning value only to the final state of charge assures that storage is discharged

during peak pricing periods, because it allows flexibility by avoiding artificial price assignments.

If value was assigned to stored energy at all timesteps, then storage devices would discharge as

soon as the cost of energy was above the assigned storage value, which may deplete the stored

energy before peak pricing hours. The function describing the value of this residual charge, (2),

17

is a convex quadratic such that the first kWh of storage is valued slightly more, 1+δ, than the

highest marginal cost dispatchable generation, and the last kWh of storage is valued less than,

1/(1+δ), the smallest marginal cost of generation [11]. The discharge efficiency, ηd, is included

because only the energy that can be extracted has value.

( ) ( ) ( )2212

1NrNrNr SOCaSOCaSOCF += (2)

Where: ( )( )

+−=

dP

PdFa i

id max11 (3)

( )( )

( )( ) max2

2 min1

1max1 i

i

i

i

id P

dP

PdF

dP

PdFa

+−

+=

(4)

The steep negative slope, a1, at zero SOC implies a preference to use the most expensive

generator before fully depleting the storage. Similarly, the less negative, or possibly positive,

slope at full SOC, a1 + a2, suggests a preference to discharge storage before using the least

expensive dispatchable generator. This method does not assign cost or value to stored energy at

intermediate time steps, thus ensuring maximizing utilization within the dispatch horizon.

It is important to avoid an optimal solution which fully charges or discharges the energy

storage when it is relied upon to provide the moment-to-moment balancing of generation and

demand, because of uncertainty in forecasting loads. Over charging or over discharging energy

storage may also damage or reduce the life of the system. The most straightforward approach

would optimize the middle 80% of the available capacity, leaving 10% as a buffer for

uncertainty. This approach may underutilize the storage device. A second approach adds soft

buffers through the cost function that grow stronger as the storage approaches 100% charged or

fully discharged. The buffer can be proportional, Π, to the maximum capacity of the storage, or a

18

fixed value. Two pseudo-states, l and u, are given quadratic costs, the severity of which

determines the relative ‘softness’ of the boundary (5-6). The soft constraints (5) and (6) can be in

addition to the hard capacity constraint (19). Soft constraints act as buffers in the receding

horizon control by placing a thumb on the scale in the modified cost function (9) as the storage

approaches full or empty.

𝛱 ∙ 𝑆𝑂𝐶𝑟𝑚𝑎𝑥 ≥ −(𝑆𝑂𝐶𝑟)𝑘 − 𝑙𝑘 & 𝑙𝑘 ≥ 0 (5)

𝛱 ∙ 𝑆𝑂𝐶𝑟𝑚𝑎𝑥 ≥ (𝑆𝑂𝐶𝑟)𝑘 − 𝑢𝑘 & 𝑢𝑘 ≥ 0 (6)

The minimization of (1) is constrained by:

Energy balance: for each energy demand category at every time step, k = 1,2,3….

∀𝑘 {∑ 𝑃𝑖 + 𝑃𝑔𝑟𝑖𝑑 + ∑ (𝑃𝑟 − ∅𝑟) 𝑆𝑟=1

𝐺𝑖=1 } = {𝐿 − 𝑃𝑢𝑛𝑐𝑡𝑟𝑙 + 𝑃𝑣𝑒𝑛𝑡}𝑘 (7)

Each energy demand category, e.g. DC power, heating, cooling, or steam production, has a

separate energy balance. There is a subset of generators, G, and storage devices, S. The power

supplied to or extracted from energy storage devices, (8), includes the round-trip energy loss, ϕr,

(9). The discharging power of the storage system, Pr, is calculated from the change in state-of-

charge, SOC and a self-discharge factor 𝜅. The charging loss term, ϕr, accounts for both charging

and discharging losses and is strictly non-negative. The indirect cost of producing additional

energy to satisfy the energy balance (5) ensures this charging loss is equal to, not greater than,

the actual round-trip energy losses. The energy storage charging and discharging efficiencies,

represented by ηc and ηd, are constant.

(𝑃𝑟)𝑘 = −{(𝑆𝑂𝐶𝑟)𝑘−(1−𝜅∗∙∆𝑡𝑘)∙(𝑆𝑂𝐶𝑟)𝑘−1−𝜅}∙𝜂𝑑

∆𝑡𝑘 (8)

19

( ) ( ) ( ) ( ) 0&11

−−− krkd

ckrkrkr tSOCSOC

(9)

The load, L determines the net sink of power from the generators and transmission lines

at each node. Any uncontrollable power generation, such as rooftop solar PV, is captured in the

term Punctrl. Curtailment is not considered here, so all solar or wind generation is considered must

take power. The Pvent term captures any excess production of heat, cooling, or steam that is

vented instead of used or stored. In some cases, such as the Savona Microgrid, there is no bypass

valve for heat created by CHP generators, so Pvent is zero for heat [11]. The inability to vent extra

heat production creates an additional constraint on generators, because it means that CHP

generators may not operate at a setpoint which overproduces heat, even if the microgrid must use

a utility for electrical power supply. Linear conversion from one energy category to another, e.g.

DC power to cooling power, is represented as a negative generator in the source energy balance,

-Pi, and a positive term in the converted energy category, Pi·β, where β represents the conversion

efficiency

Capacity constraints on dispatchable energy systems, energy storage systems, and grid

connections respectively assure that these components do not exceed their rated power or operate

below their self-sustaining lower limit.

maxmin

iii PPP (10)

maxmin

rrr SOCSOCSOC (11)

maxmin

gridgridgrid PPP (12)

20

Ramping constraints on dispatchable energy systems and charging/discharging limits on

energy storage systems assure that all components can safely reach their optimal setpoints in the

amount of time given.

( ) ( ) kikiki trPP −−

max

1 (13)

max

,

max

, drrcr PPP − (14)

Generator cost functions, F(Pi), determine the complexity of the optimization problem.

The input-to-output conversion efficiency (η) may be a non-linear function of output, depicted in

Figure 2A. The standard unit commitment problem inverts efficiency to find the specific cost of

generation ($/kWh), which is typically convex, and solves for the appropriate cost of energy that

balances supply and demand. Doing so requires multiplying the cost of energy, ($/kWh), by the

energy delivered, kWh, which results in the non-convex operating cost ($/hr), shown in Figure

3A. The unit-commitment problem typically solves for a balanced supply and demand at a single

moment in time, and thus must give an equivalent cost to energy drawn from a storage system.

The mixed-integer aspect of the unit commitment problem arises from the discontinuity between

a generator’s minimum operating condition (LB in Figure 2) and its off-line state.

Figure 3: Conceptual depiction of generator performance and cost functions. A) Typical electric generator efficiency (η), specific cost of generation ($/kWH), and non-linear operating cost curve ($/hr). B) Piecewise convex quadratic cost functions. Fit A is linear from 0 to peak efficiency, D, and quadratic from D to the upper bound, UB. Fit B is

D

isco

nti

nuit

y

η

$ kWh

LB UB Generator Output

A) $/hr B)

LB Generator Output

(kW)

UB D I

Fit A

Fit B

Opera

ting C

ost ($

/hr)

21

discontinuous from 0 to the lower bound, LB, linear from LB to the cost curve inflection point, I, and quadratic from I to UB.

It is common practice in optimization approaches to estimate convex functions with a series of

linear segments to linearize the optimization. The methodology described in this paper optimizes

the cost function (1) representing each generator operating cost, F(Pi), with a piecewise convex

quadratic function. Fit A represents the best possible piecewise convex quadratic that avoids the

lower bound discontinuity and has zero cost at zero output. Fit B is more accurate and includes

the discontinuity and has a non-zero initial cost. Limiting the cost functions to convex quadratics

enables a gradient-based interior-point search method to quickly converge on a global minimum

cost for the entire time horizon. Convex quadratic functions better approximate generator

efficiency curves than linear fits and avoid artificially guiding the optimization as would occur

with piecewise linear optimizations. The optimal dispatch points when using piecewise linear fits

are driven toward the junctions between piecewise linear segments. Constraining the piecewise

convex quadratics to have smooth junctions, eliminates the artificial bias toward junction points.

Most generators have peak operational efficiencies at or near rated capacity, in which case a

linear approximation is equivalent to Fit A. However, chillers, fuel cells and other distributed

energy systems operate more efficiently at part load. In the instances where part load is most

efficient, a piecewise quadratic cost drives the solution towards these non-upper bound optimal

operating conditions, where a linear fit would not.

1.4.1 Combined Cooling, Heating, and Power

Combined heating and power, CHP, generators’ outputs appear in two energy balances.

The secondary heating output does not alter the cost of the generator, and must be linearly

proportional to the primary output, i.e. Pi·β. This may over or under represent actual heat co-

production in the case of a partially loaded CHP unit. Generally, there is a greater tolerance for

22

variance in heating than electricity, so an increase or decrease in demand during the subsequent

forecast optimization accommodates any deviation in heat supplied.

A piecewise linear fit of the heat co-generation can be employed to achieve a more

accurate representation in partially loaded cases, but the optimal location of the end-points

between pieces is subjective and may drastically alter the dispatch of the problem. If heat

demand is a significant portion of total demand, then the optimal dispatch setpoints will tend

toward the end points of the piecewise representation, artificially emphasizing those setpoints

above other, more continuous solutions.

Electric chillers typically represent a non-linear conversion of electrical power to cooling

power not captured by a constant coefficient of performance, COP. Without cold thermal

storage, there is little flexibility in meeting the thermal demand, and chillers are often run at non-

optimal performance. In this scenario, it is preferable to first optimize the chiller dispatch

independent of other systems, where the linear and quadratic cost terms represent the non-linear

electric power consumption, then add the resulting electric demand to net electric load and

proceed with the optimization of the remaining energy systems.

With cold energy storage, it becomes feasible to use chiller loads to balance the electric

demand, and thus dispatch all systems concurrently. In this scenario, the chillers have no direct

costs, i.e. F(Pi) = 0. The chillers appear in the electric energy balance as a load, -Pi, and in the

cooling energy balance as a generator, Pi · COP. The cost of operating a chiller is accounted for

in the cost of electric power it consumes. Given the flexibility in dispatch afforded by the

thermal storage, it is generally preferable to operate all chillers at their design condition, thus

justifying the assumption of constant COP.

23

1.5 Complementary Quadratic Programming

Generally, the problem formulation of 4.1 is a mixed-integer problem with 2N·G states for

the generators to be online or offline at each time step. The number of on/off decision variables

quickly increases beyond what is practical to solve. Complementary Quadratic Programming is a

modified dynamic economic dispatch solution strategy applicable to district energy systems, with

a focus on microgrids with energy storage and is applicable to a receding horizon control

approach. The optimization strategy is part of an open-source platform for the design, simulation,

and control of district energy systems. Complementary Quadratic Programming greatly reduces

the mixed-integer aspect of the optimization problem by separating the problem into three steps:

Step 1. Estimation

Step 2. Unit Commitment

Step 3. Dispatch Optimization

Complementary quadratic programming’s base theory significantly reduces the burden of

the mixed-integer unit commitment problem.

Step 1: Estimation

The optimization of (1) is solved with Fit A, which results in a close approximation of the true

optimal operation without the need to solve the mixed-integer unit commitment problem. Fit A

assumes that all generators can come online or go offline without unit commitment because there

is no discontinuity between the lowest operating point and zero. It is likely only one component

of each energy type is dispatched within the region of Fit A between zero and the component’s

24

lower bound, since the slope of each component’s linear segment is unique. The ramping

constraints may force two or more components into this region for short periods of time when a

component is transitioning between on and offline states.

Step 2: Unit Commitment

The part-loaded component/s as estimated in Step 1 may be operating in the discontinuity

between offline and the lower bound. Either the part loaded component/s must shut down and

allow other systems to pick up the slack, or the part loaded component/s stay on with other

systems operating at part-load to accommodate the extra capacity. Unit commitment defaults to

all components above their lower bound in the estimation step are online, while all components

below their lower bound in the estimation step are offline. If the unit commitment is infeasible,

then the threshold between online and offline is incrementally lowered until a feasible solution is

reached. Figure 4 outlines the relaxation of the on/off threshold for the unit commitment

problem. Lowering the threshold for determining the on/off status does not impact the lower

operating constraint. It may force a generator that was initially dispatched at 15% power to be

online and operating above its 20% minimum.

attempt = 1; %initialize first threshold at the lower bound

percent_of_LB = [1, 0.9, 0.5, 0.1, 0, -1]; %array of thresholds

%continue to lower the threshold, until a feasible solution is found

while Feasible == False && attempt <6

%if the estimated dispatch setpoint is above the threshold, set that component online

Unit_Commitment = Estimation> LB*percent_of_LB(attempt)

[Feasible] = checkFeasibility(Unit_Commitment);

attempt = attempt+1;

end

Figure 4: Pseudo code for assuring feasibility when using cQP.

25

Feasibility with all components online is the last check because it is the most expensive.

Gradual threshold relaxation assures feasibility and reduces the computational demand of the

unit commitment problem from 2NG to a small finite number of threshold steps.

Step 3: Dispatch Optimization

With the resulting unit commitment schedule of generator operation known, the optimization

(1) can be re-solved using Fit B to reach a better approximation of the marginal cost of each

component.

Estimating the solution of (1) with Fit A, checking the feasibility of the part-loaded

component/s, then solving (1) with Fit B replaces the mixed-integer optimization with two

straightforward quadratic optimizations. This approach is valid for most simple arrangements of

generators and storage devices. Arrangements that are more complex may still require solving a

portion of the mixed-integer problem as described in Section 1.6.

1.6 Modified Complementary Quadratic Programming

For complex or highly constrained district energy systems it may be beneficial to check a

broader set of feasible operating conditions between the optimizations with Fit A and Fit B. The

error between Fit A and the actual cost at part-load, varies by component. The start-up and re-

start costs are not captured in (1). Vastly varying equipment sizes may mean that accommodating

a large generator results in shutting down one or more smaller units. These additional costs and

26

complications can be accommodated in a more robust unit commitment step.

Step 1: Estimation

This step is the same as the estimation step for cQP, but it is used to estimate the storage use

over the horizon. The preliminary optimization over the horizon using Fit A is used to estimate

use of storage. The estimate of storage power output is then subtracted from the demand at each

timestep and each timestep is optimized individually. Storage planning is the reason for

optimizing over the entire horizon simultaneously. Estimating storage use with Fit A reduces the

unit commitment problem from 2NG to N∙2G, because each timestep can be optimized

independently.

Step 2: Unit Commitment

The intermediate unit commitment step formulates a set of optimizations of (1) at each

discrete time step. Each optimization considers a feasible arrangement of the 2G generator

combinations available at that time. Combinations in which either the lower bound of the

combination of generators is above the demand, or in which the upper bound of the combination

of generators is below the demand are eliminated, before optimization and comparisons are made

according to Figure 5.

27

Figure 5 Pseudo code for the elimination of combinations which are infeasible for meeting demand

The algorithm in Figure 5 eliminates combinations that are infeasible because they either

cannot produce enough power to meet demand, or cannot all be online without over producing

power.

All combinations that do not meet ramp rate conditions are also eliminated before

optimizations are run and compared. The set of feasible combinations is often much smaller than

the set of all combinations. The cost differential between the feasible alternatives and the original

combination of generators is then compared to any start-up costs avoided by changing the unit

commitment schedule from the first optimization of (1).

When optimizing a single time step, (20), energy storage lacks the ‘big-picture’ perspective of

the simultaneous optimization. This perspective is incorporated by using the SOC determined by

the first optimization to set the nominal discharge target, Pr0. A quadratic cost constrains

deviations from the original storage, but allows for deviations when significant savings can

accrue. The energy balance at each step thus becomes (21). The planned power output from the

storage is placed on the right-hand-side as a constant, reducing apparent load. The deviation from

this power, and the charging penalty remain on the left hand.

for all_Combinations

if sum(UB(this_Combination)) < Demand

remove this_Combination

elseif sum(LB(thisCombination)) > Demand

remove this_Combination

end

end

28

min 𝐶 = ∑ 𝐹(𝑃𝑖)

𝐺

𝑖=1

+ 𝐹(𝑃𝑔𝑟𝑖𝑑) + 𝐹(𝜀𝑅𝑒𝑠) + ∑ 𝐹(𝑃𝑟)

𝑆

𝑟=1

(20)

∑ 𝑃𝑖𝑘

𝐺

𝑖=1

+ 𝑃𝑔𝑟𝑖𝑑𝑘+ ∑(𝑃𝑟𝑘

∗ − 𝜙𝑟𝑘)

𝑆

𝑟=1

= 𝐿𝑘 − 𝑃𝑟𝑘

0 (21)

The planned power output, Pr0, can be calculated based on the SOC states from the first

optimization, (22), including any proportional or fixed losses, κ* or κ. The allowable range of the

deviation is thus the nominal power output range of the storage device shifted by Pr0 as per (23).

The power output might be further constrained by the available stored energy or remaining

storage capacity if that is more restrictive given the current SOC.

𝑃𝑟0 = −

{(𝑆𝑂𝐶𝑟)𝑘 − (1 − 𝜅∗ ∙ ∆𝑡𝑘) ∙ (𝑆𝑂𝐶𝑟)𝑘−1 − 𝜅} ∙ 𝜂𝑑

∆𝑡𝑘 (22)

𝑃𝑟𝑚𝑖𝑛 − 𝑃𝑟

0 ≤ 𝑃𝑟∗ ≤ 𝑃𝑟

𝑚𝑎𝑥 − 𝑃𝑟0 (23)

The charging penalty must similarly be offset by the planned power output, as per (24).

𝜙𝑟 ≥ (1

𝜂𝑐− 𝜂𝑑) ∙ [

−𝑃𝑟0

𝜂𝑑− 𝑃𝑟

∗] (24)

Step 3: Dispatch Optimization

The individual timestep optimizations are used to select the least cost unit commitment for

each step. The unit commitment is then used for one final optimization using Fit B over the

whole horizon to assure the best dispatch setpoints when optimizing for an entire horizon instead

of a single step. In this final optimization the energy storage can fluctuate throughout the day, but

the marginal cost of power is applied to the state of charge at the last timestep, such that the

29

storage unit is not constrained to an end SOC, but maintains stored energy that can be used in

future horizons.

1.7 Artificial Neural Network

Although modified complementary quadratic programming is a more optimal method of

dispatch than complementary quadratic programming or search based methods and maintains a

much lower computational demand than traditional numerical methods, the optimization may be

out-of-date before it can be solved and setpoints implemented. This computational time

constraint prevents the mcQP from solving highly complex problems or problems with high

frequency (e.g. sub minute) receding horizon updates.

A machine learning approach can send dispatch signals at the desired frequency, but

machine learning systems require large training sets before outputs reach a robust representation

of the optimal.

Developing training sets with full mixed-integer solvers would be

prohibitively time and resource consuming. Thus, mcQP, with its balance between

reliability and efficiency, can quickly generate the large training sets necessary for

machine learning methods. The efficiency and reliability of mcQP are key in

allowing an Artificial Neural Network to train on the full range of seasonal, weekly,

and diurnal market and demand input fluctuations.

Neural Network methods are amenable to classification problems, so the neural network

developed for this work acts as a classifier for unit commitment. An ANN, trained from mcQP

solutions for specific loads, classifies each component as either online or offline, reducing the

unit commitment step computational time. Dispatch constraints, feasibility, and robustness are

30

maintained by using the unit commitment solution from the ANN as the equipment status for a

quadratic programming optimization using Fit B, thus replacing the first two steps of both cQP

and mcQP. Real time control can be achieved with an ANN that reduces the order of the problem

from 2NG to constant time.

Figure 6 Process for developing an ANN for unit commitment. A large set of robust training examples is created with mcQP. Those examples are used to train an ANN. The ANN is used for rapid unit commitment for dispatch

optimization.

An ANN passes features along connections to nodes where eventually, an output node

reveals the classification of the example. Features are multiplied by weights along the

connections and all features leading into a node are summed and added to a bias. At each node is

an activation function which operates on the sum of the inputs to a node plus the bias, and

outputs the value for that node. In this case, the operating function is a sigmoid to facilitate

Complimentary QP used to create

training data:

QP using FitA

Determine marginal cost of

last storage setpoint from FitA

solution.

Create list of feasible unit

commitment combinations

meeting constraints

Test all combinations in unit

commitment list at each

timestep using FitB to find

optimal dispatch

Train ANN to

complete unit

commitment

from Demand

inputs and

mcQP outputs

split into

training,

validation, and

testing data

ANN realtime dispatch

and control:

Use forecasts

and trained

ANN for unit

commitment

Single QP using FitB

with the unit

commitment

combination

determined by the

ANN

31

binary classification. The sigmoid has asymptotes at zero and one with a steep slope for input

values between -1 and 1. This shape facilitates binary classification by pushing inputs toward

either online (one) or offline (zero). All nodes employ the sigmoid activation function (25) to

emphasize any deviation from normal, so that small changes in the middle range have large

impacts, while outliers are prevented from dominating the characterization of an example.

𝑦 =1

𝑒𝑥+1 𝑥 = 𝑏 − ∑ 𝑤𝑖 ∗ 𝑓𝑖

𝐹𝑖=1 (25)

where y is the output of the node, b is the bias at the node, wi is the weight corresponding

to the ith connection, fi is the feature or input corresponding to the ith connection, and F is the

number of input features.

The output from mcQP can be re-organized into a training set where non-zero component

setpoints map to the online class, while setpoints at zero map to the offline class. Feature

selection is more flexible and is explained further in Section 1.7.1 Network Structure Selection.

1.7.1 Network Structure Selection

There are several challenges in accurately training an ANN, specifically determining the

features to consider as inputs and determining the number of layers. Feature selection is a

challenge because unimportant features, those whose weighting would eventually train to zero,

significantly add to the training time and obfuscate the training of more significant parameters

until the weighting reaches zero. Features must be carefully selected to reduce training time and

prevent obfuscation, while still capturing all aspects of the problem. A decision tree as described

on page 35 is used for feature selection.

The available features from the dispatch estimation include:

32

● Estimated component set point from the previous dispatch for each component at each

step in the horizon

● Estimated energy storage input/output for each storage device at each step in the horizon

● Market price of energy at each step in the horizon:

○ Price of electricity from the grid

● Forecasted Energy demand at each step in the horizon:

○ Electric, Cooling, and Heating demand

● Forecasted renewable energy production at each step in the horizon

● Upper and lower bounds of each component at each step in the horizon

○ The upper and lower bounds change at each timestep because the generation from

each component is limited by a ramp rate

The market prices for fuel are not included because they are assumed to be constant

values throughout the year, while the market price of electricity varies throughout the day and

seasonally. Static values such as the generator efficiency curve constants and generator types are

not considered for features, because these characteristics are static. Training to static features

obfuscates the learning of important features. The upper and lower bounds of each component

are not static values because they change depending on component initial condition and ramp

rate.

Instead of including the estimated hourly dispatch for each component at each timestep

into the horizon, timesteps 1, 2, 3, 6, 12, 18, and 23 were selected to reveal the value of including

estimated dispatches farther into the future without incurring the computational slowdown

associated with including all timesteps from 1 to 23. The estimations do not reach timestep 24,

because they are provided by the previous dispatch, and the last step of the previous dispatch is

33

timestep 23 of the current dispatch. Power from storage devices is only included at the previous

timestep, instead of all estimated timesteps, because the power from the previous step has

already been implemented and is not an estimated value. Also, reducing the features from all

estimated timesteps to only the previous timestep reduces computational time for the decision

tree.

The accuracy of the ANN is also dependent on the number of layers, also known as the

ANN’s depth. Deep ANN’s can achieve high accuracy, but deep learning requires a longer

training time to propagate through all the layers, and more training examples to accurately define

the classification boundary without overfitting as compared to shallow ANNs. Unit commitment

has a more predictable behavior than common problems that employ deep learning, such as

complex image recognition [64] or human behavior recognition [65], so it is assumed that the

problem will require few layers. When there are few layers, the accuracy varies more quickly

with the number of layers. A single layer ANN will not capture any interaction between features,

while a two layered ANN will capture feature interaction. The difference between one and two

layers is large while the difference between 99 and 100 layers is smaller because with many

layers, complex interactions are already captured without the addition of another layer. The

depth of the ANN is more crucial when assuming a shallow ANN is sufficient, because an extra

layer will add a significant percentage to the training time, while too few layers will fail to

capture the complexity of the problem.

Feature and layer selection is challenging because a rigorous search of all configurations

would require training, testing, and comparing test accuracy for D·2F different ANNs for each

component, where D is the number of different ANN depths to check and F is the number of

features. The case study of this thesis, 6 possible depths and 10 possible features, would result in

34

6144 ANN’s that would be trained and substituted for the unit commitment problem. Six

possible ANN layer depths were compared because the ANN layers were increased until

accuracy did not improve at all for any unit commitment validation accuracy. The ten possible

features include all possible features with varying values as described in the list above. Instead of

testing all combinations of features, a decision tree selects the features for training ANNs with 1

to 6 layer depths. MATLAB’s Neural Network Toolbox is used to train and evaluate neural

networks with 2 to 6 layers, but the toolbox does not support single layer ANNs, so an author

created open-source algorithm for training and evaluation of a single layer ANN is used for an

ANN with one layer.

Network structure selection is conducted using the complex campus system described in

Section 1.8 Test Systems. The complex system encompasses all possible features for the simple

system. Training examples were created from optimal dispatches of the complex case using

mcQP. The optimal dispatches span a full year with a receding horizon of 24 hours, timesteps of

one hour, predictions updated every hour, and 24 hours to establish initial conditions resulting in

365*24*24+24 = 210264 training examples. Training solutions are the unit commitment state of

the dispatchable components, such that a component with an output of 0 is offline, and a

component with an output above its minimum self-sustaining threshold is online. The component

lower bound constraint assures that no components have outputs between 0 and the minimum

self-sustaining threshold.

All inputs are normalized by subtracting the minimum value and dividing by the

maximum value. Normalization (26-33) prevents large inputs from overpowering smaller inputs

and facilitates the use of a sigmoid function for binary classification. The predicted power from

35

the storage devices is found from the predicted state of charge at the current timestep minus the

predicted state of charge at the previous timestep (28).

𝑓𝑑𝑒𝑚𝑎𝑛𝑑 =𝐷𝑒𝑚𝑎𝑛𝑑−𝑋𝑟𝑒𝑛𝑒𝑤

max(𝐷𝑒𝑚𝑎𝑛𝑑) (26)

𝑓𝑐𝑜𝑠𝑡 = 𝐶(

𝑈𝑡𝑖𝑙𝑖𝑡𝑦

𝑑𝑡)

max(𝐶(𝑈𝑡𝑖𝑙𝑖𝑡𝑦

𝑑𝑡))

(27)

𝑓𝑠𝑡𝑜𝑟𝑒𝑑𝑃𝑜𝑤𝑒𝑟 = (𝑆𝑂𝐶𝑡−1𝐴 − 𝑆𝑂𝐶𝑡𝐴)/𝑈𝐵𝑠𝑡𝑜𝑟𝑎𝑔𝑒 (28)

fUB = min(𝑈𝐵, 𝑋𝑡−1𝐴 + 𝑅𝑎𝑚𝑝𝑅𝑎𝑡𝑒) /𝑈𝐵 (29)

𝑓𝐿𝐵 = max(𝐿𝐵, 𝑋𝑡−1𝐴 − 𝑅𝑎𝑚𝑝𝑅𝑎𝑡𝑒) /𝑈𝐵 (30)

𝑓𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒−1 = 𝑋𝑡−1𝐴/𝑈𝐵 (31)

𝑓𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒 1 = 𝑋𝑡+1𝐴/𝑈𝐵 (32)

𝑓𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒 0 = 𝑋𝑡/𝑈𝐵 (33)

Where f is the normalized feature input, X is the predicted state of the component, C(Utility)*dt

is the cost of electricity from the utility in the form of $/kWh at an hourly timestep, SOC is the

state of charge of an energy storage device, UB is the absolute upper limit on component

production or stored energy, and LB is the lower limit on power output or stored energy when

the component is online.

The features of the ANN are determined according to accuracy associated with decision

trees for each component in the unit commitment problem to reduce the computational effort

involved in finding the most effective ANN structure.

36

A decision tree trained from the inputs listed above and the solutions as given by the final

unit commitment from mcQP is used for the selection of valuable features. If an input is never

thresholded in the tree, then that input is not useful for deciding if the generator is online. If an

input is thresholded in a node near the top of the tree, then that input is very important, because it

provides the most information gain for all examples. Using a decision tree for input evaluation is

more efficient than the exhaustive search of all ANN inputs, because it only requires the training

of one tree per dispatchable component.

WEKA’s J48 decision tree algorithm is used to conduct unit commitment for each

dispatchable component given the described set of features [63]. Trees are made with 10-fold

cross validation. The trees are generated with a smaller sample size of 21,137 labeled examples

for each component, which is one tenth of the available example set. Only one tenth of the full

set is used to train the decision tree to improve training time. The J48 implementation follows the

ID3 algorithm where the node is split along the feature with the highest information gain. One

tree is created for each component.

1.7.2 ANN Training

The ANN is trained using batch learning and back propagation to 0.0001 maximum

square error or 10,000 training iterations, whichever comes first during back propagation. A

maximum square error of 0.0001 is chosen because inputs include forecasted demands and

generator limits which are often values accurate to four decimal places, so this small square error

value maintains the importance of all significant figures. For example, a forecasted demand of

1001 kW is distinct from a forecast of 1000 kW if you have two generators with maximum

capacities at 1000 kW. The threshold of 10,000 training iterations is chosen, because this is the

point at which accuracy reaches saturation. For a more detailed accuracy curve see Section

37

1.10.1 ANN Structure Selection. Back propagation alters the weights and biases using gradient

descent starting with the output layer and working backwards in the network. The weights and

biases of the layer in question are updated according to the error between expected output and

actual output.

𝑑𝑤

𝑑𝐸= −2𝑥 𝐸𝑦 (1 − 𝑦)

∆𝑤𝑗 =𝑑𝑤

𝑑𝐸𝑒𝑟𝑟𝑜𝑟

𝑎

100+

𝑚

100∆𝑤𝑗−1

𝑑𝑏

𝑑𝐸= −

2

𝐹∑ 𝑒𝑟𝑟𝑜𝑟𝑖 𝑦𝑖 (1 − 𝑦𝑖)

𝐹

𝑖=1

∆𝑏𝑗 =𝑑𝑏

𝑑𝐸

𝑎

100+

𝑚

100∆𝑏𝑗−1

Learning rate, a, of 1 is divided by 100 to prevent overshooting the minimal error

solution. Momentum factor, m, of 0.25 is employed to increase speed of learning at points of

high change in error.

A Hessian training approach was attempted to increase accuracy and speed up

convergence, but the large multi-parameter dependency of the output prevents the inversion of

the sparse Hessian matrix, preventing learning. Biases are necessary to realize the non-zero

minimum effect of parameters such as temperature and forecasted demand. The sigmoid

activation function facilitates rapid training to the binary off/on problem in unit commitment

because it has asymptotes at 0 and 1 with a region of steep slope around an input value of zero.

Sample source code for the neural network learning algorithm and the neural network

class definition can be found in APPENDIX A: SAMPLE SOURCE CODE.

38

1.7.3 Algorithm Execution and Division of Work

A robust execution of the optimization algorithms was developed through different

contributors across several years. The evolution of software and algorithms, as well as the

division of work is detailed in Figure 7. Blue text indicates software that was developed without

author involvement, purple text indicates software that was developed with author collaboration,

and red text indicates software that was developed by the author independently.

39

Figure 7: Evolution of complementary Quadratic Programming algorithm and software implementation. Blue text indicates software developed by other members of Washington State University’s Clean Energy Systems Integration (CESI) Lab including D. McLarty, H. Mikeska, A. Mills, and N. Jones. Purple text indicates software developed by the CESI Lab in collaboration with the author. Red text indicates software developed by the author independently. GRID MEND was

developed first, followed by the more robust and generalized EAGERS.

40

The software Efficient Allocation of Grid Energy Resources including Storage (EAGERS

0.0), which was developed by D. McLarty prior to 2015, followed an algorithm where unit

commitment was conducted using heuristic rules which accommodated the UC Irvine Campus

use case and the Savona Energy Hub use case. Version 0.0 accommodated CHP gas turbines,

CHP fuel cells, vapor compression chillers, thermal storage units, and electric utilities with time

of use rates.

EAGERS underwent restructuring, feature expansion, and generalization starting when

the author joined Washington State University’s Clean Energy Systems Integration Lab in 2015.

The new modular, more robust, generalized algorithm became the open source software

EAGERS 0.1.1. The algorithm was generalized such that different processes can be used for unit

commitment. There are currently three algorithm options for unit commitment implemented in

EAGERS. The first option follows cQP as described in Section 1.4 Theory Development. The

second option follows mcQP as described in Section 1.6 Modified Complementary

Programming. The third option uses an artificial neural network for unit commitment as

described in Section 1.7 Artificial Neural Network.

The modular structure of EAGERS 0.1.1 is more adaptable to new microgrid

components, new power demand types, new optimization implementations for benchmarking,

and new features. The algorithm was generalized such that optimization timesteps could have

any arbitrary value and span an arbitrary horizon length. Timesteps longer than an hour and

horizons longer than 24-hours enable optimization of long term phenomenon, such as power

storage at a dam [66]. Timesteps shorter than one hour enable higher resolution optimization,

bringing the algorithm one step closer to bridging the gap between optimization and control.

41

Non-linear timesteps can be used in scenarios where the near future requires high resolution, but

the far-future forecast has high uncertainty.

The robustness of the new algorithm and implementation in EAGERS 0.1.1 is important

for creating training sets for neural network training. Machine learning methods require large

training sets to capture the entire space of an optimization. Other robust algorithms that search

larger unit commitment spaces run slowly, making training set generation prohibitive. CVX’s

Gurobi is a commercial mixed integer quadratic programming solver, but its long run time makes

creating large training sets prohibitive. This aspect of the computational advantage of EAGERS

0.1.1 is further described in Section 1.7 Artificial Neural Network.

The artificial neural network for unit commitment method was developed by the author

independently. Artificial neural network structure optimization is available using WEKA’s J48

decision tree algorithm for feature selection, and MATLAB’s Neural Network Toolbox for ANN

depth evaluation. Single layer ANNs are available for unit commitment using author generated

class structure, training, and unit commitment algorithm. These single layer ANNs improve

training time, require fewer training examples, reduce memory use, and improve output

computation time. For multi-layer ANNs, MATLAB’s Neural Network Toolbox is used for

training and dispatch within the unit commitment method. Large training example sets were

created using EAGERS’s mcQP option as applied to scenarios further described in Section 1.8

Test Systems.

Future work includes extending the neural network to include dispatch optimization for

real power, then adapting a neural network for dispatch optimization including reactive power

control and AC/DC connections. The ANN for reactive power control would be trained using

reinforcement learning. Future work is described in more detail in Section 7 Future Work.

42

EAGERS is open source and free to download at https://github.com/CESI-Lab/EAGERS

or by contacting the author. Sample source code is shown in APPENDIX A: SAMPLE SOURCE

CODE.

1.8 Test Systems

Two test systems are used for training and testing optimization methods to show the

exponential increase in computational efficiency with increasing microgrid complexity. The

simple microgrid is dispatched for one week and is used to validate the optimization of the neural

network approach, since it is more difficult to determine if a complex system has been

dispatched optimally, than if a simple system has been dispatched optimally. The complex

system is dispatched for a full year, allowing for a full generalized training set for the ANN. This

system is used to benchmark computational demands of different methods and verify that a

neural network can accommodate complex systems.

The first system includes 5 components: an electric utility, a gas utility, an internal

combustion engine, a gas turbine, and a hot water thermal energy storage unit. Both the internal

combustion engine and the gas turbine are combined heat and power units. This system has

electrical and heat demand.

The complex microgrid system consists of 18 components: an electric utility, a natural

gas utility, a diesel supply, two combined heat and power gas turbines with dissimilar efficiency

curves, two combined heat and power fuel cells with dissimilar efficiency curves, one battery, a

rooftop solar PV array, two large chillers with dissimilar efficiency curves, two small chillers

with dissimilar efficiency curves, one diesel generator, a cold thermal storage water tank, a hot

43

thermal storage water tank, a gas heater, and a small non-CHP gas turbine. This system has

electric, heat, and cooling demand. The gas turbines, fuel cells, chillers, heater, and diesel

generator are all dispatchable components requiring unit commitment. The components’ sizing

efficiency fit curves are described in Table 1 below. The table shows the minimum power from

each component when online, the maximum power from each component when online, the

maximum ramp rate, the linear and quadratic cost coefficients respectively, and the power

conversion factor for each component. The power conversion factor describes instances such as

CHP where the generator outputs heat as a piecewise linear function of the electrical power

output. For chillers, cost is incurred by the electricity needed to run the chiller, so chillers have

no direct cost terms, but have power conversion terms as a piecewise linear fit of the amount of

cooling power output with respect to the amount electric power needed. Energy storage devices

incur cost as the power used to charge them, so there is no direct cost for energy storage use.

Table 1 Generator component parameters used in the test campus system. The linear and quadratic cost coefficients of Fit B, [a0, a1, …aj] and [b1,…,bj], are listed along with the linear CHP coeficients, β0, β1,…,βj. The constant terms, a0 and β0, are used to determin the total operating cost and total heat production

Component Pi

min

(kWE)

Pimax

(kWE)

rimax

(kWE/hr)

a0, a1, …aj

($ , $/kW…)

b1,…,bj

($/kW2)

β0

(kWH)

β1,…, βj

(kWH /kWE)

Fuel Cell 1

(CHP) 500 2,000 500

16.58, 0.0359,

0.0359, 0.0394,

0.0394, 0.0470

0, 3.12e-6, 0,

2.14e-9,

2.27e-5

305.7

0.726, 0.694,

0.645, 0.578,

0.5782

Fuel Cell 2

(CHP) 500 2,000 600

41.54, 0.0211,

0.0211, 0.0211,

0.0346, 0.0606

0, 0, 1.35e-5,

9.83e-10,

4.54e-8

752.9

0.494, 0.380,

0.348, 0.348,

0.348

Gas Turbine 1

(CHP) 4,000 7,000 4,000

173.52, 0.0653,

0.0653, 0.0653,

0.0700, 0.0717

0, 0, 0,

1.0557e-5,

9.4938e-6

4,257.8

1.033, 1.004,

0.795, 0.795,

0.795

Gas Turbine 2

(CHP) 2,000 5,000 2,000

225.38, 0.0416,

0.0416, 0.0688,

0.0688, 0.1270

0, 2.312e-8,

0, 3.071e-5, 0 4,351.2

0.834, 0.778,

0.778, 0.778,

0.778

Gas Turbine 3 100 500 300

21.65, 0.0449,

0.0449, 0.0449,

0.0747, 0.1150

0, 0, 1.763e-

5, 0, 1.017e-4

Diesel

Generator 500 1,500 1,000

21.81, 0.275,

0.275, 0.275,

0.289, 0.292

0, 0, 3.92e-8,

5.88e-7,

5.92e-5

Heater 2,000 20,000 10,000 0.853, 0.02556 0

Rooftop PV 0 3,000 ∞

44

Table 2 Chiller component parameters used in the test campus system. Chiller costs are incurred as the electrical power used by each chiller, determined by the linear energy conversion factors β.

Component Pi

min

(kWC)

Pimax

(kWC)

rimax

(kWC/hr) β0 (kWE)

β1,…, βj

(kWE / kWC)

Chiller 1 2,000 10,000 3,000 183.1 0.133, 0.161, 0.304

Chiller 2 1,500 10,000 2,000 186.1 0.135, 0.157, 0.305

Chiller 3 2,500 7,500 5,000 199.6 0.150, 0.151, 1.52, 0.238, 0.368

Chiller 4 2,000 7,500 2,000 182.9 0.166, 0.222, 0.390

Table 3 Energy storage components used in the test campus system. Energy storage costs are incurred as the power used to charge the device, so there is no direct costs for energy storage.

Component SOCr

max

(MWhr) Pi

min (kW) Pimax (kW) κ* (%/hr) , κ (kW) ηc , ηd (%)

Battery 25.5 -14,900 7,540 0.0 , 3.41 99.48 , 99.48

Cold Thermal Storage 20 -50,000 50,000 0.0 , 123.75 99 , 99

Hot Thermal Storage 7.5 -30,000 30,000 0.0 , 330 99 , 99

Table 4 The fuel cells, gas turbines, diesel generator, and chillers have startup costs associated with coming online.

Fuel

Cell 1

Fuel

Cell 2

Gas

Turbine 1

Gas

Turbine 2

Gas

Turbine 3

Diesel

Generator

Chiller

1

Chiller

2

Chiller

3

Chiller

4

$300 $250 $1000 $300 $10 $100 $150 $200 $50 $50

Figure 8 Electric utility rates vary throughout the week and are dependent on season. From June 1st through September 30th summer rates are used, while from October 1st through May 31st winter rates apply. Peak pricing is during the middle of the day on weekdays

The electric utility pricing varies by season, day of the week, and hour of the day. Pricing

schedules are split into two seasons: summer which is from June 1st through September 30th, and

winter which is from October 1st through May 31st. The profiles for winter and summer pricing

are shown in Figure 8. These pricing profiles are representative of a typical time of use schedule

where the highest price for electricity occurs mid-day during summer weekdays.

45

5. RESULTS

1.9 Complementary Quadratic Programming Dispatch Cost and Computational Efficiency

The cQP method is benchmarked against mcQP, and the full mixed integer approach as

implemented by CVX’s Gurobi mixed-integer quadratic programming optimizer. The second,

complex microgrid setup is used for benchmarking the cQP method to demonstrate its

computational efficiency when applied to complex systems.

A historic energy profile for electric, heating, and cooling demand for a college campus

in California for a full year at one hour intervals is used for the demand. The historical demand

profile is surface fit for demand versus time of day and temperature to simulate forecasting and

include forecasting error where the actual demand is the historic data and the forecasted demand

is the surface fit for that temperature and time of day. Forecasting error is important to include

for testing the robustness of the receding horizon method, because it tests that the dispatch

method can adapt to errors in demand forecasts as well as adjust to new information in the

dispatch in the receding horizon. The hourly dispatch profile for one year allows for the creation

of 8760 optimizations from both cQP and mcQP methods on real data. The large dispatch set

allows for comparison of computational time, dispatches, and cost at all ranges in time of day,

weekly, and seasonal profiles. Full year energy profiles for electric, heating, and cooling

demands collected at a college campus in California are used for the loads, resulting in 8760

optimizations from both cQP and mcQP methods. This allows for comparison of computational

time, dispatches, and cost at all ranges in time of day, weekly, and seasonal profiles. The cQP is

able to optimize a 24 hour dispatch in an average of 1.6 seconds, while the mcQP method takes

46

on average 6.8 seconds. Both approaches simulated an entire year in a receding horizon control

approach.

Figure 9 compares the distribution of operating costs for each optimization of the 24-hour

horizon. The mean cost for the cQP method is $28,096 with a standard deviation of $3,870. The

mean cost for mcQP method is $26,992 with a standard deviation of $3,464. The cQP dispatch

averages $1,104 more expensive than the mcQP dispatch. The lower cost solutions of the mcQP

method result from searching a greater space during the unit commitment step. The smaller

standard deviation for the mcQP method also indicates a more stable solution for the receeding

horizon control.

The full mixed integer problem typically takes longer than an hour to reach a solution. In

a receeding horizon with hourly timesteps, new component setpoints must be generated in less

than an hour, so a time limit was implemented for the full mixed integer approach. The best

solution that was found within 3600 seconds of computation is returned. Since the FMI method

operates in nearly real time, two weeks were simulated: a winter week from January 8th to 15th,

Figure 9: Distribution of operating costs for each optimization of the 24-hour horizon using either the mcQP or cQP methods in a receeding horizon control strategy with perfect foreknowledge of the demands.

47

and a summer week from June 25th to July 1st. To facilitate direct comparison of FMI, cQP and

mcQP, the initial condition from the previous mcQP optimization was used for all three

approaches at each step.

During the winter week the FMI method converged to an optimal solution in less than an

hour for 104 of the 168 optimizations, found a non-optimal but feasible solution 10 times, and

failed to find a feasible operating condition 54 times. The commercial FMI solver takes longer to

find a solution when there is high cooling demand, e.g. summertime, because the multi-unit

chiller dispatch further complicates the mixed integer search space. During the summer week the

FMI method converged to an optimal solution only 35 times, found a feasible outcome for 17

additional cases, and failed to find a feasible dispatch for 116 of the 168 optimizations. Highly

complex scenarios are computationally costly, and a converged or even feasible solution may not

be found. Figure 10 compares the distribution of operating costs for each optimization of the 24-

hour horizon.

There are fewer cost samples for the FMI method because cost can only calculated for the

feasible scenarios. During the summer week the average 24-hour dispatch horizon cost is

$31,641, $32,026, and $31,979 for mcQP, cQP, and FMI respectively. During the winter week

Figure 10 Distribution of operating costs for each optimization of the 24-hour horizon using identical initial conditions and solving with either the mcQP, cQP, or FMI method for winter (left) and summer (right) sample weeks.

48

simulation the costs are $26,938, $27,464, and $27,339 for mcQP, cQP, and FMI respectively.

The mcQP method never fails to find a feasible solution and consistently finds the lowest cost

solution of all benchmarked methods in both seasons.

Table 5 Summary of Winter and Summer comparison of FMI, cQP and mcQP optimization methods

Method FMI cQP mcQP

Winter mean cost $27,339 $27,464 $26,938

Winter std. dev. $1,682 $2,739 $1,972

Summer mean cost $31,979 $32,026 $31,641

Summer std. dev. $1,983 $2,588 $2,160

Figure 11 presents results of a single optimization, midnight of January 8th, for all three

methods. For this case, the FMI solver converged on an optimal solution. The figure illustrates

only the electric portion of the dispatch solution, as the heating and chilling dispatches showed

greater similarity. The hourly cost over the course of a day varies with dispatch and unit

commitment. The spikes seen in the cost dispatch are a result of start-up costs as new generators

are brought online. The stacked bar chart illustrates the cumulative generation from each

component. Tracing the top of the stacked bars, and subtracting the charging power of the

storage that appears below the x-axis, equals the net demand at each hour. Discharging storage

power is stacked on top of the generation as it adds to the cumulative power. The overlayed line

represents the state-of-charge of the energy storage at each timestep. The generation scale is

shown on the left while the state of charge scale is shown on the right. The operating costs for

each optimization of the 24-hour horizon using identical initial conditions is $28,442, $33,648,

and $28,018 for mcQP, cQP, and FMI respectively.

49

The mcQP method charges the battery in the morning when electric prices are low, and

thus operates without the second gas turbine for much of the day. The FMI and cQP solutions

avoid using the electric utility altogether. The cQP solution employs the small micro-turbine and

mcQP

cQP

FMI

Figure 11 Comparison of electrical dispatch of January 8th for mcQP (top), cQP (middle), and FMI (bottom) methods with the same initial conditions and constrained to have the same ending state of charge. The cost for the FMI method is lowest, followed by mcQP, and cQP.

50

diesel reciprocating engine to make-up additional power, which accounts for the majority of the

additional cost. Additional control logic bespoke to this system could improve the general cQP

approach by forcing a check of the microturbine and diesel generator operating status. Generally

the cQP approach more closely approximates mcQP, and this particular optimization may be one

of the outliers of Figure 10.

51

Figure 12 presents results of a single optimization, midnight of June 26th, for all three

methods. The operating costs for each optimization of the 24-hour horizon using identical initial

mcQP

cQP

FMI

Figure 12 Dispatch from mcQP (top), cQP (middle), and FMI (bottom) methods for June 26th. The cost at each hour for each method is shown at the top.

52

conditions is $34,693, $35,751, and $35,141 for mcQP, cQP, and FMI respectively. The costs

and dispatches produced by all three aproaches are similar. This particular summer optimization

is simpler than most, as evidenced by the FMI method’s ability to reach a feasible solution.

Unlike mcQP, the cQP method keeps the second gas turbine off from 7-8am, 10am-1pm, and

from 6-7pm. During the middle of the day, the cQP method brings GT3 online and relies on the

battery and electric utility to compensate for not using GT2. This significantly changes the

battery discharge dynamics. The cQP method also reduces the time that the first gas turbine is

online. The reduction in use of the larger two gas turbines results in a higher overall cost for

cQP, as the utility is more heavily relied on.

1.10 ANN in for Unit Commitment

1.10.1 ANN Structure Optimization

A decision tree was created for each component to determine which features should be

used for ANN inputs.

The first two rows of Table 6 show the accuracy and number of nodes in each respective

component tree. The percent of examples that belong to the majority class is shown in the bottom

row for comparison to the accuracy.

Table 6: Accuracy of decision trees for each component's unit commitment. The accuracy must be high in comparison to the percent of training examples that are of one class to show that the problem is well defined by the

features included.

GT 1 GT 2 Fuel

Cell 1

Fuel

Cell 2

Small

GT

Diesel

Gen

Heater Chiller 1 Chiller 2 Chiller 3 Chiller

4

Accuracy 98.46 92.13 100 100 93.63 100 90.40 98.22 95.35 96.83 98.55 Number

of Nodes 217 857 1 1 599 1 1103 171 403 303 151

Percent

One Class 93.16 73.96 100 100 88.60 100 55.95 92.64 66.91 68.47 90.16

53

The first fuel cell is online for every timestep in the year, the second fuel cell is online for

every timestep except one, and the diesel generator is offline for every timestep in the year when

dispatched using the mcQP method. These three components do not require a decision tree or

neural network evaluation, because they always remain online due to the fuel cells’ high

efficiency, or offline due to the diesel generator’s low efficiency combined with high cost of

fuel.

All component trees have high accuracies above the percentage of training examples

belonging to one class, indicating that the problem is well represented by the trees with the

features used for thresholds. If the component tree accuracy were high, but below the percentage

of examples in one class, then the tree would perform worse than always outputting one class,

indicating a poorly represented problem. The comparison between accuracy and examples in one

class is important here because some components have a high percentage of examples in one

class. The trees for GT2, the Small GT, and the heater have lower accuracy because they are all

components which come online when there is not enough power provided by other components,

so their behaviors are more difficult to predict.

The decision trees are used to provide insight into which features are valuable and which

ones are not important when structuring an ANN. The features which appear near the top of

many components’ trees are more valuable than the features that appear at the bottom of the

trees, because thresholds are selected based on the feature that reduces information entropy the

most, so features near the top of the tree immediately provide value to the classification problem,

while features near the bottom of the tree may only separate a few examples from the group.

Features that do not appear in the tree at all are not important for component unit commitment.

54

All component trees have more than 150 nodes indicating that the problem has complex

behavior. The trees with more nodes have more complex relationships between the features and

unit commitment classification. The behavior of GT1 is relatively easy to predict comparing to

the behavior of the heater as evidenced by the larger decision tree needed to predict the heater

unit commitment.

The priority of each feature can be determined by its initial presence in the decision tree.

Features appearing earlier in the tree provide more immediate information gain. Table 7 shows

the highest layer at which each feature appears for each component. The estimations for the

current timestep, the next timestep, and the third timestep always appear in the fourth layer or

higher, meaning they are very important for all components. The importance of the components’

estimated dispatch is high because this can be used to estimate the online/offline state, similarly

to the cQP method. The upper and lower bounds either do not appear at all in the component

decision trees or are thresholded near the bottom of the tree, showing low importance of these

features. The low importance of the upper and lower bound features is likely because they are

captured by other features such as the initial condition and estimated dispatch. Estimation of

component dispatch states further into the future, from t = 6 to t= 23 do not show as high

importance as other features and are sometimes not included in component decision trees. The

power from storage devices shows lower importance than near future estimated dispatch, but

higher importance than far future estimated dispatch.

55

Table 7: Highest level layer for input threshold for each component with a decision tree. A level of 0 denotes that this feature was thresholded at the root node. A threshold level of None denotes that this feature was never thresholded for this component, so it is not an important feature in deciding the online status of the component. Etimations of future states are provided by the previous steps horizon dispatch. The estimation states only go out to t=23 because the previous dispatch does not include this timestep’s t = 24.

Thresholds GT 1 GT 2 Small GT Heater Chiller 1 Chiller 2 Chiller 3 Chiller 4

Electric Demand 7 4 4 7 10 11 7 10 Heat Demand 3 5 7 11 None 5 17 3 Cooling Demand 5 5 9 7 17 13 5 None Utility Cost 9 6 4 3 6 6 8 None Initial Condition 7 3 2 4 5 8 5 None Power from Battery 9 7 5 3 7 6 12 7 Power from Hot

Thermal Storage 4 4 3 2 None 2 2 10

Power from Cold

Thermal Storage 4 4 5 6 3 8 5 8

Estimation of t = 1 0 3 1 1 1 1 6 0

Estimation of t = 2 1 2 0 0 0 0 0 1

Estimation of t = 3 1 1 3 2 3 3 4 1

Estimation of t = 6 3 9 3 6 5 7 10 8

Estimation of t = 12 8 4 7 5 9 9 6 None

Estimation of t = 18 None 3 4 6 15 8 8 None

Estimation of t = 23 None 3 2 6 4 None 12 None

Upper Bound None 5 4 4 9 5 5 None Lower Bound None None 8 None None 7 18 None

Some components’ trees include thresholds on the estimated output from other

components with similar generation near the top of the tree. The importance of the estimated

output from other components with similar generation is intuitive, because an inefficient

component will not come online unless a more efficient component is unable to meet the full

demand.

Based on the information gathered from the decision tree structures, the following features

are selected for training the ANN:

56

• Estimated component set point from the previous dispatch for each component for

timesteps t = 1, t = 2, and t = 3

• Initial condition of each component for the current timestep

• Predicted energy storage input/output for each storage device at the current timestep

• Price of Electricity from the Grid at the current timestep

• Net Electric, Heating, and Cooling demand at the current timestep

Feature selection was conducted before training so that the optimal number of ANN layers is

found using the most effective set of features. Feature selection is conducted using decision trees,

while layer evaluation is conducted by testing ANNs with 1 through 6 layers with important

training features.

Neural Networks with layers ranging from 1 to 6 were compared to determine the

minimum number of layers necessary for acceptable accuracy. The unit commitment problem

follows logistic principles, so an ANN with few layers is sufficient. MATLAB’s Neural Network

toolbox is used for multi-layer neural network testing. This toolbox is a starting point for

validation because it does not support networks with less than one hidden layer, or two layers

total. An independently constructed ANN is needed to test the accuracy of a single layer ANN.

The ANNs are trained with 10-fold cross validation using the full training set of one year

and two days. The training set is two days longer than a year to assure that dispatches from the

end of one year to the beginning of the next year are included. There are 24 dispatches per day,

with 24 timesteps per dispatch, for 367 days excluding the initial condition dispatch which is 25

hours: 24 x 24 x 367 – 25 = 211,367 labeled examples.

57

The pattern recognition ANN algorithm from MATLAB’s Neural Network Toolbox was

used to train multi-layer ANNs for unit commitment using the features found to be important

from the decision tree. ANN’s were trained with 10-fold cross validation on training sets of

190,231 examples. The validation testing set is taken from the full 211,367 examples for the year

sampled at every tenth timestep to assure even representation of each hour of the day, day of the

week, month and season without aliasing. The validation test set size is 21,137 examples. The

validation testing set is removed from the total set, leaving behind an evenly distributed, but

distinct training set. Both sets are shuffled to avoid emphasizing any season. A threshold of 0.5

was used to sort the ANN outputs into the two distinct classes, with any output above 0.5

considered online, and any output below or equal to 0.5 considered offline.

The percentage of training and testing examples which are of one class are shown in

Table 8. Any decision process accuracy should exceed the percentage of examples in one class,

because always locking a component online or offline would meet the accuracy of examples in

one class.

Table 8: Test accuracy of single and double layer ANN when evaluated on the test set for each component unit commitment and the percent of training and testing examples that are of one class for each component.

GT 1 GT 2 Fuel Cell 1

Fuel Cell 2

Small GT

Diesel Gen

Heater Chiller 1 Chiller 2 Chiller 3 Chiller 4

ANN 1 Test

Accuracy

98.15 90.62 99.94 99.91 94.12 99.8

7

82.94 97.33 92.65 95.52 97.50

ANN 2 Test

Accuracy

99.01 93.39 1 1 94.67 1 92.04 98.12 96.33 97.44 99.02

Training

Percent One

Class

93.00 72.45 100 100 88.66 100 55.84 92.82 66.65 68.81 88.76

Testing

Percent One

Class

93.76 72.21 100 100 88.64 100 55.61 92.24 66.37 68.79 90.55

58

The learning curve for the single layer ANN is shown in Figure 13. The training and

testing error follow each other closely for all components for all iterations. Saturation is reached

after 10,000 iterations for all components, without test error increasing, so 10,000 iterations is

the limit. The single layer ANN employs batch training, so every iteration is an epoch, because

all training examples are seen every iteration. The heater does not reduce it’s training time until

after a few thousand iterations. This training slowdown may be due to the low number of

examples with the heater online and the difficulty predicting its unit commitment because it is a

backup heat source with fluctuating output.

The training accuracies for each component using increasing ANN depth is shown in

Figure 14. The training accuracy increases from a single layer ANN to a two layer ANN, but

plateaus after the two layer ANN for all components except Chiller 1, which has a dip in

accuracy when using two layers. All components show high training accuracy which risks

overfitting the training set, so the validation test set is compared.

Figure 13: Training and testing error for each component versus training iterations. The error for Fuel Cell 2 and GT 3 continues to decrease until 10,000 iterations.

59

The validation testing accuracies for each component are shown in Figure 15. Validation

testing accuracy is similar to the training accuracy for all components at all layer sizes. There is

an increase in test accuracy from one layer ANN to two layer ANN for all components except

Chiller 1 and Chiller 4. All components except the heater have accuracies above 90% for a single

layer. The fuel cell and diesel generator accuracies are near 100% because all instances are either

online or offline, indicating that a neural network should not be used for these components,

because they are always online or offline.

The training accuracies for each component are comparable, and validation test errors do

not begin to increase with iterations, so the ANNs are not overfit if the validation test set is

representative of the entire space of unit commitment.

Figure 14: Training accuracy of unit commitment for each component from ANNs with 1 through 6 layers. There is an increase in accuracy from one to two layers, followed by a plateau after for all components except Chiller 1 which shows a decrease in accuracy when switching to the 2 layer ANN.

60

All component ANNs achieved high accuracies without including upper and lower bound

features, and without far future dispatch estimates, so the removal of these features is

recommended to reduce network complexity, improve training time, and reduce memory

requirements for training.

Accuracy of component neural networks increases when moving from a single layer

network to a multi-layer network, however, the accuracy gains diminish as the number of layers

increases beyond two layers. The large initial accuracy increase followed by a tapering of

accuracy increase, indicates that feature inputs are interrelated, but that the unit commitment

problem is not sufficiently complex to require many layers.

Single and double layer ANNs are selected for training and benchmarking because of

their high accuracy and reduced computational demand for all components.

Figure 15: Validation testing accuracy of unit commitment for each component from ANNs with 1 through 6 layers during cross validation. There is an increase in accuracy from one to two layers, followed by a plateau after for all components except Chiller 1 and Chiller 4, which show a decrease in accuracy when switching to the 2 layer ANN. The test accuracy follows the training accuracy for most components.

61

1.10.2 ANN Implementation

A single layer ANN, ANN1, is implemented for unit commitment in the dispatch

optimization problem. The ANN is implemented on the simple case microgrid to show optimal

solution output and the complex case to show robustness and benchmark it against the mcQP

method.

For the simple case, mcQP is used to create a training set from one week’s worth of

historical demand data with added Gaussian noise. Only one week is used so that the network is

more closely trained to the near future scenario. Gaussian noise is added to the training set, so

that actual historical demand profiles can be used for test dispatches without having identical

examples for training and testing. The simple case is not analyzed statistically, because it was

only run for one week, so the analysis would not be representative of the full year stresses.

62

As seen in Figure 16, the unit commitment is almost identical for the simple case. The

only difference occurs from 6pm to 7pm when the ANN method uses the mGT and the mcQP

method does not.

The computational effort, measured in time (seconds) to complete each optimization,

demonstrates the greatest benefit of the ANN1. The initial ANN1 training steps require

additional time because the training set must be generated. Each subsequent ANN1 dispatch

takes a few milliseconds rather than seconds. Once unit commitment is established by ANN1,

quadratic programming is used to find the set points for the online generators. The final setpoint

Figure 16: Dispatch Comparison for ANN1 and mcQP for the simple case microgrid. Elec Utility1 is the electric utility, ICE is the internal combustion engine, mGT is the microturbine, and HotWater Tank is a hot water thermal storage unit. The top two graphs show the electric dispatch of the utility, ICE, and mGT to meet the demand. The lower two graphs show the heat dispatch using thermal storage and combined heat and power from the ICE and mGT to meet demand. The ANN (right) is capable of replicating the unit commitment of the mcQP (left) method with lower computational demand.

mcQP ANN

63

optimization step is faster than the Fit B filter and dispatch of mcQP because the unit

commitment is pre-determined.

Table 9: Time in seconds to complete various tasks using the mcQP method versus the ANN method for unit commitment and dispatch.

Task cQP ANN

dt = 1 hr dt = 15 min dt = 1 hr dt = 15 min

Creating Matrixes for

Optimization

0.87162 1.4581 0.87162 1.4581

Fit A dispatch 0.087135 0.16499 N/A N/A

Unit Commitment 0.52386 1.2186 0.0042934 0.0040753

Fit B dispatch with heuristic

rules

0.27072 0.62072 0.034862 0.094266

Training for one week of data N/A N/A 7.7968 43.4794

Total time for initial dispatch 1.7533 3.4624 8.7076 45.0318

Total time for subsequent

dispatches

0.8817 2.0043 0.0392 0.0983

Total time for one week’s

dispatches

149.00 1348.4 15.25 111.02

The ANN avoids flicker, successive start-up and shutdowns in short succession, that the

cQP does not always eliminate for the simple case. Flicker detrimentally effects of ramping,

shutdown, and startup on emissions, efficiency, and system maintenance. Flicker occurs in the

mcQP optimization when the forecast changes and there is a small marginal cost and start-up

cost difference, in opposite directions, between two or more generators. A fluctuation in the load

forecast can cause a more expensive to operate generator with a lower start-up cost to be slightly

preferred over a less expensive generator with a larger start-up cost. A subsequent shift of the

forecast in the opposite direction causes the solution to alternate back. During ANN training

flicker is seen as noise, and does not change the weight or bias structure sufficiently to replicate

the flicker behavior. Figure 17 illustrates how flicker, present in the mcQP solution, is

eliminated by the ANN. The mcQP method suggest shutting down the internal combustion

engine (ICE) at 11pm, and re-starting at 12am.

64

Figure 17: ANN1 (right) as compared to mcQP (left) for a sample dispatch. Note the elimination of shutting off the

internal combustion engine for a singular hour at midnight.

Flicker is potentially more disruptive in higher frequency dispatch schedules. Figure 18

illustrates results from dispatching the same system with 15 minute dispatch resolution. The

mcQP suggests two shutdowns of the ICE at 5:45 am and 5:15pm that the ANN method does not

65

agree with.

Figure 18: The ANN (right) is capable of eliminating shutdown of ICE at 6am and at 5pm and allows for a more steady state output from the engines. Note: in this scenario excess heat is dumped in order to prevent generators

from being controlled by heat demand.

ANN performs exceptionally well for the complex campus energy dispatch for which

cQP and mcQP were benchmarked against a full mixed integer solution. Nine tenths of the full

year receding horizon mcQP dispatch is used as the training set, with the remainder used for

validation testing. Once training and validation testing is completed, the ANN run for a whole

year as a receding horizon control using forecasted demand for testing. The validation data is

selected to prevent aliasing and assure inclusion of even proportions of each starting hour of the

day, day of the week, week of the month, month of the year, and season. The training and

validation sets are then shuffled and used in batch training to prevent over-emphasis of the end of

the year dispatches.

66

During receding horizon control, ANN1 produces feasible solutions for 8728 timesteps of

the 8760, a 0.37% failure rate. Dispatches where ANN1 returned an infeasible unit commitment

were run using mcQP for that timestep to maintain stability of the dispatch. Training takes 43

minutes to complete 10,000 iterations. Each dispatch takes an average of 0.167 seconds; one

tenth the time for cQP and one fiftieth the time for mcQP. The mean cost of a day’s dispatch is

$27,761, which is $769 more expensive than the mcQP method and $335 more expensive than

cQP. The standard deviation of dispatch cost for ANN1 is $3,485 which is $212 more than the

standard deviation for the mcQP method. The close cost and similar standard deviation indicate

that ANN1 dispatches similarly to mcQP for the complex case.

Figure 19: Cost Distribution of single layer ANN dispatch as compared to mcQP dispatch. Both dispatches approximate Gaussian with similar averages and standard deviations.

Figure 20 presents a dispatch comparison between mcQP and ANN1 from August 1st.

Nine of the 32 infeasible dispatches occur in August, making it the month with the most

67

infeasible dispatches. August has the highest cooling demand and often requires the use of all

chillers and cold thermal energy storage. Since the fourth chiller is rarely dispatched and is used

mostly as a backup component, the ANN had difficulty replicating the fluctuating behavior of

the chillers when the cooling demand reached high levels. The ANN1 dispatch is infeasible for

1.25% of the attempted optimizations in August. The cost of ANN1 for August 1st is higher than

mcQP for all hours except 11pm to midnight when mcQP brings GT2 back online and pays a

startup cost. ANN1 keeps GT2 on for the whole day instead of using the utility and battery

storage during the middle of the day, thus the higher operating costs throughout the day. The

difference in GT2’s unit commitment is also seen in the thermal dispatch in Figure 21, because it

is a CHP generator. ANN1 does not dispatch Chiller 3 and instead pushes Chillers 1 and 2 to

higher setpoints and uses the cold water storage tank for the middle of the day demand.

68

Figure 20: Comparison of mcQP (top) and ANN1 (bottom) electric dispatch for August 1st. The dispatches are similar except that the ANN keeps GT 2 on for longer while mcQP uses the utility and battery during the middle of the day. The cost of the ANN1 dispatch is higher than the mcQP disptach except at the end of the day when the mcQP brings

GT2 back online and pays a startup cost.

Figure 21 presents results of a two layered ANN, ANN2.

The double layer ANN, ANN2 achieved a higher training and validation testing accuracy

than ANN1. ANN2 reached a failure point at September 24th because it created a scenario where

storage was so depleted, that all components were unable to reach the high forecasted demand.

The storage depletion scenario was likely created because the test dispatch employed surface fit

69

load forecasting creating load fluctuations outside of the range of the training set loads. Further

refinement of the campus scenario, equipment sizes, or forecasting method would avoid this

challenge. Analysis presented covers dispatches from January 1st through September 23rd, or 266

days and 6384 24-hour horizon optimizations. ANN2 took 20 minutes to train which is lower

than the training time for ANN1 because ANN2 employs the commercial MATLAB Neural

Network Toolbox which parallelizes processes more efficiently than the author created program

for ANN training used for ANN1. ANN2 took an average of 0.368 seconds per dispatch. ANN2

did not find a feasible unit commitment for 118 timesteps, or 1.85% of dispatches. The higher

ANN2 failure rate is likely due to overfitting.

Figure 21: Comparisons of hot thermal (left) and cold thermal (right) power dispatches using mcQP (top) and ANN1 (bottom) for August 1st.

70

6. CONCLUSION

Dispatch optimization with mcQP is reliable and a dramatic improvement over full

mixed-integer solutions in both computational time and reliability, where reliability is measured

as the percentage feasibility when faced with the dispatch of a complex microgrid. The method is

highly suitable to design of micro-grid systems, and large time-step, e.g. hourly, dispatch

optimizations. The exponential increase in effort for complex systems, may limit its use as a real-

time controller when higher frequency dispatches are required. Dispatch optimization with cQP

is sufficiently fast, but reaches less optimal solutions and is applicable only for systems that do

not require complex unit commitment. Dispatch optimization with Gurobi’s CVX is unreliable

for complex cases, and is not a viable option for either dispatch optimization or training set

generation, because of the excessive computational time requirement and high failure rate.

Unit commitment with an ANN trained by mcQP provides sufficient reliability and is

computationally efficient. A single layer ANN achieves a sufficiently high accuracy replicating

mcQP unit commitment for complex cases without overfitting. To avoid computational slow

down and prevent excessive memory requirements, ANN input features should be limited to:

• Net electric demand: total electric demand minus solar PV electric generation

• Heat demand

• Cooling demand

• Market price of electricity at the current timestep

• Estimated power from each storage device at the current timestep

• Estimated dispatch at the previous timestep for each dispatchable component

• Estimated dispatch at the current timestep for each dispatchable component

71

• Estimated dispatch at the next timestep for each dispatchable component

• Power from energy storage at the current timestep

Other features such as estimated dispatch at future timesteps in the horizon, and upper and lower

bounds should not be incorporated, because they are not valuable features and obfuscate the

impact of important features, slowing learning, and even reducing accuracy in some cases.

Table 10: Comparison of Dispatch methods for year in receding horizon. The full mixed integer (FMI) method took longer than an hour to complete a single dispatch, so it was unable to run through a full year. The two layer ANN method (ANN2) crashed in September of the full year dispatch and its cost is not compared. The training time is lower for ANN2 than ANN1 because ANN2 employs a commercial training algorithm, while ANN1 employs an author created training algorithm that is likely less optimally parallelized.

FMI cQP mcQP ANN1 ANN2

Feasibility 100% 100% 99.63% 99.15%*

Time (s/dispatch) >3600 1.6 6.8 0.167 0.368

Training Time (s) N/A N/A N/A 43 min 20 min

Mean Cost ($) 28,096 26,992 27,761

Standard

Deviation Cost ($)

3,870 3,464 3,485

Single layer ANNs reduce flicker in dispatches as well as improving computational speed.

A multi-layer ANN likely leads to overfitting in energy dispatch applications.

The mcQP method is well suited for dispatch optimization and training set generation. The

cQP method should be used for less complex cases for training set generation and dispatch

optimization, because it can create dispatches quickly and reliably. The single layer ANN should

be used for real time optimization and control or for dispatch optimization because, once trained,

it rapidly solves the unit commitment problem.

72

7. DISCUSSION

The five methods benchmarked here, mcQP, cQP, FMI, ANN1, and ANN2 show a range in

dispatch reliability and computational speed. In general, there is a tradeoff between reliability

and speed, however, the mcQP method creates dispatches more reliably and faster than the

traditional FMI method. Also, ANN1 creates dispatches similar to the mcQP method reliably if

provided with a wide enough range of training examples.

The reliability and speed of both mcQP and ANN1 methods have the potential to close the

gap between optimization at large timesteps, and real time control. If generation setpoints can be

optimized in real time, then microgrid operating costs could be reduced enabling higher

penetration of technologies like combined heat and power, energy storage, and on site renewable

generation. The speed with which robust training sets can be created with mcQP also opens the

door to more machine learning techniques which require large training sets.

A single layer ANN is very similar in nature to non-linear regression. The weights and biases

are the fitting parameters, and back propagation with a single layer is similar to iterations in

regression. The high reliability and accuracy of the single layer ANN demonstrates that the unit

commitment problem is simple enough for regression methods. It also demonstrates that

complex methods such as multi-layer ANNs or deep learning machine learning methods are not

suitable for this problem, because they will not provide noise filtering and would be too

computationally demanding considering the problem at hand.

Further investigations of expansions to the Artificial Neural Network should be investigated.

The next step is to test the accuracy of an ANN trained for the entire dispatch problem including

unit commitment and dispatch optimization. The training sets can once again be taken from pre-

73

solved dispatches using the mcQP method, however input features would only be data input to

mcQP and training solutions would be the real valued set point for each component instead of

just the unit commitment decision. If an ANN can achieve high accuracy with this method, then

a reinforcement learning method should be tested where an ANN would be trained without a pre-

solved training set with feedback from the cost function. Since solutions to this problem are

known, the ANN’s solutions could be tested against known values for optimal solutions to

determine accuracy and robustness.

The high level of reliability, accuracy, and computational speed from ANN1 means that it

should be investigated for real and reactive power control. The extension to include reactive

power creates non-linear constraints, so solutions are not known a priori since cQP and mcQP

are limited to linear constraints. Reinforcement learning would avert the need for an established

training set, expanding the set of optimizable problems from convex mixed integer with linear

constraints to non-convex mixed integer with non-linear constraints. The success of the ANN

would be measured with the objective function and a tested against all constraints. A violation of

any constraint would result in an extremely high cost, guiding the ANN away from constraint

violation scenarios.

The potential for a neural network to train to a complex non-convex optimization

problem and create dispatches with high computational speed could open the door to

optimization and control unification for real power systems with transmission effects and

reactive power demands. Computationally efficient methods such as ANN1 are crucial for

reducing the operating costs of systems with renewables and energy storage allowing the

advancement of a cleaner power generation infrastructure.

74

REFERENCES

[1] C40 Cities Climate Leadership Group, Inc., "C40 Cities," 2018. [Online]. Available:

http://www.c40.org/cities . [Accessed 19 May 2018].

[2] United Nations Climate Change, "Conference of the Parties serving as the meeting of the

Parties to the Paris Agreement," United Nations Framework Convention on Climate

Change, Paris, 2014.

[3] U.S. Energy Information Administration, "Germany's renewables electricity generation

grows in 2015, but coals still dominant," eia, Washington, DC, 2016.

[4] Washington State University, "Steam plant starting up," 26 August 2004. [Online].

Available: https://news.wsu.edu/2004/08/20/steam-plant-starting-up/. [Accessed 19 March

2018].

[5] D. McLarty, C. Sabate, J. Brouwer and F. Jabbari, "Micro-grid energy dispatch

optimization and predictive control algorithms; A UC Irivine case study," Electrical Power

and Energy Systems, vol. 65, pp. 179-190, 2015.

[6] F. Farzan, S. Lahiri and M. Kleinberg, "Microgrids for Fun and Profit: The Economics of

Installation Investments and Operations," IEEE Power and Energy Magazine, Vols. July-

Aug, pp. 52-58, 2013.

[7] United States White House, "United States Mid-Century Strategy for Deep

Decarbonization," United Nations Framework Convention on Climate Change, Marrakech,

2016.

[8] S. Van Broekhoven, N. Judson, N. SV and W. Ross, "Microgrid Study: Energy Security for

DoD Installations," Defence Technical Information Center, 2012.

[9] N. Hatziargyriou, "Guest Editorial Special Section on Microgrids for Sustainable Energy

Systems," IEEE Transactions on Sustainable Energy, vol. 5, no. 4, p. 1309, 2014.

[10] Z. Yang, H. Zhong, Q. Xia and C. Kang, "Fundamental Review of the OPF Problem:

Challenges, Solutions, and State-of-the-Art Algorithms," Journal of Energy Engineering,

vol. 144, no. 1, 2018.

[11] D. McLarty, A. Traverso, N. Panossian and F. Jabbari, "Dynamic Economic Dispatch using

Complementary Quadratic Programing," TBD, 2018.

75

[12] D. McLarty, J. Brouwer and C. Ainschough, "Development of an open access tool for

design, simulated dispatch, and economic assessment of distributed generation

technologies," Energy and Buildings, vol. 105, no. 15, pp. 314-325, 2015.

[13] N. Panossian and D. McLarty, "Artificial Neural Network Trained with Complementary

Quadratic Progrmaming for Realtime Unit Commitment and Microgrid Dispatch

Optimization," TBD, 2018.

[14] S. Boyd and L. Vandenberghe, "Interior-point methods," in Convex Optimization,

Cambridge, Cambridge University Press, 2004, pp. 561-622.

[15] S. Boyd and L. Vandenberghe, "Gradient Descent Method," in Convex Optimization,

Cambridge, Cambridge Universtiy Press, 2004, p. 4660475.

[16] A. Ardakani and F. Bouffard, "Identification of Umbrella Constraints in DC-Based

Security-Constrained Optimal Power Flow," IEEE Transactions on Power Systems, vol.

28, no. 4, pp. 3924-3934, 2013.

[17] M. Nemati, M. Braun and S. Tenbohlen, "Optimization of unit commitment and economic

dispatch in microgrids based on genetic algorithm and mixed integer linear programming,"

Applied Energy, vol. 210, pp. 944-963, 2018.

[18] C. Zhao, J. Wang, J. Watson and Y. Guan, "Multi-Stage Robust Unit Commitment

Considering Wind and Demand Response Uncertainties," IEEE Transactions on Power

Systems, vol. 28, no. 3, pp. 2708-2717, 2013.

[19] A. Castillo, C. Laird, C. Silva-Monroy, W. JP and R. O'Neill, "The Unit Commitment

Problem with AC Optimal Power Flow Constraints," IEEE Transactions on Power

Systems, vol. 31, no. 6, 2016.

[20] J. Jian, K. Meng, Y. Xu and Z. Dong, "A novel projected two-binary-variable formulation

for unit commitment in power systems," Applied Energy , vol. 187, pp. 732-745, 2017.

[21] A. Farag, A. Al-Baiyat and T. Cheng, "Economic Load Dispatch Multiobjective

Optimization Procedures Using Linear Programming Techniques," IEEE Transactions on

Power Systems, vol. 10, no. 2, pp. 731-738, 1995.

[22] A. Castillo, P. Lipka, J. Watson, S. Oren and R. O'Neill, "A successive linear programming

approach to solving the iv-acopf," IEEE Transactions on Power Systems, vol. 31, no. 4,

2015.

[23] J. Lavei, A. Rantzer and S. Low, "Power Flow Optimization Using Positive Quadratic

Programming*," IFAC Proceedings Volumes, vol. 44, no. 1, pp. 10481-10486, 2011.

76

[24] E. Erseghe and S. Tomasin, "Power Flow Optimization for Smart Microgrids by SDP

Relaxation on Linear Networks," IEEE Trans Smart Grid, vol. 4, no. 2, pp. 751-762, 2013.

[25] C. Wu, P. Jiang, Y. Sun, C. Shang and W. Gu, "Economic dispatch with CHP and wind

power using probabilistic sequence theory and hybrid heuristic algorithm," AIP Journal of

Renewable and Sustainable Energy, vol. 9, 2017.

[26] E. Alvarez, J. Gomez-Aleixandre, N. de Abajo and A. Campos Lopez, "Algorithm for

microgrid on-line central dispatch of electrical power and heat," in Universities Power

Engineering Conference, Glasgow, 2009.

[27] M. Geidl and G. Andersson, "Optimal Power Flow of Mutiple Energy Carriers," IEEE

Transactions on Power Systems, vol. 22, no. 1, 2007.

[28] M. Moeini-Aghtaie, A. Abbaspour, M. Fotuhi-Firusabad and H. Ehsan, "A Decomposed

Solution to Multiple-Energy Carriers Optimal Power Flow," IEEE Transactions on Power

Systems, vol. 29, no. 2, pp. 707-716, 2014.

[29] M. Abido, "Optimal Power Flow Using Particle Swarm Optimization," International

Journal of Electrical Power and Energy Systems, vol. 24, no. 7, pp. 563-571, 2002.

[30] Z. Feng, W. Niu, J. Zhou and C. CT, "Multiobjective Operation Optimization of a

Cascaded Hydropower System," Journal of Water Resources Planning and Management,

vol. 143, no. 10, 2017.

[31] R. Arul, S. Velusami and G. Ravi, "A new algorithm for combined dynamic economic

emission dispatch with security constraints," Energy, vol. 28, no. 4, pp. 3924-3934, 2013.

[32] D. Aydin, S. Ozyon, C. Yasar and T. Liao, "Artificial bee colony algorithm with dynamic

population size to combined economic and emission dispatch problem," International

Journal of Electrical Power and Energy Systems, vol. 54, pp. 144-153, 2014.

[33] H. Wu, X. Liu and M. Ding, "Dynamic Economic Dispatch of a Microgrid: Mathematical

Models and Solution Algorithm," International Journal of Electrical Power and Energy

Systems, vol. 63, pp. 336-346, 2014.

[34] A. Moghaddam, A. Seifi, T. Niknam and M. Pahlavani, "Multi-objective operation

management of renewable MG (micro-grid) with back-up micro-turbine/fuel cell/battery

hybrid power source," Energy, vol. 36, no. 11, pp. 6490-6507, 2011.

[35] L. Han, C. Romero and Z. Yao, "Economic dispatch optimization algorithm based on

particle diffusion," Energy Conversion and Management, vol. 105, pp. 1251-1260, 2015.

77

[36] S. Pothiya, I. Ngamroo and W. Kongprawechon, "Ant colony optimisation for economic

dispatch problem with non-smooth cost functions," International Journal of Electrical

Power and Energy Systems, vol. 32, no. 5, pp. 478-487, 2010.

[37] Y. Song, C. Chou and T. Stonham, "Combined heat and power economic dispatch by

improved ant colony search algorithm," Electric Power Systems Research, vol. 52, no. 2,

pp. 115-121, 1999.

[38] M. Basu and A. Chowhury, "Cuckoo Search Algorithm for Economic Dispatch," Energy

60, pp. 99-108, 2013.

[39] A. Shabanpour-Haghighi, A. Reza Seifi and T. Niknam, "A modified teaching-learning

based optimization for multi-objective optimal power flow problem," Energy Conversion

and Management, vol. 77, pp. 597-607, 2014.

[40] M. Basu, "Artificial immune system for combined heat and power economic dispatch,"

International Journal of Electrical Power and Energy Systems, vol. 43, no. 1, pp. 1-5,

1012.

[41] S. Hemamalini and S. Simon, "Dynamic economic dispatch using artificial immune system

for units with valve-point effect," International Journal of Electrical Power and Energy

Systems, vol. 33, no. 4, pp. 868-874, 2011.

[42] C. Chen, S. Duan, T. Cai, B. Liu and G. Hu, "Smart Energy Management System for

Optimal Microgrid Economic Operation," IET Renewable Power Generation, vol. 5, no. 3,

pp. 258-267, 2011.

[43] C. Aguilar, E. Westman, J. Muehlboeck, P. Mecocci, B. Vellas, M. Tsolaki, I. Kloszewska,

H. Soininen, S. Lovestone, C. Spenger, A. Simmons and L. Wahlund, "Different

multivariate techniques for automated classification of MRI data in Alzheimer's disease

and mild cognitive impairment," Psychiatry Res, pp. 89-98, 2013.

[44] M. Binder, A. Steiner, M. Schwartz, S. Knollmayer and K. P. H. Wolff, "Application of an

Artificial Neural-Network in Epiluminescence Microscopy Pattern-Analysis of Pigmented

Skin-Lesions - A Pilot-Study," British Journal of Dermatology, vol. 130, no. 4, pp. 460-

465, 1994.

[45] C. Heath, S. Cooper, K. Murray, A. Lowman, C. Henry, M. MacLeod, G. Stewart, M.

Zeidler, J. MacKenzie, J. Ironside, D. Summers, R. Knight and R. Will, "Validation of

diagnostic criteria for variant Creutzfeldt-Jakob disease," Annals of Neurology, vol. 67, no.

6, pp. 761-770, 2010.

[46] D. Ciregan, U. Meier and J. Schmidhuber, "Multi-column deep neural networks for image

classification," in IEEE Conference on Computer Vision and Pattern Recognition,

Providence, RI, 2012.

78

[47] G. Birajdar and V. Mankar, "Subsampling-Based Blind Image Forgery Detection Using

Support Vector Machine and Artificial Neural Network Classifiers," Arabian Journal for

Science and Engineering, vol. 43, no. 2, pp. 555-568, 2017.

[48] S. Kuter, S. Akyurek and G. Weber, "Retrieval of fractional snow covered area from

MODIS data by multivariate adaptive regression splines," Remote Sensing of Environment,

vol. 205, pp. 236-252, 2018.

[49] R. Furferi and L. Governi, "Machine vision tool for real-time detection of defects on textile

raw fabrics," Journal of the Textile Institute, vol. 99, no. 1, pp. 57-66, 2008.

[50] I. Lopez-Juarez, R. Rios-Cabrera, S. Hsieh and M. Howarth, "A hybrid non-invasive

method for internal/external quality assessment of potatoes," European Food Research and

Technology, vol. 244, no. 1, pp. 161-174, 244.

[51] R. Rojas-Moraleda, N. Valous, A. Gowen, C. Esquerre, S. Hartel, L. Salinas and C.

O'Donnell, "A frame-based ANN for classification of hyperspectral images: assessment of

mechanical damage in mushrooms," Neural Computing and Applications, vol. 28, no. 1,

pp. S969-S981, 2017.

[52] M. Kentardzic, "Artificial Neural Networks," in Data Mining: Concepts, Models, Methods,

and Algorithms, Institute of Electrical and Electronics Engineers, John Wiley and Sons,

Inc., 2011, pp. 199-235.

[53] G. Capizzi, G. Lo Sciuto, C. Napoli and E. Tramontana, "An advanced neural network

based solution to enforce dispatch continuity in smart grids," Applied Soft Computing, vol.

62, pp. 768-775, 2018.

[54] B. Doucoure, K. Agbossou and A. Cardenas, "Time series preiction using artificial wavelet

neural network and multi-resolution analysis: Application to wind speed data," Renewable

Energy, vol. 92, pp. 202-211, 2016.

[55] A. Khotanzad, R. Afkhami-Rohani, T. Lu, A. Abaye, M. Davis and D. Maratukulam,

"ANNSTLF - A Neural-Network-Based Electric Load Forecasting System," IEEE

Transactions on Neural Networks, vol. 8, no. 4, pp. 835-846, 1997.

[56] D. Chaturvedi, A. Sinha and O. Malik, "Short term load forecast using fuzzy logic and

wavelet transform integrated generalized in neural network," International Journal of

Electrical Power and Energy Systems, vol. 67, pp. 230-237, 2015.

[57] J. Park, Y. Kim, I. Eom and K. Lee, "Economic load dispatch for piecewise quadratic cost

function using Hopfield neural network," IEEE Transactions on Power Systems, vol. 8, no.

3, pp. 1030-1038, 1993.

[58] H. I. Duame, A Course in Machine, self published, 2017.

79

[59] S. Kamail and T. Amraee, "Blackout prediction in interconnected electric energy systems

considering generation re-dispatch and energy curtailment," Applied Energy, vol. 187, no.

1, pp. 50-61, 2017.

[60] D. Costa, M. Nunez, J. Vieira and U. Bezerra, "Decision tree-based security dispatch

application in integrated electric power and natural-gas networks," Electric Power Systems

Research, vol. 141, pp. 442-449, 2016.

[61] H. Mohammadi, G. Khademi, D. Simon and M. Dehghani, "Multi-objective optimization

of decision trees for power system voltage security assessment," in IEEE SysCon, Orlando,

FL, 2016.

[62] R. Quinlan and M. Kaufmann, "C4.5 Programs for Machine Learning," Kluwer Academic

Publishers, vol. 16, no. 4, 1994.

[63] E. Frank, M. Hall and I. Witten, The WEKA Workbench. Onlne Appendix for "Data

Mining : Practical Machine Learning Tools and Techniques, 4th ed., Morgan Kaufmann,

2016.

[64] W. Rawat and Z. Wang, "Deep Convolutional Neural Networks for Image Classification: A

Comprehensive Review," Massachusetts Institute of Technology Neural Computation, vol.

29, no. 9, 2017.

[65] C. Ronao and S. Cho, "Human activity recognition with smartphone sensors using deep

learning neural networks," Expert Systems with Applications, vol. 59, pp. 235-244, 2016.

[66] A. Mills and D. McLarty, "Large Scale Network Optimization of the Columbia River

Basin," TBD, 2018.

[67] F. Shariatzadeh, N. Kumar and A. Srivastava, "Optimal Control Algorithms for

Reconfiguration of Shipboard Microgrid Distribution System Using Intelligent

Techniques," IEEE Transactions on Industry Applications, vol. 53, pp. 474-482, 2017.

80

8. APPENDIX A: SAMPLE SOURCE CODE

1.11 Neural Network Class

classdef Neural_Network

properties

%define structure parameters

inputLayerSize

outputLayerSize

%define wieght parameters

Wlayer1

%define bias

blayer1

%define scale factor for how complex you want to allow your system

%to be

lambda

%is it a classification network (generators off/on)

classify

%constant for node function

nodeconst

%statistical stuff to update with more info

avrginputs

stddev

end

methods

function obj = Neural_Network(inputLayerSize, outputLayerSize, varargin)

%initialization

%inputs include UpperBound, LowerBound, Demand, quadratic

%portion of cost, linear portion of cost, cost/kWh from grid

%for each generator, so inputlaysize is 4*#ofgenerators+2

if isnumeric(inputLayerSize)

obj.inputLayerSize = inputLayerSize;

obj.Wlayer1 = rand(obj.inputLayerSize,outputLayerSize);%give a

different weight to each input's connection to each node

end

if isnumeric(outputLayerSize)

obj.outputLayerSize = outputLayerSize;

obj.blayer1 = rand(1,outputLayerSize);

end

%default conditions

obj.lambda = .0001;

obj.classify = false;

obj.nodeconst = 1;

obj.avrginputs = [];

obj.stddev = [];

if length(varargin)==1%the first input is if it is a classification

network, then node const, then lambda

81

classify = iscellstr(varargin);

lambda = isnumeric(varargin);

obj.lambda = varargin(lambda);

obj.classify = (nnz(classify)>0);

elseif length(varargin)==2

obj.classify = strcmp(varargin{1},'classify');

obj.nodeconst = varargin{2};

end

end

function yHat = forward(self, X)

%forward propogate inputs, X is the inputs, this must be in a

%genparameters x number of outputs size

if sum(size(X')==size(self.Wlayer1))==2%if inputs directly allow

multiplication

z2 = X.*self.Wlayer1';

yHat = sum(z2,2)+self.blayer1';

else%if only one row of inputs, or one row of inputs per timestep

yHat = (X*self.Wlayer1 + self.blayer1);

end

yHat = activationf(self, yHat); %use the activation function scaled by 1

end

function a2 = activationf(self,yHat)

%apply activation function

%if it is a classifyer use a sigmoid function

if self.classify

a2 = 1./(1+exp(-self.nodeconst*yHat));

if nnz(isnan(a2))>0

a2(and(isnan(a2),yHat>0)) = 1;%if numbers are too big inf/inf,

make a2=1

a2(isnan(a2)) = 0;%if numbers are to negative -inf/-inf, make a2=0

end

else %if it is a numeric output network don't include activation

a2 = yHat*self.nodeconst;

end

end

end

end

1.12 Single Layer ANN Training Algorithm

function [Net,sqrerror] = trainNetwork(Net,desiredOut, inputs)

%this does forward propagation for a one layer network for a set of

%generators and a demand

%inputs: network, desiredOut: desired network output when using forward

%funnction in the form of a vertical vector, inputs: matrix of inputs of

%size inputlength x number of outputs

%inputs in order: ub, lb, f, H for each generator, demand, $/kWgrid

82

[sqrerror,dedW,dedb] = finderror(Net,inputs,desiredOut);%find the error and the

gradient of the error

tolerance = .0001;

% initialize an approximation of the Hessian matrix = d^2f/(dx_i dx_j)

warning('off','all')%prevent print of warning as Hessian gets close to singular

%for i = 1:1:length(dedW(1,:)) %each set for each node output must be trained

individually

iterations = 0;

laststep = zeros(size(dedW));

lastbstep = zeros(size(dedb));

a = 1;

momentum = .25;%.3 is too high, .1 is too low, .2 does well for test2E_1BS

while nnz(sqrerror>tolerance)>0 %keep training until you get the desired output

%find error and relation to weights and biases

[sqrerror, dedW, dedb] = finderror(Net, inputs, desiredOut);

iterations = iterations+1;

step = dedW.*a/100+laststep.*momentum/100;%training step for weights

bstep = dedb.*a/100+lastbstep.*momentum/100;%training step for bias

Net.Wlayer1 = Net.Wlayer1-step;%minimize error, so go down the slope

Net.blayer1 = Net.blayer1-bstep;

%check error with new weight an bias

[sqrerrornew, ~, ~] = finderror(Net, inputs, desiredOut);

%if it gets worse, try the other direction and try a different step size

if sum(sum(abs(sqrerrornew)))>=sum(sum(abs(sqrerror))) ||

nnz(isinf(sqrerrornew))>0%|| nnz(isnan(dedWnew))>0%if the error gets worse or you have

reached a flat point

if abs(a) <1e-12 %try different size steps

%direction = direction + 1;

a = 1;

else

a = a/10;

end

%undo the last change

Net.Wlayer1 = Net.Wlayer1+step;

Net.blayer1 = Net.blayer1+bstep;

laststep = zeros(size(laststep));

lastbstep = zeros(size(lastbstep));

%if it gets better, keep the change and keep going

else %if it works

laststep = step;

lastbstep = bstep;

sqrerror = sqrerrornew;

%if you are below tolerance, go to the next weight

if nnz(sqrerrornew>tolerance)==0

disp('below tolerance');

break

83

end

end

%if you have hit your maz iterations, stop

if iterations>1e+4

disp('not converging after 10^4 iterations, exiting loop');

if Net.classify

sqrerror = sqrerrornew;

break

else

sqrerror = sqrerrornew;

break

end

end

end

function [cost,derrordW,derrordb] = finderror(Net,inputs,desiredOut)

NetOut = forward(Net,inputs);

error = (desiredOut-NetOut);%all errors

cost = error.^2.*0.5;

%keep model simple using lambda to prevent over fitting

if Net.classify %if it has a sigmoid function

%use cross error to prevent learning slowdown with sigmoid functions

derrordW = -2*inputs'*(error.*(NetOut.*(1-

NetOut))*Net.nodeconst)/length(desiredOut(:,1));

derrordb = -2*sum(error.*(NetOut.*(1-

NetOut))*Net.nodeconst)/length(desiredOut(:,1));

else%if no activation function

derrordW = (-error*inputs)';% + Net.lambda*Net.Wlayer1;%no activation function so

this is just the error

derrordb = -1/length(error(1,:))*sum(error,2)';

end