Master's thesis: Short Term Household Electricity Load Forecasting Using a Distributed In-Memory...

115
FAKULT ¨ AT F ¨ UR INFORMATIK DER TECHNISCHEN UNIVERSIT ¨ AT M ¨ UNCHEN Master’s Thesis in Informatics Short Term Household Electricity Load Forecasting Using a Distributed In-Memory Event Stream Processing System Alexander Aprelkin

Transcript of Master's thesis: Short Term Household Electricity Load Forecasting Using a Distributed In-Memory...

FAKULTAT FUR INFORMATIKDER TECHNISCHEN UNIVERSITAT MUNCHEN

Master’s Thesis in Informatics

Short Term Household Electricity LoadForecasting Using a Distributed In-Memory

Event Stream Processing System

Alexander Aprelkin

FAKULTAT FUR INFORMATIKDER TECHNISCHEN UNIVERSITAT MUNCHEN

Master’s Thesis in Informatics

Short Term Household Electricity Load ForecastingUsing a Distributed In-Memory Event Stream

Processing System

Kurzfristige Haushaltsstromlastprognose mit Hilfe eineshauptspeicherbasierten verteilten Systems zur

Verarbeitung von Ereignisstromen

Author: Alexander AprelkinSupervisor: Prof. Dr. Hans-Arno JacobsenAdvisor: Mag. Christoph DoblanderDate: June 15, 2014

Ich versichere, dass ich diese Diplomarbeit selbstandig verfasst und nur die angegebenenQuellen und Hilfsmittel verwendet habe.

Munchen, den 16. Juni 2014 Alexander Aprelkin

Acknowledgments

Foremost, I would like to thank my advisor Christoph Doblander, who supported mewith infrastructure, important advices, freedom and a comfortable atmosphere duringwriting of the thesis. My friends Gennady and Denis for their state of mind and a con-stant motivational support. I would also like to thank my family, who has supported meduring my studies and during the process of writing this thesis. A special thanks to theresearchers of the chair for the insightful and useful comments during the presentation ofthis thesis.

vii

Abstract

The thesis investigates the task of short term electricity load forecasting in householdsusing the bottom-up approach with data originating from smart meters. The proposedforecasting system uses a state-of-the-art in-memory stream processing engine for dis-tributed Data Analytics Spark Streaming. Four different load forecasting methods areimplemented and benchmarked with respect to latency. The results show that the testedframework achieves a sub-second latency with all forecasting approaches and is able toreach a reliable throughput.

ix

x

Contents

Acknowledgements vii

Abstract ix

Outline of the Thesis xvii

I. Introduction and Overview 1

1. Introduction 31.1. Focus of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2. Purpose and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.3. Thesis Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2. Method and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3. Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2. Background and Related Work 72.1. Electricity Load Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1. Electric Power Industry and Smart Grids . . . . . . . . . . . . . . . . 72.1.2. Goals and Metrics of Load Forecasting . . . . . . . . . . . . . . . . . 112.1.3. Simple and Advanced Load Forecasting Methods . . . . . . . . . . . 13

2.2. Big Data Analytics and Stream Processing . . . . . . . . . . . . . . . . . . . . 172.2.1. Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.2. Classification of Analytics Methods . . . . . . . . . . . . . . . . . . . 192.2.3. Complex Event Processing . . . . . . . . . . . . . . . . . . . . . . . . 192.2.4. Event Stream Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.5. Distributed Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . 232.2.6. State-of-the-Art in Complex Event Processing and Event Stream Pro-

cessing Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2.7. Framework of Choice: Apache Spark Streaming . . . . . . . . . . . . 28

2.3. Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.1. Review of short term load forecasting strategies . . . . . . . . . . . . 312.3.2. Household Short Term Load Forecasting . . . . . . . . . . . . . . . . 332.3.3. Load Forecasting using Complex Event Processing . . . . . . . . . . 36

2.4. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

xi

Contents

II. Proposed Method and Implementation 37

3. Data 393.1. Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2. Structure of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3. Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4. Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.5. Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.6. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4. Method and Implementation 514.1. Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.2. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.1. Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.2.3. Forecasting methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.4. Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.5. The Strategy of Relearning . . . . . . . . . . . . . . . . . . . . . . . . 644.2.6. Other Considered Machine Learning Libaries . . . . . . . . . . . . . 644.2.7. Further Implementation Details and Limitations . . . . . . . . . . . . 64

4.3. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

III. Experimental Results and Outlook 67

5. Experiments and Results 695.1. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.2. Analysis of the Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 82

5.2.1. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.2.2. Forecasting Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.2.3. Discusson and Contributions . . . . . . . . . . . . . . . . . . . . . . . 835.2.4. Open Problems and Limitations . . . . . . . . . . . . . . . . . . . . . 83

6. Conclusions and Outlook 856.1. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.2. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Bibliography 87

xii

List of Tables

2.1. Forecast accuracy metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1. Data file structure and fields’ descriptions . . . . . . . . . . . . . . . . . . . . 40

5.1. Comparison of latency for different forecasting methods for 1 plug and asliding window of 30 seconds with 10 seconds shift size and 1 groupingthread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2. Comparison of latency performance of different forecasting methods on the64-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.3. Comparison on impact on latency of different number of grouping threadson the 8-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.4. Comparison on impact of different number of grouping threads on latencyon the 64-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.5. Comparison of latency of SVM-based method and the persistence methodusing 2 threads for grouping on forecasting of load values of one house, onthe 8-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.6. Comparison of latency of SVM-based method and the persistence methodusing 2 threads for grouping on forecasting of load values of one house, onthe 64-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.7. Comparison of throughput (records per second) depending on the SparkStreaming batch diration on the 8-core machine . . . . . . . . . . . . . . . . . 77

5.8. Comparison of throughput (records per second) depending on the SparkStreaming batch diration on a 64-core machine . . . . . . . . . . . . . . . . . 78

5.9. Latency depending on different sliding window sizes and number of threads,on the 8-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.10. Latency depending on different sliding window sizes and number of threads,on the 64-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

xiii

List of Tables

xiv

List of Figures

2.1. Traditional electricity delivery industry with one-way communication . . . 82.2. Smart grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3. Sliding window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.4. Map-Reduce concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5. Data processing using Spark Streaming . . . . . . . . . . . . . . . . . . . . . 292.6. RDDs and stateful computation using DStreams . . . . . . . . . . . . . . . . 302.7. Related scientific work and literature: the green and the black areas . . . . . 32

3.1. Snapshot of the data file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2. Number of events per second is not regular . . . . . . . . . . . . . . . . . . . 423.3. Distribution of measurements from different plugs belonging to different

households during the first approximately 10000 data rows . . . . . . . . . . 433.4. Distribution of measurements belonging to different households and houses

during the first approximately 10000 data rows . . . . . . . . . . . . . . . . . 443.5. Distribution of appearance of plugs measurements belonging to different

houses during the first approximately 10000 data rows . . . . . . . . . . . . 453.6. Distribution of load values belonging to different plugs within the first ap-

proximately 10000 data rows . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.7. Distribution of load values of all plugs along the time axis . . . . . . . . . . 473.8. Model for an event for the stream processing . . . . . . . . . . . . . . . . . . 483.9. Problem of missing load values . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1. Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2. Scheme of a region with missing load values with an immediate work value

increase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3. Scheme of a region with missing load values with a later work value increase 544.4. Data flow and architecture of the prototype . . . . . . . . . . . . . . . . . . . 574.5. UML class diagram: application architecture . . . . . . . . . . . . . . . . . . 584.6. Slice-based model for the forecasting with a representation of lag values, an

actual value and a value to be predicted . . . . . . . . . . . . . . . . . . . . . 604.7. Slice-based model for forecasting with a representation of lag values, sliding

window and a value to be predicted . . . . . . . . . . . . . . . . . . . . . . . 604.8. Transformation of data for the regression-based forecasting method . . . . . 624.9. Persistence forecasting method . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1. Comparison of latency performance of different forecasting methods on a8-core machine on a heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2. Comparison of latency performance of different forecasting methods on the64-core machine on a heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xv

List of Figures

5.3. Combined heat map with latency performance of all algorithms on bothmachines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.4. Comparison of latency performance depending on different number of threadsfor grouping as a heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.5. Comparison on impact of different number of grouping threads on latencyon the 64-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.6. Combined heatmap with a comparison of impact of different number ofgrouping threads on latency on both machines . . . . . . . . . . . . . . . . . 74

5.7. Comparison of latency of SVM-based method and the persistence methodusing 2 threads for grouping on forecasting of load values of one house, onthe 8-core machine, as a heatmap . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.8. Comparison of latency of SVM-based method and the persistence methodusing 2 threads for grouping on forecasting of load values of one house, onthe 8-core machine, as a heatmap . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.9. Comparison of latency of SVM-based method and the persistence methodusing 2 threads for grouping on forecasting of load values of one house, onboth machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.10. Comparison of throughput (records per second) depending on the SparkStreaming batch diration on a 64-core machine . . . . . . . . . . . . . . . . . 78

5.11. Comparison of throughput (records per second) depending on the SparkStreaming batch diration on a 64-core machine, as a heatmap . . . . . . . . . 79

5.12. Comparison of throughput (records per second) depending on the SparkStreaming batch diration on both machines, as a heatmap . . . . . . . . . . . 79

5.13. Impact of different sliding window sizes and and number of threads, on the8-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.14. Impact of different sliding window sizes and and number of threads, on the64-core machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.15. Impact of different sliding window sizes and and number of threads, onboth machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

xvi

List of Figures

Outline of the Thesis

Part I: Introduction and Overview

CHAPTER 1: INTRODUCTION

Thesis goals and introduction into the domain are given in this chapter.

CHAPTER 2: BACKGROUND AND RELATED WORK

This chapter contains the domain theory and related work.

Part II: Proposed Method and Implementation

CHAPTER 3: DATA

In this chapter, data used for the method is described.

CHAPTER 4: ARCHITECTURE AND IMPLEMENTATION

This chapter presents the architecture and implementation details.

Part III: Experimental Results and Discussion

CHAPTER 5: EXPERIMENTS AND RESULTS

Experimental results and their evaluation are presented in this chapter.

CHAPTER 6: CONCLUSIONS AND OUTLOOK

This chapter concludes the thesis and discusses future work.

xvii

Part I.

Introduction and Overview

1

1. Introduction

The first chapter highlights the importance of the project, states the research gaps andprovides a theoretical context for the research. Furthermore, it contains a brief descriptionof the method and contributions and provides a thesis plan.

1.1. Focus of the Thesis

This section gives an overview over the thesis goals. It describes the need in the applicationdomain and introduces into the topic.

1.1.1. Motivation

Smart grids will allow consumers to be more directly involved into the process of energyconsumption. Consumers will receive real-time feedback about their current energy con-sumption and price. This information enables consumers to make their own just-in-timedecisions about change of energy consumption and make their behavior more efficient andcarbon neutral.

Smart grids will save billions of dollars for both sides: consumers and energy operatorsin the long term, according to estimates of experts [19], [66], [115].

Fine granular energy consumption forecast provides the possibility to determine energydemand on parts of distribution infrastructure, for better planning and load balancing.

Short term forecasts enable real-time reaction and a more optimal scheduling of energygeneration, distribution and consumption. If the forecast value is too small comparedto real consumption, electricity delivery infrastructure may become unstable and energyreserves insufficient. If it is too large, costs on reserve energy are wasted. The accuracy ofthe forecast therefore determines the economics of energy generation and distribution.

Energy demand forecast can also support network distribution operators to investigatean optimal strategy for network planning.

Usually, energy demand profile has a cyclical form on different levels of granularity.Different models have been used in the past and nowadays to predict energy consump-

tion of residential households: artificial intelligence models (neural networks, decisiontrees, support vector regression and others), time series analysis, hybrid models.

Since millions of end-consumers will be directly involved into the processes and infor-mation flows of a smart grid, high scalability of the methods becomes an important issue.Distributed applications running on clusters of interconnected nodes, intelligent choice offast algorithms and frameworks allowing real-time data analytics have to solve the arisingproblems.

A complex event processing framework provides a possibility for a system to work withreal-time continuous data stream.

3

1. Introduction

Sophisticated machine learning methods for load forecasting based on historical con-sumption enable an accurate planning of electricity distribution. A learning method shouldinclude most recent events and perform well in terms of computation time to be used in areal-time application.

In [31], [66] is shown, that most European residential electricity consumption is causedby refrigerators, heating and lighting systems.

However, consumers are not always aware of that.As presented in [79], consumers could be interested in saving energy if they would be

given a good informative chart.

1.1.2. Purpose and Goals

Thesis proposes the use of a distributed mini-batch stream event processing framework forthe task of load forecasting and evaluates its performance in terms of prediction accuracyand latency.

According to the study of the related work, the following research questions represent aresearch gap nowadays and involve the following problems and challenges:

• Is the system able to deal with relatively high throughput of data?

• What is an optimal number of processors in terms of system performance?

• What is an optimal sliding window size for the load forecasting task?

• What is the optimal strategy in terms of latency to do short-term load forecastingusing that framework?

• Is there an optimal time point to relearn the prediction function for batch learningalgorithms?

• What is the optimal algorithm in terms of accuracy to do short-term load forecasting?

• How efficient can be done short-term load forecasting using a batch in-memorystream processing framework?

1.1.3. Thesis Scope

Two areas are considered to be in the thesis scope: short-term energy load forecastingand distributed big data analytics. In particular, we are interested in the application of adistributed stream analytics framework on the task of short-term load forecasting.

1.2. Method and Contributions

A prototype is used to investigate the research questions. For prototype development anexisting version of a prototype mentioned in [137] is used as a basic reference. The soft-ware is written in Scala including Java libraries. As a distributed event stream processingframework Apache SPARK Streaming [4] (based on Apache Spark [3]) is used.

4

1.3. Thesis Structure

Scala is a very good choice for the task of machine learning, because it is functional,offers an easy way to manage data flow and provides a natural interface for Spark. Addi-tionally, the MapReduce-concept, which is extensively used in Spark is also functional.

The data processing flow will be similar to the one shown in [137].An SVM-, persistence-, median- and regression-based methods are used in the mode of

batch learning.The experimental results are evaluated using MAPE.

1.3. Thesis Structure

The next chapter covers the thesis’ background. It concentrates on the development of bigdata processing technologies and describes the related work in the area of load forecasting.The chapters 3 and 4 describe data, introduce our method for load forecasting and go intoimplementation details. In chapter 5 is shown, how our method performs and an analysisof the experimental results is provided. The last chapter concludes the thesis and gives anoutlook for the future work.

5

1. Introduction

6

2. Background and Related Work

In this chapter, an introduction into the application area and goals of household electric-ity load forecasting is given, as well as an overview over the cutting edge data analyticsmethods is provided. The chapter provides a theoretical framework for the main researchwork, and discusses current State-of-the-art and limitations of previous research.

2.1. Electricity Load Forecasting

Our civilization highly depends on a reliable supply of electricity. Its efficient deliveryprocess is very important and consists of three stages: electricity generation, transmissionand distribution. The concept of a smart grid involves an accurate planning in all of thesestages.

The basis for a correct planning of resources is an accurate forecasting. Various factorsimpact the forecast accuracy: weather, consumption in the past, electricity price, time ofthe day and others.

The availability of rich sensor information from smart metering devices makes it pos-sible to control the electricity consumption and provide near-real-time forecasts aboutit.[129]

2.1.1. Electric Power Industry and Smart Grids

The traditional electric grid that has been developed over the last two centuries is ineffi-cient due to transmission and distribution losses and fossil energy sources. In near future,the supply volume of conventional energy sources like coal, oil and natural gas will de-crease, because these sources are not renewable and will be exhausted after a few decades.On the other hand, the supply of energy from renewable sources like solar, hydro, geother-mal energy, biomass and wind will have to increase.[129]

According to the US Department of Energy [18], a smart grid in 2030 is a fully automatedpower delivery network that monitors and controls every customer and node, ensuring atwo-way flow of electricity and information between the power plant and the appliance,and all points in between. Its distributed intelligence, coupled with broadband communi-cations and automated control systems, enables real-time market transactions and seam-less interfaces among people, buildings, industrial plants, generation facilities, and theelectric networks.

Another definition of smart grid by the US Department of Energy is: the smart grid is theelectricity delivery system, from point of generation to point of consumption, integratedwith communications and information technology for enhanced grid operations, customerservices, and environmental benefits [23].

The electric power industry nowadays is starting to move away from the traditionalinfrastructure architecture, shown in 2.1 towards the Smart Grid.

7

2. Background and Related Work

Figure 2.1.: Traditional electricity delivery industry with one-way communication

The conventional power supply process represents a centralized electricity control anda one-way-directional scheduling of electricity from the generators to the end-consumers.Consumers have a very restricted ability to include own energy storage and generator cellsinto the grid. It is furthermore impossible to dynamically change the amount of electric-ity flowing into the buildings from the substations. Also, consumers have a very littleknowledge about the background of their monthly electricity bills and energy prices.

In contrast, the vision of a smart grid, presented in 2.2, has the following key character-istics [54]:

• All electric devices are combined with a smart meter, constantly streaming measure-ments to a control center

• Consumers are informed exactly which device consumed how much energy andwhat they have to pay for

• Additionally, customers may change their consuming behavior in order to go for alower price at a low-demand time of the day

• Grid operators receive real-time information about energy consumption and can op-timize the distribution costs providing more or less electricity from one or anotherenergy source

• Integration of renewable power generators into the grid is optimized

8

2.1. Electricity Load Forecasting

• Unused amount of energy can be efficiently saved into a battery and be fed in intothe network later on.

• The grid is interconnected with not only energy cables, but also with communicationcables to enable constant data flowing from smart meters and between the grid nodes

• Smart grid allows all stakeholders to optimize their costs

• It gives a two-way power flow between all stakeholders

• Demand forecasts are used for an optimal use of available resources

• The grid automatically re-route the load between appliances and houses based on alearned model and current demand

This enables to meet the following requirements of the electric grid of the future:

• Reliability: the grid continuously monitors itself and immediately reacts on prob-lems on a node

• Efficiency: due to integration of renewable sources and an intelligent management ofload shifts, the efficiency of the network is high and the energy losses are minimized

• Economic efficiency: constant monitoring of the grid by itself enables reduces ofmaintenance costs

• Impact on the environment: due to an increased usage of renewable energy sources,the carbon-dioxide footprint of the mankind is minimized and the global warmingcan be less dramatic

• Security and safety: continuous grid condition monitoring predicts failures and(cyber)-attacks and an effective management reduces the possibility of its’ occur-rences.

9

2. Background and Related Work

Figure 2.2.: Smart grid

The fulfillment of these requirements can be managed by the next three main smart gridcomponents [56]:

• Smart infrastructure system: involves an advanced electricity generation, deliveryand consumption, advanced data metering and monitoring, advanced communica-tion

• Smart management system: includes advanced management and control

• Smart protection system: provides grid reliability, safety and security analysis andprotection services

The smart infrastructure system includes an equipment of electrical appliances withsmart meters, which in real-time can send the electrical load consumption information ofthe device to a control center.

Data coming in real-time from the smart meters will be huge in volume and data centersof a smart grid should be able to handle that amount of data and efficiently process it.

The installation and development of the systems above will require a lot of investments,what however, on the long term will be more economical than continuing the operation ofthe old systems and will save costs on the sides of all involved stakeholders [54].

Smart grid will be able to automatically coordinate the needs of transmission and dis-tribution network operators, power generator companies and electricity consumers. A

10

2.1. Electricity Load Forecasting

reliable smart grid will enable to change the route of electricity flow automatically in caseof an outage to avoid blackouts.

In general, the smart grid technology will provide a better quality and reliability of theservice, better control and transparency over the costs on all sides and a better manage-ment of the grid network.[54]

In [31], [66] is shown, that a large portion of European residential energy consumptionis caused by refrigerators, heating and lighting systems. This information is, however, notobvious for many customers.

An energy consumption report for Germany in 2009-2010 [21] presents a distribution ofelectricity consumption depending on consumption aims: cooling, usage of entertainmentdevices, cooking and heating turn out to be the most electricity consuming tasks. It ad-ditionally shows, which energy sources were responsible for what kind of consumptionaims.

An informative chart about current energy consumption may influence the energy sav-ing behavior of residential consumers, as Karjalainen has shown in [80].

In order to automatically and efficiently manage the electrical flow between the electric-ity users, forecasts about the future consumption have to be performed.

2.1.2. Goals and Metrics of Load Forecasting

The term forecasting is frequently confused with the terms prediction and predictive analytics.A prediction in the general sense involves an imagination of an oracle which can reason

about the past based on some experience and which on this basis is able to look into futureto predict a certain event.

Prediction and predictive analytics in the scientific sense means predicting a behavior ofsomeone or a trend characterized with a probability and based on statistical data analysisand the current evolution. In contrast, forecasting refers to predicting an (aggregated)value, or an occurrence of an event at certain time point, based on historical data analysis,the current state and sometimes on predictive analytics. [10]

Electricity load forecasts provide a prediction of an amount of electricity consumed ata certain point of time. The purpose of electricity load forecasting is in most cases anefficient economic and quality planning. Good forecasts ensure economic profitability ofthe service and safety of the network.

Energy consumption forecasts can be performed on different levels of time interval res-olution. The range of the forecasts generally depends on the available reliable data and thegoal of the forecast.

Usually, the following three terms for forecasting interval are used: short term, mediumterm and long term forecasts.

Short term load forecasting (STLF) means to give forecasts for the next minutes up to oneday on minutes or hourly basis. Such forecasts are required for the scheduling, capacityplanning and control of power systems.

Medium term load forecasting (MTLF) are required for planning and operation of powersystems. Such forecasts can be provided from one day to months ahead on hourly or daysbasis.

In contrast to short and medium term forecasting which support operational decisions,the aim of long term load forecasts (LTLF) is to support strategic decisions, more than a year

11

2. Background and Related Work

Name Formula

Mean Squared Error (MSE)1

n·∑n

i=1 (Ai − Fi)2

Root Mean Squared Error (MSE)É

1

n

∑ni=1 (Ai − Fi)

2

Mean Forecast Error (MFE)1

n·∑n

i=1(Ai − Fi)

Mean Absolute Deviation (MAD)∑n

i=1|Ai − Fi|

Mean Absolute Percentage Error(MAPE)100

n·∑n

i=1|Ai − Fi

Ai|

Weighted Mean Absolute Percentage Error (wMAPE)

∑ni=1|

Ai − Fi

Ai| ·Ai∑n

i=1Ai

Ai represents the actual value with index i, Fi is the corresponding forecast value.

Table 2.1.: Forecast accuracy metrics

ahead [29].Metrics to measure the quality of load forecasting can be subdivided into two main

categories: measuring the forecast accuracy and measuring the processing delay (latency).

2.1.2.1. Metrics of Forecasting Accuracy

There exist many methods how forecasting accuracy can be measured. A good summaryis given in [138] and [22]. The most important metrics are shown in 2.1.

One of the most popularly used measure of forecast accuracy is MAPE [138]. It deliversrepresentative intuitive values and is easy to understand.

However, for our goal, weighted MAPE should be used, because it can handle low val-ues better than other methods. This is achieved through weightining every value with anappropriate coefficient, which can be, for example, the actual value divided by the sum ofactuals observed so far.

Both MAPE and wMAPE have a problem with handling zero actual values, because ofpresense of actual values in the denominator, and therefore a possible division by zero.

We decided to use wMAPE as the accuracy measure and introduce a positive bias of 10for the calculation of accuracy in order to avoid the zero values in the denominator.

12

2.1. Electricity Load Forecasting

2.1.2.2. Latency

Grid stability can only be ensured if all operational actions perform with low latency.Latency is a total amount of time a process takes to complete.The goal of all software with a requirement of real-time processing is to minimize the

latency of the processing. It is a common mistake to believe the latency can be neglected.According to [114], three types of latency can be considered:

• Data capture latency - amount of time it takes to send, receive and unmarshall thesent message

• Analysis latency - amount of time the data is being analyzed

• Action latency - time to react on the received data.

Users are often interested in several latency values: worst-case latency, average-caselatency and 99 percentile. The percentile indicates a value which a given percentage ofobservations do not exceed. Even small number of customers which face an unexpectedlylarge latency may cost a company much money. Especially, a small latency becomes a mustin short term forecasts and of course in real-time applications. In the electricity supplyservice, the low latency of smart metering data analysis and decisions upon analysis canenable the service to be cost efficient for grid operators and safe for the grid.

Additionally, it is worth to understand the difference between end-to-end and process-ing latency.

The first one includes all of the above three, whereas the latter includes only the last two.Usually, latency is calculated in percentile representation. Therefore, in order to calculate

a statistical distribution of latency a sufficient number of tests or observations should beconsidered.

Another approach is to calculate the average latency, which does not tell anything aboutthe distribution.

Furthermore, a worst-case latency and a best-case latency can be given. We will con-centrate only on the processing latency in our work and will calculate the latency usingpercentiles.

2.1.3. Simple and Advanced Load Forecasting Methods

Forecasting has always been an important issue having many application areas. Not allforecasts could absolutely be confirmed to be true. Forecasting is a discipline which has acertain relativity and a probability component. Therefore, a forecast is only a guide valuewhich can not be considered as absolute truth given by an oracle.

Instead, forecasting relies on data analysis what implies that the underlying data playsan important role in the quality and predictability of an event or a value. If the data doesnot contain any patterns, the forecast is likely to be almost random or similar on of thepreviously seen values. From another perspective, a qualitative, but noisy or maliciouslymanipulated data might lead to a wrong forecast.

We will concentrate only on quantitative forecasting models, models that are based ondata and not on opinions of experts like in qualitative models. In contrast to qualitativemethods, quantitative provide an objective result.

13

2. Background and Related Work

A forecasting model is a mathematical expression, which has a number of certain indepen-dent variables which serve as input data and input parameters and a dependent variable,which is the output of the model, or the prediction (forecast) value.

The most simple nevertheless powerful forecasting model is an assumption that thevalue on the forecasting time point will be the same as the actual value. Such model iscalled persistence model. Another simple method is to look for some similarity on inputdata with previous observations and expect the same outcome as at a similar time point,for example when a seasonality in the data is given.

Other basic forecasting methods can be classified into two categories: causal and timeseries analysis methods.

Causal methods base on the assumption that a value of the dependent variable can beestimated using a function, given independent input variable values. In contrast, timeseries methods rely on certain historical in most cases cyclical patterns observed in thepast data.

2.1.3.1. Regression Models

An important causal technique to estimate a forecast using relationships between depen-dent and independent variables and involving some parameters is called regression analy-sis.

If the dependent variable depends only on a single independent variable, the model iscalled univariate or simple regression, if the dependent variable is a scalar, but the indepen-dent a vector - multiple regression, and in the case when both the dependent and indepen-dent variables are vectors - multivariate regression [110].

The method of least squares by Gauss and the regression theory of Pearson and Yule ledto the modern regression model, presented by R.A. Fisher in 1922 in [28].

Without loss of generality, in following we represent the input data and output resultwith low-letter characters xt, yt respectively at time point t. Assumed, that given someobservations in the past one of the following relationships between dependent and inde-pendent variables could be built, as presented in [7]:

• The simplest regression model is linear, modeling a linear relationship between de-pendent and independent variables. It is represented as a function:

yt = a+ b · xt, where a and b are intersection and slope of the line respectively.

Other methods assume a non-linear relationship.

• exponential function

yt = a · bxt

• power function

yt = a · xtb

• Logarithmic Function

yt = a+ b · log(xt)

14

2.1. Electricity Load Forecasting

• Gompertz Function

This method attempts to fit a ’Gompertz’ or ’S’ curve.

yt = c · abxt

• Logistic Function

The logistic regression is a type of regression analysis, normally used for a binaryclassification task, that is to estimate a binary label of an object based on its features.The logistic function used in this kind of regression takes values between 0 and 1 andallows to estimate the probability of an object to belong to one or another class.

yt = 1/(c+ a · bxt)

• Parabola Function

This method attempts to fit a ’Parabolic’ (second order polynomial) curve.

yt = a+ b · xt + c · xt2

In our project, we will concentrate on linear and logistic regression.We will model the non-linear dependency with a transformation of the independent

variable onto a logarithmic scale.

2.1.3.2. Time Series Analysis Methods

Time series analysis is a bunch of techniques applied when data has an autocorrelation, iscyclical, has a trend or seasonality.

A time series can be decomposed in its three basic components: trend, season and cycles.A trend is a long-term decrease or increase in the data.A time series has a seasonal component when its behavior has a defined season-related

periodic changes, for example due to the day of the week.A larger periodical changes in time series are called cycles. However their periods are

not fixed.Autoregression is a regression of time series values with its past observations (lag) and

future values.The properties of a stationary time series do not depend on the time when the time series

is observed.In following, an overview of the most important time series analysis methods is pre-

sented based on [7]:

• Simple methods. One of the most basic time series approaches is simple moving aver-ages, when the forecasting value can be obtained by averaging some last values fromthe past, providing them an equal weight.

A similar approach provides an different weight on every node, called weighted mov-ing averages. However, moving averages approaches does not allow to follow thetrend, always lagging behind the actual values.

• Exponential smoothing and seasonal smoothing methods

15

2. Background and Related Work

Single exponential smoothing methods automatically weighten past data with de-creasing weights exponentially over the time. Such methods can effectively modeldata with a flat trend.

Double Exponential Smoothing applies Single Exponential Smoothing twice. It is usefulwhere the historic data series is not stationary.

Holt’ Double Exponential Smoothing allows to handle time series which is not station-ary.

Triple Exponential Smoothing applies Single Exponential Smoothing three times.

Holt-Winter’s Seasonal Method provides three equations for modeling of the seasonal-ity and the trend along with the smoothing.

• ARIMA models

Auto-regressive moving average model (ARIMA) was firstly introduced in 1938,when moving average and auto-regressive models were combined to model timeseries [87]. Box and Jenkins made in 1976 [?] the ARMA model popular, applyingsome modifications and recommendations. The model included the letter ”I” for in-tegrated and was called ARIMA. Then the model was extensively used and analyzedfor the task of forecasting, as we will describe below.

2.1.3.3. State Space Modeling

State space modeling using the Kalman filter [76] enables a different iterative view on theforecasting task. Kalman filtering bases on state space modeling and iteratevely updatesthe state variable using new observations [76],[99]. The state space form of the Kalmanfilter can employ many different models, for example the ARIMA model [75]. However,the model for the Kalman filter has to be identified in advance [96].

2.1.3.4. Machine Learning Methods

The task of prediction of a value given an actual value, where for each new instance, an er-ror function can be computed, belongs to the class of supervised learning tasks. Unsupervisedlearning in contrast, includes tasks where no exact objective solution of the target functionand current error is known (an example for unsupervised learning is clustering) [62]

Machine learning methods are algorithms, which allow to efficiently learn an underlyingmodel using some training examples in order to reason about the unseen data.

Offline or batch learning refers to learning a target function using a training set of exam-ples only once. After the learning procedure the target function (model) can be appliedonto unseen data to predict its label (classification task) or value (regression task).

Online or incremental learning on the other hand, does not include an training procedureat the beginning, instead it allows to update (train) the target function after every one newdata item during the productive stage.

Another mode, described and benchmarked in [109] is called batch-incremental learning,which is a hybrid of both previous modes. In batch-incremental learning, as training ex-amples are buffered as they arrive and the model gets relearned in the batch learning modeafter the buffer of a certain predefined size is full. After that, the procedure starts again

16

2.2. Big Data Analytics and Stream Processing

with an empty buffer. According to [109], the performance results, achieved in batch-incremental mode are comparable to those of the pure incremental learning mode.

In our project we focused on a modification of the batch-incremental learning mode. Therelearning of the model occurs not when a buffer of examples gets full, but instead whenthe accuracy of the learned model gets worse.

The most important machine learning methods for load forecasting include:

• Learning regression models The idea is to learn a regression model (one of the mod-els presented in previous subsection) from a set of training examples. The learningalgorithm tries to minimize the error between the target function and all previouslyseen training examples’ values.

• Knowledge Based systems Knowledge based method, developed in 1960s usuallyemploy a set of reference values from the past. These values are adapted as newinformation becomes available and are used in modeling the forecasting value asdirect references. Often, the knowledge is represented in form of a rule-based system.[57]

• Fuzzy Logic Due to its nature fuzzy logic systems can be very robust when usedfor load forecasting in presence of noise [57]. The rules of a fuzzy logic system getinferred from a the fuzzy input, which is mapped on a certain output. The similarityof the input with the historical values is used to forecast the next value and to adaptthe reference value. In [81] a fuzzy rule-based system is presented, achieving highlycompatible accuracy results.

• Artificial Neural networks The invention of artificial neural networks (ANN) wasbased on the assumption to model, how human brain works. The output of an ANNis a linear or non-linear combination of its inputs.

Developed in the second half of the twentieth century, artificial neural networks rel-atively quickly became one of the most popular forecasting tools and as a researcharea still remain under development and improvement. Also, it is widely used fore-casting tool in the industry. [133] [57].

• Support Vector Regression Support Vector Machines(SVM) were developed usingthe results of the statistical learning theory and introduced by Boser, Guyon and Vap-nik in 1992 [39]. Initially developed for the task of classification, support vector ma-chine was adapted for the regression case [124]. Since the introduction of the SVMsthe method was continuously improved and modified, becoming a strong state-of-the-art machine learning tool for efficient learning of non-linear relationships, espe-cially in load forecasting. Support vector regression is one of the most recent machinelearning tools and outperforms many other methods. [57].

2.2. Big Data Analytics and Stream Processing

Analysis is a term used to describe decomposing something complex into pieces. Dataanalysis in this context means a process of gaining knowledge using decomposition andstatistical analysis of data.

17

2. Background and Related Work

Data analytics is a newer term for data analysis, meaning application of a bunch of differ-ent technologies beside statistical analysis, such as machine learning, data mining, visual-ization methods to extract hidden knowledge from the data. Additionally domain-specificknowledge has an impact on the total analysis result.

Data science is a term used similar to data analytics, but it has slightly more weight onmachine learning and recommendations of future actions.

In the following we will talk about data analytics in all cases, where also data sciencecan be meant.

History of data analytics according to [69] goes back into 1962 when John W. Tukeywrote in ”The Future of Data Analysis” about the rising importance of data analysis.

Since then the field of statistics has been expanded to a broad number of areas and theinterest for knowledge discovery for data stored in databases has been growing.

In the 21st century, data science still is a mature but still an emerging and broad fieldof research, including processes of collection of data, qualitative and quantitative analysis,predictive modeling and everything else what can be done with data.

A typical data analytics process consists of the following stages: data collection, datapreparation and preprocessing, data exploration, data analysis and visualization of theanalysis results.

In our project we will follow this process as far as possible.

2.2.1. Big Data

High volume, velocity and variety of data flowing between nodes of a smart grid createthe demand for special efficient techniques to deal with the data. Extraction of valuable in-formation from such ”Big data” generated by the smart meters is one of the central aspectsof the smart grid concept.

How big is big data? Generally, ”big” in this context means that such data volumecannot be loaded into main memory of a single system at once. Therefore, in most casesa distributed system or an extremely powerful system is needed to be able to analyze Bigdata with efficient algorithms.

Big data already became reality in many organizations. The data explosion process over-whelmed many areas: mobile and Internet communication, location tracking, extensiveuse of smart meters, weather and climate forecasting, financial operations, use electronicdevices with special sensors and many others.

The combination of high volume and velocity is a big challenge for the today’s industry.Real-time reactions on changing environment and business issues require sophisticatedanalytics methods which are able to process large amounts of data fast enough.

Use of big data provides a deeper understanding of the application domain, which im-plies more security and a better planning and prediction.

However, as indicated by Intel in 2012 in [20] a rapid technological change is requiredin order efficiently analyze big data.

Often, data comes in form of an unbounded stream and the its processing does not allowany interruptions.

Furthermore, data can be noisy and unusable without a preprocessing step. Many secu-rity and privacy issues have to be understood and dealed with.

18

2.2. Big Data Analytics and Stream Processing

2.2.2. Classification of Analytics Methods

There exist two main hardware-related approaches to deal with Big Data:

• hold the data within one machine with a large main memory

• use a distributed system

Use of a distributed system has many advantages compared to a centralized one-machinesystem:

• a distributed system scales better

• it is more reliable due to some redundancy

• computing power can be incrementally increased

Data analytics methods can be divided into 4 categories:

• standard analytic systems with data size limited to the size of hard disk drive (forexample, traditional SAS or MATLAB)

• In-Memory analytical systems (for example, R)

• Distributed analytical systems requiring unlimited scaling, but with no real-time re-quirement.

• Complex Event Processing and Event Streaming Systems when a real-time process-ing requirement is given.

These four categories of methods have different capabilities in terms of data processinglatency.

The first three are generally not able to process dynamic information in real-time, whereasthe event processing systems are designed to handle subsecond-latencies used in stream-ing tasks.

2.2.3. Complex Event Processing

Complex event processing (CEP) is a new Big Data analytics technique involving considera-tion of data as an unordered series of events coming from several sources. Incoming datais being continuously monitored and acted upon using declarative conditions. Moreover,the data monitoring and processing works at a near-zero latency.

An event can be understood as an incoming predefined atomic action or transaction fromthe environment. A complex event is an composition of several events belonging to onesemantic object.

Different events may come from different sources and the CEP system can assemble acomplex event internally modeling the parts into one object.

A CEP system is basically aimed to solve the velocity problem of big data, while datacomes as a stream of predefined events. The sliding window approach used by CEP sys-tems ensures that only a portion of actual data simultaneously passes into the main mem-ory, whereas the old events can be discarded or archived. This way, all the data does nothave to fit into the memory, but still the most recent events can be efficiently analyzed.

19

2. Background and Related Work

A CEP framework allows to aggregate data by a given function on small portions oftime. The aggregated information can be used as long as it is not discarded from thememory.

The CEP systems rely on the assumption to extract a number of events from a supersetwith a certain pattern, normally, a SQL-similar query. Therefore, the events must follow acertain common structure. The publish-subscribe concept, that was already followed by thefirst article on CEP systems [113], includes two sides: a set of subscribers (observers) whichare interested in a topic and get notified when something interesting is published, and apublisher who indirectly broadcasts some information in form of events to its subscriberswithout directly addressing any of them [49].

Complex event processing is used for anomaly and fault detection, fraud detection, real-time web or log analysis, smart metering analysis, business operations, financial tradingand many other applications.

A survey on CEP frameworks [61] summarizes the achievements of different vendorsand gives an overview of the area in 2010. Furthermore, it discusses the open questionsregarding forecasting that need to be solved using the CEP.

2.2.4. Event Stream Processing

Event Stream Processing (ESP) systems are a subset of Complex Event Processing systems.Unlike a CEP system, an event stream processing system always relies on data ordered bya provided time stamp, usually coming from only one source.

Formally, an event in an ESP system represents a tuple (x, y), where x is a timestampand y is a vector of the remaining event attribute values. A stream of events is a collectionof events arriving in a linear order of their time stamps.

Just like CEP an event stream processing system allows data analytics with low latency.However, as event stream processing relies on a single stream strictly ordered by the timecomponent, algorithms should perform faster than on a more general purpose CEP system.

Stonebraker et. al. discuss in [118] the requirements on software for a real-time streamprocessing and compare the abilities of rule engines, database management systems andevent stream processing systems. The authors come to the result, that any traditional sys-tem other than a special event stream processing system fails to meet the requirements onsuccessful real-time stream processing. The requirements on such systems are: continousdata moving, handling of stream imperfections, data safety and availability, low latency,scalability, generation of predictive outcomes, integration of stored and streaming data,querying the data using a window.

Example applications of event stream processing systems include: position tracking,stock quotes changes, smart metering analysis.

In [128] an event stream processing system is presented, which is able to query RFID-readings from a window and process them in real time.

2.2.4.1. Stream Mining

Guha et. al. [70] define a data stream as an ordered sequence of points x1, .., xn that mustbe accessed in order and can be read once or a small number of times.

Some key characteristics of a data stream according to [33] are:

20

2.2. Big Data Analytics and Stream Processing

• a stream can be unbounded in size,

• stream elements arrive online,

• memory used to process streams is comparably small such that processed streamelements have to be discarded or archived.

Stream items can be processed in the online mode, when every single arriving itemmight immediately be processed and can modify a model or a function.

Another approach is to buffer streaming elements into batches as they arrive based onan interval of time. The latter method should be used when computational process onarrived items is performance-expensive and should be used less frequently. The batchinterval should be chosen as small as possible, however a too small batch interval mightlead to an always increasing delay if the system is not able to process arriving elements athigh rate.

If data rates are still too high shedding [34] can be used which ignores some streamelements and provides only an approximate computational answer. In this case the errorbounds can be given along with the result.

A synopsis data structure can enable a fast update of the model [33].In our method we focused on a technology relying on batches, which are small enough

ideally less than the streaming frequency to near real-time performance.Values of elements of a data stream can be of different types: cumulative or time series

[101].In the first case the sent value has to be added to the value of previous time step. In the

second case the value of the previous time step has to be overwritten by the new value.Furthermore, a cumulative value can be directly sent.

A data stream can include several independent time series as well as only one.If data stream elements arrive in no particular predictable order and data stream in-

cludes several independent time series and the amount of time between two stream el-ements belonging to the same time series is irregular, a time resolution grid has to bedefined. The time resolution grid bases on the underlying frequency of the data stream.The time resolution step should be a multiple of the data stream frequency and the great-est common divisor of frequencies of individual time series. Based on the time resolutiongrid, the data stream can be equidistantly discretized more easily.

A common technique to work with data streams is to introduce a sliding window [63].Many special algorithms and data structures have been developed specially for the case

of data streams, for example running median calculation. A well used approach in relationto data streams is to introduce buckets or a histogram and place the incoming values inone of the buckets, having a special functionality or characteristics. In the next step afunction can be applied to the items in one particular bucket, allowing aggregation basedon a common property of stream elements.

Efficient data mining and machine learning algorithms performing on data streams canbe implemented to run in real-time, retraining and updating their models as new streamelements arrive.

Some learning algorithms (for example support vector machine) require a sufficientlylarge training set. Therefore, application of a technique with this requirement on a batch

21

2. Background and Related Work

of learning examples from a data stream requires sufficiently large batch sizes or a modifi-cation of the learning algorithm.

In [38] most important classes of machine learning algorithms on data streams and its’requirements are summarized .

Common applications of data streams are: internet protocol traffic, smart metering, dig-ital telephone services.

Information from sensors of smart meters can be used as a stream of data and clus-tered by the groups with a similar historical profile. In the next step, each cluster canbe equipped with a learning prediction model, for example, in online mode, as shown in[112].

As smart meter data frequently provides time series with a lot of zero values when thedevice is turned off, some optimization methods, like stochastic gradient descent, can leadto intermediate irregular matrices and oscillating results. Some approaches, how to dealwith zero-inflated data are shown in [92] [60] [111].

2.2.4.2. Sliding Window

A concept of a sliding window, presented in 2.3, means considering only a portion ofa data stream at once arriving during a defined fixed time interval. This window slidesby a fixed number of time steps (sliding interval) into the future, discarding or archivingelements from the past which do not fit into the window anymore [63].

22

2.2. Big Data Analytics and Stream Processing

a b c d he f g

t = 1

t = 2

a b c d he f g

t = 2

a b c d he f g

Figure 2.3.: Sliding window

A sliding window can vary over time and change its size. Such sliding windows arecalled adaptive.

In our work, we will use traditional sliding windows with a fixed size over time.

2.2.5. Distributed Data Analytics

According to [119], a distributed system is a collection of connected autonomous computerswhich appear to the user as a single system.

As discussed before, distributed data analytics offers more advantages than conven-tional use of a single machine. A simple parallelization on a single machine is often notfast enough for an application on Big Data to run in a reasonable amount of time.

Distributed data analytics enables near-real-time optimizations, predictions and pre-scriptions.

Google Inc. introduced in [51], a new programming paradigm and software frameworkcalled MapReduce, enabling large-scale distributed computations on a cluster and in [67]the underlying appropriate distributed file system.

In 2006, an implementation built upon the ideas from these papers was moved to anApache open-source project Hadoop, attracted lots of contributors all over the world.

23

2. Background and Related Work

In the next years, Apache Hadoop including HDFS (Hadoop Distributed File System)and MapReduce became for many users the ultimate system of choice to execute parallelcomputations on a cluster.

In following, we will consider the basics of the MapReduce-paradigm:Input and output for MapReduce are stored on the HDFS. Cluster for MapReduce in-

corporate a master node and some worker nodes.Basically, the MapReduce concept consists of two steps: Map and Reduce, shown in 2.4.During the Map()-Step, the master node subdivides the input into stateless tasks and

distributes them among the workers. This step can continue hierarchically until a leaf-worker node processes the task and returns a result. The results are grouped by keys andcollected in parallel during the Reduce()-Step.

Formally, a map-function takes as input a key-value pair (k1, v1) and returns a list ofother key-value pairs (k2, v2), which are distributed among the worker nodes.

map(k1, v1)− > list(k2, v2)In the next step, elements from the latter list are grouped by the key k2, preparing the

input for the Reduce-function.In the Reduce stage, a function application on grouped lists is distributed among the

worker nodes and the corresponding result values of the the function are returned as theoutput.reduce(k2, list(v2))− > list(v3)The figure visualizes the data process in MapReduce-concept along with the intermedi-

ate data types.

24

2.2. Big Data Analytics and Stream Processing

Map-Functionon Node 1

Map-Functionon Node 3

Map-Functionon Node 2

Map-Functionon Node 4

Result: (key2, value2)

Result: (key2, value2)

Result: (key2, value2)

Result: (key2, value2)

Input: (key1, value1)

Input: (key1, value1)

Input: (key1, value1)

Input: (key1, value1)

Input: (key1, value1)

Input: (key1, value1)

Input: (key1, value1)

Input: (key1, value1)

Group by key,sort

Input: (key2, list(value2))

Input: (key2, list(value2))

Reduce-Function

Reduce-Function

Output:(value3)

Output:(value3)

Figure 2.4.: Map-Reduce concept

Unlike in normal functional programming, MapReduce uses map and reduce-functionswhich are applied on (key,value)-pairs and not on simple lists.

The official MapReduce-Tutorial offers a variety of examples to learn the framework [9].With introduction of Apache Hadoop YARN (Yet Another Resource Negotiator) [1] re-

source management of Hadoop separated from MapReduce tasks. The original MapRe-duce had to do both: schedule tasks and resources.

Apache Hadoop offers wide possibilities for big data analytics. However, it is impossibleto achieve low latency and real-time performance due to computational results storage onthe hard disk drive (as in HDFS).

Therefore, many additional projects around Hadoop arise nowadays, allowing real-timeevent stream processing using the idea of MapReduce.

Some innovative commercial software like SAP HANA offers a possibility to combinethe fast access to data on main memory with distributed computation with Hadoop.

The shift to storage of data in main memory can significantly reduce the latency andprovide near real-time processing ability.

Distributed Event Stream and Complex Event Processing Systems play therefore an im-portant role in reducing the latency further towards real-time combining the advantagesof stream mining and the MapReduce paradigm. Event Stream Processing addresses inthat sense the velocity aspect of Big Data.

25

2. Background and Related Work

2.2.6. State-of-the-Art in Complex Event Processing and Event StreamProcessing Frameworks

A lot of large vendors developed or acquired CEP platforms.Current state of the art in near real-time event processing can be differentiated into two

main categories: general purpose complex event processing frameworks and pure narrowfocused event stream processing frameworks.

A recent survey [50] analyzed the capabilities of different event processing frameworks.It can be easily seen that most of the compared frameworks are based on Apache Hadoop.This fact constitutes that it has today one of the most important and valuable paradigmamong distributed computing libraries. On the other hand, as stated in the previous sub-section, its core abilities do not allow to execute queries or analyze data in real-time due toneed of hard disk drive accesses. This leak of technology led to the development towardsoptimizing latency.

In following we will compare the most popular event processing frameworks SparkStreaming [4], S4 [2], Storm [5] which were moved to Apache Incubator and the CEP sys-tems Esper [6] and StreamBase [15], and will summarize the comparison results. Stream-Base, as mentioned in the survey, is representing all commercial CEP frameworks, sincethey are offering nearly the same capabilities.

Storm, S4 and Spark Streaming share the following characteristics:

• Stream processing

• Fault Tolerance

• Distributed parallel computation

• Modular design

• Open Source

• Based on JVM

Although, Storm, S4 and Spark Streaming follow the idea of MapReduce, they are notbased on Hadoop.

Unlike the above, StreamBase and Esper are general purpose CEP systems. Esper is stillan open source project, whereas StreamBase was commercialized.

Like the above stream processing systems, Esper bases on JVM and is available in Javaand .Net as NEsper.

As can be seen from the table in [50], Storm, S4 and Spark Streaming were introducedafter 2010, whereas StreamBase and Esper are much older, with release years of 2003 and2006 respectively. Nevertheless, there exist no significant independent comparison of per-formance and benchmarks of all named frameworks to current point of time.

As argued in [132], the throughput of Spark Streaming is comparable to the throghputof commercial CEP systems and Esper. According to this publication, Spark Streaming,achieves a throughput per node of up to 640 000 records per second on a 4-core node,whereas Oracle CEP 1 million records per second on a 16-core machine, StreamBase 245000 records per second on a 8-core machine and Esper 500 000 records per second on a4-core machine.

26

2.2. Big Data Analytics and Stream Processing

The comparison of Spark Streaming to S4 and Storm gave the following result: S4 couldonly achieve 7500 records per second and Storm’s performance was at 115 000 records persecond on small messages, and 2 times slower than Spark Streaming on larger ones, asdiscussed in [132].

The scalability (in terms of a distributed system) of pure Esper and StreamBase librariesis very limited due to its initial design. In contrast, Storm, S4 and Spark Streaming canperform the computations on distributed machines. However, several possibilities exist inorder to enable Esper and StreamBase to work in a distributed mode including integrationwith Hadoop and the distributed stream processing systems. Spark Streaming offers abetter scalability (linearly upto 100 nodes) and a higher throughput then the other frame-works [132].

Unlike Esper, and the Stream Processing frameworks, which all claim to reach real-time or sub-second latencies (in case of Spark Streaming), StreamBase argues to reachmicroseconds-latencies.

StreamBase is mainly written in C++, whereas the other compared libraries are basicallywritten in Java or Scala (Spark Streaming).

Spark Streaming bases on Apache Spark, which is a distributed in-memory batch pro-cessing framework, which can be executed on Hadoop, YARN or standalone. Spark pro-vides a stack of libraries for different purposes, such as MLLib for Machine Learning [13],Streaming [4] for event stream processing. When comparing Spark Streaming to the oneof the most popular stream processing frameworks Storm, the following issues should beconsidered.

Storm maintains the state recovery using the so called upstream backup [78], whichmake the recovery process performing very slow. Spark Streaming on the other handsaves the graph of applied actions and can easily recover the applied operations stack incase of faults within one second [132]. S4 relies on the service ZooKeeper to manage itsresources and it has an even worse recovery system, which does not prevent state loss [12].

Spark is mainly implements the idea, that is cheaper in terms of performance to movethe process to the big data, than big data to the process. Since, the nodes in Spark store thedata sets and the processes are performed on the nodes, the process applies to the data andnot other way around. Following this concept, iterative algorithms can be very efficientwhen run in-memory on Spark, because every output of the previous step can be the inputfor the next step.

Storm, on the other hand, lets the data flow between the processes. Compared to S4 andSpark, Storm is more mature with a larger community.

Unlike others, Spark Streaming is following a hybrid approach, filling the gap betweenbatch and stream processing. Spark Streaming buffers the incoming events into smallbatches and processes them in near-real-time. Compared to others, Spark Streaming offersthe advantage that the same code used for batch processing in Spark, can be used for real-time computations.

There is no general perfect framework, as all of them have their pros and contras, offer-ing different ports and performing differently on various tasks.

Specific stream event processing frameworks were developed just to be used for onespecific task, as in [102], [59] for short term load forecasting, reaching millisecond-scalelatencies.

Another survey on CEP systems [61] provides a more detailed overview of the CEP

27

2. Background and Related Work

frameworks by 2010, and does not include the analysis of the stream processing frame-works, mentioned above. Furthermore, the survey shows a relationship between the pre-dictive analytics and complex event processing.

2.2.7. Framework of Choice: Apache Spark Streaming

We selected Apache Spark Streaming as a framework with a high potential and imple-mented our prototype in it.

Apache Spark Streaming offers a variety of advantages.In comparison to original MapReduce Spark does not rely on data serialization on hard

disk. Instead, it holds the data exclusively in main memory, offering an increase of perfor-mance up to two orders of magnitude [131].

Spark uses a read-only efficient distributed collections of objects, resilient distributeddata sets (RDDs) for intermediate results. Stored in memory across the cluster, objectsinside RDDs can be manipulated in parallel. RDDs can be automatically recomputed whenfailure occurs due to series of transformations logging.

An RDD can have actions returning a value and transformations to another RDD. The datastream is managed by a sequence of RDDs - discretized stream (DStream).

On DStreams two types of methods can be applied: transformations and output opera-tions. Transformations create new DStreams.

Main advantages of the underlying platform Apache Spark forming its high potentialare:

• Fault-tolerance due to storage of lineage of RDD transformations

• High performance due to holding the data in memory: low latencies are achievable

• Ability to work with data from different sources, including HDFS

• Simple programming model

• High scalability to large clusters

• Interactive shell

• Ability to use the Spark’s native machine learning library

• Spark has relatively small codebase, which can easily be extended

Spark’s abstraction of RDD works across the whole libraries in the Spark stack: ML-Lib - Machine Learning library, GraphX - visualization library, Spark Streaming - eventprocessing, Shark SQL - RDD-based tables.

Spark Streaming perfectly integrates into Apache Spark environment, enabling highscalability, fault-tolerance, direct use of machine learning methods from MLLib as wellas a combination of batch and stream processing.

Unlike traditional streaming approaches with processing one record at a time, SparkStreaming divides the data into small batches, which gives the ability to use the fault-tolerance mechanism of Spark: failed tasks can be rerun on other nodes and recovered from

28

2.2. Big Data Analytics and Stream Processing

the transformations logs. Specifically in the area of sensor metering the fault-tolerance ofSpark Streaming can be an important issue.

As discussed in the previous subsection, Spark Streaming allows to achieve high datathroughput, comparable to state-of-the-art commercial CEP systems. Spark Streamingreads data in small interval batches and holds it in form of RDDs in memory.

Additionally, a DStream could be combined with standard Spark RDDs and be run inbatch mode for example to compute a specific report. An attachment of a scala-Consolecan enable users to interact with the stream in real-time to get an actual report on the data.

The in-built sliding window in Spark Streaming allows to perform stateful operations onincoming data using DStream. The data is buffered based on a predefined batch intervaland is stored as RDDs across the cluster in memory, building the input for RDD transfor-mations (map, reduce, groupby etc.) and actions (save, count etc.) which either producenew RDDs (in case of transformations) or return a value (in case of actions).

Similar to MapReduce, Spark relies on a master and a set of worker nodes. Like in batchsystems, especially MapReduce, Spark Streaming is able to balance the load on the cluster,react on failures, but achieves lower latencies than MapReduce because of data storage asRDDs in memory.

DStreamProcessing

batches of X seconds

Live data stream

Spark

Processed results

Figure 2.5.: Data processing using Spark Streaming

As shown in figure 2.5 streaming computation in Spark Streaming is a series of smallbatch computations. The incoming data builds the DStream, which get processed by thedesired operations on Spark.

29

2. Background and Related Work

batch operation

Immutable dataset immutable dataset

t = 1:

t = 2:

Immutable dataset immutable dataset

input

DStream 1 DStream 2

Figure 2.6.: RDDs and stateful computation using DStreams

The figure 2.6 visualizes the process of stateful streaming computation on a DStream. Inthe first step, the RDDs as immutable datasets receive input data, store it across the clusterfor a defined period of time and use for that an RDD key as identifier. In the next step,the parallel operations (transformations and actions) on the RDDs are performed, whichcreate new RDDs. Finally, the results are written across the cluster as the new RDDs. Theresulting transformed DStream, as a series of RDDs, is able to maintain its state with thetime and be kept in memory.

Furthermore, all stages and time performance of the computations can be monitored athttp://localhost:4040/stages/.

A limitation of Apache Spark Streaming nowadays is the inability to perform with mil-lisecond latency due to the batch-nature of Spark. Also, latency of the overall computation

30

2.3. Literature Review

depends on the length of the sliding window and the size of available memory.

2.3. Literature Review

2.3.1. Review of short term load forecasting strategies

2.3.1.1. Time Series Analysis

The first class of the state-of-the-art methods of short term load forecasting rely on classicaltime series analysis. The most well established time series forecasting methods used inpractice are ARIMA, Holt-Winters’ seasonal model and its’ modifications [40].

2.3.1.2. Artificial Neural Networks

The second class of forecasting methods bases on artificial neural networks. An early ap-plication of the ANN on short term forecasting of the Italian electric load is shown in [41].In [47] Hong Chen et. al. perform short term load forecasting using a three layer artificialneural network modeling achieving a good accuracy result. Fuzzy neural networks areused in [35] with a faster training procedure than traditional neural networks. An onlineANN-based system architecture for short term load forecasting in smart grids is presentedin [71].

Hippert et. al. discuss in [72] the main important issues how to effectively model theelectrical short term load using the ANN, defend the methodology and identify the mainreasons for its criticism of some early adopters. In publication [123] ANN applied onshort term load forecasting for residential consumers performed much better than linearregression.

2.3.1.3. Support Vector machines

The third class of techniques applies support vector machines.Support vector machines were first introduced for the task of load forecasting in 2001 in

[44]. In 2002, SVMs were firstly used for short-term load forecasting [97]. Support vectormachines provides highly satisfactory results in terms of accuracy and in most cases out-performs many other state-of-the-art methods, including ANN and many other complexmethods on short term load forecasting [130] [43] [45] [116].

2.3.1.4. Regression Analysis

The next class of techniques uses regression analysis. Regression analysis is a very welldeveloped and broadly used method of forecasting with a huge number of variations andapplications. Authors of [27] apply a complex regression-based model to forecast electric-ity peak loads.

2.3.1.5. Combination of Techniques

A combination of methods of the previously introduced classes build a next class of ap-proaches with a high impact.

31

2. Background and Related Work

2.3.1.6. Other Methods

A number of other interesting techniques represent the last class of analyzed forecastingmodels. As mentioned above, load forecasting can be modeled using state space modeling,knowledge based methods, fuzzy logic and many others.

In following, we will concentrate the review of the related work on two subcategories:residential household short term load forecasting in general and load forecasting usingevent processing frameworks.

There is already a lot of work done which falls into the first subcategory (residentialhousehold load forecasting), whereas the usage of event processing frameworks for thetask of load forecasting is still in a very early stage.

Our related work analysis can be expressed using the figure 2.7.

Load forecasting

Event processing

Short term

Load forecasting in households

Household STLF

Household load forecasting using event processing

Figure 2.7.: Related scientific work and literature: the green and the black areas

We are mostly interested in research, covering the black area (household short term loadforecasting) and the green one which is an explicit intersection of event processing systemsand the load forecasting in households.

The problem of household electricity load forecasting and electric load forecasting ingeneral has been actively under improvement over the last decades [29], but still efficientshort-term load forecasting has many open research problems.

According to our research, the most work was done in analysis of methods in terms oftheir forecasting accuracy. The scalability, performance in terms of latency and suitablereliable frameworks for the use in smart grid were investigated by a very little number ofscientists in recent years.

32

2.3. Literature Review

2.3.2. Household Short Term Load Forecasting

2.3.2.1. Bottom-up versus top-down

The forecasting on short term should enable to effectively and efficiently manage the elec-tricity flow and match the consumption of the electricity load on household and devicelevel.

For the purpose of a bottom-up household load forecasting [42] the devices inside ahousehold are usually equipped with smart meters which measure the load exactly in realtime. The conventional top-down forecasting strategy on the other hand involves a long-term estimation of a house-level consumption based on a total demand in a region andeconometric parameters [83]. We will be interested only in the more fine-grained bottom-up approach.

The methods used for this task are exactly those, which are introduced in the previoussubsection.

In [105] authors developed an average household demand profile curve from a data setfrom Finland following the bottom-up strategy.

2.3.2.2. Linear Regression

Papalexopoulos and Hesterberg describe in [106] a method for short term load forecastingusing linear regression.

Again, a simple regression model sometimes performs even better than machine learn-ing methods, like shown in [46] where the linear regression is compared to SVM.

2.3.2.3. Time Series Analysis

[30] discusses various basic time series methods for bottom-up next day appliance con-sumption forecasting. An application of the ARIMA model for the short-term load fore-casting on a Slovenian public utility company is shown in [139].

Authors of [117] use seasonal ARIMA model to forecast electricity load demand inBrazil.

In the article [74] several ARIMA models for different day types in Malaysia are pre-sented.

In [120] different methods to forecast energy consumption up to one day ahead are com-pared and in the result an extended Holt-Winters exponential smoothing method performsbetter in terms of accuracy than ARIMA, regression with principal component analysis andneural networks.

On varying levels of aggregation Seasonal Auto Regressive Moving Average (SARMA),Support Vector Regression and Feed Forward Neural Networks based on aggregated num-ber of users are compared in [108]. SVR and NN reached the best performance especiallyin comparison to SARMA.

Authors of [48] compared ARMA and ARIMA models for household electricity con-sumption. The ARIMA model had a smaller accuracy error than ARMA.

In [58] different algorithms are benchmarked on load forecasting task for non-residentialbuildings. Although, the autoregressive model is simpler than SVM, but it performs fast,

33

2. Background and Related Work

and shows a better accuracy. Authors used variable data windows, variable weighting ofdays and introduction of an artificial noise to correct the prediction bias.

In [125] Sneha Vasudevan performed a one step ahead load forecasting for smart gridusing autoregressive models based on weather data and compared different load profiles:of residential, industrial and commercial buildings.

Authors of cite6691774 benchmarked the ARIMA model for a scalable forecasting ofincrementally clustered data.

As data analysis significantly depends on conditions and the data, sometimes even sim-ple forecasting models like autoregressive models work better than sophisticated ones likeneural networks, as shown in [107].

The traditional models like regression analysis and time series analysis are, however, of-ten too simple to model the non-linear behavior of the load. Novel approaches include theuse of artificial intelligence tools like SVMs and neural networks with their combinations.

2.3.2.4. Artificial Neural Networks

In [64] every cluster of electrical devices is equipped with an own ANN which get incre-mentally relearned as stream data comes in.

Similarly, the same authors [65] follow the approach of attaching to every smart meteran ANN model which get updated in the Online-Learning mode.

Using self-organizing maps, Marineschu et. al. in [88] also perform clustering as thefirst step and then use ANN for the prediction.

2.3.2.5. Support Vector Machines

In [73] is shown, that support vector regression shows a better scalability than linear re-gression method in terms of forecasting accuracy and number of correlated consumptionpatterns involved,.

Authors of [32] developed an online learning support vector regression model for theload forecasting on household level and achieved very efficiently accurate results.

Model proposed in [104] uses a modification of support vector machine, which outper-forms ANN and SVM on load forecasting.

Authors of [94] benchmarked Holt-Winters’ method, SVR, sigma-SVR, state space modeland seasonal ARIMA and came to the result that the average and median of all thesemethods would result in a best accuracy on a one-day-ahead forecast. On the one-hour-ahead-forecast, however, the sigma-SVR reached the best performance.

2.3.2.6. Hybrid models

A new area of research is a combination of several machine learning methods, which oftenresults in performance increases[93].

A move towards combination of methods could be observed quite early, as shown in[29].

Combined support vector regression with locally weighted regression with and achievedhighly accurate results [55].

34

2.3. Literature Review

An evaluation of several methods (Neural networks, Fuzzy logic-based and AR/AR-MA/ARIMA) resulted in no particular winner model. Authors therefore recommended acombination of all of them [89]

2.3.2.7. Other methods

Artificial intelligence methods turned out to be most complex but in most cases effective.However, they also have their disadvantages providing little value, when the demand pro-file of a household remains constant over many hours during a day. In the survey [126]benchmarking state-of-the-art load forecasting methods exactly this situation is shown.Surprisingly, the persistence method shows one of the best performances among ARIMA,neural networks, BATS, TBATS. The performance evaluation resulted in the smallest accu-racy error for the persistence method.

Several projects used load shedding, a technique discarding some input values in or-der to increase data throughput and performance, for load forecasting [34]. A clusteringapproach for load forecasting based on similar patterns [90].

In publication [82] Kopponen is using transfer functions for modeling the short termload forecasting given smart meter data.

Authors of [36] propose a system which analyzes the data coming from smart metersafter every 24 hours and makes a forecast for the devices usasge on the next day.

A probabilistic Bayesian framework, as shown in [103], provides another prospective onthe forecasting task.

Also, contributions from another direction of research, expert-knowledge driven fore-casting methods are applied in household short term load forecasting [37].

A forecasting method described in [52] separates the load time series into its compo-nents.

Paper [53] benchmarks various models for short-term load forecasting and analyzed theimpact of human activity at smart meter level. It has tested RF (Random Forests), ANNs,SVR and ARMAX (Autoregressive Moving Average Model with Exogenous Inputs) andcame to the result, that RF and ARMAX showed the best accuracy, although no impact ofhuman activity on low aggregation level was observed.

Survey on electric load forecasting from 2002 does not yet include SVM [29]. Authorssee a trend towards new methods like fuzzy logic, expert systems and neural networks.Also, a move to a combination of some of the techniques is noted.

In [134] stream mining using a fuzzy logic system is performed.Truong et. al. propose a graphical model for multi-appliance usage prediction in smart

home mnagement [121]In the article [86] short term load forecasting using Gaussian Process as forecasting func-

tion is discussed.Authors of [122] predicts future consumption of homes, using the EGH (episode gener-

ated Hidden Markov Model) [85] model, which describes the dependency between usageof several appliances, outperforming several algorithms in terms of computational costsand accuracy.

In [100] the residential power demand is modeled modeled using Markov chains.Ghofrani et. al. use Kalman filtering in [68] to perform forecasting based on smart

metering data.

35

2. Background and Related Work

2.3.3. Load Forecasting using Complex Event Processing

Zhou et.al discuss in [135] the use of complex event processing for smart grid operations.Later in 2013, the same authors extended the framework to respond the load demand in[136].

The prototype, which our work bases on achieved the best accuracy using SVR andEsper as the CEP framework[137].

Two papers are published on the competition where the dataset and the task for ourproject was obtained from. Both described approaches created an own event processingsystem and achieve very low millisecond latencies [59][102].

In [91] the application of the ESPER CEP-framework on smart grid management hasbeen analyzed. However, load forecasting was not the focus of the work.

Paper [98] discusses the software infrastrucutre needed for smart grid management, in-cluding complex event processing framework.

An approach to track the weather conditions for load forecasting using a CEP is pre-sented in [127].

Publication [95] discusses the use of complex event processing in smart grid environ-ment.

A recent Survey of the State-of-the-art in Event Processing discusses main event pro-cessing tools and compares their abilities[50].

An approach to control the average latency in stream processing systems to be boundedis to use load shedding, discarding some incoming packets, discussed in [77].

2.4. Concluding remarks

In summary, it can be stated that short-term load forecasting is a well researched area, witha huge amount of contributions due to its high industrial and economical value. Althoughnew complex artificial intelligence methods arise and dominate the current research, some-times dependent on the data set it is easier and more practical to use a simple approach.Therefore, sophisticated methods’ performance should be compared to that of the sim-plest ones, like the persistence method. As already mentioned above, the characteristicsand boundary conditions of every case study vary and have a high impact on the forecast-ing method, which should be used. With a growth of the availability of smart meteringdata, new software technologies and the appropriate hardware infrastructure should becarefully chosen to meet the main goals of the short-term load forecasting: low latencyand high accuracy.

36

Part II.

Proposed Method andImplementation

37

3. Data

The most important factor in data analytics is the underlying data and its structure. In thischapter, data used for experiments in our project is described.

3.1. Data Source

The data set originates from the 8th ACM International Conference on Distributed Event-Based Systems (DEBS) 2014 [24]. The ACM International Conference on DEBS providescompetitions with problems which are relevant for the industry.

In 2014, the conference concentrated on the ability of complex event processing systemsto apply on real-time analytics of big amount of sensor data. For this purpose householdenergy consumption measurements collected in a number of real-life German smart homeswere taken.

The measurements were collected within one month (September 2013) from sensors on2125 smart plugs, which were connected to electric devices in private households of 40houses.

Due to the real-life nature of the data, some measurements could be missing.

3.2. Structure of Data

The data in form of comma-separated values has a hierarchical structure. A house hasseveral households, which have several plugs. This allows an analysis and aggregation ondifferent hierarchy levels.

In following (table 3.1), the data set fields are explained in detail, as presented in [8]:

39

3. Data

Name Description Type Unitid unique identifier of the measurement 32 bit

unsignedinteger

Number

timestamp timestamp of the measurement 32 bitunsignedinteger

(Number ofsecondssinceJanuary 1,1970,00:00:00GMT)

value measurement value 32 bitfloatingpoint

kWh andWatt

property type of the measurement: 0 for work or 1 forload

boolean 0 or 1

plug id unique identifier of the smart plug 32 bitunsignedinteger

Number

household id unique identifier of the household where theplug is located

32 bitunsignedinteger

Number

house id unique identifier of the house where thehousehold with the plug is located

32 bitunsignedinteger

Number

Table 3.1.: Data file structure and fields’ descriptions

The data represents many univariate time series with value being the dependent vari-able and all other variables being independent variables.

The sampling interval is one second. The sampling period is 1 month. The file size ofthe data is 135 Gigabytes.

In following (figure 3.1) a snippet of the data is given:

40

3.3. Data Exploration

id , timestamp , value , property , plug id , household id , house id. . . . . . . . . .3 3 9 5 , 1 3 7 7 9 8 6 4 0 2 , 3 . 2 1 6 , 0 , 2 , 2 , 93396 ,1377986402 ,0 ,1 ,2 ,2 ,93397 ,1377986402 ,0 ,1 ,3 ,2 ,93 3 9 8 , 1 3 7 7 9 8 6 4 0 2 , 2 . 2 5 , 0 , 3 , 2 , 93399 ,1377986402 ,0 ,1 ,5 ,2 ,93 4 0 0 , 1 3 7 7 9 8 6 4 0 2 , 2 . 2 5 , 0 , 5 , 2 , 93 4 0 1 , 1 3 7 7 9 8 6 4 0 3 , 6 8 . 4 5 1 , 0 , 1 1 , 0 , 03 4 0 2 , 1 3 7 7 9 8 6 4 0 3 , 2 2 . 6 8 2 , 1 , 1 1 , 0 , 03 4 0 3 , 1 3 7 7 9 8 6 4 0 3 , 1 0 . 2 3 6 , 1 , 2 , 0 , 03 4 0 4 , 1 3 7 7 9 8 6 4 0 3 , 1 1 . 7 2 1 , 0 , 2 , 0 , 03405 ,1377986403 ,0 ,1 ,4 ,0 ,03 4 0 6 , 1 3 7 7 9 8 6 4 0 3 , 0 . 9 6 2 , 0 , 4 , 0 , 03 4 0 7 , 1 3 7 7 9 8 6 4 0 3 , 3 5 . 0 8 6 , 0 , 5 , 0 , 03 4 0 8 , 1 3 7 7 9 8 6 4 0 3 , 4 . 1 4 6 , 1 , 5 , 0 , 03 4 0 9 , 1 3 7 7 9 8 6 4 0 3 , 3 5 . 0 8 6 , 0 , 6 , 0 , 0

Figure 3.1.: Snapshot of the data file

3.3. Data Exploration

In order to get a general overview of the data, we considered the data file structure fromthe previous subsection and made an initial visualization of the first 10396 rows out of 27trillions rows using WEKA The time between two measurements of the same smart meteris approximately 2 seconds. This implies a frequency of 0.5 Hz. As can be seen from thesnippet of data from above, each work event comes always in pair with the correspondingload event of the same smart meter.

Furthermore, we could observe, that data arrives irregularly, especially at each timestamp a different number of sensor measurements is available.

Figure 3.2 visualizes this fact.

41

3. Data

Figure 3.2.: Number of events per second is not regular

42

3.3. Data Exploration

Figure 3.3.: Distribution of measurements from different plugs belonging to differenthouseholds during the first approximately 10000 data rows

From the figure 3.3 and 3.4 we can observe, that number of households in one house isbounded by 18 (from 0 to 17) and the number of plugs in one household is bounded by 14(from 0 to 13). Also, from the figure 3.4, we can confirm that the total number of houses is40.

As the total number of plugs 2125 can not be divided by 40 without a rest, each house hasa different number of plugs. Without having an overview of the whole data it is impossibleto know in advance the identifiers of all plugs.

43

3. Data

Figure 3.4.: Distribution of measurements belonging to different households and housesduring the first approximately 10000 data rows

Figure 3.4 shows that every house from the data set has a different number of house-holds.

44

3.3. Data Exploration

A visual inspection of the source data showed the fact that sometimes, two load or twowork values belonging to the same plug appear in the same second. In this case, the lattervalue will be used as a correction.

As can be seen on the left hand side of figure 3.5 there are some gaps in the namingorder of plugs, which means that plugs’ events do not arrive in the order of plugs’ nam-ing. Especially, in house number 2 (third column from the left) measurements from plugnumber 10 after more than 10000 rows still did not arrive, although measurements fromplug number 11, 12 and 13 are already available.

Figure 3.5.: Distribution of appearance of plugs measurements belonging to differenthouses during the first approximately 10000 data rows

45

3. Data

In the figure 3.6 one can see the distribution of load values of plugs during the firstseconds. Most of the values are located around zero.

The plug id for the chart in this case is calculated in WEKA as an additional attribute,composed using the formula: 1000000 · house id+ 1000 · household id+ plug id. The loadvalues are extracted from instances having 1 in the property field (work/load).

Figure 3.6.: Distribution of load values belonging to different plugs within the first approx-imately 10000 data rows

46

3.3. Data Exploration

Figure 3.7 again visualizes, that the load values are mostly concentrated around zero,what may potentially result in some computational problems, especially with the mea-surements of the accuracy and during the optimization step of learning algorithms.

Figure 3.7.: Distribution of load values of all plugs along the time axis

47

3. Data

3.4. Data Modeling

The data structure presented above provides an excellent recommendation for a modelof an event. However, the data structure does not base on a plug, but our applicationdemands building a forecasting model on device level, we need to extend the data modelwith an identifier of the plug. We chose this identifier to be the string concatenation of thehouse id ,household id and plug id.

Additionally, we extend the data structure for the event model we an extra-flag ’on’,which signalizes if the plug is turned off (load value equals zero).

The data rows can be represented using the event model as UML diagram, which isshown in the figure 3.8

EnergyEvent

- fromString (in: String): EnergyEvent

- id : Long - timestamp : Long- value : Double- work_load : Byte - plug_id : Int- household_id : Int - house_id : Int - hhp : String - hh : String - h : String- on : Double

Figure 3.8.: Model for an event for the stream processing

48

3.5. Missing Data

3.5. Missing Data

The field ’property’ puts the measurements into two categories: work and load values.Unlike the load values, work measurements are cumulative.

Work values (energy) have a lower unit resolution and are measured in kWh, whereasthe load values (power) are measured in Watt.

Therefore, it is more accurate to use load values for load forecasting. However, someload values are missing for some period of time, although work values continue countingdue to the cumulative art of representation of work. In that case, work values happen toshow a significant growth at the next available time point.

Figure 3.9, taken from [8], visualizes the problem with missing load but available workvalues between 15:36 and 19:03.

Figure 3.9.: Problem of missing load values

For the period where the load values are not available, an interpolation using the workvalues will be needed. In the next chapter the interpolation procedure will be presented.

3.6. Concluding Remarks

The data structure provides a basis for the upcoming data analysis. The used data set isnot perfect and contains some inconsistencies, discussed in the previous subsection. Also,the data is very zero-inflated, so we have to deal with a huge number of zero values. Still,the event model, we defined is simple and can easily be used for our analysis. The dataset represents many time series of different plugs, which values are mixed between allplugs’ time series. As we aim to extract a model for every plug, we will have to extractthe individual time series and look at them separately. Furthermore, we had to extractenhance the event model with additional field, needed to identify the individual plugs.Also, we added an extra field for an advanced analysis.

49

3. Data

50

4. Method and Implementation

4.1. Infrastructure

For development and initial benchmarking an 8-Core processor with 8 Gb main memoryand Windows-operating system was used. The prototype written in Scala can easily beported to any Linux-machine or cluster, using Docker [25], [26], which provides an addi-tional virtual layer.

For further benchmarking, a 64-core Linux-Machine with 512 Gb main memory wasused, what improved the performance significantly.

A sufficient computational power is mandatory for this very intensive performance con-suming task. As mentioned above, Spark Streaming can on a cluster with upto 100 nodeswith a linear scaling.

Scala runs on a JVM, so the program can easily be converted into an executable JAR-file,which can be executed on any machine with at least Java 7 installed.

4.2. Method

For the implementation and testing we decided to follow the next scheme:

1. Get to know Spark and Spark Streaming

2. Implement the short term load forecasting algorithm of [8] using the data, as definedin [24], following a modular approach

3. Implement different predictor functions using open source machine learning libraries

4. Run experiments on a normal machine

5. Run experiments on a powerful machine

In order to test several load forecasting approaches, we decided to implement a tool us-ing the in-memory distributed event stream processing framework Apache Spark Stream-ing. It is written in Scala, so our prototype is also written in Scala. Spark Streaming pro-cesses incoming data in small batches, so a batch duration has to be provided.

Furthermore, Spark Streaming provides an intuitive mechanism of a sliding window.It can easily be initialized by providing two parameters: window size and window shiftsize, where the window shift size has to be a multiple of the batch duration size.

Using the data, described in 3, the following continuous process needed to be imple-mented:

51

4. Method and Implementation

Input stream

1. Interpolate missing values if needed

2. Update values in the models using new values

3. Update forecasts, calculate accuracy, retrain models if needed, increment the slice number

Figure 4.1.: Data processing

Due to hierarchical data structure and the possibility of Spark Streaming to group theevents by a field, it is easily possible to perform load forecasting on different levels ofhierarchy: on the plug level, on the household level and on the house level. In following,the term ”plug” will be used with no loss of generality, instead of plug/household/housein order to abstract from the possible level of hierarchy.

In the first step, data (work and load values) coming as a stream, separated into smallbatches, gets tested against possible missing load values. If this is the case, an interpolationprocedure has to be performed.

The interpolation procedure returns an array of load values along with the name of theplug, which they belong to.

During the execution, a hash map of all plugs is maintained, which contains its valuesand forecasts divided into slices of one day.

After the interpolation, the load values are saved into the array of values of the appro-priate plug.

The forecasts get updated using the predicted values from a selected forecasting model.If not available, the model has to be trained.

The whole procedure repeats after an increment of the slice number, which correspondsto the shift of the window.

In the following subsections, the steps 1, 2 and 3 from the 4.1 are described in detail.

4.2.1. Interpolation

Two possible scenarios can be considered for the problem of missing load values, pre-sented in the section 3.5 due to the cumulative nature of work values.

• The work value shows an increase after the skipped region

• The work value remains unchanged after the skipped region

52

4.2. Method

The first case is presented in figure 4.2. The region of missing load values is locatedbetween the waved red lines. As can be seen on the figure, there is an increase of workvalue between the points 2 and 3, which are the boundaries of the problem region.

Point of previous work

increase Point before missing load

values

Point after missing load

values, with an increase

of work

1

2

3

4 5 6

Work (kWh)

Load (Watt)

Figure 4.2.: Scheme of a region with missing load values with an immediate work valueincrease

The second case, demonstrated in 4.3, is characterized by an unchanged work valuebefore and after the period of missing load values.

53

4. Method and Implementation

Point of previous work

increase Point before missing load

values

Point after missing load

values, without an increase

of work

1

2 3

4 5 6

7

Point of next work increase

Work (kWh)

Load (Watt)

Figure 4.3.: Scheme of a region with missing load values with a later work value increase

Work values are measured in kWh, whereas the load values are measured in Watt. Theconversion of the load into work values can be done by integrating load with respect totime. This implies the following simplified formula for the relationship between load andwork:

work(i) = work(i− 1) + load · timespan, (4.1)

where

timespan = work(i).timestamp− work(i− 1).timestamp (4.2)

and work(i), work(i−1) are the work values at time points of the increases of the values.Solving the equation 4.1 for the load using 4.2, we receive:

load =work(i)− work(i− 1)

work(i).timestamp− work(i− 1).timestamp. (4.3)

This formula can be used to interpolate the load values using the available work values.

Assuming that points 1 and 3 on the 4.2 represent work(i) and work(i − 1) values re-spectively, the work increase between that points includes the problem region and theregion between the points 1 and 2. Because the work value unit is 1 kWh, in contrast tothe load value, which unit is given in Watt, the amount of work accumulates very slowlyin comparison to changes of the load values.

54

4.2. Method

In order to take this issue into consideration, the accumulated value since the last in-crease of work value has to be substracted from the amount of work work(i)−work(i−1).

In other words, amount of work between points 2 and 3 is equal to amount of workbetween points 1 and 3, minus the amount of work between points 1 and 2.

Additionally, every value has 3 decimal places. This means, the work increases by 1 Wh,which is 3600 Watt second, whereas the identifiable load change is 0.001 Watt.

These facts imply the following formula for the missing region:

load =work(p3) ∗ 1000 ∗ 3600− work(p2) ∗ 1000 ∗ 3600− (work(p2)− work(p1))

work(p3).timestamp− work(p2).timestamp. (4.4)

The multiplication with 1000 and 3600 leads to a conversion of the work units fromkWh to Watt second. The conversion of the work values is needed, because the timestampis given in seconds and the load is measured in Watts. The accumulated work until themissing region is already measured in Watt seconds due to the nature of its calculationusing loads and timestamps, so it does not need to be converted to another unit.

The other case, when there was no work increase after the missing data, means an in-significant growth of work (less than 1 Wh, i.e. 3600 Ws) since last work value increase.Therefore, the amount of load can be interpolated using the maximum value of 3600 Wattsecond between two subsequent work increases.

As the interpolation takes place before the actual consideration of the work value atpoint 7 in figure 4.2, the amount of work between points 3 and 7 cannot be known exactlyin advance. But, the amount of work between the points 2 and 3 (the missing region) isequal the amount of work between points 1 and 7, minus the amount of work betweenpoints 1 and 2, minus the amount of work between points 3 and 7.

The insecurity about the unknown work between the points 3 and 7 is expressed usingthe half of work inside the missing region.

The following formula describes the interpolated value for this case:

load =3600− (work(p2)− work(p1))

2 · (work(i).timestamp− work(p2).timestamp). (4.5)

In addition to the mentioned formulas, each Spark sliding window may cover a value,needed for the interpolation or not. If not, then it will be available in the next batch, or wasavailable in the previous one. However, as the data needs to be processed just in time, anassumption about the missing work or load value in the sliding window has to be done. Asdiscussed in the chapter ??, work and load of a plug usually come as subsequent events.In our approach, the latest work increase is saved in the plug object, in order to have alldata for interpolation.

Based on this, and taking into account some possible fluctuations, the following prob-lems needed to be solved:

55

4. Method and Implementation

1. Whole missing region is inside the sliding window

2. Increase of work value is ”to the right” of the sliding window

3. Corresponding load value is ”to the right” of the sliding window

4. Increase of work value is ”to the left” of the sliding window

5. Corresponding load value is ”to the left” of the sliding window

6. Both work and load values are ”to the left” of the sliding window

When a value is ”to the right” of the sliding window, the last one inside is taken. When avalue is located ”to the left” of the sliding window, the first one inside the sliding windowis taken instead.

Both values to the left of the sliding window is a situation when, the appropriate valueswere in the window during before the last shift. In that case, the work increase and thecorresponding load are already saved in the appropriate plug project.

After the interpolation is done, an array of load events is passed to the procedure ofupdating the values.

4.2.2. Architecture

In figure 4.4 an overview over the architecture and data flow inside the prototype is given.The program bases on the prototype developed in [137], which allowed an online predic-tion given a stream of energy measurements.

56

4.2. Method

Events Listener

Group by plug / house

Sliding Window

Events Generator

Data preprocessing/interpolation

Update values in memory

Calculate forecast plugs

Forecasting models

Relearn forecasting model

Calculate forecasting

accuracy

yesIf accuracy

is bad

noUpdate Slice Nr

Figure 4.4.: Data flow and architecture of the prototype

57

4. Method and Implementation

SVMLightPredictor

- model: SVMLightModel

<<Interface>>

Predictor

predictValue(p: ForecastingUnit) : Double

SparkPredictor

- model1: LogisticRegressionModel- model2: LinearRegressionModel

Generator

- main(args : Array[String])

Listener

- plugs : HashMap[String, Plug]

- main(args : Array[String])

BaselinePredictor

- model: BaselineModel

relearnModel(p: ForecastingUnit)

EnergyEvent

- fromString (in: String): EnergyEvent

- id : Long - timestamp : Long- value : Double- work_load : Byte - plug_id : Int- household_id : Int - house_id : Int - hhp : String - hh : String - h : String- on : Double

StreamTimer

- WindowShiftSizeInSeconds : Long- WindowSizeInSeconds: Long- DayNumber : Long- ShiftNr : Long- ShiftModuloDay : Int

Plug

- p : Predictor- lags1 : Array[Double]- lags2 : Array[Double]- lags3 : Array[Double]- lags4 : Array[Double]- lags5 : Array[Double]- lags6 : Array[Double]- lags7 : Array[Double]- actual : Array[Double] - forecastvalues:Array[Double}- name : String- houseid : Int- median : Array[Double]- valuesWindow : Array[EnergyEvent]

- updateValues()- updateMedian()- calculateAccuracy()- updateForecast()

PersistencePredictor

Figure 4.5.: UML class diagram: application architecture

In figure 4.5 a UML class diagram for the developed tool is shown. The tool basicallycontains two main actors: a generator (Scala class Generator) and a listener(Scala classListener). The generator reads the data set and sends it to a defined port, simulating theprocess of receiving the data from the sensors at the time stamp defined in each data row.The listener, on the other hand, receives the data from the port and processes it.

The class Plug is responsible for the modeling of the individual plugs. It contains a list

58

4.2. Method

of lag values in form of slices within one day. Also, it contains a list of actual and corre-sponding forecast values as slices of one day.

An independent timer of window shifts is implemented in the object StreamTimer. Itupdates the shift number with every window shift and enables the control over the fore-casting process in slices.

Every class, which inerits from the class Predictor needs to implement its two methods:relearnModelPlug() and predictValuePlug(). This way, predictors can easily be exchanged.

Currently four models are implemented: SVMLightPredictor, BaselinePredictor, SparkPre-dictor and PersistencePredictor.

In following, these four models are described in details.

4.2.3. Forecasting methods

The idea for the implementation was provided by the challenge of the conference DEBS2014 [8].

Every sliding window shift corresponds to a slice of the data. The load values of theslice is calculated as an average of all load values inside the slice. An array of slices insidea period of 24 hours is maintained in every plug model. This idea relies on the cyclicalpattern of the electricity consumption, during a day.

A lag value of the current slice is a value of the same slice of the day before. In ourmethod, values of the seven last days are saved and used for the forecasting, if they areavailable.

4.2.3.1. Median-based method

The first method is the implementation of the baseline algorithm of the mentioned confer-ence.

It uses a calculation of the running median across the lag values and the actual value.In our implementation the forecast for the next slice is calculated as the average of therunning median of the lags and the value of the actual slice, schematically demonstratedin 4.6.

f(i+ 1) =actualV alue+median(lags)

2(4.6)

59

4. Method and Implementation

V3 V4V2V1

24 hours 24 hours 24 hours

Figure 4.6.: Slice-based model for the forecasting with a representation of lag values, anactual value and a value to be predicted

In following, this method will be referred to as the median-based or baseline method.The calculation of the running median is approximated using an efficient formula, pre-

sented in [14].

4.2.3.2. SVM-based method

The next implemented method, the SVMLightPredictor, uses an open-source library SVM-Light [16], using a Java interface for it from [11].

V3 V4 V5V2V1

24 hours 24 hoursSliding window

Figure 4.7.: Slice-based model for forecasting with a representation of lag values, slidingwindow and a value to be predicted

The support vector machine is trained using either the lag values, or the values of thelast slices, or only the values inside the sliding window along (when a sufficient numberof previous slices is still not available).

model = Train(lags, slidingwindow) (4.7a)f(i+ 1) = model.predict(i+ 1) (4.7b)

It again relies on the sliding window approach and on the schema of slices. The figure4.7 demonstrates the forecasting using SVMLightPredictor.

4.2.3.3. Regression-based method

The next method, is an attempt to use the native Spark Machine Learning library [13],which can be executed in parallel. MLLib provides an implementaion of linear regression

60

4.2. Method

with stochastic gradient descent. Besides, it provides an implementation of logistic regres-sion for classification. We slightly adjusted the logistic regression in MLLib to return afloat value representing a probability instead of a classification label.

The use of linear regression alone did not lead to good results. The learned model pre-dicted oscillated values due to the stochastic gradient descent optimization and a largenumber of zero values in the stream. The big problem of the current gradient descent im-plementation is the inability to handle sparse matrices.

Other methods like Support Vectore Machines or Neural Networks are in MLLIB stillnot implemented.

Therefore, a combination of the linear and logistic regression was used to make fore-casts. For this purpose, the plug model received an additional field maxValue, which gotupdated every time, an incoming load value exceeded the maximal load value of the plug.A simple approach, presented in [60] was taken as a reference for the method. The valuesinside the sliding window were marked whether the values are 0 or not. The non-zerovalues were transformed using the simple formula xt = ln(x+1). This transformation didnot change the 0’s, but significantly decreased large load values, as shown in 4.8.

The transformed values were put into the linear regression, whereas the boolean array,containing the information for load values about being zero or not, was the input for thelogistic regression.

Then, a weighted sum of both regression calculations was taken a prediction:

f(i+ 1) = linearRegression(transformed))backTransform · 0.25++logisticRegression(isNonZero) ·maxV alue · 0.75

(4.8)

Although, this approach led to a stable forecasting method and the order of magnitudewas not oscillating and not extremely exceeding, as with linear regression alone, the fore-cast accuracy was not sufficient.

61

4. Method and Implementation

22 0 19 29 0 28 0 0

1 0 1 1 0 1 0 0

IsNonZero

Values

3.14 3.00 3.40 3.37

Transformed, using xt = ln (1+x)

0 0 0 0

Figure 4.8.: Transformation of data for the regression-based forecasting method

4.2.3.4. Persistence method

The next implemented method belongs to the simplest but effective approaches: the per-sistence method.

V3 V4 V1 V2 V3 V4 V5

V3 V4 V1 V2 V3 V3 V4 V5

V3 V4 V1 V2 V3 V4 V4 V5

Figure 4.9.: Persistence forecasting method

It simply forecasts the value for the next slice using an equal value of the actual slice.

f(i+ 1) = values(i) (4.9)

4.2.4. Configuration

The configuration of the tool is performed using the program parameters. The generatorcan be adjusted with the following parameters:

62

4.2. Method

1. Destination port, e.g. 444444

2. File name of the data set, e.g. data.csv

3. Data rate: how much time in milliseconds should pass between two events, e.g. 50or -1 for simulation of reality

For the generator the parameters are the port number, file name of the data set and a valueresponsible for the control of the data rate. The listener has the following parameters:

1. Source host, e.g. localhost

2. Source port, e.g. 44444

3. Number of threads for the ”Map” calculations, e.g. 64

4. Number of threads for the ”GroupBy” calculations, e.g. 2

5. Spark batch duration in seconds, e.g. 5

6. Spark sliding window size in seconds, e.g. 30

7. Spark sliding window shift size in seconds, e.g. 10

8. Level of hierarchy: h for house, p for plug

9. Whether the debug information about current data rate should be printed: noRatesor rates

10. Forecasting algorithm: median for BaselinePredictor, svm for SVMLightPredictor,persist for PersistencePredictor and spark for SparkPredictor

11. Number of plugs/houses to be considered: 0 for all, value greater than zero for acertain number of plugs/houses

63

4. Method and Implementation

4.2.5. The Strategy of Relearning

In order to avoid the problem of concept drift, the model has to be relearned. One way todeal to with concept drifts is to forget outdated examples. In this case there is a trade-offbetween number of examples needed for training of the model and number of examplesneeded to be forgotten.

We assume that the underlying distribution of the sensor measurements data can changeover time. So the learning model has to be adapted after some intervals of time. There canbe two approaches:

• retrain a model in batches

• change the model incrementally

In our task, the training set of the future becomes the testing set of the current step, whichcan be understood as a supervised learning task.

The model in our case is retrained either is the accuracy became not sufficient (less than80 percent) or if the model was trained many slices ago and the accuracy is below 90percent).

4.2.6. Other Considered Machine Learning Libaries

It was considered to use R at the background to have access to its powerful time seriesanalysis algorithms. However, after some tests, the interface from Scala to R turned out tobe too complicated to invest time in its investigation. R offers a variety of libraries veryuseful in application with our task.

Another idea was to use the MATLAB online-learning implementation of the supportvector regression algorithm. However, during the execution of the program on MATLABseveral temporary files are created, which block the access to them and make therefore thelatency unneccessary large.

The library LASVM also provides an online learning implementation of the support vec-tor machine, but it only can be used for classification.

We decided not to use WEKA, because it has nearly the same implementation of SVMas SVMLight, so it would not have contributed much.

The Scala machine learning library Nak [17] provides only a linear regression. So, therewould also be no contribution, as we decided to use the native Spark MLLib.

4.2.7. Further Implementation Details and Limitations

As Spark Streaming does not provide any information on the number of current windowshift, we had to implement an additional stream timer, which updates after every shift.Fortunately, Spark Streaming provides and interface StreamingListener, with functions on-BatchCompleted and onBatchStarted. We used these functions to implement the slice number

64

4.3. Concluding Remarks

update after every completed batch and to measure the latency the window shifts. Furthe-more, in this intermediate step, the forecasts of all models can be updated.

In order to be able to group events on different levels of hierarhcy (household, house),additional identifier attributes on the level of the event model had to be introduced. Suchattribute identifies the plug with its name as a concatenation of the plug id within thehousehold, household id within the house and the global house id.

The data preprocessing part (interpolation of load values) as well as the structure of theclass EnergyEvent heavily depend on the structure of data. So, the current implementationcan be used without modification only for the the data set of DEBS 2014.

The implementation of the median-based algorithm can only be successfully bench-marked, when the program runs for more than a week, since an exceed of the real-timeperformance was not achieved: every window shift requires at-least additional sub-secondlatencies. Therefore, a one month data set can not be processed in less than a month withSpark Streaming.

Modern methods are better suited for non-linear forecasting.

4.3. Concluding Remarks

The implementation required an insight into the domain knowledge and forecasting meth-ods. The program relies on Spark Streaming methods and on its performance, which couldbe a bottleneck if some methods like sliding window would not have existed. In the nextchapter, it will be shown, how the described prototype performs in practice using a pre-sentation of experimental results.

65

4. Method and Implementation

66

Part III.

Experimental Results and Outlook

67

5. Experiments and Results

The prototype was tested on two different systems. The first one had 8 GB of main memoryand a 8-core processor. Another one had 512 Gb of main memory and a 64-core processor.

The tests included the successive start of the events generator and events listener pro-grams, with different parameters which represented our research questions.

The programs were run for a duration of 50 sliding window interval shifts. After that,the programs were manually stopped and the test results were extracted from a log file.The log-files were created with the start of the program and were populated every 10 slid-ing window interval shifts.

The generator was configured to produce events in the time, they should have been pro-duced in reality, that means the generator it simulated the sensors from the data set.

The measure used to estimate the latency is the 90th percentile, that is a value, belowwhich 90 percent of observations fall.

5.1. Latency

The first iteration of Spark Streaming requires some more time than the next ones, becauseduring the first iteration the application has to be initialized and Spark Streaming has toorganize its environment.

Therefore, most test results have a significant latency peak during the 0-th iteration. Weconsidered this interval for the latency measure as well, ass the data stream already sentevents during this iteration.

As the sensor data was synthesized, we decided not to concentrate on the accuracy ofthe methods. Furthermore, the accuracy of the methods does not much depend on theselected infrastructure and framework. The focus of the experiments was, therefore, thelatency.

In all following tables, the 90th percentiles of the latency, measured in millisecondsare presented.

During the first testing stage, all four forecasting methods were benchmarked the 8-coremachine using a different maximal number of threads for Spark Streaming. The listenerprogram was configured with a sliding window of 30 seconds and 10 seconds shift size.

69

5. Experiments and Results

Method \ Number of threads 2 4 6 8 10 12

SVM 1874 795 670 725 587 606

Regression-based 1486 827 712 769 647 630

Median-based 2293 939 764 756 750 793

Persistence 2253 885 839 666 732 699

Table 5.1.: Comparison of latency for different forecasting methods for 1 plug and a slidingwindow of 30 seconds with 10 seconds shift size and 1 grouping thread

The batch duration of Spark Streaming was 5 seconds. Furthermore, in this test the num-ber of threads for grouping operations was set to 1 and all plugs except one of them werefiltered out from the experiment.

In the figure 5.1 of each method is presented. The corresponding heat map is shownbelow in 5.1.

As can be recognized on the heat map 5.1, the worse performance was shown by allmethods, when only two threads were available. An increase of number of threads almostautomatically led to a decrease of the latency. Best performance was shown by the sup-port vector regression. All methods performed comparable and achieved the sub-secondlatency, except the case with 2 threads.

Figure 5.1.: Comparison of latency performance of different forecasting methods on a 8-core machine on a heatmap

The same tests were run on the 64-core machine and led to the results, presented in the

70

5.1. Latency

Method \ Number of threads 16 32 64 128 256

SVM 682 685 681 640 657

Regression-based 672 696 693 698 659

Median-based 725 641 691 655 675

Persistence 722 710 684 695 695

Table 5.2.: Comparison of latency performance of different forecasting methods on the 64-core machine

table 5.2 and the heatmap chart 5.2. Also in this case, the SVM-based model performedbetter than other methods. The persistence method was surprisingly not that successful.

Figure 5.2.: Comparison of latency performance of different forecasting methods on the64-core machine on a heatmap

A combination of the test results is presented in the heat map 5.3. It demonstrated adecreasing latency with an increasing number of threads as expected. The methods havea comparable latency values, with a slight advantage of the SVM-based method.

71

5. Experiments and Results

Figure 5.3.: Combined heat map with latency performance of all algorithms on bothmachines

The next test aimed to understand a second thread parameter of Spark Streaming: num-ber of threads used for the grouping operations. This test is significant for our application,because all events in the initial step are grouped by the plug or house identifier.

For the tests the persistence method on one house was used. The hierarchy of a houseassured that the values are not always zero. The configuration of the sliding windowremained the same (30 seconds size, 10 seconds shift size, 5 seconds batch duration).

As can be seen on the heat map 5.4, a growing number of grouping threads does notnecessarily lead to an increase of performance. The corresponding table is shown in 5.3.

Instead, there seems to be an optimal number grouping threads, which on this 8-coremachine is equal to 2.

Figure 5.4.: Comparison of latency performance depending on different number of threadsfor grouping as a heatmap

The tests on the 64-core machine showed a similar behavior, with event a stronger inten-

72

5.1. Latency

Nr gr. thr. \ Nr of thr. 2 4 6 8 10 12

1 1166 749 664 575 684 563

2 1289 869 516 588 545 546

4 1786 1196 784 592 607 765

Table 5.3.: Comparison on impact on latency of different number of grouping threads onthe 8-core machine

sity. Also for a 64-core machine, the number 2 of grouping threads seems to be optimal.

Figure 5.5.: Comparison on impact of different number of grouping threads on latency onthe 64-core machine

The summary and combination of the test results on both machines is presented on theheat map 5.6. One can easily recognize that the performance increases with a larger num-ber of threads available, but the grouping threads number should be selected carefully.The chart shows better performance, when a lower number (1 or 2) of grouping threads isused, rather than a higher one.

73

5. Experiments and Results

Nr gr. thr. \ Nr of thr. 16 32 64 128 256

1 553 667 575 568 576

2 470 544 465 541 476

4 511 506 499 529 529

Table 5.4.: Comparison on impact of different number of grouping threads on latency onthe 64-core machine

Figure 5.6.: Combined heatmap with a comparison of impact of different number of group-ing threads on latency on both machines

A comparison in terms of latency of the SVM-based predictor and the persistence method,applied on one house with the same sliding window configuration but 2 grouping threads,is shown in table 5.5 and the corresponding heat map 5.8 on the 8-core machine. Thesuprising results show that the SVM was able to learn and predict the values at the samelatency as the persistence method.

74

5.1. Latency

Nr grp. thr. \ Nr of thr. 2 4 6 8 10 12

SVM 1907 656 543 463 529 504

Persistence 1289 869 516 588 545 546

Table 5.5.: Comparison of latency of SVM-based method and the persistence method using2 threads for grouping on forecasting of load values of one house, on the 8-coremachine

Figure 5.7.: Comparison of latency of SVM-based method and the persistence method us-ing 2 threads for grouping on forecasting of load values of one house, on the8-core machine, as a heatmap

75

5. Experiments and Results

Nr gr. thr. \ Nr of thr. 16 32 64 128 256

SVM 548 551 455 506 464

Persistence 470 544 465 541 476

Table 5.6.: Comparison of latency of SVM-based method and the persistence method using2 threads for grouping on forecasting of load values of one house, on the 64-coremachine

Figure 5.8.: Comparison of latency of SVM-based method and the persistence method us-ing 2 threads for grouping on forecasting of load values of one house, on the8-core machine, as a heatmap

76

5.1. Latency

Bacth dur. \ Nr of thr. 2 4 6 8 10 12

2 1818 1830 1820 1839 1950 1839

5 1757 1810 1810 1717 1741 1714

10 1717 1753 1749 1717 1712 1688

Table 5.7.: Comparison of throughput (records per second) depending on the SparkStreaming batch diration on the 8-core machine

Figure 5.9.: Comparison of latency of SVM-based method and the persistence method us-ing 2 threads for grouping on forecasting of load values of one house, on bothmachines

The throughput was measured using a number of elements inside RDDs in every secondwithin the persistence forecasting method, a 30 second-window with 10 second shift sizeand 2 grouping threads. The results are presented in 5.1, 5.10, 5.1 and 5.11 and 5.12.

In general, there is an optimal batch duration, which has to be accurately adjusted. Inour case, the system throughput was optimal when the batch duration was 5 seconds,although a higher throughput is given when the batch duration decreases. The number ofapproximately 1700 is almost equal to the number of events sent from the data set everysecond. Therefore, the batch duration of 5 seconds allowed a continuous near real-timedata flow exponential growth of the latency.

77

5. Experiments and Results

Batch dur. \ Nr of thr. 16 32 64 128 256

2 1904 1839 1820 1814 1950

5 1734 1715 1761 1701 1757

10 1742 1689 1695 1680 1695

Table 5.8.: Comparison of throughput (records per second) depending on the SparkStreaming batch diration on a 64-core machine

Figure 5.10.: Comparison of throughput (records per second) depending on the SparkStreaming batch diration on a 64-core machine

78

5.1. Latency

Figure 5.11.: Comparison of throughput (records per second) depending on the SparkStreaming batch diration on a 64-core machine, as a heatmap

Figure 5.12.: Comparison of throughput (records per second) depending on the SparkStreaming batch diration on both machines, as a heatmap

Figures 5.13, 5.14 and 5.15 demonstrate the performance of the persistence method ondifferent machines with different sliding window sizes. The number of grouping threadswas selected at 2. The window shift size was 10 seconds in every case. For the slidingwindow of 5 minutes it seems to be not a perfect choice. A not presented result showed,

79

5. Experiments and Results

Window size \ Nr of thr. 2 4 6 8 10 12

5 Min 23352 5743 4615 4186 4796 4999

1 Min 2768 1527 922 799 916 891

30 Sec 1289 869 516 588 545 546

Table 5.9.: Latency depending on different sliding window sizes and number of threads,on the 8-core machine

Nr grouping threads \ Number of threads 16 32 64 128 256

5 Min 3683 3574 3975 3908 3714

1 Min 707 791 806 651 691

30 Sec 470 544 465 541 476

Table 5.10.: Latency depending on different sliding window sizes and number of threads,on the 64-core machine

80

5.1. Latency

that the optimal shift size turned out to be optimal approximately at1

6of the window size.

Nevertheless, as can be seen on the values in tables 5.1, 5.1 and the heatmaps, thenumber of threads does not have a direct impact on the latency, except for the case withonly 2 threads, which has always the worst performance.

Figure 5.13.: Impact of different sliding window sizes and and number of threads, on the8-core machine

Figure 5.14.: Impact of different sliding window sizes and and number of threads, on the64-core machine

81

5. Experiments and Results

Figure 5.15.: Impact of different sliding window sizes and and number of threads, on bothmachines

The tests involving all houses of the data set or all plugs are not described, due to thefollowing issue.

Our tests showed that some data got discarded when the system is not able to processthe items fast enough, before a new shift. Therefore, it becomes difficult to compare severalmethods, when a different amount of data is processed and updated with new incomingvalues. The filtering of the number of houses/plugs involved into testing takes place at alater stage, where all events are available for processing.

5.2. Analysis of the Experimental Results

5.2.1. Latency

The latency seems to be depending on many different parameters. Basically, an accurateperformance tuning needs to be done in advance. Summarizing, all methods perform al-most equally well, achieving sub-second latencies, when there are more than 2 threadsassigned. Also important is to use 2 threads for the grouping operation.

A high throughput involved sometimes the problem of the system, that it did not pro-cess the events. Therefore, it is important to have a throughput, which is not as necessarilyhigh as real-time, but can enable the processing of the values.

In general, the system’s throughput is high enough to maintain a good data rate andrespond with sub-second latencies.

In general, a higher number of threads (move to a more powerful machine) led to a bet-ter latency performance.

82

5.2. Analysis of the Experimental Results

A combination of the right window and shift size is also important.

5.2.2. Forecasting Accuracy

Although, the forecasting accuracy was not the focus of our work, the not presented re-sults showed that SVM and persistance method were comparable with over 90 percent ac-cuaracy. The invented combination of linear and logistic regression did not perform well,struggling on small oscillations and a higher order of magnitude of prediction. This indi-cates a need of an efficient and accurate implementation of regression methods in MLLib,capable of working with zero-inflated data.

5.2.3. Discusson and Contributions

The evaluation of in-memory distributed stream processing system Spark Streaming asa forecasting framework for short-term load forecasting showed that it can be used toprocess a significant amount of data in near real-time. However, the sub-second latency,achieved during the tests is usually not sufficient in the field of an efficient short-term loadforecasting.

Due to the modular structure of the developed prototype, it can be used in future tobenchmark different predictor methods. According to our knowledge, it was one of thefirst usages of Spark Streaming for short term load forecasting.

The throughput and the processing latency highly depend on the configuration param-eters of Spark Streaming. Nevertheless,

5.2.4. Open Problems and Limitations

A benchmarking of the method on a cluster is needed in order to answer an upcomingquestion of the scalability of the approach.

The scalability of Spark Streaming should be very good, however, it would be an inter-esting to benchmark which forecasting method scales better and can be parallelized.

Another open problem is whether the online learning mode is a better approach in thisenvironment, as we did not benchmark any online learning methods.

An optimal time point to relearn the model is still an open question, at least in the pro-posed infrastructure and the method.

A limitation is the need for scalable parallel algorithms, which do not fall do not createaccess violations due to dependencies on the same files.

83

5. Experiments and Results

84

6. Conclusions and Outlook

The prototype was successfully implemented and tested on the task of short term loadforecasting. In following, we discuss, which research questions were answered.

6.1. Conclusions

The tested environment achieved its goals and was able to maintain a relatively highthroughput of smart metering data. It achieved low sub-second latencies. Different al-gorithms were implemented and tested, showing a nearly the same latency performance.

A higher number of threads and processor-cores leads to a slightly lower latency, withsome more parameters, which can be adjusted.

We did not find an optimal point to relearn the model, but proposed a strategy, whichdepends on the accuracy of the method. This strategy was successfully applied.

A smaller sliding window seems to be better in terms of latency perfomance.

However, tests on a large number of houses/plugs did not lead to a sufficient result.

6.2. Future work

The next step in evaluating the performance of the mini-batch streaming processor ofSpark Streaming can be the application of the program on a large cluster in order to de-crease the latency.The model behind the prototype should be hold modular to enable use of different predic-tors in future.Another direction of research is an implementation and benchmarking of a stable machinelearning prediction method in MLLib.

Tests on real data (with no artificial components) and accuracy tests on the used fore-casting methods could increase the knowledge about the possibilities of Spark Streamingfor short term load forecasting.

85

Bibliography

[1] Apache Hadoop NextGen MapReduce (YARN). http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed on2014-05-26.

[2] Apache S4. Distributed Stream Computing Framework. http://incubator.apache.org/s4/. Accessed on 2014-05-26.

[3] Apache Spark. Lightning-fast cluster computing. http://spark.apache.org/.Accessed on 2014-05-26.

[4] Apache Spark Streaming. http://spark.apache.org/streaming/. Accessedon 2014-05-26.

[5] Apache Storm. Distributed and fault-tolerant stream computation. http://storm.incubator.apache.org/. Accessed on 2014-05-26.

[6] Esper. Complex Event Processing. http://esper.codehaus.org/. Accessed on2014-05-26.

[7] Forecast Methods used in ezForecaster. http://www.ezforecaster.com/fcmethod.htm. Accessed on 2014-05-26.

[8] Grand challenge — the 8th acm international conference on distributed event basedsystems (debs 2014). http://www.cse.iitb.ac.in/debs2014/?page_id=42. Accessed on 2014-05-27.

[9] Hadoop mapreduce tutorial. https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html. Accessed on 2014-05-26.

[10] How is predictive analytics different from forecasting? http://www.predictiveanalyticsworld.com/faq.php#q3-2. Accessed on 2014-05-06.

[11] Java native interface (jni) for svmlight. http://adrem.ua.ac.be/˜tmartin/.Accessed on 2014-06-16.

[12] S4. Distributed Stream Computing Platform. Fail-over mechanism. http://incubator.apache.org/s4/doc/0.6.0/fault_tolerance/. Accessed on2014-05-27.

[13] Spark Machine Learning Library. http://spark.apache.org/mllib/. Ac-cessed on 2014-05-26.

87

Bibliography

[14] Stackoverflow: Median filter super efficient implementa-tion. http://stackoverflow.com/questions/11482529/median-filter-super-efficient-implementation/15150968#15150968. Accessed on 2014-06-16.

[15] Streambase. complex event processing. event stream processing. streambase stream-ing platform. http://www.streambase.com/. Accessed on 2014-05-26.

[16] Svmlight: implementation of support vector machine in c. http://svmlight.joachims.org/. Accessed on 2014-06-16.

[17] The Nak Machine Learning Library. https://github.com/scalanlp/nak. Ac-cessed on 2014-06-16.

[18] ”Grid 2030” - A National Vision for Electricity’s Second 100 Years.http://energy.gov/sites/prod/files/oeprod/DocumentsandMedia/Electric_Vision_Document.pdf, 2003. Accessed on: 2014-06-10.

[19] Reuters: U.S. smart grid to cost billions, save tril-lions. http://www.reuters.com/article/2011/05/24/us-utilities-smartgrid-epri-idUSTRE74N7O420110524, 5 2011. Ac-cessed on 2014-06-10.

[20] Distributed Data Mining and Big Data, 8 2012. Accessed on 2014-05-26.

[21] Bundesministerium fuer Wirtschaft und Energie: Erhebung des Energieverbrauchsder privaten Haushalte fuer die Jahre 2009-2010, 2013.

[22] Forecasting: principles and practice, 2013.

[23] What is the Smart Grid? Definitions, Perspectives, and Ultimate Goals. TarbiatModares University (TMU), 2013.

[24] The 8th acm international conference on distributed event based systems (debs2014). http://www.cse.iitb.ac.in/debs2014/, 3 2014.

[25] Docker - open source project to pack, ship and run any application as a lightweightcontainer. https://www.docker.io/, 2014. [Online; accessed 22-April-2014].

[26] How to spin up a Spark cluster using Docker.https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/,2014. Accessed on 2014-04-22.

[27] Gail Adams, P.Geoffrey Allen, and Bernard J. Morzuch. Probability distributions ofshort-term electricity peak load forecasts. International Journal of Forecasting, 7(3):283– 297, 1991.

[28] John Aldrich. Fisher and regression. Statistical Science, 20(4):401–417, 11 2005.

88

Bibliography

[29] Hesham K. Alfares and Mohammad Nazeeruddin. Electric load forecasting: Liter-ature survey and classification of methods. International Journal of Systems Science,33(1):23–34, 2002.

[30] Nicoleta Arghira, Lamis Hawarah, StA©phane Ploix, and Mireille Jacomino. Pre-diction of appliances energy use in smart homes. Energy, 48(1):128 – 134, 2012. 6thDubrovnik Conference on Sustainable Development of Energy Water and Environ-mental Systems, {SDEWES} 2011.

[31] Bogdan Atanasiu and Paolo Bertoldi. Residential electricity consumption in newmember states and candidate countries. Energy and Buildings, 40(2):112–125, 2008.

[32] Zeyar Aung, Mohamed Toukhy, John Williams, Abel Sanchez, and Sergio Herrero.Towards accurate electricity load forecasting in smart grids. In DBKDA 2012, TheFourth International Conference on Advances in Databases, Knowledge, and Data Applica-tions, pages 51–57, 2012.

[33] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom.Models and issues in data stream systems. In Proceedings of the twenty-first ACMSIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1–16.ACM, 2002.

[34] Brian Babcock, Mayur Datar, and Rajeev Motwani. Load shedding in data streamsystems. In CharuC. Aggarwal, editor, Data Streams, volume 31 of Advances inDatabase Systems, pages 127–147. Springer US, 2007.

[35] A.G. Bakirtzis, J.B. Theocharis, S. J. Kiartzis, and K.J. Satsios. Short term load fore-casting using fuzzy neural networks. Power Systems, IEEE Transactions on, 10(3):1518–1524, Aug 1995.

[36] A. Barbato, A. Capone, M. Rodolfi, and D. Tagliaferri. Forecasting the usage ofhousehold appliances through power meter sensors for demand management in thesmart grid. In Smart Grid Communications (SmartGridComm), 2011 IEEE InternationalConference on, pages 404–409, Oct 2011.

[37] Kaustav Basu, Mathieu Guillame-Bert, Hussein Joumaa, Stephane Ploix, and JamesCrowley. Predicting home service demands from appliance usage data. InternationalConference on Information and Communication Technologies and Applications ICTA, 2011.

[38] Albert Bifet and Richard Kirkby. Data stream mining a practical approach. 2009.

[39] Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algo-rithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop onComputational Learning Theory, COLT ’92, pages 144–152, New York, NY, USA, 1992.ACM.

[40] P.J. Brockwell and R.A. Davis. Introduction to Time Series and Forecasting. SpringerTexts in Statistics. Springer New York, 2013.

89

Bibliography

[41] M. Caciotta, R. Lamedica, V. Orsolini Cencelli, A. Prudenzi, and M. Sforna. Appli-cation of artificial neural networks to historical data analysis for short-term electricload forecasting. European Transactions on Electrical Power, 7(1):49–56, 1997.

[42] A. Capasso, W. Grattieri, R. Lamedica, and A. Prudenzi. A bottom-up approach toresidential load modeling. Power Systems, IEEE Transactions on, 9(2):957–964, May1994.

[43] E. Ceperic, V. Ceperic, and A. Baric. A strategy for short-term load forecasting bysupport vector regression machines. Power Systems, IEEE Transactions on, 28(4):4356–4364, Nov 2013.

[44] Bo-Juen Chen, Ming-Wei Chang, and Chih-Jen Lin. Load forecasting using supportvector machines: a study on eunite competition 2001. Power Systems, IEEE Transac-tions on, 19(4):1821–1830, Nov 2004.

[45] Bo-Juen Chen, Ming-Wei Chang, and Chih-Jen Lin. Load forecasting using supportvector machines: a study on eunite competition 2001. Power Systems, IEEE Transac-tions on, 19(4):1821–1830, Nov 2004.

[46] Chao Chen and Diane J. Cook. Behavior-based home energy prediction. In Proceed-ings of the 2012 Eighth International Conference on Intelligent Environments, IE ’12, pages57–63, Washington, DC, USA, 2012. IEEE Computer Society.

[47] Hong Chen, C.A. Canizares, and A. Singh. Ann-based short-term load forecastingin electricity markets. In Power Engineering Society Winter Meeting, 2001. IEEE, vol-ume 2, pages 411–415 vol.2, 2001.

[48] Pasapitch Chujai, Nittaya Kerdprasop, and Kittisak Kerdprasop. Time series analy-sis of household electric consumption with arima and arma models. In Proceedings ofthe International MultiConference of Engineers and Computer Scientists 2013, volume 1,pages 217–235, Hong Kong, 3 2013.

[49] Gianpaolo Cugola and Alessandro Margara. Processing flows of information: Fromdata stream to complex event processing. ACM Computing Surveys (CSUR), 44(3):15,2012.

[50] Otavio M de Carvalho, Eduardo Roloff, and Philippe OA Navaux. A survey of thestate-of-the-art in event processing. 2013.

[51] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on largeclusters. San Francisco, 12 2004.

[52] Ni Ding, Y. Besanger, F. Wurtz, G. Antoine, and P. Deschamps. Time series methodfor short-term load forecasting using smart metering in distribution systems. InPowerTech, 2011 IEEE Trondheim, pages 1–6, June 2011.

[53] Yong Ding, Martin Alexander Neumann, Per Goncalves Da Silva, and Michael Beigl.A framework for short-term activity-aware load forecasting. In Joint Proceedings ofthe Workshop on AI Problems and Approaches for Intelligent Environments and Workshopon Semantic Cities, AIIP ’13, pages 23–28, New York, NY, USA, 2013. ACM.

90

Bibliography

[54] Keith Dodrill. Understanding the benefits of the smart grid, 6 2010.

[55] E.E. Elattar, J. Goulermas, and Q.H. Wu. Electric load forecasting based on locallyweighted support vector regression. Systems, Man, and Cybernetics, Part C: Applica-tions and Reviews, IEEE Transactions on, 40(4):438–447, July 2010.

[56] Xi Fang, Satyajayant Misra, Guoliang Xue, and Dejun Yang. Smart grid 2014; thenew and improved power grid: A survey. Communications Surveys Tutorials, IEEE,14(4):944–980, Fourth 2012.

[57] EugeneA. Feinberg and Dora Genethliou. Load forecasting. In JoeH. Chow, FelixF.Wu, and James Momoh, editors, Applied Mathematics for Restructured Electric PowerSystems, Power Electronics and Power Systems, pages 269–285. Springer US, 2005.

[58] I. Fernandez, C.E. Borges, and Y.K. Penya. Efficient building load forecasting. InEmerging Technologies Factory Automation (ETFA), 2011 IEEE 16th Conference on, pages1–8, Sept 2011.

[59] Raul Castro Fernandez, Matthias Weidlich, Peter Pietzuch, and Avigdor Gal. Debsgrand challenge: Scalable stateful stream processing for smart grids. 2014.

[60] David Fletcher, Darryl MacKenzie, and Eduardo Villouta. Modelling skewed datawith many zeros: A simple approach combining ordinary and logistic regression.Environmental and Ecological Statistics, 12(1):45–54, 2005.

[61] Lajos Jeno Fulop, Gabriella Toth, Robert Racz, Janos Panczel, Tamas Gergely, ArpadBeszedes, and Lorant Farkas. Survey on complex event processing and predictiveanalytics. Nokia Siemens Networks, 2010.

[62] Joao Gama, Pedro Pereira Rodrigues, and Raquel Sebastiao. Evaluating algorithmsthat learn from data streams. In Proceedings of the 2009 ACM Symposium on AppliedComputing, SAC ’09, pages 1496–1500, New York, NY, USA, 2009. ACM.

[63] Joao Gama and Mohamed Medhat Gaber. Learning from data streams. Springer, 2007.

[64] Joao Gama and Pedro Pereira Rodrigues. Stream-Based Electricity Load Forecast.In JoostN. Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, DunjaMladenic, and Andrzej Skowron, editors, Knowledge Discovery in Databases: PKDD2007, volume 4702 of Lecture Notes in Computer Science, pages 446–453. SpringerBerlin Heidelberg, 2007.

[65] Joao Gama and Pedro Pereira Rodrigues. Electricity load forecast using data streamstechniques. 2008.

[66] C. Gellings. State-of-the-art projects for estimating the elictricity end-use de-mand. http://sintef.biz/project/ElDeK/Publisering/TR%20A6999%20State%20of%20the%20art%20Projects%20for%20estimating%20the%20electricity%20end-use%20demand.pdf, 2010.

[67] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system.In ACM SIGOPS Operating Systems Review, volume 37, pages 29–43. ACM, 2003.

91

Bibliography

[68] M. Ghofrani, M. Hassanzadeh, M. Etezadi-Amoli, and M.S. Fadali. Smart meterbased short-term load forecasting for residential customers. In North American PowerSymposium (NAPS), 2011, pages 1–5, Aug 2011.

[69] Forbes Gil Press. A Very Short History Of Data Science.http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/. Accessed on 2014-05-26.

[70] Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, and LiadanO’Callaghan. Clustering data streams: Theory and practice. Knowledge and DataEngineering, IEEE Transactions on, 15(3):515–528, 2003.

[71] Luis Hernandez, Carlos Baladron, Javier M. Aguiar, Belen Carro, Antonio J. Sanchez-Esguevillas, and Jaime Lloret. Short-term load forecasting for microgrids based onartificial neural networks. Energies, 6(3):1385–1408, 2013.

[72] H.S. Hippert, C.E. Pedreira, and R.C. Souza. Neural networks for short-term loadforecasting: a review and evaluation. Power Systems, IEEE Transactions on, 16(1):44–55, Feb 2001.

[73] S. Humeau, T.K. Wijaya, M. Vasirani, and K. Aberer. Electricity load forecasting forresidential customers: Exploiting aggregation and correlation between households.In Sustainable Internet and ICT for Sustainability (SustainIT), 2013, pages 1–6, Oct 2013.

[74] Arfah binti Ahmad Shah bin Majid Intan Azmira binti Wan Abdul Razak, MohdShahrieel bin Mohd. Aras. 2012-09-12.

[75] Piet De Jong and Jeremy Penzer. The arima model in state space form, 2000.

[76] Rudolph Emil Kalman. A new approach to linear filtering and prediction problems.Transactions of the ASME–Journal of Basic Engineering, 82(Series D):35–45, 1960.

[77] Evangelia Kalyvianaki, Themistoklis Charalambous, Marco Fiscato, and Peter Piet-zuch. Overload management in data stream processing systems with latency guar-antees, 2012.

[78] Supun Kamburugamuve, Geoffrey Fox, David Leake, and Judy Qiu. Survey of Dis-tributed Stream Processing for Large Stream Sources.

[79] Sami Karjalainen. Consumer preferences for feedback on household electricity con-sumption. Energy and Buildings, 43(2):458–467, 2011.

[80] Sami Karjalainen. Consumer preferences for feedback on household electricity con-sumption. Energy and Buildings, 43(2-3):458 – 467, 2011.

[81] S. J. Kiartzis, A.G. Bakirtzis, J.B. Theocharis, and G. Tsagas. A fuzzy expert systemfor peak load forecasting. application to the greek power system. In ElectrotechnicalConference, 2000. MELECON 2000. 10th Mediterranean, volume 3, pages 1097–1100vol.3, May 2000.

92

Bibliography

[82] P. Koponen. Short-term load forecasting model based on smart metering data: Dailyenergy prediction using physically based component model structure. In Smart GridTechnology, Economics and Policies (SG-TEP), 2012 International Conference on, pages1–4, Dec 2012.

[83] S.W. Lai, G.G. Messier, H. Zareipour, and C.H. Wai. Wireless network performancefor residential demand-side participation. In Innovative Smart Grid Technologies Con-ference Europe (ISGT Europe), 2010 IEEE PES, pages 1–4, Oct 2010.

[84] S. Laxman, P. S. Sastry, and K. P. Unnikrishnan. Discovering frequent episodes andlearning hidden markov models: a formal connection. Knowledge and Data Engineer-ing, IEEE Transactions on, 17(11):1505–1517, Nov 2005.

[85] Joao Lourenco and Paulo Santos. Short term load forecasting using gaussian processmodels. Proceedings of Instituto de Engenharia de Sistemas e Computadores de Coimbra,2010.

[86] SPYROS MAKRIDAKIS and MICHELE HIBON. Arma models and the box-jenkinsmethodology. Journal of Forecasting, 16(3):147–163, 1997.

[87] A. Marinescu, I. Dusparic, C. Harris, V. Cahill, and S. Clarke. A dynamic forecastingmethod for small scale residential electrical demand. In International Joint Conferenceon Neural Networks (IJCNN). IEEE, 2014. (to appear).

[88] A. Marinescu, C. Harris, I. Dusparic, S. Clarke, and V. Cahill. Residential electri-cal demand forecasting in very small scale: An evaluation of forecasting methods.In Software Engineering Challenges for the Smart Grid (SE4SG), 2013 2nd InternationalWorkshop on, pages 25–32, May 2013.

[89] F. Martinez Alvarez, A. Troncoso, J.C. Riquelme, and J.S. Aguilar Ruiz. Energy timeseries forecasting based on pattern sequence similarity. Knowledge and Data Engineer-ing, IEEE Transactions on, 23(8):1230–1243, Aug 2011.

[90] Ralph Matroos. Smart grid management using java-based complex event processing.Master’s thesis, 2013.

[91] Yongyi Min, Alan Agresti, and AWT-TAG. Modeling nonnegative data with clump-ing at zero: A survey. 2, 1.

[92] Zhijie Wang Xiaoli Zhu Ge Zhang Ming Zeng, Song Xue. Short-term load forecastingof smart grid systems by combination of general regression neural network and leastsquares-support vector machine algorithm optimized by harmony search algorithmmethod. Applied Mathematics and Information Sciences, 7(1L):291–298, 2013.

[93] Piotr Mirowski, Sining Chen, Tin Kam Ho, and Chun-Nam Yu. Demand forecastingin smart grids. Bell Labs Technical Journal, 18(4):135–158, 2014.

[94] Dunja Mladenic and Alexandra Moraru. Complex event processing and data miningfor smart cities. 2012.

93

Bibliography

[95] I.S. Moghram and S. Rahman. Analysis and evaluation of five short-term load fore-casting techniques. Power Systems, IEEE Transactions on, 4(4):1484–1491, Nov 1989.

[96] Mohamed Mohandes. Support vector machines for short-term electrical load fore-casting. International Journal of Energy Research, 26(4):335–345, 2002.

[97] A. Monacchi, D. Egarter, and W. Elmenreich. Integrating households into the smartgrid. In Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES), 2013Workshop on, pages 1–6, May 2013.

[98] G. W. Morrison. Kalman filtering applied to statistical forecasting. Management Sci-ence., 1977.

[99] Matteo Muratori, Matthew C Roberts, Ramteen Sioshansi, Vincenzo Marano, andGiorgio Rizzoni. A highly resolved modeling technique to simulate residentialpower demand. Applied Energy, 107:465–473, 2013.

[100] S Muthukrishnan. Data streams: Algorithms and applications. Now Publishers Inc,2005.

[101] Christopher Mutschler, Christoffer Loffler, Nicolas Witt, Thorsten Edelhaußer, andMichael Philippsen. Debs grand challenge: Predictive load management in smartgrid environments. 2014.

[102] P. Grindrod D. V. Greetham N. Charlton, S. A. Haben and C. Singleton. A probabilis-tic framework for forecasting household energy demand profiles. 3, 2014.

[103] Dongxiao Niu, Yongli Wang, and Desheng Dash Wu. Power load forecasting usingsupport vector machine and ant colony optimization. Expert Systems with Applica-tions, 37(3):2531 – 2539, 2010.

[104] Jukka V. Paatero and Peter D. Lund. A model for generating household electricityload profiles. International Journal of Energy Research, 30(5):273–290, 2006.

[105] A.D. Papalexopoulos and T.C. Hesterberg. A regression-based approach to short-term system load forecasting. Power Systems, IEEE Transactions on, 5(4):1535–1547,Nov 1990.

[106] Yoseba K. Penya, Cruz E. Borges, Denis Agote, and Ivan Fernandez. Short-term loadforecasting in air-conditioned non-residential Buildings. pages 1359–1364, 2011.

[107] Ram Rajagopal Raffi Sevlian. Short term electricity load forecasting on varying levelsof aggregation. 3 2014.

[108] Jesse Read, Albert Bifet, Bernhard Pfahringer, and Geoff Holmes. Batch-incrementalversus instance-incremental learning in dynamic and evolving data. In Advances inIntelligent Data Analysis XI, pages 313–323. Springer, 2012.

[109] Dr. rer. nat. Florian Leitenstorfer. Vorlesung multivariate verfahren, kapitel ”multi-variate regression”. Accessed on 2014-05-26.

94

Bibliography

[110] Martin Ridout, Clarice GB Demetrio, and John Hinde. Models for count data withmany zeros. 1998.

[111] Pedro Pereira Rodrigues and Joao Gama. Online prediction of streaming sensor data.In Proceedings of the 3rd international workshop on knowledge discovery from data streams(IWKDDS 2006), in conjuntion with the 23rd international conference on machine learning,2006.

[112] David S Rosenblum and Alexander L Wolf. A design framework for Internet-scale eventobservation and notification, volume 22. ACM, 1997.

[113] Ronald G. Ross. The latency of decisions. new ideas on the roi of business rules.Accessed on 2014-05-26.

[114] Hanna Saeli, Eva Rosenberg, and Nicolai Feilberg. Estimating costs and benefits ofthe smart grid. a preliminary estimate of the investment requirements and the resul-tant benefits of a fully functioning smart grid. http://www.rmi.org/Content/Files/EstimatingCostsSmartGRid.pdf, 2011.

[115] Zhi-biao Shi, Yang Li, and Tao Yu. Short-term load forecasting based on ls-svm op-timized by bacterial colony chemotaxis algorithm. In Proceedings of the 2009 Interna-tional Conference on Information and Multimedia Technology, ICIMT ’09, pages 306–309,Washington, DC, USA, 2009. IEEE Computer Society.

[116] Lacir J. Soares and Marcelo C. Medeiros. Modelling and forecasting short-term elec-tricity load: a two step methodology. Technical Report 495, Department of Eco-nomics PUC-Rio (Brazil), 2005.

[117] Michael Stonebraker, UC§ur Cetintemel, and Stan Zdonik. The 8 requirements ofreal-time stream processing. ACM SIGMOD Record, 34(4):42–47, 2005.

[118] Andrew S Tanenbaum and Maarten van Steen. Distributed systems, volume 2. Pren-tice Hall Upper Saddle River, 2002.

[119] James W. Taylor, Lilian M. de Menezes, and Patrick E. McSharry. A comparison ofunivariate methods for forecasting electricity demand up to a day ahead. Interna-tional Journal of Forecasting, 22(1):1–16, 2006.

[120] Ngoc Cuong Truong, James McInerney, Long Tran-Thanh, Enrico Costanza, and Sar-vapali D. Ramchurn. Forecasting multi-appliance usage for smart home energy man-agement. In Proceedings of the Twenty-Third International Joint Conference on ArtificialIntelligence, IJCAI’13, pages 2908–2914. AAAI Press, 2013.

[121] Ngoc Cuong Truong, Long Tran-Thanh, Enrico Costanza, and D. Sarvapali Ram-churn. Activity prediction for agent-based home energy management. In AgentTechnologies for Energy Systems (ATES 2013), May 2013.

[122] University of Oradea. Consumers load profile classification corelated to the electric energyforecast, 1 2012.

95

Bibliography

[123] Vladimir Vapnik, Steven E. Golowich, and Alex Smola. Support vector method forfunction approximation, regression estimation, and signal processing. In Advancesin Neural Information Processing Systems 9, pages 281–287. MIT Press, 1996.

[124] Sneha Vasudevan. One-step-ahead load forecasting for smart grid applications.Master’s thesis, 2011.

[125] Andreas Veit, Christoph Goebel, Rohit Tidke, Christoph Doblander, and Hans-ArnoJacobsen. Household electricity demand forecasting–benchmarking state-of-the-artmethods. arXiv preprint arXiv:1404.0200, 2014.

[126] Nithya Vijayakumar and Beth Plale. Tracking stream provenance in complex eventprocessing systems for workflow-driven computing. In EDA-PS Workshop, 2007.

[127] Eugene Wu, Yanlei Diao, and Shariq Rizvi. High-performance complex event pro-cessing over streams. In Proceedings of the 2006 ACM SIGMOD international conferenceon Management of data, pages 407–418. ACM, 2006.

[128] Xinghuo Yu, C. Cecati, T. Dillon, and M.G. Simoes. The new frontier of smart grids.Industrial Electronics Magazine, IEEE, 5(3):49–63, Sept 2011.

[129] Li Yuancheng, Fang Tingjian, and Yu Erkeng. Short-term electrical load forecastingusing least squares support vector machines. In Power System Technology, 2002. Pro-ceedings. PowerCon 2002. International Conference on, volume 1, pages 230–233 vol.1,Oct 2002.

[130] Matei Zaharia. An Architecture for Fast and General Data Processing on Large Clusters.PhD thesis, EECS Department, University of California, Berkeley, Feb 2014.

[131] Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and IonStoica. Discretized streams: Fault-tolerant streaming computation at scale. In Pro-ceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages423–438. ACM, 2013.

[132] Guoqiang Zhang, B. Eddy Patuwo, and Michael Y. Hu. Forecasting with artificialneural networks:: The state of the art. International Journal of Forecasting, 14(1):35 –62, 1998.

[133] Liang Zhao, Liyuan He, Wong Harry, and Xing Jin. Intelligent agricultural forecast-ing system based on wireless sensor. JNW, 8(8):1817–1824, 2013.

[134] Qunzhi Zhou, Yogesh Simmhan, and Viktor Prasanna. Incorporating semanticknowledge into dynamic data processing for smart power grids. In The SemanticWeb–ISWC 2012, pages 257–273. Springer, 2012.

[135] Qunzhi Zhou, Yogesh Simmhan, and Viktor K. Prasanna. On using complexevent processing for dynamic demand response optimization in microgrid. CoRR,abs/1311.6146, 2013.

96

Bibliography

[136] H. Ziekow, C. Doblander, C. Goebel, and H.-A. Jacobsen. Forecasting householdelectricity demand with complex event processing: Insights from a prototypical so-lution. 2013.

[137] Nadav Zivelin. Forecast metrics and evaluation. http://demantrasig.oaug.org/file/NadavForecastEvaluation1203532889.pdf. Accessed on 2014-05-26.

[138] Peter Zunko and Irena Komprej. Short term load forecasting. In Mediterranean Elec-trotechnical Conference, Ljubljana, volume 2, pages 1470–1473, June 1991.

97