Edge Orchestrator for Mobile Robotics to provide ... - DiVA Portal

112
DEGREE PROJECT IN TECHNOLOGY, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2021 Edge Orchestrator for Mobile Robotics to provide on-demand run-time support KTH Thesis Report Ahmed El Yaacoub KTH ROYAL INSTITUTE OF TECHNOLOGY ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Transcript of Edge Orchestrator for Mobile Robotics to provide ... - DiVA Portal

DEGREE PROJECT IN TECHNOLOGY,SECOND CYCLE, 30 CREDITSSTOCKHOLM, SWEDEN 2021

Edge Orchestrator forMobile Robotics toprovide on-demandrun-time support

KTH Thesis Report

Ahmed El Yaacoub

KTH ROYAL INSTITUTE OF TECHNOLOGYELECTRICAL ENGINEERING AND COMPUTER SCIENCE

AuthorsAhmed El Yaacoub <[email protected]>Embedded SystemsKTH Royal Institute of Technology

Place for ProjectStockholm, SwedenRISE SICS

ExaminerGyörgy DánKTH Royal Institute of Technology

Academic SupervisorSladana JosiloKTH Royal Institute of Technology

Industrial Supervisor

Luca Mottola

RISE SICS

ii

Abstract

Edge computing emerged as an attractive method of distributing computational re-

sources in a network. When comparedwith cloud computing, edge computing presents

a number of key benefits which include improved response times, scalability, privacy,

and redundancy. This makes edge computing desirable for use in mobile robotics, in

which low response times and redundancy are key issues.

This thesis work will cover the design and implementation of a general-purpose edge

orchestrator, that can support a wide range of domains due to being built around the

concept of modularity. An edge orchestrator is a program that manages an edge net-

work by analyzing the edge network and the requirements of devices within that net-

work, then optimizing how the computational resources are distributed within the de-

vices in the network. Modules have been designed and implemented on top of the

orchestrator that allow for optimizations specific to mobile robotics. A proof of con-

ceptmodulewas designed to optimize for latencywhichwas comparedwith an external

algorithm that seeks to optimize for latency as well. Both were implemented on the or-

chestrator and an evaluationwas performed to compare both approaches. It was found

that the module designed in this thesis is better suited for optimizing for latency.

LXDwas chosen to be used for software packaging which is a container-based software

packaging solution. A software packaging solution is used to package software which

would be deployed by the orchestrator. The choice of LXD is analyzed through an eval-

uation procedure that compares it with Docker, which is another container-based soft-

ware packaging solution. It was found that LXDproduces containers of smaller size but

required more time to generate those containers, when compared with Docker. It was

also found that LXD container images exhibited better performance than the Docker

ones for software which is not I/O heavy. It was decided through this evaluation that

LXD was a better choice for the orchestrator.

iii

Keywords

Edge Computing; Orchestration; Mobile Robotics; Software Packaging.

iv

Abstract

Edge computing är en attraktivmetod för distribution av beräkningsresurser i ett nätverk.

Jämförtmedmolnberäkningar har edge computing ett antal viktiga fördelar som inklud-

erar förbättrade svarstider, skalbarhet, integritet och redundans. Detta gör edge com-

puting önskvärt för användning i mobil robotik, där låga svarstider och redundans är

viktiga frågor.

Detta examensarbete täckermindesign och implementering av en generell edge-orkestrerare,

som kan stödja ett brett spektrum av domäner eftersom den är byggd på ett modulärt

sätt. En edge-orkestrerare är ett program som hanterar ett edge-nätverk genom att

analysera edge-nätverket och kraven på enheter inom det nätverket, för att sedan opti-

mera hur beräkningsresurserna fördelas över enheterna i nätverket. Jag har utformat

och implementerat moduler ovanpå orkestratorn som möjliggör optimeringar speci-

fika för mobil robotik. Jag designade också en koncepttest-modul för att optimera för

latens, vilken jag jämförde med en extern algoritm som även den försöker optimera

för latens. Jag implementerade båda på orkestratorn och utförde en utvärdering för

att jämföra båda metoderna. Resultaten visar att modulen utformad i detta examen-

sarbete är bättre lämpad för att optimera för latens.

Förmjukvarupaketering valde jag att användaLXD, vilket är en containerbaseradmjuk-

varupaketeringslösning. Dess syfte är att paketera programvara som ska distribueras

av orkestratorn. Jag analyserade valet av LXD genom ett utvärderingsförfarande som

jämför det med Docker, som är en annan containerbaserad mjukvarupaketeringslös-

ning. Jag fann att LXD producerar mindre containrar, men krävde mer tid för att

generera dessa containrar jämförtmedDocker. Jag fannockså att LXD-containerbilder

visade bättre prestanda än Docker-bilderna för programvara som inte är I/O-intensiv.

Jag fann genom denna utvärdering att LXD var ett bättre val för orkestratorn.

v

Nyckelord

Edge Computing; Orkestrering; Mobil robotik; Mjukvarupaketering.

vi

Acronyms

VM Virtual Machine

LXC Linux Containers

RAM Random-Access Memory

JSON JavaScript Object Notation

IoT Internet of Things

QoS Quality of Service

KVM Kernel-based Virtual Machine

OVF Open Virtualization Format

MCDR Multi-Camera Distributed Rendering

HUD head-up-display

REST Representational state transfer

API Application programming interface

vii

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Edge Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Mobile Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Ethics and Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Background 102.1 Edge Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 What is Edge Computing? . . . . . . . . . . . . . . . . . . . . . 10

2.1.2 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Mobile Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 What is a Mobile Robot? . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Mobile Robotics and Edge Computing integration . . . . . . . . . . . . 14

2.3.1 Active Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.2 Video Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Edge Orchestrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.1 Gamelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.2 ECHO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.3 MicroELementS . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.4 Container Migration . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.5 Service Orchestration . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Building Blocks 24

viii

CONTENTS

3.1 iDrOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.3 Deployment Example . . . . . . . . . . . . . . . . . . . . . . . . 28

3.1.4 Implemented Parts . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Edge Deployment Platform . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.2 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . 33

3.2.3 Rejected Platforms . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.4 Fog05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3 Software Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.3 Possible Candidates . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Edge Orchestrator - Design 444.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2 Main Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3 Data Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.4 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.5 Optimization Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.6 Action Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.7 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Edge Orchestrator - Implementation 515.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2 Monitoring Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.3 Module Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.4 Surveying Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.4.1 Implemented Metrics . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5 Graph Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.6 Analysis Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.6.1 Implemented Optimization Strategies . . . . . . . . . . . . . . . 63

ix

CONTENTS

5.7 Action Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.8 Execution Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6 Results and Analysis 716.1 Software Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.1.1 Setup and Implementation . . . . . . . . . . . . . . . . . . . . . 716.1.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 73

6.2 Optimization Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.2.1 Edge Scheduling Strategy (ESS) . . . . . . . . . . . . . . . . . . 746.2.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.2.3 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.2.4 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.2.5 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3 Orchestrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7 Conclusions and Future work 877.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

References 91

x

List of Figures

2.4.1 Architecture of the gamelet solution . . . . . . . . . . . . . . . . . . . . 16

3.1.1 System Architecture of iDrOS . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.2 An example illustrating iDrOS interfaces . . . . . . . . . . . . . . . . . 29

4.5.1 An example of the decision making process . . . . . . . . . . . . . . . . 47

5.1.1 System Architecture of the orchestrator. . . . . . . . . . . . . . . . . . . 51

5.4.1 Process for obtaining GPS location and battery percentage for drone. . 59

5.4.2Data acquisitionprocess betweenSurveyingModule andMonitoringMod-

ules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.2.1 Architecture of the virtual machines in the evaluation environment. . . 76

6.2.2Graph of average initialization times at different numbers of active nodes 82

6.3.1 Graph of RAM usage at different numbers of instances deployed . . . . 84

A.0.1Module Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

A.0.2Metric class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

A.0.3Network Graph class diagram . . . . . . . . . . . . . . . . . . . . . . . . 96

A.0.4Graph interface class diagram . . . . . . . . . . . . . . . . . . . . . . . 97

A.0.5Optimization Strategies Class Diagram . . . . . . . . . . . . . . . . . . 97

A.0.6Action Interface Class Diagram . . . . . . . . . . . . . . . . . . . . . . . 97

A.0.7Action Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

xi

List of Tables

6.1.1 Results for software packaging evaluations . . . . . . . . . . . . . . . . 73

6.2.1 Network conditions for different node types . . . . . . . . . . . . . . . 76

6.2.2Configurations tested for optimization strategy evaluation . . . . . . . 77

6.2.3Evaluation results of the Latency Optimization strategy . . . . . . . . . 78

6.2.4Evaluation results of the ESS optimization strategy . . . . . . . . . . . 79

6.2.5Low configuration but with one edge node at 300ms time delay . . . . 81

6.3.1 Evaluation of RAM usage on the orchestrator node as more instances

are instantiated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

xii

Listings

3.1 Adding battery percentage to telemetry object. . . . . . . . . . . . . . . 30

3.2 Modification to process MAVLink message to obtain battery percentage. 30

3.3 Modification to add battery percentage as a drone property. . . . . . . 30

3.4 Request to obtain GPS coordinates . . . . . . . . . . . . . . . . . . . . . 31

3.5 Drone request handler for exposing GPS location and battery percentage. 31

5.1 Injecting custom statistics into the node status. . . . . . . . . . . . . . . 54

5.2 Function to ping all other nodes and obtain the time taken each. . . . . 54

5.3 Function to obtain network usage. . . . . . . . . . . . . . . . . . . . . . 54

5.4 Metric handling functions in Surveying Module. . . . . . . . . . . . . . 56

5.5 General module functions for the Surveying Module. . . . . . . . . . . 57

5.6 Function to obtain status for all online nodes. . . . . . . . . . . . . . . . 58

5.7 Function to obtain active nodes. . . . . . . . . . . . . . . . . . . . . . . 59

5.8 Function to update node graphs. . . . . . . . . . . . . . . . . . . . . . . 61

5.9 Function to update node edges. . . . . . . . . . . . . . . . . . . . . . . . 62

5.10 Graph Interface function. . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.11 Necessary functions for analysis module. . . . . . . . . . . . . . . . . . 63

5.12 Pseudocode for Latency Optimization strategy. . . . . . . . . . . . . . . 66

5.13 Function to terminate an instance. . . . . . . . . . . . . . . . . . . . . . 69

5.14 Execute action function. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

xiii

Chapter 1

Introduction

This chapter will provide a brief background for the two domains in which this thesis

falls under which are edge computing and mobile robotics. This is followed by the

problem statement which is obtained through studying how andwhy to integrate those

two domains. This problem statement is followed by the purpose of the thesis which

outlines what will be produced as part of this thesis to solve the problem highlighted

in the problem statement which is my main contribution. The purpose also highlights

the consequences of solving the problem on the two domains. The stakeholders for the

thesis are then listed and a description of the scope of the thesis is presented. Lastly,

an outline of the structure of the thesis is presented.

1.1 Background

In broad terms, this thesis is about the integration of two domains, edge computing

and mobile robotics. In this section, those two domains are introduced and explained,

to provide the necessary background to understand the motivation to integrate those

two domains and what are the key challenges faced when attempting to do so.

1.1.1 Edge Computing

Over the past several years, edge computing has emerged as a method of distribut-

ing computational resources that provides a compromise between on-device and cloud

computing [33]. Edge computing is described as bringing computationphysically closer

to where it is needed, thereby reducing response times by deploying edge nodes close

1

CHAPTER 1. INTRODUCTION

to client devices [33]. Edge nodes are defined as computation or storage devices con-

nected to the same network as the client devices while being physically close to them.

There are no absolute values defining how close a nodemust be, therefore this depends

on the application. For example, an edge node is defined in the game streaming do-

main to be mostly one or two hops away from the client device [4].

Edge computing has a number of key benefits over cloud computing, of which four are

highlighted [33]. Firstly, edge computing enables highly responsive services due to the

closer proximity to the end users [33]. Secondly, edge computing allows for increased

scalability by processing on the edge then only transmitting processed information to

the cloud [33]. Thirdly, edge computing allows for more private operation by restrict-

ing the amount of information sent over the internet [33]. Lastly, edge computing can

mask network outages in the cloud by switching to edge nodes which remain connected

[33].

Yet despite its benefits, edge computing has some challenges. Utilizing edge computing

increases the potential of security vulnerabilities being exploited when compared with

a completely isolated local system that does not utilize edge computing [41]. As a result,

it is vital to utilize platforms that are regularlymaintained and receive security updates

to deal with future unknown vulnerabilities.

Edge computing is also less standardized than cloud computing. There are many dif-

ferent standards and definitions as to what constitutes edge computing [41]. There is a

need for universally-agreed definitions so that problems can be defined in amore clear

manner [41].

Edge computing also utilizes a wide array of devices from multiple generations [41].

Therefore, it would be challenging to build an edge network that can handle and mon-

itor many different types of devices which can range from sensors, to compute nodes.

Thus there exists a need for handling and monitoring solutions that can be adapted to

a wide range of devices.

1.1.2 Mobile Robots

The word mobile robots has multiple definitions, depending on whatmobile refers to.

In this thesis, mobile robots refers to a robot that is capable of motion which is also

connected to a mobile network. To be more specific, this thesis will focus on aerial

2

CHAPTER 1. INTRODUCTION

robots (colloquially known as drones), however the techniques developed can be ex-

tended and applied to any robot which satisfies the definition of mobile robot.

Mobile robots typically perform computations locally on a single-board computer such

as a Raspberry Pi. Those single-board computersmay not have the computing capabil-

ities for applications with high computational intensity such as real-time object recog-

nition. An approach to increase the computational capabilities of mobile robots is to

integrate an edge computing network with themobile robot. This edge computing net-

work is composed of a number of edge nodes, which can be used for processing and

storage of data. Communication between the mobile robot and the edge network uti-

lizes high-speed mobile networks that enable the transfer of vast amounts of data at

high throughputs and low latencies.

The integration of mobile robotics with edge computing will be the main topic of this

thesis. Mobile robots have some unique characteristics that require them to have re-

quirements and considerations specific to them, whichwould not be present with other

devices that seek to integrate edge computing.

The source of those considerations come from 3 main characteristics:

• The first characteristic is the limited energy supply that is caused by the mobile

robots utilizing batteries that have very limited capacities, which in turn limits

the maximum operational time. As a result, it becomes essential to maximize

efficiency and reduce resource usage both from mechanical (i.e. motors) and

computational (i.e. onboard computer) perspectives.

• The second characteristic is the inconsistent network availability. This is caused

by a number of reasons. First reason is mobile robots operating in a wide variety

of environments including some which have poor access to a mobile network.

Second reason is the capability of movement in any direction, which means that

themobile antennamay accidentally be hindered by an object obstructing it from

the network. Those scenarios may be predictable in some cases but not always,

which presents the challenging problem of how should a mobile robot always be

prepared for any sudden loss in network connectivity.

• The third characteristic is that mobile robots have physical movements as part

of their application logic. This means that they are both aware and in control of

the physical movements that they will take. This is opposed to smartphones that

3

CHAPTER 1. INTRODUCTION

may be aware of their physical movements, but are unable to actively choose to

move in a particular direction.

1.2 Problem

Researchwas conducted to find if there exists a solution that integrates edge computing

and mobile robotics in a manner that considers those three characteristics. Unfortu-

nately no such solution was found. Literature was found that integrates edge comput-

ing with other domains. This was done through the use of an edge orchestrator.

An edge orchestrator is a program that manages an edge network. This is done by an-

alyzing the edge network and the requirements of devices within that network, and

utilizing the available edge nodes to establish an edge deployment that achieves the

requirements that the devices need for their operation. The edge network refers to the

network layer connections between the devices. The edge deployment refers to the ap-

plication layer, i.e. what application, if any, is each device in the network running, if

any. Application here refers to a set of storage, and/or computations that are running

on the device. The devices’ requirements may be computational, storage, and/or net-

working requirements. An application that has been deployed on a node is called an

application instance.

The orchestrator is intended to use measurements and data about the current edge

deployment to compute and execute a more optimized deployment. The deployment

would be optimized for one or multiple measurements such as for example, within the

specific mobile robotics domain, battery levels and current GPS location. The purpose

of a deployment is to utilize resources available on edge nodes to carry out computa-

tions by deploying application instances that can access the resources directly and then

communicate results of those computations to appropriate nodes. The orchestrator al-

lows for the deployment and management of those application instances but does not

handle the computations or communication of results that is instead handled by the

application being deployed. This management takes the form of creating, terminating

andmigrating application instances between different nodes in a network. Computing

a certain deployment is akin to determining which nodes should have application in-

stances running and which should not. Determining and executing this deployment is

what the edge orchestrator is supposed to do.

4

CHAPTER 1. INTRODUCTION

Edge orchestrators were found that either apply to specific domains, or are general or-

chestrators intended for general purpose edge computing. Unfortunately, there was no

orchestrator that considers the three characteristics ofmobile robotics previouslymen-

tioned (limited energy supply, inconsistent network availability, and physical move-

ments as application logic) in the orchestration. This presents an opportunity to build

an orchestrator that does so.

This thesis will address the following questions:

1. What functionalities should the edge orchestrator intended for mobile robots

have?

2. Which parts of the edge orchestrator should be modular?

3. How should applications be deployed using the edge orchestrator?

The next section will discuss the benefits that this orchestrator could provide.

1.3 Purpose

The purpose of the degree project was to present a modular edge orchestrator that en-

ables the integration of edge computing with mobile robotics. By using this orchestra-

tor, developers that intend to add edge computing functionality to their domains will

have an orchestrator which is modular so that they can add their own customizations

for their specific use-cases. In our case, we will be using the mobile robotics domain as

a proof of concept implementation and therefore modules are implemented with that

domain in mind on top of the orchestrator.

This orchestrator, as its implemented in this thesis, greatly simplifies the process of

integrating edge computing into a mobile robotics platform and this integration has a

number of key benefits. Firstly, developers working on mobile robotics applications

can utilize the benefits that edge computing provides, while minimizing the drawbacks

of off-device computing. Historically, due to real-time processing requirements and

poor performance ofmobile networks, processing onmobile robots had to be restricted

to on-device processing by relying on single board computers such as a Raspberry Pi

[18].

However, with the increased capabilities of modern mobile networks, and by building

an orchestrator with customizations designed specifically for mobile robots, mobile

5

CHAPTER 1. INTRODUCTION

robotswouldno longer be limited by the computing power of the single board computer

on the robot but instead, can utilize powerful edge nodes that offer lower latencies than

cloud nodes and higher computational power than on-device computers.

As a result, this will allow for new applications which were previously not possible that

utilize those additional resources, enabling mobile robots to be deployed in those ap-

plications. An example of this is a building analysis drone, which takes pictures and

heatmaps of various buildings, sends them to an edge node that compares this data

to a database of previously obtained images and heatmaps which may be too memory

and processing intensive to be done on-device [26]. Based on the comparison, build-

ing deterioration can be calculated in real-time, reported back to the drone which can

make real-time on-the-spot adjustments to the mission to gather more data if the de-

terioration was found to be substantial enough. Such an application would have been

more difficult using just on-device processing since the operator would have to wait

until the mission is over and the drone has returned to get the data. It also would have

been more difficult using cloud computing because the higher latency of cloud nodes

when compared with edge nodes may prevent real-time on-the-spot adjustments to be

made. This is because knowledge about the position and state of the drone requires

more time to reach cloud nodes compared with edge nodes, due to the higher latencies

associated with cloud nodes.

Throughout this project, I have performed the following tasks. The purpose of those

tasks is described below the list:

• Researched various software packaging solutions

– Chosen most appropriate software packaging solution

– Evaluated the decision of software packaging solution

– Researched, decided on, and utilized an edge deployment platform

• Designed, implemented, and tested the edge orchestrator

• Expanded on the mobile robotics platform developed by previous students

– Added reporting of mobile robot telemetry on a network

– Added measurement of battery percentage of mobile robot

The mobile robotics platform was mostly developed by two previous thesis students.

6

CHAPTER 1. INTRODUCTION

The reason this software platform is being described, is because it is deployed on nodes

by the orchestrator. The platformenables drones to utilize amore dynamic approach to

drone navigation by relying on the concept of Active Sensing [26] that considers mea-

surements from sensors to dynamically determine the path of the drone. The platform

also simplifies the process of adding sensors to drones by abstracting sensor drivers.

The platform enables communication and management of the drone through the in-

ternet.

An edge deployment platform is a platform that has the capability to utilize compute,

storage, and networking resources on edge nodes. The edge deployment platform is

described because it is a necessary component to build the orchestrator on top. The

edge deployment platformprovidesAPIs that enable functionality that the orchestrator

will use.

A software packaging solution packages a software platform, alongwith all the required

dependencies, into one file that could be deployed on any system that supports the

packaging solution. This is described because edge deployment platforms need the

drone software platform and application to be packaged in such a manner to be able to

be deployed dynamically without requiring the drone software platform’s dependen-

cies to be installed on every node prior to deployment.

1.4 Scope

The scope of the project was to design and implement an edge orchestrator, using an

edge deployment platform which is already available, to be able to organize the de-

ployment of instances in an edge network. An edge deployment platform is a platform

that has the capabilities to deploy and manage instances onto edge nodes, thereby

utilizing their compute, storage, and networking capabilities. An instance refers to

software running on an edge node that requires compute, storage and/or network re-

sources.

The design of the orchestrator has been made to be platform agnostic, in the sense

that the design could be implemented with any edge deployment platform. This does

not carry on to the proof of concept implementation, which has been implemented to

utilize a specific edge deployment platform.

There are some aspects that are beyond the scope of this project. The orchestrator can

7

CHAPTER 1. INTRODUCTION

only perform macro-management of instances, meaning that it can instantiate, termi-

nate, andmigrate entire instances. Instantiationmeans that an instance is deployment

on a particular edge node. Termination means that a deployed instance is closed and

its resources are deallocated. Migrationmeans that an instance is transferred from one

node to another.

Micro-management manages parts of an instance between different edge nodes, such

as migrating one function from one node to another. Micro-management is not im-

plemented as part of this thesis and therefore cannot be utilized by the orchestrator.

The unavailability of micro-management limits the capabilities of the orchestrator by

limiting the types of modifications it can make to the edge deployment.

1.5 Ethics and Sustainability

This project will be using a number of different open source software, one of which

will be modified to fit our needs. As a result, the modified software’s licenses must be

analyzed to ensure that we have permission to make changes to the code. The software

that will be modified follows the Eclipse Public License 2.0 and the Apache 2.0 License

[13] [5]. Fortunately, both licenses permit the modification of the code as well as pri-

vate use and commercial use, which means that we are able to freely modify the code

to suit our purposes.

Mobile robots are battery powered devices, and as such, they are a part of the sus-

tainability aspect of this project. Sustainability has been tackled by including battery

conservation as a key characteristic ofmobile robots. As a result, a solution to integrate

edge computing intomobile robotsmust consider the problem of battery conservation.

Prioritizing battery conservation both increases the utility of the robot by extending its

flight hours, andmakes the robot more sustainable by using less energy and extending

the lifespan of batteries.

1.6 Outline

Chapter 2 of this report will go over the background required to read and understand

the rest of the report. Firstly, edge computing is introduced and explained. Then, the

same is done to mobile robotics. Then, certain applications where the integration of

8

CHAPTER 1. INTRODUCTION

edge computing into mobile robots are outlined. Lastly, edge orchestrators from other

works are described and analyzed.

Chapter 3 discusses the design and implementation of various parts of the project that

form the building blocks that are necessary to implement the orchestrator. Firstly,

the drone software platform, iDrOS, is described. Then the edge deployment platform

used, Fog05, is described. Finally, the design and implementation of the software pack-

aging solution chosen is illustrated.

Chapter 4 discusses the design of the orchestrator. How the orchestrator has been

designed to be modular? Which components can be customized and which cannot?

It also demonstrates the rationale behind the design decisions taken to provide the

context for why those decisions were made.

Chapter 5 discusses implementation of the orchestrator. It will also discuss the im-

plementation of the proof of concept customizations for the mobile robotics domain.

By discussing the customizations, it can serve as a guide for how to create other cus-

tomizations for other domains on top of the orchestrator, to adapt it to those other

domains.

Chapter 6 shows the evaluations performed, results of the evaluations, and the analysis

performed on those results. The evaluation in this section is split into two main parts,

one evaluating the use of the application packaging solution chosen, LXD. It was found

that this choice was an appropriate one. The other evaluation is focused on evaluating

one of the proof of concept implementations for themobile robotics domain, whichwas

to design a module that optimizes for latency between different nodes. This module

was compared to an external algorithm called ESS that also optimizes for latency. It

was found that the module designed in this thesis was better suited to optimizing for

latency.

Finally, Chapter 7 provides conclusions to thework done on this project. Then provides

and discusses some future works that could be performed to expand the work done on

this project.

9

Chapter 2

Background

This chapter will provide the necessary background on concepts important for the un-

derstanding of the thesis. The twodomains presented in the introduction chapter (edge

computing and mobile robotics) will be explained in further detail with a larger focus

on specifics of their importance in building the orchestrator. Finally, examples of edge

orchestrators from the literature are presented. Those edge orchestrators were not de-

signed specifically for mobile robotics, but either are edge orchestrators for general

purpose applications or are designed specifically for other domains. Their suitability

towards being used for mobile robotics will be discussed. Their key innovations will be

highlighted since those innovations could be reused in the design of the orchestrator

presented in this paper.

2.1 Edge Computing

2.1.1 What is Edge Computing?

Edge computing is “the processing and analyzing of data along a network edge, closest

to the point of its collection, so that data becomes actionable.” [16]. Themain objective

that edge computing was created to solve, is proximity by moving the computations

closer to where they are needed. [16].

10

CHAPTER 2. BACKGROUND

2.1.2 Benefits

Compared to nodes in a cloud network, nodes in an edge network are physically closer

to client devices, both in terms of the number of hops and in terms of physical distance.

By utilizing closer nodes the latency between the node and the client device is smaller

[16]. This immediately highlights a major benefit of edge computing within the do-

main of mobile robotics, which is reduced latency and improved response time, both

of them vital when dealing with devices that have real-time deadlines. By relying on

edge computing instead of cloud computing, the reduced latency of edge computing

means the likelihood of meeting real-time deadlines is higher.

A proof of concept face recognition applicationwas implemented both in a cloud-based

implementation and an edge-based implementation [40]. The response times, which

were defined as the time the client device begins uploading the image to the time it re-

ceives the answer, for the cloud-based and edge-based implementations were 900ms

for the cloud-based implementation and 169ms for the edge-based implementation

[40]. The processing times once the data is on the servers is 2.492ms for the cloud

server and 2.479ms for the edge server [40]. This speedupwas due to both lower laten-

cies and higher bandwidth on edge nodes when compared with cloud nodes which re-

sulted in faster transmission times for images. This demonstrates that for applications

where the transmission times are significantly higher than the processing times, edge

computing can result in significant speedups compared with cloud computing.

A common misconception is that the decision to utilize edge computing means cloud

computing is not utilized at all. However, edge nodes could be placed between the

client device and cloudnodes. In this setup, edge nodeswould be connected over a local

area network to the client device, rather than over the internet [16]. Cloud nodes are

connected to the client device and edge nodes over the internet [16]. Data processing

is done on edge nodes. Only the results of the processing are communicated to the

cloud nodes [16]. This results in the second benefit of edge computing, which is that

the amount of data sent to the internet is greatly reduced by only sending the results

to the cloud [16].

By deploying such a network, the edge nodes would be utilized for computations re-

quiring fast response times. Computations can be grouped in terms of complexity. Less

complex computations can be done on the client device. Highly complex computations

can be done on the powerful cloud node and computations of medium complexity can

11

CHAPTER 2. BACKGROUND

be done on the edge node. This approach is known as osmotic computing [7]. Achiev-

ing the right balance of performance, response time and energy consumption is the job

of the edge orchestrator that should deploy computations in themost appropriate node

to achieve that balance. This is an example of orchestration by optimizing computa-

tional intensity. The decision to optimize in terms of which metric is a key decision

when building an orchestrator geared towards a specific domain.

Edge computing should theoretically be more secure than cloud computing [16]. In

particular, by keeping most of the data between the client device and the edge node,

most of the data is not exposed to the wider internet as it is with cloud computing

[16]. This may reduce the attack surface by reducing the number of devices that the

data could be exposed to. However, edge computing can potentially increase the at-

tack surface depending on the implementation. If sensitive data is transferred to the

cloud node either way, then passing through the edge node increases the number of

hops and as a result increases the likelihood of a vulnerability in one of the hops being

exploited. Therefore, security sensitive applications would require a security focused

implementation for this benefit to be applicable. So it can be fair to say that edge com-

puting provides a higher upper ceiling of security, but does not necessarily increase

security in of itself. It should be noted that security was not an aspect of the orchestra-

tor that was explored in this thesis.

2.2 Mobile Robotics

2.2.1 What is a Mobile Robot?

Mobile robots are robots which are capable of motion as well as being connected to a

mobile network. In this definition the word mobile refers to both mobility in physi-

cal terms as well as connectivity terms. Mobile robots also have the unique property

that the mobility is part of the application logic. This means that the mobility can be

controlled and manipulated by the mobile robot. This is opposed to mobile phones,

which cannot directly control their mobility since they are being carried by the user.

The consequence of this is two-fold, it means that mobility can be considered by an or-

chestrator so that orchestration decisions are made with mobility taken into account.

However, it also means that there is added complexity if this was to be done.

12

CHAPTER 2. BACKGROUND

2.2.2 Networking

The integration of edge computingwithmobile robotics ismade possible by high-speed

mobile networks. Therefore, it is important to study previous attempts at using high-

speed networks with mobile robots or drones.

The use of a 4G mobile network to enable drones to be connected was examined in

[21]. Field trials were performed by utilizing a “commercial LTE network occupying a

suburban area in Masala, Finland”. LTE in this quote refers to Long-Term Evolution

which is a standard for mobile networks [22]. The drone used was a DJI Phantom 4

Pro, which was connected to an LTE smartphone running TEMS Pocket 16.3 for data

collection. Based on those trials, “the applicability of terrestrial networks for connected

drones” was demonstrated [21]. It was also found that current LTE networks are ca-

pable of supporting low-altitude (heights up to 300m) drones, however, “there may

be challenges related to interference as well as to mobility.” [21]. This interference

is caused by the fact that mobile antennas are usually tilted downwards towards the

ground which is where the majority of users are located, whereas drones are typically

at higher altitudes and therefore are more likely to experience decreased performance

or even complete connection loss [39].

The 5G standard, which is currently being deployed, has multiple optimizations that

make it more suitable than 4G for drones. For example, the user plane latency (one-

way time for delivering a packet from one 5G device to another 5G device) has been

reduced to 1ms when using ultra-reliable and low-latency communications and 4ms

when using enhanced Mobile Broadband communications [25]. Minimizing this la-

tency results in a significant reduction in the latency overhead caused by the network.

As a comparison, 4G LTE experiences around 50ms [17].

The 5G standard has guarantees for the Quality of Service (QoS) at different physi-

cal speeds [25]. This guarantee is important for automotive applications to ensure a

certain QoS for automonous cars or trains, but has the side-effect of being useful for

mobile robotics [25]. The standard defines 4 classes of mobility of which the latter two

are relevant for mobile robotics, vehicular at 10km/h to 120km/h and high speed ve-

hicular at 120km/h to 500km/h [25]. The former achieves the defined QoS in dense

urban as well as rural areas while the latter does so in rural areas. Based on those guar-

antees, it is expected that the 5G standard is a marked improvement than 4G for the

purposes ofmobile robots and therefore would aid in the adoption ofmobile connected

13

CHAPTER 2. BACKGROUND

robots. This demonstrates that the integration ofmobile robots with edge computing is

becoming more feasible with the improvements brought on by the 5G standard.

2.3 Mobile Robotics andEdgeComputing integration

This section will highlight different scenarios where the integration of mobile robotics

with edge computing would be beneficial.

2.3.1 Active Sensing

Active sensing refers to the usage of sensor data to dynamically modify flight param-

eters [26]. In cases where sensor data must be preprocessed before use, and the pre-

processing requires computational resources that are not available on themobile robot,

edge computing provides increased computational resources in the form of edge nodes

which could be used to aid in the preprocessing of data. In this scenario, the integra-

tion of edge computingwould boost the computational resources available to the robot,

which would allow for the preprocessing of more complex computational tasks.

This could also be accomplished with cloud computing, since that also increases the

computational resources available. However, cloud computing lacks the most impor-

tant benefit of edge computing, which is being close to the client device, both in terms

of physical distance and the number of hops. If we expand this scenario to require

preprocessing to be completed within a certain real-time deadline, then an edge com-

puting based implementationmeets a wider array of deadlines than a cloud computing

based implementation.

2.3.2 Video Streaming

The high bandwidths and low latencies of edge computing mean it can be adopted in

applications where vast amounts of data have to be transferred. In 2019, video files

accounted for 60% of traffic over the internet [1]. A scenario where amobile robot with

a camera would have to stream, in real-time, the camera feed over a mobile network

would benefit from utilizing an edge network. The video stream would be sent to an

edge node, which would have the bandwidth and latency capabilities to handle large

video files. Edge nodes have been shown to have up-link bandwidths almost 50 times

higher than cloud nodes and latencies around 1 order of magnitude lower [40]. This

14

CHAPTER 2. BACKGROUND

makes them more likely to handle the requirements to transmit a live video feed with

minimal delay. In this case, the integration of mobile robotics and edge computing

has increased the bandwidth and latency limitations resulting in higher quality video

streams transmitted with lower delays.

2.4 Edge Orchestrators

Throughout the past several years, there have been multiple papers about possible ap-

proaches to orchestrating an edge network. Those approaches range from generalized

approaches to approaches focused on specific domains such as game streaming. In

this section, those orchestrators will be discussed. Their suitability will be analyzed

when it comes to the domain of mobile robotics. Some useful concepts will also be

highlighted since they may be useful to adapt to the design of the orchestrator from

this thesis.

2.4.1 Gamelets

An orchestration system was developed called Gamelet system [4]. This system was

developed specifically to tackle the problem of video game streaming. This system is

being analyzed because it is analogous to the integration of edge computing intomobile

robotics. This system integrates edge computing into video game streaming. Studying

this system will provide some ideas on how to integrate edge computing into other

domains, as well as show if it is beneficial to do so.

Video game streaming refers to an approach where the client device (the one the player

has) handles game inputs (controls and voice) as well as game outputs (video feed of

the game). But the processing of the game (such as rendering and the control of non-

player characters) is handled on an external server [20]. In video game streaming, the

client device sends the player’s inputs to the server [20]. The servers run those inputs

through the game and returns the video stream corresponding to the next frame after

those inputs were processed to the client device [20]. The client device then displays

that frame to the user [20]. This is done multiple times (typically 30 or 60) times per

second to create a live video stream of the game. This highlights the importance of low

latencies especially in games that require fast user inputs such as driving or shooting

games.

15

CHAPTER 2. BACKGROUND

The architecture of Gamelets is shown in Figure 2.4.1 [4].

Figure 2.4.1: Architecture of the gamelet solution

This architecture is commonly found in edge computingnetworks. However, theGamelet

system has three optimizations that are specific to its domain:

Zone Distribution: Zone Distribution reduces the size of the game on each gamelet

by only downloading the zone of the game the user is on. This zone can represent the

location, or level the user is on. Additionally, this allows a gamelet to serve multiple

clients if they are on different zones, by serving each client the zone it is on. To give an

example, if player one was on level A and player two was on level B, they can both be

served by the same gamelet. The gamelet downloads levels A and B but not the other

levels. It can then run them in parallel to be served to players one and two respectively.

Using this approach, two players were served by one gamelet which only had to down-

load levels A and B. This type of approach is known as micro-orchestration because

the orchestrator is able to split up different parts of the entire game and transfer those

parts between different nodes. This is very powerful since it results in lower network

consumption since less data is transferred.

Distributed Rendering: Distributed Rendering utilizes parallel processing by ren-

16

CHAPTER 2. BACKGROUND

dering different camera views in different Gamelets. This method is called Multi-

Camera Distributed Rendering (MCDR). This can be best demonstrated with virtual

reality games that rely on a virtual reality headset. In this case, there are two cameras

views, each representing an eye. MCDR allows each camera view to be processed on

a separate gamelet. This means that each camera view can utilize the full resources of

the gamelet resulting in improved graphical fidelity. This approach is opposite to the

zone distribution, since it tries to increase the number of gamelets used to increase the

amount of resources available, as opposed to zone distribution, which tries to pack as

many parts onto one gamelet. This approach instead prioritizes computational perfor-

mance, which would be preferable in mobile robotics use-cases such as active sensing

where each sensor being processed can have one dedicated edge node that ensures it

meets its real-time requirements.

AdaptiveStreaming: Aunique characteristic of game feeds is that not all the content

is required to be streamed at the same rate. For example, head-up-display (HUD)s in

games which display important information such as the players’ health and ammo, do

not need to be refresh at a consistent 30-60 frames per second. Therefore, the gamelet

can stream those areas at lower framerates, preserving useful bandwidth for areas of

the stream which require a consistently high frame rate. An example is in shooting

games where the amount of ammo is displayed on the screen. This display does not

need to be refreshed 30 times a second, but instead whenever the amount of ammo

changes. Therefore this display can be frozen until the gamelet detects a change in

which case the updated display is streamed to the client device. This reduces band-

width since the entire video feed does not have to be streamed at 30-60 frames per

second, but only the sections which change at that rate. This could be applied directly

to the mobile robotics use-case of video streaming. If we have a video stream that has

a HUD displaying GPS coordinates updated every x seconds, then the video stream

would use adaptive streaming to only send the updated HUD when the GPS coordi-

nates are updated which results in bandwidth savings.

A user study was conducted with seven undergraduate students to test their perception

of using the gamelet solution after playing the game locally first. They were asked to

rate their experience from 1 (very noticeable artifacts) to 5 (no difference between local

version and gamelet). This was also compared with a pure cloud streaming platform.

As a result, using an architecture with 2 or 4 gamelets resulted in scores slightly above

4, 1 gamelet in 3.5, 8 gamelets in 3 and the pure cloud platform in 3.

17

CHAPTER 2. BACKGROUND

Those findings demonstrated that there is a sweet spot in terms of the number of

gamelets where the best compromise between synchronization errors between multi-

ple gamelets and pure performance lies. This was around 2-4 gamelets. It also demon-

strated anoticeable improvement in perceptionwhen comparedwith pure cloud stream-

ing. As a result, this study demonstrates the potential in building and customizing an

edge computing network towards a specific domain, rather than relying on a general-

ized architecture. It also demonstrated 3 key innovations that could also be applicable

in integrating edge computing with mobile robotics.

2.4.2 ECHO

An orchestration platform called ECHO was developed in [31]. ECHO manages re-

sources which utilize Linux Containers (LXC). LXC is a Linux platform that enables

support for isolated Linux systems. Each of those systems is known as a container.

The containers share one Linux kernel, but have separate environments and names-

paces. Resources managed by ECHO are composed of containers which are deployed

accordingly.

Another container platform was considered which is known as Docker. LXC was cho-

sen because Docker was “more resource intensive for low-end edge devices” [31].

ECHO utilizes a number of interesting innovations. Firstly, ECHO uses a JavaScript

Object Notation (JSON)-based registry called the resource directory to provide infor-

mation about all the resources in the system. A Representational state transfer (REST)

based Application programming interface (API) is used to obtain and update informa-

tion in the registry. This information includesmemory, disk, IP addresses, and current

CPU utilization which are produced for each node in the network. Exposing informa-

tion about all the resources in the system in such a manner simplifies the process of

building an orchestrator to utilize this information. It also allows this information to

be easily accessible by any component or device that is in the same network. Since

the orchestrator built in this thesis is intended to manage both mobile robots and edge

nodes, the flexibility of having all the resources in the network exposed through a com-

mon REST API means it is easier to add new types of nodes, such as a specific type of

node for mobile robots.

Secondly, ECHO uses a service which runs on each connected device called the De-

vice Service. This service is responsible for device registration as well as management

18

CHAPTER 2. BACKGROUND

of the containers. It logs various information such as CPU and memory utilization of

each node in the network. The choice to have a service running on each device can po-

tentially be adapted to mobile robots. Different node types can have different services.

For example, edge nodes would run a generic service that communicates its processing

capabilities and the current status of its resources. Other more specialized nodes such

as mobile robots can have a specialized service that obtains more specific information,

such as GPS coordinates where the robot is located.

Thirdly, the Platform Master is a service that manages the dataflow between different

devices. To do this, each dataflow is assigned a unique ID which is used for its man-

agement. The Platform Master utilizes another service that is running on each Virtual

Machine (VM) or container called the Platform Service. The Platform Service has built

in access to the VM or container each application is running on which allows it to di-

rectly enable features such as dataflow rebalancing. This would not be possible without

a service that has direct access to the data within the VM or container.

As a result of those optimizations, ECHO is suited towards applications with a heavy

emphasis on large data sets. The evaluation was performed using the Extract Trans-

form Load (ETL) which performs pre-processing on sensor data. The testing proce-

dure was that for the first half of the testing period, only Raspberry Pis (client devices)

were used in the computations. At the midpoint, 2 VMs (edge nodes) are initialized

and added to the available resources. As a result, the throughput increased from 15

events/sec to 80 events/sec, which is an over five-fold increase in the throughput.

Based on the ECHO architecture, three key innovations that could be utilized in our

edge orchestrator were presented. Firstly, is the usage of containers as the resources

being orchestrated. Secondly, is the adoption of a standard registry for resource infor-

mation. Thirdly, is the usage of services which run on each node, to collect data and

manage containers.

2.4.3 MicroELementS

Another interesting approachwas developed calledMicroELementS orMELS for short

was developed in [7]. It utilized a new computing paradigm called Osmotic Comput-

ing, which combines Internet of Things (IoT), Edge, and Cloud Computing in a hier-

archical manner, with IoT at the bottom, Edge in the middle, and Cloud Computing at

the top [7]. This facilitates the movement of different computational elements up the

19

CHAPTER 2. BACKGROUND

hierarchy when the additional resources are needed. MELS is composed of MicroSer-

vices (MS) that represent functionality andMicroData (MD) that represents dataflows

[7]. MELS can be deployed using containers that include all the software components

they require. Those MELS are created dynamically by translating the user-specified

the device requirements and constraints [7]. The orchestration of theMELS is done by

utilizing deep learning to “train a predictive model able to create a MELS deployment

manifest” by previous attempts at monitoring the MELS [7].

The key innovation here is the osmotic computing structure. This approach, in the-

ory, appears to have the best of both worlds since it scales up in a hierarchical manner

as more resources are required. By having IoT devices at the bottom, it prioritizes

using those devices if they have available resources. This minimizes latency. As the

resource requirements increase, it scales to utilizing devices in the middle of the hier-

archy, edge devices. This sacrifices latency for more resources. The same compromise

is made again once resources in the middle are saturated, that is, when the system

scales to utilizing devices at the top of the hierarchy, cloud devices. Those devices in-

crease the amount of resources available at the cost of poor latencies. This ability to

scale as more resources are required makes this approach well suited for applications

with a wide range of computational requirements that vary over time. Mobile robots

are an example of such an application, since computational requirements would be low

in cases where simple unit conversions for sensor data is required, to cases with high

computational requirements such as active sensing using an image recognition plat-

form. Consequently, this hierarchical scaling approach is a promising candidate for

mobile robotics.

2.4.4 Container Migration

The issue of container migration time is an important aspect of orchestration, since

container migration is one of the tools used by an edge orchestrator to achieve an opti-

mized deployment [14]. Container migration is the transfer of one container from one

node to another. Container migration happens to use a significant amount of network

resources due to typical container migration techniques requiring the transfer of the

entire container. A deployment mechanism was developed that minimized the time it

takes for container migration. This was done by relying on application replicas which

exist in multiple nodes to allow for faster migration [14].

20

CHAPTER 2. BACKGROUND

Two components were introduced to allow for this. The first is proactive instance

scheduling. Proactive instance scheduling selects two things, the number of applica-

tion replicas andwhich application replicas to place onwhich nodes [14]. The second is

themanagement of application replicaswhich ensures that various constraints are held

[14]. Containers have been separated to application replicas, which contains files that

do not typically change overtime (such as application dependencies), and container

context information which contain files that change often over time, such as sensor

data.

By relying on the replica system, the amount of data transferred is reduced since the

application replicas can be stored prior to migration on each node, with only the con-

tainer context information being transferred during migration.

The implementation of the application replicas is done using Docker containers. Con-

tainer context information is stored in a Data Volume. This is required to ensure that

the data generated by containers remains available after the containers have been re-

moved. When containers are removed, data within those containers is removed as

well.

The synchronization algorithm that was developed for ensuring consistency between

containers follows the following structure [14]. Firstly, the data volume of the source

container is transferred to the target server. Then processing is stopped in the source

container. The previous 2 steps are repeated to ensure all the changes after processing

is stopped are included in the data volume. Then, the container known as the target

container is created on the target server with the updated data volume. Next, the target

container is started and traffic is then routed from the source container to the target

container. Finally, the source container is released and the synchronization is com-

pleted.

This approach increase the reliability of themigration process by ensuring that the pro-

cess can be rolled back at any point in between. Additionally, application replicas are

typically available on both the source and target prior to this process , and thus only

the data volume is transferred [14]. This results in significant time and bandwidth

savings. A drawback to this approach is the increased storage consumption, caused

by having multiple copies of the application replica at different nodes. The usefulness

of this approach depends on the priorities of the application. If an application pri-

oritized minimal migration time and storage is not a concern, then this approach is

21

CHAPTER 2. BACKGROUND

suitable.

Testing was conducted comparing this method to “traditional reactive stateless migra-

tion” [14]. It was found that total migration time was reduced by 52% for 10MB data

volume sizes and 84% for 50MB data volume sizes [14]. Further improvements are

obtained when changing the values of the other parameters, for example, when vary-

ing the latency of the network and adjusting the synchronization periods between data

volumes [14].

This container approach could be useful to apply to our edge orchestrator depending on

the containers beingdeployed. For containers inwhich the bulk of the data is composed

of files that are not frequently modified (such as dependencies), this approach could

result in significant reductions in both the network consumption and the migration

times when performing migrations. Within the domain of mobile robotics, files which

do change are typically data from sensor readings. If those sensors output large files

such as video feeds then this approach may not be that beneficial since video files are

quite large in size. Therefore the benefits are very much application dependant even

within the context of mobile robotics.

2.4.5 Service Orchestration

Another architecture was developed which was geared towards the orchestration of

services in edge networks [9]. This architecture is composed of twomain components.

The first component is the Edge Orchestrator Agent (EOA) which is available on each

node and has three main responsibilities. Firstly, it handles the management of con-

tainers, resources and attached devices on each node. Secondly, it handles the moni-

toring of each node. Finally, it allows for machine to machine communication for ma-

chines in the edge network [9]. The second component is the Edge Orchestrator (EO)

which is only available on the node running the orchestrator, of which there can only

be one in an edge network. It has three responsibilities. Firstly, it manages the life-

cycle of all the nodes. Secondly, it enables and organizes the deployment of services on

nodes. Lastly, it communicates with the edge orchestrators on each node [9].

This basic architecture would make sense in the domain of mobile robotics, even with-

out the focus on orchestration of services. By having an EOA running on each node, it

could reduce the responsibilities of the EO which does not have to be concerned about

the specifics of monitoring, management, and communication. This means that the

22

CHAPTER 2. BACKGROUND

EO could be more platform generic by relegating the implementation of platform spe-

cific aspects to the EOA. For example, a possible approach for mobile robotics would

be to build two versions of the EOA, one for mobile robot nodes and another for edge

nodes, allowing the EOA to focus on the relevant aspects for each node type. Con-

versely the EO would support both types of EOA and would have the ability to apply

the appropriate logic depending on the type of nodes. This makes this approach suit-

able for domains where there is a wide variety of devices such as the mobile robotics

domain.

Service orchestration can also be tackled by modelling the virtualized service migra-

tion problem as an integer programming problem [42]. An algorithm was developed

to generatemigration actions that is based on representing dependencies of the virtual-

ized service as a graph. The algorithm developed was compared with both the optimal

integer programming solution as well as a baseline algorithm. The evaluation was per-

formed based on both execution time, and service value performance. The algorithm

developed had service value performance that was near identical to the optimal solu-

tionwhile also being several orders ofmagnitude faster in terms of execution time. This

solution demonstrated that graph based approaches can be quite powerful since there

is signficant amount of theory and optimization done on graph based algorithms. The

solution also demonstrated that significant speedups can be obtained by aiming for a

near optimal solution rather than an optimal solution without sacrificing a significant

amount of performance.

23

Chapter 3

Building Blocks

This chapter will first describe the drone software platform that will be used in this

project. This platformwasmostly developed by two previous thesis students, however,

a fewmodifications have beenmade as part of this thesis that will be highlighted in the

chapter. The reason this software platform is being described, is because it will be

deployed on nodes by the orchestrator.

Then, the edge deployment platform chosen will be described. An edge deployment

platform is a platform that has the capability to utilize compute, storage, and network-

ing resources on edge nodes. The edge deployment platform is described because it is a

necessary component to build the orchestrator on top. The edge deployment platform

provides APIs that enable functionality that the orchestrator will use.

Finally, the software packaging solution chosen will be described along with the deci-

sion making process and possible candidates considered. A software packaging solu-

tion packages a software platform, along with all the required dependencies, into one

file which could be deployed on any system that supports the packaging solution. The

software packaging solution is described because the drone software platform and ap-

plication must be packaged into one file. By packaging them into one file, they can be

deployed dynamically without requiring the drone software platform’s dependencies

to be installed on every node prior to deployment.

The design and implementation of the orchestrator are not included in this chapter but

instead compose the entirety of Chapter 4 and Chapter 5 respectively. This was done

because it was the largest part of the project and needed to be in its own chapter for

readability purposes. It is also wheremy key contribution is and therefore presents the

24

CHAPTER 3. BUILDING BLOCKS

majority of the work I performed throughout this project.

3.1 iDrOS

This sectionwill provide a brief overviewof the drone software platform, which is called

iDrOS. The first version of iDrOSwas developed by a previous student, Daniel Cantoni,

for hismaster thesis. The second version was developed by Pietro Avolio for hismaster

thesis and builds on the work by Daniel Cantoni. The version described in this section

will be the one after Pietro Avolio has completed his work. It is important to note that

besides adding the ability to obtain the GPS location and drone battery level through

a server, iDrOS was not implemented as part of this project, but rather is the work of

the two previous thesis students.

3.1.1 Motivation

The first purpose of iDrOS is to enable drones to utilize a more dynamic approach to

drone navigation by relying on the concept of Active Sensing [26].

Unlike traditional drone navigation that typically relies on the concept of predefined

waypoints, active sensing utilizes sensor data to adjust the navigation dynamically. As

a result, active sensing supports tracking and following a specific object, whuch mkes

it a promising technique for aplications such as tracking and video surveillance. Ac-

tive sensing has the potential to make these applications possible by detecting the ob-

ject and its relative location through a camera sensor, then adjusting the navigation

to move towards that location. This flexibility would greatly increase the number of

applications possible on drones.

The second purpose of iDrOS is to simplify the process of adding sensors to drones.

Typically, only a predetermined list of sensors is supported by drone software, which

means that sensors produced in the future would not be supported. It also means that

the user is restricted to a small list of sensors produced by a smaller list of hardware

vendors and therefore provides little flexibility. iDrOS solves this by enabling the use

of custom sensor drivers meaning that any sensor would be compatible by writing the

appropriate driver.

The third purpose of iDrOS was to integrate an Internet component into drones. This

Internet component is composed of interfaces to facilitatemonitoring aswell as control

25

CHAPTER 3. BUILDING BLOCKS

of the drone over the Internet. This enables drones to be connected over a mobile

network facilitating control, and monitoring from anywhere on earth, provided that

there is an Internet connection present on both the drone and the control device.

3.1.2 Design

The design of iDrOS is composed of 3 different layers. Each layer represents a partic-

ular independent set of modules responsible for a particular set of tasks. This design

is shown in Figure 3.1.1 [6].

Figure 3.1.1: System Architecture of iDrOS

Remote Control Layer

The Remote Control layer is composed of various network protocols that are used for

to expose other layers within iDrOS to the internet [6]. This enables monitoring of

various telemetry on the drone, control of the mission, and accessing various sensors.

Additional features could easily be added by implementing a handler that performs a

particular task depending on the input. As part of this layer, a socket server implemen-

tationwas done to allow for socket requests to be used to obtain information such as the

GPS location of the drone or to issue various commands such as modifying the flight

parameters. This socket server was extended to allow for obtaining the GPS location

26

CHAPTER 3. BUILDING BLOCKS

and drone battery level. This extension was performed as part of this project.

Application Logic Layer

The Application Logic layer is responsible for control of missions and applications [6].

A mission has two components, one Navigation Module and any number of Data Ac-

quisition Modules. Navigation Modules use this data to control the navigation and

movement of the drone. DataAcquisitionModules communicatewith sensors to gather

data.

Tomanage amission, aMissionManagement component is implemented which has 4

modules. Firstly, theModule Manager that manages the NavigationModule and Data

Acquisition Modules mentioned earlier [6]. It is also responsible for managing sensor

drivers. This management includes loading the relevant drivers or modules, listing the

current ones, and deleting unused ones.

Secondly, the Mission Manager that takes the Navigation Module and Data Acquisi-

tion Modules loaded into the system and selects the one included in the current mis-

sion. It also allows for direct control of themission and facilitates information transfer

between the Navigation Module and Data Acquisition Modules.

Thirdly, the Sensor Manager manages the sensors onboard the system [6]. This in-

cludes both sensors on the drone (local sensors) as well as sensors exposed by the Con-

nection Layer (remote sensors). The sensor manager abstracts the handling of both

sensor types making it easier for application developers to utilize the appropriate sen-

sor without worrying about where it is located or what kind of sensor it is.

Lastly, the Fail Safe Manager is used in situations where the drone could be endan-

gered [6]. In such a scenario, the mission execution is stopped and the drone lands

safely. The detection of when this manager should be utilized is beyond the scope of

this module and it is left to the application developers.

Connection Layer

TheConnection Layer has twomain function. Firstly, it facilitates communicationwith

the flight controller and sensors [6]. This is called theHardware Abstraction compo-

nent. Secondly, it facilitates utilizing high-performance networks. This is called the

High-Performance Networks Abstraction component.

27

CHAPTER 3. BUILDING BLOCKS

The Hardware Abstraction component utilizesMAVLink to communicate messages to

and from the flight controller [19] [6]. A high-level API is provided to abstract the

most commonly used functionalities such as takeoff, and landing among others. To

communicate with sensors, this component includes a sensor interface that must be

implemented by each sensor using the sensor drivers. This allows for a high-level API

to be used which simplifies the process of using sensors once the interface is imple-

mented.

TheHigh-PerformanceNetworksAbstraction component has threemodules [6]. Firstly,

Remote Sensors is responsible for communicating with sensors on other iDrOS in-

stances. This includes discovering remote sensors in the samenetwork, retrieving their

properties, obtainingmeasurements andmaking themavailable to other instances.

Secondly, Communication Bus enables the use of a bus based communication channel

for the various instances [6]. Since this is a commonly used communication scheme

by developers, it was vital to implement in iDrOS.

Thirdly,ComputationOffloader enablesmoving functions and computations fromone

iDrOS instance to another [6]. It allows for invoking functions across instances and

receiving their result. An example of how this could be used is offloading a computa-

tionally intensive function from the iDrOS instance running on the drone to an iDrOS

instance running on a powerful cloud computer and leveraging the increased resources

to compute the result more quickly. In this example, the edge orchestrator would be

responsible for deploying iDrOS instances on different nodes that can then be used to

offload computations from one instance to another through the use of the computation

offloader.

3.1.3 Deployment Example

Figure 3.1.2 shows an example deployment of iDrOS instances and the corresponding

interfaces. In this example, there are 4 iDrOS instances running in the same network,

with one running on the drone computer which is referred to as iDrOS 0 in this text.

The drone instance is connected to local sensors onboard the drone, as well as the flight

controller that enables control of the flight hardware. It is also subscribed to a commu-

nication channel with two other iDrOS instances, iDrOS 2 and iDrOS 3. Additionally,

through a control interface, it is connected to iDrOS 1. Through this interface, functions

from the drone instance have been offloaded to iDrOS 1. This interface also allows a

28

CHAPTER 3. BUILDING BLOCKS

sensor that is connected to iDrOS 1 to be accessible by the drone instance. This example

demonstrates the interconnectivity of instances and how they can be utilized to build a

distributed network of instances that could be used to access peripherals across devices

and allow for various forms of communication between different instances.

This example deployment is made possible by the edge orchestrator. The edge orches-

trator would deploy an instance of iDrOS on each of the nodes in the example. The

iDrOS instances that have been deployed can then communicate using the various lay-

ers of iDrOS as discussed above. The iDrOS instance iDrOS 1 has sensors physically

connected to it which can be accessible by iDrOS 0 using the Sensor Manager layer

which exposes sensors between different instances. iDrOS 0 has offloaded functions

to iDrOS 1 through the Computation Offloader layer. The communication channel be-

tween iDrOS 0, 2, and 3 is enabled by the Communication Bus layer. The communica-

tion between iDrOS0 and the flight controller is enabled by theHardware Abstraction

layer.

Figure 3.1.2: An example illustrating iDrOS interfaces

3.1.4 Implemented Parts

This section will cover the only parts of iDrOS that have been implemented as part of

this thesis. Any part of iDrOS not mentioned in this section has been implemented by

the two students that have previously worked on iDrOS. There have been two modifi-

29

CHAPTER 3. BUILDING BLOCKS

cations made to iDrOS as part of this thesis.

Drone Battery Percentage

The first modification was to extend the hardware abstraction within the connection

layer to allow for obtaining the remaining battery percentage for the drone. This was

done because the orchestrator needed to use the battery percentage as an input met-

ric.

The modification was implemented by adding a message to be communicated over

MAVLink to obtain this information from the flight controller. This was done in three

steps.

The first one, which is shown in Listing 3.1, adds the battery percentage to the ob-

ject that encapsulates all of the telemetry. This is where the battery percentage will

be stored. The second one, which is shown in Listing 3.2, adds the capability to re-

quest the battery percentage from the flight controller by supporting themessage BAT-

TERY_STATUS which is defined in the MAVLink specification. This updates the bat-

tery percentage stored in the telemetry object with the output from the flight controller.

The third one, which is shown in Listing 3.3, adds battery percentage as a property of

the object encapsulating the drone making it simple to obtain the battery percentage

of the drone through the use of Properties feature that is supported by Python.

1 # battery

2 battery_remaining: int = None

Listing 3.1: Adding battery percentage to telemetry object.

1 elif message_type == "BATTERY_STATUS":

2 self.telemetry.battery_remaining = message.battery_remaining

Listing 3.2: Modification to process MAVLink message to obtain battery percentage.

1 @property

2 def battery_remaining(self) -> int:

3 return self._mav.telemetry.battery_remaining

Listing 3.3: Modification to add battery percentage as a drone property.

30

CHAPTER 3. BUILDING BLOCKS

Socket Server Extension

The second modification to iDrOS was to extend the remote control layer to allow the

socket server, which is used for exposing iDrOS functionality, to expose the GPS loca-

tion of the drone as well as the remaining battery percentage of the drone. The purpose

of themodificationwas to enable the orchestrator to accss those two values. The socket

server was a simple way to do so since it had already been implemented. It was simply

extended to expose the GPS location and battery percentage as well.

The extension was done by adding a new request handler, which handles requests re-

ceived by the socket server, that is specifically used for obtaining telemetry from the

drone. The telemetry obtained was restricted to just the GPS location and battery per-

centage since those were the only values used by the orchestrator that needed to be

obtained from iDrOS. An example of the request type that would be sent is Listing 3.4,

which shows a request to obtain the GPS location. It specifies that the handler is the

drone handler, and that the action is to get_gps.

The request handler implementation is shown in Listing 3.5. The main function is

the handle function, which has the logic for processing requests for this handler. This

function is only used if the request is detected to be processed by the drone handler that

is specified as shown in the example request previously discussed. The handle function

first obtains the drone object, which is used to obtain the drone’s telemetry. Next, the

action field is checked. If it corresponds to get_gps then a message with the latitude

and logitude of the drone is sent as the response through the socket server. If the field

corresponds to get_battery_remaining then the remaining battery percentage is sent

as the response through the socket server. Otherwise, an errormessage is sent through

the socket server since the action field does not correspond to an available action.

1 {"handler": "drone", "payload": {"action": "get_gps"} }

Listing 3.4: Request to obtain GPS coordinates

1 class DroneHandler(IdrosRequestHandler):

2 def __init__(self):

3 self.drone = None

4

5 def get_handler_name(self):

6 return Handlers.DRONE.value

7

8 def handle(self, socket_request_handler , request):

31

CHAPTER 3. BUILDING BLOCKS

9 self.drone = drone.get_drone()

10 req_action = request["action"]

11 lat = 0

12 lon = 0

13 battery_remaining = 200

14 if req_action == "get_gps":

15 lat = self.drone.location.lat

16 lon = self.drone.location.lon

17 msg = json.dumps({'handler': Handlers.DRONE.value,

18 'payload': {'status': HandlersResponse.OK.

value,

19 'lat': lat,

20 'lon': lon}

21 })

22 elif req_action == "get_battery_remaining":

23 battery_remaining = self.drone.battery_remaining

24 msg = json.dumps({'handler': Handlers.DRONE.value,

25 'payload': {'status': HandlersResponse.OK.

value,

26 'battery_remaining':

battery_remaining}

27 })

28 else:

29 msg = json.dumps({'handler': Handlers.DRONE.value,

30 'payload': {'status': HandlersResponse.ERROR.

value}

31 })

32 socket_request_handler.sendall(msg)

Listing 3.5: Drone request handler for exposing GPS location and battery percentage.

3.2 Edge Deployment Platform

3.2.1 Motivation

An edge deployment platform is a platform thatmanages nodes in an edge network. An

instance refers to an application running on an edge node that requires computation

and/or storage resources. The advantage of an edge deployment platform is that soft-

ware can be deployed on any node as long as the edge deployment platform is installed

on that node.

32

CHAPTER 3. BUILDING BLOCKS

Additionally, using such a platform means that physical access to nodes would not be

required after installing the platform. Therefore, such a platform is well suited towards

the mobile robotics domain, since the orchestrator would be able to dynamically man-

age any instances on the edge nodes without requiring the user to actively access the

edge node. It additionally means that it does not matter which application is deployed,

since the application could come froma remote server and does not have to be available

prior to deployment on each node.

3.2.2 Hardware Requirements

It is important to note that any edge deployment platform would have to be deployed

both on edge computing hardware as well as the onboard computer that drones have.

Since the onboard computer that drones have is typically a single-board computer with

limited hardware capabilities, a common onboard computer’s hardware capabilities

will be treated as the limiting factor that determines whether the edge deployment

platform’s minimum requirements are met. Those requirements include processing

power and memory.

The typical onboard computer used for the purpose of this project is the Raspberry Pi

4. The choice is justified by the fact that Raspberry Pi 4 is one of the most modern

single board computers that has a large number of users and wide support in terms

of both supported operating systems and technical support. The configuration of the

Raspberry Pi 4 with the highest specifications has the following hardware specifica-

tions [2]:

• 1.5GHz Quad core ARM Cotex-A72 SoC,

• 4GB of LPDDR4 SDRAM,

• micro-SD card support with up to 2TB of storage.

It is important to note that since then, a configuration of theRaspberry Pi 4with 8GBof

LPDDR4 SDRAMwas released. However, this was not a factor in the decision making

process since it was not available at the time the decision was made [2].

Boards utilizing themore ubiquitous x86-64 architecture were considered. The UDOO

x86 II ADVANCED PLUS which uses the x86-64 architecture claims to be ten times

more powerful than a Raspberry Pi 3 [36]. However, using the computing capabilities

of the UDOO as the minimum requirements for the edge deployment platform would

33

CHAPTER 3. BUILDING BLOCKS

be restricting the range of devices that can be deployed on the edge network. There-

fore, it was instead decided to fix the requirements to match the specifications of the

Raspberry Pi 4 to support any device that is at least as powerful. This means that our

orchestrator would support a wider range of devices to act as nodes.

3.2.3 Rejected Platforms

Before discussing the chosen edge deployment platform, other platforms which were

considered but rejected will be discussed in this section. Showing the rejected plat-

forms highlights why the chosen platforms was picked.

OpenNebula

OpenNebula has been developed to be a datacenter management solution for cloud

and edge deployments [27]. Those deployments can be in the form of virtual machines

or system containers. OpenNebula provides a number of virtual machines and system

containers from its marketplace that run popular Linux distributions.

OpenNebula has the capability of being managed through a number of different inter-

faces. Themost relevant is the Python interface called PyONE. This interface allows for

the integration and management of Open Nebula deployments through a Python API.

The API simplifies the development of an orchestrator built with OpenNebula since

the orchestrator can utilize the vast ecosystem of Python libraries and modules.

OpenNebula has some minimal hardware requirements that are needed on nodes that

run it [27].

• 4 GiB RAM,

• 20 GiB of free space.

Apart from the aboveminimal hardware requirements, OpenNebula has extra require-

ments so that it can deploy virtual machines or containers. A processor is required that

uses the x86-64 architecture. Unfortunately, the Raspberry Pi 4 uses a processor that

uses the ARMv8 instruction set. This meant that it was not possible to use OpenNeb-

ula whilemaintaining compatibility with devices that utilize the ARMv8 instruction set

which is commonly found on drones.

34

CHAPTER 3. BUILDING BLOCKS

OpenStack

OpenStack is a cloud computing platform that is intended to be scalable, have a simple

implementation, and a large set of features [30]. One of the supported features is a

marketplace where different distributions and appliances could be deployed from [28].

Other features include a load balancer, a DNS service, and a messaging service [28].

The wide range of features is enabled by the modular architecture of OpenStack that

allows for plug and play of different components that enable different features and

services depending on the users needs [28].

A cloud network that utilizes OpenStack is composed of nodes of which there are two

mandatory types. Those nodes are mandatory in that a cloud network must have at

least one of each. The first mandatory node is the Compute node, which is where vir-

tual machines that should be deployed would run [30]. The Compute nodes use the

kernel-based virtual machine hypervisor, which is commonly found in Linux systems.

The second mandatory node is the Controller node, which manages the various Com-

pute nodes [30]. The Controller node is responsible for managing the networking that

allows nodes to communicate with each other. It exposes the dashboard that the user

uses to manipulate the network.

Unfortunately, OpenStack Nova requires significant available resources on Controller

and Compute nodes. Controller nodes must have the following hardware require-

ments:

• 1-2 core CPU,

• 8GB RAM,

• 100GB Storage,

• 2 network interface controllers.

Compute nodes must have the following hardware requirements:

• 2-4 core CPU,

• 8GB RAM,

• 100GB Storage,

• 2 network interface controllers.

35

CHAPTER 3. BUILDING BLOCKS

The 8GB RAM requirement is well above the hardware specification we are target-

ing (i.e. 4GB of RAM). The hardware specification we are targeting required a more

lightweight platform so the OpenStack Nova platform was rejected.

3.2.4 Fog05

The edge deployment platform chosen was Fog05. It was chosen due to its active de-

velopment status meaning that our requirements could be feasibly accommodated by

the Fog05 team. It supported a wide array of runtimes providing increased flexibility

to use the runtime that is most appropriate for our use-case.

It has a Python API with which the orchestrator could be built [12]. Python is the same

programming language used for iDrOS, meaning there would be a lower technical bar-

rier when moving from developing iDrOS to developing the orchestrator. The Python

API also allowed us to use a wide array of Python libraries that are available for the

language.

Another benefit of Fog05 is that it is completely open source, which means that its

functionality could be extended by forking it and implementing additional features that

will be used by the orchestrator.

Fog05 is composed of 4 main services which allow for its operation [12]:

1. Agent Plugin: The Agent Plugin is responsible for the close-loop management

of a node. It enables the other plugins to work by exposing node resources to

them.

2. Operating System Plugin: The Operating System Plugin is responsible for

operating system level functionality. Therefore, building an operating system

plugin for a specific operating system, would be akin to adding support for that

operating system to run fog05.

3. Networking Plugin: The Networking Plugin is responsible for the manage-

ment of a network of nodes. Some of the functionality the Networking Plugin

enables include creating andmanaging virtual networks, and virtual bridged net-

works.

4. Runtime Plugin: The Runtime Plugin is responsible for what could be run us-

ing Fog05. For example, one of those plugins is the LXD runtime plugin which

36

CHAPTER 3. BUILDING BLOCKS

enables support for running and managing LXD containers on any fog05 node.

Fog05 has support for Linux based operating systems and has a wide array of runtimes

supported with which one could deploy applications in. The supported runtimes are

[12]:

1. Native executables,

2. LXD containers,

3. Kernel-based Virtual Machine (KVM),

4. Containerd (which includes Docker containers) has experimental support.

Those runtimes demonstrated the possible options when it comes to packaging iDrOS.

This was something important to explore to find themost appropriatemethod of pack-

aging iDrOS and is explored in the next section.

3.3 Software Packaging

3.3.1 Motivation

There is a wide number of runtimes supported by Fog05. iDrOS and any application

running on top of iDrOS to be deployed over Fog05 must to be packaged into one of

those runtimes. This section will present information about each of the software pack-

aging platforms, then discuss the decision making process behind which packaging

solution was chosen.

iDrOS and applications running on top of it have a large number of dependenciesmak-

ing it non-trivial to set up and run. This is especially the case when the iDrOS are dy-

namically instantiated on edge nodes that do not necessarily have the dependencies

installed. However, iDrOS, the applications running on iDrOS and the dependencies

can instead be packaged onto one file using a software packaging solution. This file can

be distributed to the different edge nodes and run accordingly, without needing any de-

pendency beyond what is needed to run the software packaging solution. This makes

deployment platform agnostic, meaning that any software that has been packaged us-

ing a supported software packaging solution could be deployed. It also increases sup-

port for software on different platforms.

For example, if a piece of software is designed only to run on Debian systems, a pack-

37

CHAPTER 3. BUILDING BLOCKS

aging solution could be a virtual machine running Debian with the software installed.

This virtual machine could have been packaged using the Open Virtualization For-

mat (OVF), which means it is compatible with a wide array of virtualization solutions

that themselves are compatible with a wider array of operating systems. As a result,

softwarewhich only runs onDebian systems, can now run on any operating system that

has a virtualization solution that supports OVF. There is, unfortunately, a cost to this

in terms of performance and hardware requirements, which is studied inmore depth in

the next section. The compromise between performance, compatibility, and flexibility

is why it is so important to pick the appropriate software packaging solution.

3.3.2 Background

There are several methods to automatically deploy software in packages that ensure

the same environment regardless of which machine or operating system the user is

running. By ensuring the same environment is replicated, the user would not have to

worry about incompatible packages in their system. It also saves valuable time since

the setting up process is automatic, and the user would have a setup that can run the

software immediately. Furthermore, it provides a consistent environment that makes

it easier to debug by reducing the number of software configurations to one.

This section will focus on three deployment methods. The first and oldest method is

to deploy a full Linux virtual machine with certain configuration parameters that en-

sure each machine contains unique identifiers. The second is to deploy the software

in containers, such as Docker. Containers package the software, libraries and configu-

ration files into one container that is isolated from other containers in the user-space

but share the same operating system kernel. The third method is to compile the soft-

ware and its dependencies into a unikernel. A unikernel is an operating system that

has been compiled to only contain the necessary software and its dependencies. This

has the consequence of there being no distinct user and kernel modes. Instead, the

OS is simply composed of the software, the dependencies and runtime to run the soft-

ware.

There are several performance metrics which are important for the purposes of the

orchestrator. Because of this, several studies will be presented that compare those

performance metrics between different software packaging platforms.

The first of those performancemetrics is throughput in I/O intensive applications. This

38

CHAPTER 3. BUILDING BLOCKS

is important because there are certain applications of drones that are I/O intensive

such as capturing high definition video. So checking which software packaging plat-

form is best performant in I/O intensive applications.

The performance of running transactions in aMySQL database was compared between

running Linux natively, Kernel-based Virtual Machine (KVM), and Docker [15]. The

results found that the Docker configuration had similar performance to native (2%

lower than native at high concurrency). KVM had an overhead of 40%when compared

with native [15]. Despite the poor performance of KVM, it had no significant overhead

in CPU and memory usage but had significant I/O performance penalties. Therefore,

KVM could be a good choice provided the software is not I/O intensive.

The second important performance metric is the startup time of the software packag-

ing runtime. The start up time determines how long it takes for the software pack-

aging runtime to start and the software that is being packaged to run. The faster the

start up process is, the lower the response time would be when deploying applications

on nodes during orchestration and receiving the signal that the application has been

deployed.

Startup times of a unikernel (OSv running on KVM), a Linux virtual machine (KVM)

and a container (Docker) were compared [38]. This was done for 10, 20 and 30 in-

stances of each. It was found that the container was the fastest to start up at 0.52

seconds followed by the unikernel at 0.72 seconds followed by the virtual machine at

6.98 seconds, all of which were for 10 instances. Note that those times are for the OS-

/container startups and do not include the time to start the virtual machine process,

which is identical for the Linux VM and unikernel since they both utilize KVM. This

study provides some insight as to what provisioning times would be expected for each

of those methods. This will be useful in determining provisioning time requirements

when instantiating instances on edge.

A virtualization solution was developed called LightVM [24]. It is a virtualization solu-

tion based on Xen that utilizes distributed mode of operation to reduce the number of

interactions with the hypervisor. LightVM can boot a virtual machine in 2.3ms which

is two orders of magnitude faster than Docker. LightVM also utilizes a build system

called Tinyx that creates small Linux VM images that are designed to run one soft-

ware platform but keep the flexibility of being based on Linux. This build system could

also be automated making it a candidate for dynamically generating small Linux VMs,

39

CHAPTER 3. BUILDING BLOCKS

regardless of the virtualization solution.

3.3.3 Possible Candidates

This section will present information about possible software packaging solutions that

could be used. In the Discussion section, the information will be used to analyze and

decide which software packaging solution to pick.

OSv

OSv is a modular unikernel that runs on micro-VMs in the cloud [29]. It is intended

to run unmodified Linux applications [29]. OSv can only supports software which re-

quires only a single process [29]. OSv canbe runon the following four platforms[29]:

1. Locally on a Linux/Mac/Windows installation,

2. virtual machine using VirtualBox or KVM,

3. Amazon EC2,

4. Google GCE.

UniK

UniK is a tool for compiling applications into unikernels [35]. UniK supports awide va-

riety of languages and a wide variety of unikernels, however, Python 3 support is avail-

able only for one unikernel (rumprun) [35]. UniK supports Python3.5 with a simple

compilation procedure [32]. Python code is compiled toRumprun unikernels but there

are two specifications that the code must meet. Firstly, it must have one main file in

the project [32]. Secondly, all dependencies must be installed locally to the root direc-

tory of the project [32]. The following 5 platforms are supported for running unikernels

which support Python 3: AWS,Openstack, QEMU, Virtualbox, and vSphere [32].

Docker

Docker is themost popular container software [10]. Due to its popularity, it has a large

number of base images, which can be used as building blocks to build custom images

from [3]. Due to Docker’s high popularity, there is a high probability of Docker being

supported by whichever edge deployment platform is used. Unfortunately, software

40

CHAPTER 3. BUILDING BLOCKS

running on Docker can only run one process at a time. Custom Docker images can be

created using Dockerfile which is composed of a list of commands to be run in a base

image that generate a new image [10].

In terms of Docker base images, there are multiple feasible options:

1. Python images (preconfigured to contain necessary Python environments), [3]

(a) Based on Debian.

(b) Available in normal and slim (reduced size) variants.

2. Ubuntu images (must be configured with correct Python environments). [3]

(a) Less convenient than Python images.

(b) Similar base (Ubuntu which is based on Debian).

3. LinuxAlpine images (must be configuredwith correct Python environments). [3]

(a) Utilize less disk space so could result in faster instantiation time (36.8MB

compared to 145MB for Ubuntu for the application shown in [11]).

(b) Has slower performance than Ubuntu/Debian based packages.

LXD

LXD is a container manager that “offers a user experience similar to virtual machines

but using Linux containers instead.” [23]. LXD runtime is supported by any Linux dis-

tribution that supports Snaps, which is a packagemanager for Linux operating systems

[23]. LXD also enables more isolation than Docker [34]. For example, each container

uses its own networking and storage stack. Therefore it ismore comparable to VMware

and KVM and other hypervisors. LXD supports applications that require more than

one running process [23]. Since it uses the same Linux kernel as the host machine,

equivalent LXD images should be significantly smaller than KVM images [23]. LXD

is less popular than Docker and hence there is a higher chance that the instance man-

agement platform used does not support it. On the contrary, this is not a problem for

Fog05, which supports LXD. Using LXD, it is possible to create a container using base

OS images, install all the required dependencies, then export the container as an im-

age [8]. Using this process, it becomes possible to create an image that contains the

required application dependencies, which is similar to what is accomplished by Dock-

41

CHAPTER 3. BUILDING BLOCKS

erfiles.

3.3.4 Discussion

Comparing these findings with the Fog05 platform, a number of decisions could be

made in terms of viability and suitability of each packagingmethod. The requirements

for running native software platforms in Fog05 was that each machine must contain

the native software platform as well as its dependencies prior to instantiation. This

is quite problematic since dynamic instantiation would not be possible because the

dependencies must be present prior to starting operation.

A study analyzing the performance of KVM is presented in Section 3.3.2. It demon-

strated that large VM sizes would result in long startup times, high RAM and disk us-

age. Therefore, large VM sizes decrease the maximum number of parallel instances

that could be deployed. In comparison, Docker andLXDcontainers demonstratemuch

of the benefits of KVM in terms of isolation between instances, without the large draw-

backs, therefore making KVM less likely tomeet themaximummemory usage require-

ments.

Moving on to the unikernels, each unikernel discussed has a few problems which pre-

vent their use on this project. OSv supports packaging Python software platforms as

long as they use Python 2. Since official Python 2 support ended on the 1st of Jan-

uary 2020, from a security and support standpoint it would be preferable to go with

another option, seeing as applications that can be packaged with OSv would not meet

the requirements for the support and project lifecycle.

Unik is the most promising of the unikernel related tools, since it supports a relatively

modern version of Python (3.5) and uses a rather simple compilation procedure. How-

ever, the onemajor issue is that none of the platforms that the unikernel could be pack-

aged into are supported by Fog05 at the moment.

This narrows the number of choices to two, that are, Docker and LXD. Both options

are used to run and deploy containers. LXD is currently supported by Fog05 while

Docker support is in development. While they are both container based packaging

systems, there are differences between them. Docker containers share the networking

and storage stack with each other, while LXD containers isolate these aspects so that

each container has its own networking and storage stack. This is quite useful since

42

CHAPTER 3. BUILDING BLOCKS

iDrOS makes heavy use of networking through features such as remote sensor access.

LXD isolating containers fromanetworking perspective allows formore flexibility with

regards to networking and communication. Furthermore, Docker support in Fog05 is

currently experimental, while LXD has first-class support. This was the reason LXD

was chosen for packaging iDrOS and the applications running on iDrOS.

3.3.5 Implementation

After choosing LXD as an adequate software packaging solution, the next step was to

develop amethod of packaging iDrOS and the applications running on iDrOS to one file

that will be deployed using Fog05. To simplify this, a script was implemented which is

composed of the following steps:

1. Obtain and run a base Ubuntu LXD container using a base image.

2. Install dependencies inside the container.

3. Copy iDrOS and application files into the container.

4. Configure permissions inside the container.

5. Enable iDrOS bootup service inside the container.

6. Stop the container.

7. Export the container image as a tar.gz file.

By using this script, an image containing iDrOS and the application running on iDrOS

can be generated with one command, making it very simple to regenerate if changes to

iDrOS or its application were made.

In addition to iDrOS and the applications running on iDrOS, this image includes a ser-

vice which was implemented to allow iDrOS and its application to run whenever the

container boots up. This service was implemented using the Systemd interface which

simplifies the creation of services while providing flexibility in terms of powerful fea-

tures. This flexibility enabled features such as automatic restart on failure and logging

to the system output. While the implementation of the script and service is specific to

iDrOS, the architecture and steps taken are general enough to apply to any software

platform needing packaging.

43

Chapter 4

Edge Orchestrator - Design

4.1 Motivation

The reason to build the edge orchestrator is to provide developerswhowish to integrate

edge computing into their applications a platform that makes it significantly simpler

to do so. With this in mind, the orchestrator has been designed to be general but also

capable of supporting modular application specific customization. This customizabil-

ity was added to increase the number of possible applications that can be served by

the edge orchestrator, and to allow developers to program their own customizations to

make sure they get an optimized orchestration for their specific application.

4.2 Main Objectives

The design of the orchestrator has been shaped around three objectives that the or-

chestrator must achieve.

First, the orchestrator must capture and store data about the edge network. This data

includes the number of nodes in the edge network, network conditions, etc.. Applica-

tion developers, who develop applications that are to be deployed using the edge or-

chestrator, should be able to add their own custom data that they need to be recorded.

Second, the orchestrator must utilize this data to make decisions on whether and how

to modify the current edge deployment. Those decisions take the form of actions to

instantiate an instance on a particular node, stop an instance that is currently on a

node, or migrate an instance from one node to another. The orchestrator must gen-

44

CHAPTER 4. EDGE ORCHESTRATOR - DESIGN

erate those actions to convert the current edge deployment to an optimized deploy-

ment. That optimized deployment is generated from analyzing the current data about

the edge network and also application specific data that has been added by the appli-

cation developers. Third, the orchestrator must be able to perform the conversion of

the current edge deployment to the optimized deployment that has been previously

generated. This means that the orchestrator must be able to interface with the edge

deployment platform (Fog05 in this case) to modify the current edge deployment to

match the intended deployment.

4.3 Data Capture

In this section, the first objective, which is capturing and storing data is discussed in

more detail. Data that provides information about the network itself (e.g., information

about a set of nodes that are online, a set of nodes that have an instance running and

a latency between nodes) is not application specific and thus it needs to be obtained

regardless of which application is being deployed.

On the contrary, other types of data that are application specific need to be gathered

only when the particular application is deployed. For example, when using iDrOS and

an application on top of iDrOS it would be important to gather data that provides in-

formation about GPS location and the battery level of the drone.

The capability to gather data that is specific for different purposesmeans that flexibility

andmodularity would have to be inherent to the design. Itmeans there should be away

for the orchestrator to distinguish between data depending on its purpose. Data that is

on the application level would have to be defined in a standard way to allow developers

the ability to add and capture custom data as required by their application.

This modularity has been integrated in the form of a information grouping system

called a Metric. A Metric is a set of data that has been grouped together due to a

relation. This relation is up to developers to decide depending on their needs. This

grouping system allows for modularity by supporting the addition and removal of dif-

ferent metrics, which inherently corresponds to the addition and removal of different

data to be gathered.

Metrics can also be designed to obtain data through any different means. For example,

drone battery percentage needs to be obtained throughMAVLinkmessages. By group-

45

CHAPTER 4. EDGE ORCHESTRATOR - DESIGN

ing closely related data together into one set, the libraries required for obtaining the

data are similarly grouped and certain optimizations can be made (e.g., sending one

MAVLinkmessage that obtains several pieces of data rather than sending onemessage

for each piece of data that must be obtained)

4.4 Data Generation

Some data is automatically generated such as the drone battery percentage, which is

automatically generated by the flight computer, which nodes are online, and which

nodes have an instance running. The latter two are automatically generated by the

Fog05 API. Other types of data must be generated through methods that have been

designed and implemented as part of this thesis. Two examples are latencies between

all the nodes in the network and the total amount of data that has been used by each

node, in up and down directions.

The last two examples of data types share one common characteristic, which is that

each node would have a unique set of values for this data, and thereby, this data must

be generated on each node independently. This necessitates that there must be a data

generation system running on each node that is constantly measuring and updating

this data. It also necessitates that this system must be able to communicate with the

orchestrator to report this data.

Two benefits of the Fog05 API are that it is both open-source and extensible which

means it could be extended to allow for the communication of this additional data nat-

urally because it already reports data that is specific to each node. As for the problem

of constant measuring and updating of the data, this could be solved through a service

which runs during the boot of each node. This service monitors and measures data

such as the two mentioned previously, latency and data usage. Since it is a service

which runs during boot, it is able to measure this data for the entire time a node is

powered on, until a shutdown or reboot, which would reset the values. The decision

to reset the values when power-cycling the node is because those measurements are

intended to be for the duration of the powered-on period for each node. This data is

updated periodically to ensure that it reflects the latest changes.

Not that this section is not an objective of the orchestrator because it lies beyond the

responsibilities of the orchestrator. Data generation is not part of the orchestrator but

46

CHAPTER 4. EDGE ORCHESTRATOR - DESIGN

instead is composed of code that communicates with the orchestrator. The orchestra-

tor’s responsibility is to collect and store this data, not generate it.

4.5 Optimization Strategies

In this section, the second objective, which is using the captured data tomake decisions

about the current deployment and how and whether it should be modified is discussed

in more detail. Those decisions will be made by first generating an intended edge de-

ployment by using the data available. This intended edge deployment is then compared

with the current deployment and the differences between them are used to generate ac-

tions that convert the current edge deployment to the intended edge deployment. In

the scenario that no differences were found, then no actions are generated. Otherwise,

one of the following actions can be made:

• Instantiate action that deploys an instance on the node that is specified. This

instance in our case would be a container containing iDrOS and the application

on top of it.

• Terminate action that stops and deletes the instance that is currently running

on the node that is specified.

• Migrate action that moves the instance running on the specified source node,

to the specified destination node.

Figure 4.5.1: An example of the decision making process

An example of the decision making process is shown in Figure 4.5.1. In this example,

we have the current deployment that has 4 nodes of which Node 1 (highlighted in red)

47

CHAPTER 4. EDGE ORCHESTRATOR - DESIGN

has an instance running. Let us assume that there has been a change in the conditions

that made Node 2 more desireable for the instance to be running on (e.g., due to im-

proved latency of Node 2 compared with Node 1). The decision making process sees

this through the data that reports the improved latency of Node 2 compared with Node

1. Based on this, the intended deployment calculated by the decision making process

reflects this and shows that Node 2 should have the instance running, as demonstrated

on the right in Figure 4.5.1. The decisionmaking process then generates the action that

is required to convert the current deployment to the intended deployment, which is to

migrate the instance from Node 1 to Node 2.

Because this orchestrator is intended to be used by various different applications, and

the data being captured is dependent on what the application developer has required,

there is no standard algorithm that could be followed for determining how tomake de-

cisions and how to modify the deployment. This necessitates that the decision making

process should be modular and customizable so that different decision making strate-

gies could be implemented and used depending on the kind of application.

This modularization was achieved by designing a flexible template for building custom

decision making components and utilizing them. This template is called an Optimiza-

tion Strategy. AnOptimization Strategy is composed of twomain parts that can be cus-

tomized. The first part is the decision making logic, which is responsible for analyzing

the current deployment and generating a new deployment graph if deemed necessary.

This part can be customized as long as the input is the current deployment graph, and

the output is the intended deployment graph. The logic to determine the deployment

graph is where the flexibility of the decisionmaking process lies. The second part of the

Optimization Strategy template is the execution conditions. The execution conditions

determine whether an Optimization Strategy should run, i.e., should the Optimization

Strategy in question be used to generate the intended deployment graph, or should

another Optimization Strategy be used. This highlights an important feature of the

decision making process, which is support for multiple Optimization Strategies, with

the execution conditions and a priority list determining which strategy would be used.

Optimization Strategies would be ranked by the developer on a priority based ranking,

then the execution conditions are used so that the Optimization Strategy with the high-

est priority that meets its execution conditions would be the one used for generating

the intended deployment.

48

CHAPTER 4. EDGE ORCHESTRATOR - DESIGN

The reason to have those execution conditions is to provide some intelligence in decid-

ing which Optimization Strategies to run. The benefit of having multiple Optimization

strategies as well as execution conditions can be demonstrated in the following exam-

ple. Let us say that we have two Optimization Strategies, one which optimizes for com-

putation speed, and another which optimizes for battery life. If our drone had plenty

of battery life remaining, then the execution conditions for the battery optimization

would not be met, since battery optimization is not needed at the current battery level.

In this case, the computation speed optimization would run to maximize the compu-

tation speed. If the battery level drops to a certain low level, which is dependant on

what the user determines to be a low battery level, then the execution conditions for

the battery optimization would be met and the battery optimization would run instead

of the computation speed optimization to optimize for battery life. Therefore, having

execution conditions andmultiple Optimization Strategies allowed for amore dynamic

system that chooses the most appropriate Optimization Strategy for each specific pur-

pose and scenario.

4.6 Action Execution

In this section, the objective to execute the actions to convert the current deployment

to the intended deployment is discussed in more detail. To be able to execute those ac-

tions, there needs to be the capability to thosemap those actions to actions understood

by the edge computing platform used, which in our case is Fog05.

There is a need for some form of verification to ensure that any valid action can be

executed. Let us assume that an action is generated on a certain node. Then, this

node disconnects from the network before the action has begun executing. The action

is guaranteed to fail as long as the node is still disconnected. Therefore, there needs

to be a verification process immediately before the beginning of the execution of each

action that verifies that the action is still valid.

There needs to be a procedure for shutting down the orchestrator that should ensure

that all instances on all nodes shut down gracefully. This is necessary because this or-

chestrator considers each session to be entirely self-contained. The session is defined

to be fromwhen the orchestrator begins running, to when the orchestrator shuts down.

Self-contained in this case means that any instances which have been initialized in a

session, must also be shut down before the session ends. The requirement that each

49

CHAPTER 4. EDGE ORCHESTRATOR - DESIGN

session is self-contained simplifies the process of obtaining data from particular nodes

and monitoring instances, since the orchestrator in a session has been responsible for

every instance which has been initialized. The procedure for shutting down the orches-

trator is based on the requirement to be self-contained and therefore the procedure is

to shut down all instances in the network before shutting down the orchestrator pro-

gram.

4.7 Hardware Requirements

The hardware requirements for the orchestrator are derived from the hardware re-

quirements for the edge deployment platform from Section 3.2.2. Those are the fol-

lowing specifications:

• 1.5GHz Quad core ARM Cotex-A72 SoC,

• 4GB of LPDDR4 SDRAM,

• micro-SD card support with up to 2TB of storage.

50

Chapter 5

Edge Orchestrator -Implementation

5.1 Architecture

In what follows we discuss the architecture of the orchestrator.

Figure 5.1.1: System Architecture of the orchestrator.

As shown in Figure 5.1.1, the orchestrator architecture is based on three modules: Sur-

veying, Analysis, and Execution. The Surveying Module is responsible for gathering

data from edge nodes in the network. The Analysis Module is responsible for utilizing

that data to analyze the current deployment graph and generate an optimized deploy-

ment graph. The Execution Module is responsible for turning the current deployment

51

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

graph into the optimized deployment graph. The deployment graph refers to a graph

of all the nodes in the edge network, which includes mobile robots and edge nodes.

Edges in this graph represent the distance between the nodes. The measure of dis-

tance is up to the application that will be deployed to determine. The 3 modules are

each running periodically so that updated information is constantly being factored into

the orchestration. It is important to mention that Monitoring Module is an additional

module that is not strictly part of the orchestrator but rather runs on each node in the

network to gather important monitoring data that will be sent to the orchestrator. The

Monitoring Module will be explained in further detail in the next section.

There are two interfaces: Graph and Action. The Graph Interface is located between

the Surveying Module and the Analysis Module. The Action Interface is located be-

tween the Analysis Module and the Evacuation Module. These two interfaces improve

the flow of information between the three modules and they also perform conversions

of objects, since different modules use different types of objects to store data.

There are 3 custom objects that are in use by different modules. Those objects are

Metric, Network Graph, and Optimization Strategy. The Metric object is used by the

SurveyingModule and represents a group of related data. Updating and gathering that

data is the responsibility of the Metric object that is attached to that data. The Metric

object is based on theMetric information grouping systemdiscussed in Chapter 4. Net-

work Graph is used by the Graph Interface and is designed to abstract themodification

of the deployment graph. Updating of the deployment graph and generation of new

deployment graph is the responsibility of the Network Graph object. The last object is

the Optimization Strategy which is utilized by the Analysis Module. An Optimization

Strategy would be responsible for implementing the optimization logic to optimize the

current deployment graph. Different Optimization Strategies may prioritize different

aspects to optimize. Optimization strategies may have execution conditions that de-

termine if the strategy should run.

Class diagrams for the different modules of the orchestrator are found in Appendix

A. The architecture of the orchestrator was chosen to maintain the modularity of dif-

ferent modules of the orchestration. For example, if the execution parameters were

to be modified, only the execution module would have to be reimplemented rather

than reimplementing the entire orchestrator. This applies to the other modules as well

meaning that only themodule that is changing and potentially the interface themodule

52

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

is connected to would have to be modified. By utilizing the 3 custom objects, i.e., Met-

ric, Network Graph and Optimization Strategy, the implementation specific properties

for those 3 objects can be abstracted from the modules. An example of where this is

used is the implementation of the Optimization Strategies which has been abstracted

from the other parts of the orchestrator. Consequently, a new Optimization Strategy

can easily be added to the system by simply implementing it and adding it to the list of

available Optimization Strategies.

The architecture of the orchestrator was based on maintaining a continuous flow of

data. The flow of data goes from all the Monitoring Modules on the edge nodes, to

the Metric object. Those Metric objects are collected by the Surveying Module and

passed to the Graph Interface. The Graph Interface uses those Metrics to update the

Network Graph object stored in the Analysis Module. The Analysis Module passes

the Network Graph object to the highest priority Optimization Strategy that meets its

execution conditions. The Optimization Strategy passes back an optimized intended

graph to the AnalysisModule which forwards alongwith the current deployment graph

to the Action Interface. The Action Interface generates a list of Actions necessary to

convert the Network Graph to the optimized intended graph. This list of Actions is

passed to the Execution Module which executes those actions on the edge nodes that

require any changes.

5.2 Monitoring Module

To be able to gather useful data about each node in the edge network, the Monitoring

Module was included in the architecture. The purpose of this module is to run in the

background in any node that is currently online, gathering data, and communicating

with other nodes in the same network. This module gathers two important data.

First, the network usage of every node (in both up and down directions) is gathered.

This is important information for the orchestrator so that it can restrict the network

traffic if the user has a limited data plan. This limited data plan is a limitation on the to-

tal number of data transferred during a single session. Second, this module includes a

ping systemwhich checks which nodes are currently online, then sends a fixed number

of ping packets to each node, measuring the average amount of time taken to receive an

acknowledgement. This provides the orchestrator with information about the distance

between nodes in terms of latency.

53

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

When implementing the Monitoring Module the objective was to integrate it with the

Fog05 LinuxOS plugin. The OS plugin contains a function for compiling and reporting

the status of each nod. By extending this function to utilize the Monitoring Module,

we were able to add the custom data that we require. This was done to promote code

reusability since the functionality to report the status was already implemented by the

Fog05 team. This injection can be seen in Listing 5.1.

1 ping = self.monitoring_module.ping_nodes()

2 status.update({'ping_nodes':ping})

3 bytes_received , bytes_sent = self.monitoring_module.network_usage()

4 status.update({'bytes_received':bytes_received})

5 status.update({'bytes_sent': bytes_sent})

Listing 5.1: Injecting custom statistics into the node status.

As Listing 5.1 shows, there are two key functions which obtain the necessary data. The

first one, ping_nodes, utilizes the pythonping library, to ping all the online nodes and

obtain the ping information. A simplified implementation of the ping_nodes function

is shown in Listing 5.2. In this implementation, the function sends 5 packets of size 40

bytes each and measures the round-trip time, i.e., the time it takes for all packets to be

reported as acknowledged.

1 response_list = ping(ip, verbose=False, count=5, size=40, timeout=2)

2 time_elapsed[x] = response_list.rtt_avg_ms

Listing 5.2: Function to ping all other nodes and obtain the time taken each.

The other key function is the network_usage function, which is used to obtain the net-

work usage of each node. It reports both the amount of downloaded and uploaded

bytes. A simplified implementation of the network_usage function is shown in List-

ing 5.3. This function utilizes the psutil library to interface with the operating system

and obtain the network usage of the network adapter. The downlink network usage is

stored in the bytes_received variable while the uplink network usage is stored in the

bytes_sent variable.

1 net_info = psutil.net_io_counters(pernic=True)

2 bytes_received = net_info['ens160'].bytes_recv

3 bytes_sent = net_info['ens160'].bytes_sent

Listing 5.3: Function to obtain network usage.

54

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

5.3 Module Structure

The three modules running on the orchestrator (Surveying Module, Analysis Module,

and ExecutionModule) inherit the genericModule structure which was designed. This

structure was designed to ensure periodicity and minimizes coupling. Each module is

composed a periodic_loop function which is intended to run periodically at predefined

periods. The purpose of this function is dependant on the design of the specific mod-

ules.

The other function which should be implemented is start_threads which should in-

clude the implementation of how the periodic_loop can be made periodic. It also en-

sures the correct module boot sequence by ensuring that perodic_loop does not begin

until all the other modules are initialized and ready.

5.4 Surveying Module

In order to make decisions, the orchestrator first needs to obtain the necessary infor-

mation. This is done through the Surveying Module. The Surveying Module obtains

various information reported by every node in the edge network. This information

gathering system utilizes objects called Metric that should be implemented by devel-

opers utilizing the orchestrator. Each Metric should correspond to a group of related

data. The Surveying Module manages those Metrics by adding the available ones to its

list of loaded Metrics and ensuring they are updated at appropriate times.

The Metric object specifies three functions that should be implemented by developers

when building their own custom Metrics. The first function is update_metric which

is intended to update all the data that is associated with that metric. This function is

expected to be called whenever the metric should be updated. The second function is

clean_metricwhich is intended for metrics that require clean up before shutting down

the orchestrator. Some metrics rely on connections that must be terminated before

shutting down the orchestrator. So the clean_metric function is used to facilitate a

clean and graceful shutdown of the orchestrator. The third function is log_metric,

which is intended to log the operation of the metric. This could consist of logging all

the data obtained by the metric everytime it is updated, to provide timed logging data,

which is useful for debugging and record keeping.

55

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

The Surveying Module requires the implementation of five functions, two from the

Module structure and three specific to the surveyingmodule. The following three func-

tions are specific to the surveying module. The first one is add_metric, which adds

a metric to the list of metrics that is stored in the surveying module. The second is

remove_metric, which removes a metric and should call the clean function for that

metric as well. The third is update_metrics, which is a function that looks at all the

metrics that are currently stored in the surveying module, and updates them using the

update_metric function that is part of each metric. This is considered updating all the

information that the Surveying Module obtains. The implementation of those three

functions is shown in Listing 5.4. The implementations are quite simple with the only

layer of complexity being checking to make sure the functions can be performed. For

example, the add_metric function checks that the metric being added is not already in

the list of metrics before adding it if it is not. Metrics are stored in the form of a list

containing all the metrics that are a part of the surveying module. The periodic loop

updates the information available to all the metrics, by utilizing the update_metrics

function. Then, logs each metric using the log_metric function associated with each

metric in the list. Finally, the list of metrics is sent to the graph interface for process-

ing.

1 def add_metric(self, metric):

2 if metric not in self.metrics:

3 self.metrics.append(metric)

4 def remove_metric(self, metric):

5 metric.clean_metric()

6 if metric in self.metrics:

7 self.metrics.remove(metric)

8 def update_metrics(self):

9 for metric in self.metrics:

10 metric.update_metric()

Listing 5.4: Metric handling functions in Surveying Module.

The Surveying Module contains a periodic loop that updates the information period-

ically. This loop structure is a part of all 3 of the modules of the orchestrator and is

based on theModule structure. The implementation of the 2 functions required by the

Module Structure is shown in Listing 5.5. The Python Threading library is used to im-

plement the periodic loop by keeping it on a separate thread. This implementation is

the same for the other modules.

56

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

1 def start_threads(self):

2 information_update_thread = threading.Thread(target=self.periodic_loop ,

args=[5], daemon=True)

3 information_update_thread.setDaemon(True)

4 information_update_thread.start()

5 while not self.finished_first_iteration:

6 pass

7 self.graph_interface.analysis_module.start_threads()

8

9 def periodic_loop(self, period):

10 while True:

11 self.update_metrics()

12 for metric in self.metrics:

13 metric.log_metric()

14 self.graph_interface.update_analysis_module_information(self.

metrics)

15 time.sleep(period)

Listing 5.5: General module functions for the Surveying Module.

5.4.1 Implemented Metrics

Two distinct metrics have been implemented, Deployment and Drone. Deployment

is concerned with the deployment of the network such as which nodes are online and

how are they connected. Drone is concerned with data specific to the drone, which is

not relevant for other nodes such as GPS location and remaining battery percentage.

A full list of the obtained data for each metric is below:

1. Deployment Metric

(a) List of nodes which are currently online.

(b) Node status which includes the following:

i. node ID,

ii. total and free Random-Access Memory (RAM),

iii. total and free disk space,

iv. ping information,

57

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

v. network usage (up and down directions).

(c) Node info which includes the following:

i. node ID,

ii. node name,

iii. operating system the node runs on,

iv. CPU model, frequency and architecture,

v. RAM size,

vi. disks sizes,

vii. network interfaces.

(d) List of nodes with active instances deployed.

2. Drone Metric

(a) Battery percentage,

(b) antenna signal strength,

(c) current GPS location of drone (latitude and longitude).

Two functions (get_stats and get_info) that obtain the status and information for all

the online nodes have been implemented. The implementation of the get_status func-

tion is shown in Listing 5.6, which is nearly identical to the implementation of the

get_info function. The get_status function utilizes the function provided by the Fog05

API to get the status of a single node, and simply calls this function for each node, re-

turning a dictionary of the corresponding status with the corresponding Node ID as

the key.

1 def get_status(self):

2 status = dict()

3 for node in self.online_nodes:

4 status[node] = self.api.node.status(node)

5 return status

Listing 5.6: Function to obtain status for all online nodes.

The deploymentmetric contains a function to returnnodeswith active instances, which

determines whether a node has an instance or not (i.e. whether it is considered active).

58

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

This function utilizes the Fog05 API to obtain all the instances in the network, then

finds the Node ID where that instance is contained. The Node ID of each node with

an instance and the Instance ID of each instance are both returned by this function.

The implementation is shown in Listing 5.7. This implementation works by scanning

through all the nodes in the network and then checking how many instances are on

each node. Nodes with at least one instance are added to the active nodes list.

1 for node in self.scan_nodes():

2 fdus[node] = self.api.fdu.list_node(node)

3 if len(fdus[node]) > 0:

4 instances[node] = self.api.fdu.instance_list(fdus[node][0])[node

][0]

Listing 5.7: Function to obtain active nodes.

To obtain the GPS coordinates of the drone location and the battery percentage of

the drone, a socket server which has been integrated into the iDrOS is utilized. This

socket server receives commands, handles them, and returns the appropriate data from

within iDrOS. To communicate with this server, the update_drone_data function uti-

lizes the socket library to send a command (in the form of a JSON) to get the GPS coor-

dinates and battery percentage and then waits for the response containing the location

in the form of a JSON. This implementation of the update_drone_location function is

demonstrated in a sequence diagram in Figure 5.4.1

Figure 5.4.1: Process for obtaining GPS location and battery percentage for drone.

59

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

A sequence diagram describing the data acquisition process of both metrics in more

general terms is shown inFigure 5.4.2. It includes both theMonitoringModules present

on every node and the LXD container running iDrOS and the application running on

top of iDrOS. This sequence diagramassumes there are 2 nodes, with one of thembeing

a drone node and the other being an edge node.

Figure 5.4.2: Data acquisition process between Surveying Module and MonitoringModules.

5.5 Graph Interface

The graph interface is used by the Surveying Module to convert and save the metrics

to a form that that can be understood by the Analysis Module. The Analysis Module

stores all the data in the form of a graph, with graph nodes representing edge nodes or

drones and graph edges representing the interconnections between the edge nodes or

drones. If an edge exists between twonodes, then itmeans the twonodes are connected

60

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

and accessible from one another. If a graph node exists, then this means there is an

edge node or drone running Fog05 with an identifier that corresponds to that graph

node.

To perform the conversion from metrics to a graph, a custom object called Network

Graph has been designed to enable and simplify this conversion. The Network Graph

object is designed to abstract the modification of the deployment graph. There are

two vital functions in it and 3 helper functions. The first vital function is the up-

date_nodes_graph, which creates and updates the nodes in the network deployment

graph. This function utilizes the NetworkX library, which is the library used by the

Analysis Module for storing edge network information. The NetworkX library was

used for its flexibility by allowing any arbitrary type of object to be defined as either

a node or edge. A simplified version of the update_nodes_graph function is shown in

Listing 5.8. The ID of each node is considered the node object. The node attributes

are the various information about the node and they are stored using a key/value pair.

For example Listing 5.8 shows how the node attribute byte_received is used to store

the value of the number of bytes a node uses. It is also seen how the add_node func-

tion is used to create the node. Note that Listing 5.8 shows a simplified version of the

code to highlight the important parts and avoid repetition making it easier to compre-

hend.

1 for node in deployment.online_nodes:

2 if not self.node_graph.has_node(node):

3 self.node_graph.add_node(node)

4 self.node_graph.nodes[node]['bytes_received'] = deployment.status[node]["

bytes_received"]

Listing 5.8: Function to update node graphs.

The update_nodes_graph function only updates the nodes of the graph. The second

function is used to update the edges and is called populate_ping_edges. This function

looks at all the nodes in the graph populated by the previous function, and adds the

latencies between them, which represent the edges in the graph. Another way to think

about this is the latencies represent the distance between one node and any another

node. Those distances are inserted as edge attributes. A simplified version of the popu-

late_ping_edges function is shown in Listing 5.9. This implementation scans through

all the different nodes, and adds the appropriate edges between each node pair. Those

edges come from the latencies between the pair of nodes that have been recorded by

61

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

the Monitoring Modules running on each node.

1 for source_node in deployment.online_nodes:

2 for destination_node in deployment.status[source_node]["ping_nodes"]:

3 if source_node != destination_node:

4 self.node_graph.add_edge(source_node , destination_node ,

5 ping_time=deployment.status[

source_node]["ping_nodes"][destination_node])

Listing 5.9: Function to update node edges.

The specifics of graph generation have been abstracted to the Network Graph object,

which implies that the Graph Interface can be simplified. To achieve this, we need to

implement only one function that simply updates the information the AnalysisModule

has based on the updated Metrics from the Surveying Module. This function utilizes

the Network Graph functions, update_nodes_graph and populate_graph_edges, to

update the nodes and populate the edges of the graph in the analysis module and its

simplified implementation is shown in Listing 5.10.

1 def update_analysis_module_information(self, metrics: []):

2 self.node_graph.update_nodes_graph(metrics)

3 self.node_graph.populate_graph_edges(metrics)

Listing 5.10: Graph Interface function.

5.6 Analysis Module

Themain function of the Analysis Module is to use the information about all the nodes

(i.e., the network graph) to generate a list of actions that achieve an optimized deploy-

ment. This optimization is achieved by utilizingOptimization Strategies. AnOptimiza-

tionStrategy is composedof two functions, check_conditions andgenerate_intended_graph.

check_conditions checks whether the current deployment fulfills a set of conditions

that determine whether the strategy should run. Those conditions are dependant on

the Optimization Strategy and its implementation. If the conditions are met, i.e., the

function returns True, then generate_intended_graph is used to optimize the current

graph and generate a new graph of the intended deployment.

Similarly to the Surveying Module, the Analysis Module manages the Optimization

Strategies available. It organizes them in terms of priority, and as the outcome the

62

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

strategywith the highest priority thatmeets its conditionswill run. The intended graph

generated by that Optimization Strategy will be passed onto the Action Interface for

processing.

Listing 5.11 shows the implementations of the specific functions required to be imple-

mented in the analysis module. In this case, the Optimization Strategies are stored in

an ordered list. The strategy with the lowest index is considered of highest priority,

and thus its conditions are checked first. The strategy with the highest index is consid-

ered of lowest priority and is considered the default strategy that runs when all other

strategies do not meet the required conditions. Therefore, it is required for the strat-

egy with the highest index (lowest priority) to have no conditions but instead always

return True and execute if no other strategy executed first.

1 def add_optimization_strategy(self, strategy: OptimizationStrategy):

2 if strategy not in self.optimization_strategies:

3 self.optimization_strategies.append(strategy)

4 def remove_optimization_strategy(self, strategy: OptimizationStrategy):

5 if strategy in self.optimization_strategies:

6 self.optimization_strategies.remove(strategy)

7 def optimize_graph(self, current_graph: NetworkXGraph) -> NetworkXGraph:

8 for strategy in self.optimization_strategies:

9 if strategy.check_conditions(current_graph):

10 return strategy.generate_intended_graph(current_graph)

Listing 5.11: Necessary functions for analysis module.

5.6.1 Implemented Optimization Strategies

There have been four Optimization Strategies designed with three being implemented.

It is important to note that these optimizations strategies are examples that have been

implemented to demonstrate the capability of the orchestrator to have multiple Op-

timization Strategies and demonstrate how custom Optimization Strategies could be

built. So they are intended to be a demonstration of the customizability and modular-

ity of the orchestrator. The four Optimization Strategies are shown here:

1. Network Latency: Optimizing in terms of latency utilizes a specific algorithm

called the closeness centrality algorithm. This algorithm computes the shortest-

path distance between every node, with the distance corresponding to the latency

between the nodes. The highest number indicates the highest centrality which

63

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

means the nodes are as close to each other as possible. The centrality formultiple

different possible deployments is computed and the highest is considered the

most optimized in terms of latency. A threshold is used to decide whether or not

the deployment needs to be updated. This threshold is modifiable and can be

used for tuning the sensitivity of the algorithm.

2. Networkreliability: ThisOptimizationStrategy should utilizemicro-migration

of all offloaded functions back to the drone node if poor network conditions are

detected. Thismeans that if the networkwas completely lost, all the functions are

available locally on the drone and can continue executing albeit at a slower rate

than when using edge nodes. The implementation of this Optimization Strategy

will not be completed as part of this thesis due to the implementation of micro-

migration of functions being beyond the scope of this thesis.

3. Network consumption: To optimize for network consumption, the strategy is

to terminate all nodes with the exception of the drone node, which must remain

online. This achieves an offline system with the iDrOS and the application run-

ning on iDrOS only running on the drone node. This achieves the lowest possible

amount of network consumption, since the network is only used by the Surveying

Module for gathering data.

4. Battery optimization: To optimize for battery consumption, the strategy is to

keep the same deployment. The reason for this is that instantiation, migration,

and termination of instances are considered energy intensive processes, since

they utilize a large amount of CPU cycles. Therefore, by keeping the same de-

ployment, nodes could still be utilized for offloading if they are active. If none

are active then the energy savings from offloadingmay not counteract the energy

consumption required for instantiation and migration.

The AnalysisModule implementation ranks the Optimization Strategies in a particular

order, which can be easily changed by changing the order in which they are loaded in.

It is important to note that this is a recommended order and that the user is free to

change this order by merely changing the indices of the Optimization Strategies in the

list. The strategies are ranked in terms of importance for keeping the mission active in

the following manner:

1. Drone battery conservation: This was ranked first because if the drone ran

out of battery the mission would fail and there is a reasonable chance that the

64

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

drone may be lost. Therefore, it is extremely vital that if the battery levels are

low, battery conservation must maintain battery at all costs. The condition for

this strategy to run is that the drone battery percentage drops below X%.

2. Network reliability: Poor network conditions are regularly experienced by

drones, as shown in Section 2.2.2, and therefore it becomes important to onload

the data if conditions worsen, to ensure that the data is not lost and can be used

by the drone in the case of complete network failure. Onload refers to transfer-

ring necessary data from edge nodes to the drone node that this data is associated

with. The conditions for this strategy to run are that there are less than Xmobile

antennas within range and the antenna signal is below Y.

3. Network consumption: This conservation mode assumes the user has a lim-

ited data plan. In such a case that the limit is being approached, this mode re-

duces data consumption to hopefully avoid the scenario where the user must in-

cure additional charges. However, this mode was ranked lower than the first 2

because we believe that the dangers associated with a depleted battery (losing the

drone andmission) and unaccounted for loss of network (data loss) far outweigh

the cost of additional charges associated with exceeding the data plan. However

this would be dependant of the user and therefore could be easily modified. The

conditions for this strategy to run is that the sum of the upload and download

network consumption of a drone has to be above X.

4. Latency optimization: This was ranked last since poor latency optimization

does not typically endanger the mission, but rather degrades the performance.

As a result, this should be considered the default mode of operation provided

there are no issues affecting the safety of the drone and the status of the mission.

There are no conditions for this strategy to run since it is the default strategy if

none of the other ones are running, so the conditions to run always returns True

in this case.

The implementation of the Latency Optimization strategy’s graph generation will now

be explained. First, the strategy checks whether there are any active nodes. If not then

it returns a deployment graph with all drone nodes being active. Otherwise, the strat-

egy calculates the closeness centrality C(u) for each node in the current deployment

graph using the following formula:

65

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

C(u) =n− 1

N − 1

n− 1∑n−1v=1 d(v, u)

,

where n refers to the number of reachable nodes from node u, N refers to the number

of total nodes in the graph, and d(v,u) refers to the latency between node u and node

v.

After calculating the closeness centrality for each node, the average closeness centrality

A for the entire deployment can be expressed as

A =

∑Nv=1 C(u)

N,

Afterwards, the function generates deployment graphs with the updated sets of nodes

that have instances running on them. A number of those graphs are generated and the

average closeness centrality for each graph is calculated for each one. The generated

graphwith the highest average closeness centrality is compared with the average close-

ness centrality for the current deployment graph. If the average closeness centrality for

the generated graph is higher than the one for the current deployment graph by a cer-

tain percentage, then the graph returned is the generated graph, which means there

will be some changes made to the deployment graph. Otherwise, the current deploy-

ment graph is returned, which means that no changes will be made. This percentage

determines the willingness of the strategy to modify the current deployment graph. A

higher percentage means it is less likely to modify the graph since a larger increase in

average closeness centrality is required for changes to be made. A pseudocode that is

based on the actual code of this optimization strategy but simplified is shown in List-

ing 5.12.

1 centrality_current = closeness_centrality(graph_current)

2 average_centrality_current = sum(centrality_current) / len(graph_current.

nodes)

3

4 for i in range(iterations):

5 if i == 0:

6 graphs_generated.append(network_generator(graph_current , "

ALL_ACTIVE"))

7 elif i == 1:

8 graphs_generated.append(network_generator(graph_current , "

ALL_INACTIVE"))

66

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

9 elif i == 2:

10 graphs_generated.append(network_generator(graph_current , "

HALF_ACTIVE"))

11 else:

12 graphs_generated.append(network_generator(graph_current , "RANDOM"))

13

14 centrality_generated = closeness_centrality(graphs_generated[i])

15 average_centrality_generated.append(sum(centrality_generated) / len(

graphs_generated[i].nodes))

16

17 max_centrality = max(average_centrality_generated)

18

19 if max_centrality > threshold*centrality_current:

20 return generated_graph_with_max_centrality

21 else:

22 return graph_current

Listing 5.12: Pseudocode for Latency Optimization strategy.

5.7 Action Interface

The Execution Module utilizes the concept of an Action, which is composed of the fol-

lowing attributes:

1. Action type:

(a) migrate an instance from one node to another,

(b) instantiate an instance at a node,

(c) terminate an instance.

2. Source node: the node where the instance is currently or will be located if instan-

tiating.

3. Destination node: the node the instance will be migrated to (only applicable for

migrations).

The purpose of the Action Interface is to pass and convert data from the Analysis Mod-

ule to theExecutionModule. Since theExecutionModule generates an intended graph,

the Action Interface utilizes the current graph and the intended graph to generate a list

67

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

of actions that will be needed to convert the current graph to the intended graph. This

is done though the action_generator.

The action_generator function takes in the current deployment graph and the in-

tended deployment graph. For both graphs, the action_generator function checks

whether or not the nodes instances running. Nodes with and without instances run-

ning are called active and inactive nodes, respectively. For the nodes that undergo a

change between the two deployment graphs, i.e., from inactive to active or active to

inactive, a list of actions will be generated. First, migrations are generated for pairs of

nodes in which one must turn inactive and another must turn active. Once there are

no pairs left, actions to instantiate instances are generated for nodes that are currently

inactive but should be active. Actions to terminate instances are generated for nodes

that are current active but should be inactive. Those actions are combined in a list and

sent to the Execution Module.

5.8 Execution Module

The purpose of the Execution Module is to execute the list of actions previously gener-

ated by the Action Interface. The actions generated are intended to convert the current

deployment graph, to the intended deployment graph generated by the chosen Opti-

mization Strategy. This is done by utilizing the Fog05 API to execute its actions, since

those actions are directly based on functionality implemented in the Fog05 API.

There are three functions in the Execution Module which map to corresponding func-

tions in the Fog05 API:

• instantiate_instancemaps to the instantiate function in the Fog05 API,

• terminate_instancemaps to the terminate function in the Fog05 API, and

• migrate_instancemaps to themigrate function in the Fog05 API.

The implementation of the terminate_instance function is shown in Listing 5.13. This

function first ensures that the node is currently active, before actually attempting to

terminate. This increases the reliability of the orchestrator. Consider this situation, a

sudden connection loss results in a node no longer being active, but an action to termi-

nate it had already been generated. The terminate action will attempt to be executed

despite no longer being needed or even possible. By ensuring the node is active first,

68

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

the situation mentioned will not result in any problems. Performance measuring is in-

tegrated into this function, bymeasuring the time taken for each action (using the time

library) and logging it (using the logging library). Similar implementations are done

for the instantiate_instance andmigrate_instance functions.

1 def terminate_instance(self, node: str):

2 error = 0

3 if node in self.active_nodes:

4 start = time.time()

5 self.api.fdu.terminate(self.active_nodes[node])

6 end = time.time()

7 self.logger.debug('Instance terminated : {}'.format(self.

active_nodes[node]))

8 self.logger.debug('Termination time taken: {}'.format(end - start))

9 else:

10 self.logger.error('Node: {} is not active, cannot be terminated'.

format(node))

11 error = -1

12 return error

Listing 5.13: Function to terminate an instance.

Another vital function is the execute_action function which maps the Action object

to a particular function such as mapping actions of type TERMINATE to the termi-

nate_instance function. A simplified version of the execute_action function is shown

in Listing 5.14. This function is composed of IF statements that check the action types

of the action received, andmap those actions to the corresponding function (e.g., map-

ping the action type TERMINATE to the terminate_instance function).

1 def execute_action(self, action: Action):

2 if action.action_type == Action.ActionTypes.MIGRATE:

3 inst_info = self.migrate_instance(action.source, action.destination

)

4 return inst_info

5 elif action.action_type == Action.ActionTypes.INSTANTIATE:

6 error = self.instantiate_instance(action.source)

7 elif action.action_type == Action.ActionTypes.TERMINATE:

8 error = self.terminate_instance(action.source)

9 return error

Listing 5.14: Execute action function.

69

CHAPTER 5. EDGE ORCHESTRATOR - IMPLEMENTATION

Another important function is the execute_all_actions, which goes through the list of

actions generated by theAction Interface and executes each oneusing the execute_action

function. In the current implementation, the order of how the actions are executed is

not relevant and therefore does not matter. By the end of the execution of the exe-

cute_all_actions, the deployment graph should match the deployment graph that was

generated by the chosen Optimization Strategy.

The terminate_all_instances function is intended to be used when the orchestrator

must shutdown. This function should be called before the program exits to make sure

no more instances are active by terminating the ones that are.

70

Chapter 6

Results and Analysis

6.1 Software Packaging

The software packaging solution was used to package iDrOS to one file, which can be

easily deployed on the edge. The evaluation of the software packaging solution will

evaluate how the software packaging solution that has been chosen, LXD, compares

against the most widely used software packaging solution, which is Docker contain-

ers.

6.1.1 Setup and Implementation

iDrOS will be packaged using both Docker and LXD. The Docker implementation will

be based on Dockerfile and a Docker image will be produced. Two Docker implemen-

tations will be performed, the first one will be using a base Ubuntu image which is the

same image used by the LXD implementation. The second one will use a Python im-

age which is an image that has Python preinstalled already. This kind of image is not

available for LXD and is insteadmade possible by the large developer community built

around Docker.

The LXD implementation has been described in Section 3.3, but basically would rely

on a script to generate an LXD image.

The measurements will be run on the same machine to ensure that the conditions are

as similar as possible. The measurements will be taken on a Linux VM running on a

VMware ESXi server. The VM has 2 vCPUs and 2GB of RAM and the server has the

71

CHAPTER 6. RESULTS AND ANALYSIS

following specifications:

• 4 CPUs x Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz,

• 64GB of RAM, and

• 10.91TB of storage.

The previously mentioned setup was used to take the following measurements:

• The image size demonstrates the scalability of the solution, the memory over-

head and impact on migration times.

• The initialization time demonstrates the performance overhead and expected

downtime when booting a new container.

• The image generation time demonstrates the feasibility of adding functional-

ity to dynamically create new images on the fly, which would be really useful with

amoremodularized version of iDrOS that can package different parts separately.

Note that this modularized version has not been implemented but is merely a po-

tential next step for iDrOS in future work. The image is generated on the same

VM that is used for the previous measurements.

The previous measurements will be taken for the following configurations of the soft-

ware packaging solutions:

• LXD Ubuntu image: This is an LXD image based on a base Ubuntu 18.04

image from LXD servers. This is the image actually used throughout the project.

• Docker Ubuntu Image: This is a Docker image based on a base Ubuntu 18.04

from the Docker server. This is a near identical baseline to compare the LXD

image with (with the only difference being the software packaging solution).

• DockerPython Image: This is aDocker image designed specifically for Python

3.7 software. As a result, it includes Python 3 and 2 preinstalled and is based on

Debian Buster. This is not a direct comparison to the LXD image but presents

one of the strengths of Docker which is its large ecosystem of application specific

images. It should be noted that this can be replicated for LXD. However, the

Python image would have to be custom made by us whereas it is an officially

supported image for Docker.

72

CHAPTER 6. RESULTS AND ANALYSIS

6.1.2 Results and Analysis

Table 6.1.1: Results for software packaging evaluations

LXD

(current

implementation)

Docker (baseline)

Ubuntu

image

Docker Python

image

Image size (MB) 561.9 862.4 704.3

Initialization time (s) 2.498 6.343 4.553

Image generation

time (s)691.03 465.17 357.37

Table 6.1.1 shows that the LXD implementation generates the image with the smallest

size at 560MB, followed by the Docker Python image at 704MB, and lastly the Docker

Ubuntu image at 862.4MB. Since images may be downloaded from an external server,

lower image sizes are highly desireable since they lower the file transfer times during

migration, which is especially important when experiencing poor network conditions.

Additionally, smaller images typically require less time to unpack and deploy, which

reduces the time it takes to deploy an image.

Table 6.1.1 shows that the LXD implementation experiences the lowest initialization

time at around 2.5 seconds, followed by the Docker Python image at 4.5 seconds, and

the Docker Ubuntu image at 6.3 seconds. This is the time it takes for iDrOS and the

application running on top of iDrOS to set up and reach a stable state (i.e., deploy its

control server) once the container is running. Therefore, it is considered a measure of

the execution performance of the packaging runtime. The LXD implementation having

the lowest time is a good indication of its execution performance. The Docker Python

image experienced better execution time than the Ubuntu image, which may be due to

its more lightweight nature, as proven by its smaller image size.

Table 6.1.1 shows that the LXD image takes the longest time to generate an image at 691

seconds. This is followed by the Docker Ubuntu image at 465 seconds and the Docker

Python image at 357 seconds. The Docker Python image takes the least amount of

time because it already comes preloaded with the Python libraries, which otherwise

must be installed during image generation. As a result, a significant amount of time

is saved by the Docker Python image. This process was particularly slow for the LXD

implementation and contributed to its long image generation time.

73

CHAPTER 6. RESULTS AND ANALYSIS

To summarize, the results show that LXD has better runtime performance so it can

be concluded that LXD is a better choice than Docker when runtime performance is

of priority. This evaluation also indicates that there may be a trade-off between the

image size, and the image generation time. This is indicated by the LXD images be-

ing the smallest but requiring the most time to generate, if we compare both Ubuntu

images.

6.2 Optimization Strategies

An optimization strategy generates an optimized configuration of the edge deployment

based on a particular parameter. In this section, we compare the Closeness Optimiza-

tion strategy proposed in this thesis with the Edge Scheduling Strategy (ESS) proposed

in [37]. Both optimization strategies focus on minimizing latency. However, in order

to allow for a fair comparison of the performance achieved by the two strategies, the

ESS was reimplemented for the purpose of this project. The details concerning the

implementation of the ESS are discussed in the following section.

6.2.1 Edge Scheduling Strategy (ESS)

This strategy is composed of multiple components. The first component is the Classi-

fication Component [37]. The classification component is responsible for sorting scal-

ing requests accordingly. Scaling requests are requests for resources in an edge net-

work. This component has not been adapted since the orchestrator built in this thesis

is intended to be autonomous, and thus the concept of manual scaling does not apply

here.

The second component of the strategy is the Edge Scheduling which is responsible for

selecting on which node to deploy a container. This is analogous to the concept of an

Optimization Strategy from the orchestrator and will be adapted and compared to the

Latency Optimization strategy. Edge Scheduling follows the following procedure to

pick the node onto which to deploy a container next. Firstly, the latencies between

nodes in the deployment network are sorted in non-descending order in a list. Then,

a list of nodes in the network is iterated through. If the current node in the iteration

(Node A) has the resources to deploy the container then the container is deployed on it.

Otherwise, the list of latencies is iterated through, and the nodewith the closest latency

74

CHAPTER 6. RESULTS AND ANALYSIS

to Node A that has the resources to deploy the container, would be the node where the

container is deployed. Since the latency list has been sorted, iterating through should

give the node with the closest latency to Node A. Each run of the algorithm generates

at most one change in the deployment graph, which corresponds to which node the

container should be deployed on. If the process completes without a node being found,

then an error message indicating that there is no node available that has the resources

to deploy the container is shown and the request for resources is rejected.

In order to use ESS as a baseline for comparison, I had to perform two modifications.

The first modification is related to the Classification Component and it was needed

because the orchestrator is intended to be autonomous, which means that the idea

of scaling requests does not apply anymore. The modification is that the number of

containers to be deployed is equal to the number of nodes available in the network. The

second simplification is related to the amount of resources each node could handle. In

particular, since in our system each node has atmost one container running, we needed

to take into consideration whether or not the resources on a node are available when

deploying a new container. Consequently, in our system, a container can be deployed

on a node only if the node has no other container running.

6.2.2 Setup

The setup followed for the evaluation utilizes the same physical machine as the eval-

uation for the software packaging solutions. In this casem however, the setup is com-

posed of 9 VMs. One VM represents the image server, while the other eight VMs are

intended to represent either a drone node or an edge node. An image server is a server

which contains LXD images of iDrOSmaking themaccessible to every other node in the

network. Any node can download the iDrOS images from the image server. The distri-

bution between drone nodes and edge nodes is different throughout different parts of

the evaluation.

During the evaluation, I used two types of nodes: drone nodes which are intended to

simulate drones and therefore experience worse network conditions, and edge nodes,

which have better network conditions. I used throughput and delay as the main per-

formance metrics for evaluating the network conditions. Throughput here refers to

the maximum amount of bits that could be transferred from one node to another in

one second. Time delay here refers to an artificial delay that postpones packets sent

75

CHAPTER 6. RESULTS AND ANALYSIS

from the source node for the specified time delay in seconds. Latency can be approx-

imated by summing the time delay of the source and destination nodes. Therefore, in

the analysis the latency will refer to the sum of the time delay of the source and desti-

nation nodes.

The corresponding specifications are given in Table 6.2.1.

Table 6.2.1: Network conditions for different node types

Specifications Drones Edge nodes

Number of vCPUs 2 2

RAM (GB) 2 2

Time Delay (ms) 100 20

Throughput (Mbps) 50 200

Figure 6.2.1: Architecture of the virtual machines in the evaluation environment.

76

CHAPTER 6. RESULTS AND ANALYSIS

6.2.3 Process

We run the orchestrator with both optimization strategies separately. Then, we wait

until the deployment remains the same after 3 runs, whichmeans that the optimization

strategy has been deployed and reached a stable deployment. After this is achieved,

we terminate all instances and shut down the orchestrator. We measure the amount

of time between the start of the orchestrator and the time it takes for the orchestrator

to shut down.

This will be done for different configurations to provide an idea of how different con-

ditions and different network setups impact both strategies. The configurations are

listed in Table 6.2.2:

Table 6.2.2: Configurations tested for optimization strategy evaluation

Low MediumMax

dronesMax edge nodes

Number of drones 2 4 8 1

Number of edge nodes 2 4 0 7

Configurations Low and Medium are intended to represent a case where each drone

node, has one edge node. Those configurations focus on scalability by increasing the

maximum number of drones that have a corresponding edge node through a 1:1 ratio

between drones and edge nodes. Configuration Max drones represents an offline sce-

nario in which there are eight drones deployed but no edge nodes. Configuration Max

edge nodes focuses on performance, by assigning one drone node seven edge nodes.

This gives that drone node plenty of resources and therefore is suitable towards mul-

titasking by assigning different edge nodes different computations.

6.2.4 Measurements

The following measurements will be done for both optimization strategies:

• Stable deployment time: the time it takes for the deployment to not change

after 3 consecutive runs of analyses in addition to the time to terminate all in-

stances and close the orchestrator. This is measured because a lower stable de-

ployment time means the strategy is able to reach its goal more quickly.

77

CHAPTER 6. RESULTS AND ANALYSIS

• Optimization computing time: the average time it takes to generate the op-

timized graph. This is measured because it would be useful to observe what pro-

portion of time is taken by the analysis to judge whether it is feasible to utilize

more complex algorithms.

• Deployment time: the time it takes for the deployment to match the first opti-

mized graph. This ismeasured because the shorter this time is, themore frequent

the analysis would be performed and the more responsive the system would be

to changes in the deployment.

• Total number of actions performed: this is the number of actions that were

performed throughout the operation. This is measured to provide context to

which kinds of actions were taken and to scale the stable deployment time with

the number of actions taken.

6.2.5 Results and Analysis

Throughout the evaluation, both optimization strategies, Latency and ESS, were ana-

lyzed under 4 different network configurations (low, medium, max drones, max edge

nodes). Those configurations allow the strategies tested to be compared across differ-

ent numbers of nodes and different network conditions. Throughout the evaluation,

it was observed that both optimization strategies achieved the same final deployment,

with one exception highlighted in Table 6.2.5 that will be explained later.

The results for the Latency and ESS optimization strategies are shown in Table 6.2.3

and Table 6.2.4 respectively.

Table 6.2.3: Evaluation results of the Latency Optimization strategy

Latency

OptimizationLow Medium Max drones Max edge nodes

Stable

deployment time (s)255 875 966 668

Optimization

computing time (s)0.0105567 0.07267 0.07737237 0.047581387

First deployment

time (s)81 211 832 41

78

CHAPTER 6. RESULTS AND ANALYSIS

Total number of

actions performed8 16 16 16

Table 6.2.4: Evaluation results of the ESS optimization strategy

ESS Low Medium Max drones Max edge nodes

Stable

deployment time (s)291 1,306 1,100 927

Optimization

computing time (s)0.00055977 0.00904 0.00144261 0.005997831

First deployment

time (s)40 80 40 27

Total number of

actions performed8 16 16 16

It was observed that the stable deployment times were consistently lower for the La-

tency Optimization strategy, which indicates that this strategy achieves its optimized

configuration faster. This observation can be explained due to the closeness optimiza-

tion being capable of generating multiple actions in each iteration of analysis, by look-

ing at the entire deployment graph. On the contrary, the ESS optimization generates

one action in each analysis cycle, since it picks the most appropriate node to deploy an

instance in each analysis iteration, then proceeds to deploy an instance on that node

and finally repeats the analysis once that instance is deployed. As a result, the ESS

algorithm has more iterations of the analysis module, which take up more time to run

due to their periodic nature.

The approach of the ESS algorithmdoes, however, simplify its optimization computing

time. This is reflected when looking at the average time it takes to optimize a graph.

The ESS algorithm is roughly one order of magnitude faster than the closeness algo-

rithm in this metric. The more complex graph analysis required by the closeness algo-

rithm results inmore time required. However, it is important to note that the closeness

algorithm is still quite fast, the most time consuming optimization observed took 0.22

secondswhich is extremely quickwhen compared to the times taken to actually execute

the actions generated which reached 244 seconds in one case. On average, the close-

ness optimization requires less than 0.1 seconds to generate an intended graph. This

79

CHAPTER 6. RESULTS AND ANALYSIS

opens the door for even more complex optimizations since the time spent optimizing

could result in large time savings if an action is deemed unnecessary, which would be

well worth it looking at how long action execution takes.

The final deployment achieved by both strategies was to initialize an instance on each

online node. The final deployment was considered the deployment that was best in

terms of latency. However, the method by which this deployment was reached is dif-

ferent. The total number of actions performed corresponds to how many nodes were

initialized and terminated (since there were no migrations occurring throughout the

evaluations). For example, in the Low configuration, 8 actions were performed of

which 4 were initializations and 4 were terminations. In the Medium, Max drones,

and Max edge nodes configurations from Table 6.2.3 and Table 6.2.4, 8 initializations

and 8 terminations were performed.

If ESS receives a request to initialize an instance, the algorithm is only capable of mod-

ifying one node in one analysis iteration. An analysis iteration refers to one run of the

analysis module. This is quite different to the closeness algorithm, which instead looks

at the entire graph and determines the full deployment that optimizes the latency in

one analysis iteration after the initial iteration. In this case, the closeness algorithm

first initializes instances in drone nodes, which is a requirement since drones should

have an instance running. The subsequent analysis iterations look at the deployment

graph, which now have the drone nodes initialized, and attempt to find the graph that

maximizes the latency. Based on the results, in each case it was determined that the

deployment which maximizes the latency is one where all the nodes have an active in-

stance.

This behaviour explains why the ESS strategy took less time to reach its first deploy-

ment. The first deployment of the ESS strategy is when the first node has an instance

instantiated, whereas the first deployment of the Latency strategy is when all the drone

nodes have active instances. When there is only one drone node in the network, such

as theMax edge nodes network configuration, Latency strategy has a first deployment

time which is close to the one for the ESS strategy.

There is a good reason for Latency strategy to deploy on all the nodes. Drone nodes

are required to have an instance, and the drone nodes had higher latencies than edge

nodes, meaning it was more beneficial to maximize the number of instances in edge

nodes because edge nodes are significantly closer to drone nodes than drone nodes

80

CHAPTER 6. RESULTS AND ANALYSIS

are to each other. In a scenario where an edge node experiences high latencies, it is

expected that this an instance will not be deployed in that edge node, since it would

increase the average latency throughout the network.

To test this behaviour, the Low configuration was repeated but with one change, that

is, one edge node had its time delay increased from 20ms to 300ms. The correspond-

ing results are shown for both strategies in Table 6.2.5. The Latency strategy was able

to detect the poor latency and not deploy an instance on that node. The ESS strategy

however, was not and ended up deploying an instance there. This demonstrates the

power of considering the latencies of all the nodes in the network when making deci-

sions (as is done by the Latency strategy), rather than only looking for the available

node with the lowest latency (as is done by the ESS strategy).

Table 6.2.5: Low configuration but with one edge node at 300ms time delay

Latency ESS

Stable deployment time (s) 167 283

Optimization computing time (s) 0.006013966 0.000457355

First deployment time (s) 69 38

Total number of actions performed 6 8

An interesting feature that can be observed in Figure 6.2.2 is that when no nodes have

instances yet, the time it takes to instantiate an instance is around 50 seconds. This

time begins increasing to around 75 seconds when 3 nodes are online, then increases

rapidly to around 100 seconds with 4, 120 seconds with 5, 150 second with 6, and 180

seconds with 7 nodes. This is caused by the evaluation setup used. Since the evalua-

tion was performed on the same machine, with each node running in a separate VM,

this shows the hardware resources becoming more constrained as more instances are

actively running. It was observed that running a container with iDrOS increased CPU

usage to around 70%. Instantiating an instance maximized the CPU of the VM. This is

also reflected in the stable deployment times from Table 6.2.3 and Table 6.2.4, which

do not scale linearly with the amount of actions performed. The low configurations

performed half the actions of themedium configurations. The low configuration take

less than a third of themedium configuration’s time for the Latency strategy. The low

configuration take less than a quarter of themedium configuration’s time for the ESS

strategy.

81

CHAPTER 6. RESULTS AND ANALYSIS

Figure 6.2.2: Graph of average initialization times at different numbers of active nodes

The hardware resources becoming more constrained was verified by running the or-

chestrator under the ESS strategy, and monitoring the CPU usage of the physical ma-

chine as more instances are initialized. The CPU usage was around 16% when idle.

It increased to 42% when the first instance was deployed then increased to 69% with

the second instance, 77% with the third and finally 100%with the fourth. This demon-

strated thatwhen four instancewere running, the host CPUwasbeingutilized fully.

To summarize, this evaluation demonstrates that the Latency Optimization strategy is

better suited to optimizing for latency than the ESS strategy. LatencyOptimizationwas

able to adapt when one of the edge nodes has high latencies. It consistently reached

its stable deployments quicker. The evaluation demonstrated that the time to compute

an optimized graph was insignificant for both strategies when compared to the time to

carry out actions to convert the current deployment graph to the optimized graph. Even

82

CHAPTER 6. RESULTS AND ANALYSIS

though calculating the optimized graph for Latency Optimization was slower than ESS,

it was still insignificant compared to the potential time savings of finding out an action

was not necessary. This evaluation also demonstrated a weakness of the evaluation

setup due to the fact that running a large number of virtual machines on one physical

device resulted in performance degradation due to the same physical resources (mainly

CPU resources) being shared across multiple VMs competing for them.

6.3 Orchestrator

In this section, the orchestrator I built will be evaluated based on the RAM require-

ments presented in Section 4.7. This evaluation is performed using the same setup

presented in Section 6.2.2, which consists of a virtual machine running on a server,

but with 4GB of RAM instead of 2GB.

In this evaluation, I will be running the orchestrator and observing the RAM usage

on the node running the orchestrator as it deploys more instances. The node running

the orchestrator will deploy instances sequentially in seven other nodes, but not in

itself.

The results obtained are shown in Table 6.3.1:

Table 6.3.1: Evaluation of RAM usage on the orchestrator node as more instances areinstantiated.

Instances Deployed RAMUsage (MB) Incremental RAM Usage (MB)

0 50 0

1 67 17

2 97 30

3 130 33

4 155 25

5 190 35

6 235 45

7 296 61

These results demonstrate that the orchestrator uses 50MB when no instances have

been deployed. During this stage, the orchestrator is collecting data from all eight

nodes in the network, which includes the node the orchestrator is running on. Dur-

83

CHAPTER 6. RESULTS AND ANALYSIS

ing this stage, the orchestrator is also running the analysis module to choose the op-

timization strategy that passes its conditions. The table shows two columns for RAM

usage. Incremental RAM usage is the difference between the current RAM usage, and

the RAM usage for the previous stage. Consequently, incremental RAM usage repre-

sents the increase in RAM when deploying an instance. The general trend is for RAM

usage can be demonstrated by Figure 6.3.1:

Figure 6.3.1: Graph of RAM usage at different numbers of instances deployed

By fitting a second order polynomial trendline onto this graph, we can obtain an equa-

tion that links the RAM usage with the number of instances deployed:

O(n) = 2.5287n2 + 16.575n+ 50,

where O represents the Orchestrator RAM usage and n represents the number of in-

stances deployed.

Using this equation, we can predict howmany instances can bemanaged until the 4GB

of RAM has been saturated. During this evaluation, the base OS consumed 779MB of

RAM, which is used in the following formula to find the total RAM usage including the

base OS.

84

CHAPTER 6. RESULTS AND ANALYSIS

T (n) = 2.5287n2 + 16.575n+ 829,

where T represnts the Total RAM usage.

The RAM usage formula can be rearranged to find the maximum number of instances

that can be contained within 4096MB of RAM, which we found to be 32 instances.

Unfortunately, since there is no similar orchestrator available to compare with, it is

difficult to judge how good this value is. However, it is an important value to keep in

mind for developers that intend on using the orchestrator because this gives the devel-

opers a hard limit on how many instances can be managed. This evaluation has also

provided us with an equation to calculate RAM usage for any number of instances.As

a result, developers can organize and partition their RAM ahead of time if they know

exactly how many instances they want to deploy. Consequently, they can either use

a computer with less RAM resulting in cost savings, or utilize leftover RAM for other

software.

The limitations of this evaluation should also be noted. Because the orchestrator is

modular, the custom modules developers are using will affect RAM usage. Addition-

ally, this evaluation looks only at RAM, meaning it does not evaluate whether the pro-

cessing requirements are satisfied by the orchestrator. The reason being that, unlike

RAM, processing is not a hard limit, but rather determines the orchestrator’s perfor-

mance. Evaluating the orchestrator’s performance requires there to be a comparable

orchestrator to act as a baseline, whereas when it comes to RAM, the hard RAM limi-

tation is the baseline that the orchestrator was compared with.

In Section 1.2 I outlined the questions that the thesis will attempt to answer, which are

repeated here:

1. What functionalities should the edge orchestrator intended for mobile robots

have?

2. Which parts of the edge orchestrator should be modular?

3. How should applications be deployed using the edge orchestrator?

The functionalities developed that are intended for mobile robots are the four opti-

mization strategies. Each of those strategies tackles a specific need that mobile robots

have. The network latency strategy tackles the need for performance optimization and

85

CHAPTER 6. RESULTS AND ANALYSIS

achieving the lowest response times when utilizing edge nodes. Network reliability

tackles the unpredictability ofmobile networkswhen operating at high altitudes, which

drones commonly operate in. Network consumption tackles the use case in which a

drone has a limited amount of mobile bandwidth and therefore must restrict its net-

work usage to avoid additional costs to the user. Battery optimization tackles the issue

of the success of themission being dependant on having enough battery to last through

the entire mission.

As for the second question, two key parts of the orchestrator have been modularized

because they are deemed the most necessary to be modular. Metrics have been mod-

uliarized because we are aware of the large range of sensors that are used on mobile

robots. Modularizing metrics allows for any sensor to be compatible and used with the

orchestrator, including future sensors that have not been released yet. This comes at

a cost of the developer having to create and implement a metric for the custom sen-

sors being used. However, we deem this to be a worthwhile cost because restricting

the orchestrator to specific sensors which we have implemented would severely curtail

the adoption of the orchestrator. Optimization strategies have also been modularized

due to the wide range of scenarios that mobile robots are deployed in. In this thesis,

we identified that different applications of mobile robots have different priorities and

objectives. By giving developers the ability to develop and use custom optimization

strategies, and sort them in custom priorities, the orchestrator would be well suited to

handle applications with unique objectives and priorities. This will improve the adop-

tion of the orchestrator by making it suitable for a wider range of applications.

The question of how to deploy application was covered by the sections focused on soft-

ware packaging. A number of different software packaging solutions were compared.

By packaging software into one file and deploying it, it simplifies the process of soft-

ware dissemination because only file has to be transferred to the different nodes. This

one file contains all the dependencies needed by the software being deployed, which

means that those dependencies will not need to be installed on nodes prior to deploy-

ment. As a result, any piece of software can be deployed without requiring preconfig-

uration of the nodes, provided that the software packaging runtime is installed on the

nodes. The software packaging solution chosen was LXD.

86

Chapter 7

Conclusions and Future work

7.1 Conclusions

In this thesis, a general purpose modular edge orchestrator has been designed and im-

plemented. The orchestrator can be extended to apply to different domains by building

modules in the form of custommetrics, which are used to collect custom data, and cus-

tom optimization strategies, which are used to add custom optimization logic for spe-

cific purposes. This modularity was utilized to build custom modules for the mobile

robotics domain, which is a domain that benefits from the advantages (e.g., improved

response times and redundancy) that edge computing provides. One of those custom

modules includes an optimization strategy that has been designed to optimize for la-

tency by utilizing the closeness centrality algorithm. This optimization strategy was

compared with an external optimization strategy called ESS that also optimizes for la-

tency. Both optimization strategies were implemented on top of the orchestrator for a

direct comparison. It was found that the optimization strategy designed in this thesis

was better suited towards optimizing for latency.

By building this modular orchestrator it was demonstrated through the modules de-

signed for mobile robotics that the orchestrator could be adapted towards different

domains. The orchestrator greatly simplifies the process of adding edge computing

support and utilizing computation nodes that have been deployed on edge.

As part of this thesis, a choice was made on which software packaging platform to

adopt. The decisionwas to adopt LXD,which is a Linux container based software pack-

aging solution. This decision was evaluated by comparing LXD with Docker, which is

87

CHAPTER 7. CONCLUSIONS AND FUTUREWORK

another software packaging solution that is based around containers. It was found

that LXD containers exhibited better performance than Docker containers. It was also

found that LXD images are of smaller size but required more time to generate than

Docker based images. Therefore, there is a trade-off between image generation time

and image size. As a result of the evaluation, it was found that LXD is a better option

for the purposes of being deployed by the orchestrator built in this thesis.

Additionally, two improvements were made to iDrOS as part of this thesis. The first

being the addition of monitoring of the drone battery percentage which is used in one

of the optimization strategies that has been developed. The second improvement is

the extension to the socket server to expose additional data, namely the GPS location

of the drone and the remaining battery percentage. The second improvement allowed

the GPS location and the battery percentage to be obtained by the orchestrator and

used in the optimizations strategies.

7.2 Future Work

An important extension to the orchestrator built in this project is to expand its ca-

pabilities to include micro-migrations. Those micro-migrations would allow for the

offloading of computations between different edge nodes, by transferring functions

between the different instances, thereby tranferring them from one node to another.

This provides an additional layer of flexibility and optimization that the orchestrator

is capable of performing because a deployment could be optimized not just where in-

stances would be located, but also on which instances various computations could be

performed.

An interesting extension would be to add the capability to consider time of deployment

and precompute an optimized deployment at various points in time. An example of this

would be if the drone’s entire flight path was known prior to flying. An optimization

strategy would look at the concentration of network antennas along the flight path and

calculate the optimized deployment at each point in time to ensure reliable operation,

before actually flying. This allows for an additional dimension to consider when op-

timizing, which is time. For example, if the first half of the flight was in a residential

area with a high antenna density (good network conditions) while the second half is in

an area with a low antenna density (poor network conditions). In this example, it is

preferable for the drone to schedule intensive computations during the first half, since

88

CHAPTER 7. CONCLUSIONS AND FUTUREWORK

the edge network is accessible under the good network conditions. The drone can then

delay the computations that can run offline to begin during the second half of the flight,

since the edge network is unlikely to be accessible due to the poor network conditions.

This capability to consider time when optimizing is an interesting extension of this

orchestrator.

It is also important to add a form of redundancy. Not all network losses can be pre-

dicted, and hence, the orchestrator should not lose data if such a network loss was to

suddenly occur. This could be in the form of a specialized redundancy module that en-

sures that a local copy of the data that is necessary to restart computations is available

on the drone nodes. The design and implementation of such a module would be an

important extension to increase reliability. By building this redundancy module, the

goal is that a complete network loss for any node at any point throughout the mission

results in the least amount of data lost. For example, let us assume that we have two

nodes, one being the drone and another being an edge computer. The edge computer

is used for computationally intensive computations, and for sending the results back

to the drone. Lets say that the drone completely losses its network connection due to

entering an area with high interference. Then the edge computer is unable to send

the results of the computations back to the drone. The redundancy module present

on both nodes should be able to realize the network has been lost, and react accord-

ingly. This could be in the form of keeping a backup of the results on the edge node

until the network connection has been regained, after which the results would be sent

to the drone. Another option would be to backup the results to a cloud service since it

is possible the network connection to the drone has been lost permanently. Therefore,

the redundancy module needs to ensure that no data has been lost. The redundancy

module should ensure that, in the case of a failure in any node, there is an alternative

source that could be used to obtain data on that node. An approach can be through

keeping multiple copies of the data on multiple nodes or other potential solutions that

seek to ensure no data is lost.

In addition, the implemented battery optimization strategy could be expanded further,

to conserve battery through othermeans such as, for example, utilizingmethods of lim-

iting the energy consumed by the drone’s motors at the cost of decreased performance,

which may be preferable if the remaining battery percentage is low. To utilize those

methods, the capabilities of the ExecutionModulemust be expanded to include actions

such as manipulating drone motors. It should be noted that the expansion should be

89

CHAPTER 7. CONCLUSIONS AND FUTUREWORK

designed in a manner that maintains the modularity and flexibility of the orchestrator.

Rather than directly integrating such battery conservation techniques into the orches-

trator, there instead should be a formal structure designed that allows the addition of

more specific actions.

90

Bibliography

[1] 2019. URL: https://www.ncta.com/whats-new/report-where-does-the-

majority-of-internet-traffic-come.

[2] Aug. 2020. URL: https://www.raspberrypi.org/products/raspberry-pi-4-

model-b/specifications/.

[3] URL: https://hub.docker.com/.

[4] Anand, B. andHaoEdwin, A. J. “Gamelets—Multiplayermobile gameswith dis-

tributed micro-clouds”. In: 2014 Seventh International Conference on Mobile

Computing and Ubiquitous Networking (ICMU). Jan. 2014, pp. 14–20. DOI:

10.1109/ICMU.2014.6799051.

[5] Apache 2.0 License. July 2020. URL: https://www.apache.org/licenses/

LICENSE-2.0.

[6] Avolio, Pietro. Apr. 2020. URL: http://hdl.handle.net/10589/154050.

[7] Carnevale, L., Celesti, A., Galletta, A., Dustdar, S., and Villari, M. “From the

Cloud to Edge and IoT: a Smart Orchestration Architecture for Enabling Os-

motic Computing”. In: 2018 32nd International Conference on Advanced In-

formation Networking and Applications Workshops (WAINA). 2018, pp. 419–

424.

[8] Containers - lxd: Server documentation. May 2020. URL: https://ubuntu.

com/server/docs/containers-lxd.

[9] deBrito,M. S.,Hoque, S.,Magedanz, T., Steinke, R.,Willner, A., Nehls, D., Keils,

O., and Schreiner, F. “A service orchestration architecture for Fog-enabled in-

frastructures”. In: 2017 Second International Conference on Fog and Mobile

Edge Computing (FMEC). 2017, pp. 127–132.

[10] Docker. June 2020. URL: https://www.docker.com/.

91

BIBLIOGRAPHY

[11] Docker Hub Alpine. URL: https://hub.docker.com/%5C_/alpine.

[12] Eclipse fog05. Apr. 2020. URL: https://fog05.io/.

[13] Eclipse Public License 2.0. Sept. 2020. URL: https://choosealicense.com/

licenses/epl-2.0/.

[14] Farris, I., Taleb, T., Flinck, H., and Iera, A. “Providing ultra-short latency to

user-centric 5G applications at the mobile network edge”. In: Transactions on

EmergingTelecommunicationsTechnologies29.4 (2018). e3169 ett.3169, e3169.

DOI: 10.1002/ett.3169.

[15] Felter, W., Ferreira, A., Rajamony, R., and Rubio, J. “An updated performance

comparison of virtual machines and Linux containers”. In: 2015 IEEE Inter-

national Symposium on Performance Analysis of Systems and Software (IS-

PASS). 2015, pp. 171–172.

[16] Hamilton, Eric.What is Edge Computing: The Network Edge Explained. Dec.

2018. URL: https://www.cloudwards.net/what-is-edge-computing/.

[17] Hill, Simon. 5Gvs. 4G:Differences in Speed, Latency, andCoverage Explained.

Nov. 2019. URL: https://www.digitaltrends.com/mobile/5g-vs-4g/.

[18] Hou, X., Ren, Z., Cheng, W., Chen, C., and Zhang, H. “Fog Based Computation

Offloading for Swarm of Drones”. In: 2019 IEEE International Conference on

Communications (ICC). 2019, pp. 1–7.

[19] Introduction. Aug. 2020. URL: https://mavlink.io/en/.

[20] Laikari, A., Fechteler, P., Prestele, B., Eisert, P., and Laulajainen, J. “Accelerated

video streaming for gaming architecture”. In: 2010 3DTV-Conference: The True

Vision - Capture, Transmission and Display of 3D Video. 2010, pp. 1–4.

[21] Lin, X., Wiren, R., Euler, S., Sadam, A., Määttänen, H., Muruganathan, S., Gao,

S., Wang, Y. -. E., Kauppi, J., Zou, Z., and Yajnanarayana, V. “Mobile Network-

Connected Drones: Field Trials, Simulations, and Design Insights”. In: IEEE

Vehicular Technology Magazine 14.3 (Sept. 2019), pp. 115–125. ISSN: 1556-

6080. DOI: 10.1109/MVT.2019.2917363.

[22] lteencyclopedia. June 2020.URL: https://sites.google.com/site/lteencyclopedia/

home.

[23] LXD. June 2020. URL: https://linuxcontainers.org/lxd/.

92

BIBLIOGRAPHY

[24] Manco, Filipe, Lupu, Costin, Schmidt, Florian, Mendes, Jose, Kuenzer, Simon,

Sati, Sumit, Yasukata, Kenichi, Raiciu, Costin, and Huici, Felipe. “My VM is

Lighter (and Safer) than Your Container”. In: Proceedings of the 26th Sympo-

sium on Operating Systems Principles. SOSP ’17. Shanghai, China: Association

for ComputingMachinery, 2017, pp. 218–233. ISBN: 9781450350853. DOI: 10.

1145/3132747.3132763.

[25] Minimum requirements related to technical performance for IMT-2020 radio

interface(s). Nov. 2017. URL: https://www.itu.int/dms_pub/itu-r/opb/

rep/R-REP-M.2410-2017-PDF-E.pdf.

[26] Mottola, Luca, Moretta, Mattia, Whitehouse, Kamin, and Ghezzi, Carlo. “Team-

level programming of drone sensor networks”. In: Proceedings of the 12th ACM

Conference on Embedded Network Sensor Systems. 2014, pp. 177–190.

[27] OpenNebula miniONE¶. Aug. 2020. URL: https : / / docs . opennebula . io /

minione/.

[28] OpenStack Components. Aug. 2020. URL: https : / / www . openstack . org /

software/project-navigator/openstack-components.

[29] OSv - the operating system designed for the cloud. Apr. 2019. URL: http://

osv.io/.

[30] Overview. Aug. 2020. URL: https://docs.openstack.org/nova/latest/

install/overview.html.

[31] Ravindra, Pushkara, Khochare, Aakash, Reddy, Siva Prakash, Sharma, Sarthak,

Varshney, Prateeksha, and Simmhan, Yogesh. “ECHO: An Adaptive Orchestra-

tion Platform for Hybrid Dataflows across Cloud and Edge”. In: arXiv preprint

arXiv:1707.00889 (2017).

[32] Rumprun Python. Apr. 2017. URL: https://github.com/solo-io/unik/blob/

master/docs/compilers/rump.md#python-3.

[33] Satyanarayanan, M. “The Emergence of Edge Computing”. In: Computer 50.1

(Jan. 2017), pp. 30–39. ISSN: 1558-0814. DOI: 10.1109/MC.2017.9.

[34] Singh, Ranvir. LXD vs Docker. 2017. URL: https://linuxhint.com/lxd-vs-

docker/.

[35] Solo-Io. unik. July 2019. URL: https://github.com/solo-io/unik.

93

BIBLIOGRAPHY

[36] UDOO X86 II ADVANCED PLUS. Aug. 2020. URL: https://shop.udoo.org/

udoo-x86-ii-advanced-plus.html.

[37] Wong,Walter, Zavodovski, Aleksandr, Zhou, Pengyuan, andKangasharju, Jussi.

“Container deployment strategy for edgenetworking”. In:Proceedings of the 4th

Workshop on Middleware for Edge Clouds & Cloudlets. 2019, pp. 1–6.

[38] Xavier, B., Ferreto, T., and Jersak, L. “Time Provisioning Evaluation of KVM,

Docker and Unikernels in a Cloud Platform”. In: 2016 16th IEEE/ACM Inter-

national Symposium on Cluster, Cloud and Grid Computing (CCGrid). 2016,

pp. 277–280.

[39] Yajnanarayana, Vijaya, Wang, Y-P Eric, Gao, Shiwei, Muruganathan, Siva, and

Ericsson, Xingqin Lin. “Interference mitigation methods for unmanned aerial

vehicles served by cellular networks”. In: 2018 IEEE 5GWorld Forum (5GWF).

IEEE. 2018, pp. 118–122.

[40] Yi, S., Hao, Z., Qin, Z., and Li, Q. “Fog Computing: Platform and Applications”.

In:2015Third IEEEWorkshoponHotTopics inWebSystemsandTechnologies

(HotWeb). 2015, pp. 73–78.

[41] Yousefpour, Ashkan, Fung, Caleb, Nguyen, Tam, Kadiyala, Krishna, Jalali, Fate-

meh, Niakanlahiji, Amirreza, Kong, Jian, and Jue, Jason P. “All one needs to

know about fog computing and related edge computing paradigms: A complete

survey”. In: Journal of Systems Architecture 98 (2019), pp. 289–330.

[42] Zhao, P. and Dán, G. “Scheduling Parallel Migration of Virtualized Services Un-

der Time Constraints in Mobile Edge Clouds”. In: 2019 31st International Tele-

traffic Congress (ITC 31). 2019, pp. 28–36.

94

Appendix A

Class diagrams for orchestrator

Figure A.0.1: Module Class Diagram

95

APPENDIX A. CLASS DIAGRAMS FOR ORCHESTRATOR

Figure A.0.2: Metric class diagram

Figure A.0.3: Network Graph class diagram

96

APPENDIX A. CLASS DIAGRAMS FOR ORCHESTRATOR

Figure A.0.4: Graph interface class diagram

Figure A.0.5: Optimization Strategies Class Diagram

Figure A.0.6: Action Interface Class Diagram

97

APPENDIX A. CLASS DIAGRAMS FOR ORCHESTRATOR

Figure A.0.7: Action Class Diagram

98

www.kth.se

TRITA-EECS-EX-2020:807