Dynamic Path Planning for Mars Data Collection Mission using Probability Distributions

25
Dynamic Path Planning for Mars Data Collection Mission using Probability Distributions Fredric Moezinia under the direction of Mr. Lawrence Bush MIT Computer Science and Artificial Intelligence Laboratory Research Science Institute July 30, 2013

Transcript of Dynamic Path Planning for Mars Data Collection Mission using Probability Distributions

Dynamic Path Planning for Mars Data Collection

Mission using Probability Distributions

Fredric Moezinia

under the direction ofMr. Lawrence Bush

MIT Computer Science and Artificial Intelligence Laboratory

Research Science InstituteJuly 30, 2013

Abstract

This research explores many aspects of an adjustably autonomous Mars mission. One of our

several objectives is to find a viable path for a ground-vehicle to take from a starting point

to a pre-determined destination while avoiding obstacles along the way. This is achieved

by merging a Gaussian Process heuristic with an online simulation. In this way, we find

computationally e↵ective methods to relieve the problems of exact value iteration. Moreover,

we introduce a tree-branching recursive algorithm, named UMP (Uncertainty Minimization

Planning), designed to reduce the uncertainty within an area on the Martian surface. This

method investigates a series of actions with high uncertainty in order to have a better

understanding of the long-term consequences of the ground-vehicle’s first action. As well

as innovating upon approximate value iteration, we introduce the idea of a human-machine

interface to make decisions of the adjustably autonomous robot more tailored to the current

situation.

Summary

Central to dynamic programming is the concept of dividing up one complicated idea

into smaller, more manageable pieces in a recursive manner. We use this notion by splitting

up a multi-stage Mars mission into simpler subsections. We try to find the safest path for a

ground-rover to take from a starting point to a pre-determined destination without hitting

obstacles. This path is determined using information collected by overhead UAVs (Unmanned

Aerial Vehicles), which try to gather as much useful data about an area as possible. We assign

certain values to smaller portions of the area, namely, the estimated likelihood that a vehicle

can traverse through a single portion of the area safely, and the certainty of that estimate.

Subsequently, an online algorithm –called Uncertainty Minimization Planning– can suggest

a viable path to a point of interest for Mars data collection.

1 Introduction

In recent years, the search for signs of life on Mars has become a major focus of scientific and

astronomical research. Two NASA missions especially –the Curiosity Rover (2012) and Spirit

Rover (2004)– were instrumental in developing humanity’s knowledge about the molecular

composition of minerals on Mars. The Curiosity Rover even unveiled the site of an ancient

stream-bed near its landing point, substantiating speculation over the presence of liquid H2O

on Mars [1] . These missions were groundbreaking in revealing important new information

about Mars, and serve as examples of possible applications of our research.

Information travelling at the speed of light takes between 4 to 12 minutes to relay from

Earth to Mars depending on the planetary positions. Given the latency involved in sending

commands to and receiving data from an unmanned robot at such distances, our objective

is to create an adjustably autonomous robot. This robot can generate an informative map

of the surrounding area in a short time period and from this, output an optimal path to

a pre-determined destination. This plan is accumulated from data collected by Mars-based

scout airplanes and other UAVs (Unmanned Aerial Vehicles). The actions of these surveying

aircrafts are also central to our project as they have a direct impact on the capabilities of

the ground-rover.

Despite the success of previous missions to Mars, there are two fundamental ways in

which our work could improve the actions of the ground rover on the Martian surface, as

well as the decisions of the UAVs. First of all, by reducing the overall uncertainty associated

with the topography in a given area using a new strategy, the rover can plan its path from

landing point to destination with higher knowledge and accuracy. Secondly, we can refine

current path planning algorithms in order to optimize the path for the ground vehicle to

follow to its destination of preference.

1

1.1 Inadequacies of Current Mission

Current systems that use hazard avoidance cameras or the ‘Field D*’ path planner are

sub-optimal; the built-in cameras can only detect nearby obstacles, and cannot plan around

larger obstacles in the long term. As an example of the consequences of such shortcomings,

the Spirit Rover spent 105 minutes trying to maneuver around a rock cluster, and ultimately

never reached its destination. Furthermore, the current path planner mechanism is ine�cient

and time consuming, as it plans the shortest Euclidean distance to the destination, only

updating its awareness of obstacles once the rover reaches them. Our process will tackle

these issues by using a scout satellite and lightweight-airplanes to gather information about

the area surrounding the ground vehicle. Finally, we address the current inadequacies and

limitations by realizing the long-term e↵ects of our real-time decisions.

1.2 Approach

The main task of our present research is to try to create a series of algorithms that enable a

semi-autonomous vehicle to carry out a mission using only collected data from the aircrafts.

The path-planning algorithms are given a map of the terrain in matrix form. This map is

divided into cells, where each cell contains a probability density function (PDF) modelling

the probability that the rover will be able to safely pass through that cell. A combination

of these distributions is an approximate heuristic known as the Gaussian Process, which

can form our policy (decision making process). For example, if it is revealed that there is

an obstacle in a cell square, such as a boulder or pothole, the distribution will contain a

likelihood of successful traversability close to 0. We then proceed by picking a certain path

which has a strong likelihood of a successful journey, where this likelihood is bounded by 0

and 1. Although there is a trade-o↵ between selecting a shorter or safer path, in practical

terms the preference is to take the safer one.

2

The robot is independently active for 4 hours every Mars-day, spending the remaining

20.5 hours recharging using solar cells. The robot receives instructions from earth before its

period of activity [2]. This is part of the human-machine interface, which allows an operator to

either permit or disallow certain paths to be followed under di↵erent circumstances. Despite

this, there may be a pre-determined threshold value for the probability of path success, only

above which the robot is allowed to move. Throughout the mission, there are continuous

updates of the estimated best route as more information is collected and processed, which

makes this adjustable level of autonomy achieved through the human-machine interface

useful.

Another aspect of our project is to plan the flight paths of the scout airplanes, which

survey an area in order to procure more data, either to inform the rover, or to decide upon

an interesting destination. Once a potential path for the rover has been chosen, we must

configure an algorithm enabling the airplanes to fly over parts of the path which we know

the least about. Priority is shifted to these areas of high uncertainty for further investigation.

From this exploration, more information about the path can be collated, and the viability

of the path examined more closely.

In summary, this project aims to provide a framework of algorithms to be used in a

planetary mission project which calculates low-risk paths for rovers on Mars to follow. To be

useful in unknown situations, these algorithms must be able to provide acceptable paths from

initial data collected, update their paths when new data is acquired, and improve their results

as time elapses. Our addition to this field will be combining the Gaussian Process with both

an existing algorithm (Rollout) as well as our new recursive algorithm UMP (Uncertainty

Minimization Planning).

3

2 Minimizing Information Uncertainty

The optimal exploration problem can be defined as a task to successively select locations to

observe so as to minimize the uncertainty of a state estimate. The scout satellite surveys areas

which yield the highest expected uncertainty reduction, creating a probabilistic activity map.

Our objective is to find the most informative and useful map, an assignment often approached

by starting to collect information in the center of an area of high uncertainty. However, this

common sub-optimal approach, often referred to as ‘a greedy strategy’, is myopic and ends

up gathering less useful information than our new strategy. Past research by Lawrence Bush,

has suggested a non-myopic strategy, [3] which can find out more information about a given

area in the same amount of time. This is beneficial as more cell distributions can be inferred

with greater precision if we know more about a given area. The comparison of strategies is

shown in Fig. 1.

4

By making use of this new development, we can provide the path planning algorithm with

more precise details of the path-planning area and henceforth construct a better estimate of

the optimal path.

3 Analysis of Safety Distribution

In order to split up the large problem into a set of smaller ones (a policy central to

dynamic programming) the area of interest is gridded up into smaller square cells. Each cell

contains a probability density function, represented by a series of Gaussian distributions.

One distribution contains a mean value µ that represents the likelihood of the vehicle passing

through the cell safely and a standard deviation, �, which represents the uncertainty of that

5

belief. While the mean is deduced by either the presence or the absence of obstacles, the

uncertainty value could be, for example, due to the low resolution of the images collected by

the scout airplane. These values are inferred from images taken by the scout aircrafts; this

procedure constitutes another topic within machine learning.

From the inferred values of µ and �, we then implement value iteration to find the path

from the start location to the end location that has the greatest chance of mission success.

Such a path is defined as our optimal path. Note that probability of mission success does

not depend solely on µ, but also on �. The Gaussian distribution is defined as:

P (x) =1

�p2⇡

e�(x� µ)2

2�2

where the area underneath the curve is exactly one. P (x) spans the domain of (-1,1).

However, because the x-axis represents probability, it is preferable to have a domain of

(0, 1). This probability is the likelihood that the cell is traversable, where the larger the

x-value is, the safer the contents of that cell are. We truncate the distribution at x = 0 and

x = 1 yet retain an area of 1 beneath the curve, we define a new function, F (x):

F (x) =P (x)

R 1

0 P (x)dx

[4]. For practical reasons, the sections of F (x) that lie outside the domain (0,1) are ignored.

While there is a loss of accuracy, this bounded function is more appropriate and can be

more readily manipulated for our purposes [5]. For example, we can find the probability of

a range of x-values occurring, such as [0.5-0.9], by taking a bounded integral between these

two x-values, representing the area under this part of the curve.

6

4 Choosing the Optimal Path

In order to find the optimal path, we need a metric to compare the viability of di↵erent

paths. Because both the mean and standard deviation are used, constructing a metric is not

a straightforward matter. If we compare the distributions of two cells, it is not pertinent

to choose the state with a higher traversability value (likelihood of successful cell traverse),

since this might be o↵set by a large standard deviation, which represents a high uncertainty

of the contents of that cell. Likewise, it is not feasible to compare distributions solely on the

distribution’s standard deviation. We can define a metric A, such that

A =

Z 1

T

F (x)dx

where T is a certain threshold x-value, representing the minimum likelihood of traversing

a cell successfully. We seek to minimize A, to choose the better path option. This policy is

comparable to that used in ‘Exact Value Iteration’ (5.1), where we make our decisions solely

based on the next quantitatively highest value.

As an example, we can take the threshold value T to be 0.75 for these two very

di↵erent distributions. Clearly, the area underneath the green curve is much greater than

that of the blue curve, implying that the probability that the green action is optimal is

greater.

7

At certain times however, this metric A is not enough to decide the best path for

the rover. When there are two state distributions with di↵erences in both risk belief and

uncertainty, it is hard for the dynamic programming algorithm or even a human to come to

a decisive conclusion over which path would be better to follow. Fortunately, we can calculate

the probability that one distribution has a higher x-value (traversability) than the other.

In this example, (Fig. 3) even though it might be conclusive for our policy to pick

the distribution with both a higher safety level and uncertainty, it is actually advantageous

to investigate the probability that the mean of the grey distribution is greater than that of

blue.

8

[6] We calculate this probability using the equation:

P (grey � blue) =

Z 1

0

(

Z 1

�1blue(⌧)grey(t+ ⌧)d⌧)dt (1)

In the equation above, we integrate over the probability that blue equals ⌧ , and grey equals

⌧ plus some quantity t which varies from 0 ! 1. This is a convolution over the di↵erence

between the grey and blue distributions [7]. If this yields a probability threshold which

exceeds a specified human-implemented threshold, we can carry out an iterative ‘if’ clause

to output an expected mean value E(x) of the grey distribution:

E [grey � blue|grey � blue]⇥ P (grey � blue) (2)

Based on equation (1) and expression (2), we find that it is sometimes worth sending the

scout airplanes over this area in order to collect more information. On account of this, we

obtain a better comprehension of the real risk involved with executing Action 1 or Action

9

2. Though this could be a waste of time, it could also mean finding a safer or shorter path.

This technique is later incorporated into our new algorithm, UMP.

5 Value Iteration

5.1 Exact Value Iteration

As a precursor to the actual kernels used in our algorithms, it is useful to think about

the steps leading up to the final algorithm, labelled ‘Exact Value Iteration’. This is a form

of a Markov Decision Process (MDP), defined by the tuple hS,A,R, �i, where

• S is the ‘State’

• A is the ‘Set of Actions’

• R is the ‘Reward’ and

• � is the ‘Discount Factor’.

If a simple grid is taken with a starting point, destination and obstacles, we can

simulate through the grid and find the optimal path. We allow the set of actions A, which

are movements travelling only in one of the four cardinal directions. Each state is given an

arbitrary starting value of 0, which is refined with each successive recursion until it converges

to V ⇤. This optimal value function, V ⇤, is defined for each state, s 2 S as:

V ⇤(s) = max Qa

ss0

where Q is the state-action-value function. This value is useful as it represents the best action

to take in a certain state. V ⇤ represents the expected total discounted reward received along

an optimal trajectory, starting at state s with action a.

10

In this example, (Fig. 5) we can find the optimal state-action-value Q⇤(s) of each cell

using the Bellman equation for each possible action:

Q(s) = �V (s0) +Ra

ss0

where Q⇤(s, a, s0) = max Q(s, a, s0) and Ra

ss0 is the reward of proceeding with action ‘a’ from

state ‘s’ to ‘s0’. We assign a reward of 1 for reaching the goal state, and negative infinity for

moving into an obstacle or a wall.

In each cell, there are 4 Q functions representing the value of moving into an adjacent

cell-state. And the highest Q function returns the true utility or value of that state. In other

11

words, the value obtained by following the best policy or action in that state. At the same

time, we do not discard the other state-action-value functions, rather we keep them stored

for possible future use.

In this example, the state-action value functions of each cell are determined as the

program works backwards from the goal cell until each state has a calculated value. The

constant � is called the discount factor, and is arbitrarily set between 0 and 1: in our code,

we assign � a value of 0.9. This is used to discount short term rewards, making long term

rewards more valuable. Understandably, we want the rover to end up at the goal, which is

why we make reaching the goal the most valuable long-term action. We can therefore utilize

a greedy policy, which is defined as choosing the next best adjacent cell value, which outputs

the safest path to the destination [8]. This is a policy tantamount to what we use to form

paths in a Gaussian Process simulation.

5.2 Approximate Value Iteration

In actuality, the problem-area involves many more cells and complicated functions; there-

fore, the dynamic program only stores approximate state-action-value functions (Q̂) instead

of exact ones when iterating, to alleviate storage problems. Moreover, without this approx-

imation, the algorithms would take an interminable length of time to find V ⇤i

(s), the real

value of each state. However, to avoid cases where the information is not detailed enough,

we store a normal distribution rather than a single number

N(µs,a

, �2s,a

) � Q̂s,a

to incorporate uncertainty into the iterative process. Additionally, we assign starting values

which are closer to V ⇤i

(s) in order to shorten the number of iterations and reach a convergence

in less time.

12

We use a Gaussian Process to approximate the value iteration for each state. Hyper-

parameters and random variables –each of which has a normal distribution– are compressed

to fit a low-order function. This function incorporates complex features and contains more

substance and information than a simple feature such as an x and y-value. It is defined by

multi-variate distributions and the marginal likelihood law which takes into account these

variables. After the Gaussian Process is fitted, it can be used to approximate value iteration.

6 Bellman’s Principle of Optimality

Our dynamic programming algorithms attempt to rely on a property, known as the

‘Principle of Optimality’. This states that:

An optimal policy has the property that whatever the initial state and initial decision are,

the remaining decisions must constitute an optimal policy with regard to the state resulting

from the first decision [9].

This is true when the substructures of an optimal path are also optimal. Essentially, if

we find a path k: Cs

! Ci

, then our calculated path k0: Ci

! CG

must be optimal [10]. The

path k0 is easily found if Cs

and Ci

are adjacent. We can reduce the problem to find the

optimal path k0, by finding sub-paths k0⇤ [10]. Again, this is the very essence of dynamic

programming: solving complex problems recursively by partitioning them into smaller ones.

7 Training

One way to gauge the e�cacy of di↵erent algorithms is to restrict the inputted data

and analyze the outputted results. For example, in our simple grid-case, we can initiate the

Gaussian Process without telling it where the obstacles or walls are. We only label a starting

and ending point, and the program tries to predict the best state-action value for each cell.

13

However, this Gaussian Process (which forms our base policy) thrives in situations where it

can collect and utilize as many parameters as possible. Therefore, as expected, it produced

nonsensical results due to an over-simplified situation. Even when we repeated the process in

several di↵erent locations, the results of the state-action values shown were incorrect. This

shows the necessity of balancing training the system with o✏ine heuristics and tailoring the

process to the current situation using online re-optimization. This is why we combine the

Gaussian Process with two algorithms, Rollout and UMP; to create a balance between an

oversimplified and optimizing function. We propose that this trade-o↵ is optimal for working

under stochastic conditions.

8 Simulation and Results

8.1 Rollout

There are two methods that we can use to find a viable path to a target location.

They are variants of tree-based search simulations used to find the optimal initial action. In

the ‘Rollout algorithm’, we take an action from one cell, and then using a base (one step

lookahead) policy, we forward a certain number of steps into the future, sometimes reaching

the goal state (depending on the depth of the search). Our base policy is formed by the

Gaussian Process and represents the putative optimal action. After the simulation, we move

back up our ‘search tree’ and assign a more precise value to the initial cell. For example, if

we reach our goal state in 5 steps, the state-action value in our first cell will have a value of

Vgoal

(s)⇥ �5. We obtain this value from the equation

�n Vfinalstate

(s) +X

R(s, a, s0) (3)

14

This operation is then repeated for all possible actions in the initial cell, and each time the

maximum Q function is updated. In simple cases, this can tell us which is the best first

action, as we see where this base policy ends up taking us. We can then take this optimal

initial action to an adjacent cell and simulate again from that cell.

There are caveats associated with this method. For example, in larger grids, the

number of computations in a worst-case scenario is bn, where b is the branching factor and

n is the depth of search steps. In our 4x4 grid, the algorithm took 0.613 seconds to complete

82 computations. Consequently, when we want to apply this method to a larger grid, such

as 50x50, the simulation time will be intractable. If we restrict the depth of the simulation

to mitigate this, then the beneficial information that we normally receive about the first

action is lost. Instead, the algorithm will return substandard values as the search depth is

too shallow to reach the goal state.

8.2 UMP (Uncertainty Minimization Planning)

The second algorithm, named UMP, is useful both in defining the optimal path, and

in maximizing the uncertainty reduction yield in an area of interest. The benefit of using

action uncertainty is that it finds the optimal way to discover additional information related

to your action. In this case, we start out by focusing on the 2 state-action value functions with

the highest mean. When we are faced with these two di↵erent distributions, an exploratory

policy is employed and we simulate choosing the action with the higher standard deviation

(uncertainty). Whereas ‘Rollout’ ignores decision uncertainty, we use this uncertainty to

focus on simulations which might drastically change our original state-action values.

Initially, we call in an o✏ine matrix containing a certain state, search depth, and

state-action values procured from the Gaussian Process. While we branch down this tree,

constantly choosing the more uncertain of two actions, we encounter more state values which

are recursively called by the base policy. Once the goal state or depth limit is reached, we

15

once again use equation (3) to predict the initial state-action value, using the final state

value and any intermediary rewards. Since this value, originally forecasted by the Gaussian

Process, is often too approximate, it is advantageous to replace it with a value which is more

representative of the true aggregate discounted future reward. Like in ‘Rollout’, we can then

follow the calculated optimal first action to an adjacent cell, and simulate UMP again. After

an operator-determined number of simulations is performed, the ground-vehicle can utilize

the algorithm to follow a path with the updated Q functions.

Again, there are two major caveats associated with this UMP and Gaussian Process

combination. Firstly, the base policy is a low order function which can not always quantify

the complexity of the situation [7]. Consequently, in our 4x4 grid, simulations suggest that

the initial values are o↵ by more than 10%, approximately 77% of the time. Furthermore,

when we are faced with a larger grid (sometimes up to 10,000 cells) we are limited by our

search depth. If we keep the depth long enough to reach the goal state and compute a good

initial value, the program will take a prohibitively long time to run. And if we want to limit

the time length of the program, the algorithm will often return a substandard approximation.

16

This is because the algorithm returns the best initial value when the search reaches the goal

state, as its value is deterministic.

When working with UMP, we can describe the worst-case complexity or performance

of the algorithm by ‘Big O notation’ [11]. The summative order of UMP is a function of the

search depth, the number of actions and the number of repetitions of UMP. In the worst-case

scenario, the search depth is linearly related with the number of cells in a grid, and therefore

the search depth factor displays quadratic growth with the grid size. This is assuming that

we want to have a good chance of reaching the goal state in every grid size. We also repeat

UMP in every cell so that each state has updated state-action value functions, though this

can later be truncated when time becomes a major focus. Since we restrict the number of

actions to one in all cases, the running time is nominally quartic.

We use the execution time to decipher the e�ciency of our code in a real case. As

UMP runs 49 computations in 0.251 seconds in a 4x4 grid, the worst-case scenario in a 4x4

grid runs in about 4 days. Although this is a lengthy time, in typical cases the time will be

reduced by the presence of obstacles and the Gaussian Process base policy. These variables

shrink both the search depth and the number of UMP repetitions. It is within reason to

expect an optimal path in a 50x50 grid to be found in approximately one hour. Moreover, if

we refine this search using an even shallower depth search or other methods such as storing

the last simulated action, we could easily trim this figure down further.

Finally, an analysis of the comparable complexities of both Rollout and UMP reveals

that as the size of the area grid increases, UMP is a more e�cient algorithm. Rollout has a

linear complexity of:

⌃d

i=0 bmin(i,1)

where b in the worst-case has is branching factor of 4 and d is the search-depth. UMP is

17

constructed so that its complexity,

⌃d

i=0 bi

(though seemingly exponential) is also linear because the braching factor is 1:

⌃d

i=0 1i

To boot, as the search-depth increases, UMP becomes the less complex algorithm and

therefore less time-consuming.

9 Conclusion

Achieving optimality in real scenarios is a demanding task. There are an infite number of

paths and many ways of quantifying these paths. Our combination of Rollout with GP was

intractable for a real scenario. However, by changing our strategy in UMP and simulating

fewer actions, we could reduce the amount of time needed to complete the task. Despite the

fact that UMP was approximate and not compliant with Bellman’s Principle of Optimality,

it served as a pragmatic online algorithm.

10 Future Research

Although the field of machine learning has expanded rapidly since its inception, it is

still such a burgeoning branch of study. There are various possibilities for advancements and

improvements which can be made. In our project, there are still inadequacies throughout

the algorithmic planning and new developments can be made to enhance the quality of our

current research project. For example, there is ine�ciency in the fact that we only allow the

ground-rover four possible actions, as shown in Fig. 7, while we would prefer the rover to

18

have the capability to to move diagonally as well. To further this argument, ideally the rover

would be able to move smoothly and continuously change direction, and not be restricted

by only straight-line choices.

Another improvement which could be made is storing the ‘last action’ data in the

MDP tuple. Normally, there is a higher probability of continuing in the same direction (and

therefore executing the previous action again) when moving towards a goal. This means

that storing this piece of information could be convenient, as it could speed up the decision

making of the online algorithm. Additionally, implementing a way of retaining and utilizing

19

data procured from short-term searches could be explored in the future, as a means of

reducing the simulation time.

A final inadequacy of the current program is that we do not consider the size of the

obstacles or the ability of obstacles to impede on neighbouring cells. One analogy is that it

would be undesirable to drive an expensive car merely a few inches away from a potentially

unstable truck. On Mars, there are numerous strong winds and sandstorms which are not

stationary, but rather move about randomly. In a real life scenario, this dynamic aspect could

prove disruptive.

11 Acknowledgements

My foremost thanks are to my mentor Mr. Lawrence Bush of MITs Computer Science

and Artificial Intelligence Laboratory (CSAIL) for inviting me to work on such an interesting

and engaging project and for all of his help and expertise. I would also like to thank my

tutor Mr. Sam Spencer for all of his wisdom and guidance. As well, it was a pleasure to

work alongside my colleague Jon Xia, who provided me with invaluable assistance through-

out the research journey. I would like to thank the Research Science Institute (RSI), the

Center for Excellence in Education (CEE), and the Massachusetts Institute of Technology

(MIT) for their generosity in providing me with a tremendous opportunity to conduct my

research. I would also like to acknowledge the e↵orts of all the RSI sta↵ and Teaching As-

sistants for helping me learn LaTex and polish my paper. My thanks go out especially to

Evgenia Sendova, Megan Belzner and Charlie Pasternak for their constant help throughout

my research project. Finally, I would like to convey my sincerest thanks to my sponsor Gilad

Sheba, who truly made my stay at RSI possible.

20

References

[1] P. S. Anderson. a chronicle of planetary exploration, 2012.

[2] D. Ferguson and A. Stentz. Global path planning on board the mars exploration rovers.

[3] B. W. Lawrence A. Bush and N. Roy. Computing exploration policies via closed-formleast squares value iteration. Doctoral Consortium, Eighteenth International Conferenceon Automated Planning & Scheduling (ICAPS-08), pages 1–8, 2008.

[4] C. Ke. Dynamic path planning for disaster relief transport vehicles using probabilitydistributions. page 14, 2011.

[5] C. E. Rasmussen. Gaussian processes in machine learning. page 75, 2009.

[6] L. Bush. Decision uncertainty minimization planning.

[7] Decision Uncertainty Minimization Planning and Autonomous Information Gathering.PhD thesis, 2013.

[8] L. A. Bush, A. J. Wang, and B. C. Williams. Risk-based sensing in support of adjustableautonomy. pages 1–18, 2012.

[9] Bellman. Dynamic programming. 1957, republished 2003.

[10] L. Missik. Dynamic path planning using probability distributions. page 16, 2011.

[11] R. Bell. Big o notation, 2008.

21

A Pseudocode

Algorithm 1 UMP (Uncertainty Minimization Planning)branching factor 2 UncertaintyMinimizationPlanningM = hS,A,R, V

G

, �id 2 Ns 2 Sif d = 0 thenreturn Q̂ = �d x V

G

for a 2 A dos’ (= DeterministicSimulation UncertaintyMinimization(s’, M , d-1)

end forreturn max Q̂

end if

22