Dynamic Path Planning for Mars Data Collection Mission using Probability Distributions
Transcript of Dynamic Path Planning for Mars Data Collection Mission using Probability Distributions
Dynamic Path Planning for Mars Data Collection
Mission using Probability Distributions
Fredric Moezinia
under the direction ofMr. Lawrence Bush
MIT Computer Science and Artificial Intelligence Laboratory
Research Science InstituteJuly 30, 2013
Abstract
This research explores many aspects of an adjustably autonomous Mars mission. One of our
several objectives is to find a viable path for a ground-vehicle to take from a starting point
to a pre-determined destination while avoiding obstacles along the way. This is achieved
by merging a Gaussian Process heuristic with an online simulation. In this way, we find
computationally e↵ective methods to relieve the problems of exact value iteration. Moreover,
we introduce a tree-branching recursive algorithm, named UMP (Uncertainty Minimization
Planning), designed to reduce the uncertainty within an area on the Martian surface. This
method investigates a series of actions with high uncertainty in order to have a better
understanding of the long-term consequences of the ground-vehicle’s first action. As well
as innovating upon approximate value iteration, we introduce the idea of a human-machine
interface to make decisions of the adjustably autonomous robot more tailored to the current
situation.
Summary
Central to dynamic programming is the concept of dividing up one complicated idea
into smaller, more manageable pieces in a recursive manner. We use this notion by splitting
up a multi-stage Mars mission into simpler subsections. We try to find the safest path for a
ground-rover to take from a starting point to a pre-determined destination without hitting
obstacles. This path is determined using information collected by overhead UAVs (Unmanned
Aerial Vehicles), which try to gather as much useful data about an area as possible. We assign
certain values to smaller portions of the area, namely, the estimated likelihood that a vehicle
can traverse through a single portion of the area safely, and the certainty of that estimate.
Subsequently, an online algorithm –called Uncertainty Minimization Planning– can suggest
a viable path to a point of interest for Mars data collection.
1 Introduction
In recent years, the search for signs of life on Mars has become a major focus of scientific and
astronomical research. Two NASA missions especially –the Curiosity Rover (2012) and Spirit
Rover (2004)– were instrumental in developing humanity’s knowledge about the molecular
composition of minerals on Mars. The Curiosity Rover even unveiled the site of an ancient
stream-bed near its landing point, substantiating speculation over the presence of liquid H2O
on Mars [1] . These missions were groundbreaking in revealing important new information
about Mars, and serve as examples of possible applications of our research.
Information travelling at the speed of light takes between 4 to 12 minutes to relay from
Earth to Mars depending on the planetary positions. Given the latency involved in sending
commands to and receiving data from an unmanned robot at such distances, our objective
is to create an adjustably autonomous robot. This robot can generate an informative map
of the surrounding area in a short time period and from this, output an optimal path to
a pre-determined destination. This plan is accumulated from data collected by Mars-based
scout airplanes and other UAVs (Unmanned Aerial Vehicles). The actions of these surveying
aircrafts are also central to our project as they have a direct impact on the capabilities of
the ground-rover.
Despite the success of previous missions to Mars, there are two fundamental ways in
which our work could improve the actions of the ground rover on the Martian surface, as
well as the decisions of the UAVs. First of all, by reducing the overall uncertainty associated
with the topography in a given area using a new strategy, the rover can plan its path from
landing point to destination with higher knowledge and accuracy. Secondly, we can refine
current path planning algorithms in order to optimize the path for the ground vehicle to
follow to its destination of preference.
1
1.1 Inadequacies of Current Mission
Current systems that use hazard avoidance cameras or the ‘Field D*’ path planner are
sub-optimal; the built-in cameras can only detect nearby obstacles, and cannot plan around
larger obstacles in the long term. As an example of the consequences of such shortcomings,
the Spirit Rover spent 105 minutes trying to maneuver around a rock cluster, and ultimately
never reached its destination. Furthermore, the current path planner mechanism is ine�cient
and time consuming, as it plans the shortest Euclidean distance to the destination, only
updating its awareness of obstacles once the rover reaches them. Our process will tackle
these issues by using a scout satellite and lightweight-airplanes to gather information about
the area surrounding the ground vehicle. Finally, we address the current inadequacies and
limitations by realizing the long-term e↵ects of our real-time decisions.
1.2 Approach
The main task of our present research is to try to create a series of algorithms that enable a
semi-autonomous vehicle to carry out a mission using only collected data from the aircrafts.
The path-planning algorithms are given a map of the terrain in matrix form. This map is
divided into cells, where each cell contains a probability density function (PDF) modelling
the probability that the rover will be able to safely pass through that cell. A combination
of these distributions is an approximate heuristic known as the Gaussian Process, which
can form our policy (decision making process). For example, if it is revealed that there is
an obstacle in a cell square, such as a boulder or pothole, the distribution will contain a
likelihood of successful traversability close to 0. We then proceed by picking a certain path
which has a strong likelihood of a successful journey, where this likelihood is bounded by 0
and 1. Although there is a trade-o↵ between selecting a shorter or safer path, in practical
terms the preference is to take the safer one.
2
The robot is independently active for 4 hours every Mars-day, spending the remaining
20.5 hours recharging using solar cells. The robot receives instructions from earth before its
period of activity [2]. This is part of the human-machine interface, which allows an operator to
either permit or disallow certain paths to be followed under di↵erent circumstances. Despite
this, there may be a pre-determined threshold value for the probability of path success, only
above which the robot is allowed to move. Throughout the mission, there are continuous
updates of the estimated best route as more information is collected and processed, which
makes this adjustable level of autonomy achieved through the human-machine interface
useful.
Another aspect of our project is to plan the flight paths of the scout airplanes, which
survey an area in order to procure more data, either to inform the rover, or to decide upon
an interesting destination. Once a potential path for the rover has been chosen, we must
configure an algorithm enabling the airplanes to fly over parts of the path which we know
the least about. Priority is shifted to these areas of high uncertainty for further investigation.
From this exploration, more information about the path can be collated, and the viability
of the path examined more closely.
In summary, this project aims to provide a framework of algorithms to be used in a
planetary mission project which calculates low-risk paths for rovers on Mars to follow. To be
useful in unknown situations, these algorithms must be able to provide acceptable paths from
initial data collected, update their paths when new data is acquired, and improve their results
as time elapses. Our addition to this field will be combining the Gaussian Process with both
an existing algorithm (Rollout) as well as our new recursive algorithm UMP (Uncertainty
Minimization Planning).
3
2 Minimizing Information Uncertainty
The optimal exploration problem can be defined as a task to successively select locations to
observe so as to minimize the uncertainty of a state estimate. The scout satellite surveys areas
which yield the highest expected uncertainty reduction, creating a probabilistic activity map.
Our objective is to find the most informative and useful map, an assignment often approached
by starting to collect information in the center of an area of high uncertainty. However, this
common sub-optimal approach, often referred to as ‘a greedy strategy’, is myopic and ends
up gathering less useful information than our new strategy. Past research by Lawrence Bush,
has suggested a non-myopic strategy, [3] which can find out more information about a given
area in the same amount of time. This is beneficial as more cell distributions can be inferred
with greater precision if we know more about a given area. The comparison of strategies is
shown in Fig. 1.
4
By making use of this new development, we can provide the path planning algorithm with
more precise details of the path-planning area and henceforth construct a better estimate of
the optimal path.
3 Analysis of Safety Distribution
In order to split up the large problem into a set of smaller ones (a policy central to
dynamic programming) the area of interest is gridded up into smaller square cells. Each cell
contains a probability density function, represented by a series of Gaussian distributions.
One distribution contains a mean value µ that represents the likelihood of the vehicle passing
through the cell safely and a standard deviation, �, which represents the uncertainty of that
5
belief. While the mean is deduced by either the presence or the absence of obstacles, the
uncertainty value could be, for example, due to the low resolution of the images collected by
the scout airplane. These values are inferred from images taken by the scout aircrafts; this
procedure constitutes another topic within machine learning.
From the inferred values of µ and �, we then implement value iteration to find the path
from the start location to the end location that has the greatest chance of mission success.
Such a path is defined as our optimal path. Note that probability of mission success does
not depend solely on µ, but also on �. The Gaussian distribution is defined as:
P (x) =1
�p2⇡
e�(x� µ)2
2�2
where the area underneath the curve is exactly one. P (x) spans the domain of (-1,1).
However, because the x-axis represents probability, it is preferable to have a domain of
(0, 1). This probability is the likelihood that the cell is traversable, where the larger the
x-value is, the safer the contents of that cell are. We truncate the distribution at x = 0 and
x = 1 yet retain an area of 1 beneath the curve, we define a new function, F (x):
F (x) =P (x)
R 1
0 P (x)dx
[4]. For practical reasons, the sections of F (x) that lie outside the domain (0,1) are ignored.
While there is a loss of accuracy, this bounded function is more appropriate and can be
more readily manipulated for our purposes [5]. For example, we can find the probability of
a range of x-values occurring, such as [0.5-0.9], by taking a bounded integral between these
two x-values, representing the area under this part of the curve.
6
4 Choosing the Optimal Path
In order to find the optimal path, we need a metric to compare the viability of di↵erent
paths. Because both the mean and standard deviation are used, constructing a metric is not
a straightforward matter. If we compare the distributions of two cells, it is not pertinent
to choose the state with a higher traversability value (likelihood of successful cell traverse),
since this might be o↵set by a large standard deviation, which represents a high uncertainty
of the contents of that cell. Likewise, it is not feasible to compare distributions solely on the
distribution’s standard deviation. We can define a metric A, such that
A =
Z 1
T
F (x)dx
where T is a certain threshold x-value, representing the minimum likelihood of traversing
a cell successfully. We seek to minimize A, to choose the better path option. This policy is
comparable to that used in ‘Exact Value Iteration’ (5.1), where we make our decisions solely
based on the next quantitatively highest value.
As an example, we can take the threshold value T to be 0.75 for these two very
di↵erent distributions. Clearly, the area underneath the green curve is much greater than
that of the blue curve, implying that the probability that the green action is optimal is
greater.
7
At certain times however, this metric A is not enough to decide the best path for
the rover. When there are two state distributions with di↵erences in both risk belief and
uncertainty, it is hard for the dynamic programming algorithm or even a human to come to
a decisive conclusion over which path would be better to follow. Fortunately, we can calculate
the probability that one distribution has a higher x-value (traversability) than the other.
In this example, (Fig. 3) even though it might be conclusive for our policy to pick
the distribution with both a higher safety level and uncertainty, it is actually advantageous
to investigate the probability that the mean of the grey distribution is greater than that of
blue.
8
[6] We calculate this probability using the equation:
P (grey � blue) =
Z 1
0
(
Z 1
�1blue(⌧)grey(t+ ⌧)d⌧)dt (1)
In the equation above, we integrate over the probability that blue equals ⌧ , and grey equals
⌧ plus some quantity t which varies from 0 ! 1. This is a convolution over the di↵erence
between the grey and blue distributions [7]. If this yields a probability threshold which
exceeds a specified human-implemented threshold, we can carry out an iterative ‘if’ clause
to output an expected mean value E(x) of the grey distribution:
E [grey � blue|grey � blue]⇥ P (grey � blue) (2)
Based on equation (1) and expression (2), we find that it is sometimes worth sending the
scout airplanes over this area in order to collect more information. On account of this, we
obtain a better comprehension of the real risk involved with executing Action 1 or Action
9
2. Though this could be a waste of time, it could also mean finding a safer or shorter path.
This technique is later incorporated into our new algorithm, UMP.
5 Value Iteration
5.1 Exact Value Iteration
As a precursor to the actual kernels used in our algorithms, it is useful to think about
the steps leading up to the final algorithm, labelled ‘Exact Value Iteration’. This is a form
of a Markov Decision Process (MDP), defined by the tuple hS,A,R, �i, where
• S is the ‘State’
• A is the ‘Set of Actions’
• R is the ‘Reward’ and
• � is the ‘Discount Factor’.
If a simple grid is taken with a starting point, destination and obstacles, we can
simulate through the grid and find the optimal path. We allow the set of actions A, which
are movements travelling only in one of the four cardinal directions. Each state is given an
arbitrary starting value of 0, which is refined with each successive recursion until it converges
to V ⇤. This optimal value function, V ⇤, is defined for each state, s 2 S as:
V ⇤(s) = max Qa
ss0
where Q is the state-action-value function. This value is useful as it represents the best action
to take in a certain state. V ⇤ represents the expected total discounted reward received along
an optimal trajectory, starting at state s with action a.
10
In this example, (Fig. 5) we can find the optimal state-action-value Q⇤(s) of each cell
using the Bellman equation for each possible action:
Q(s) = �V (s0) +Ra
ss0
where Q⇤(s, a, s0) = max Q(s, a, s0) and Ra
ss0 is the reward of proceeding with action ‘a’ from
state ‘s’ to ‘s0’. We assign a reward of 1 for reaching the goal state, and negative infinity for
moving into an obstacle or a wall.
In each cell, there are 4 Q functions representing the value of moving into an adjacent
cell-state. And the highest Q function returns the true utility or value of that state. In other
11
words, the value obtained by following the best policy or action in that state. At the same
time, we do not discard the other state-action-value functions, rather we keep them stored
for possible future use.
In this example, the state-action value functions of each cell are determined as the
program works backwards from the goal cell until each state has a calculated value. The
constant � is called the discount factor, and is arbitrarily set between 0 and 1: in our code,
we assign � a value of 0.9. This is used to discount short term rewards, making long term
rewards more valuable. Understandably, we want the rover to end up at the goal, which is
why we make reaching the goal the most valuable long-term action. We can therefore utilize
a greedy policy, which is defined as choosing the next best adjacent cell value, which outputs
the safest path to the destination [8]. This is a policy tantamount to what we use to form
paths in a Gaussian Process simulation.
5.2 Approximate Value Iteration
In actuality, the problem-area involves many more cells and complicated functions; there-
fore, the dynamic program only stores approximate state-action-value functions (Q̂) instead
of exact ones when iterating, to alleviate storage problems. Moreover, without this approx-
imation, the algorithms would take an interminable length of time to find V ⇤i
(s), the real
value of each state. However, to avoid cases where the information is not detailed enough,
we store a normal distribution rather than a single number
N(µs,a
, �2s,a
) � Q̂s,a
to incorporate uncertainty into the iterative process. Additionally, we assign starting values
which are closer to V ⇤i
(s) in order to shorten the number of iterations and reach a convergence
in less time.
12
We use a Gaussian Process to approximate the value iteration for each state. Hyper-
parameters and random variables –each of which has a normal distribution– are compressed
to fit a low-order function. This function incorporates complex features and contains more
substance and information than a simple feature such as an x and y-value. It is defined by
multi-variate distributions and the marginal likelihood law which takes into account these
variables. After the Gaussian Process is fitted, it can be used to approximate value iteration.
6 Bellman’s Principle of Optimality
Our dynamic programming algorithms attempt to rely on a property, known as the
‘Principle of Optimality’. This states that:
An optimal policy has the property that whatever the initial state and initial decision are,
the remaining decisions must constitute an optimal policy with regard to the state resulting
from the first decision [9].
This is true when the substructures of an optimal path are also optimal. Essentially, if
we find a path k: Cs
! Ci
, then our calculated path k0: Ci
! CG
must be optimal [10]. The
path k0 is easily found if Cs
and Ci
are adjacent. We can reduce the problem to find the
optimal path k0, by finding sub-paths k0⇤ [10]. Again, this is the very essence of dynamic
programming: solving complex problems recursively by partitioning them into smaller ones.
7 Training
One way to gauge the e�cacy of di↵erent algorithms is to restrict the inputted data
and analyze the outputted results. For example, in our simple grid-case, we can initiate the
Gaussian Process without telling it where the obstacles or walls are. We only label a starting
and ending point, and the program tries to predict the best state-action value for each cell.
13
However, this Gaussian Process (which forms our base policy) thrives in situations where it
can collect and utilize as many parameters as possible. Therefore, as expected, it produced
nonsensical results due to an over-simplified situation. Even when we repeated the process in
several di↵erent locations, the results of the state-action values shown were incorrect. This
shows the necessity of balancing training the system with o✏ine heuristics and tailoring the
process to the current situation using online re-optimization. This is why we combine the
Gaussian Process with two algorithms, Rollout and UMP; to create a balance between an
oversimplified and optimizing function. We propose that this trade-o↵ is optimal for working
under stochastic conditions.
8 Simulation and Results
8.1 Rollout
There are two methods that we can use to find a viable path to a target location.
They are variants of tree-based search simulations used to find the optimal initial action. In
the ‘Rollout algorithm’, we take an action from one cell, and then using a base (one step
lookahead) policy, we forward a certain number of steps into the future, sometimes reaching
the goal state (depending on the depth of the search). Our base policy is formed by the
Gaussian Process and represents the putative optimal action. After the simulation, we move
back up our ‘search tree’ and assign a more precise value to the initial cell. For example, if
we reach our goal state in 5 steps, the state-action value in our first cell will have a value of
Vgoal
(s)⇥ �5. We obtain this value from the equation
�n Vfinalstate
(s) +X
R(s, a, s0) (3)
14
This operation is then repeated for all possible actions in the initial cell, and each time the
maximum Q function is updated. In simple cases, this can tell us which is the best first
action, as we see where this base policy ends up taking us. We can then take this optimal
initial action to an adjacent cell and simulate again from that cell.
There are caveats associated with this method. For example, in larger grids, the
number of computations in a worst-case scenario is bn, where b is the branching factor and
n is the depth of search steps. In our 4x4 grid, the algorithm took 0.613 seconds to complete
82 computations. Consequently, when we want to apply this method to a larger grid, such
as 50x50, the simulation time will be intractable. If we restrict the depth of the simulation
to mitigate this, then the beneficial information that we normally receive about the first
action is lost. Instead, the algorithm will return substandard values as the search depth is
too shallow to reach the goal state.
8.2 UMP (Uncertainty Minimization Planning)
The second algorithm, named UMP, is useful both in defining the optimal path, and
in maximizing the uncertainty reduction yield in an area of interest. The benefit of using
action uncertainty is that it finds the optimal way to discover additional information related
to your action. In this case, we start out by focusing on the 2 state-action value functions with
the highest mean. When we are faced with these two di↵erent distributions, an exploratory
policy is employed and we simulate choosing the action with the higher standard deviation
(uncertainty). Whereas ‘Rollout’ ignores decision uncertainty, we use this uncertainty to
focus on simulations which might drastically change our original state-action values.
Initially, we call in an o✏ine matrix containing a certain state, search depth, and
state-action values procured from the Gaussian Process. While we branch down this tree,
constantly choosing the more uncertain of two actions, we encounter more state values which
are recursively called by the base policy. Once the goal state or depth limit is reached, we
15
once again use equation (3) to predict the initial state-action value, using the final state
value and any intermediary rewards. Since this value, originally forecasted by the Gaussian
Process, is often too approximate, it is advantageous to replace it with a value which is more
representative of the true aggregate discounted future reward. Like in ‘Rollout’, we can then
follow the calculated optimal first action to an adjacent cell, and simulate UMP again. After
an operator-determined number of simulations is performed, the ground-vehicle can utilize
the algorithm to follow a path with the updated Q functions.
Again, there are two major caveats associated with this UMP and Gaussian Process
combination. Firstly, the base policy is a low order function which can not always quantify
the complexity of the situation [7]. Consequently, in our 4x4 grid, simulations suggest that
the initial values are o↵ by more than 10%, approximately 77% of the time. Furthermore,
when we are faced with a larger grid (sometimes up to 10,000 cells) we are limited by our
search depth. If we keep the depth long enough to reach the goal state and compute a good
initial value, the program will take a prohibitively long time to run. And if we want to limit
the time length of the program, the algorithm will often return a substandard approximation.
16
This is because the algorithm returns the best initial value when the search reaches the goal
state, as its value is deterministic.
When working with UMP, we can describe the worst-case complexity or performance
of the algorithm by ‘Big O notation’ [11]. The summative order of UMP is a function of the
search depth, the number of actions and the number of repetitions of UMP. In the worst-case
scenario, the search depth is linearly related with the number of cells in a grid, and therefore
the search depth factor displays quadratic growth with the grid size. This is assuming that
we want to have a good chance of reaching the goal state in every grid size. We also repeat
UMP in every cell so that each state has updated state-action value functions, though this
can later be truncated when time becomes a major focus. Since we restrict the number of
actions to one in all cases, the running time is nominally quartic.
We use the execution time to decipher the e�ciency of our code in a real case. As
UMP runs 49 computations in 0.251 seconds in a 4x4 grid, the worst-case scenario in a 4x4
grid runs in about 4 days. Although this is a lengthy time, in typical cases the time will be
reduced by the presence of obstacles and the Gaussian Process base policy. These variables
shrink both the search depth and the number of UMP repetitions. It is within reason to
expect an optimal path in a 50x50 grid to be found in approximately one hour. Moreover, if
we refine this search using an even shallower depth search or other methods such as storing
the last simulated action, we could easily trim this figure down further.
Finally, an analysis of the comparable complexities of both Rollout and UMP reveals
that as the size of the area grid increases, UMP is a more e�cient algorithm. Rollout has a
linear complexity of:
⌃d
i=0 bmin(i,1)
where b in the worst-case has is branching factor of 4 and d is the search-depth. UMP is
17
constructed so that its complexity,
⌃d
i=0 bi
(though seemingly exponential) is also linear because the braching factor is 1:
⌃d
i=0 1i
To boot, as the search-depth increases, UMP becomes the less complex algorithm and
therefore less time-consuming.
9 Conclusion
Achieving optimality in real scenarios is a demanding task. There are an infite number of
paths and many ways of quantifying these paths. Our combination of Rollout with GP was
intractable for a real scenario. However, by changing our strategy in UMP and simulating
fewer actions, we could reduce the amount of time needed to complete the task. Despite the
fact that UMP was approximate and not compliant with Bellman’s Principle of Optimality,
it served as a pragmatic online algorithm.
10 Future Research
Although the field of machine learning has expanded rapidly since its inception, it is
still such a burgeoning branch of study. There are various possibilities for advancements and
improvements which can be made. In our project, there are still inadequacies throughout
the algorithmic planning and new developments can be made to enhance the quality of our
current research project. For example, there is ine�ciency in the fact that we only allow the
ground-rover four possible actions, as shown in Fig. 7, while we would prefer the rover to
18
have the capability to to move diagonally as well. To further this argument, ideally the rover
would be able to move smoothly and continuously change direction, and not be restricted
by only straight-line choices.
Another improvement which could be made is storing the ‘last action’ data in the
MDP tuple. Normally, there is a higher probability of continuing in the same direction (and
therefore executing the previous action again) when moving towards a goal. This means
that storing this piece of information could be convenient, as it could speed up the decision
making of the online algorithm. Additionally, implementing a way of retaining and utilizing
19
data procured from short-term searches could be explored in the future, as a means of
reducing the simulation time.
A final inadequacy of the current program is that we do not consider the size of the
obstacles or the ability of obstacles to impede on neighbouring cells. One analogy is that it
would be undesirable to drive an expensive car merely a few inches away from a potentially
unstable truck. On Mars, there are numerous strong winds and sandstorms which are not
stationary, but rather move about randomly. In a real life scenario, this dynamic aspect could
prove disruptive.
11 Acknowledgements
My foremost thanks are to my mentor Mr. Lawrence Bush of MITs Computer Science
and Artificial Intelligence Laboratory (CSAIL) for inviting me to work on such an interesting
and engaging project and for all of his help and expertise. I would also like to thank my
tutor Mr. Sam Spencer for all of his wisdom and guidance. As well, it was a pleasure to
work alongside my colleague Jon Xia, who provided me with invaluable assistance through-
out the research journey. I would like to thank the Research Science Institute (RSI), the
Center for Excellence in Education (CEE), and the Massachusetts Institute of Technology
(MIT) for their generosity in providing me with a tremendous opportunity to conduct my
research. I would also like to acknowledge the e↵orts of all the RSI sta↵ and Teaching As-
sistants for helping me learn LaTex and polish my paper. My thanks go out especially to
Evgenia Sendova, Megan Belzner and Charlie Pasternak for their constant help throughout
my research project. Finally, I would like to convey my sincerest thanks to my sponsor Gilad
Sheba, who truly made my stay at RSI possible.
20
References
[1] P. S. Anderson. a chronicle of planetary exploration, 2012.
[2] D. Ferguson and A. Stentz. Global path planning on board the mars exploration rovers.
[3] B. W. Lawrence A. Bush and N. Roy. Computing exploration policies via closed-formleast squares value iteration. Doctoral Consortium, Eighteenth International Conferenceon Automated Planning & Scheduling (ICAPS-08), pages 1–8, 2008.
[4] C. Ke. Dynamic path planning for disaster relief transport vehicles using probabilitydistributions. page 14, 2011.
[5] C. E. Rasmussen. Gaussian processes in machine learning. page 75, 2009.
[6] L. Bush. Decision uncertainty minimization planning.
[7] Decision Uncertainty Minimization Planning and Autonomous Information Gathering.PhD thesis, 2013.
[8] L. A. Bush, A. J. Wang, and B. C. Williams. Risk-based sensing in support of adjustableautonomy. pages 1–18, 2012.
[9] Bellman. Dynamic programming. 1957, republished 2003.
[10] L. Missik. Dynamic path planning using probability distributions. page 16, 2011.
[11] R. Bell. Big o notation, 2008.
21