A linear programming approach for probabilistic robot path planning with missing information of...

7
Abstract—In practical robot motion planning, robots usually do not have full models of their surrounding, and hence no complete and correct plan exists for the robots to be executed fully. In most real-world problems a robot operates in just a partially-known environment, meaning that most of the environment is known to the robot at the time of planning, but there exists incomplete information about some ‘hidden’ variables which represent potential blockages (e.g. open/closed doors, or corridors congested with other robots or obstacles). For these hidden variables, the robot has a probability distribution estimation and a prioritized preference over their possible values. In this paper, to deal with the problem of choosing an optimal policy for planning in offline mode, a stochastic dynamic programming model is developed, which is converted to and solved by linear programming. Next, a heuristic method is proposed for conditional planning in the presence of numerous hidden variables which produces near-optimal plans. I. INTRODUCTION obile robots are generally used for indoor or outdoor large-scale tasks such as navigation or transportation. A mobile robot may be used for exploring either known or unknown areas. In offline robot path planning, it is generally assumed that robots have complete information about their world. This assumption is often considered as unrealistic since robots operating in real-world environments often lack complete information about their surroundings, but acquire this knowledge incrementally through interleaving plan execution and sensing. Therefore, robots need to use different search methods other than widely implemented conventional static search methods like the A* algorithm. Aiming to develop autonomous agents capable of planning in dynamic environments, an increasing amount of research has been dedicated to the problem of planning with incomplete information as a response to the growing need for acting in real-world applications, such as robot path planning [1], aeromedical evacuation of injured people in crisis situations [2], air campaign planning [3], and planning for path clearance for a robot navigating to its goal without being detected by adversaries [4]. When planning with missing information, the outcomes or costs of some actions are unknown to the robot but the status of the robot is completely known. The certainty about the robot’s state and the uncertainty about the outcomes of its actions are properties of the Markov Decision Processes (MDP), which differentiates it from the Partial Observable Markov Decision Process (POMDP) where there are uncertainties about both the robot’s state and action outcomes. Authors are with the Faculty of Engineering, Tarbiat Modares University, Tehran, Iran. Corresponding author’s email: [email protected]. The missing information about the outcomes of actions can be modeled with ‘Hidden Variables’ (HV), the exact values of which are not known at the beginning of path planning. For instance, there is not enough evidence to be sure that a door is open/closed, or a narrow corridor will/won’t be congested with other agents or dynamic obstacles before the robot reaches there. Instead, the robot uses a probability distribution for estimating the possible values of these hidden variables [5]. The exact values for HVs will be unfolded only after the robot arrives there and senses their values. The probability distribution for possible values of each HV may be based on previous observations or inferred from some other sensing actions. For many real-world path planning problems with missing information, the initial values of the hidden variables can be roughly labeled as ‘best’, ‘good’, ‘bad’, etc. with an ordinal preference over them. For example, for a robot navigating through an initially unknown passage, the ‘best’ status is “the passage is open and traversable”, the ‘good’ status might be “the passage can be made traversable by removing the obstacles”, and the ‘bad’ status could be “the passage is impassable”. In the presence of uncertainty about the outcomes of actions with exact sensing, a probabilistic plan can be either online or offline, and open-loop or closed-loop. Since open-loop plans are similar in online and offline modes, we can define a total of three distinct classes of probabilistic plans as follows: Open loop plans (also known as Conformant plans), Closed-loop offline plans (also known as Conditional or Probabilistic Contingent plans), and Closed-loop online plans (which we will call Incremental Conditional). These classes are described in the following subsections. A. Conformant Plans Some planning systems try to achieve their goal states without sensing, and thus continue to find sequences of ac- tions regardless of different possibilities for actions’ out- comes. When preparing a conformant plan, the initial state is no longer assumed to be known and actions may also be non- deterministic. Conformant plans assume that the nature exhi- bits the fiercest behavior and always chooses the worst possi- ble successor state [8]. In conformant planning, a solution is a sequence of actions that achieves the goal from every possible initial state [7], and the plan should be successful regardless of which particular initial world we start from [9]. B. Conditional Plans When dealing with missing information, the planner may be interested in finding plans that contain both sensing and A Linear Programming Approach for Probabilistic Robot Path Planning with Missing Information of Outcomes Mohamad Ali Movafaghpour and Ellips Masehian M 2011 IEEE International Conference on Automation Science and Engineering Trieste, Italy - August 24-27, 2011 ThA4.1 978-1-4577-1732-1/11/$26.00 ©2011 IEEE 126

Transcript of A linear programming approach for probabilistic robot path planning with missing information of...

Abstract—In practical robot motion planning, robots usually do not have full models of their surrounding, and hence no complete and correct plan exists for the robots to be executed fully. In most real-world problems a robot operates in just a partially-known environment, meaning that most of the environment is known to the robot at the time of planning, but there exists incomplete information about some ‘hidden’ variables which represent potential blockages (e.g. open/closed doors, or corridors congested with other robots or obstacles). For these hidden variables, the robot has a probability distribution estimation and a prioritized preference over their possible values. In this paper, to deal with the problem of choosing an optimal policy for planning in offline mode, a stochastic dynamic programming model is developed, which is converted to and solved by linear programming. Next, a heuristic method is proposed for conditional planning in the presence of numerous hidden variables which produces near-optimal plans.

I. INTRODUCTION obile robots are generally used for indoor or outdoor large-scale tasks such as navigation or transportation. A mobile robot may be used for exploring either known or

unknown areas. In offline robot path planning, it is generally assumed that robots have complete information about their world. This assumption is often considered as unrealistic since robots operating in real-world environments often lack complete information about their surroundings, but acquire this knowledge incrementally through interleaving plan execution and sensing. Therefore, robots need to use different search methods other than widely implemented conventional static search methods like the A* algorithm.

Aiming to develop autonomous agents capable of planning in dynamic environments, an increasing amount of research has been dedicated to the problem of planning with incomplete information as a response to the growing need for acting in real-world applications, such as robot path planning [1], aeromedical evacuation of injured people in crisis situations [2], air campaign planning [3], and planning for path clearance for a robot navigating to its goal without being detected by adversaries [4].

When planning with missing information, the outcomes or costs of some actions are unknown to the robot but the status of the robot is completely known. The certainty about the robot’s state and the uncertainty about the outcomes of its actions are properties of the Markov Decision Processes (MDP), which differentiates it from the Partial Observable Markov Decision Process (POMDP) where there are uncertainties about both the robot’s state and action outcomes.

Authors are with the Faculty of Engineering, Tarbiat Modares University, Tehran, Iran. Corresponding author’s email: [email protected].

The missing information about the outcomes of actions can be modeled with ‘Hidden Variables’ (HV), the exact values of which are not known at the beginning of path planning. For instance, there is not enough evidence to be sure that a door is open/closed, or a narrow corridor will/won’t be congested with other agents or dynamic obstacles before the robot reaches there. Instead, the robot uses a probability distribution for estimating the possible values of these hidden variables [5]. The exact values for HVs will be unfolded only after the robot arrives there and senses their values. The probability distribution for possible values of each HV may be based on previous observations or inferred from some other sensing actions.

For many real-world path planning problems with missing information, the initial values of the hidden variables can be roughly labeled as ‘best’, ‘good’, ‘bad’, etc. with an ordinal preference over them. For example, for a robot navigating through an initially unknown passage, the ‘best’ status is “the passage is open and traversable”, the ‘good’ status might be “the passage can be made traversable by removing the obstacles”, and the ‘bad’ status could be “the passage is impassable”.

In the presence of uncertainty about the outcomes of actions with exact sensing, a probabilistic plan can be either online or offline, and open-loop or closed-loop. Since open-loop plans are similar in online and offline modes, we can define a total of three distinct classes of probabilistic plans as follows: − Open loop plans (also known as Conformant plans), − Closed-loop offline plans (also known as Conditional

or Probabilistic Contingent plans), and − Closed-loop online plans (which we will call

Incremental Conditional). These classes are described in the following subsections.

A. Conformant Plans Some planning systems try to achieve their goal states

without sensing, and thus continue to find sequences of ac-tions regardless of different possibilities for actions’ out-comes. When preparing a conformant plan, the initial state is no longer assumed to be known and actions may also be non-deterministic. Conformant plans assume that the nature exhi-bits the fiercest behavior and always chooses the worst possi-ble successor state [8]. In conformant planning, a solution is a sequence of actions that achieves the goal from every possible initial state [7], and the plan should be successful regardless of which particular initial world we start from [9].

B. Conditional Plans When dealing with missing information, the planner may

be interested in finding plans that contain both sensing and

A Linear Programming Approach for Probabilistic Robot Path Planning with Missing Information of Outcomes

Mohamad Ali Movafaghpour and Ellips Masehian

M

2011 IEEE International Conference on Automation Science and EngineeringTrieste, Italy - August 24-27, 2011

ThA4.1

978-1-4577-1732-1/11/$26.00 ©2011 IEEE 126

selecting actions depending on the observations made during the plan execution. That is, before the execution, the planner must prepare a comprehensive scheme in which the proper actions for each step or after unfolding each hidden variable are mentioned.

The most important difference between conformant and conditional planning is the proof of existence of a solution and optimality: for some initial states and domains there may not be conformant plans, although conditional plans may exist [7]. If suitable solution procedures such as Dynamic Programming are utilized, then conditional plans can be proved to be optimal policies. While many conditional planners can deal with sensing actions, none of the conformant planners do so.

When trying to prepare an optimal conditional plan using an arbitrary search method, the belief-state space grows exponentially when the number of hidden variables (n) increases. The complexity order of this increase is O(an), in which a refers to the number of different possible values for hidden variables. More formally, a is called the theoretical worst-case branching factor [10]. Generally, if a system has |A| different actions and |P| different percepts at each transition, the computational cost to find the optimal plan with a length of k is in O(|A|k.|P|k) [10].

Because of the intractable nature of the optimal conditional planning, there are two approaches for preparing effective solution procedures: (1) divide and conquer, which iteratively solves the problem with different subsets of HVs, and (2) incremental addition of HVs (also called Assumptive Planning), which initiates the belief-state space by an empty or nonempty subset of HVs as the active HVs (with other HVs assumed to have arbitrary values) and iteratively appending more active HVs to the belief-state space.

The first approach results in suboptimal conditional plans and the second one yields near-optimal or even provable optimal plans if suitable procedures are implemented. Since the second approach is able to produce optimal plans, it is a good alternative for online conditional planning, as discussed later in the paper. Our developed algorithm takes advantage of the second approach (Incremental Assumptive Planning) for path planning with missing information, and is capable of handling numerous HVs in offline execution mode.

C. Incremental Conditional Plans Most of the researches on path planning have investigated

solution methods for a priori planning problems. However, for many real-world problems, the environment either is partially known at the time of planning, or changes dynamically. Thus, the robot must continuously adapt its plan to environment changes, or should interleave planning and plan execution for rapid response to newly collected sensory data. In these cases the initial plan might no longer be applicable or effective. It turns out that the idea of incremental addition of hidden variables to the set of searchable variables can be used to construct an efficient planner that consequently solves subproblems by performing a series of probabilistic low-dimensional searches. In other words, at the beginning of the initial planning phase, the robot solves a subproblem after

assigning some arbitrary values to all hidden variables. These arbitrary values could be the most preferred (desired)

values for the hidden variables, as suggested by Likhachev and Stentz [5]. In that case the first subproblem is solved by a backward A* search. After that, if any hidden variable is found to have taken a single value (best value in this case) and is located on the generated suboptimal path, then it should be investigated further by being added to the set of expandable hidden variables in the subsequent iteration. Through this approach, the first plan is obtained quickly, can be used for an online planning robot, and can be updated incrementally to work out a comprehensive plan containing all ‘effective’ hidden variables. As a result, a suboptimal plan always exists which can be executed in an online manner.

Nourbakhsh and Genesereth in [10] developed a robot architecture for interleaving planning and execution in presence of incomplete information, benefiting from three types of assumptions about the robot’s initial conditions, behavior of actions, and behavior of percepts. Their assumptive algorithm was tested on a real robot. Since in assumptive planning all the hidden variables are not entirely analyzed as compared to the conditional planning, the computational cost of assumptive planning is less.

Bonet et al. in [11] dealt with the problem of probabilistic sensing and action modeled as a Partially Observable Markov Decision Process (POMDP). They introduced an algorithm called Real Time Dynamic Programming (RTDP) which combines a real-time greedy search with the Dynamic Programming (DP) method. The DP updates the set of observations and the cost function.

Koenig in [6] benefited from interleaving planning and plan execution to reduce the total planning and plan execution cost, and called this approach an ‘agent-centered search’. He generalized the previously developed algorithm LRTA* by Korf [12] and also developed a method based on POMDP planning and utility theory for goal-directed acting in the presence of incomplete information. In [13] Koenig introduced the Lifelong Planning A* (LPA*), which is an incremental version of A* that repeatedly finds shortest paths in a dynamic graph from a given start vertex to a goal vertex. The graph’s edges or vertices are either added/deleted, or the costs of edges are changed. Through extensive experimental results he showed that in case of small changes in the initial graph, the LPA* significantly outperforms the A* algorithm.

Recently, Likhachev and Stentz [5] focused on hidden variables taking binary values of either clearly preferred or clearly non-preferred values. They used a series of low-dimensional backward A* searches applied on a subproblem of the base problem in an incremental manner. The subproblems are produced by assigning preferred values to hidden variables. This assumption is corrected later for effective hidden variables through which the path generated by backward A* passes or if their exact value is sensed.

In this paper, we have extended the notion of binary-valued hidden variables to multi-valued hidden variables. This enables us to solve a wider range of problems with higher varieties of stochastic behaviors. Moreover, instead of A*-based search methods, we have developed a stochastic

127

dynamic programming solution approach for planning with missing information which is formulated as a Linear Programming (LP) model (called model P1) and generates optimal conditional plans. Also, another LP model is presented (called model P2) that produces optimal conditional plans for incrementally-generated subproblems.

In addition, a heuristic algorithm called Incremental Assumptive Probabilistic Planning (IAPP) is developed which benefits from assumptive planning for missing information with multi-valued ordinal preferences to generate solely active states for incremental plans and deal with large scale state-space graphs. The IAPP applies the model P2 to generate optimal conditional plans for subsets of active states.

The paper is organized as follows: In Section II, planning with incomplete information and the concept of ordinal preferences are explained. The developed solution approaches for Markov Decision Processes of the problem at hand are described in Section III. Next, the developed framework IAPP is presented in Section IV. Experimental results are given in Section V, and finally some concluding remarks are presented in Section VI.

II. PLANNING WITH MISSING INFORMATION Let us consider the simple example introduced in [5] in its

original form: Example 1. A mobile robot is to plan a path in a partially known terrain (Fig. 1). At the beginning of planning, it is located in cell A4 and its goal is to reach the cell F4. Un-traversable cells are shown in black, and there are two cells B5 and E4 (shaded in grey) whose status is unknown to the robot. For each, the probability of containing an obstacle is 50%. In this example, the robot is restricted to move only in four compass directions. Whenever the robot attempts to enter an unknown cell, it is assumed that it moves towards the cell, senses it and enters it if it is free and returns back otherwise. The cost of each move is 1, and the cost of moving towards an unknown cell, sensing it and then returning back is 2.

More formally, in the graph G in Fig. 1(d) which represents the planning problem, all nodes and edge lengths are fully known except for the edges incident to B5 and E4, i.e., e(B5, B6) and e(E4, F4), respectively. In particular, these are hidden variables whose statuses affect the outcomes and/or costs of certain actions (or state transitions) and are unknown to the robot at the time of planning. Instead, the robot has probability distributions (beliefs) over the possible values of these hidden variables. During plan execution, however, the robot may sense one or more of these hidden variables at certain states. Once the robot senses the status of each HV, its actual value becomes known. That is, the connection or disconnection to all its neighbors of the concerning node is determined at once.

The problem of planning with incomplete information and perfect sensing can be modeled as an MDP. Incomplete information refers to the situation in which the real values of hidden variables are unfolded only after sensing. The special problem of interest in this paper can be modeled as an MDP with finite states and infinite horizon.

A B C D E F A B C D E F 1 1 2 2 3 3 4 Sstart Sgoal 4 Sstart Sgoal

5 5 6 6

(a) (b) A B C D E F 1 2 3 4 Sstart Sgoal

5 6 (c)

(d) Fig. 1. The problem of robot navigation with binary-valued hidden variables in a partially-known terrain: (a) The robot tries to enter the cell E4. (b) E4 is sensed to be open. (c) E4 is sensed to be blocked and the robot returns back. (d) Graph representation of the environment.

POMDP is a more general form of MDP and its solution approaches are more inefficient. Papadimitriou and Tsitsiklis [16] state that POMDP is PSPACE-complete and thus is more intractable than NP-Complete problems. Consequently, optimal conditional planning for real sized POMDP problems becomes impossible due to memory requirements.

Optimal planning for the finite horizon MDP is also chal-lenging because the straightforward solution approach for it is stochastic dynamic programming [15] and the time com-plexity of this general-purpose solution method is dependent on the problem structure and usually grows exponentially with the state variables [17]. However, in some cases efficient dynamic programming methods are developed [16].

Although optimal planning for POMDP is intractable and finite horizon MDPs cannot be solved efficiently in general form, infinite horizon MDPs can be transformed into Linear Programming forms [15]. LP models are usually solved by the Simplex efficient algorithm which is proved to have a strongly polynomial average-case complexity [18]. Klee and Minty constructed a family of LP problems for which the Simplex method takes a number of steps exponential in the problem size [19]. However, other polynomial solution approaches are developed to deal with LP problems such as the ellipsoid method and some interior point methods [20].

III. SOLVING THE MARKOV DECISION PROCESS Solving an MDP means finding a policy π that prescribes

the optimal action in every state i. Indeed, the policy π is a conditional plan. The MDP can be solved through a variety of algorithms such as Dynamic Programming (DP), Value Iteration, and Linear Programming (when dealing with infinite horizon MDPs). In the remaining of the paper we discuss these solution approaches.

A. A Stochastic Dynamic Programming Scheme for MDP Markov Decision Processes are models for sequential

decision making when outcomes are uncertain [14]. The MDP model consists of decision period, states, actions, rewards, and transition probabilities. The process is summarized as below: 1) At time t, a certain state i of the Markov chain is observed.

A4 B4

B5

B6

C4 D4

C3

E4 F4

C2

C1

F3

F2

F1 D1 E1

C6 D6 E6 F6

F5

A6

128

2) After the observation of the state, an action, e.g. k, is taken from a set of possible decisions Ai. Different states may have different sets of decisions.

3) An immediate gain (or loss) qi(k) is then incurred

according to the current state i and the action k taken. 4) The transition probability pij

(k) is then affected by the action k.

5) As the time parameter t increases, transition occurs again and the steps (1)–(5) are repeated.

Dynamic Programming is a solution approach for solving sequential decision models based on inductive computation. In Stochastic Dynamic Programming (SDP), the total cost vs is the expected path length between the start (s) and goal (g) nodes, and should be minimized subject to vg = 0 (in backward calculations) and some normal and Bellman equations representing certain and uncertain state transitions, respectively (explained later).

Consider a planning problem with missing information as a directed graph G(V, E) with the set of certain nodes V as states and the set of edges E as admissible transitions. The set E is partitioned into two mutually exclusive subsets Ec and Eu, which contain certain and uncertain edges, respectively. e(i, k, j) ∈ Ec means that in each state si ∈ V, trying to execute the action a(i, k) results in a certain state transition that positions the robot in state j ∈ V. Alternatively, e(i, k′, j) ∈ Eu means that in trying to execute the uncertain action a′(i, k′) with incomplete knowledge about its consequences, the robot is positioned in the state j ∈ V with probability pij

(k′). Further notations are described as follows:

Parameters u0 The set of all unknown hidden variables. uA(t) The set of all active combinations in iteration t. si

u The state i when the set of hidden variables is u.

qi(k) The immediate gain or loss when the robot is in

si and actuates the action k. pij

(k) The probability of reaching sj when action k is selected from si.

succC(si) The set of certain successors of si. succC(si, k) The certain successor of si when action k is

selected. succU(si) The set of probabilistic successors of si. succU(si, k) The probabilistic successor of si when action k

is selected. a(i, k) The k-th action executed in state i. a′(i, k′) The k′-th uncertain action with missing

information of outcomes executed in state i. A(i, u) The set of all possible actions in state i with the

set of unknown hidden variables u. e(i, k, j) The edge e ∈ E corresponding to the state

transition from si into sj when action k is selected.

Decision variables vi

u The expected path length connecting si to sgoal

when all the hidden variables in the subset u have unknown values.

Normal equations are the set of constraints for certain state transitions from si

u to sju if action k is selected, as:

( )min{ }= +u k ui i jk

v q v , , , , | ( , , ) cu i k j e i k j E∀ ∈ (1)

and Bellman’s equations refer to uncertain state transitions from si

u succeeded by e(i, k, j) ∈ Eu if action k is selected, as:

( ) ( )min ( ) , ,′

∈= + ∀

⎧ ⎫⎨ ⎬⎩ ⎭∑

i

u k k ui ij i jk A j

v p q v i u , (2)

in which { } , if ( , , ) ,

, , if ( , , ) .

−′ = ∉

=≠ ∈

⎧⎪⎨⎪⎩

u kiu

j uj

v i j e i k j Ev

v i j e i k j E

For further clarification, the Bellman’s equations are enforced to the graph representation of the Example 1 in Fig. 1(d). As a result, the optimal v-values for active nodes are: vA4

u0 = 10, vB4u0 = 9, vC4

u0 = 8, vD4u0 = 7,

vE4u0 = 1, vD4

u0−{E4} = 10, vC4u0−{E4} = 9, vC3

u0−{E4} = 8, vC2

u0−{E4} = 7, vC1u0−{E4} = 6, vD1

u0−{E4} = 5, vE1u0−{E4} = 4,

vF1u0−{E4} = 3, vF2

u0−{E4} = 2, vF3u0−{E4} = 1, vF4

u0−{E4}= 0. This means that the robot’s optimal plan starts from cell A4 and passes through cells B4 to D4. At the cell D4, the robot must verify if E4 is traversable. If it is open, then the robot passes through it and reaches its goal state, F4. Otherwise, the robot should retract to C4 and continue its way through C3 towards its goal, F4. The v-value of each state demonstrates the expected path length to the goal node starting from that state. For instance, vA4

u0 = 10 means that the expected path length from A4 when the value of all hidden variables are unfolded is 10.

Probabilistic planning for the robot’s environment with the graph representation shown in Fig. 1(d) is equivalent to planning for a belief-state space like that in Fig. 2. Each unknown cell is shaded in gray and is succeeded by two possibilities, Open or Close, illustrated with the letters ‘O’ and ‘C’ associated with the arcs extended from the cells with hidden values. In a graph with N nodes, the existence of two hidden variables with m possible values (two values in this case for Open and Close states) results in a belief-state space graph with N×2m nodes. In Fig. 2, each of the four subgraphs represents a combination of HVs. Overall, with n HVs and a branching factor of m all possible combinations will be mn.

In our backward Stochastic DP method, only the previously calculated v-values are required for solving an equation, and there is no need to consider ‘unknown’ values for HVs. This makes the SDP efficient since the worst-case branching factor is set to m, while in most of the similar works the worst-case branching factor is set to m+1, with the additional one being assigned for taking into account an ‘unknown’ value as a possible value for each HV, and thus causing an exponential expansion in the belief-state space.

B. Linear Programming Formulation As mentioned in Section II, infinite horizon MDPs can be

transformed into Linear Programming form [15], and since planning with missing information is an infinite horizon MDP, it can be modeled as an LP problem as well. Solving the recursive equations introduced in (1) and (2) in their general forms is tedious, but can be simplified by

129

considering them as a system of simultaneous equations which are easily transformed into an LP model. In this way, all the Bellman equations can be easily solved simulta-neously using any off-the-shelf LP solver. Transforming the normal and Bellman equations into their respective linear equations is depicted in Fig. 3 and the corresponding LP model is given in the model P1.

Fig. 2. Possible robot states in the belief-state space graph.

( )

( )

min{ }

is equivalent to

,

= +

≤ + ∀

u k ui i jk

u k ui i j

v q v

v q v k

( ) ( )

1

( ) ( )min

is equivalent to

( )

( ){ }∈

=

′=

≤ +

+

∑i

ui k A

nu k k ui ij i j

j

k k uij i j

jv

v p q v

p q v

(a) (b) Fig. 3. Different situations in a belief-state space graph and their respective formulations for LP modeling: (a) The node i has a set of certain successors as admissible choices. (b) The node i has a set of probabilistic successors.

0

0

( )0

( ) ( )0

1

max

0,

, , , ( , ), ( )

( ), , , ( , ), ( )

0,

ustart

ugoal

u k ui i j

nu k k ui ij i j

j

ui

z v

v u u

v q v i u u k A i u j succC i

v p q v i u u k A i u j succU i

v

=

=

= ∀ ⊂

≤ + ∀ ⊂ ∈ ∈

≤ + ∀ ⊂ ∈ ∈

P1 :

0 ,i u u∀ ⊂

The term ∀u ⊂ u0 in the constraints set of P1 refers to each combination u of HVs, and since for n HVs each with m possible states the total number of combinations equals to mn, the solution space grows exponentially both in constraints and decision variables. When comparing P1 with the Bellman’s Equations introduced earlier, it can be easily inferred that these two approaches use equivalent sets of linear equations.

C. An Incremental LP Approach to Assumptive Planning As mentioned above, in planning with missing information

using Stochastic DP or equivalently P1, the equations set and thus the solution space (i.e. the belief-state space) grows exponentially with an increase in the number of hidden variables. However, by introducing ordinal preferences on the missing information, a new and scalable method based on the notions of assumptive planning and Incremental Conditional Planning approach can be developed. In fact, we will be able to solve the problem by exploring only ‘Active combinations’ of hidden variables rather than their all possible combinations (i.e. the original belief-state space). The concept of active combination is defined as follows: Definition 1. Assume that ρ is an optimal/suboptimal policy

on a belief-state space graph. Any node (or hidden variable) i that the policy ρ traverses through it is called an Active node (or Active HV). That is, ∃ u; si

u ∈ ρ. Definition 2. An Active Combination u′ is a subset of u0

such that the policy ρ traverses through it. That is, ∃ i; si

u ∈ ρ, u′ = u0 − u. Exploring just Active combinations instead of planning

with regard to all possible combinations means replacing the term ∀u ⊂ u0 with ∀u ⊂ uA in P1, where uA represents the set of all Active combinations.

The Incremental LP algorithm is designed such that initially uA(0) = {φ} (in which φ is the most preferred value of hidden variables), and all Active combinations are identified and appended to it iteratively. The procedure of generating new active combinations in the t-th iteration is as follows:

Procedure Update(uA(t)): For each i-th hidden variable (HVi) with an assumed value under the combination u′ ∈ uA(t) such that the given policy ρ passes through it, the set of Active combinations in the next iteration uA(t+1) will be:

uA(t+1) = uA(t) ∪ {u′∪ HVi}.

Having set the Active combinations uA(t) the model P1 is reformulated based on the reduced belief-state space as the model P2.

siui

s1u1

sjuj

snun

M

M

s1u1

sjuj

snun

siu

P(sj)

P(s1)

P(sn)

pij(k) M

M

A4 B4

B6

C4 D4

C3

F4

C2

C1

F3

F2

F1D1 E1

C6 D6 E6 F6

F5

E

B

O

O C

C

U = {B5, E4}

A6

A4 B4

B6

C4 D4

C3

F4

C2

C1

F3

F2

F1D1 E1

C6 D6 E6 F6

F5

E

BO

C

U − {E4} = {B5}

A6

A4 B4

B6

C4 D4

C3

F4

C2

C1

F3

F2

F1D1 E1

C6 D6 E6 F6

F5

E

B

O

C U − {B5} = {E4}

A6

A4 B4

B6

C4 D4

C3

F4

C2

C1

F3

F2

F1D1 E1

C6 D6 E6 F6

F5

E

B5

U − {E4} − {B5} = φ

A6

130

0

( )

( ) ( )

1

max

0,

, , , ( , ), ( )

( ), , , ( , ), ( )

0

A

u

j A

A

ustart

ugoal

u ki i

nu k k ui ij i j

j

ui

z v

v u u

v q v i u u k A i u j succC i

v p q v i u u k A i u j succU i

v

=

=

= ∀ ⊂

≤ + ∀ ⊂ ∈ ∈

≤ + ∀ ⊂ ∈ ∈

P2 :

, .Ai u u∀ ⊂

IV. INCREMENTAL ASSUMPTIVE PROBABILISTIC PLANNING In preparing a conditional plan for path planning with

ordinally preferred missing information, when the number of hidden variables is small, solving the recursive equations of its corresponding Stochastic DP formulation (presented in Section III.A) or its equivalent LP model (introduced in Section III.B) is recommended. But in real-world problems with a large number of hidden variables, we propose a new 5-step iterative algorithm called Incremental Assumptive Probabilistic Planning (IAPP).

The IAPP sequentially applies the model P2 to subproblems related to a subset of combinations of u0 (i.e., the Active combinations) instead of considering all possible combinations of u0. The algorithm is described as follows:

Step 0. Set all hidden variables to their most preferred values, i.e., uA(0) = {φ}.

Step 1. Plan with the model P2 to find the optimal policy (which is the preferred path to the goal). If the generated policy does not pass through any HV with an assumed value, then go to Step 4. Otherwise, pass on the newly identified Active HVs and their relative combination (u′ = φ) to Step 2.

Step 2. Reconsider all the newly identified hidden variables planned to be active HVs in generating some new Active combinations, and update the set uA(t) according to aforementioned procedure. Reformulate P2 to accommodate the updated uA(t).

Step 3. Plan with the model P2 to find the optimal policy for the updated subproblem. If the generated policy does not pass through any hidden variable with an assumed value, then go to Step 4. Otherwise, send the newly identified Active HVs to Step 2.

Step 4. Stop and report the optimal policy. This approach does not guarantee to find global optimal

solutions unless no hidden variable is contained in the shortest path generated by the first execution of Step 1. This optimality condition can be proved straightforwardly.

Instead of planning with the linear model P2 to find a suboptimal policy, one may think that implementing the A* search might be more favorable for finding the optimal or a near-optimal solution to the subproblems. However, we should note that the A* is a heuristic search method and is unable to consider all the v*-values at once. For instance, Likhachev and Stentz manipulated the standard A* into a

backward search with the Bellman equations applied on estimated v-values [5]. As a result, their presented algorithm lacked the optimality conditions in general form.

Unlike the stepwise heuristic A* search, since the model P2 simultaneously considers all the Bellman equations pertaining to all active HVs, it is capable of finding the minimum expected cost policy for each subproblem. Remarkably, if we manage to prove that the subset of HVs considered in a subproblem is the optimal subset, then our developed LP model is capable of finding the global optimal policy for the original problem. This proof, however, is among our future works.

Here the solution of the Example 1 by the IAPP method is presented: In the first iteration the generated path is:

A4 → B4 → C4 → D4 → E4 → F4. Since E4 is a hidden variable and the robot has to traverse through it with u′ = φ (which is a subset of uA(0) = {φ}), the next set of active combinations will be: uA(1) = uA(0) ∪ {u′∪ E4}. The path generated in the second iteration will be:

A4 → B4 → C4 → D4 → E4 → F4 (if E4 is Open), or, A4 → B4 → C4 → D4 → E4 → D4 → C4 → C3 → C2 → C1 → D1 → E1 → F1 → F2 → F3 → F4 (if E4 is Closed).

In order to evaluate the efficiency of the IAPP algorithm we designed and solved a number of problems. Also, to assess the effect of reducing the size of the belief-state space through incremental implementation of the Active combinations, we solved the P1 models of the same problems by the Lingo™ solver and compared their runtimes and generated path lengths.

The test problems consisted of finding the optimal policy for a robot moving in a 4-connected grid as a terrain environment from a start cell to a goal cell. For producing random grid worlds we used randomly generated fractal environments that are often used to model outdoor environments, as discussed in [21]. The start and goal states were placed in the north-west and south-east corner of the grid, respectively. Both approaches were implemented on a PC with 1.5 GHz of CPU speed and 500 MB of RAM.

Table 1 summarizes the elapsed times (in seconds) for finding optimal policies obtained by solving the P1, and near-optimal policies calculated by the IAPP method. For each instance a Lower Bound (LB) and an Upper Bound (UB) on the path length was calculated by assigning certain values to HVs. When the most preferred value is assigned to all HVs, the calculated shortest path is considered as an LB, and if the least preferred value is assigned to all HVs, then the calculated shortest path is considered as an UB.

We observed that small instances with less than 10 HVs could be solved optimally with P1, but larger instances were impossible to be solved because of the “Out of Memory” error occurred in the LP solver. This is caused by exponential growth in solution space mentioned in Sec. III.A.

For example, formulating the model P1 for a 20×20 4-connected grid with 20 HVs requires 20×4×220 decision

131

variables. This value becomes 20×4×240 for the same grid with 40 HVs. As it can be seen in Fig. 4, the runtimes for optimal solutions grow exponentially and depend on the number of hidden variables, but not on the grid size. Likewise, the runtimes of IAPP are significantly independent of the grid size, but are affected by the number of HVs, still much less than the optimally-solved P1.

TABLE I. Experimental Results of solving random problems by the two methods.

Grid size No. of HVs

% of HVs

Elapsed Time Path Length

Opt. IAPP LB Opt. IAPP UB

5×5 3 12 < 1 < 1 8.96 10.07 10.07 10.80

5×5 6 24 < 1 < 1 5.10 7.56 7.56 7.56

10×10 5 5 < 1 < 1 20.96 21.15 21.15 21.15

10×10 10 10 152 < 1 17.08 17.38 17.38 18.05

10×10 20 20 − 1 22.19 − 23.18 23.22

20×20 20 5 − < 1 37.16 − 37.19 37.36

20×20 40 10 − 5 44.50 − 45.40 46.29

20×20 80 20 − 103 41.62 − 45.08 45.92

40×40 80 5 − 9 87.42 − 88.47 88.72

40×40 160 10 − 496 90.00 − 92.86 94.56

Fig. 4. Runtimes versus the number of hidden variables.

V. CONCLUSION In this paper we have focused on conditional planning for

multi-valued missing information in robot path planning. The missing information about the outcomes of actions can be modeled with ‘Hidden Variables’ (HV), the exact values of which are not known at the beginning of path planning.

Since the belief-state space of such problems grows ex-ponentially with the number of hidden variables (HVs), an incremental algorithm was developed based on the notion of assumptive planning to plan in a search space comprised of only ‘Active’ combinations of HVs instead of a search space containing ‘all’ the possible combinations of HV states. In order to deal with the intractable state-space graph, by using assumptive planning, we initially assign the most preferred values to the hidden variables, but then relax this optimistic assumption for HVs identified to be Active and thus used in generation of subsets of active combinations. The subproblem regarding to each set of active combination is

solved with a linear programming model (P2) which is a transformation from the stochastic dynamic programming model of the problem.

Because the hidden variables are assumed to take their most preferred values before being added to the belief-state space, this optimistic assumption yields near-optimal solutions. In our future research we will try to work out the optimality conditions or lower/upper bounds for the LP formulation of the reduced Active belief-state space.

REFERENCES [1] A. Stentz, “Optimal and efficient path planning for partially-known

environments,” in Proc. IEEE International Conference on Robotics and Automation (ICRA), San Diego, CA, 1994, pp. 3310–3317.

[2] A. Kott, V. Saks, and A. Mercer, “A new technique enables dynamic replanning and rescheduling of aeromedical evacuation,” Artificial Intelligence Magazine, vol. 20, no. 1, pp. 43–53, 1999.

[3] K. Myers, “CPEF: A continuous planning and execution framework,” Artificial Intelligence Magazine, vol. 20, no. 4, 1999, pp. 63–69.

[4] M. Likhachev and A. Stentz, “Path clearance”, IEEE Robotics & Automation Magazine, vol.16, no.2, pp. 62–72.

[5] M. Likhachev and A. Stentz, “Probabilistic planning with clear preferences on missing information,” Artificial Intelligence, vol. 173, pp. 696–721, 2009.

[6] S. Koenig, “Goal-directed acting with incomplete information,” Ph.D. dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1997.

[7] T. Son, P. Tu, and C. Baral, “Planning with sensing actions and incomplete information using logic programming,” in Proc. 7th Int. Conf. Logic Prog. & NonMonotonic Reasoning (LPNMR’04), V. Lifschitz and I. Niemelä (Eds.) LNCS Vol. 2923, Springer, pp. 261–274, 2004.

[8] S. Koenig, “Minimax real-time heuristic search,” Artificial Intelligence, vol. 129 pp. 165–197, 2001.

[9] J. Hoffmann and R. I. Brafman, “Conformant planning via heuristic forward search: A new approach,” Artificial Intelligence, vol. 170, pp. 507–541, 2006.

[10] I. R. Nourbakhsh, and M. R. Genesereth, “Assumptive planning and execution: a simple, working robot architecture,” Autonomous Robots, vol. 3, no. 1, pp. 49–67, 1996.

[11] B. Bonet and H. Geffner, “Planning with incomplete information as heuristic search in belief space,” in Proc. 5th Int. Conf. AI Planning and Scheduling (AIPS), Colorado, AAAI Press, pp 52–61, 2000.

[12] R. Korf, “Realtime heuristic search,” Artificial Intelligence, vol. 42, no. 23, pp. 189–211, 1990.

[13] S. Koenig, M. Likhachev, and D. Furcy, “Lifelong planning A*,” Artificial Intelligence, vol. 155, pp. 93–146, 2004.

[14] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.

[15] W. K. Ching and K. N. Michael, Markov Chains: Models, Algorithms and Applications, Springer Science, 2006.

[16] C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of Markov decision processes,” Mathematics of Operations Research, vol. 12, no. 3, pp. 441–450, 1987.

[17] H. A. Taha, Operations Research: An Introduction, 8th ed., Pearson Prentice Hall, Upper Saddle River, NJ., 2007.

[18] H. Petra and K. H. Borgwardt, “Interior-point methods: worst case and average case analysis of a phase-I algorithm and a termination procedure,” Journal of Complexity, vol. 18, pp. 833–910, 2002.

[19] V. Klee and G. J. Minty, “How good is the Simplex algorithm?” in O. Shisha (Ed.) Inequalities, III, Academic Press, NY, pp. 159–175, 1972.

[20] M. S. Bazaraa, J. J. Jarvis, and H. D. Sherali, Linear Programming and Network Flows, 4th ed., John Wiley & Sons, Hoboken, NJ, 2010. A. Stentz, “Map-based strategies for robot navigation in unknown environments,” in Proc. AAAI Spring Symposium on Planning with Incomplete Information for Robot Problems, Stanford University, California, pp. 110–116, 1996.

AIPP

Opt.

0

100

200

300

400

500

1 2 3 4 5 6 7 8 9 10

600

132