A game theoretical approach to finding optimal strategies for pursuit evasion in grid environments

A Game Theoretical Approach to Finding Optimal Strategies forPursuit Evasion in Grid Environments

Francesco Amigoni and Nicola Basilico

Abstract— Pursuit evasion problems, in which evading targetsmust be cleared from an environment, are encountered insurveillance and search and rescue applications. Several workshave addressed variants of this problem in order to studystrategies for the pursuers. As a common trait, many ofthese works present results in the general form: given someassumptions on the environment, on the pursuers, and on theevaders, upper and lower bounds are calculated for the timeneeded for (the probability of, the resources needed for, . . . )clearing the environment. The question “what is the optimalstrategy for a given pursuer in a given environment to clear agiven evader?” is left largely open. In this paper, we proposea game theoretical framework that contributes in finding ananswer to the above question in a version of the pursuitevasion problem in which the evader enters and exits a gridenvironment and the pursuer has to intercept it along its path.We adopt a criterion for optimality related to the probability ofcapture. We experimentally evaluate the proposed approach insimulated settings and we provide some hints to generalize theframework to other versions of the pursuit evasion problem.

I. INTRODUCTION

In a pursuit evasion problem, pursuers have to capturesome evaders in an environment [1]–[3]. Pursuit evasionproblems are encountered in many real world applications,for example in those related to security and to search andrescue. The literature has investigated manifold variationsof the problem, producing a large amount of results. Mostpart of the work has been devoted to establish bounds, validfor classes of environments, pursuers, and evaders, on thenumber of pursuers needed to guarantee the capture of theevaders, on the expected time required for capture, and onthe computational complexity of the strategies adopted by thepursuers. Relatively few works have addressed the problemof finding the optimal pursuing strategy for a specific setting(i.e., for given environment, pursuer, and evader).

In this paper, we present a game theoretical approach tocalculate optimal pursuing strategies for given environments,pursuers, and evaders. We consider a pursuit evasion settingin which a single pursuer moves in a grid environment,while an evader is initially outside the environment, enters itthrough an entrance door, traverses it until it reaches an exitdoor, and leaves through that door. If the evader is interceptedby the pursuer while it is in the environment, namely if theyoccupy the same cell of the grid at the same time, then theevader is captured. In our approach, the optimal strategyfor the pursuer is the one that maximizes the probability of

F. Amigoni is with the Dipartimento di Elettronica e Informazione,Politecnico di Milano, Italy [email protected]

N. Basilico is with the School of Engineering, University of California,Merced, USA [email protected]

capturing the evader. We conducted experimental activities insimulation to assess the properties of the proposed approach.

The main original contribution of this paper is the proposalof a game theoretical framework that can be used for calcu-lating the optimal strategy for the pursuer in a pursuit evasionsetting. Finding the optimal pursuing strategy in a givensetting is a task that arises when applying pursuit evasiontechniques to specific practical scenarios. This problem hasfound solutions based on genetic algorithms [4] and onheuristic approaches [5]. We propose a game theoreticalapproach that is inspired by our previous work on roboticpatrolling [6] that, in this paper, is applied to a rather differentscenario and significantly extended (e.g., considering anevader that is as fast as the pursuer).

This paper is organized as follows. The next sectionreviews related works in the area of pursuit evasion. Sec-tion III introduces the pursuit evasion setting we consider,while Section IV presents our approach for finding theoptimal strategy for the pursuer. Section V reports on theexperimental validation of our approach. Finally, Section VIconcludes the paper.

II. RELATED WORK

A pursuit evasion game is played by some pursuers andsome evaders in an environment. The goal of the pursuersis to capture the evaders that, in turn, try to escape andavoid capture. Several specialized versions of the pursuitevasion game have been studied. Each problem is associatedto specific characteristics of the environment and to specificcapabilities of, and information available to, the pursuers andthe evaders. In this section, we review some of the worksappeared in the literature, without attempting to provide acomplete coverage (see [7] for a recent survey), but trying togive the general flavor of the results that have been obtained,in order to motivate the point of view we take in this paper.

The study of pursuit evasion games can be traced back tothe von Neumann’s hide-and-seek games [8], where a hideragent chooses one cell of a two-dimensional grid in whichto hide itself and a seeker agent chooses a subset of cellsof the grid (usually one row and one column) in which toseek the hider. If the seeker agent seeks the cell chosen bythe hider agent, then the hider is captured and the seekerwins the game. Otherwise, the hider wins the game. Startingfrom this seminal work, adversarial pursuit evasion settingshave been modeled as differential games [9]. Environmentsin which the game takes place are usually represented asgraphs or geometrically.

https://www.researchgate.net/publication/2296668_Experiments_in_Learning_Prototypical_Situations_for_Variants_of_the_Pursuit_Game?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

https://www.researchgate.net/publication/227443793_The_Hide_and_Seek_Game_of_Von_Neumann?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

https://www.researchgate.net/publication/221068792_Finding_the_optimal_strategies_for_robotic_patrolling_with_adversaries_in_topologically-represented_environments?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

https://www.researchgate.net/publication/3299275_Probabilistic_pursuit-evasion_games_theory_implementation_and_experimental_evaluation_IEEE_Trans_Rob_Autom?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

https://www.researchgate.net/publication/220814856_Algorithms_and_Complexity_Results_for_Pursuit-Evasion_Problems?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

https://www.researchgate.net/publication/221455647_ESP_Pursuit_Evasion_on_Series-Parallel_Graphs?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

https://www.researchgate.net/publication/228768181_Search_and_Pursuit-evasion_in_Mobile_Robotics?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

Let us start from works that consider topological, graph-based, representations of the environment (extensively sur-veyed in [2]). For instance, [1] presents complexity andalgorithmic results for a number of pursuers that have toclear a graph and explicitly links pursuit evasion problemsto patrolling problems addressed in mobile robotics [6], [10].This is in line with our contribution that draws inspirationfrom work on robotic patrolling to define a game theoreticalframework for finding optimal pursuing strategies. Otherresults on pursuit evasion on graphs are presented in [11],where a pursuer with limited visibility has to capture anevader with global visibility. The paper addresses also thesymmetric visibility case, presenting a characterization ofenvironments for which greedy algorithms guarantee thecapture of the evader. For the asymmetric case, an upperbound on expected capture time is derived. This result isgeneral, in the sense that it holds for every environmentrepresented as a graph. Although a lower bound on expectedcapture time is presented for a special class of graphs, thequestion of what is the optimal pursuing strategy for a givengraph is left unanswered.

An attempt to calculate the optimal pursuing strategyon graph environments is reported in [5]. The environmentis represented by a special form of graph, called “series-parallel” (i.e., a treewidth-2 graph). The optimal pursuingstrategy is defined as the strategy that minimizes the sumof travelled distances of the pursuers or that minimizes theexpected capture time. Since the problem is NP-hard, theauthors propose a heuristic approach producing, generallysub-optimal, strategies that are then experimentally comparedwith the optimal strategies (calculated with a brute-forcesearch) in small settings. In our approach we determine theoptimal pursuing strategies, but in a different setting andusing the probability of capturing the evader as criterion foroptimality.

Another class of approaches try to learn optimal strategiesfor the pursuers. For example, [4] considers grid settings (thatcan be easily reduced to graph settings) and applies geneticalgorithms to learn the optimal behavior (represented by a setof pairs situation/action) for pursuers. In the variants of thepursuit-evasion game considered in [4], the evader escapeseither moving randomly or trying to maximize distancefrom pursuers. The assumption is that the evader knows thepositions of the pursuers. In our setting, we assume thatthe evader knows the randomized strategy of the pursuer.Another difference is that the strategy learned in [4] (the setof situation/action rules) is deterministic, while the strategyreturned by our approach is a probability distribution.

Another large fraction of works consider a geometricalrepresentation of the environment, for example, as a col-lection of geometric primitives (like polygons). Althoughour approach represents environments as discrete grids andcannot be directly compared with these works, there aresome common traits in the formulation of the pursuit evasiongame that are discussed in the following. For example, [3]considers bounded environments in which a team of aerialand ground pursuers have to capture evaders and, at the same

time, build a map of the initially unknown environment. Inthe theoretical framework of [3], evaders are assumed to berandomly moving, but in the experimental activity (which isperformed with simulated and real robotic agents) evadersare also considered intelligently trying to avoid capture. Thetheoretical framework is used to decide the optimal (withrespect to the expected capture time) policy of the pursuers,namely where they move at the next time instant. Two sub-optimal but efficiently computable greedy policies are alsoproposed. Although we also look for the optimal strategy ofthe pursuer, our scenario is rather different from that of [3]because we assume that the environment is known and thatthe optimality refers to the probability of capturing an evader.

In the setting considered in [12], there is an arbitrarily fastevader that has full visibility over a geometrically representedenvironment (and, in particular, over the position of thepursuer) and that actively tries to avoid capture. Similarlyto our work, there is a door through which the evader canescape the environment and win the game. Also similarly toour work, the pursuer uses a randomized strategy to makeits decisions unpredictable for the evader. The evader knowsthe randomized strategy of the pursuer, but does not know itsrealization. For instance, the evader knows that the pursuerwill move to point A with probability 1

3 and to point B withprobability 2

3 , but it does not know to which point the pursuerwill actually move. The main result of the paper is an upperbound on the expected time of capture. As noted by theauthors, “the randomized strategy may not be optimal forsome environments”. This is exactly the issue we addressin this paper: given an environment, what is the optimalpursuing strategy for given pursuers and evaders?

III. OUR PURSUIT EVASION SETTING

This section introduces the pursuit evasion setting weconsider. The environment is represented by a bounded (8-connected) grid map, which has m entrance cells and nexit cells. For simplicity, we focus on rectangularly-boundedenvironments, but our approach is applicable to any environ-ment. Any cell can belong to one of the following types:entrance cell, exit cell, obstacle cell, free cell. Obstacle cellsare not traversable by the agents, while all other cells are.We assume that the environment is fixed.

There are a pursuer and an evader. Both agents knowthe environment and, in particular, the position of entranceand exit cells. The pursuer is equipped with sensors ableto detect the evader when they are in the same cell. Whenthe evader is detected by the pursuer, we say that it iscaptured. Following what discussed in [13], it is not difficultto extend the approach to pursuers with wider sensor ranges,but the notation becomes more complicated. In this paper weconsider the simplest perception model for the pursuer in theattempt to make a clearer presentation. The goal of the evaderis to enter the environment (from one of the entrance cells)and to leave it (from one of the exit cells) without beingcaptured. The goal of the pursuer is to capture the evaderwhile it is in the environment. If the evader successfully

https://www.researchgate.net/publication/224086191_Pursuit-Evasion_on_Trees_by_Robot_Teams?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==








https://www.researchgate.net/publication/221075805_Multi-robot_perimeter_patrol_in_adversarial_settings?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

https://www.researchgate.net/publication/222826445_The_role_of_information_in_the_cop-robber_game?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==


https://www.researchgate.net/publication/221155219_Extending_Algorithms_for_Mobile_Robot_Patrolling_in_the_Presence_of_Adversaries_to_More_Realistic_Settings?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==


https://www.researchgate.net/publication/3450216_Randomized_Pursuit-Evasion_in_a_Polygonal_Environment?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

enters and leaves the environment without being captured, itwins the game. If it is captured, the pursuer wins the game.

The pursuer and the evader act at discrete time steps. Everyagent performs an action at each time step. The pursuer canmove from its current cell to an adjacent free cell or canremain in its current cell. For example, a pursuer currentlyin cell 4 of the simple environment of Fig. 1 can performthe 8 possible actions shown in the figure.

0

86

53

21

4

7

Fig. 1. Possible actions (black arrows) for a pursuer in cell 4 of a 3× 3environment (cell 7 is an obstacle cell)

The evader is assumed to start outside the environmentand to decide through which entrance cell it will enter andthrough which exit cell it will leave the environment. Weassume that the evader always follows the shortest path togo from an entrance cell to an exit cell. Recalling that weconsider 8-connected grid environments, we assume that anymovement (vertical, horizontal, or diagonal) from a cell toan adjacent cell has the same cost. Fig. 2 shows an exampleof shortest paths for the evader in a simple environment. Asa consequence, the possible actions for the evader are theshortest paths that connect any entrance cell with any exitcell. For example, in the case of Fig. 2, the evader has 4possible actions. Note that selecting a path implies selectingthe entrance and the exit cells.

0

86

53

21

4

7

Fig. 2. Possible paths (in blue) for the evader in a 3×3 environment withone entrance cell (cell 1) and two exit cell (cells 6 and 8)

We assume that the evader has to stay dt time steps (dt =0, 1, 2, . . .) in an exit cell t before leaving the environment.During this time it can be captured by the pursuer. Weconsider two models for the evader’s movement. In thesimplest one, the evader can move arbitrarily fast and cango from the entrance cell to the exit cell in one time step,independently of the length of the (shortest) path connectingthem. During the time step in which the evader moves alongthe path, it can be captured by the pursuer in any cell ofthe path. Although this model is rather unrealistic, it helpsenlighten some properties, as shown in Section V (infinitevelocity for evaders is sometimes assumed in pursuit evasionmodels, see Section II). In the more realistic model, the

evader spends one time step in moving from a cell to anadjacent free cell along the shortest path. Hence, with thesimplest model the evader is faster than the pursuer, whilein the more realistic model, the two agents are equally fast.

Some comments are worth on the above issues. Assumingthat the evader follows the shortest path from an entrancecell to an exit cell is without loss of generality when theevader is arbitrarily fast. In this case, since the probabilityof being captured is larger if the evader passes over a largernumber of cells, the evader will only follow shortest paths. Ifthe evader is as fast as the pursuer, the assumption that it willfollow shortest paths can be with loss of generality becausethe evader could take a longer path as an evading manoeuvre.However, this assumption is justified by the need to limit the(possibly infinite) actions of the evader for computationalreasons.

We make two assumptions on the knowledge of the evader:we assume that, while it waits outside the environment, theevader can see the position of the pursuer and that the evaderknows the pursuer’s strategy. These assumptions amount toa worst case scenario for the pursuer, because the evaderknows almost everything about it. A similar worst casestance is taken in some pursuit evasion works [12] and alsoin many robotic patrolling works [6], [10]. We note thatthe pursuer can adopt a randomized strategy, mitigating the“power” of the evader. Indeed, in this case, the evader willnot know the next action of the pursuer, but only a probabilitydistribution over the possible next actions. We also note that,when moving in the environment, the evader cannot see theposition of the pursuer (unless they are in the same cell) andso it will blindly follow the selected path. This assumptionsmakes the perception model of the evader similar to that ofthe pursuer discussed before.

There are several real-world situations that can be repre-sented by the setting described in this section. A first exampleis a large museum corridor that connects the entrance doorsand some sites that have some value (represented by the exitcells). Another example is a road that connects the entrancegates of a building with strategically important sites (exitcells). In both these cases a robotic guard (pursuer) has tocapture any thief (evader) that tries to enter the environmentand reach the sites of interest. The time dt the evaderspends in an exit cell can model the fact that opening asecurely closed door and going outside takes some time.It can also model some situations in which the evader hasto wait some time for an elevator before being safe. Theassumption that the evader knows the strategy of the pursueris motivated by the fact that the evader, while waiting outsidethe environment, can observe the behavior of the pursuer forenough time and derive a correct belief over its strategy.

IV. FINDING THE OPTIMAL PURSUIT STRATEGYA. Formulating the Strategic Game

In order to find the optimal pursuing strategy for the settingdescribed in the previous section, we formulate the problemas a two-player multi-stage zero-sum game with imperfect in-formation and infinite horizon [14]. (This formal framework




is inspired to previous works on robotic patrolling [6].) Theplayers are the pursuer and the evader that, at each stage ofthe game (corresponding to a time step), act simultaneously.

The actions available to the pursuer and to the evaderare, as discussed in the previous section, the movements inadjacent cells and the shortest paths from an entrance cellto an exit cell, respectively. The actions of the pursuer aredenoted by move(j) where j is a free cell adjacent to thecurrent cell of the pursuer. When action move(j) is playedat time k, then at time k+1 the pursuer occupies cell j andchecks it for the presence of the evader.

The available actions of the evader are denoted by wait andfollow(p), where p is a shortest path connecting an entrancecell to an exit cell. The sequence of cells covered by a pathis denoted by p (with a slight overload of the notation), thestarting (entrance) cell is p(1), the ending (exit) cell is p(l)(l is the length of the path). Playing action wait at time kmeans waiting outside the environment for that time step.According to our simplest model for the movements of theevader, playing action follow(p) at time k means to enterthe environment, reaching cell t = p(l) at time k + 1, andstaying there for other dt time steps. While in an exit cell, theevader stays there for the time interval {k + 1, . . . , k + dt}.According to the more realistic model for the movements ofthe evader, playing action follow(p) at time k means to enterthe environment (cell p(1)) at time k, reaching cell p(2) attime k + 1, and so on, until exit cell t = p(l) is reached atthe end of the path. Also in this case, the evader stays dtfurther time steps (namely, {k + l − 1, . . . , k + l − 1 + dt})in cell t = p(l).

The evader’s actions are not perfectly observable and thusthe pursuer, when acting, does not know whether the evaderis currently within the environment or waiting outside. Thegame has an infinite horizon, since the evader is allowed towait indefinitely outside the environment.

The possible outcomes of the game are:• null: when the evader plays wait forever, i.e., it never

enters the environment;• capture: when the evader plays follow(p) at time k and

the pursuer captures it on a cell of the path p or in exitcell t = p(l);

• escape: when the evader plays follow(p) at time k andthe pursuer does not capture it on a cell of the path pnor in exit cell t = p(l).

The game is zero-sum, so a win for a player is a loss forthe other player. We call Y0 the utility the evader gets if itis captured and Ydt

the utility the evader gets when it winsthe game leaving the environment from cell t.

B. Solving the Strategic Game

We now discuss how to solve the strategic game for findingthe optimal strategy for the pursuer, which maximizes theprobability of capturing the evader. A strategy for the pursueris a probability distribution over its possible next actions,namely over the actions move(j) where j is a free celladjacent to the current cell of the pursuer. Ideally, the strategycan be represented by a function that, given the past history

of the actions taken by the pursuer, returns the probabilitydistribution over the possible next actions. However, forcomputational efficiency, we assume a Markovian strategyand we limit the history to the last action played by thepursuer. Then, the pursuer strategy is defined as a probabilitydistribution αi,j , that is the probability for the pursuer tomove to cell j if it is currently in cell i.

As in the robotic patrolling scenarios described in [6],[15], the evader’s knowledge of the pursuer strategy “nat-urally” leads to find a leader-follower equilibrium, wherethe pursuer is the leader and the evader is the follower. In aleader-follower equilibrium [16], the leader plays the strategythat maximizes its expected utility given that the followerobserves this strategy and acts as a best responder. It hasbeen shown that in two-player games any leader-followerequilibrium is never worse (in terms of leader’s expectedutility) than every Nash equilibrium [16].

We now show how a leader-follower equilibrium can becalculated for our setting. For simplicity, we do not considerthe null outcome in which the evader waits forever outsidethe environment and we assume that dt is the same for allexit cells t (these two limitations can be easily removed). Itis convenient to define the possible best responses for theevader as enter-when(p, c), namely, making wait until thepursuer is in cell c and then follow path p (recall that theevader can fully observe the environment while it is waitingoutside). The leader-follower equilibrium can be computedby resorting to mathematical programming.

We first describe the algorithm for the simple model of themovement of the evader (according to which the evader isarbitrarily fast and covers any path in a single time step). Weintroduce the variable γh,pi,j , referring to the probability thatthe pursuer goes to cell j after h time steps from the enteringof the evader, starting from cell i and without detecting theevader along path p.

minu

αi,j ≥ 0 ∀i, j ∈ C (1)Xj∈C

αi,j = 1 ∀i ∈ C (2)

αi,j ≤ G(i, j) ∀i, j ∈ C (3)

γ1,pi,j = αi,j ∀p ∈ P, i ∈ C, j ∈ C − p (4)

γh,pi,j =

Xx∈C−{p(l)}

(γh−1,pi,x αx,j)

∀h ∈ {2, ..., dt}, p ∈ P, i ∈ C, j ∈ C − {p(l)} (5)

Y0 + (Ydt − Y0)×X

j∈C−{p(l)}

γdt,pi,j ≤ u ∀i ∈ C, p ∈ P (6)

where u is the expected utility of the evader, C is the set offree, entrance, and exit cells of the environment, G(i, j) = 1if cells i and j are adjacent and G(i, j) = 0 if they are not,and P is the set of all paths available to the evader.

Constraints (1)-(2) express that probabilities αi,j are welldefined. Constraints (3) limit the pursuer actions to thosecompatible with the topology of the environment. Con-straints (4)-(5) impose the Markov property to the pursuingstrategy. Constraints (6) impose u as the upper bound of theexpected utility of the evader. Finally, the objective function

https://www.researchgate.net/publication/221603690_Efficient_Algorithms_to_Solve_Bayesian_Stackelberg_Games_for_Security_Applications?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==



https://www.researchgate.net/publication/2925692_Leadership_with_Commitment_to_Mixed_Strategies?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

minimizes u, namely maximizes the expected utility of thepursuer (being the game zero-sum), which is proportionalto the probability of capturing the evader. This is ourcriterion for optimality. A solution of this problem is a setof probabilities {αi,j} that represent the optimal strategy ofthe pursuer. The best response of the evader is a path passociated with the optimal strategy of the pursuer. Note thatthe above formulation holds for any starting location (cell)of the pursuer.

We now describe the algorithm for the more realisticmodel of the movement of the evader (according to whichthe evader is as fast as the pursuer). The new algorithm is avariation of the previous one and accounts for the fact thatthe evader moves one cell at a time.

minu

αi,j ≥ 0 ∀i, j ∈ C (7)Xj∈C

αi,j = 1 ∀i ∈ C (8)

αi,j ≤ G(i, j) ∀i, j ∈ C (9)

γ1,pi,j = αi,j ∀p ∈ P, i ∈ C, j ∈ C − {p(1)} (10)

γh,pi,j =

Xx∈C−{p(h−1)}

(γh−1,pi,x αx,j)

∀h ∈ {2, ..., l + dt}, p ∈ P, i ∈ C, j ∈ C, j 6= p(h) (11)

Y0 + (Ydt − Y0)×X

j∈C−{p(l)}

γdt,pi,j ≤ u ∀i ∈ C, p ∈ P (12)

Constraints (7)-(9) are the same as above. Constraints (10)-(11) impose the Markov property, considering that the evaderchanges cell at each step (note that, in (11), p(h) = p(l)if h ≥ l). Finally, constraints (12) impose, as in the caseof arbitrarily fast evader, that u is an upper bound of theexpected utility of the evader.

The complexity of the proposed model is related to thenumber of constraints. For example, in the case of the morerealistic model for the movement of the evader, we have thatthe number of constraints is O(|C|2 · |P | · max dt) (due toconstraints (11)), and so the constraints grows quadraticallywith the size |C| of the environment and linearly with boththe number |P | of paths for the evader and the maximumtime spent in an exit cell max dt. The computational timefor calculating a solution depends, once a solver for themathematical programming problems has been selected, on|C|, |P |, and max dt, as the next section will show.

V. EXPERIMENTAL RESULTSThe mathematical programming problems of the previous

section have been represented in AMPL (A MathematicalProgramming Language, [17]) and solved using SNOPT(Sequential Non-linear OPTimizer, [18]). Tests have beenrun on a Linux computer equipped with a 2.33 GHz CPUand 4 GB RAM. For each experiment, we defined a rect-angular grid environment and a value for dt (the timethe evader has to stay in an exit cell before leaving theenvironment). We set Y0 = 0 and Ydt

= 5 (changing thesevalues amounts to rescaling results, but does not affect thequalitative considerations we present). In the following, wepresent a representative sample of the many experiments weconducted.

We start with experiments in which we consider the simplemodel for the movement of an arbitrarily fast evader. A firsttest is shown in Fig. 3. Fig. 3(a) shows the environment, cells0, 1, and 2 are entrance cells, while cells 6, 7, and 8 are exitcells. Blue arrows represent the possible shortest paths for theevader. Fig. 3(b) shows the expected utility u of the evaderfor different values of dt. The expected utility of the pursuercan be simply calculated as Ydt − u. Of course, the longerthe evader has to stay in an exit cell, the smaller its expectedutility (and the larger that of the pursuer). For dt ≥ 3, thepursuer always captures the evader (u = 0). It is easy to seethat, starting from any cell of the environment of Fig. 3(a),the pursuer can reach any of the exit cells in at most 4 timesteps (the evader spends 1 time step to complete a path anddt = 3 time steps in an exit cell). Fig. 3(c) shows the optimalstrategy of the pursuer for dt = 2, with the convention thatthe thicker the arrow, the more probable the correspondingaction. The pursuer prefers to go toward the exit cells tocapture the evader there. This is true also for larger valuesof dt, for which the strategy of the pursuer tends to form acycle over cells 4, 8, 7, 6, 4, . . . .

An experiment in a more complex environment is pre-sented in Fig. 4. In this case, the pursuer is not guaranteedto capture the evader even if the time dt the evader spendsin an exit cell is arbitrarily long. This impossibility isrelated to the fact that the strategies of the pursuer mustbe Markovian. Ideally, given that the evader stays in an exitcell for enough time, the pursuer could be sure to captureit by following a cycle like 5, 8, 12, 8, 5, 6, 11, 15, 14,11, 6, 5, and so on. The problem is that this cycle is notMarkovian, because the pursuer, when in cell 8, has to goin different cells (12 or 5) according to the second last cell(i.e., according to the last cell visited just before reaching8). The same happens for cell 11. It is easy to show that, forthe environment in Fig. 4(a), the Markovian optimal strat-egy is outperformed by a non-Markovian optimal strategy.Since our framework can only find Markovian strategies, thepursuer is not guaranteed to capture an evader, regardless ofhow large dt is. Fig. 4(c) shows the strategy of the pursuerfor dt = 10. Use of Markovian strategies restricts the casesin which our approach finds an optimal solution, but limitsthe explosion of computational complexity. The size of themathematical programming problems, and in particular thenumber of probabilities {αhist,j} that must be calculated,grows exponentially in the length of the history hist of lastactions played by the pursuer.

In both the previous experiments, the environments arecorridor-like, namely entrance cells are all along the top edgeand exit cells are all along the bottom edge. Now we considerother environments with entrance and exit cells in differentpositions along their borders. Fig. 5 shows an experiment inan environment with two entrance cells (cells 3 and 12) andtwo exit cells (cells 0 and 15). In this case, given that dt isenough large, the pursuer can visit all the exit cells withindt+1 time steps and capture the evader. Another interestingsituation is reported in Fig. 6, where one of the two possibleshortest path for the evader (that going from cell 2 to cell 3)

https://www.researchgate.net/publication/200120812_AMPL_Modeling_Language_for_Mathematical_Programming?el=1_x_8&enrichId=rgreq-dc8375bb04332f98fc4d2667c36637c0-XXX&enrichSource=Y292ZXJQYWdlOzI1NDA0MDY3ODtBUzoxMzE4MTY5MjA5ODE1MDVAMTQwODQzOTAxMTY4OQ==

0

876

4

21

3 5

(a) environment

0 1 2 3 4

1

2

3

4

5

u5 5

2,5

0

dt

(b) expected utility of the evader vs. dt

0

876

4

21

3 5

(c) optimal strategy forthe pursuer for dt = 2

Fig. 3. First experiment in a 3× 3 environment

0

8

654

21

11

7

3

12 1514

109

13

(a) environment

0 1 2 3 4 5 6 7

1

2

3

4

5

u

8 9 10 11 12 13 14 15

5 5 5 54,78

4,56 4,47 4,374,15

3,76 3,83

3,343,53

3,08

2,34

20

2,20

dt

2,29


12 1514

109

13

8

654

11

7

0 21 3

(c) optimal strategy for thepursuer for dt = 10

Fig. 4. Second experiment in a 4× 4 environment

is included in the other one (that going from cell 2 to cell 14).Given that dt ≥ 4, the pursuer is guaranteed to capture theevader just by monitoring the surroundings of cell 3, whichwill be visited by the evader for any path it chooses. Thisclearly appears from Fig. 6(c), showing the optimal pursuingstrategy for dt = 5 that guarantees to capture the evader.The worst case for the pursuer is when the evader enters theenvironment when the pursuer is in cell 12. When dt ≥ 4,the pursuer can visit both cells 14 and 3 (in this order) within5 time steps capturing the evader in one of them.

Computational times for calculating the optimal strategyof the pursuer mainly depend on the size of the environmentand on dt. Fig. 7 shows how the number of iterations SNOPTemploys for finding the optimal strategy changes with thesize of a (square) environment. We considered dt = D for aD×D environment. Increasing the number of entrance (in)and exit (out) cells slows down the computation, basicallybecause there are increasingly many possible paths for theevader (recall that the possible paths for the evader connectany entrance cell to any exit cell). From this analysis, itemerges the need of limiting the number |P | of paths for theevader, as discussed in Section III. Fig. 8 (referred to a 3×3environment) shows that the number of iterations depends in

a rather complex way from dt. This is because large valuesof dt do not always lead to easy solutions for the pursuer(as in the case of Fig. 4). All our experiments have beenterminated in less than 2 hours. Considering that we did notapply any optimization, these results assess the feasibilityof our approach for calculating optimal pursuing strategies.In practical applications, these strategies are expected to becalculated off-line, at design time, and then implemented onrobotic guards.

We now show some experiments in which we consider thesecond, more realistic, model for the movement of the evader.Fig. 9 compares the expected utilities obtained with the twomodels for the movement of the evader in the environmentof Fig. 3(a). With the more realistic model, the expectedutility of the evader is decreased because the evader is slowerand stays for more time in the environment, facilitating thecapture for the pursuer.

A somehow counterintuitive result is that sometimes theevader has some advantages in being slower. An exampleis presented in Fig. 10. In this situation, the maximum dtfor which the pursuer is guaranteed to capture the evader islarger in the case of a slower evader. This can be explainedby noting that, with an arbitrarily fast evader, the task ofthe pursuer is somehow easy: it can go to the exit cells,

0

8

654

21

11

7

3

141312 15

109

(a) environment

0 1 2 3 4 5 6 7

1

2

3

4

5

u

2,5

5

3,75

5

3,56

0

2,1

dt

(b) utility of the evader vs. dt

Fig. 5. Third experiment in a 4× 4 environment

0

876

3

21

119

141312

54

10

(a) environment

0 1 2 3 4

1

2

3

4

5

u5 5

3,09

2,5

0

d t


54

10

0

876

3

21

119

1412 13

(c) optimal strategy forthe pursuer for dt = 5

Fig. 6. Fourth experiment in a 3× 5 environment

0

1X1 2X2 3X3 4X4 5X5

1

2

7

6

5

4

3

8

9

10

12

11

13

14

15

6X6

1 in

1 out

2 in

2 out

4 1480873

3362

11943

70936

4245

19550

1871517

10083

3 in

3 out

16

17

18

19

2811

18261

301

4 in

4 out

#iterations (x 1000)

#cells

Fig. 7. Number of iterations vs. size of the environment

0 1

d

1

2

7

6

5

4

3

8

9

10

412

2 3 4 5 6 7 8 9 15 20 25

1012

1704

21162346

31593179

3600 3446

3511

6937

5371

2483

t

#iterations (x 1000)

Fig. 8. Number of iterations vs. dt

where the evader spends most of its time in the environment,and capture the evader there. A slower evader can movesynchronously with the pursuer and has the possibility towin games also for larger values of dt.

Computational times for the more realistic model of themovement of the evader are qualitatively similar to those ofFigs. 7 and 8, the only difference being that the more realisticmodel requires a slightly larger number of iterations.

VI. CONCLUSION

In this paper, we have presented an approach to computethe optimal strategy for a pursuer that maximizes the proba-bility of capturing an evader in a given environment. Theproposed approach is based on the definition of a gametheoretical model of a pursuit evasion setting and on itssolution resorting to mathematical programming. Experimen-tal results confirm the feasibility of the approach and showsome interesting insights on the relations between the optimal

0 1 2 3 4

1

2

3

4

5

u

3,5

2,5

5

2,5

0

5

0

1,9

dt

Fig. 9. Expected utility of the evader vs. dt inthe case of an arbitrarily fast evader (blue curve)and in the case of an evader as fast as the pursuer(yellow curve) for the environment of Fig. 3(a)

0

876

543

21

(a) environment

0 1 2 3 4 5 6 7

1

2

3

4

5

u

3,63

2,5

2,1

1,25

0,31

1,15

0,59

0

8

5 5

3,09

1,90

0

0,62

1,25

dt

(b) expected utility of the evader vs. dt in the case ofan arbitrarily fast evader (blue curve) and in the case ofan evader as fast as the pursuer (yellow curve)

Fig. 10. An experiment in a 3× 3 environment

strategies and the features of the environments in which thegame is played.

Future work will consider more complex environments andthe computation of more general, non-Markovian, strategiesin order to overcome situations like that of Fig. 4. Thiswill require the development of methods to reduce thecomputational complexity of calculating optimal strategies.With a way to find solutions in (nearly) real-time, alsodynamically changing environments could be considered.Moreover, our pursuit evasion setting could be generalized.Extending our approach to generic graph environments isstraightforward. Also issues like assuming that the evader isalready inside the environment at the start of the game andthat the evader can stop in some cells along its path can beeasily considered in our model. Accounting for pursuers andevaders that adjust their paths and velocities opportunisticallydeserves more work towards a model for on-line pursuitevasion that considers perceptions and adaptive strategies.Our approach can handle multiple synchronized pursuersby considering a single “virtual” pursuer whose actions andperceptions are elements of the Cartesian product of the setsof actions and perceptions of the actual pursuers. Handlingmultiple independent pursuers and multiple evaders requirea deeper investigation. Finally, implementing our approachon real robots will further assess its significance.

ACKNOWLEDGEMENTS

The authors gladly thank Simone Priamo for his contribu-tions to the experimental part of this work.

REFERENCES

[1] R. Borie, C. Tovey, and S. Koenig, “Algorithms and complexity resultsfor pursuit-evasion problems,” in Proc. IJCAI, 2009, pp. 59–66.

[2] A. Kolling and S. Carpin, “Pursuit-evasion on trees by robot teams,”IEEE T ROBOT, vol. 26, no. 1, pp. 32–47, 2010.

[3] R. Vidal, O. Shakernia, J. Kim, D. Shim, and S. Sastry, “Probabilisticpursuit-evasion games: Theory, implementation and experimental re-sults,” IEEE T ROBOTIC AUTOM, vol. 18, no. 5, pp. 662–669, 2002.

[4] J. Denzinger and M. Fuchs, “Experiments in learning prototypicalsituations for variants of the pursuit game,” in Proc. ICMAS, 1996,pp. 48–55.

[5] K. Daniel, R. Borie, S. Koenig, and C. Tovey, “ESP: pursuit evasionon series-parallel graphs,” in Proc. AAMAS, 2010, pp. 1519–1520.

[6] F. Amigoni, N. Basilico, and N. Gatti, “Finding the optimal strategiesin robotic patrolling with adversaries in topologically-representedenvironments,” in Proc. ICRA, 2009, pp. 819–824.

[7] T. Chung, G. Hollinger, and V. Isler, “Search and pursuit-evasion inmobile robotics: A survey,” AUTON ROBOT, vol. 31, no. 4, pp. 299–316, 2011.

[8] M. Flood, “The hide and seek game of Von Neumann,” ManagementScience, vol. 18, no. 5, pp. 107–109, 1972.

[9] O. Hajek, Pursuit Games. Academic Press, 1975.[10] N. Agmon, S. Kraus, and G. Kaminka, “Multi-robot perimeter patrol

in adversarial settings,” in Proc. ICRA, 2008, pp. 2339–2345.[11] V. Isler and N. Karnad, “The role of information in the cop-robber

game,” THEOR COMPUT SCI, vol. 399, pp. 179–190, 2008.[12] V. Isler, S. Kannan, and S. Khanna, “Randomized pursuit-evasion in a

polygonal environment,” IEEE T ROBOT, vol. 5, no. 21, pp. 864–875,2005.

[13] N. Basilico, N. Gatti, T. Rossi, S. Ceppi, and F. Amigoni, “Extendingalgorithms for mobile robot patrolling in the presence of adversariesto more realistic settings,” in Proc. IAT, 2009, pp. 557–564.

[14] D. Fudenberg and J. Tirole, Game Theory. The MIT Press, 1991.[15] P. Paruchuri, J. Pearce, J. Marecki, M. Tambe, F. Ordonez, and

S. Kraus, “Efficient algorithms to solve bayesian Stackelberg gamesfor security applications,” in Proc. AAAI, 2008, pp. 1559–1562.

[16] B. von Stengel and S. Zamir, “Leadership with commitment to mixedstrategies,” London School of Economics, CDAM Research ReportLSE-CDAM-2004-01, 2004.

[17] R. Fourer, D. Gay, and B. Kernighan, “A modeling language formathematical programming,” Management Science, vol. 36, no. 5, pp.519–554, 1990.

[18] Stanford Business Software Inc., “http://www.sbsi-sol-optimize.com/.”


































A game theoretical approach to finding optimal strategies for pursuit evasion in grid environments

Documents

Transcript of A game theoretical approach to finding optimal strategies for pursuit evasion in grid environments