A co-evolutionary method for pursuit-evasion games with non-zero lethal radii

18
Engineering Optimization Vol. 36, No. 1, February 2004, 19–36 A CO-EVOLUTIONARY METHOD FOR PURSUIT-EVASION GAMES WITH NON-ZERO LETHAL RADII HAN-LIM CHOI, HYEOK RYU, MIN-JEA TAHK and HYOCHOONG BANG Division of Aerospace Engineering, Korea Advanced Institute of Science and Technology, 373-1, Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea (Received 28 May 2003; In final form 28 September 2003) This study suggests a co-evolutionary method for solving pursuit-evasion games with consideration of non-zero lethal radii. The proposed method has three key features. First, it can handle both the final time problem and the miss distance problem simultaneously, by adopting a separated payoff function. Second, the Stackelberg equilibrium instead of the security strategy solution is employed to consider the maximin characteristics of an open-loop solution. Finally, an additional evolving group is introduced to treat an unprescribed final time. Numerical simulations are performed to verify the proposed method by comparing it with the gradient-based method. In addition, the effect of lethal radius is discussed based on the numerical results. Keywords: Pursuit-evasion game; Co-evolutionary algorithm; Differential game; Direct optimization method; Stackelberg equilibrium 1 INTRODUCTION Today, there are increasing needs for a robust missile guidance law, which guarantees acceptable interception performance against an intelligent target. Thus, the pursuit-evasion game, first introduced by Isaacs [1], has become a fascinating concept to meet those needs, since it considers worst-case design. The pursuit-evasion game results in a zero-sum game between the pursuer and the evader control; the payoff of the game is usually the final time or the miss distance. However, an analytic solution of the pursuit-evasion game is available only for very simple cases. Thus, to implement the game concept into real guidance problems, a reliable numerical solver, which can deal with realistic pursuit-evasion situations, is essential. An indirect method like the multiple shooting method is not suitable for that practical purpose, since it entails a complicated formulation and difficulty in the initial guess of the costate. Recently, many researchers have been interested in obtaining numerical solutions of pursuit- evasion games using a direct optimization method, in which control parameterization is involved. Tahk et al. [2, 3] proposed a gradient-based method that finds the maximin solution throughout the update and correction loop, and Ryu [4] extended it to three-dimensional pursuit-evasion games. Ehtamo and Raivio [5] introduced a bilevel programming approach, which is based on the decomposition of the necessary conditions and the feasible direction Corresponding author. Tel.: +82-42-869-3718; Fax: +82-42-869-3710; E-mail: [email protected] Engineering Optimization ISSN 0305-215X print; ISSN 1029-0273 online c 2004 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/03052150310001634862

Transcript of A co-evolutionary method for pursuit-evasion games with non-zero lethal radii

Engineering OptimizationVol. 36, No. 1, February 2004, 19–36

A CO-EVOLUTIONARY METHOD FORPURSUIT-EVASION GAMES WITH NON-ZERO

LETHAL RADII

HAN-LIM CHOI, HYEOK RYU, MIN-JEA TAHK∗ and HYOCHOONG BANG

Division of Aerospace Engineering, Korea Advanced Institute of Science and Technology, 373-1,Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea

(Received 28 May 2003; In final form 28 September 2003)

This study suggests a co-evolutionary method for solving pursuit-evasion games with consideration of non-zero lethalradii. The proposed method has three key features. First, it can handle both the final time problem and the miss distanceproblem simultaneously, by adopting a separated payoff function. Second, the Stackelberg equilibrium instead of thesecurity strategy solution is employed to consider the maximin characteristics of an open-loop solution. Finally, anadditional evolving group is introduced to treat an unprescribed final time. Numerical simulations are performed toverify the proposed method by comparing it with the gradient-based method. In addition, the effect of lethal radius isdiscussed based on the numerical results.

Keywords: Pursuit-evasion game; Co-evolutionary algorithm; Differential game; Direct optimization method;Stackelberg equilibrium

1 INTRODUCTION

Today, there are increasing needs for a robust missile guidance law, which guarantees acceptableinterception performance against an intelligent target. Thus, the pursuit-evasion game, firstintroduced by Isaacs [1], has become a fascinating concept to meet those needs, since itconsiders worst-case design. The pursuit-evasion game results in a zero-sum game betweenthe pursuer and the evader control; the payoff of the game is usually the final time or themiss distance. However, an analytic solution of the pursuit-evasion game is available onlyfor very simple cases. Thus, to implement the game concept into real guidance problems, areliable numerical solver, which can deal with realistic pursuit-evasion situations, is essential.An indirect method like the multiple shooting method is not suitable for that practical purpose,since it entails a complicated formulation and difficulty in the initial guess of the costate.

Recently, many researchers have been interested in obtaining numerical solutions of pursuit-evasion games using a direct optimization method, in which control parameterization isinvolved. Tahk et al. [2, 3] proposed a gradient-based method that finds the maximin solutionthroughout the update and correction loop, and Ryu [4] extended it to three-dimensionalpursuit-evasion games. Ehtamo and Raivio [5] introduced a bilevel programming approach,which is based on the decomposition of the necessary conditions and the feasible direction

∗ Corresponding author. Tel.: +82-42-869-3718; Fax: +82-42-869-3710; E-mail: [email protected]

Engineering OptimizationISSN 0305-215X print; ISSN 1029-0273 online c© 2004 Taylor & Francis Ltd

http://www.tandf.co.uk/journalsDOI: 10.1080/03052150310001634862

20 HAN-LIM CHOI et al.

search optimization method. These two methods have provided desirable solutions for rathercomplicated three-dimensional pursuit-evasion games. However, they have only consideredfinal time problems, although a miss distance problem can give more useful information inreal situations. The miss distance problem inevitably requires a more complicated formulationthan the final time problem; this leads to difficulty in obtaining a numerical solution.

The co-evolutionary algorithm [6] can also be a good candidate in solving pursuit-evasiongames. It was developed as a minimax solver for saddle point optimization problems; it hasprovided remarkable results when applied to the robust control of an aircraft [6], and a con-strained parameter optimization [7]. Since a direct approach to pursuit-evasion game leads toa saddle point parameter optimization, the basic architecture of the co-evolutionary algorithmis very suitable for solving pursuit-evasion games. It can be noted that the co-evolutionaryalgorithm provides a general framework for the optimization process which is not problem-specific, thereby being rarely dependent on the payoff of the game. Hence, when implementedinto the pursuit-evasion game, it is expected to give a reasonable game solution whether thepayoff is the final time or the miss distance.

However, the conventional co-evolutionary algorithm cannot directly be implemented intothe pursuit-evasion game for the following reasons: (1) the final time is not pre-determined inthe pursuit-evasion game; (2) a differential game cannot be treated as a security strategy gameafter it is formulated as a direct optimization problem. Therefore, modifications of the evolutionstructure and of the way the fitness evaluation is taken are required to graft the co-evolutionconcept into the pursuit-evasion game.

The authors have made efforts to implement the co-evolutionary algorithm into pursuit-evasion games by modifying it [8, 9]. Both of these two works dealt with the final time problem,and shared a mindset in treating the capture condition: they considered that condition as aconstraint of minimax optimization. Choi and Tahk [8] introduced a penalty function method,which leads to a bi-matrix game problem, to deal with this constraint, while Kim and Tahk [9]tried to convert the constrained minimax problem into an unconstrained one by adopting an aug-mented Lagrangian function. Unfortunately, these approaches were not particularly successful.The former suffered from solution ambiguity caused by the penalty function, whereas the lattersuffered from a logical discrepancy which resulted from inseparability of the constraint. How-ever, Choi and Tahk [8] hinted the effectiveness of (1) a new evolving group to deal with anunprescribed final time, and (2) the Stackelberg equilibrium in fitness evaluation of the players.

This study suggests a new co-evolutionary approach to the pursuit-evasion game based onthe co-evolution framework of Choi and Tahk [8]. This work handles the final time problem andmiss distance problem at the same time by modifying the payoff, and adopts the Stackelbergequilibrium for fitness evaluation. Moreover, a new evolving group is employed for guessingthe final time. Numerical examples for verifying the proposed method are also given comparedwith results of the gradient-based method.

This paper is organized as follows. Section 2 introduces the preliminaries of this work,such as the formulation of a pursuit evasion game and an introduction to the co-evolutionaryalgorithm. In Section 3, the main ideas of the suggested method are explained, whereas detailsof the algorithm are given in Section 4. Section 5 verifies the proposed method numerically bycomparing it with the gradient-based method. Section 6 concludes this work.

2 PRELIMINARIES

2.1 Pursuit-Evasion Game

The pursuit-evasion game considered in this paper is a perfect information zero-sum differentialgame, in which two players – the pursuer and the evader – are concerned. In addition, this paper

PURSUIT-EVASION GAMES 21

assumes that the dynamics of each player is decoupled from the other; therefore, the equationsof motion can be stated as(

xP(t)

xE(t)

)= x(t) = f (x(t), uP(t), uE(t), t) =

(fP(xP(t), uP(t), t)

fE(xE(t), uE(t), t)

)(1)

x(0) = x0,

where xi(t) ∈ Rni , ui (t) ∈ Rmi , i = P, E (Pursuer, Evader). The function f is continuous int, uP, and uE and is continuously differentiable in x. The final time of the game is defined as

t f = inf{t ∈ R+|(x(t), t) ∈ �}, (2)

where � is a closed subset, called the target set, in the product space of RnP+nE × R+. Theboundary ∂� of � is an (nP + nE)-dimensional manifold in the space RnP+nE × R+ and ischaracterized by the scalar equation

l(x(t), t) = 0 (3)

which is assumed to be continuous in t and continuously differentiable in x. The payoff of thegame is assumed to be a Meyer form like

J (uP, uE) = q(x(tf ), t f ), (4)

where q is continuous in t f and continuously differentiable in x(t f ).The final time t f or the miss distance r f is often selected as the payoff of the pursuit-evasion

game. For each case, the following boundary function can be employed.

l(x(t), t) = r − rc if q(x(tf ), t f ) = t f , (5)

l(x(t), t) = r = dr

dtif q(x(tf ), t f ) = r f , (6)

where r is distance between the players, and rc is the lethal radius of the pursuer’s warhead.Suppose that a pair (σ ∗

P , σ ∗E ) ∈ �P × �E is a saddle-point solution in feedback strategies

for the game and x∗(t) is the corresponding trajectory, the value function satisfying the Issacsequation is defined by

V (x, t) = minσP∈�P

maxσE∈�E

q(x(t f ), t f ) = maxσE∈�E

minσP∈�P

q(x(tf ), t f ) (7)

for the initial condition (x, t). Since the dynamics of each player is decoupled, and the payoffis terminal, the open-loop representations u∗

i (t): = σ ∗i (x∗(t), t), i = P, E are also the saddle-

point solution [10]. This study mainly focuses on this open-loop represented saddle-pointsolution.

Numerical methods for obtaining open-loop represented solutions can be classified intotwo categories: indirect methods and direct methods. Since indirect methods inevitably entailtwo-point boundary value problems (TPBVPs) that are difficult to solve, direct methods, suchas nonlinear programming and the gradient method, have become more attractive in solvingcomplicated pursuit-evasion games.

This paper is also proposing a direct approach, which is based on the discretization of thecontrol variables. The authors assume that the interval [0, t f ] is divided into N subintervals of

22 HAN-LIM CHOI et al.

the same length, and that the time history of each player’s control variable can be approximatedas follows:

ui (t) � uDi (t) =

N∑k=1

[uP,k × π(t − (k − 1)�t, t − k�t)] i = P, E (8)

where

π(t1, t2) ={

1 if t1 ≥ 0 and t2 < 0

0 otherwise, and �t = t f

N.

In other words, the control history of each player is assumed to be a sequence of N piecewiseconstant control inputs, each of which lasts for �t . Therefore, the control parameter vectorcan be expressed as

uVi = [ui,1, ui,2, . . . , ui,N ] i = P, E. (9)

2.2 Conventional Co-evolutionary Algorithm

The conventional co-evolutionary algorithm introduced by Park and Tahk [6] is a stochasticglobal search algorithm for obtaining a saddle point of a minimax parameter optimizationproblem. If a minimax problem has a saddle point solution, then it can be treated as a zero-sumgame. Discretization of the strategy variables of the zero-sum game leads to a static matrixgame from which the saddle point can be found approximately. At this point, the conceptof evolution is applied to generate the next static matrix game that has a denser populationaround the saddle point than the previous matrix game. In the co-evolutionary algorithm, thepopulation of each group represents the set of strategies of each player involved in the game.

Consider a static zero-sum game G, for which a payoff function F = F(a, b) is to beminimized by a ∈ A ⊂ RkA and maximized by b ∈ B ⊂ RkB . The zero-sum game can beapproximated as a static matrix game G M for which the players a and b have a finite numberof strategies as follows:

ai ∈ AM ⊂ A ⊂ RkA , i = 1, 2, . . . , µa,

b j ∈ BM ⊂ B ⊂ RkB , j = 1, 2, . . . , µb,(10)

where µa and µb denote the number of strategies of a and b, respectively.For this static matrix game, the security strategy of a is defined as aS that satisfies the

following condition:

F(G M )�= max

jF(aS, b j) ≤ max

jF(ai , b j), i = 1, . . . , µa (11)

and F(G M) is called the security level for a’s losses. Likewise, the security strategy of bsatisfies the condition,

F(G M)�= min

iF(ai , bS) ≥ min

iF(ai , b j), j = 1, . . . , µb (12)

and F(G M) is called the security level for b’s gains. It is noteworthy that it is a very reasonableway for each player to employ the security strategy in a zero-sum game. If each player is notgiven any information about the other player’s strategy, and if the game is played only once,a reasonable mode of play for the minimizer is to secure his losses against any (rational or

PURSUIT-EVASION GAMES 23

irrational) behavior of the maximizer. Likewise, the maximizer’s reasonable mode of play isto secure his gains against any behavior of the minimizer. By adopting the security strategy,each player can achieve this purpose [10].

In addition, the security levels always satisfy F(G M) ≤ F(G M ). If this condition holds,the security strategy (aS, bS) corresponds to the saddle-point solution of G M . Since G M is justan approximation of G (aS, bS) may be quite different from (a∗, b∗), the saddle point solutionof G. However, as the population density of AM and BM about the saddle point increases, thesecurity strategy (aS, bS) guarantees a better approximation of (a∗, b∗).

Based on the discussion above, a static matrix game is interpreted in the context of co-evolution. Let ξ i and η j denote individuals of population X and Y, representing the strategiesai and b j , respectively. Define the score of the match between ξ i and η j as the value ofF(ai , b j). Then the triplet (F, X, Y) defines a matrix game G M . Fitness is evaluated relying oneach player group’s objective in the game. Thus the following strategies are given:

fitness of ξ i = maxj

F(ξ i , η j ),

fitness of η j = mini

F(ξ i , η j).

Security strategies of ξ i and η j become the saddle-point solution of G M . Hence, ξ i andη j converge to ξ∗ and η∗. Since the offspring population is denser around the saddle pointas the generations increase, a co-evolution process with a security strategy will converge tothe saddle point. The flow chart of the conventional co-evolutionary algorithm is given in

FIGURE 1 Flow chart of the conventional co-evolutionary algorithm.

24 HAN-LIM CHOI et al.

Figure 1. It adopts the algorithm of evolution strategy for genetic operations such as selection,recombination, and mutation variances.

3 MAIN IDEAS

3.1 Simultaneous Consideration of Final Time and Miss Distance

Most direct approaches to the pursuit-evasion game have dealt with the final time problemdue to its suitability for evaluation of the capture set. From a practical point of view, though,the miss distance problem might give more insightful information. If the missile’s thrust issustained long enough, it is not that important whether the missile intercepts the target ear-lier or later. Moreover, the final time problem does not provide any quantitative informationwhen the interception fails. The so-called lethal radius is a probabilistic concept; therefore,a guidance law needs to guarantee the smallest miss distance, even if that is greater than thelethal radius. Nevertheless, the miss distance problem is accompanied by more complicatedformulation. This entails difficulty in solving it with classical optimization methods. By usingan evolutionary approach, this difficulty can be alleviated. An evolutionary approach is a verygeneral framework of optimization, thereby being insensitive to the characteristics of a partic-ular problem. Based on this idea, this paper adopts an evolutionary optimization methodologyto solve both the final time problem and the miss distance problem.

In addition, this paper tries to handle both problems simultaneously. The suggested methodconsiders the time-optimal problem if the initial state lies in the capture set, while it considersthe miss-distance problem otherwise. The payoff of the game is as follows:

J ={

t f if x0 ∈ C,

r f if x0 /∈ C,(13)

where C is defined by C = {x0: r∗f (x0) ≤ rc}. However, this expression needs to be modified in

order to be used for payoff evaluation in each generation of co-evolution. Against a particularevader, the pursuer prefers a certain strategy resulting in interception rather than all the strategiesfailing in interception, regardless of the value of the final time. Likewise, the evader prefers astrategy resulting in avoidance while sacrificing a small amount of final time. Under thisargument, this paper modifies the payoff as the following:

J ={

t f if r f ≤ rc,

r f + M if r f > rc,(14)

where M is a very large constant number that is greater than any possible value of t f . When thispayoff function is applied to the co-evolution process, the game looks like the miss distanceproblem in the initial phase of evolution. Afterwards, in the case x0 ∈ C, no evader individualsucceeds in avoidance; the game gets to resemble the final time problem. In case x0 /∈ C, nopursuer individual succeeds in the interception against all the evader individuals; the gameends as being a miss distance problem. Thus, one can expect to know the success or failureof the interception, and to obtain the optimal trajectories regardless of the success or failure ofinterception, with the proposed method.

PURSUIT-EVASION GAMES 25

3.2 Fitness Evaluation Using Stackelberg Equilibrium

As mentioned before, if the payoff is terminal and the dynamics of both players are decoupled,an open-loop solution of the pursuit-evasion game also satisfies the saddle point condition.Thus, the minimax value should be the same as the maximin value. However, numericallyonly the maximin solution is available. It is not wrong to call the open-loop considerationof the game as ‘a game between trajectories,’ since an open-loop solution shows only thetrajectories the players should follow, not the way they should behave. A so-called open-loopgame solution is game-optimal in the manner that there exists a pursuer trajectory resulting inless final time (or miss distance), unless the evader moves along the game-optimal trajectory,this is a maximin statement. The following discussion justifies the validity of this statement.

Let us consider two trajectory pairs, T11 = (P1, E1) and T22 = (P2, E2), assuming T11 isgame-optimal, while T22 is not game-optimal but is a certain minimum-time trajectory pair.The payoff values corresponding to the two trajectory pairs, when denoted as J11 and J22,obviously satisfy the maximin condition, J11 > J22. If we know the feedback strategies ofboth players for both cases, such as σP1, σE1, σP2, and σE2, we can easily construct the cross-related trajectories such as T12 = T12(σP1, σE2) and T21 = T21(σP2, σE1), satisfying the saddlepoint condition, J21 > max(J11, J12) > min(J11, J12) > J22, (Figure 2(a)). However, if we donot know the feedback strategies but just know the open-loop representation of the controlhistories, we cannot guarantee to construct such cross-related trajectories satisfying the saddlepoint condition. We cannot guarantee even the success of capture for the cross-related cases,(Figure 2(b)). Thus, when considering the open-loop solution, we cannot optimize it satisfyingboth the maximin and minimax condition throughout the optimization process.

Under the above discussion, the security strategy concept in the conventional co-evolutionary algorithm is not appropriate for the pursuit-evasion game between trajectories,since the saddle point condition cannot be guaranteed throughout the evolution process. Instead,the Stackelberg equilibrium concept should be employed. If Stackelberg equilibrium is adopted,one player (leader) selects its strategy in advance, and the other player (follower) chooses itsstrategy later. The leader may well select its security strategy, but the follower does not haveto select its security strategy. The follower had better choose the best strategy against theleader’s security strategy, since it knows what strategy the leader selected. For pursuit-evasiongames, the evader should play a role of the leader, and the pursuer should adopt that of thefollower. By assigning the roles like this, we can optimize the open-loop solution guaranteeingthe maximin condition during the evolution process.

Originally, the Stackelberg equilibrium was devised for an N-person non-zero-sum gamein which each player has his own payoff that need not be same as that of another player.Nevertheless, if two players share the payoff and there exists a saddle point, the Stackelberg

FIGURE 2 Cross-related trajectories: (a) when the feedback strategies are known, (b) when the feedback strategiesare not known.

26 HAN-LIM CHOI et al.

game is same as the saddle point game governed by the security strategies. Namely, for thepursuit-evasion game, a finally converged solution should be the saddle point solution, even ifthe fitness evaluation in each generation relies on the Stackelberg equilibrium.

3.3 Adoption of a New Evolving Group

As stated in the previous section, the conventional co-evolutionary algorithm adopts two evolv-ing groups, each of which has opposite objectives. When pursuit-evasion is considered, natu-rally the pursuer’s control input vector comprises the minimizer group (X), and the evader’sthe maximizer group (Y). However, this group assignment involving two evolving groups isnot sufficient for pursuit-evasion games. This is because the pursuit-evasion game does nothave any prescribed final time. Instead, the final time is evaluated as a byproduct or an endproduct of the game optimization. Ironically, without any information about the final time, thetrajectories of both players cannot be computed, neither can the payoff.

One way to resolve this discrepancy is to include the final time as an optimization parameter.In other words, during the evolution process of searching for the players’game-optimal controls,the optimal final time is also sought for by its own optimization process. Several optimizationmethods can be implemented into this process. This study also applies an evolution process tothe final time search in order to prevent a wrong initial guess of the final time from affectingthe overall co-evolution process of the minimizer and the maximizer. Thus, another evolvinggroup T, which deals with the final time, is introduced.

The new group T plays two essential roles: one is to generate the candidates of the optimalfinal time in each generation; the other is to determine the control update time of both playerswhen computing the payoff of the game. Since the control vector only consists of N componentsof control value, in order to obtain an actual time history of the control, one need to know thecontrol update time, which is equal to t f /N . On the other hand, it is more desirable to let T givethe integration termination time rather than the exact final time. Let us denote the integrationtermination time as τ , then the control update time becomes τ/N . However, τ should be closeenough to t f .

4 ALGORITHM

The overall architecture of the proposed co-evolutionary method is represented in Figure 3.Three evolving groups are involved in the evolution process. Genetic operations to generate theoffspring populations go parallel for the 3 groups, whereas the fitness evaluation is performedsequentially. Details of the proposed algorithm will now be introduced.

4.1 Evolving Group Assignment

An individual of X and Y is the parameterized control vector of each player, and that of T isthe termination time. Thus, the parent populations of each evolving group can be expressed as

X = { ξ1 ξ2 · · · ξµX }Y = { η1 η2 · · · ηµY }T = { τ 1 τ 2 · · · τµT }

(15)

PURSUIT-EVASION GAMES 27

FIGURE 3 Architecture of the proposed co-evolutionary method for pursuit-evasion games.

where

ξ i = uV,iP = [uV,i

P,1, uV,iP,2, . . . , uV,i

P,N ]T i = 1, 2, . . . , µX

η j = uV, jE = [uV, j

E,1 , uV, jE,2 , . . . , uV, j

E,N ]T j = 1, 2, . . . , µY.

Offspring populations with size of λX, λY, and λT can be expressed similarly. For sequentialfitness evaluation by the Stackelberg equilibrium, the size of X and Y should be same. That is,µX = µY, and λX = λY.

4.2 Offspring Generation

The three evolving groups adopt recombination and mutation operations conventionally usedin evolution strategies [11]. An annealing scheme is implemented to avoid the prematurefreezing phenomenon [7]. The soft-wall method is employed to handle the boundary limit ofeach parameter [12].

28 HAN-LIM CHOI et al.

4.3 Payoff Evaluation

Since the final time is defined like Eq. (2), its formulation should be different with respect to thepayoff function. For the miss distance problem, the final time and miss distance are defined as

t f�= arg{min r(t): t ∈ [0,∞)} (16)

r f�=r(t f ). (17)

However, if the payoff is the final time, the final time is defined as

t f�= inf{t ∈ [0,∞): r(t) < rc} (18)

and in this case the miss distance is trivially rc.However, it is almost impossible to evaluate the final time exactly during the calculation

of the players’ trajectories, since numerical integration methods indispensably consider a dis-cretized time space. Thus, this work also employs a modification scheme to calculate a moreaccurate final time and miss distance based on the concept of zero effort miss distance.

The final time is defined as

t f�= t f + tgo,

t f�= arg{min

kr f (tk)} = arg{min

kr f (kδt)},

(19)

where δt is the integration time step, and k ∈ [1, 2, . . . , K ] for sufficiently large K . The finaltime-to-go, tgo is defined as

tgo = rgo

v, (20)

where

rgo =

− r · vv

if r f > rc

− r · vv

−√

r2c − r2

f if r f ≤ rc

(21)

andr f = r sin θ,

θ = cos−1

(− r · v

r v

),

(22)

where r and v are the relative position and velocity vectors, and r and v are their magnitudes.The over-bar means the value at time t f . Figure 4 shows a final relative geometry along withthe definition of t f and r f . It can be found that the definition of rgo depends on whether theinterception is successful or not.

4.4 Fitness Evaluation and Selection

The fitness of each individual is determined by the score of full-matches between the two playergroups. Let us consider a match between the i th individual of the minimizer (ξ i ) and the j thindividual of the maximizer (η j). In order to evaluate the payoff value, a termination time has

PURSUIT-EVASION GAMES 29

FIGURE 4 Terminal relative geometry with respect to the success or failure of interception: (a) successful intercep-tion, (b) failed interception.

to be selected out of T. Let us denote τ i j the corresponding termination time. τ i j is selectedrandomly out of T, if either i or j is greater than µX and µY. If not, τ i j is inherited from theprevious generation. The payoff value evaluated for the match of ξ i and η j is denoted as

J i j = J (ξ i , η j , τ i j). (23)

By evaluating J i j for all (µ + λ)2 matches, a cost matrix for the static game is completed, whichis to be used for the fitness evaluation of each individual. Since a Stackelberg game whoseleader is the maximizing parameter is considered, the fitness of the η j s should be evaluatedfirst. Fitness of a maximizing strategy is its worst-case performance, that is,

F(η j ) = mini

J i j . (24)

By this fitness, η j ’s rank in the matrix game is determined. Fitness of a minimizing strategyis evaluated after all the ranks for the η’s are determined. The individual in Y which occupiesthe i th rank is denoted as ηi . Then, the fitness of ξ is determined as follows:

ξ i = {ξ ∈ X−i−1: J (ξ , ηi ) ≤ J (ξ j , ηi ), ∀ξ j ∈ X−

i−1}, (25)

where X−0 = X, and X−

k+1 = X−k − {ξ k}.

In other words, the ξ which records the best score in the match versus ηi becomes the i thfittest one. However, no ξ i is allowed to hold two or more ranks, for the sake of wide explorationof the evolutionary search. Instead, the next best scorer replaces the rank, if he does not alreadyhold a higher rank [13].

Fitness of τ is determined after rank evaluation of ξ and η is over. If ξ i and η j mark the kthrank, τ i j marks the kth rank, incidentally. That is,

τ k = τ i j , if ξ k = ξ i and ηk = ηi . (26)

After all rank assignments for the three evolving groups are finished, the next parent populationis selected by using the (µ + λ) selection scheme.

30 HAN-LIM CHOI et al.

5 COMPARISON WITH GRADIENT-BASED METHOD

In this section, the authors will verify the proposed co-evolutionary method numerically bycomparing it with the gradient-based method. First, the authors introduce the gradient-basedmethod briefly and comment on its merits and demerits.

5.1 Gradient-Based Method

The gradient-based method deals with the final time problem, and assumes the lethal radiusto be zero. Thus, the payoff and the boundary function are written as J (uP, uE) = t f , andl(x(t f ), t f ) = r f , respectively. The initial setting of the gradient-based method is a pursuit-evasion situation in which uP is the minimum-time solution for an arbitrary uE. The minimum-time solution u∗

P and the corresponding payoff t∗f are eventually functions of uE. The

gradient-based method is composed of two procedures, update and correction. The updateprocedure determines �uE, a small variation of uE in the direction of maximizing t∗

f by usingan explicit expression of the gradient vector of uE. It also determines �uP for u∗

P + �uP tosatisfy the capture condition. However, the reiteration of the update procedure eventually maygive rise to a situation where the computed u∗

P is far from the minimum-time solution. In thiscase, the correction procedure provides a fine tuning of u∗

P to satisfy the optimality conditionwithin the user-specified error bounds. The above procedures are repeated until there is noimprovement in t∗

f .The gradient-based method has been very successful in that (1) it escapes from the inner-

outer loop structure that leads to long computing time, and (2) it has provided good performancein calculating capture sets. In spite of these merits of the gradient-based method, it suffers fromthe following two demerits: (1) it considers only the final time problem, and (2) it assumes zero-size lethal radius. The former has already been discussed in Section 1. The latter assumption isnot appropriate for practical situations. Most air-to-air missiles can harm the target aircraft evenif the miss distance is a few dozen meters. Thus, a capture set obtained by the gradient-basedmethod might underestimate the missile’s ability.

5.2 Numerical Examples

A two-dimensional pursuit-evasion situation is considered as described in Figure 5. The equa-tions of motion of the pursuer and the evader are expressed as follows:

xi = vi cos γi ,

yi = vi sin γi ,

γi = vi

ρiui , |ui | ≤ 1,

vi = −v2i

ρi(αi + βi u

2i ), i = P, E,

(27)

where x, y are the pursuer’s or the evader’s position, v is the speed and γ is the flight pathangle, respectively. u is the normalized control input, and ρ is the minimum turn radius. α andβ are related to aerodynamic coefficients. The values of ρ, α, and β for each player are givenas follows: αP = 0.0875, βP = 0.40, ρP = 1515.15 m, αE = 0, βE = 0.40, and ρE = 600 m.

To verify the proposed co-evolutionary algorithm the authors consider two engagementsituations: one is inside the capture set, while the other is outside of it.

PURSUIT-EVASION GAMES 31

FIGURE 5 Two-dimensional pursuit-evasion geometry.

Engagement 1: Inside the Capture Set

The initial conditions of state are:

(xP, yP, γP, vP) = (0 m, 0 m, 0◦, 600 m/s),

(xE, yE, γE, vE) = (1500 m, 0 m, 90◦, 200 m/s),

FIGURE 6 Trajectories for engagement 1.

32 HAN-LIM CHOI et al.

which is known to be inside the capture set even when rc = 0; therefore, this engagementleads to a final time problem. By utilizing the gradient-based method, one can obtain the game-optimal solution for zero-size lethal radius; the optimal final time is 3.5168 s. The authors solvethis problem using the proposed method with consideration of three lethal radii – 10 m, 20 m,and 30 m, in order to investigate the effect of the size of lethal radius on the game-optimalsolution. Players’ control inputs are parameterized with 30 time steps; the population sizesof all three evolving groups are 8 and 40 for parent populations and offspring populations,respectively.

Figure 6 represents game-optimal trajectories obtained by the proposed method. Referringto Figure 6, it is observed that (1) there are differences among the trajectories, but (2) thereexists a tendency in those differences. The authors are sure that these differences are due tovariations in lethal radius. Figure 7, in which relative trajectories between the two playersare depicted, shows this tendency more clearly. Since maximin solutions are considered, thepursuer trajectory should be the minimum-time solution against the corresponding evadertrajectory. In this sense, the authors argue that each relative trajectory in Figure 7 should bethe minimum-time trajectory of an imaginary pursuer against an imaginary stationary target aslarge as the corresponding lethal radius. If this argument is admitted, the tendency is explainedwith ease. To intercept a big imaginary target, the imaginary pursuer does not have to followthe trajectory which is optimal against a small target. Large maneuvers, which are required formovement along the optimal trajectory against a small target, cause loss of speed, and ensuingincrease of final time. Thus, the optimal relative trajectory appears lifted upward as the lethalradius increases.

FIGURE 7 Relative trajectories for engagement 1.

PURSUIT-EVASION GAMES 33

TABLE I Estimated Final Times Along Game Trajectories for Various Lethal Radii.

Solution cases t (r = 0m) t (r = 10m) t (r = 20m) t (r = 30m)

Gradient-based (rc = 0) 3.5168 3.4843 3.4523 3.4204Co-evolution (rc = 10 m) N/A 3.4789 3.4428 3.4094Co-evolution (rc = 20 m) N/A N/A 3.4372 3.4052Co-evolution (rc = 30 m) N/A N/A N/A 3.3990

Also, one can implicitly verify that the solution by the proposed method is the optimalsolution, based on the minimum-time solution argument above. Table I shows the estimatedintercept time with respect to the lethal radius along each trajectory. The bold value along thediagonal is of course the resultant payoff for the corresponding simulation. It is noticeable thata bold value is the minimum along each column; that is, the pursuer is guaranteed the minimumintercept time when it moves on the very trajectory given by the proposed method consideringthe appropriate lethal radius. Thus, the proposed method seems to provide the game-optimalsolution.

In Figure 8, game-optimal control histories are depicted. In spite of chattering in the controlhistories, one can find a tendency of changes in control input caused by variations in lethalradius. The chattering problem results partly from the stochastic characteristics of the co-evolution and partly from the insensitiveness of the final time to the rapid changes in control. Forthe latter reason, actually chattering is not so significant for obtaining the optimal trajectory.Chattering can be reduced if more computing time is allowed, or if the parameterization scheme

FIGURE 8 Control histories for engagement 1.

34 HAN-LIM CHOI et al.

FIGURE 9 Trajectories for engagement 2.

FIGURE 10 Control histories for engagement 2.

PURSUIT-EVASION GAMES 35

is modified. Since the co-evolutionary approach is a very general framework, to apply anotherparameterization scheme is comparatively easy.

Engagement 2: Outside the Capture Region

The initial conditions of the state are:

(xP, yP, γP, vP) = (0 m, 0 m, 30◦, 600 m/s),

(xE, yE, γE, vE) = (2000 m, 0 m, 180◦, 200 m/s).

The gradient-based method cannot provide a solution for this case, since the pursuer failsto intercept the evader. However, the proposed method is able to give a game solution whosepayoff is the miss distance. The sizes of populations and the number of parameters are thesame as those for the engagement 1 simulation; the lethal radius is assumed 30 m.

Figures 9 and 10 portray the trajectories and control histories. Also, the optimal r f of246.26 m at t f = 3.5338 s is obtained. Referring to the figures, the pursuer maneuvers asintensively as he can overcome the deficiency caused by initial heading error, while the evaderalso turns with large lateral acceleration to move towards the opposite direction to the pur-suer. However, after a while, the evader gradually decreases the magnitude of his maneuver,since large maneuver entails loss of speed, which may allow interception. All these situationsare very reasonable; thus, one can be sure that a solution given by the proposed algorithm isthe optimal one.

6 CONCLUSIONS

In this work, a co-evolutionary numerical solver for pursuit-evasion games considering thelethal radius is devised. The authors intend to obtain time-optimal and miss-distance-optimalgame solution for the cases inside and outside the capture set, respectively. In order to achievethis objective, the payoff function of the game is defined separately with regard to the successor failure of the interception. The Stackelberg game whose leader is the evader is adopted forfitness evaluation. An additional evolving group of integration time is introduced to treat anunprescribed final time. Numerical examples are given for verification of the suggested method.The proposed method provides reasonable solutions for the cases of initial conditions bothinside the capture set and outside the capture set. In addition, the effect of the size of the lethalradius is discussed based on the simulation results. In conclusion, the proposed co-evolutionarymethod can provide reasonable game solutions whatever the payoff is involved in.

Acknowledgements

The authors gratefully acknowledge the financial support given by the Agency for DefenseDevelopment and the Automatic Control Research Center, Seoul National University.

References

[1] Isaacs, R. (1967) Differential Games. John Wiley and Sons, New York[2] Tahk, M. J., Ryu, H., Kim, J. G. and Rhee, I. S. (1998)A gradient-based direct method for complex pursuit-evasion

games. Proceedings of 8th International Symposium on Dynamic Games, Vaals, Netherlands, pp. 579–582[3] Tahk, M. J., Ryu, H. and Kim, J. G. (1998)An iterative numerical method for a class of quantitative pursuit-evasion

games.Proceedings of AIAA Guidance, Navigation, and Control Conference, Boston, USA, pp. 175–182

36 HAN-LIM CHOI et al.

[4] Ryu, H. (2000) A gradient-based direct method for a class of quantitative pursuit-evasion games. PhDDissertation, KAIST

[5] Ehtamo, H. and Raivio, T. (2001) On applied nonlinear and bilevel programming for pursuit-evasion games.Journal of Optimization Theory and Applications, 108(1), 65–96

[6] Park, C. S. and Tahk, M. J. (1998) A co-evolutionary minimax solver and its application to autopilot design.Proceedings of AIAA Guidance, Navigation, and Control Conference, Boston, USA, pp. 408–415

[7] Tahk, M. J. and Sun, B. C. (2000) Co-evolutionary augmented Lagrangian methods for constrained optimization.IEEE Transactions on Evolutionary Computation, 4(2), 114–124

[8] Choi, H. L. and Tahk, M. J. (2000) An improved co-evolutionary method for pursuit-evasion game. Proceedingsof JSASS Aircraft Symposium, Sendai, Japan, pp. 637–640

[9] Kim, J. G. and Tahk, M. J. (2001) Co-evolutionary computation for constrained min-max problems and itsapplications for pursuit-evasion games. Proceedings of Congress on Evolutionary Computation, Seoul, Korea,pp. 1205–1212

[10] Basar, T. and Olsder, G. J. (1999) Dynamic Noncooperative Game Theory. SIAM, Philadelphia[11] Back, T. (1996) Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Program-

ming, Genetic Algorithms. Oxford University Press, New York[12] Choi, H. L. and Tahk, M. J. (2001) New boundary handling techniques for evolution strategies. International

Conference on Control, Automation, and Systems, Jeju, Korea, pp. 1222–1225[13] Hur, J. and Tahk, M. J. (2000) Robust flight control design using co-evolution with considerations of design

constraints. 3rd Asian Control Conference, Shanghai, China, p. 73