Hamiltonian Cycles and Singularly Perturbed Markov Chains

25
Hamiltonian Cycles and Singularly Perturbed Markov Chains * Vladimir Ejov Jerzy A. Filar Minh-Tuan Nguyen School of Mathematics – University of South Australia Mawson Lakes Campus, Mawson Lakes, SA 5095, Australia Abstract We consider the Hamiltonian cycle problem embedded in a singularly per- turbed Markov decision process. We also consider a functional on the space of deterministic policies of the process that consists of the (1,1)-entry of the funda- mental matrices of the Markov chains induced by the same policies. We show that when the perturbation parameter, ε, is less than or equal to 1 N 2 the Hamiltonian cycles of the directed graph are precisely the minimizers of our functional over the space of deterministic policies. In the process, we derive analytical expressions for the possible N distinct values of the functional over the, typically, much larger space of deterministic policies. Key words: Hamiltonian Cycle, Markov Decision Process, Optimal Policy, Singu- lar Perturbation. 1 Introduction This paper is a continuation of a line of research (Chen & Filar, 1992; Filar & Krass, 1994; Andramonov et al., 2000; Filar & Lasserre, 2000; Feinberg, 2000) which aims to exploit the tools of Markov Decision Processes (MDP’s) to study the properties of a famous problem of combinatorial optimization: the Hamiltonian Cycle Problem (HCP). More specifically, the present paper extends and improves the results presented in (Filar * This work is supported by Australian Research Council Grants Corresponding author. Tel.: +61 8302 3530, Fax: +61 83025785, E-mail: [email protected] Present address: Joint Systems Branch, DSTO, PO Box 1500, Edinburgh SA. 5111, Australia. 1

Transcript of Hamiltonian Cycles and Singularly Perturbed Markov Chains

Hamiltonian Cycles and Singularly

Perturbed Markov Chains∗

Vladimir Ejov Jerzy A. Filar† Minh-Tuan Nguyen‡

School of Mathematics – University of South Australia

Mawson Lakes Campus, Mawson Lakes, SA 5095, Australia

Abstract

We consider the Hamiltonian cycle problem embedded in a singularly per-

turbed Markov decision process. We also consider a functional on the space of

deterministic policies of the process that consists of the (1,1)-entry of the funda-

mental matrices of the Markov chains induced by the same policies. We show that

when the perturbation parameter, ε, is less than or equal to 1N2 the Hamiltonian

cycles of the directed graph are precisely the minimizers of our functional over the

space of deterministic policies. In the process, we derive analytical expressions for

the possible N distinct values of the functional over the, typically, much larger

space of deterministic policies.

Key words: Hamiltonian Cycle, Markov Decision Process, Optimal Policy, Singu-

lar Perturbation.

1 Introduction

This paper is a continuation of a line of research (Chen & Filar, 1992; Filar & Krass,

1994; Andramonov et al., 2000; Filar & Lasserre, 2000; Feinberg, 2000) which aims to

exploit the tools of Markov Decision Processes (MDP’s) to study the properties of a

famous problem of combinatorial optimization: the Hamiltonian Cycle Problem (HCP).

More specifically, the present paper extends and improves the results presented in (Filar

∗This work is supported by Australian Research Council Grants†Corresponding author. Tel.: +61 8302 3530, Fax: +61 83025785, E-mail: [email protected]‡Present address: Joint Systems Branch, DSTO, PO Box 1500, Edinburgh SA. 5111, Australia.

1

& Liu, 1996). In the latter paper it was shown that Hamiltonian Cycles of a graph can

be characterized as the minimizers of a functional based on the fundamental matrices of

Markov Chains induced by deterministic policies in a suitably perturbed MDP; provided

that the value of the perturbation parameter, ε, is sufficiently small.

However, no estimate was provided in (Filar & Liu, 1996) as to how small is the above

“sufficiently small” ε. Indeed, it was entirely conceivable that such an ε would tend to

0 so fast, as the number of nodes, N , increased, as to render the practical value of the

preceding result quite negligible. Fortunately, that is not the case. In particular, we

demonstrate that ε ≤ 1N2 qualifies as ε sufficiently small.

Thus for a graph of 10,000 nodes, ε ≤ 10−8 is good enough and that is well within the

precision limits of most computers. The proof of the above bound is quite technical

and it utilizes the precise form of closed form expressions for up to N distinct values

of our functional; corresponding to the deterministic policies belonging to an N -way

partition of the space of such policies. The fact that the functional is invariant with

respect to the policies belonging to a single member of this partition was also observed

in (Filar & Liu, 1996), however, the proofs provided here are more elegant and include

the analytical closed form expressions for the N distinct values of the functional that

were only estimated in (Filar & Liu, 1996).

Thus the present paper fully supersedes (Filar & Liu, 1996) but, arguably, it would not

have been possible without the results of the latter as a starting point. We also make

use of classical results on MDP’s due to (Blackwell, 1962).

In the line of research initiated in (Filar & Krass, 1994), the goal is to express the

HCP as a problem involving the minimization of functionals from the theory of Markov

decision processes. This is done with the hope that the resulting embedding in a con-

tinuous setting - endowed with probabilistic interpretation - will stimulate research into

algorithmic procedures that exploit this novel relaxation of the HCP. The particular

functional discussed in this paper is based on the fundamental matrix of Markov chains

induced by deterministic policies. The latter is a classical, important, operator in the

theory of Markov chains but - to the best of our knowledge - its relationship with com-

binatorial problems such as the HCP has not been studied prior to (Filar & Liu, 1996);

the predecessor of the present contribution.

2

2 Representing a directed graph as an MDP

Consider a directed graph G with the node set S and the arc set A. We can associate

a Markov Decision Process Γ with the graph G as follows

• The set of N nodes is the finite state space S = {1, 2, · · · , N} and the set of arcs

in G is the total action space A = {(i, j), i, j ∈ S} where, for each state (node) i,

the action space is the set of arcs (i, j) emanating from this node.

•{

p(j|i, a) = δaj

∣ a = (i, j) ∈ A, i, j ∈ S}

, where δaj the Kronecker delta, is the set

of (one-step) transition probabilities. Note that, we are adopting the convention

that a equals to both arc (i, j) and its “head” j, whenever there is no possibility

of confusion as to the “tail” i.

A stationary policy π in Γ is a set of N probability vectors π(i) =(

π(i, 1), π(i, 2),· · · ,π(i, N)

)

, where π(i, k) denotes the probability of choosing an action k (arc emanating

from i to k) whenever state (node) i is visited. Of course,∑N

k=1 π(i, k) = 1 and if the

arc (i, k) /∈ A, then π(i, k) = 0.

A deterministic policy f is simply a stationary policy that selects a single action with

probability 1 in every state (hence, all other available actions are selected with probabil-

ity 0). That is, f(i, k) = 1 for some (i, k) ∈ A. For convenience, we will write f(i) = k

in this case.

Assume that 1 is the initial state (home node). We shall say that a deterministic policy

f in Γ is a Hamiltonian Cycle (HC) in G if the sub-graph Gf with the set of arcs{(

1, f(1))

,(

2, f(2))

, · · · ,(

N, f(N))}

is a HC in G. If the sub-graph Gf contains cycles

of length less than N , say m, we say that f has an m-sub-cycle.

However, such a straightforward identification of G with Γ leads to an inevitable diffi-

culty of confronting multiple ergodic classes induced by various deterministic policies.

This can be illustrated on a complete graph G4 on 4 nodes (without self-loops) in Fig-

ure 1. A policy f such that f(1) = 2, f(2) = 1, f(3) = 4 and f(4) = 3 induces a

sub-graph Gf ={

(1, 2), (2, 1), (3, 4), (4, 3)}

which contains two 2-sub-cycles (see Fig-

3

PSfrag replacements 1 2

34

Figure 1: Complete graph G4

PSfrag replacements 1 2

34

Figure 2: Sub-graph Gf

ure 2). Policy f also induces a Markov chain with the probability transition matrix

P (f) =

0 1 0 0

1 0 0 0

0 0 0 1

0 0 1 0

which has two ergodic classes corresponding to the sub-cycles of Gf .

Hence, the direct embedding of G in Γ, in general, induces a multi-chain ergodic struc-

ture. This and some other technical difficulties would vanish if we force the MDP to

be “unichain”. This is achieved by passing to a singularly perturbed MDP Γε, that is

obtained from Γ by introducing perturbed transition probabilities{

pε(j|i, a)∣

∣ (i, j) ∈A, i, j ∈ S

}

, where for any ε ∈ (0, 1)

pε(j|i, a)def=

1 if i = 1 and a = j,

0 if i = 1 and a 6= j,

1 if i > 1 and a = j = 1,

ε if i > 1, a 6= j and j = 1,

1 − ε if i > 1, a = j and j > 1,

0 if i > 1, a 6= j and j > 1.

Note that 1 denotes the “home” node. For each pair of nodes i, j (not equal to 1)

corresponding to a (deterministic) arc (i, j), our perturbation replaces that arc by a pair

of “stochastic arcs” (i, 1) and (i, j) (see Figure 3) with weights ε and 1− ε respectively.

This stochastic perturbation has the interpretation that a decision to move along arc

4

PSfrag replacements

1

Perturbationε 1 − ε

ii

jj

Figure 3: Perturbation of a deterministic action (arc)

(i, j) results in movement along (i, j) only with probability of (1−ε) and with probability

ε it results in a return to the home node 1.

Next, we recall that any stationary policy π induces a probability transition matrix

Pε(π) =[

pε(j|i, π)]

, i, j = 1, · · · , N,

where for all i, j ∈ S

pε(j|i, π) =N

a=1

pε(j|i, a) π(i, a).

In the above summation, we assume that pε(j|i, a) = 0 if the arc (i, a) /∈ A.

We emphasize that the perturbation is chosen to ensure that the Markov chain defined

by Pε(π) contains only a single ergodic class. On the other hand, the ε-perturbed process

Γε clearly “tends” to Γ as ε → 0, in the sense that Pε(π) → P0(π) for every stationary

policy π.

Our ultimate goal is to construct a simple function that is minimized over the set of

stationary policies Π only by a Hamiltonian cycle. Here we achieve this for the set of

deterministic policies D using the function extracted from the well-known characteristic,

associated with the MDP Γε, called the fundamental matrix.

3 Main Result

Let π ∈ Π be a stationary policy and Pε(π) be the corresponding probability transition

matrix. By P ∗ε (π) we denote its stationary distribution matrix, which is defined as its

limit Cesaro-sum matrix

P ∗ε (π)

def= lim

T→∞

1

T + 1

T∑

t=0

P tε(π), P 0

ε (π) = IIN ,

5

where IIN is a N × N identity matrix. The fundamental matrix F associated with π is

defined byF (π)

def=

(

IIN − Pε(π) + P ∗ε (π)

)−1.

In order to distinguish HC from among other deterministic policies in D, we use the

(1, 1)-entry of the matrix F (π),

L(π)def= [F (π)]1,1,

as an objective function. Denote by

dm(ε)def= 1 +

m∑

k=2

(1 − ε)k−2 =1 + ε − (1 − ε)m−1

ε, ∀m ≥ 2. (3.1)

We state our main results as the theorem below

Theorem 3.1.

(i) Graph G contains a Hamiltonian cycle GHC with a corresponding deterministic

policy fHC, if and only if for ε ≤ 1N2 , ∀N ≥ 3

minf∈D

L(f) = L(fHC)def= LHC ,

where LHC =1

dN(ε)− 1

d2N(ε)

+1

εdN(ε)− (1 − ε) + (N − 1)(1 − ε)N−1

εd2N(ε)

.

(ii) If G does not contain Hamiltonian cycles, then for ε ≤ 1N2 , ∀N ≥ 3

minf∈D

L(f) > LHC .

It should be noted that whenever part (ii) applies, minf∈D L(f) identifies the length of

the longest cycle that passes through the home node. By considering all possible home

nodes the value of the above minimum also identifies the length of the longest cycle in

a non-Hamiltonian graph 1. The essence of the proof of Theorem 3.1 will be presented

in the next two sections whereas its technical aspects will be gathered in the Appendix.

4 Derivation of the objective function L

To start with, we shall derive a useful partition of D that is based on the graphs “traced

out” in G by deterministic policies (see also Chen & Filar, 1992; Filar & Vrieze, 1996).

As above, with each f ∈ D we associate a sub-graph Gf of G defined by

(i, j) ∈ Gf ⇐⇒ f(i) = j.

1We thank an anonymous referee for this observation.

6

We shall also denote a simple cycle of length m and beginning at 1 by a set of arcs

c1m =

{

(i1 = 1, i2), (i2, i3), · · · , (im, im+1 = 1)}

, m = 2, 3, · · · , N.

Note that c1N is a HC. If Gf contains a cycle c1

m, we write Gf ⊃ c1m. Let

C1m

def=

{

f ∈ D∣

∣Gf ⊃ c1

m

}

,

namely, the set of deterministic policies that trace out a simple cycle of length m,

beginning at node 1, for each m = 2, 3, · · · , N . Of course, C1N is the (possible empty)

set of policies that correspond to HC’s and any single C1m can be empty depending in

the structure of the original graph G. Thus, a typical policy f ∈ C1m traces out a graph

Gf in G that might look as Figure 4.

••• ••

PSfrag replacements

1 i2 im−1 im

Figure 4: A typical policy f in C1m

where the dots indicate the “immaterial” remainder of Gf that corresponds to states

that are transient in Pε(f), as a result of the perturbation (without the perturbation

there might exist other ergodic classes that do not contain the home node 1).

We use the further partition of the deterministic policies of the form

D =

[ N⋃

m=2

C1m

]

B,

where B contains all deterministic policies that are not in any of the C1m’s. A typical

policy f ∈ B traces out a sub-graph Gf in G as in Figure 5.

• ••

PSfrag replacements

1 i2 imik ik+1

Figure 5: A typical policy f in B

where the dots again denote the immaterial part of Gf . It is important, however, to

note that the transient states {1, i2, · · · , ik−1} in Γ are no longer transient in Γε for any

ε > 0.

7

According to the theory of Markov chains, the stationary distribution matrix P ∗ε (π)

consists of N identical row-vectors p∗ε(π)

def=

(

p∗1(π), · · · , p∗N(π))

. That is

P ∗ε (π)

def=

p∗ε(π)...

p∗ε(π)

=

p∗1(π) · · · p∗N(π)...

... · · ·p∗1(π) · · · p∗N(π)

.

Using this notation, we define the long-run expected state-action frequencies induced

by a stationary policy π ∈ Π as

xia(π)def= p∗i (π) π(i, a).

for i = 1, 2, · · · , N and all arcs (i, j) ∈ A. Assuming that π(i, a) = 0 for any arc

(i, a) /∈ A, we may arrange{

xia(π)}

as entries of an N × N frequency matrix X(π) =[

xia(π)]

; i, a = 1, · · · , N .

The frequency of visits to the state {1}, for example, is then given by

x1(π) =N

a=1

x1a(π) = p∗1(π), sinceN

a=1

π(1, a) = 1.

For technical reasons, it is convenient to represent our objective function L(π) in the

form

L(π) = x1(π) + y1(π),

where x1(π) = p∗1(π) is the long-run frequency of visits to node 1 and y1(π) is the first

component of the vector y(π) =[

IIN −Pε(π) + P ∗ε (π)

]−1r−P ∗

ε (π)r, r = (1, 0, · · · , 0)T .

Using the Markov chain identities

P ∗ε (π)Pε(π) = Pε(π)P ∗

ε (π) = P ∗ε (π),

we can alternatively (ref. Blackwell, 1962) express y(π) as a unique solution of the

system

p∗ε(π)y(π) = 0 (4.1a)

[

IIN − Pε(π)]

y(π) = r −[

x1(π), · · · , x1(π)]T

. (4.1b)

We will explicitly calculate x1(f) and y1(f) for different types of deterministic policies,

namely, f ∈ C1N , f ∈ C1

m(m < N) and f ∈ B. Due to the following lemma, it will be

sufficient to perform the computation for the particular representatives of each class.

8

Lemma 4.1. Let f ∈ D and θ be a permutation of nodes {1, · · · , N} that preserves the

home node {1}. Define a permutation, θ(f) for a deterministic policy f by θ(f)(i)def=

θ(

f(θ−1(i)))

. Then

L(f) = L(

θ(f))

.

Proof. Permutation θ induces the conjugate transformation on the transition probability

and stationary distribution matrices of the form

Pε(θ(f)) = ΘPε(f)Θ−1,

P ∗ε (θ(f)) = ΘP ∗

ε (f)Θ−1,

where Θ is the matrix with the (i, j)-entry Θi,j = δθ(i),j representing the permutation θ.

Hence, the fundamental matrix F(

θ(f))

=(

IIN −Pε(θ(f)) + P ∗ε (θ(f))

)−1transforms as

F(

θ(f))

= ΘF (f)Θ−1.

Since θ preserves {1}, the same is true for θ−1. Unit in position (1, 1) is the only non-zero

element in the first row and the first column of Θ and Θ−1. Therefore

L(

θ(f))

=[

F (θ(f))]

1,1=

[

ΘF (f)Θ−1]

1,1=

[

F (f)]

1,1= L(f).

Thus, without loss of generality, it is sufficient to consider only fm ∈ C1m tracing out the

graph

••• ••

PSfrag replacements

1 2 m−1 m

Figure 6: A representative fm of the whole class C1m

as the representative of the whole class C1m and also, f k

m ∈ B that traces out

. . .

PSfrag replacements

1 2 k k+1 m

Figure 7: A representative f km of the whole class B

9

as the representative of B. The following lemma verifies that the transient states for

fm ∈ C1m and fk

m ∈ B are, indeed, immaterial as far as L(f) is concerned.

Lemma 4.2. Let Γmε be the restriction of Γε to the first m < N nodes {1, 2, , · · · ,m};

f̄m and f̄km are respectively the restrictions of fm ∈ C1

m and fkm ∈ B to the part of G

consisting of the first m nodes. Then

L(fm) = L(f̄m),

L(fkm) = L(f̄k

m).

Proof. In both cases, for f = fm and f = fkm the matrix [F (f)]−1 = IIN −Pε(f) + P ∗

ε (f)

has the form

[F (f)]−1 =

X O

Y Z

,

where X,Y, Z are the matrices of dimensions

m×m, (N −m) ×m, (N −m) × (N −m), respectively. Matrix O is the m× (N −m)

zero matrix. Hence,

F (f) =

X O

Y Z

−1

=

X ′ W ′

Y ′ Z ′

,

where XX ′ = X ′X = IIm, XW ′ = 0, Y X ′ + ZY ′ = 0 and Y W ′ + ZZ ′ = IIN−m.

Thus X ′ = X−1 and L(f) =[

X−1]

1,1. Matrix X−1 is , in fact, the fundamental matrix

that corresponds to the restricted f̄ of f and this completes the proof of the lemma.

Now we are ready to derive L(f).

Lemma 4.3. Let f = f kN ∈ B. Then

• its stationary probability distribution vector is

p∗ε(f) =

(

ε

1 + ε,

ε

1 + ε,ε(1 − ε)

1 + ε, · · · ,

ε(1 − ε)k−3

1 + ε,(1 − ε)k−2

KN

,

· · · ,(1 − ε)k−3+j

KN

, · · · ,(1 − ε)N−2

KN

)

, (4.2)

where j = 1, · · · , N−k+1 and KNdef= (1+ε)

∑N−k

i=0 (1−ε)i = 1+εε

(

1−(1−ε)N−k+1)

and

10

• the value of the objective function is

L(f) =1 + ε + ε2

(1 + ε)2. (4.3)

Proof. The stationary probability distribution vector p∗ε(f) is the unique solution of the

system

p∗ε(f)

(

IIN − Pε(f))

= 0, (4.4a)

N∑

j=1

p∗j(f) = 1 (4.4b)

Checking for the vector p∗ε(f) specified in (4.2), we observe that

ε

1 + ε+

ε

1 + ε+

ε(1 − ε)

1 + ε+ · · · +

ε(1 − ε)k−3

1 + ε=

1 + ε − (1 − ε)k−2

1 + ε,

and alsoN−k+1∑

j=1

(1 − ε)k−3+j

KN

=(1 − ε)k−2

1 + ε.

Hence,N

j=1

p∗j(f) =1 + ε − (1 − ε)k−2

1 + ε+

(1 − ε)k−2

1 + ε= 1

which verifies (4.4b). For the first equation (4.4a), let us expand it as follows

(

p∗1(f), · · · , p∗N(f))

0 1 0 0 · · ·kth

0 · · · 0 0

ε 0 1 − ε 0 · · · 0 · · · 0 0

ε 0 0 1 − ε · · · 0 · · · 0 0...

......

ε 0 0 0 · · · 1 − ε · · · 0 0...

......

ε 0 0 0 · · · 0 · · · 0 1 − ε

ε 0 0 0 · · · 1 − ε · · · 0 0

=(

p∗1(f), · · · , p∗N(f))

For the first entry of both sides, we have

εN

j=2

p∗j(f) = ε(

1 − p∗1

)

= ε − ε

1 + ε=

ε

1 + ε= p∗1(f).

11

For the second entry:

p∗1(f) =ε

1 + ε= p∗2(f).

For the jth entries, j ∈ [3, N ]:

• if j 6= k, then

(1 − ε)p∗j−1(f) = p∗j(f),

• if j = k, then

(1 − ε)p∗k−1 + (1 − ε)p∗N(f) =ε

1 + ε(1 − ε)k−2 +

(1 − ε)N−1

KN

=(1 − ε)k−2

KN

= p∗k(f).

These verify the expression of p∗ε(f).

To prove (4.3), we note that x1(f) = p∗1(f) = ε1+ε

and find the value of y1(f). We solve

(4.1b) for y(f) = (y1, y2, · · · , yN ):

y1 − y2 = 1 − ε

1 + ε

−εy1 + y2 − (1 − ε)y3 = − ε

1 + ε

· · · · · · · · ·

−εy1 + yN−1 − (1 − ε)yN = − ε

1 + ε

−εy1 + yN − (1 − ε)yk = − ε

1 + ε.

The rank of this linear system equals N−1. Hence, if we set yj = y1 +wj, j = 2, · · · , N ,

then the first N − 1 equations are sufficient to uniquely determine wj = − 11+ε

, j =

2, · · · , N . The last equation then is clearly verified

−εy1 − (1 − ε)y1 +1 − ε

1 + ε+ y1 −

1

1 + ε= − ε

1 + ε.

To determine y1 we need to use (4.1a), which implies

( N∑

j=1

p∗j(f)

)

y1 −1

1 + ε

N∑

j=2

p∗j(f) = y1 −1

1 + ε

N∑

j=1

p∗j(f) +1

1 + εp∗1(f)

= y1 −1

(1 + ε)2

= 0.

12

Thus, y1 = y1(f) = 1(1+ε)2

and we conclude that

L(f) = x1(f) + y1(f) =ε

1 + ε+

1

(1 + ε)2=

1 + ε + ε2

(1 + ε)2

We note that L(f) does not depend on the number of transient states for f ∈ B.

Along the same lines, we proceed with the policy fHC , that corresponds to the Hamil-

tonian cycle.

Lemma 4.4. Let f = fHCdef= fN ∈ C1

N . Then

• its stationary probability distribution vector is

p∗ε(f) =

(

1

dN

,1

dN

,1 − ε

dN

,(1 − ε)2

dN

, · · · ,(1 − ε)N−2

dN

)

• the value of the objective function is

L(f) =1

dN

− 1

d2N

+1

εdN

− (1 − ε) + (N − 1)(1 − ε)N−1

εd2N

.

Proof. The expression for p∗ε(f) can be found in (Filar & Krass, 1994) or (Chen & Filar,

1992). However, it can be verified as a unique solution of system (4.4). In particular,

x1(f) = 1dN

.

Again, we employ system (4.1) to determine y(f) and hence y1(f). Note that (4.1b)

takes the form

y1 − y2 = 1 − 1

dN

−εy1 + y2 − (1 − ε)y3 = − 1

dN

· · · · · · · · ·

−εy1 + yN−1 − (1 − ε)yN = − 1

dN

−y1 + yN = − 1

dN

.

By induction on k, one can easily verify that the kth component of y(f) is

yk = y1 −1

dN

+1

(1 − ε)k−2

(1 − ε)N−1 − (1 − ε)k−1

1 + ε − (1 − ε)N−1, ∀k = 2, · · · , N.

13

The above and (4.1a) then imply

y1

N∑

k=1

p∗k(f) − 1

dN

N∑

k=1

p∗k(f) +1

dN

p∗1(f) +N

k=2

p∗k(f)

(1 − ε)k−2

(1 − ε)N−1 − (1 − ε)k−1

1 + ε − (1 − ε)N−1= 0.

Using (3.1), the property∑N

k=1 p∗k(f) = 1 and the expression for p∗1(f) and p∗k(f), k ≥ 2,

the above equation can be rewritten as

y1 −1

dN

+1

d2N

+N

k=2

1

dN

(1 − ε)N−1 − (1 − ε)k−1

1 + ε − (1 − ε)N−1

= y1 −1

dN

+1

d2N

+1

εd2N

{ N∑

k=2

(1 − ε)N−1 − (1 − ε)N

k=2

(1 − ε)k−2

}

= y1 −1

dN

+1

d2N

+1

εd2N

{

(N − 1)(1 − ε)N−1 − (1 − ε)(dN − 1)}

= y1 +1

d2N

− 1

εdN

+(1 − ε) + (N − 1)(1 − ε)N−1

εd2N

= 0.

Thus,

y1(f) = y1 = − 1

d2N

+1

εdN

− (1 − ε) + (N − 1)(1 − ε)N−1

εd2N

,

and so

L(f) = x1(f) + y1(f) =1

dN

− 1

d2N

+1

εdN

− (1 − ε) + (N − 1)(1 − ε)N−1

εd2N

.

Corollary 4.1. For m > 2, if f = fm ∈ C1m, then

L(f) = x1(f) + y1(f) =1

dm

− 1

d2m

+1

εdm

− (1 − ε) + (m − 1)(1 − ε)m−1

εd2m

.

Proof. immediately follows from the application of Lemma 4.2 and Lemma 4.4.

5 Crucial argument about the estimate ε ≤ 1N2

Lemma 5.1. For ε ≤ 1N2 , fN−1 ∈ C1

N−1 and fHC = fN ∈ C1N , the following holds

L(fN) < L(fN−1).

14

We postpone the proof of this lemma (which is straight-forward but technical) to the

Appendix. Here we demonstrate that the statement of the lemma is crucial for the proof

of our main result – Theorem 3.1.

Indeed, if we compare L(fN) and L(fm), m < N, fm ∈ C1m, then iteratively using

Lemma 5.1, one can easily see that

∀ ε ≤ 1

N2, m ∈ [2, N − 1] : L(fN) < L(fm).

Thus, for any policy f ∈⋃N−1

m=2 C1m, whenever ε ≤ 1

N2 , then L(fHC) < L(f).

It remains to verify that L(fHC) < L(f), with f ∈ B. By the above argument, as

L(f) with f = f kN ∈ B is independent of k and N , it will be sufficient to show that

L(f3 ∈ C1m) < L(f 2

3 ∈ B).

By virtue of Lemma 4.3 and Lemma 4.4, we know that

L(f 23 ) − L(f3) =

1 + ε + ε2

(1 + ε)2− 3(2 − ε)

(3 − ε)2=

3 − 6ε + 4ε2 − 2ε3 + ε4

(1 + ε)2(3 − ε)2,

which is clearly positive for any ε ≤ 1/2.

Theorem 3.1 is, therefore, proved provided that the validity of Lemma 5.1 can be estab-

lished. The latter is done in the Appendix.

6 Conclusion and open problems

Theorem 3.1 motivates the following characterization of graphs with Hamiltonian cycles:

Conjecture 6.1. Graph G contains a Hamiltonian cycle if and only if

minπ∈Π

L(π) = LHC ,

where LHC defined in Theorem 3.1 and Π is the set of all stationary policies; if ε ≤ 1N2 .

The results of (Filar & Krass, 1994) - based on the stationary distribution operator

of Markov chains - have already led to algorithmic approaches to the HCP (e.g. see

(Andramonov, 2002) and (Ejov et al., 2002)). If the above conjecture can be established

it is reasonable to expect that algorithms that begin with fully randomized policies and

15

approach the degenerate (deterministic) policies corresponding to Hamiltonian cycles

could be explored. However, if the conjecture were false such “interior point” methods

may arrive at a randomized policy πLM ∈ Π with L(πLM) < LHC . By analogy, the

elegance of linear programming stems from the fact that it is impossible to find an

interior global minimum that is strictly better than all extreme points. Therefore, a

confirmation of our conjecture would raise hopes that an intelligent search procedure

for a Hamiltonian cycle could be devised. It is possible that the results of (Kallenberg,

1983; Hordijk & Kallenberg, 1984; Ross, 1989; Kallenberg, 2002) could be of help in

establishing the above conjecture.

Conjecture 6.1 is confirmed by our numerical experiments on Maple for convex combi-

nations π = π(λ, f, g)def= λf + (1−λ)g, λ ∈ [0, 1] of deterministic policies f, g ∈ D with

the corresponding probability transition matrices as Pε(π) = λPε(f)+(1−λ)Pε(g). Our

experiments for N ≤ 20 show that for such π the objective function

L(π) ≥ L(fHC),

and the equality takes place if and only if π = fHC , that is, g = fHC and λ = 0.

Figure 8 shows the behaviour of the objective function L(π) with some different convex

combinations of f ∈ D for fB, gB ∈ B, fHC and fm ∈ C1m, m < N .

10.80.60.40.20

1.1

1

0.9

0.8

0.7

0.6

PSfrag replacements

λ

L(

π(λ, fB , f′

B))

L(

π(λ, fB , fN ))

L(

π(λ, fB , f′

N))

L(

π(λ, fB , fm))

Figure 8: Behaviour of the objective function L(π)

16

7 Appendix: Verification of the estimate ε ≤ 1N2

- (Proof of Lemma 5.1) -

Setting t = 1 − ε, we observe from (3.1) that

dm(ε) =1 + ε − tm−1

ε

Using this expression for m = N and m = N + 1, we obtain by Lemma 4.4 that

L(fN) − L(fN+1) =tN−1QN(ε, t)

(1 + ε − tN)2(1 + ε − tN−1)2, (7.1)

where

QN(ε, t) = − t2N + t2N−1 + tN+1 − tN−1 − t + 1

+ ε{

−Nt2N + (N + 1)t2N−1 + tN+1 − 2tN − tN−1 + (N − 1)t − (N − 2)}

+ ε2{

tN+1 − 2tN − tN−1 + (2N − 1)t − (2N − 3)}

+ ε3{

(N − 1)t − (N − 2)}

. (7.2)

So, it is sufficient to verify that QN(ε, t) > 0 for all ε ≤ 1N2 .

Substituting t = 1 − ε, we now write QN as polynomial of ε only by expanding every

term in the above formula of QN(ε, t). For example, the contribution of −t2N to the

coefficient at εk for 0 ≤ k ≤ 2N + 1, will be equal to −(

2N

k

)

and we denote it as

−t2N =⇒εk

−(−1)k(

2N

k

)

,

with(

α

β

)

being the usual notation for the binomial coefficients. We assume that(

α

β

)

= 0

for β > α and also that(

α

β

)

= 0 for β < 0.

Analogous contribution of t2N−1 is

ε(N + 1)t2N−1 =⇒εk

(−1)k−1(

2N−1k−1

)

= −(−1)k(

2N−1k−1

)

Adding up all such contributions to the coefficient at εk, we obtain

QN(ε) = QN(ε, t) =2N+1∑

k=0

(−1)k

(

−(

2N

k

)

+(

2N−1k

)

+(

N+1k

)

−(

N−1k

)

−(

1k

)

+(

0k

)

+ N(

2N

k−1

)

− (N + 1)(

2N−1k−1

)

−(

N+1k−1

)

+ 2(

N

k−1

)

+(

N−1k−1

)

− (N − 1)(

1k−1

)

+ (N − 2)(

0k−1

)

+(

N+1k−2

)

− 2(

N

k−2

)

−(

N−1k−2

)

+ (2N − 1)(

1k−2

)

− (2N − 3)(

0k−2

)

− (N − 1)(

1k−3

)

+(

0k−3

)

)

εk. (7.3)

17

It turns out that the coefficients at ε0, ε, ε2 and ε3 in the above formula are all vanish.

The expression QN(ε) thus takes the form.

QN(ε, t) = QN(ε) =1

2N(N + 1)ε4 − 1

6(N − 1)(N 3 + 6N 2 − 4N − 6)ε5

+2N+1∑

k=6

(−1)k

(

−(

2N

k

)

+(

2N−1k

)

+(

N+1k

)

−(

N−1k

)

+ N(

2N

k−1

)

− (N + 1)(

2N−1k−1

)

−(

N+1k−1

)

+ 2(

N

k−1

)

+(

N−1k−1

)

+(

N+1k−2

)

− 2(

N

k−2

)

−(

N−1k−2

)

)

εk. (7.4)

Grouping together terms with(

2N

)

and(

2N−1•

)

, we obtain a single term with common

factor(

2N−1k−2

)

. For example,(

2N

k

)

=2N(2N − k + 1)

k(k − 1)

(

2N − 1

k − 2

)

Analogously, grouping together all remaining terms and extracting(

N−1k−4

)

as the common

factor, such as, for example,(

N + 1

k

)

=N(N + 1)(N − k + 2)(N − k + 3)

(k − 3)(k − 2)(k − 1)k

(

N − 1

k − 4

)

,

we obtain a representation for QN(ε) as follows

QN(ε) =1

2N(N + 1)ε4 − 1

6(N − 1)(N 3 + 6N 2 − 4N − 6)ε5

+2N+1∑

k=6

(−1)k

(

αk

k − 1

(

2N−1k−2

)

)

εk +N+3∑

k=6

(−1)k

(

βk

(k − 1)(k − 2)(k − 3)

(

N−1k−4

)

)

εk.

(7.5)

where

αk = Nk − 5N + 2k − 2

βk = 4N 3 − 13kN 2 + 25N 2 + 13Nk2 − 52kN + 47N − 3k3 + 18k2 − 33k + 18. (7.6)

In order to see the positivity of QN(ε), let us combine pairs of consecutive terms for

k = 2i and k = 2i + 1 and re-arrange it as follows

QN(ε) =1

2N(N + 1)ε4 − 1

6(N − 1)(N 3 + 6N 2 − 4N − 6)ε5

+N

i=3

(2N − 1)!

(2i − 1)!(2N − 2i)!

(

α2i

2N − 2i + 1− εα2i+1

2i

)

ε2i

+

N

2

+1∑

i=3

(N − 1)!

(2i − 1)!(N − 2i + 2)!

(

β2i

N − 2i + 3− εβ2i+1

2i

)

ε2i

+ RN(ε), (7.7)

18

where

RN(ε) =

εN+3 if N is odd and N ≥ 3,

0 if N is even and N ≥ 4.

We observe that

• For ε ≤ 1/N 2

1

2N(N + 1)ε4 − 1

6(N − 1)(N 3 + 6N 2 − 4N − 6)ε5

≥ ε4

{

1

2N(N + 1) − 1

6N2(N − 1)(N 3 + 6N 2 − 4N − 6)

}

=ε4

3N2(N 4 − N3 + 5N 2 + N − 3) > 0.

• The last term RN(ε) ≥ 0

• On the other hand, the positivity of the terms containing the sums,∑N−1

i=3 and∑

N+3

2

−1

i=3 , of QN(ε) will be the consequence of the following technical result.

Proposition 7.1. Suppose that ε is chosen in such a way 0 < ε ≤ 1/N 2, then the

following inequalities hold

2iα2i − ε(2N − 2i + 1)α2i+1 >0, ∀ 3 ≤ i ≤ N,

2iβ2i − ε(N − 2i + 3)β2i+1 >0, ∀ 3 ≤ i ≤⌊

N+32

and N > 14.

Remark 7.1. Since the last inequality of Proposition 7.1 is only valid for N > 14, we

still need to check that L(fN) − L(fN+1) > 0 for all 3 ≤ N ≤ 14. This can be done

by a simple loop program using, for example, Maple, or any other computer algebra

software. Figure 9 is typical plots of L(fN) and L(fN+1) for 0 < ε < 1/N 2 with

particular N = 7, 10 and 14.

Proof of Proposition 7.1. We shall prove a stronger version of Proposition 7.1. That

is,

kαk − ε(2N − k + 1)αk+1 >0, ∀ 6 ≤ k ≤ 2N, (7.8)

kβk − ε(N − k + 3)βk+1 >0, ∀ 6 ≤ k ≤ N + 3 and N > 14, (7.9)

which hold for all 0 < ε ≤ 1/N 2.

19

0.020.0150.010.0050

0.585

0.58

0.575

0.57

0.565

PSfrag replacements

L(f7)

L(f8)

L(f10)

L(f11)

L(f14)

L(f15)

ε0.010.0080.0060.0040.0020

0.562

0.56

0.558

0.556

0.554

0.552

0.55

0.548

0.546

PSfrag replacements

L(f7)

L(f8)

L(f10)

L(f11)

L(f14)

L(f15)

ε0.0050.0040.0030.0020.0010

0.544

0.542

0.54

0.538

0.536

0.534

PSfrag replacements

L(f7)

L(f8)

L(f10)

L(f11)

L(f14)

L(f15)

ε

Figure 9: Typical plot of L(fN) and L(fN+1) for 0 < ε < 1/N 2

We start with inequality (7.8) for αk and αk+1. Firstly notice that Nk > 5N and 2k > 2

for k ≥ 6, and so αk > 0 for any k ≥ 6.

Therefore, with any ε ≤ 1/N 2, one can write

kαk − ε(2N − k + 1)αk+1 ≥ kαk −1

N2(2N − k + 1)αk+1,

where, upon substituting for αk+1, the numerator of the latter expression

N2kαk − (2N − k + 1)αk+1 = (k2 − 5k)N 3 + 2(k2 − 2k + 4)N 2

+ (k2 − 9k + 4)N + 2k(k − 1)

is always strictly positive for all k ≥ 6, since

• the leading coefficient of N 3: (k2 − 5k) > 0, ∀k ≥ 6,

• the coefficient 2(k2 − 2k + 4) of N 2 is at least twice the coefficient (k2 − 9k + 4)

of N and

• the last term 2k(k − 1) is also positive for k ≥ 6.

Hence, our first assertion (7.8) holds.

Fix a positive integer N and check that βk > 0. Setting tdef= k/N where t ∈ (0, 2) since

3 ≤ k ≤ N + 3, We rewrite βk as

βk = 18 + (47 − 33t) N + (25 − 52t + 18t2) N2 + (4 − 13t + 13t2 − 3t3) N 3 (7.10)

20

t

21.510.50

6

5

4

3

2

1

PSfrag replacements

y = −4

y = −14

y = 25−52t+18t2

4−13t+13t2−3t3

y = 47−33t

4−13t+13t2−3t3

y = 4 − 13t + 13t2 − 3t3

t

21.510.50

120

100

80

60

40

20

0

-20

PSfrag replacements

y = −4

y = −14

y = 25−52t+18t2

4−13t+13t2−3t3

y = 47−33t

4−13t+13t2−3t3

y = 4 − 13t + 13t2 − 3t3

Figure 10: Plot of 4 − 13t + 13t2 − 3t3 and ratios 25−52t+18t2

4−13t+13t2−3t3, 47−33t

4−13t+13t2−3t3

The above expression has properties (see Figure 10) that

min0<t≤2

{

4 − 13t + 13t2 − 3t3}

> 0,

min0<t≤2

{

25 − 52t + 18t2

4 − 13t + 13t2 − 3t3

}

> −14,

min0<t≤2

{

47 − 33t

4 − 13t + 13t2 − 3t3

}

> −4.

Thus,

βk > µN(N 2 − 14N − 4) + 18 = µN(N − 7 +√

53)(N − 7 −√

53) + 18,

where µdef= 4 − 13t + 13t2 − 3t3. Therefore, it is clear that βk > 0 for any N > 14.

Under the assumption N > 14, we now proceed to the proof of the inequality (7.9) for

βk and βk+1. We know that

kβk − ε(N − k + 3)βk+1 ≥ kβk −1

N2(N − k + 3)βk+1, ∀ε ≤ 1

N2.

21

1.210.80.60.40.20

120

100

80

60

40

20

0

PSfrag replacements

t

y = (51−46 t+13 t2−3 t

3)4−13t+13t2−3t3

y =−(24−65 t+48 t

2−9 t

3) 1

k

4−13t+13t2−3t3

Figure 11: Plot of y = (51−46 t+13 t2−3 t3)4−13t+13t2−3t3

and y =−(24−65 t+48 t2−9 t3) 1

k

4−13t+13t2−3t3with k = 2.

1.210.80.60.40.20

70

60

50

40

30

20

10

0

PSfrag replacements

t

y = (30−26 t+9 t2)

4−13t+13t2−3t3

y =−(44−84 t+27 t

2) 1

k

4−13t+13t2−3t3

Figure 12: Plot of y = (30−26 t+9 t2)4−13t+13t2−3t3

and y =−(44−84 t+27 t2) 1

k

4−13t+13t2−3t3with k = 6.

22

1.210.80.60.40.20

15

10

5

0

-5

-10

PSfrag replacements

t

y = 8−6t

4−13t+13t2−3t3

y =−(24−18t) 1

k

4−13t+13t2−3t3

Figure 13: Plot of y = (8−6t)4−13t+13t2−3t3

and y =−(24−18t) 1

k

4−13t+13t2−3t3with k = 6.

Multiplying the right hand side of the above inequality by N2

k, we obtain

N2

k

(

kβk −1

N2(N − k + 3)βk+1

)

= N2βk + βk+1 −1

k(N + 3)βk+1

=[

−(24 − 18t)1

k+ (8 − 6t)

]

N

+[

−(44 − 84 t + 27 t2)1

k+ (30 − 26 t + 9 t2)

]

N2

+[

−(24 − 65 t + 48 t2 − 9 t3)1

k+ (51 − 46 t + 13 t2 − 3 t3)

]

N3

+[

−(4 − 13 t + 13 t2 − 3 t3)1

k+ (25 − 52 t + 18 t2)

]

N4

+ (4 − 13 t + 13 t2 − 3 t3)N 5, (7.11)

where t ∈ (0, 1.22) (since k ≥ 6 and N > 14). The latter expression has properties that

for t ∈ (0, 1.22) and k ≥ 6

• 4 − 13 t + 13 t2 − 3 t3 > 0 (see Figure 10)

• −(4 − 13 t + 13 t2 − 3 t3) 1k

+ (25 − 52 t + 18 t2)

4 − 13 t + 13 t2 − 3 t3> −14 − 1

6(see again Figure 10)

23

• −(24 − 65 t + 48 t2 − 9 t3) 1k

+ (51 − 46 t + 13 t2 − 3 t3)

4 − 13 t + 13 t2 − 3 t3> 0 (see Figure 11)

• −(44 − 84 t + 27 t2) 1k

+ (30 − 26 t + 9 t2)

4 − 13 t + 13 t2 − 3 t3> 0 (see Figure 12)

• −(24 − 18t) 1k

+ (8 − 6t)

4 − 13 t + 13 t2 − 3 t3> 0 (see Figure 13)

Therefore,

N2

k

(

kβk −1

N2(N − k + 3)βk+1

)

> µ(

N5 − (14 +1

6)N4

)

> 0, ∀N > 14.

This concludes the proof of the second inequality (7.9).

References

Andramonov, M. (2002). Solving problems with min-type functions by disjunctive pro-

gramming. submitted to Journal of Global Optimization.

Andramonov, M., Filar, J. A., Rubinov, A., & Pardalos, P. (2000). Hamiltonian cy-

cle problem via markov chains and min-type approaches. In P. M. Padalos (Ed.),

Approximation and Complexity in Numerical Optimization: Continuous and Discrete

Problems (pp. 31–47). Dordrecht, The Netherlands: Kluwer Academic Publishers.

Blackwell, D. (1962). Discrete dynamic programming. Ann. Math. Statist., 33, 719–726.

Chen, M. & Filar, J. A. (1992). Hamiltonian cycles, quadractic programming and

ranking of extreme points. In C. A. Floudas & P. M. Padalos (Eds.), Recent Advances

in Global Optimization (pp. 32–49). Princeton, New Jersey: Princeton University

Press.

Ejov, V., Filar, J. A., & Gondzio, J. (2002). MDP-based optimization algorithm for the

HCP. in preparation.

Feinberg, E. (2000). Constrained discounted markov decision processes with hamiltonian

cycles. Mathematics of Operations Research, 25, 130–140.

Filar, J. A. & Krass, D. (1994). Hamiltonian cycles and markov chains. Mathematical

Operations Research, 19, 223–227.

24

Filar, J. A. & Lasserre, J.-B. (2000). A non-standard branch and bound method for the

hamiltonian cycle problem. ANZIAM J., 42 (E), C556–C577.

Filar, J. A. & Liu, K. (1996). Hamiltonian cycle problem and singularly perturbed

decision process. in Statistics, Probabitily and Game Theory: Papers in Honor of

David Blackwell, IMS Lecture Notes - Monograph Series, USA, .

Filar, J. A. & Vrieze, K. (1996). Competitive Markov Decision Processes. N.Y.: Springer-

Verlag.

Hordijk, A. & Kallenberg, L. (1984). Constrained undiscounted stochastic dynamic

programming. Math. Oper. Res., 9, 276–289.

Kallenberg, L. C. M. (1983). Linear Programming and Finite Markovian Control Prob-

lems. Mathematical Centre Tracts 148. Mathematical Centre, Amsterdam.

Kallenberg, L. C. M. (2002). Finite state and action MDPs. In E. Feiberg & A. Shwartz

(Eds.), Handbook of Markov Decision Processes (pp. 21–87). Boston: Kluwer.

Ross, K. (1989). Randomized and past-dependent policies for markov decision processes

with multiple constraints. Oper. Res., 37, 474–477.

25