Ensemble simulations with perturbed physical parametrizations: Pre-HyMeX case studies
Hamiltonian Cycles and Singularly Perturbed Markov Chains
Transcript of Hamiltonian Cycles and Singularly Perturbed Markov Chains
Hamiltonian Cycles and Singularly
Perturbed Markov Chains∗
Vladimir Ejov Jerzy A. Filar† Minh-Tuan Nguyen‡
School of Mathematics – University of South Australia
Mawson Lakes Campus, Mawson Lakes, SA 5095, Australia
Abstract
We consider the Hamiltonian cycle problem embedded in a singularly per-
turbed Markov decision process. We also consider a functional on the space of
deterministic policies of the process that consists of the (1,1)-entry of the funda-
mental matrices of the Markov chains induced by the same policies. We show that
when the perturbation parameter, ε, is less than or equal to 1N2 the Hamiltonian
cycles of the directed graph are precisely the minimizers of our functional over the
space of deterministic policies. In the process, we derive analytical expressions for
the possible N distinct values of the functional over the, typically, much larger
space of deterministic policies.
Key words: Hamiltonian Cycle, Markov Decision Process, Optimal Policy, Singu-
lar Perturbation.
1 Introduction
This paper is a continuation of a line of research (Chen & Filar, 1992; Filar & Krass,
1994; Andramonov et al., 2000; Filar & Lasserre, 2000; Feinberg, 2000) which aims to
exploit the tools of Markov Decision Processes (MDP’s) to study the properties of a
famous problem of combinatorial optimization: the Hamiltonian Cycle Problem (HCP).
More specifically, the present paper extends and improves the results presented in (Filar
∗This work is supported by Australian Research Council Grants†Corresponding author. Tel.: +61 8302 3530, Fax: +61 83025785, E-mail: [email protected]‡Present address: Joint Systems Branch, DSTO, PO Box 1500, Edinburgh SA. 5111, Australia.
1
& Liu, 1996). In the latter paper it was shown that Hamiltonian Cycles of a graph can
be characterized as the minimizers of a functional based on the fundamental matrices of
Markov Chains induced by deterministic policies in a suitably perturbed MDP; provided
that the value of the perturbation parameter, ε, is sufficiently small.
However, no estimate was provided in (Filar & Liu, 1996) as to how small is the above
“sufficiently small” ε. Indeed, it was entirely conceivable that such an ε would tend to
0 so fast, as the number of nodes, N , increased, as to render the practical value of the
preceding result quite negligible. Fortunately, that is not the case. In particular, we
demonstrate that ε ≤ 1N2 qualifies as ε sufficiently small.
Thus for a graph of 10,000 nodes, ε ≤ 10−8 is good enough and that is well within the
precision limits of most computers. The proof of the above bound is quite technical
and it utilizes the precise form of closed form expressions for up to N distinct values
of our functional; corresponding to the deterministic policies belonging to an N -way
partition of the space of such policies. The fact that the functional is invariant with
respect to the policies belonging to a single member of this partition was also observed
in (Filar & Liu, 1996), however, the proofs provided here are more elegant and include
the analytical closed form expressions for the N distinct values of the functional that
were only estimated in (Filar & Liu, 1996).
Thus the present paper fully supersedes (Filar & Liu, 1996) but, arguably, it would not
have been possible without the results of the latter as a starting point. We also make
use of classical results on MDP’s due to (Blackwell, 1962).
In the line of research initiated in (Filar & Krass, 1994), the goal is to express the
HCP as a problem involving the minimization of functionals from the theory of Markov
decision processes. This is done with the hope that the resulting embedding in a con-
tinuous setting - endowed with probabilistic interpretation - will stimulate research into
algorithmic procedures that exploit this novel relaxation of the HCP. The particular
functional discussed in this paper is based on the fundamental matrix of Markov chains
induced by deterministic policies. The latter is a classical, important, operator in the
theory of Markov chains but - to the best of our knowledge - its relationship with com-
binatorial problems such as the HCP has not been studied prior to (Filar & Liu, 1996);
the predecessor of the present contribution.
2
2 Representing a directed graph as an MDP
Consider a directed graph G with the node set S and the arc set A. We can associate
a Markov Decision Process Γ with the graph G as follows
• The set of N nodes is the finite state space S = {1, 2, · · · , N} and the set of arcs
in G is the total action space A = {(i, j), i, j ∈ S} where, for each state (node) i,
the action space is the set of arcs (i, j) emanating from this node.
•{
p(j|i, a) = δaj
∣
∣ a = (i, j) ∈ A, i, j ∈ S}
, where δaj the Kronecker delta, is the set
of (one-step) transition probabilities. Note that, we are adopting the convention
that a equals to both arc (i, j) and its “head” j, whenever there is no possibility
of confusion as to the “tail” i.
A stationary policy π in Γ is a set of N probability vectors π(i) =(
π(i, 1), π(i, 2),· · · ,π(i, N)
)
, where π(i, k) denotes the probability of choosing an action k (arc emanating
from i to k) whenever state (node) i is visited. Of course,∑N
k=1 π(i, k) = 1 and if the
arc (i, k) /∈ A, then π(i, k) = 0.
A deterministic policy f is simply a stationary policy that selects a single action with
probability 1 in every state (hence, all other available actions are selected with probabil-
ity 0). That is, f(i, k) = 1 for some (i, k) ∈ A. For convenience, we will write f(i) = k
in this case.
Assume that 1 is the initial state (home node). We shall say that a deterministic policy
f in Γ is a Hamiltonian Cycle (HC) in G if the sub-graph Gf with the set of arcs{(
1, f(1))
,(
2, f(2))
, · · · ,(
N, f(N))}
is a HC in G. If the sub-graph Gf contains cycles
of length less than N , say m, we say that f has an m-sub-cycle.
However, such a straightforward identification of G with Γ leads to an inevitable diffi-
culty of confronting multiple ergodic classes induced by various deterministic policies.
This can be illustrated on a complete graph G4 on 4 nodes (without self-loops) in Fig-
ure 1. A policy f such that f(1) = 2, f(2) = 1, f(3) = 4 and f(4) = 3 induces a
sub-graph Gf ={
(1, 2), (2, 1), (3, 4), (4, 3)}
which contains two 2-sub-cycles (see Fig-
3
PSfrag replacements 1 2
34
Figure 1: Complete graph G4
PSfrag replacements 1 2
34
Figure 2: Sub-graph Gf
ure 2). Policy f also induces a Markov chain with the probability transition matrix
P (f) =
0 1 0 0
1 0 0 0
0 0 0 1
0 0 1 0
which has two ergodic classes corresponding to the sub-cycles of Gf .
Hence, the direct embedding of G in Γ, in general, induces a multi-chain ergodic struc-
ture. This and some other technical difficulties would vanish if we force the MDP to
be “unichain”. This is achieved by passing to a singularly perturbed MDP Γε, that is
obtained from Γ by introducing perturbed transition probabilities{
pε(j|i, a)∣
∣ (i, j) ∈A, i, j ∈ S
}
, where for any ε ∈ (0, 1)
pε(j|i, a)def=
1 if i = 1 and a = j,
0 if i = 1 and a 6= j,
1 if i > 1 and a = j = 1,
ε if i > 1, a 6= j and j = 1,
1 − ε if i > 1, a = j and j > 1,
0 if i > 1, a 6= j and j > 1.
Note that 1 denotes the “home” node. For each pair of nodes i, j (not equal to 1)
corresponding to a (deterministic) arc (i, j), our perturbation replaces that arc by a pair
of “stochastic arcs” (i, 1) and (i, j) (see Figure 3) with weights ε and 1− ε respectively.
This stochastic perturbation has the interpretation that a decision to move along arc
4
PSfrag replacements
1
Perturbationε 1 − ε
ii
jj
Figure 3: Perturbation of a deterministic action (arc)
(i, j) results in movement along (i, j) only with probability of (1−ε) and with probability
ε it results in a return to the home node 1.
Next, we recall that any stationary policy π induces a probability transition matrix
Pε(π) =[
pε(j|i, π)]
, i, j = 1, · · · , N,
where for all i, j ∈ S
pε(j|i, π) =N
∑
a=1
pε(j|i, a) π(i, a).
In the above summation, we assume that pε(j|i, a) = 0 if the arc (i, a) /∈ A.
We emphasize that the perturbation is chosen to ensure that the Markov chain defined
by Pε(π) contains only a single ergodic class. On the other hand, the ε-perturbed process
Γε clearly “tends” to Γ as ε → 0, in the sense that Pε(π) → P0(π) for every stationary
policy π.
Our ultimate goal is to construct a simple function that is minimized over the set of
stationary policies Π only by a Hamiltonian cycle. Here we achieve this for the set of
deterministic policies D using the function extracted from the well-known characteristic,
associated with the MDP Γε, called the fundamental matrix.
3 Main Result
Let π ∈ Π be a stationary policy and Pε(π) be the corresponding probability transition
matrix. By P ∗ε (π) we denote its stationary distribution matrix, which is defined as its
limit Cesaro-sum matrix
P ∗ε (π)
def= lim
T→∞
1
T + 1
T∑
t=0
P tε(π), P 0
ε (π) = IIN ,
5
where IIN is a N × N identity matrix. The fundamental matrix F associated with π is
defined byF (π)
def=
(
IIN − Pε(π) + P ∗ε (π)
)−1.
In order to distinguish HC from among other deterministic policies in D, we use the
(1, 1)-entry of the matrix F (π),
L(π)def= [F (π)]1,1,
as an objective function. Denote by
dm(ε)def= 1 +
m∑
k=2
(1 − ε)k−2 =1 + ε − (1 − ε)m−1
ε, ∀m ≥ 2. (3.1)
We state our main results as the theorem below
Theorem 3.1.
(i) Graph G contains a Hamiltonian cycle GHC with a corresponding deterministic
policy fHC, if and only if for ε ≤ 1N2 , ∀N ≥ 3
minf∈D
L(f) = L(fHC)def= LHC ,
where LHC =1
dN(ε)− 1
d2N(ε)
+1
εdN(ε)− (1 − ε) + (N − 1)(1 − ε)N−1
εd2N(ε)
.
(ii) If G does not contain Hamiltonian cycles, then for ε ≤ 1N2 , ∀N ≥ 3
minf∈D
L(f) > LHC .
It should be noted that whenever part (ii) applies, minf∈D L(f) identifies the length of
the longest cycle that passes through the home node. By considering all possible home
nodes the value of the above minimum also identifies the length of the longest cycle in
a non-Hamiltonian graph 1. The essence of the proof of Theorem 3.1 will be presented
in the next two sections whereas its technical aspects will be gathered in the Appendix.
4 Derivation of the objective function L
To start with, we shall derive a useful partition of D that is based on the graphs “traced
out” in G by deterministic policies (see also Chen & Filar, 1992; Filar & Vrieze, 1996).
As above, with each f ∈ D we associate a sub-graph Gf of G defined by
(i, j) ∈ Gf ⇐⇒ f(i) = j.
1We thank an anonymous referee for this observation.
6
We shall also denote a simple cycle of length m and beginning at 1 by a set of arcs
c1m =
{
(i1 = 1, i2), (i2, i3), · · · , (im, im+1 = 1)}
, m = 2, 3, · · · , N.
Note that c1N is a HC. If Gf contains a cycle c1
m, we write Gf ⊃ c1m. Let
C1m
def=
{
f ∈ D∣
∣
∣Gf ⊃ c1
m
}
,
namely, the set of deterministic policies that trace out a simple cycle of length m,
beginning at node 1, for each m = 2, 3, · · · , N . Of course, C1N is the (possible empty)
set of policies that correspond to HC’s and any single C1m can be empty depending in
the structure of the original graph G. Thus, a typical policy f ∈ C1m traces out a graph
Gf in G that might look as Figure 4.
••• ••
PSfrag replacements
1 i2 im−1 im
Figure 4: A typical policy f in C1m
where the dots indicate the “immaterial” remainder of Gf that corresponds to states
that are transient in Pε(f), as a result of the perturbation (without the perturbation
there might exist other ergodic classes that do not contain the home node 1).
We use the further partition of the deterministic policies of the form
D =
[ N⋃
m=2
C1m
]
⋃
B,
where B contains all deterministic policies that are not in any of the C1m’s. A typical
policy f ∈ B traces out a sub-graph Gf in G as in Figure 5.
• ••
PSfrag replacements
1 i2 imik ik+1
Figure 5: A typical policy f in B
where the dots again denote the immaterial part of Gf . It is important, however, to
note that the transient states {1, i2, · · · , ik−1} in Γ are no longer transient in Γε for any
ε > 0.
7
According to the theory of Markov chains, the stationary distribution matrix P ∗ε (π)
consists of N identical row-vectors p∗ε(π)
def=
(
p∗1(π), · · · , p∗N(π))
. That is
P ∗ε (π)
def=
p∗ε(π)...
p∗ε(π)
=
p∗1(π) · · · p∗N(π)...
... · · ·p∗1(π) · · · p∗N(π)
.
Using this notation, we define the long-run expected state-action frequencies induced
by a stationary policy π ∈ Π as
xia(π)def= p∗i (π) π(i, a).
for i = 1, 2, · · · , N and all arcs (i, j) ∈ A. Assuming that π(i, a) = 0 for any arc
(i, a) /∈ A, we may arrange{
xia(π)}
as entries of an N × N frequency matrix X(π) =[
xia(π)]
; i, a = 1, · · · , N .
The frequency of visits to the state {1}, for example, is then given by
x1(π) =N
∑
a=1
x1a(π) = p∗1(π), sinceN
∑
a=1
π(1, a) = 1.
For technical reasons, it is convenient to represent our objective function L(π) in the
form
L(π) = x1(π) + y1(π),
where x1(π) = p∗1(π) is the long-run frequency of visits to node 1 and y1(π) is the first
component of the vector y(π) =[
IIN −Pε(π) + P ∗ε (π)
]−1r−P ∗
ε (π)r, r = (1, 0, · · · , 0)T .
Using the Markov chain identities
P ∗ε (π)Pε(π) = Pε(π)P ∗
ε (π) = P ∗ε (π),
we can alternatively (ref. Blackwell, 1962) express y(π) as a unique solution of the
system
p∗ε(π)y(π) = 0 (4.1a)
[
IIN − Pε(π)]
y(π) = r −[
x1(π), · · · , x1(π)]T
. (4.1b)
We will explicitly calculate x1(f) and y1(f) for different types of deterministic policies,
namely, f ∈ C1N , f ∈ C1
m(m < N) and f ∈ B. Due to the following lemma, it will be
sufficient to perform the computation for the particular representatives of each class.
8
Lemma 4.1. Let f ∈ D and θ be a permutation of nodes {1, · · · , N} that preserves the
home node {1}. Define a permutation, θ(f) for a deterministic policy f by θ(f)(i)def=
θ(
f(θ−1(i)))
. Then
L(f) = L(
θ(f))
.
Proof. Permutation θ induces the conjugate transformation on the transition probability
and stationary distribution matrices of the form
Pε(θ(f)) = ΘPε(f)Θ−1,
P ∗ε (θ(f)) = ΘP ∗
ε (f)Θ−1,
where Θ is the matrix with the (i, j)-entry Θi,j = δθ(i),j representing the permutation θ.
Hence, the fundamental matrix F(
θ(f))
=(
IIN −Pε(θ(f)) + P ∗ε (θ(f))
)−1transforms as
F(
θ(f))
= ΘF (f)Θ−1.
Since θ preserves {1}, the same is true for θ−1. Unit in position (1, 1) is the only non-zero
element in the first row and the first column of Θ and Θ−1. Therefore
L(
θ(f))
=[
F (θ(f))]
1,1=
[
ΘF (f)Θ−1]
1,1=
[
F (f)]
1,1= L(f).
Thus, without loss of generality, it is sufficient to consider only fm ∈ C1m tracing out the
graph
••• ••
PSfrag replacements
1 2 m−1 m
Figure 6: A representative fm of the whole class C1m
as the representative of the whole class C1m and also, f k
m ∈ B that traces out
. . .
PSfrag replacements
1 2 k k+1 m
Figure 7: A representative f km of the whole class B
9
as the representative of B. The following lemma verifies that the transient states for
fm ∈ C1m and fk
m ∈ B are, indeed, immaterial as far as L(f) is concerned.
Lemma 4.2. Let Γmε be the restriction of Γε to the first m < N nodes {1, 2, , · · · ,m};
f̄m and f̄km are respectively the restrictions of fm ∈ C1
m and fkm ∈ B to the part of G
consisting of the first m nodes. Then
L(fm) = L(f̄m),
L(fkm) = L(f̄k
m).
Proof. In both cases, for f = fm and f = fkm the matrix [F (f)]−1 = IIN −Pε(f) + P ∗
ε (f)
has the form
[F (f)]−1 =
X O
Y Z
,
where X,Y, Z are the matrices of dimensions
m×m, (N −m) ×m, (N −m) × (N −m), respectively. Matrix O is the m× (N −m)
zero matrix. Hence,
F (f) =
X O
Y Z
−1
=
X ′ W ′
Y ′ Z ′
,
where XX ′ = X ′X = IIm, XW ′ = 0, Y X ′ + ZY ′ = 0 and Y W ′ + ZZ ′ = IIN−m.
Thus X ′ = X−1 and L(f) =[
X−1]
1,1. Matrix X−1 is , in fact, the fundamental matrix
that corresponds to the restricted f̄ of f and this completes the proof of the lemma.
Now we are ready to derive L(f).
Lemma 4.3. Let f = f kN ∈ B. Then
• its stationary probability distribution vector is
p∗ε(f) =
(
ε
1 + ε,
ε
1 + ε,ε(1 − ε)
1 + ε, · · · ,
ε(1 − ε)k−3
1 + ε,(1 − ε)k−2
KN
,
· · · ,(1 − ε)k−3+j
KN
, · · · ,(1 − ε)N−2
KN
)
, (4.2)
where j = 1, · · · , N−k+1 and KNdef= (1+ε)
∑N−k
i=0 (1−ε)i = 1+εε
(
1−(1−ε)N−k+1)
and
10
• the value of the objective function is
L(f) =1 + ε + ε2
(1 + ε)2. (4.3)
Proof. The stationary probability distribution vector p∗ε(f) is the unique solution of the
system
p∗ε(f)
(
IIN − Pε(f))
= 0, (4.4a)
N∑
j=1
p∗j(f) = 1 (4.4b)
Checking for the vector p∗ε(f) specified in (4.2), we observe that
ε
1 + ε+
ε
1 + ε+
ε(1 − ε)
1 + ε+ · · · +
ε(1 − ε)k−3
1 + ε=
1 + ε − (1 − ε)k−2
1 + ε,
and alsoN−k+1∑
j=1
(1 − ε)k−3+j
KN
=(1 − ε)k−2
1 + ε.
Hence,N
∑
j=1
p∗j(f) =1 + ε − (1 − ε)k−2
1 + ε+
(1 − ε)k−2
1 + ε= 1
which verifies (4.4b). For the first equation (4.4a), let us expand it as follows
(
p∗1(f), · · · , p∗N(f))
0 1 0 0 · · ·kth
0 · · · 0 0
ε 0 1 − ε 0 · · · 0 · · · 0 0
ε 0 0 1 − ε · · · 0 · · · 0 0...
......
ε 0 0 0 · · · 1 − ε · · · 0 0...
......
ε 0 0 0 · · · 0 · · · 0 1 − ε
ε 0 0 0 · · · 1 − ε · · · 0 0
=(
p∗1(f), · · · , p∗N(f))
For the first entry of both sides, we have
εN
∑
j=2
p∗j(f) = ε(
1 − p∗1
)
= ε − ε
1 + ε=
ε
1 + ε= p∗1(f).
11
For the second entry:
p∗1(f) =ε
1 + ε= p∗2(f).
For the jth entries, j ∈ [3, N ]:
• if j 6= k, then
(1 − ε)p∗j−1(f) = p∗j(f),
• if j = k, then
(1 − ε)p∗k−1 + (1 − ε)p∗N(f) =ε
1 + ε(1 − ε)k−2 +
(1 − ε)N−1
KN
=(1 − ε)k−2
KN
= p∗k(f).
These verify the expression of p∗ε(f).
To prove (4.3), we note that x1(f) = p∗1(f) = ε1+ε
and find the value of y1(f). We solve
(4.1b) for y(f) = (y1, y2, · · · , yN ):
y1 − y2 = 1 − ε
1 + ε
−εy1 + y2 − (1 − ε)y3 = − ε
1 + ε
· · · · · · · · ·
−εy1 + yN−1 − (1 − ε)yN = − ε
1 + ε
−εy1 + yN − (1 − ε)yk = − ε
1 + ε.
The rank of this linear system equals N−1. Hence, if we set yj = y1 +wj, j = 2, · · · , N ,
then the first N − 1 equations are sufficient to uniquely determine wj = − 11+ε
, j =
2, · · · , N . The last equation then is clearly verified
−εy1 − (1 − ε)y1 +1 − ε
1 + ε+ y1 −
1
1 + ε= − ε
1 + ε.
To determine y1 we need to use (4.1a), which implies
( N∑
j=1
p∗j(f)
)
y1 −1
1 + ε
N∑
j=2
p∗j(f) = y1 −1
1 + ε
N∑
j=1
p∗j(f) +1
1 + εp∗1(f)
= y1 −1
(1 + ε)2
= 0.
12
Thus, y1 = y1(f) = 1(1+ε)2
and we conclude that
L(f) = x1(f) + y1(f) =ε
1 + ε+
1
(1 + ε)2=
1 + ε + ε2
(1 + ε)2
We note that L(f) does not depend on the number of transient states for f ∈ B.
Along the same lines, we proceed with the policy fHC , that corresponds to the Hamil-
tonian cycle.
Lemma 4.4. Let f = fHCdef= fN ∈ C1
N . Then
• its stationary probability distribution vector is
p∗ε(f) =
(
1
dN
,1
dN
,1 − ε
dN
,(1 − ε)2
dN
, · · · ,(1 − ε)N−2
dN
)
• the value of the objective function is
L(f) =1
dN
− 1
d2N
+1
εdN
− (1 − ε) + (N − 1)(1 − ε)N−1
εd2N
.
Proof. The expression for p∗ε(f) can be found in (Filar & Krass, 1994) or (Chen & Filar,
1992). However, it can be verified as a unique solution of system (4.4). In particular,
x1(f) = 1dN
.
Again, we employ system (4.1) to determine y(f) and hence y1(f). Note that (4.1b)
takes the form
y1 − y2 = 1 − 1
dN
−εy1 + y2 − (1 − ε)y3 = − 1
dN
· · · · · · · · ·
−εy1 + yN−1 − (1 − ε)yN = − 1
dN
−y1 + yN = − 1
dN
.
By induction on k, one can easily verify that the kth component of y(f) is
yk = y1 −1
dN
+1
(1 − ε)k−2
(1 − ε)N−1 − (1 − ε)k−1
1 + ε − (1 − ε)N−1, ∀k = 2, · · · , N.
13
The above and (4.1a) then imply
y1
N∑
k=1
p∗k(f) − 1
dN
N∑
k=1
p∗k(f) +1
dN
p∗1(f) +N
∑
k=2
p∗k(f)
(1 − ε)k−2
(1 − ε)N−1 − (1 − ε)k−1
1 + ε − (1 − ε)N−1= 0.
Using (3.1), the property∑N
k=1 p∗k(f) = 1 and the expression for p∗1(f) and p∗k(f), k ≥ 2,
the above equation can be rewritten as
y1 −1
dN
+1
d2N
+N
∑
k=2
1
dN
(1 − ε)N−1 − (1 − ε)k−1
1 + ε − (1 − ε)N−1
= y1 −1
dN
+1
d2N
+1
εd2N
{ N∑
k=2
(1 − ε)N−1 − (1 − ε)N
∑
k=2
(1 − ε)k−2
}
= y1 −1
dN
+1
d2N
+1
εd2N
{
(N − 1)(1 − ε)N−1 − (1 − ε)(dN − 1)}
= y1 +1
d2N
− 1
εdN
+(1 − ε) + (N − 1)(1 − ε)N−1
εd2N
= 0.
Thus,
y1(f) = y1 = − 1
d2N
+1
εdN
− (1 − ε) + (N − 1)(1 − ε)N−1
εd2N
,
and so
L(f) = x1(f) + y1(f) =1
dN
− 1
d2N
+1
εdN
− (1 − ε) + (N − 1)(1 − ε)N−1
εd2N
.
Corollary 4.1. For m > 2, if f = fm ∈ C1m, then
L(f) = x1(f) + y1(f) =1
dm
− 1
d2m
+1
εdm
− (1 − ε) + (m − 1)(1 − ε)m−1
εd2m
.
Proof. immediately follows from the application of Lemma 4.2 and Lemma 4.4.
5 Crucial argument about the estimate ε ≤ 1N2
Lemma 5.1. For ε ≤ 1N2 , fN−1 ∈ C1
N−1 and fHC = fN ∈ C1N , the following holds
L(fN) < L(fN−1).
14
We postpone the proof of this lemma (which is straight-forward but technical) to the
Appendix. Here we demonstrate that the statement of the lemma is crucial for the proof
of our main result – Theorem 3.1.
Indeed, if we compare L(fN) and L(fm), m < N, fm ∈ C1m, then iteratively using
Lemma 5.1, one can easily see that
∀ ε ≤ 1
N2, m ∈ [2, N − 1] : L(fN) < L(fm).
Thus, for any policy f ∈⋃N−1
m=2 C1m, whenever ε ≤ 1
N2 , then L(fHC) < L(f).
It remains to verify that L(fHC) < L(f), with f ∈ B. By the above argument, as
L(f) with f = f kN ∈ B is independent of k and N , it will be sufficient to show that
L(f3 ∈ C1m) < L(f 2
3 ∈ B).
By virtue of Lemma 4.3 and Lemma 4.4, we know that
L(f 23 ) − L(f3) =
1 + ε + ε2
(1 + ε)2− 3(2 − ε)
(3 − ε)2=
3 − 6ε + 4ε2 − 2ε3 + ε4
(1 + ε)2(3 − ε)2,
which is clearly positive for any ε ≤ 1/2.
Theorem 3.1 is, therefore, proved provided that the validity of Lemma 5.1 can be estab-
lished. The latter is done in the Appendix.
6 Conclusion and open problems
Theorem 3.1 motivates the following characterization of graphs with Hamiltonian cycles:
Conjecture 6.1. Graph G contains a Hamiltonian cycle if and only if
minπ∈Π
L(π) = LHC ,
where LHC defined in Theorem 3.1 and Π is the set of all stationary policies; if ε ≤ 1N2 .
The results of (Filar & Krass, 1994) - based on the stationary distribution operator
of Markov chains - have already led to algorithmic approaches to the HCP (e.g. see
(Andramonov, 2002) and (Ejov et al., 2002)). If the above conjecture can be established
it is reasonable to expect that algorithms that begin with fully randomized policies and
15
approach the degenerate (deterministic) policies corresponding to Hamiltonian cycles
could be explored. However, if the conjecture were false such “interior point” methods
may arrive at a randomized policy πLM ∈ Π with L(πLM) < LHC . By analogy, the
elegance of linear programming stems from the fact that it is impossible to find an
interior global minimum that is strictly better than all extreme points. Therefore, a
confirmation of our conjecture would raise hopes that an intelligent search procedure
for a Hamiltonian cycle could be devised. It is possible that the results of (Kallenberg,
1983; Hordijk & Kallenberg, 1984; Ross, 1989; Kallenberg, 2002) could be of help in
establishing the above conjecture.
Conjecture 6.1 is confirmed by our numerical experiments on Maple for convex combi-
nations π = π(λ, f, g)def= λf + (1−λ)g, λ ∈ [0, 1] of deterministic policies f, g ∈ D with
the corresponding probability transition matrices as Pε(π) = λPε(f)+(1−λ)Pε(g). Our
experiments for N ≤ 20 show that for such π the objective function
L(π) ≥ L(fHC),
and the equality takes place if and only if π = fHC , that is, g = fHC and λ = 0.
Figure 8 shows the behaviour of the objective function L(π) with some different convex
combinations of f ∈ D for fB, gB ∈ B, fHC and fm ∈ C1m, m < N .
10.80.60.40.20
1.1
1
0.9
0.8
0.7
0.6
PSfrag replacements
λ
L(
π(λ, fB , f′
B))
L(
π(λ, fB , fN ))
L(
π(λ, fB , f′
N))
L(
π(λ, fB , fm))
Figure 8: Behaviour of the objective function L(π)
16
7 Appendix: Verification of the estimate ε ≤ 1N2
- (Proof of Lemma 5.1) -
Setting t = 1 − ε, we observe from (3.1) that
dm(ε) =1 + ε − tm−1
ε
Using this expression for m = N and m = N + 1, we obtain by Lemma 4.4 that
L(fN) − L(fN+1) =tN−1QN(ε, t)
(1 + ε − tN)2(1 + ε − tN−1)2, (7.1)
where
QN(ε, t) = − t2N + t2N−1 + tN+1 − tN−1 − t + 1
+ ε{
−Nt2N + (N + 1)t2N−1 + tN+1 − 2tN − tN−1 + (N − 1)t − (N − 2)}
+ ε2{
tN+1 − 2tN − tN−1 + (2N − 1)t − (2N − 3)}
+ ε3{
(N − 1)t − (N − 2)}
. (7.2)
So, it is sufficient to verify that QN(ε, t) > 0 for all ε ≤ 1N2 .
Substituting t = 1 − ε, we now write QN as polynomial of ε only by expanding every
term in the above formula of QN(ε, t). For example, the contribution of −t2N to the
coefficient at εk for 0 ≤ k ≤ 2N + 1, will be equal to −(
2N
k
)
and we denote it as
−t2N =⇒εk
−(−1)k(
2N
k
)
,
with(
α
β
)
being the usual notation for the binomial coefficients. We assume that(
α
β
)
= 0
for β > α and also that(
α
β
)
= 0 for β < 0.
Analogous contribution of t2N−1 is
ε(N + 1)t2N−1 =⇒εk
(−1)k−1(
2N−1k−1
)
= −(−1)k(
2N−1k−1
)
Adding up all such contributions to the coefficient at εk, we obtain
QN(ε) = QN(ε, t) =2N+1∑
k=0
(−1)k
(
−(
2N
k
)
+(
2N−1k
)
+(
N+1k
)
−(
N−1k
)
−(
1k
)
+(
0k
)
+ N(
2N
k−1
)
− (N + 1)(
2N−1k−1
)
−(
N+1k−1
)
+ 2(
N
k−1
)
+(
N−1k−1
)
− (N − 1)(
1k−1
)
+ (N − 2)(
0k−1
)
+(
N+1k−2
)
− 2(
N
k−2
)
−(
N−1k−2
)
+ (2N − 1)(
1k−2
)
− (2N − 3)(
0k−2
)
− (N − 1)(
1k−3
)
+(
0k−3
)
)
εk. (7.3)
17
It turns out that the coefficients at ε0, ε, ε2 and ε3 in the above formula are all vanish.
The expression QN(ε) thus takes the form.
QN(ε, t) = QN(ε) =1
2N(N + 1)ε4 − 1
6(N − 1)(N 3 + 6N 2 − 4N − 6)ε5
+2N+1∑
k=6
(−1)k
(
−(
2N
k
)
+(
2N−1k
)
+(
N+1k
)
−(
N−1k
)
+ N(
2N
k−1
)
− (N + 1)(
2N−1k−1
)
−(
N+1k−1
)
+ 2(
N
k−1
)
+(
N−1k−1
)
+(
N+1k−2
)
− 2(
N
k−2
)
−(
N−1k−2
)
)
εk. (7.4)
Grouping together terms with(
2N
•
)
and(
2N−1•
)
, we obtain a single term with common
factor(
2N−1k−2
)
. For example,(
2N
k
)
=2N(2N − k + 1)
k(k − 1)
(
2N − 1
k − 2
)
Analogously, grouping together all remaining terms and extracting(
N−1k−4
)
as the common
factor, such as, for example,(
N + 1
k
)
=N(N + 1)(N − k + 2)(N − k + 3)
(k − 3)(k − 2)(k − 1)k
(
N − 1
k − 4
)
,
we obtain a representation for QN(ε) as follows
QN(ε) =1
2N(N + 1)ε4 − 1
6(N − 1)(N 3 + 6N 2 − 4N − 6)ε5
+2N+1∑
k=6
(−1)k
(
αk
k − 1
(
2N−1k−2
)
)
εk +N+3∑
k=6
(−1)k
(
βk
(k − 1)(k − 2)(k − 3)
(
N−1k−4
)
)
εk.
(7.5)
where
αk = Nk − 5N + 2k − 2
βk = 4N 3 − 13kN 2 + 25N 2 + 13Nk2 − 52kN + 47N − 3k3 + 18k2 − 33k + 18. (7.6)
In order to see the positivity of QN(ε), let us combine pairs of consecutive terms for
k = 2i and k = 2i + 1 and re-arrange it as follows
QN(ε) =1
2N(N + 1)ε4 − 1
6(N − 1)(N 3 + 6N 2 − 4N − 6)ε5
+N
∑
i=3
(2N − 1)!
(2i − 1)!(2N − 2i)!
(
α2i
2N − 2i + 1− εα2i+1
2i
)
ε2i
+
⌊
N
2
⌋
+1∑
i=3
(N − 1)!
(2i − 1)!(N − 2i + 2)!
(
β2i
N − 2i + 3− εβ2i+1
2i
)
ε2i
+ RN(ε), (7.7)
18
where
RN(ε) =
εN+3 if N is odd and N ≥ 3,
0 if N is even and N ≥ 4.
We observe that
• For ε ≤ 1/N 2
1
2N(N + 1)ε4 − 1
6(N − 1)(N 3 + 6N 2 − 4N − 6)ε5
≥ ε4
{
1
2N(N + 1) − 1
6N2(N − 1)(N 3 + 6N 2 − 4N − 6)
}
=ε4
3N2(N 4 − N3 + 5N 2 + N − 3) > 0.
• The last term RN(ε) ≥ 0
• On the other hand, the positivity of the terms containing the sums,∑N−1
i=3 and∑
⌊
N+3
2
⌋
−1
i=3 , of QN(ε) will be the consequence of the following technical result.
Proposition 7.1. Suppose that ε is chosen in such a way 0 < ε ≤ 1/N 2, then the
following inequalities hold
2iα2i − ε(2N − 2i + 1)α2i+1 >0, ∀ 3 ≤ i ≤ N,
2iβ2i − ε(N − 2i + 3)β2i+1 >0, ∀ 3 ≤ i ≤⌊
N+32
⌋
and N > 14.
Remark 7.1. Since the last inequality of Proposition 7.1 is only valid for N > 14, we
still need to check that L(fN) − L(fN+1) > 0 for all 3 ≤ N ≤ 14. This can be done
by a simple loop program using, for example, Maple, or any other computer algebra
software. Figure 9 is typical plots of L(fN) and L(fN+1) for 0 < ε < 1/N 2 with
particular N = 7, 10 and 14.
Proof of Proposition 7.1. We shall prove a stronger version of Proposition 7.1. That
is,
kαk − ε(2N − k + 1)αk+1 >0, ∀ 6 ≤ k ≤ 2N, (7.8)
kβk − ε(N − k + 3)βk+1 >0, ∀ 6 ≤ k ≤ N + 3 and N > 14, (7.9)
which hold for all 0 < ε ≤ 1/N 2.
19
0.020.0150.010.0050
0.585
0.58
0.575
0.57
0.565
PSfrag replacements
L(f7)
L(f8)
L(f10)
L(f11)
L(f14)
L(f15)
ε0.010.0080.0060.0040.0020
0.562
0.56
0.558
0.556
0.554
0.552
0.55
0.548
0.546
PSfrag replacements
L(f7)
L(f8)
L(f10)
L(f11)
L(f14)
L(f15)
ε0.0050.0040.0030.0020.0010
0.544
0.542
0.54
0.538
0.536
0.534
PSfrag replacements
L(f7)
L(f8)
L(f10)
L(f11)
L(f14)
L(f15)
ε
Figure 9: Typical plot of L(fN) and L(fN+1) for 0 < ε < 1/N 2
We start with inequality (7.8) for αk and αk+1. Firstly notice that Nk > 5N and 2k > 2
for k ≥ 6, and so αk > 0 for any k ≥ 6.
Therefore, with any ε ≤ 1/N 2, one can write
kαk − ε(2N − k + 1)αk+1 ≥ kαk −1
N2(2N − k + 1)αk+1,
where, upon substituting for αk+1, the numerator of the latter expression
N2kαk − (2N − k + 1)αk+1 = (k2 − 5k)N 3 + 2(k2 − 2k + 4)N 2
+ (k2 − 9k + 4)N + 2k(k − 1)
is always strictly positive for all k ≥ 6, since
• the leading coefficient of N 3: (k2 − 5k) > 0, ∀k ≥ 6,
• the coefficient 2(k2 − 2k + 4) of N 2 is at least twice the coefficient (k2 − 9k + 4)
of N and
• the last term 2k(k − 1) is also positive for k ≥ 6.
Hence, our first assertion (7.8) holds.
Fix a positive integer N and check that βk > 0. Setting tdef= k/N where t ∈ (0, 2) since
3 ≤ k ≤ N + 3, We rewrite βk as
βk = 18 + (47 − 33t) N + (25 − 52t + 18t2) N2 + (4 − 13t + 13t2 − 3t3) N 3 (7.10)
20
t
21.510.50
6
5
4
3
2
1
PSfrag replacements
y = −4
y = −14
y = 25−52t+18t2
4−13t+13t2−3t3
y = 47−33t
4−13t+13t2−3t3
y = 4 − 13t + 13t2 − 3t3
t
21.510.50
120
100
80
60
40
20
0
-20
PSfrag replacements
y = −4
y = −14
y = 25−52t+18t2
4−13t+13t2−3t3
y = 47−33t
4−13t+13t2−3t3
y = 4 − 13t + 13t2 − 3t3
Figure 10: Plot of 4 − 13t + 13t2 − 3t3 and ratios 25−52t+18t2
4−13t+13t2−3t3, 47−33t
4−13t+13t2−3t3
The above expression has properties (see Figure 10) that
min0<t≤2
{
4 − 13t + 13t2 − 3t3}
> 0,
min0<t≤2
{
25 − 52t + 18t2
4 − 13t + 13t2 − 3t3
}
> −14,
min0<t≤2
{
47 − 33t
4 − 13t + 13t2 − 3t3
}
> −4.
Thus,
βk > µN(N 2 − 14N − 4) + 18 = µN(N − 7 +√
53)(N − 7 −√
53) + 18,
where µdef= 4 − 13t + 13t2 − 3t3. Therefore, it is clear that βk > 0 for any N > 14.
Under the assumption N > 14, we now proceed to the proof of the inequality (7.9) for
βk and βk+1. We know that
kβk − ε(N − k + 3)βk+1 ≥ kβk −1
N2(N − k + 3)βk+1, ∀ε ≤ 1
N2.
21
1.210.80.60.40.20
120
100
80
60
40
20
0
PSfrag replacements
t
y = (51−46 t+13 t2−3 t
3)4−13t+13t2−3t3
y =−(24−65 t+48 t
2−9 t
3) 1
k
4−13t+13t2−3t3
Figure 11: Plot of y = (51−46 t+13 t2−3 t3)4−13t+13t2−3t3
and y =−(24−65 t+48 t2−9 t3) 1
k
4−13t+13t2−3t3with k = 2.
1.210.80.60.40.20
70
60
50
40
30
20
10
0
PSfrag replacements
t
y = (30−26 t+9 t2)
4−13t+13t2−3t3
y =−(44−84 t+27 t
2) 1
k
4−13t+13t2−3t3
Figure 12: Plot of y = (30−26 t+9 t2)4−13t+13t2−3t3
and y =−(44−84 t+27 t2) 1
k
4−13t+13t2−3t3with k = 6.
22
1.210.80.60.40.20
15
10
5
0
-5
-10
PSfrag replacements
t
y = 8−6t
4−13t+13t2−3t3
y =−(24−18t) 1
k
4−13t+13t2−3t3
Figure 13: Plot of y = (8−6t)4−13t+13t2−3t3
and y =−(24−18t) 1
k
4−13t+13t2−3t3with k = 6.
Multiplying the right hand side of the above inequality by N2
k, we obtain
N2
k
(
kβk −1
N2(N − k + 3)βk+1
)
= N2βk + βk+1 −1
k(N + 3)βk+1
=[
−(24 − 18t)1
k+ (8 − 6t)
]
N
+[
−(44 − 84 t + 27 t2)1
k+ (30 − 26 t + 9 t2)
]
N2
+[
−(24 − 65 t + 48 t2 − 9 t3)1
k+ (51 − 46 t + 13 t2 − 3 t3)
]
N3
+[
−(4 − 13 t + 13 t2 − 3 t3)1
k+ (25 − 52 t + 18 t2)
]
N4
+ (4 − 13 t + 13 t2 − 3 t3)N 5, (7.11)
where t ∈ (0, 1.22) (since k ≥ 6 and N > 14). The latter expression has properties that
for t ∈ (0, 1.22) and k ≥ 6
• 4 − 13 t + 13 t2 − 3 t3 > 0 (see Figure 10)
• −(4 − 13 t + 13 t2 − 3 t3) 1k
+ (25 − 52 t + 18 t2)
4 − 13 t + 13 t2 − 3 t3> −14 − 1
6(see again Figure 10)
23
• −(24 − 65 t + 48 t2 − 9 t3) 1k
+ (51 − 46 t + 13 t2 − 3 t3)
4 − 13 t + 13 t2 − 3 t3> 0 (see Figure 11)
• −(44 − 84 t + 27 t2) 1k
+ (30 − 26 t + 9 t2)
4 − 13 t + 13 t2 − 3 t3> 0 (see Figure 12)
• −(24 − 18t) 1k
+ (8 − 6t)
4 − 13 t + 13 t2 − 3 t3> 0 (see Figure 13)
Therefore,
N2
k
(
kβk −1
N2(N − k + 3)βk+1
)
> µ(
N5 − (14 +1
6)N4
)
> 0, ∀N > 14.
This concludes the proof of the second inequality (7.9).
References
Andramonov, M. (2002). Solving problems with min-type functions by disjunctive pro-
gramming. submitted to Journal of Global Optimization.
Andramonov, M., Filar, J. A., Rubinov, A., & Pardalos, P. (2000). Hamiltonian cy-
cle problem via markov chains and min-type approaches. In P. M. Padalos (Ed.),
Approximation and Complexity in Numerical Optimization: Continuous and Discrete
Problems (pp. 31–47). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Blackwell, D. (1962). Discrete dynamic programming. Ann. Math. Statist., 33, 719–726.
Chen, M. & Filar, J. A. (1992). Hamiltonian cycles, quadractic programming and
ranking of extreme points. In C. A. Floudas & P. M. Padalos (Eds.), Recent Advances
in Global Optimization (pp. 32–49). Princeton, New Jersey: Princeton University
Press.
Ejov, V., Filar, J. A., & Gondzio, J. (2002). MDP-based optimization algorithm for the
HCP. in preparation.
Feinberg, E. (2000). Constrained discounted markov decision processes with hamiltonian
cycles. Mathematics of Operations Research, 25, 130–140.
Filar, J. A. & Krass, D. (1994). Hamiltonian cycles and markov chains. Mathematical
Operations Research, 19, 223–227.
24
Filar, J. A. & Lasserre, J.-B. (2000). A non-standard branch and bound method for the
hamiltonian cycle problem. ANZIAM J., 42 (E), C556–C577.
Filar, J. A. & Liu, K. (1996). Hamiltonian cycle problem and singularly perturbed
decision process. in Statistics, Probabitily and Game Theory: Papers in Honor of
David Blackwell, IMS Lecture Notes - Monograph Series, USA, .
Filar, J. A. & Vrieze, K. (1996). Competitive Markov Decision Processes. N.Y.: Springer-
Verlag.
Hordijk, A. & Kallenberg, L. (1984). Constrained undiscounted stochastic dynamic
programming. Math. Oper. Res., 9, 276–289.
Kallenberg, L. C. M. (1983). Linear Programming and Finite Markovian Control Prob-
lems. Mathematical Centre Tracts 148. Mathematical Centre, Amsterdam.
Kallenberg, L. C. M. (2002). Finite state and action MDPs. In E. Feiberg & A. Shwartz
(Eds.), Handbook of Markov Decision Processes (pp. 21–87). Boston: Kluwer.
Ross, K. (1989). Randomized and past-dependent policies for markov decision processes
with multiple constraints. Oper. Res., 37, 474–477.
25