Nested Family of Cyclic Games with $ k $-total Effective Rewards

24
arXiv:1412.6072v1 [cs.DM] 20 Nov 2014 Nested Family of Cyclic Games with k -total Effective Rewards Endre Boros * Khaled Elbassioni Vladimir Gurvich Kazuhisa Makino December 19, 2014 Abstract We consider Gillette’s two-person zero-sum stochastic games with per- fect information. For each k Z+ we introduce a payoff function, called the k-total reward. For k = 0 and 1 these payoffs are known as mean payoff and total reward, respectively. We restrict our attention to the determin- istic case, the so called cyclic games. For all k, we prove the existence of a saddle point which can be realized by pure stationary strategies. We also demonstrate that k-total reward games can be embedded into (k +1)-total reward games. In particular, all of these classes contain mean payoff cyclic games. Keywords: stochastic game with perfect information, cyclic games, two-person, zero-sum, mean payoff, total reward 1 Introduction We consider two-person zero-sum stochastic games with perfect informa- tion and, for each positive integer k we define an effective payoff func- tion, called the k-total reward, generalizing the classical mean payoffs [4] (k = 0), as well as the total rewards [19, 20] (k = 1). In this paper, we restrict ourselves by two-person zero-sum games with deterministic states, and the solution concept is Nash equilibrium, which is just a saddle point in the considered case. We call the considered family of games k-total reward BW-games, where B and W stand for the two players, Black, the minimizer and White, the maximizer. We denote by R the set of reals, by Z the set of integers, and by Z+ the set of nonnegative integers. For a subset S Z+, let R S denote the MSIS Department and RUTCOR, Rutgers University, 100 Rockafellar Road, Livingston Campus Piscataway, NJ 08854, USA; ({boros,gurvich}@rutcor.rutgers.edu) Masdar Institute of Science and Technology, P.O.Box 54224, Abu Dhabi, UAE; (kelbas- [email protected]) Research Institute for Mathematical Sciences (RIMS) Kyoto University, Kyoto 606-8502, Japan; ([email protected]) 1

Transcript of Nested Family of Cyclic Games with $ k $-total Effective Rewards

arX

iv:1

412.

6072

v1 [

cs.D

M]

20

Nov

201

4

Nested Family of Cyclic Games with k-total

Effective Rewards

Endre Boros∗ Khaled Elbassioni† Vladimir Gurvich†

Kazuhisa Makino‡

December 19, 2014

Abstract

We consider Gillette’s two-person zero-sum stochastic games with per-fect information. For each k ∈ Z+ we introduce a payoff function, calledthe k-total reward. For k = 0 and 1 these payoffs are known as mean payoffand total reward, respectively. We restrict our attention to the determin-istic case, the so called cyclic games. For all k, we prove the existence of asaddle point which can be realized by pure stationary strategies. We alsodemonstrate that k-total reward games can be embedded into (k+1)-totalreward games. In particular, all of these classes contain mean payoff cyclicgames.

Keywords: stochastic game with perfect information, cyclic games,two-person, zero-sum, mean payoff, total reward

1 Introduction

We consider two-person zero-sum stochastic games with perfect informa-tion and, for each positive integer k we define an effective payoff func-tion, called the k-total reward, generalizing the classical mean payoffs [4](k = 0), as well as the total rewards [19, 20] (k = 1).

In this paper, we restrict ourselves by two-person zero-sum games withdeterministic states, and the solution concept is Nash equilibrium, which isjust a saddle point in the considered case. We call the considered familyof games k-total reward BW-games, where B and W stand for the twoplayers, Black, the minimizer and White, the maximizer.

We denote by R the set of reals, by Z the set of integers, and by Z+

the set of nonnegative integers. For a subset S ⊆ Z+, let RS denote the

∗MSIS Department and RUTCOR, Rutgers University, 100 Rockafellar Road, LivingstonCampus Piscataway, NJ 08854, USA; ({boros,gurvich}@rutcor.rutgers.edu)

†Masdar Institute of Science and Technology, P.O.Box 54224, Abu Dhabi, UAE; ([email protected])

‡Research Institute for Mathematical Sciences (RIMS) Kyoto University, Kyoto 606-8502,Japan; ([email protected])

1

set of vectors indexed by the elements of S. In particular, S = S(R) ⊆R

Z+\{0} denotes the set of infinite integral sequences with elements notlarger in absolute value than R, and for a ∈ S we write a = (a1, a2, . . .).Furthermore, for n ∈ Z+ \ {0} we define [n] = {1, 2, . . . , n} and writesimply R

n instead of R[n].To describe BW-games, let us consider a directed graph (digraph)

G = (V,E), whose vertices (positions or states) are partitioned into twosets V = B ∪ W , a fixed initial state v0 ∈ V , a real-valued functionr : E → Z assigning integer weights to the arcs (moves), and a mappingπ : S → R. We call the tuple (G,B,W ) a BW-game form and r itslocal rewards, while the tuple (G,B,W, r, π) is called a BW-game and πis its effective reward. Two players, Black (the minimizer) and White

(the maximizer) control the positions of B and W , respectively. Thegame is played by starting at time t = 0 in the initial node (or position)s0 = v0. In a general step, in time t, we are at node st ∈ V . The playerwho controls st chooses an outgoing arc et+1 = (st, v) ∈ E, and the gamemoves to node st+1 = v. We assume, in fact without any loss of generality,that every vertex in G has an outgoing arc. (Indeed, if not, one can addloops to the corresponding vertices.) We assume that an initial vertex v0is fixed. However, when we talk about solving a BW-game, we consider(separately) all possible initial vertices.

In the course of this game players generate an infinite sequence ofedges p = (e1, e2, . . .) (a play) and the corresponding real sequence r(p) =(r(e1), r(e2), . . .) ∈ S of local rewards. At the end (after infinitely manysteps) Black pays White π(r(p)) amount. Naturally, White’s aim is tocreate a play which maximizes this payoff, while Black tries to minimizeit. (Let us note that the local reward function r : E → R may havenegative values, and π(r(p)) maybe negative too, in which case White hasto pay Black.) As usual, a pair of (not necessarily stationary) strategiesis a saddle point if neither of the players can improve individually bychanging her/his strategy. The corresponding π(r(p)) is the value of thegame.

As we shall see later, it will be enough to restrict ourselves, and theplayers, to their stationary strategies in these games. This means thateach player chooses, in advance, a move in every position that he controlsand makes this move whenever the play comes to this position. Then,the play is uniquely determined by the one time selection of arcs and bythe initial position. Such a play always looks like a “lasso”: it consists ofan initial path entering a directed cycle, which is then repeated infinitelymany times.

Mean payoff (undiscounted) stochastic games, introduced in [4], areBW-games (see also [16, 15, 3, 8]) with payoff function π = φ:

φ(a) = lim infT→∞

1

T

T∑

j=1

aj . (1)

Such a game is known to have a saddle point in pure stationary strategies[4, 12] and these strategies can be computed in pseudo-polynomial time[17, 22]. Let us remark that the problem of deciding if the value of a mean

2

payoff BW-game is below (or above) a given threshold belongs to both NPand co-NP ([8, 11, 22]). The exact complexity of this problem is howeverstill not known; the best known algorithms are either pseudo-polynomial[17, 22] or randomized subexponential [1, 9, 21].

Discounted mean payoff stochastic games were in fact introduced ear-lier in [18] and have payoff function π = φβ:

φβ(a) = (1− β)∞∑

j=1

βj−1aj . (2)

As a consequence of the classical Hardy-Littlewood Tauberian theorems[10] we have the equality

φ(a) = limβ→1

φβ(a). (3)

Discounted games, in general, are easier to solve, due to the fact that astandard value iteration is in fact a converging contraction. Hence, theyare widely used in the literature of stochastic games together with theabove limit equality. In fact, for mean payoff BW-games it is known [22]that for two sequences a,b ∈ S(R) we have φβ(a) < φβ(b) if and only ifφ(a) < φ(b) whenever 1− β ≤ 1

4n3R.

Total reward, introduced in [19] and considered in more detail in [20],is defined by

ψ(a) = lim infT→∞

1

T

T∑

i=1

i∑

j=1

aj . (4)

It was shown in [20] that a total reward game is equivalent with a meanpayoff game having countably many states. The authors derive from thisthat every total reward game has a value. Furthermore, ǫ-optimal Marko-vian strategies can be constructed by solving a discounted mean payoffgame on the same graph with the same rewards using a discount factorβ close enough to 1. The proof of the latter is analogous to the proof in[13].

It is worth noting that the 1-total reward games with nonnegative localrewards are polynomially solvable [14]. This contrasts the fact that meanpayoff games with nonnegative rewards are as hard as general mean payoffgames, and that the fastest known algorithms for mean payoff BW-gamesare either pseudo-polynomial [8, 17, 22] or randomized subexponential[1, 9, 21].

Main Results

In this paper we extend and generalize the above results. For every k ∈Z+, we define the k-total effective reward, which coincide with the meanpayoff when k = 0 and with the total reward when k = 1.

In general, given a sequence of local rewards a = (a0, a1, a2, . . .) let usassociate to it another sequence M(a) = (a0, a0 + a1, a0 + a1 + a2, . . .).Then the k-total payoff is defined as the mean payoff of the sequenceMk(a).

3

First, we show that for every k a k-total reward game, when restrictedto stationary strategies, has the same optimal strategy as the discountedmean payoff game on the same directed graph and with the same localrewards, if the discount factor is close enough to 1.

Second, we show that for every k in a k-total reward game, to a fixedstationary strategy of one player, there exists a best response of the otherthat is pure and stationary. This with the previous claim implies theexistence of a saddle point in uniformly optimal stationary strategies inany k-total reward BW-game.

Next, we prove that the k-total reward games can be embedded intothe family of (k + 1)-total reward games, for each k ∈ Z+. In particular,mean payoff games can be embedded into k-total reward games for all k ∈Z+. This containment and the example in [7] prove that for each k ∈ Z+,there is a non-zero sum k-total reward game without Nash equilibria.

Finally, it seems possible (and important) to generalize the above re-sults to general stochastic games with (or even without) perfect informa-tion, or in other words, to replace the BW-model considered in this paperby the BWR-model, where R stands for positions of chance. For k = 1,it was shown in [2] that there always exists a saddle point, which canbe realized by pure stationary uniformly optimal strategies. The similarquestion for k > 1 remains open.

Remark 1 The k-total effective payoff looks somewhat similar to the mo-ment Mk in the theory of probability. The first two cases k = 0 and k = 1are easy to interpret and they play a very important role: the first twomoments M0 and M1 are the expectation and variance, the 0- and 1-totalpayoffs are the mean and total effective payoffs, respectively. Yet, thehigher moments have some important applications too.

Remark 2 We note that, for k = 1, the existence of optimal stationarystrategies in k-total reward games can be derived from the general results in[5]. The authors also left an open question at the end of the paper aboutpayoff functions φ that guarantee positional optima for all BW-games,namely, whether, φ can always be expressed as a non-decreasing functionof a fairly mixing mapping of the reward sequence. It is easy however tocheck that the 2-total reward answers this question in the negative.

2 Iterated Total Rewards

In this section we introduce a complete hierarchy of payoff functions pro-viding a natural generalization of mean and total payoffs, and their dis-counted counterparts.

Let us first introduce three operators acting on infinite sequences ofreals. The limiting average A : S → R and discounted limiting averageAβ : S → R operators map an infinite sequence into the set of reals, whilethe moment M : S → S operator maps it into another infinite sequence.

More precisely, given an infinite sequence a = (a1, a2, . . .) and a real

4

0 < β ≤ 1 we define

A(a) = lim infT→∞

1

T

T∑

j=1

aj (5)

and

Aβ(a) = (1− β)∞∑

j=1

βj−1aj . (6)

For convenience, we also extend the definition of the operator A for finitesequences a to denote the average of the elements of a.

Finally, recall that we define M(a) = b = (b1, b2, . . .) ∈ S by setting

bi =

i∑

j=1

aj for all i = 1, 2, . . . (7)

For k = 0, 1, . . . , we call Mk(a) the kth moment sequence of a, and defineM0(a) = a.

We also introduce the following families of functions φ(k) : S → R andφ(k)β : S → R for k = 0, 1, . . ., defined by

φ(k)(a) = A(Mk(a)

)and φ

(k)β (a) = Aβ

(Mk(a)

). (8)

Let us note that φ0 = φ is the mean payoff function, φ0β = φβ is

the discounted mean payoff, while φ1 = ψ is the total reward. Followingthis terminology, we call φ(k) the k-total reward, and φ

(k)β the discounted

k-total reward. Thus the above hierarchy of payoff functions provides anatural generalization of mean payoff and total reward.

We show first that the M operator changes the discounted total re-wards by a factor that depends only on β.

Fact 1 For all a ∈ S and for all 0 < β < 1 we have

Aβ(M(a)) =1

1− βAβ(a).

Proof Using definition (6) and (7) we can write

Aβ(M(a)) = (1− β)∞∑

i=1

βi−1

(i∑

j=1

aj

)

= (1− β)∞∑

j=1

aj

(∞∑

i=j

βi−1

)

= (1− β)

∞∑

j=1

βj−1aj

(∞∑

i=j

βi−j

)

= (1− β)

∞∑

j=1

βj−1aj

(∞∑

ℓ=0

βℓ

)

= (1− β)

∞∑

j=1

βj−1aj

(1

1− β

)

=1

1− βAβ(a).

5

Given two sequences, x ∈ Zp and y ∈ Z

q, let us denote by a = (x(y))the infinite sequence obtained by listing first the elements of x and thenrepeating y cyclically, infinitely many times. Let us call such an a ∈ S alasso sequence, and let us denote the set of lasso sequences that can arisefrom a graph on n vertices by

Sn(R) =

a = (x(y))

∣∣∣∣∣∣

p, q ∈ Z+, p+ q ≤ n

x ∈ [−R,R]p, y ∈ [−R,R]q

,

where [−R,R] denotes the set of integers of absolute value not exceedingR. Note that a BW-game with n states in stationary strategies alwaysproduces a play p such that the corresponding rewards sequence r(p)belongs to Sn(R), where R is an upper bound on the absolute values ofthe integral arc rewards. We shall simply write Sn when R is not specified.

To be able to state and prove our main results about iterated totalrewards, we need next to analyze the above operations on lasso sequences.

Fact 2 For a = (x(y)) ∈ Sn, we have

ai =

xi if i ≤ p,

yr if i = p+ ℓq + r for some integers ℓ ≥ 0, 0 < r ≤ q.

Fact 3 For a = (x(y)) ∈ Sn, we have

M(a)i =

i∑

j=1

xj if i ≤ p

p∑

j=1

xj + ℓ

q∑

j=1

yj +r∑

j=1

yj if i = p+ ℓq + r for integers ℓ ≥ 0, 0 < r ≤ q.

Fact 4 For a = (x(y)) ∈ Sn, we have A(M(x(y))) =

{+∞ if A(x(y)) > 0,−∞ if A(x(y)) < 0.

Note that there is an example in [2] showing that Fact 4 does not nec-essarily hold for reward sequences corresponding to non-stationary strate-gies.

Let us also note that M(x(y)) is not a lasso sequence, in general.However, the above facts, obtained by simple counting arguments, implythe following claim, the second part of which can be obtained from thefirst part by induction on k:

Corollary 1 If x(y) ∈ Sn(R) such that A(y) = 0, then M(x(y)) =x(y) ∈ Sn(nR), where

x = M(x), andy = pA(x) +M(y).

Furthermore, if x(y) ∈ Sn(R) such that A(x(y)) = A(M(x(y))) = · · ·= A(Mk−1(x(y))) = 0, then Mk(x(y)) ∈ Sn(n

kR).

6

Recall that by adding a scalar to a vector we mean incrementing all com-ponents of the vector by the same scalar value.

The above properties allow us to generalize an inequality between thediscounted and undiscounted payoffs shown by [22] for the mean payoffcase.

Lemma 1 If x(y) ∈ Sn(R) such that A(x(y)) = A(M(x(y))) = · · ·= A(Mk−1(x(y))) = 0, then

|φ(k)(x(y))−1

(1− β)kφβ(x(y))| ≤ 2(1− β)nk+2R.

Proof It was shown in [22] that for a lasso sequence (x(y)) ∈ Sn(R) wehave

|A(x(y))− Aβ(x(y))| ≤ 2(1− β)n2R.

Applying this for an arbitrary lasso sequence x(y) ∈ Sn(nkR) we get

|A(x(y))− Aβ(x(y))| ≤ 2(1− β)n2(nkR) = 2(1− β)nk+2R.

By Corollary 1 we have Mk(x(y)) ∈ Sn(nkR), thus applying the above

for x(y) =Mk(x(y)) we get

|A(Mk(x(y)))−Aβ(Mk(x(y)))| ≤ 2(1− β)nk+2R.

By Fact 1 we have Aβ(Mk(a)) = 1

(1−β)kAβ(a), and thus the above implies

our claim. �

Let us note that we can prove actually a slightly better inequality fork = 1; see Lemma 9 in the appendix. This is because not all sequencesin Sn(nR) are of the form M(x(y)) for some x(y) ∈ Sn(R). To prove asimilarly stronger inequality for any k we would need to understand betterthe functional dependence of Mk(x(y)) on x(y) under the constraintsA(x(y)) = A(M(x(y))) = · · · = A(Mk−1(x(y))) = 0, which might bea harder task. For our purpose the above weaker inequalities are stillenough.

3 Uniform Optimality within Stationary

Strategies

Let us consider a BW-game form (G,B,W ) and an integral local rewardr : E → Z. As before, we denote by R the largest absolute value of an arcreward and use V = B∪W for the set of nodes. For a subset F ⊆ E of thearcs of the directed graph G = (V,E) we denote by d+F (v) the out-degreeof vertex v ∈ V in the subgraph (V, F ). A subset F ⊆ E of the arcs isa pure stationary strategy of Black (resp., White) if d+F (v) = 1 for allv ∈ B (resp., for all v ∈ W ). Let us denote by SB and SW the sets ofstationary strategies of Black and White, respectively.

Given a situation (a pair of stationary strategies), b ∈ SB and w ∈SW , we have a unique walk e1, e2, . . . , ep, ep+1, . . . , ep+q, ep+1, ep+2, . . . , ep+q, . . .from every initial node v0 ∈ V , which consists of an initial path e1, e2, . . . , ep

7

followed by a cycle ep+1, . . . , ep+q, which we traverse infinitely many times.We denote by x = x(b,w) and y = y(b,w) the corresponding reward se-quences x = (r(ej) | j = 1, . . . , p) and y = (r(ep+j) | j = 1, . . . , q), and byc(v0; b,w) the payoff value corresponding to payoff function π:

c(v0; b,w) = π(x(y)).

We say that a situation (b∗,w∗), b∗ ∈ SB , w∗ ∈ SW is a saddle point forinitial position v0 ∈ V if

c(v0; b∗,w) ≤ c(v0; b

∗,w∗) ≤ c(v0; b,w∗) (9)

hold for all b ∈ SB and w ∈ SW . We say that (b∗,w∗) is a uniformsaddle point if (9) holds for all initial positions v0 ∈ V .

The following result follows essentially from [18].

Fact 5 A BW-game with the discounted mean payoff function π = φβ hasa uniform saddle point for all 0 < β < 1,

We shall show next that the same pair of stationary strategies form auniform saddle point with respect to the total reward π = ψ, if β is closeenough to 1.

Theorem 1 Consider a BW-game (G,B,W, r) as above, and choose adiscount factor satisfying 0 ≤ (1−β) < 1

4nk+4R. Let us consider a uniform

saddle point (b∗,w∗) with respect to the discounted mean payoff φβ. Then,(b∗,w∗) is also a uniform saddle point with respect to the k-total rewardφ(k).

Proof We introduce two preorders ≺β and ≺k on the set of lasso se-quences Sn(R) as follows: for two sequences (x(y)) and x(y)) we say that(x(y)) ≺β (x(y)) (resp., (x(y)) ≺k (x(y))) if φβ(x(y)) ≤ φβ(x(y)) (resp.,φ(k)(x(y)) ≤ φ(k)(x(y))). We shall show that the preorder ≺β inducedby φβ on Sn(R) is a refinement of ≺k induced by φ(k). For this end let usfirst prove some partial claim about ≺β . Introduce

S(ℓ,+)={x(y) ∈ Sn(R) | φ

(j)(x(y)) = 0 for j = 0, 1, . . . , ℓ− 1, and φ(ℓ)(x(y)) > 0}

S(ℓ,−)={x(y) ∈ Sn(R) | φ

(j)(x(y)) = 0 for j = 0, 1, . . . , ℓ− 1, and φ(ℓ)(x(y)) < 0}

for ℓ = 0, 1, . . . , k − 1, and set

S∗ ={x(y) ∈ Sn(R) | φ

(j)(x(y)) = 0 for j = 0, 1, . . . , k − 1}.

It is immediate to see by these definitions that these sets partition Sn(R).We claim that the following relations hold:

S(0,−) ≺β · · · S(k−1,−) ≺β S∗ ≺β S(k−1,+) ≺β · · · ≺β S(0,+). (10)

To see this, let us fix an index 0 ≤ ℓ < k, and consider x(y) ∈ S(ℓ,+) andx(y) ∈ S(ℓ + 1,+) (or this could be S∗ if ℓ = k − 1). Lemma 1 impliesthat ∣∣∣∣φ

(ℓ)(x(y))−φβ(x(y))

(1− β)ℓ

∣∣∣∣ ≤ 2(1− β)nℓ+2R.

8

Since M ℓ(x(y)) ∈ Sn(nℓR) (by Corollary 1) is an integer sequence, thus

by the definition of the A operator we get

φ(ℓ)(x(y)) ≥1

q≥

1

n.

Thus, since we have 2(1− β)nℓ+2R < 12n

the above implies

φβ(x(y))

(1− β)ℓ>

1

2n. (11)

On the other hand we can apply Lemma 1 for x(y), too, yielding

∣∣∣∣φ(ℓ)(x(y))−

φβ(x(y))

(1− β)ℓ

∣∣∣∣ ≤ 2(1− β)nℓ+2R.

Since we have φ(ℓ)(x(y)) = 0, we get

φβ(x(y))

(1− β)ℓ<

1

2n. (12)

Inequalities (11) and (12) together imply that S(ℓ+ 1,+) ≺β S(ℓ,+) (orS∗ ≺β S(k−1,+)). Analogous arguments work for the negative side, too,and hence (10) follows.

Since φ(k) has value +∞ on S(ℓ,+) for ℓ = 0, 1, . . . , k − 1 and value−∞ on the sets S(ℓ,−) for ℓ = 0, 1, . . . , k − 1 by Corollary 1, the onlything left to prove that ≺β is a refinement of that of ≺k is to show thatthese two preorders do not conflict on the set S∗.

To this end let us consider two lasso sequences x(y), x(y) ∈ S∗ suchthat φ(k) has different values on these sequences. Since φ(k)(x(y)) =A(Mk(x(y))) and for x(y) ∈ S∗ we have Mk(x(y)) ∈ Sn(n

kR), we againcan conclude that the difference between two nonequal φ(k) values is al-ways at least 1/n2, say:

φ(k)(x(y))− φ(k)(x(y)) ≥1

n2.

We can apply Lemma 1 for these sequences yielding∣∣∣φ(k)(x(y))−

φβ (x(y))

(1−β)k

∣∣∣ ≤ 2(1− β)nk+2R, and

∣∣∣φ(k)(x(y))−φβ (x(y))

(1−β)k

∣∣∣ ≤ 2(1− β)nk+2R.

Since we have 2(1 − β)nk+2R < 12n2 , the above three inequalities imply

φβ(x(y)) > φβ(x(y)), proving our claim, and hence completing the proofof the theorem. �

Let us remark that the proof of the above theorem in fact provides amore complete picture about these iterated total reward functions thanthe statement of the theorem alone. Namely, the payoff functions φ(k),k = 0, 1, . . . form a nested sequence, in the sense that the lasso sequenceson which φ(k+1) vanishes form a subset of those where φ(k) vanishes,and φ(k+1) has a finite value only on sequences on which φ(k) vanishes.

9

Furthermore, (10) can be claimed for an arbitrary integer k, providing acomplete hierarchy in the limit, and we conjecture that S∗ in the limitcontains only zero sequences. It is also interesting to see that all thesepayoffs rank lasso sequences in agreement with the discounted payoff, andall of them has “essentially” the same discounted version.

4 Best Response to Stationary Strategies

In this section, we will use two operators on finite sequences: for asequence a = (a1, a2, . . . , an) we have S(a) =

∑ni=1 ai and M(a) =

(a1, a1 + a2, . . . , S(a)). Note that both M and S are linear operators.

Lemma 2 For sequences a,b ∈ Rn and real λ ∈ R we have

S(a+ b) = S(a) + S(b) (13a)

S(λa) = λS(a) (13b)

M(a+ b) =M(a) +M(b) (13c)

M(λa) = λM(a) (13d)

Proof By definitions of these operators. �

For a sequence I of integers (indices), we denote by aI the sequenceof the corresponding a components. E.g., if a = (a1, a2, a3, . . .) and I =(1, 2, 3, 2), then aI = (a1, a2, a3, a2). We denote by [1, n] the sequence ofintegers from 1 to n. If a and b are two finite sequences, then we denoteby [a;b] their concatenation. For a subset I of indices we denote by eI

the sequence of 1-s of length |I | indexed by i ∈ I .We defineM0(a) = a, and for k > 1 we defineMk(a) =M(Mk−1(a)).

With these notation, the k-total value of an infinite sequence a of localrewards (corresponding to a play) can be rewritten as

φ(k)(a) = lim infT→∞

1

TS(Mk(a[1,T ])).

Assume in the sequel that X, Y and Z are subsets of the indices, anda, b, and c are finite sequences of appropriate lengths.

Lemma 3

M(a[X;Y ]) = [M(aX); M(aY ) + S(aX)eY ] (14a)

Mk(a[X;Y ]) =

[Mk(aX);Mk(aY ) +

k∑

ℓ=1

S(Mk−ℓ(aX))M ℓ−1(eY )

].

(14b)

Proof Equality (14a) follows by the definitions ofM and S. For (14b) weuse (14a), (13c), (13d) and induction on k. For k = 0 we getM0(a[X;Y ]) =[aX ; aY ], and for k = 1 we get

M(a[X;Y ]) = M([aX ; aY ]) = [M(aX);M(aY ) + S(aX)eY ]

10

by the definition of M , as in (14a). Then, by induction on k we get

Mk+1(a[X;Y ]) = M

([Mk(aX);Mk(aY ) +

k∑

ℓ=1

S(Mk−ℓ(aX))M ℓ−1(eY )

])

=

[Mk+1(aX);Mk+1(aY ) + S(Mk(aX))eY +

k∑

ℓ=1

S(Mk−ℓ(aX))M ℓ(eY )

]

=

[Mk+1(aX);Mk+1(aY ) +

k+1∑

ℓ=1

S(Mk+1−ℓ(aX))M ℓ−1(eY )

].

Corollary 2

S(Mk(a[X;Y ])) = S(Mk(aX)) + S(Mk(aY )) +

k∑

ℓ=1

S(Mk−ℓ(aX))S(M ℓ−1(eY ))

= S(Mk(aX)) + S(Mk(aY )) +

k∑

ℓ=1

S(Mk−ℓ(aX))

(|Y |+ ℓ− 1

).

Proof By Lemmas 2 and 3, and by the equality

S(M ℓ−1(eY )) =

(|Y |+ ℓ− 1

). (15)

Corollary 3

Mk(a[X;Y ;Z]) =

[Mk(aX);Mk(aY ) +

k∑

ℓ=1

S(Mk−ℓ(aX))M ℓ−1(eY );

Mk(aZ) +

k∑

ℓ=1

(S(Mk−ℓ(aX)) + S(Mk−ℓ(aY ))

)M ℓ−1(eZ)

+

k∑

ℓ=1

S(Mk−ℓ(aX))

ℓ−1∑

m=1

(|Y |+ ℓ− 1−m

ℓ−m

)Mm−1(eZ)

].

Proof Apply Lemma 3 with [Y ;Z] instead of Y , and then again withY,Z instead of X,Y , and use S(M ℓ−1−m(eY )) =

(|Y |+ℓ−1−m

ℓ−m

). �

Corollary 4

S(Mk(a[X;Y ;Z])) = S(Mk(aX)) + S(Mk(aY )) + S(Mk(aZ))

+

k∑

ℓ=1

(S(Mk−ℓ(aX))

(|Y |+ |Z| + ℓ− 1

)+ S(Mk−ℓ(aY ))

(|Z|+ ℓ− 1

)).

11

Proof Apply (13a) and (13b) of Lemma 2 and Corollary 3 above togetherwith the equality (15) applied for both eY and eZ , and finally use thebinomial identity

ℓ∑

m=0

(|Y |+ ℓ− 1−m

ℓ−m

)(|Z| +m− 1

m

)=

(|Y |+ |Z|+ ℓ− 1

). (16)

Corollary 5

S(Mk(a[X;Y ;Z])) = S(Mk(a[X;Z]))+k∑

ℓ=0

(|Z|+ k − 1− ℓ

k − ℓ

)S(M ℓ(a[X;Y ])).

Proof Elementary calculations by Corollaries 2, 4, and by the binomialidentity (16). �

Corollary 6 If a = x(y) is a lasso sequence, where x = aX and y = aY ,and

S(M ℓ(a[X;Y ])) = 0 for ℓ = 0, 1, . . . , k − 1, (17a)

then we have

φ(k)(a) =1

|Y |S(Mk(a[X;Y ])). (17b)

Furthermore, if condition (17a) does not hold, and 0 ≤ m < k is thesmallest index with S(M (m)(a[X;Y ])) 6= 0, then we have

φ(k)(a) =

{−∞ if S(M (m)(a[X;Y ])) < 0,

+∞ if S(M (m)(a[X;Y ])) > 0.(17c)

Proof For a positive integer T let us define r(T ) = (T−|X|) mod |Y | and

α(T ) = T−|X|−rT|Y |

. Furthermore, let us denote by Rr the first r elements

of Y , for 0 ≤ r < |Y |, and by αY = [Y, Y, . . . , Y ] the concatenation of Yα times. Then, for the lasso sequence a = aX(aY ) we have

φ(k)(a) = lim infT→∞

1

TS(Mk(a[X;α(T )Y ;Rr(T )]

). (18)

Let us now apply Corollary 5 with Z = [(β − 1)Y ;Rr(T )] for an arbitrarypositive integer β to get

S(Mk(a[X;βY ;Rr(T )]) = S(Mk(a[X;(β−1)Y ;Rr(T )]

)

+

k∑

ℓ=0

(r(T ) + (β − 1)|Y |+ k − 1− ℓ

k − ℓ

)S(M ℓ(a[X;Y ])). (19)

12

Assume first that condition (17a) holds. Then the above equality simplifiesto

S(Mk(a[X;βY ;Rr(T )]) = S(Mk(a[X;(β−1)Y ;Rr(T )]

) + S(Mk(a[X;Y ])).

Summing up the above equalities for β = 1, . . . , α(T ), we get

S(Mk(a[X;α(T )Y ;Rr(T )]) = S(Mk(a[X;Rr(T)]

)

+ α(T )S(Mk(a[X;Y ])). (20)

Since S(Mk(a[X;Rr(T)]) and |X| + r(T ) are both bounded independently

of T , and since

limT→∞

α(T )

T= lim

T→∞

α(T )

|X|+ α(T )|Y |+ r(T )=

1

|Y |,

we get from (18) and (20) that

φ(k)(a) =1

|Y |S(Mk(a[X;Y ]))

as claimed in (17b).

Let us assume next that condition (17a) does not hold, and let m bethe smallest index such that S(Mm(a[X;Y ])) 6= 0. Consider equality (19),and note that the summation on the right hand side can be viewed as apolynomial p(β), which is of degree k −m and in which the sign of theleading term is the same as the sign of S(Mm(a[X;Y ])). Then,

α(T )∑

β=1

p(β) = q(α(T ))

is a polynomial of α(T ) of degree k −m + 1, and the sign of its leadingterm is the same as the sign of S(Mm(a[X;Y ])). Thus, by summing up(19) for β = 1, . . . , α(T ) we get

S(Mk(a[X;α(T )Y ;Rr(T )]) = S(Mk(a[X;Rr(T )]

) + q(α(T )). (21)

Since k − m + 1 > 1 and since S(Mk(a[X;Rr(T)]) and |X| + r(T ) are

both bounded independently of T , we obtain (17c) from (18) and (21) bydividing by T and taking limits. �

Given a BW-game Γ, we denote by PB(Γ) and PW (Γ) the sets of(not necessarily pure and/or stationary) strategies of Black and White,respectively. Let us note that SB and SW are proper subsets of PB andPW , respectively.

Now we are ready to prove the main result of this section.

Theorem 2 There is a pure stationary best response to a pure stationarystrategy in a k-total reward BW-game, for any k ∈ Z+.

13

Proof Let us consider a positive integer k, and a k-total reward gameΓ = (G,B,W, v0, r, φ

(k)). Assume that Black fixes an arbitrary purestationary strategy b∗ ∈ SB = SB(Γ), and we will show that White hasa pure stationary best response. Though there is a mild asymmetry dueto the definition of the value of the game, the case when White fixes apure stationary strategy can be handled similarly.

Let us note that at this moment we may not (yet) assume that a bestresponse exists, at all. We however know that for every 1 > ǫ > 0 thereexists a strategy w∗ ∈ PW = PW (Γ) of White (not necessarily pureand/or stationary) for which

φ(k)(b∗,w∗) ≥ supw∈PW

φ(k)(b∗,w) − ǫ (22)

simply by the definition of sup. Let us also note that we can assume

φ(k)(b∗,w∗) > −∞

since otherwise any pure and stationary strategy of White is a best re-sponse.

To prove our theorem, let us first use the fact that b∗ is pure andstationary, delete all the arcs going out from vertices controlled by Black

which are not chosen in b∗, and denote by G∗ = (V,E∗) the resultingdirected graph.

Let us note that any walk in G∗ starting from v0 arises as the playcorresponding to strategies (b∗,w) for some strategy w of White. Fur-thermore, if w is a pure and stationary strategy then this walk is a lasso,and every lasso sequence starting at v0 corresponds in this way to somepure and stationary strategy of White.

Let us next consider the pair (b∗,w∗) satisfying (22), which defines aninfinite walk in the directed graph G∗ along the vertices W = (v0, v1, . . .).As before, let us denote by et = (vt−1, vt) the edges of this walk, and byat = r(et) the local rewards, for t = 1, 2, . . ., and set F = {e1, e2, . . .} todenote the sequence of arcs in this infinite walk in G∗.

Let us denote for integers i < j by [i, j] the set of integers {i, i +1, . . . , j}, and set J0 = {1, 2, . . .} to denote the set of positive integers. Toan increasing subset I = {i1, i2, . . .} ⊆ J0 of the indices, i1 < i2 < · · · ,we associate the edge set F (I) = {ei | i ∈ I} and we say that F (I) isa walk in G∗ if the endpoint of eis is the beginning of eis+1 , that is ifvis = vis+1−1 for all s = 1, 2, . . .. Note that if s ∈ I , then F (I ∩ [1, s]) isalso a walk in G∗.

Let us consider an arbitrary infinite increasing sequence I of integerssuch that F (I) is a walk in G∗ and let q = q(I) be the smallest integerq ∈ I such that vertex vq is repeated in the walk F (I ∩ [1, q]). Let usdenote then by p = p(I) the unique index p < q, p ∈ I for which vp = vq .Let us introduce C(I) = I ∩ [p + 1, q], set P (I) = I ∩ [1, p] and defineJ(I) = I \ [p+1, q]. Observe that F ([P (I);C(I)]) is a lasso sequence withF (C(I)) as its cycle, and that F (J(I)) is again a walk in G∗.

Next, starting with I = J0 and F = F (J0), let us define ps = p(Js−1),qs = q(Js−1), Ps = P (Js−1), Cs = C(Js−1), Ls = [Cs;Ps] and set Js =

14

J(Js−1), recursively for s = 1, 2, . . .. Note that by the above observationF (Jk) ⊆ F is a walk in G∗, F (Cs) is a cycle, and F (Ls) is a lasso withF (Cs) as its cycle, for every index s = 1, 2, . . ..

Let us finally denote by ws, s = 1, 2, . . . a pure stationary strategy ofWhite for which F (Ls) is the corresponding play when Black choosesb∗ and White chooses ws. (As we observed above, every lasso sequencein G∗, starting from v0 corresponds to such a pure stationary strategy ofWhite.)

Recall that we have the notation at = r(et) for all t ∈ J0, and considera = aJ0 = (a1, a2, . . .) the infinite sequence of local rewards of the play(b∗,w∗).

Let us first observe that if for some index s and m < k we haveS(M ℓ(aPs , aCs)) = 0 for ℓ < m and S(Mm(aPs ,aCs)) > 0, then byCorollary 6 we have

φ(k)(b∗,ws) = φ(k)(aPs(aCs)) = +∞

implying that ws is a pure and stationary best response of White to b∗,completing the proof of the claim.

Consequently, for the rest of the proof we can assume that

S(Mm(aPs ,aCs)) ≤ 0 for all 0 ≤ m < k and s = 1, 2, . . . (23)

For a positive integer t let us define s(t) as the largest index s forwhich qs ≤ t, and set Qt = Js(t) ∩ [1, t]. Let us then note that F (Qt) is asimple path in G∗.

By repeated applications of Corollary 5 and by the inequalities (23)we can write for any integer T that

S(Mk(a[1,T ])) ≤ S(Mk(aQT )) +

s(T )∑

s=1

S(Mk(aPs ,aCs)).

Since F (QT ) and F (Ps ∪ Cs) for s = 1, . . . , s(T ) are all subgraphs of atmost n arcs, there is a polynomial p(n, k), independent of T , such thatall quantities on the right hand side are bounded form above by p(n, k)R.Consequently,

1

TS(Mk(a[1,T ])) ≤ p(n, k)R

follows, implying that φ(k)(b∗,w∗) ≤ p(n, k)R, in other words, that φ(k)(b∗,w∗)is finite.

We shall show next that in fact we have equalities in (23).To this end, let us note first that if we have S(Mm(aPi ,aCi)) = 0 for

all m = 0, . . . , k − 1 and i = 1, 2, . . . , ℓ− 1, then for every integer T ≥ qℓwe can write by Corollaries 5 and 6 that

S(Mk(a[1,T ])) = S(Mk(a[P1;Z])) +

ℓ−1∑

i=1

|Ci|φ(k)(aPi(aCi)),

15

where Z = [qℓ−1 + 1, T ] = [1, T ] \ ∪ℓ−1i=1 (Pi ∪ Ci). Assume now that s is

the smallest index such that S(Mm(aPs ,aCs )) < 0 for some 0 ≤ m < k.Then for all T > qs we can write, analogously to the above by Corollaries5 and 6 that

S(Mk(a[1,T ]))− S(Mk(a[P1;Z])) =

s−1∑

i=1

|Ci|φ(k)(aPi(aCi))

+k∑

m=0

(|Z| + k − 1−m

k −m

)S(Mm(aPs ,aCs)),

where Z = [qs−1 + 1, T ] = [1, T ] \ ∪s−1i=1 (Pi ∪ Ci). We can view the right

hand side here as a polynomial of T = |Z| (of degree at most k). Accordingto (23) and our choice of s, the leading term in this polynomial have anegative coefficient. Thus, for all values of T , large enough, we have

S(Mk(a[1,T ]))− S(Mk(a[P1;Z])) < −1,

implying that for the strategy w of White corresponding to F ([P1;Z])we have φ(k)(b∗, w) > φ(k)(b∗,w∗), contradicting the choice of w∗. Thiscontradiction proves that we must have equalities in (23) for all s = 1, 2, . . .and m = 0, . . . , k − 1.

The above then implies that for all T we have the equality

S(Mk(a[1,T ])) = S(Mk(aQT )) +

s(T )∑

i=1

|Ci|φ(k)(aPi(aCi)).

Since there are only a finite number of different pure stationary strategiesof White, let us choose w∗∗ such that φ(k)(b∗,w∗∗) is the maximumamong all pure stationary strategies of White. In particular, we haveφ(k)(aPi(aCi)) ≤ φ(k)(b∗,w∗∗) for all i = 1, . . . , s(T ).

Then, since |QT | ≤ n and hence∑s(T )

i=1 |Ci| ≥ T − n, from the aboveequality we get

φ(k)(b∗,w∗) = lim infT→∞

1

TS(Mk(a[1,T ]))

≤ lim infT→∞

[p(n, k)R

T+

∑s(T )i=1 |Ci|

Tφ(k)(b∗,w∗∗)

]

≤ lim infT→∞

T − n

Tφ(k)(b∗,w∗∗) = φ(k)(b∗,w∗∗)

Since this inequality holds for every choice of ǫ > 0, it follows that w∗∗ is apure stationary best response to b∗, completing the proof of the theorem.�

Remark 3 Let us observe that the above proof goes through even if eachplayer has a (possibly different) payoff function which is a convex combi-nation of lim inf and lim sup.

16

5 Hierarchy of k-Total Rewards

In what follows we shall show that the k-total reward is a special case ofthe (k + 1)-total reward. In particular, mean payoff games are a specialcase of 1-total reward games, and in general of k-total reward games.

To state and prove our results we need to obtain a functional relationbetween M(x(y)) and the vectors x and y. To start our analysis, let usconsider the all-one vector e = (1, 1, . . . , 1) ∈ R

n, and compute its iteratedM images and the corresponding sums.

Lemma 4 For e = (1, 1, . . . , 1) ∈ Rn and for every k ∈ Z+ we have

Mk(e) =

((k − 1 + j

k

) ∣∣∣∣∣ j = 1, . . . , n

)(24)

and correspondingly

S(Mk(e)) =

(n+ k

k + 1

). (25)

Proof The above expressions are clearly correct for k = 0. We can provethem by induction on k using the binomial identity

k∑

j=0

(a+ j

a

)=

(a+ k + 1

k

)(26)

for all integers a and k. �

Lemma 5 For z = (z1, z2, . . . , zn) ∈ Rn and for every k ∈ Z+ we have

Mk(z) =

(i∑

j=1

(k − 1 + i− j

k − 1

)zj

∣∣∣∣∣ i = 1, . . . , n

)(27)

and correspondingly

S(Mk(z)) =

n∑

j=1

(k + n− j

k

)zj . (28)

Proof For k = 0, the above formula with an extended definition ofthe binomial coefficients [6] shows that M0 is the identity operator asassumed. For k = 1 the above expressions coincide with the definitions ofthe M and S operators. Thus, by induction on k we can write

Mk+1(z) = M(Mk(z))=

(i∑

ℓ=1

ℓ∑

j=1

(k − 1 + ℓ− j

k − 1

)zj

∣∣∣∣∣ i = 1, . . . , n

)

=

i∑

j=1

zj

i∑

ℓ=j

(k − 1 + ℓ− j

k − 1

) ∣∣∣∣∣∣i = 1, . . . , n

=

(i∑

j=1

(k + i− j

k

)zj

∣∣∣∣∣ i = 1, . . . , n

)

17

where the second equality follows by (27), and the last one by (26). Finally(28) follows by (27) and (26). �

Let us recall a few combinatorial identities from [6], which we shallneed in the sequel.

Lemma 6

N∑

u=0

(−1)u(N

u

)(X − u

R

)=

(X −N

X −R

).

Lemma 7

N∑

u=0

(X + u

u

)(Y +N − u

N − u

)=

(X + Y +N + 1

N

).

Now we are ready to provide an algebraic description of theM operatorover the set of lasso sequences.

Theorem 3 Let us consider integers p, q > 0, k ≥ 0 and a lasso sequencex(y) ∈ Sn(R) with x ∈ Z

p and y ∈ Zq, satisfying

φ(ℓ)(x(y)) = 0 for 0 ≤ ℓ ≤ k − 1. (29)

Then, we have

Mk(x(y)) = Mk(x)

(k∑

ℓ=1

S(Mk−ℓ(x)

)M ℓ−1(e) + Mk(y)

)(30)

where e = (1, 1, . . . , 1) ∈ Zq, and

φ(k) (x(y)) = A

(k∑

ℓ=1

S(Mk−ℓ(x)

)M ℓ−1(e) + Mk(y)

)

=1

qS

(k∑

ℓ=1

S(Mk−ℓ(x)

)M ℓ−1(e) + Mk(y)

)

=1

q

p∑

j=1

xj

[(q + p− j + k

k

)−

(p− j + k

k

)]+

1

q

q∑

i=1

(k + q − i

k

)yi.

(31)

Proof Let us note first that for k = 0 the condition (29) is empty, and thesummation in (30) is an empty sum, yielding M0(x(y)) = x(y). Further-more, for k = 1 we have by Corollary 1 thatM(x(y)) =M(x) (S(x)e+M(y)),in agreement with (30), since condition (29) is equivalent with sayingS(y) = 0 by definition (8). Thus, (30) follows by induction on k using thelinearity of M as in Lemma 2.

Finally, (31) follows from (30) after applying the S operator to bothsides of (30) yielding the second line of (31). Then, by using the linearity

18

of S by Lemma 2 and applying Lemmas 4 and 5 we get

1

q

k∑

ℓ=1

(q + ℓ− 1

)p∑

j=1

(k − ℓ+ p− j

k − ℓ

)xj +

1

q

q∑

i=1

(k + q − i

k

)yi

=1

q

p∑

j=1

xj

[k∑

ℓ=1

(q + ℓ− 1

)(k − ℓ+ p− j

k − ℓ

)]+

1

q

q∑

i=1

(k + q − i

k

)yi.

Finally, applying Lemma 7 with u = ℓ, N = k, X = q − 1 and Y = p− jand subtracting the u = 0 term from both sides we get the last line of(31). �

Remark 4 Formula (31) shows that the value φ(k)(x(y)) is a linear com-bination of the components of x and y. Furthermore, it can be verifiedthat these linear combinations for k = 0, 1, . . . , n−1, are linearly indepen-dent. Consequently, if φn takes a finite value on a lasso sequence, thenall local rewards on this sequence must be equal to 0. On the other hand,there are BW-games such that the φ(n−1) value, from a certain startingposition, is finite and different from zero.

For the main claim of this section, we introduce a split operation onsequences: to a given sequence x = (x1, x2, . . .), we associate x(1) =(x1,−x1, x2,−x2, . . .).

We need an additional technical lemma.

Lemma 8 For every integer k there exist reals αj , j = 0, 1, . . . , k suchthat (

2X + k + 1

k

)=

k∑

j=0

αj

(X + j

j

)

holds for all integers X. In particular, we have αk = 2k.

Proof Let us note that gj(X) =(X+j

j

)is a polynomial of X of degree

exactly j, for all j = 0, . . . , k. Thus, the polynomials gj , j = 0, . . . , kform a basis for polynomials in X of degree at most k. Hence, everypolynomial in X of degree k can be expressed as a linear combination ofthe gj , j = 0, . . . , k functions, proving the existence of the αj , j = 0, . . . , kcoefficients.To see that αk = 2k, we need to compare the coefficients ofXk on both sides. �

Remark 5 One can also obtain the explicit values of the coefficients αj

in Lemma 8; for completeness, we give these in Lemma 10 in the appendix.

The main result in this section is the following.

Theorem 4 For a nonnegative integer k and any lasso sequence x(y) wehave

φ(k+1)(x(1)(y(1)))

= 2k−1φ(k) (x(y)) . (32)

19

Proof We prove this claim by induction on k. For k = 0, let us notethat M(x1,−x1, x2,−x2, . . .) = (x1, 0, x2, 0, . . .) implying that φ1(x(1)) =12φ(0)(x).Let us next assume that we have the equalities

φ(ℓ+1)(x(1)(y(1)))

= 2ℓ−1φ(ℓ) (x(y))

for all 0 ≤ ℓ < k. Thus, in particular, the signs of φ(ℓ+1)(x(1)

(y(1)

))

and φ(ℓ) (x(y)) are the same for all ℓ < k. Therefore, by Fact 4 andthe definition of φ(k), both sides of (32) are simultaneously equal to ±∞,

whenever φ(k)(x(1)

(y(1)

))6= 0. Hence it is enough to prove the claim

for lasso sequences satisfying

φ(ℓ+1)(x(1)(y(1)))

= φ(ℓ) (x(y)) = 0 for all ℓ = 0, 1, . . . , k − 1.

(33)Under these conditions Theorem 3 can be applied and we get for the split

sequence x(1)(y(1)

)that 2qφ(k+1)

(x(1)

(y(1)

))is equal to

p∑

j=1

xj

1∑

r=0

(−1)r[(

2(q + p− j) + 1− r + k + 1

k + 1

)−

(2(p− j) + 1− r + k + 1

k + 1

)]

+

q∑

i=1

yi

1∑

r=0

(−1)r(2(q − i) + 1− r + k + 1

k + 1

)

=

p∑

j=1

xj

[(2(q + p− j) + k + 1

k

)−

(2(p− j) + k + 1

k

)]+

q∑

i=1

yi

(2(q − i) + k + 1

k

).

Let us also note that under conditions (33) we can apply Theorem 3 toφ(ℓ) (x(y)) and express it as in (31) for all ℓ < k. Let us finally note thatusing these expressions and using Lemma 8 three times, with X = q+p−j,X = p− j and X = q − i we get

2φ(k+1)(x(1)(y(1)))

=k∑

j=0

αjφ(j)(x(y)).

By condition (33) this further simplifies to

2φ(k+1)(x(1)(y(1)))

= αkφ(k) (x(y)) = 2kφ(k) (x(y))

from which the statement follows. �

Corollary 7 k-total reward BW-games are special cases of (k + 1)-totalreward BW-games.

Proof Let us consider an arbitrary BW-game Γ = (G,B,W, r) and let us

define its split, denoted by Γ = (G, B, W , r), as the game obtained fromΓ by subdividing each arc of local reward r(u, v) by a vertex w = wuv

20

and defining r(u,w) = r(u, v) and r(w, v) = −r(u, v). Clearly, there is

a one-to-one correspondence between the strategies in Γ and Γ. Then, itis also clear that the expected reward sequence arising from a play in Γis the split sequence of the reward sequence arising from the correspond-ing play in Γ. Theorem 4 implies that the games (G,B,W, r, φ(k)) and

(G, B, W , r, φ(k+1)) are equivalent; more precisely, the effective values ofthe corresponding plays are equal up to a multiplicative factor of 2k−1. �

Remark 6 Due to an example of a non-zero sum mean payoff BW-game[7] that does not have a Nash equilibrium, we can conclude that, for anyk ∈ Z+, there are non-zero sum k-total reward games that have no Nashequilibria either.

Let us note that the above reduction creates games in which every cyclehas zero total length. In fact, we have no NE-free examples with 1-totaleffective payoff and without directed cycles of zero total length.

It is also interesting to notice that in [20] the 1-total reward was viewedas a refinement of the 0-total one. The above result shows that in fact every0-total game can be viewed as a special 1-total game. Yet, a polynomialreduction in the other direction is not known.

References

[1] H. Bjorklund and S. Vorobyov. Combinatorial structure and ran-domized subexponential algorithms for infinite games. TheoreticalComputer Science, 349(3):347–360, 2005.

[2] E. Boros, K. Elbassioni, V. Gurvich, and K. Makino. Markov decisionprocesses and stochastic games with total effective payoff. RRR-4-2014, RUTCOR, Rutgers University, 2014.

[3] A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoffgames. International Journal of Game Theory, 8:109–113, 1979.

[4] D. Gillette. Stochastic games with zero stop probabilities. In Con-tributions to the Theory of Games, Vol. III, volume 39 of Annals ofMathematics Studies, pages 179–187. Princeton, N.J., 1957.

[5] Hugo Gimbert and Wiesaw Zielonka. When can you play position-ally? In Mathematical Foundations of Computer Science 2004, vol-ume 3153 of Lecture Notes in Computer Science, pages 686–697.Springer Berlin Heidelberg, 2004.

[6] Henry W. Gould. Combinatorial indentities, 2012.http://www.math.wvu.edu/ gould/.

[7] V. Gurvich. A stochastic game with complete information and with-out equilibrium situations in pure stationary strategies. RussianMathematical Surveys, 43(2):171–172, 1988.

[8] V.A. Gurvich, A.V. Karzanov, and L.G. Khachiyan. Cyclic gamesand an algorithm to find minimax cycle means in directed graphs.USSR Comput. Math. Math. Phys., 28:8591, 1988.

21

[9] N. Halman. Simple stochastic games, parity games, mean payoffgames and discounted payoff games are all LP-type problems. Algo-rithmica, 49(1):37–50, 2007.

[10] G.H. Hardy and J.E. Littlewood. Notes on the theory of series (xvi):two tauberian theorems. J. of London Mathematical Society, 6:281–286, 1931.

[11] A.V. Karzanov and V.N. Lebedev. Cyclical games with prohibition.Mathematical Programming, 60:277–293, 1993.

[12] Thomas M. Liggett and Steven A. Lippman. Stochastic gameswith perfect information and time average payoff. SIAM Review,11(4):604–607, 1969.

[13] J. F. Mertens and A. Neyman. Stochastic games. International Jour-nal of Game Theory, 10:53–66, 1981.

[14] Rolf H. Mohring, Martin Skutella, and Frederik Stork. Schedulingwith and/or precedence constraints. SIAM J. Comput., 33(2):393–415, 2004.

[15] H. Moulin. Extension of two person zero sum games. Journal ofMathematical Analysis and Aplication, 55:2:490–507, 1976.

[16] H. Moulin. Prolongement des jeux a deux joueurs de somme nulle.une theorie abstraite des duels. Memoires de la Soc. Math. France,45:5–111, 1976.

[17] N.N. Pisaruk. Mean cost cyclical games. Mathematics of OperationsResearch, 24:4:817–828, 1999.

[18] L. Shapley. Stochastic games. Proc. Nat. Acad. Sci. USA, 39:1095–1100, 1953.

[19] F. Thuijsman and O. J. Vrieze. The bad match, a total rewardstochastic game. Operations Research Spektrum, 9:93–99, 1987.

[20] F. Thuijsman and O. J. Vrieze. Total reward stochastic games andsensitive average reward strategies. Journal of Optimization Theoryand Applications, 98:175–196, 1998.

[21] S. Vorobyov. Cyclic games and linear programming. Discrete AppliedMathematics, 156(11):2195–2231, 2008.

[22] Uri Zwick and Mike Paterson. The complexity of mean payoff gameson graphs. Theoretical Computer Science, 158(1-2):343 – 359, 1996.

A Stronger Version of Lemma 1 for k = 1

Lemma 9 Let ε be a positive constant, 0 < ε < min{ 1R, 1 − 1

21/n}, and

let x(y) be a reward sequence from the set S∗ such that A(y) = 0. Then,we have ∣∣∣εφ(1)(x(y))− φ1−ε(x(y))

∣∣∣ ≤ 2ε2n2R.

Proof Using β = 1− ε we can write

φ(1)(x(y))−φβ(x(y))

ε=

p∑

i=1

xi(1− βi−1) +

q∑

j=1

yj

(q − j

q−βp+j−1

1− βq

).

22

Since A(y) = 0 is assumed, we have(1− βp−1

1−βq

)∑qj=1 yj = 0 and thus we

can rewrite the right hand side as

φ(1)(x(y))−φβ(x(y))

ε= ε

p∑

i=2

xi(1+β+· · ·+βi−2)+

q∑

j=1

yj

(βp−1

(1− βj

)

1− βq−j

q

).

Here we have (1 + β + · · ·+ βi−2) ≤ i− 1 and

(βp−1

(1− βj

)

1− βq−j

q

)≤

2εnj

q

from which, together with |xi| ≤ R, i = 1, . . . , p and |yj | ≤ R, j = 1, . . . , q,one can derive

∣∣∣∣φ(1)(x(y))−

φβ(x(y))

ε

∣∣∣∣ ≤ 2εn2R

implying the claim of the lemma. �

B Explicit coefficients in Lemma 8

Lemma 10 Let A, k ≥ 0 be integers. Then

⌊k/2⌋∑

j=0

(−1)j2k−2j

(k − j

j

)(A+ k − j

k − j

)=

(2A+ k + 1

k

). (34)

Proof Let g(A,k) denote the summation on the left hand side. Thus,we want to show that g(A,k) =

(2A+k+1

k

). We apply induction on A ≥ 0.

For A = 0, we have g(0, k) = k + 1 by (2.4) in [6], and hence (34) holdsin this case. We assume now that it holds for A, and verify it for A+ 1.

First we can show the following claim.

Claim 1

g(A+ 1, k) =1

A+ 1

⌊k/2⌋∑

j=0

(A+ k + 1− 2j)g(A, k − 2j). (35)

Proof

g(A+ 1, k) =

⌊k/2⌋∑

j=0

(−1)j2k−2j

(k − j

j

)(A+ 1 + k − j

k − j

)

=

⌊k/2⌋∑

j=0

(−1)j2k−2j

(k − j

j

)(A+ k − j

k − j

)·A+ k + 1− j

A+ 1

=A+ k + 1

A+ 1g(A,k) + h(k),

23

where

h(k) := −

⌊k/2⌋∑

j=0

(−1)j2k−2j

(k − j

j

)(A+ k − j

k − j

j

A+ 1

=

⌊(k−2)/2⌋∑

j=0

(−1)j2(k−2)−2j

((k − 2)− j

j

)(A+ (k − 2)− j

(k − 2)− j

)·A+ k − 1− j

A+ 1

=A+ k − 1

A+ 1g(A,k − 2) + h(k − 2). (36)

The claim follows by iterative application of (36). �

By induction, we have g(A,k − 2j) =(2A+k−2j+1

k−2j

). Thus, it remains

to prove the following claim.

Claim 2

1

A+ 1

⌊k/2⌋∑

j=0

(A+ k + 1− 2j)

(2A+ k − 2j + 1

k − 2j

)=

(2(A+ 1) + k + 1

k

).

(37)

Proof We use induction on k. The base cases k = 0 and 1 are easilyverified. Assume the statement holds for all integers less than k. Denoteby f(k) the left hand side of (37). Then

f(k) =1

A+ 1(A+ k + 1)

(2A+ k + 1

k

)+ f(k − 2)

=1

A+ 1(A+ k + 1)

(2A+ k + 1

k

)+

(2(A+ 1) + (k − 2) + 1

k − 2

)

=(2A+ k + 1)!

k!(2A + 3)![2(2A + 3)(A+ k + 1) + k(k − 1)]

=

(2(A+ 1) + k + 1

k

).

24