Cooperative user–network interactions in next generation communication networks

34
Cooperative User-Network Interactions in Next Generation Communication Networks Josephina Antoniou a,* , Vicky Papadopoulou b , Vasos Vassiliou a , and Andreas Pitsillides a a University of Cyprus Dept. Computer Science 75 Kallipoleos Street, P.O. Box 20537, CY-1678 Nicosia, Cyprus b European University Cyprus Dept. Computer Science 6, Diogenes Street, P.O. Box 22006, CY-1516, Nicosia, Cyprus Abstract Next Generation Communication Networks employ the idea of convergence, where heterogeneous access technologies may coexist, and a user may be served by anyone of the participating access networks, motivating the emergence of a Network Selec- tion mechanism. The triggering and execution of the Network Selection mechanism becomes a challenging task due to the heterogeneity of the entities involved, i.e., the users and the access networks. This heterogeneity results in different and often conflicting interests for these entities, motivating the question of how they should behave in order to remain satisfied from their interactions. This paper studies co- operative user-network interactions and seeks appropriate modes of behaviour for these entities such that they achieve own satisfaction overcoming their conflicting interests. Key words: next generation networks, access network selection, game theory, cooperative games, present value * Corresponding author Email addresses: [email protected] (Josephina Antoniou), [email protected] (Vicky Papadopoulou), [email protected] (Vasos Vassiliou), [email protected] (Andreas Pitsillides). Preprint submitted to Elsevier 20 November 2009

Transcript of Cooperative user–network interactions in next generation communication networks

Cooperative User-Network Interactions in

Next Generation Communication Networks

Josephina Antoniou a,∗, Vicky Papadopoulou b,Vasos Vassiliou a, and Andreas Pitsillides a

aUniversity of CyprusDept. Computer Science

75 Kallipoleos Street, P.O. Box 20537, CY-1678 Nicosia, CyprusbEuropean University Cyprus

Dept. Computer Science6, Diogenes Street, P.O. Box 22006, CY-1516, Nicosia, Cyprus

Abstract

Next Generation Communication Networks employ the idea of convergence, whereheterogeneous access technologies may coexist, and a user may be served by anyoneof the participating access networks, motivating the emergence of a Network Selec-tion mechanism. The triggering and execution of the Network Selection mechanismbecomes a challenging task due to the heterogeneity of the entities involved, i.e.,the users and the access networks. This heterogeneity results in different and oftenconflicting interests for these entities, motivating the question of how they shouldbehave in order to remain satisfied from their interactions. This paper studies co-operative user-network interactions and seeks appropriate modes of behaviour forthese entities such that they achieve own satisfaction overcoming their conflictinginterests.

Key words: next generation networks, access network selection, game theory,cooperative games, present value

∗ Corresponding authorEmail addresses: [email protected] (Josephina Antoniou),

[email protected] (Vicky Papadopoulou), [email protected] (VasosVassiliou), [email protected] (Andreas Pitsillides).

Preprint submitted to Elsevier 20 November 2009

1 Introduction1

Converged Networks [1] allow different access networks, terminals and ser-2

vices to coexist bringing forth a new communication paradigm, which is user-3

centric [2], i.e., the user is no longer bound to only one access network but4

may indirectly select the best available access network to support a service5

session [3]. Upon a new service request or even dynamic change affecting the6

session, (e.g., mobility) one of the participating access networks needs to be7

selected in order to support the session. Next Generation Communication8

Networks - controlled by an IP Core Network - need to be equipped with9

a Network Selection mechanism to assign the best access network to handle10

service activation, or any dynamic session change. Such decision may result11

in a single access network or even multiple networks cooperating to handle a12

service (the multiple networks case is treated in [4]). This paper studies the13

resulting interaction between a user and a network, and seeks the best be-14

haviour for each entity such that their conflicting interests are overcome and15

satisfaction is achieved.16

Interactions between entities with conflicting interests, follow action plans17

designed by each entity in such a way as to achieve a particular selfish goal, and18

are known as strategic interactions. Game Theory is a theoretical framework19

that studies strategic interactions, by developing models that prescribe actions20

in order for the interacting entities to achieve satisfactory gains from the21

situation. In this paper, we utilise Game Theory in order to model, analyze22

and finally propose solutions for the user-network interactions arising due to23

the Network Selection mechanism. Initially, we utilise Game Theory in order24

to model the relationship between the user and the network as a strategic25

game. Particularly, we utilise the notion of Present Value (PV) for a sequence26

of payoffs in a repeated game [5] in order to reach a decision that will be both27

user-satisfying and network-satisfying.28

In an attempt to improve the user-network interaction according to the knowl-29

edge acquired during the history of user-network interaction, we proceed to30

define a notion of adaptivity for the user as a player in the access network31

selection decision game, and based on that, we propose a new user strategy32

for the repeated user-network interaction game, as defined in this paper. The33

proposed adaptive strategy, together with the rest of the strategies studied34

in this paper, are evaluated through simulations that explore the behaviour35

of these strategies. The numerical results verify that the players gain more36

when cooperating, as expected from the theoretical analysis. Moreover, the37

simulation results show that the proposed adaptive user strategy results in38

profitable payoffs, also reinforcing the conclusions of the theoretical analysis.39

The paper is organized as follows. Section 2 describes related work and moti-40

2

vates the subsequent work. Section 3 investigates the user-network interaction41

both as a one-shot and as a repeated interaction. The paper continues with a42

study of access network selection (Section 4) leading to the proposal of a new43

adaptive user strategy. A numerical evaluation of all strategies analyzed in the44

paper is found in Section 5 and finally, conclusions are drawn in Section 6.45

2 Related Work and Motivation46

Although network selection is a relatively young research area, it has been47

extensively studied in recent years [6–8]. The growing popularity of networks48

that promote technology convergence and allow the co-existence of heteroge-49

neous access networks, has pushed towards the efficient resource management50

of the overall converged system upon a given dynamic change (e.g., service51

request, context change, mobility, etc).52

From the earlier works on network selection various approaches describing53

and analyzing the effects of this new resource management mechanism have54

been introduced and investigated. Such approaches include fuzzy logic [9,10],55

adaptive techniques [2,11], utility-based and game-theoretic models [12,13],56

technology-specific solutions, especially focussing on the inter-operation of57

cellular systems with wireless LANs [8,14], as well as architectural models58

focussing on a more comprehensive architectural view [3]. Decision making in59

these works is either user-controlled [10,12,14] or network-controlled [3,8,11].60

The interaction between them is not considered in detail and the selection61

decision mainly involves one entity, in some cases involving the other entity62

indirectly, i.e., decision is taken by the network with the consideration of some63

user preference [2,13], or the decision is taken by the user with the considera-64

tion of some network-specific rankings [9].65

In addition to the above techniques, more recent research development ex-66

plores more dynamic/automated solutions that, similarly to the earlier works,67

either are based on the user as a decision-making entity [15,16], or are network-68

controlled solutions proposing improvements to the user-perceived quality dur-69

ing netwok selection [6,17,18]. Besides the general solutions for network selec-70

tion, specific solutions handling particular services such as multicast, have71

been recently proposed [19,20]. These focus on the effect of network selec-72

tion on the particular service (e.g., multicast) in addition to user and network73

satisfaction aspects. Again, in these works the decision-making is either user-74

controlled or network-controlled.75

Since network selection is a mechanism involving important decision making,76

it is essential to take into consideration some strategical planning on behalf of77

the entities involved in this decision. Game theory is a theoretical framework78

3

for strategical decision-making. It has been a very popular approach among79

recently presented research works [4,7,21,22]. These explore various game the-80

oretic models such as non-cooperative games [21,22], and cooperation schemes81

where limited resources and/or need for quality guarantees exist [4,7].82

Cooperative approaches [4,7] explore the formation of coalitions between the83

various networks, where a terminal is capable to simultaneously connect to84

and be served by more than one network. The game selects the best group of85

networks to serve a new connection or a service demand forecast. In the non-86

cooperative papers, [21] studies the competition between the mobile users87

when they are able to select from a set of available access networks, each88

aiming to minimize the potential selection cost, whereas [22] considers network89

selection to be a non-cooperative game between the users and the networks90

instead of between the users themselves. A user in this game seeks to maximize91

quality of service and a network seeks to maximize its number of customers.92

Mathematical programming is used to find the equilibrium in this game.93

The current paper is closer to [22] because it models a game where the inter-94

acting entities involve the mobile user and the candidate access networks. As95

in [22], both the user and the network have different payoff functions reflecting96

their own satisfaction: best predicted quality per unit payment for the user and97

greatest profit for the network. The current paper improves on the work of [22]98

by capturing additional elements (e.g., user payment and network cost), and99

in addition it identifies and formalizes the cooperative properties instead of100

the non-cooperative aspects of the user-network interaction addressed in [22].101

We utilise the mathematical framework and theoretical tools from Game The-102

ory in order to capture the involvement of both interacting entities in the103

Network Selection mechanism, i.e., the user and the available networks. The104

game-theoretic framework enabled us to compute a solution for Network Se-105

lection that is both user- and network-controlled, and most importantly, the106

solution is satisfactory for both the user and the available networks.107

3 User-Network Interaction108

In this section we explore, using game theoretical tools, the relationship be-109

tween the user and the network participating in a converged system. Par-110

ticularly, we investigate the interaction resulting from the network selection111

procedure, and we aim through the proposed game model to reach a decision112

that is both user-satisfying and network-satisfying.113

4

3.1 Incentives114

The interaction between a user and a network in a converged system may115

be viewed as an exchange between two entities. The user gives some kind of116

compensation (most likely monetary) and the network gives a promise, in this117

case of a specific quality level, such that the promise can be evaluated in terms118

of compensation, i.e., a quality q that corresponds to a compensation c.119

Investigating the interaction between entities, we seek to discover the incen-120

tives for each entity to select certain strategies, i.e., sets of actions, that result121

in (i) a cooperative behaviour, and (ii) are such that both entities are satisfied.122

The incentive function is usually realized by the payoff of each entity involved.123

Considering a typical scenario situation of the user-network interaction, we as-124

sume that the payoffs of the user and the network are the following:125

- The payoff of the user is the difference between the perceived satisfaction,126

which is considered to be an increasing function of perceived quality, and127

the compensation offered by the user to the network. The perceived quality128

is based on quality measurements (e.g., the signal strength received by the129

terminal from the specific access network and the signal alteration rate130

measured at the terminal). The details of such a function are left for future131

work.132

- The payoff of the network is basically the profit of the network, represented133

as the difference between the compensation received from the user for the134

specific service, and the cost of supporting the session, which is considered135

to be an increasing function of the requested quality level.136

Both payoffs are based on the fact that the compensation and the quality level137

representing satisfaction in the user payoff, as well as the cost of supporting138

a session in the network payoff are comparable. In the first case, the compen-139

sation and satisfaction may be measured by similar units of satisfaction, i.e.,140

the more compensation given the less the units of satisfaction for the user,141

and the more quality received the more the units of satisfaction for the user.142

In the second case, quality corresponds to the cost for setting up the resources143

to support it and again we may view the comparison in terms of units of satis-144

faction, i.e., the more compensation received the more the units of satisfaction145

for the network, whereas the more the quality to be supported, the less the146

units of satisfaction for the network.147

3.2 General Assumptions and Requirements148

In order to explore the realization of the above incentives, we proceed to149

model the user-network interaction as a game, based on the following initial150

5

assumptions :151

(1) The players in the network selection game are heterogeneous players,152

aiming at different payoffs. This assumption is realistic given the diverse153

nature of a network and a user.154

(2) The modeling of the game assumes a complete game, i.e., one in which155

players are aware of the available actions and corresponding payoffs of156

their opponents, but of imperfect information, since the players make157

decisions without having knowledge of their opponents’ moves.158

(3) There is a minimum and a maximum compensation as well as a minimum159

and a maximum quality.160

(4) There is always a probability that quality degradation will be perceived161

by the user. This assumption is justified by the dynamic nature of the162

network, even if the network decides to offer the requested quality.163

(5) At any time the sum of the payoffs of the two players is a constant value,164

not equal to zero, i.e. a general sum game model; neither player wins165

as in a zero-sum game. This is a reasonable assumption since we aim to166

extract conclusions on the cooperation of the two entities.167

(6) Both the user and the network have non-negative payoff functions. This168

assumption is introduced in order to motivate the players to participate169

in the interaction (reflecting a selection of an appropriate access network170

during Network Selection).171

The above assumptions result in the following specifications, which are im-172

posed by the game model:173

- Assumption (6) results in the requirement that the maximum compensation174

offered by the user is less than or equal to the satisfaction corresponding175

to the minimum requested quality. Thus, given that compensation and sat-176

isfaction from perceived quality are measurable in comparable units, the177

maximum compensation offered by the user should be defined to be less178

than or equal to the satisfaction corresponding to the minimum requested179

quality. In this way, the user plays the game without risking to have a180

negative payoff, satisfying Assumption (6).181

- Assumption (6) results in the requirement that the minimum compensation182

offered by the user is greater than or equal to the cost of the maximum183

requested quality. Thus, given that the compensation and cost of support-184

ing a requested quality are measurable in comparable units, the minimum185

compensation offered by the user should be defined to be greater than or186

equal to the cost for the maximum requested quality. In this way, the net-187

work plays the game without risking to have a negative payoff, satisfying188

Assumption (6).189

- Moreover at any time of the user-network interaction, we require that the190

network’s cost from supporting the requested service is less than the com-191

pensation offered by an amount ε > 0, such that the network may gain at192

6

least marginal profit from this interaction.193

Given ε > 0, it is better for a network not to reject the user’s offer, but accept194

it and decide whether to cooperate, i.e., offer the requested quality, or cheat,195

i.e., offer a lower quality than the one requested. On the other hand, we also196

allow the user to be able to cheat by giving a lower amount of compensation197

than the one offered as incentive to initiate the user-network interaction.198

Initially, in our study, we treat this interaction as a one-shot interaction in199

order to derive some fundamental properties of the interaction (although un-200

realistic since users and networks interact multiple times). In particular, the201

theoretical analysis of the one-shot model of the game reveals that cooperation202

between the entities is not achieved. This result motivated us to investigate203

whether the knowledge of previous outcomes of the repeated interaction, which204

is a more realistic assumption, results in cooperative behaviour.205

3.3 One-Shot User-Network Interaction Game206

In this subsection, we examine the interaction by considering it first to be a207

one-shot game, i.e., that it is not affected by outcomes of previous interactions208

and that it does not affect any future interactions between the two entities.209

The user-network interaction model is defined as follows.210

Definition 3.1 (User-Network Interaction game) Let the user choose an211

incentive compensation scheme κi ∈ K to offer to the network, where K =212

{κ1, κ2, . . . κn}, in order to encourage the network to make and keep a promise213

of offering qi ∈ Q, where Q = {q1, q2, . . . qn}. The user has the capability to214

cheat and offer κ′i, such that ∀ i ∈ [n], κ′i < κi, but this will not be known to215

the network until the end of the game when payoffs are collected. The network216

makes a decision of whether to accept the user’s compensation κi and promise217

to offer qi or to reject it. If the network rejects the offer, the interaction ter-218

minates and the payoff to both players is 0. If the network accepts the offer,219

it can decide to keep the promise of supporting qi or to cheat, by offering q′i,220

such that ∀ i ∈ [n], q′i < qi. This will be known to the user also at the end of221

the game, when payoffs are collected.222

Regardless of the network’s decision, there is a random event, represented by223

Nature, i.e. an event that is not controlled by neither the network nor the224

user. This random event has two outcomes: the quality offered by the network225

degrades (the user perceives a lower quality q′i < qi), or the perceived quality in-226

dicates no degradation (quality qi perceived by the user). A quality degradation227

might be observed by the user, even when the network takes all the necessary228

measures to keep the promise for the requested quality. Whether the network229

7

cheats, or Nature’s event is one of degradation, or whether both occur, the230

user perceives a quality q′i without knowing what caused it.231

In the rest of the section, we assume a fixed requested quality q and corre-232

sponding compensation κi, so we simplify κi to κ, and qi to q. Table 1 illustrates233

the possible payoffs of the user and the network in the user-network interac-234

tion game, where π(q) represents the satisfaction resulting from the perceived235

quality q, and c(q) represents the cost for supporting quality q. The payoffs236

are presented by order pairs: the first term is the user payoff and the second237

term is the network payoff.238

Table 1User & Network play the Prisoner’s Dilemma, Nature’s Play: No Degradation

Network Accepts & Cooperates Network Accepts & Cheats

User Accepts & Cooperates π(q)− κ, κ− c(q) π(q′)− κ, κ− c(q′)

User Accepts & Cheats π(q)− κ′, κ′ − c(q) π(q′)− κ′, κ′ − c(q′)

3.3.1 Equivalence to Prisoner’s Dilemma239

Consider the well-known Prisoner’s Dilemma Game (Definition 3.2).240

Definition 3.2 (Prisoner’s Dilemma type of game) [23] Consider an one-241

shot strategic game with two players in which each player has two possible242

actions: to cooperate with his opponent or to defect from cooperation. Further-243

more, assume that the two following additional restrictions on the payoffs are244

satisfied:245

(1) The order of the payoffs is shown in Table 2 for each player j ∈ [2] and246

is such that Aj > Bj > Cj > Dj.247

(2) The reward for mutual cooperation should be such that each player is not248

motivated to exploit his opponent or be exploited with the same probability,249

i.e. for each player it must hold that Bj >Aj+Dj

2.250

Table 2General payoffs for the Prisoner’s Dilemma

Player 2 Cooperates Player 2 Cheats

Player 1 Cooperates B1, B2 D1, A2

Player 2 Cheats A1, D2 C1, C2

Then, the game is said to be equivalent to a Prisoner’s Dilemma type of game.251

Consider now the following outcome for the user-network interaction game:252

the network accepts the user’s offer and the outcome of Nature’s random event253

8

is no degradation. We prove that this outcome of the game is equivalent to a254

Prisoner’s Dilemma type of game. In the subsequent discussion, we make use255

of the following fact:256

Fact 3.3 For a set quality q it holds that π(q) − κ > 0, i.e. the difference257

between the maximum satisfaction perceived by the user and the maximum258

compensation given by the user is greater than 0. Similarly, the difference259

between the minimum satisfaction and the minimum compensation is greater260

than 0, i.e. π(q′)− κ′ > 0.261

We make the following assumptions for the User-Network Interaction game:262

(1) The difference between the maximum and minimum satisfaction, π(q)−263

π(q′), is greater than the diifference between the maximum and minimum264

compensation, κ−κ′, that may be offered by the user, since the user will265

always prefer that the maximum compensation offered is set as low as it is266

allowed by the platform administrator, whereas the satisfaction received267

from the maximum quality may be evaluated as high as possible, since268

no such constraints exist.269

(2) The network accepts to participate in the interaction in a riskless man-270

ner, i.e. only if the range of possible compensations exceeds the range of271

possible costs, κ− κ′ > c(q)− c(q′).272

Table 3The mapping between the user & network payoffs and the payoffs in the Prisoner’sDilemma type of game

User Payoffs (j=1) Network Payoffs (j=2)

Aj π(q)− κ′ κ− c(q′)

Bj π(q)− κ κ− c(q)

Cj π(q′)− κ′ κ′ − c(q′)

Dj π(q′)− κ κ′ − c(q)

Proposition 3.4 Consider the User-Network Interaction game. Assume that273

the network accepts the user’s offer, and the event of Nature does not generate274

degradation. Then the game is equivalent to a Prisoner’s Dilemma game.275

PROOF. By Definition 3.1 we immediately conclude that:276

Claim 3.5 There are two possible actions for both players: to cooperate and277

to cheat (i.e. defect from cooperation).278

Claim 3.5 combined with Definition 3.2 imply that the actions of the players279

in the user-network interaction game match the actions of the players of a280

9

Prisoner’s Dilemma game. In particular, Table 3 maps each player’s payoffs281

(for the cases that the network accepts the user’s offer), of Table 1 to actions282

Aj, Bj, Cj, Dj, where j ∈ [2], as defined in Definition 3.2. We proceed to prove:283

Lemma 3.6 Set Aj, Bj, Cj, Dj according to Table 3. Then it holds that: Aj >284

Bj > Cj > Dj for each j ∈ [2].285

PROOF. The User-Network Interaction game satisfies condition 1 of Defini-286

tion 3.2, when the network accepts the user’s offer and no degradation occurs.287

Examining the user, we verify straightforward that π(q)− κ′ > π(q)− κ, thus288

A1 > B1, and that π(q′)− κ′ > π(q′)− κ, thus C1 > D1, since κ > κ′. Given289

Fact 3.3 and the assumption that π(q)−π(q′) > κ−κ′, it holds that B1 > C1.290

Examining the network, we verify straightforward that κ − c(q′) > κ − c(q),291

thus A2 > B2, and that κ′ − c(q′) > κ′ − c(q), since c(q) > c(q′), thus C2 >292

D2. Assuming that the network accepts to participate in the interaction in a293

riskless manner, i.e. only if the range of possible compensations exceeds the294

range of possible costs, then κ− κ′ > c(q)− c(q′), and B2 > C2. 2295

We now proceed to prove that:296

Lemma 3.7 The User-Network Interaction game, satisfies condition 2 of Def-297

inition 3.2.298

PROOF. To prove the claim we must prove that the reward for cooperation299

is greater than the payoff for the described situation, i.e. for each player it300

must hold that Bj >Aj+Dj

2.301

For the user,

π(q)− κ > π(q)− κ′ + π(q′)− κ2

>π(q) + π(q′)

2− κ′ + κ

2, (1)

since π(q) > π(q′) and κ > κ′.302

For the network,

κ− c(q) > κ− c(q′) + κ′ − c(q)2

>κ+ κ′

2− c(q) + c(q′)

2, (2)

since c(q) > c(q′). 2303

Claim 3.5, Lemma 3.6, and Lemma 3.7 together complete the proof of Propo-304

sition 3.4. 2305

10

The decision of each player in the User-Network Interaction game is based on306

the following reasoning [23]: If the opponent cooperates, defect to maximize307

payoff (Aj in Table 3); if the opponent defects, defect (payoff Cj instead of308

payoff Dj). This reasoning immediately implies:309

Corollary 3.8 [23] In a one-shot Prisoner’s Dilemma game, for any profile310

of the game, a best response strategy of both players is to defect.311

Proposition 3.4 combined together with Corollary 3.8 immediately imply:312

Corollary 3.9 In the one-shot user-network interaction game, when the event313

of Nature is one of no degradation, a best response strategy of both players is314

to cheat.315

Corrolary 3.8 and Corollary 3.9 give rise to the question of how cooperation316

could be motivated, since the user and the network must cooperate in or-317

der to satisfy their corresponding needs for higher quality levels and higher318

compensation; if both players are motivated to cheat, then the quality and319

compensation are kept at minimum levels. A memoryless user-network inter-320

action results in no cooperation between the players. However, since in realistic321

scenarios previous outcomes of the interaction can affect the behaviour of the322

two entities, we introduce in Section 3.4, more realistic models, in an attempt323

to check whether cooperation could be reached. In particular, we present a324

model that captures the case where players have a one-unit window of the325

previous history of the game, i.e., they know their opponent’s last action.326

3.3.2 Sequential Moves Game327

Here, we present a sequential moves game modeling the user-network interac-328

tion, in order to check whether the same conclusions hold. In this game, the329

two involved entities make sequential decisions, i.e., the user makes a decision330

first and the network makes the decision second. The game ends by a random331

event, which is not controlled by either of the two entities. We utilise a no-332

tion from Game Theory to model this sequential decision-making: sequential333

moves games, a.k.a extensive games.334

According to this model, the user is a first-mover, the network moves second,335

and each player has one more move before the event of Nature, which randomly336

sets the outcome of the game to be equal to degradation or no degradation.337

The payoff functions and the set of actions of the players remain the same338

as in the one-shot game. Extensive games can be represented as trees, where339

branches represent decisions and nodes represent decision makers (except the340

leaves which represent end states). So, the user-network interaction game is341

modeled in extensive form as illustrated in Figure 1.342

11

Fig. 1. The one shot sequential moves game of user-network interaction in extensiveform

In the case where the network decides to cooperate and keep the promise to343

the user, degradation might or might not be perceived. If degradation is not344

perceived, the payoff to the user is the result of the difference between an345

increasing function of quality q, representing the user’s satisfaction, π(q), and346

the offered compensation κ, such that π(q) − κ; the payoff to the network347

in this case, is the difference between the received compensation κ and an348

increasing function of quality q ∈ Q, representing the network’s cost, c(q), that349

is κ−c(q). On the other hand, if degradation is perceived, the quality perceived350

is q′ < q, and the payoff to the user changes to π(q′)− κ, where π(q′) < π(q).351

However, the payoff to the network does not change; we assume that both the352

compensation offer and the setup for supporting a specific quality level (and353

consequently the relevant cost), happen at the beginning of a session, whereas354

any degradation is not observed until the end of the session. In the case that355

the network decides to cheat, the payoffs to the network also change, since356

the cost for quality q′ < q decreases such that c(q′) < c(q). All the payoffs are357

illustrated at the leaves of the tree in Figure 1.358

Both players want to make choices that maximize their payoff since they are359

rational players. Let’s first examine the behaviour of the network. The first360

choice involves whether to accept or reject a compensation by the user. Since361

we have considered only the cases where the compensation offered is always362

greater than or equal to the network’s cost, the best choice for the network is363

to accept the offer made by the user. Given that the network will accept the364

compensation offer, it knows that the lower the quality for which it reserves365

resources, the lower its own cost will be. Since, according to the payoffs, the366

network will not experience any punishment for cheating in the one-shot game,367

12

and since the current decision of the network has no impact on any future368

interactions between the two players, the network is motivated to only offer369

minimum quality, increasing its own payoff but also cheating in every case that370

the quality promised was greater than the minimum. Since Nature does not371

affect the network’s payoff, cheating is the decision that will give the highest372

payoff to the network.373

Now, since the user is a rational player and wants to maximize his own payoff,374

he will choose to offer a payment that in combination with the quality level re-375

quested will give him the highest attainable payoff. Using backward induction376

thinking, i.e., by considering which is the best move for the network, the user377

may guess that the best response to the network’s best behaviour of cheating378

is to only offer minimum compensation, expecting in return minimum quality.379

This is done in order to maximize his own payoff as well, by making sure not to380

experience any losses due to the network’s cheating behaviour. Corollary 3.10381

summarizes these findings:382

Corollary 3.10 In the sequential moves one-shot game modelling the user-383

network interaction, the outcome of the game is the following:384

The user offers minimum compensation and the network supports minimum385

quality, with analogous payoffs as in the case where both entities were cheating.386

3.3.3 Remarks on One-Shot Game387

The above discussion on the one-shot game leads to the conclusion that coop-388

eration between the user and the network is not supported for all quality levels.389

This conclusion motivated our research to move towards a different model for390

the user-network interaction game, a repeated game model. The result of this391

study is presented in the next section, where we model the user-network in-392

teraction game as a repeated game with infinite horizon [5], particularly as a393

repeated Prisoner’s Dilemma game, with the payoffs for each period as indi-394

cated in Table 1.395

3.4 Repeated User-Network Interaction Game396

If the one-shot game model were an accurate representation of the real in-397

teraction, why would networks ever be motivated to satisfy users’ requests398

for higher qualities? The reason is that, in reality, the interactions between399

networks and users are not one-shot but re-occuring. More importantly, the400

previous outcomes of their interactions affect their future behaviours. In such401

relationships, the players do not only seek the immediate maximization of pay-402

offs but instead the long-run optimal solution. Such situations are modelled403

13

in Game Theory by repeated games. There are two kinds of repeated games:404

the finite horizon repeated games and the infinite horizon repeated games,405

which are actually models of games of unknown length [5]. We categorize the406

user-network interaction model as an infinite horizon repeated game, since the407

users keep requesting new sessions from the networks but the number of such408

requests is not known.409

A repeated game makes it possible for the players to condition their moves on410

the complete previous history of the various stages, by employing strategies411

that define appropriate actions for each period. Such strategies are called trig-412

ger strategies [24]. A trigger strategy is a strategy that changes in the presence413

of a predefined trigger; it dictates that a player must follow one strategy until414

a certain condition is met and then follow a different strategy, for the rest415

of the game. One of the most popular trigger strategies is the grim trigger416

strategy [5], which dictates that the player participates in the relationship,417

but if dissatisfied for some known reason, leaves the relationship forever. The418

grim trigger strategy may be used by the user in the user-network interaction419

game, such that if degradation is detected by the user in one stage, in the next420

stage the user may punish the network by leaving the relatiosnhip (e.g., stop421

interacting with the specific network). Given such a strategy, the network has422

a stronger incentive to keep the promise given for a certain quality, since it423

faces the threat of losing its customer forever. The threat of non-renewal of424

the user’s contract to the network, secures compliance of the network to keep425

the promise of quality. Exchanges based on such threats of non-renewing a re-426

lationship, which is based on a particular agreement between the two parties,427

are often referred to as Contingent Renewal Exchanges [24]. Therefore, the428

user employs a grim trigger strategy to elicit performance from the network429

and the loss of the relationship is costly to the network because it has a nega-430

tive impact on the user-network relationship. Another popular strategy used431

to elicit cooperative performance from an opponent, is for a player to mimic432

the actions of his opponent, giving him the incentive to play cooperatively,433

since in this way he will be rewarded with a similar mirroring behaviour. This434

strategy is referred to as tit-for-tat strategy [5].435

The subsequent study of the repeated user-network interaction employs the436

grim strategy as a possible strategy for the user and the tit-for-tat strategy437

as a possible strategy for the network, since the strengths of each strategy438

mentioned above are considered appropriate for the user-network interaction.439

In addition, we define initially, two more strategies for the user and one more440

strategy for the network. Specifically, for the user we define: (a) the Cheat-and-441

Leave strategy, and (b) the Leave-and-Return strategy, and for the network442

we define the Cheat-and-Return strategy. The Cheat-and-Leave strategy is443

defined for the user to reflect the user’s option to cheat, since the grim strat-444

egy is a cooperative strategy. The user leaves after cheating in this case, i.e.,445

does not continue interaction with the particular network, in order to avoid446

14

any punishment for cheating. The Leave-and-Return strategy is a cooperative447

strategy similar to the grim strategy but we define it so as to capture the case448

where the punishment if the network cheats does not involve the user leaving449

the relationship forever, but leaving for only one period and returning in the450

subsequent interaction period. Similarly, the Cheat-and-Return strategy gives451

the opportunity to the Network to cheat, and since it cannot in reality leave452

the user-network relationship (if the user selects to interact with a particular453

network), it returns to the interaction and accepts the user’s punishment, if454

any. Consequently, the profiles considered in the analysis involve combinations455

of these strategies.456

3.4.1 Present Value457

Examining the user-network relationship, we observe that even though the458

network might decide to keep the promise for a specific quality, there is a459

probability that degradation will be perceived, since this can not be completely460

controlled by the network. Therefore, there is always a probability that the461

game will not continue in the next period, which is equal to the probability that462

degradation is perceived. We model this probability as a function of quality,463

p(q). In order to compare different sequences of payoffs in repeated games, we464

utilize the idea of the present value of a payoff sequence [5], and we refer to465

it as the Present Value (PV). PV is the sum that a player is willing to accept466

currently instead of waiting for the future payoff, i.e., accept a smaller payoff467

today that will be worth more in the future, similar to making an investment468

in the current period that will be increased by a rate r in the next period.469

Therefore, if the payoff in the next period were equal to 1, today the payoff a470

player would be willing to accept would be equal to 11+r

. If there is a probability471

that the game will not continue in the next period, equal to the probability472

of no degradation 1− p(q), then the payoff a player is willing to accept today,473

i.e., the player’s PV, would be equal to 1−p(q)1+r

, where p(q) gives the probability474

of degradation, which is not controlled by the network. Let δ = 1−p(q)1+r

, where475

δ ∈ [0, 1] and often referred to as the discount factor in repeated games [5].476

Therefore, given a payoff X in the next period, its PV in the current period477

equals δ · X. Now, for an infinitely repeated game, a PV should include the478

discounted payoff of all subsequent periods of the game. Let the payoff from the479

current period be equal to 1. Then, the additional payoff a player is willing to480

accept for the next period equals to δ, for the period after that the additional481

payoff equals to δ2 and so on. Thus, PV equals to 1+δ+δ2+δ3+δ4+. . . , which,482

according to the sum of infinite geometric series, equals to 11−δ . Therefore, for483

a payoff X payable at the end of each period, the present value in an infinitely484

repeated game equals to X1−δ .485

15

In order to determine whether cooperation is a better strategy in the repeated486

game for both the network and the user, we utilize PV and examine for which487

values of δ = 1−p(q)1+r

a given strategy is a player’s best response to the other488

player’s strategy. A strategy in an infinitely repeated game, gives the action489

to take at each decision node. At each decision node, i.e., for each period the490

network has the choice of two actions: either to take the risk and cooperate491

with the user offering the requested quality, or to cheat and offer the lowest492

quality. In case the network cheats, and the grim trigger strategy is used, the493

user will leave the network at the end of the session, because degradation will494

definitely be perceived.495

3.4.2 Equilibria496

Since in such a game we have an infinite number of decision nodes, we describe497

decision nodes in terms of histories, i.e., records of all past actions that the498

players took [25], thus a history corresponds to a path to a particular decision499

node in the infinitely repeated game tree. When a strategy instructs a player500

to play the best response to the opponent’s strategy after every history, i.e.,501

giving the player a higher payoff than any other action available after each502

particular history, it is called a subgame perfect strategy [5]. When all players503

play their subgame perfect strategies, then we have an equilibrium in the504

repeated game, known as a subgame perfect equilibrium [5].505

Definition 3.11 When a player cheats in one period, and leaves in the next506

period to avoid punishment by the other player, the strategy employed is re-507

ferred to as the cheat-and-leave strategy.508

Definition 3.12 When a player cheats in one period, and returns to cooper-509

ation to accept the punishment by the other player, the strategy employed is510

referred to as the cheat-and-return strategy.511

Let the user have a choice between the two following strategies: (i) the grim512

strategy, i.e., offer a compensation κ and keep offering the compensation as513

long as the network cooperates and no degradation is perceived; if degradation514

is perceived, then leave the relationship forever, and (ii) the cheat-and-leave515

strategy. Let the network have a choice between the two following strategies:516

(a) the tit-for-tat strategy, i.e., mimic the actions of its opponent, and (b) the517

cheat-and-return strategy. When neither of the two players cheats, the game518

profile is one of cooperation as identified in Definition 3.13.519

Definition 3.13 When the user employs the grim strategy and the network520

employs the tit-for-tat strategy, the profile of the repeated game is referred to521

as conditional-cooperation profile of the game.522

16

Next, we formulate Theorem 3.14, which identifies the conditional-cooperation523

profile as a subgame perfect equilibrium for the repeated user-network inter-524

action game.525

Theorem 3.14 In a repeated user-network interaction game, assume δ >526

c(q)−c(q′)κ−c(q′) and δ > κ−κ′

π(q)−κ′ , where δ is the discount factor of the repeated game.527

Then, the conditional-cooperation profile is a subgame perfect equilibrium for528

the game.529

PROOF. Since the players’ strategies suggest that cheating by either playerwould eventually result in a termination of the interaction, we assume a historyof cooperative moves in the past. Then in the current period, both the userand the network could choose to either cooperate or cheat. Assume that theuser plays the grim strategy. If the network cooperates, PV is equal to,

PVcooperate =κ− c(q)

1− δ

If the network cheats, PV will be a sum of what the network may get in thecurrent period and what the network may get in the rest of the game beginningin the next period. Thus,

PVcheat = κ− c(q′) +δ · 01− δ

For the network to be motivated to cooperate with the user instead of cheating,we must show that PV in case of cooperation is preferrable for the networkthan PV in case of cheating. The inequality is given next.

PVcooperate > PVcheat =κ− c(q)

1− δ> κ− c(q′) +

δ · 01− δ

Simplifying, we get δ > c(q)−c(q′)κ−c(q′) . Now, assume that the user plays the cheat-

and-leave strategy. If the network cooperates, its PV is equal to,

PVcooperate = κ′ − c(q) +δ · 01− δ

On the other hand, if the network cheats, its PV is equal to,

PVcheat = κ′ − c(q′) +δ · 01− δ

Clearly, in case the user plays the cheat-and-leave strategy, the network is notmotivated to cooperate since:

PVcheat > PVcooperate

17

Next, assume that the network employs a tit-for-tat strategy, and the usermay choose to cooperate or cheat. The PV if the user cooperates is the valueof the current payoff, π(q)− κ discounted for the whole infinite game.

PVcooperate =π(q)− κ

1− δ

The PV if the user cheats, i.e., offers a compensation κ′ < κ and then leavethe relationship, is the current payoff π(q) − κ′ plus the forever discountedpayoff for leaving the relationship, which is 0.

PVcheat = π(q)− κ′ + δ · 01− δ

The inequality of cooperation compared to cheating is given next.

PVcooperate > PVcheat =π(q)− κ

1− δ> π(q)− κ′ + δ · 0

1− δ

Simplifying, we get δ > κ−κ′

π(q)−κ′ . Now, assume that the network plays the cheat-and-return strategy. If the user cooperates, its PV is equal to:

PVcooperate = π(q′)− κ+δ · 01− δ

On the other hand, if the user cheats, its PV is equal to,

PVcheat = π(q′)− κ′ + δ · 01− δ

Clearly, in case the network plays the cheat-and-return strategy, the user isnot motivated to cooperate since:

PVcheat > PVcooperate

Thus, it has been shown that as long as δ > c(q)−c(q′)κ−c(q′) and δ > κ−κ′

π(q)−κ′ , the530

conditional-cooperation profile is the subgame perfect equilibrium for the user-531

network interaction game. 2532

Remark 3.15 For the network, cooperation is motivated if δ > c(q)−c(q′)κ−c(q′) .533

Since c(q′) < c(q) < κ, Theorem 3.14 implies that the bigger c(q′) is the534

more motivated the network is to cooperate; in other words, if the network535

cannot save a substantial fraction of the cost by cheating, then it is best to co-536

operate. For the user, cooperation is motivated if δ > κ−κ′

π(q)−κ′ ; since κ′ < κ and537

κ < π(q). Theorem 3.14 implies that the greater κ′ is, the better it is for the538

user to cooperate, in other words; unless the user can lower κ′ substantially,539

he will gain more by cooperating.540

18

Let the strategies selected by the user and by the network, be such that the541

punishment they impose on their opponent, in case the opponent cheats, lasts542

only for one period; namely, let the user employ the leave-and-return strategy543

and the network employ the cheat-and-return strategy as these are defined544

below.545

Definition 3.16 When a player cooperates as long as the other player coop-546

erates, and leaves for one period in case the other player cheats, returning in547

the subsequent period to cooperate again, the strategy employed is referred to548

as the leave-and-return strategy.549

Based on the newly defined strategy, another profile of the game is defined550

next.551

Definition 3.17 When the user employs the leave-and-return strategy and552

the network employs the tit-for-tat strategy, the profile of the repeated game is553

referred to as one-period-punishment profile of the game.554

It is assumed that when punished, a player accepts the punishment without555

cheating. This set of strategies is not as strict as the one previously considered,556

since the punishment lasts for only one period for the player that cheats. It557

has been proven in [26], that the conditions to sustain cooperation with grim558

trigger strategies, which are the stricter strategies that may be employed by559

the players in a repeated Prisoner’s Dilemma, are necessary conditions for the560

possibility of any form of conditional cooperation, i.e., a grim trigger strategy561

can sustain cooperation in the iterated Prisoner’s Dilemma under the least562

favourable circumstances of any strategy that can sustain cooperation. Moti-563

vated by the result in [26], and assuming a history of cooperation between the564

user and the network, we show that it is easier to impose cooperation in the re-565

peated user-network interaction game under the conditional-cooperation pro-566

file elaborated in Definition 3.13, where the imposed punishment ends in the567

termination of the interaction, rather than under the one-period-punishment568

profile.569

Theorem 3.18 Assume that δ > c(q)−c(q′)κ−c(q′) and δ > κ−κ′

π(q)−κ′ in the repeated570

User-Network Interaction game. Then, the conditional-cooperation profile mo-571

tivates cooperation of the players. These conditions on δ are also necessary to572

motivate cooperation in the one-period-punishment profile.573

PROOF. Given a history of the game where both players have cooperatedin the past, and that the user has decided to cooperate for the current pe-riod, the network has two options: to cooperate or to cheat. If the networkcooperates, PV will consist of the payoff κ− c(q) in the current period, plus adiscounted version of this payoff for the next period plus a discounted versionof the same payoff for all the rest of the periods. We consider the first two

19

periods separately in order to be able to compare PV when cheating, since thebehaviour of the network alternates in these periods of the game. Therefore,

PVcooperate = κ− c(q) + δ · (κ− c(q)) +δ2 · (κ− c(q))

1− δ

If the network cheats, PV is a sum of what the network may get in the currentperiod, what the network may get in the next period (when it is punished)and what it may get in the rest of the game beginning two periods from thecurrent period. Thus,

PVcheat = κ− c(q′) + δ · 0 +δ2 · (κ− c(q))

1− δ

The network’s motivation to cooperate is calculated in terms of δ, by showing574

through an inequality that PV in case of cooperation is preferrable for the575

network than PV in case of cheating as follows:576

PVcooperate > PVcheat =

κ− c(q) + δ · (κ− c(q)) +δ2 · (κ− c(q))

1− δ>

κ− c(q′) + δ · 0 +δ2 · (κ− c(q))

1− δ

Simplifying, we get δ > c(q)−c(q′)κ−c(q) . Now, considering that the network decides to

cooperate in the current period, the user has two options as well: to cooperateor to cheat. The PV of the user if he cooperates is the sum of the current payoff,π(q)− κ, the discounted same payoff for the next period, and the discountedsame payoff for the rest of the infinitely repeated game. We consider threediscounted periods to facilitate the comparison with the user’s PV in the casethat he cheats.

PVcooperate = π(q)− κ+ δ · (π(q)− κ) +δ2 · (π(q)− κ)

1− δ

If the user decides to cheat, PV is a sum of the current payoff π(q) − κ′, thediscounted payoff π(q′)− κ in the next period (since the network employs thetit-for-tat strategy), and the cooperation payoff π(q) − κ discounted for therest of the game.

PVcheat = π(q)− κ′ + δ · (π(q′)− κ) +δ2 · (π(q)− κ)

1− δ

We compare PVs for cooperation and cheating in the inequality given next.577

20

Table 4Cooperation Thresholds

conditional cooperation one-period punishment

Network Cooperates If: δ > c(q)−c(q′)κ−c(q′) δ > c(q)−c(q′)

κ−c(q)

conditional cooperation one-period punishment

User Cooperates If: δ > κ−κ′

π(q)−κ′ δ > κ−κ′

π(q)−π(q′)

PVcooperate > PVcheat =

π(q)− κ+ δ · (π(q)− κ) +δ2 · (π(q)− κ)

1− δ>

π(q)− κ′ + δ · (π(q′)− κ) +δ2 · (π(q)− κ)

1− δ

Simplifying, we get δ > κ−κ′

π(q)−π(q′). The cooperation thresholds for each player,578

given its opponent’s different strategies, are summarrized in Table 4. The con-579

ditions for sustaining cooperation under the conditional-cooperation profile are580

necessary for sustaining cooperation under the one-period-punishment profile581

as well. 2582

3.4.3 Remarks on Repeated Games583

It appears to be easier to sustain cooperation when strategies that involve584

harsher punishments are used. For the network, cooperation is motivated if585

δ > c(q)−c(q′)κ−c(q) , which is a similar result with the result for the conditional-586

cooperation profile. The difference between the two results is the second term587

in the denominator, c(q) instead of c(q′). Therefore, the second result is always588

greater than the first since c(q) > c(q′). For the user, cooperation is motivated589

if δ > κ−κ′

π(q)−π(q′), differing from the result of the conditional-cooperation profile590

in the second term of the denominator only. Since π(q′) < π(q), and given the591

initial model assumptions that any compensation offered is less or equal to592

the minimum satisfaction that could be received, the second result (under the593

one-period-punishment profile) is always greater than the first result (under594

the conditional-cooperation profile), since based on the above the denominator595

is always less.596

On a more practical note, in order to be able to enforce these solutions in a real597

heterogeneous communication network, additional issues must be considered598

such as the possible architecture that would enable easier management of the599

heterogeneity of the system, the repeatability of the user-network interaction,600

and the compensation set. The architecture considered to host such model is601

envisioned to be a hierarchical system, where a platform administrator (either602

a centralized or a distributed process) has knowledge of all participating access603

21

networks joined to a common core network, which is either IP-based with SIP604

signalling, or supporting a fully implemented IP Multimedia Subsystem (IMS)605

infrastructure to ensure multimedia support over all participating networks in606

an access-agnostic manner. Such architecture would support the existence of607

several autonomous entities (e.g., content and context providers, network oper-608

ators, etc.) motivating the overall architecture to be more user-centric instead609

of network-centric, since all these entities have a common goal of satisfying610

the user in order to receive the appropriate compensation/payment as in the611

user-network interaction model.612

Furthermore, the existence of several autonomous entities acting indepen-613

dently requires the existence of certain policies which may integrate the in-614

terests of these entities by enforcing some rules for the better management615

and operation of the overall system. Such policies may deal with setting the616

compensation, i.e., the payment, corresponding to a particular quality level617

of a requested service. This has to do with the range of qualities in which a618

particular content (e.g., a video) is available and the corresponding costs and619

allowable profits.620

In addition, policies may have to do with strategy/profile configurations en-621

forced onto users and networks by the platform administrator process. To622

achieve the repeatability, which is a major element in these strategies, there623

arises the need for the existence of a variable as part of the internal logic624

of a user terminal or a network gateway node such that both the user and625

the network may remember the previous action of the opponent entity. Thus,626

ideas of reward and punishment, elaborated in the theoretical model, may be627

implemented. In any case, the element of repeatability exists, since each user628

terminal has a number of activated access network interfaces (corresponding to629

different Radio Access Technologies - RATs) with which it interacts repeat-630

edly; each of these access networks may be under a different profit-seeking631

network operator who participates in the heterogeneous communication sys-632

tem.633

4 Access Network Selection Decision634

4.1 The User as an Adaptive Entity635

Having examined the generic relationship between a user and a network through636

two different types of strategies for each of the players, in this section we pro-637

pose a model for network selection in a converged environment, which is based638

on the knowledge obtained from the analysis of these strategies. The Network639

Selection decision is modelled as a game between one user of the converged640

22

environment and the participating access networks that are available to the641

specific user: the networks play simultaneously as one player (the payoff for642

the player called Networks is given as an array of payoffs corresponding to643

each of the individual access networks).644

The situation we model is the following: the user plays first and offers a com-645

pensation to Networks, and Networks examine the compensation and decide646

how many networks will accept and how many networks will reject the com-647

pensation. Any subset of Networks could accept or reject the proposed offer,648

including all accepting and all rejecting. In the latter case the game termi-649

nates with zero payoff to the user and to Networks. If one or more networks650

accept the compensation, then for each network, the user predicts his own651

satisfaction, in terms of π(q)expected, and selects the network that is predicted652

to offer the highest satisfaction, evaluated using several parameters; if only653

one network accepts, the selection is trivial.654

The evaluation of π(q)expected is based on an evaluation of user and network655

context in an attempt to predict the possible satisfaction from the received656

quality, and is different for each network. Network context includes such pa-657

rameters as: QoS offered by the network, maximum throughput offered by the658

network, current signal strength at the terminal, current signal alteration rate659

at the terminal, current noise/interference etc. These parameters are compared660

to the service QoS requirements corresponding to the specific service request,661

the user terminal capabilities and the user preferences in order to generate an662

adequately accurate prediction of the possible satisfaction (in terms of per-663

ceived quality). Currently, the expected satisfaction is only considered in terms664

of service quality. However, an interesting enhancement would be to consider665

service cost as a part of this function and investigate the satisfaction-cost666

tradeoff. Future work plans to investigate this aspect further.667

The user’s decision to select one of the networks, induces the specific network668

to start interacting with the user, having the options to cooperate with the user669

or to cheat, while the rest of the networks do not interact any further with the670

user during the game. From then on, the interaction between the user and the671

network is as previously described in the repeated user-network interaction672

model. The payoffs for the networks that are not selected are zero, while673

the payoffs for the user and the selected network are the same as previously674

discussed. The flow of the network selection game is outlined in Figure 2.675

Within the context of network selection, the user-network interaction may676

be improved to reflect a user preference towards networks that do not often677

demonstrate degradation. Perceived degradation could be a Nature-induced678

random event or could be the result of the network cheating and offering a679

lower level of quality than the one requested. Nature-induced degradation is680

a random event with a probability of re-occuring; it is acceptable to assume681

23

Fig. 2. Network Selection model based on User-Network interactions

that such a random event will re-occur in a uniformly random manner. How-682

ever, such prediction cannot be made for perceived degradation in the case683

that the network cheats. We employ the idea of an adaptive player, such that684

the user’s decision of which network to select considers a normalized weight of685

the network’s past degradation behaviour, based on the acquired knowledge a686

user gains over the course of the repeated game [26]. This normalized weight687

is placed as a coefficient to the π(q)expected function, thus it is considered ad-688

ditionally to the parameters previously mentioned to comprise this evaluation689

function.690

Thus, when the user evaluates his predicted satisfaction, or selects each one691

of the available networks, the evaluation should consider this weighted coef-692

ficient, calculated dynamically from observing past network behaviour. Being693

an adaptive player, the user makes a more informed selection decision that694

considers the past. More specifically, this is achieved by multiplying the pre-695

dicted satisfaction π(q)expected by the variable α (Definition 4.1), representing696

the normalized weight of past degradation behaviour (the bigger the weight697

value, the less the degradation observed in the past 1 ).698

Definition 4.1 In the network selection game, a user possesses an internalstate, which, based on a network’s history of degradation behaviour, acts as anormalized weight coefficient, α ∈ [0, 1], to its payoff function for better evalua-tion of the predicted satisfaction from a particular network. Given that the user

1 To achieve this, we assume that the user has an internal state, which modifies αafter every interactive period with the network, and that for each different networkthat interacts with the user, a different α is evaluated.

24

has an expected satisfaction for a service request, e, such that π(q)expected ≥ e,the value of α at the end of an interactive period is modified according to (3).

α =

αprevious + (αprevious · π(q)final−e

π(q)final), if π(q)final ≥ e, α ≤ 1

1, if π(q)final ≥ e, α ≥ 1

αprevious · π(q)final

e, otherwise .

(3)

By introducing the variable α ∈ [0, 1], the user considers the network’s history,699

approaching the selection decision in an adaptive manner, i.e., by evaluating700

π(q)expected · α instead of only π(q)expected, thus taking account of past history.701

4.2 A New Adaptive User Strategy702

Once the user makes a selection decision, he interacts with the selected net-703

work by specifying an appropriate strategy for this interaction. In Section 3.4704

we have proposed two game profiles; in the conditional-cooperation profile,705

the user punishes the network forever if degradation is perceived even once,706

while in the one-period-punishment profile, the user punishes the network with707

only one period of absence, even if the network demonstrates degradation fre-708

quently.709

The conditional-cooperation profile involves a harsh punishment to the net-710

work in case of cheating, i.e., the user leaves the relationship forever. This711

strongly motivates the network to cooperate. However, in case of cheating the712

relationship terminates forever and one could argue that this is costly for both713

the user and the network, considering the finite and relatively small number of714

access networks. On the other hand, the one-period punishment profile is less715

motivating for cooperation because punishment is minimum, i.e., leaving for716

only one period, and this is known to both players. Considering the adaptive717

way in which the user takes a decision during the selection process, with the718

use of α, we propose a new strategy for the user to be employed as a means719

to interact with the selected network. This strategy selects a punishment that720

adaptively sets the number of periods that the user will leave the network ac-721

cording to the past behaviour of the network, in order to motivate the network722

to keep cooperating, without having to enforce the harsh punishment of the723

grim strategy.724

Let the user’s strategy be the following: cooperate as long as the network725

cooperates; if the network cheats, then leave for an x number of periods; after726

that, return and cooperate again. Let the number x be equal to 1 if α = 1 or727

d 1αe otherwise; such that a network with a lower value for α suffers a separation728

of more periods with the user, whereas a network with a higher value for α is729

25

punished for less periods (minimum punishment is 1 period).730

Definition 4.2 In the network selection game, the adaptive-return strategyfor the user dictates that if the user perceives degradation, he punishes thenetwork by leaving for an x number of periods, before returning back to coop-eration. The value of x is a user-generated value as defined in (4).

x =

1, if α = 1

d 1αe, otherwise .

(4)

It is important that the network has a motivation to cooperate with the user,731

when the user employs the Adaptive Return strategy. Given that a trigger732

strategy that employs a punishment that lasts forever is the strategy that733

provides the strongest motivation to cooperate [26], we show that when the734

user employs the Adaptive Return strategy, the network is at least as moti-735

vated to cooperate with the user as when the user employs the one-period736

punishment strategy, because the minimum number of punishment periods737

imposed by the Adaptive Return strategy equals to one. However, the knowl-738

edge that the number of punishment periods could increase according to the739

network’s behaviour, increases the motivation of the network to cooperate.740

Theorem 4.3 Assume that δ > c(q)−c(q′)κ−c(q) in the repeated User-Network Inter-741

action game. Then, when the user employs the Leave-and-Return strategy, i.e.742

when the profile of the game is the one-period-punishment profile, the network743

is motivated to cooperate. This condition on δ is also necessary to motivate744

cooperation by the network when the user employs the Adaptive-Return strat-745

egy.746

PROOF. Given a history of the game where both players have cooperatedin the past, and the user employs the adaptive-return strategy, the networkhas two options in the current period: cooperate or cheat. When the networkcooperates, the PV is as follows:

PVcooperate =(κ− c(q)) · (1− δx+1)

1− δ+δx+2 · (κ− c(q))

1− δ

The sum of a finite geometric progression is used to calculate the discountedvalue for the first x+ 1 periods. If the network cheats, PV is:

PVcheat = κ− c(q′) +(1− δx) · 0

1− δ+δx+2 · (κ− c(q))

1− δ

The conditions necessary to impose cooperation are calculated next:747

26

PVcooperate > PVcheat =

(κ− c(q)) · (1− δx+1)

1− δ+δx+2 · (κ− c(q))

1− δ>

(κ− c(q′)) +(1− δx) · 0

1− δ+δx+2 · (κ− c(q))

1− δ

Since δ ∈ (0, 1), x ≥ 1, and δx+1 ≤ δ2, such that δx+1 − 1 ≤ δ2 − 1, we may748

simplify for δ to get δ > c(q)−c(q′)κ−c(q) , the same as the value of δ when employing749

the one-period punishment strategy. In fact the simplification of δx+1 ≤ δ2,750

considers the case, that the adaptive-return Strategy generates a punishment751

of only one period, whereas if we consider greater values of x, the motivation for752

the network to cooperate increases, showing that the conditions for sustaining753

cooperation by the network when the adaptive-return strategy is used by the754

user are in the limit the same as the ones necessary for sustaining cooperation755

under the one-period-punishment profile. 2756

5 User and Network Behaviour with Different Strategies757

This section examines the numerical behaviour of user and network strate-758

gies defined and used in the previous sections. The evaluation is based on759

a Matlab [27] implementation of an iterated user-network interaction game,760

where all user and network strategies are played against each other multiple761

times in order to evaluate the behaviour of each strategy in terms of pay-762

off. The implementation of the user-network interaction game was based on a763

publicly available Matlab implementation of the Iterated Prisoner’s Dilemma764

Game [28], which has been extended to include all the strategies examined in765

this paper.766

The implementation makes use of the following guidelines, set to reflect the767

analytical model of the repeated user-network interaction game. In each sim-768

ulation run, both players play their strategies and get payoffs accordingly. In769

the first set of simulations the payoffs are the following: when the user leaves,770

they both get 0 in the specific period, if one cheats and one cooperates, the771

first gets 4 and the other gets 1, if they both cheat, each gets 2, and if they772

both cooperate each gets 3. In the second set of simulations, we investigate the773

behaviour of the players when the numerical difference between cheating and774

cooperating increases. The payoffs for the second set of simulations are the775

following: when the user leaves, they both get 0 in the specific period, if one776

cheats and one cooperates, the first gets 100 and the other gets 1, if they both777

cheat, each gets 40, and if they both cooperate each gets 60. We use simple778

numbers as payoffs to help us get some scores for different strategy combina-779

tions but these numbers follow the relationships of the payoffs as described in780

27

their general case in the repeated game model (Table 3). Furthermore, for the781

adaptive strategy, the value of α is randomly generated at the beginning of782

each simulation run but adapted according to the network’s behaviour during783

the actual simulation (since each run simulates an iterative process). A ran-784

domly generated perceived quality and a fixed threshold of expected quality785

are also implemented for the strategies in all simulation runs.786

A randomly generated number of iterations was run for each set of simulations787

to get cumulative user and network payoffs for each combination of a user788

strategy playing against a network strategy. The user payoffs per strategy789

and the network payoffs per strategy are eventually added to give the most790

profitable user and network strategies respectively, for the total number of791

iterations of a simulation run; then the average cumulative payoffs from all792

simulation runs are calculated. Although the number of iterations is randomly793

generated, we still repeat the process 100 times for each set of simulations,794

i.e. by randomly generating 100 different numbers of iterations, in order to795

include behaviours when the number of iterations is both small and large.796

For the first set of simulations, i.e. with the payoffs ranging from 0 to 4,797

the payoffs are calculated for an average of 264.38 iterations per simulation798

run 2 . In both tables we see a score for each strategy combination. The score799

corresponds to either a user payoff (Table 5), or a network payoff (Table 6).800

Table 5User Payoffs from all strategy combinations (1st simulation set)

USER PAYOFFS

Network Strategies

User Strategies Tit-for-Tat Cheat&Return

Grim 793.14 4.42

Cheat&Leave 6.62 4.65

Leave&Return 793.14 252.92

Adaptive-Return 793.14 355.05

Table 6Network Payoffs from all strategy combinations (1st simulation set)

NETWORK PAYOFFS

Network Strategies

User Strategies Tit-for-Tat Cheat&Return

Grim 793.14 7.42

Cheat&Leave 3.82 4.23

Leave&Return 793.14 617.21

Adaptive-Return 793.14 618.57

2 minimum iterations generated: 8, maximum iterations generated: 1259

28

The results for the first set of simulations, show that the most profitable user801

strategy is the Adaptive-Return strategy, and that the most profitable net-802

work strategy is the Tit-for-Tat strategy for most payoffs 3 . Furthermore, the803

combination of these two strategies in the same game profile gives the highest804

cumulative payoffs to both players. Based on these numerical results, we de-805

fine the Adaptive-Punishment profile (Definition 5.1) for the game to consist806

of the Adaptive-Return strategy for the user and the Tit-for-Tat strategy for807

the network.808

Definition 5.1 (Adaptive-Punishment profile) When the user employs809

the Adaptive-Return strategy and the network employs the Tit-for-Tat strat-810

egy, the profile of the repeated game is referred to as Adaptive-Punishment811

profile of the game.812

For the second set of simulations, i.e. with the payoffs ranging from 0 to 100,813

the payoffs are calculated for an average of 240.59 iterations per simulation814

run 4 . As previously, we see in both tables a score for each strategy combi-815

nation. The score corresponds to either a user payoff (Table 7), or a network816

payoff (Table 8).817

Table 7User Payoffs from all strategy combinations (2nd simulation set)

USER PAYOFFS

Network Strategies

User Strategies Tit-for-Tat Cheat&Return

Grim 14435.4 47.8

Cheat&Leave 164.2 108.36

Leave&Return 14435.4 4894.76

Adaptive-Return 14435.4 4974.47

It is interesting to observe that for the second set of simulations, again the818

most profitable strategy for the user is the Adaptive-Return strategy and the819

network’s most profitable strategy is the Tit-for-Tat strategy, except for the820

payoff in combination with the user’s Cheat&Leave strategy (as in the pre-821

vious example). Overall, however, the preferred profile for the game is still822

the Adaptive-Punishment profile. The increase in the differences between co-823

operating and cheating payoffs, resulted in higher overall payoffs but has not824

changed the general payoff trend for the two players. In total, the highest825

payoffs are experienced by the players when they decide to use cooperating826

3 except for the payoff received from the combination with the user’s Cheat&Leavestrategy. However, the difference between the payoffs received by the network fromplaying either the Tit-for-Tat strategy or the Cheat&Return strategy, in combina-tion with the user’s Cheat&Leave strategy, is negligible.4 minimum iterations generated: 2, maximum iterations generated: 1124

29

Table 8Network Payoffs from all strategy combinations (2nd simulation set)

NETWORK PAYOFFS

Network Strategies

User Strategies Tit-for-Tat Cheat&Return

Grim 14435.4 146.8

Cheat&Leave 65.2 95.49

Leave&Return 14337.4 12850.4

Adaptive-Return 14435.4 12861.8

strategies instead of cheating; this result motivates the players to go ahead827

and cooperate.828

Furthermore, it is worth noting that, for both simulation sets, when the user829

plays the Leave&Return strategy, the payoffs received by the user are compa-830

rable (though less) to the highest payoffs received, i.e. payoffs for employing831

the Adaptive-Return strategy. The justification for these results is the follow-832

ing: we have shown by Theorem 4.3 that the minimum conditions for the833

network to cooperate when the user employs the Adaptive-Return strategy,834

are necessary for the network to cooperate when the Leave&Return strategy835

is employed. In fact, the conditions for the two strategies are at least equal,836

and given this, it is expected to observe payoff values that are numerically837

closer compared to payoff values for the other user strategies, although it ap-838

pears that the Adaptive-Return strategy manages to achieve higher payoffs839

than the Leave&Return strategy.840

An additional observation is that the Leave&Return strategy is weaker in the841

case the network decides to cheat, because the user’s reaction is a fixed-period842

punishment. On the other hand, when the user employs the Adaptive-Return843

strategy, the punishment period is not fixed but adapts to the network’s past844

behaviour, appearing to consequently achieve an improvement in the overall845

user payoff. It is important to note here, that according to the theoretical846

results when the Adaprive-Return strategy is used it is easier to motivate847

cooperation than when the Leave&Return strategy is used. Therefore, it is848

more likely for the network to decide to cheat when Leave&Return is used.849

Further, when interpreting the obtained payoffs, we should keep in mind that850

each simulation investigates a single user-network interaction, thus not con-851

sidering dependencies that may arise in an evaluation of multiple co-existing852

interactions, where a user may interact with several networks and vice versa.853

Future work plans to investigate such aspects.854

30

6 Conclusions855

In this paper, we have studied the interaction between a user and a network re-856

sulting from Network Selection in Next Generation Communication Networks857

and we utilized Game-Theoretic tools in order to capture this interaction. This858

framework enabled us to compute a cooperative solution that is satisfactory859

for both the user and the network in a repeated interactive situation between860

the two entities.861

Prior to analyzing the repeatability of the interaction, we introduced a model862

that captures the interaction of the two entities at any time, and does not863

consider their previous and future actions. The study revealed that the prob-864

lem in this case is equivalent to the fundamental game theoretic problem of865

Prisoner’s Dilemma. This enabled us to extract previous knowledge from game866

theory in order to understand our problem. In particular, we concluded that867

in this model a best response strategy of both players is to cheat.868

This conclusion gave rise to the question of how cooperation between the869

user and the network could be motivated, since it is expected to be beneficial870

so as to satisfy their corresponding needs for higher quality levels and higher871

compensation. Our second model addressed this question through a sequential872

moves game, in order to reflect better the reality that the decisions of the873

involving entities are taken in sequential time units. The study revealed that874

the behavior of the two players would be the same as in the previous (one-875

shot game) model: they will both try to cheat, since repeatability is still not876

considered.877

This undesired behavior of the interacting entities emerges because the net-878

work will not experience any punishment for cheating the user. This obser-879

vation has led to the next model, which successfully solves this inadequacy880

of previous models: in this model, the interaction between the user and the881

available networks is represented as a repeated game. Moreover, we utilized882

some notions from economic game theory, such as the present value, in order883

to evaluate the profits of the players in case they choose to cheat/cooperate884

given that their previous actions will affect the other player’s future actions.885

In particular, we assume that the user will punish the network by switch-886

ing to another network, in case that he experiences some degradation in the887

perceived quality of service.888

The study has shown that if a harsh punishment is given to the network in889

case of cheating, i.e., the user leaves the relationship forever, this strongly mo-890

tivates the network to cooperate. However, in case of cheating the relationship891

terminates forever and one could argue that this is costly for both the user892

and the network. On the other hand, if punishment is minimum, i.e., leaving893

31

for only one period, and this is known to both players, it is less motivating for894

them to engage in cooperation. This motivates a new adapive user strategy,895

which is not as harsh as leaving forever, but also not as soft as returning after896

one period.897

This adaptive user strategy feeds on the complete history of the network898

behaviour. This proposed user strategy as well as the proposed adaptive α899

component that affects the network leaving period selection compose a new,900

game-theoretic, adaptive network selection approach. The importance of this901

new strategy lies in the following: The network will only choose to cooperate if902

it is motivated to do so. We have shown in the theoretical results that already903

the motivation for the network is weaker when the Leave&Return strategy is904

used than when the Grim Strategy is used, i.e. when the network is threatened905

with the minimum punishment it is not as motivated to cooperate than when906

it is threatened with the maximum punishment. However, we must consider907

that the application of the Grim strategy in Next Generation Communication908

Networks, is not realistic because a user may only select from a finite number909

of access networks; leaving forever would be disastrous for the user in the long910

run, as could be left with no access network to select. Therefore, the user will911

choose to employ the Adaptive-Return strategy because it offers a solution912

which theoretically is at least as motivating for cooperation as the Leave-and913

Return strategy, with varying punishments depending on the history of the914

network’s behaviour. In addition, the use of particular strategy is numerically915

shown to achieve at least as high payoffs as the other user strategies, thus it916

is a satisfactory solution both theoretically and numerically.917

The approach is evaluated both analytically and numerically through Matlab918

simulations. The theoretical results show that the proposed approach easily919

motivates cooperation between the user and the network and the simulative920

results provide supporting evidence that the proposed approach can result in921

comparatively higher payoffs for both players.922

References923

[1] M. Turek, Voice and video over IP: Leveraging network convergence for924

collaboration, Issue paper, Nemertes Research (2006).925

[2] A. Iera, A. Molinaro, C. Campolo, M. Amadeo, An Access Network Selection926

Algorithm Dynamically Adapted to User Needs and Preferences, in: Proceedings927

of the IEEE International Symposium on Personal, Indoor and Mobile Radio928

Communications (PIMRC ’06), IEEE, 2006.929

[3] F. Bari, V. C. M. Leung, Service delivery over heterogeneous wireless networks:930

Network selection aspects, in: Proceedings of the ACM IWCMC 2006, ACM,931

2006.932

32

[4] J. Antoniou, I. Koukoutsidis, E. Jaho, A. Pitsillides, I. Stavrakakis, Access933

Network Synthesis in Next Generation Networks, To appear in Elsevier934

Computer Networks.935

[5] A. Dixit, S. Skeath, Games of Strategy, W.W.Norton & Company, New York,936

1999.937

[6] O. Markaki, D. Charilas, D. Nikitopoulos, Enhancing Quality of Experience938

in Next Generation Networks through Network Selection Mechanisms, in:939

Proceedings of the B3G Environment Workshop, PIMRC 2007, IEEE, 2007.940

[7] D. Niyato, E. Hossain, A cooperative game framework for bandwidth allocation941

in 4g heterogeneous wireless networks, in: Proceedings of the IEEE International942

Conference on Communications 2006 (ICC ’06), IEEE, 2006, pp. 4357–4362.943

[8] Qingyang Song, Abbas Jamalipour, Network Selection in an integrated Wireless944

LAN and UMTS environment using mathemaical modeling and computer945

techniques, IEEE Wireless Communications Magazine 12 (3) (2005) 42–48.946

[9] Shubba Kher, Arun K. Somani, Rohit Gupta, Network Selection using Fuzzy947

Logic, in: Proceedings of the 2nd International Conference on Broadband948

Networks, IEEE, 2005, pp. 876–885.949

[10] W. Zhang, Handover Decision Using Fuzzy MADM in Heterogeneous Networks,950

in: Proceedings of the IEEE WIreless Communications and Networking951

Conference (WCNC ’04), IEEE, 2004.952

[11] Q. Song, Abbas Jamalipour, An adaptive quality-of-service network selection953

mechanism for heterogeneous mobile networks, Wireless Communications and954

Mobile Computing 5 (6) (2005) 697–708.955

[12] O. Ormond, G.-M. Muntean, J. Murphy, Economic model for cost effective956

network selection strategy in service oriented heterogeneous wireless network957

environment, in: Proceedings of the Network Operations and Management958

Symposium 2006, IEEE/IFIP, 2006.959

[13] J. Antoniou, A. Pitsillides, Radio access network selection scheme in next960

generation heterogeneous environments, in: Proceedings of the IST Mobile961

Summit 2006, IEEE, 2006.962

[14] V. Gazis, N. Alonistioti, L. Merakos, Towards a Generic “Always Best963

Connected” Capability in integrated WLAN/UMTS Cellular Mobile Networks,964

IEEE Wireless Communications Magazine 12 (2005) 22–29.965

[15] X. Cai, L. Chen, R. Sofia, Y. Wu, Dynamic and user-centric network selection in966

heterogeneous networks, in: Proceedings of the IEEE International Conference967

on Performance, Computing and Communications, IEEE, 2007, pp. 538–544.968

[16] F. Bari, V. C. M. Leung, Automated network selection in a heterogeneous969

wireless network enviroment, IEEE Network 21 (1) (2007) 34–40.970

33

[17] M. S. Jawad, W. Ismael, M. S. J. Jingh, Optimizing network selection to support971

end-user qos requirements for next generation networks, IJCSNS International972

Journal of Computer Science and Network Security 8 (6) (2008) 113–117.973

[18] D. Charilas, O. Markaki, D. Nikitopoulos, M. Theologou, Packet switched974

network selection with the highest qos in 4g networks, Computer Networks975

52 (1) (2008) 248–258.976

[19] J. Antoniou, M. Stylianou, A. Pitsillides, RAN Selection for Converged977

Networks supporting IMS Signalling and MBMS Service, in: Proceedings of978

the 18th International Symposium on Personal, Indoor and Mobile Radio979

Communications (PIMRC ’07), IEEE, 2007.980

[20] J. Kim, J. H. Lee, T. Kwon, Y. Choi, Wireless Access Network Selection for Live981

Streaming Multicast in Future Internet, in: Proceedings of the 3rd International982

Conference on Future Internet Technologies, 2008.983

[21] M. Cesana, N. Gatti, I. Malanchini, Game Theoretic Analysis of Wireless984

Access Network Selection: Models, Inefficiency Bounds, and Algorithms,985

in: Proceedings of the 2nd International Workshop on Game Theory in986

Communication Networks 2008, ACM, 2008.987

[22] M. Cesana, I. Malanchini, A. Capone, Modelling network selection and988

resource allocation in wireless access networks with non-cooperative games, in:989

Proceedings of the 5th IEEE Mobile Ad-Hoc and Sensor Systems, IEEE, 2008.990

[23] G. Kendall, X. Yao, S. Y. Chong, The Iterated Prisoner’s Dilemma: 20 Years991

On, World Scientific Publishing Co., 2009, advances In Natural Computation992

Book Series, vol. 4.993

[24] H. Gintis, Game Theory Evolving: A Problem-Centered Introduction to994

Modeling Strategic Interaction, Princeton University Press, Princeton, New995

Jersey, 2000.996

[25] R. B. Myerson, Game Theory: Analysis of Conflict, Harvard University Press,997

Cambridge, Massachusetts, 2004.998

[26] R. Hegselmann, A. Flache, Rational and adaptive playing: A comparative999

analysis for all possible prisoner’s dilemmas, Analyse and Kritik 22 (1) (2000)1000

75–97.1001

[27] Matlab: The language of technical computing, The Mathworks, version 7.6.0.3241002

(R2008a) (February 2008).1003

[28] G. Taylor, Iterated Prisoner’s Dilemma in MATLAB, Archive for the “Game1004

Theory” Category “http://maths.straylight.co.uk/archives/category/game-1005

theory” (March 2007).1006

34