ALGORITHM DESIGN FOR LOW LATENCY ...

190
ALGORITHM DESIGN FOR LOW LATENCY COMMUNICATION IN WIRELESS NETWORKS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of the Ohio State University By Sherif ElAzzouni, B.S. M.S. Graduate Program in Electrical and Computer Engineering The Ohio State University 2020 Dissertation Committee: Eylem Ekici, Advisor Ness B. Shroff, Advisor Atilla Eryilmaz

Transcript of ALGORITHM DESIGN FOR LOW LATENCY ...

ALGORITHM DESIGN FOR LOW LATENCYCOMMUNICATION IN WIRELESS NETWORKS

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of

Philosophy in the Graduate School of the Ohio State University

By

Sherif ElAzzouni, B.S. M.S.

Graduate Program in Electrical and Computer Engineering

The Ohio State University

2020

Dissertation Committee:

Eylem Ekici, Advisor

Ness B. Shroff, Advisor

Atilla Eryilmaz

c© Copyright by

Sherif ElAzzouni

2020

ABSTRACT

The new generation of wireless networks is expected to be a key enabler of a myr-

iad of new industries and applications. Disruptive technologies such as autonomous

driving, cloud gaming, smart healthcare, and virtual reality are expected to rely

on a robust wireless infrastructure to support those applications’ vast and diverse

communication requirements. The successful realization of a large number of those

applications hinges on timely information exchange, and thus, Latency arises as

the critical requirement essential to unlock the true potential of the new 5G wire-

less generation. In order to ensure reliable low latency communication, new network

algorithms and protocols prioritizing latency need to be developed across different

layers of the network stack. Furthermore, a theoretical framework is needed to better

demonstrate the behavior of delay at the wireless edge and the proposed solutions’

performance.

In this dissertation, we study the problem of designing algorithms for low latency

communication by addressing traditional problems such as resource allocation and

scheduling from a delay-oriented standpoint, as well as, new problems that arise from

the new 5G architecture such as caching and Heterogeneous Networks (HetNets)

access. We start by a addressing the problem of designing real-time cellular downlink

resource allocation algorithms for flows with hard deadlines. Attempting to solve

this problem brings about the following two key challenges: (i) The flow arrival

and the wireless channel state information are not known to the Base Station (BS)

ii

apriori, thus, the allocation decisions need to be made in an online manner. (ii)

Resource allocation algorithms that attempt to maximize a reward in the wireless

setting will likely be unfair, causing unacceptable service for some users. We model

the problem of allocating resources to deadline-sensitive traffic as an online convex

optimization problem. We address the question of whether we can efficiently solve

that problem with low complexity. In particular, whether we can design a constant-

competitive scheduling algorithm that is oblivious to requests’ deadlines. To this

end, we propose a primal-dual Deadline-Oblivious (DO) algorithm, and show it is

approximately 3.6-competitive. We also explore the issues of fairness and long-term

performance guarantees. We then address the potential of caching at wireless end-

users where caches are typically very small, orders of magnitude smaller than the

catalog size. We develop a predictive multicasting and caching scheme, where the

BS in a wireless cell proactively multicasts popular content for end-users to cache,

and access locally if requested. We analyze the impact of this joint multicasting and

caching on delay performance and show that predictive caching fundamentally alters

the asymptotic throughput-delay scaling. This practically translates to a several-

fold delay improvement in simulations over the baseline at high network loads. We

highlight a fundamental delay-memory trade-off in the system and identify the correct

memory scaling to fully benefit from the network multicasting gains.

We then shift our focus from centralized wireless networks to distributed wire-

less adhoc networks. We build on recent results in wireless networks that show

that CSMA can be made throughput optimal by optimizing over activation rates

at the cost of poor delay performance, especially for large networks. Motivated by

those shortcomings, we propose a Node-Based version of the throughput optimal

CSMA (NB-CSMA) as opposed to traditional link-based CSMA algorithms, where

iii

links were treated as separate entities. Our algorithm is fully distributed and corre-

sponds to Glauber dynamics with “Block updates”. We show analytically and via

simulations that NB-CSMA outperforms conventional link-based CSMA in terms of

delay, highlighting that, exploiting the natural “hotspots” in wireless networks can

greatly improve delay performance. Finally, we shift our attention to the problem

of Heterogeneous Networks (HetNets) access, where a cellular device can connect to

multiple Radio Access Technologies (RATs) simultaneously including costly ubiqui-

tous cellular technologies and other cheap intermittent technologies such as WiFi and

mmWave. A natural question arises “How should traffic be distributed over differ-

ent interfaces, taking into account different application QoS requirements and the

diverse nature of radio interfaces?”. To this end, we propose the Discounted Rate

Utility Maximization (DRUM) framework with interface costs as a means to quan-

tify application preferences in terms of throughput, delay, and cost. We propose an

online predictive algorithm that exploits the predictability of wireless connectivity

for a small look-ahead window w. We show that the proposed algorithm achieves a

constant competitive ratio independent of the time horizon. Furthermore, the com-

petitive ratio approaches 1 as the prediction window increases. We conduct experi-

ments to better demonstrate the behavior of both delay-sensitive and delay-tolerant

applications under intermittent connectivity.

Our research demonstrates how low-complexity algorithms at the wireless edge

could be designed to enable reliable low-latency communication and enables a deeper

understanding of algorithm performance analysis from a delay standpoint.

iv

To my parents, Waguih ElAzzouni and Mary Iskander.

v

ACKNOWLEDGMENTS

First, I would like to express my sincere gratitude to my advisors. It was a privilege

working under the guidance of Prof. Eylem Ekici throughout my PhD journey. As

my advisor, Prof. Ekici taught me to think analytically and critically about research

problems, and how to translate an idea into a research problem. He also offered a lot

of guidance and advice on how to find interesting research directions, how to look at

the problem from all sides, and how to present my work effectively. I am very grateful

for the effort he put into my growth as a researcher, through continuous guidance and

thought-provoking feedback, as well his genuine care for his students’ success. As a

friend, Prof. Ekici was a source of continuous support and encouragement. I was also

very lucky to work under the guidance of Prof. Ness Shroff. Prof. Shroff’s mentorship

was invaluable for me as he guided me on how to conduct good research and how

to formulate an interesting research problem. His breadth of knowledge and level of

rigor were also immensely integral to my growth and evolution as an independent

researcher and are something I will forever benefit from in my career moving forward.

I am also very appreciative of Prof. Shroff’s perspective during our meetings as he

always offered an enlightening point of view. On the personal level, Prof. Shroff was

always a wise, supportive, and caring friend.

I am also thankful for Prof. Atilla Eryilmaz for serving on my candidacy com-

mittee, providing many insightful comments and suggestions, and for all the courses

he taught and I attended at OSU. I have immensely benefited from Prof. Eryilmaz’s

vi

inspiring work in our field and teaching talent. My discussions with Prof. Eryilmaz

were very helpful and enjoyable to me. I would also like to thank Prof. Abhishek

Gupta for serving on my candidacy committee. Prof. Gupta’s useful feedback and

insightful comments have significantly helped me into improving my dissertation. I

would also like to thank Dr. Gagan Choudhury for hosting me and being my mentor

during my summer internship. His guidance during our collaboration was very useful

for me and my perspective of our field. I would also like to thank my collaborator

Fei Wu for our joint work on caching. It has been a pleasure working together and it

has definitely benefited me greatly. I am also very grateful for all the useful courses I

have attended and discussions I’ve had with Prof. Can Emre Koksal, Prof. Hesham

El Gamal, Prof. Andrea Serrani, and Prof. Tasos Sidiropoulos. I would also like to

thank all my friends and colleagues at the IPS lab, for their continued support and

for all the discussions and time we had together.

On a personal level, I am immensely grateful for my family for being my support

system, especially my parents, my brother Karim, and my cousin Omar. I would also

like to thank all my friends in Egypt and USA for all the time we shared, and for

their continued support and advice throughout my PhD duration.

vii

VITA

November 6th, 1988 . . . . . . . . . . . . . . . . . . . Born in Alexandria, Egypt

2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.S., Electrical Engineering, AlexandriaUniversity

2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.S., Wireless Technologies, Nile Univer-sity

2014-Present . . . . . . . . . . . . . . . . . . . . . . . . . . Graduate Research Associate, GraduateTeaching Associate, The Ohio State Uni-versity

PUBLICATIONS

Sherif ElAzzouni, and Eylem Ekici “A Node-Based CSMA Algorithm For ImprovedDelay Performance in Wireless Networks”, in Proceedings of ACM MobiHoc, Pader-born, Germany, July 2016.

Sherif ElAzzouni, and Eylem Ekici “Node-Based Distributed Channel Access withEnhanced Delay Characteristics”, IEEE/ACM Transactions on Networking, 2018.

Sherif ElAzzouni, Eylem Ekici, and Ness Shroff “QoS-Aware Predictive Rate Allo-cation Over Heterogeneous Wireless Interfaces”, Proceedings of WiOpt, Shanghai,China, May 2018.

Sherif ElAzzouni, Eylem Ekici, and Ness Shroff “Is Deadline Oblivious SchedulingEfficient for controlling real-time traffic in cellular downlink systems?”, To appear inInfocom, Toronto, Canada, 2020.

viii

Sherif ElAzzouni, Fei Wu, Ness Shroff, and Eylem Ekici “Predictive Caching at TheWireless Edge Using Near-Zero Caches”, To appear in ACM Mobihoc, Shanghai,China, 2020.

FIELDS OF STUDY

Major Field: Electrical and Computer Engineering

Specialization: Network Science

ix

TABLE OF CONTENTS

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

CHAPTER PAGE

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Resource Allocation for Cellular Real-Time Traffic . . . . . . . . . 21.2 Caching at The Wireless Edge . . . . . . . . . . . . . . . . . . . . 31.3 Distributed Throughput-Optimal Low-Latency Scheduling . . . . . 51.4 Managing Cellular Access across Heterogeneous Wireless Interfaces 71.5 Contribution and Thesis Organization . . . . . . . . . . . . . . . . 9

2 Resource Allocation for Cellular Real-Time Traffic . . . . . . . . . . . . 13

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Deadline Oblivious (DO) Algorithm . . . . . . . . . . . . . . . . . 20

2.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4.3 Lightweight Algorithm . . . . . . . . . . . . . . . . . . . . 25

2.5 Stochastic Setting with timely throughput constraints . . . . . . . 262.5.1 Virtual Queue Structure . . . . . . . . . . . . . . . . . . . 282.5.2 D Look-ahead Algorithm . . . . . . . . . . . . . . . . . . . 292.5.3 Long-term Fair Deadline Oblivious (LFDO) Algorithm . . 32

2.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . 37

x

3 Predictive Caching at the Wireless Edge . . . . . . . . . . . . . . . . . 39

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.1 Basline On-demand Unicast System . . . . . . . . . . . . . 423.2.2 Predictive Caching . . . . . . . . . . . . . . . . . . . . . . 44

3.3 Analysis of Unicast On-Demand System . . . . . . . . . . . . . . . 473.4 Duality Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.4.1 Capacity of Predictive Caching . . . . . . . . . . . . . . . 493.4.2 Duality between Scheduling and Routing . . . . . . . . . . 51

3.5 Performance of Predictive Caching . . . . . . . . . . . . . . . . . . 533.5.1 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . 533.5.2 Discussion of Main Result . . . . . . . . . . . . . . . . . . 553.5.3 Proof of Main Result . . . . . . . . . . . . . . . . . . . . . 573.5.4 Closed-Form Delay-Memory Trade-off for the approximate

model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.6 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . 69

4 Distributed Node-Based Low Latency Scheduling . . . . . . . . . . . . 71

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.4 Node Based CSMA : Glauber Dynamics with Block updates . . . . 80

4.4.1 Step 1: Forming Blocks . . . . . . . . . . . . . . . . . . . . 814.4.2 Step 2: Updating Blocks . . . . . . . . . . . . . . . . . . . 82

4.5 Performance of the NB-CSMA Algorithm . . . . . . . . . . . . . . 834.6 Collocated Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.6.1 Q-CSMA Starvation time . . . . . . . . . . . . . . . . . . . 974.6.2 NB-CSMA Starvation time . . . . . . . . . . . . . . . . . . 100

4.7 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044.7.1 General Networks . . . . . . . . . . . . . . . . . . . . . . . 1044.7.2 Collocated Networks . . . . . . . . . . . . . . . . . . . . . 108

4.8 Practical Implementation of NB-CSMA . . . . . . . . . . . . . . . 1104.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5 Cellular Access Over Heterogeneous Wireless Interfaces . . . . . . . . . 117

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.2.1 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . 1225.2.2 Flow Utility . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 1245.4 Online Predictive Rate Allocation . . . . . . . . . . . . . . . . . . 125

xi

5.4.1 Receding Horizon Control . . . . . . . . . . . . . . . . . . 1255.4.2 Average Fixed Horizon control (AFHC) . . . . . . . . . . . 127

5.5 Competitive Ratio of AFHC . . . . . . . . . . . . . . . . . . . . . 1285.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Appendix A: Proofs for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . 146

A.1 Proof of Lemma 2.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . 146A.2 Proof of Lemma 2.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . 146A.3 Proof of Lemma 2.4.5 . . . . . . . . . . . . . . . . . . . . . . . . . 147A.4 Proof of Theorem 2.4.1 . . . . . . . . . . . . . . . . . . . . . . . . 148A.5 Proof of Theorem 2.5.2 . . . . . . . . . . . . . . . . . . . . . . . . 148

Appendix B: Proofs for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . 151

B.1 Proof of Lemma 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 151B.2 Proof of Proposition 3.5.1 . . . . . . . . . . . . . . . . . . . . . . . 152

Appendix C: Proofs for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . 155

C.1 Proof of Theorem 4.5.1 . . . . . . . . . . . . . . . . . . . . . . . . 155C.2 Proof of Theorem 4.5.2 . . . . . . . . . . . . . . . . . . . . . . . . 156C.3 Proof of Theorem 4.5.4 . . . . . . . . . . . . . . . . . . . . . . . . 158C.4 Proof of Theorem 4.5.6 . . . . . . . . . . . . . . . . . . . . . . . . 161

Appendix D: Proofs for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . 163

D.1 Proof of Lemma 5.5.1 . . . . . . . . . . . . . . . . . . . . . . . . . 163D.2 Proof of Lemma 5.5.2 . . . . . . . . . . . . . . . . . . . . . . . . . 164D.3 Proof of Theorem 5.5.1 . . . . . . . . . . . . . . . . . . . . . . . . 165

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

xii

LIST OF FIGURES

FIGURE PAGE

2.1 Real-Time Cellular Traffic System Model . . . . . . . . . . . . . . . . 16

2.2 Comparison of performance of different algorithms . . . . . . . . . . 36

2.3 Resource allocation per user under DO and LFDO . . . . . . . . . . 37

3.1 On-Demand Unicast Baseline System Model . . . . . . . . . . . . . . 42

3.2 Predictive Caching Model . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3 Predictive Caching Model . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4 Capacity Region of different delivery systems . . . . . . . . . . . . . 51

3.5 Duality between routing and scheduling problems . . . . . . . . . . . 53

3.6 Scaling of (3.5.25) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.7 Effect of Predictive Caching on Delay . . . . . . . . . . . . . . . . . . 67

3.8 Normalized Cache Size supporting Predictive Caching . . . . . . . . 67

3.9 Empirical Delay-Cache size Trade-off . . . . . . . . . . . . . . . . . . 69

4.1 An Example of a simple 5 node network topology and the correspond-ing conflict graph (1-Hop interference relationship) . . . . . . . . . . 77

4.2 State space of Q-CSMA in collocated network, equal throughput case 98

4.3 State space of NB-CSMA in collocated network, equal throughput case 100

4.4 600x600m Random Network Topology . . . . . . . . . . . . . . . . . 105

4.5 Average Queue Length per link vs. ρ . . . . . . . . . . . . . . . . . 106

4.6 Average Queue Length per link vs. ρ (Delayed CSMA, T = 2, T = 8.) 107

xiii

4.7 Mean Starvation time of link 1 vs. ρ . . . . . . . . . . . . . . . . . . 108

4.8 Average Queue Length per link vs. ρ (Collocated Networks) . . . . . 109

4.9 The POLLING/CTU exchange in the contention period . . . . . . . 113

5.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.2 Secondary Interface capacity process . . . . . . . . . . . . . . . . . . 132

5.3 Optimal Rate Allocations of heterogeneous flows, U(r) = log(1 +r), pc = 0.75. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.4 Competitive Ratios of RHC and AFHC compared to the functionf(w) = 1− 1

w+1. U(r) = r(1−α)

1−α , pc = 0.55, β = 0, 0.7, 0.95, α = 0.5. 134

5.5 Competitive Ratios of RHC and AFHC as a function of βmax. U(r) =r(1−α)

1−α , pc = 0.55, β = 0, 0.3, βmax, α = 0.5, w = 3. . . . . . . . . . . 134

5.6 Reward obtained by OPT, RHC and AFHC as a function of βmax.U(r) = r(1−α)

1−α , pc = 0.55, β = 0, 0.3, βmax, α = 0.5, w = 3. . . . . . . 135

5.7 Comparison between theoretical lower bound and simulated competi-

tive ratio, β = 0.2, 0.4, 0.6, U(r) = (1+r)(1−α)

1−α − 11−α , α = 0.5. . . . . 137

B.1 Generic Resource Pooling queue . . . . . . . . . . . . . . . . . . . . . 151

xiv

CHAPTER 1

INTRODUCTION

The last few years have witnessed an explosion in the proliferation, capability, and

ubiquity of mobile devices and networks. This growth has opened the door for a

new “5G” ecosystem, where mobile networks are increasingly relied on for the tech-

nological advancements that are expected to power the economic growth, and shape

various aspects of our lives in the next decade. Applications like remote medicine,

autonomous driving, Internet-of-Things (IoT), and virtual/augmented reality are ex-

amples of new disruptive applications that are expected to heavily rely on the robust

communication infrastructure promised by the new wireless technology generation.

For many of the applications that 5G aims to facilitate, such as critical commu-

nication and interactive applications, Latency is the most important performance

consideration. The survey in [1] identifies several key 5G services that require an

end-to-end latency on the order of 1 millisecond (ms). This requirement is much

more stringent than the delays seen in current LTE 4G Networks, reported to be in

the 20-60 ms range [2] [3]. Network Latency or Network Delay (we shall use the terms

interchangeably) is the time it takes for data to travel between the transmitter and

the receiver. Thus, the total latency seen by the average packet is affected by every

layer in the stack in both radio and core networks. This means that a novel network

architecture aiming to provide reliable low-latency communication has to redesign

every layer in the network with this fundamental goal in mind.

1

From a theoretical standpoint, latency is a complicated metric to analyze and

optimize in algorithm design due to the complex interactions between different as-

pects of the system. This is exacerbated for wireless networks due to the interaction

between wireless channel conditions such as fading, scattering, shadowing, path-loss,

etc., physical layer techniques employed by the network, and the higher layer design

choices. This means that a wireless low-latency communication system design should

target different problems in the system from traditional ones such as scheduling to

new ones included in the 5G design with the goal of reducing latency such as caching.

In this dissertation, we focus on system design and evaluation across four aspects

of the low-latency communication system: 1. Resource Allocation for Cellular Real-

Time Traffic. 2. Caching at The Wireless Edge. 3. Distributed Throughput-Optimal

Low-Latency Scheduling. 4. Managing Cellular Access across Heterogeneous Wireless

Interfaces.

1.1 Resource Allocation for Cellular Real-Time Traffic

Next generation mobile networks are poised to support a set of diverse applications,

many of which are both bandwidth-intensive and latency-sensitive, having strict re-

quirements on end-to-end delay. In applications like Virtual Reality, Cloud Gaming,

and Video Streaming, it is critical that end users receive the bulk of their data within

a prespecified hard deadline. Any extra delay would usually render the transmission

useless. On the other hand, the high bandwidth requirements of those applications

would often make streaming all users’ data within the deadline impossible, thus,

a good scheduler has to balance those two goals, intelligently making decisions on

how to use the available bandwidth to maximize end users’ satisfaction. This mo-

tivates the design of resource allocation schemes that jointly account for bandwidth

2

requirements, hard deadlines and applications’ priorities in terms of what has to be

transmitted to end-users to maintain a seamless experience.

Due to the continuous arrivals of flows and change in the wireless channel state, the

resource allocation of spectrum has to be made online without knowledge of future

events. The central question becomes “Can we find a constant-competitive

solution that has low-complexity?”. Specifically, we are interested in the class

of “deadline-oblivious” algorithms, that make scheduling decisions without taking

individual flows’ deadline requirements into account. Those algorithms have low

complexity, are more amenable to implementation than deadline-aware schedulers,

and are robust against deadline information absence or inaccuracy.

We show that the answer to this question is affirmative and develop an online

primal-dual algorithm that is provably constant-competitive and has an empirical

performance that is very close to the offline optimal as illustrated by simulations. We

then address the problem of long-term fairness by reformulating the online optimiza-

tion resource allocation problem as a stochastic optimization problem and modifying

our algorithm to account for long-term throughput fairness between users. We then

combine the primal-dual analysis of online algorithms with the Lyapunov tools for

stochastic control to show that our modified algorithm tracks the offline problem

closely in terms of reward while satisfying long-term stochastic constraints. Thus,

we show that deadline-oblivious scheduling algorithms can be efficient in controlling

real-time cellular traffic.

1.2 Caching at The Wireless Edge

Caching is poised to play a key role in most proposed future network architectures.

The huge increase in mobile traffic, expected to reach 77 exabytes of data per month

by 2022 [4], and higher user expectations in terms of high throughput and low latency

3

have pushed the networking community to rely on edge caching as a central pillar

of emerging architectures due to its potential to increase network capacity, reduce

latency, and alleviate peak-hour congestion among other expected benefits. Recently,

there has been a special IEEE JSAC issue that was dedicated to answer the question of

“What role will caching play in future communication systems?” [5]. As an example,

the Information Centric Networks (ICN) [6] proposal is an ambitious project to evolve

the internet away from the host-centric paradigm to a new content-centric paradigm

that decouples senders and receivers. In ICN, the sender requests a certain object,

rather than establishing a connection with the object’s host, and the network then

leverages in-network caches to locate that item and deliver it to the user. The

reliance on caching has motivated modeling ICN as a “network of caches”. Another

domain where caching has been gaining significant traction is 5G cellular networks.

There have been many works examining the potential of caching in both the core and

the RAN edge [7] [1], and significant commercialization efforts to deploy caching in

existing networks, enabling the operation of the Radio Access Network as a Content

Delivery Network [8] [9].

Central to the recent increased interest in caching is the possibility of caching at

the wireless edge [10]. Utilizing Base Stations (BSs)/Access Points (APs) to cache

popular content has been proposed [11], which has sometimes been referred to as

femtocaching. This enables users to fetch content from the closest Base Station,

if possible. This decreases the Round-trip delay of communication. Furthermore,

femtocaching reduces network congestion by alleviating the need to continuously move

popular content between the core servers and edge devices (such as RANs). Although

femtocaching can significantly reduce access delay, femtocaching cannot reduce the

last mile wireless network delay, thus, we present the following question: “If caching

content in last mile edge devices can cause significant delay reduction, can we go

4

one step further and push popular content to end-users devices’ caches?”. Users can

access cached content locally with zero-delay. Furthermore, this helps reduce the

overall delay by avoiding having to continuously transmit redundant content over the

wireless medium, which dissipates expensive wireless resources, that may be the delay

bottleneck especially during the busy hour. We address that fundamental question

“Can small cache sizes at the end-users be exploited to significantly reduce delay?”.

To this end, we propose a predictive caching scheme whereby popular content

is multicast to all users proactively and have end users cache that content. We

formalize the notion of “small caches” by defining the concept of “vanishing caches”,

where the usable cache size goes to zero as the network approaches full-load. To

analyze the effect of our proposed scheme, we apply the heavy-traffic analysis in a

novel way, and show that predictive caching has the ability to fundamentally alter

the delay-throughput scaling, practically translating to multiple fold delay increases.

We quantify the effect of cache sizes on delay savings highlighting the fundamental

delay-memory trade-off in the system. Furthermore, we identify the correct memory

scaling to obtain multicasting gains. We expect this result to aid in future practical

cache dimensioning at the wireless edge.

1.3 Distributed Throughput-Optimal Low-Latency Schedul-

ing

Scheduling is an essential task for resource allocation in communication networks.

This task is especially challenging in wireless networks due to inherent mutual inter-

ference among wireless links, and for various networks, the absence of central control

and/or resource management decision making. Typically, a good scheduling algo-

rithm should be able to achieve three goals; i)High Throughput: Characterized by

5

the fraction of the network capacity region a scheduling algorithm achieves. Ideally

a scheduling algorithm should be able to support any set of arrival rates within the

capacity region. ii)Low Delay: A good scheduling algorithm should be able to main-

tain the throughput required by the application without incurring excessive delay at

any of the links. Furthermore, the expected delay should scale favorably with the

size of the network. iii)Low Complexity: Required to ensure easy implementation

and to minimize resources required to run the algorithm.

A recent breakthrough in wireless network scheduling happened when it was shown

that CSMA-like algorithms can be made throughput optimal if every link’s acti-

vation rate is optimized [12] or taken to be an appropriate function of the queue

length [13], [14]. This result is attractive because CSMA algorithms are fully dis-

tributed. However, despite the promise of those algorithms, they were found to suffer

from poor delay performance due to the well-known CSMA starvation problem. A

weakness of the current implementations of throughput-optimal CSMA algorithms

is that they tend to treat all wireless links as separate autonomous entities that do

not communicate. This is not true in many instances of wireless networks. In many

practical wireless network deployments (e.g. wireless mesh networks and wireless ad-

hoc networks), nodes typically control multiple outgoing links. We use this fact to

motivate our proposed Node-Based CSMA (NB-CSMA) algorithm, where scheduling

decisions are made on a node level rather than a link level. This allows us to exploit

the interdependence between links along with the presence of hotspots in wireless net-

works to guide the scheduling process. We design our NB-CSMA algorithms guided

by the Glauber dynamics with block transitions idea from statistical physics to en-

sure throughput optimality and fully exploit link interdependies to minimize delay.

We rigorously analyze the proposed NB-CSMA: First, we demonstrate that for fixed

6

networks the delay performance of NB-CSMA is no worse than the baseline by ana-

lyzing the second order properties of link starvation times. Second, we analyze the

fraction of the capacity region where the delay is guaranteed to grow as a polynomial

in the size of the network (as opposed to exponential) and show that the fraction

in NB-CSMA networks is larger than that of link-based CSMA. Third, we derive a

closed-form mean-starvation time for collocated networks for both NB-CSMA and

the baseline which further highlights the key reasons node-based scheduling improves

delay performance. Finally, to assess our proposed algorithm we build a simulator

to test different distributed scheduling algorithms for arbitrary topologies and show

that NB-CSMA consistently achieves around 50% delay reduction over the baseline

for a variety of practical scenarios.

1.4 Managing Cellular Access across Heterogeneous Wire-

less Interfaces

The demand growth is straining cellular networks, as it is becoming clear that oper-

ators cannot increase capacity to meet the demand by deploying more base stations.

Thus, alternative approaches to capacity increase must be undertaken to provide

users with the bandwidth needed to support their applications. One such approach

is exploiting alternative Radio Access Technologies (RATs) that may be available to

smart phones such as WiFi, Bluetooth, mmWave, etc., to aid the cellular network in

data transmission [15]. Furthermore, the Dynamic Spectrum Access (DSA) technol-

ogy [16] could also be used to aid the cellular network in data transfer. An interesting

question arises from this proposal: How can we best distribute mobile traffic

over heterogeneous RATs, taking into account the inherent differences between

interfaces? Cellular networks are ubiquitous but have high cost on the operator in

7

terms of congesting the cellular network, as well as having high energy consumption

that may drain the phone battery. WiFi networks, if accessible, are usually free, but

WiFi coverage is not always present. Furthermore, it is typical for public places to

throttle WiFi rates. DSA is usually free-of-charge and has high rates. However, the

connection is intermittent as the user is only allowed to access this spectrum when

the spectrum owner is absent. These heterogeneous properties necessitate a frame-

work for mobile users to take all these factors into account and make a decision on

traffic allocation that is optimal in terms of throughput, Quality of Service (QoS)

constraints satisfaction, and cost.

The “correct” solution of this problem is different for different applications. For

example, for interactive applications such as phone/video calls, it is critical that a

minimum amount of throughput be available all the time irrespective of the interface,

whereas, a delay tolerant application such as downloads would be served better by

waiting until a“cheap” connection such as WiFi is available. The challenge in manag-

ing cellular access is that all those applications are contending for available resources,

thus, a good scheduler needs to balance the available resources along with application

specific QoS requirements. We approach the problem by designing an application spe-

cific utility function that proportionally weighs the long-term throughput as well as

the short-term service regularity to capture each application QoS requirements. We

then leverage the link prediction capability of mobile users that has been well docu-

mented to design predictive QoS-aware resource allocation algorithms that provable

maximizes applications’ utility while minimizing the expensive cellular link usage. We

rigorously analyze our predictive solutions and assess the effect of prediction quality

on the competitive ratio showing that our algorithms are O( 1

w

)-competitive, where

w is the length of the prediction window. We demonstrate via simulations how our

8

predictive solutions allocate available resources across applications and adapts to the

intermittency of secondary connections such as WiFi.

1.5 Contribution and Thesis Organization

This dissertation focuses on designing efficient network control algorithms to enable

robust low-latency communication by addressing four specific networking areas: re-

source allocation for real-time traffic (Chapter 2), caching and multicasting at the

wireless edge (Chapter 3), distributed scheduling (Chapter 4), and heterogeneous

network access (Chapter 5). Specifically, in Chapter 2, we investigate the capability

of low-complexity deadline-oblivious algorithms have in controlling real-time wireless

traffic. In Chapter 3, we analyze predictive wireless caching using multicasting and

its effect on the asymptotic delay-throughput relationship. In Chapter 4, we inves-

tigate throughput optimal distributed algorithms outlining how exploiting interlink

dependencies can significantly improve delay performance. In Chapter 5, we address

the problem of heterogeneous network access for contending applications, and how

predictability of link quality can be exploited for resource allocation tailored to ap-

plications’ needs. The contributions of this dissertation are described in more detail

below.

In Chapter 2, we study the problem of resource allocation for cellular traffic with

hard-deadlines. Attempting to solve this problem brings about the following two key

challenges: (i) The flow arrival and the wireless channel state information are not

known to the Base Station (BS) apriori, thus, the allocation decisions need to be

made in an online manner. (ii) Resource allocation algorithms that attempt to maxi-

mize a reward in the wireless setting will likely be unfair, causing unacceptable service

for some users. We model the problem of allocating resources to deadline-sensitive

traffic as an online convex optimization problem, where the BS acquires a per-request

9

reward that depends on the amount of traffic transmitted within the required dead-

line. We address the question of whether we can efficiently solve that problem with

low complexity. In particular, whether we can design a constant-competitive schedul-

ing algorithm that is oblivious to requests’ deadlines. To this end, we propose a

primal-dual Deadline-Oblivious (DO) algorithm, and show it is approximately 3.6-

competitive. Furthermore, we show via simulations that our algorithm tracks the

prescient offline solution very closely, significantly outperforming several algorithms

that were previously proposed. Our results demonstrate that even though a sched-

uler may not know the deadlines of each flow, it can still achieve good theoretical and

empirical performance. In the second part, we impose a stochastic constraint on the

allocation, requiring a guarantee that each user achieves a certain timely throughput

(amount of traffic delivered within the deadline over a period of time). We propose

a modified version of our algorithm, called the Long-term Fair Deadline Oblivious

(LFDO) algorithm for that setup. We combine the Lyapunov framework for stochas-

tic optimization with the Primal-Dual analysis of online algorithms, to show that

LFDO retains the high-performance of DO, while satisfying the long-term stochastic

constraints.

In Chapter 3, we study the effect of predictive caching on the delay of wireless

networks. We explore the possibility of caching at the wireless end-users where caches

are typically very small, orders of magnitude smaller than the catalog size. We develop

a predictive multicasting and caching scheme, where the Base Station (BS) in a

wireless cell proactively multicasts popular content for end-users to cache, and access

locally if requested. We analyze the impact of this joint multicasting and caching

on the delay performance. Our analysis uses a novel application of Heavy-Traffic

theory under the assumption of vanishing caches to show that predictive caching

fundamentally alters the asymptotic throughput-delay scaling. This in turn translates

10

to a several-fold delay improvement in simulations over the baseline as the network

operates close to the full load. We highlight a fundamental delay-memory trade-off in

the system and identify the correct memory scaling to fully benefit from the network

multicasting gains.

In Chapter 4, we consider the problem of distributed scheduling in wireless ad-

hoc networks. We build on recent studies in wireless scheduling have shown that

CSMA can be made throughput optimal by optimizing over activation rates. It

has been found those throughput optimal CSMA algorithms suffer from poor delay

performance, especially at high throughputs where the delay can potentially grow

exponentially in the size of the network. We argue that exploiting interdependencis

between wireless links can greatly improve delay performance while preserving the

fully distributed scheduling functionality of CSMA. We propose a Node-Based version

of the throughput optimal CSMA (NB-CSMA) as opposed to traditional link-based

CSMA algorithms, where links were treated as separate entities. Our algorithm is

fully distributed and corresponds to Glauber dynamics with “Block updates”. We

show analytically and via simulations that NB-CSMA outperforms conventional link-

based CSMA in terms of delay for any fixed-size network. We also characterize the

fraction of the capacity region for which the average queue lengths (and the average

delay) grow polynomially in the size of the network, for networks with bounded-degree

conflict graphs. This fraction is no smaller than the fraction known for link-based

CSMA, and is significantly larger for many instances of practical wireless ad-hoc

networks. Finally, we restrict our focus to the special case of collocated networks,

analyze the mean starvation time using a Markov chain with rewards framework and

use the results to quantitatively demonstrate the improvement of NB-CSMA over the

baseline link-based algorithm.

In Chapter 5, we focus on the problem of heterogeneous network access through

11

multiple radio interfaces. A natural approach to alleviate cellular networks conges-

tion is to use, in addition to the cellular interface, secondary interfaces such as WiFi,

Dynamic spectrum and mmWave to aid cellular networks in handling mobile traf-

fic. The fundamental question now becomes: How should traffic be distributed over

different interfaces, taking into account different application QoS requirements and

the diverse nature of radio interfaces. To this end, we propose the Discounted Rate

Utility Maximization (DRUM) framework with interface costs as a means to quantify

application preferences in terms of throughput, delay, and cost. The flow rate alloca-

tion problem can be formulated as a convex optimization problem. However, solving

this problem requires non-causal knowledge of the time-varying capacities of all radio

interfaces. To this end, we propose an online predictive algorithm that exploits the

predictability of wireless connectivity for a small look-ahead window w. We show

that, under some mild conditions, the proposed algorithm achieves a constant com-

petitive ratio independent of the time horizon T . Furthermore, the competitive ratio

approaches 1 as the prediction window increases. We also propose another predictive

algorithm based on the “Receding Horizon Control” principle from control theory

that performs very well in practice. Numerical simulations serve to validate our for-

mulation, by showing that under the DRUM framework: the more delay-tolerant the

flow, the less it uses the cellular network, preferring to transmit in high rate bursts

over the secondary interfaces. Conversely, delay-sensitive flows consistently transmit

irrespective of different interfaces’ availability. Simulations also show that the pro-

posed online predictive algorithms have near-optimal performance compared to the

offline prescient solution under all considered scenarios.

Finally, concluding remarks and possible future research directions are presented

in Chapter 6.

12

CHAPTER 2

RESOURCE ALLOCATION FOR CELLULAR

REAL-TIME TRAFFIC

2.1 Introduction

In this chapter, we address the problem of resource allocation for real-time cellu-

lar traffic [17]. To model the problem of resource allocation and scheduling for

bandwidth-intensive latency-critical applications, we propose approaching the prob-

lem as an online scheduling problem, where requests arrive to the BS carrying a

payload, a hard deadline, and a concave reward function that rewards successful par-

tial transmission within the prespecified hard deadline. Our motivations is that, in

many applications, completing a request partially within a deadline is acceptable.

For example in video transmission, frame-dropping and error concealment are used

to adapt to lower bandwidths, thus, this fits our model where 1. transmitting a

frame after the deadline is useless, 2. the portion of the request completed exhibits

a diminishing return. Another example is VR applications and/or 360 videos where

tiles outside field-of-view can be adaptively streamed at a lower rate if needed [18].

A third example is mobile cloud gaming, where the cloud server adaptively trans-

mit most-likely sequences depending on the bandwidth availability [19], thus, also an

example of a high-bandwidth hard deadline application with diminishing returns.

We focus on the class of Deadline Oblivious algorithms that allocate resources

13

without taking deadline information into account. Deadline Oblivious algorithms are

preferable due to their lower complexity as the scheduler need not track individual

flows’ deadlines, and their robustness against deadline information absence or inaccu-

racy. Our solution to the problem follows the online primal-dual approach presented

in [20] for online linear programs and used in [21] [22] [23] in the context of datacenter

scheduling. The problem of online deadline-sensitive scheduling in wireless networks

presents the following unique challenges: 1. Time-varying complex non-orthogonal

capacity regions due to the nature of the wireless channel, and a set of power control,

coding and MIMO capabilities, that a Base Station (BS) can use to achieve rates

within the capacity region. Our problem formulation treats instantaneous capacity

region as a time-varying closed convex region with no assumptions on the orthogonal-

ity of user rates. 2. Susceptibility of opportunistic scheduling to unfairness, as any

utility-maximizing algorithm would prefer users with consistently good channels. We

tackle long-term unfairness through stochastic timely-throughput constraints. Our

key contributions can be summarized as follows:

1. We develop a Primal-Dual Deadline Oblivious (DO) algorithm to solve the

problem of scheduling deadline sensitive traffic, and show in Theorem 2.4.1,

that our online solution provides a 3.6 competitive ratio compared to the offline

prescient solution that has all the information apriori.

2. We show in Theorem 2.5.3 that the Primal-Dual algorithm can be modified to

satisfy long-term stochastic “Timely Throughput” constraints. Timely through-

put is the amount of traffic delivered to the end user within the allowed dead-

line over a certain time period. We show that this modification causes minimal

sacrifice to performance by utilizing a virtual queue structure and Lyapunov

arguments in a novel way.

14

3. We show via simulations that our algorithm outperforms some well-known al-

gorithms proposed in the literature for deadline-sensitive traffic scheduling. We

also show that our algorithm closely tracks the offline optimal solution. Further-

more, we verify the efficacy of the modified Long-term Fair Deadline Oblivious

(LFDO) algorithm in satisfying timely throughput constraints.

Online Scheduling of Deadline-constrained traffic is a classical problem in networking

[24]. This problem has received increased recent attention with the proliferation of

deadline-sensitive applications in datacenters. A preemptive algorithm that relies on

the slackness metric was proposed in [22]. In [23], it was shown that online primal-

dual algorithms are also energy efficient. Perhaps closest to our setup is the work

in [21], where hard-deadlines and partial utilities are considered for multi-resource

allocation. We compare our algorithm to the one in [21] in the simulation section

and show that our algorithm has better performance due to reliance on primal-dual

updates rather than only primal updates. The aforementioned works however do

not take into account the fundamental challenges of the wireless setup that we have

discussed.

In the wireless setting, there has been an increasing interest in deadline-constrained

traffic. In particular, the concept of “timely-throughput” has been proposed and

studied extensively [25] [26] [27] for packets with deadlines. However, these works

target packet transmissions and do not consider the “diminishing returns” properties

of bandwidth-intensive traffic at the flow level.

2.2 System Model

The system model is shown in Fig.1. Every time slot, we model every job/request j

arriving at the BS as the tuple (aj, dj, Yj, fj(.), Uj), representing arrival time, deadline,

job size, concave reward function that rewards the amount of the job served x with

15

!!

!"

"!

""Problem Setup

Capacity Region of two Users

Time

Jobs

Sequence of Jobs

(#!:Arrival time, %!:Deadline, '!: Job Size, (! . : Reward Function, *!:User)

+

(!(+)

'!

Figure 2.1: Real-Time Cellular Traffic System Model

fj(x), and an intended user among an available N users, that is, Uj ∈ 1, 2, . . . , N.

At each time slot, t, the BS calculates an instantaneous feasible rate region R[t],

based on the CSI feedback. The feasible rate region determines the rates that the BS

can allocate to different users at each time slot. We do not make any assumptions

on R[t], except that it is closed, bounded, and convex. We model the feasible rate

regions over time in this way to capture both the time variability characteristic of

wireless networks as well as the BS capabilities to employ power control, coding, and

MIMO to extend the rate region beyond the simple orthogonal capacity region (see

for example [28]). We remark that this assumption changes the problem significantly

from the typical datacenter job-resource pairing (e.g. [21]), where the capacity is

assumed to be orthogonal with no time-variation.

Each job j is active between its arrival time, aj, and its deadline dj, after which

the job expire and no reward would be gained from transmitting it. At each time slot

t, each active job j is allocated a rate xtj. We use the variable Atj as an indicator of

whether a job j is active at time t. We collect those indicators at time t in a diagonal

matrix that we refer to as At. We denote all the jobs that arrive over the problem

horizon by the set J , and all rates given to all jobs at time t by xt = (xt1, xt2, . . . , xtJ).

16

We assume that utility functions fj(.) are continuous, strictly concave, non-

decreasing, and differentiable with a gradient ∂fj(.) and fj(0) = 0 for all jobs

j. This captures the diminishing return properties of the job service. With some

abuse of notation we will refer to the vector of the gradients of all functions as

∇f( ) = (∂f1( ), ∂f2( ), . . . , ∂fJ( )).

2.3 Problem Formulation

We model the problem as a finite-horizon online convex optimization problem aiming

to maximize the total utility obtained from the total resources received by each job

prior to expiry. Formally:

maxx1,x2,...,xt

∑j∈J

fj(T∑t=1

Atjxtj) (2.3.1a)

subject toT∑t=1

xtj ≤ Yj, ∀j (2.3.1b)

xt ∈ R[t], ∀t = 1, 2, . . . , T. (2.3.1c)

The objective function (2.3.1a) is the utility achieved by each job, due to the sum

of resources allocated to that job over its activity window. The constraint (2.3.1b)

ensures jobs are not allocated more than their size. The constraint (2.3.1c) ensures

that the rates allocated by the BS are feasible w.r.t the rate region estimated from

the CSI feedback. Technically, this constraint should be on the users rates, not on

the jobs. However, it is easy to transform the constraints on users’ sum rates to

constraints on individual jobs, since every job has a single intended user.

Our performance metric throughout will be the Competitive Ratio (CR). The

Competitive Ratio, γ, guarantees that the online algorithm always achieves at least,

a1

γfraction of the total reward achieved by an optimal offline prescient solution

17

that knows all jobs’ details before-hand as well as all the rate regions, independent

of the problem size. Denote the total reward achieved by an online algorithm as

P =∑j∈J

fj(T∑t=1

Atjxtj). We call the offline optimal algorithm OPT, and denote the

total reward achieved by OPT as P ∗ =∑j∈J

fj(T∑t=1

Atjx∗tj)

Definition 2.3.1. Competitive Ratio: An online algorithm is γ-competitive if the

following holds:

γ ≤ supSj ,R[1],R[2],...,R[T ]

P ∗

P(2.3.2)

where Sj is the input job sequence over all slots.

Dual Problem

Since our solution is based on simultaneously updating the primal and dual solutions,

we start by deriving the dual optimization problem:

minα,β

T∑t=1

maxxt∈R[t]

< Atα− β,x > +βTY −J∑j=1

f ∗j (αj) (2.3.3a)

subject to α, β ≥ 0 (2.3.3b)

where α = [α1, . . . , αJ ] is the J × 1 Fenchel Dual vector , β = [β1, . . . , βJ ] is the J × 1

multiplier of the constraint (2.3.1b), and Y = [Y1, . . . , YJ ]. The operator < , > is the

inner product operator. The function f ∗j (αj) is the cocave conjugate of the function

fj( ) [29], which can be written as:

f ∗j (αj) = infx≥0

< αj, x > −fj(x) (2.3.4)

18

A solution (x, α, β) is a primal-dual solution if and only if:

xt = argmaxx∈R[t]

< Atα− β,x >, αj = ∂(fj(T∑t=1

Atjxtj)). (2.3.5)

To derive a Competitive Ratio bound for our algorithm, we use the following theorem

on primal and dual problems:

Theorem 2.3.1. (Weak and Strong Duality [29]) Let (x1, . . . ,xT ) and (α, β) be fea-

sible solutions for the Primal and the Dual problems respectively, then the following

holds:

D =

T∑t=1

σt(α, β) + βTY −J∑j=1

f∗j (αj) ≥∑j∈J

fj(

T∑t=1

Atjxtj) = P (2.3.6)

where σt(α, β) = maxx∈R[t]

< Atα − β,x >. For the optimal offline Primal and Dual

solutions, assuming strong duality, the following holds:

D ≥ D∗ = P ∗ ≥ P (2.3.7)

This gives us a method to bound the competitive ratio of any primal-dual online

algorithm by showing thatD ≤ γP , which implies that P ∗ ≤ D ≤ γP . This technique

is covered in depth for online linear programs in [20] (e.g. Theorem 2.3), for many

applications. We use the same idea to analyze our online algorithm presented in the

next section.

19

2.4 Deadline Oblivious (DO) Algorithm

2.4.1 Algorithm

Before presenting our algorithm, we give some intuition on how we developed it.

It is useful to think of our problem as an online fractional matching problem with

edge weights on a bipartite graph. One side of the graph are the jobs, and on the

other side are the time slots. Each time slot brings new information on the capacity,

edge weights, and utility functions. It is well known that for the simplest online

matching problem with linear rewards, there exists an e−1-competitive Primal-Dual

algorithm that outperforms the simple Greedy Algorithm that is 2-competitive [30].

Later, this framework was extended for concave reward functions for covering/packing

problems [31], and for online matching problems [32]. In fact, our algorithm builds

on the algorithm presented in [32] for online matching with capacity constraints only

and no job size constraints. We develop a complete resource allocation algorithm

for deadline sensitive traffic with job sizes constraints, as well as tackling the long-

term stochastic constraints. The algorithm continuously allocates resources to active

jobs by controlling xt, and updates the per-job dual variables αt = [αt1, . . . , αtJ ], and

βt = [βt1, . . . , βtJ ] every time slot accordingly. Line 4 of the algorithm jointly allocates

the primal and dual variables by solving a low complexity saddle point problem. We

will later show how to use approximation to further reduce the complexity of the

problem. Line 5 updates the dual variable β that ensures that no job is allocated

more resources than its size. This discounts the reward obtained from any job as it

gets closer to completion, hence, this discounting gives priority to jobs that have more

work remaining. Note that the instantaneous primal and dual allocations of all jobs

do not use the knowledge of the activity window after time t. Since the algorithm is

20

Algorithm 1: Deadline Oblivious (DO) Algorithm

1 Initialize: At t = 0, set βtj = 0, ∀j2 for t = 1 to T do3 BS receives new jobs arriving at time t, and calculates R[t]4 Calculate the pair (αt,xt) that solves the following saddle point problem:

minα≥0

maxx∈R[t]

− f ∗(α)+ < α− βt−1,

t−1∑s=1

Asxs + Atx >

Update the dual variable for every job βtj as follows:

βtj =∂f(∑t

s=1 Asjxsj)

∂f(∑t−1

s=1 Asjxsj)

(1 +

AtjxtjYj

)βt−1j

+∂f(∑t

s=1Asjxsj)Atjxtj(C − 1)Yj

deadline oblivious, decisions only depend on current activity of a job and do not take

into account the future activity until the deadline.

We define the capacity-to-file-size ratio, Fmax, as the maximum ratio between

the resources any job can receive at any one time slot and the total job size. We

assume, that Fmax > 1, i.e., no job can be fully transmitted over one time-slot. This

assumption is essential to obtain a constant competitive ratio. This is equivalent to

the “bid-to-budget” ratio assumption in online matching problems [30]. Also, let C

in line 5 of the algorithm be C = (1 +Fmax)1

Fmax . Note that as Fmax approaches zero,

C approaches e, which will be useful when we derive the competitive ratio.

2.4.2 Analysis

In the next few Lemmas, we will show that the DO algorithm has some useful prop-

erties that enable us to derive a relationship between the primal and dual objectives.

We first define a complementary pair

21

Definition 2.4.1. x and α are said to be a Complementary Pair if any one of those

properties hold (It can be shown that they are all equivalent)

f ′(x) = α, f ∗′(α) = x, f(x) + f ∗(α) = xα,

where f ∗(α) is the concave conjugate defined in (2.3.4).

Lemma 2.4.1. DO produces a primal-dual solution (x, α, β) that guarantees the fol-

lowing for all time slots:

1. (αtj,t∑

s=1

Asjxsj) are a complementary pair for all time slots t, and for all jobs

j ∈ J , i.e., αtj ∈ ∂fj(t∑

s=1

Asjxsj)

2. xt ∈ argmaxx∈R[t]

< α− β,t−1∑s=1

Asxs + Atx >

The Proof of the Lemma is immediate from the properties of the concave-conjugate

property and the inner maximization problem in line 4 of the algorithm. The next

two Lemmas ensures that DO produces a feasible primal-dual solution

Lemma 2.4.2. For any job j, the dual variable βtj grows as a geometric series that

can be bounded from below as follows

βtj ≥∂f(∑t

s=0Asjxsj)

C − 1(C

∑ts=0 Asjxsj

Yj − 1) (2.4.1)

Proof. see Appendix A.1.

Lemma 2.4.3. (Properties of DO) DO produces a primal solution [xtj],∀j ∈ J , and

a dual solution (αtj, βtj),∀j ∈ J , for all time slots t, with the following properties:

1. The dual solution is feasible for all jobs at all time-slots:

αtj ≥ 0,∀j ∈ J, ∀t = 1, 2, . . . , T (2.4.2)

22

βtj ≥ 0, ∀j ∈ J, ∀t = 1, 2, . . . , T (2.4.3)

2. The Primal solution is almost feasible for all jobs at all time slots. The following

conditions are satisfied:

xt ∈ R[t],∀t = 1, 2, . . . , T (2.4.4)T∑t=1

xtj ≤ Yj(1 + Fmax),∀j ∈ J (2.4.5)

We say that the solution is “almost feasible” since the job size constraint can

be slightly violated as seen in (2.4.5). In particular, allocations of a job can exceed

the job size by Fmax, which we assume to be small. We can easily obtain a feasible

solution by multiplying all allocations xtj by (1− Fmax).

Proof. See Appendix A.2

To prove a competitive ratio bound, we will bound the Dual cost in terms of the

Primal reward using the next key theorem, and then use the weak duality in Theorem

2.3.1 to obtain our main result.

Theorem 2.4.1. (Key Theorem) The dual cost given the Primal-Dual online solution

obtained by DO can be bounded as follows:

D =T∑t=1

σt(ATt αT − βT ) + βTT Y −

J∑j=1

f ∗j (αTj) (2.4.6)

≤ P + P + P

(1 +

1

C − 1

)= P

(3 +

1

C − 1

)(2.4.7)

To prove the Theorem, we will give three lemmas. Each of those lemmas is to

bound one term on the RHS of (2.4.6).

23

Lemma 2.4.4. For any time slot t, DO chooses an allocation that satisfies the fol-

lowing:

< αt,Atxt >≤ 4P (2.4.8)

where 4P =∑j

4Pj =∑j

fj(t∑

s=1

Atjxtj)−fj(t−1∑s=1

Atjxtj) is the instantaneous utility

obtained by DO at time t.

Proof. Let f(y) =∑j

fj(yj). By Lemma 2.4.1, we know that αt ∈ ∇f(t∑

s=1

Asxs).

Substituting in the LHS of (2.4.8), and using the concavity of utility function, we get

the following

< ∇f(t∑

s=1

Asxs),Atxt >≤ f(t∑

s=1

Asxs)− f(t−1∑s=1

Asxs) = 4P (2.4.9)

Lemma 2.4.5. The sequence of vectors [β1, β2, . . . , βt] produced by DO has the fol-

lowing property:

(βt − βt−1)TY ≤ 4P(

1 +1

C − 1

)(2.4.10)

Proof. See Appendix A.3

The next Lemma bounds the last term in (2.4.6) by bounding the concave conju-

gate in terms of the original function.

Lemma 2.4.6. The concave conjugate f ∗(α) can be bounded using the term, µf given

by

µf = supc|f ∗(α) ≥ cf(u), α ∈ ∂f(u), u ∈ K (2.4.11)

24

for a proper cone K, and −1 ≤ µf ≤ 0.

The proof is straightforward from Lemma 2.4.1. A complete proof of this property

is given in Lemma 1 in [32].

We are now ready to proof our key Theorem:

Proof. (Theorem 2.4.1): See Appendix A.4

Corollary 2.4.1.1. The online solution found by DO is (3 +1

C − 1)-competitive.

We note two things about our results

1. To guarantee primal feasibility, the BS can multiply the resource allocation

solution by (1 − Fmax) at each time slot. This adds an extra factor to the

Competitive Ratio making the algorithm (3 +1

C − 1)(1− Fmax)-competitive.

2. Practically, we expect Fmax to be small as the job service times have a slower

time scale than the scheduling job completion time scale. Thus we expect

Fmax → 0 making the algorithm approximately 3 +1

e− 1-competitive.

2.4.3 Lightweight Algorithm

The complexity of the DO Algorithm can be further reduced by splitting the saddle

point problem in line 4 into two separate steps as follows:

maxx∈R[t]

< αt−1 − βt−1, Atx >, αtj ∈ ∂(fj(t∑

s=1

Asjxsj))

This approximation was proposed in [32] in the context of online bipartite matching.

This formulation approximates the saddle point problem with a Linear Programming

problem, reducing complexity. However, the price of this reduction in complexity is an

increase in the constant-competitive ratio bound that depends on the specific utility

25

function gradients ( [32] analyzes this penalty in the bipartite matching problem).

We will show using numerical simulations that this approximation retains the good

performance of the DO algorithm.

2.5 Stochastic Setting with timely throughput constraints

Although the job/reward formulation in (2.3.3) has been used extensively in modeling

scheduling with hard deadlines, for example [21] [22] [23] , a formulation that aims to

maximize total rewards of jobs is susceptible to unfairness. For example, the BS can

maximize the sum of rewards by consistently allocating resources to a nearby user

experiencing better channels all the time. This phenomenon was reported in previous

works [33] and is further validated by simulations. Furthermore, the results in the

previous section hold for adversarial models, designed for “worst case” inputs. In

practice however, both the job arrivals processes and the rate regions are stochastic.

We propose a new model to deal with those two issues that have the following extra

assumptions:

Assumption 2.5.1. 1. We assume a frame structure: At the beginning of a frame

of size D, some jobs arrive to the BS to be transmitted to users. By the end of

the frame after D slots, all jobs expire, and the system is empty. Note that jobs

can still have different deadlines as long as they are all upper bounded by D.

The frame structure has been extensively used in modeling deadline-constrained

traffic [25] [34] [35]. This assumption has been shown to adequately approximate

practical scenarios, while enabling the design of efficient scheduling algorithms

with deterministic bounds on delay.

2. We assume that there are l-job classes with specified deadlines, reward func-

tions, and sizes. Each of these l-classes arrive at the beginning of the frame

26

according to an i.i.d arrival process Ak. We assume that the number of the new

jobs arriving at the beginning of a frame can be deterministically bounded, i.e.,

(m(t)) ≤M , where m(t) is a random variable representing the number of active

jobs at time t.

3. We assume that the instantaneous rate region R[t] is sampled every time slot

from a set of finite convex regions in an i.i.d manner unknown to the BS. The

realization of rate regions over a frame is denoted as Rk.

The new formulation is presented in (2.5.1). Our goal now is to maximize the

long-term average expected rewards over frames k = 1, . . . , K. We denote the jobs

that arrive at frame k as Jk. In (2.5.1b), we introduce a new constraint to guarantee

fairness by ensuring that every user gets an expected timely-throughput higher

than δn. Timely-throughput is the amount of traffic delivered within the deadline

over a period of time. It has been used extensively to analyze networks with real-

time traffic [25] [26]. The function U( ) simply maps the job j to its intended user n.

maxx1,...,xt

lim infK→∞

1

K

K∑k=1

E∑j∈Jk

fj(

(k+1)D−1∑t=kD

Atjxtj)

(2.5.1a)

subject to lim infK→∞

1

K

K∑k=1

E ∑j∈Jk∩U(j)=n

(k+1)D−1∑t=kD

Atjxtj

≥ δn (2.5.1b)

T∑t=1

xtj ≤ Yj, ∀j (2.5.1c)

xt ∈ R[t], ∀t = 1, 2, . . . , T. (2.5.1d)

We refer to a random realization of job arrivals and rate regions over a frame as q.

The optimization problem (2.5.1) can be solved by a stationary scheduler that maps

27

q = Ak,Rk into the set of feasible actions over the frame: χ = x|(k+1)D−1∑t=kD

xtj ≤

Yj, ∀j ∈ Jk,xt ∈ R[t],∀t = kD, . . . , (k + 1)D − 1 with probabilities pqχ. Thus, the

optimal solution can be derived by finding the probabilities pqχ that solve (2.5.2).

This is practically infeasible as the probabilities q are typically unknown to the BS.

Even if the probabilities were known, the BS needs to non-causally know the rate

regions for the entire frame. This motivates us to extend our DO algorithm for the

stochastic setting to solve (2.5.2) and derive performance guarantees.

maxpqχ

∑q

νq

∫χ∈Xq

pqχ∑j

fj(

(K+1)D−1∑t=KD

Atjxtj)dχ (2.5.2a)

subject to∑q

νq

∫χ∈Xq

pqχ∑

j|U(j)=n

(K+1)D−1∑t=KD

Atjxtjdχ ≥ δn (2.5.2b)∫χ∈Xq

pqχdχ = 1 ∀q (2.5.2c)∫χ∈Xq

pqχ ≥ 0 ∀q (2.5.2d)

2.5.1 Virtual Queue Structure

To deal with the new timely throughput constraints (2.5.1b) for each user n, we define

a virtual queue that records constraint violations. For every frame, the amount of

unserved work under the δn requirement, δn −T∑t=1

Atjxtj is added to the queue, i.e.,

the queue is updated as follows:

Qn[k + 1] = (Qn[k] + δn −∑

j∈Jk∩U(j)=n

(k+1)D−1∑t=kD

Atjxtj)+, (2.5.3)

where (x)+ = max(0, x). There are two time-scales at play here. First, the slower

frame-level time scale. At the beginning of a frame, jobs arrive and by the end of the

frame, those jobs expire. Second, the faster slot level time-scale, where the channels

28

change and the BS allocates rates x. Each frame consists of D time slots where all

jobs are guaranteed to expire by the end of the frame by Assumption 2.5.1. Virtual

queues are used to analyze the time-average constraint violation for a given scheduling

policy. It can be shown that stability of the virtual queue ensures that the constraint

is satisfied in the long term. We state that well-known result as a Lemma without

proof (The proof is simple and can be found in [36] [37])

Lemma 2.5.1. For any user n, the virtual queue length upper bounds the constraint

violation at all times as follows:

Qn[K]

K− Qn[0]

K≥ δn −

1

K

K∑k=1

∑j∈Jk∩U(j)=n

(k+1)D−1∑t=kD

Atjxtj (2.5.4)

Furthermore the mean rate stability defined as:

limK→∞

E(Qn[K])

K= 0 (2.5.5)

implies that the constraint (2.5.1b) is satisfied in the long-term.

2.5.2 D Look-ahead Algorithm

Before explaining our algorithm, we present and analyze a non-causal frame-based

algorithm that we refer to as the D look-ahead algorithm. The benefits of this hy-

pothetical algorithm are two-fold: First, it guides our design of the practical LFDO

algorithm in the next section, and second, it will be crucial in analyzing the perfor-

mance of LFDO.

The D look-ahead algorithm observes the jobs Jk at the beginning of the frame and

non-causally observes all rate regions over the frame R[k],R[k+ 1], . . . ,R[k+D−1],

29

and allocates rates x′ of jobs over the frame k by solving the following optimization

problem:

maxxkD,..,x(k+1)D−1

V∑j∈Jk

fj(

(k+1)D−1∑t=kD

Atjxtj)

+N∑n=1

Qn[k]

( ∑j|U(j)=n

(k+1)D−1∑t=kD

Atjxtj

) (2.5.6a)

subject toT∑t=1

xtj ≤ Yj, ∀j ∈ Jk (2.5.6b)

xt ∈ R[t], ∀t ∈ [kD, (k + 1)D − 1] (2.5.6c)

where V is a free parameter that will be used to manage the trade-off between the

timely-throughput short-term constraint violation and total reward achieved by the

algorithm. The D look-ahead algorithm is essentially a version of the well-known drift-

plus-penalty algorithm introduced in [36] that has been used extensively in stochastic

constrained optimization problems, where a queue structure can be used to deal with

long-term constraints.

To simplify the notation, we will refer to the frame k D look-ahead reward and

timely throughput, respectively as follows:

P ′[k] =∑j∈Jk

fj(

(k+1)D−1∑t=kD

Atjx′tj) (2.5.7)

b′n[k] =

( ∑j∈Jk|U(j)=n

(k+1)D−1∑t=kD

Atjx′tj

)(2.5.8)

30

We define the quadratic Lyapunov function L(Q[t]) =1

2

N∑n=1

Q2n[t]. We also define

the one step Lyapunov drift and bound it as follows:

4Θ(Q) = E(L(Q[k + 1])− L(Q[k])|Q[k] = Q)

≤ B +N∑n=1

Qn[k](δn − bn[k]) (2.5.9)

where B is a bound on the term E((δn− bn[k])2

), which is guaranteed to exist due to

the boundedness of the number of jobs and the job sizes. It can be seen that the D

Look-ahead algorithm in (2.5.6) attempts to maximize the reward while minimizing

the drift (and subsequently the queue lengths), using the parameter V to manage the

trade-off. We are now ready to state the theorem that bounds the performance of

the D look-ahead theorem.

Theorem 2.5.2. Suppose there exists a solution that can achieve a timely throughput

strictly greater than δn + ε, for some ε > 0, for all users. Under the D look-ahead

solution, the queues Qn,∀n are mean-rate stable, and the following holds:

lim infK→∞

1

K

K∑k=1

E(P ′[k]) ≥ P ∗ − B

V(2.5.10)

lim supK→∞

1

K

K∑k=1

N∑n=1

E(Qn[k]) ≤ B + VMfmax(Ymax)

ε(2.5.11)

Before giving the proof, we point out that Theorem 2.5.2 shows that the D look-

ahead algorithm can be made arbitrarily close to OPT by increasing V , at the cost of

increasing the queue lengths, which implies higher short term violation of the timely

throughput constraint. The main assumption of the theorem is a mild assumption

31

that a strictly feasible solution exists, i.e., timely throughput constraints cannot be

set arbitrarily and must be strictly feasible under some solution. This corresponds to

the “Slater conditions” that are essential to applying the Lyapunov arguments [36].

Proof. See Appendix A.5

2.5.3 Long-term Fair Deadline Oblivious (LFDO) Algorithm

We are now ready to present our modified deadline oblivious algorithm that can

satisfy long-term timely throughput constraints. As can be seen in Algorithm 2,

Algorithm 2: Long-term Fair Deadline Oblivious (LFDO) Algorithm

Initialize: At k = 0, set Qn[k] = 0, ∀n1 for k = 1 to K do

Initialize Frame: Receive jobs at the beginning of the frame2 for t = kD to (k + 1)D − 1 do3 Perform the DO algorithm with the modified job reward function, gj:4

gj(x) = V fj(x) +N∑n=1

1(U(j) = n)Qn[k]Atjxtj (2.5.12)

5 Update the queues according to (2.5.3)

LFDO is a modified version of the DO algorithm incorporating long term timely

throughput guarantees. This is done by building on the virtual queue idea shown in

the D look-ahead solution. There are two time scales at play here:

• Frame time scale: The slower time scale where virtual queues are updated according

to the LFDO solution over the frame duration.

32

• Slot time scale: The faster time scale where the DO algorithm operates. Every

frame length acts as the “horizon” for the DO algorithm. At the beginning of the

frame, DO re-initializes to serve the jobs that belong to that frame.

The reward function in line 3 has been modified to add the user queue length infor-

mation to the job reward function. This follows the drift-plus-reward maximization

used to obtain the D look-ahead solution in (2.5.6). The difference is, unlike the D

look-ahead solution, LFDO does not know the future rate regions. Thus, on time-

slot scale, LFDO uses the primal-dual optimization used for DO with the modified

reward. We are now ready to combine our results of the DO algorithm performance

and the D look-ahead solution performance to obtain a powerful performance result

for the LFDO algorithm in the next theorem

Theorem 2.5.3. Under the LFDO Algorithm in the stochastic setting, all queues are

mean rate-stable. Furthermore, the expected reward and the expected queue length can

be bounded as follows:

lim infK→∞

1

K

K∑k=1

E(P [k]) ≥ 1

γ(P ∗ − B

V) (2.5.13)

lim supK→∞

1

K

K∑k=1

N∑n=1

E(Qn[k]) ≤ γ(B + VMfmax(Ymax))

ε(2.5.14)

where γ is the Competitive Ratio achieved by the DO algorithm.

This result asserts that the LFDO algorithm maintains its power when moving

from the adversarial to the stochastic setting. In particular, LFDO satisfies the

timely throughput constraint by (2.5.14). This comes at the cost of a larger queue

length compared to the D Look-ahead algorithm. Similarly, the LFDO algorithm can

be made arbitrarily close to achieve a1

γ-fraction of the stationary optimal reward,

where γ is the constant competitive ratio achieved by the DO algorithm in Corollary

33

2.4.1.1. This reduction of reward compared to the D Look-ahead algorithm is due

to the non-causality advantage that the D look-ahead algorithm has over the LDFO

algorithm. However, Theorem 2.5.3 shows that LFDO guarantees each user a long-

term stochastic timely throughput while achieving a constant fraction of the long-term

optimal reward independent of the problem size. Proving 2.5.3 is straight-forward

given the machinery we have already built.

Proof. We prove the theorem by applying the key Theorem 2.4.1 with reward-plus-

drift function, over the frame length. Since LFDO maximizes the sum of gj( ) func-

tions over every frame, Theorem 2.4.1 guarantees that LFDO achieves a modified

reward that is at least a1

γ-fraction of the reward achieved by any offline solution.

Thus, we can relate the reward-plus-drift achieved by the LFDO (in the LHS) to the

one achieved by the D look-ahead (in the RHS) as follows:

γ

( N∑n=1

Qn[k](δn − bn[k])− V P [k])≤

N∑n=1

Qn[k](δn − b′n[k])− V P ′[k] (2.5.15)

The rest of the proof is straight-forward and follows exactly the steps of the proof of

Theorem 2.5.2

2.6 Numerical Results

We assess the performance of our proposed algorithms with numerical simulations.

We first show that Lightweight DO tracks the offline solution very closely, outper-

forming several existing algorithms in the literature. We compare DO to a state of

the art algorithm that was proposed in [21] in the datacenter context. We call that

algorithm “Primal” since it is also a deadline oblivious algorithm that only relies on

the primal but not the dual updates to determine the allocation. Despite being a dat-

acenter algorithm, Primal also attempts to maximize total partial job rewards, and is

34

therefore comparable to DO (although the competitive ratio results were derived for a

wired setting only). We also compare the performance against the Earliest-Due Date

(EDD) that was analyzed in [38] for packets as a benchmark, and a greedy algorithm

that was proposed for linear reward functions in [39].

Setup: We simulate a downlink cell with three users (we chose a small number

of users to enable the offline solver to run with reasonable memory requirements).

Each time slot, for each user, a new jobs arrive to the BS intended to that user with

probability p. Thus, p represents the traffic intensity of the system. The job sizes

are uniformly distributed between 5 and 25 units. Each job has a random deadline

uniformly distributed between 2 and Dmax time slots. Dmax represents the laxity

of the system. Smaller Dmax means tighter deadlines. Large Dmax implies more

variety in traffic. The instantaneous rate region is generated by sampling a uniformly

random distribution for each user, then taking the convex hull of those user samples.

The resultant rate region is non-orthogonal. Finally, each job has a random reward

function ofv(0.1 + x)(1−ψ)

1− ψ, where v and ψ are uniformly distributed between 0 and

1.

Performance of DO: In Fig. 2.2a, we plot the performance of different al-

gorithms and OPT while varying traffic intensity, p. It is clear that DO tracks the

OPT very closely, confirming our premise that Deadline Oblivious scheduling is ef-

ficient for real-time traffic. DO consistently performs 8 − 15% better than Primal

at a lower complexity, since lightweight DO has the complexity of a linear program

while primal solves a general convex program. Greedy and EDD perform significantly

worse. In Fig. 2.2b, we vary Dmax between 2 and 40 time slots to simulate different

workloads. The results are similar to the previous figure with DO closely tracking

OPT. Interestingly, there is a slight performance degradation for very small values

35

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

2

3

4

5

6

7

8

9

(a) Varying p

0 10 20 30 402

3

4

5

6

7

8

(b) Varying Dmax

Figure 2.2: Comparison of performance of different algorithms

of Dmax when deadlines are very tight. This is consistent with our findings regarding

the dependence of competitive ratio bound on Fmax, the job-size-to-capacity ratio.

Performance of LFDO: In Fig. 2.3, we simulate the system for five users. We

set up the simulation, such that User 1 consistently gets low feasible rates compared

to other users. In particular, we sample the random rates such that User 1 can get

a maximum timely throughput of 0.05, and other users can get up to 0.5 timely

throughput. The instantaneous rate region is the convex hull of random rates. We

set a minimum timely throughput constraint of 0.045, thus pushing the system to the

boundary of the “capacity region” by forcing User 1 to operate very close to its upper

limit. In Fig. 2.3a, we show the timely throughput of all users under DO. Since DO

tries to maximize reward with no regard to timely throughput constraints, we see

that User 1 converges to a timely throughput well below the requirement. In Fig.

2.3b, we run LFDO for the same system with V = 1. Despite the improvement over

DO, the convergence to the required timely-throughput level is slow since virtual

queues are allowed to backlog before being cleared. In Fig. 2.3c, we set V = 0.1

emphasizing the importance of timely throughput constraints. The result is that

36

0 100 200 3000

0.1

0.2

0.3

0.4

0.5

(a) DO

0 100 200 3000

0.1

0.2

0.3

0.4

0.5

(b) LFDO, V = 1

0 100 200 3000

0.1

0.2

0.3

0.4

0.5

(c) LFDO, V = 0.1

Figure 2.3: Resource allocation per user under DO and LFDO

User 1 can now satisfy the constraint with fairly quick convergence at the expense of

slightly decreased reward (within 95% of DO reward). This outlines the previously

stated trade-off between the reward and the timely throughput guarantees.

2.7 Conclusion and Future Work

We have studied the problem of resource-allocation of low-latency bandwidth-intensive

traffic. We have formulated the problem as an online convex optimization problem,

developed a low-complexity Primal-Dual DO algorithm and derived its competitive

ratio. We have demonstrated that our algorithm is efficient and does not rely on

deadline information. We have also proposed the LFDO algorithm, that modifies

DO to satisfy long-term stochastic timely-throughput constraints. We have shown

via simulations that our proposed algorithms tracks the offline optimal solution very

closely and performs better than existing solutions. In the future work, we aim to

understand the properties of DO algorithm better, for example, we aim to analyze

how many jobs are served to their completion. This will enable us to expand the

algorithm to serve traffic that must be served to completion as well as traffic that

has the partial utility property. We aim to develop our work to take the unreliability

37

of wireless channels and inaccurate channel estimations into account. We also plan

to test our algorithm with a real-time setup through the variety of traffic seen in 5G

networks.

38

CHAPTER 3

PREDICTIVE CACHING AT THE WIRELESS EDGE

3.1 Introduction

In this chapter, we investigate the potential of caching at mobile end-users in wire-

less networks. While Base Station (BS) caching has been extensively studied, we

present the following question: “If caching content in last mile edge devices can cause

significant delay reduction, can we go one step further and push popular content to

end-users devices’ caches?”. Users can access cached content locally with zero-delay.

Furthermore, this helps reduce the overall delay by avoiding having to continuously

transmit redundant content over the wireless medium, which dissipates expensive

wireless resources, that may be the delay bottleneck especially during the busy hour.

We address that fundamental question “Can small cache sizes at the end-users be

exploited to significantly reduce delay?”. The motivating principle behind our work

is the “commonality of information”, i.e., the same content being requested by a

large number of users over a short period of time. Indeed, most content publishers

and sharing platforms often exhibit their trending content, encouraging more users

to request that content. This phenomena has been widely studied and a significant

effort has been done to statistically model it [40]. From a networking standpoint,

serving trending content in an on-demand unicast fashion (as is prevalent in today’s

39

networks) unnecessarily strains the network, wasting radio resources on fulfilling re-

dundant requests. Intuitively, the network is better off multicasting trending content

to all users that may request it, exploiting the broadcast nature of the wireless chan-

nel. Thus, instead of using a single resource (for example the number of LTE resource

blocks needed to transmit a video) per each user request, the operator can use a single

resource to fulfill all requests.

The joint deployment of multicasting and caching has been previously proposed

in [41] for energy minimization for transmissions that can tolerate a small amount of

delay. Joint caching and multicasting was studied for minimization of delay, power,

and fetching cost in [42], and maximization of successful transmission probability

in [43]. A deep reinforcement learning approach was undertaken in [44] to determine

which content to multicast. Despite the interest in joint caching and multicasting,

a study of the effect of combining both on delay has been absent. The fundamental

idea of multicasting popular content for end users to cache faces two fundamental

challenges. The first challenge is that users are likely to request the same content

at different times. Some works [41] have circumvented that obstacle by waiting for a

constant window before multicasting content to all users with outstanding requests,

forcing users to wait until the end of the window (on the order of a few minutes)

which might be unacceptable for users less tolerant to delay. Conversely, we propose

proactively multicasting popular content upon generation then exploiting end-users

caches to hold popular content. We refer to this scheme as Predictive Caching. Pre-

dictive Caching consists of two steps:1. Popular content is proactively multicast to

all users in the cell. 2. End users cache that content upon receipt in their local caches

for a duration equal to the typical requests lifetime, before discarding that content

to empty cache space for newer content. The users can then access that content any

time from the local cache with zero-delay. The second challenge is that end-users

40

have very limited cache sizes. Much of the previous work on wireless caching [11] [45]

assumed that the local cache size is on the order of the catalog size. This assumption

is not suitable for a variety of wireless networks, where the end devices (e.g., smart

phones, tablets, etc.) have limited memory. Thus, we carry out our analysis under

the assumption that cache sizes at end users can be very small. More precisely, we

show that significant delay reductions are attained even if the end-user cache sizes

vanish as the load increases. Our contributions can be summarized as follows:

1. We propose a predictive caching system whereby a BS (or an AP) divides the

bandwidth as a load-dependent θ-fraction (constrained by the cache sizes) for

predictive caching and a (1 − θ)-fraction for traditional on-demand unicast.

The BS then uses that θ-fraction to proactively multicast popular content for

end-users to cache by exploiting the wireless broadcast channel.

2. We model the predictive caching system as a downlink scheduling problem.

We introduce the Heavy-Traffic (HT) queuing framework to analyze the delay

performance under predictive caching. We use a novel duality framework to

simplify the scheduling problem with a load-dependent capacity region into a

single dimensional routing problem that is easier to analyze using standard HT

tools.

3. We analyze the predictive caching system for vanishing cache sizes vis-a-vis the

baseline unicast on-demand system. We show that predictive caching alters

the asymptotic delay scaling in the heavy traffic limit. This means that the

average delay of the predictive caching regime grows slower than the baseline

as a function of the network load, leading to significant delay savings as the

network approaches full load. We also illustrate via simulations that this delay

41

scaling altering translates to many-fold savings in delay for reasonable cache

sizes.

4. We characterize the effect of cache sizes, popularity distribution, number of

users in the system, and network load on the delay. We identify and formalize

the inherent delay-memory trade-off in the system, which is expected to aid

in end-user cache dimensioning. We also characterize the memory scaling as a

function of throughput to attain favorable delay scaling.

3.2 System Model

3.2.1 Basline On-demand Unicast System

New

Content

generated

with rate r

Edge

Server:

Receives

and routes

requestsShared Wireless Channel

Figure 3.1: On-Demand Unicast Baseline System Model

The system model is shown in Fig. 3.1. We assume new content is continu-

ously generated by the network with rate r. Each new content/item has a popularity

p ∈ [0, 1] drawn from a prior popularity distribution f(p). Upon content generation,

each user will request this new content with probability p. For ease of presentation,

42

we assume the popularity distribution is homogeneous across different users. Never-

theless, the theoretical framework could be extended to the case with heterogeneous

popularity distributions. The Base Station (BS) keeps a queue, Qi, for each user

i, to hold their requests until they are served. Each queue has an arrival rate Ai[t]

depending on the content requests and a service rate Si[t] that depends on the BS

scheduling algorithm. We assume that the channel between the BS and the end users

is a collision channel, where each time slot, the BS can transfer one item to one user.

For simplicity, we assume that all items are equal in size, which is justified by the fact

that large items can be split up to smaller chunks of equal size. The BS can deploy

any scheduling algorithm to serve outstanding requests. However, in the on-demand

unicast system, requests have to be fulfilled individually and reactively.

Formally, the on-demand unicast queues evolve as follows:

Qi[t+ 1] = (Qi[t] + Ai[t]− Si[t])+, ∀i = 1, . . . , N,where (3.2.1)

Si[t] ∈ 0, 1, ∀i = 1, . . . , N.,N∑i=1

Si[t] ≤ 1, ∀i = 1, . . . , N. (3.2.2)

λ = E[Ai[t]] = r

∫ 1

0

pf(p)dp (3.2.3)

ε = E(SΣ[t])− E(AΣ[t]) = 1−Nλ, (3.2.4)

where (x)+ = max(0, x). AΣ[t] and SΣ[t] are the sum of arrivals and service of all

queues at time t, respectively, and ε quantifies the network load, so a network with

ε→ 0 is said to be operating at full load. Condition (3.2.2) highlights that only one

user can be served at a a time-slot We can see from Fig. 1 that popular contents that

are requested by multiple users over a short period of time cause a lot of redundancy

in the queues, which for crowded cells can cause users to experience large delays.

This phenomena was empirically verified in [46], where the traffic from big events

43

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Content Popularity: p

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

PD

F: f(

p)

Popular Content

Proactively multicast

and cached by

all users

(a) Popularity Distribution highlightingγ(θ, ε)

New

Content

generated

with rate r

Edge

Server:

Receives

and routes

requestsShared Wireless Channel

Multicast

Channel

𝜽-fraction

(1-𝜽)-fraction

Popular content

sent to

predictive

multicast queue

(b) Predictive Caching System Model

Figure 3.2: Predictive Caching Model

was analyzed (in this case the superbowl), and it was reported that popular content

(in this case game related content) constituted the majority of traffic requests by

users. However, it was reported that simultaneous requests for same content were

rare, rendering direct multicast ineffective. This motivates our predictive caching

solution that we now present.

3.2.2 Predictive Caching

In order to reduce the redundancy in the network and exploit the commonality of

information and temporal locality in users’ content requests, we propose the model

in Fig. 3.2. The main idea is that content that is known to be popular is likely to be

widely requested, thus, the BS can proactively multicast those items and have end

users cache them and access them locally. In order to do that, we propose the BS

divides the bandwidth into a θ(ε)-fraction dedicated to multicasting, and a (1− θ(ε))-

fraction dedicated to on-demand unicast. The choice of θ(ε) is determined by two

things: The first is the network load ε, and the second is the amount of physical

44

memory available at the end users. Physical memory imposes a constraint on how

large θ(ε) can be independent of the load. To see this, assume that all items have a

lifetime T , for which they can be requested. This approximates the temporal locality

phenomenon of content requests reported in [40]. Assuming a physical cache size of

M (with respect to a normalized item size), θ(ε) can be bounded proportionally to

M

Tto ensure that items are available in the cache for a duration no shorter than

their lifetime. From here on, we will use the multicast bandwidth fraction, θ(ε), as

representative of the amount of end user cache used to hold content. We further

assume that the multicast channel has a rate of 1 to transmit an item to all users.

Once new content is generated in the network, the BS makes a choice on whether

to predictively multicast that content and have end-users cache it. The BS makes

that decision by simply setting a popularity threshold γ(θ, ε). Any item that has

popularity greater than γ(θ, ε) is automatically multicast, where the threshold is

chosen to ensure that the multicast queue is stable. Thus, the contents are divided

into two sets, as shown in Fig. 3.2(a), a popular set to be multicast, C(θ,ε), and a

unicast on-demand set, C(θ,ε)

.

C(θ,ε) =

c ∈ Contents |p(c) ≥ γ(ε, θ) = F−1

(1− θ(ε)

r

), (3.2.5)

where F−1 denotes the inverse CDF of the popularity distribution. This can be

equivalently written in terms of the cache arrival rate by picking the threshold to

ensure a certain arrival rate of items to the cache size that guarantees that, with

high probability, items stay in the cache for a duration no shorter than their request

lifetime:

r

∫ 1

γ(θ,ε)

f(p)dp = θ(ε) (3.2.6)

45

Predictive caching reduces both the arrival rates and service rates of unicast queues.

We denote those two quantities as A∗i [t], S∗i [t], respectively. We can now write down

the equations that make up the unicast queues of the Predictive Caching system as

follows:

Qi[t+ 1] = (Qi[t] + A∗i [t]− S∗i [t])+, ∀i = 1, . . . , N. (3.2.7)

S∗i [t] ∈ 0, 1, ∀i = 1, . . . , N,N∑i=1

S∗i [t] =

0 w.p. θ(ε),

1 w.p. 1− θ(ε).

(3.2.8)

E[A∗i [t]] = λ∗ = r

∫ γ(θ,ε)

0

pf(p)dp (3.2.9)

δ(ε, θ) = E[SΣ[t]]− E[AΣ[t]] = 1− θ(ε) −Nλ∗ (3.2.10)

Note that the now in the predictive caching regime, arrivals to the unicast queues

excludes items that belong to the set C(θ,ε), since those items are multicast, cached,

and accessed locally by end-users whenever requested. Similarly, the channel is acces-

sible by unicast queues in (3.2.8) only for a fraction of 1− θ(ε), and for a θ(ε)-fraction,

the wireless channel is dedicated to serving the multicast queue. Thus, the load

of delivering those requests is now deferred to the multicast queue at the cost of a

θ(ε)-fraction of the bandwidth.

The key question in this chapter is: Can we achieve delay gains even if the cache

available is asymptotically zero as the network approaches the full load, i.e., θ(ε) → 0

as ε→ 0? An affirmative answer to that question means that in practice, even small

caches at end users could be leveraged to reduce the overall user delay. We carry

out a detailed analysis that shows that indeed, predictive caching can improve the

asymptotic delay scaling for vanishing caches.

46

λ "

λ #

(λ #, λ ")

𝜖

ℱ(#)

𝑐 #(*)1

1

Figure 3.3: Predictive Caching Model

3.3 Analysis of Unicast On-Demand System

We start by analyzing the baseline unicast on-demand system to provide a basis

for comparison when we analyze the predictive caching scheme. We first derive a

lower bound on the sum queue lengths at the BS by utilizing the resource pooling

lower bound [47]. In the unicast problem, the BS scheduler makes a decision on

which user should be served every time slot depending possibly on the requests’

queue lengths (For example the scheduler can give the channel to the user with

the longest queue of outstanding requests). It is known that the capacity region is

R = Convex Hull(S), where S is the set of feasible schedules. Under our assumption

of a collision homogeneous channel, S could be written as S = Si, i = 1, . . . , N, |Si ∈

0, 1,∀i,N∑i=1

Si = 1

Since the number of users, N , is finite, the region S is a polyhedron that can be

fully described by as intersection of hyperplanes as follows:

R = r ≥ 0 : 〈c(k), r〉 ≤ b(k), k = 1, . . . , K (3.3.1)

where K is the number of hyperplanes describing the polyhedron. The notation

〈., .〉 indicates an inner product. The k−th hyperpane, H(k), can be described by the

47

pair (c(k), b(k)). For the special case of the collision channel, the capacity region can

be described by the single hyperplane R = r : r ≥ 0,1√N〈1, r〉 ≤ 1√

N. We plot in

Fig. 3.3 the capacity region for the two user case.

Having defined the capacity region, we can now utilize the “resource pooling”

system to derive a lower bound on the steady-state queue lengths in the heavy traffic

setting. The queue lengths process Qi[t]Ni=1 can be modelled as a Markov chain

that converges in distribution to steady-state QiNi=1 when the system is stable, i.e.,

the Markov chain is positive Harris recurrent. We are interested in characterizing the

steady-state of the sum queue lengths. Intuitively, pooling the resources of all queues

into one queue leads to a natural lower bound on the system. We parameterize the

system with the network load, ε = 1 − Nλ, and derive the steady-state sum queue

lengths at this loadN∑i=1

Q(ε)

, in particular, we are interested in the behavior of the

system in the Heavy-traffic limit, i.e., when ε → 0 pushing the operating point to

the boundary of the capacity region. This idea was introduced and applied in [47]

for both the routing and scheduling problems. The next Lemma characterizes the

resource pooling lower bound for the on-demand unicast baseline system:

Lemma 3.3.1. In the on-demand unicast system described above, let ε = 1 − Nλ,

the sum queue lengths in the network is lower-bounded as follows:

E[N∑i=1

Q(ε)

i ] ≥ ζ(ε)

2ε− 1

2(3.3.2)

where ζ(ε) =√N(σ

(ε)A )2 + ε2, and (σ

(ε)A )2 is the variance of the arrivals for each queue

at each time slot. Furthermore, using the conditional variance of the arrivals, we

can show that (σ(ε)A )2 = µ(ε)

p − (µ(ε)p )2, where µp is the mean of the prior popularity

distribution. Furthermore, taking the heavy traffic limit as ε → 0, the steady-state

48

sum queue lengths limit is asymptotically lower bounded as follows:

lim infε→0

εE[ N∑i=1

Qi

]≥ ζ

2(3.3.3)

Where Qi is the limit of Q(ε)

i as ε → 0. ζ =√Nσ2

A, where σ2A = lim

ε→0(σ

(ε)A )2. Equiva-

lently, the sum queue lengths in the steady-state asymptotically scales as Ω(1

ε).

Proof. The proof of the result can be directly obtained by applying the resource

pooling lower bound for the generic single queue in [47] to a queue with an arrival

rate of 〈1,A[t]〉, where 1 = [1, 1, . . . , 1] and a deterministic service rate equal to 1. A

complete proof is given in Appendix B.1

The important thing to note in Lemma 1 is that the expected steady state sum

queue lengths in the on-demand system scales as Ω(1

ε). We will show that the predic-

tive caching fundamentally alters this scaling to a slower scaling leading to arbitrarily

large delay saving in the HT limit. In order to do that we introduce the duality frame-

work that maps the scheduling problem into an easier routing problem.

3.4 Duality Framework

3.4.1 Capacity of Predictive Caching

In order to motivate our duality framework, it is essential to understand how pre-

dictive caching alters the capacity region. Consider a general capacity region, C, for

an on-demand system. Recall that the predictive caching reserves a fraction θ(ε) for

multicasting popular content for end users to cache. We make the key design choice

of vanishing cache size: θ(ε) → 0 as ε → 0, i.e., as the network load approaches the

full load, the multicast bandwidth decreases until this multicast bandwidth vanishes

at the full-load. The motivation for this choice is two-fold:

49

1. We aim to show that small memory sizes typical of end-user devices, such as

hand held devices, can still be used to achieve significant delay savings. Having

the vanishing cache size assumption emphasizes that even diminishing caches

can be useful at high network loads, furthermore, a smaller cache at high load

can be more useful than a larger cache at lower load. This is crucial to show that

our results hold for practical systems and do not make the common assumption

in wirless caching previous works [11] [48] [45] [49] that the cache size can hold

a significant fraction of the content catalog.

2. As the network approaches full-load, less resources could be dedicated to pre-

dictive caching, and the scheduler needs to dedicate all of the resources to fulfill

on-demand requests to guarantee stability.

Since θ(ε) of the the bandwidth is sacrificed to predictive caching, the BS unicast

scheduler sees a reduced capacity region C(ε), as shown in Fig. 4. Specifically, the

scheduler gets an aggregate 1 − θ(ε) capacity to allocate to unicast traffic. However,

the average user arrival rate is also reduced from λi to λ∗i due to predictive caching

of popular content. Thus, the network load changes from ε to a new load: δ(ε, θ) (we

write the new load δ as a function of ε and θ to emphasize the dependence). Ideally,

we design our algorithm such that δ(ε, θ) > ε. If we can show that the average delay of

the unicast queues under predictive caching scales as O(1

δ(ε, θ)), then this establishes

that predictive caching alters the asymptotic scaling of delay at the heavy-traffic

leading to arbitrarily large delay savings as ε→ 0.

We can see in Fig. 4 that the capacity region, C(ε) is dependent on the load, ε, with

the property that limε→∞C(ε) = C, due to the vanishing caches assumption. The standard

analysis for scheduling algorithms in the HT regime [47] [50] [51] [52] is carried out

under the assumption of a fixed capacity region independent of ε. It is unclear how the

analysis can be altered to fit the load-dependent regime, since now the Hyperplanes

50

!"

!#

!

! (#)

! (&)

! (")' (")' (#)' (&)

(a) On-demand Unicast CapacityRegion: C

𝜃(#) Multicast

𝜆∗

𝜆/

𝜆0 1 − 𝜃(#) Unicast

δ(𝜖, 𝜃) (8)δ(𝜖, 𝜃)

(0)

δ(𝜖, 𝜃) (/)

𝜆 ∗(0)

𝜆 ∗(8)

𝜆 ∗(/)

(b) Predictive Caching CapacityRegion: C(ε)

Figure 3.4: Capacity Region of different delivery systems

that define the capacity region are dependent on ε. To avoid a complicated analysis,

we introduce a duality framework that transforms the scheduling problem into a

simpler routing problem more amenable to standard HT tools.

3.4.2 Duality between Scheduling and Routing

We now present a method to map the scheduling problem into an equivalent routing

problem that allows us to extend the existing HT framework to directly analyze the

predictive caching problem. Intuitively, scheduling a non-empty queue (denoted as

i) over a time-slot leads to reduction of that queue by one request. This can be

equivalently viewed as all queues except queue i adding one request if all queues have

a constant independent service equal to 1 request/slot, as shown in Fig. 5. Thus,

Instead of solving the scheduling problem by determining which queue to schedule at

time t, we can equivalently solve a routing problem by determining where to route

N − 1 “artificial arrivals”. We denote those artificial arrivals as B[t]. We formalize

this intuition in the next Theorem.

Theorem 3.4.1. Duality of Routing and Scheduling Problems: Given two Systems

U1 and U2. U1 is a scheduling problem described by equations (3.2.1)-(3.2.4) and some

51

(possibly random) scheduling rule s[t] = g(Q[t]), that depends on the queue lengths at

time t. U2 is an (N − 1)-routing problem described by the following equations:

Qi[t+ 1] = (Qi[t] + Ai[t] +Bi[t]− Si[t])+, ∀i = 1, . . . , N. (3.4.1)

Si[t] = 1, E[Ai[t]] = λ, ∀i = 1, . . . , N.

Bi[t] ∈ 0, 1, BΣ[t] =N∑i=1

Bi[t] = N − 1, ∀i = 1, . . . , N.

ε = E[SΣ]− E[AΣ]− E[BΣ] = 1−Nλ (3.4.2)

Where the router makes routing decisions according to some, possibly random func-

tion, B[t] = h(Q[t]), depending on the queue lengths at time t. If the scheduling rule

in U1 and the routing rule in U2 satisfy:

Pg(Q) chooses Qifor scheduling = Ph(Q) routes requests to

all queues except Qi ∀i = 1, . . . , N (3.4.3)

Then the systems U2 and U1 are Sample-path equivalent, i.e., for the same sample

path (same realizations of requests arrivals and scheduling/routing random decisions),

Q[t] are equal in U1 and U2 with probability 1 at all times t, assuming the same initial

state Q[0].

The Proof is straightforward by induction. We give two examples to further

illustrate the duality condition (3.4.3). The first example is the Longest-Queue-First

(LQF) scheduling algorithm, breaking ties uniformly at random. Thus g(Q[t]) =

RANDargmax(Q[t]). This scheduling rule can be mapped to the Join-the-Shortest

N − 1 Queues (JS(N − 1)Q), as we can express the routing rule that routes to N − 1

shortest queues as follows g(Q[t]) = Q \ RANDargmax(Q[t]). where ‘\’ is the set

52

𝐴"[𝑡]

𝑺 𝑡 =𝑆( 𝑡 , i = 1, . . , N |𝑆 𝑡 𝜖 0,1,∑(2"3 𝑆( 𝑡 =1

𝐴4[𝑡]

𝐴5[𝑡]

𝑆"[𝑡]

𝑆4[𝑡]

𝑆5[𝑡]

Scheduler

(a) Single Resource Scheduling Problem

𝐴"[𝑡]𝑩 𝑡 =𝐵( 𝑡 , i =1, . . , n |𝐵( 𝑡 𝜖 0,1,∑(2"3 𝐵( 𝑡 =N-1 𝐴4[𝑡]

𝐴5[𝑡]

𝑆" 𝑡 = 1

Router

𝑆5 𝑡 = 1

𝑆4 𝑡 = 1

𝐵"[𝑡]

𝐵5[𝑡]

𝐵4[𝑡]𝐵7 𝑡 = 𝑁 − 1

(b) N − 1-Routing problem

Figure 3.5: Duality between routing and scheduling problems

difference notation. It is straightforward to see that LQF and JS(N − 1)Q satisfy

(3.4.3). Another example is Random Scheduling (RS) and (N−1)−Random Routing

(N − 1)-RR which respectively make the routing and scheduling decisions uniformly

at random (where each queue can get at most one request in the routing system). It

is easy to see that RS and (N − 1)RR satisfy (3.4.3).

3.5 Performance of Predictive Caching

3.5.1 Main Result

We now analyze a predictive caching algorithm that we call Predictive Caching

Longest-Queue First (PC-LQF) that follows the outline of Section II. PC-LQF mul-

ticasts items that have a popularity p higher than threshold γ(ε, θ) for end users to

cache, and unicasts all other items. PC-LQF serves the multicast queue with prob-

ability θ(ε), and the unicast queue with probability 1 − θ(ε). In the unicast mode,

the BS schedules the longest queue for unicast transmissions breaking ties randomly.

PC-LQF is summarized in Algorithm 1. Having described the PC-LQF algorithm.

We are now interested in the delay scaling in the HT limit as ε→ 0, under the vanish-

ing caches assumption. In particular, we are interested in the scaling as it compares

53

Algorithm 3: Predictive Caching-Longest Queue First (PC-LQF)

1 for time slot t do2 Receive new content items generated by the network3 if new item c has popularity pc ≥ γ(ε, θ) then4 Send item c to Multicast Queue QM to be sent to all users to cache

5 else6 Only forward c to Qi when requested by user i

7 w.p. θ(ε)

8 Serve Multicast Queue, QM

9 w.p. 1− θ(ε)

10 Choose the longest unicast queue to serve, i.e.,

s[t] = RANDargmaxi

Qi[t]

to Lemma 3.3.1. We are also interested in how different factors such as popularity

distribution, cache sizes, and number of users in the network affect the asymptotic

delay scaling. The result in Theorem 1 exactly characterizes that

Theorem 3.5.1. Main Result: Consider the Predictive Caching System shown in Fig.

3.2 with homogeneous arrivals equal to λi = r

∫ 1

0

pf(p)dp, satisfying ε = 1−Nλ > 0.

If the BS applies Algorithm 1, then, the system is stable as long as γ(θ, ε) >1

N, and

the limiting steady-state queue length vector Q(ε) satisfies the following:

E[ N∑i=1

Q(ε)

i

]≤ ζ∗(ε)

2δ(ε, θ)+B

∗(ε)(3.5.1)

where ζ∗(ε) = (N(σ(ε)A )2 + θ(1− θ) + δ(ε, θ)2), and B

∗(ε)= o(

1

δ(ε, θ)), with (σ

(ε)A )2 being

the variance of a single queue arrival at any time slot. Furthermore, the scaling factor

54

δ(ε, θ) can be bounded as follows

ε+ θ(Nγ(θ, ε)− 1) ≤ δ(ε, θ) ≤ ε+ θ(N − 1) (3.5.2)

Additionally, in the Heavy-traffic limit as ε → 0, which implies Nrλ → 1,

(σ(ε)A )2 → σ2

A, and θ(ε) → 0, by the vanishing caches assumption, the asymptotic

limit becomes:

lim supε→0

δ(ε, θ)E[ N∑i=1

Qi

]≤ ζ

2(3.5.3)

where ζ = Nσ2A.

Finally, PC-LQF is Heavy-Traffic optimal within all algorithms with predictive

caching capability.

3.5.2 Discussion of Main Result

Before proving our main result, we highlight a few key observations from the main

theorem:

1. The most important observation is that predictive caching alters the asymptotic

delay scaling, namely, it significantly “slows down” the delay build-up as ε

vanishes. To see that it is useful to contrast the scaling in Theorem 3.5.1 with

Lemma 3.3.1. We see that for the baseline unicast system, the sum queue

lengths in the steady state is lower bounded by a Ω(1

ε) scaling, whereas the

PC-LQF system is upper bounded by O(1

δ(ε, θ)) scaling. Under the assumption

that δ(ε, θ) > ε (we will show the mild conditions for this assumption to be

true), the average queue lengths under PC-LQF are arbitrarily smaller than

55

the unicast system as ε→ 0. This leads to many-fold delay savings in practical

heavily-loaded systems as seen in simulations.

2. To get a sufficient condition for delay scaling improvement, it is useful to note

that having γ(ε,θ) >1

N, guarantees δ(ε, θ) > ε. This means that the BS should

use the rule in (3.2.5) for caching as long as the item’s popularity p >1

N. This

is expected since this rule guarantees that an item that is multicast and cached

is expected to be requested more than once, giving multicasting gains over the

unicast system. Thus, this rule can lead the BS to be more conservative in

multicasting in cases where end users have no commonality of information to

be exploited.

3. The main result answers the question of “How should cache sizes scale with

respect to the network load to maintain asymptotic reduction in delay?”. This

can be seen by inspecting (3.5.2): To obtain asymptotic delay improvement, we

need the second term in the bound to be on the order of Ω(ε). The next corollary

characterizes the expected improvement in delay according to the cache size

scaling with ε.

Corollary 3.5.1.1. Under a mild condition that limε→0

γ(ε, θ) > c, for some con-

stant c, the cache-size scaling with the network load ε determines the improve-

ment in delay as follows:

(a) Case I: θ(ε) = o(ε): No improvement in delay asymptotics can be achieved

since δ(ε, θ) and ε are on the same order.

(b) Case II: θ(ε) = Θ(ε): Improves the constant in the delay asymptotics.

(c) Case III: θ(ε) = ω(ε): Improves the scaling in the delay asymptotics.

This corollary introduces the fundamental requirement for delay improvement

56

in terms of cache scaling. This requirement can translate practically by having

end-users allocate the appropriate device memory for content caching at high

network loads when instructed by the BS. Thus at congestion, the end-users

can still decrease their cache sizes to zero as long as the decrease is slower than

ε.

4. From Corollary 3.5.1.1 , we see that scaling memory as Ω(ε) alters the asymp-

totic delay scaling as O( 1

ε+NΩ(ε)

), reducing the average delay linearly in the

number of users N . This is the Multicasting Gain that the predictive caching

offers over the baseline system.

5. Finally, the theorem points to the effect of “commonality of information”. Given

any value of ε, we see from (3.2.5), that a heavier tail of the popularity distribu-

tion means a higher value for the threshold γ(ε, θ) which in turn further slows

down the delay scaling lower bound as ε grows. This is expected since a heavier

tail means the existence of more high popularity items that are useful to cache.

Equivalently, a heavier tail implies that users are more likely to request similar

contents from the distribution tail which increases the multicasting gains.

3.5.3 Proof of Main Result

The proof of the main result utilizes the main idea used in HT analysis of routing

and scheduling algorithms presented in [47]. The outline of the proof can be broken

down into four steps.

1. We find the appropriate dual system (as in Fig. 3.5) that satisfies Theorem

3.4.1. The analysis is carried out for the dual system.

2. We derive the resource pooling lower bound to be used to show Heavy-traffic

optimality of PC-LQF.

57

3. We show that under PC-LQF, the queue lengths are close to each other at high

network loads. This is formally known as the State-space collapse.

4. We use the results from state-space collapse to derive the main result in Theorem

3.5.1.

Deriving the Dual System

We follow the guidelines implied by Theorem 3.4.1 to derive an equivalent dual sys-

tem to the PC-LQF system. We refer to the dual system as Predictve Caching-Join

the Shortest (N − 1) Queues (PC-JS(N − 1)Q). The new system retains the struc-

ture of PC-LQF by forwarding the most popular content to a multicast queue, and

splitting the bandwidth between on-demand unicast and multicast. However, the key

difference is that in the new system model, the queues are not contending for the

wireless shared channel. The dual system equations can be derived from Theorem

3.4.1 as follows:

Qi[t+ 1] = Qi[t] + A∗i [t] +B∗i [t]− S∗i [t] + Ui[t], ∀i = 1, . . . , N.

BΣ[t] =

0 w.p. θ(ε),

N − 1 w.p. 1− θ(ε).

, Si[t] =

0 w.p. θ(ε),

1 w.p. 1− θ(ε).

E[A∗i [t]] = λ∗ =

∫ γ(θ,ε)

0

pf(p)dp, δ(ε, θ) = 1− θ(ε) − rNλ∗

Where BΣ[t] is the sum of “artificial” arrivals to be routed in the new equivalent dual

problem. Also Si[t] and BΣ[t] are coupled, meaning they follow the same “coin flip”

to decide the value they take at time t. Finally, Ui[t] denotes the unused service of

queue i at time t, namely, Ui[t] = max(0, Si[t]− Ai[t]−Bi[t]−Qi[t]) .

58

Resource Pooling Lower Bound

We start by providing a lower bound for the queue lengths in the next lemma

Lemma 3.5.1. For any predictive caching system, the steady-state sum queue lengths

for a load 1−Nλ = ε can be lower bounded as follows:

E[N∑i=1

Q(ε)

i ] ≥ ζ ′(ε)

2δ(ε, θ)− N

2(3.5.4)

where ζ ′(ε) = N(σ(ε)A )2 + θ(1− θ) + δ(ε, θ)2

The proof is identical to the resource pooling lower bound in Appendix B.1 applied

to the predictive caching system.

State-Space Collapse

In order to prove state-space collapse, we use the result in [53] for bounding the

moments of a Markov Chain defined on a countable state-space:

Lemma 3.5.2. [53] For an irreducible and aperiodic Markov chain Q[t]t≥0 over

a countable state space χ, suppose Z : χ → R+ is a nonnegative-valued Lyapunov

function. We define the drift of Z at Q as:

∆(Z)4= [Z(Q[t+ 1])− Z(Q[t])]I(Q[t] = Q), (3.5.5)

where I(.) is the indicator function. If the following conditions are satisfied:

1. The exists η > 0 and B <∞ such that:

E[∆Z(Q)|Q[t] = Q] ≤ −η, ∀Q ∈ χ with Z(Q) ≥ B. (3.5.6)

59

2. There exists a D <∞ such that

P(|δZ(Q)| ≤ D) = 1 for all Q ∈ χ (3.5.7)

Then, there exists θ∗ > 0 and C∗ <∞ such that

lim supt→∞

E[eθ∗Z(Q[t])] ≤ C∗ (3.5.8)

Furthermore, if Q[t]t is positive recurrent. Then Z(Q[t]) converges in distribution

to a random variable that satisfies

E[eθ∗Z ] ≤ C∗ (3.5.9)

which implies all moments of Z exist and are finite.

A key step of proving our main result is showing that as ε → 0, under PC-

JS(N − 1)Q, all user queue lengths are close to each other in size. This enables us to

show that at the steady state, the system behaves as a single resource pooling queue

that scales slower than the unicast regime. We parametrize the model by the

unicast ε = 1−Nλ(ε), where λ = r

∫ 1

0

pf(p)dp. We are interested in the queue-length

process Q(ε)[t]t and its steady state Q(ε)

under the PC-JS(N − 1)Q policy. This

is done by decomposing the queue lengths vector into two components: a parallel

component that averages all queue lengths, i.e., a projection of the Q onto the vector

c =1√N

1, and a perpendicular component that quantifies the differences in queue

lengths:

Q(ε)‖

∆=

∑Ni=1Q

(ε)i

N1 Q

(ε)⊥

∆=

[Qi −

1

N

N∑i=1

Q(ε)i

]Ni=1

60

From the continuous mapping theorem, we know that the convergence of Q(ε)[t]t

implies the convergence of Q(ε)‖ [t]t and Q(ε)

⊥ [t]t. Following the approach of [47],

we are interested in showing that Q(ε)

⊥ is uniformly bounded for all ε > 0.

Proposition 3.5.1. Under the dual system considered parametrized by ε = 1 − Nλ

, applying the JS(N − 1)Q routing to the arrivals BΣ[t]t. Then for any feasible

arrival rate, i.e., Nλ < 1, there exists a sequence of finite numbers Nrr=1,2,... such

that E[ ∥∥∥Q(ε)

∥∥∥r ] < Nr, for all ε > 0, and for all r = 1, 2, . . ..

Before proving the preposition, we state an important Lemma from [47] that

bounds the Lyapunov function of Q⊥ in terms of other functions that are easier to

manipulate.

Lemma 3.5.3. [47] For any queuing system where the arrival and service processes of

each queue are bounded every time slot by Amax and Smax, respectively. Let ∆L(X) =

(L(X[t+ 1])− L(X[t]))1(X[t] = X), denote the single-step drift for any appropriate

Lyapunov function, L, and any state, X. Then the following bounds hold for V⊥(Q):

∆V⊥(Q) ≤ 1

2 ‖Q⊥‖(∆W (Q)−∆W‖(Q)), ∀Q ∈ RN

+ (3.5.10)

|∆V⊥(Q)| ≤ 2√N max(Amax, Smax)∀Q ∈ RN

+ (3.5.11)

We are now ready to prove the main state-space collapse result:

Proof. See Appendix B.2

Deriving Upper Bounds on queues

The next step in our proof is utilizing Proposition 3.5.1 to obtain the sum-queue

lengths bound on the main Theorem. We state a Lemma from [47] applied to our

predictive caching system which enables us to bound the sum queue lengths in the

steady-state.

61

Lemma 3.5.4. [47] Given the dual predictive caching routing system where with ar-

rival A[t], artificial arrival B[t], and service S[t] vectors at time t, where the artificial

arrivals depend on the queue lengths. Suppose Q[t]t converges in distribution to a

random vector Q with all bounded moments, then for any positive vector c ∈ RN+ , the

following applies

E[〈c,Q〉〈c,S−A−B(Q)〉] =E[〈c,S−A−B(Q)〉2]

2+

E[〈c,U(Q)〉2]

2

(3.5.12)

+E[〈c,S−A−B(Q)〉〈c,U(Q)〉], (3.5.13)

where the term in (3.5.13) can be further bounded as follows

(3.5.13) ≤√

E[‖Q⊥‖2]E[U(‖Q‖2] (3.5.14)

The proof of Lemma is a straightforward application of Lemma 8 and Lemma

9 in [47] into our system. We can now bound the expression (3.5.12) (3.5.13) to

conclude the main Theorem. We start by analyzing the LHS in (3.5.12) as follows

E[〈c,Q〉〈c,S−A−B(Q)〉] =δ(ε, θ)

NE[ N∑i=1

Q(ε)

i

](3.5.15)

Denote the first term in the RHS asζ(ε)

2. This can be calculated directly as

ζ(ε) ∆= E[〈c,S−A−B(Q)〉2] =

1

N(N(σ

(ε)A )2 + θ(1− θ) + δ(ε, θ)2) (3.5.16)

The second term in the RHS of (3.5.12) can be bounded as follows:

E[〈c,U(Q)〉2] ≤ 〈c,1〉E[〈c,U(Q)〉] ≤ δ(ε, θ) (3.5.17)

62

Similarly, we can bound (3.5.13) by applying Proposition 3.5.1 into the bound (3.5.14)

as follows:

(3.5.13) ≤√

E[‖Q⊥‖2]E[‖U(Q)‖2]

≤√

E[‖Q⊥‖2]E[〈1, U(Q)〉] ≤√N2δ(ε, θ) (3.5.18)

Substituting (3.5.15), (3.5.16), (3.5.17), and (3.5.18) into Lemma 3.5.4:

δ(ε, θ)

NE[ N∑i=1

Q(ε)i

]≤ ζ(ε)

2+δ(ε, θ)

2+√N2δ(ε, θ) (3.5.19)

This proves (3.5.1). Taking the limit as ε → 0 leads to to the expression in (3.5.3).

Also taking the limit and comparing to the lower bound in Lemma 3.5.1 establishes

HT optimality since the lower and the upper bound match asymptotically.

Deriving δ(ε, θ)

To conclude the proof of Theorem 3.5.1, it remains to find the bounds characterizing

δ(ε, θ). This could be obtained by rewriting δ(ε, θ) as follows:

δ(ε, θ) = E[S∗Σ]− E[A∗Σ] = 1− θ(ε) − rN∫ γ(θ,ε)

0

pf(p)dp (3.5.20)

=1− λ+ rN

∫ 1

γ(θ,ε)

pf(p)dp− θ = ε+ rN

∫ 1

γ(θ,ε)

pf(p)dp− θ (3.5.21)

We use the fact that

∫ 1

γ(θ,ε)

pf(p)dp ≥∫ 1

γ(θ,ε)

γ(θ, ε)f(p)dp to obtain the following

lower bound:

δ(ε, θ) ≥ ε+ rNγ(θ, ε)

∫ 1

γ(θ,ε)f(p)dp

(a)= ε+ θ(Nγ(θ, ε)− 1) (3.5.22)

63

Where (a) follows the substitution in (3.2.6). Similarly, an upper bound can be

obtained for δ(ε, θ) by using the inequality

∫ 1

γ(θ,ε)

pf(p)dp ≤∫ 1

γ(θ,ε)

1f(p)dp.

δ(ε, θ) ≤ ε+ rNγ(θ, ε)

∫ 1

γ(θ,ε)

f(p)dp(a)= ε+Nθ − θ = ε+ θ(N − 1) (3.5.23)

3.5.4 Closed-Form Delay-Memory Trade-off for the approximate model

The result in Theorem 3.5.1 illustrates the fundamental delay-memory trade-off in

the predictive caching system. Intuitively, larger cache sizes at the users mean more

bandwidth can be used to multicasting, which in turn implies more items can be

served locally which leads to lower delay. We further illustrate that by proposing a

specific approximate model.

We use the Poisson Shot Noise model that was found to empirically fit content

requests well in [40] and used in the caching literature [54] to obtain an approximate

request model as follows:

1. The aggregate requests from all users for a single item follows a Poisson(Nrλ)

distribution.

2. The parameter λ is random for every item, sampled from a Pareto distribu-

tion, i.e., λ ∼ βαβ

(λ+ α)β, λ > 0, where α, β are the scale and shape parameter,

respectively.

The Pareto distribution approximates the well known Zipf distribution [55] used to

model content requests in the infinite catalog regime. We use this model for its

practical utility and analytic tractability, as the scaling term δ(ε, θ) in (3.5.1) can be

obtained exactly in closed-form in the next corollary:

64

Corollary 3.5.1.2. For the approximate Poisson-Pareto model, the PC-LQF algo-

rithm multicasts the items with a Poisson Parameter λ > γ(θ, ε), where the threshold

can be quantified as follows:

γ(θ, ε) = α

((rN

θ

) 1β

− 1

))(3.5.24)

then, PC-LQF achieves the an asymptotic (1

δ(ε, θ), θ(ε))-Queue length scaling-Memory

trade-off, where δ(ε, θ), is quantified as follows

δ(ε, θ) = ε+Nrα

[1

(β − 1)( rNθ

)1− 1β

+α(( rNθ

)1β − 1

)rNθ

]− θ (3.5.25)

The Corollary is a straightforward application of Theorem 3.5.1 applied to the

approximate model. Although the theorem was derived for a different model, an

identical proof can be carried out with the exception that now the Poisson arrivals

every slot cannot be bounded by a number Amax as required by Lemma 3.5.3, however,

it was shown in [56] that the boundedness condition could be relaxed to a bound on

the Moment Generating Function, which is satisfied for our Poisson arrivals, since the

parameter γ clips the Poisson parameter of the arrivals at any slot. We proceed to

plot the Queue length Scaling-Memory trade-off of the approximate model in Fig. 3.6,

with a Pareto(1,3.5) popularity distribution, and a cell with 100 users, for different

values of network utilization ρ =E[AΣ]

E[SΣ], to further illustrate the essential dynamic

in our system. We note two important observations: 1. Reduction in scaling is

more significant at higher network utilization confirming the utility of our proposal

in congested networks. 2. The relationship between the queue length scaling and the

cache size is concave and decreasing. The decreasing part highlights that intelligent

caching indeed causes continuous decrease in scaling as the cache sizes increases (as

long as items being cached have expected requests higher than 1, and the unicast

65

regime is stable). The concave part highlights diminishing returns of increasing cache

sizes: A relatively small cache that can hold most of the “trending content” can offer

great savings by eliminating most redundancy. Once the cache sizes increase beyond

that, the BS starts multicasting less popular items that are otherwise, not widely

requested by the networks causing the savings to slow down. This further confirms

our main message that small practical cache sizes can be very beneficial in reducing

delay.

0 0.005 0.01 0.015 0.020

20

40

60

80

100

Figure 3.6: Scaling of (3.5.25)

3.6 Simulations

We simulate a cellular downlink channel with 100 users, following our request system

model. We are interested in the effect of predictive caching at the “busy hour”, i.e.,

in a congested network. Thus we define the network utilization, ρ =E[AΣ]

E[SΣ], and

simulate cells with varying values of ρ. Note that ρ → 1 is equivalent to ε → 0. We

simulate three scenarios, the baseline unicast on-demand system, a predictive caching

66

;0.8 0.85 0.9 0.95 1

Ave

rage

del

ay (

slot

)

0

10

20

30

40

50

60Baseline: LQFPC-LQF, 3 = 0:1# 0PC-LQF, 3 = 0:0447#

p0

(a) Average Delay vs ρ, f(p) : Beta(1, 4)

;0.8 0.85 0.9 0.95 1

Ave

rage

del

ay (

slot

)

0

10

20

30

40

50Baseline: LQFPC-LQF, 3 = 0:1# 0PC-LQF, 3 = 0:0447#

p0

(b) Average Delay vs ρ, f(p) : Beta(1, 9)

Figure 3.7: Effect of Predictive Caching on Delay

;0.8 0.85 0.9 0.95 1

Nor

mal

ized

cac

he s

ize

0

0.5

1

1.5

2

2.5Baseline: LQFPC-LQF, 3 = 0:1 # 0PC-LQF, 3 = 0:0447 #

p0

(a) Normalized Cache Size vs ρ, f(p) :Beta(1, 4)

;0.8 0.85 0.9 0.95 1

Nor

mal

ized

cac

he s

ize

0

0.5

1

1.5

2

2.5Baseline: LQF

PC-LQF, 3 = 0:1 # 0

PC-LQF, 3 = 0:0447 #p0

(b) Normalized Cache Size vs ρ, f(p) :Beta(1, 9)

Figure 3.8: Normalized Cache Size supporting Predictive Caching

67

system with cache sizes that scale as c1ε, and a predictive caching system with cache

sizes that scale as c2

√ε , i.e., a scaling of ω(ε) (as ε decays to 0).

In Fig. 3.7, we plot the average delay by varying network utilization, ρ, for

two scenarios: Fig. 3.7 (a) for contents sampled from the popularity distribution

Beta(1,4). and (b) sampled from Beta(1,9). We plot the corresponding normalized

cache sizes (with respect to the user request rate, r) in Fig. 3.8. Fig 3.8 constitutes

the price we pay to get delay savings in terms of cache size. Following the vanishing

cache sizes assumption, the normalized cache sizes decay to zero for both the θ(ε)

and the θ(√ε) scaling. The first thing to note in both figures, is the vast delay

reduction for predictive caching over the baseline as ρ → 1, for example as ρ =

0.99, predictive caching offers 10 times delay reduction for θ(ε) cache sizes, and 30

times delay reduction for θ(√ε) cache sizes, which indicates the benefits of predictive

caching. This comes at the cost of a normalized cache sizes of 0.1 and 0.5, roughly

meaning a cache size equal to 10% − 50% of user request rate per content lifetime

(often on the order of a day/few days [40]) , respectively, indicating the power of a

small cache to offer great delay reduction at a congested network. The second thing

to notice is that the figures further solidify our intuition gained from Corollary 3.5.1.1,

since the θ(√ε) offers favorable scaling that empirically alters delay asymptotics as

the delay build-up is very slow compared to the other case. Finally, the discrepancy

between the average delay in Fig. 3.7 (a) and Fig. 3.7 (b) points out to the effect of

popularity distributions on the average delay. The reason case (a) has better delay

performance is because the Beta(1,4) distribution has a heavier tail than the Beta

(1,9) distribution. Heavier tails imply more items with high popularity suggesting

more homogeneity in user requests which increases the multicasting gains in delay

saving. Thus, predictive caching exploits that commonality information which might

68

offer a guiding principle in design that users with similar tastes should be grouped in

physical or virtual cells to fully realize the benefits of predictive caching.

In Fig. 3.9, we plot the empirical delay-memory tradeoff by directly plotting the

delay vs. the cache size for various values of ρ. The average delay value for the

on-demand unicast system(equivalently when the cache sizes are zero) appears on

the Y-axis. and the predictive caching average delay appear on the X-axis. The

figure further highlights that a small cache can offer very significant delay reductions

especially in congested networks.

Figure 3.9: Empirical Delay-Cache size Trade-off

3.7 Conclusion and Future Work

We have studied the potential of predictive caching to reduce delay at wireless cells

especially at high traffic. We introduced a novel duality framework between routing

and scheduling problems that we expect to be of independent interest in simplifying

analysis of scheduling algorithms. We have shown that under a vanishing cache size

assumption, predictive caching that utilizes multicasting alters the delay-throughput

69

scaling as the network approaches full-load, which translates to many-fold reduction in

average delay in simulations. We highlighted a fundamental delay memory trade-off in

the system and characterized the correct delay scaling to obtain linear multicasting

gains in the number of users. Future works include the treatment of personalized

predictions where multicasting and caching can be done taking into account some

information of user tastes. This combines the advances in recommender systems and

online learning with the delivery problem that aims to build efficient multicasting

trees. Furthermore, we aim to develop the PC-LQF scheduling algorithm to operate

under non-ideal radio conditions, such as fading, where achievable rates can vary

for various receivers. Choosing the correct multicasting rate becomes a non-trivial

problem. We also plan to test that practical algorithm with real-life data traces using

testbeds to accurately quantify the empirical effect of memory usage on delay.

70

CHAPTER 4

DISTRIBUTED NODE-BASED LOW LATENCY

SCHEDULING

4.1 Introduction

In this chapter, we study the design, analysis, and empirical performance of node-

based distributed scheduling algorithms [57] [58] . It is well known that distributed

scheduling algorithms in large wireless networks is very challenging due to inter-

ference between wireless link and lack of central coordination. This problem has

hampered the development and deployment of wireless adhoc networks and wireless

mesh networks on a large scale. Thus, developing efficient fully-distributed schedul-

ing algorithm is of utmost importance to realize large wireless decentralized networks

and fully benefit from their capabilities. A good scheduling algorithms balances three

objects: i)High Throughput: Characterized by the fraction of the network capac-

ity region a scheduling algorithm achieves. Ideally a scheduling algorithm should be

able to support any set of arrival rates within the capacity region. ii)Low Delay: A

good scheduling algorithm should be able to maintain the throughput required by the

application without incurring excessive delay at any of the links. Furthermore, the

expected delay should scale favorably with the size of the network. iii)Low Com-

plexity: Required to ensure easy implementation and to minimize resources required

to run the algorithm.

71

The seminal work of [59] is the first example of a throughput-optimal scheduling

algorithm, which can support any arrival rate vector within the network capacity

region without any of the link queues growing to infinity. It was shown that if the in-

terference relationships of the network is modeled by a conflict graph, the max-weight

algorithm, where the weight of the link is taken to be the queue size, is throughput

optimal. However, max-weight based algorithms suffer from high complexity: In

general networks, determining the maximum weight independent set is NP-hard.

It was recently shown that CSMA-like algorithms could be made throughput-

optimal, if designed intelligently, namely, if every static link’s activation rate is opti-

mized [12] according to the network topology and operating point or taken as appro-

priate function of the queue length [13], [14]. This result is attractive because CSMA

algorithms are fully distributed. The idea behind such algorithms is to run a Markov

chain of collision-free schedules that have a stationary distribution approximating the

max-weight solution. When the Markov chain converges to the max-weight solution,

throughput optimality is achieved. A major shortcoming of these algorithms is their

delay performance which has been shown to be unsatisfactory for many cases [60].

For example, in [61], it has been shown that the delay can grow exponentially with

the size of the network in general graphs. Furthermore, it was shown in [62] that there

exists worst case topologies, such that, even to attain a fraction of the capacity region,

either the delay or the complexity must increase exponentially. The canonical exam-

ple that illustrates the poor performance of CSMA is the network that has a torus

or lattice conflict graph. We can easily see the existence of two optimal schedules:

the “odd” and the “even” schedules. As the network size increases the transitions

between the “odd” and “even” schedules become less frequent, causing the average

delay to increase. This is known as the starvation problem of CSMA [63]. CSMA

gets stuck in a “good” schedule for a very long time. This means that the key to

72

decreasing the average-delay of CSMA-like algorithms, is to decrease the starvation

time of all links.

Throughput-optimal CSMA algorithms tend to treat links as separate autonomous

entities that do not communicate. We argue that this squanders the opportunity to

exploit hotspots in the networks that can control a large number of outgoing links

to obtain favorable delay performance. In many practical wireless network, such as

wireless mesh networks and wireless ad-hoc networks, many nodes typically control

multiple outgoing links, and thus those nodes can use that to make better scheduling

decisions. Furthermore, it is a desirable trait for wireless ad-hoc networks to be k -

connected [64] to ensure connectivity and fault tolerance. This means that every node

will have a minimum of k outgoing links. Thus, we use those practical properties

in networks to motivate a new Node-Based CSMA (NB-CSMA), where scheduling

decisions are made on a node level rather than a link level. The node-based CSMA

implementation was touched upon in [14], however the node based implementation

is described as a straight-forward extension to the link based Q-CSMA, that is, it

still relies on “single-site updates” in the underlying Glauber dynamics as opposed to

the proposed NB-CSMA that relies on updating a number of vertices in the conflict

graph jointly. The motivation behind the NB-CSMA algorithm is our observation

that at high throughput regimes, activation rates tend to be high and links tend to

be increasingly greedy acquiring and keeping the medium. Thus, a node’s ability to

switch between two links in one slot without having to go through an idle slot is

expected to make switching between dominant schedules more frequent. This causes

the starvation period of all links to decrease. Our contributions can be summarized

as follows:

73

1. We propose a new throughput-optimal distributed Node-Based CSMA (NB-

CSMA) algorithm, where the scheduling decisions are made on a node level

rather than a link level.

2. We compare the Node-Based CSMA (NB-CSMA) to the link based CSMA

(Q-CSMA) [14] in terms of expected delay for any fixed network. We show

analytically and via simulations that NB-CSMA performs no worse than Q-

CSMA for any network setting.

3. We use mixing time analysis to characterize the fraction of the capacity region

where under the NB-CSMA algorithm, the expected queue lengths and expected

delay can be bounded by a polynomial in the size of the network (as opposed to

exponential mixing). We show that this fraction is no smaller than the known

fraction of capacity region under Q-CSMA.

4. For a special class of networks, namely, collocated networks, we derive analyti-

cally a closed-form for the link mean starvation time using a Markov chain with

rewards framework. We then use the results in the case where all link through-

puts are equal to quantitatively demonstrate the improvement of NB-CSMA

over Q-CSMA as a function of both topology and network load.

This list shows that NB-CSMA achieves all three objectives of scheduling: Throughput-

Optimality, improved delay over Q-CSMA and distributed implementation. Throughput-

Optimality can be shown by analyzing the Markov chain generated by the scheduling

algorithm. Delay performance, however, is very hard to analyze. Therefore, to show

the delay benefits of NB-CSMA, we look at different angles of delay performance,

namely, the second order behavior of the per-link service late under NB-CSMA as

compared to Q-CSMA, the delay scaling as a function of the network size and the

per-link mean starvation time. These different angles give us a comprehensive view

74

of delay improvements of NB-CSMA. We supplement the analysis using extensive

numerical simulations that show that NB-CSMA results in around a 50% reduction

of average delay over Q-CSMA for all considered scenarios.

4.2 Related Work

Delay performance of throughput optimal CSMA algorithms have been discussed in

the literature in several works. In [65], it was shown via a mixing time analysis that

for bounded-degree graphs and for a fraction of the capacity region, the delay growth

is upper bounded by a polynomial in the size of the network. In [66], it was further

shown that for a reduced fraction of the capacity region, the delay is bounded by a

constant independent of the network size. In addition to delay analysis, much of the

existing research focused on how to alter the CSMA algorithms to improve the net-

work delay performance while maintaining throughput optimality and low complexity.

In [61], resetting the algorithm periodically is proposed to prevent starvation in the

network. This was shown to be order optimal (with respect to the size of the network)

for networks that have a torus or lattice shaped conflict graphs. In [67], the authors

propose a new update rule, where the Metropolis algorithm is used as a substitute to

the underlying Glauber dynamics-based algorithm to update the schedule. This was

shown to have better delay performance for fixed sized networks. In [68], a modified

version of the CSMA algorithm is proposed, where only links with queue lengths ex-

ceeding a certain threshold where allowed to contend for the medium. This has the

effect of reducing the number of contending links every time slot and consequently

reducing average delay. Elegant solutions to the delay problem were proposed in [69]

and [70] where multiple Markov chains of collision free schedules are run in parallel,

and the actual schedule is chosen from them probabalistically [69] or periodically [70].

Intuitively, as the number of parallel Markov Chains increases, the probability that

75

the scheduling algorithm gets “stuck” in one good schedule for a long time decreases,

decreasing the expected delay. It was also showed in [71] that the delayed-CSMA pro-

posed in [70], with a suitable number of parallel schedules, achieves order optimal per

link steady state delay. However, this vast improvement of steady state delay comes

at the cost of fast increase of convergence time as the number of parallel schedules

increases. This has a detrimental effect on the transient delay, i.e., these algorithms

have favorable steady state delay performance, however the time to get to that steady

state can still be exponential in the size of the network. Therefore, the resulting de-

lay performance can still be unsatisfactory. Complementary to delay analysis, some

works such as [63] [72] have focused their efforts on studying link starvation; another

quantity of interest that relates to delay. Link starvation can be roughly defined

as the time it takes the link to regain the transmission medium after ceasing trans-

mission. Some works such as [69] and [68] have also studied starvation performance

of their respective proposed algorithms and related it to the Head-Of-Line packet

waiting time and average delay respectively.

4.3 System Model

We model the wireless network by the connectivity graph G(K,V ) where K is the set

of nodes in the network and V is the set of the directional links. Links are assumed to

have binary interference relationships. The interference relationship between different

links in the network can be represented by a conflict graph G(V,E), where V is the

set of communication links in the network. An undirected edge (i, j) ∈ E exists if

link i and link j interfere with each other. This is called the Interference Graph

Model, and can be used to model any interference relationship such as Geometric,

M-Hop, etc., as long as interference relationships between links are binary. We define

the neighbors of a link v, Nv = w ∈ V : (v, w) ∈ E. Neighboring links in the

76

1

2

3

4

5

A

BE

C

DF

(a) G(K,V )

E

A

B

C

D

F

K1

(b) G(V,E)

Figure 4.1: An Example of a simple 5 node network topology and the correspondingconflict graph (1-Hop interference relationship)

conflict graph are not allowed to transmit simultaneously to avoid collision. We

assume that a node can only activate one outgoing link in each slot, which is the case

in most wireless networks. Note that under this assumption, outgoing links of node

k ∈ K form a clique (complete subgraph) in the conflict graph. We denote this set

of links as Kk. Fig. 1 is an example of this modeling for a simple 5-node network

with primary exclusive constraints, i.e., links can be scheduled if and only if they

constitute a matching in G. The clique K1 is also highlighted as an example.

We consider a slotted system, where each slot is divided into a contention period

and a transmission period. Our system follows the Discrete-time Synchronous-CSMA

framework. At the beginning of each time slot, there is a fixed contention period

where all links cease transmission and start contending for the medium while sensing

to determine whether they should transmit or sense for the transmission period of the

current time slot. We define a schedule s(t) = (sv)v∈V (t) ∈ 0, 1|V | which represents

a set of transmitting links in a given time slot, i.e., link v is active at time t if sv(t) = 1.

We use the bold notation s(t) to denote the schedule of all links in a certain time slot.

77

The mean service rates of all links are E(s(t)) = µ. A feasible schedule is one which

does not violate the conflict relationships of E, therefore a schedule s is feasible if

si + sj ≤ 1 ∀(i, j) ∈ E. We denote the set of all feasible schedules by Ω. Note that a

feasible schedule is mapped to an independent set of vertices in the conflict graph.

Each link v has a queue qv to store incoming packets. We assume an independent

stationary arrival process av(t) at each link with a mean equal to E(av(t)) = νv. The

queue dynamics for each link are given by

qv(t+ 1) = (qv(t)− sv(t) + av(t))+ ∀v ∈ V (4.3.1)

We assume all traffic is single hop, where a packet exits the network right after a

successful transmission. These assumptions imply that the queue state of all links

(q1(t), q2(t), . . . , q|V |(t)) evolves as a discrete-time Markov chain. We define the Ca-

pacity Region of the network, Λ ⊆ [0, 1]n to be the set of arrival rate vectors νvv∈V ,

of which there exists a scheduling algorithm that can stabilize all the queues, i.e.,

under some feasible scheduling algorithm, the expected queue length remains finite

for all links in the network. A condition for stability is that the Markov chain of the

queue evolution process is positive recurrent. It was shown in [59], that the Capacity

Region of any network is the convex hull of all feasible schedules:

Λ = ν 0 : ∃µ ∈ Co(Ω), ν ≺ µ (4.3.2)

where Co(.) is the convex hull operator, and vector inequalities are component-wise.

Definition 4.3.1. A scheduling algorithm is said to be Throughput-Optimal if it

can keep all queues stable for any arrival rate vector ν ∈ Λ.

78

CSMA: Glauber Dynamics Single Site Update

We now briefly explain the discrete time throughput optimal link-based CSMA pro-

posed in [14] (Q-CSMA) to motivate and introduce our NB-CSMA algorithm in the

next section. The throughput-optimal CSMA algorithm applies the Glauber dy-

namics from statistical physics with appropriate activation rates. Here, a feasible

schedule corresponds to an independent set in the conflict graph G. One application

of Glauber dynamics is sampling independents sets on a graph. This is known in

the statistical physics literature as the hardcore model (see [73] for details and ex-

amples). Q-CSMA is basically an application of Glauber dynamics on the conflict

graph to sample weighted independent sets. In every time slot t, a link v is randomly

chosen to update as follows:

• If∑w∈Nv

sw(t) = 0, then

sv(t+ 1) = 1 w.p.

λv1 + λv

sv(t+ 1) = 0 w.p.1

1 + λv

• Otherwise, sv(t+ 1) = 0

• and for all w 6= v, let sw(t+ 1) = sw(t)

The schedule s(t) then forms an irreducible aperiodic reversible Markov chain with

stationary distribution:

π(s) =1

Z

∏v∈V

λsvv (4.3.3)

where Z =∑sv∈Ω

∏v∈V

λsvv is a normalizing constant. The throughput optimal Q-CSMA

[14] uses a modified version of the Glauber dynamics where multiple parallel updates

at non-conflicting links are allowed using the same update rule. The activation rate

or fugacity (we use those two expressions interchangeably), λv, of any link v changes

dynamically over time depending on a local weight function wv that is taken as a

79

concave non-decreasing function of the queue length qv. For example, if the fugacity

λv is taken as

λv = ewv , (4.3.4)

this implies that the stationary distribution of the Q-CSMA is

π(s) =1

Ze∑v∈s wv(t) (4.3.5)

The intuition behind the throughput optimality is that the Q-CSMA approximates

the max-weight solution, which is known to be Throughput-Optimal [59].

The extension to a node-based implementation in [14] describes a protocol to

determine which nodes can update their links using an RTS/CTS mechanism. Se-

lected nodes then choose an outgoing link uniformly to update link activity using

the Q-CSMA rules. Clearly, the node-based implementation of Q-CSMA does not

allow switching between outgoing links. This motivates our Node-Based CSMA (NB-

CSMA).

4.4 Node Based CSMA : Glauber Dynamics with Block up-

dates

In this section, we introduce our proposed NB-CSMA algorithm. The reason behind

Q-CSMA’s poor delay performance is the need for high activation rates that causes

all links to be greedy when contending for the medium, which causes the Q-CSMA to

converge to one good schedule and remain at that schedule (or fluctuate around it)

for a long time. In NB-CSMA, we attempt to solve that problem by examining the

possibility of directly switching between links that share a common transmitter. The

80

intuition is that the network will then switch between dominant schedules more often,

without having to pay the price of an idle time slot every time a switch happens.

4.4.1 Step 1: Forming Blocks

We now explain how NB-CSMA works. Recall that the outgoing links of every node

k ∈ K are mapped to a clique in the conflict graph that we call Kk. At the beginning

of each slot, a subset of nodes is selected for update. Each one of those selected nodes

chooses a subset of outgoing links to update in that slot. We call this subset of links

the update clique, Ck, for a node k that is allowed to update this slot. The selection

of update cliques Ck is made such that

1. Ck ⊆ Kk

2. For any two nodes that will update at the same slot k, l ∈ K, we have (v, w) /∈

E ∀v ∈ Ck, w ∈ Cl

The first condition simply states that each update clique Ck is a subset of the outgoing

links that have a common transmitter node k ∈ K. This ensures that the algorithm

is fully distributed with respect to the nodes, i.e., the only information needed to

make a scheduling decision comes from the queue lengths at each physical node. The

second condition is more subtle: It states that the links (vertices on conflict graph)

of update cliques should not have an edge between them, i.e., they do not interfere.

This requirement is necessary to ensure that the resulting schedule is feasible since

the update clique can turn any of its links on. We denote the collection of update

cliques at each time slot as C. A simple RTS/CTS-like scheme can be used, whereby

each node polls its outgoing links and adds them to the update clique if no conflict

exists. This simple scheme can be shown to find update cliques that satisfy the two

81

conditions with no siginificant messaging overhead. We denote the probability of

choosing a particular update clique collection C at any time slot as P (C).

4.4.2 Step 2: Updating Blocks

Now that the update cliques C have been found, we proceed to explain how the

schedule shall be updated. We call x(t+1) the proposed schedule. This schedule will

be used to obtain the actual schedule s(t + 1). The NB-CSMA update procedure is

given in Algorithm 4. We can see that lines 3-8 in Algorithm 4 describe an operation

Algorithm 4: NB-CSMA Algorithm

1 for Ck in C do2 if ∃v ∈ Ck s.t. sv(t) = 1 then

3 w.p.1

|Ck|4 sv(t+ 1) = 1, sw(t+ 1) = 0, ∀w ∈ Ck \ v w.p.

λv1 + λv

5 sv(t+ 1) = 0, sw(t+ 1) = 0, ∀w ∈ Ck \ v w.p.1

1 + λv

6 w.p.|Ck| − 1

|Ck|7 xw(t+ 1) = 1, xz(t+ 1) = 0, ∀z ∈ Ck \ w w.p.

λw∑z∈Ck(1 + λz)

, ∀w ∈ Ck \ v

8 xw(t+ 1) = sw(t), ∀w ∈ Ck Otherwise

9 else10 pick link w ∈ Ck uniformly at random

11 xw(t+ 1) = 1 w.p.λw

1 + λw

12 xw(t+ 1) = 0 w.p.1

1 + λw13 if xw(t+ 1) + sz(t) ≤ 1, ∀w ∈ Ck, ∀z ∈ Nw \ Ck then14 sw(t+ 1) = xw(t+ 1), ∀w ∈ Ck15 else16 sw(t+ 1) = sw(t), ∀w ∈ Ck

82

where an already active link is chosen to be updated according to a procedure similar

to the Q-CSMA. Similarly, lines 10-12 describe how an inactive link with no active

neighbor is chosen randomly and added to the schedule with some probability. This

is again similar to the Q-CSMA procedure, i.e., lines 3-8 and 10-12 just describe the

classical Glauber dynamics to generate independent sets with some desired distribu-

tion. Lines 6-8 represent the addition provided by making the CSMA algorithm node

based. A node can switch from one active link to an inactive one according to the

probability defined in line 7. Lines 14-18 state that a link update is only accepted if

the new schedule is an independent set to ensure that no collisions happen; otherwise

the links’ state in the new time slot remains identical to the one in the previous time

slot.

Note that the NB-CSMA algorithms corresponds to performing “Block updates”

as opposed to single site updates in Q-CSMA. The block updates have been used

before for Glauber dynamics to analyze the mixing time [74], [73].

4.5 Performance of the NB-CSMA Algorithm

Since the schedule s(t+1) depends only on the schedule of the previous time slot s(t),

the evolution of schedules over time forms a Discrete Time Markov Chain (DTMC)

with state space Ω. To write the transition probability between two feasible schedules

s and s′, it is useful to look at the conditional transition probabilities given a particular

choice of update cliques C. Let sCk(t) be the schedule of clique Ck at time slot t.

We can breakdown update cliques into three types and characterize their different

transition probabilities P (sCk , s′Ck)

1. Type A: CA: sw(t) = 0, ∀w ∈ CA and s′v(t + 1) = 1, ∃v ∈ CA and

P (sCA , s′CA) =

1

|CA|λv

1 + λv

83

2. Type B: CB: sv(t) = 1, ∃v ∈ CB and s′w(t + 1) = 0, ∀w ∈ CB and

P (sCB , s′CB) =

1

|CB|1

1 + λv

3. Type C: CC: sv(t) = 1, ∃v ∈ CC and s′w(t + 1) = 1, ∃w ∈ CC \ v and

P (sCC , s′CC ) =

|CC | − 1

|CC |λw∑

z∈CC (1 + λz)

Since changes over different cliques at each time slot are independent, the conditional

transition probability given some update cliques C, P (s, s′|C), is calculated by multi-

plying the transition probabilities over every clique depending on the transition type

as described above. Let CA,CB and CC be the cliques that will have a transition of

type A, B or C respectively. The total transition probability follows.

P (s, s′) =∑C

P (C)P (s, s′|C) (4.5.1)

=∑(C)

P (C)( ∏CA∈CA

P (sCA , s′CA)

∏CB∈CB

P (sCB , s′CB)

∏CC∈CC

P (sCC , s′CC ))

(4.5.2)

The DTMC of transmission schedules is both irreducible and aperiodic. Irreducibility

can be checked by noticing that starting from the state s, sv = 0 ∀v ∈ V , any

feasible state s′ ∈ Ω can be reached in finite number of steps. Aperiodicity is easy

to check as well, by noticing that every schedule s ∈ Ω has a self transition if every

clique in the update cliques makes no transition, which always happens with a non-

zero probability.

Theorem 4.5.1. The DTMC is reversible with stationary distribution given by

π(s) =1

Z

∏v∈V

λsvv (4.5.3)

where Z =∑sv∈Ω

∏v∈V

λsvv is a normalizing constant.

Proof. See Appendix C.1

84

Throughput Optimality

It was shown in [14] that any scheduling algorithm that has the stationary distribu-

tion of the form (4.5.3) is Throughput-Optimal under the time-scale separation as-

sumption, i.e., assuming that the DTMC of schedules converges on a fast time-scale

compared to the queue evolution. This means that it is sufficient for the schedul-

ing algorithm to have the “correct” stationary distribution in (4.5.3) in order to be

Throughput-Optimal, thus, NB-CSMA is Throughput-Optimal. The time-scale

separation assumption means that the schedules can be assumed to be always at the

stationary distribution. This assumption was justified in [75]. The intuition behind

Throughput-Optimality can be shown by defining a weight function of link v as

wv = f(qv), ∀v ∈ V (4.5.4)

where f is a concave nondecreasing function of the queue length q (usually taken as

a function slower than log( ) for technical reasons). Then, if we take the link fugacity

to be:

λv = ewv , (4.5.5)

we obtain the stationary distribution of the DTMC as:

π(s) =1

Ze∑v∈s wv(t). (4.5.6)

Intuitively, (4.5.6) approximates the max-weight solution that is known to be Throughput-

Optimal every time slot [59]. We note that the stationary distribution of NB-CSMA

is equal to the one of Q-CSMA in [14], that is, in terms of the stationary distribution

85

of schedules, NB-CSMA and Q-CSMA are equivalent. It is the second-order behavior

that is different, which causes NB-CSMA to have favorable delay performance.

Delay Performance

Due to complex interactions between different links, the delay performance is gener-

ally hard to assess. However, several tools have been used to analyze the delay of

CSMA-like algorithms. We begin our delay performance assessment by comparing

the delay performance between Q-CSMA and NB-CSMA for a fixed size network,

showing that under NB-CSMA algorithm, each link sees a service process that has

less variability, implying better delay performance. We then proceed to test how NB-

CSMA scales as the network size increases. In particular, we are interested in char-

acterizing the fraction of the capacity region where the queue sizes can be bounded

polynomially in the number of links (as opposed to exponentially). A similar result

for Q-CSMA was found in [65]. We show that the fraction of the throughput region

where the NB-CSMA is fast mixing is usually larger that the one found in [65], except

for some special cases when the two regions coincide. Before we proceed, we make

three simplifying assumptions for the sake of tractability:

A1 All fugacities, λv ∀v, are fixed and possibly heterogeneous, as opposed to the

dynamic fugacities used to prove throughput optimality. Suitable fixed fugac-

ities can be found using the problem formulation/solution in [12] to stabilize

any arrival rate vector in the capacity region.

A2 We assume for our comparison purposes that both NB-CSMA and Q-CSMA

perform a single update per time slot, i.e., in Q-CSMA one link (vertex in the

conflict graph) is updated, and in NB-CSMA one node (clique in the conflict

graph) is updated. This does not change the stationary distribution in (4.5.3);

86

it just provides easier grounds for comparison at the cost of slower convergence

to the stationary distribution since only one update per slot is allowed.

A3 We assume that in Q-CSMA, each time slot, a link is chosen uniformly at

random to be updated (with probability1

n, where n is the number of links).

We assume on the other hand in NB-CSMA, each time slot a node k is updated

at random with probability proportional to the number of outgoing links (with

probability|Kk|n

). This makes the probability that a particular link is turned

ON or OFF equal in both Q-CSMA and NB-CSMA.

We note that A2-A3 suit the case of continuous-time CSMA. However, in discrete-time

NB-CSMA, the network has to perform parallel updates (dropping A2-A3) for two

reasons: 1. Performing single-site updates require network-wide coordination which

is not practical in large networks. 2. Parallel updates enables the network to track

the optimal max-weight solution faster (and speeds up mixing time by O(n) [65])

which translates to a reduction in network delay.

Comparison to Q-CSMA for fixed size networks

In a fixed size network setting, we are interested in comparing the steady-state delay

performance between Q-CSMA and NB-CSMA. The approach we use is similar to the

one used in [67] to compare fixed size networks. This approach depends on looking at

the service process of any link v in isolation. First, define the service process of link

v under NB-CSMA and Q-CSMA algorithms as σv(t) and σv(t), respectively. Also

define P, P , π, π to be the transition matrix and the stationary distribution of the

NB-CSMA and the Q-CSMA Markov chains, respectively. Throughout this analysis,

we assume that both Markov chains P and P have already converged to their steady

state distribution.

87

Let B be the subset of schedules that include link v in them: B = s ∈ Ω : sv = 1.

The service process σv(t) is a 0-1 process with

P (σv(t) = 1) =∑s∈B

π(s). (4.5.7)

We define

T1 = mint ≥ 0 : σv(t) = 1 (4.5.8)

Ti+1 = mint > Ti : σv(t) = 1. (4.5.9)

For the states s ∈ B, Let τi = Ti+1 − Ti (i ≥ 1) be the sequence of recurrence times

and πB be their steady state probability. The quantities πB, σ(t), τi are defined simi-

larly. We are interested in comparing the quantities E(τi),E(τ 2i ) with the quantities

E(τi),E(τi2).

Theorem 4.5.2. Under assumptions A1-A3, for any link v, the following inequalities

hold

E(τi) = E(τi) (4.5.10)

E(τ 2i ) ≤ E(τ 2

i ) (4.5.11)

Proof. See Appendix C.2

Fast Mixing Activation Rates

We are now interested in characterizing the asymptotic delay performance of the NB-

CSMA as the size of the network grows for networks with bounded-degree conflict

graphs. It was shown in [61], that at high throughput values, the delay of the con-

ventional CSMA algorithm grows exponentially with the number of the links in the

88

network, which makes the delay performance unacceptable in large networks. Our

goal is to characterize the throughput region where the average delay is bounded by

a polynomial, because this is the region where the network is guaranteed to operate

with acceptable delay performance. In [65], this region was shown to be contained in

the region where the schedules Markov chain is “fast mixing”. Thus, our target is to

find the fraction of the capacity region that makes the network fast mixing.

We first define the total variation distance between the Markov chain distribution

at time t starting from state x and the stationary distribution π:

dTV (P tx, π) = max

A⊂Ω|P tx(A)− π(A)| (4.5.12)

The mixing time of the Markov Chain is defined as

Tmix(ε) = mint |maxx

dTV (P tx, π) < ε. (4.5.13)

Our tool for bounding the mixing time will be coupling. The coupling of two Markov

chains is a pair process (X t, Y t) such that

1. Each of the Markov Chains (X t, Y t) when viewed in isolation remains faithful

to the original Markov Chain.

2. If X t = Y t then X t+1 = Y t+1.

The mixing time is bounded by the stopping time taken for any two coupled processes

to meet, that is

Tmix ≤ maxx,y

mint : X t = Y t|X0 = x, Y 0 = y. (4.5.14)

89

Equivalently, instead of computing the stopping time explicitly, we can define a dis-

tance metric on Ω, and compute the time taken for the distance between the two

processes to go to zero. Path coupling introduced in [76] is a powerful tool that

makes it easier to design and analyze couplings, by showing that to bound the mix-

ing time, it is enough to restrict the couplings to pairs of states that are adjacent

in the metric. This is much easier than bounding mixing time using couplings for

arbitrary pairs of states. Formally:

Theorem 4.5.3. [76] Let δ be an integer valued metric on Ω×Ω which takes values

0, ..., D . Let S be a subset of Ω×Ω such that for all (xt, yt) ∈ Ω×Ω, there exists

a path xt = e0, e1, ..., er = yt between xt and yt such that (el, el+1) ∈ S, ∀ 0 ≤ l < r

and

r−1∑l=0

δ(el, el+1) = δ(xt, yt) (4.5.15)

Define a coupling (x, y)→ (x′, y′) on the Markov Chain P on all pairs (x, y) ∈ S and

suppose there exists β ≤ 1 s.t. E(δ(x′, y′)) ≤ βδ(x, y) for all x, y ∈ S . If β < 1 the

mixing time Tmix(ε) satisfies

Tmix(ε) ≤log(Dε−1)

1− β, (4.5.16)

where D = maxx,y∈Ω

δ(x, y), i.e., the maximum distance between any two states in terms

of the metric δ(x, y).

Later, we shall apply this theorem with the Hamming distance metric, i.e., δ(x, y) =∑v∈V

1(xv 6= yv). Thus, for our results, we can substitute D with the number of links

in the network, n. Instead of analyzing the NB-CSMA Markov chain, P , we analyze

a different Markov Chain Q that is linearly related to the original Markov chain P ,

90

and rely on the relationship between P and Q to estimate the mixing time of P .

Throughout this section, we shall assume that assumptions A1, A2 and A3 hold. For

notational convenience, we refer to the set of outgoing links of node k, that contains

link v as Kv. Recall that Kv is also a clique in the conflict graph. Furthermore under

A2, since only one node gets to update its schedule every time slot, we assume this

node always chooses to update all outgoing links Kv. Let λ1 ≤ λ2... ≤ λmax be the set

of fugacities of all links v ∈ V . Define the transitions of Markov Chain Q as specified

in Algorithm 5. It is not difficult to check that the Markov Chain Q has a stationary

distribution equal to that of (4.3.3) and (4.5.3).

Theorem 4.5.4. Given the Markov Chain Q, if λmax <1

maxv

(dv − |Kv|), then the

mixing time of Q satisfies

Tmix(ε) ≤2n(1 + λmax) log(nε−1)

1−maxv

(dv − |Kv|)λmax

= O(n log n). (4.5.17)

Where dv is the interference degree of link v (number of neighbors in conflict graph

or number of interferers).

Proof. See Appendix C.3

Now that we have bounded the mixing time of Q, we need to relate it to the mixing

time of P . We make use of the following Lemma stated as Theorem 5.3 in [77].

Lemma 4.5.1. [77] Given two Markov chains P and Q, let their mixing times be

T Pmix and TQmix respectively. If

P ≥ αQ, (4.5.18)

then T Pmix ≤ 2α−1TQmix log(π∗(2ε)−1) where π∗ = maxx∈Ω

√1− π(x)

π(x).

91

Algorithm 5: Evolution of Markov Chain Q

1 Pick a Clique Kv to update randomly w.p.|Kv|n

2 w.p.1

23 s(t+ 1) = s(t)

4 w.p.1

25 if ∃v ∈ Kv s.t. sv(t) = 1 then

6 w.p.1

|Kv|7 sv(t+ 1) = 1, sw(t+ 1) = 0, ∀w ∈ Kv \ v w.p.

λv1 + λmax

8 sv(t+ 1) = 0, sw(t+ 1) = 0, ∀w ∈ Kv \ v w.p.1

1 + λmax

9 w.p.|Kv| − 1

|Kv|10 xw(t+ 1) = 1, xz(t+ 1) = 0, ∀z ∈ Kv \ w w.p.

λw2(|Kv| − 1)(1 + λmax)

, ∀w ∈ Kv \ v

11 xw(t+ 1) = sw(t), ∀w ∈ Kv otherwise

12 else13 pick link w ∈ Kv uniformly at random

14 xw(t+ 1) = 1 w.p.λw

1 + λmax

15 xw(t+ 1) = 0 w.p.1

1 + λmax

16 if xw(t+ 1) + sz(t) ≤ 1, ∀w ∈ Kv, ∀z ∈ Nw \Kv then17 sw(t+ 1) = xw(t+ 1), ∀w ∈ Kv

18 else19 sw(t+ 1) = sw(t), ∀w ∈ Kv

92

Before we apply this Lemma to bound the mixing time of P, we state another

Lemma

Lemma 4.5.2. Let P ∗ be the lazy version of the Markov Chain P (by having a self

transition with probability1

2every time slot). Then P ∗ ≥ αQ, where α =

1

2.

Proof. Denote the transition from s to s′ as P ∗(s, s′) and Q(s, s′) in the P ∗ and Q

chains respectively. We verify that α =1

2for all three types of transitions (and we

add type 4 for self-transitions

T1 P ∗(s, s′) =λv

2n(1 + λv)≥ λv

2n(1 + λmax)≥ Q(s, s′) ≥ αQ(s, s′)

T2 P ∗(s, s′) =1

2n(1 + λv)≥ 1

2n(1 + λmax)≥ Q(s, s′) ≥ αQ(s, s′)

T3 P ∗(s, s′) =|Kv| − 1

2n

λw∑z∈Kv(1 + λz)

≥ |Kv| − 1

2n

λw|Kv|(1 + λmax)

≥ αQ(s, s′)

T4 P ∗(s, s′) ≥ 1

2≥ α ≥ αQ(s, s′)

We can now directly bound the mixing time of P using the following theorem:

Theorem 4.5.5. Given the NB-CSMA Markov Chain P, if λmax <1

maxv

(dv − |Kv|),

then the mixing time of P can be bounded as

Tmix(ε) = O(n2 log n). (4.5.19)

Proof. The theorem is proved by applying Lemma 5 for P ∗ with α =1

2from Lemma

6. Also, for any graph, log(π∗) = O(n) [77]. Applying Lemma 5 with these quantities,

and noticing that P ∗ has double the mixing time of P concludes the theorem.

93

It remains to find the fraction of the capacity region that causes the Markov

chain P to be fast mixing, and consequently causes the queue lengths of links to be

polynomially bounded in the number of links. Before we state our main theorem we

state a related result from [12] as a Lemma

Lemma 4.5.3. [12] Given any ν ∈ Λ, there exists suitable activation rates λ such

that for every link v, the mean service rate E(sv) is equal to the mean arrival rate νv.

The last Lemma ensures the existence of fixed activation rates (fugacities) that

stabilize the queues whenever the arrival rate vector falls within the capacity region.

Now we are ready to present our final theorem

Theorem 4.5.6. Given an arrival rate vector ν that satisfies ν ∈ γΛ, where

γ =1

maxv

(dv − |Kv|+ 1), (4.5.20)

the Markov chain P is fast mixing, and the expected queue lengths of any link v can

be bounded by

E(qv(t)) = O(Tmix) = O(n2log(n)) (4.5.21)

Proof. See Appendix C.4

Discussion

In the last theorem, we characterized the fraction of the capacity region that makes

the Markov chain P fast-mixing. This is the fraction of the capacity region for which

the average queue lengths grow polynomially in the number of links n. The region

in Theorem 6 is no smaller than the regions found in [65] and [66]. In fact the result

in [65] showed that the network was fast-mixing for ν ∈ 1

4Λ, where 4 = max

vdv is

94

the conflict graph degree. The difference between this result and ours is the following:

In [65] we have to decrease the throughput as the number of interferers increase to

get acceptable delay performance. In Theorem 9, we have to decrease the throughput

only when the number of external interferers increase (interferers that have a different

transmitter). We expect the difference to be significant in ad-hoc networks when the

average node degree increases as the network becomes dense. Another difference is

the following: The mixing time bound obtained in [65] was O(log(n)) under parallel-

updates assumption (with assumption A2 removed). However the mixing time of Q-

CSMA for single-site updates is lower bounded by Ω(n log(n)) [78]. Furthermore, in

Theorem 9, an additional O(n) factor comes from the log(π∗) term in the comparison

theorem in Lemma 5. Thus the mixing time upper bound of P is of order O(n)

larger than that of Q. However, as asserted in [77], this additional O(n) factor is

almost certainly an artifact of the analysis. Thus, although the bound of Theorem 9

is O(n2) times that of [65], we do not expect the mixing time of Q-CSMA to be less

than that of NB-CSMA for any arrival rate vector. This will be further validated by

simulations.

4.6 Collocated Networks

In this section, we restrict our study to collocated networks where every link interferes

with all other links. It is easy to see that the resulting conflict graph is complete and

at most one link can be active each time slot, i.e., for a network with n links, we have

n+1 feasible schedules enumerating all links in the network plus the empty schedule.

We focus on collocated networks for both their relative simplicity compared to general

networks which have O(2n) feasible schedules and their practical importance as they

are often used to model wireless local area networks. Several works have analyzed

collocated networks separately: [65] analyzed the mixing time of the schedules Markov

95

chain. [79] formulated the link access probability as an optimization problem, and [80]

derived a lower bound on mean delay for dynamic activation rates. Our goals in this

section are

1. We analyze collocated networks using the Markov Chain with rewards frame-

work [81] to derive a closed-form expression for the mean starvation time for

individual links under both Q-CSMA and NB-CSMA.

2. We use mean starvation expressions to infer, in a quantifiable way, the expected

gains from using NB-CSMA over Q-CSMA as a function of activation rates and

network topology. To this end, we use simplifying assumptions for the network

for the sake of obtaining closed-form expressions.

We remark that our approach is inspired by [66] where the marginal service rate

process along with stochastic dominance were used to derive average delay bound for

any link in a general network with bounded-degree conflict graph. However, we note

that finding the mean-delay bounds obtained in [66] involves inverting the transition

matrix, which is only possible computationally and does not offer clear insights on how

the bound depends on different network parameters. We first note that for any

collocated network, feasible schedules can be enumerated as (0, e1, e2, ..., en), where

eu = s|su = 1 and sv = 0, ∀v 6= u. For the rest of our analysis, we will consider the

behavior of link 1, without loss of generality. Similar to our analysis in Section 5.2,

we define ζi as the ith starvation time. Formally, we have ζi = Ti+1 − Ti| s1(Ti) =

1, s1(Ti+1) = 1 and s1(t) = 0 ∀Ti < t < Ti+1. Note that the difference between

starvation time ζi and recurrence time τi is that starvation time counts the time

between two “ON” slots such that all the slots between them link 1 is “OFF”, whereas

recurrence time counts the time between “ON” slots even if they are consecutive. Our

96

goal is to explicitly express E(ζi) for a collocated network performing both Q-CSMA

and NB-CSMA and relating their performance ratio to the network topology

4.6.1 Q-CSMA Starvation time

General Case

To compute the mean starvation time for a collocated network executing Q-CSMA,

we assume without loss of generality that one link is chosen uniformly at random

to update each time slot w.p.1

n. We then modify the Markov chain governing the

schedules by making the state e1 absorbing (by modifying probabilities such that

P11 = 1 and P1u = 0, for u 6= 1. We can now write the modified Markov chain

transition matrix SQ (where the transitions from/to schedule e1 were moved to the

last row/column respectively) as follows:

1−∑n

i=1λi

n(1+λi)λ2

n(1+λ2)λn

n(1+λn)λ1

n(1+λ1)

. . . . . . . . . . . . . . .

1n(1+λn)

. . . 1− 1n(1+λn)

0

0 0 0 1

(4.6.1)

Let vi be the expected first-passage time from ei to e1. The vector v can be found

using a a first-step analysis by solving the following linear equations [81, Chapter 4.5]

v = 1 +Qv (4.6.2)

which can be simplified to:

v0 = 1 +n∑i=2

λin(1 + λi)

vi +

(1−

n∑i=2

λin(1 + λi)

)v0 (4.6.3)

97

1 0 B

1

n(1 + λ)

1− 1

n(1 + λ) (n− 1)λ

n(1 + λ)

1

(1 + λ)

λ

n(1 + λ)

1

n(1 + λ)

1− 1

n(1 + λ)

Figure 4.2: State space of Q-CSMA in collocated network, equal throughput case

vi = 1 +1

n(1 + λi)v0 +

(1− 1

n(1 + λi)

)vi, 1 < i ≤ n (4.6.4)

We can easily solve these linear equations to obtain vi for i = 0 and all i > 1. Note

that under Q-CSMA, the scheduling Markov chain can only depart schedule e1 to

schedule 0, i.e., we have E(ζ1) = v0, thus, solving (4.6.3) and (4.6.4) we get

E(ζ1) = v0 =(∑n

i=2 λi + 1)(n(1 + λ1))

λ1

. (4.6.5)

Equal Throughput Case

If we further simplify the network to assume all links in the network having the same

arrival rate, we can assume that all activation rates in the network are fixed and

equal, we drop the subscript and refer to this unified fugacity as λ. Under these

assumptions, we can simplify the state space of the scheduling Markov chain s(t) by

making the following observation: Link 1 only sees the medium in three states: 1.

ON: whenever s(t) = e1, i.e., link 1 currently owns the medium and freely transmits

if it has any packets in its queues, 2. IDLE: whenever s(t) = e0: No link owns

the medium, link 1 can acquire the medium with probabilityλ

n(1 + λ), and finally, 3.

98

BLOCKED: whenever s(t) ∈ e2, e3, ..., en. The state diagram of the new transition

matrix is shown in Fig. 2. The reason of this is when activation rates are equal, it

does not make a difference what other link has the medium from the point of view

of link 1. More formally, the states e2, e3, ..., en are statistically indistinguishable

when we are interested in analyzing the behavior of link 1. We consolidate these

states into a single state eB and rewrite the matrix Q in (4.6.1) as

Q =

1

1+λ(n−1)λn(1+λ)

λn(1+λ)

1n(1+λ)

1− 1n(1+λ)

0

0 0 1

(4.6.6)

Performing a first-step analysis for the new Markov Chain to obtain the mean-first

passage times (v0, vB) starting from the schedules (e0, eB), respectively, we get:

v0 = 1 +1

1 + λv0 +

(n− 1)λ

n(1 + λ)vB (4.6.7)

vB =1

n(1 + λ)v0 + (1− 1

n(1 + λ))v2 (4.6.8)

Solving the two linear equations, we get the following expression for the mean-

starvation time

E(ζQ1 ) = v0 = n2 + n(n− 1)λ+n

λ, (4.6.9)

where we have used the fact that any starvation epoch will have to start from the

state e0 since the state e1 can only transition to state e0. As a sanity check, we note

that letting all activation rates λi = λ for all links in (4.6.5) gives us the expression

in (4.6.9).

99

1

N

0 B

1

n(1 + λ)

1− 1

n(1 + λ)− (K − 1)2λ

K(1 + λ)

(K − 1)2λ

K(1 + λ)

(n−K1)λ

n(1 + λ)

1

(1 + λ)λ

n(1 + λ)

(K − 1)λ

n(1 + λ)

1

n(1 + λ)

1− 1

n(1 + λ)

1

n(1 + λ)

1− 1

n(1 + λ)− (K − 1)λ

K(1 + λ)

(K − 1)λ

nK(1 + λ)

Figure 4.3: State space of NB-CSMA in collocated network, equal throughput case

4.6.2 NB-CSMA Starvation time

We follow a similar procedure to compute the average starvation time of networks

under NB-CSMA algorithm. We begin by writing the transition matrix of the NB-

CSMA schedules B. As an example, the matrix (4.6.10) represents a collocated

network with two transmitting nodes with a total of n transmitting links, where

node 1 has K1 outgoing links and node 2 has K2 outgoing links. In this section,

we will use the set Kv of outgoing links from node v and the set cardinality |Kv|

interchangeably for clarity of presentation. Note that in this case, K1 +K2 = n. The

corresponding transition matrix B is a (n + 1) × (n + 1) given in (4.6.10), where a

node transition sub-matrix Pm that contains transition between schedules within the

node (ei, ei+1, . . . , ei+Km) is given in (4.6.11).

100

B =

1−∑n

i=1λi

(1+λi)λ1

n(1+λ1). . .

λK1

n(1+λK1)

λK1+1

n(1+λK1+1). . . λn

n(1+λn)

1−∑n

i=1λi

(1+λi)

λ1n(1+λ1)

...

λK1

n(1+λK1)

P1 0

1−− 1n(1+λK1+1)

...

1n(1+λn)

0 P2

(4.6.10)

We can repeat the steps in the previous section to obtain a result similar to (4.6.5).

However due to the less-sparse structure of the matrix B, we cannot get a closed-form

solution for the mean starvation time of any link in a network operating under the

NB-CSMA scheduling algorithm. Therefore, we focus on the special case of equal

throughputs.

Similar to Section 4.6.1, when studying the mean starvation time of link 1 of node

1, which has a total of K1 outgoing links under the equal throughput assumption, we

simplify the state space into four states relevant to link 1: 1. ON: whenever s(t) = e1,

i.e., link 1 currently owns the medium and freely transmits if it has any packets in its

queues, 2. IDLE: whenever s(t) = e0: No link owns the medium, link 1 can acquire

Pm =

1− 1

n(1+λi)−∑j 6=i

(Km−1)λj

n∑i+Km

l=i (1+λl)

(km−1)λi+1

n∑i+Km

l=i (1+λl)

(km−1)λi+Km

n∑i+Km

l=i (1+λl)(km−1)λi

n∑i+Km

l=i (1+λl)1− 1

n(1+λi+1)−∑j 6=i+1

(Km−1)λj

n∑i+Km

l=i (1+λl). . .

(km−1)λi+Km

n∑i+Km

l=i (1+λl)

. . . . . . . . . . . .(km−1)λi

n∑i+Km

l=i (1+λl)1− 1

n(1+λi+Km ) −∑j 6=i+Km

λj∑i+Kml=i (1+λl)

(4.6.11)

101

the medium with probabilityλ

n(1 + λ), 3. NEIGHBOR: The medium is owned by a

link that is outgoing from node 1, thus at anytime the schedule can switch to link

1 w.p.(K1 − 1)λ

nK1(1 + λ). The NEIGHBOR state is entered whenever s(t) ∈ eK1 \ e1

4. BLOCKED: whenever s(t) ∈ eS \ (eK1 ∪ e0). In the rest of this section, we will

denote K1 simply as K for clarity of presentation. The state diagram of the new

transition matrix is shown in Fig. 4.3.

To derive the mean starvation time of link 1, we modify the Markov chain in Fig.

4.3 of schedules by making the state e1 absorbing. We repeat the procedure of Section

4.6.1.A. to derive the first-passage time of all states to the target state e1. Solving

the associated linear equations, we obtain the vector of mean first passage times

[vN , v0, vB]. Under the NB-CSMA in Fig. 4.3, the schedule can transition out of state

e1 to either the IDLE state e0 or the NEIGHBOR state eN with probabilityP10

P10 + P1N

andP1N

P10 + P1N

respectively. Therefore, to compute the mean starvation time we take

the inner product between the initial distribution α = [P10

P10 + P1N

P1N

P10 + P1N

0] and

the mean first-passage time α = [v0 vN vB] to obtain:

E(ζN1 ) = αᵀv =Kn(λ+ 1)(nλ− λ+ 1)

λ(λk2 + (1− 2λ)k + λ)(4.6.12)

In this section, we compare the service processes of both Q-CSMA and NB-CSMA

in the case of collocated networks in the equal throughput case, when all nodes have K

outgoing links. We first begin by stating the following Lemma in choosing activation

rates λ in our special case of equal throughputs.

Lemma 4.6.1. Given a collocated network running Q-CSMA or NB-CSMA with

equal throughputs for all links, an activation rate of λ =ρ

n(1− ρ)+ ε for any ε > 0

stabilizes any throughput requirement ν =ρ

nfor 0 < ρ < 1.

The proof is immediate by checking that the stationary probability of the Markov

102

chain π(ei) =λ

1 + nλ>ρ

nwhich guarantees the stability of all queues. Recall that the

stability region of a collocated network is given by ν|n∑i=1

ν < 1 and for the special

case of equal throughputs the stability conditions reduces to the single condition of

ν <1

n. The ρ in Lemma 4.6.1 takes values between 0 and 1 and represents the traffic

intensity in the network.

We are now ready to compare the ratio between mean starvation times of both

Q-CSMA and NB-CSMA in the following theorem

Theorem 4.6.1. Given a collocated network with n links,K

nnodes each with K

outgoing links and equal throughputs at all links with ν =ρ

n, the ratio r between

mean starvation times of NB-CSMA and Q-CSMA is given by

r =K

K + λ(K2 − 2K + 1)(4.6.13)

where

λ =ρ

n(1− ρ)+ ε, for any ε > 0. (4.6.14)

The proof is immediate by dividing (4.6.12) by (4.6.9). The ratio gives the decrease

observed when using NB-CSMA instead of Q-CSMA in a given network. Theorem

4.6.1 leads to the following observations:

1. The ratio decreases with outgoing links K as Θ(K2), which means that for a

given network, we see greater improvement in starvation times as the number of

outgoing links per node increases. This result is expected since larger K means

that many decisions will be centralized for more links.

2. The ratio decreases as Θ(1

(1− ρ)2) as well, which means: the higher the network

103

traffic intensity, the more improvement we expect to get from NB-CSMA in

terms of starvation times.

3. Each link will see an alternating renewal service process with equal long-term

mean service times between all links under both NB-CSMA and Q-CSMA

scheduling algorithms. However, the service process under each algorithm will

be as follows

Q-CSMA: The link will be ON for a Geometrically distributed, Geo(1

n(1 + λ)),

epoch, then OFF for an epoch that has a discrete Phase-Type distribution with

mean E(ζQ1 ) given in (4.6.9).

NB-CSMA: The link will be ON for a Geometrically distributed, Geo(1

n(1 + λ)+

(K − 1)2λ

K(1 + λ)), epoch, then OFF for an epoch that has a discrete Phase-Type dis-

tribution with mean E(ζN1 ) given in (4.6.12).

4.7 Numerical Results

4.7.1 General Networks

We simulate a random topology where 20 wireless nodes are placed uniformly at

random in a 600x600m square. A wireless link is established with probability 1, if

the transmitter node is within 150m from the receiver node, and with probability

0.5 if the transmitter is within 250m from the receiver. All links in this simulation

are unidirectional. We assume a geometric interference relationship between links

whereby two links interfere with each other if a)they share a transmitter node, b)they

share a receiver node, c)the transmitter node of one link is within 250m of the receiver

node of the other one. The resulting instance of the geometric random network has

48 links. To determine the decision schedule of Q-CSMA and the update blocks of

104

0 100 200 300 400 500 600

X

0

100

200

300

400

500

600

700

Y

Nodes

Links

Figure 4.4: 600x600m Random Network Topology

the NB-CSMA, we use a contention window of size 8 and a random back-off scheme,

where every link waits a random time, then attempts to include itself in the update

blocks (decision schedule). If no conflict happens, the inclusion is successful. We also

use the dynamic fugacities λv =log(1 + qv)

log(e+ log(1 + qv))found to have the best delay

performance [75]. Thus effectively in the simulation we have dropped assumptions

(A1-A3) of Section 4.5. The arrivals at different links are independent Bernoulli

processes. We determine the arrival rates of all links using the following three steps:

a) We compute all the maximal independent sets of the conflict graphG of the network

G in Fig. 4.4, we call this set of independent sets AM . b) For each of these sets,

Am∈M , we obtain an arrival rate vector νm on the boundary of the capacity region

by setting the arrival rates of all links v included in the set to νvm = 1, ∀v ∈ m

and otherwise νvm = 0, ∀v /∈ m. c) By taking the average of νm that is, taking the

average of the binary vector [νvm]m∈M at each link v, we get an arrival rate vector

ν∗ at the boundary of the capacity region that has a strictly positive arrival rate

for every link v ∈ V . We multiply ν∗ by a factor ρ ∈ (0, 1) that we call “Traffic

105

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Traffic Intensity (ρ)

0

50

100

150

200

250

300

350

Avera

ge Q

ueue L

ength

Per

Lin

k

NB-CSMA

Q-CSMA

Figure 4.5: Average Queue Length per link vs. ρ

Intensity” to simulate the network at different levels of throughput. In Fig. 4.5,

we plot the time-average queue lengths per link against the traffic intensity ρ. We

calculate the average queue lengths for 2 × 105 time slots. In all simulations, we

neglect the evolution of queue lengths for the first half of the simulation time when

calculating time-averages, to make sure that we have minimized the influence of the

transient behavior at the beginning of the simulation. We can see as expected that

NB-CSMA outperforms Q-CSMA for all values of ρ. In this example, the average

queue lengths of NB-CSMA is on average half of that of Q-CSMA for all values of

ρ. By Little’s la w [81], we know that the average delay is the ratio of the average

queue length of any link (and of the whole network) to the arrival rate. Thus, for

this example, NB-CSMA results in 50% decrease in average delay for any arrival rate

vector within the capacity region.

We consider a recent proposal that greatly improves the performance of Q-CSMA,

and compare the performance of the modified Q-CSMA to the performance of NB-

CSMA when they both implement the same modification. Namely, we consider the

106

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Traffic Intensity ( )

0

50

100

150

200

250

300

350

Ave

rag

e Q

ue

ue

Le

ng

th P

er

Lin

k

Delayed CSMA, T=2 [16]

Delayed NB-CSMA, T=2

Delayed CSMA, T=8 [16]

Delayed NB-CSMA, T=8

Figure 4.6: Average Queue Length per link vs. ρ (Delayed CSMA, T = 2, T = 8.)

case of “Delayed CSMA” proposed in [70] [71] (and slightly differently in [69], [72])

whereby the links keep T parallel schedules and update each schedule independently.

For instance, if T = 2, then the network keeps two separate schedules: one for even

time slots and one for odd time slots. Intuitively, when two or more schedules evolve

independently, it becomes less likely that any link will be starved for a long time.

In Fig. 4.6, the resullts of the implementation of Delayed-CSMA for both Q-CSMA

and NB-CSMA are presented. We simulate the network in Fig. 4.4 for different

traffic intensities. We note that, both Q-CSMA and NB-CSMA see improvements

when the number of parallel schedules T , grow from 2 to 8. In fact, Q-CSMA sees a

28% reduction in average delay, whereas NB-CSMA sees a 33% reduction, i.e., NB-

CSMA sees more reduction when the schedules are parallelized. Also when T = 8

at high throughputs, the average delay under Delayed NB-CSMA is 40-50% that

of Delayed-CSMA. This suggests that the benefits of improvements to link-based

CSMA algorithms, such as parallelization, carry over to NB-CSMA whenever those

improvements are implemented on top of NB-CSMA.

107

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

Traffic Intensity ( )

200

300

400

500

600

700

800

900

1000

1100

1200

Me

an

Sta

rva

tion

Tim

e:

E(

)

Q-CSMA-Analytical

Q-CSMA-Simulated

NB-CSMA-Analytical

NB-CSMA-Simulated

0.4776 0.4639 0.4467 0.4248 0.3956 0.3550 0.2945 0.1949

Ratio: r

0.4776

0.4639

0.4467

0.4248

0.3956

0.3550

Figure 4.7: Mean Starvation time of link 1 vs. ρ

4.7.2 Collocated Networks

We simulate a collocated topology with the assumptions of Section 4.6 (symmetric

networks, fixed fugacities, equal throughputs), with four nodes each having six out-

going links (K = 6, n = 24) for 106 slots. In Fig. 4.7, we plot the analytical and

simulated mean starvation times against the traffic intensity ρ. We also added the

corresponding ratio r for each ρ at the top x-axis. We first note that the mean starva-

tion time increases as we increase ρ for Q-CSMA whereas it decreases for NB-CSMA.

The behavior of Q-CSMA can be explained by taking the derivative of (4.6.9) with

respect to λ. We find that the mean starvation time is decreasing for λ <

√1

n− 1

and increasing for λ >

√1

n− 1. Therefore, at high traffic intensities we observe a

sharp increase in mean starvation times. The behavior of NB-CSMA is more compli-

cated, as the expression in (4.6.12) is a fractional polynomial, thus, increasing traffic

intensity may cause mean starvation time to increase, decrease or oscillate depending

on the network topology. As traffic intensity increases, there are two forces that affect

108

.0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

Traffic Intensity ( )

0

100

200

300

400

500

600

700

800

900

Ave

rag

e Q

ue

ue

Le

ng

th P

er

Lin

k

NB-CSMA

Q-CSMA

Figure 4.8: Average Queue Length per link vs. ρ (Collocated Networks)

the mean starvation time: An attractive force that pulls the schedule towards link

1 from the neighbors that share the transmitter node. Thus, when λ increases, the

node tends to switch the schedule rapidly between outgoing links decreasing the mean

starvation time of all links. On the other hand, since all other links become attractive

as well, link 1 has to wait longer to get the medium back whenever a link that is out-

going from a different node is transmitting, which increases the mean starvation time.

Thus, the mean starvation time under NB-CSMA does not have a simple monotonic

relationship with the throughput like that of Q-CSMA. Another observation from

Fig. 4.7 , the ratio of NB-CSMA to Q-CSMA mean starvation time, r, decrease

sharply with throughput in line with our findings in Theorem 7. For example, this

ratio decreases from r = 0.47 for ρ = 0.6 to r = 0.19 for ρ = 0.95 demonstrating the

vast decrease in mean starvation time as we approach the boundary of the capacity

region. We also simulate the mean queue length for the same network for different

traffic intensities ρ under the same assumptions. The results, shown in Fig. 4.8,

109

show that NB-CSMA consistently achieves around 50% of the delay achieved by Q-

CSMA across all traffic intensities ρ. Interestingly, this is similar to the reduction we

obtained in the random topology with unequal throughputs and adaptive fugacities.

4.8 Practical Implementation of NB-CSMA

We now discuss the practical implementation of NB-CSMA. NB-CSMA is synchro-

nized, in the sense that all nodes perform contention simaltanuously. As we shall

show, this completely eliminates problems such as Hidden-Terminal problem and

Exposed-terminal problem. This is unlike the 802.11 protocol model which is asyn-

chronous, but follows earlier practical implementations for throughput-optimal CSMA,

such as [14], which are synchronous. A similar discussion was presented in [14] for

the link-based Q-CSMA. We give the Node-based counterpart of the implementation

that satisfies the NB-CSMA requirements and procedures.

Similar to the WiFi RTS/CTS/Data/ACK scheme, our implementation of NB-

CSMA takes four steps:

1. POLLING: Sent by a potentially transmitting node, S, each time slot to explore

outgoing link states.

2. Clear To Update (CTU): Sent by a receiving node, R, as a response to the

POLLING signal by S, to indicate that a link (S,R) is available for update this

time slot.

3. Data: Transmission of the packet by transmitter node S to receiver node R.

4. ACK: Sent by receiver R to acknowledge successfully receiving a packet.

For the purposes of Forming Blocks (Section 4.4-Step 1 ), each node keeps a num-

ber of flag bits that are updated each time slot:

110

• AVAILABLE Tx, AVAILABLE Rx : A bit that indicates whether a node is

available or blocked as a transmitter or a receiver, respectively.

• SENSE Tx, SENSE Rx : A bit that indicates that a node has sensed an active

transmitter or receiver, respectively, in its neighborhood in the previous slot.

• ACTIVE Tx, ACTIVE Rx : A bit that indicates that a node was active as a

transmitter or receiver, respectively, in the previous slot.

Recall from Section 4.4, Each node k has a set of out going links Kk. Each time slot,

node k forms an update clique Ck where schedule updates take place. Furthermore,

we define Hk as the set of nodes that can overhear a transmission from node k and

vice-versa. We assume Symmetry in hearing, i.e. if k′ ∈ Hk then k ∈ Hk′ . Finally,

each nodes keep a set of potential receivers, Rk, where a link (k, k′) exists if k′ ∈ Rk,

and a set of potential transmitters, Tk, where a link (k′, k) exists if k′ ∈ Tk.

A the beginning of each time slot, there is a contention period of length W mini-

slots, where the “Forming Blocks” step takes place. Each frame in the contention

window consists of V + 1 sub-mini-slots. Nodes use the contention window to form

the update cliques Ck that follow both conditions stated in Section 4.4, Step 1. We

will shortly give the details of how that is achieved, and later show that this takes

care of both the Hidden and the Exposed Terminal problems.

First we give the details of how each node performs the step of “Updating Blocks”

in Section 4.4, Step 2. In order perform that step, each node needs to know the state

of the neighboring links to each of its outgoing links in the previous time slot, t. This

is achieved as follows:

Updating Blocks: At time t, node k updates its flags as follows.

1. If one of the outgoing links of node k (say (k, k′)) was active as a transmitter

in the previous time slot, then node k sets ACTIVE Tx and ACTIVE Rx to

111

1 and 0 respectively. By the collision-free nature of the algorithm, No other

transmitter or receiver neighbor can be active in the previous time slot, thus,

SENSE Tx and SENSE Rx are set to 0. Using this information, node k knows

that sk,k′(t) = 1. Thus, by Algorithm 1, node k is free to activate any of it’s

outgoing links (k, j) at time t+ 1 provided that 1.(k, j) ∈ Ck(t+ 1) and 2.The

CTU message from k′ confirms that k′ sensed no other active transmitter in

the previous time slot.

2. If one of the incoming links of node k was active at time t (say (k′, k), then

node k sets ACTIVE Rx= 1, and all other flags to 0. At time slot t + 1, node

k knows that none of it’s outgoing or incoming links can be activated at time

slot t + 1, except for (k′, k). This informs node k that∑

z∈N(w)

sz = 1, ∀w ∈ Tk

and by lines 17-20 in Algorithm 1, none of the links in Tk can be active at t+ 1

to avoid collisions. Thus, node k sets all outgoing links to OFF at t+ 1.

3. If node k overhears a Data packet at time slot t, it sets SENSE Tx to 1. At

time t + 1, node k knows it cannot be active as a receiver (by lines 17-20 in

Algorithm 1). Furthermore, node k piggybacks its flags on the CTU messages

at the next time slot to inform potential transmitters they can not transmit in

the next slot to avoid Hidden-terminal collision.

4. If node k overhears an ACK packet at time slot t intended to another node,

it sets SENSE Rx to 1. At time t + 1, node k knows it can not be active as

a transmitter (by lines 17-20 in Algorithm 1). Thus, node k sets all outgoing

links to OFF at time t+ 1.

5. If node k was neither transmitting or receiving at time t, and does not sense a

Data or an ACK packet, then node k is free to be a transmitter or a receiver

112

S

R1

R2

R3

POLLING

t=1

POLLING CTU1

POLLING CTU1 CTU2

POLLING CTU1 CTU2 CTU3

t=1 t=2 t=3 t=4

t=2t=1 t=3

t=1 t=2

Figure 4.9: The POLLING/CTU exchange in the contention period

at time t if one of the outgoing/incoming links (k, k′) is included in Ck and is

updated to ON.

We note that as mentioned in [14], the state flags SENSE Tx, SENSE Rx, AC-

TIVE Tx, ACTIVE Rx can be piggybacked on the CTU message to inform the

transmitter with the state of its potential receiver. This eliminates any possibility

of collision due to hidden-terminals, since the transmitter k knows whether any of

the neighbors of its receiver k′ were active (through the SENSE Tx bit of node k′

piggybacked on the CTU packet from k′ to k) during the previous time slot. Also,

each node differentiates between being blocked as a transmitter and as a receiver,

which eliminates the exposed terminal problem, since a node blocked as a receiver

can still be a transmitter given its receiver is not blocked.

Forming Blocks: At the beginning of each contention period, Each node sets

the bits AVAILABLE Tx and AVAILABLE Rx to 1. Each node randomly chooses

an integer TBackoff between [0,W − 1]. This will be the back-off time of that node at

the current contention period. Node k then does the following:

113

1. At time TBackoff, if node k has AVAILABLE Tx= 1, it will randomly choose

min(Kk, V ) outgoing links at random (Recall that Kk is the set of outgoing

links from node k, and V + 1 is the number of contention sub-mini-slots in each

contention mini-slot). Node k then sends a POLLING message, directing those

min(Kk, V ) chosen potential receivers to send a CTU message in a designated

sub-mini-slot as shown in Fig. 4.9. Node k then listens for CTU messages from

potential receivers. If node k hears a CTU message from k′, then (k, k′) will

be included in the update block Ck. At the “Update Block” step, node k can

utilize the information piggy-packed on the CTU message to infer the state of

k′ neighbors. Node k then sets AVAILABLE Rx= 0, indicating that the no

block can contain any incoming links to k.

2. At time t < TBackoff, if node k hears a POLLING message directed to it (possibly

among other potential receivers), node k will first check the AVAILABLE Rx

bit. if the AVAILABLE Rx bit is 1, node k will set both AVAILABLE Tx

and AVAILABLE Rx to 0 and then send a CTU message in its designated

sub-mini-slot as instructed by the POLLING message.

3. At time t < TBackoff, if node k hears a CTU message not directed to it. Node

k knows it can not include any outgoing links to avoid collisions and thus, sets

AVAILABLE Tx bit to 0.

4. At time t < TBackoff, if node k hears a POLLING message not directed to it,

node k will set AVAILABLE Rx= 0, indicating that none of the incoming links

to node k can be included in the update block Ck′ of any node.

5. At time t < TBackoff, if node k hears a collision of POLLING messages from two

transmitters, node k will set AVAILABLE Rx= 0, indicating that none of the

incoming links to node k can be included in the update block Ck′ of any node.

114

Theorem 4.8.1. The Update Cliques formed using the abouve signalling protocols

satisfy the conditions:

1. Ck ⊆ Kk

2. For any two nodes that will update at the same slot k, l ∈ K, we have (v, w) /∈

E ∀v ∈ Ck, w ∈ Cl

It is straight-forward to see that the update cliques formed by the provided

POLLING/CTU scheme is non-interfering. The key is noticing that the handshake

is successful only if none of the links that interfere with Ck are included in any up-

date clique Ck′ . Steps 2-5 indicate that a CTU message is only sent as a response

to a POLLING message only if: 1. In the previous mini-slots there were no other

POLLING messages heard. 2. A successful POLLING message was received with

no collisions. This completely eliminates the hidden-terminal problem. Furthermore,

differentiating between transmitter and receiver availability of any node eliminates

the exposed-terminal problem.

To summarize, Our model assumes a discrete-time synchronized CSMA system,

where all nodes have a common contention period. Nodes keep multiple flags to indi-

cate availability as a transmitter or a receiver as well as sensing results. Following a

POLLING/CTU handshake, update cliques can be formed that satisfy the theoreti-

cal conditions needed to prove our results. Furthermore, the synchronization as well

as the signaling scheme eliminate the possibility of hidden or exposed terminals.

4.9 Conclusion

We have proposed a node-based CSMA algorithm that is throughput optimal and

outperforms link-based CSMA. The improvement margin depends on network topol-

ogy, but for practical ad-hoc networks, we expect the improvement to be significant.

115

NB-CSMA is also fully distributed, in the sense that the nodes make their decisions

based solely on local information. Our mixing time analysis gives an insight on how

topology affects the low delay fraction of the capacity region. In particular, we have

shown that interferers that originate from the same transmitter do not contribute to

the shrinkage of the fraction of the capacity region with low delays under NB-CSMA.

Furthermore, our framework of deriving mean starvation times assists us in under-

standing how making node-based decisions affects link starvation. Our simulation

results show a significant improvement in mean delay and mean starvation time over

link-based CSMA. In future work, we can investigate the delay performance of NB-

CSMA in multihop ad-hoc networks and refine our delay analysis to include dynamic

update rules and more general topologies. We will also investigate how our mean

starvation time framework could be extended to obtain bounds on the mean delay in

both special collocated and more general networks.

116

CHAPTER 5

CELLULAR ACCESS OVER HETEROGENEOUS

WIRELESS INTERFACES

5.1 Introduction

In this chapter, we tackle the problem of allocating resources to different applications

with Quality of Service (QoS) requirements across different radio interfaces [82]. Mo-

bile devices are often presented with two types of connectivity coming from their

accessible Radio Access Technologies (RATs): A ubiquitous expensive cellular RAT

coming from the cellular provider and an intermittent cheap connection coming from

different alternative RATs such as WiFi, mmWave, and Dynamic Spectrum Access

in white spaces. The interesting question is: How can we best distribute mo-

bile traffic over heterogeneous RATs, taking into account availability and cost

differences between interfaces? Furthermore, different applications have different con-

nectivity requirements in terms of long-term throughput and short-term regularity.

The heterogeneous cost and availability properties, along with different applications’

connectivity requirements introduce the need for schedulers that could balance ap-

plications’ utilities along with RATs properties.

In general, two approaches have been employed to address this question. The first

approach is Intelligent Network Selection by choosing the suitable RAT for each

application. This approach assumes that the user is allowed to use only one RAT at

117

any given time. The current default policy employed by Android phones falls into this

category: Android default policy is to choose WiFi over LTE whenever possible with

the option of setting some applications to only use WiFi. This is known as delayed

offloading, where delay tolerant applications are only allowed to use WiFi. If WiFi is

not immediately available, then these applications will delay their transmissions up

to a deadline or until a WiFi connection is established. This policy was proposed and

analyzed in [83] and [84,85], respectively. However, this solution is more suitable for

3G deployments, where WiFi consistently offers significantly higher rates than the

cellular network. More recently, however, [86] has shown that this is not the case in

LTE deployments. In particular, it was shown that LTE outperforms WiFi in terms

of rate 40% of the time. One feature that has been used in the literature is the

predictability of wireless connectivity. In particular, it was shown in [87] that short-

term future wireless connectivity can be forecast accurately. Thus, a slightly delay-

tolerant application can delay a transmission if the connectivity forecast indicates

that preferable network conditions will be available in the near future. Using this

predictive ability, [88] modeled the problem as a Finite-Horizon Markov Decision

Process where the user follows a network-selection policy that minimizes the expected

energy when uploading a single file before a deadline. In [89], a Lyapunov drift-plus-

penalty approach was taken for network selection with the aim of minimizing power

subject to queue stability. In [90], the different applications’ QoS needs in terms of

throughput, delay and cellular cost were quantified as a utility function. Then, each

user solves an open-loop planning problem to choose the best transmission time for

each application according to its QoS and the availability of WiFi at any given time.

Aside from client-controlled solutions, [91,92] proposed network-controlled centralized

solutions to the problem, where a single entity has a perfect view of all RAT states, of

all users, and can assign users to RATs accordingly. The centralized control, however,

118

is not realistic in today’s settings as different networks are often controlled by different

entities. This is somewhat similar to our approach, however, as will be seen later, the

paper did not consider the possibility of operating multiple interfaces simaltanuously.

Furthermore, the timescale assumed in this paper is in terms of hours whereas we

consider decisions that are made over a few seconds. Finally this paper considered

deferring whole files to future time-slots whereas we consider elastic traffic sources

with no pre-defined file size.

The second possible approach to solve the problem is Simultaneously utilizing

all RATs at any given time. This is sometimes referred to as multihoming. This

approach is more flexible as it gives the user the opportunity to utilize the entire

bandwidth from all available RATs at any given time. Our problem formulation

takes this approach. The current de-facto solution for utilizing different RATs is the

MPTCP protocol [93]. However, MPTCP suffers from a variety of problems such as

high energy consumption [94] and over-utilization of the cellular link [95], which might

cause an increased monetary cost to the user. Several other approaches have been

proposed to exploit different RATs: [96] models the scheduling over different RATs

as a Mixed-Linear-Integer-Program and propose a greedy heuristic to defer delay

tolerant flows to later times. In [97], the problem is considered with flow-interface

assignment constraints. A minimum deficit round-robin policy is proposed and it is

shown that this policy is max-min fair. However, the result strongly depends on the

policy being work-conserving, which might cause over-utilization of a metered cellular

link. In [95], managing several RATs is studied for the case of transmission of video

chunks. The problem is formulated as a 0-1 min-knapsack problem that takes into

account different bandwidths, usage costs, and deadlines of video chunks. A practical

online heuristic is proposed and good performance is established via simulations.

In [98], the authors proposed an integrated transport layer that modifies SCTP to

119

allow the exploitation of heterogeneous RATs in vehicular networks. The paper uses

a Network Utility Maximization formulation with link costs. However, the issues of

QoS differentiation per application, application-level fairness and temporal variation

in secondary capacity are not addressed as implementation details are emphasized.

There are many challenges in solving the problem of rate-allocation over hetero-

geneous RATS: 1) Cellular and Secondary interfaces have different costs. While the

cellular network is usually metered, secondary RATs such as public WiFi are often

free to use. This may cause the optimal policy in terms of throughput, cost, and

QoS to be non-work conserving which complicates the problem. 2) Secondary

interfaces are inherently intermittent and unreliable. WiFi have limited coverage and

DSA is only allowed to access spectrum in absence of the spectrum owner. 3) Differ-

ent applications have different requirements in terms of delay and throughput. Thus,

a good rate-allocation policy has to incorporate individual application requirements

when allocating rates. However, all these applications share the same RATs that

have limited capacity. Thus, the allocations of all applications are coupled. Our

contributions can be summarized as follows:

1. We apply the DRUM framework, proposed in [99], in the context of allocating

rates to different cellular users to exploit temporal diversity, to the problem of

application rate allocation over different RATs. We demonstrate that using the

DRUM framework with a discount tied to application delay sensitivity results

in a fair allocation with desirable characteristics in terms of balancing the per-

application trade-off between throughput, delay, and cost.

2. We propose two online low complexity algorithms that exploit limited look-head

predictions of future connectivity.

3. We analyze one of those online algorithms and show that: in the presence of a

120

prediction window of length w time slots, under some mild conditions, the on-

line algorithm achieves a reward that is no less than (1− c

w + 1)Reward(OPT),

where c is a constant and OPT is the prescient offline solution. Thus, the pro-

posed algorithm is constant-competitive independent of the time horizon T , and

approaches the optimal reward as the prediction window increases. Simulations

show that, in practice, these proposed algorithms perform much better than

the theoretical bound under all considered scenarios.

Our work relates to [89] by being predictive and QoS-aware. The difference is that [89]

does not offer differentiated service to flows. The formulation in [98] relates to our

formulation of utility with link costs. However, [98] does not differentiate between

flows, nor does it consider time variations of secondary RAT, and instead, attempts

to solve a static optimization problem every time slot. Perhaps the closest work to

our problem are [90, 96] which considered all three factors of throughput, delay and

cost. [96] considered inelastic traffic that needs to be served over T time slots, whereas

[90] considered a mixture of inelastic and fixed-time elastic traffic. However, both of

these papers assumed full knowledge (or a good estimate) on all future connectivity,

and used heuristics to find good solutions for the hard traffic assignment problem.

5.2 System Model

We consider a mobile user running N applications, where each application creates a

flow i. At each time slot, the mobile user can use the cellular network, the secondary

network or both to transmit traffic belonging to flow i. We denote the rate received

by flow i over the cellular network at time t as yi[t], and the rate received by flow i

over the secondary network as xi[t].

121

Figure 5.1: System Model

5.2.1 Channel Model

We consider a smart-phone with two interfaces as shown in Fig. 5.1 a cellular in-

terface and a secondary interface. The extension of the formulation and the online

algorithms to the case of multiple secondary interfaces is straightforward. The sec-

ondary interface has a time varying capacity c[t] every time slot to capture the effects

of intermittence, unreliability, and possible user mobility. We do not have any statis-

tical assumptions on c[t]. We assume the user can accurately predict the secondary

capacity up to a future window of w time slots. The predictability assumption has

been used extensively for similar problems [83, 89, 100], and the feasibility of WiFi

prediction was shown in [87, 101]. We assume that the cellular interface has a con-

stant normalized capacity equal to 1 every time slot, i.e., we assume that the cellular

operator offers a constant rate to the user throughout the time-horizon. This cap-

tures the effect of ubiquity of the cellular network in contrast to intermittence of the

secondary network. Although the cellular network is affected by fading, we assume

that the cellular operator can employ scheduling, resource-block allocation, MIMO,

etc., to guarantee that the client gets a constant rate every time slot over the problem

horizon. The “time-slot” in the system is in the order of a few seconds, a sufficient

122

time for the state of secondary interface connectivity to change. In our model, the

rates xi[t] and yi[t] take continuous non-negative values. Finally, we assume that all

queues carrying different flows are infinitely backlogged, i.e., we assume that the flows

are elastic.

5.2.2 Flow Utility

We use the Discounted Rate Utility framework introduced in [99] to capture the

utility of flow i

Definition 5.2.1. (β-Discounted Rate [99]): For a given β ∈ [0, 1], we define the

β-discounted rate of flow i at time t ≥ 0 as:

R(βi)i [t]

∆=

∑tτ=0 β

t−τ (xi[τ ] + yi[τ ])∑tτ=0 β

t−τ(5.2.1)

As an illustrative example, we write down the β-discounted rate for β = 0, β ∈

(0, 1) and β = 1 as follows:

R(βi)i [t] =

xi[t] + yi[t] if β = 0,∑tτ=0 β

t−τ (xi[τ ] + yi[τ ])∑tτ=0 β

t−τif β ∈ (0, 1),

1

t

t∑τ=0

(xi[τ ] + yi[τ ]) if β = 1.

(5.2.2)

The β-discounted rate ties the utility of a certain flow to both the throughput and

average delay by adding a weight β to the history of allocated rates. Closer inspection

of (5.2.2) shows when β = 0, the β-discounted rate is equal to the instantaneous rate,

representing maximum delay sensitivity as no weight is given to the rate allocation

history. When β = 1, the β-discounted rate represents the time-average of allocated

rates since time 0. This is suitable for modeling a flow with no delay sensitivity.

123

To summarize, an increase in β models less delay sensitivity, thus, an application

with a high value of β can afford to wait for “favorable” transmission opportunities,

whereas lower β represents a flow that emphasizes importance of delay over possible

cost. Finally, we model the cost of using the cellular network as a linear coefficient

pc. Thus, every flow i has to pay a cost of pcyi[t] every time slot, in order to transmit

at rate yi[t] on the cellular network. This cost corresponds to a cellular operator that

meters usage of the cellular network. Furthermore, cellular cost helps as a factor

discouraging delay tolerant applications from using the cellular interface if they can

afford to wait and transmit on the secondary interface. This encapsulates the idea

of delayed offloading. However, while most existing literature of delayed offloading

considers inelastic traffic that should be transmitted in full, our model considers

elastic traffic that balances the trade-off between throughput and delay by using β

as a control knob.

5.3 Problem Formulation

We formulate the Finite-Horizon Discounted-Rate Utility Maximization (DRUM)

problem. For a horizon of T slots and N flows, each having its own discount fac-

tor β(i) we can write the problem as:

P1 : maxx[1],y[1],...,x[T ],y[T ]

T∑t=1

N∑i=1

wiU(R(βi)i [t])− pcyi[t] (5.3.1)

subject to (5.3.2)

N∑i=1

xi[t] ≤ c[t], t = 1, . . . , T (5.3.3)

N∑i=1

yi[t] ≤ 1, t = 1, . . . , T (5.3.4)

xi[t], yi[t] ≥ 0, i = 1, . . . , N, and t = 1, . . . , T, (5.3.5)

124

where the bold notation in x[t],y[t] refers to the allocation of all flows (x1[t], x2[t], . . . , xN [t])

and (y1[t], y2[t], . . . , yN [t]) at time t, respectively. Also, wi is a positive weight and

U() is a suitable concave non-decreasing utility function that aims to achieve fairness

between different flows. Examples of utility functions that provide fairness are the α-

fairness functions of the form U(r) =r(1−α)

1− αthat were introduced in [102]. However,

the difference between that formulation and the standard Network Utility Maximiza-

tion (NUM) framework is that the utility function is taken over a β-discounted rate

that puts a weight β on the history of rate allocated. Every flow i is parameterized

with the pair (wi, β(i)) where wi indicates a higher priority in rate allocation and a

lower βi indicates higher sensitivity to delay. The constraint (5.3.3) ensures that the

sum of the rates allocated on the secondary interface does not exceed the instanta-

neous capacity c[t]. Similarly, the constraint (5.3.4) ensures that the sum of rates

allocated on the cellular interface does not exceed the constant normalized cellular

capacity. The problem P1 is a standard constrained convex optimization problem

with 2NT decision variables (rate per flow per time-slot per interface) and 2T +2NT

constraints. Solving this problem requires non-causal knowledge of secondary ca-

pacities (c[1], c[2], . . . , c[T ]). In the next section, we provide two predictive online

solutions that depend on the knowledge of capacities up to a window w and have

theoretical bounds on worst-case performance as well as good practical performance.

5.4 Online Predictive Rate Allocation

5.4.1 Receding Horizon Control

RHC, also referred to in control literature as Model Predictive Control (MPC) [103,

104], is a feedback control technique that provides an online solution to the original

125

problem by approximating the original problem as a sequence of open-loop optimiza-

tion problems over the prediction horizon [t, t+w]. After solving the open-loop prob-

lem and obtaining the solution, the algorithm implements the first step of the solution

only, i.e., (x[t],y[t]), updates the state, finds the new prediction at time t+w+1 and

repeats the procedure at time t + 1. We now give a detailed description of the algo-

rithm. System State: Since the optimization is over the β-discounted rate, which

is a function of the rates allocated in the past, the system has to keep a “memory”

of past allocations. Since the equivalent rates in (5.2.2) are updated as a discounted

sum, it is sufficient to save a vector R[t−1] = (R(β1)1 [t−1], R

(β2)2 [t−1], . . . , R

(βN )N [t−1])

of the equivalent rates at time t − 1. Define the control input θi[t] = (xi[t], yi[t])T

where ( )T is the vector transpose notation. It can be shown from (5.2.2), that the

β-discounted rate of flow i is updated over time as follows:

R(βi)i [t] = βiR

(βi)i [t− 1]

(1− βt−1

i (1− βi)1− βti

)+

1− βi1− βti

1T θi[t]

≈ βiR(βi)i [t− 1] + (1− βi)1T θi[t] (5.4.1)

To obtain the online rate allocation, we first solve the following open-loop RHC

optimization problem. Let A = [1, 0]T and B = [0, 1]T . Also define the vector

Θ(R[t− 1]) = (θ[t|R[t− 1]], θ[t+ 1|R[t− 1]], . . . , θ[t+w|R[t− 1]]) as the 2× (w+ 1)

vector that solves the following open-loop optimization problem:

P2: maxθ[t],...,θ[t+w]

t+w∑τ=t

N∑i=1

wiU(R(βi)i [τ ])− pcBT θi[τ ] (5.4.2)

subject to

R(βi)i [τ ] = βiR

(βi)i [τ − 1] + (1− βi)1T θi[τ ], (5.4.3)

τ = t, t+ 1, . . . , t+ w, and i = 1, . . . , N

126

N∑i=1

AT θi[τ ] ≤ c[τ ], τ = t, t+ 1, . . . , t+ w (5.4.4)

N∑i=1

BT θi[τ ] ≤ 1, τ = t, t+ 1, . . . , t+ w (5.4.5)

θi[τ ] ≥ 0, i = 1, . . . .N, and τ = t, t+ 1, . . . , t+ w (5.4.6)

After solving the RHC optimization problem, the scheduler implements only the first

step of the solution, i.e.,

θRHC[t] = Θ(R[t− 1])[t] (5.4.7)

The state R[t] is then updated according to (5.4.1), the new prediction c[t + w + 1]

is obtained from the predictor, and the procedure is repeated to obtain the updated

solution.

5.4.2 Average Fixed Horizon control (AFHC)

The AFHC algorithm was proposed in [105] and analyzed in [106] for online convex

optimization (where the objective function is unknown every time slot) with switching

costs. AFHC is more amenable to theoretical analysis and sometimes outperforms

RHC. Similar to RHC, AFHC approximates the offline problem with a series of open-

loop optimization problems. However unlike RHC, AFHC does not only implement

the first step of the solution and discards the rest of the solution. Instead, AFHC

saves all solutions from all open-loop approximations and averages them out. Thus,

both algorithms have the same time complexity. However, AFHC needs 2N space

whereas RHC only needs N space. We next give the algorithm formally.

First, we define the Fixed Horizon Control parametrized by k where k = 0, 1, . . . , w.

FHC(k) only solves the problem at time slots Ωk = z : z ≡ kmod(w+1), i.e., FHC(k)

127

solves the problem P2 every w+1 slots at times k, k+(w+1), k+2(w+1), . . . etc., and

implements the solutions for the entire horizon. Let θ(k)[t] be the solution obtained

by FHC(k) for time t, we have

[θ(k)[t], θ(k)[t+ 1], . . . , θ(k)[t+ w]] = Θ(R[t− 1]), ∀t ∈ Ωk (5.4.8)

For example FHC(0) will implement the solution by solving the problem at 0, w+

1, 2(w + 1), . . . etc., FHC(1) will implement the solution by solving the problem at

1, 1 + (w + 1), 1 + 2(w + 1), . . . etc., and so on up to k = w. To complete the AFHC

solution, at time slot t ∈ Ωk, the scheduler will first solve FHC(k) to obtain the

allocation [θ(k)[t], θ(k)[t+ 1], . . . , θ(k)[t+ w]] and then sets

θi,AFHC[t] =

∑wk=0 θ

(k)i [t]

w + 1, ∀i = 1, . . . , N (5.4.9)

5.5 Competitive Ratio of AFHC

A natural question to ask is how well does the proposed algorithm perform under dif-

ferent conditions: number of flows, (wi, βi) of each flow, length of prediction window,

etc. While it is known that deriving competitive ratios for general online convex op-

timization problems is hard [106], under some mild conditions, we are able to derive

a lower bound on the competitive ratio (the competitive ratio here is w.r.t to reward

rather than cost, so we are looking for lower bounds to the ratio between rewards

achieved by the online algorithm and offline algorithm, respectively).

Definition 5.5.1. (Competitive Ratio): An algorithm ALG is said to be γ-competitive

if

infc[1],c[2],...,c[T ]

Reward(ALG)

Reward(OPT)≥ γ (5.5.1)

128

where Reward( ) is the function that computes the objective function according to

(5.3.1), and OPT is the offline prescient solution of P1.

Note that the definition we use here is slightly different from the definition used in

most online algorithms’ literature. Conventionally, an algorithm is called c-competitive

if Reward(ALG) ≥ 1

cReward(OPT). We choose to use γ =

1

cin Definition 1, instead

of the conventional c-competitive notation, since it makes our results more intuitive

and understandable.

Theorem 5.5.1. Given N flows, all with weights wi = 1 and β ∈ [0, 1). Under the

following assumptions:

A1 U(0) = 0 and U(r)− pcr > 0 for all r > 0.

A2 The (sub)-gradient of U( ) is uniformly bounded by G over the feasible domain.

A3 c[t] ≤ cmax,∀t ∈ 1, 2, . . . , T , we take the cellular capacity to be y[t] = yc,∀t

(instead of the normalized value 1 in the RHS of (5.3.4) to derive a more general

result.)

then, under AFHC:

Competitive Ratio ≥ 1− 1

w + 1

Gβmax

D(1− βmax)(5.5.2)

where

D = min

(U(yc)− pcyc

yc,U(yc + cmax)− pcyc

yc + cmax

)(5.5.3)

Before proving the theorem, we discuss the implications of this result. First, it is

clear that the bound improves with increased prediction window w. This is expected

since better foresight enables the scheduler to make better instantaneous decisions.

129

Second, the factor1

1− βimplies that an increase in β worsens the bound. This is

also expected since a delay tolerant flow has more flexibility on when it should be

allocated optimally. Thus, prediction window discards important information about

future opportunities to defer delay tolerant transmissions. Since the competitive ratio

bound is valid for all sequences of secondary capacities (c[1], c[2], . . . , c[T ]), even cases

where an adversary can observe the system state and controls, then generate future

secondary capacities accordingly. The bound is expectedly loose for practical cases,

and the empirical competitive ratio does not increase sharply with β as suggested by

the bound. In order to prove Theorem 5.5.1, three Lemmas are needed. The proof

generally follows the same approach as [105] by bounding the difference between

the reward of OPT and the reward of FHC(k) over a short horizon, and then using

Jensen Inequality to bound the competitive ratio. The difference between our proof

and [105] is that: 1. The formulation in [105] minimizes a convex function of the

current control action only plus a switching cost that penalizes difference between

current and previous control actions, whereas in our formulation, the reward is a

function of a state that depends on both control action and previous state. Thus, we

need extra steps of using first-order conditions to bound difference between rewards

(Lemma 5.5.2). 2. The way we define the Competitive Ratio in (5.5.1) requires

finding a linear underestimator of the reward function (Lemma 5.5.1).

Lemma 5.5.1. Given a vector of β-discounted rate vectors (R[1],R[2], . . . ,R[T ]),

the total reward achieved by this vector according to (5.3.1) satisfies the following

linear bound

T∑t=1

N∑i=1

U(R(βi)i [t])− pcyi[t] ≥

T∑t=1

N∑i=1

DR(βi)i [t] (5.5.4)

where D is given by (5.5.3).

130

Proof. See Appendix D.1

The next Lemma provides a bound on the difference between the reward achieved

by the offline solution and the reward achieved by the approximation P2. We first

write the reward achieved in the interval [t, t + w] with a control decision vector

(θ[t], θ[t+ 1], . . . , θ[t+ w]) as a function of the initial state at time t− 1 as follows:

g(R[t− 1]; θ[t], . . . , θ[t+ w]) =t+w∑τ=t

N∑i=1

U(R(βi)i [τ ])− pcBT θi[τ ]

=N∑i=1

t+w∑τ=t

U(βτ−t+1R(βi)i [t− 1] +

τ∑η=t

βη−t(1− β)1T θ[η])

−pcN∑i=1

t+w∑τ=t

BT θn[τ ]. (5.5.5)

Lemma 5.5.2. Denote the offline solution (OPT) of problem P1 and the resulting

states as (θ∗[1], θ∗[2], . . . , θ∗[T ]) and (R∗[1],R∗[2], . . . ,R∗[T ]), respectively. Suppose

we were running the FHC(k) algorithm from time 0 up to time t ∈ Ωk. Let the

system state at time t − 1 be R(k)[t − 1] and denote the online solution at time

t of problem P2 given R(k)[t − 1] as Θ(k)(R(k)[t − 1]) = (θ(k)[t|R(k)[t − 1]], θ(k)[t +

1|R(k)[t−1]], . . . , θ(k)[t+w|R(k)[t−1]]). Under assumptions A1-A3 of Theorem 5.5.1,

the following inequality holds:

g(R(k)[t− 1]; Θ(k)(R(k)[t− 1])) ≥ g(R∗[t− 1]; θ∗[t], . . . , θ∗[t+ w])

−N∑i=1

Gβi1− βi

(R(βi)∗

i [t − 1] − R(βi)(k)

i [t − 1]) (5.5.6)

where G is the uniform bound of the (sub)-gradient of the function U( ) over the

domain as stated by A2 in Theorem 5.5.1.

131

c[t] = 0 c[t] = 2 c[t] = 4

0.15

0.85

0.25

0.6

0.15 0.25

0.75

Figure 5.2: Secondary Interface capacity process

Proof. See Appendix D.2

The next Lemma bounds the total reward achieved by the FHC(k) algorithm as a

function of the total reward achieved by OPT. With some abuse of the notation, we

denote the reward gained over the horizon of AFHC, FHC(k), and OPT as g1:T (θAFHC),

g1:T (θ(k)), and g1:T (θ∗), respectively.

Lemma 5.5.3. Given any k = 0, 1, . . . , w, the following holds

g1:T (θ(k)) ≥ g1:T (θ∗)

−∑τ∈Ωk

N∑i=1

Gβi1− βi

(R(βi)∗

i [t − 1] − R(βi)(k)

i [t − 1]) (5.5.7)

The proof of this Lemma is straight-forward by summing the expression in (5.5.6)

over the set Ωk.

Proof of Theorem 5.5.1. See Appendix D.3

5.6 Numerical Results

To validate our formulation, we first assume that the secondary capacity generation

process follows the Markov chain in Fig. 5.2. We simulate the case when two flows

132

0 50 100 150 200 250 300 350 400 450 500

0

1

2

3

4

0 50 100 150 200 250 300 350 400 450 500

0

0.2

0.4

0.6

0.8

1

Figure 5.3: Optimal Rate Allocations of heterogeneous flows, U(r) = log(1 + r), pc =0.75.

exist, a delay sensitive flow with β = 0 and a delay tolerant flow with β = 0.9. Fig.

5.3 shows the allocation of OPT when the utility function U(r) = log(1 + r) and

the cellular price, pc = 0.75. We note the following: The delay tolerant flow cellular

usage is almost non-existent (less than 1% of the total flow rate) whereas the delay

sensitive flow uses the cellular interface whenever the secondary connectivity is weak

or absent. In Fig. 5.3, the delay sensitive flow transmits 15% of its traffic over the

cellular interface. The delay tolerant flow transmits in bursts whenever a secondary

network is available. Furthermore, the delay tolerant flow tends to increase its rate

whenever it predicts a period with no secondary connectivity at the expense of the

delay sensitive flow. The burstiness effect can be captured by noticing the following:

the means of total rate allocated to the delay-tolerant and the delay sensitive flows

are 1.01 and 0.87 respectively, whereas the variance of the total rate allocated to the

delay-tolerant flow is 1.11, more than 4 times that of the delay sensitive flow 0.27.

133

0 10 20 30 40

0.5

0.6

0.7

0.8

0.9

1

Figure 5.4: Competitive Ratios of RHC and AFHC compared to the function f(w) =

1− 1

w + 1. U(r) =

r(1−α)

1− α, pc = 0.55, β = 0, 0.7, 0.95, α = 0.5.

0.4 0.5 0.6 0.7 0.8 0.9 1

0.75

0.8

0.85

0.9

0.95

1

Figure 5.5: Competitive Ratios of RHC and AFHC as a function of βmax. U(r) =r(1−α)

1− α, pc = 0.55, β = 0, 0.3, βmax, α = 0.5, w = 3.

.

134

0.4 0.5 0.6 0.7 0.8 0.9 11900

2000

2100

2200

2300

2400

2500

2600

Figure 5.6: Reward obtained by OPT, RHC and AFHC as a function of βmax. U(r) =r(1−α)

1− α, pc = 0.55, β = 0, 0.3, βmax, α = 0.5, w = 3.

.

In Fig. 5.4, we simulate three flows with β values equal to 0, 0.7, 0.95, repre-

senting different delay sensitivities for a horizon T = 500. The secondary capacity

evolves as the Markov Chain shown in Fig. 5.2. We take the utility function as the

α-fairness function with α = 0.5. We set pc to be equal to 0.55. We plot the empirical

competitive ratio as a function of w. We see that the RHC performance is slightly

superior to AFHC for all values of prediction window w. Naturally, we see that the

competitive ratio increases with the increase of w. Interestingly, the empirical rate of

increase is very similar to the growth of the function 1− 1

w + 1, which suggests that

our theoretical lower bound matches the empirical results order-wise up to a constant

factor. However our findings confirm that the CR converges to 1 as w increases. It is

worth noting that the α-fairness functions do not satisfy the assumption A2 in Theo-

rem 1, since the gradient is unbounded. However, this can be fixed by modifying the

utility functions to be U(r) =(ε+ r)(1−α)

(1− α)− ε

1− α. This will cause the gradient to be

bounded by ε. A small ε approximates α-fairness functions efficiently at the expense

135

of a loose lower bound on the competitive ratio. Thus, to get a fairly tight bound

we have to either use a larger ε or refine the bound in Lemma 2 by using special

properties of the utility function such as strong concavity.

In Fig. 5.5, we use the same setup as the previous case to simulate the empir-

ical performance of three flows with parameters 0, 0.3, βmax, where βmax is varied

between 0.4 and 0.99. We use a small prediction window with w = 3. Our lower

bound have suggested that there might be some performance degradation of online

algorithms as flows become more delay tolerant. However, while our lower bound

suggests very fast degradation in performance (as1

1− βmax

), simulations show that

degradation happens at a much slower rate, and that a small window, w = 3 achieves

over 85% of the utility of OPT, even as βmax is increased to 0.95. In Fig. 5.6, we

plot the objective function of OPT, AFHC, and RHC under the same setup. Fig. 5.6

shows that while Reward(OPT) is guaranteed to be non-decreasing with increased

βmax, since more delay tolerance enables flows to defer transmissions until favorable

conditions appear. On the other hand, Reward(AFHC) and Reward(RHC) are not

guaranteed to increase with βmax.

In Fig. 5.7, we compare the lower bound derived from Theorem 1 to the empirical

lower bound. For this figure we assume that c[t] is an i.i.d random variable that takes

a value uniformly distributed between 0 and 5. We use a modified α-fairness function

as our utility value function. In particular, we use U(r) =(1 + r)(1−α)

1− α− 1

1− α, which

is concave increasing and satisfies assumptions A1-A3. The system for this figure has

3 flows with β = 0.2, 0.4, 0.6. We can see that the lower bound is loose. This is

due to two reasons. First, the lower bound covers all possible sequences of secondary

capacities including adversarial cases, where an adversary can view the scheduler’s

decision and generate future capacities to minimize the scheduler’s reward. In Fig.

5.7, the capacities are sampled as an i.i.d sequence, which naturally performs much

136

0 5 10 15

0

0.2

0.4

0.6

0.8

1

Figure 5.7: Comparison between theoretical lower bound and simulated competitive

ratio, β = 0.2, 0.4, 0.6, U(r) =(1 + r)(1−α)

1− α− 1

1− α, α = 0.5.

better than the adversarial worst-case. Second, The approximation in Lemma 1 is

very coarse since it must apply to all possible utility functions. We can see that up to

w = 2, there is no theoretical guarantee for performance whereas practically, AFHC

achieves over 90% of the utility achieved by OPT. However, the bound gets tighter

as the prediction window increases.

5.7 Conclusion

In this chapter, we have studied the problem of application rate allocation over dif-

ferent radio interfaces. We have addressed the issue of different delay requirements

of applications using the discounted-rate framework. We have proposed two online

predictive algorithms to handle the intermittence of secondary interface(s). We have

shown that for the AFHC algorithm, the competitive ratio is 1 − Ω(1

w + 1) when

using a prediction window of length w. We have tested our algorithms for a number

137

of practical scenarios using different utility functions. The empirical performance of

the proposed online algorithms are consistently near-optimal using small prediction

windows. We intend to extend our work in two directions: 1. We plan to consider

systems with arrivals, whereby flows of different classes arrive randomly. The flows

are then served at a rate determined by the proposed algorithms and exit the system

once they receive a total rate equal to their random size. The interesting questions

include a characterization of the stable region of the proposed algorithms (in the sense

of [107]), as well as the effect of (wi, βi) on the mean flow response time. 2. We plan

on testing our algorithm in realistic scenarios using real world traces of user mobility

and mobile flows that belong to real applications, which will enable us to compare

the performance of the proposed algorithms to other solutions in the literature.

138

CHAPTER 6

CONCLUSION

In this dissertation, we have studied the design of algorithms to enable low latency

communication in wireless networks. We have identified the fundamental properties

that make low latency wireless communication a challenge, and how network algo-

rithms and protocols across the stack could be designed to enable it. Furthermore,

we have shown how a wide set of theoretical tools from optimization and queuing

theory can be applied to analyze the performance analysis from a delay standpoint.

Our research results enable a deeper understanding of how reliable high-throughput

low-latency wireless communication needed to support many 5G applications can be

realized using delay-oriented algorithm design across the network stack.

First, we looked at the fundamental problem of cellular downlink resource allo-

cation for real-time traffic. Specifically, we were interested in bandwidth-intensive

latency-critical traffic with hard deadlines such as Virtual/Augmented Reality and

Cloud Gaming. We were able to show that deadline-oblivious resource allocation

schemes can be very efficient by approaching the problem as an online convex op-

timization problem and utilizing primal-dual analysis tools from online algorithms.

139

Furthermore, we developed a modified version of our deadline-oblivious resource al-

location algorithm to account for long-term fairness and were able to show via com-

bining primal-dual analysis with Lyapunov stochastic control tools that deadline-

oblivious schedulers can be competitive while satisfying long-term performance guar-

antees. This result is important as it shows that efficient real-time schedulers need

not know the individual jobs’ deadlines which drastically reduces their complexity

compared to schedulers that track individual jobs’ deadlines. We expect this result

to guide in the MAC layer protocol design of 5G bandwidth-intensive traffic as we

have outlined the theoretical guidelines in designing low-complexity schedulers along

with methods to assess their performance.

Second, we studied the problem of predictive caching in wireless networks. As

storage and networking become increasingly intertwined in datacenters and content

delivery networks, there is a similar trend in wireless networks of relying on edge

caches for content dissemination to mobile users. We have proposed pushing the

problem of edge caching one step further by caching popular content at end users’

mobile devices, utilizing the wireless channel broadcast property. To model the small

end users’ caches, we have the developed the notion of “diminishing caches” to ana-

lyze the system in heavy traffic as cache size approaches zero in the limit. We have

studied the delay effect of this wireless “predictive caching” system and have shown

that intelligent predictive caching fundamentally alters the delay-throughput scaling

in the heavy traffic translating to several fold practical savings in delay as the wire-

less link approaches full-utilization. To this end, we applied the heavy-traffic tools

from queuing theory to characterize the fundamental delay-memory trade-off in the

system highlighting the critical memory dimensioning with throughput increase to

realize multicasting gains in delay savings. This result motivates the practical im-

plementation of predictive caching which can utilize existing LTE/5G multicasting

140

protocols to reduce delay, and further lays a theoretical foundation for analyzing the

interplay between delay and storage as caching becomes an integral part of emerging

wireless systems.

Third, we addressed the classical open problem of distributed wireless scheduling.

Building on recent developments [13] [14] that have shown that distributed CSMA

algorithms can be made throughput optimal but were shown to suffer from poor de-

lay performance due to the CSMA starvation problem. We proposed a Node-Based

CSMA (NB-CSMA) algorithm that exploits inter-dependencies between links and the

presence of hotspots in large wireless networks. The underlying algorithm design built

on ideas from statistical physics Glauber dynamics with “Block Updates”, this en-

abled us to preserve the throughput optimality of the algorithm. We have shown that

NB-CSMA has superior delay performance for fixed networks by analyzing the second

order properties of the underlying scheduling Markov chains. We have also shown

that NB-CSMA further expands the polynomial delay fraction (the capacity-region

fraction where delay scales polynomially with the size of the network) of the compa-

rable link-based CSMA algorithms in the literature. We also studied the special case

of Collocated networks and presented a new simple method to obtain the expected

per-link starvation time. We have utilized that to further highlight the benefits of

NB-CSMA and the network topology effects. To empirically assess NB-CSMA, we

built a simulator that can assess the performance of any scheduling algorithm for

any network topology. We have shown that NB-CSMA provides around 50% delay

reduction consistently over comparable link-based algorithms in the literature. This

work can guide the design of layer 2 protocols in large wireless ad-hoc networks,

and illustrates the benefits of designing node-based layer 2 scheduling protocols while

maintaining the fully distributed properties of CSMA.

141

Finally, we studied the problem of managing different cellular interfaces in a mo-

bile device. This problem is especially relevant as Heterogeneous Networks (HetNets)

are becoming an integral part of the 5G architecture. As mobile devices add more Ra-

dio Access Technology (RAT) capabilities, how to manage interfaces becomes critical

to realizing the full potential of the emerging technologies such as mmWave. In mobile

devices, there are usually two connection modes: A ubiquitous “expensive” cellular

connection (in terms of congestion to the cellular network, cost, or energy), and an

intermittent cheap connection such WiFi, Dynamic Spectrum Access, and mmWave.

A scheduler allocates those two types of resources to applications with different QoS

requirements. Finally, some works have shown that wireless link can be reliably pre-

dicted for a few minutes ahead from mobility information [87]. We modelled the

problem of allocating wireless resources over different RATs to applications as an on-

line convex optimization problem. We leveraged the prediction capability to develop

two low complexity allocation algorithms to proactively allocate wireless resources

to applications. We have shown that our allocation schemes appropriately serve ap-

plications according to their QoS preference maintaining a consistent throughput for

delay-stringent applications while taking advantage of cheap connections for delay

tolerant ones by transmitting into bursts. We also analyzed the effect of prediction

quality on allocation efficiency quantified by the competitive ratio, showing that the

competitive ratio shrinks as O( 1

w

)with the prediction window length, w. The im-

plications of this work are two-fold: first, we give a simple client-base predictive rate

allocation scheme for different applications over radio interfaces, second: we quantify

the benefit of link-prediction using online convex optimization tools. We expect this

to aid in designing and assessing the benefits of predictive resource allocation schemes

in other wireless networking domains.

The investigations in this dissertation show how Online Convex Optimization

142

could be used to assess wireless resource allocation algorithms. Traditionally, convex

optimization has been used for resource allocation in networks [108] [109] [110] to

optimize long-term averages. However, traditional convex optimization techniques

are not suited to real-time traffic with deadlines as in Chapter 2 or non-stochastic

models such as the mobility model in Chapter 5. This dissertation demonstrates how

these cases can be approached using an Online Convex Optimization framework. Fur-

thermore, it is shown how Primal-Dual and Predictive algorithms can be designed to

obtain performance that is constant-competitive compared to the offline optimal.

We also show in Chapter 2 how to combine Online Convex Optimization tools with

traditional Stochastic Control tools to obtain powerful algorithms for real-time traffic

satisfying long-term constraints.

We have also provided a novel method to analyze caching from a network delay

standpoint. Caching networks are notoriously hard to analyze due to non-linearities

in the system and the curse of dimensionality. Thus, the effect of caching on network

delay has been mostly empirically studied without proper theoretical understanding.

To this end, we have introduced a novel application of the Heavy-Traffic framework,

used to analyze delay scaling when the network approaches the full-load, that is

suited to analyze caching systems. We have introduced the “zero caches” notion to

capture the practical limitations of end-user cache sizes. We have leveraged those

two developments to present a strong delay scaling result for caching systems. Thus,

this investigation reveals the applicability of Heavy-traffic tools for analyzing delay

aspects of caching systems, which could be leveraged to understand more caching

systems from a theoretical delay standpoint.

Regarding the main focus in this dissertation, there are still many interesting

open problems to fully realize reliable low-latency wireless communication in the next

generation of networks. One line of work that could be extended in this dissertation is

143

a better understanding of the interplay of delay and caching in wireless networks. As

networking and storage become more coupled at the edge, new frameworks are needed

to approach networking and caching jointly as one problem at the wireless edge.

For instance, cache replacement policies are well understood from a miss-probability

standpoint but combining the costs of delivery of predictive or opportunistic caching

along with networking metrics such as throughput and delay, open up new spaces

to better understand cache policies from a networking standpoint. Furthermore,

building on our result that multicasting and caching can still reduce delay open the

door to designing complex multicast algorithms that target users with personalized

content, instead of broadcasting globally popular content to every user in the cell.

Thus, combining recommender systems, caching, and multicasting could aid us in

developing new network/link layers that intelligently disseminate content in novel

ways that drastically reduce delay. There is a need for such a holistic approach in

both 5G and Information Centric Networks.

Another interesting research direction is the control of real-time traffic in cellular

downlinks. We have shown that deadline-oblivious algorithms can indeed be very

efficient in the “best effort” model, where a concave reward is obtained according

to the amount of traffic served from every flow. While this model is accurate in

applications such as Virtual Reality and Cloud gaming, a 5G downlink has to allocate

resources to other applications that do not necessarily fit this model, for example,

some applications are only rewarded if they are fully served, thus their reward function

is a step function not a concave smooth function. Modifications would be needed to

accommodate that type of traffic. It is also worth investigating whether premeption

might be the right solution for this type of traffic. Other real-time cellular traffic

includes Ultra Low Latency Reliable Communication (URLLC) traffic that requires

deterministic delay along with very high reliability. In that case, the scheduler needs

144

to take into account coding and retransmissions along with very stringent reliability

constraints. This also implies that the functionality of the scheduler needs to be

significantly expanded and new algorithms need to be developed to accommodate

that large variety of low-latency communication.

Finally, low-latency communication would further require developments across

all the network layer stack, starting from physical layer new technologies such as

mmWave and massive MIMO, to radio layer developments such as HetNets addressed

in Chapter 5, to upper layers developments such as caching capabilities addressed in

Chapter 3. Novel holistic approaches are still needed to fully integrate the benefits

offered by the developments across each layer to further enhance wireless networks

capability in supporting low latency communication.

145

APPENDIX A: PROOFS FOR CHAPTER 2

A.1 Proof of Lemma 2.4.2

We prove the Lemma by induction. The base case is t = 0, where βtj ≥ 0 is trivially

satisfied. Suppose the claim is true for t−1, then substituting in the update equation

in Algorithm 1, line 5, we obtain the following:

βtj =∂f(∑t

s=1Asjxsj)

∂f(∑t−1

s=1Asjxsj)

(1 +

AtjxtjYj

)βt−1j

+∂f(∑t

s=1Asjxsj)Atjxtj(C − 1)Yj

(A.1.1)

(a)

≥ ∂f(∑t

s=1Asjxsj)

C − 1

(C

∑t−1s=0 Asjxsj

Yj(1 +

AtjxtjYj

)− 1

)(A.1.2)

(b)

≥ ∂f(∑t

s=1Asjxsj)

C − 1

(C

∑ts=0 Asjxsj

Yj − 1

), (A.1.3)

where (a) is from the induction hypothesis and (b) follows the inequalitylog(1 + y)

y≤

log(1 + x)

xwhen y ≥ x, and we have chosen Fmax ≥

AtjxtjYj

, ∀j, ∀t.

A.2 Proof of Lemma 2.4.3

eqrefdual1 is straightforward, since by line 4 in the algorithm, α ≥ 0. (2.4.3) can be

shown by noticing that for any job j, βtj is a non-decreasing geometric series that

starts from 0, thus, βtj ≥ 0, ∀j ∀t. (2.4.4) is also guaranteed by the choice of xt by line

4 in the algorithm. (2.4.5) is a consequence of Lemma 2.4.1 and Lemma 2.4.2. Given

146

that a job is completely served, i.e.,t∑

s=1

Atjxtj ≥ Yj, Lemma 2.4.2 guarantees it’s dual

variable βtj ≥ ∂(f(t∑

s=1

Atjxtj)). Lemma 2.4.1 tells us that αtj = ∂f(t∑

s=1

Atjxtj) ≤ βtj.

Since DO tries to maximize the inner product < α − β,t−1∑s=1

Asxs + Atx >, having

αtj ≤ βtj implies that xtj = 0 is optimal. It follows that when a job is completely

served, no resources are allocated to that job from thereon. There can only be one

iteration where a job can be served over its size, bounding that excess resources by

FmaxYj concludes the Lemma.

A.3 Proof of Lemma 2.4.5

For any active job j, we can bound each element in the LHS inner product as follows:

(βt − βt−1)TY(a)

≤βt−1jYj

(∂f(∑t

s=1 Asjxsj)

∂f(∑t−1

s=1Asjxs)

(1 +

AtjxtjYj

)− 1

)+∂f(∑t

s=1 Asjxsj)Atjxtj(C − 1)

(A.3.1)

(b)

≤ ∂f(t∑

s=1

Asjxsj)Atjxtj

(βt−1j

∂f(∑t−1

s=1Asjxsj)+

1

C − 1

)(A.3.2)

(c)

≤ 4Pj(1 +1

C − 1) (A.3.3)

Here (a) is due to the update equation of β. (b) is obtained by noticing that

∂f(t∑

s=1

Asxs) ≤ ∂f(t−1∑s=1

Asxs) by concavity. (c) is because βt−1j ≤ ∂f(t−1∑s=1

Asxs) if

xtj > 0 (since this implies that αt−1j > βt−1j).

147

A.4 Proof of Theorem 2.4.1

The first two terms in (2.4.6) can be bounded as follows

D′ =T∑t=1

σt(ATt αT − βT ) + βTT Y (A.4.1)

=T∑t=1

< αT − βT ,t∑

s=1

Asxs > +βTT Y (A.4.2)

(a)

≤T∑t=1

< αT ,

t∑s=1

Asxs > +βTT Y (A.4.3)

(b)

≤T∑t=1

< αt,t∑

s=1

Asxs > +βTT Y (A.4.4)

(c)=

T∑t=1

< αt,t∑

s=1

Asxs > +T∑t=1

(βt − βt−1)TY (A.4.5)

(d)

≤T∑t=1

4P (2 +1

C − 1) = P (2 +

1

C − 1) (A.4.6)

where (a) is because βT ≥ 0, so dropping the term < −βT,t∑

s=1

Asxs > can only

increase the objective. (b) is because αt ≥ αT , by Lemma 2.4.1 and the concavity of

the function, thus decreasing gradients. (c) is true due to telescoping and the fact that

β0 = 0. (d) holds by substituting the bounds from Lemmas 2.4.4 and 2.4.5. Finally

by (2.4.6), we have D = D′ −J∑j=1

f ∗j (αTj). We can bound that extra −J∑j=1

f ∗j (αTj)

term on the RHS by P utilizing Lemma 2.4.6. Adding that bound to the bound on

D′ concludes the proof.

A.5 Proof of Theorem 2.5.2

Let the reward achieved and the amount of traffic served by the D look-ahead be

denoted by P ′[k] and b′[k] as in (2.5.7) and (2.5.8), respectively. Similarly OPT

148

achieves P ∗[k] and b∗[k]. By the maximization in (2.5.6a), we have:

N∑n=1

Qn[k](δn − b′n[k])− V P ′[k] ≤N∑n=1

Qn[k](δn − b∗n[k])− V P ∗[k] (A.5.1)

for any frame instance, thus, the inequality holds for the conditional expectation

given the queue lengths equal to Q. Noting the definition of the drift in (2.5.9), we

can bound E(4Θ′(Q))− V E(P ′[k]|Q) as follows:

≤ E(4Θ∗(Q))− V E(P ∗[k]|Q) (A.5.2)

(a)

≤ B +N∑n=1

Qn[k]E(δn − b∗n[k])− V E(P ∗[k]|Q) (A.5.3)

(b)

≤ B − V E(P ∗[k]|Q) (A.5.4)

where (a) is by the bound in (2.5.9), and (b) is because the optimal stationary solution

satisfies the constraint in expectation independent of Q. Taking the expectation over

Q and taking the time average over all the frames, we can use telescoping sums to

arrive at the key equation

L(Q[k])− L(Q[0])− V

K

K∑k=1

E(P ′[k]) ≤ B − V P ∗ (A.5.5)

Noting that L(Q[k]) is non-negative, initializing L(Q[0]) to 0, and rearranging the

sum gives (2.5.10). To prove (2.5.11), we can follow the same steps by comparing

the solution produced by the D Look-ahead algorithm to another solution that can

strictly satisfy the constraint (2.5.1b), i.e., E(δn − bn[k]) < −ε,∀n, for some ε > 0.

This solution is guaranteed to exist by the assumption in the Theorem statement. We

denote the reward of that solution as P (ε). Repeating the same steps up to (A.5.3)

149

we get the following inequality:

E(4Θ′(Q[k]))− V E(P ′[k]) ≤ B − εN∑n=1

E(Qn[k])− V E(P (ε)[k]|Q) (A.5.6)

Similar to last part, we can take the time average over frames and telescope to get

1

K

K∑k=1

N∑n=1

E(Qn[k]) ≤ B + V E(P (ε))− E(P ′)) + E(L(Q[0])

ε

Bounding P (ε) by the Mfmax(Ymax), the maximum achievable reward over the frame,

and taking the limit gives (2.5.11).

150

APPENDIX B: PROOFS FOR CHAPTER 3

B.1 Proof of Lemma 3.3.1

To prove a lower bound, we follow the resource pooling approach in [47], by first

defining the generic queuing system shown in Fig. 5 that evolves as follows:

φ[t+ 1] = (φ[t] + α[t]− β[t])+ (B.1.1)

= φ[t] + α[t]− β[t] + χ[t] (B.1.2)

where χ[t] is the unused work equal to max(0, β[t] − α[t] − φ[t]). We assume that

both distribution have finite support, i.e., there exists αmax and βmax, such that,

α[t] ≤ αmax, and β[t] ≤ βmax, for all t almost surely. The means and variances of the

arrival and service rates respectively are µα, σ2α and µβ, σ2

β. It was shown in [47] that

in the steady-state, the expected queue length can be lower bounded as follows:

E[φ] ≥ ζ(ε)

2ε−B1 (B.1.3)

[#]ɸ[#]

β[#]

Figure B.1: Generic Resource Pooling queue

151

where ζ(ε) = σ2A + µ2

β + ε2, and B1 =βmax

2. A resource pooling lower bound for

the on-demand system can be obtained from the capacity region face, F , by taking

the arrival to be α[t] = 〈c, A[t]〉, where c = [1√N,

1√N, . . . ,

1√N

], and b =1√N

. We

have used the assumption that the users’ arrivals are homogeneous, as well as the

assumption of a collision channel. Applying the resource pooling lower bound on the

scheduling problem in [47], we obtain the following bound on the steady state queue

lengths:

E[〈c,Q(ε)〉] ≥(σ(ε))2 + ε2

Nε√N

− 1

2√N

(B.1.4)

Multiplying the LHS and RHS by√N gives (3.3.2). Taking ε→ 0 gives (3.3.3).

B.2 Proof of Proposition 3.5.1

We begin the proof by analyzing the Lyapunov drift of the function W (Q). For

convenience we drop the ε superscript notation and the time index t.

E[∆W [Q]] = E[‖Q[t+ 1]‖2 − ‖Q[t]‖2 |Q] (B.2.1)

=E[‖Q + A + B− S‖2 + 2〈Q + A + B− S,U〉+ ‖U‖2 − ‖Q‖2 |Q]

(a)

≤E[‖Q + A + B− S‖2 − ‖Q‖2 |Q](b)

≤ 2E[〈Q,A + B− S〉|Q] + 2N (B.2.2)

where (a) follows form (Qi +Ai +Bi− Si)Ui < 0), and (b) is since Ai, Bi, and Si are

all bounded by 1. We proceed to bound the first term in LHS in (B.2.2) by defining

a hypothetical arrival rate λB =(1− θ)(N − 1)

N1. Denote the expectation of the

service rate as E[Si[t]] = µ. The term can be then bounded as follows:

E[〈Q,A + B− S〉|Q] = E[〈Q,B− λB〉|Q] + E[〈Q, λB + A− S〉|Q]

152

= 〈Q,E[B|Q]〉 − 〈Q, λB〉+ 〈Q, λ∗A + λB − µ〉

(a)

≤ (1− θ)( N∑i=1

Qi −Qmax

)−

N∑i=1

(1− θ)(N − 1)Qi

N− δ(ε, θ)√

N

∥∥Q‖∥∥=−(1− θ)

N

N∑i=1

(Qmax1−Qi)−δ(ε, θ)√N

∥∥Q‖∥∥=−(1− θ)

N‖Qmax1−Q‖1 −

δ(ε, θ)√N

∥∥Q‖∥∥(b)

≤ −(1− θ)N

‖Qmax1−Q‖ − δ(ε, θ)√N

∥∥Q‖∥∥(c)

≤ −(1− θ)N

∥∥∥∥∥1∑N

i=1 Qi

N−Q

∥∥∥∥∥− δ(ε, θ)√N

∥∥Q‖∥∥(d)= −(1− θ)

N‖Q⊥‖ −

δ(ε, θ)√N

∥∥Q‖∥∥ (B.2.3)

where the first term in (a) follows from the fact that the JS(N − 1)Q routing policy

increases the lengths of all queues by 1 except for one queue having the maximal

length whenever N − 1 requests arrive to the router which happens with probability

(1− θ), the second term is by direct computation, and the third term is by the fact

that λ∗A + λB − µ = −[(1− θ)N

− r

∫ τ(θ,ε)

0

pf(p)dp]1 = −δ(ε, θ)N

1. (b) follows from

the fact that for any vector x ∈ RN , ‖x‖1 ≥ ‖x‖, i.e., the L1 norm of any vector is

always greater than or equal the L2 norm. (c) follows from the fact that the average

is less than or equal than the maximum. (d) is by definition of Q⊥. The next step in

the proof is finding a lower bound for E[W‖(Q)|Q]. It is straightforward to show the

following holds:

E[W‖(Q)|Q] =E[〈c,Q+ A+B − S + U〉2 − 〈c,Q〉2|Q]

≥2〈c,Q〉〈c, λ∗ + E[B|Q]− µ〉 − 2E[〈c, S〉〈c, U〉]

≥− 2δ(ε, θ)√N

∥∥Q‖∥∥− 2δ(ε, θ) (B.2.4)

153

We can plug the bounds in (B.2.2), (B.2.3), and (B.2.4) in (3.5.10) to get:

E[∆V⊥(Q)|Q] ≤ −(1− θ)N

+N + 1

‖Q⊥‖(B.2.5)

This inequality establishes the first condition (3.5.6) in Lemma 3.5.2. The second

condition is satisfied by assumption, and thus, applying the conclusion of the Lemma

3.5.2 in (3.5.9) to V⊥(Q) concludes the proof.

154

APPENDIX C: PROOFS FOR CHAPTER 4

C.1 Proof of Theorem 4.5.1

Given the update cliques C, for two feasible schedules (s, s′), define the symmetric

difference as s4s′ = (s \ s′) ∪ (s′ \ s). It is easy to see that for the transition to

happen, the update cliques should fulfill the condition that

s4s′ ⊆ C (C.1.1)

Given any such selection of update cliques, we have for any Ck ∈ C:

1. If ∃!v ∈ Ck∩ (s′ \s) s.t. s4s′∩Ck = v, then P (sCk , s′Ck) is a Type A transition,

where ∃!a means “there exists a unique a”.

2. If ∃!w ∈ Ck∩(s\s′) s.t. s4s′∩Ck = w, then P (sCk , s′Ck) is a Type B transition.

3. If ∃x, y ∈ Ck s.t. x ∈ (s \ s′) and y ∈ (s′ \ s), then P (sCk , s′Ck) is a Type C

transition.

A straight-forward substitution shows that

π(s)P (s, s′|C) = π(s′)P (s′, s|C)) (C.1.2)∑C

P (C)π(s)P (s, s′|C) =∑C

P (C)π(s′)P (s′, s|C)) (C.1.3)

π(s)P (s, s′) = π(s′)P (s′, s) (C.1.4)

155

Thus the stationary distribution satisfies the detailed balance equations.

C.2 Proof of Theorem 4.5.2

When comparing the reversible Markov Chains P, P we first notice that π(s) =

π(s), ∀s ∈ Ω by (4.3.3) and (4.5.3). It is a well-known fact of Markov Chains that

the stationary distribution of a state is the reciprocal of its expected recurrence time,

i.e., for any state (or group of states) the following holds by (4.3.3) and (4.5.3)

1

E(τi)= πB = πB =

1

E(τi)(C.2.1)

Next we define the Dirichlet form ε(f, f) for functions f : Ω→ R [73] by

ε(f, f) =1

2

∑x,y∈Ω

(f(x)− f(y))2π(x)P (x, y) (C.2.2)

The comparison method in [111] provides a method to compare the Dirichlet forms

of two reversible Markov Chains defined on the same state space Ω to obtain linear

inequalities between them when the Markov Chains do not necessarily have a linear

relationship. Define E = (x, y) : P (x, y) > 0. An E-path from x to y is a sequence

Γ = (e1, e2, ..., em) of edges in E such that e1 = (x, x1), e2 = (x1, x2), ...., em =

(xm−1, y) for some states x1, ..., xm−1 ∈ Ω. The length of E-path Γ is denoted by

|Γ|. Suppose that for each (x, y) ∈ E there is an E-path from x to y. We refer to this

path as Γxy. Now, define the congestion ratio as

A = max(z,w)∈E

1

π(z)P (z, w)

∑E(z,w)

|Γxy|π(x)P (x, y) (C.2.3)

156

The comparison theorem then states that

ε(f, f) ≤ Aε(f, f). (C.2.4)

The key to calculating the congestion ratio A is noticing that all the Q-CSMA Markov

chain’s transitions are contained within the NB-CSMA Markov chain’s transitions.

In particular, line 4, line 5, line 11 and line 12 of Algorithm 4 are exactly Q-CSMA

operations. Furthermore, the first and third assumption ensures that the probability

of “refreshing” any link v ∈ V is equal for both Q-CSMA and NB-CSMA, i.e.:

P (x, y) = P (x, y), ∀x, y ∈ Ω s.t. P (x, y) > 0. (C.2.5)

We argue that the extra “transitions” in NB-CSMA Markov chain entails better delay

performance. To apply the comparison theorem, we simply take the E-path Γxy given

any x and y to be (x, y). Furthermore, by the equation in (C.2.5), and the fact that

both chains have the same stationary distribution, the computation of the congestion

ratio in (C.2.3) gives A = 1. By the comparison theorem

ε(f, f) ≤ ε(f, f). (C.2.6)

Define the hitting time HB as (HB is defined similarly)

HB = mint ≥ 0 : σv(t) = 1. (C.2.7)

The hitting time HB is the time needed to reach the subset of states B where link v

is active. We are interested in the expected hitting time: The time needed to reach

subset B starting from a randomly chosen state. By the formula in [67] (Presented

157

originally in [112, Ch.3, Proposition 41]) we have

E(H) = supg 1

ε(g, g): −∞ < g <∞, g(.) = 1 on B and

∑s∈Ω

π(s)g(s) = 0 (C.2.8)

By the equality of stationary distributions of the two chains, and substituting with

the inequality relating their Dirichlet forms (C.2.6) in (C.2.8), we get that

E(H) ≤ E(H), (C.2.9)

Again from [112], we obtain an important relationship relating the recurrence time

and the hitting time

E(τ 2i ) =

2E(H) + 1

πB(C.2.10)

Substituting Inequality (C.2.9) in (C.2.10)

E(τ 2i ) =

2E(H) + 1

πB≤ 2E(H) + 1

πB= E(τ 2

i ) (C.2.11)

C.3 Proof of Theorem 4.5.4

We choose our distance metric function δ(x, y) to be the Hamming distance between

the two schedules x and y, δ(x, y) =∑v∈V

1(xv 6= yv). We take the subset S ⊆ Ω to

be the states that are different at only 1 link , i.e., δ(x, y) = 1. It is straightforward

to check that the subset S satisfies the condition (4.5.15). Line 2 in Algorithm 5 has

the effect of making the Markov chain lazy (probability of staying in any state is at

least1

2). Making the Markov chain lazy makes the task of relating the mixing time

of Q to the mixing time of P easier in Theorem 4.5.5. Also, making Q lazy has the

effect of doubling the mixing time. Therefore we neglect this self transition (Line 2

158

and Line 3 in Algorithm 5) in the analysis and compensate for it by a factor of 2 at

the end of the analysis. We will use the prime symbol to donate states after one slot

has elapsed, for example P (X ′, Y ′|x, y) is the distribution of the schedule at time slot

t + 1 given that the Markov chain was at state (x, y) at time t. The next step is to

calculate E(δ(x′, y′)), that is, the expected distance between the states after one time

slot has elapsed.

Let x and y be two feasible schedules on Ω that agree everywhere except at link

v. Suppose WLOG that xv = 1 and yv = 0. Note that this directly implies that

xw = yw = 0 ∀w ∈ Nv. We run the Markov chain Q for one slot and estimate the

expected distance metric after one time slot δ(x′, y′). There are 5 different cases that

result in different E(δ(x′, y′)). We define the coupling for each of these cases:

1. Kv is chosen to be updated w.p.|Kv|n

for both (X, Y ). Furthermore both

(X, Y ) choose link v to update w.p.1

|Kv|(where x is performing lines 7, 8

of algorithm 5 and y is performing lines 13, 14 of algorithm 5), and P (X ′ =

X, Y ′ = X) =λv

n(1 + λmax), P (X ′ = Y, Y ′ = Y ) =

1

n(1 + λmax). Thus, in this

case δ(x′, y′) = 0 w.p. 1.

2. Kv is chosen to be updated w.p.|Kv|n

for both (X, Y ), also both X and Y

attempt to activate a new link w ∈ Kv that has xz(t) = yz(t) = 0, ∀z ∈ Nw \v.

Define the coupling as follows

P (X ′ = Y ∪ w, Y ′ = Y ∪ w) =1

2n

λw(1 + λmax)

(C.3.1)

P (X ′ = X, Y ′ = Y ∪ w) =1

2n

λw(1 + λmax)

(C.3.2)

where x is performing lines 10, 11 and y is performing lines 14, 15 of Algorithm

5. Since Both contributions are equal we have E(δ(x′, y′)) = 1.

159

3. A link w ∈ Nv \Kv where w ∈ Cw and∑w∈Cw

xw =∑w∈Cw

yw = 0 . Now both x and

y are performing lines 13, 14 of Algorithm refnb:QMC. We have the following

coupling

P (X ′ = X, Y ′ = Y ∪ w) =λw

n(1 + λmax)(C.3.3)

P (X ′ = X, Y ′ = Y ) =1

n(1 + λmax)(C.3.4)

Thus, in the first equation δ(x′, y′) = 2, and in the second equation δ(x′, y′) = 1.

4. A link w ∈ Nv \Kv where w ∈ Cw and∑w∈Cw

xw =∑w∈Cw

yw = 1 . Now both x and

y are performing lines 10, 11 of Algorithm 5. We have the following coupling

P (X ′ = X, Y ′ = Y ∪ w) =λw

2n(1 + λmax)(C.3.5)

P (X ′ = X, Y ′ = Y ) =1

2n(1 + λmax)(C.3.6)

Thus, in the first equation δ(x′, y′) = 3 and the second equation δ(x′, y′) = 1.

5. A link w is chosen to updated where w does not fall in any of the previous four

categories. In that case, the coupling is defined to make both X and Y perform

the same update. In this case, we have δ(x′, y′) = 1.

It is straightforward to see that there are at most |Kv| − 1 links satisfying case 2,

and at most dv − |Kv| + 1 satisfying case 3 and case 4. By collecting the individual

contributions of all cases we obtain the following result

E(δ(x′, y′)− 1) ≤ 1

n

( ∑w∈Nv\Kv

λw1 + λmax

− 1)

(C.3.7)

≤ 1

n

((dv − |Kv|+ 1)

λmax

1 + λmax

− 1)

(C.3.8)

160

Now by taking λmax < minv

1

dv − |Kv|=

1

maxv

(dv − |Kv|), we get the β term in Theo-

rem 4.5.3 as

β = 1− 1

n

(1−maxv

(dv − |Kv|)λmax + 1)

1 + λmax

< 1 (C.3.9)

Directly applying Theorem 3 proves the theorem.

C.4 Proof of Theorem 4.5.6

We follow the approach of [65] to prove the theorem. The first equality states that

the expected per-link queue length is bounded order-wise by the underlying Markov

Chain mixing time. This result was proven in [65] using a Lyapunov analysis.

To prove the second equality, we need to show that the mixing time of the Markov

Chain is bounded order-wise by O(n2 log(n)). We have seen in Theorem 5 that, if the

activation rates are bounded by1

maxv

(dv − |Kv|), then the mixing time bound holds.

The critical part in the proof is showing that there exist a set of activation rates that

satisfies λv <1

dv − |Kv|, ∀v ∈ V that stabilizes the network when ν ∈ γΛ, i.e., those

activation rates cause all the queues to see a service rate no lower than the arrival

rate when the arrival rate vector is in the region γΛ.

Now Suppose that E(sv) = νv, this implies stability of any arrival rate in γΛ.

Let pv0 be the probability that the medium as seen by link v is not blocked. It is

straightforward to see (and proved in detail in [65]) that the service rate satisfies

E(sv) =λv

1 + λvpv0. By the union bound we have

1− pv0 ≤∑j∈Kv

sj +∑

k∈Nv\Kv

sk =∑j∈Kv

νj +∑

k∈Nv\Kv

νk (C.4.1)

161

Also, note that ν ′ =1

γν ∈ Λ. Hence, there exists another set of activation rates sch2

(λ′1, λ′2, . . . , λ

′|V |) which can stabilize ν ′. Under sch2, we have 1 − ν ′v as the fraction

of time where link v is idle. During these idle slots, at most 1 link from Kv and

dv − |Kv|+ 1 links from Nv \Kv are active, but the total service of link v neighbors

cannot exceed (1− ν ′v) to ensure that v is stable, thus

∑j∈Kv

ν ′j +∑

k∈Nv\Kv

ν ′k ≤ (dv − |Kv|+ 2)(1− ν ′v). (C.4.2)

Combining (C.4.1) and (C.4.2)

1− pv0 ≤∑j∈Kv

νj +∑

k∈Nv\Kv

νk ≤ γ(dv − |Kv|+ 2)(1− νvγ

) ≤ (1− νvγ

) (C.4.3)

Hence, νv ≤ γpv0 which impliesλv

1 + λv≤ γ. A direct substitution gives λv ≤

1

dv − |Kv|, and this concludes the proof.

162

APPENDIX D: PROOFS FOR CHAPTER 5

D.1 Proof of Lemma 5.5.1

Taking 1T θi[t] = xi[t] + yi[t] in (5.4.1) we get the following bound:

yi[t] ≤1

1− βi(R

(βi)i [t]− βR(βi)

i [t− 1]). (D.1.1)

The reward per-flow every time slot can be bounded as follows

U(R(βi)i [t])− pcyi[t] ≥ U(R

(βi)i [t])− pc

1− βi( Ri[t]− βRi[t− 1]). (D.1.2)

Setting R[0] = 0 and summing over all flows and over all time slots, we have the

following inequality:

T∑t=1

N∑i=1

U(R(βi)i [t])− pcyi[t] ≥

T∑t=1

N∑i=1

U(R(βi)i [t])− pcR(βi)

i [t]. (D.1.3)

By noting that the linear cost at the RHS cannot exceed pcyc (since the cellular

allocation cannot exceed the cellular capacity), we can refine the bound on the LHS

of (D.1.3)

T∑t=1

N∑i=1

U(R(βi)i [t])− pcR(βi)

i [t] forN∑i=1

Ri[t] ≤ yc

T∑t=1

N∑i=1

U(R(βi)i [t])− pcyc for

N∑i=1

Ri[t] ≥ yc

163

(a)

T∑t=1

N∑i=1

U(yc)− pcycyc

R(βi)i [t] for

N∑i=1

Ri[t] ≤ yc

T∑t=1

N∑i=1

U(yc + cmax)− pcycyc + cmax

R(βi)i [t] for

N∑i=1

Ri[t] ≥ yc

≥T∑t=1

N∑i=1

DR(βi)i [t]

where inequality (a) comes from the fact that the two summand functions on the

LHS are concave in Ri[t], thus, each of those two one-dimensional functions can

be lower bounded by a straight line connecting points (0, U(yc) − pcyc), (U(yc) −

pcyc, U(yc + cmax) − pcyc), respectively. Those two straight lines in turn lie between

linesU(yc)− pcyc

ycRi[t] and

U(yc + cmax)− pcycyc + cmax

Ri[t], thus taking the minimum will

give us a lower bound everywhere in the domain.

D.2 Proof of Lemma 5.5.2

By concavity of the function U( ), it is straight-forward to see that the function

g(R[t−1]; Θ(R[t−1]) is concave in the variable R[t−1]. By first order conditions of

concavity in the variable R[t−1] only (where Θ(R[t−1]) are treated as parameters):

g(R∗[t− 1]; θ∗[t], .., θ∗[t+ w]) ≤ g(R(k)[t− 1]; θ∗[t], .., θ∗[t+ w])

+∇g(R(k)[t− 1]; θ∗[t], . . . , θ∗[t+ w]))T (R∗[t− 1]−R(k)[t− 1]) (D.2.1)

Where ∇ is the gradient operator w.r.t. R[t − 1]. The first term in the RHS of

(D.2.1) can be bounded as follows:

g(R(k)[t− 1]; θ∗[t], .., θ∗[t+ w]) ≤ g(R(k)[t− 1];Θ(k)(R(k)[t− 1])) (D.2.2)

164

This is because, given the initial state R(k)[t−1], Θ(k)(R(k)[t−1]) is the maximizing

vector of g(R(k)[t− 1]; θ[t], .., θ[t+w]) according to the formulation in P2. To bound

the second term in the RHS, we can use the expression in (5.5.5) to explicitly derive

the gradient w.r.t. the vector R[t − 1]. The ith term of the gradient vector can be

bounded as follows:

∂g( )

∂Ri[t− 1]=

t+w∑τ=t

βτ−t+1i U ′

(βτ−t+1i Rn[t− 1])

+τ∑η=t

βη−ti (1− βi)1T θi[η])) (b)

≤ Gt+w∑τ=t

βτ−t+1i ≤ Gβi

(1− βi)(D.2.3)

Where (b) comes from assumption A2 in Theorem 5.5.1. Taking the inner product

of the gradient in (D.2.3) and (R∗[t − 1] − R(k)[t − 1]) and combining the bounds

gives the desired result.

D.3 Proof of Theorem 5.5.1

The reward obtained over the horizon by the AFHC control algorithm can be lower

bounded as follows

g1:T (θ(AFHC))(c)

≥ 1

w + 1

w+1∑k=1

g1:T (θ(k))

(d)

≥ g1:T (θ∗)

− 1

w + 1

w+1∑k=1

∑τ∈Ωk

N∑i=1

Gβi1− βi

(R(βi)∗

i [t− 1]−R(βi)(k)

i [t− 1])

≥ g1:T (θ∗)− 1

w + 1

T∑t=1

N∑i=1

Gβi1− βi

R(βi)∗

i [t− 1]

165

≥ g1:T (θ∗)− 1

w + 1

Gβmax

1− βmax

N∑i=1

T∑t=1

R(βi)∗

i [t− 1]

Where (c) is by Jensen Inequality (averaging property of AFHC) and (d) is a

result of Lemma 3. Dividing both sides by g1:T (θ∗) results in the Competitive Ratio

(CR) lower bound

CR ≥ 1− 1

w + 1

Gβmax

1− βmax

∑Tt=1

∑Ni=1R

(βi)∗

i [t− 1]

g1:T (θ∗)(D.3.1)

Using Lemma 1 to bound g1:T (θ∗) linearly and noticing that R[0] = 0 gives the

desired result.

166

BIBLIOGRAPHY

[1] I. Parvez, A. Rahmati, I. Guvenc, A. I. Sarwat, and H. Dai, “A survey on lowlatency towards 5g: Ran, core network and caching solutions,” IEEE Commu-nications Surveys & Tutorials, vol. 20, no. 4, pp. 3098–3130, 2018.

[2] H. Holma and A. Toskala, LTE for UMTS: Evolution to LTE-advanced. JohnWiley & Sons, 2011.

[3] M. Laner, P. Svoboda, P. Romirer-Maierhofer, N. Nikaein, F. Ricciato, andM. Rupp, “A comparison between one-way delays in operating hspa and ltenetworks,” in 2012 10th International Symposium on Modeling and Optimiza-tion in Mobile, Ad Hoc and Wireless Networks (WiOpt), pp. 286–292, IEEE,2012.

[4] C. V. Forecast, “Cisco visual networking index: Global mobile data trafficforecast update, 2017–2022 white paper,” 2019.

[5] G. S. Paschos, G. Iosifidis, M. Tao, D. Towsley, and G. Caire, “The role ofcaching in future communication systems and networks,” IEEE Journal onSelected Areas in Communications, 2018.

[6] B. Ahlgren, C. Dannewitz, C. Imbrenda, D. Kutscher, and B. Ohlman, “Asurvey of information-centric networking,” IEEE Communications Magazine,2012.

[7] X. Wang, M. Chen, T. Taleb, A. Ksentini, and V. C. Leung, “Cache in theair: Exploiting content caching and delivery techniques for 5g systems,” IEEECommunications Magazine, 2014.

[8] “Qwilt’s* open edge cloud* puts content delivery at the network edge,” WhitePaper, 2018.

[9] “Saguna* and intel – using mobile edge computing to improve mobile networkperformance and profitability,” White Paper, 2018.

[10] G. Paschos, E. Bastug, I. Land, G. Caire, and M. Debbah, “Wireless caching:Technical misconceptions and business barriers,” IEEE Communications Mag-azine, 2016.

167

[11] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire,“Femtocaching: Wireless content delivery through distributed caching helpers,”IEEE Transactions on Information Theory, 2013.

[12] L. Jiang and J. Walrand, “A distributed csma algorithm for throughput andutility maximization in wireless networks,” IEEE/ACM Transactions on Net-working (TON), 2010.

[13] S. Rajagopalan, D. Shah, and J. Shin, “Network adiabatic theorem: an effi-cient randomized protocol for contention resolution,” in ACM SIGMETRICSPerformance Evaluation Review, 2009.

[14] J. Ni, B. Tan, and R. Srikant, “Q-csma: queue-length-based csma/ca algo-rithms for achieving maximum throughput and low delay in wireless networks,”IEEE/ACM Transactions on Networking, 2012.

[15] M. Wang, J. Chen, E. Aryafar, and M. Chiang, “A survey of client-controlledhetnets for 5g,” IEEE Access, 2017.

[16] I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty, “Next generation/dynamicspectrum access/cognitive radio wireless networks: A survey,” Computer net-works, 2006.

[17] S. ElAzzouni, E. Ekici, and N. Shroff, “Is deadline oblivious scheduling efficientfor controlling real-time traffic in cellular downlink systems?,” arXiv preprintarXiv:2002.06474, 2020.

[18] M. Hosseini and V. Swaminathan, “Adaptive 360 vr video streaming: Divideand conquer,” in 2016 IEEE International Symposium on Multimedia (ISM),pp. 107–110, IEEE, 2016.

[19] K. Lee, D. Chu, E. Cuervo, J. Kopf, Y. Degtyarev, S. Grizan, A. Wolman,and J. Flinn, “Outatime: Using speculation to enable low-latency continuousinteraction for mobile cloud gaming,” in Proceedings of the 13th Annual Interna-tional Conference on Mobile Systems, Applications, and Services, pp. 151–165,ACM, 2015.

[20] N. Buchbinder, J. S. Naor, et al., “The design of competitive online algorithmsvia a primal–dual approach,” Foundations and Trends R© in Theoretical Com-puter Science, vol. 3, no. 2–3, pp. 93–263, 2009.

[21] Z. Zheng and N. B. Shroff, “Online multi-resource allocation for deadline sensi-tive jobs with partial values in the cloud,” in Computer Communications, IEEEINFOCOM 2016-The 35th Annual IEEE International Conference on, pp. 1–9,IEEE, 2016.

168

[22] B. Lucier, I. Menache, J. S. Naor, and J. Yaniv, “Efficient online scheduling fordeadline-sensitive jobs,” in Proceedings of the twenty-fifth annual ACM sympo-sium on Parallelism in algorithms and architectures, pp. 305–314, ACM, 2013.

[23] N. R. Devanur and Z. Huang, “Primal dual gives almost optimal energy-efficientonline algorithms,” ACM Transactions on Algorithms (TALG), 2018.

[24] K. Pruhs, J. Sgall, and E. Torng, “Online scheduling.,” 2004.

[25] I.-H. Hou, “Scheduling heterogeneous real-time traffic over fading wireless chan-nels,” IEEE/ACM Transactions on Networking, vol. 22, no. 5, pp. 1631–1644,2014.

[26] S. Lashgari and A. S. Avestimehr, “Timely throughput of heterogeneous wire-less networks: Fundamental limits and algorithms,” IEEE Transactions on In-formation Theory, vol. 59, no. 12, pp. 8414–8433, 2013.

[27] S. Shakkottai and R. Srikant, “Scheduling real-time traffic with deadlines overa wireless channel,” Wireless Networks, vol. 8, no. 1, pp. 13–26, 2002.

[28] L. Dai, B. Wang, Y. Yuan, S. Han, I. Chih-Lin, and Z. Wang, “Non-orthogonalmultiple access for 5g: solutions, challenges, opportunities, and future researchtrends,” IEEE Communications Magazine, 2015.

[29] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge universitypress, 2004.

[30] A. Mehta et al., “Online matching and ad allocation,” Foundations and Trends R©in Theoretical Computer Science, 2013.

[31] Y. Azar, N. Buchbinder, T. H. Chan, S. Chen, I. R. Cohen, A. Gupta, Z. Huang,N. Kang, V. Nagarajan, J. Naor, et al., “Online algorithms for covering andpacking problems with convex objectives,” in IEEE 57th Annual Symposiumon Foundations of Computer Science (FOCS), 2016, pp. 148–157, IEEE, 2016.

[32] R. Eghbali and M. Fazel, “Designing smoothing functions for improved worst-case competitive ratio in online optimization,” in Advances in Neural Informa-tion Processing Systems, pp. 3287–3295, 2016.

[33] X. Liu, E. K. P. Chong, and N. B. Shroff, “Opportunistic transmission schedul-ing with resource-sharing constraints in wireless networks,” IEEE Journal onSelected Areas in Communications, vol. 19, no. 10, pp. 2053–2064, 2001.

[34] J. J. Jaramillo and R. Srikant, “Optimal scheduling for fair resource allocationin ad hoc networks with elastic and inelastic traffic,” IEEE/ACM Transactionson Networking (TON), vol. 19, no. 4, pp. 1125–1136, 2011.

169

[35] L. Deng, C.-C. Wang, M. Chen, and S. Zhao, “Timely wireless flows with gen-eral traffic patterns: Capacity region and scheduling algorithms,” IEEE/ACMTransactions on Networking, vol. 25, no. 6, pp. 3473–3486, 2017.

[36] M. J. Neely, “Stochastic network optimization with application to communica-tion and queueing systems,” Synthesis Lectures on Communication Networks,vol. 3, no. 1, pp. 1–211, 2010.

[37] B. Tan and R. Srikant, “Online advertisement, optimization and stochasticnetworks,” IEEE Transactions on Automatic Control, vol. 57, no. 11, pp. 2854–2868, 2012.

[38] A. Dua and N. Bambos, “Downlink wireless packet scheduling with deadlines,”IEEE Transactions on Mobile Computing, vol. 6, no. 12, pp. 1410–1425, 2007.

[39] M. Agarwal and A. Puri, “Base station scheduling of requests with fixed dead-lines,” in Proceedings. Twenty-First Annual Joint Conference of the IEEEComputer and Communications Societies, vol. 2, pp. 487–496, IEEE, 2002.

[40] S. Traverso, M. Ahmed, M. Garetto, P. Giaccone, E. Leonardi, and S. Niccolini,“Temporal locality in today’s content caching: why it matters and how to modelit,” ACM SIGCOMM Computer Communication Review, 2013.

[41] K. Poularakis, G. Iosifidis, V. Sourlas, and L. Tassiulas, “Exploiting cachingand multicast for 5g wireless networks,” IEEE Transactions on Wireless Com-munications, 2016.

[42] B. Zhou, Y. Cui, and M. Tao, “Optimal dynamic multicast scheduling for cache-enabled content-centric wireless networks,” IEEE Transactions on Communi-cations, 2017.

[43] Y. Cui and D. Jiang, “Analysis and optimization of caching and multicasting inlarge-scale cache-enabled heterogeneous wireless networks,” IEEE transactionson Wireless Communications, 2016.

[44] S. O. Somuyiwa, A. Gyorgy, and D. Gunduz, “Multicast-aware proactive cachingin wireless networks with deep reinforcement learning,” in IEEE SPAWC, 2019.

[45] M. Ji, G. Caire, and A. F. Molisch, “Wireless device-to-device caching networks:Basic principles and system performance,” IEEE Journal on Selected Areas inCommunications, 2015.

[46] J. Erman and K. K. Ramakrishnan, “Understanding the super-sized traffic ofthe super bowl,” in Proceedings of the 2013 conference on Internet measurementconference, ACM, 2013.

[47] A. Eryilmaz and R. Srikant, “Asymptotically tight steady-state queue lengthbounds implied by drift conditions,” Queueing Systems, 2012.

170

[48] S. Gitzenis, G. S. Paschos, and L. Tassiulas, “Asymptotic laws for joint contentreplication and delivery in wireless networks,” IEEE Transactions on Informa-tion Theory, 2012.

[49] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEETransactions on Information Theory, 2014.

[50] W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, “Maptask scheduling inmapreduce with data locality: Throughput and heavy-traffic optimality,” IEEE/ACMTransactions on Networking (TON), vol. 24, no. 1, pp. 190–203, 2016.

[51] S. T. Maguluri and R. Srikant, “Heavy traffic queue length behavior in a switchunder the maxweight algorithm,” Stochastic Systems, vol. 6, no. 1, pp. 211–250,2016.

[52] B. Li, R. Li, and A. Eryilmaz, “Wireless scheduling design for optimizing bothservice regularity and mean delay in heavy-traffic regimes,” IEEE/ACM Trans-actions on Networking, vol. 24, no. 3, pp. 1867–1880, 2015.

[53] B. Hajek, “Hitting-time and occupation-time bounds implied by drift analysiswith applications,” Advances in Applied probability, vol. 14, 1982.

[54] M. Leconte, G. Paschos, L. Gkatzikis, M. Draief, S. Vassilaras, and S. Chou-vardas, “Placing dynamic content in caches with small population,” in IEEEINFOCOM, 2016.

[55] M. E. Newman, “Power laws, pareto distributions and zipf’s law,” Contempo-rary physics, 2005.

[56] X. Zhou, F. Wu, J. Tan, K. Srinivasan, and N. Shroff, “Degree of queue im-balance: Overcoming the limitation of heavy-traffic delay optimality in loadbalancing systems,” Proceedings of the ACM on Measurement and Analysis ofComputing Systems, 2018.

[57] S. ElAzzouni and E. Ekici, “A node-based csma algorithm for improved delayperformance in wireless networks,” in Proceedings of the 17th ACM Interna-tional Symposium on Mobile Ad Hoc Networking and Computing, pp. 31–40,2016.

[58] S. ElAzzouni and E. Ekici, “Node-based distributed channel access with en-hanced delay characteristics,” IEEE/ACM Transactions on Networking, vol. 26,no. 3, pp. 1474–1487, 2018.

[59] L. Tassiulas and A. Ephremides, “Stability properties of constrained queueingsystems and scheduling policies for maximum throughput in multihop radionetworks,” IEEE Transactions on Automatic Control, 1992.

171

[60] N. Bouman, S. C. Borst, and J. S. van Leeuwaarden, “Delay performance inrandom-access networks,” Queueing Systems, 2014.

[61] M. Lotfinezhad and P. Marbach, “Throughput-optimal random access withorder-optimal delay,” in INFOCOM, IEEE, 2011.

[62] D. Shah, D. N. Tse, and J. N. Tsitsiklis, “Hardness of low delay network schedul-ing,” IEEE Transactions on Information Theory, 2011.

[63] K.-K. Lam, C.-K. Chau, M. Chen, and S.-C. Liew, “Mixing time and temporalstarvation of general csma networks with multiple frequency agility,” in ISIT,IEEE, 2012.

[64] C. Bettstetter, “On the minimum node degree and connectivity of a wirelessmultihop network,” in Proceedings of the 3rd ACM MobiHoc, 2002.

[65] L. Jiang, M. Leconte, J. Ni, R. Srikant, and J. Walrand, “Fast mixing of par-allel glauber dynamics and low-delay csma scheduling,” IEEE Transactions onInformation Theory, 2012.

[66] V. G. Subramanian and M. Alanyali, “Delay performance of csma in networkswith bounded degree conflict graphs.,” in ISIT, 2011.

[67] C.-H. Lee, D. Y. Eun, S.-Y. Yun, and Y. Yi, “From glauber dynamics tometropolis algorithm: Smaller delay in optimal csma,” in IEEE ISIT, 2012.

[68] D. Xue and E. Ekici, “On reducing delay and temporal starvation of queue-length-based csma algorithms,” in Allerton, IEEE, 2012.

[69] P.-K. Huang and X. Lin, “Improving the delay performance of csma algorithms:A virtual multi-channel approach,” in INFOCOM, IEEE, 2013.

[70] J. Kwak, C.-H. Lee, et al., “A high-order markov chain based scheduling algo-rithm for low delay in csma networks,” in INFOCOM, IEEE, 2014.

[71] D. Lee, D. Yun, J. Shin, Y. Yi, and S.-Y. Yun, “Provable per-link delay-optimalcsma for general wireless network topology,” in INFOCOM, IEEE, 2014.

[72] C. H. Kai and S. C. Liew, “Temporal starvation in csma wireless networks,” inICC, IEEE, 2011.

[73] D. A. Levin, Y. Peres, and E. L. Wilmer, Markov chains and mixing times.American Mathematical Soc., 2009.

[74] F. Martinelli, “Lectures on glauber dynamics for discrete spin models,” in Lec-tures on probability theory and statistics, pp. 93–191, Springer, 1999.

[75] J. Ghaderi and R. Srikant, “On the design of efficient csma algorithms forwireless networks,” in 49th IEEE CDC, 2010, IEEE, 2010.

172

[76] R. Bubley and M. Dyer, “Path coupling: A technique for proving rapid mixingin markov chains,” in 38th FOCS, IEEE, 1997.

[77] M. Dyer and C. Greenhill, “On markov chains for independent sets,” Journalof Algorithms, 2000.

[78] T. P. Hayes and A. Sinclair, “A general lower bound for mixing of single-sitedynamics on graphs,” in 46th FOCS, IEEE, 2005.

[79] J. Ghaderi and R. Srikant, “Effect of access probabilities on the delay perfor-mance of q-csma algorithms,” in INFOCOM, IEEE, 2012.

[80] N. Bouman, S. Borst, J. van Leeuwaarden, and A. Proutiere, “Backlog-basedrandom access in wireless networks: fluid limits and delay issues,” in Proceedingsof the 23rd International Teletraffic Congress, 2011.

[81] R. G. Gallager, “Discrete stochastic processes,” 2012.

[82] S. ElAzzouni, E. Ekici, and N. B. Shroff, “Qos-aware predictive rate alloca-tion over heterogeneous wireless interfaces,” in 2018 16th International Sympo-sium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks(WiOpt), pp. 1–8, IEEE, 2018.

[83] A. Balasubramanian, R. Mahajan, and A. Venkataramani, “Augmenting mobile3g using wifi,” in Proceedings of MobiSys, ACM, 2010.

[84] K. Lee, J. Lee, Y. Yi, I. Rhee, and S. Chong, “Mobile data offloading: Howmuch can wifi deliver?,” IEEE/ACM Transactions on Networking (TON), 2013.

[85] F. Mehmeti and T. Spyropoulos, “Is it worth to be patient? analysis andoptimization of delayed mobile data offloading,” in INFOCOM, IEEE, 2014.

[86] S. Deng, R. Netravali, A. Sivaraman, and H. Balakrishnan, “Wifi, lte, or both?:Measuring multi-homed wireless internet performance,” in Proceedings of the2014 Conference on Internet Measurement Conference, ACM, 2014.

[87] A. J. Nicholson and B. D. Noble, “Breadcrumbs: forecasting mobile connectiv-ity,” in MobiCom, ACM, 2008.

[88] M. H. Cheung and J. Huang, “Optimal delayed wi-fi offloading,” in WiOpt,IEEE, 2013.

[89] H. Yu, M. H. Cheung, L. Huang, and J. Huang, “Predictive delay-aware networkselection in data offloading,” in GLOBECOM, IEEE, 2014.

[90] Y. Im, C. Joe-Wong, S. Ha, S. Sen, M. Chiang, et al., “Amuse: Empoweringusers for cost-aware offloading with throughput-delay tradeoffs,” IEEE Trans-actions on Mobile Computing, 2016.

173

[91] R. Mahindra, H. Viswanathan, K. Sundaresan, M. Y. Arslan, and S. Rangara-jan, “A practical traffic management system for integrated lte-wifi networks,”in MobiCom, ACM, 2014.

[92] H. Deng and I.-H. Hou, “Online scheduling for delayed mobile offloading,” inINFOCOM, IEEE, 2015.

[93] A. Ford, C. Raiciu, M. Handley, S. Barre, and J. Iyengar, “Architectural guide-lines for multipath tcp development,” tech. rep., 2011.

[94] A. Nikravesh, Y. Guo, F. Qian, Z. M. Mao, and S. Sen, “An in-depth under-standing of multipath tcp on mobile devices: Measurement and system design,”in MobiCom, ACM, 2016.

[95] B. Han, F. Qian, L. Ji, V. Gopalakrishnan, and N. Bedminster, “Mp-dash:Adaptive video streaming over preference-aware multipath.,” in CoNEXT, 2016.

[96] O. B. Yetim and M. Martonosi, “Adaptive delay-tolerant scheduling for efficientcellular and wifi usage,” in WoWMoM, IEEE, 2014.

[97] K.-K. Yap, T.-Y. Huang, Y. Yiakoumis, S. Chinchali, N. McKeown, and S. Katti,“Scheduling packets over multiple interfaces while respecting user preferences,”in Proceedings of the ninth ACM conference on Emerging networking experi-ments and technologies, ACM, 2013.

[98] X. Hou, P. Deshpande, and S. R. Das, “Moving bits from 3g to metro-scale wififor vehicular network access: An integrated transport layer solution,” in ICNP,IEEE, 2011.

[99] A. Eryilmaz and I. Koprulu, “Discounted-rate utility maximization (drum): Aframework for delay-sensitive fair resource allocation,” in WiOpt, IEEE, 2017.

[100] S. Deng, A. Sivaraman, and H. Balakrishnan, “Delphi: A software controllerfor mobile network selection,” 2016.

[101] J. Pang, B. Greenstein, M. Kaminsky, D. McCoy, and S. Seshan, “Wifi-reports:Improving wireless network selection with collaboration,” IEEE Transactionson Mobile Computing, 2010.

[102] J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,”IEEE/ACM Transactions on Networking (ToN), 2000.

[103] J. Mattingley, Y. Wang, and S. Boyd, “Receding horizon control,” IEEE ControlSystems, 2011.

[104] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. Scokaert, “Constrainedmodel predictive control: Stability and optimality,” Automatica, 2000.

174

[105] M. Lin, Z. Liu, A. Wierman, and L. L. Andrew, “Online algorithms for ge-ographical load balancing,” in Green Computing Conference (IGCC), IEEE,2012.

[106] N. Chen, A. Agarwal, A. Wierman, S. Barman, and L. L. Andrew, “Online con-vex optimization using predictions,” in SIGMETRICS Performance EvaluationReview, ACM, 2015.

[107] J. Liu, A. Proutiere, Y. Yi, M. Chiang, and H. V. Poor, “Stability, fairness, andperformance: A flow-level study on nonconvex and time-varying rate regions,”IEEE Transactions on Information Theory, 2009.

[108] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for net-work utility maximization,” IEEE Journal on Selected Areas in Communica-tions, vol. 24, no. 8, pp. 1439–1451, 2006.

[109] F. P. Kelly, A. K. Maulloo, and D. K. Tan, “Rate control for communicationnetworks: shadow prices, proportional fairness and stability,” Journal of theOperational Research society, vol. 49, no. 3, pp. 237–252, 1998.

[110] X. Lin, N. B. Shroff, and R. Srikant, “A tutorial on cross-layer optimization inwireless networks,” IEEE Journal on Selected Areas in Communications, 2006.

[111] P. Diaconis and L. Saloff-Coste, “Comparison theorems for reversible markovchains,” The Annals of Applied Probability, 1993.

[112] D. Aldous and J. Fill, “Reversible markov chains and random walks on graphs.(monographin preparation.),” 2002.

175