Post on 24-Mar-2023
New Directions in Streaming Algorithms
by
Ali Vakilian
B.S., Sharif University of Technology (2011)M.S., University of Illinois at Urbana-Champaign (2013)
Submitted to the Department of Electrical Engineering and Computer Sciencein partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2019
cโ Massachusetts Institute of Technology 2019. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Electrical Engineering and Computer Science
August 30, 2019
Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Erik D. Demaine
Professor of Electrical Engineering and Computer ScienceThesis Supervisor
Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Piotr Indyk
Professor of Electrical Engineering and Computer ScienceThesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Leslie A. Kolodziejski
Professor of Electrical Engineering and Computer ScienceChair, Department Committee on Graduate Students
New Directions in Streaming Algorithms
by
Ali Vakilian
Submitted to the Department of Electrical Engineering and Computer Scienceon August 30, 2019, in partial fulfillment of the
requirements for the degree ofDoctor of Philosophy in Electrical Engineering and Computer Science
Abstract
Large volumes of available data have led to the emergence of new computational models fordata analysis. One such model is captured by the notion of streaming algorithms: given asequence of ๐ items, the goal is to compute the value of a given function of the input items bya small number of passes and using a sublinear amount of space in ๐ . Streaming algorithmshave applications in many areas such as networking and large scale machine learning. Despitea huge amount of work on this area over the last two decades, there are multiple aspectsof streaming algorithms that remained poorly understood, such as (a) streaming algorithmsfor combinatorial optimization problems and (b) incorporating modern machine learningtechniques in the design of streaming algorithms.
In the first part of this thesis, we will describe (essentially) optimal streaming algorithmsfor set cover and maximum coverage, two classic problems in combinatorial optimization.Next, in the second part, we will show how to augment classic streaming algorithms of thefrequency estimation and low-rank approximation problems with machine learning oraclesin order to improve their space-accuracy tradeoffs. The new algorithms combine the benefitsof machine learning with the formal guarantees available through algorithm design theory.
Thesis Supervisor: Erik D. DemaineTitle: Professor of Electrical Engineering and Computer Science
Thesis Supervisor: Piotr IndykTitle: Professor of Electrical Engineering and Computer Science
3
Acknowledgments
First and foremost, I would like to thank my advisors Erik Demaine and Piotr Indyk. I had
the chance to learn from Erik while before joining MIT through his lectures on โIntroduction
to Algorithmsโ when I was a freshman at Sharif UT. His inspiring style of teaching played
a significant role on my passion to pursue TCS in grad school. For me, it was very valuable
to have the opportunity to work with Erik. Besides, Erik gave me the freedom to explore
different areas and pursue my interests; though always being there to support and advise.
The main topic of this thesis, streaming algorithms, and all results in this thesis are in
collaboration with Piotr Indyk. His guidance and supervision has shaped my academic
character and his support during my PhD studies were significant both academic-wise and
social-wise. He always had time for me to discuss both research and non-research questions
and his continuous passion for science, research and mentoring young researchers will be
always a perfect example for me.
Next, I would like to thank Ronitt Rubinfeld who kindly accepted to be in my thesis
committee. Ronitt introduced me to other very related areas of research in massive data
analysis, namely Local Computation Algorithms and Sublinear Algorithms both through her
courses and collaborations. Besides, I benefited a lot from her advice during my PhD years.
I also would like to thank my MS advisor at UIUC, Chandra Chekuri. My first experience
in doing research in TCS was in collaboration with Chandra and I will be always grateful of
his support and supervision. He always had time for me to discuss any topic even after my
graduation from UIUC.
During my years in grad school, I had the chance to collaborate with many wonderful
researchers and this thesis would not be there without their collaboration. I would like to
thank all my co-authors: Anders Aamand, Arturs Backurs, Chandra Chekuri, Yodsawalai
Chodpathumwan, Erik Demaine, Alina Ene, Timothy Goodrich, Christoph Gunru, Sariel
Har-Peled, Chen-Yu Hsu, Piotr Indyk, Dina Katabi, Kyle Kloster, Nitish Korula, Brian
Lavallee, Quanquan Liu, Sepideh Mahabadi, Slobodan Mitroviฤ, Amir Nayyeri, Krzysztof
Onak, Merav Parter, Ronitt Rubinfeld, Baruch Schieber, Blair Sullivan, Arash Termehchy,
Jonathan Ullman, Andrew van der Poel, Tal Wagner, Marianne Winslett, David Woodruff,
5
Anak Yodpinyanee and Yang Yuan.
Thanks to all members in the Theory of Computation group and in particular Maryam
Aliakbarpour, Arturs, Mohammad Bavarian, Dylan Foster, Mohsen Ghaffari, Themis Gouleakis,
Siddhartha Jayanti, Gautam Kamath, Pritish Kamath, William M. Leiserson, Quanquan,
Sepideh, Saeed Mehraban, Slobodan, Cameron Musco, Chirstoph Musco, Merav, Madalina
Persu, Ilya Razenshteyn, Nicholas Schiefer, Ludwig Schmidt, Adam Sealfon, Tal, Anak, Yang
and Henry Yuen, I had great time at MIT. Being away from home, thanks to my friends
in Boston area I had wonderful years during my PhD. I would like to thank all my friends
and in particular those at MIT: Ehsan, Ardavan, Asma, Maryam, Fereshteh, Elaheh, Sajjad,
Mina, Alireza, Amir, Mehrdad, Zahra and Ali.
Last but not least, I would like to thank my family. My parents, Amir and Maryam,
were always sources of inspiration for me and I am thankful of all their supports and efforts
for me since my childhood. Words are not enough to acknowledge what they have done for
me. I also would like to thank my siblings Mohsen and Fatemeh who always care about
me. Mohsen was my very first mentor since elementary school and had a significant role
in shaping my interest in Math and CS. I have benefited a lot from his advice and helps.
Mohsen and Fatemeh, thanks for being so perfect! Sepideh, my beloved wife, is the most
valuable gift I had from God during my PhD at MIT. Besides our several collaborations
on different projects that contribute to half of this thesis, I was very fortunate to have her
honest and unconditional support in my ups and downs during my PhD.
6
Contents
1 Introduction 23
1.1 Streaming Algorithms for Coverage Problems . . . . . . . . . . . . . . . . . 24
1.1.1 Streaming Algorithms for Set Cover . . . . . . . . . . . . . . . . . . . 25
1.1.2 Streaming Algorithms for Fractional Set Cover . . . . . . . . . . . . . 25
1.1.3 Sublinear Algorithms for Set Cover . . . . . . . . . . . . . . . . . . . 26
1.1.4 Streaming Algorithms for Maximum Coverage . . . . . . . . . . . . . 26
1.2 Learning-Based Streaming Algorithms . . . . . . . . . . . . . . . . . . . . . 27
1.2.1 Learning-Based Algorithms for Frequency Estimation . . . . . . . . . 28
1.2.2 Learning-Based Algorithms for Low-Rank Approximation . . . . . . . 29
I Sublinear Algorithms for Coverage Problems 31
2 Streaming Set Cover 33
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.2 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Streaming Algorithm for Set Cover . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.1 Analysis of IterSetCover . . . . . . . . . . . . . . . . . . . . . . . 41
2.3 Lower Bound for Single Pass Algorithms . . . . . . . . . . . . . . . . . . . . 44
2.4 Geometric Set Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9
2.4.2 Description of GeomSC Algorithm . . . . . . . . . . . . . . . . . . . 52
2.4.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Lower bound for Multipass Algorithms . . . . . . . . . . . . . . . . . . . . . 55
2.6 Lower Bound for Sparse Set Cover in Multiple Passes . . . . . . . . . . . . . 61
3 Streaming Fractional Set Cover 65
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1.1 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1.2 Other Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.3 Our Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2 MWU Framework of the Streaming Algorithm for Fractional Set Cover . . . 70
3.2.1 Preliminaries of the MWU method for solving covering LPs . . . . . 71
3.2.2 Semi-streaming MWU-based algorithm for factional Set Cover . . . . 73
3.2.3 First Attempt: Simple Oracle and Large Width . . . . . . . . . . . . 74
3.3 Max Cover Problem and its Application to Width Reduction . . . . . . . . . 76
3.3.1 The Maximum Coverage Problem . . . . . . . . . . . . . . . . . . . . 78
3.3.2 Sampling-Based Oracle for Fractional Max Coverage . . . . . . . . . 82
3.3.3 Final Step: Running Several MWU Rounds Together . . . . . . . . . 85
3.3.4 Extension to general covering LPs . . . . . . . . . . . . . . . . . . . . 86
4 Sublinear Set Cover 91
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1.1 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.1.3 Our Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3 Lower Bounds for the Set Cover Problem . . . . . . . . . . . . . . . . . . . . 97
4.3.1 The Set Cover Lower Bound for Small Optimal Value ๐ . . . . . . . . 99
4.3.2 The Set Cover Lower Bound for Large Optimal Value ๐. . . . . . . . 104
4.4 Sub-Linear Algorithms for the Set Cover Problem . . . . . . . . . . . . . . . 111
10
4.4.1 First Algorithm: small values of ๐ . . . . . . . . . . . . . . . . . . . . 113
4.4.2 Second Algorithm: large values of ๐ . . . . . . . . . . . . . . . . . . . 118
4.5 Lower Bound for the Cover Verification Problem . . . . . . . . . . . . . . . . 120
4.5.1 Underlying Set Structure . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.6 Generalized Lower Bounds for the Set Cover Problem . . . . . . . . . . . . . 124
4.6.1 Construction of the Median Instance ๐ผ*. . . . . . . . . . . . . . . . . 126
4.6.2 Distribution ๐(๐ผ*) of the Modified Instances Derived from ๐ผ*. . . . . 130
4.7 Omitted Proofs from Section 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . 133
5 Streaming Maximum Coverage 135
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.1.1 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.1.2 Our Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.1.3 Other Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.2 Preliminaries and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.2.1 Sampling Methods for Max ๐-Cover and Set Cover . . . . . . . . . . 139
5.2.2 HeavyHitters and Contributing Classes . . . . . . . . . . . . . . . . . 140
5.2.3 ๐ฟ0-Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.3 Estimating Optimal Coverage of Max ๐-Cover . . . . . . . . . . . . . . . . . 144
5.3.1 Universe Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.4 (๐ผ, ๐ฟ, ๐)-Oracle of Max ๐-Cover . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.4.1 Multi-layered Set Sampling: โ๐ฝ โค ๐ผ s.t. |๐ฐ cmn๐ฝ๐ | โฅ ๐๐ฝ|๐ฐ|
๐ผ. . . . . . . . 150
5.4.2 Heavy Hitters and Contributing Classes:
|๐(OPTlarge)| โฅ |๐(OPTsmall)| . . . . . . . . . . . . . . . . . . . . . . 152
5.4.3 Element Sampling: |๐(OPTlarge)| < |๐(OPTsmall)| . . . . . . . . . . . 157
5.5 Approximating Max ๐-Cover Problem in Edge Arrival Streams . . . . . . . . 161
5.6 Lower Bound for Estimating Max ๐-Cover in Edge Arrival Streams . . . . . 165
5.7 Chernoff Bound for Applications with Limited Independence . . . . . . . . . 167
5.7.1 An Application: Set Sampling with ฮ(log(๐๐))-wise Independence . 168
11
5.8 Generalization of Section 5.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 169
II Learning-Based Streaming Algorithms 178
6 The Frequency Estimation Problem 179
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.2.1 Algorithms for Frequency Estimation . . . . . . . . . . . . . . . . . . 184
6.2.2 Zipfian Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.3 Learning-Based Frequency Estimation Algorithms . . . . . . . . . . . . . . . 185
6.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.4.1 Tight Bounds for Count-Min with Zipfians . . . . . . . . . . . . . . . 187
6.4.2 Learned Count-Min with Zipfians . . . . . . . . . . . . . . . . . . . . 190
6.4.3 (Nearly) tight Bounds for Count-Sketch with Zipfians . . . . . . . . . 194
6.4.4 Learned Count-Sketch for Zipfians . . . . . . . . . . . . . . . . . . . . 201
6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.5.1 Internet Traffic Estimation . . . . . . . . . . . . . . . . . . . . . . . . 204
6.5.2 Search Query Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.5.3 Analyzing Heavy Hitter Models . . . . . . . . . . . . . . . . . . . . . 209
7 The Low-Rank Approximation Problem 211
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.1.1 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
7.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
7.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
7.3 Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
7.4 Worst Case Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
12
7.5.1 Average test error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7.5.2 Comparing Random, Learned and Mixed . . . . . . . . . . . . . . . . 222
7.6 The case of ๐ = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.7 Optimization Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.8 Generalization Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.8.1 Generalization Bound for Robust Solutions . . . . . . . . . . . . . . . 228
13
List of Figures
2.1.1 A collection of ๐2/4 distinct rectangles (i.e., sets), each containing two points. 38
2.5.1 (a) shows an example of the communication Set Chasing(4, 3) and (b) is an
instance of the communication Intersection Set Chasing(4, 3). . . . . . . . . 55
2.5.2 The gadgets used in the reduction of the communication Intersection Set
Chasing problem to the communication Set Cover problem. (a) and (c) shows
the construction of the gadget for players 1 to ๐ and (c) and (d) shows the
construction of the gadget for players ๐+ 1 to 2๐. . . . . . . . . . . . . . . . 56
2.5.3 In (b) two Set Chasing instances merge in their first set of vertices and (c)
shows the corresponding gadgets of these merged vertices in the communica-
tion Set Cover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.5.4 In (๐), path๐ is shown with black dashed arcs and (๐) shows the corresponding
cover of path ๐. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.1 LP relaxation of Set Cover. . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.2 LP relaxations of the feasibility variant of set cover and general covering prob-
lems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.1 LP relaxation of weighted Max ๐-Cover. . . . . . . . . . . . . . . . . . . . . 79
4.5.1 A basic slab and an example of a swapping operation. . . . . . . . . . . . . . 123
4.5.2 A example structure of a Yes instance; all elements are covered by the first 3
sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
15
4.6.1 Tables illustrating the representation of a slab under EltOf and SetOf
before and after a swap; cells modified by swap(๐2, ๐4) between ๐3 and ๐9 are
highlighted in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3.1 Pseudo-code and block-diagram representation of our algorithms . . . . . . . 185
6.5.1 Frequency of Internet flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.5.2 Comparison of our algorithms with Count-Min and Count-Sketch on Internet
traffic data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.5.3 Frequency of search queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.5.4 Comparison of our algorithms with Count-Min and Count-Sketch on search
query data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.5.5 ROC curves of the learned models. . . . . . . . . . . . . . . . . . . . . . . . 209
7.3.1 The ๐-th iteration of power method. . . . . . . . . . . . . . . . . . . . . . . . 216
7.5.1 Test error by datasets and sketching matrices For ๐ = 10,๐ = 20 . . . . . . 220
7.5.2 Test error for Logo (left), Hyper (middle) and Tech (right) when ๐ = 10. . . 220
7.5.3 Low rank approximation results for Logo video frame: the best rank-10 ap-
proximation (left), and rank-10 approximations reported by Algorithm 27 us-
ing a sparse learned sketching matrix (middle) and a sparse random sketching
matrix (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
16
List of Tables
2.1.1 Summary of past work and our results. Parameters ๐ and ๐g respectively
denote the approximation factor of off-line algorithms for Set Cover and its
geometric variant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1 A summary of our algorithms and lower bounds. We use the following no-
tation: ๐ โฅ 1 denotes the size of the optimum cover; ๐ผ โฅ 1 denotes a pa-
rameter that determines the trade-off between the approximation quality and
query/time complexities; ๐ โฅ 1 denotes the approximation factor of a โblack
boxโ algorithm for set cover used as a subroutine. . . . . . . . . . . . . . . . 93
5.1.1 The summary of known results on the space complexity of single-pass stream-
ing algorithms of Max ๐-Cover. . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.4.1 The values of parameters used in our algorithms. . . . . . . . . . . . . . . . 149
6.4.1 The performance of different algorithms on streams with frequencies obeying
Zipf Law. ๐ is the number of hash functions, ๐ต is the number of buckets, and
๐ is the number of distinct elements. The space complexity of all algorithms
is ฮ(๐ต). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.5.1 Test error in various settings. . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.5.2 Performance of mixed sketches . . . . . . . . . . . . . . . . . . . . . . . . . . 222
17
List of Algorithms
1 A tight streaming algorithm for the (unweighted) Set Cover problem. Here,
OfflineSC is an offline solver for Set Cover that provides ๐-approximation,
and ๐ is some appropriate constant. . . . . . . . . . . . . . . . . . . . . . . . 39
2 RecoverBit uses a protocol for (Many vs One)-Set Disjointness(๐,๐) to
recover Aliceโs sets, โฑ๐ด in Bobโs side. . . . . . . . . . . . . . . . . . . . . . . 47
3 A streaming algorithm for the Points-Shapes Set Cover problem. . . . . . . 53
4 FracSetCover returns a (1 + ๐)-approximate solution of SetCover-LP. . . 71
5 A generic implementation of FeasibilityTest. Its performance depends on
the implementations of Oracle, UpdateProb. We will investigate different
implementations of Oracle. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 HeavySetOracle computes ๐๐ of every set given the set system in a stream
or stored memory, then returns the solution x that optimally places value โ
on the corresponding entry. It reports INFEASIBLE if there is no sufficiently
good solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7 MaxCoverOracle returns a fractional โ-cover with weighted coverage at
least 1โ ๐ฝ/3 w.h.p. if โ โฅ ๐. It provides no guarantee on its behavior if โ < ๐. 80
8 The Prune subroutine lifts a solution in โฑ to a solution in โฑ with the same
MaxCover-LP objective value and width 1. The subroutine returns z, the
amount by which members of โฑ cover each element. The actual pruned so-
lution x may be computed but has no further use in our algorithm and thus
not returned. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9 An efficient implementation of FeasibilityTest which performs in ๐ passes
and consumes ๐(๐๐๐( 1๐๐
) + ๐) space. . . . . . . . . . . . . . . . . . . . . . . 87
19
10 The procedure of constructing a modified instance of ๐ผ*. . . . . . . . . . . . 100
11 IterSetCover is the main procedure of the SmallSetCover algorithm
for the Set Cover problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
12 OfflineSC(S, โ) invokes a black box that returns a cover of size at most ๐โ
(if there exists a cover of size โ for S) for the Set Cover instance that is the
projection of โฑ over S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
13 The description of the SmallSetCover algorithm. . . . . . . . . . . . . . . 117
14 A (๐+๐)-approximation algorithm for the Set Cover problem. We assume that
the algorithm has access to EltOf and SetOf oracles for Set Cover(๐ฐ ,โฑ),as well as |๐ฐ| and |โฑ|. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
15 The procedure of constructing a modified instance of ๐ผ*. . . . . . . . . . . . 131
16 A single pass streaming algorithm that given a vector ๐ in a stream, for each
contributing class ๐ , returns an index ๐ along with a (1ยฑ (1/2))-estimate of
its frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
17 A single-pass streaming algorithm that uses an (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover
to compute an (1/ ๐(๐ผ))-approximation of the optimal coverage size of Max ๐-
Cover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
18 An (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover. . . . . . . . . . . . . . . . . . . . . . . . 150
19 A (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover that handles the case in which the number
of common elements is large. . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
20 An (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover that handles the case in which the ma-
jority of the coverage in an optimal solution is by the sets whose coverage
contributions are at least 1/(s๐ผ) fraction of the optimal coverage size. . . . . 156
21 A single pass streaming algorithm that estimates the optimal coverage size of
Max ๐-Cover(๐ฐ ,โฑ) within a factor of 1/ ๐(๐ผ) in ๐(๐/๐ผ2) space. To return a๐( ๐๐ผ)-cover that achieves the computed estimate it suffices to change the line
marked by (โโ) to return the corresponding ๐( ๐๐ผ)-cover as well (see Section 5.5).160
22 A single pass streaming algorithm that reports a (1/๐ผ)-approximate solution
of Max ๐-Cover(๐ฐ ,โฑ) in ๐(๐ฝ+ ๐) space when the set of common elements is
large. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
20
23 An (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover that reports a (1/ ๐(๐ผ))-approximate ๐-cover.164
24 A (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover that handles the case in which the majority
of the coverage in an optimal solution is by the sets whose coverage contribu-
tions are at least 1/(s๐ผ) fraction of the optimal coverage size. By modifying
the lines marked by (โโ) slightly, the algorithm returns the ๐-cover that
achieves the returned coverage estimate (see Section 5.5 for more explanation).172
25 A single pass streaming algorithm. . . . . . . . . . . . . . . . . . . . . . . . 176
26 Learning-Based Frequency Estimation . . . . . . . . . . . . . . . . . . . . . 185
27 Rank-๐ approximation of a matrix ๐ด using a sketch matrix ๐ (refer to Sec-
tion 4.1.1 of [52]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
28 Differentiable implementation of SVD . . . . . . . . . . . . . . . . . . . . . . 216
21
Chapter 1
Introduction
In recent decades, massive datasets arose in numerous application areas such as genomics,
finance, social networks and Internet of things. The main challenge of massive datasets is that
the traditional data-processing architectures are not capable of analyzing them efficiently.
This state of affairs has led to the emergence of new computational models for data analysis.
In particular, since a large fraction of datasets are generated as a stream, algorithms for such
data have been studied in the streaming algorithms model extensively. Formally, given a
sequence of ๐ items, the goal is to compute the value of a given function of the input items
by a small number of passes and using a sublinear amount of space in ๐ .
In this thesis, we advance the development of streaming algorithms in two new directions:
(a) designing streaming algorithms for new classes of problems and (b) designing streaming
algorithms using modern machine learning approaches.
The streaming model has been studied since early 80s. The focus of the early work was
on processing numerical data such as estimating heavy hitters, number of distinct elements
and quantiles [74, 35, 13, 129]. Over the last two decades, there has been a significant body
of work on designing streaming algorithms for discrete optimization and in particular graph
problems [102, 143, 73]. However, until recently not much was known about the complexity
of coverage problems, which is a well-studied class of problems in discrete optimization, in
the streaming setting. In the first part of this thesis (Chapters 2-5), we provide optimal
streaming algorithms for several variants of set cover and maximum coverage problems, two
23
well-studied problems in discrete optimization.
Since the classical algorithms have formal worst-case guarantees, they tend to work
equally well on all inputs. However, in most applications, the underlying data has certain
properties that, if discovered and harnessed, could enable much better performance. Inspired
by the success of machine learning, in the second part of this thesis (Chapters 6-7), we de-
sign such โlearning-basedโ algorithms for two fundamental tasks in data analysis: frequency
estimation and low-rank approximation. We remark that these are the first learning-based
streaming algorithms. For both of these problems, we design learning-based algorithms that,
in theory and practice, achieve better performance (e.g., space complexity or approximation
quality) than the existing algorithms, as long as the input stream has certain patterns. At
the same time, the algorithms (usually) retain the worst-case guarantees of the existing
algorithms for these problems even on adversarial inputs.
1.1. Streaming Algorithms for Coverage Problems
A fundamental class of problems in the area of combinatorial optimization involves minimiz-
ing a given cost function under a set of covering constraints. Perhaps the most well-studied
problem in this family is set cover: given a collection ๐ฎ of sets whose union is [๐], the goal
is to identify the smallest sub-collection of ๐ฎ whose union is [๐]. In the presence of massive
data, new techniques are required to solve even this classic problem.
The first part of this thesis is devoted to the design of efficient sublinear (space or time)
algorithms for Set Cover and a closely related problem Max ๐-Cover, here collectively called
coverage problem, that have in many areas, including operations research, machine learning,
information retrieval and data mining [86, 155, 48, 2, 115, 26, 117].
Although both Set Cover and Max ๐-Cover are NP-complete, a natural greedy algorithm
which iteratively picks the โbestโ remaining set is provably the optimal algorithm for these
problems unless P = NP [72, 152, 14, 138, 63] and it is widely used in practice. The algorithm
often finds solutions that are very close to optimal. Unfortunately, due to its sequential na-
ture, this algorithm does not scale very well to massive data sets (e.g., see Cormode et al. [56]
for an experimental evaluation). This difficulty has motivated a considerable research effort
whose goal was to design algorithms that are capable of handling large data efficiently on
24
modern architectures. Of particular interest are data stream algorithms, which compute the
solution using only a small number of sequential passes over the data using a limited memory.
The study of streaming Set Cover problem initiated in [155] and since then there has been a
large body of work on coverage problems in massive data analysis models and in particular
streaming model [67, 61, 42, 97, 20, 29, 17, 105, 106, 107, 18, 123, 131, 17, 29, 19, 147, 5].
1.1.1. Streaming Algorithms for Set Cover
In the Set Cover problem, given a ground set of ๐ elements ๐ฐ = {๐1, ยท ยท ยท , ๐๐}, and a family
of ๐ sets โฑ = {๐1, . . . , ๐๐} where ๐ โฅ ๐, the goal is to select a subset โ โ โฑ such that โcovers ๐ฐ and the number of the sets in โ is as small as possible. In the streaming Set Cover
problem as introduced in [155], the set of elements ๐ฐ is stored in the memory in advance;
the sets ๐1, ยท ยท ยท , ๐๐ are stored consecutively in a read-only repository and an algorithm can
access the sets only by performing sequential scans of the repository. However, the amount
of read-write memory available to the algorithm is limited, and is smaller than the input
size (which could be as large as ๐๐). The objective is to design efficient approximation
algorithms for the Set Cover problem that performs few passes over the data, and uses as
little memory as possible. In Chapter 2, we present an efficient streaming algorithm for this
problem which has been proved to be optimal [17].
1.1.2. Streaming Algorithms for Fractional Set Cover
The LP relaxation of Set Cover is one of the well-studied mathematical programs in ap-
proximation algorithms design. It is a continuous relaxation of the problem where each set
๐ โ โฑ can be selected โfractionallyโ, i.e., assigned a number ๐ฅ๐ from [0, 1], such that for each
element ๐ its โfractional coverageโโ
๐:๐โ๐ ๐ฅ๐ is at least 1, and the sumโ
๐ ๐ฅ๐ is minimized.
To the best of our knowledge, it is not known whether there exists an efficient and accurate
algorithm for this problem that uses only a logarithmic (or even a polylogarithmic) number of
passes. This state of affairs is perhaps surprising, given the many recent developments on fast
LP solvers [119, 174, 124, 12, 11, 167]. The only prior results on streaming Packing/Covering
LPs were presented in [6], which studied the LP relaxation of Maximum Matching.
In Chapter 3, we present the first (1 + ๐)-approximation algorithm for the fractional
Set Cover in the streaming model with constant number of passes and using sublinear amount
25
of space.
1.1.3. Sublinear Algorithms for Set Cover
Another related natural question to streaming Set Cover is the following: โis it possible to
solve minimum set cover in sub-linear time?โ This question was previously addressed in
[145, 172], who showed that one can design constant running-time algorithms by simulating
the greedy algorithm, under the assumption that the sets are of constant size and each
element occurs in a constant number of sets. However, those constant-time algorithms have
a few drawbacks: they only provide a mixed multiplicative/additive guarantee (the output
cover size is guaranteed to be at most ๐ ยท ln๐+๐๐), the dependence of their running times on
the maximum set size is exponential, and they only output the (approximate) minimum set
cover size, not the cover itself. From a different perspective, [119] (building on [84]) showed
that an ๐(1)-approximate solution to the fractional version of the problem can be found
in ๐(๐๐2 + ๐๐2) time.1 Combining this algorithm with the randomized rounding yields an
๐(log ๐)-approximate solution to Set Cover with the same complexity.
In Chapter 4, we initiate a systematic study of the complexity of sub-linear time algo-
rithms for set cover with multiplicative approximation guarantees. Our upper bounds com-
plement the aforementioned result of [119] by presenting algorithms which are fast when ๐ is
large, as well as algorithms that provide more accurate solutions (even with a constant-factor
approximation guarantee) that use a sub-linear number of queries.2 Equally importantly, we
establish nearly matching lower bounds, some of which even hold for estimating the optimal
cover size.
1.1.4. Streaming Algorithms for Maximum Coverage
In Max ๐-Cover, given a ground set ๐ฐ of ๐ elements, a family of ๐ sets โฑ (each subset of ๐ฐ),and a parameter ๐, the goal is to select ๐ sets in โฑ whose union has the largest cardinality.
Moreover, Max ๐-Cover is an important problem in submodular maximization.
The initial algorithms were developed in the set arrival model, where the input sets are
listed contiguously. This restriction is natural from the perspective of submodular optimiza-
1The method can be further improved to ๐(๐+ ๐๐) (N. Young, personal communication).2Note that polynomial time algorithms with sub-logarithmic approximation are unlikely to exist.
26
tion, but limits the applicability of the algorithms3. Avoiding this limitation can be difficult,
as streaming algorithms can no longer operate on sets as โunit objectsโ. As a result, the
first maximum coverage algorithm for the general edge arrival model, where pairs of (set,
element) can arrive in arbitrary order, have been developed recently. In particular [29] was
first to (explicitly) present a one-pass algorithm with space linear in ๐ and constant approx-
imation factor. We remark that many of the prior bounds (both upper and lower bounds) on
set cover and max ๐-cover problems in set-arrival streams also work in edge arrival streams
(e.g. [61, 97, 20, 131, 17, 105]).
A particularly interesting line of research in set arrival streaming set cover and max
๐-cover is to design efficient algorithms that only use ๐(๐) space [155, 22, 67, 42, 131]. Pre-
vious work have shown that we can adopt the existing greedy algorithm of Max ๐-Cover to
achieve constant factor approximation in ๐(๐) space [155, 22] (which later improved to ๐(๐)
by [131]). However, the complexity of the problem in the โlow spaceโ regime is very different
in edge-arrival streams: [29] showed that as long as the approximation factor is a constant,
any algorithm must use ฮฉ(๐) space. Still, our understanding of approximation/space trade-
offs in the general case is far from complete. In Chapter 5, we provide tight bounds for the
approximation/space tradeoffs of Max ๐-Cover in the general edge-arrival streams.
1.2. Learning-Based Streaming Algorithms
Classical algorithms are best known for providing formal guarantees over their performance,
but often fail to leverage useful patterns in their input data to improve their output. However,
in most applications, the underlying data has certain properties that can lead to much better
performance if exploited by algorithms. On the other hand, machine learning approaches (in
particular, deep learning models) are highly successful at capturing and utilizing complex
data patterns, but often lack formal error bounds. The last few years have witnessed a
growing effort to bridge this gap and introduce algorithms that can adapt to data properties
while delivering worst case guarantees. For example, โlearning-basedโ models have been
integrated into the design of data structures [120, 137, 91], online algorithms [128, 151, 81],
3For example, consider a situation where the sets correspond to neighborhoods of vertices in a directedgraph. Depending on the input representation, for each vertex, either the ingoing edges or the outgoingedges might be placed non-contiguously.
27
graph optimization [59], similarity search [156, 169] and compressive sensing [33]. This
learning-based approach to algorithm design has attracted a considerable attention due to its
potential to significantly improve the efficiency of some of the most widely used algorithmic
tasks.
Indeed, many applications involve processing streams of data (e.g., videos, data logs,
customer activities and social media) by executing the same algorithm on an hourly, daily or
weekly basis. These data sets are typically not โrandomโ or โworst-caseโ; instead, they come
from some distribution with useful properties which does not change rapidly from execution
to execution. This makes it possible to design better streaming algorithms tailored to the
specific data distribution, trained on past instances of the problem. In the second part of
this thesis, we developed first learning-based algorithms in the streaming model and design
such algorithms for two basic problems in this area frequency estimation (in Chapter 6) and
low-rank approximation (in Chapter 7). For both of these problems, we design learning-
based algorithms that empirically (and provably) achieve better performance (e.g., space
complexity or approximation quality) compared to the existing algorithms if the input stream
has certain patterns while achieving the worst-case guarantees of the existing algorithms for
these problems even on adversarial inputs.
1.2.1. Learning-Based Algorithms for Frequency Estimation
Estimating the frequencies of elements in a data stream is one of the most fundamental sub-
routines in data analysis. It has applications in many areas of machine learning, including
feature selection [4], ranking [66], semi-supervised learning [164] and natural language pro-
cessing [82]. It has been also used for network measurements [70, 175, 127] and security [158].
Frequency estimation algorithms have been implemented in popular data processing libraries,
such as Algebird at Twitter [36]. They can answer practical questions like: what are the
most searched words on the Internet? or how much traffic is sent between any two machines
in a network?
The frequency estimation problem is formalized as follows: given a sequence ๐ of elements
from some universe ๐ , for any element ๐ โ ๐ , estimate ๐๐, the number of times ๐ occurs in ๐.
If one could store all arrivals from the stream ๐, one could sort the elements and compute
28
their frequencies. However, in big data applications, the stream is too large (and may be
infinite) and cannot be stored. This challenge has motivated the development of streaming
algorithms, which read the elements of ๐ in a single pass and compute a good estimate of
the frequencies using a limited amount of space. Over the last two decades, many such
streaming algorithms have been developed, including Count-Sketch [43], Count-Min [57] and
multi-stage filters [70]. The performance guarantees of these algorithms are well-understood,
with upper and lower bounds matching up to ๐(ยท) factors [110].However, such streaming algorithms typically assume generic data and do not leverage
useful patterns or properties of their input. For example, in text data, the word frequency
is known to be inversely correlated with the length of the word. Analogously, in network
data, certain applications tend to generate more traffic than others. If such properties
can be harnessed, one could design frequency estimation algorithms that are much more
efficient than the existing ones. Yet, it is important to do so in a general framework that
can harness various useful properties, instead of using handcrafted methods specific to a
particular pattern or structure (e.g., word length, application type). In Chapter 6, we
introduce โlearning-basedโ frequency estimation streaming algorithms.
1.2.2. Learning-Based Algorithms for Low-Rank Approximation
Low-rank approximation is one of the most widely used tools in massive data analysis, ma-
chine learning and statistics, and has been a subject of many algorithmic studies. Formally,
in low-rank approximation, given an ๐ ร ๐ matrix ๐ด, and a parameter ๐, the goal is to
compute a rank-๐ matrix
[๐ด]๐ = argmin๐ดโฒ: rank(๐ดโฒ)โค๐
โ๐ดโ ๐ดโฒโ๐น .
In particular, multiple algorithms developed over the last decade use the โsketchingโ
approach, see e.g., [157, 171, 92, 52, 53, 144, 132, 34, 54]. Its idea is to use efficiently com-
putable random projections (a.k.a., โsketchesโ) to reduce the problem size before performing
low-rank decomposition, which makes the computation more space and time efficient. For
example, [157, 52] show that if ๐ is a random matrix of size๐ร๐ chosen from an appropriate
29
distribution4, for ๐ depending on ๐, then one can recover a rank-๐ matrix ๐ดโฒ such that
โ๐ดโ ๐ดโฒโ๐น โค (1 + ๐)โ๐ดโ [๐ด]๐โ๐น
by performing an SVD on ๐๐ด โ R๐ร๐ followed by some post-processing. Typically the
sketch length ๐ is small, so the matrix ๐๐ด can be stored using little space (in the context of
streaming algorithms) or efficiently communicated (in the context of distributed algorithms).
Furthermore, the SVD of ๐๐ด can be computed efficiently, especially after another round of
sketching, reducing the overall computation time.
In light of the success of learning-based approaches, it is natural to ask whether similar
improvements in performance could be obtained for other sketch-based algorithms, notably
for low-rank decompositions. In particular, reducing the sketch length ๐ while preserving
its accuracy would make sketch-based algorithms more efficient. Alternatively, one could
make sketches more accurate for the same values of ๐. In Chapter 7, we design the first
โlearning-basedโ (streaming) algorithm for the low-rank approximation problem.
4Initial algorithms used matrices with independent sub-gaussian entries or randomized Fourier/Hadamardmatrices [157, 171, 92]. Starting from the seminal work of [53], researchers began to explore sparse binarymatrices, see e.g., [144, 132].
30
Chapter 2
Streaming Set Cover
2.1. Introduction
In the streaming Set Cover problem as defined in [155], the set of elements ๐ฐ = {๐1, ยท ยท ยท , ๐๐}is stored in the memory in advance; the sets ๐1, ยท ยท ยท , ๐๐ are stored consecutively in a read-
only repository and an algorithm can access the sets only by performing sequential scans
of the repository. However, the amount of read-write memory available to the algorithm is
limited, and is smaller than the input size (which could be as large as ๐๐). The objective is
to design an efficient approximation algorithm for the Set Cover problem that performs few
passes over the data, and uses as little memory as possible.
The last few years have witnessed a rapid development of new streaming algorithms for
the Set Cover problem, in both theory and applied communities, see [155, 56, 122, 67, 61, 42].
Table 2.1.1 presents the approximation and space bounds achieved by those algorithms, as
well as the lower bounds.1
2.1.1. Related Work
The semi-streaming Set Cover problem was first studied by Saha and Getoor [155]. Their
result for Max ๐-Cover problem implies a ๐(log ๐)-pass ๐(log ๐)-approximation algorithm
for the Set Cover problem that uses ๐(๐2) space. Adopting the standard greedy algorithm of
Set Cover with a thresholding technique leads to ๐(log ๐)-pass ๐(log ๐)-approximation using
1Note that the simple greedy algorithm can be implemented by either storing the whole input (in onepass), or by iteratively updating the set of yet-uncovered elements (in at most ๐ passes).
33
Result Approximation Passes Space Note
Greedy Algorithm ln๐1 ๐(๐๐)
Deterministic๐ ๐(๐)
[155] ๐(log ๐) ๐(log ๐) ๐(๐2 ln๐) Deterministic
[67] ๐(โ๐) 1 ฮ(๐)
Randomized (tight bounds)
& deterministic algorithm
[42] ๐(๐๐ฟ
๐ฟ) 1
๐ฟโ 1 ฮ(๐)
Randomized (tight bounds)
& deterministic algorithm
[146] 12log ๐ ๐(log ๐) ฮฉ(๐) Randomized
[61] ๐(41๐ฟ ๐) ๐(4
1๐ฟ ) ๐(๐๐๐ฟ) Randomized
[61] ๐(1) ๐(log ๐) ฮฉ(๐๐) Deterministic
Theorem 2.2.8 ๐(๐๐ฟ) 2
๐ฟ๐(๐๐๐ฟ) Randomized
Theorem 2.3.8 32
1 ฮฉ(๐๐) Randomized
Theorem 2.5.4 1 12๐ฟโ 1 ฮฉ(๐๐๐ฟ) Randomized
Geometric Set Cover
(Theorem 2.4.6)๐(๐๐) ๐(1) ๐(๐) Randomized
๐ -Sparse Set Cover
(Theorem 2.6.6)1 1
2๐ฟโ 1 ฮฉ(๐๐ ) Randomized
Table 2.1.1: Summary of past work and our results. Parameters ๐ and ๐g respectivelydenote the approximation factor of off-line algorithms for Set Cover and its geometric variant.๐(๐) space. In ๐(๐) space regime, Emek and Rosen studied designing one-pass streaming
algorithms for the Set Cover problem [67] and gave a deterministic greedy based ๐(โ๐)-
approximation for the problem. Moreover they proved that their algorithm is tight, even for
randomized algorithms. The lower/upper bound results of [67] applied also to a generaliza-
tion of the Set Cover problem, the ๐-Partial Set Cover(๐ฐ ,โฑ) problem in which the goal is to
cover (1โ ๐) fraction of elements ๐ฐ and the size of the solution is compared to the size of an
optimal cover of Set Cover(๐ฐ ,โฑ). Very recently, Chakrabarti and Wirth extended the result
of [67] and gave a trade-off streaming algorithm for the Set Cover problem in multiple passes
[42]. They gave a deterministic algorithm with ๐ passes over the data stream that returns a
(๐+1)๐1/(๐+1)-approximate solution of the Set Cover problem in ๐(๐) space. Moreover they
34
proved that achieving 0.99๐1/(๐+1)/(๐+1)2 in ๐ passes using ๐(๐) space is not possible even
for randomized protocols which shows that their algorithm is tight up to a factor of (๐+1)3.
Their result also works for the ๐-Partial Set Cover problem.
In a different regime which was first studied by Demaine et al., the goal is to design a
โlowโ approximation algorithms (depending on the computational model, it could be ๐(log ๐)
or ๐(1)) in the smallest possible space [61]. They proved that any constant pass determin-
istic (log ๐/2)-approximation algorithm for the Set Cover problem requires ฮฉ(๐๐) space. It
shows that unlike the results in ๐(๐)-space regime, to obtain a sublinear โlowโ approxima-
tion streaming algorithm for the Set Cover problem in a constant number of passes, using
randomness is necessary. Moreover, [61] presented a ๐(41/๐ฟ)-approximation algorithm that
makes ๐(41/๐ฟ) passes and uses ๐(๐๐๐ฟ) memory space.
The Set Cover problem is not polynomially solvable even in the restricted instances with
points in R2 as elements, and geometric objects (either all disks or axis parallel rectangles
or fat triangles) in plane as sets [71, 75, 99]. As a result, there has been a large body of
work on designing approximation algorithms for the geometric Set Cover problems. See for
example [142, 3, 15, 51] and references therein.
2.1.2. Our Results
Despite the progress outlined above, however, some basic questions still remained open. In
particular:
(A) Is it possible to design a single pass streaming algorithm with a โlowโ approximation
factor2 that uses sublinear (i.e., ๐(๐๐)) space?
(B) If such single pass algorithms are not possible, what are the achievable trade-offs
between the number of passes and space usage?
(C) Are there special instances of the problem for which more efficient algorithms can be
designed?
2Note that the lower bound in [61] excluded this possibility only for deterministic algorithms, while theupper bound in [67, 42] suffered from a polynomial approximation factor.
35
In this chapter, we make a significant progress on each of these questions. Our upper and
lower bounds are depicted in Table 2.1.1.
On the algorithmic side, we give a ๐(1/๐ฟ)-pass algorithm with a strongly sub-linear๐(๐๐๐ฟ) space and logarithmic approximation factor. This yields a significant improvement
over the earlier algorithm of Demaine et al. [61] which used exponentially larger number of
passes. The trade-off offered by our algorithm matches the lower bound of Nisan [146] that
holds at the endpoint of the trade-off curve, i.e., for ๐ฟ = ฮ(1/ log ๐), up to poly-logarithmic
factors in space3. Furthermore, our algorithm is very simple and succinct, and therefore easy
to implement and deploy.
Our algorithm exhibits a natural tradeoff between the number of passes and space, which
resembles tradeoffs achieved for other problems [87, 88, 90]. It is thus natural to conjecture
that this tradeoff might be tight, at least for โlow enoughโ approximation factors. We present
the first step in this direction by showing a lower bound for the case when the approximation
factor is equal to 1, i.e., the goal is to compute the optimal set cover. In particular, by an
information theoretic lower bound, we show that any streaming algorithm that computes set
cover using ( 12๐ฟโ1) passes must use ฮฉ(๐๐๐ฟ) space (even assuming exponential computational
power) in the regime of ๐ = ๐(๐). Furthermore, we show that a stronger lower bound holds
if all the input sets are sparse, that is if their cardinality is at most ๐ . We prove a lower
bound of ฮฉ(๐๐ ) for ๐ = ๐(๐๐ฟ) and ๐ = ๐(๐).
We also consider the problem in the geometric setting in which the elements are points in
R2 and sets are either discs, axis-parallel rectangles, or fat triangles in the plane. We show
that a slightly modified version of our algorithm achieves the optimal ๐(๐) space to find an
๐(๐)-approximation in ๐(1) passes.
Finally, we show that any randomized one-pass algorithm that distinguishes between
covers of size 2 and 3 must use a linear (i.e., ฮฉ(๐๐)) amount of space. This is the first
result showing that a randomized, approximate algorithm cannot achieve a sub-linear space
bound.
Recently Assadi et al. [20] generalized this lower bound to any approximation ratio ๐ผ =
3Note that to achieve a logarithmic approximation ratio we can use an off-line algorithm with the ap-proximation ratio ๐ = 1, i.e., one that runs in exponential time (see Theorem 2.2.8).
36
๐(โ๐). More precisely they showed that approximating Set Cover within any factor ๐ผ =
๐(โ๐) in a single pass requires ฮฉ(๐๐
๐ผ) space.
Our techniques: basic idea. Our algorithm is based on the idea that whenever a large
enough set is encountered, we can immediately add it to the cover. Specifically, we guess
(up to factor two) the size of the optimal cover ๐. Thus, a set is โlargeโ if it covers at
least 1/๐ fraction of the remaining elements. A small set, on the other hand, can cover
only a โfewโ elements, and we can store (approximately) what elements it covers by storing
(in memory) an appropriate random sample. At the end of the pass, we have (in memory)
the projections of โsmallโ sets onto the random sample, and we compute the optimal set
cover for this projected instance using an offline solver. By carefully choosing the size of
the random sample, this guarantees that only a small fraction of the set system remains
uncovered. The algorithm then makes an additional pass to find the residual set system
(i.e., the yet uncovered elements), making two passes in each iteration, and continuing to
the next iteration.
Thus, one can think about the algorithm as being based on a simple iterative โdimension-
ality reductionโ approach. Specifically, in two passes over the data, the algorithm selects a
โsmallโ number of sets that cover all but ๐โ๐ฟ fraction of the uncovered elements, while using
only ๐(๐๐๐ฟ) space. By performing the reduction step 1/๐ฟ times we obtain a complete cover.
The dimensionality reduction step is implemented by computing a small cover for a random
subset of the elements, which also covers the vast majority of the elements in the ground set.
This ensures that the remaining sets, when restricted to the random subset of the elements,
occupy only ๐(๐๐๐ฟ) space. As a result the procedure avoids a complex set of recursive calls
as presented in Demaine et al. [61], which leads to a simpler and more efficient algorithm.
Geometric results. Further using techniques and results from computational geometry
we show how to modify our algorithm so that it achieves almost optimal bounds for the
Set Cover problem on geometric instances. In particular, we show that it gives a ๐(1)-pass
๐(๐)-approximation algorithm using ๐(๐) space when the elements are points in R2 and the
sets are either discs, axis parallel rectangles, or fat triangles in the plane. In particular, we
37
Figure 2.1.1: A collection of ๐2/4 distinct rectangles (i.e., sets), each containing two points.
use the following surprising property of the set systems that arise out of points and disks:
the the number of sets is nearly linear as long as one considers only sets that contain โa fewโ
points.
More surprisingly, this property extends, with a twist, to certain geometric range spaces
that might have quadratic number of shallow ranges. Indeed, it is easy to show an example of
๐ points in the plane, where there are ฮฉ(๐2) distinct rectangles, each one containing exactly
two points, see Figure 2.1.1. However, one can โsplitโ such ranges into a small number of
canonical sets, such that the number of shallow sets in the canonical set system is near linear.
This enables us to store the small canonical sets encountered during the scan explicitly in
memory, and still use only near linear space.
We note that the idea of splitting ranges into small canonical ranges is an old idea in
orthogonal range searching. It was used by Aronov et al. [15] for computing small ๐-nets for
these range spaces. The idea in the form we use, was further formalized by Ene et al. [68].
Lower bounds. The lower bounds for multi-pass algorithms for the Set Cover problem
are obtained via a careful reduction from Intersection Set Chasing. The latter problem is a
communication complexity problem where ๐ players need to solve a certain โset-disjointness-
likeโ problem in ๐ rounds. A recent paper [90] showed that this problem requires ๐1+ฮฉ(1/๐)
๐๐(1)
bits of communication complexity for ๐ rounds. This yields our desired trade-off of ฮฉ(๐๐๐ฟ)
space in 1/2๐ฟ passes for exact protocols for Set Cover in the communication model and hence
in the streaming model for ๐ = ๐(๐). Furthermore, we show a stronger lower bound on
memory space of sparse instances of Set Cover in which all input sets have cardinality at
most ๐ . By a reduction from a variant of Equal Pointer Chasing which maps the problem to
38
Algorithm 1 A tight streaming algorithm for the (unweighted) Set Cover problem. Here,OfflineSC is an offline solver for Set Cover that provides ๐-approximation, and ๐ is someappropriate constant.
1: procedure IterSetCover((๐ฐ ,โฑ), ๐ฟ)2: โ try (in parallel) all possible (2-approx) sizes of optimal cover3: for ๐ โ {2๐ | 0 โค ๐ โค log ๐} in parallel do โ ๐ = |๐ฐ|4: solโ โ 5: for ๐ = 1 to 1/๐ฟ do6: let S be a sample of ๐ฐ of size ๐๐๐๐๐ฟ log๐ log ๐7: Lโ S, โฑS โ โ 8: for ๐ โ โฑ do โ by doing one pass9: if |L โฉ ๐| โฅ |S|/๐ then โ size test10: solโ sol โช {๐}11: Lโ L โ ๐12: else13: โฑS โ โฑS โช {๐ โฉ L} โ store the set ๐ โฉ L explicitly in memory
14: ๐ โ OfflineSC(L,โฑS, ๐)15: solโ sol
โ๐16: ๐ฐ โ ๐ฐ โโ๐โsol ๐ โ by doing additional pass over data
17: return best sol computed in all parallel executions.
a sparse instance of Set Cover, we show that in order to have an exact streaming algorithm
for ๐ -Sparse Set Cover with ๐(๐๐ ) space, ฮฉ(log ๐) passes is necessary. More precisely, any
( 12๐ฟโ 1)-pass exact randomized algorithm for ๐ -Sparse Set Cover requires ฮฉ(๐๐ ) memory
space, if ๐ โค ๐๐ฟ and ๐ = ๐(๐).
Our single pass lower bound proceeds by showing a lower bound for a one-way commu-
nication complexity problem in which one party (Alice) has a collection of sets, and the
other party (Bob) needs to determine whether the complement of his set is covered by one
of the Aliceโs sets. We show that if Aliceโs sets are chosen at random, then Bob can decode
Aliceโs input by employing a small collection of โqueryโ sets. This implies that the amount
of communication needed to solve the problem is linear in the description size of Aliceโs sets,
which is ฮฉ(๐๐).
2.2. Streaming Algorithm for Set Cover
In this section, we design an efficient streaming algorithm for the Set Cover problem that
matches the lower bound results we already know about the problem. In the Set Cover
problem, for a given set system (๐ฐ ,โฑ), the goal is to find a subset โ โ โฑ , such that โ covers
39
๐ฐ and its cardinality is minimum. In Algorithm 1, we sketch the IterSetCover algorithm.
In the IterSetCover algorithm, we have access to the OfflineSC subroutine that
solves the given Set Cover instance offline (using linear space) and returns a ๐-approximate
solution where ๐ could be anywhere between 1 and ฮ(log ๐) depending on the computational
model one assumes. Under exponential computational power, we can achieve the optimal
cover of the given instance of the Set Cover (๐ = 1); however, under P = NP assumption, ๐
cannot be better than ๐ ยท ln๐ where ๐ is a constant [72, 152, 14, 138, 63] given polynomial
computational power.
Let ๐ = |๐ฐ| be the initial number of elements in the given ground set. The IterSet-
Cover algorithm, needs to guess (up to a factor of two) the size of the optimal cover of
(๐ฐ ,โฑ). To this end, the algorithm tries, in parallel, all values ๐ in {2๐ | 0 โค ๐ โค log ๐}. Thisstep will only increase the memory space requirement by a factor of log ๐.
Consider the run of the IterSetCover algorithm, in which the guess ๐ is correct (i.e.,
|OPT| โค ๐ < 2|OPT|, where OPT is an optimal solution). The idea is to go through ๐(1/๐ฟ)
iterations such that each iteration only makes two passes and at the end of each iteration the
number of uncovered elements reduces by a factor of ๐๐ฟ. Moreover, the algorithm is allowed
to use ๐(๐๐๐ฟ) space.
In each iteration, the algorithm starts with the current ground set of uncovered elements
๐ฐ , and copies it to a leftover set L. Let S be a large enough uniform sample of elements
๐ฐ . In a single pass, using S, we estimate the size of all large sets in โฑ and add ๐ โ โฑ to
the solution sol immediately (thus avoiding the need to store it in memory). Formally, if ๐
covers at least ฮฉ(|๐ฐ|/๐) yet-uncovered elements of L then it is a heavy set, and the algorithm
immediately adds it to the output cover. Otherwise, if a set is small, i.e., its covers less than
|๐ฐ|/๐ uncovered elements of L, the algorithm stores the set ๐ in memory. Fortunately, it is
enough to store its projection over the sampled elements explicitly (i.e., ๐โฉL) โ this requiresremembering only the ๐(|S|/๐) indices of the elements of ๐ โฉ L.
In order to show that a solution of the Set Cover problem over the sampled elements
is a good cover of the initial Set Cover instance, we apply the relative (๐, ๐)-approximation
sampling result of [100] (see Definition 2.2.4) and it is enough for S to be of size ๐(๐๐๐๐ฟ).
Using relative (๐, ๐)-approximation sampling, we show that after two passes the number of
40
uncovered elements is reduced by a factor of ๐๐ฟ. Note that the relative (๐, ๐)-approximation
sampling improves over the Element Sampling technique used in [61] with respect to the
number of passes.
Since in each iteration we pick ๐(๐๐) sets and the number of uncovered elements decreases
by a factor of ๐๐ฟ, after 1/๐ฟ iterations the algorithm picks ๐(๐๐/๐ฟ) sets and covers all elements.
Moreover, the memory space of the whole algorithm is ๐(๐๐๐๐ฟ) (see Lemma 2.2.2).
2.2.1. Analysis of IterSetCover
In the rest of this section we prove that the IterSetCover algorithm with high probabil-
ity returns a ๐(๐/๐ฟ)-approximate solution of Set Cover(๐ฐ ,โฑ) in 2/๐ฟ passes using ๐(๐๐๐ฟ)
memory space.
Lemma 2.2.1. The number of passes the IterSetCover algorithm makes is 2/๐ฟ.
Proof: In each of the 1/๐ฟ iterations of the IterSetCover algorithm, the algorithm makes
two passes. In the first pass, based on the set of sampled elements S, it decides whether to
pick a set or keep its projection over S (i.e., ๐ โฉ L) in the memory. Then the algorithm calls
OfflineSC which does not require any passes over โฑ . The second pass is for computing
the set of uncovered elements at the end of the iteration. We need this pass because we
only know the projection of the sets we picked in the current iteration over S and not over
the original set of uncovered elements. Thus, in total we make 2/๐ฟ passes. Also note that
for different guesses for the value of ๐, we run the algorithm in parallel and hence the total
number of passes remains 2/๐ฟ. ๏ฟฝ
Lemma 2.2.2. The memory space used by the IterSetCover algorithm is ๐(๐๐๐ฟ).
Proof: In each iteration of the algorithm, it picks during the first pass at most ๐ sets (more
precisely at most ๐ sets) which requires ๐(๐ log๐) memory. Moreover, in the first pass we
keep the projection of the sets whose projection over the uncovered sampled elements has
size at most |S|/๐. Since there are at most ๐ such sets, the total required space for storing
the projections is bounded by ๐(๐๐๐๐ฟ log๐ log ๐
).
Since in the second pass the algorithm only updates the set of uncovered elements, the
amount of space required in the second pass is ๐(๐). Thus, the total required space to
41
perform each iteration of the IterSetCover algorithm is ๐(๐๐๐ฟ). Moreover, note that
the algorithm does not need to keep the memory space used by the earlier iterations; thus,
the total space consumed by the algorithm is ๐(๐๐๐ฟ). ๏ฟฝ
Next we show the sets we picked before calling OfflineSC has large size on ๐ฐ .
Lemma 2.2.3. With probability at least 1 โ ๐โ๐ all sets that pass the โsize testโ in the
IterSetCover algorithm have size at least |๐ฐ|/๐๐.
Proof: Let ๐ be a set of size less than |๐ฐ|/๐๐. In expectation, |๐ โฉ S| is less than (|๐ฐ|/๐๐) ยท(|S|/|๐ฐ|) = ๐๐๐ฟ log๐ log ๐. By Chernoff bound for large enough ๐,
Pr(|๐ โฉ S| โฅ ๐๐๐๐ฟ log๐ log ๐) โค ๐โ(๐+1).
Applying the union bound, with probability at least 1โ๐โ๐, all sets passing โsize testโ have
size at least |๐ฐ|/(๐๐). ๏ฟฝ
In what follows we define the relative (๐, ๐)-approximation sample of a set system and men-
tion the result of Har-Peled and Sharir [100] on the minimum required number of sampled
elements to get a relative (๐, ๐)-approximation of the given set system.
Definition 2.2.4. Let (V,โ) be a set system, i.e., V is a set of elements and โ โ 2V is a
family of subsets of the ground set V. For given parameters 0 < ๐, ๐ < 1, a subset ๐ โ V is
a relative (๐, ๐)-approximation for (V,โ), if for each ๐ โ โ, we have that if |๐| โฅ ๐|V| then
(1โ ๐)|๐||V| โค
|๐ โฉ ๐||๐| โค (1 + ๐)
|๐||V| .
If the range is light (i.e., |๐| < ๐|V|) then it is required that
|๐||V| โ ๐๐ โค |๐ โฉ ๐|
|๐| โค |๐||V| + ๐๐.
Namely, ๐ is (1 ยฑ ๐)-multiplicative good estimator for the size of ranges that are at least
๐-fraction of the ground set.
The following lemma is a simplified variant of a result in Har-Peled and Sharir [100] โ
indeed, a set system with๐ sets, can have VC dimension at most log๐ . This simplified form
42
also follows by a somewhat careful but straightforward application of Chernoffโs inequality.
Lemma 2.2.5. Let (๐ฐ ,โฑ) be a finite set system, and ๐, ๐, ๐ be parameters. Then, a random
sample of ๐ฐ such that |๐ฐ| = ๐โฒ
๐2๐
(log |โฑ| log 1
๐+ log 1
๐
), for an absolute constant ๐โฒ is a relative
(๐, ๐)-approximation, for all ranges in โฑ , with probability at least (1โ ๐).
Lemma 2.2.6. Assuming |OPT| โค ๐ โค 2|OPT|, after any iteration, with probability at
least 1 โ ๐1โ๐/4 the number of uncovered elements decreases by a factor of ๐๐ฟ, and this
iteration adds ๐(๐|OPT|) sets to the output cover.
Proof: Let V โ ๐ฐ be the set of uncovered elements at the beginning of the iteration and
note that the total number of sets that is picked during the iteration is at most (1+ ๐)๐ (see
Lemma 2.2.3). Consider all possible such covers, that is ๐ข = {โฑ โฒ โ โฑ| |โฑ โฒ| โค (1 + ๐)๐},and observe that |๐ข| โค ๐(1+๐)๐. Let โ be the collection that contains all possible sets
of uncovered elements at the end of the iteration, defined as โ ={V โโ๐โ๐ ๐
๐ โ ๐ข
}.
Moreover, set ๐ = 2/๐๐ฟ, ๐ = 1/2 and ๐ = ๐โ๐ and note that |โ| โค |๐ข| โค ๐(1+๐)๐. Since
๐โฒ
๐2๐(log |โ| log 1
๐+ log 1
๐) โค ๐๐๐๐๐ฟ log๐ log ๐ = |S| for large enough ๐, by Lemma 2.2.5, S is a
relative (๐, ๐)-approximation of (V,โ) with (1โ ๐) probability. Let ๐ โ โฑ be the collection
of sets picked during the iteration which covers all elements in S. Since S is a relative (๐, ๐)-
approximation sample of (V,โ) with probability at least 1โ๐โ๐, the number of uncovered
elements of V (or ๐ฐ) by ๐ is at most ๐๐|V| = |๐ฐ|/๐๐ฟ.
Hence, in each iteration we pick ๐(๐๐) sets and at the end of iteration the number of
uncovered elements reduces by ๐๐ฟ. ๏ฟฝ
Lemma 2.2.7. The IterSetCover algorithm computes a set cover of (๐ฐ ,โฑ), whose sizeis within a ๐(๐/๐ฟ) factor of the size of an optimal cover with probability at least 1โ๐1โ๐/4.
Proof: Consider the run of IterSetCover for which the value of ๐ is between |OPT| and2|OPT|. In each of the (1/๐ฟ) iterations made by the algorithm, by Lemma 2.2.6, the number
of uncovered elements decreases by a factor of ๐๐ฟ where ๐ is the number of initial elements
to be covered by the sets. Moreover, the number of sets picked in each iteration is ๐(๐๐).
Thus after (1/๐ฟ) iterations, all elements would be covered and the total number of sets in
the solution is ๐(๐|OPT|/๐ฟ). Moreover by Lemma 2.2.6, the success probability of all the
iterations, is at least 1โ 1๐ฟ๐๐/4 โฅ 1โ (1/๐)
๐4โ1. ๏ฟฝ
43
Theorem 2.2.8. The IterSetCover(๐ฐ ,โฑ , ๐ฟ) algorithm makes 2/๐ฟ passes, uses ๐(๐๐๐ฟ)
memory space, and finds a ๐(๐/๐ฟ)-approximate solution of the Set Cover problem with high
probability.
Furthermore, given enough number of passes the IterSetCover algorithm matches
the known lower bound on the memory space of the streaming Set Cover problem up to a
polylog(๐) factor where ๐ is the number of sets in the input.
Proof: The first part of the proof implied by Lemma 2.2.1, Lemma 2.2.2, and Lemma 2.2.7.
As for the lower bound, note that by a result of Nisan [146], any randomized ( log๐2)-
approximation protocol for Set Cover(๐ฐ ,โฑ) in the one-way communication model requires
ฮฉ(๐) bits of communication, no matter how many number of rounds it makes. This implies
that any randomized ๐(log ๐)-pass, ( log๐2)-approximation algorithm for Set Cover(๐ฐ ,โฑ) re-
quires ฮฉ(๐) space, even under the exponential computational power assumption.
By the above, the IterSetCover algorithm makes ๐(1/๐ฟ) passes and uses ๐(๐๐๐ฟ)
space to return a ๐(1๐ฟ)-approximate solution under the exponential computational power
assumption (๐ = 1). Thus by letting ๐ฟ = ๐/ log ๐, we will have a ( log๐2
)-approximation
streaming algorithm using ๐(๐) space which is optimal up to a factor of polylog(๐). ๏ฟฝ
Theorem 2.2.8 provides a strong indication that our trade-off algorithm is optimal.
2.3. Lower Bound for Single Pass Algorithms
In this section, we study the Set Cover problem in the two-party communication model and
give a tight lower bound on the communication complexity of the randomized protocols
solving the problem in a single round. In the two-party Set Cover, we are given a set of
elements ๐ฐ and there are two players Alice and Bob where each of them has a collection of
subsets of ๐ฐ , โฑ๐ด and โฑ๐ต. The goal for them is to find a minimum size cover ๐ โ โฑ๐ด โช โฑ๐ต
covering ๐ฐ while communicating the fewest number of bits from Alice to Bob (In this model
Alice communicates to Bob and then Bob should report a solution).
Our main lower bound result for the single pass protocols for Set Cover is the following
theorem which implies that the naive approach in which one party sends all of its sets to the
the other one is optimal.
44
Theorem 2.3.1. Any single round randomized protocol that approximates Set Cover(๐ฐ ,โฑ)within a factor better than 3/2 and error probability ๐(๐โ๐) requires ฮฉ(๐๐) bits of commu-
nication where ๐ = |๐ฐ| and ๐ = |โฑ| and ๐ is a sufficiently large constant.
We consider the case in which the parties want to decide whether there exists a cover of
size 2 for ๐ฐ in โฑ๐ด โชโฑ๐ต or not. If any of the parties has a cover of size at most 2 for ๐ฐ , thenit becomes trivial. Thus the question is whether there exist ๐๐ โ โฑ๐ด and ๐๐ โ โฑ๐ต such that
๐ฐ โ ๐๐ โช ๐๐.
A key observation is that to decide whether there exist ๐๐ โ โฑ๐ด and ๐๐ โ โฑ๐ต such that
๐ฐ โ ๐๐ โช ๐๐, one can instead check whether there exists ๐๐ โ โฑ๐ด and ๐๐ โ โฑ๐ต such that
๐๐ โฉ ๐๐ = โ . In other words we need to solve OR of a series of two-party Set Disjointness
problems. In two-party Set Disjointness problem, Alice and Bob are given subsets of ๐ฐ , ๐๐
and ๐๐ and the goal is to decide whether ๐๐ โฉ ๐๐ is empty or not with the fewest possible
bits of communication. Set Disjointness is a well-studied problem in the communication
complexity and it has been shown that any randomized protocol for Set Disjointness with
๐(1) error probability requires ฮฉ(๐) bits of communication where ๐ = |๐ฐ| [27, 111, 153].We can think of the following extensions of the Set Disjointness problem.
I. Many vs One: in this variant, Alice has๐ subsets of ๐ฐ , โฑ๐ด and Bob is given a single set
๐๐. The goal is to determine whether there exists a set ๐๐ โ โฑ๐ด such that ๐๐ โฉ๐๐ = โ .
II. Many vs Many: in this variant, each of Alice and Bob are given a collection of subsets
of ๐ฐ and the goal for them is to determine whether there exist ๐๐ โ โฑ๐ด and ๐๐ โ โฑ๐ต
such that ๐๐ โฉ ๐๐ = โ .
Note that deciding whether two-party Set Cover has a cover of size 2 is equivalent to
solving the (Many vs Many)-Set Disjointness problem. Moreover, any lower bound for
(Many vs One)-Set Disjointness clearly implies the same lower bound for the (Many vs
Many)-Set Disjointness problem. In the following theorem we show that any single-round
randomized protocol that solves (Many vs One)-Set Disjointness(๐,๐) with ๐(๐โ๐) error
probability requires ฮฉ(๐๐) bits of communication.
Theorem 2.3.2. Any randomized protocol for (Many vs One)-Set Disjointness(๐,๐) with
45
error probability that is ๐(๐โ๐) requires ฮฉ(๐๐) bits of communication if ๐ โฅ ๐1 log๐ where
๐ and ๐1 are large enough constants.
The idea is to show that if there exists a single-round randomized protocol for the problem
with ๐(๐๐) bits of communication and error probability ๐(๐โ๐), then with constant proba-
bility one can distinguish ฮฉ(2๐๐) distinct inputs using ๐(๐๐) bits which is a contradiction.
Suppose that Alice has a collection of ๐ uniformly and independently random subsets of
๐ฐ (in each of her subsets the probability that ๐ โ ๐ฐ is in the subset is 1/2). Lets assume that
there exists a single round protocol I for (Many vs One)-Set Disjointness(๐,๐) with error
probability ๐(๐โ๐) using ๐(๐๐) bits of communication. Let ExistsDisj be Bobโs algorithm
in protocol I. Then we show that one can recover ๐๐ random bits with constant probability
using ExistsDisj subroutine and the message ๐ sent by the first party in protocol I. The
RecoverBit which is shown in Algorithm 2, is the algorithm to recover random bits using
protocol I and ExistsDisj.
To this end, Bob gets the message ๐ communicated by protocol I from Alice and considers
all subsets of size ๐1 log๐ and ๐1 log๐+1 of ๐ฐ . Note that ๐ is communicated only once and
thus the same ๐ is used for all queries that Bob makes. Then at each step Bob picks a random
subset ๐๐ of size ๐1 log๐ of ๐ฐ and solve the (Many vs One)-Set Disjointness problem with
input (โฑ๐ด, ๐๐) by running ExistsDisj(๐ , ๐๐). Next we show that if ๐๐ is disjoint from a set
in โฑ๐ด, then with high probability there is exactly one set in โฑ๐ด which is disjoint from ๐๐
(see Lemma 2.3.3). Thus once Bob finds out that his query, ๐๐, is disjoint from a set in โฑ๐ด,
he can query all sets ๐+๐ โ {๐๐ โช ๐|๐ โ ๐ฐ โ ๐๐} and recover the set (or union of sets) in โฑ๐ด
that is disjoint from ๐๐. By a simple pruning step we can detect the ones that are union of
more than one set in โฑ๐ด and only keep the sets in โฑ๐ด.
In Lemma 2.3.6, we show that the number of queries that Bob is required to make to
recover โฑ๐ด is ๐(๐๐) where ๐ is a constant.
Lemma 2.3.3. Let ๐๐ be a random subset of ๐ฐ of size ๐ log๐ and let โฑ๐ด be a collection
of m random subsets of ๐ฐ . The probability that there exists exactly one set in โฑ๐ด that is
disjoint from ๐๐ is at least1
๐๐+1 .
46
Algorithm 2 RecoverBit uses a protocol for (Many vs One)-Set Disjointness(๐,๐) torecover Aliceโs sets, โฑ๐ด in Bobโs side.
1: procedure RecoverBit(๐ฐ , ๐ )2: โฑ๐ โ โ 3: for ๐ = 1 to ๐๐ log๐ do4: let ๐๐ be a random subset of ๐ฐ of size ๐1 log๐5: if ExistsDisj(๐ , ๐๐) = true then6: โ discovering the set (or union of sets)7: โ in โฑ๐ด disjoint from ๐๐
8: ๐ โ โ 9: for ๐ โ ๐ฐ โ ๐๐ do10: if ExistsDisj(๐๐ โช ๐, ๐ ) = false then11: ๐ โ ๐ โช ๐12: if โ๐ โฒ โ โฑ๐ s.t. ๐ โ ๐ โฒ then โ pruning step13: โฑ๐ โ โฑ๐ โ {๐ โฒ}14: โฑ๐ โ โฑ๐ โช {๐}15: else if @๐ โฒ โ โฑ๐ s.t. ๐ โฒ โ ๐ then16: โฑ๐ โ โฑ๐ โช {๐}17: return โฑ๐
Proof: The probability that ๐๐ is disjoint from exactly one set in โฑ๐ด is
Pr(๐๐ is disjoint from โฅ 1 set in โฑ๐ด)โPr(๐๐ is disjoint from โฅ 2 sets in โฑ๐ด)
โฅ (1
2)๐ log๐ โ
(๐
2
)(1
2)2๐ log๐ โฅ 1
๐๐+1.
First we prove the first term in the above inequality. For an arbitrary set ๐ โ โฑ๐ด, since any
element is contained in ๐ with probability 12, the probability that ๐ is disjoint from ๐๐ is
(1/2)๐ log๐.
Pr(๐๐ is disjoint from at least one set in โฑ๐ด) โฅ 2โ๐ log๐.
Moreover since there exist(๐2
)pairs of sets in โฑ๐ด, and for each ๐1, ๐2 โ โฑ๐ด, the probability
that ๐1 and ๐2 are disjoint from ๐๐ is ๐โ2๐,
Pr(๐๐ is disjoint from at least two sets in โฑ๐ด) โค ๐โ(2๐โ2). ๏ฟฝ
A family of setsโณ is called intersecting if and only if for any sets ๐ด,๐ต โโณ either both
47
๐ด โ ๐ต and ๐ต โ ๐ด are non-empty or both ๐ด โ ๐ต and ๐ต โ ๐ด are empty; in other words, there
exists no ๐ด,๐ต โ โณ such that ๐ด โ ๐ต. Let โฑ๐ด be a collection of subsets of ๐ฐ . We show
that with high probability after testing ๐(๐๐) queries for sufficiently large constant ๐, the
RecoverBit algorithm recovers โฑ๐ด completely if โฑ๐ด is intersecting. First we show that
with high probability the collection โฑ๐ด is intersecting.
Observation 2.3.4. Let โฑ๐ด be a collection of ๐ uniformly random subsets of ๐ฐ where
|๐ฐ| โฅ ๐ log๐. With probability at least 1โ๐โ๐/4+2, โฑ๐ด is an intersecting family.
Proof: The probability that ๐1 โ ๐2 is (34)๐ and there are at most ๐(๐โ 1) pairs of sets in
โฑ๐ด. Thus with probability at least 1โ๐2(34)๐ โฅ 1โ 1/๐
๐4โ2, โฑ๐ด is intersecting. ๏ฟฝ
Observation 2.3.5. The number of distinct inputs of Alice (collections of random subsets
of ๐ฐ), that is distinguishable by RecoverBit is ฮฉ(2๐๐).
Proof: There are 2๐๐ collections of ๐ random subsets of ๐ฐ . By Observation 2.3.4, ฮฉ(2๐๐)
of them are intersecting. Since we can only recover the sets in the input collection and not
their order, the distinct number of input collection that are distinguished by RecoverBit
is ฮฉ(2๐๐
๐!) which is ฮฉ(2๐๐) for ๐ โฅ ๐ log๐. ๏ฟฝ
By Observation 2.3.4 and only considering the case such that โฑ๐ด is intersecting, we have the
following lemma.
Lemma 2.3.6. Let โฑ๐ด be a collection of ๐ uniformly random subsets of ๐ฐ and suppose
that |๐ฐ| โฅ ๐ log๐. After testing at most ๐๐ queries, with probability at least (1 โ 1๐)๐๐
๐,
โฑ๐ด is fully recovered, where ๐ is the success rate of protocol I for the (Many vs One)-Set
Disjointness problem.
Proof: By Lemma 2.3.3, for each ๐๐ โ ๐ฐ of size ๐1 log๐ the probability that ๐๐ is disjoint
from exactly one set in a random collection of sets โฑ๐ด is at least 1/๐๐1+1. Given ๐๐ is disjoint
from exactly one set in โฑ๐ด, due to symmetry of the problem, the chance that ๐๐ is disjoint
from a specific set ๐ โ โฑ๐ด is at least 1๐๐1+2 . After ๐ผ๐๐1+2 log๐ queries where ๐ผ is a large
enough constant, for any ๐ โ โฑ๐ด, the probability that there is not a query ๐๐ that is only
disjoint from ๐ is at most (1โ 1๐๐1+2 )๐ผ๐
๐1+2 log๐ โค ๐โ๐ผ log๐ = 1๐๐ผ .
Thus after trying ๐ผ๐๐1+2 log๐ queries, with probability at least (1โ 12๐๐ผโ1 ) โฅ (1โ 1
๐),
48
for each ๐ โ โฑ๐ด we have at least one query that is only disjoint from ๐ (and not any other
sets in โฑ๐ด โ ๐).Once we have a query subset ๐๐ which is only disjoint from a single set ๐ โ โฑ๐ด, we can
ask ๐โ๐ log๐ queries of size ๐1 log๐+1 and recover ๐. Note that if ๐๐ is disjoint from more
than one sets in โฑ๐ด simultaneously, the process (asking ๐โ๐ log๐ queries of size ๐1 log๐+1)
will end up in recovering the union of those sets. Since โฑ๐ด is an intersecting family with
high probability (Observation 2.3.4), by pruning step in the RecoverBit algorithm we are
guaranteed that at the end of the algorithm, what we returned is exactly โฑ๐ด. Moreover the
total number of queries the algorithm makes is at most
๐ร (๐ผ๐๐1+2 log๐) โค ๐ผ๐๐1+3 log๐ โค ๐๐
for ๐ โฅ ๐1 + 4.
Thus after testing ๐๐ queries, โฑ๐ด will be recovered with probability at least (1โ 1๐)๐๐
๐
where ๐ is the success probability of the protocol I for (Many vs One)-Set Disjointness(๐,๐).๏ฟฝ
Corollary 2.3.7. Let I be a protocol for (Many vs One)-Set Disjointness(๐,๐) with error
probability ๐(๐โ๐) and ๐ bits of communication such that ๐ โฅ ๐ log๐ for large enough ๐.
Then RecoverBit recovers โฑ๐ด with constant success probability using ๐ bits of communi-
cation.
By Observation 2.3.5, since RecoverBit distinguishes ฮฉ(2๐๐) distinct inputs with con-
stant probability of success (by Corollary 2.3.7), the size of message sent by Alice, should be
ฮฉ(๐๐). This proves Theorem 2.3.2.
Proof of Theorem 2.3.1: As we showed earlier, the communication complexity of (Many vs
One)-Set Disjointness is a lower bound for the communication complexity of Set Cover.
Theorem 2.3.2 showed that any protocol for (Many vs One)-Set Disjointness(๐, |โฑ๐ด)| witherror probability less than ๐(๐โ๐) requires ฮฉ(๐๐) bits of communication. Thus any single-
round randomized protocol for Set Cover with error probability ๐(๐โ๐) requires ฮฉ(๐๐) bits
of communication. ๏ฟฝ
Since any ๐-pass streaming ๐ผ-approximation algorithm for problem P that uses ๐(๐ )
49
memory space, is a ๐-round two-party ๐ผ-approximation protocol for problem P using ๐(๐ ๐)
bits of communication [88], and by Theorem 2.3.1, we have the following lower bound for
Set Cover problem in the streaming model.
Theorem 2.3.8. Any single-pass randomized streaming algorithm for Set Cover(๐ฐ ,โฑ) that
computes a (3/2)-approximate solution with probability ฮฉ(1โ๐โ๐) requires ฮฉ(๐๐) memory
space (assuming ๐ โฅ ๐1 log๐).
2.4. Geometric Set Cover
In this section, we consider the streaming Set Cover problem in the geometric settings. We
present an algorithm for the case where the elements are a set of ๐ points in the plane R2
and the ๐ sets are either all disks, all axis-parallel rectangles, or all ๐ผ-fat triangles (which for
simplicity we call shapes) given in a data stream. As before, the goal is to find the minimum
size cover of points from the given sets. We call this problem the Points-Shapes Set Cover
problem.
Note that, the description of each shape requires ๐(1) space and thus the Points-Shapes
Set Cover problem is trivial to be solved in ๐(๐ + ๐) space. In this setting the goal is to
design an algorithm whose space is sub-linear in ๐(๐ + ๐). Here we show that almost the
same algorithm as IterSetCover (with slight modifications) uses ๐(๐) space to find an
๐(๐)-approximate solution of the Points-Shapes Set Cover problem in constant passes.
2.4.1. Preliminaries
A triangleโณ is called ๐ผ-fat (or simply fat) if the ratio between its longest edge and its height
on this edge is bounded by a constant ๐ผ > 1 (there are several equivalent definitions of ๐ผ-fat
triangles).
Definition 2.4.1. Let (๐ฐ ,โฑ) be a set system such that ๐ฐ is a set of points and โฑ is a
collection of shapes, in the plane R2. The canonical representation of (๐ฐ ,โฑ) is a collection โฑ โฒ
of regions such that the following conditions hold. First, each ๐ โฒ โ โฑ โฒ has ๐(1) description.
Second, for each ๐ โฒ โ โฑ โฒ, there exists ๐ โ โฑ such that ๐ โฒ โฉ ๐ฐ โ ๐ โฉ ๐ฐ . Finally, for each
๐ โ โฑ , there exists ๐1 sets ๐ โฒ1, ยท ยท ยท , ๐ โฒ
๐1โ โฑ โฒ such that ๐ โฉ ๐ฐ = (๐ โฒ
1 โช ยท ยท ยท โช ๐ โฒ๐1) โฉ ๐ฐ for some
constant ๐1.
50
The following two results are from [68] which are the formalization of the ideas in [15].
Lemma 2.4.2. (Lemma 4.18 in [68]) Given a set of points ๐ฐ in the plane R2 and a parameter
๐ค, one can compute a set โฑ โฒtotal of ๐(|๐ฐ|๐ค2 log |๐ฐ|) axis-parallel rectangles with the following
property. For an arbitrary axis-parallel rectangle ๐ that contains at most ๐ค points of ๐ฐ , thereexist two axis-parallel rectangles ๐ โฒ
1, ๐โฒ2 โ โฑ โฒ
total whose union has the same intersection with
๐ฐ as ๐, i.e., ๐ โฉ ๐ฐ = (๐ โฒ1 โช ๐ โฒ
2) โฉ ๐ฐ .
Lemma 2.4.3. (Theorem 5.6 in [68]) Given a set of points ๐ฐ in R2, a parameter ๐ค and
a constant ๐ผ, one can compute a set โฑ โฒtotal of ๐(|๐ฐ|๐ค3 log2 |๐ฐ|) regions each having ๐(1)
description with the following property. For an arbitrary ๐ผ-fat triangle ๐ that contains at
most ๐ค points of ๐ฐ , there exist nine regions from โฑ โฒtotal whose union has the same intersection
with ๐ฐ as ๐.
Using the above lemmas we get the following lemma.
Lemma 2.4.4. Let ๐ฐ be a set of points in R2 and let โฑ be a set of shapes (discs, axis-parallel
rectangles or fat triangles), such that each set in โฑ contains at most ๐ค points of ๐ฐ . Then, ina single pass over the stream of sets โฑ , one can compute the canonical representation โฑ โฒ of
(๐ฐ ,โฑ). Moreover, the size of the canonical representation is at most ๐(|๐ฐ|๐ค3 log2 |๐ฐ|) andthe space requirement of the algorithm is ๐(|โฑ โฒ|) = ๐(|๐ฐ|๐ค3).
Proof: For the case of axis-parallel rectangles and fat triangles, first we use Lemma 2.4.2 and
Lemma 2.4.3 to get the set โฑ โฒtotal offline which require ๐(โฑ โฒ
total) =๐(|๐ฐ|๐ค3 log2 |๐ฐ|) memory
space. Then by making one pass over the stream of sets โฑ , we can find the canonical
representation โฑ โฒ by picking all the sets ๐ โฒ โ โฑ โฒtotal such that ๐ โฒ โฉ ๐ฐ โ ๐ โฉ ๐ฐ for some
๐ โ โฑ . For discs however, we just make one pass over the sets โฑ and keep a maximal subset
โฑ โฒ โ โฑ such that for each pair of sets ๐ โฒ1, ๐
โฒ2 โ โฑ โฒ their projection on ๐ฐ are different, i.e.,
๐ โฒ1 โฉ ๐ฐ = ๐ โฒ
2 โฉ ๐ฐ . By a standard technique of Clarkson and Shor [50], it can be proved that
the size of the canonical representation, i.e., |๐ โฒ|, is bounded by ๐(|๐ฐ|๐ค2). Note that this is
just counting the number of discs that contain at most ๐ค points, namely the at most ๐ค-level
discs. ๏ฟฝ
51
2.4.2. Description of GeomSC Algorithm
The outline of the GeomSC algorithm (shown in Algorithm 3) is very similar to the Iter-
SetCover algorithm presented earlier in Section 2.2.
In the first pass, the algorithm picks all the sets that cover a large number of yet-uncovered
elements. Next, we sample S. Since we have removed all the ranges that have large size, in
the first pass, the size of the remaining ranges restricted to the sample S is small. Therefore
by Lemma 2.4.4, the canonical representation of (S,โฑS) has small size and we can afford to
store it in the memory. We use Lemma 2.4.4 to compute the canonical representation โฑS in
one pass. The algorithm then uses the sets in โฑS to find a cover solS for the points of S.
Next, in one additional pass, the algorithm replaces each set in solS by one of its supersets
in โฑ .Finally, note that in IterSetCover, we are assuming that the size of the optimal
solution is ๐(๐). Thus it is enough to stop the iterations once the number of uncovered
elements is less than ๐. Then we can pick an arbitrary set for each of the uncovered elements.
This would add only ๐ more sets to the solution. Using this idea, we can reduce the size of
the sampled elements down to ๐๐๐(๐๐)๐ฟ log๐ log ๐ which would help us in getting near-linear
space in the geometric setting. Note that the final pass of the algorithm can be embedded
into the previous passes but for the sake of clarity we write it separately.
2.4.3. Analysis
By a similar approach to what we used in Section 2.2 to analyze the pass count and approx-
imation guarantee of IterSetCover algorithm, we can show that the number of passes of
the GeomSC algorithm is 3/๐ฟ + 1 (which can be reduced to 3/๐ฟ with minor changes), and
the algorithm returns an ๐(๐/๐ฟ)-approximate solution. Next, we analyze the space usage
and the correctness of the algorithm. Note that our analysis in this section only works for
๐ฟ โค 1/4.
Lemma 2.4.5. The algorithm uses ๐(๐) space.
Proof: Consider an iteration of the algorithm. The memory space used in the first pass of
each iteration is ๐(๐). The size of S is ๐๐๐(๐/๐)๐ฟ log๐ log ๐ and after the first pass the size
52
Algorithm 3 A streaming algorithm for the Points-Shapes Set Cover problem.
1: procedure GeomSC((๐ฐ ,โฑ), ๐ฟ)2: for ๐ โ {2๐ | 0 โค ๐ โค log ๐} in parallel do โ ๐ = |๐ฐ|3: let Lโ ๐ฐ and solโ โ 4: for ๐ = 1 to 1/๐ฟ do5: for all ๐ โ โฑ do โ one pass6: if |๐ โฉ L| โฅ |๐ฐ|/๐ then7: solโ sol โช {๐}8: Lโ L โ ๐9: Sโ sample of L of size ๐๐๐(๐/๐)๐ฟ log๐ log ๐
10: โฑS โ CompCanonicalRep(S,โฑ , |S|๐) โ one pass
11: solS โ OfflineSC(S,โฑS)12: for ๐ โ โฑ do โ one pass13: if โ๐ โฒ โ solS s.t. ๐ โฒ โฉ S โ ๐ โฉ S then14: solโ sol โช {๐}15: solS โ solS โ {๐ โฒ}16: Lโ L โ ๐17: for ๐ โ โฑ do โ final pass18: if ๐ โฉ L = โ then19: solโ sol โช {๐}20: Lโ L โ ๐21: return smallest sol computed in parallel
of each set is at most |๐ฐ|/๐. Thus using Chernoff bound for each set ๐ โ โฑ โ sol,
Pr |๐ โฉ S| > (1 + 2)|๐ฐ|๐ร |S||๐ฐ| โค exp
(โ4|S|
3๐
)โค (
1
๐)๐+1.
Thus, with probability at least 1โ๐โ๐ (by the union bound), all the sets that are not picked
in the first pass, cover at most 3|S|/๐ = ๐๐(๐/๐)๐ฟ log๐ log ๐ elements of S. Therefore, we can
use Lemma 2.4.4 to show that the number of sets in the canonical representation of (S,โฑS)
is at most
๐(|S|(3|S|๐
)3
log2 |S|) = ๐(๐4๐ log4๐ log6 ๐),
as long as ๐ฟ โค 1/4. To store each set in a canonical representation of (S,โฑ) only constant
space is required. Moreover, by Lemma 2.4.4, the space requirement of the second pass is๐(|โฑS|) = ๐(๐). Therefore, the total required space is ๐(๐) and the lemma follows. ๏ฟฝ
Theorem 2.4.6. Given a set system defined over a set ๐ฐ of ๐ points in the plane, and a
53
set of ๐ ranges โฑ (which are either all disks, axis-parallel rectangles, or fat triangles). Let
๐ be the quality of approximation to the offline set-cover solver we have, and let 0 < ๐ฟ < 1/4
be an arbitrary parameter.
Setting ๐ฟ = 1/4, the algorithm GeomSC with high probability returns an ๐(๐)-approximate
solution of the optimal set cover solution for the instance (๐ฐ ,โฑ). This algorithm uses ๐(๐)
space, and performs constant passes over the data.
Proof: As before consider the run of the algorithm in which |OPT| โค ๐ < 2|OPT|. Let
V be the set of uncovered elements L at the beginning of the iteration and note that the
total number of sets that is picked during the iteration is at most (1 + ๐1๐)๐ where ๐1
is the constant defined in Definition 2.4.4. Let ๐ข denote all possible such covers, that is
๐ข ={โฑ โฒ โ โฑ
|โฑ โฒ| โค (1 + ๐1๐)๐
}. Let โ be the collection that contains all possible set of
uncovered elements at the end of the iteration, defined as โ ={V โโ๐โ๐ ๐
๐ โ ๐ข
}. Set
๐ = (๐/๐)๐ฟ, ๐ = 1/2 and ๐ = ๐โ๐. Since for large enough ๐, ๐โฒ
๐2๐(log |โ| log 1
๐+ log 1
๐) โค
๐๐๐(๐/๐)๐ฟ log๐ log ๐ = |S| with probability at least 1 โ ๐โ๐, by Lemma 2.2.5, the set of
sampled elements S is a relative (๐, ๐)-approximation sample of (V,โ).Let ๐ โ โฑ be the collection of sets picked in the third pass of the algorithm that covers all
elements in S. By Lemma 2.4.4, |๐| โค ๐1๐๐ for some constant ๐1. Since with high probability
S is a relative (๐, ๐)-approximation sample of (V,โ), the number of uncovered elements of
V (or L) after adding ๐ to sol is at most ๐๐|V| โค |๐ฐ|(๐/๐)๐ฟ. Thus with probability at least
(1 โ ๐โ๐), in each iteration and by adding ๐(๐๐) sets, the number of uncovered elements
reduces by a factor of (๐/๐)๐ฟ.
Therefore, after 4 iterations (for ๐ฟ = 1/4) the algorithm picks ๐(๐๐) sets and with high
probability the number of uncovered elements is at most ๐(๐/๐)๐ฟ/๐ฟ = ๐. Thus, in the final
pass the algorithm only adds ๐ sets to the solution sol, and hence the approximation factor
of the algorithm is ๐(๐). ๏ฟฝ
Remark 2.4.7. The result of Theorem 2.4.6 is similar to the result of Agarwal and Pan [3]
โ except that their algorithm performs ๐(log ๐) iterations over the data, while the algorithm
of Theorem 2.4.6 performs only a constant number of iterations. In particular, one can use
the algorithm of Agarwal and Pan [3] as the offline solver.
54
f3 f2 f11
2
3
4
f โฒ1 f โฒ2 f โฒ3f3 f2 f1
P1P2P3 P1P2P3 P6P5P4
(a) (b)
Figure 2.5.1: (a) shows an example of the communication Set Chasing(4, 3) and (b) is aninstance of the communication Intersection Set Chasing(4, 3).
2.5. Lower bound for Multipass Algorithms
In this section we give lower bound on the memory space of multipass streaming algorithms
for the Set Cover problem. Our main result is ฮฉ(๐๐๐ฟ) space for streaming algorithms that
return an optimal solution of the Set Cover problem in ๐(1/๐ฟ) passes for ๐ = ๐(๐). Our
approach is to reduce the communication Intersection Set Chasing(๐, ๐) problem introduced
by Guruswami and Onak [90] to the communication Set Cover problem.
Consider a communication problem P with ๐ players ๐1, ยท ยท ยท , ๐๐. The problem P is a
(๐, ๐)-communication problem if players communicate in ๐ rounds and in each round they
speak in order ๐1, ยท ยท ยท , ๐๐. At the end of the ๐th round ๐๐ should return the solution.
Moreover we assume private randomness and public messages. In what follows we define the
communication Set Chasing and Intersection Set Chasing problems.
Definition 2.5.1. The Set Chasing(๐, ๐) problem is a (๐, ๐โ 1) communication problem in
which the player ๐ has a function ๐๐ : [๐]โ 2[๐] and the goal is to compute ๐1(๐2(ยท ยท ยท ๐๐({1}) ยท ยท ยท ))where ๐๐(๐) =
โ๐ โ๐ ๐๐(๐ ). Figure 2.5.1(a) is an instance of the communication Set Chasing(4, 3).
Definition 2.5.2. The Intersection Set Chasing(๐, ๐) is a (2๐, ๐โ 1) communication prob-
lem in which the first ๐ players have an instance of the Set Chasing(๐, ๐) problem and the
other ๐ players have another instance of the Set Chasing(๐, ๐) problem. The output of the In-
tersection Set Chasing(๐, ๐) is 1 if the solutions of the two instances of the Set Chasing(๐, ๐)
intersect and 0 otherwise. Figure 2.5.1(b) shows an instance of the Intersection Set Chasing
(4, 3). The function ๐๐ of each player ๐๐ is specified by a set of directed edges form a copy of
vertices labeled {1, ยท ยท ยท , ๐} to another copy of vertices labeled {1, ยท ยท ยท , ๐}.
The communication Set Chasing problem is a generalization of the well-known commu-
55
fi
Pi
(a)
v1i+1
v2i+1
v3i+1
v4i+1
v1i
v2i
v3i
v4i
(b)
f โฒi
Pp+i
u1i
u2i
u3i
u4i
u1i+1
u2i+1
u3i+1
u4i+1
(c)
ei
out(v1i+1)
out(v2i+1)
out(v3i+1)
out(v4i+1)
in(v1i )
in(v2i )
in(v3i )
in(v4i )
(d)
out(u1i+1)
out(u2i+1)
out(u3i+1)
out(u4i+1)
in(u1i )
in(u2i )
in(u3i )
in(u4i ) ep+i
Figure 2.5.2: The gadgets used in the reduction of the communication Intersection SetChasing problem to the communication Set Cover problem. (a) and (c) shows the construc-tion of the gadget for players 1 to ๐ and (c) and (d) shows the construction of the gadget forplayers ๐+ 1 to 2๐.
nication Pointer Chasing problem in which player ๐ has a function ๐๐ : [๐] โ [๐] and the
goal is to compute ๐1(๐2(ยท ยท ยท ๐๐(1) ยท ยท ยท )).[90] showed that any randomized protocol that solves Intersection Set Chasing(๐, ๐)
with error probability less than 1/10, requires ฮฉ( ๐1+1/(2๐)
๐16 log3/2 ๐) bits of communication where
๐ is sufficiently large and ๐ โค log๐log log๐
. In Theorem 2.5.4, we reduce the communication
Intersection Set Chasing problem to the communication Set Cover problem and then give
the first superlinear memory lower bound for the streaming Set Cover problem.
Definition 2.5.3 (Communication Set Cover(๐ฐ ,โฑ , ๐) Problem). The communication
Set Cover(๐, ๐) is a (๐, ๐ โ 1) communication problem in which a collection of elements ๐ฐis given to all players and each player ๐ has a collection of subsets of ๐ฐ , โฑ๐. The goal is to
solve Set Cover(๐ฐ ,โฑ1 โช ยท ยท ยท โช โฑ๐) using the minimum number of communication bits.
Theorem 2.5.4. Any (1/2๐ฟโ1) passes streaming algorithm that solves the Set Cover(๐ฐ ,โฑ)
optimally with constant probability of error requires ฮฉ(๐๐๐ฟ) memory space where ๐ฟ โฅ log log๐log๐
and ๐ = ๐(๐).
56
Consider an instance ISC of the communication Intersection Set Chasing(๐, ๐). We con-
struct an instance of the communication Set Cover(๐ฐ ,โฑ , 2๐) problem such that solving Set
Cover(๐ฐ ,โฑ) optimally determines whether the output of ISC is 1 or not.
The instance ISC consists of 2๐ players. Each player 1, ยท ยท ยท , ๐ has a function ๐๐ : [๐]โ 2[๐]
and each player ๐ + 1, ยท ยท ยท , 2๐ has a function ๐ โฒ๐ : [๐] โ 2[๐] (see Figure 2.5.1). In ISC, each
function ๐๐ is shown by a set of vertices ๐ฃ1๐ , ยท ยท ยท , ๐ฃ๐๐ and ๐ฃ1๐+1, ยท ยท ยท , ๐ฃ๐๐+1 such that there is a
directed edge from ๐ฃ๐๐+1 to ๐ฃโ๐ if and only if โ โ ๐๐(๐). Similarly, each function ๐ โฒ๐ is denoted
by a set of vertices ๐ข1๐ , ยท ยท ยท , ๐ข๐
๐ and ๐ข1๐+1, ยท ยท ยท , ๐ข๐
๐+1 such that there is a directed edge from ๐ข๐๐+1
to ๐ขโ๐ if and only if โ โ ๐ โฒ
๐(๐) (see Figure 2.5.4(a) and Figure 2.5.4(b)).
In the corresponding communication Set Cover instance of ISC, we add two elements
in(๐ฃ๐๐ ) and out(๐ฃ๐๐ ) per each vertex ๐ฃ๐๐ where ๐ โค ๐ + 1, ๐ โค ๐. We also add two elements
in(๐ข๐๐ ) and out(๐ข๐
๐ ) per each vertex ๐ข๐๐ where ๐ โค ๐+1, ๐ โค ๐. In addition to these elements,
for each player ๐, we add an element ๐๐ (see Figure 2.5.4(c) and Figure 2.5.4(d)).
Next, we define a collection of sets in the corresponding Set Cover instance of ISC. For
each player ๐๐, where 1 โค ๐ โค ๐, we add a single set ๐๐๐ containing out(๐ฃ๐๐+1) and in(๐ฃโ๐ ) for
all out-going edges (๐ฃ๐๐+1, ๐ฃโ๐ ). Moreover, all ๐๐
๐ sets contain the element ๐๐. Next, for each
vertex ๐ฃ๐๐ we add a set ๐ ๐๐ that contains the two corresponding elements of ๐ฃ๐๐ , in(๐ฃ
๐๐ ) and
out(๐ฃ๐๐ ). In Figure 2.5.4(c), the red rectangles denote ๐ -type sets and the curves denote
๐-type sets for the first half of the players.
Similarly to the sets corresponding to players 1 to ๐, for each player ๐๐+๐ where 1 โค ๐ โค ๐,
we add a set ๐๐๐+๐ containing in(๐ข๐
๐ ) and out(๐ขโ๐+1) for all in-coming edges (๐ขโ
๐+1, ๐ข๐๐ ) of ๐ข
๐๐
(denoting ๐ โฒโ1๐ (๐)). The set ๐๐
๐+๐ contains the element ๐๐+๐ too. Next, for each vertex ๐ข๐๐ we
add a set ๐ ๐๐+๐ that contains the two corresponding elements of ๐ข๐
๐ , in(๐ข๐๐ ) and out(๐ข๐
๐ ). In
Figure 2.5.4(d), the red rectangles denote ๐ -type sets and the curves denote ๐-type sets for
the second half of the players.
At the end, we merge ๐ฃ๐1s and ๐ข๐1s as shown in Figure 2.5.3. After merging the corre-
sponding sets of ๐ฃ๐1s (๐ 11, ยท ยท ยท , ๐ ๐
1 ) and the corresponding sets of ๐ข๐1s (๐
11 , ยท ยท ยท , ๐ ๐
1 ), we call
the merged sets ๐ 11 , ยท ยท ยท , ๐ ๐
1 .
The main claim is that if the solution of ISC is 1 then the size of an optimal solution of
its corresponding Set Cover instance SC is (2๐+ 1)๐+ 1; otherwise, it is (2๐+ 1)๐+ 2.
57
f1
P1
(a)
v12
v22
v32
v42
v11
v21
v31
v41
f โฒ1
Pp+1
u11
u21
u31
u41
u12
u22
u32
u42
f1
P1
v12
v22
f โฒ1
Pp+1
u12
u22
u32
u42
(b)
in(u41)in(v41)
v32
v42
(c)P1 Pp+1
f1 f โฒ1
Figure 2.5.3: In (b) two Set Chasing instances merge in their first set of vertices and (c)shows the corresponding gadgets of these merged vertices in the communication Set Cover.
(b)
1
2
3
4
f3
P3
f2
P2
(a)
f1
P1
f โฒ1
P4
f โฒ2
P5
f โฒ3
P6
Figure 2.5.4: In (๐), path ๐ is shown with black dashed arcs and (๐) shows the correspond-ing cover of path ๐.
Lemma 2.5.5. The size of any feasible solution of SC is at least (2๐+ 1)๐+ 1.
Proof: For each player ๐ (1 โค ๐ โค ๐), since out(๐ฃ๐๐+1)s are only covered by ๐ ๐๐+1 and ๐๐
๐ , at least
๐ sets are required to cover out(๐ฃ1๐+1), ยท ยท ยท , out(๐ฃ๐๐+1). Moreover for player ๐๐, since in(๐ฃ๐๐+1)s
are only covered by ๐ ๐๐+1 and ๐๐ is only covered by ๐1
๐ , all ๐+1 sets ๐ 1๐+1, ยท ยท ยท , ๐ ๐
๐+1, ๐1๐ must
be selected in any feasible solution of SC.
Similarly for each player ๐+๐ (1 โค ๐ โค ๐), since in(๐ข๐๐ )s are only covered by ๐
๐๐ and ๐๐
๐+๐, at
least ๐ sets are required to cover in(๐ข1๐ ), ยท ยท ยท , in(๐ข๐
๐ ). Moreover, considering ๐ข1๐+1, ยท ยท ยท , ๐ข๐
๐+1,
since in(๐ข๐๐+1) is only covered by ๐ ๐
๐+1, all ๐ sets ๐ 1๐+1, ยท ยท ยท , ๐ ๐
๐+1 must be selected in any
feasible solution of SC.
All together, at least (2๐+1)๐+1 sets should be selected in any feasible solution of SC.๏ฟฝ
Lemma 2.5.6. Suppose that the solution of ISC is 1. Then the size of an optimal solution
of its corresponding Set Cover instance is exactly (2๐+ 1)๐+ 1.
Proof: By Lemma 2.5.5, the size of an optimal solution of S is at least (2๐ + 1)๐ + 1.
Here we prove that (2๐ + 1)๐ + 1 sets suffice when the solution of ISC is 1. Let ๐ =
58
๐ฃ1๐+1, ๐ฃ๐๐๐ , . . . , ๐ฃ๐22 , ๐ฃ๐11 , ๐ขโ1
1 , ๐ขโ22 , . . . , ๐ข
โ๐๐ , ๐ข1
๐+1 be a path in ISC such that ๐1 = โ1 (since the solu-
tion of ISC is 1 such a path exists). The corresponding solution to ๐ can be constructed as
follows (See Figure 2.5.4):
โ Pick ๐1๐ and all ๐ ๐
๐+1s (๐+ 1 sets).
โ For each ๐ฃ๐๐๐ in ๐ where 1 < ๐ โค ๐, pick the set ๐๐๐๐โ1 in the solution. Moreover, for each
such ๐ pick all sets ๐ ๐๐ where ๐ = ๐๐ (๐(๐โ 1) sets).
โ For ๐ฃ๐11 (or ๐ขโ11 ), pick the set ๐๐1
๐+1. Moreover, pick all sets ๐ ๐1 where ๐ = ๐1 (๐ sets).
โ For each ๐ขโ๐๐ in ๐ where 1 < ๐ โค ๐, pick the set ๐โ๐
๐+๐ in the solution. Moreover, for each
such ๐ pick all sets ๐ โ๐ where โ = โ๐ (๐(๐โ 1) sets).
โ Pick all ๐ ๐๐+1s (๐ sets).
It is straightforward to see that the solution constructed above is a feasible solution. ๏ฟฝ
Lemma 2.5.7. Suppose that the size of an optimal solution of the corresponding Set Cover
instance of ISC, SC, is (2๐+ 1)๐+ 1. Then the solution of ISC is 1.
Proof: As we proved earlier in Lemma 2.5.5, any feasible solution of SC picks๐ 1๐+1, ยท ยท ยท , ๐ ๐
๐+1, ๐1๐
and ๐ 1๐+1, ยท ยท ยท , ๐ ๐
๐+1. Moreover, we proved that for each 1 โค ๐ < ๐, at least ๐ sets should
be selected from ๐ 1๐+1, ยท ยท ยท , ๐ ๐
๐+1, ๐1๐ , ยท ยท ยท , ๐๐
๐ . Similarly, for each 1 โค ๐ โค ๐, at least ๐ sets
should be selected from ๐ 1๐ , ยท ยท ยท , ๐ ๐
๐ , ๐1๐+๐, ยท ยท ยท , ๐๐
๐+1. Thus if a feasible solution of SC, OPT,
is of size (2๐+ 1)๐+ 1, it has exactly ๐ sets from each specified group.
Next we consider the first half of the players and second half of the players separately.
Consider ๐ such that 1 โค ๐ < ๐. Let ๐๐1๐ , ยท ยท ยท , ๐๐๐
๐ be the sets picked in the optimal solution
(because of ๐๐ there should be at least one set of form ๐๐๐ in OPT). Since each out(๐ฃ๐๐+1)
is only covered by ๐๐๐ and ๐ ๐
๐+1, for all ๐ /โ {๐1, . . . , ๐๐}, ๐ ๐๐+1 should be selected in OPT.
Moreover, for all ๐ โ {๐1, ยท ยท ยท , ๐๐}, ๐ ๐๐+1 should not be contained in OPT (otherwise the size
of OPT would be larger than (2๐ + 1)๐ + 1). Consider ๐ โ {๐1, . . . , ๐๐}. Since ๐ ๐๐+1 is not
in OPT, there should be a set ๐โ๐+1 selected in OPT such that in(๐ฃ๐๐+1) is contained in ๐โ
๐+1.
Thus by considering ๐๐s in a decreasing order and using induction, if ๐๐๐ is in OPT then ๐ฃ๐๐+1
is reachable form ๐ฃ1๐+1.
Next consider a set ๐๐๐+๐ that is selected in OPT (1 โค ๐ โค ๐). By similar argument,
๐ ๐๐ is not in OPT and there exists a set ๐โ
๐+๐โ1 (or ๐โ1 if ๐ = 1) in OPT such that out(๐ข๐
๐ )
59
is contained in ๐โ๐+๐โ1. Let ๐ขโ1
๐+1, ยท ยท ยท , ๐ขโ๐๐+1 be the set of vertices whose corresponding out
elements are in ๐๐๐+๐. Then by induction, there exists an index ๐ such that ๐ฃ๐1 is reachable
from ๐ฃ1๐+1 and ๐ข๐1 is also reachable from all ๐ขโ1
๐+1, ยท ยท ยท , ๐ขโ๐๐+1. Moreover, the way we constructed
the instance SC guarantees that all sets ๐12๐, ยท ยท ยท , ๐๐
2๐ contains out(๐ข1๐+1). Hence if the size of
an optimal solution of SC is (2๐+ 1)๐+ 1 then the solution of ISC is 1. ๏ฟฝ
Corollary 2.5.8. Intersection Set Chasing(๐, ๐) returns 1 if and only if the size of optimal
solution of its corresponding Set Cover instance (as described here) is (2๐+ 1)๐+ 1.
Observation 2.5.9. Any streaming algorithm for Set Cover, โ, that in โ passes solves the
problem optimally with a probability of error err and consumes ๐ memory space, solves the
corresponding communication Set Cover problem in โ rounds using ๐(๐ โ2) bits of communi-
cation with probability error err.
Proof: Starting from player ๐1, each player runs โ over its input sets and once ๐๐ is done
with its input, she sends the working memory of โ publicly to other players. Then next
player starts the same routine using the state of the working memory received from the
previous player. Since โ solves the Set Cover instance optimally after โ passes using ๐(๐ )
space with probability error err, applying โ as a black box we can solve P in โ rounds using
๐(๐ โ2) bits of communication with probability error err. ๏ฟฝ
Proof of Theorem 2.5.4: By Observation 2.5.9, any โ-round ๐(๐ )-space algorithm that solves
streaming Set Cover (๐ฐ ,โฑ) optimally can be used to solve the communication Set Cover(๐ฐ ,โฑ , ๐)
problem in โ rounds using ๐(๐ โ2) bits of communication. Moreover, by Corollary 2.5.8, we
can decide the solution of the communication Intersection Set Chasing(๐, ๐) by solving its
corresponding communication Set Cover problem. Note that while working with the corre-
sponding Set Cover instance of Intersection Set Chasing(๐, ๐), all players know the collection
of elements ๐ฐ and each player can construct its collection of sets โฑ๐ using ๐๐ (or ๐ โฒ๐).
However, by a result of [90], we know that any protocol that solves the communication
Intersection Set Chasing(๐, ๐) problem with probability of error less than 1/10, requires
ฮฉ( ๐1+1/(2๐)
๐16 log3/2 ๐) bits of communication. Since in the corresponding Set Cover instance of the
communication Intersection Set Chasing(๐, ๐), |๐ฐ| = (2๐ + 1) ร 2๐ + 2๐ = ๐(๐๐) and
|โฑ| โค (2๐ + 1)๐ + 2๐๐ = ๐(๐๐), any (๐ โ 1)-pass streaming algorithm that solves the
60
Set Cover problem optimally with a probability of error at most 1/10, requires ฮฉ( ๐1+1/(2๐)
๐18 log3/2 ๐)
bits of communication. Then using Observation 2.5.9, since ๐ฟ โฅ log log๐log๐
, any ( 12๐ฟโ 1)-pass
streaming algorithm of Set Cover that finds an optimal solution with error probability less
than 1/10, requires ฮฉ(|โฑ| ยท |๐ฐ|๐ฟ) space. ๏ฟฝ
2.6. Lower Bound for Sparse Set Cover in Multiple Passes
In this part we give a stronger lower bound for the instances of the streaming Set Cover
problem with sparse input sets. An instance of the Set Cover problem is ๐ -Sparse Set Cover,
if for each set ๐ โ โฑ we have |๐| โค ๐ . We can us the same reduction approach described
earlier in Section 2.5 to show that any (1/2๐ฟโ 1)-pass streaming algorithm for ๐ -Sparse Set
Cover requires ฮฉ(|โฑ|๐ ) memory space if ๐ < |๐ฐ|๐ฟ and โฑ = ๐(๐ฐ). To prove this, we need
to explain more details of the approach of [90] on the lower bound of the communication
Intersection Set Chasing problem. They first obtained a lower bound for Equal Pointer
Chasing(๐, ๐) problem in which two instances of the communication Pointer Chasing(๐, ๐)
are given and the goal is to decide whether these two instances point to a same value or not;
๐๐(ยท ยท ยท ๐1(1) ยท ยท ยท ) = ๐ โฒ๐(ยท ยท ยท ๐ โฒ
1(1) ยท ยท ยท ).
Definition 2.6.1 (๐-non-injective functions). A function ๐ : [๐] โ [๐] is called ๐-non-
injective if there exists ๐ด โ [๐] of size at least ๐ and ๐ โ [๐] such that for all ๐ โ ๐ด,
๐(๐) = ๐.
Definition 2.6.2 (Pointer Chasing Problem). Pointer Chasing(๐, ๐) is a (๐, ๐โ 1) com-
munication problem in which the player ๐ has a function ๐๐ : [๐] โ [๐] and the goal is to
compute ๐1(๐2(ยท ยท ยท ๐๐(1) ยท ยท ยท )).
Definition 2.6.3 (Equal Limited Pointer Chasing Problem). Equal Pointer Chasing(๐, ๐)
is a (2๐, ๐ โ 1) communication problem in which the first ๐ players have an instance of the
Pointer Chasing(๐, ๐) problem and the other ๐ players have another instance of the Pointer
Chasing(๐, ๐) problem. The output of the Equal Pointer Chasing(๐, ๐) is 1 if the solutions
of the two instances of Pointer Chasing(๐, ๐) have the same value and 0 otherwise. Further-
more in another variant of pointer chasing problem, Equal Limited Pointer Chasing(๐, ๐, ๐),
if there exists ๐-non-injective function ๐๐, then the output is 1. Otherwise, the output is the
61
same as the value in Equal Pointer Chasing(๐, ๐).
For a boolean communication problem P, OR๐ก(P) is defined to be OR of ๐ก instances of P
and the output of OR๐ก(P) is true if and only if the output of any of the ๐ก instances is true.
Using a direct sum argument, [90] showed that the communication complexity of OR๐ก(Equal
Limited Pointer Chasing(๐, ๐, ๐)) is ๐ก times the communication complexity of Equal Limited
Pointer Chasing(๐, ๐, ๐).
Lemma 2.6.4 ([90]). Let ๐, ๐, ๐ก and ๐ be positive integers such that ๐ โฅ 5๐, ๐ก โค ๐4and
๐ = ๐(log ๐). Then the amount of bits of communication to solve OR๐ก(Equal Limited
Pointer Chasing(๐, ๐, ๐)) with error probability less than 1/3 is ฮฉ( ๐ก๐๐16 log๐
)โ๐(๐๐ก2).
Lemma 2.6.5 ([90]). Let ๐, ๐, ๐ก and ๐ be positive integers such that ๐ก2๐๐๐โ1 < ๐/10. Then
if there is a protocol that solves Intersection Set Chasing(๐, ๐) with probability of error less
than 1/10 using ๐ถ bits of communication, there is a protocol that solves OR๐ก(Equal Lim-
ited Pointer Chasing(๐, ๐, ๐)) with probability of error at most 2/10 using ๐ถ + 2๐ bits of
communication.
Consider an instance of OR๐ก(Equal Limited Pointer Chasing (๐, ๐, ๐)) in which ๐ก โค ๐๐ฟ, ๐ =
log(๐), ๐ = 12๐ฟโ 1 where 1
๐ฟ= ๐(log ๐). By Lemma 2.6.4, the required amount of bits of com-
munication to solve the instance with constant success probability is ฮฉ(๐ก๐). Then,applyingLemma 2.6.5, to solve the corresponding Intersection Set Chasing, ฮฉ(๐ก๐) bits of communi-
cation is required.
In the reduction from OR๐ก(Equal Limited Pointer Chasing(๐, ๐, ๐)) to Intersection Set
Chasing(๐, ๐) (proof of Lemma 2.6.5), the ๐-non-injective property is preserved. In other
words, in the corresponding Intersection Set Chasing instance each playerโs functions ๐๐ :
[๐] โ 2[๐] is union of ๐ก ๐-non-injective functions ๐๐(๐) := ๐๐,1(๐) โช ยท ยท ยท โช ๐๐,๐ก(๐)4. Given that
none of the ๐๐,๐ functions is ๐-non-injective, the corresponding Set Cover instance will have
sets of size at most ๐๐ก (๐-type sets are of size at most ๐ก for 1 โค ๐ โค ๐ and of size at most ๐๐ก for
๐+1 โค ๐ โค 2๐). Since ๐ = ๐(log ๐), the corresponding Set Cover instance is ๐(๐ก)-sparse. As
we showed earlier in the reduction from Intersection Set Chasing to Set Cover, the number
4The Intersection Set Chasing instance is obtained by overlaying the ๐ก instances of Equal PointerChasing(๐, ๐, ๐). To be more precise, the function of player ๐ in instance ๐ is ๐๐,๐ โ ๐๐,๐ โ ๐โ1
๐+1,๐ (๐ arerandomly chosen permutation functions) and then stack the functions on top of each other.
62
of elements (and sets) in the corresponding Set Cover instance is ๐(๐๐). Thus we have the
following result for ๐ -Sparse Set Cover problem.
Theorem 2.6.6. For ๐ โค |๐ฐ|๐ฟ, any streaming algorithm that solves ๐ -Sparse Set Cover(๐ฐ ,โฑ)
optimally with probability of error less than 1/10 in ( 12๐ฟโ 1) passes requires ฮฉ(|โฑ|๐ ) memory
space for โฑ = ๐(๐ฐ).
63
Chapter 3
Streaming Fractional Set Cover
3.1. Introduction
The LP relaxation of Set Cover (called SetCover-LP) is a continuous relaxation of the prob-
lem where each set ๐ โ โฑ can be selected โfractionallyโ, i.e., assigned a number ๐ฅ๐ from
[0, 1], such that for each element ๐ its โfractional coverageโโ
๐:๐โ๐ ๐ฅ๐ is at least 1, and the
sumโ
๐ ๐ฅ๐ is minimized.
Despite the significant developments in streaming Set Cover [67, 42, 61, 97, 20, 29, 17],
the results for the fractional variant of the problem are still unsatisfactory. To the best of
our knowledge, it is not known whether there exists an efficient and accurate algorithm for
this problem that uses only a logarithmic (or even a poly logarithmic) number of passes.
This state of affairs is perhaps surprising, given the many recent developments on fast LP
solvers [119, 174, 124, 12, 11, 167]. To the best of our knowledge, the only prior results on
streaming Packing/Covering LPs were presented in [6], which studied the LP relaxation of
Maximum Matching.
3.1.1. Our Results
In this chapter, we present the first (1 + ๐)-approximation algorithm for the fractional
Set Cover in the streaming model with constant number of passes. Our algorithm performs
๐ passes over the data stream and uses ๐(๐๐๐( 1๐๐
) + ๐) memory space to return a (1 + ๐)
approximate solution of the LP relaxation of Set Cover for positive parameter ๐ โค 1/2.
65
We emphasize that similarly to the previous work on variants of Set Cover in streaming
setting, our result also holds for the edge arrival stream in which the pair of (๐๐, ๐๐) (edges)
are stored in the read-only repository and all elements of a set are not necessarily stored
consecutively.
3.1.2. Other Related Work
Set Cover. The Set Cover problem was first studied in the streaming model in [155],
which presented an ๐(log ๐)-approximation algorithm in ๐(log ๐) passes and using ๐(๐)
space. This approximation factor and the number of passes can be improved to ๐(log ๐)
by adapting the greedy algorithm thresholding idea presented in [56] . In the low space
regime ( ๐(๐) space), Emek and Rosen [67] designed a deterministic single pass algorithm
that achieves an ๐(โ๐)-approximation. This is provably the best guarantee that one can
hope for in a single pass even considering randomized algorithms. Later Chakrabarti and
Wirth [42] generalized this result and provided a tight trade-off bounds for Set Cover in
multiple passes. More precisely, they gave an ๐(๐๐1/(๐+1))-approximate algorithm in ๐-
passes using ๐(๐) space and proved that this is the best possible approximation ratio up to
a factor of poly(๐) in ๐ passes and ๐(๐) space.
A different line of work started by Demaine et al. [61] focused on designing a โlowโ ap-
proximation algorithm (between ฮ(1) and ฮ(log ๐)) in the smallest possible amount of space.
In contrast to the results in the ๐(๐) space regime, [61] showed that randomness is neces-
sary: any constant pass deterministic algorithm requires ฮฉ(๐๐) space to achieve constant
approximation guarantee. Further, they provided a ๐(4๐ log ๐)-approximation algorithm
that makes ๐(4๐) passes and uses ๐(๐๐1/๐ + ๐). Later Har-Peled et al. [97] improved the
algorithm to a 2๐-pass ๐(๐ log ๐)-approximation with memory space ๐(๐๐1/๐ + ๐)1. The
result was further improved by Bateni et al. where they designed a ๐-pass algorithm that
returns a (1 + ๐) log ๐-approximate solution using ๐๐ฮ(1/๐) memory [29].
As for the lower bounds, Assadi et al. [20] presented a lower bound of ฮฉ(๐๐/๐ผ) memory
for any single pass streaming algorithm that computes a ๐ผ-approxime solution. For the prob-
lem of estimating the size of an optimal solution they prove ฮฉ(๐๐/๐ผ2) memory lower bound.
1In streaming model, space complexity is of interest and one can assume exponentital computation power.In this case the algorithms of [61, 97] save a factor of log ๐ in the approximation ratio.
66
For both settings, they complement the results with matching tight upper bounds. Further-
more, Assadi [17] proved a lower bound on the space complexity of streaming algorithms for
Set Cover with multiple passes which is tight up to polylog factors: any ๐ผ-approximation
algorithm for Set Cover requires ฮฉ(๐๐1/๐ผ) space, even if it is allowed polylog(๐) passes
over the stream, and even if the sets are arriving in a random order in the stream. Fur-
ther, [17] provided the matching upper bound: a (2๐ผ + 1)-pass algorithm that computes a
(๐ผ+ ๐)-approximate solution in ๐(๐๐1/๐ผ
๐2+ ๐
๐) memory (assuming exponential computational
resource).
Maximum Coverage. The first result on streaming Max ๐-Cover showed how to com-
pute a (1/4)-approximate solution in one pass using ๐(๐๐) space [155]. It was improved
by Badanidiyuru et al. [22] to a (1/2 โ ๐)-approximation algorithm that requires ๐(๐/๐)
space. Moreover, their algorithm works for a more general problem of Submodular Max-
imization with cardinality constraints. This result was later generalized for the problem
of non-monotone submodular maximization under constraints beyond cardinality [46]. Re-
cently, McGregor and Vu [131] and Bateni et al. [29] independently obtained single pass
(1 โ 1/๐ โ ๐)-approximation with ๐(๐/๐2) space. On the lower bound side, [131] showed
a lower bound of ฮฉ(๐) for constant pass algorithm whose approximation is better than
(1 โ 1/๐). Moreover, [17] proved that any streaming (1 โ ๐)-approximation algorithm of
Max ๐-Cover in polylog(๐) passes requires ฮฉ(๐/๐2) space even on random order streams
and the case ๐ = ๐(1). This bound is also complemented by the ๐(๐๐/๐2) and ๐(๐/๐3)
algorithms of [29, 131]. For more detailed survey of the results on streaming Max ๐-Cover
refer to [29, 131, 17, 107].
Covering/Packing LPs. The study of LPs in streaming model was first discussed in the
work of Ahn and Guha [6] where they used multiplicative weights update (MWU) based tech-
niques to solve the LP relaxation of Maximum (Weighted) Matching problem. They used
the fact that MWU returns a near optimal fractional solution with small size support: first
they solve the fractional matching problem, then solve the actual matching only considering
the edges in the support of the returned fractional solution.
67
Our algorithm is also based on the MWU method, which is one of the main key techniques
in designing fast approximation algorithms for Covering and Packing LPs [150, 173, 76, 16].
We note that the MWU method has been previously studied in the context of streaming and
distributed algorithms, leading to efficient algorithms for a wide range of graph optimization
problems [6, 23, 7].
For a related problem, covering integer LP (covering ILP), Assadi et al. [20] designed
a one pass streaming algorithm that estimates the optimal solution of {min cโคx | Aโคx โฅb,x โ {0, 1}๐} within a factor of ๐ผ using ๐(๐๐
๐ผ2 ยท ๐max +๐+ ๐ ยท ๐max) where ๐max denotes the
largest entry of b. In this problem, they assume that columns of A, constriants, are given
one by one in the stream.
In a different regime, [65] studied approximating the feasibility LP in streaming model
with additive approximation. Their algorithm performs two passes and is most efficient when
the input is dense.
3.1.3. Our Techniques
Preprocessing. Let ๐ denote the value of the optimal solution. The algorithm starts by
picking a uniform fractional vector (each entry of value ๐( ๐๐)) which covers all frequently
occurring elements (those appearing in ฮฉ(๐๐) sets), and updates the uncovered elements in
one pass. This step considerably reduces the memory usage as the uncovered elements have
now lower occurrence (roughly ๐๐). Note that we do not need to assume the knowledge of
the correct value ๐: in parallel we try all powers of (1 + ๐), denoting our guess by โ.
Multiplicative Weight Update (MWU). To cover the remaining elements, we employ
the MWU framework and show how to implement it in the streaming setting. In each
iteration of MWU, we have a probability distribution p corresponding to the constraints
(elements) and we need to satisfy the average covering constraint. More precisely, we need
an oracle that assigns values to ๐ฅ๐ for each set ๐ so thatโ
๐ ๐๐๐ฅ๐ โฅ 1 subject to โxโ1 โค โ,
where ๐๐ is the sum of probabilities of the elements in the set ๐. Then, the algorithm
needs to update p according to the amount each element has been covered by the oracleโs
solution. The simple greedy realization of the oracle can be implemented in the streaming
68
setting efficiently by computing all ๐๐ while reading the stream in one pass, then choosing
the heaviest set (i.e., the set with largest ๐๐) and setting its ๐ฅ๐ to โ. This approach works,
except that the number of rounds ๐ required by the MWU framework is large. In fact,
๐ = ฮฉ(๐ log๐๐2
), where ๐ is the width parameter (the maximum amount an oracle solution
may over-cover an element), which is ฮ(โ) in this naรฏve realization. Next, we show how to
decrease ๐ in two steps.
Step 1. A first hope would be that there is a more efficient implementation of the oracle
which gives a better width parameter. Nonetheless, no matter how the oracle is implemented,
if all sets in โฑ contain a fixed element ๐, then the width is inevitably ฮฉ(โ). This observation
implies that we need to work with a different set system that has small width, but at the
same time, it has the same objective value as of the optimal solution. Consequently, we
consider the extended set system where we replace โฑ with all subsets of the sets in โฑ . Thisextended system preserves the optimality, and under this system we may avoid over-covering
elements and obtain ๐ = ๐(log ๐) (for constant ๐).
In order to turn a solution in our set system into a solution in the extended set system
with small width, we need to remove the repeated elements from the sets in the solution
so that every covered element appears exactly once, and thereby getting constant width.
However, as a side effect, this reduces the total weight of the solution (โ
๐โsol ๐๐๐ฅ๐), and
thus the average covering constraint might not be satisfied anymore. In fact, we need to
come up with a guarantee that, on one hand, is preserved under the pruning step, and on
the other hand, implies that the solution has large enough total weight
Therefore, to fulfill the average constraint under the pruning step, the oracle must instead
solve the maximum coverage problem: given a budget, choose sets to cover the largest (frac-
tional) amount of elements. We first show that this problem can be solved approximately
via the MWU framework using the simple oracle that picks the heaviest set, but this MWU
algorithm still requires ๐ passes over the data. To improve the number of passes, we perform
element sampling and apply the MWU algorithm to find an approximate maximum cover-
age of a small number of sampled elements, whose subproblem can be stored in memory.
Fortunately, while the number of fractional solutions to maximum coverage is unbounded,
69
by exploiting the structure of the solutions returned by the MWU method, we can limit the
number of plausible solutions of this oracle and approximately solve the average constraint,
thereby reducing the space usage to ๐(๐) for a ๐( log๐๐2
)-pass algorithm.
Step 2. To further reduce the number of required passes, we observe that the weights of
the constraints change slowly. Thus, in a single pass, we can sample the elements for multiple
rounds in advance, and then perform rejection (sub-)sampling to obtain an unbiased set of
samples for each subsequent round. This will lead to a streaming algorithm with ๐ passes
and ๐๐๐(1/๐) space.
Extension. We also extend our result to handle general covering LPs. More specifically,
in the LP relaxation of Set Cover, maximize cโคx subject to Ax โฅ b and x โฅ 0, A has
entries from {0, 1} whereas entries of b and c are all ones. If the non-zero entries instead
belong to a range [1,๐ ], we increase the number of sampled elements by poly(๐) to handle
discrepancies between coefficients, leading to a poly(๐)-multiplicative overhead in the space
usage.
3.2. MWU Framework of the Streaming Algorithm for
Fractional Set Cover
In this section, we present a basic streaming algorithm that computes a (1+ ๐)-approximate
solution of the LP-relaxation of Set Cover for any ๐ > 0 via the MWU framework. We will,
in the next section, improve it into an efficient algorithm that achieves the claimed ๐(๐)
passes and ๐(๐๐1/๐) space complexity.
SetCover-LP โ Input: ๐ฐ ,โฑ
minimizeโ๐โโฑ
๐ฅ๐
subject toโ๐:๐โ๐
๐ฅ๐ โฅ 1 โ๐ โ ๐ฐ
๐ฅ๐ โฅ 0 โ๐ โ โฑ
Figure 3.2.1: LP relaxation of Set Cover.
70
Algorithm 4 FracSetCover returns a (1 + ๐)-approximate solution of SetCover-LP.
1: procedure FracSetCover(๐)2: โ finds a feasible (1 + ๐)-approximate solution in ๐( log๐
๐) iterations
3: for โ โ {(1 + ๐/3)๐ | 0 โค ๐ โค log1+๐/3 ๐} in parallel do4: xโ โ FeasibilityTest(โ, ๐/3)5: โ returns a solution of objective value at most (1 + ๐/3)โ when โ โฅ ๐.
6: return ๐ฅโ* where โ* โ min{โ : ๐ฅโ is not INFEASIBLE}
Let ๐ฐ and โฑ be the ground set of elements and the collection of sets, respectively, and
recall that |๐ฐ| = ๐ and |โฑ| = ๐. Let x โ R๐ be a vector indexed by the sets in โฑ , where๐ฅ๐ denotes the value assigned to the set ๐. Our goal is to compute an approximate solution
to the LP in Figure 3.2.1. Throughout the analysis we assume ๐ โค 1/2, and ignore the case
where some element never appears in any set, as it is easy to detect in a single pass that no
cover is valid. For ease of reading, we write ๐ and ฮ to hide polylog(๐,๐, 1๐) factors.
Outline of the algorithm. Let ๐ denote the optimal objective value, and 0 < ๐ โค 1/2 be
a parameter. The outline of the algorithm is shown in FracSetCover (Algorithm 4). This
algorithm makes calls to the subroutine FeasibilityTest, that given a parameter โ, with
high probability, either returns a solution of objective value at most (1+๐/3)โ, or detects that
the optimal objective value exceeds โ. Consequently, we may search for the right value of โ by
considering all values in {(1+๐/3)๐ | 0 โค ๐ โค log1+๐/3 ๐}. As for some value of โ it holds that
๐ โค โ โค ๐(1 + ๐/3), we obtain a solution of size (1 + ๐/3)โ โค (1 + ๐/3)(1 + ๐/3)๐ โค (1 + ๐)๐
which gives an approximation factor (1 + ๐). This whole process of searching for ๐ increases
the space complexity of the algorithm by at most a multiplicative factor of log1+๐/3 ๐ โ 3 log๐๐
.
The FeasibilityTest subroutine employs the multiplicative weights update method
(MWU) which is described next.
3.2.1. Preliminaries of the MWU method for solving covering LPs
In the following, we describe the MWU framework. The claims presented here are standard
results of the MWU method. For more details, see e.g. Section 3 of [16]. Note that we
introduce the general LP notation as it simplifies the presentation later on.
Let Ax โฅ b be a set of linear constraints, and let ๐ซ , {x โ R๐ : x โฅ 0} be the polytopeof the non-negative orthant. For a given error parameter 0 < ๐ฝ < 1, we would like to solve
71
an approximate version of the feasibility problem by doing one of the following:
โ Compute x โ ๐ซ such that A๐xโ ๐๐ โฅ โ๐ฝ for every constraint ๐.
โ Correctly report that the system Ax โฅ b has no solution in ๐ซ .
The MWU method solves this problem assuming the existence of the following oracle that
takes a distribution p over the constraints and finds a solution x that satisfies the constraints
on average over p.
Definition 3.2.1. Let ๐ โฅ 1 be a width parameter and 0 < ๐ฝ < 1 be an error parameter.
A (1, ๐)-bounded (๐ฝ/3)-approximate oracle is an algorithm that takes as input a distribution
p and does one of the following:
โ Returns a solution x โ ๐ซ satisfying
โ pโคAx โฅ pโคbโ ๐ฝ/3, and
โ A๐xโ ๐๐ โ [โ1, ๐] for every constraint ๐.
โ Correctly reports that the inequality pโคAx โฅ pโคb has no solution in ๐ซ .
The MWU algorithm for solving covering LPs involves ๐ rounds. It maintains the (non-
negative) weight of each constraint in Ax โฅ b, which measures how much it has been
satisfied by the solutions chosen so far. Let w๐ก denote the weight vector at the beginning
of round ๐ก, and initialize the weights to w1 , 1. Then, for rounds ๐ก = 1, . . . , ๐ , define
the probability vector p๐ก proportional to those weights w๐ก, and use the oracle above to
find a solution x๐ก. If the oracle reports that the system pโคAx โฅ pโคb is infeasible, the
MWU algorithm also reports that the original system Ax โฅ b is infeasible, and terminates.
Otherwise, define the cost vector incurred by x๐ก as m๐ก , 1๐(Axโb), then update the weights
so that ๐ค๐ก+1๐ , ๐ค๐ก
๐(1โ๐ฝ๐๐ก๐/6) and proceed to the next round. Finally, the algorithm returns
the average solution x = 1๐
โ๐๐ก=1 x
๐ก.
The MWU theorem (e.g., Theorem 3.5 of [16]) shows that ๐ = ๐(๐ log๐๐ฝ2 ) is sufficient to
correctly solve the problem, yielding A๐x โ ๐๐ โฅ โ๐ฝ for every constraint, where ๐ is the
number of constraints. In particular, the algorithm requires ๐ calls to the oracle.
72
Theorem 3.2.2 (MWU Theorem [16]). For every 0 < ๐ฝ < 1, ๐ โฅ 1 the MWU algorithm
either solves the Feasibility-Covering-LP problem up to an additive error of ๐ฝ (i.e., solves
A๐xโ๐๐ โฅ โ๐ฝ for every ๐) or correctly reports that the LP is infeasible, making only ๐(๐ log๐๐ฝ2 )
calls to a (1, ๐)-bounded ๐ฝ/3-approximate oracle of the LP.
3.2.2. Semi-streaming MWU-based algorithm for factional Set Cover
Setting up our MWU algorithm. As described in the overview, we wish to solve, as
a subroutine, the decision variant of SetCover-LP known as Feasibility-SC-LP given in
Figure 3.2.2a, where the parameter โ serves as the guess for the optimal objective value.
Feasibility-SC-LP โ Input: ๐ฐ ,โฑ , โโ๐โโฑ
๐ฅ๐ โค โโ๐:๐โ๐
๐ฅ๐ โฅ 1 โ๐ โ ๐ฐ
๐ฅ๐ โฅ 0 โ๐ โ โฑ
(a) LP relaxation of Feasibility Set Cover.
Feasibility-Covering-LP โ Input: A,b, c, โ
cโคx โค โ (objective value)
Ax โฅ b (covering)
x โฅ 0 (non-negativity)
(b) LP relaxation of Feasibility Covering.
Figure 3.2.2: LP relaxations of the feasibility variant of set cover and general coveringproblems.
To follow the conventional notation for solving LPs in the MWU framework, consider
the more standard form of covering LPs denoted as Feasibility-Covering-LP given in Fig-
ure 3.2.2b. For our purpose, A๐ร๐ is the element-set incidence matrix indexed by ๐ฐ ร โฑ ;that is, ๐ด๐,๐ = 1 if ๐ โ ๐, and ๐ด๐,๐ = 0 otherwise. The vectors b and c are both all-ones
vectors indexed by ๐ฐ and โฑ , respectively. We emphasize that, unconventionally for our
system Ax โฅ b, there are ๐ constraints (i.e. elements) and ๐ variables (i.e. sets).
Employing the MWU approach for solving covering LPs, we define the polytope
๐ซโ , {x โ R๐ : cโคx โค โ and x โฅ 0}.
Observe that by applying the MWU algorithm to this polytope ๐ซ and constraints Ax โฅ b,
we obtain a solution x โ ๐ซโ such that A๐
(x
1โ๐ฝ
)โฅ ๐๐โ๐ฝ
1โ๐ฝ= 1 = ๐๐, where A๐ denotes the row
of A corresponding to ๐. This yields a (1 +๐(๐))-approximate solution for ๐ฝ = ๐(๐).
Unfortunately, we cannot implement the MWU algorithm on the full input under our
73
streaming context. Therefore, the main challenge is to implement the following two subtasks
of the MWU algorithm in the streaming settings. First, we need to design an oracle that
solves the average constraint in the streaming setting. Moreover, we need to be able to
efficiently update the weights for the subsequent rounds.
Covering the common elements. Before we proceed to applying the MWU framework,
we add a simple first step to our implementation of FeasibilityTest (Algorithm 5) that
will greatly reduce the amount of sapce required in implementing the MWU algorithm.
This can be interpreted as the fractional version of Set Sampling described in [61]. In our
subroutine, we partition the elements into the common elements that occur more frequently,
which will be covered if we simply choose a uniform vector solution, and the rare elements
that occur less frequently, for which we perform the MWU algorithm to compute a good
solution. In one pass we can find all frequently occurring elements by counting the number
of sets containing each element. The amount of required space to perform this task is
๐(๐ log๐).
we call an element that appears in at least ๐๐ผโ
sets common, and we call it rare otherwise,
where ๐ผ = ฮ(๐). Since we are aiming for a (1 + ๐)-approximation, we can define xcmn as a
vector whose all entries are ๐ผโ๐. The total cost of xcmn is ๐ผโ and all common elements are
covered by xcmn. Thus, throughout the algorithm we may restrict our attention to the rare
elements.
Our goal now is to construct an efficient MWU-based algorithm, which finds a solution
xrare covering the rare elements, with objective value at most โ1โ๐ฝโค (1 + ๐ โ ๐ผ)โ. We
note that our implementation does not explicitly maintain the weight vector w๐ก described in
Section 3.2.1, but instead updates (and normalizes) its probability vector p๐ก in every round.
3.2.3. First Attempt: Simple Oracle and Large Width
A greedy solution for the oracle. We implement the oracle for MWU algorithm such
that ๐ = โ, and thus requiring ฮ(โ log ๐/๐ฝ2) iterations (Theorem 3.2.2). In each iteration,
we need an oracle that finds some solution x โ ๐ซโ satisfying pโคAx โฅ pโคbโ ๐ฝ/3, or decides
that no solution in ๐ซโ satisfies pโคAx โฅ pโคb.
74
Algorithm 5 A generic implementation of FeasibilityTest. Its performance depends onthe implementations of Oracle, UpdateProb. We will investigate different implementa-tions of Oracle.1: procedure FeasibilityTest(โ, ๐)2: ๐ผ, ๐ฝ โ ๐
3, pcurr โ 1๐ร1โ the initial prob. vector for the MWU algorithm on ๐ฐ
3: โ compute a cover of common elements in one pass4: xcmn โ ๐ผโ
๐ยท 1๐ร1, freqโ 0๐ร1
5: for all set ๐ in the stream do6: for all element ๐ โ ๐ do7: freq๐ โ freq๐ + 18: if ๐ appears in more than ๐
๐ผโsets (i.e. freq๐ >
๐๐ผโ) then โ common element
9: ๐curr๐ โ 0
10: pcurr โ pcurr
โpcurrโ โ pcurr represents the current prob. vector
11: xtotal โ 0๐ร1
12: โ multiplicative weight update algorithm for covering rare elements13: for ๐ = 1 to ๐ do14: โ solve the corresp. oracle of MWU and decide if the solution is feasible15: try xโ Oracle(pcurr, โ,โฑ)16: xtotal โ xtotal + x โ in one pass, update p according to x17: zโ 0๐ร1
18: for all set ๐ in the stream do19: for all element ๐ โ ๐ do20: ๐ง๐ โ ๐ง๐ + ๐ฅ๐
21: if (pcurr)โคz < 1โ ๐ฝ/3 then โ detect infeasible solutions returned by Oracle22: report INFEASIBLE
23: pcurr โ UpdateProb(pcurr, z)
24: xrare โ xtotal
(1โ๐ฝ)๐โ scaled up the solution to cover rare elements
25: return xcmn + xrare
Observe that pโคAx is maximized when we place value โ on ๐ฅ๐* where ๐* achieves the
maximum value ๐๐ ,โ
๐โ๐ ๐๐. Further, for our application, b = 1 so pโคb = 1. Our im-
plementation HeavySetOracle of Oracle given in Algorithm 6 below is a deterministic
greedy algorithm that finds a solution based on this observation. As A๐x โค โxโ1 โค โ,
HeavySetOracle implements a (1, โ)-bounded (๐ฝ/3)-approximate oracle. Therefore, the
implementation of FeasibilityTest with HeavySetOracle computes a solution of ob-
jective value at most (๐ผ + 11โ๐ฝ
)โ < (1 + ๐3)โ when โ โฅ ๐ as promised.
Finally, we track the space usage which concludes the complexities of the current version
of our algorithm: it only stores vectors of length ๐ or ๐, whose entries each requires a
logarithmic number of bits, yielding the following theorem.
75
Algorithm 6 HeavySetOracle computes ๐๐ of every set given the set system in a streamor stored memory, then returns the solution x that optimally places value โ on the corre-sponding entry. It reports INFEASIBLE if there is no sufficiently good solution
1: procedure HeavySetOracle(p, โ,โฑ)2: compute ๐๐ for all ๐ โ โฑ while reading the set system3: ๐* โ argmax๐โโฑ๐๐4: if ๐๐ < (1โ ๐ฝ/3)/โ then5: report INFEASIBLE
6: xโ 0๐ร1, ๐ฅ๐ โ โ7: return x
Theorem 3.2.3. There exists a streaming algorithm that w.h.p. returns a (1+๐)-approximate
fractional solution of SetCover-LP(๐ฐ ,โฑ) in ๐(๐ log๐๐2
) passes and using ๐(๐ + ๐) memory
for any positive ๐ โค 1/2. The algorithm works in both set arrival and edge arrival streams.
The presented algorithm suffers from large number of passes over the input. In particular,
we are interested in solving the fractional Set Cover in constant number of passes using
sublinear space. To this end, we first reduce the required number of rounds in MWU by a
more complicated implementation of Oracle.
3.3. Max Cover Problem and its Application to Width
Reduction
In this section, we improve the described algorithm in the previous section and prove the
following result.
Theorem 3.3.1. There exists a streaming algorithm that w.h.p. returns a (1+๐)-approximate
fractional solution of SetCover-LP(๐ฐ ,โฑ) in ๐ passes and uses ๐(๐๐๐(1/๐๐)+๐) memory for
any 2 โค ๐ โค polylog(๐) and 0 < ๐ โค 1/2. The algorithm works in both set arrival and edge
arrival streams.
Recall that in implementing Oracle, we must find a solution x of total size โxโ1 โค โ
with a sufficiently large weight pโคAx. Our previous implementation chooses only one good
entry ๐ฅ๐ and places its entire budget โ on this entry. As the width of the solution is roughly
the maximum amount an element is over-covered by x, this implementation induces a width
of โ. In this section, we design an oracle that returns a solution in which the budget is
distributed more evenly among the entries of x to reduce the width. To this end, we design
76
an implementation of Oracle of the MWU approach based on the Max โ-Cover problem
(whose precise definition will be given shortly). The solution to our Max โ-Cover aids in
reducing the width of our Oracle solution to a constant, so the required number of rounds
of the MWU algorithm decreases to ๐( log๐๐2
), independent of โ. Note that, if the objective
value of an optimal solution of Set Cover(๐ฐ ,โฑ) is โ, then a solution of width ๐(โ) may
not exist, as shown in Lemma 3.3.2. This observation implies that we need to work with
a different set system. Besides having small width, an optimal solution of the Set Cover
instance on the new set system should have the same objective value of the optimal solution
of Set Cover(๐ฐ ,โฑ).
Lemma 3.3.2. There exists a set system in which, under the direct application of the MWU
framework in computing a (1 + ๐)-approximate solution, induces width ๐ = ฮฉ(๐), where ๐
is the optimal objective value. Moreover, the exists a set system in which the approach from
the previous section (which handles the frequent and rare elements differently) has width
๐ = ฮ(๐) = ฮ(โ
๐/๐).
Proof: For the first claim, we consider an arbitrary set system, then modify it by adding a
common element ๐ to all sets. Recall that the MWU framework returns an average of the
solutions from all rounds. Thus there must exist a round where the oracle returns a solution
x of size โxโ1 = ฮ(๐). For the added element ๐, this solution hasโ
๐:๐โ๐ ๐ฅ๐ =โ
๐โโฑ ๐ฅ๐ =
ฮ(๐), inducing width ๐ = ฮฉ(๐).
For the second claim, consider the following set system with ๐ =โ
๐/๐ and ๐ = 2๐+ 1.
For ๐ = 1, . . . , ๐, let ๐๐ = {๐๐, ๐๐+๐, ๐2๐+1}, whereas the remaining ๐ โ ๐ sets are arbitrary
subsets of {๐1, . . . , ๐๐}. Observe that ๐๐+๐ is contained only in ๐๐, so ๐ฅ๐๐= 1 in any valid set
cover. Consequently the solution x where ๐ฅ๐1 = ยท ยท ยท = ๐ฅ๐๐= 1 and ๐ฅ๐๐+1
= ยท ยท ยท = ๐ฅ๐2๐+1= 0
forms the unique (fractional) minimum set cover of size ๐ =โ๐/๐. Next, recall that an
element is considered rarely occurring if it appears in at most ๐๐ผโ
> ๐๐๐
sets. As ๐๐+1, . . . , ๐2๐
each only occurs once, and ๐2๐+1 only appears in ๐ =โ
๐/๐ = ๐๐๐
sets, these ๐ + 1 elements
are deemed rare and thus handled by the MWU framework.
The solution computed by the MWU framework satisfiesโ
๐:๐โ๐ ๐ฅ๐ โฅ 1โ ๐ฝ for every ๐,
and in particular, for each ๐ โ {๐๐+1, . . . , ๐2๐}. Therefore, the average solution places a total
77
weight of at least (1 โ ๐ฝ) ยท ฮ(๐) on ๐ฅ๐1 , . . . , ๐ฅ๐๐, so there must exist a round that places at
least the same total weight on these sets. However, these ๐ sets all contain ๐2๐+1, yieldingโ๐:๐2๐+1โ๐ ๐ฅ๐ โฅ (1โ ๐ฝ) ยทฮ(๐) = ฮฉ(๐), implying a width of ฮฉ(๐) = ฮฉ(
โ๐/๐). ๏ฟฝ
Extended set system. First, we consider the extended set system (๐ฐ , โฑ), where โฑ is the
collection containing all subsets of sets in โฑ ; that is,
โฑ , {๐ : ๐ โ ๐ for some ๐ โ โฑ}.
It is straightforward to see that the optimal objective value of Set Cover over (๐ฐ , โฑ) is
equal to that of (๐ฐ ,โฑ): we only add subsets of the original sets to create โฑ , and we may
replace any subset from โฑ in our solution with its original set in โฑ . Moreover, we may
prune any collection of sets from โฑ into a collection from โฑ of the same cardinality so that,
this pruned collection not only covers the same elements, but also each of these elements is
covered exactly once. This extended set system is defined for the sake of analysis only: we
will never explicitly handle an exponential number of sets throughout our algorithm.
We define โ-cover as a collection of sets of total weight โ. Although the pruning of an
โ-cover reduces the width, the total weight pโคAx of the solution will decrease. Thus, we
consider the weighted constraint of the form
โ๐โ๐ฐ
(๐๐ ยทmin{1,
โ๐:๐โ๐
๐ฅ๐})โฅ 1;
that is, we can only gain the value ๐๐ without any multiplicity larger than 1. The problem of
maximizing the left hand side is known as the weighted max coverage problem: for a param-
eter โ, find an โ-cover such that the total value ๐๐โs of the covered elements is maximized.
3.3.1. The Maximum Coverage Problem
In the design of our algorithm, we consider the weighted Max ๐-Cover problem, which is
closely related to Set Cover. Extending upon the brief description given earlier, we fully
specify the LP relaxation of this problem. In the weighted Max ๐-Cover(๐ฐ ,โฑ , โ,p), givena ground set of elements ๐ฐ , a collection of sets โฑ over the ground set, a budget parameter
78
MaxCover-LP โ Input: ๐ฐ ,โฑ , โ,p
maximizeโ๐โ๐ฐ
๐๐๐ง๐
subject toโ๐:๐โ๐
๐ฅ๐ โฅ ๐ง๐ โ๐ โ ๐ฐโ๐โโฑ
๐ฅ๐ = โ
0 โค ๐ง๐ โค 1 โ๐ โ ๐ฐ๐ฅ๐ โฅ 0 โ๐ โ โฑ
Figure 3.3.1: LP relaxation of weighted Max ๐-Cover.
โ, and a weight vector p, the goal is to return โ sets in โฑ whose weighted coverage, the
total weight of all covered elements, is maximized. Moreover, since we are aiming for a
fractional solution of Set Cover, we consider the LP relaxation of weighted Max ๐-Cover,
MaxCover-LP (see Figure 3.3.1); in this LP relaxation, ๐ง๐ denotes the fractional amount
that an element is covered, and hence is capped at 1.
As an intermediate goal, we aim to compute an approximate solution of MaxCover-LP,
given that the optimal solution covers all elements in the ground set, or to correctly detect
that no solution has weighted coverage of more than (1โ ๐). In our application, the vector
p is always a probability vector: p โฅ 0 andโ
๐โ๐ฐ ๐๐ = 1. We make the following useful
observation.
Observation 3.3.3. Let ๐ be the value of an optimal solution of SetCover-LP(๐ฐ ,โฑ) andlet p be an arbitrary probability vector over the ground set. Then there exists a fractional
solution of MaxCover-LP(๐ฐ ,โฑ , โ,p) whose weighted coverage is one if โ โฅ ๐.
๐ฟ-integral near optimal solution of MaxCover-LP. Our plan is to solve MaxCover-
LP over a randomly projected set system, and argue that with high probability this will
result in a valid Oracle. Such an argument requires an application of the union bound
over the set of solutions, which is generally of unbounded size. To this end, we consider a
more restrictive domain of ๐ฟ-integral solutions: this domain has bounded size, but is still
guaranteed to contain a sufficiently good solution.
Definition 3.3.4 (๐ฟ-integral solution). A fractional solution x๐ร1 of an LP is ๐ฟ-integral
79
Algorithm 7 MaxCoverOracle returns a fractional โ-cover with weighted coverage atleast 1โ ๐ฝ/3 w.h.p. if โ โฅ ๐. It provides no guarantee on its behavior if โ < ๐.
1: procedure MaxCoverOracle((๐ฐ ,โฑ), โ)2: xโ solution returned by MWU using HeavySetOracle on MaxCover-LP3: return x
if 1๐ฟยท x is an integral vector. That is, for each ๐ โ [๐], ๐ฅ๐ = ๐ฃ๐๐ฟ where each ๐ฃ๐ is an integer.
Next we claim thatMaxCoverOracle given in Algorithm 7 below, which is the MWU
algorithm with HeavySetOracle for solving MaxCover-LP, results in a ๐ฟ-integral solu-
tion.
Lemma 3.3.5. Consider a MaxCover-LP with the optimal objective value OPT (where
the weights of elements form a probability vector). There exists a ฮ(๐2MC
log๐)-integral solution
of MaxCover-LP whose objective value is at least (1 โ ๐MC)OPT. In particular, if an
optimal solution covers all elements ๐ฐ (โ โฅ ๐), MaxCoverOracle returns a solution
whose weighted coverage is at least 1โ ๐MC in polynomial time.
Proof: Let (x*, z*) denote the optimal solution of valueOPT to MaxCover-LP, which implies
that โx*โ1 โค โ and Ax* โฅ z*. Consider the following covering LP: minimize โxโ1 subject
to Ax โฅ z* and x โฅ 0. Clearly there exists an optimal solution of objective value โ, namely
x*. This covering LP may be solved via the MWU framework. In particular, we may use the
oracle that picks one set ๐ with maximum weight (as maintained in the MWU framework)
and places its entire budget on ๐ฅ๐. For an accurate guess โโฒ = ฮ(โ) of the optimal value, this
algorithm returns an average of ๐ = ฮ( โโฒ log๐๐2MC
) = ฮ( โ log๐๐2MC
) oracle solutions. Observe that the
outputted solution x is of the form ๐ฅ๐ = ๐ฃ๐โโฒ
๐= ๐ฃ๐๐ฟ where ๐ฃ๐ is the number of rounds in which
๐ is chosen by the oracle, and ๐ฟ = โโฒ
๐=
โโฒ๐2MC
โ log๐= ฮ(
๐2MC
log๐). In other words, x is ( ๐
2MC
log๐)-integral.
By Theorem 3.2.2, x satisfies Ax โฅ (1 โ ๐MC)z*. Then in MaxCover-LP, the solution
(x, (1โ๐MC)z*) yields coverage at least pโค((1โ๐MC)z*) = (1โ๐MC)pโคz* = (1โ๐MC)OPT.๏ฟฝ
Pruning a fractional โ-cover. In our analysis, we aim to solve the Set Cover problem
under the extended set system. We claim that any solution x with coverage z in the actual
set system may be turned into a pruned solution x in the extended set system that provides
the same coverage z, but satisfies the strict equalityโ
๐โโฑ :๐โ๐ ๏ฟฝ๏ฟฝ๐ = ๐ง๐. Since ๐ง๐ โค 1, the
pruned solution satisfies the condition for an oracle with width one.
80
Algorithm 8 The Prune subroutine lifts a solution in โฑ to a solution in โฑ with the sameMaxCover-LP objective value and width 1. The subroutine returns z, the amount by whichmembers of โฑ cover each element. The actual pruned solution x may be computed but hasno further use in our algorithm and thus not returned.
1: procedure Prune(x)2: xโ 0|โฑ |ร1, zโ 0๐ร1 โ maintain the pruned solution and its coverage amount3: for all ๐ โ โฑ do4: ๐ โ ๐5: while ๐ฅ๐ > 0 do6: ๐ โ min{๐ฅ๐,min๐โ๐(1โ ๐ง๐)} โ weight to be moved from ๐ฅ๐ to ๏ฟฝ๏ฟฝ๐
7: ๐ฅ๐ โ ๐ฅ๐ โ ๐, ๐ฅ๐ โ ๐ฅ๐ + ๐ โ move weight to the pruned solution8: for all ๐ โ ๐ do9: ๐ง๐ โ ๐ง๐ + ๐ โ update coverage accordingly
10: ๐ โ ๐ โ {๐ โ ๐ : ๐ง๐ = 1} โ remove ๐ with ๐ง๐ = 1 from ๐
11: return z
Lemma 3.3.6. A fractional โ-cover x of (๐ฐ ,โฑ) can be converted, in polynomial time, to a
fractional โ-cover x of (๐ฐ , โฑ) such that for each element ๐, its coverage ๐ง๐ =โ
๐โโฑ :๐โ๐ ๏ฟฝ๏ฟฝ๐ =
min(โ
๐:๐โ๐ ๐ฅ๐, 1).
Proof: Consider the algorithm Prune in Algorithm 8. As we pick a valid amount ๐ โค ๐ฅ๐ to
move from ๐ฅ๐ to ๏ฟฝ๏ฟฝ๐ at each step, x must be an โ-cover (in the extended set system) when
Prune finishes. Observe that ifโ
๐:๐โ๐ ๐ฅ๐ < 1 then ๐ will never be removed from any ๐,
so ๐ง๐ is increased by ๐ฅ๐ for every ๐, and thus ๐ง๐ =โ
๐:๐โ๐ ๐ฅ๐. Otherwise, the condition
๐ โค 1โ ๐ง๐ ensures that ๐ง๐ stops increasing precisely when it reaches 1. Each ๐ takes up to
๐ + 1 rounds in the while loop as one element ๐ โ ๐ is removed at the end of each round.
There are at most ๐ sets, so the algorithm must terminate (in polynomial time).
We note that in Section 3.3.4, we need to adjust Prune to instead achieves the condition
๐ง๐ = min(A๐x, 1) where entries of A are arbitrary non-negative values. We simply make the
following modifications: choose ๐ โ min(๐ฅ๐,min๐โ๐1โ๐ง๐๐ด๐,๐
) and update ๐ง๐ โ ๐ง๐ + ๐ ยท๐ด๐,๐, and
the same proof follows. ๏ฟฝ
Remark that to update the weights in the MWU framework, it is sufficient to have the
coverageโ
๐โโฑ :๐โ๐ ๏ฟฝ๏ฟฝ๐, which are the ๐ง๐โs returned by Prune; the actual solution x is not
necessary. Observe further that our MWU algorithm can still use x instead of x as its
solution because x has no worse coverage than x in every iteration, and so does the final,
81
average solution. Lastly, notice that the coverage z returned by Prune has the simple
formula ๐ง๐ = min(โ
๐:๐โ๐ ๐ฅ๐, 1). That is, we introduce Prune to show an existence of x,
but will never run Prune in our algorithm.
3.3.2. Sampling-Based Oracle for Fractional Max Coverage
In the previous section, we simply needed to compute the values ๐๐โs in order to construct a
solution for the Oracle. Here as we aim to bound the width of Oracle, our new task is to
find a fractional โ-cover x whose weighted coverage is at least 1โ๐ฝ/3. The element sampling
technique, which is also known from prior work in streaming Set Cover and Max ๐-Cover,
is to sample a few elements and solve the problem over the sampled elements only. Then,
by applying the union bound over all possible candidate solutions, it is shown that w.h.p.
a nearly optimal cover of the sampled elements also covers a large fraction of the whole
ground set. This argument applies to the aforementioned problems precisely because there
are standard ways of bounding the number of all integral candidate solutions (e.g. โ-covers).
However, in the fractional setting, there are infinitely many solutions. Consequently, we
employ the notion of ๐ฟ-integral solutions where the number of such solutions is bounded.
In Lemma 3.3.6, we showed that there always exists a ๐ฟ-integral solution to MaxCover-LP
whose coverage is at least a (1โ๐MC)-fraction of an optimal solution. Moreover, the number
of all possible solutions is bounded by the number of ways to divide the budget โ into โ/๐ฟ
equal parts of value ๐ฟ and distribute them (possibly with repetition) among ๐ entries:
Observation 3.3.7. The number of feasible ๐ฟ-integral solutions to MaxCover-LP(๐ฐ ,โฑ , โ,p)is ๐(๐โ/๐ฟ) for any multiple โ of ๐ฟ.
Next, we design our algorithm using the element sampling technique: we show that a
(1 โ ๐ฝ/3)-approximate solution of MaxCover-LP can be computed using the projection of
all sets in โฑ over a set of elements of size ฮ( โ log๐ log๐๐๐ฝ4 ) picked according to p. For every
fractional solution (x, z) and subset of elements ๐ฑ โ ๐ฐ , let ๐๐ฑ(x) ,โ
๐โ๐ฑ ๐๐๐ง๐ denote the
coverage of elements in ๐ฑ where ๐ง๐ = min(1,โ
๐:๐โ๐ ๐ฅ๐). We may omit the subscript ๐ฑ in
๐๐ฑ if ๐ฑ = ๐ฐ .The following lemma, which is essentially an extension of the Element Sampling lemma
of [61] for our application, MaxCover-LP, shows that a (1โ ๐MC)-approximate โ-cover over
82
a set of sampled elements of size ฮ(โ log ๐ log๐๐/๐พ4) w.h.p. has a weighted coverage of
at least (1 โ 2๐พ)(1 โ ๐MC) if there exists a fractional โ-cover whose coverage is 1. Thus,
choosing ๐MC = ๐พ = ๐ฝ/9 yields the desired guarantee for MaxCoverOracle, leading to
the performance given in Theorem 3.3.9.
Lemma 3.3.8. Let ๐MC and ๐พ be parameters. Consider the MaxCover-LP(๐ฐ ,โฑ , โ,p) withoptimal solution of value OPT, and let โ be a multi-set of ๐ = ฮ(โ log ๐ log(๐๐)/๐พ4) ele-
ments sampled independently at random according to the probability vector p. Let xsol be a
(1 โ ๐MC)-approximate ฮ( ๐พ2
log๐)-integral โ-cover over the sampled elements. Then with high
probability, ๐(xsol) โฅ (1โ 2๐พ)(1โ ๐MC)OPT.
Proof: Consider the MaxCover-LP(๐ฐ ,โฑ , โ,p) with optimal solution (xOPT, zOPT) of value
OPT, and let xsol be a (1 โ ๐MC)-approximate ฮ( ๐พ2
log๐)-integral โ-cover over the sampled
elements and zsol be its corresponding coverage vector. Denote the sampled elements with
โ = {๐1, ยท ยท ยท , ๐๐ }. Observe that by defining each X๐ as a random variable that takes the value
๐งOPT๐๐with probability ๐๐๐ and 0 otherwise, the expected value of X =
โ๐ ๐=1 X๐ is
E[X] =๐ โ
๐=1
E[X๐] = ๐ โ๐โ๐ฐ
๐๐ ยท ๐งOPT๐ = ๐ ยท ๐(xOPT) = ๐ ยทOPT.
Let ๐ = ๐ (1โ ๐พ)OPT. Since X๐ โ [0, 1], by applying Chernoff bound on X, we obtain
Pr ๐โ(xOPT) โค ๐ = PrX โค (1โ ๐พ)E[X]
โค ๐โ๐พ2E[X]
3 โค ๐โฮฉ(โ log(๐๐) log๐/๐พ2)
3 = (๐๐)โฮฉ(โ log๐/๐พ2).
Therefore, since xsol is a (1โ ๐MC)-approximate solution of MaxCover-LP(โ,โฑ , โ,p), withprobability 1โ (๐๐)โฮฉ(โ log๐/๐พ2), we have ๐โ(xsol) โฅ (1โ ๐MC)๐ .
Next, by a similar approach, we show that for any fractional solution x, if ๐โ(x) โฅ๐โ(xOPT), then with probability 1 โ (๐๐)โฮฉ(โ log๐/๐พ2), ๐(x) โฅ
(1โ๐พ1+๐พ
)(1 โ ๐MC)OPT. Con-
sider a fractional โ-cover (x, z) whose coverage is less than(1โ๐พ1+๐พ
)(1 โ ๐MC)OPT. Let Y๐
denote a random variable that takes value ๐ง๐๐ with probability ๐๐๐ , and define Y =โ๐
๐=1 Y๐.
Then, E[Y๐] = ๐(x) <(1โ๐พ1+๐พ
)(1 โ ๐MC)OPT. For ease of analysis, let each Y๐ โ [0, 1] be
an auxiliary random variable that stochastically dominates Y๐ with expectation E[Y๐] =
83
(1โ๐พ1+๐พ
)(1 โ ๐MC)OPT, and Y =
โ๐ ๐=1 Y๐ which stochastically dominates Y with expectation
E[Y] = ๐ ยท(1โ๐พ1+๐พ
)(1โ ๐MC)OPT = (1โ๐MC)๐
1+๐พ. We then have
Pr ๐โ(x) > (1โ ๐MC)๐ = PrY > (1โ ๐MC)๐ = PrY > (1 + ๐พ)E[Y]
โค PrY > (1 + ๐พ)E[Y] โค ๐โ๐พ2E[Y]
3 โค (๐๐)โฮฉ(โ log๐/๐พ2),
using the fact that(1โ๐พ1+๐พ
)(1โ ๐MC) = ฮ(1) for our interested range of parameters. Thus,
Pr ๐(x) โค(1โ ๐พ
1 + ๐พ
)(1โ ๐MC)OPT and ๐โ(x) > (1โ ๐MC)๐ โค (๐๐)โฮฉ(โ log๐/๐พ2).
In other words, except with probability (๐๐)โฮฉ(โ log๐/๐พ2), a chosen solution x that offers at
least as good empirical coverage over โ as xOPT (namely xsol) does have actual coverage of
at least(1โ๐พ1+๐พ
)(1โ ๐MC)OPT.
Since the total number of ฮ( ๐พ2
log๐)-integral โ-covers is ๐(๐โ log๐/๐พ2
) (Observation 3.3.7),
applying union bound, with probability at least 1 โ ๐(๐โ log๐/๐พ2) ยท (๐๐)โฮฉ(โ log๐/๐พ2) = 1 โ
1poly(๐๐)
, a (1โ ๐MC)-approximate ฮ( ๐พ2
log๐)-integral solution of MaxCover-LP(โ,โฑ , โ,p) has
weighted coverage of at least(1โ๐พ1+๐พ
)(1โ ๐MC)OPT > (1โ 2๐พ)(1โ ๐MC)OPT over ๐ฐ . ๏ฟฝ
Theorem 3.3.9. There exists a streaming algorithm that w.h.p. returns a (1+๐)-approximate
fractional solution of SetCover-LP(๐ฐ ,โฑ) in ๐(log ๐/๐2) passes and uses ๐(๐/๐6+๐) mem-
ory for any positive ๐ โค 1/2. The algorithm works in both set arrival and edge arrival
streams.
Proof: The algorithm clearly requires ฮ(๐ ) passes to simulate the MWU algorithm. The
required amount of memory, besides ๐(๐) for counting elements, is dominated by the pro-
jected set system. In each pass over the stream, we sample ฮ(โ log๐๐ log ๐/๐4) elements,
and since they are rarely occurring, each is contained in at most ฮ(๐๐โ) sets. Finally, we
run log1+ฮ(๐) ๐ = ๐(log ๐/๐) instances of the MWU algorithm in parallel to compute a
(1 + ๐)-approximate solution. In total, our space complexity is ฮ(โ log๐๐ log ๐/๐4) ยทฮ(๐๐โ) ยท
๐(log ๐/๐) = ๐(๐/๐6). ๏ฟฝ
84
3.3.3. Final Step: Running Several MWU Rounds Together
We complete our result by further reducing the number of passes at the expense of increas-
ing the required amount of memory, yielding our full algorithm FastFeasibilityTest
in Algorithm 9. More precisely, aiming for a ๐-pass algorithm, we show how to execute
๐ , ๐ฮ(๐)
= ฮ( log๐๐๐ฝ2 ) rounds of the MWU algorithm in a single pass. We show that this task
may be accomplished with a multiplicative factor of ๐ ยทฮ(log๐๐) increase in memory usage,
where ๐ , ๐ฮ(1/(๐๐ฝ)).
Advance sampling. Consider a sequence of ๐ consecutive rounds ๐ = 1, . . . , ๐ . In order to
implement the MWU algorithm for these rounds, we need (multi-)sets of sampled elements
โ1, . . . ,โ๐ according to probabilities p1, . . . ,p๐ , respectively (where p๐ is the probability
corresponding to round ๐). Since the probabilities of subsequent rounds are not known in
advance, we circumvent this problem by choosing these sets โ๐โs with probabilities according
to p1, but the number of samples in each set will be |โ๐| = ๐ ยท ๐ ยท ฮ(log๐๐) instead of ๐ .
Then, once p๐ is revealed, we sub-sample elements from โ๐ to obtain โโฒ๐ as follow: for a (copy
of) sampled element ๐ โ โ๐, add ๐ to โโฒ๐ with probability ๐๐๐
๐1๐๐; otherwise, simply discard it.
Note that it is still left to be shown that the probability above is indeed at most 1.
Since each ๐ was originally sampled with probability ๐1๐, then in โโฒ๐, the probability that a
sampled element ๐ = ๐ is exactly ๐๐๐/๐ . By having ๐ ยทฮ(log๐๐) times the originally required
number of samples ๐ in the first place, in expectation we still have E[|โโฒ๐|] = |โ๐|
โ๐โ๐ฐ
๐๐๐๐=
(๐ ยท๐ ยทฮ(log๐๐)) 1๐= ๐ ยทฮ(log๐๐). Due to the ฮ(log๐๐) factor, by the Chernoff bound, we
conclude that with w.h.p. |โโฒ๐| โฅ ๐ . Thus, we have a sufficient number of elements sampled
with probability according to p๐ to apply Lemma 3.3.8, as needed.
Change in probabilities. As noted above, we must show that the probability that we
sub-sample each element is at most 1; that is, ๐๐๐/๐1๐ โค ๐ = ๐ฮ(1/(๐๐ฝ)) for every element ๐ and
every round ๐ = 1, . . . , ๐ . We bound the multiplicative difference between the probabilities
of two consecutive rounds as follows.
Lemma 3.3.10. Let p and pโฒ be the probability of elements before and after an update.
Then for every element ๐, ๐โฒ๐ โค (1 +๐(๐ฝ))๐๐.
85
Proof: Recall the weight update formula ๐ค๐ก+1๐ = ๐ค๐ก
๐(1โ ๐ฝ(A๐xโ๐๐)6๐
) for the MWU framework,
where A๐ร|โฑ| represents the membership matrix corresponding to the extended set system
(๐ฐ , โฑ). In our case, the desired coverage amount is ๐๐ = 1. By construction, we have
A๐x = ๐ง๐ โค 1; therefore, our width is ๐ = 1, and โ1 โค A๐xโ ๐๐ โค 0. That is, the weight of
each element cannot decrease, but may increase by at most a multiplicative factor of 1+๐ฝ/6,
before normalization. Thus even after normalization no weight may increase by more than
a factor of 1 + ๐ฝ/6 = 1 +๐(๐ฝ). ๏ฟฝ
Therefore, after ๐ = ฮ( log๐๐๐ฝ2 ) rounds, the probability of any element may increase by at
most a factor of (1 +๐(๐ฝ))ฮ( log๐
๐๐ฝ2) โค ๐ฮ( log๐
๐๐ฝ) = ๐ฮ(1/(๐๐ฝ)) = ๐ , as desired. This concludes the
proof of Theorem 7.8.9.
Implementation details. We make a few remarks about the implementation given in
Algorithm 9. First, even though we perform all sampling in advance, the decisions of Max-
CoverOracle do not depend on any โ๐ of later rounds, and UpdateProb is entirely
deterministic: there is no dependency issue between rounds. Next, we only need to perform
UpdateProb on the sampled elements โ = โ1 โช ยท ยท ยท โช โ๐ during the current ๐ rounds.
We therefore denote the probabilities with a different vector q๐ over the sampled elements
โ only. Probabilities of elements outside โ are not required by MaxCoverOracle dur-
ing these rounds, but we simply need to spend one more pass after executing ๐ rounds of
MWU to aggregate the new probability vector p over all (rare) elements. Similarly, since
MaxCoverOracle does not have the ability to verify, during the MWU algorithm, that
each solution x๐ returned by the oracle indeed provides a sufficient coverage, we check all
of them during this additional pass. Lastly, we again remark that this algorithm operates
on the extended set system: the solution x returned by MaxCoverOracle has at least
the same coverage as x. While x is not explicitly computed, its coverage vector z can be
computed exactly.
3.3.4. Extension to general covering LPs
We remark that our MWU-based algorithm can be extended to solve a more general class
of covering LPs. Consider the problem of finding a vector x that minimizes cโคx subject to
86
Algorithm 9 An efficient implementation of FeasibilityTest which performs in ๐ passesand consumes ๐(๐๐๐( 1
๐๐) + ๐) space.
1: procedure FastFeasibilityTest(โ, ๐)2: ๐ผ, ๐ฝ โ ๐
3, pcurr โ 1๐ร1 โ the initial prob. vector for the MWU algorithm on ๐ฐ
3: โ see Algorithm 5 for implementation of the below line4: compute a cover of common elements in one pass5: xtotal โ 0๐ร1
6: โ MWU algorithm for covering rare elements7: for ๐ = 1 to ๐ do8: ๐ โ ฮ( log๐
๐๐ฝ2 ) โ number of MWU iterations performed together9: โ in one pass, projects all sets in โฑ over the collections of samples โ1, ยท ยท ยท โ๐
10: sample โ1, . . . ,โ๐ according to pcurr each of size โ๐ฮ(1/(๐๐ฝ)) poly(log๐๐)11: โ โ โ1 โช ยท ยท ยท โช โ๐ , โฑโ โ โ โ โ is a set whereas โ1, . . . ,โ๐ are multi-sets12: for all set ๐ in the stream do13: โฑโ โ โฑโ โช {๐ โฉ โ}14: โ each pass simulates ๐ rounds of MWU15: for all ๐ โ โ do16: ๐1๐ โ ๐curr๐ โ project pcurr
๐ร1 to q1|โ|ร1 over sampled elements
17: q1 โ q1
โq1โ18: for all round ๐ = 1, . . . , ๐ do
19: โโฒ๐ โ sample each elt ๐ โ โ๐ with probab. ๐๐๐
๐1๐๐ฮ(1/(๐๐ฝ)) โ rejection sampling
20: x๐ โMaxCoverOracle(โโฒ๐,โฑโ, โ) โ w.h.p. ๐(x๐) โฅ 1โ ๐ฝ/3 when โ โฅ ๐
21: โ in no additional pass, updates probab. q over sampled elts accord. to x๐
22: zโ 0|โ|ร1 โ compute coverage over sampled elements23: for all element-set pair ๐ โ ๐ where ๐ โ โฑโ do24: ๐ง๐ โ min(๐ง๐ + ๐ฅ๐
๐, 1)
25: q๐+1 โ UpdateProb(q๐, z) โ only update weights of elements in โ26: โ in one pass, updates probab. pcurr over all (rare) elts according to x1, . . . ,x๐
27: z1, . . . , z๐ โ 0๐ร1 โ compute coverage over all (rare) elements28: for all element-set pair ๐ โ ๐ in the stream do29: for all round ๐ = 1, . . . , ๐ do30: ๐ง๐๐ โ min(๐ง๐๐ + ๐ฅ๐
๐, 1)
31: for all round ๐ = 1, . . . , ๐ do32: if (pcurr)โคz๐ < 1โ ๐ฝ/3 then โ detect infeasible solutions33: report INFEASIBLE
34: xtotal โ xtotal + x๐, pcurr โ UpdateProb(pcurr, z๐) โ perform actual updates
35: xrare โ xtotal
(1โ๐ฝ)๐โ scaled up the solution to cover rare elements
36: return xcmn + xrare
constraints Ax โฅ b and x โฅ 0. In terms of the Set Cover problem, ๐ด๐,๐ โฅ 0 indicates the
multiplicity of an element ๐ in the set ๐, ๐๐ > 0 denotes the number of times we wish ๐ to
87
be covered, and ๐๐ > 0 denotes the cost per unit for the set ๐. Now define
๐ฟ , min(๐,๐):๐ด๐,๐ =0
๐ด๐,๐
๐๐๐๐and ๐ , max
(๐,๐)
๐ด๐,๐
๐๐๐๐.
Then, we may modify our algorithm to obtain the following result.
Theorem 3.3.11. There exists a streaming algorithm that w.h.p. returns a (1+๐)-approximate
fractional solution to general covering LPs in ๐ passes and using ๐(๐๐๐6๐ฟยท๐๐( 1
๐๐)+๐) memory
for any 3 โค ๐ โค polylog(๐), where parameters ๐ฟ and ๐ are defined above. The algorithm
works in both set arrival and edge arrival streams.
Proof: We modify our algorithm and provide an argument of its correctness as follows. First,
observe that we can convert the input LP into an equivalent LP with all entries ๐๐ = ๐๐ = 1
by simply replacing each ๐ด๐,๐ with๐ด๐,๐
๐๐๐๐. Namely, let the new parameters beAโฒ,bโฒ and cโฒ, and
we consider the variable xโฒ where ๐ฅโฒ๐ = ๐๐๐ฅ๐. It is straightforward to verify that cโฒโคxโฒ = cโคx
and Aโฒ๐x
โฒ = A๐x๐๐
, reducing the LP into the desired case. Thus, we may afford to record b
and c, so that each value ๐ด๐,๐
๐๐๐๐may be computed on-the-fly. Henceforth we assume that all
entries ๐๐ = ๐๐ = 1 and ๐ด๐,๐ โ {0}โช [๐ฟ,๐ ]. Observe as well that the optimal objective value
๐ may be in the expanded range [1/๐, ๐/๐ฟ], so the number of guesses must be increased
from log๐๐
to log(๐๐/๐ฟ)๐
.
Next consider the process for covering the rare elements. We instead use a uniform
solution xcmn = ๐ผโ๐ฟ๐ยท1. Observe that if an element occurs in at least ๐
๐ผโ๐ฟsets, then A๐x
cmn =โ๐:๐โ๐ ๐ด๐,๐ ยท ๐ผโ๐ โฅ ๐
๐ผโ๐ฟยท๐ฟ ยท ๐ผโ
๐= 1. That is, we must adjust our definition so that an element
is considered common if it appears in at least ๐๐ผโ๐ฟ
sets. Consequently, whenever we perform
element sampling, the required amount of memory to store information of each element
increases by a factor of 1/๐ฟ.
Next consider Lemma 3.3.5, where we show an existence of integral solutions via the
MWU algorithm with a greedy oracle. As the greedy implementation chooses a set ๐ and
places the entire budget โ on ๐ฅ๐, the amount of coverage ๐ด๐,๐๐ฅ๐ may be as large as โ๐ as ๐ด๐,๐
is no longer bounded by 1. Thus this application of the MWU algorithm has width ๐ = ฮ(โ๐)
and requires ๐ = ฮ( โ๐ log๐๐2MC
) rounds. Consequently, its solution becomes ฮ( โ๐) = ฮ(
๐2MC
๐ log๐)-
integral. As noted in Observation 3.3.7, the number of potential solutions from the greedy
88
oracle increases by a power of ๐ . Then, in Lemma 3.3.8, we must reduce the error probability
of each solution by the same power. We increase the number of samples ๐ by a factor of ๐
to account for this change, increasing the required amount of memory by the same factor.
As in the previous case, any solution x may always be pruned so that the width is
reduced to 1: our algorithm Prune still works as long as the entries of A are non-negative.
Therefore, the fact that entries of A may take on values other than 0 or 1 does not affect the
number of rounds (or passes) of our overall application of the MWU framework. Thus, we
may handle general covering LPs using a factor of ๐(๐/๐ฟ) larger memory within the same
number of passes. In particular, if the non-zero entries of the input are bounded in the range
[1,๐ ], this introduces a factor of ๐(๐/๐ฟ) โค ๐(๐3) overhead in memory usage. ๏ฟฝ
89
Chapter 4
Sublinear Set Cover
4.1. Introduction
It is well-known that although the problem of finding an optimal solution is NP-complete, a
natural greedy algorithm which iteratively picks the โbestโ remaining set is provably optimal.
The algorithm finds a solution of size at most ๐ ln๐ where ๐ is the optimum cover size, and
can be implemented to run in time linear in the input size. However, the input size itself
could be as large as ฮ(๐๐), so for large data sets even reading the input might be infeasible.
This raises a natural question: is it possible to solve minimum set cover in sub-linear
time? This question was previously addressed in [145, 172], who showed that one can design
constant running-time algorithms by simulating the greedy algorithm, under the assumption
that the sets are of constant size and each element occurs in a constant number of sets.
However, those constant-time algorithms have a few drawbacks: they only provide a mixed
multiplicative/additive guarantee (the output cover size is guaranteed to be at most ๐ ยทln๐ + ๐๐), the dependence of their running times on the maximum set size is exponential,
and they only output the (approximate) minimum set cover size, not the cover itself. From
a different perspective, [119] (building on [84]) showed that an ๐(1)-approximate solution
to the fractional version of the problem can be found in ๐(๐๐2 + ๐๐2) time1. Combining
this algorithm with the randomized rounding yields an ๐(log ๐)-approximate solution to
Set Cover with the same complexity.
1The method can be further improved to ๐(๐+ ๐๐) (N. Young, personal communication).
91
In this chapter we study the complexity of sub-linear time algorithms for set cover with
multiplicative approximation guarantees. Our upper bounds complement the aforementioned
result of [119] by presenting algorithms which are fast when ๐ is large, as well as algorithms
that provide more accurate solutions that use a sub-linear number of queries.2 Equally im-
portantly, we establish nearly matching lower bounds, some of which even hold for estimating
the optimal cover size. Our upper and lower bounds are presented in Table 4.1.1.
Data access model. As in the prior work [145, 172] on Set Cover, our algorithms and
lower bounds assume that the input can be accessed via the adjacency-list oracle.3 More
precisely, the algorithm has access to the following two oracles:
1. EltOf: Given a set ๐๐ and an index ๐, the oracle returns the ๐th element of ๐๐. If
๐ > |๐๐|, โฅ is returned.
2. SetOf: Given an element ๐๐ and an index ๐, the oracle returns the ๐th set containing
๐๐. If ๐๐ appears in less than ๐ sets, โฅ is returned.
This is a natural model, providing a โtwo-wayโ connection between the sets and the
elements. Furthermore, for some graph problems modeled by Set Cover (such as Dominating
Set or Vertex Cover), such oracles are essentially equivalent to the aforementioned incident-
list model studied in sub-linear graph algorithms. We also note that the other popular
access model employing the membership oracle, where we can query whether an element ๐ is
contained in a set ๐, is not suitable for Set Cover, as it can be easily seen that even checking
whether a feasible cover exists requires ฮฉ(๐๐) time.
4.1.1. Our Results
In this chapter, we present algorithms and lower bounds for the Set Cover problem. The
results are summarized in Table 4.1.1. The NP-hardness of this problem (or even its ๐(log ๐)-
approximate version [72, 152, 14, 138, 63]) precludes the existence of highly accurate algo-
rithms with fast running times, while (as we show) it is still possible to design algorithms
2Note that polynomial time algorithm with sub-logarithmic approximation algorithms are unlikely toexist.
3In the context of graph problems, this model is also known as the incidence-list model, and has beenstudied extensively, see e.g., [45, 80, 31].
92
Problem Approximation Constraints Query Complexity Section
Set Cover
๐ผ๐+ ๐ ๐ผ โฅ 2 ๐(1๐(๐(๐
๐)
1๐ผโ1 + ๐๐)) 4.4.1
๐+ ๐ - ๐(๐๐๐๐2
) 4.4.2
๐ผ ๐ < ( ๐log๐
)1
4๐ผ+1 ฮฉ(๐(๐๐)1/(2๐ผ)) 4.6
๐ผ๐ผ โค 1.01 and
๐ = ๐( ๐log๐
)
ฮฉ(๐๐๐) 4.3.2
Cover
Verification- ๐ โค ๐/2 ฮฉ(๐๐) 4.5
Table 4.1.1: A summary of our algorithms and lower bounds. We use the following notation:๐ โฅ 1 denotes the size of the optimum cover; ๐ผ โฅ 1 denotes a parameter that determinesthe trade-off between the approximation quality and query/time complexities; ๐ โฅ 1 denotesthe approximation factor of a โblack boxโ algorithm for set cover used as a subroutine.
with sub-linear query complexities and low approximation factors. The lower bound proofs
hold for the running time of any algorithm approximation set cover assuming the defined
data access model.
We present two algorithms with sub-linear number of queries. First, we show that the
streaming algorithm presented in [97] can be adapted so that it returns an ๐(๐ผ)-approximate
cover using ๐(๐(๐/๐)1/(๐ผโ1) + ๐๐) queries, which could be quadratically smaller than ๐๐.
Second, we present a simple algorithm which is tailored to the case when the value of ๐ is
large. This algorithm computes an ๐(log ๐)-approximate cover in ๐(๐๐/๐) time (not just
query complexity). Hence, by combining it with the algorithm of [119], we get an ๐(log ๐)-
approximation algorithm that runs in time ๐(๐+ ๐โ๐).
We complement the first result by proving that for low values of ๐, the required number
of queries is ฮฉ(๐(๐/๐)1/(2๐ผ)) even for estimating the size of the optimal cover. This shows
that the first algorithm is essentially optimal for the values of ๐ where the first term in the
runtime bound dominates. Moreover, we prove that even the Cover Verification problem,
which is checking whether a given collection of ๐ sets covers all the elements, would require
ฮฉ(๐๐) queries. This provides strong evidence that the term ๐๐ in the first algorithm is
unavoidable. Lastly, we complement the second algorithm, by showing a lower bound ofฮฉ(๐๐/๐) if the approximation ratio is a small constant.
93
4.1.2. Related work
Sub-linear algorithms for Set Cover under the oracle model have been previously studied as
an estimation problem; the goal is only to approximate the size of the minimum set cover
rather than constructing one. Nguyen and Onak [145] consider Set Cover under the oracle
model we employ in this chapter, in a specific setting where both the maximum cardinal-
ity of sets in โฑ , and the maximum number of occurrences of an element over all sets, are
bounded by some constants ๐ and ๐ก; this allows algorithms whose time and query complex-
ities are constant, (2(๐ ๐ก)4/๐)๐(2๐ ), containing no dependency on ๐ or ๐. They provide an
algorithm for estimating the size of the minimum set cover when, unlike our work, allowing
both ln ๐ multiplicative and ๐๐ additive errors. Their result has been subsequently improved
to (๐ ๐ก)๐(๐ )/๐2 by Yoshida et al. [172]. Additionally, the results of Kuhn et al. [121] on general
packing/covering LPs in the distributed โ๐ช๐๐โ model, together with the reduction method
of Parnas and Ron [149], implies estimating set cover size to within a ๐(ln ๐ )-multiplicative
factor (with ๐๐ additive error), can be performed in (๐ ๐ก)๐(log ๐ log ๐ก)/๐4 time/query complexi-
ties.
Set Cover can also be considered as a generalization of the Vertex Cover problem. The
estimation variant of Vertex Cover under the adjacency-list oracle model has been studied
in [149, 130, 148, 172]. Set Cover has been also studied in the sub-linear space context, most
notably for the streaming model of computation [155, 67, 42, 20, 17, 29, 105, 61, 97]. In this
model, there are algorithms that compute approximate set covers with only multiplicative
errors. Our algorithms use some of the ideas introduced in the last two papers [61, 97].
4.1.3. Our Techniques
Overview of the algorithms. The algorithmic results presented in Section 4.4, use the
techniques introduced for the streaming Set Cover problem by [61, 97] to get new results
in the context of sub-linear time algorithms for this problem. Two components previously
used for the set cover problem in the context of streaming are Set Sampling and Element
Sampling. Assuming the size of the minimum set cover is ๐, Set Sampling randomly samples๐(๐) sets and adds them to the maintained solution. This ensures that all the elements that
are well represented in the input (i.e., appearing in at least ๐/๐ sets) are covered by the
94
sampled sets. On the other hand, the Element Sampling technique samples roughly ๐(๐/๐ฟ)
elements, and finds a set cover for the sampled elements. It can be shown that the cover for
the sampled elements covers a (1โ ๐ฟ) fraction of the original elements.
Specifically, the first algorithm performs a constant number of iterations. Each iteration
uses element sampling to compute a โpartialโ cover, removes the elements covered by the
sets selected so far and recurses on the remaining elements. However, making this process
work in sub-linear time (as opposed to sub-linear space) requires new technical development.
For example, the algorithm of [97] relies on the ability to test membership for a set-element
pair, which generally cannot be efficiently performed in our model.
The second algorithm performs only one round of set sampling, and then identifies the
elements that are not covered by the sampled sets, without performing a full scan of those
sets. This is possible because with high probability only those elements that belong to few
input sets are not covered by the sample sets. Therefore, we can efficiently enumerate all
pairs (๐๐, ๐๐), ๐๐ โ ๐๐, for those elements ๐๐ that were not covered by the sampled sets. We
then run a black box algorithm only on the set system induced by those pairs. This approach
lets us avoid the ๐๐ term present in the query and runtime bounds for the first algorithm,
which makes the second algorithm highly efficient for large values of ๐.
The Set Cover lower bound for smaller optimal value ๐. We establish our lower
bound for the problem of estimating the size of the minimum set cover, by constructing two
distributions of set systems. All systems in the same distribution share the same optimal
set cover size, but these sizes differ by a factor ๐ผ between the two distributions; thus, the
algorithm is required to determine from which distribution its input set system is drawn,
in order to correctly estimate the optimal cover size. Our distributions are constructed by
a novel use of the probabilistic method. Specifically, we first probabilistically construct a
set system called median instance (see Lemma 4.3.6): this set system has the property that
(a) its minimum set cover size is ๐ผ๐ and (b) a small number of changes to the instance
reduces the minimum set cover size to ๐. We set the first distribution to be always this
median instance. Then, we construct the second distribution by a random process that
performs the changes (depicted in Algorithm 10) resulting in a modified instance. This
95
process distributes the changes almost uniformly throughout the instance, which implies
that the changes are unlikely to be detected unless the algorithm performs a large number of
queries. We believe that this construction might find applications to lower bounds for other
combinatorial optimization problems.
The Set Cover lower bound for larger optimal value ๐. Our lower bound for the
problem of computing an approximate set cover leverages the construction above. We create
a combined set system consisting of multiple modified instances all chosen independently at
random, allowing instances with much larger ๐. By the properties of the random process
generating modified instances, we observe that most of these modified instances have different
optimal set cover solution, and that distinguishing these instances from one another requires
many queries. Thus, it is unlikely for the algorithm to be able to compute an optimal solution
to a large fraction of these modified instances, and therefore it fails to achieve the desired
approximation factor for the overall combined instance.
The Cover Verification lower bound for a cover of size ๐. For Cover Verification,
however, we instead give an explicit construction of the distributions. We first create an
underlying set structure such that initially, the candidate sets contain all but ๐ elements.
Then we may swap in each uncovered element from a non-candidate set. Our set structure is
systematically designed so that each swap only modifies a small fraction of the answers from
all possible queries; hence, each swap is hard to detect without ฮฉ(๐) queries. The distribution
of valid set covers is composed of instances obtained by swapping in every uncovered element,
and that of non-covers is similarly obtained but leaving one element uncovered.
4.2. Preliminaries
First, we formally specify the representation of the set structures of input instances, which
applies to both Set Cover and Cover Verification.
Our lower bound proofs rely mainly on the construction of instances that are hard to
distinguish by the algorithm. To this end, we define the swap operation that exchanges a
pair of elements between two sets, and how this is implemented in the actual representation.
96
Definition 4.2.1 (swap operation). Consider two sets ๐ and ๐ โฒ. A swap on ๐ and ๐ โฒ is
defined over two elements ๐, ๐โฒ such that ๐ โ ๐ โ๐ โฒ and ๐โฒ โ ๐ โฒโ๐, where ๐ and ๐ โฒ exchange ๐
and ๐โฒ. Formally, after performing swap(๐, ๐โฒ), ๐ = (๐โช{๐โฒ})โ{๐} and ๐ โฒ = (๐ โฒโช{๐})โ{๐โฒ}. Asfor the representation via EltOf and SetOf, each application of swap only modifies 2 entries
for each oracle. That is, if previously ๐ = EltOf(๐, ๐), ๐ = SetOf(๐, ๐), ๐โฒ = EltOf(๐ โฒ, ๐โฒ),
and ๐ โฒ = SetOf(๐โฒ, ๐โฒ), then their new values change as follows: ๐โฒ = EltOf(๐, ๐), ๐ โฒ =
SetOf(๐, ๐), ๐ = EltOf(๐ โฒ, ๐โฒ), and ๐ = SetOf(๐โฒ, ๐โฒ).
In particular, we extensively use the property that the amount of changes to the oracleโs
answers incurred by each swap is minimal. We remark that when we perform multiple
swaps on multiple disjoint set-element pairs, every swap modifies distinct entries and do not
interfere with one another.
Lastly, we define the notion of query-answer history, which is a common tool for estab-
lishing lower bounds for sub-linear algorithms under query models.
Definition 4.2.2. By query-answer history, we denote the sequence of query-answer pairs
โจ(๐1, ๐1), (๐2, ๐2), . . . , (๐๐, ๐๐)โฉ recording the communication between the algorithm and the
oracles, where each new query ๐๐+1 may only depend on the query-answer pairs (๐1, ๐1), . . . ,
(๐๐, ๐๐). In our case, each ๐๐ represents either a SetOf query or an EltOf query made by
the algorithm, and each ๐๐ is the oracleโs answer to that respective query according to the
set structure instance.
4.3. Lower Bounds for the Set Cover Problem
In this section, we present lower bounds for Set Cover both for small values of the optimal
cover size ๐ (in Section 4.3.1), and for large values of ๐ (in Section 4.3.2). For low values of
๐, we prove the following theorem whose proof is postponed to Section 4.6.
Theorem 4.3.1. For 2 โค ๐ โค ( ๐16๐ผ log๐
)1
4๐ผ+1 and 1 < ๐ผ โค log ๐, any randomized algorithm
that solves the Set Cover problem with approximation factor ๐ผ and success probability at least
2/3 requires ฮฉ(๐(๐/๐)12๐ผ ) queries.
Instead, in Section 4.3.1 we focus on the simple setting of this theorem which applies to
approximation protocols for distinguishing between instances with minimum set cover sizes
97
2 and 3, and show a lower bound of ฮฉ(๐๐) (which is tight up to a polylogarithmic factor)
for approximation factor 3/2. This simplification is for the purpose of both clarity and also
for the fact that the result for this case is used in Section 4.3.2 to establish our lower bound
for large values of ๐.
High level idea. Our approach for establishing the lower bound is as follows. First,
we construct a median instance ๐ผ* for Set Cover, whose minimum set cover size is 3. We
then apply a randomized procedure GenModifiedInst, which slightly modifies the median
instance into a new instance containing a set cover of size 2. Applying Yaoโs principle, the
distribution of the input to the deterministic algorithm is either ๐ผ* with probability 1/2,
or a modified instance generated thru GenModifiedInst(๐ผ*), which is denoted by ๐(๐ผ*),again with probability 1/2. Next, we consider the execution of the deterministic algorithm.
We show that unless the algorithm asks at least ฮฉ(๐๐) queries, the resulting query-answer
history generated over ๐ผ* would be the same as those generated over instances constituting a
constant fraction of ๐(๐ผ*), reducing the algorithmโs success probability to below 2/3. More
specifically, we will establish the following theorem.
Theorem 4.3.2. Any algorithm that can distinguish whether the input instance is ๐ผ* or
belongs to ๐(๐ผ*) with probability of success greater than 2/3, requires ฮฉ(๐๐/ log๐) queries.
Corollary 4.3.3. For 1 < ๐ผ < 3/2, and ๐ โค 3, any randomized algorithm that approxi-
mates by a factor of ๐ผ, the size of the optimal cover for the Set Cover problem with success
probability at least 2/3 requires ฮฉ(๐๐) queries.
For simplicity, we assume that the algorithm has the knowledge of our construction
(which may only strengthens our lower bounds); this includes ๐ผ* and ๐(๐ผ*), along with
their representation via EltOf and SetOf. The objective of the algorithm is simply to
distinguish them. Since we are distinguishing a distribution of instances ๐(๐ผ*) against a
single instance ๐ผ*, we may individually upper bound the probability that each query-answer
pair reveals the modified part of the instance, then apply the union bound directly. However,
establishing such a bound requires a certain set of properties that we obtain through a careful
design of ๐ผ* and GenModifiedInst. We remark that our approach shows the hardness of
98
distinguishing instances with with different cover sizes. That is, our lower bound on the
query complexity also holds for the problem of approximating the size of the minimum set
cover (without explicitly finding one).
Lastly, in Section 4.3.2 we provide a construction utilizing Theorem 4.3.2 to extend
Corollary 4.3.3, establish the following theorem on lower bounds for larger minimum set
cover sizes.
Theorem 4.3.4. For any sufficiently small approximation factor ๐ผ โค 1.01 and ๐ = ๐( ๐log๐
),
any randomized algorithm that computes an ๐ผ-approximation to the Set Cover problem with
success probability at least 0.99 requires ฮฉ(๐๐/๐) queries.
4.3.1. The Set Cover Lower Bound for Small Optimal Value ๐
Construction of the Median Instance ๐ผ*. Let โฑ be a collection of ๐ sets such that
(independently for each set-element pair (๐, ๐)) ๐ contains ๐ with probability 1โ ๐0, where
๐0 =โ
9 log๐๐
(note that since we assume log๐ โค ๐/๐ for large enough ๐, we can assume
that ๐0 โค 1/2). Equivalently, we may consider the incidence matrix of this instance: each
entry is either 0 (indicating ๐ /โ ๐) with probability ๐0, or 1 (indicating ๐ โ ๐) otherwise.
We write โฑ โผ โ(๐ฐ , ๐0) denoting the collection of sets obtained from this construction.
Definition 4.3.5 (Median instance). An instance of Set Cover, ๐ผ, is a median instance
if it satisfies all the following properties.
(a) No two sets cover all the elements. (The size of its minimum set cover is at least 3.)
(b) For any two sets the number of elements not covered by the union of these sets is at
most 18 log๐.
(c) The intersection of any two sets has size at least ๐/8.
(d) For any pair of elements ๐, ๐โฒ, the number of sets ๐ s.t. ๐ โ ๐ but ๐โฒ /โ ๐ is at least๐โ9 log๐4โ๐
.
(e) For any triple of sets ๐, ๐1 and ๐2, |(๐1 โฉ ๐2) โ ๐| โค 6โ๐ log๐.
(f) For each element, the number of sets that do not contain that element is at most
6๐โ
log๐๐
.
Lemma 4.3.6. There exists a median instance ๐ผ* satisfying all properties from Defini-
tion 4.3.5. In fact, with high probability, an instance drawn from the distribution in which
99
Pr ๐ โ ๐ = 1โ ๐0 independently at random, satisfies the median properties.
The proof of the lemma follows from standard applications of concentration bounds. Specif-
ically, it follows from the union bound and Lemmas 4.7.1โ4.7.6, appearing in Section 4.7.
Distribution ๐(๐ผ*) of Modified Instances ๐ผ โฒ Derived from ๐ผ*. Fix a median instance
๐ผ*. We now show that we may perform ๐(log๐) swap operations on ๐ผ* so that the size of
the minimum set cover in the modified instance becomes 2. Moreover, its incidence matrix
differs from that of ๐ผ* in ๐(log๐) entries. Consequently, the number of queries to EltOf
and SetOf that induce different answers from those of ๐ผ* is also at most ๐(log๐).
We define ๐(๐ผ*) as the distribution of instances ๐ผ โฒ generated from a median instance ๐ผ* by
GenModifiedInst(๐ผ*) given below in Algorithm 10 as follows. Assume that ๐ผ* = (๐ฐ ,โฑ).We select two different sets ๐1, ๐2 from โฑ uniformly at random; we aim to turn these two
sets into a set cover. To do so, we swap out some of the elements in ๐2 and bring in the
uncovered elements. For each uncovered element ๐, we pick an element ๐โฒ โ ๐2 that is also
covered by ๐1. Next, consider the candidate set that we may exchange its ๐ with ๐โฒ โ ๐2:
Definition 4.3.7 (Candidate set). For any pair of elements ๐, ๐โฒ, the candidate set of
(๐, ๐โฒ) are all sets that contain ๐ but not ๐โฒ. The collection of candidate sets of (๐, ๐โฒ) is
denoted by Candidate(๐, ๐โฒ). Note that Candidate(๐, ๐โฒ) = Candidate(๐โฒ, ๐) (in fact, these two
collections are disjoint).
Algorithm 10 The procedure of constructing a modified instance of ๐ผ*.
1: procedure GenModifiedInst(๐ผ* = (๐ฐ ,โฑ))2: โณโ โ 3: pick two different sets ๐1, ๐2 from โฑ uniformly at random4: for all ๐ โ ๐ฐ โ (๐1 โช ๐2) do5: pick ๐โฒ โ (๐1 โฉ ๐2) โโณ uniformly at random6: โณโโณโช {๐โฒ}7: pick a random set ๐ in Candidate(๐, ๐โฒ)8: swap(๐, ๐โฒ) between ๐, ๐2
We choose a random set ๐ from Candidate(๐, ๐โฒ), and swap ๐ โ ๐ with ๐โฒ โ ๐2 so that ๐2
now contains ๐. We repeatedly apply this process for all initially uncovered ๐ so that eventu-
ally ๐1 and ๐2 form a set cover. We show that the proposed algorithm, GenModifiedInst,
can indeed be executed without getting stuck.
100
Lemma 4.3.8. The procedure GenModifiedInst is well-defined under the precondition
that the input instance ๐ผ* is a median instance.
Proof: To carry out the algorithm, we must ensure that the number of the initially uncovered
elements is at most that of the elements covered by both ๐1 and ๐2. This follows from the
properties of median instances (see Definition 4.3.5): |๐ฐ โ (๐1 โช ๐2)| โค 18 log๐ by property
(b), and that the size of the intersection of ๐1 and ๐2 is greater than ๐/8 by property (c).
That is, in our construction there are sufficiently many possible choices for ๐โฒ to be matched
and swapped with each uncovered element ๐. Moreover, by property (d) there are plenty of
candidate sets ๐ for performing swap(๐, ๐โฒ) with ๐2. ๏ฟฝ
Bounding the Probability of Modification. Let ๐(๐ผ*) denote the distribution of in-
stances generated by GenModifiedInst(๐ผ*). If an algorithm were to distinguish between
๐ผ* or ๐ผ โฒ โผ ๐(๐ผ*), it must find some cell in the EltOf or SetOf tables that would have been
modified by GenModifiedInst, to confirm that GenModifiedInst is indeed executed;
otherwise it would make wrong decisions half of the time. We will show an additional prop-
erty of this distribution: none of the entries of EltOf and SetOf are significantly more
likely to be modified during the execution of GenModifiedInst. Consequently, no algo-
rithm may strategically detect the difference between ๐ผ* or ๐ผ โฒ with the desired probability,
unless the number of queries is asymptotically the reciprocal of the maximum probability of
modification among any cells.
Define ๐EltโSet : ๐ฐ ร โฑ โ [0, 1] as the probability that an element is swapped by a set.
More precisely, for an element ๐ โ ๐ฐ and a set ๐ โ โฑ , if ๐ /โ ๐ in the median instance ๐ผ*,
then ๐EltโSet(๐, ๐) = 0; otherwise, it is equal to the probability that ๐ swaps ๐. We note that
these probabilities are taken over ๐ผ โฒ โผ ๐(๐ผ*) where ๐ผ* is a fixed median instance. That is,
as per Algorithm 10, they correspond to the random choices of ๐1, ๐2, the random matching
โณ between ๐ฐ โ (๐1 โช ๐2) and ๐1 โฉ ๐2, and their random choices of choosing each candidate
set ๐. We bound the values of ๐EltโSet via the following lemma.
Lemma 4.3.9. For any ๐ โ ๐ฐ and ๐ โ โฑ , ๐EltโSet(๐, ๐) โค 4800 log๐๐๐
where the probability is
taken over ๐ผ โฒ โผ ๐(๐ผ*).
101
Proof: Let ๐1, ๐2 denote the first two sets picked (uniformly at random) from โฑ to construct
a modified instance of ๐ผ*. For each element ๐ and a set ๐ such that ๐ โ ๐ in the basic
instance ๐ผ*,
๐EltโSet(๐, ๐) = Pr๐ = ๐2 ยท Pr ๐ โ ๐1 โฉ ๐2
ยท Pr ๐ matches to ๐ฐ โ (๐1 โช ๐2) | ๐ โ ๐1 โฉ ๐2
+ Pr๐ /โ {๐1, ๐2}
ยท Pr ๐ โ ๐ โ (๐1 โช ๐2) | ๐ โ ๐
ยท Pr๐ swaps ๐ with ๐2 | ๐ โ ๐ โ (๐1 โช ๐2).
where all probabilities are taken over ๐ผ โฒ โผ ๐(๐ผ*). Next we bound each of the above six terms.
Since we choose the sets ๐1, ๐2 randomly, Pr๐ = ๐2 = 1/๐. We bound the second term by
1. For the third term, since we pick a matching uniformly at random among all possible
(maximum) matchings between ๐ฐ โ (๐1โช๐2) and ๐1โฉ๐2, by symmetry, the probability that
a certain element ๐ โ ๐1 โฉ ๐2 is in the matching is (by properties (b) and (c) of median
instances),
|๐ฐ โ (๐1 โช ๐2)||๐1 โฉ ๐2|
โค 18 log๐
๐/8=
144 log๐
๐.
We bound the fourth term by 1. To compute the fifth term, let ๐๐ denote the number of sets
in โฑ that do not contain ๐. By property (f) of median instances, the probability that ๐ โ ๐
is in ๐ โ (๐1 โช ๐2) given that ๐ /โ {๐1, ๐2} is at most,
๐๐(๐๐ โ 1)
(๐โ 1)(๐โ 2)โค 36๐2 ยท log๐
๐
๐2/2=
72 log๐
๐.
Finally for the last term, note that by symmetry, each pair of matched elements ๐๐โฒ is picked
by GenModifiedInst equiprobably. Thus, for any ๐ โ ๐ โ (๐1 โช ๐2), the probability that
each element ๐โฒ โ ๐1โฉ๐2 is matched to ๐ is 1|๐1โฉ๐2| . By properties (c)โ(e) of median instances,
102
the last term is at most
โ๐โฒโ(๐1โฉ๐2)โ๐
Pr ๐๐โฒ โโณ ยท 1
|Candidate(๐, ๐โฒ)|
= |(๐1 โฉ ๐2) โ ๐| ยท1
|๐1 โฉ ๐2|ยท 1
Candidate(๐, ๐โฒ)
โค 6โ๐ log๐ ยท 1
๐/8ยท 1
๐โ9 log๐4โ๐
=64
๐.
Therefore,
๐EltโSet(๐, ๐) โค1
๐ยท 1 ยท 144 log๐
๐+ 1 ยท 72 log๐
๐ยท 64๐โค 4800 log๐
๐๐. ๏ฟฝ
Proof of Theorem 4.3.2. Now we consider a median instance ๐ผ*, and its corresponding
family of modified sets ๐(๐ผ*). To prove the promised lower bound for randomized protocols
distinguishing ๐ผ* and ๐ผ โฒ โผ ๐(๐ผ*), we apply Yaoโs principle and instead show that no de-
terministic algorithm ๐ may determine whether the input is ๐ผ* or ๐ผ โฒ โผ ๐(๐ผ*) with success
probability at least 2/3 using ๐ = ๐( ๐๐log๐
) queries. Recall that if ๐โs query-answer historyโจ(๐1, ๐1), . . . , (๐๐, ๐๐)โฉ when executed on ๐ผ โฒ is the same as that of ๐ผ*, then ๐ must unavoid-
ably return a wrong decision for the probability mass corresponding to ๐ผ โฒ. We bound the
probability of this event as follows.
Lemma 4.3.10. Let ๐ be the set of queries made by ๐ on ๐ผ*. Let ๐ผ โฒ โผ ๐(๐ผ*) where ๐ผ* is
a given median instance. Then the probability that ๐ returns different outputs on ๐ผ* and ๐ผ โฒ
is at most 4800 log๐๐๐
|๐|.
Proof: Let ๐(๐ผ) denote the algorithmโs output for input instance ๐ผ (whether the given
instance is ๐ผ* or drawn from ๐(๐ผ*)). For each query ๐, let ans๐ผ(๐) denote the answer of
๐ผ to query ๐. Observe that since ๐ is deterministic, if all of the oracleโs answers to its
previous queries are all the same, then it must make the same next query. Combining this
fact with the union bound, we may lower bound the probability that ๐ returns the same
103
outputs on ๐ผ* and ๐ผ โฒ โผ ๐(๐ผ*) as follows:
Pr๐(๐ผ*) = ๐(๐ผ โฒ) โค|๐|โ๐ก=1
Pr ans๐ผ*(๐๐ก) = ans๐ผโฒ(๐๐ก).
For each ๐ โ ๐, let ๐(๐) and ๐(๐) denote respectively the set and element queried by ๐.
Applying Lemma 4.3.9, we obtain
Pr๐(๐ผ*) = ๐(๐ผ โฒ) โค|๐|โ๐ก=1
Pr ans๐ผ*(๐๐ก) = ans๐ผโฒ(๐๐ก) โค|๐|โ๐ก=1
๐EltโSet(๐(๐๐ก), ๐(๐๐ก)) โค4800 log๐
๐๐|๐|.๏ฟฝ
Proof of Theorem 4.3.2. If ๐ does not output correctly on ๐ผ*, the probability of success
of ๐ is less than 1/2; thus, we can assume that ๐ returns the correct answer on ๐ผ*. This
implies that ๐ returns an incorrect solution on the fraction of ๐ผ โฒ โผ ๐ผ โฒ(๐ผ*) for which ๐(๐ผ*) =๐(๐ผ โฒ). Now recall that the distribution in which we apply Yaoโs principle consists of ๐ผ* with
probability 1/2, and drawn uniformly at random from ๐(๐ผ*) also with probability 1/2. Then
over this distribution, by Lemma 4.3.10,
Pr๐ succeeds โค 1โ 1
2Pr๐ผโฒโผ๐(๐ผ*)[๐(๐ผ*) = ๐(๐ผ โฒ)] โค 1โ 1
2
(1โ 4800 log๐
๐๐|๐|)
=1
2+
2400 log๐
๐๐|๐|.
Thus, if the number of queries made by ๐ is less than ๐๐14400 log๐
, then the probability that
๐ returns the correct answer over the input distribution is less than 2/3 and the proof is
complete. ๏ฟฝ
4.3.2. The Set Cover Lower Bound for Large Optimal Value ๐.
Our construction of the median instance ๐ผ* and its associated distribution ๐(๐ผ*) of modified
instances also leads to the lower bound of ฮฉ(๐๐๐) for the problem of computing an approx-
imate solution to Set Cover. This lower bound matches the performance of our algorithm
for large optimal value ๐ and shows that it is tight for some range of value ๐, albeit it only
applies to sufficiently small approximation factor ๐ผ โค 1.01.
104
Proof overview. We construct a distribution over compounds : a compound is a Set Cover
instance that consists of ๐ก = ฮ(๐) smaller instances ๐ผ1, . . . , ๐ผ๐ก, where each of these ๐ก instances
is either the median instance ๐ผ* or a random modified instance drawn from ๐(๐ผ*). By our
construction, a large majority of our distribution is composed of compounds that contains at
least 0.2๐ก modified instances ๐ผ๐ such that, any deterministic algorithm ๐ must fail to distin-
guish ๐ผ๐ from ๐ผ* when it is only allowed to make a small number of queries. A deterministic
๐ can safely cover these modified instances with three sets, incurring a cost (sub-optimality)
of 0.2๐ก. Still, ๐ may choose to cover such an ๐ผ๐ with two sets to reduce its cost, but it then
must err on a different compound where ๐ผ๐ is replaced with ๐ผ*. We track down the trade-off
between the amount of cost that ๐ saves on these compounds by covering these ๐ผ๐โs with
two sets, and the amount of error on other compounds its scheme incurs. ๐ is allowed a
small probability ๐ฟ to make errors, which we then use to upper-bound the expected cost that
๐ may save, and conclude that ๐ still incurs an expected cost of 0.1๐ก overall. We apply
Yaoโs principle (for algorithms with errors) to obtain that randomized algorithms also incur
an expected cost of 0.05๐ก, on compounds with optimal solution size ๐ โ [2๐ก, 3๐ก], yielding the
impossibility result for computing solutions with approximation factor ๐ผ = ๐+0.1๐ก๐
> 1.01
when given insufficient queries. Next, we provide an high-level overview of the lower bound
argument.
Compounds. Consider the median instance ๐ผ* and its associated distribution ๐(๐ผ*) of
modified instances for Set Cover with ๐ elements and ๐ sets, and let ๐ก = ฮ(๐) be a positive
integer parameter. We define a compound I = I(๐ผ1, ๐ผ2, . . . , ๐ผ๐ก) as a set structure instance
consisting of ๐ก median or modified instances ๐ผ1, ๐ผ2, . . . , ๐ผ๐ก, forming a set structure (๐ฐ ๐ก,โฑ ๐ก) of
๐โฒ , ๐๐ก elements and ๐โฒ , ๐๐ก sets, in such a way that each instance ๐ผ๐ occupies separate
elements and sets. Since the optimal solution to each instance ๐ผ๐ is 3 if ๐ผ๐ = ๐ผ*, and 2 if ๐ผ๐
is any modified instance, the optimal solution for the compound is 2๐ก plus the number of
occurrences of the median instance; this optimal objective value is always ฮ(๐).
Random distribution over compounds. Employing Yaoโs principle, we construct a
distribution D of compounds I(๐ผ1, ๐ผ2, . . . , ๐ผ๐ก): it will be applied against any deterministic
105
algorithm ๐ for computing an approximate minimum set cover, which is allowed to err on at
most a ๐ฟ-fraction of the compounds from the distribution (for some small constant ๐ฟ > 0).
For each ๐ โ [๐ก], we pick ๐ผ๐ = ๐ผ* with probability ๐/(๐2
)where ๐ > 2 is a sufficiently large
constant. Otherwise, simply draw a random modified instance ๐ผ๐ โผ ๐(๐ผ*). We aim to show
that, in expectation overD, ๐must output a solution that of size ฮ(๐ก)more than the optimal
set cover size of the given instance I โผ D.
๐ frequently leaves many modified instances undetected. Consider an instance I
containing at least 0.95๐ก modified instances. These instances constitute at least a 0.99-
fraction of D: the expected number of occurrences of the median instance in each compound
is only ๐/(๐2
)ยท ๐ก = ๐(๐ก/๐2), so by Markovโs inequality, the probablity that there are more
than 0.05๐ก median instances is at most ๐(1/๐2) < 0.01 for large ๐. We make use of the
following useful lemma, whose proof is deferred to Section 4.3.2. In what follow, we say
that the algorithm โdistinguishesโ or โdetects the differenceโ between ๐ผ๐ and ๐ผ* if it makes a
query that induces different answers, and thus may deduce that one of ๐ผ๐ or ๐ผ* cannot be the
input instance. In particular, if ๐ผ๐ = ๐ผ* then detecting the difference between them would
be impossible.
Lemma 4.3.11. Fix ๐ โ [๐ก] and consider the distribution over compounds I(๐ผ1, . . . , ๐ผ๐ก)
with ๐ผ๐ โผ ๐(๐ผ*) for ๐ โ ๐ and ๐ผ๐ = ๐ผ* for ๐ /โ ๐ . If ๐ makes at most ๐( ๐๐๐กlog๐
) queries to
I, then it may detect the differences between ๐ผ* and at least 0.75๐ก of the modified instances
{๐ผ๐}๐โ๐ , with probability at most 0.01.
We apply this lemma for any |๐ | โฅ 0.95๐ก (although the statement holds for any ๐ , even
vacuously for |๐ | < 0.75๐ก). Thus, for 0.99 ยท 0.99 > 0.98-fraction of D, ๐ fails to identify,
for at least 0.95๐ก โ 0.75๐ก = 0.2๐ก modified instances ๐ผ๐ in I, whether it is a median instance
or a modified instance. Observe that the query-answer history of ๐ on such I would not
change if we were to replace any combination of these 0.2๐ก modified instances by copies of
๐ผ*. Consequently, if the algorithm were to correctly cover I by using two sets for some of
these ๐ผ๐, it must unavoidably err (return a non-cover) on the compound where these ๐ผ๐โs are
replaced by copies of the median instance.
106
Charging argument. We call a compound I tough if ๐ does not err on I, and ๐ fails
to detect at least 0.2๐ก modified instances; denote by Dtough the conditional distribution of
D restricted to tough instances. For tough I, let cost(I) denote the number of modified
instances ๐ผ๐ that the algorithm decides to cover with three sets. That is, for each tough
compound I, cost(I) measures how far the solution returned by ๐ is, from the optimal set
cover size. Then, there are at least 0.2๐ก โ cost(I) modified instances ๐ผ๐ that ๐ chooses to
cover with only two sets despite not being able to verify whether ๐ผ๐ = ๐ผ* or not. Let ๐ I
denote the set of the indices of these modified instances, so |๐ I| = 0.2๐กโ cost(I). By doing
so, ๐ then errs on the replaced compound ๐(I, ๐ I), denoting the compound similar to I,
except that each modified instance ๐ผ๐ for ๐ โ ๐ I is replaced by ๐ผ*. In this event, we say
that the tough compound I charges the replaced compound ๐(I, ๐ I) via ๐ I. Recall that
the total error of ๐ is ๐ฟ: this quantity upper-bounds the total probability masses of charged
instances, which we will then manipulate to obtain a lower bound on EIโผD[cost(I)].
Instances must share optimal solutions for ๐ to charge the same replaced in-
stance. Observe that many tough instances may charge to the same replaced instance: we
must handle these duplicities. First, consider two tough instances I1 = I2 charing the same
Ir = ๐(I1, ๐ ) = ๐(I2, ๐ ) via the same ๐ = ๐ I1 = ๐ I2 . As I1 = I2 but ๐(I1, ๐ ) = ๐(I2, ๐ ),
these tough instances differ on some modified instances with indices in ๐ . Nonetheless, the
query-answer histories of ๐ operating on I1 and I2 must be the same as their instances in
๐ are both indistinguishable from ๐ผ* by the deterministic ๐. Since ๐ does not err on tough
instances (by definition), both tough I1 and I2 must share the same optimal set cover on
every instance in ๐ . Consequently, for each fixed ๐ , only tough instances that have the same
optimal solution for modified instances in ๐ may charge the same replaced instance via ๐ .
Charged instance is much heavier than charging instances combined. By our con-
struction of I(๐ผ1, . . . , ๐ผ๐ก) drawn from D, Pr[๐ผ๐ = ๐ผ*] = ๐/(๐2
)for the median instance. On the
other hand,โโ
๐=1 Pr[๐ผ๐ = ๐ผ๐] โค (1โ๐/(๐2
))ยท(1/
(๐2
)) < 1/
(๐2
)for modified instances ๐ผ1, . . . , ๐ผโ
sharing the same optimal set cover, because they are all modified instances constructed to
have the two sets chosen by GenModifiedInst as their optimal set cover: each pair of sets
107
is chosen uniformly with probability 1/(๐2
). Thus, the probability that ๐ผ* is chosen is more
than ๐ times the total probability that any ๐ผ๐ is chosen. Generalizing this observation, we
consider tough instances I1, I2, . . . ,Iโ charging the same Ir via ๐ , and bound the difference
in probabilities that Ir and any I๐ are drawn. For each index in ๐ , it is more than ๐ times
more likely for D to draw the median instance, rather than any modified instances of a fixed
optimal solution. Then, for the replaced compound Ir that ๐ errs, ๐(Ir) โฅ ๐|๐ | ยทโโ๐=1 ๐(I
๐)
(where ๐ denotes the probability mass in D, not in Dtough). In other words, the probability
mass of the replaced instance charged via ๐ is always at least ๐|๐ | times the total probability
mass of the charging tough instances.
Bounding the expected cost using ๐ฟ. In our charging argument by tough instances
above, we only bound the amount of charges on the replaced instances via a fixed ๐ . As
there are up to 2๐ก choices for ๐ , we scale down the total amount charged to a replaced
instance by a factor of 2๐ก, so thatโ
tough I ๐|๐ I|๐(I)/2๐ก lower bounds the total probability
mass of the replaced instances that ๐ errs.
Let us first focus on the conditional distribution Dtough restricted to tough instances.
Recall that at least a (0.98โ ๐ฟ)-fraction of the compounds in D are tough: ๐ fails to detect
differences between 0.2๐ก modified instances from the median instance with probability 0.98,
and among these compounds, ๐ may err on at most a ๐ฟ-fraction. So in the conditional
distribution Dtough over tough instances, the individual probability mass is scaled-up to
๐tough(I) โค ๐(I)0.98โ๐ฟ
. Thus,
โtough I ๐
|๐ I|๐(I)
2๐กโฅโ
tough I ๐|๐ I|(0.98โ ๐ฟ)๐tough(I)
2๐ก=
(0.98โ ๐ฟ)EIโผDtough
[๐|๐ I|
]2๐ก
.
As the probability mass above cannot exceed the total allowed error ๐ฟ, we have
๐ฟ
0.98โ ๐ฟยท 2๐ก โฅ EIโผDtough
[๐|๐ I|
]โฅ EIโผDtough
[๐0.2๐กโcost(I)
]โฅ ๐0.2๐กโE
IโผDtough [cost(I)],
108
where Jensenโs inequality is applied in the last step above. So,
EIโผDtough [cost(I)] โฅ 0.2๐กโ๐ก+ log ๐ฟ
0.98โ๐ฟ
log ๐=
(0.2โ 1
log ๐
)๐กโ
log ๐ฟ0.98โ๐ฟ
log ๐โฅ 0.11๐ก,
for sufficiently large ๐ (and ๐) when choosing ๐ฟ = 0.02.
We now return to the expected cost over the entire distribution I. For simplicity, define
cost(I) = 0 for any non-tough I. This yields EIโผD[cost(I)] โฅ (0.98โ ๐ฟ)EIโผDtough [cost(I)] โฅ(0.98โ๐ฟ)ยท0.11๐ก โฅ 0.1๐ก, establishing the expected cost of any deterministic ๐ with probability
of error at most 0.02 over D.
Establishing the lower bound for randomized algorithms. Lastly, we apply Yaoโs
principle4 to obtain that, for any randomized algorithm with error probability ๐ฟ/2 = 0.01,
its expected cost under the worst input is at least 12ยท 0.1๐ก = 0.05๐ก. Recall now that our cost
here lower-bounds the sub-optimality of the computed set cover (that is, the algorithm uses
at least cost more sets to cover the elements than the optimal solution does). Since our input
instances have optimal solution ๐ โ [2๐ก, 3๐ก] and the randomized algorithm returns a solution
with cost at least 0.05๐ก in expectation, it achieves an approximation factor of no better than
๐ผ = ๐+0.05๐ก๐
> 1.01 with ๐( ๐๐๐กlog๐
) queries. Theorem 4.3.4 then follows, noting the substitution
of our problem size: ๐๐๐กlog๐
= (๐โฒ/๐ก)(๐โฒ/๐ก)๐กlog(๐โฒ/๐ก)
= ฮ( ๐โฒ๐โฒ
๐โฒ log๐โฒ ).
Proof of Lemma 4.3.11. First, we recall the following result from Lemma 4.3.10 for
distinguishing between ๐ผ* and a random ๐ผ โฒ โผ ๐(๐ผ*).
Corollary 4.3.12. Let ๐ be the number of queries made by ๐ on ๐ผ๐ โผ ๐(๐ผ*) over ๐ elements
and ๐ sets, where ๐ผ* is a median instance. Then the probability that ๐ detects a difference
between ๐ผ๐ and ๐ผ* in one of its queries is at most 4800๐ log๐๐๐
.
Marbles and urns. Fix a compound I(๐ผ1, . . . , ๐ผ๐ก). Let ๐ , ๐๐4800 log๐
, and then consider the
following, entirely different, scenario. Suppose that we have ๐ก urns, where each urn contains
๐ marbles. In the ๐th urn, in case ๐ผ๐ is a modified instance, we put in this urn one red marble
4Here we use the Monte Carlo version where the algorithm may err, and use cost instead of the timecomplexity as our measure of performance. See, e.g., Proposition 2.6 in [139] and the description therein.
109
and ๐ โ 1 white marbles; otherwise if ๐ผ๐ = ๐ผ*, we put in ๐ white marbles. Observe that
the probability of obtaining a red marble by drawing ๐ marbles from a single urn without
replacement is exactly ๐/๐ (for ๐ โค ๐ ). Now, we will relate the probability of drawing red
marbles to the probability of successfully distinguishing instances. We emphasize that we
are only comparing the probabilities of events for the sake of analysis, and we do not imply
or suggest any direct analogy between the events themselves.
Corollary 4.3.12 above bounds the probability that the algorithm successfully distin-
guishes a modified instance ๐ผ๐ from ๐ผ* with 4800๐ log๐๐๐
= ๐/๐ . Then, the probability of
distinguishing between ๐ผ๐ and ๐ผ* using ๐ queries, is bounded from above by the proba-
bility of obtaining a red marble after drawing ๐ marbles from an urn. Consequently, the
probability that the algorithm distinguishes 3๐ก/4 instances is bounded from above by the
probability of drawing the red marbles from at least 3๐ก/4 urns. Hence, to prove that the
event of Lemma 4.3.11 occurs with probability at most 0.01, it is sufficient to upper-bound
the probability that an algorithm obtains 3๐ก/4 red marbles by 0.01.
Consider an instance of ๐ก urns; for each urn ๐ โ [๐ก] corresponding to a modified instance
๐ผ๐, exactly one of its ๐ marbles is red. An algorithm may draw marbles from each urn, one by
one without replacement, for potentially up to ๐ times. By the principle of deferred decisions,
the red marble is equally likely to appear in any of these ๐ draws, independent of the events
for other urns. Thus, we can create a tuple of ๐ก random variables ๐ฏ = (๐1, . . . , ๐๐ก) such that
for each ๐ โ [๐ก], ๐๐ is chosen uniformly at random from {1, . . . , ๐ }. The variable ๐๐ represents
the number of draws required to obtain the red marble in the ๐th urn; that is, only the ๐ th๐
draw from the ๐th urn finds the red marble from that urn. In case ๐ผ๐ is a median instance,
we simply set ๐๐ = ๐ +1 indicating that the algorithm never detects any difference as ๐ผ๐ and
๐ผ* are the same instance.
We now show the following two lemmas in order to bound the number of red marbles the
algorithm may encounter throughout its execution.
Lemma 4.3.13. Let ๐ > 3 be a fixed constant and define ๐ฏhigh = {๐ | ๐๐ โฅ ๐ ๐}. If ๐ก โฅ 14๐,
then |๐ฏhigh| โฅ (1โ 2๐)๐ก with probability at least 0.99.
Proof: Let ๐ฏlow = {1, . . . , ๐ก}โ๐ฏhigh. Notice that for the ๐th urn, Pr ๐ โ ๐ฏlow < 1๐independently
110
of other urns, and thus |๐ฏlow| is stochastically dominated by B(๐ก, 1๐), the binomial distribution
with ๐ก trials and success probability 1๐. Applying Chernoff bound, we obtain
Pr |๐ฏlow| โฅ2๐ก
๐โค ๐โ
๐ก3๐ < 0.01.
Hence, |๐ฏhigh| โฅ ๐กโ 2๐ก๐= (1โ 2
๐)๐ก with probability at least 0.99, as desired. ๏ฟฝ
Lemma 4.3.14. If the total number of draws made by the algorithm is less than (1โ 3๐) ๐ ๐ก๐,
then with probability at least 0.99, the algorithm will not obtain red marbles from at least ๐ก๐
urns.
Proof: If the total number of such draws is less than (1 โ 3๐) ๐ ๐ก๐, then the number of draws
from at least 3๐ก๐urns is less than ๐
๐each. Assume the condition of Lemma 4.3.13: for at least
(1 โ 2๐)๐ก urns, ๐๐ โฅ ๐
๐. That is, the algorithm will not encounter a red marble if it makes
less than ๐ ๐draws from such an urn. Then, there are at least ๐ก
๐urns with ๐๐ โฅ ๐
๐from which
the algorithm makes less than ๐ ๐draws, and thus does not obtain a red marble. Overall this
event holds with probability at least 0.99 due to Lemma 4.3.13. ๏ฟฝ
We substitute ๐ = 4 and assume sufficiently large ๐ก. Suppose that the deterministic
algorithm makes less than (1 โ 34) ๐ ๐ก4= ๐ ๐ก
16queries, then for a fraction of 0.99 of all possible
tuples ๐ฏ , there are ๐ก/4 instances ๐ผ๐ that the algorithm fails to detect their differences from
๐ผ*: the probability of this event is lower-bounded by that of the event where the red marbles
from those corresponding urns ๐ are not drawn. Therefore, the probability that the algorithm
makes queries that detect differences between ๐ผ* and more than 3๐ก/4 instances ๐ผ๐โs is bounded
by 0.01, concluding our proof of Lemma 4.3.11.
4.4. Sub-Linear Algorithms for the Set Cover Problem
In this chapter, we present two different approximation algorithms for Set Cover with sub-
linear query in the oracle model: SmallSetCover and LargeSetCover. Both of our
algorithms rely on the techniques from the recent developments on Set Cover in the streaming
model. However, adopting those techniques in the oracle model requires novel insights and
technical development.
Throughout the description of our algorithms, we assume that we have access to a black
111
box subroutine that given the full Set Cover instance (where all members of all sets are
revealed), returns a ๐-approximate solution.5
The first algorithm (SmallSetCover) returns a (๐ผ๐ + ๐) approximate solution of
the Set Cover instance using ๐(1๐(๐(๐
๐)
1๐ผโ1 + ๐๐)) queries, while the second algorithm
(LargeSetCover) achieves an approximation factor of (๐+ ๐) using ๐(๐๐๐๐2
) queries, where
๐ is the size of the minimum set cover. These algorithms can be combined so that the number
of queries of the algorithm becomes asymptotically the minimum of the two:
Theorem 4.4.1. There exists a randomized algorithm for Set Cover in the oracle model
that w.h.p.6 computes an ๐(๐ log ๐)-approximate solution and uses ๐(min{๐(๐๐
)1/ log๐+
๐๐ , ๐๐๐}) = ๐(๐+ ๐
โ๐) number of queries.
Our algorithms use the following two sampling techniques developed for Set Cover in the
streaming model [61]: Element Sampling and Set Sampling. The first technique, Element
Sampling, states that in order to find a (1โ๐ฟ)-cover of ๐ฐ w.h.p., it suffices to solve Set Cover
on a subset of elements of size ๐(๐๐ log๐๐ฟ
) picked uniformly at random. It shows that we may
restrict our attention to a subproblem with a much smaller number of elements, and our
solution to the reduced instance will still cover a good fraction of the elements in the original
instance. The next technique, Set Sampling, shows that if we pick โ sets uniformly at random
from โฑ in the solution, then each element that is not covered by any of picked sets w.h.p.
only occurs in ๐(๐โ) sets in โฑ ; that is, we are left with a much sparser subproblem to solve.
The formal statements of these sampling techniques are as follows. See [61] for the proofs.
Lemma 4.4.2 (Element Sampling). Consider an instance of Set Cover on (๐ฐ , โฑ) whoseoptimal cover has size at most ๐. Let ๐ฐsmp be a subset of ๐ฐ of size ฮ
(๐๐ log๐
๐ฟ
)chosen
uniformly at random, and let ๐smp โ โฑ be a ๐-approximate cover for ๐ฐsmp. Then, w.h.p. ๐smp
covers at least (1โ ๐ฟ)|๐ฐ| elements.
Lemma 4.4.3 (Set Sampling). Consider an instance (๐ฐ ,โฑ) of Set Cover. Let โฑrnd be a
collection of โ sets picked uniformly at random. Then, w.h.p. โฑrnd covers all elements that
5The approximation factor ๐ may take on any value between 1 and ฮ(log ๐) depending on the computa-tional model one assumes.
6An algorithm succeeds with high probability (w.h.p.) if its failure probability can be decreased to ๐โ๐ forany constant ๐ > 0 without affecting its asymptotic performance, where ๐ denotes the input size.
112
appear in ฮฉ(๐ log๐โ
) sets of โฑ .
4.4.1. First Algorithm: small values of ๐
The algorithm of this section is a modified variant of the streaming algorithm of Set Cover
in [97] that works in the sub-linear query model. Similarly to the algorithm of [97], our
algorithm SmallSetCover considers different guesses of the value of an optimal solution
(๐โ1 log ๐ guesses) and performs the core iterative algorithm IterSetCover for all of them
in parallel. For each guess โ of the size of an optimal solution, the IterSetCover goes
through 1/๐ผ iterations and by applying Element Sampling, guarantees that w.h.p. at the
end of each iteration, the number of uncovered elements reduces by a factor of ๐โ1/๐ผ. Hence,
after 1/๐ผ iterations all elements will be covered. Furthermore, since the number of sets
picked in each iteration is at most โ, the final solution has at most ๐โ sets where ๐ is the
performance of the offline block OfflineSC that IterSetCover uses to solve the reduced
instances constructed by Element Sampling.
Although our general approach in IterSetCover is similar to the iterative core of the
streaming algorithm of Set Cover, there are challenges that we need to overcome so that it
works efficiently in the query model. Firstly, the approach of [97] relies on the ability to
test membership for a set-element pair when executing its set filtering subroutine: given a
subset S, the algorithm of [97] requires to compute |๐ โฉ S| which cannot be implemented
efficiently in the query model (in the worst case, requires ๐|S| queries). Instead, here we
employ the set sampling which w.h.p. guarantees that the number of sets that contain an
(yet uncovered) element is small.
Next challenge is achieving๐(๐/๐)1/(๐ผโ1)+๐๐ query bound for computing an ๐ผ-approximate
solution. As mentioned earlier, both our approach and the algorithm of [97] need to run the
algorithm in parallel for different guesses โ of the size of an optimal solution. However,
since IterSetCover performs ๐(๐/โ)1/(๐ผโ1) + ๐โ queries, if SmallSetCover invokes
IterSetCover with guesses in an increasing order then the query complexity becomes
๐๐1/(๐ผโ1) + ๐๐; on the other hand, if it invokes IterSetCover with guesses in a de-
creasing order then the query complexity becomes ๐(๐/๐)1/(๐ผโ1) +๐๐. To solve this issue,
SmallSetCover performs in two stages: in the first stage, it finds a (log ๐)-estimate of
113
๐ by invoking IterSetCover using ๐+ ๐๐ queries (assuming guesses are evaluated in an
increasing order) and then in the second rounds it only invokes IterSetCover with ap-
proximation factor ๐ผ in the smaller ๐(log ๐)-approximate region around the (log ๐)-estimate
of ๐ computed in the first stage. Thus, in our implementation, besides the desired approx-
imation factor, IterSetCover receives an upper bound and a lower bound on the size of
an optimal solution.
Now, we provide a detailed description of IterSetCover. It receives ๐ผ, ๐, ๐ and ๐ข as
its arguments, and it is guaranteed that the size of an optimal cover of the input instance,
๐, is in [๐, ๐ข]. Note that the algorithm does not know the value of ๐ and the sampling
techniques described in Section 4.2 rely on ๐. Therefore, the algorithm needs to find a
(1+ ๐) estimate7 of ๐ denoted as โ. This can be done by trying all powers of (1+ ๐) in [๐, ๐ข].
The parameter ๐ผ denotes the trade-off between the query complexity and the approximation
guarantee that the algorithm achieves. Moreover, we assume that the algorithm has access
to a ๐-approximate black box solver of Set Cover.
IterSetCover first performs Set Sampling to cover all elements that occur in ฮฉ(๐/โ)
sets. Then it goes through ๐ผ โ 2 iterations and in each iteration, it performs Element
Sampling with parameter ๐ฟ = ๐((โ/๐)1/(๐ผโ1)). By Lemma 4.4.2, after (๐ผ โ 2) iterations,
w.h.p. only โ(๐โ
)1/(๐ผโ1)elements remain uncovered, for which the algorithm finds a cover by
invoking the offline set cover solver. The parameters are set so that all (๐ผโ1) instances that
are required to be solved by the offline set cover solver (the (๐ผโ 2) instances constructed by
Element Sampling and the final instance) are of size ๐(๐(๐โ
)1/(๐ผโ1)).
In the rest of this section, we show that SmallSetCover w.h.p. returns an almost (๐๐ผ)-
approximate solution of Set Cover(๐ฐ ,โฑ) with query complexity ๐(๐(๐๐
) 1๐ผโ1 + ๐๐) where ๐
is the size of a minimum set cover.
Theorem 4.4.4. SmallSetCover outputs a (๐ผ๐+๐)-approximate solution of Set Cover(๐ฐ ,โฑ)using ๐(1
๐(๐(๐/๐)
1๐ผโ1 + ๐๐)) number of queries w.h.p., where ๐ is the size of an optimal so-
lution of (๐ฐ ,โฑ).
To analyze the performance of SmallSetCover, first we need to analyze the proce-
7The exact estimate that the algorithm works with is a (1 + ๐2๐๐ผ ) estimate.
114
Algorithm 11 IterSetCover is the main procedure of the SmallSetCover algorithmfor the Set Cover problem.
1: procedure IterSetCover(๐ผ, ๐, ๐, ๐ข)2: โ Try all (1 + ๐
2๐ผ๐)-approximate guesses of ๐
3: for โ โ {(1 + ๐2๐ผ๐
)๐ | log1+ ๐2๐ผ๐
๐ โค ๐ โค log1+ ๐2๐ผ๐
๐ข} do4: solโ โ collection of โ sets picked uniformly at random โ Set Sampling5: ๐ฐrem โ ๐ฐ โ
โ๐โsolโ ๐ โ ๐โ EltOf
6: for ๐ = 1 to ๐ผโ 2 do
7: Sโ sample of ๐ฐrem of size ๐(๐โ(๐โ
) 1๐ผโ1 )
8: ๐ โ OfflineSC(S, โ)9: if ๐ = null then10: break โ Try the next value of โ
11: solโ โ solโโ๐
12: ๐ฐrem โ ๐ฐrem โโ
๐โ๐ ๐โ ๐๐โEltOf
13: if |๐ฐrem| โค โ(๐โ
)1/(๐ผโ1)then โ Feasibility Test
14: ๐ โ OfflineSC(๐ฐrem, โ)15: if ๐ = null then16: solโ โ solโ
โ๐17: return solโ
dures invoked by SmallSetCover: IterSetCover and OfflineSC. The OfflineSC
procedure receives as an input a subset of elements S and an estimate on the size of an opti-
mal cover of S using sets in โฑ . The OfflineSC algorithm first determines all occurrences
of S in โฑ . Then it invokes a black box subroutine that returns a cover of size at most ๐โ (if
there exists a cover of size โ for S) for the reduced Set Cover instance over S.
Moreover, we assume that all subroutines have access to the EltOf and SetOf oracles,
|๐ฐ| and |โฑ|.
Lemma 4.4.5. Suppose that each ๐ โ S appears in ๐(๐โ) sets of โฑ and lets assume that
there exists a set of โ sets in โฑ that covers S. Then OfflineSC(S, โ) returns a cover of size
at most ๐โ of S using ๐(๐|S|โ) queries.
Proof: Since each element of S is contained by ๐(๐โ) sets in โฑ , the information required to
solve the reduced instance on S can be obtained by ๐(๐|S|โ) queries (i.e. ๐(๐
โ) SetOf query
per element in S). ๏ฟฝ
Lemma 4.4.6. The cover constructed by the outer loop of IterSetCover(๐ผ, ๐, ๐, ๐ข) with
the parameter โ > ๐, solโ, w.h.p. covers ๐ฐ .
115
Algorithm 12 OfflineSC(S, โ) invokes a black box that returns a cover of size at most ๐โ(if there exists a cover of size โ for S) for the Set Cover instance that is the projection of โฑover S.1: procedure OfflineSC(S, โ)2: โฑS โ โ 3: for all element ๐ โ S do4: โฑ๐ โ the collection of sets containing ๐5: โฑS โ โฑS โช โฑ๐
6: ๐ โ solution of size at most ๐โ for Set Cover on (S,โฑS) constructed by a solver7: โ If there exists no such cover, then ๐ = null8: return ๐Proof: After picking โ sets uniformly at random, by Set Sampling (Lemma 5.2.3), w.h.p. each
element that is not covered by the sampled sets appears in ๐(๐โ) sets of โฑ . Next, by Element
Sampling (Lemma 4.4.2 with ๐ฟ =(โ๐
)1/(๐ผโ1)), at the end of each inner iteration, w.h.p. the
number of uncovered elements decreases by a factor of(โ๐
)1/(๐ผโ1). Thus after at most (๐ผโ2)
iterations, w.h.p. less than โ(๐โ
)1/(๐ผโ1)elements remain uncovered. Finally, OfflineSC is
invoked on the remaining elements; hence, solโ w.h.p. covers ๐ฐ . ๏ฟฝ
Next we analyze the query complexity and the approximation guarantee of IterSet-
Cover. As we only apply Element Sampling and Set Sampling polynomially many times,
all invocations of the corresponding lemmas during an execution of the algorithm must suc-
ceed w.h.p., so we assume their high probability guarantees for the proofs in rest of this
section.
Lemma 4.4.7. Given that ๐ โค ๐ โค ๐ข1+๐/(2๐ผ๐)
, w.h.p. IterSetCover(๐ผ, ๐, ๐, ๐ข) finds a (๐๐ผ+
๐)-approximate solution of the input instance using ๐ (1๐(๐(๐
๐)1/(๐ผโ1) + ๐๐)
)queries.
Proof: Let โ๐ = (1+ ๐2๐ผ๐
)โlog1+ ๐
2๐ผ๐๐โbe the smallest power of 1+ ๐
2๐ผ๐greater than or equal to
๐. Note that it is guaranteed that โ๐ โ [๐, ๐ข]. By Lemma 4.4.6, IterSetCover terminates
with a guess value โ โค โ๐. In the following we compute the query complexity of the run of
IterSetCover with a parameter โ โค โ๐.
Set Sampling component picks โ sets and then update the set of elements that are not
covered by those sets, ๐ฐrem, using ๐(๐โ) EltOf queries. Next, in each iteration of the inner
loop, the algorithm samples a subset S of size ๐ (โ(๐/โ)1/(๐ผโ1))from ๐ฐrem. Recall that, by
Set Sampling (Lemma 5.2.3), each ๐ โ S โ ๐ฐrem appears in at most ๐(๐/โ) sets. Since
116
each element in ๐ฐrem appears in ๐(๐/โ), OfflineSC returns a cover ๐ of size at most
๐โ using ๐ (๐ (๐/โ)1/(๐ผโ1))SetOf queries (Lemma 4.4.5). By the guarantee of Element
Sampling (Lemma 4.4.2), the number of elements in ๐ฐrem that are not covered by ๐ is at
most (โ/๐)1/(๐ผโ1)|๐ฐrem|. Finally, at the end of each inner loop, the algorithm updates the
set of uncovered elements ๐ฐrem by using ๐(๐โ) EltOf queries. The Feasibility Test which is
passed w.h.p. for โ โค โ๐ ensures that the final run of OfflineSC performs ๐(๐(๐/โ)1/(๐ผโ1))
SetOf queries. Hence, the total number of queries performed in each iteration of the outer
loop of IterSetCover with parameter โ โค โ๐ is ๐ (๐ (๐/โ)1/(๐ผโ1) + ๐โ).
By Lemma 4.4.6, if โ๐ โค ๐ข, then the outer loop of IterSetCover is executed for
๐ โค โ โค โ๐ before it terminates. Thus, the total number of queries made by IterSetCover
is:
log1+ ๐2๐ผ๐
โ๐โ๐=โlog1+ ๐
2๐ผ๐๐โ
๐โโ๐
(๐
(1 + ๐2๐ผ๐
)๐
) 1๐ผโ1
+ ๐(1 +๐
2๐ผ๐)๐
โโ = ๐(๐(๐๐
) 1๐ผโ1
(log1+ ๐
2๐ผ๐
โ๐๐
)+
๐โ๐๐/(๐๐ผ)
)
= ๐(1
๐
(๐(๐๐
)1/(๐ผโ1)
+ ๐๐
)).
Now, we show that the number of sets returned by IterSetCover is not more than
(๐ผ๐+ ๐)โ๐. Set Sampling picks โ sets and each run of OfflineSC returns at most ๐โ sets.
Thus the size of the solution returned by IterSetCover is at most (1 + (๐ผ โ 1)๐)โ๐ <
(๐ผ๐+ ๐)๐. ๏ฟฝ
Next, we prove the main theorem of the section.
Algorithm 13 The description of the SmallSetCover algorithm.
1: procedure SmallSetCover(๐ผ, ๐)2: solโ IterSetCover(log ๐, 1, 1, ๐)3: ๐โฒ โ |sol| โ Find a ๐ log ๐ estimate of ๐.4: return IterSetCover(๐ผ, ๐, โ ๐โฒ
๐ log๐โ, โ๐โฒ(1 + ๐
2๐ผ๐)โ)
Proof of Theorem 4.4.4. The algorithm SmallSetCover first finds a (๐ log ๐)-approximate
solution of Set Cover(๐ฐ ,โฑ), sol, with ๐(๐+๐๐) queries by calling IterSetCover(log ๐, 1, 1, ๐).
Having that ๐ โค ๐โฒ = |sol| โค (๐ log ๐)๐, the algorithm calls IterSetCover with ๐ผ as
the approximation factor and [โ๐โฒ/(๐ log ๐)โ, โ๐โฒ(1 + ๐2๐ผ๐
)โ] as the range containing ๐. By
117
Lemma 4.4.7, the second call to IterSetCover in SmallSetCover returns a (๐ผ๐ + ๐)-
approximate solution of Set Cover(๐ฐ ,โฑ) using the following number of queries:
๐(1
๐
(๐
(๐
๐/(๐ log ๐)
) 1๐ผโ1
+ ๐๐
))= ๐(1
๐
(๐(๐๐
) 1๐ผโ1
+ ๐๐
)). ๏ฟฝ
4.4.2. Second Algorithm: large values of ๐
The second algorithm, LargeSetCover, works strictly better than SmallSetCover for
large values of ๐ (๐ โฅ โ๐). The advantage of LargeSetCover is that it does not need to
update the set of uncovered elements at any point and simply avoids the additive ๐๐ term
in the query complexity bound; the result of Section 4.5 suggests that the ๐๐ term may be
unavoidable if one wishes to maintain the uncovered elements. Note that the guarantees of
LargeSetCover is that at the end of the algorithm, w.h.p. the ground set ๐ฐ is covered.
The algorithm LargeSetCover, given in Algorithm 14, first randomly picks ๐โ/3 sets.
By Set Sampling (Lemma 5.2.3), w.h.p. every element that occurs in ฮฉ(๐/(๐โ)) sets of โฑwill be covered by the picked sets. It then solves the Set Cover instance over the elements
that occur in ๐(๐/(๐โ)) sets of โฑ by an offline solver of Set Cover using ๐(๐/(๐โ)) queries;
note that this set of elements may include some already covered elements. In order to get
the promised query complexity, LargeSetCover enumerates the guesses โ of the size of
an optimal set cover in the decreasing order. The algorithm returns feasible solutions for
โ โฅ ๐ and once it cannot find a feasible solution for โ, it returns the solution constructed for
the previous guess of ๐, i.e., โ(1 + ๐/(3๐)).
Since LargeSetCover performs Set Sampling for ๐(๐โ1) iterations, w.h.p. the total
query complexity of LargeSetCover is ๐(๐๐/(๐๐2)).
Note that testing whether the number of occurrences of an element is ๐(๐/(๐โ)) only
requires a single query, namely SetOf(๐, ๐๐ log๐๐โ
). We now prove the desired performance of
LargeSetCover.
Lemma 4.4.8. LargeSetCover returns a (๐+๐)-approximate solution of Set Cover(๐ฐ ,โฑ)w.h.p.
Proof: The algorithm LargeSetCover tries to construct set covers of decreasing sizes until
118
Algorithm 14 A (๐ + ๐)-approximation algorithm for the Set Cover problem. We assumethat the algorithm has access to EltOf and SetOf oracles for Set Cover(๐ฐ ,โฑ), as well as|๐ฐ| and |โฑ|.1: procedure LargeSetCover(๐)2: โ try all (1 + ๐
3๐)-approximate guesses of ๐
3: for โ โ {(1 + ๐3๐)๐ | 0 โค ๐ โค log1+ ๐
3๐๐} in the decreasing order do
4: rndโ โ collection of ๐โ3sets picked uniformly at random โ set sampling
5: โฑrare โ โ โ intersection with rare elements6: for all ๐ โ ๐ฐ do7: if ๐ appears in <๐๐ log๐
๐โsets then โ size test: SetOf(๐, ๐๐ log๐
๐โ)
8: โฑ๐ โ collection of sets containing ๐ โ ๐(๐๐โ) SetOf queries
9: โฑrare โ โฑrare โช โฑ๐
10: Sโ Sโ{๐}
11: ๐ โ solution of Set Cover(S,โฑrare) returned by a ๐-approximation algorithm12: if |๐| โค ๐โ then13: solโ rndโ โช ๐14: else15: return sol โ solution for the previous value of โ
it fails. Clearly, if ๐ โค โ then the black box algorithm finds a cover of size at most ๐โ for
any subset of ๐ฐ , because ๐ sets are sufficient to cover ๐ฐ . In other words, the algorithm
does not terminate with โ โฅ ๐. Moreover, since the algorithm terminates when โ is smaller
than ๐, the size of the set cover found by LargeSetCover is at most ( ๐3+ ๐)(1 + ๐
3๐)โ <
( ๐3+ ๐)(1 + ๐
3๐)๐ < (๐+ ๐)๐. ๏ฟฝ
Lemma 4.4.9. The number of queries made by LargeSetCover is ๐(๐๐๐๐2
).
Proof: The value of โ in any successful iteration of the algorithm is greater than ๐/(๐+ ๐);
otherwise, the size of the solution constructed by the algorithm is at most (๐+๐)โ < ๐ which
is a contradiction.
Set Sampling guarantees that w.h.p. each uncovered element appears in ฮ(๐/๐โ) sets
and thus the algorithm needs to perform ๐(๐๐๐โ) SetOf queries to construct โฑrare. Moreover,
the number of required queries in the size test step is ๐(๐) because we only need one SetOf
query per each element in ๐ฐ . Thus, the query complexity of LargeSetCover(๐) is bounded
119
by
log1+ ๐3๐
๐โ๐=log1+ ๐
3๐
๐๐+๐
๐(๐+๐๐
๐(1 + ๐3๐)๐
)= ๐ ((๐+
๐๐
๐๐) log1+ ๐
3๐
๐
๐
)= ๐ (๐๐
๐๐2
). ๏ฟฝ
4.5. Lower Bound for the Cover Verification Problem
In this section, we give a tight lower bound on a feasibility variant of the Set Cover problem
which we refer to as Cover Verification. In Cover Verification(๐ฐ ,โฑ ,โฑ๐), besides a collection
of ๐ sets โฑ and ๐ elements ๐ฐ , we are given indices of ๐ sets โฑ๐ โ โฑ , and the goal is to
determine whether they are covering the whole universe ๐ฐ or not. We note that, throughout
this section, the parameter ๐ is a candidate for, but not necessarily the value of, the size of
the minimum set cover.
A naive approach for this decision problem is to query all elements in the given ๐ sets
and then check whether they cover ๐ฐ or not; this approach requires ๐(๐๐) queries. However,
in what follows we show that this approach is tight and no randomized protocol can decide
whether the given ๐ sets cover the whole universe with probability of success at least 0.9
using ๐(๐๐) queries.
Theorem 4.5.1. Any (randomized) algorithm for deciding whether a given ๐ = ฮฉ(log ๐)
sets covers all elements with probability of success at least 0.9, requires ฮฉ(๐๐) queries.
Proof: Observe that according to our instance construction, the algorithm may verify, with
a single query, whether a certain swap occurs in a certain slab. Namely, it is sufficient to
query an entry of EltOf or SetOf that would have been modified by that swap, and check
whether it is actually modified or not. For simplicity, we assume that the algorithm has the
knowledge of our construction. Further, without loss of generality, the algorithm does not
make multiple queries about the same swap, or make a query that is not corresponding to
any swap.
We employ Yaoโs principle as follows: to prove a lower bound for randomized algorithms,
we show a lower bound for any deterministic algorithm on a fixed distribution of input
instances. Let ๐ = ๐โ๐ be the number of possible swaps in each slab; assume ๐ = ฮ(๐). We
120
define our distribution of instances as follows: each of the ๐ ๐ possible Yes instances occurs
with probability 1/(2๐ ๐), and each of the ๐๐ ๐โ1 possible No instances occurs with probability
1/(2๐๐ ๐โ1). Equivalently speaking, we create a random Yes instance by making one swap on
each basic slab. Then we make a coin flip: with probability 1/2 we pick a random slab and
undo the swap on that slab to obtain a No instance; otherwise we leave it as a Yes instance.
To prove by contradiction, assume there exists a deterministic algorithm that solves the
Cover Verification problem over this distribution of instances with ๐ = ๐(๐ ๐) queries.
Consider the Yes instances portion of the distribution, and observe that we may alterna-
tively interpret the random process generating them as as follows. For each slab, one of its ๐
possible swaps is chosen uniformly at random. This condition again follows the scenario con-
sidered in Section 4.3.2: we are given ๐ urns (slabs) of each consisting of ๐ marbles (possible
swap locations), and aim to draw the red marble (swapped entry) from a large fraction of
these urns. Following the proof of Lemmas 4.3.13-4.3.14, we obtain that if the total number
of queries made by the algorithm is less than (1โ 3๐) ๐ ๐
๐, then with probability at least 0.99,
the algorithm will not see any swaps from at least ๐๐slabs.
Then, consider the corresponding No instances obtained by undoing the swap in one of
the slabs of the Yes instance. Suppose that the deterministic algorithm makes less than
(1 โ 3๐) ๐ ๐
๐queries, then for a fraction of 0.99 of all possible tuples ๐ฏ , the output of the
Yes instance is the same as the output of 1๐fraction of No instances, namely when the
slab containing no swap is one of the ๐๐slabs that the algorithm has not detected a swap
in the corresponding Yes instance; the algorithm must answer incorrectly on half of the
corresponding weight in our distribution of input instances. Thus the probability of success
for any algorithm with less than (1โ 3๐) ๐ ๐
๐queries is at most
1โ Pr |๐ฏhigh| โฅ (1โ 2
๐)๐(
1
๐)(1
2) โค 1โ 0.495
๐< 0.9,
for a sufficiently small constant ๐ > 3 (e.g. ๐ = 4). As ๐ = ฮ(๐) and by Yaoโs principle, this
implies the lower bound of ฮฉ(๐๐) for the Cover Verification problem. ๏ฟฝ
While this lower bound does not directly lead to a lower bound on Set Cover, it sug-
gests that verifying the feasibility of a solution may even be more costly than finding the
121
approximate solution itself; any algorithm bypassing this ฮฉ(๐๐) lower bound may not solve
Cover Verification as a subroutine.
We prove our lower bound by designing the Yes and No instances that are hard to
distinguish, such that for a Yes instance, the union of the given ๐ sets is ๐ฐ , while for a No
instance, their union only covers ๐โ1 elements. Each Yes instance is indistinguishable from
a good fraction of No instances. Thus any algorithm must unavoidably answer incorrectly
on half of these fractions, and fail to reach the desired probability of success.
4.5.1. Underlying Set Structure
Our instance contains ๐ sets and ๐ elements (so ๐ = ๐), where the first ๐ sets forms
โฑ๐, the candidate for the set cover we wish to verify. We first consider the incidence matrix
representation, such that the rows represent the sets and the columns represent the elements.
We focus on the first ๐/๐ elements, and consider a slab, composing of ๐/๐ columns of the
incidence matrix. We define a basic slab as the structure illustrated in Figure 4.5.1 (for
๐ = 12 and ๐ = 3), where the cell (๐, ๐) is white if ๐๐ โ ๐๐, and is gray otherwise. The rows
are divided into blocks of size ๐, where first block, the query block, contains the rows whose
sets we wish to check for coverage; notice that only the last element is not covered. More
specifically, in a basic slab, the query block contains sets ๐1, . . . , ๐๐/๐, each of which is equal
to {๐1, . . . , ๐๐/๐โ1}. The subsequent rows form the swapper blocks each consisting of ๐/๐
sets. The ๐th swapper block consists of sets ๐(๐+1)๐/๐+1, . . . , ๐(๐+2)๐/๐, each of which is equal
to {๐1, . . . , ๐๐/๐} โ {๐๐}. We perform one swap in this slab. Consider a parameter (๐ฅ, ๐ฆ)
representing the index of a white cell within the query block. We exchange the color of this
white cell with the gray cell on the same row, and similarly exchange the same pair of cells on
swapper block ๐ฆ. An example is given in Figure 4.5.1; the dashed blue rectangle corresponds
to the indices parameterizing possible swaps, and the red squares mark the modified cells.
This modification corresponds to a single swap operation; in this example, choosing the index
(3, 2) swaps (๐2, ๐4) between ๐3 and ๐9. Observe that there are ๐ร(๐/๐โ1) = ๐โ๐ possible
swaps on a single slab, and any single swap allows the query sets to cover all ๐/๐ elements.
Lastly, we may create the full instance by placing all ๐ slabs together, as shown in
Figure 4.5.2, shifting the elementsโ indices as necessary. The structure of our sets may be
122
query block
swapper block 1
swapper block 2
swapper block 3
๐1 ๐2 ๐3 ๐4
๐1
๐2
๐3
๐4
๐5
๐6
๐7
๐8
๐9
๐10
๐11
๐12
(a) a basic slab
๐1 ๐2 ๐3 ๐4
๐1
๐2
๐3
๐4
๐5
๐6
๐7
๐8
๐9
๐10
๐11
๐12
(b) after performing a (3, 2)-swap
Figure 4.5.1: A basic slab and an example of a swapping operation.
slab 1 slab 2 slab 3
๐1 ๐2 ๐3 ๐4 ๐5 ๐6 ๐7 ๐8 ๐9 ๐10 ๐11 ๐12
๐1
๐2
๐3
๐4
๐5
๐6
๐7
๐8
๐9
๐10
๐11
๐12
Figure 4.5.2: A example structure of a Yes instance; all elements are covered by the first3 sets.
123
specified solely by the swaps made on these slabs. We define the structure of our instances
as follows.
1. For a Yes instance, we make one random swap on each slab. This allows the first ๐
sets to cover all elements.
2. For a No instance, we make one random swap on each slab except for exactly one of
them. In that slab, the last element is not covered by any of the first ๐ sets.
Now, to properly define an instance, we must describe our structure via EltOf and
SetOf. We first create a temporary instance consisting of ๐ basic slabs, where none of the
cells are swapped. Create EltOf and SetOf lists by sorting each list in an increasing order
of indices. Each instance from the above construction can then be obtained by applying
up to ๐ swaps on this temporary instance. Figure 4.6.1 provides a sample realization of a
basic slab with EltOf and SetOf, as well as a sample result of applying a swap on this
basic slab; these correspond to the incidence matrices in Figure 4.5.1a and Figure 4.5.1b,
respectively. Such a construction can be extended to include all ๐ slabs. Observe here that
no two distinct swaps modify the same entry; that is, the swaps do not interfere with one
another on these two functions. We also note that many entries do not participate in any
swap.
4.6. Generalized Lower Bounds for the Set Cover Problem
In this section we generalize the approach of Section 4.3 and prove our main lower bound
result (Theorem 4.3.1) for the number of queries required for approximating with factor ๐ผ
the size of an optimal solution to the Set Cover problem, where the input instance contains
๐ sets, ๐ elements, and a minimum set cover of size ๐. The structure of our proof is largely
the same as the simplified case, but the definitions and the details of our analysis will be
more complicated. The size of the minimum set cover of the median instance will instead be
at least ๐ผ๐ + 1, and GenModifiedInst reduces this down to ๐. We now aim to prove the
following statement which implies the lower bound in Theorem 4.3.1.
Theorem 4.6.1. Let ๐ be the size of an optimal solution of ๐ผ* such that 1 < ๐ผ โค log ๐ and
2 โค ๐ โค(
๐16๐ผ log๐
) 14๐ผ+1
. Any algorithm that distinguishes whether the input instance is ๐ผ*
124
Before: EltOf table for a basic slab After: EltOf table after applying a swap
EltOf 1 2 3 EltOf 1 2 3
๐1 ๐1 ๐2 ๐3 ๐1 ๐1 ๐2 ๐3
๐2 ๐1 ๐2 ๐3 ๐2 ๐1 ๐2 ๐3
๐3 ๐1 ๐2 ๐3 ๐3 ๐1 ๐4 ๐3
๐4 ๐2 ๐3 ๐4 ๐4 ๐2 ๐3 ๐4
๐5 ๐2 ๐3 ๐4 ๐5 ๐2 ๐3 ๐4
๐6 ๐2 ๐3 ๐4 ๐6 ๐2 ๐3 ๐4
๐7 ๐1 ๐3 ๐4 ๐7 ๐1 ๐3 ๐4
๐8 ๐1 ๐3 ๐4 ๐8 ๐1 ๐3 ๐4
๐9 ๐1 ๐3 ๐4 ๐9 ๐1 ๐3 ๐2
๐10 ๐1 ๐2 ๐4 ๐10 ๐1 ๐2 ๐4
๐11 ๐1 ๐2 ๐4 ๐11 ๐1 ๐2 ๐4
๐12 ๐1 ๐2 ๐4 ๐12 ๐1 ๐2 ๐4
Before: SetOf table for a basic slab
SetOf 1 2 3 4 5 6 7 8 9
๐1 ๐1 ๐2 ๐3 ๐7 ๐8 ๐9 ๐10 ๐11 ๐12
๐2 ๐1 ๐2 ๐3 ๐4 ๐5 ๐6 ๐10 ๐11 ๐12
๐3 ๐1 ๐2 ๐3 ๐4 ๐5 ๐6 ๐7 ๐8 ๐9
๐4 ๐4 ๐5 ๐6 ๐7 ๐8 ๐9 ๐10 ๐11 ๐12
After: SetOf table after applying a swap
SetOf 1 2 3 4 5 6 7 8 9
๐1 ๐1 ๐2 ๐3 ๐7 ๐8 ๐9 ๐10 ๐11 ๐12
๐2 ๐1 ๐2 ๐9 ๐4 ๐5 ๐6 ๐10 ๐11 ๐12
๐3 ๐1 ๐2 ๐3 ๐4 ๐5 ๐6 ๐7 ๐8 ๐9
๐4 ๐4 ๐5 ๐6 ๐7 ๐8 ๐3 ๐10 ๐11 ๐12
Figure 4.6.1: Tables illustrating the representation of a slab under EltOf and SetOfbefore and after a swap; cells modified by swap(๐2, ๐4) between ๐3 and ๐9 are highlighted inred.
125
or belongs to ๐(๐ผ*) with probability of success at least 2/3 requires ฮฉ(๐(๐๐)1/(2๐ผ)) queries.
Proof: Applying the same argument as that of Lemma 4.3.10, we derive that the probability
that ๐ returns different outputs on ๐ผ* and ๐ผ โฒ is at most
Pr๐(๐ผ*) = ๐(๐ผ โฒ) โค|๐|โ๐ก=1
Pr ans๐ผ*(๐๐ก) = ans๐ผโฒ(๐๐ก) โค|๐|โ๐ก=1
๐EltโSet(๐(๐๐ก), ๐(๐๐ก)) โค64๐๐0
๐(1โ ๐0)2|๐|,
via the result of Lemma 4.6.12. Then, over the distribution in which we applied Yaoโs lemma,
we have
Pr๐ succeeds โค 1โ 1
2Pr๐ผโฒโผ๐(๐ผ*)[๐(๐ผ*) = ๐(๐ผ โฒ)] โค 1โ 1
2
(1โ 64๐๐0
๐(1โ ๐0)2|๐|)
=1
2+
32๐๐0๐(1โ ๐0)2
|๐|
โค 1
2+
32
๐
(8(๐๐ผ+ 2) log๐
๐
) 12๐ผ
|๐|
where the last inequality follows from Lemma 4.6.2. Thus, if the number of queries made by
๐ is less than ๐192
( ๐8(๐๐ผ+2) log๐
)1/(2๐ผ), then the probability that ๐ returns the correct answer
over the input distribution is less than 2/3 and the proof is complete. ๏ฟฝ
4.6.1. Construction of the Median Instance ๐ผ*.
Let โฑ be a collection of๐ sets such that independently for each set-element pair (๐, ๐), ๐ con-
tains ๐ with probability 1โ ๐0, where we modify the probability to ๐0 =(
8(๐ผ๐+2) log๐๐
)1/(๐ผ๐).
We start by proving some inequalities involving ๐0 that will be useful later on, which hold
for any ๐ in the assumed range.
Lemma 4.6.2. For 2 โค ๐ โค(
๐16๐ผ log๐
) 14๐ผ+1
, we have that
(a) 1โ ๐0 โฅ ๐๐/40 ,
(b) ๐๐/40 โค 1/2,
(c)๐๐0
(1โ๐0)2โค(
8(๐ผ๐+2) log๐๐
) 12๐ผ.
Proof: Recall as well that ๐ผ > 1. In the given range of ๐, we have ๐4๐ผ โค ๐16๐ผ๐ log๐
โค
126
๐8(๐ผ๐+2) log๐
because ๐๐ผ โฅ 2. Thus
๐0 =
(8(๐ผ๐ + 2) log๐
๐
) 1๐ผ๐
โค(
1
๐4๐ผ
) 1๐ผ๐
= ๐โ4/๐.
Next, rewrite ๐โ4/๐ = ๐โ4 ln ๐
๐ and observe that 4 ln ๐๐โค 4
๐< 1.5. Since ๐โ๐ฅ โค 1 โ ๐ฅ
2for
any ๐ฅ < 1.5, we have ๐0 โค ๐โ4 ln ๐
๐ < 1 โ 2 ln ๐๐. Further, ๐
๐/40 โค ๐โ ln ๐ = 1/๐. Hence
๐0 + ๐๐/40 โค 1โ 2 ln ๐
๐+ 1
๐โค 1, implying the first statement.
The second statement easily follows as ๐๐/40 โค 1/๐ โค 1/2 since ๐ โฅ 2. For the last
statement, we make use of the first statement:
๐๐0(1โ ๐0)2
โค ๐๐0
(๐๐/40 )2
= ๐๐/20 =
(8(๐ผ๐ + 2) log๐
๐
) 12๐ผ
which completes the proof of the lemma. ๏ฟฝ
Next, we give the new, generalized definition of median instances.
Definition 4.6.3 (Median instance). An instance of Set Cover, ๐ผ = (๐ฐ ,โฑ), is a median
instance if it satisfies all the following properties.
(a) No ๐ผ๐ sets cover all the elements. (The size of its minimum set cover is greater than
๐ผ๐.)
(b) The number of uncovered elements of the union of any ๐ sets is at most 2๐๐๐0.
(c) For any pair of elements ๐, ๐โฒ, the number of sets ๐ โ โฑ s.t. ๐ โ ๐ but ๐โฒ /โ ๐ is at least
(1โ ๐0)๐0๐/2.
(d) For any collection of ๐ sets ๐1, ยท ยท ยท , ๐๐, |๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1)| โฅ (1โ ๐0)(1โ ๐๐โ10 )๐/2.
(e) For any collection of ๐ + 1 sets ๐, ๐1, ยท ยท ยท , ๐๐, |(๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1)) โ ๐| โค 2๐0(1 โ๐0)(1โ ๐๐โ1
0 )๐.
(f) For each element, the number of sets that do not contain the element is at most (1 +
1๐)๐0๐.
Lemma 4.6.4. For ๐ โค min{โ ๐27 ln๐
, ( ๐16๐ผ log๐
)1
4๐ผ+1}, there exists a median instance ๐ผ*
satisfying all the median properties from Definition 4.6.3. In fact, most of the instances
constructed by the described randomized procedure satisfy the median properties.
127
Proof: The lemma follows from applying the union bound on the results of Lemmas 4.6.5โ
4.6.10. ๏ฟฝ
The proofs of the Lemmas 4.6.5โ4.6.10 follow from standard applications of concentration
bounds. We include them here for the sake of completeness.
Lemma 4.6.5. With probability at least 1โ๐โ2 over โฑ โผ โ(๐ฐ , ๐0), the size of the minimumset cover of the instance (โฑ ,๐ฐ) is at least ๐ผ๐ + 1.
Proof: The probability that an element ๐ โ ๐ฐ is covered by a specific collection of ๐ผ๐ sets
in โฑ is at most 1 โ ๐๐ผ๐0 = 1 โ 8(๐ผ๐+2) log๐๐
. Thus, the probability that the union of the ๐ผ๐
sets covers all elements in ๐ฐ is at most (1โ 8(๐ผ๐+2) log๐๐
)๐ < ๐โ8(๐ผ๐+2). Applying the union
bound, with probability at least 1โ๐โ2 the size of an optimal set cover is at least ๐ผ๐+1.๏ฟฝ
Lemma 4.6.6. With probability at least 1โ๐โ2 over โฑ โผ โ(๐ฐ , ๐0), any collection of ๐ sets
has at most 2๐๐๐0 uncovered elements.
Proof: Let ๐1, ยท ยท ยท , ๐๐ be a collection of ๐ sets from โฑ . For each element ๐ โ ๐ฐ , the proba-bility that ๐ is not covered by the union of the ๐ sets is ๐๐0. Thus,
E[|๐ฐ โ (๐1 โช ยท ยท ยท โช ๐๐)|] = ๐๐0๐ โฅ ๐๐ผ๐0 ๐ = 8(๐ผ๐ + 2) log๐.
By Chernoff bound,
Pr |๐ฐ โ (๐1 โช ยท ยท ยท โช ๐๐)| โฅ 2๐๐0๐ โค ๐โ๐๐0๐
3 โค ๐โ(๐ผ๐+2) log๐ โค ๐โ๐โ2.
Thus with probability at least 1 โ ๐โ2, for any collection of ๐ sets in โฑ , the number of
uncovered elements by the union of the sets is at most 2๐๐0๐. ๏ฟฝ
Lemma 4.6.7. Suppose that โฑ โผ โ(๐ฐ , ๐0) and let ๐, ๐โฒ be two elements in ๐ฐ . Given ๐ โค(๐
16๐ผ log๐
) 14๐ผ+1
, with probability at least 1โ๐โ2, the number of sets ๐ โ โฑ such that ๐ โ ๐
but ๐โฒ /โ ๐ is at least ๐๐0(1โ ๐0)/2.
Proof: For each set ๐, Pr ๐ โ ๐ and ๐โฒ /โ ๐ = (1 โ ๐0)๐0. This implies that the expected
128
number of such sets ๐ satisfying the condition for ๐ and ๐โฒ is
๐0(1โ ๐0)๐ โฅ ๐0 ยท ๐๐/40 ยท๐ โฅ ๐๐ผ๐0 ๐ = 8(๐ผ๐ + 2) log๐
by Lemma 4.6.2 and ๐ โฅ ๐. By Chernoff bound, the probability that the number of sets
containing ๐ but not ๐โฒ is less than ๐๐0(1โ ๐0)/2 is at most
๐โ๐0(1โ๐0)๐
8 โค ๐โ(๐ผ๐+2) log๐ โค ๐โ๐ผ๐โ2.
Thus with probability at least 1โ๐โ2 property (c) holds for any pair of elements in ๐ฐ . ๏ฟฝ
Lemma 4.6.8. Suppose that โฑ โผ โ(๐ฐ , ๐0) and let ๐1, ยท ยท ยท , ๐๐ be ๐ different sets in โฑ .Given ๐ โค
(๐
16๐ผ log๐
) 14๐ผ+1
, with probability at least 1 โ ๐โ2, |๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1)| โฅ(1โ ๐0)(1โ ๐๐โ1
0 )๐/2.
Proof: For each element ๐, Pr ๐ โ ๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1) = (1 โ ๐0)(1 โ ๐๐โ10 ). This implies
that the expected size of ๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1) is
(1โ ๐0)(1โ ๐๐โ10 )๐ โฅ ๐
๐/40 ยท ๐๐/40 ยท ๐ โฅ ๐๐ผ๐0 ๐ = 8(๐ผ๐ + 2) log๐.
by Lemma 4.6.2. By Chernoff bound, the probability that |๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1)| โค (1 โ๐0)(1โ ๐๐โ1
0 )๐/2 is at most
๐โ(1โ๐0)(1โ๐๐โ1
0 )๐
8 โค ๐โ(๐ผ๐+2) log๐ โค ๐โ๐ผ๐โ2.
Thus with probability at least 1โ๐โ2 property (d) holds for any sets ๐1, ยท ยท ยท , ๐๐ in โฑ . ๏ฟฝ
Lemma 4.6.9. Suppose that โฑ โผ โ(๐ฐ , ๐0) and let ๐1, ยท ยท ยท , ๐๐ and ๐ be ๐+1 different sets in
โฑ . Given ๐ โค(
๐16๐ผ log๐
) 14๐ผ+1
, with probability at least 1โ๐โ2, |(๐๐โฉ(๐1โชยท ยท ยทโช๐๐โ1))โ๐| โค2๐0(1โ ๐0)(1โ ๐๐โ1
0 )๐.
129
Proof: For each element ๐, Pr ๐ โ (๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1)) โ ๐ = ๐0(1โ ๐0)(1โ ๐๐โ10 ). Then,
E(|(๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1)) โ ๐|) = ๐0(1โ ๐0)(1โ ๐๐โ10 )๐ โฅ ๐0 ยท ๐๐/40 ยท ๐๐/40 โฅ ๐๐ผ๐0 ๐
= 8(๐ผ๐ + 2) log๐
by Lemma 4.6.2. By Chernoff bound, the probability that |(๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1)) โ ๐| โฅ2๐0(1โ ๐0)(1โ ๐๐โ1
0 )๐ is
๐โ๐0(1โ๐0)(1โ๐๐โ1
0 )๐
3 โค ๐โ2(๐ผ๐+2) log๐ โค ๐โ2๐ผ๐โ4.
Thus with probability at least 1 โ๐โ2 property (e) holds for any sets ๐1, ยท ยท ยท , ๐๐ and ๐ in
โฑ . ๏ฟฝ
Lemma 4.6.10. Given that ๐ โค(
๐16๐ผ log๐
) 14๐ผ+1
, for each element, the number of sets that
do not contain the element is at most (1 + 1๐)๐0๐.
Proof: First, note that ๐ โค(
๐16๐ผ log๐
) 14๐ผ+1 โคโ ๐
27 ln๐as ๐ โฅ ๐ and ๐ผ โฅ 1.
Next, for each element ๐, Pr๐โผโฑ [๐ /โ ๐] = ๐0. This implies that E๐(|{๐ | ๐ /โ ๐}|) = ๐0๐.
By Chernoff bound, the probability that |{๐ | ๐ /โ ๐}| โฅ (1 + 1๐)๐0๐ is at most ๐
โ๐๐03๐2 . Now
if ๐ โฅ log ๐, then ๐0 โฅ 1/๐ and thus this probability would be at most exp( โ๐3๐๐2
) โค ๐โ3 for
any ๐ โคโ ๐27 ln๐
. Otherwise, we have that the above probability is at most exp(โ๐๐โ1/๐ผ๐
3 log2 ๐) โค
exp(โ๐1โ1/๐ผ๐
3 log2 ๐) โค ๐โ3 given ๐ โฅ ๐ and sufficiently large ๐. Thus with probability at least
1โ๐โ2 property (f) holds for any element ๐ โ ๐ฐ . ๏ฟฝ
4.6.2. Distribution ๐(๐ผ*) of the Modified Instances Derived from ๐ผ*.
Fix a median instance ๐ผ*. We now show that we may perform ๏ฟฝ๏ฟฝ(๐1โ1/๐ผ๐1/๐ผ) swap operations
on ๐ผ* so that the size of the minimum set cover in the modified instance becomes ๐. So, the
number of queries to EltOf and SetOf that induce different answers from those of ๐ผ* is
at most ๐(๐1โ1/๐ผ๐1/๐ผ). We define ๐(๐ผ*) as the distribution of instances ๐ผ โฒ that is generated
from a median instance ๐ผ* by GenModifiedInst(๐ผ*) given below in Algorithm 15. The
main difference from the simplified version are that we now select ๐ different sets to turn
them into a set cover, and the swaps may only occur between ๐๐ and the candidates.
130
Algorithm 15 The procedure of constructing a modified instance of ๐ผ*.
1: procedure GenModifiedInst(๐ผ* = (๐ฐ ,โฑ))2: โณโ โ 3: pick ๐ different sets ๐1, ยท ยท ยท๐๐ from โฑ uniformly at random4: for all ๐ โ ๐ฐ โ (๐1 โช ยท ยท ยท โช ๐๐) do5: pick ๐โฒ โ (๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1)) โโณ uniformly at random6: โณโโณโช {๐๐โฒ}7: pick a random set ๐ in Candidate(๐, ๐โฒ)8: swap(๐, ๐โฒ) between ๐, ๐๐
Lemma 4.6.11. The procedure GenModifiedInst is well-defined under the precondition
that the input instance ๐ผ* is a median instance.
Proof: To carry out the algorithm, we must ensure that the number of the initially uncov-
ered elements is at most that of the elements covered by both ๐๐ and some other set from
๐1, . . . , ๐๐โ1. Since ๐ผ* is a median instance, by properties (b) and (d) from Definition 4.6.3,
these values satisfy |๐ฐโ(๐1โชยท ยท ยทโช๐๐)| โค 2๐๐0๐ and |๐๐โฉ(๐1โชยท ยท ยทโช๐๐โ1)| โฅ (1โ๐0)(1โ๐๐โ10 )๐/2,
respectively. By Lemma 4.6.2, ๐๐/40 โค 1/2. Using this and Lemma 4.6.2 again,
(1โ ๐0)(1โ ๐๐โ10 )๐/2 โฅ ๐
๐/40 ยท ๐๐/40 ยท ๐/2 โฅ ๐
๐/20 ๐/2 โฅ 2๐๐0๐.
That is, in our construction there are sufficiently many possible choices for ๐โฒ to be matched
and swapped with each uncovered element ๐. Moreover, since ๐ผ* is a median instance,
|Candidate(๐, ๐โฒ)| โฅ (1โ ๐0)๐0๐/2 (by property (c)), and there are plenty of candidates for
each swap. ๏ฟฝ
Bounding the Probability of Modification. Similarly to the simplified case, define
๐EltโSet : ๐ฐ ร โฑ โ [0, 1] as the probability that an element is swapped by a set, and upper
bound it via the following lemma.
Lemma 4.6.12. For any ๐ โ ๐ฐ and ๐ โ โฑ , ๐EltโSet(๐, ๐) โค 64๐๐0(1โ๐0)2๐
where the probability
is taken over the random choices of ๐ผ โฒ โผ ๐(๐ผ*).
Proof: Let ๐1, . . . , ๐๐ denote the first ๐ sets picked (uniformly at random) from โฑ to construct
a modified instance of ๐ผ*. For each element ๐ and a set ๐ such that ๐ โ ๐ in the basic instance
131
๐ผ*,
๐EltโSet(๐, ๐) = Pr๐ = ๐๐ ยท Pr ๐ โ โช๐โ[๐โ1]๐๐ | ๐ โ ๐๐
ยท Pr ๐ matches to ๐ฐ โ (โช๐โ[๐]๐๐) | ๐ โ ๐๐ โฉ (โช๐โ[๐โ1]๐๐)
+ Pr๐ /โ {๐1, . . . , ๐๐} ยท Pr ๐ โ ๐ โ (โช๐โ[๐]๐๐) | ๐ โ ๐
ยท Pr๐ swaps ๐ with ๐๐ | ๐ โ ๐ โ (๐1 โช ยท ยท ยท โช ๐๐),
where all probabilities are taken over ๐ผ โฒ โผ ๐(๐ผ*). Next we bound each of the above six
terms. Clearly, since we choose the sets ๐1, ยท ยท ยท , ๐๐ randomly, Pr[๐ = ๐๐] = 1/๐. We bound
the second term by 1. Next, by properties (b) and (d) of median instances, the third term
is at most
|๐ฐ โ (โช๐โ[๐]๐๐)||๐๐ โฉ (โช๐โ[๐โ1]๐๐)|
โค 2๐๐0๐
(1โ ๐0)(1โ ๐๐โ10 )๐
2
โค 4๐๐0(1โ ๐0)2
.
We bound the fourth term by 1. Let ๐๐ denote the number of sets in โฑ that do not contain
๐. Using property (f) of median instances, the fifth term is at most
๐๐(๐๐ โ 1) ยท ยท ยท (๐๐ โ ๐ + 1)
(๐โ 1)(๐โ 2) ยท ยท ยท (๐โ ๐)โค(
๐๐๐โ 1
)๐
โค ((1 + 1/๐)๐0๐
๐(1โ 1๐+1
))๐ โค ๐2๐๐0,
Finally for the last term, note that by symmetry, each pair of matched elements ๐๐โฒ is
picked by GenModifiedInst equiprobably. Thus, for any ๐ โ ๐ โ (๐1 โช ยท ยท ยท โช ๐๐), the
probability that each element ๐โฒ โ ๐๐ โฉ (๐1 โช ยท ยท ยท โช ๐๐โ1) is matched to ๐ is 1|๐๐โฉ(๐1โชยทยทยทโช๐๐โ1)|
.
By properties (c)-(e) of median instances, the last term is at most
โ๐โฒโ(๐๐โฉ(โช๐โ[๐โ1]๐๐))โ๐
Pr ๐๐โฒ โโณ ยท Pr (๐, ๐๐) swap (๐, ๐โฒ)
โค |(๐๐ โฉ (โช๐โ[๐โ1]๐๐)) โ ๐| ยท1
|๐๐ โฉ (โช๐โ[๐โ1]๐๐)|ยท 1
|Candidate(๐, ๐โฒ)|
โค 2๐0(1โ ๐0)(1โ ๐๐โ10 )๐ ยท 1
(1โ ๐0)(1โ ๐๐โ10 )๐/2
ยท 1
๐0(1โ ๐0)๐/2
โค 8
(1โ ๐0)๐
132
Therefore,
๐EltโSet(๐, ๐) โค1
๐ยท 1 ยท 4๐๐0
(1โ ๐0)2+ 1 ยท ๐2๐๐0 ยท
8
(1โ ๐0)๐
โค 4๐๐0(1โ ๐0)2
+60๐๐0
(1โ ๐0)๐โค 64๐๐0
(1โ ๐0)2๐. ๏ฟฝ
4.7. Omitted Proofs from Section 4.3
Lemma 4.7.1. With probability at least 1โ๐โ1 over โฑ โผ โ(๐ฐ , ๐0), the size of the minimumset cover of the instance (โฑ ,๐ฐ) is greater than 2.
Proof: The probability that an element ๐ โ ๐ฐ is covered by two sets selected from โฑ is at
most:
Pr[๐ โ ๐1 โช ๐2] = 1โ ๐20 = 1โ 9 log๐
๐.
Thus, the probability that ๐1 โช ๐2 covers all elements in ๐ฐ is at most (1 โ 9 log๐๐
)๐ < ๐โ9.
Applying the union bound, with probability at least 1โ๐โ1 the size of optimal set cover is
greater than 2. ๏ฟฝ
Lemma 4.7.2. Let ๐1 and ๐2 be two sets in โฑ where โฑ โผ โ(๐ฐ , ๐0). Then with probability
at least 1โ๐โ1, |๐ฐ โ (๐1 โช ๐2)| โค 18 log๐.
Proof: For an element ๐, Pr[๐ /โ ๐1 โช ๐2] = ๐20 = 9 log๐๐
. So, E[|๐ฐ โ (๐1 โช ๐2)|] = 9 log๐.
By Chernoff bound, Pr[|๐ฐ โ (๐1 โช ๐2)| โฅ 18 log๐] is at most ๐โ9 log๐/3 โค ๐โ3. Thus with
probability at least 1 โ๐โ1, for any pair of sets in โฑ , the number of element not covered
by their union is at most 18 log๐. ๏ฟฝ
Lemma 4.7.3. Let ๐1 and ๐2 be two sets in โฑ where โฑ โผ โ(๐ฐ , ๐0). Then |๐1 โฉ ๐2| โฅ ๐/8
with probability at least 1โ๐โ1.
Proof: For each element ๐, it is either covered by both ๐1, ๐2, one of ๐1, ๐2 or none of them.
Since ๐0 โค 1/2, the probability that an element is covered by both sets is greater than other
cases, i.e., Pr [๐ โ ๐1 โฉ ๐2] > 1/4. Thus, E[|๐ฐ โ (๐1 โฉ ๐2)|] > ๐/4. By Chernoff bound,
Pr[|๐ฐ โ (๐1 โฉ๐2)| โค ๐/8] is exponentially small. Thus with probability at least 1โ๐โ1, the
133
intersection of any pairs of sets in โฑ is greater than ๐/8. ๏ฟฝ
Lemma 4.7.4. Suppose that โฑ โผ โ(๐ฐ , ๐0) and let ๐, ๐โฒ be two elements in ๐ฐ . With prob-
ability at least 1 โ ๐โ1, the number of sets ๐ โ โฑ such that ๐ โ ๐ but ๐โฒ /โ ๐ is at least
๐โ9 log๐4โ๐
.
Proof: For each set ๐, Pr[๐ โ ๐ and ๐โฒ /โ ๐] = (1 โ ๐0)๐0 โฅ ๐0/2. This implies that the
expected number of ๐ satisfying the condition for ๐ and ๐โฒ is at least ๐2ยทโ
9 log๐๐
and by
Chernoff bound, the probability that the number of sets containing ๐ but not ๐โฒ is less than๐โ9 log๐4โ๐
is exponentially small. Thus with probability at least 1 โ๐โ1 property (d) holds
for any pair of elements in ๐ฐ . ๏ฟฝ
Lemma 4.7.5. Suppose that โฑ โผ โ(๐ฐ , ๐0) and let ๐1, ๐2 and ๐ be sets in โฑ . With proba-
bility at least 1โ ๐โ1, |(๐1 โฉ ๐2) โ ๐| โค 6โ๐ log๐.
Proof: For each element ๐, Pr[๐ โ (๐1 โฉ ๐2) โ ๐] = (1 โ ๐0)2๐0 โค ๐0. This implies that the
expected size of (๐1 โฉ ๐2) โ ๐ is less thanโ9๐ log๐ and by Chernoff bound, the probability
that |(๐1โฉ๐2)โ๐| โฅ 6โ๐ log๐ is exponentially small. Thus with probability at least 1โ๐โ1
property (๐) holds for any sets ๐1, ๐2 and ๐ in โฑ . ๏ฟฝ
Lemma 4.7.6. For each element, the number of sets that do not contain the element is at
most 6๐โ
log๐๐
.
Proof: For each element ๐, Pr๐[๐ /โ ๐] = ๐0. This implies that E๐(|{๐ | ๐ /โ ๐}|) is lessthan ๐
โ9 log๐
๐and by Chernoff bound, the probability that |{๐ | ๐ /โ ๐}| โฅ 2๐
โ9 log๐
๐
is exponentially small. Thus with probability at least 1 โ ๐โ1 property (f) holds for any
element ๐ โ ๐ฐ . ๏ฟฝ
134
Chapter 5
Streaming Maximum Coverage
5.1. Introduction
In maximum ๐-coverage (Max ๐-Cover), given a ground set ๐ฐ of ๐ elements, a family of ๐
sets โฑ (each subset of ๐ฐ), and a parameter ๐, the goal is to select ๐ sets in โฑ whose union
has the largest cardinality. The initial streaming algorithms for this problem were developed
in the set arrival model, where the input sets are listed contiguously. This restriction is
natural from the perspective of submodular optimization, but limits the applicability of the
algorithms.1 Avoiding this limitation can be difficult, as streaming algorithms can no longer
operate on sets as โunit objectsโ. As a result, the first maximum coverage algorithm for the
general edge arrival model, where pairs of (set, element) can arrive in arbitrary order, have
been developed recently. In particular [29] presented a one-pass algorithm with space linear
in ๐ and constant approximation factor.2
A particularly interesting line of research in set arrival streaming set cover and max
๐-cover is to design efficient algorithms that only use ๐(๐) space [155, 22, 67, 42, 131]. Pre-
vious work have shown that we can adopt the existing greedy algorithm of Max ๐-Cover
to achieve constant factor approximation in ๐(๐) space [155, 22] (which later improved to
1For example, consider a situation where the sets correspond to neighborhoods of vertices in a directedgraph. Depending on the input representation, for each vertex, either the ingoing edges or the outgoingedges might be placed non-contiguously.
2We remark that many of the prior bounds (both upper and lower bounds) on set cover and max ๐-coverproblems in set-arrival streams also work in edge arrival streams (e.g. [61, 97, 20, 131, 17, 105]). However, thedesign of efficient streaming algorithms for the coverage problems on edge arrival streams was first studiedexplicitly in [29].
135
๐(๐) by [131]). However, the complexity of the problem in the โlow spaceโ regime is very
different in edge-arrival streams: [29] showed that as long as the approximation factor is
a constant, any algorithm must use ฮฉ(๐) space. Still, our understanding of approxima-
tion/space tradeoffs in the general case is far from complete. Table 5.1.1 summarizes the
known results.
5.1.1. Our Results
In this chapter, we complement the work of [29] by designing space-efficient (1/๐ผ)-approximation
algorithms for super-constant values of ๐ผ. In fact, we show a tight (up to polylogarithmic
factors) tradeoff between the two: the optimal space bound is ๐(๐/๐ผ2) for estimating the
maximum coverage value, and ๐(๐/๐ผ2+๐) for reporting an approximately optimal solution.3
The approximation factor ๐ผ can take any value in [1/ฮ(โ๐), 1โ 1/๐).
5.1.2. Our Techniques
In the edge arrival model, elements of each set can arrive irregularly and out of order. This
necessitates the use of methods that aggregate the information about the input sets, or their
coverage. In particular, distinct element sketches were used both in [29] (implicitly) and [131]
(explicitly). In this chapter we expand the use of sketching toolkit. Specifically, in addition
to distinct element estimation [13, 28, 112, 113, 32], we also need algorithms for heavy hitters
with respect to the ๐ฟ2 norm [44, 165, 38, 37], as well as a frequency-based partitioning of
elements, and detecting sets that โsubstantially contributeโ to the solution [108]. Application
of vector-sketching techniques (e.g. ๐ฟ๐-sampling/estimation and heavy hitters) in graph
streaming settings have been studied extensively (e.g. [8, 9, 89, 21, 49, 114]). We believe
that our algorithms can lead to further connections between vector sketching methods and
streaming algorithms for the coverage problems.
3We note that similar tradeoffs were previously obtained for the set cover problem, as [20] showed aฮ(๐๐/๐ผ2) bound for estimation, and a ฮ(๐๐/๐ผ) bound for reporting. Interestingly, the 1/๐ผ2 vs. 1/๐ผ gapdoes not occur for our problem.
4Their result works for the general submodular maximization assuming access to a value oracle that givena collection of sets computes their coverage. A careful adoption of their result to Max ๐-Cover (without thevalue oracle) uses ๐(๐) space.
136
Problem Stream Model Approximation Upper Bound Lower Bound
Estimation Edge Arrival1โ ๐
โฮฉ(๐
๐2)[17]
1โ 1๐โ ๐ ฮฉ(๐
๐2)[131]
Reporting
Edge Arrival 1โ 1๐โ ๐
๐(๐๐3)[29]๐(๐
๐2)[131]
โ
Set Arrival14[155], 1
2[22]4 ๐(๐)
12โ ๐ ๐( ๐
๐3)[131]
Estimation Edge Arrival 1/๐ผ ๐(๐๐ผ2 ) [here] ฮฉ(๐
๐ผ2 ) [here],[29]
Reporting Edge Arrival 1/๐ผ ๐(๐ + ๐๐ผ2 ) [here] โ
Table 5.1.1: The summary of known results on the space complexity of single-pass stream-ing algorithms of Max ๐-Cover.
Lower bound. Our algorithm was inspired by the lower bound. Specifically, it was pre-
viously shown by [29] that approximating Max ๐-Cover by a factor better than 2 requires
ฮฉ(๐) space. Similar approach works for larger values of ๐ผ, by showing a reduction from
the ๐ผ-player set disjointess problem (DSJ[m]) with unique intersection guarantee (i.e., either
playersโ sets are disjoint or there is a unique item that appears in all sets) to the task of
๐ผ-approximating Max ๐-Cover.
The specific hard instances in the aforementioned lower bound can be distinguished in
the streaming model using space ๐(๐/๐ผ2). To this end, we compute an ๐ผ-approximation to
the ๐ฟโ-norm of a vector ๐ฃ that, for each element ๐, counts the number of sets ๐ belongs to.
This problem can be solved in ๐(๐/๐ผ2) space, by using ๐ฟ2-norm sketches [13]. This suggests
that it might be possible to solve the general Max ๐-Cover using sketching techniques as
well.
Upper bound. We start our algorithm with a โcoverage boostingโ universe reduction
technique which constructs a reduced size instance (i.e., with reduced ground set) whose
optimal ๐-cover has constant fraction coverage (see Section 5.3.1). This step is particularly
important as the space complexity of the existing methods for Max ๐-Cover is proportional
to the reciprocal of the fraction of covered elements in an optimal solution.
137
Once we have a constant fraction coverage guarantee, our algorithm exploits three differ-
ent approaches so that on any instance, at least one of them reports a โgoodโ approximate
solution.
Multi-layered set sampling. By extending the set sampling approach (see Section 5.2.1)
and trying a larger range of sampling rate, [ ๐๐, ๐ผ๐๐], we design a smooth variant of set sampling:
a collection of sets sampled uniformly and independently5 at rate ๐(๐ฝ๐/๐) w.h.p., covers
all elements that appear in at least ๐/(๐ฝ๐) sets. Besides expanding the application of set
sampling in finding (1/๐ผ)-approximate ๐-cover, this smooth variant implies more structure
on the number of elements in a wider range of frequency levels which is specifically crucial
in our approach for detecting sets with โlow contributionโ.
Unlike the set sampling based technique whose success in finding an ๐ผ-approximate ๐-
cover only depends on the structure of the set system, the performance of the next two
approaches rely on the structure of optimal solutions as well: whether the majority of the
coverage (in a specific optimal ๐-cover) is due to (few) โlargeโ sets or, (many) โsmallโ sets.
Heavy hitters and contributing frequencies. The high level idea in this approach is
that if in an optimal solution, a sufficiently6 small number of sets cover the majority of
the elements (covered by the optimal solution), it is enough to find a single large set, which
naturally hints the use of ideas related to heavy hitters. For the sake of efficiency (in space
complexity), we randomly partition sets into supersets of size at most ๐. However, once we
merge sets into a single superset, we can no longer distinguish between their coverage and
their total size. Since we combine sets at random, if all elements have โlowโ frequency in the
set system, then the gap between the total size of all sets in a superset and their coverage is
just ๐(1). This observation implies that if there is no โcommonโ element in the set system,
then we can use the total size of the sets in a superset as an estimate of its coverage size. To
get around the case with (many) โcommonโ elements, we show that performing the heavy
hitter based algorithm on a sampled set of elements will find a sufficiently large superset as
desired (see Section 5.8).
5In fact, ๐(log๐๐)-wise independent is sufficient for all applications in this chapter.6Depending on how large the value of ๐ผ is.
138
Detecting ๐-covers with many small sets. Finally, we address the case in which an
optimal ๐-cover consists of many โsmallโ sets. In this case, we can show that after subsam-
pling sets uniformly with probability 1/๐ผ, a ( ๐๐ผ)-cover with coverage at least ฮ(1/๐ผ) times
the coverage of an optimal ๐-cover survives. This sampling method will save us a factor
of ๐ผ in the memory usage of the algorithm. Further, by exploiting the structural prop-
erty guaranteed due to the multi-layered set sampling7, we can show that element sampling
can save another factor of ๐ผ in the space complexity once applied to find a constant factor
approximate Max ( ๐๐ผ)-Cover of the subsampled sets.
5.1.3. Other Related Work
Another important related question in this area is to design a โlow-approximationโ (i.e.,
better than the 2-approximation guarantee of the greedy approach) streaming algorithm
for the max ๐-cover problem in the set arrival setting. Recently, Norouzi-Fard et al. [147]
presented the first streaming algorithm that improves upon 2-approximation guarantee of
greedy approach on random arrival streams. Very recently, Agrawal et al. [5] achieved an
almost (1โ 1/๐)-approximation in ๐(๐) space which is essentially the optimal bound [131].8
Still it is an important question to design such algorithms on adversarial order streams. We
also remark that the algorithms of [147, 5] do not work on edge arrival streams.
In many scenarios, space is the most critical factor, and thus the question becomes: what
approximation guarantees are possible within the given space bounds? This question has
been studied before in the context of set cover in set arrival streams (e.g. [67, 42]), leading
to poly(๐,๐)-factor approximation algorithms.
5.2. Preliminaries and Notations
5.2.1. Sampling Methods for Max ๐-Cover and Set Cover
Here we describe two sampling methods that have been used widely in the design of streaming
algorithms for Max ๐-Cover and Set Cover [123, 61, 97, 20, 131, 17, 29, 105]. For a collection7If the multi-layered set sampling fails to return a (1/๐ผ)-approximate estimate, we can infer strong
conditions on the maximum number of elements that belong to each frequency level in [๐๐ ,๐๐ผ๐ ].
8Both [147, 5] study the more general problem of submodular maximization and their results are statedwith different notation and assuming oracle access. Here, we state their guarantees for max cover on setarrival streams
139
of sets ๐ฌ, we define ๐(๐ฌ) to denote the set of elements that are covered by ๐ฌ; ๐(๐ฌ) :=โ๐โ๐ฌ ๐. Moreover, we denote an optimal ๐-cover of (๐ฐ ,โฑ) by OPT.
Set Sampling. Roughly speaking, it says that by selecting sets uniformly at random, with
high probability, all elements that appear in large number of sets will be covered.
Definition 5.2.1. An element ๐ โ ๐ฐ is called ๐-common if it appears in at least ๐๐ polylog(๐,๐)/๐
sets in โฑ . Furthermore, We denote the set of ๐-common elements by ๐ฐ cmn๐ .
Observation 5.2.2. For any 0 โค ๐1 โค ๐2, ๐ฐ cmn๐1โ ๐ฐ cmn
๐2.
Lemma 5.2.3 (Set Sampling [61]). Consider a set system (๐ฐ ,โฑ) and let โฑ rnd โ โฑ be
a collection of sets such that each set ๐ is picked in โฑ rnd with probability ๐๐. With high
probability, โฑ rnd covers all elements that appear in ฮฉ(๐/๐) sets (i.e. ๐-common elements).
Element Sampling for Max ๐-Cover . This sampling method shows that if we sample
elements of ๐ฐ uniformly with a large enough rate (i.e. proportional to (๐|๐ฐ|)/|๐(OPT)|),then a constant factor approximate ๐-cover over the sampled elements w.h.p., is a constant
factor approximate solution of the original instance.
Lemma 5.2.4 (Element Sampling Lemma [123, 61]). Consider an instance of Max
๐-Cover(๐ฐ ,โฑ). Letโs assume that an optimal ๐-cover of (๐ฐ ,โฑ) covers (1/๐)-fraction of ๐ฐ .Let โ โ ๐ฐ be a set of elements of size ฮ(๐๐) picked uniformly at random. Then, with high
probability, a ฮ(1)-approximate ๐-cover of (โ,โฑ) is a ฮ(1)-approximate ๐-cover of (๐ฐ ,โฑ).
Observation 5.2.5. Let ๐ฌ be a collection of (๐ฝ๐)-cover in โฑ . Then, in any partitioning of
๐ฌ into ๐ฝ groups, there exists one group with coverage at least |๐(๐ฌ)|/๐ฝ. In particular, an
optimal ๐-cover in ๐ฌ covers at least |๐(๐ฌ)|/๐ฝ.
This simple observation is in particular interesting because it relates the task of ๐ผ-approximating
Max ๐-Cover to solving instances of Max (๐ฝ๐)-Cover where ๐ฝ โค ๐ผ.
5.2.2. HeavyHitters and Contributing Classes
Suppose that a sequence of items ๐1, ยท ยท ยท , ๐๐ arrive in a data stream where for each ๐ โค ๐ ,
๐๐ โ [๐]. We can think of the stream as a sequence of (insertion only) updates on an initially
zero vector ๏ฟฝ๏ฟฝ such that upon arrival of ๐๐ in the stream, ๏ฟฝ๏ฟฝ[๐] โ ๏ฟฝ๏ฟฝ[๐] + 1. Here, we review
140
the notion of ๐น2-heavy hitter and contributing coordinates that are used in our algorithm for
approximating Max ๐-Cover.
Definition 5.2.6. Given an ๐-dimensional vector ๏ฟฝ๏ฟฝ, an item ๐ (corresponding to ๏ฟฝ๏ฟฝ[๐]) is a
๐-HeavyHitter of ๐น2(๐), if ๏ฟฝ๏ฟฝ[๐]2 โฅ ๐ ยท ๐น2(๐) = ๐ ยทโ๐โ[๐] ๏ฟฝ๏ฟฝ[๐]2. Intuitively, the set of items
that appear frequently in the stream are the heavy hitters.
We conceptually partition coordinates of ๏ฟฝ๏ฟฝ into classes ๐ ๐ = {๐ | 2๐โ1 < ๏ฟฝ๏ฟฝ[๐] โค 2๐}.
Definition 5.2.7. A class of coordinates ๐ ๐ก is ๐พ-contributing if |๐ ๐ก| ยท 22๐ก โฅ ๐พ๐น2(๐) =
๐พโ
๐โ[๐] ๏ฟฝ๏ฟฝ[๐]2.
Let ๐ ๐ก* be a ๐พ-contributing class and let ๐๐ก* denote the size of ๐ ๐ก* ; ๐๐ก* = |๐ ๐ก*|. Further,letโs assume that ๐* = โlog ๐๐ก*โ; 2๐*โ1 < ๐๐ก* โค 2๐
*. Let โ : [๐] โ [(12๐ log๐)/2๐
*] be a
function chosen uniformly at random from a family of ฮ(log(๐๐))-wise independent hash
functions. We define ๐ฎ๐* as a sampled substream of the input stream with rate 1/2๐*. More
precisely, ๐ฎ๐* only contains the updates corresponding to the coordinates โฑ๐* = {๐ | โ(๐) = 1}that are mapped to one under โ. Next, we show that the survived coordinates of ๐ ๐ก*
(๐ โ ๐ ๐ก*) in ๏ฟฝ๏ฟฝ๐* , which is the vector ๏ฟฝ๏ฟฝ restricted to the items in โฑ๐* , are ฮฉ(๐พ)-HeavyHitters
of ๐น2(๐๐*); ๏ฟฝ๏ฟฝ[๐]2 โฅ ฮฉ(๐พ) ยท ๐น2(๐๐*). Roughly speaking, if we subsample the stream so that
only polylog(๐) coordinates of ๐ ๐ก* survive, then with high probability these coordinates areฮฉ(๐พ)-HeavyHitters of the sampled substream.
Claim 5.2.8. With probability at least 1 โ๐โ1, the number of survived coordinates in the
sampled substream ๐ฎ๐* is at least (6๐ log๐)/2๐*.
Proof: Let ๐๐ be a random variable which is one if item ๐๐ โ โฑ๐* and zero otherwise. We
define ๐ := ๐1 + ยท ยท ยท + ๐๐. Note that ๐๐ are ฮ(log(๐๐))-wise independent and E[๐] =
(12๐ log๐)/2๐*. Then, by an application of Chernoff bound with limited independence
(Lemma 5.7.3),
Pr(๐ < (6๐ log๐)/2๐*) โค ๐โ1.
Hence, with high probability, โฑ๐* has size at least (6๐ log๐)/2๐*. ๏ฟฝ
Lemma 5.2.9. With probability at least 1 โ 2/(9 log2 ๐ log๐๐), a coordinate ๐ โ ๐ ๐ก* is a
141
( ๐พ162 log2 ๐ log๐+1 ๐
)-HeavyHitter in the sampled substream ๐ฎ๐ก*.
Proof: Let ๐๐ be a random variable corresponding to the ๐-th coordinate in ๐ ๐ก* such that
๐๐ = 1 if ๐ โ โฑ๐* and zero otherwise. Moreover, we define๐ := ๐1+ยท ยท ยท+๐๐๐ก* . Then, E[๐] =
(12๐๐ก* log๐)/2๐*and by an application of Chernoff bound with limited independence,
Pr(๐ < 1) < Pr(๐ < (1โโ
1/6)E[๐]) โค ๐โ1.
Next, we show that with probability at least 1 โ 1/ log๐๐, ๐น2(๐๐*) โค ๐(๐น2(๐)/2๐*). It is
straightforward to check that E[๐น2(๐๐*)] = (12๐น2(๐) log๐)/2๐*. Hence, by Markovโs inequal-
ity,
Pr(๐น2(๐๐*) โฅ (108๐น2(๐) log2 ๐ log๐+1๐)/2๐
*) โค 1/(9 log2 ๐ log๐๐).
Hence, with probability at least 1 โ 2/(9 log2 ๐ log๐ ๐), a coordinate ๐ โ ๐ ๐ก* survives
in โฑ๐* and ๐น2(๐๐*) โค (108๐น2(๐) log2 ๐ log๐+1๐)/2๐
*. Thus, with probability at least 1 โ
2/(9 log2 ๐ log๐ ๐),
(๐[๐])2 โฅ (2๐ก*โ1)2 โฅ ๐พ
4๐๐ก*๐น2(๏ฟฝ๏ฟฝ) โฅ
๐พ
4 ยท 2๐* (2๐
*
108 log2 ๐ log๐+1๐)๐น2(๏ฟฝ๏ฟฝ) = (
๐พ
432 log2 ๐ log๐+1๐)๐น2(๏ฟฝ๏ฟฝ)
In other words, with probability at least 1 โ 2/(9 log2 ๐ log๐๐), a coordinate ๐ โ ๐๐ก* is a
( ๐พ432 log2 ๐ log๐+1 ๐
)-HeavyHitter of ๐น2(๐๐*). ๏ฟฝ
Next, we can use the exiting algorithms for ๐น2-HeavyHitters to complete this section.
Theorem 5.2.10 (๐น2-heavy hitters [44, 165, 38, 37]). Letโs assume that ๏ฟฝ๏ฟฝ is an ๐-
dimensional vector initialized to zero. Let ๐ฎ be a stream of items ๐1, ยท ยท ยท , ๐๐ where for
each ๐ โ [๐ ], ๐๐ โ [๐]. Then, there is a single pass algorithm ๐น2-HeavyHitter that uses๐(1/๐พ) space and with high probability returns all coordinates ๐ such that ๏ฟฝ๏ฟฝ[๐]2 โฅ ๐พ๐น2(๐). In
addition, it returns (1ยฑ 12)-approximate values of these coordinates.
Finally, there exists an algorithm that with probability at least 1 โ 2/(9 log ๐ log๐ ๐),
finds at least one coordinate in each ๐พ-contributing class of ๏ฟฝ๏ฟฝ using ๐(1/๐พ) space.
Theorem 5.2.11 (๐พ-contributing [108]). Letโs assume that ๏ฟฝ๏ฟฝ is an ๐-dimensional vector
142
Algorithm 16 A single pass streaming algorithm that given a vector ๐ in a stream, for eachcontributing class ๐ , returns an index ๐ along with a (1ยฑ (1/2))-estimate of its frequency.
1: procedure ๐น2-Contributing(๐พ, ๐)2: โ Input: stream of updates ๐ โ [๐] on vector ๏ฟฝ๏ฟฝ3: โ ๐ is an upper bound on the size of a ๐พ-contributing class4: for all ๐๐ก โ {2๐ | ๐ โ [log๐]} in parallel do5: โ ๐๐ก : #coordinates in a ๐พ-contributing class6: ๐โ ( ๐พ
432 log๐ log๐+1 ๐) โ parameter of HeavyHitter
7: let HH๐ be a instance of ๐น2-HeavyHitter(๐)8: ๐โ (12 log๐)/2๐ โ sample rate of the substream9: pick โ : [๐]โ [๐
๐] from a family of ฮ(log(๐๐))-wise independent hash functions
10: for all item ๐๐ in the input stream do11: if โ(๐๐) = 1 then12: feed ๐๐ to HH๐
13: return output of HH๐
14: โ returns heavy coordinates together with their approximate frequencies
initialized to zero. Let ๐ฎ be a stream of items ๐1, ยท ยท ยท , ๐๐ where for each ๐ โ [๐ ], ๐๐ โ [๐].
Moreover, letโs assume no item in ๐ฎ has frequency more than ๐. There exists a single
pass algorithm ๐น2-Contributing that uses ๐(1/๐พ) space and with probability at least 1 โ2/(9 log ๐ log๐๐) returns a coordinate ๐ from each ๐พ-contributing class. In addition, it returns
(1ยฑ 12)-approximate frequency of these coordinates.
Proof: There are at most log ๐ (the total number of classes) ๐พ-contributing classes for
๏ฟฝ๏ฟฝ and for each ๐พ-contributing class ๐ ๐ก, by Lemma 5.2.9, with probability at least 1 โ2/(9 log2 ๐ log๐ ๐), a coordinate in ๐ ๐ก will be a ฮฉ(๐พ)-HeavyHitter of ๐น2(๐๐*) (where ๐* =
โlog(๐๐ก)โ). By trying all values of ๐* โ [log ๐], with probability at least 1โlog ๐( 29 log2 ๐ log๐ ๐
) โฅ1โ 2
9 log๐ log๐ ๐, ๐น2-Contributing algorithm outputs a coordinate from each ๐พ-contributing
class. ๏ฟฝ
5.2.3. ๐ฟ0-Estimation
Norm estimation is one of the fundamental problems in the area of streaming algorithms
where we are given an ๐-dimensional vector ๏ฟฝ๏ฟฝ which is initialized to zero and a sequence
of items ๐1, ยท ยท ยท , ๐๐ (updates for the vector ๏ฟฝ๏ฟฝ) where for each ๐ โ [๐ ], ๐๐ โ [๐] arrive
in a data stream. In the well-studied task ๐ฟ0-estimation (also known as Count-distinct
problem), the goal is to output a (1 ยฑ ๐)-estimate of the number of distinct elements (i.e.,
143
๐ฟ0(๐) := |{๐ | ๏ฟฝ๏ฟฝ[๐] = 0}|) after reading the whole stream.
Theorem 5.2.12 (๐ฟ0-estimation [13, 28, 112, 113, 32]). Letโs assume that ๏ฟฝ๏ฟฝ is an ๐-
dimensional vector initialized to zero. Let ๐ฎ be a stream of items ๐1, ยท ยท ยท , ๐๐ where for each
๐ โ [๐ ], ๐๐ โ [๐]. There exists a single pass algorithm that returns a (1ยฑ1/2)-approximation
of ๐ฟ0(๐) and uses ๐(1) space.
5.3. Estimating Optimal Coverage of Max ๐-Cover
In this section, we describe the outline of our single-pass algorithm that approximates the
coverage size of an optimal ๐-cover of (๐ฐ ,โฑ) within a factor of ๐ผ using ๐(๐/๐ผ2) space in
arbitrary order edge arrival streams. The input to our algorithm is ๐, ๐ผ, ๐ = |๐ฐ| and๐ = |โฑ|.In high level, we perform three different subroutines in parallel and show that for any given
Max ๐-Cover instance, at least one of the subroutines estimates the optimal coverage size
within the desired factor in the promised space.
Theorem 5.3.1. For any ๐ผ โ [1/ฮ(โ๐), 1/ฮ(1)), there exists one pass streaming algorithm
that uses ๐(๐/๐ผ2) space and with probability at least 3/4 computes the size an optimal
coverage of Max ๐-Cover(๐ฐ ,โฑ) within a factor of 1/๐ผ in edge-arrival streams.
Note that Theorem 5.3.1 together with the ๐(1)-approximation algorithms of [131, 29] that
use ๐(๐) space, implies that for any ๐ผ โ [1/ฮ(โ๐), 1 โ 1/๐), there exists a single-pass
streaming algorithm that computes an ๐ผ-approximation of the optimal coverage size of
Max ๐-Cover(๐ฐ ,โฑ) in ๐(๐/๐ผ2) space. Later in Section 5.5, we extend our approach fur-
ther to achieve a single pass algorithm that computes an (1/๐ผ)-approximate ๐-cover in๐(๐/๐ผ2 + ๐) space.
Theorem 5.3.2. For any ๐ผ โ [1/ฮ(โ๐), 1/ฮ(1)), there exists a single-pass algorithm that
uses ๐(๐/๐ผ2 + ๐) space and with probability at least 3/4 returns an (1/๐ผ)-approximate
solution of Max ๐-Cover (๐ฐ ,โฑ) in edge-arrival streams.
Finally, we complement our upper bounds with a matching lower bound in Section 5.6.
Theorem 5.3.3. Any single pass (possibly randomized) algorithm on edge-arrival streams
that (1/๐ผ)-approximates the optimal coverage size of Max ๐-Cover requires ฮฉ(๐/๐ผ2) space.
144
As a first step, we provide a mapping from the ground set ๐ฐ to a small size set of pseudo-
elements such that the optimal ๐-cover on the pseudo-elements covers a constant fraction
of the pseudo-elements. This reduction is in particular useful for bounding the number of
required samples in methods such as element sampling.
5.3.1. Universe Reduction
In this section we show that in order to solve Max ๐-Cover on edge-arrival streams, it suffices
to solve the instances whose optimal coverage size are at least a constant fraction of |๐ฐ|.To this end, suppose that we have an algorithm ๐ for Max ๐-Cover in edge-arrival streams
with the following properties:
Definition 5.3.4 ((๐ผ, ๐ฟ, ๐)-oracle for Max๐-Cover). An algorithm๐ is an (๐ผ, ๐ฟ, ๐)-oracle
for Max ๐-Cover if it satisfies the following properties (๐ผ denotes the approximation guaran-
tee, ๐ฟ denotes the failure probability and ๐ is the promised coverage of an optimal ๐-cover):
โ If the optimal coverage size of Max ๐-Cover(๐ฐ ,โฑ) is at least |๐ฐ|/๐, then with proba-
bility at least 1โ ๐ฟ, ๐ returns a (1/๐ผ)-approximation of the optimal coverage size.
โ If ๐ returns ๐ง, then an optimal solution of Max ๐-Cover (๐ฐ ,โฑ) with high probability,
has coverage at least ๐ง.
Using an (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover, we design an (1/ ๐(๐ผ))-approximation algorithm
EstimateMaxCover for general Max ๐-Cover with success probability at least 1 โ ๐(๐ฟ)
as in Algorithm 17.
As in EstimateMaxCover, let โ : ๐ฐ โ [๐ง] be a hash function picked uniformly at
random from a family of 4-wise independent hash functions mapping the ground set ๐ฐonto pseudo-elements ๐ฑ = {1, ยท ยท ยท , ๐ง}. Furthermore, for a subset of elements ๐, we define
โ(๐) :=โ
๐โ๐ โ(๐).
Lemma 5.3.5. Let โ : ๐ฐ โ [๐ง] be a hash function picked uniformly at random from a family
of 4-wise independent hash functions where ๐ง โฅ 32. Further, let ๐ be a subset of ๐ฐ of size
at least ๐ง. Then, with probability at least 3/4, |โ(๐)| โฅ ๐ง/4.
Proof: For any pair of elements ๐๐, ๐๐ โ ๐, let ๐๐,๐ be a random variable which is one if
โ(๐๐) = โ(๐๐) (i.e. they collide) and zero otherwise. Let ๐ :=โ
๐๐,๐๐โ๐ ๐๐,๐ denote the the
145
Algorithm 17 A single-pass streaming algorithm that uses an (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover to compute an (1/ ๐(๐ผ))-approximation of the optimal coverage size of Max ๐-Cover.
1: procedure EstimateMaxCover(๐, ๐ผ)2: โ Input: ๐ is an (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover3: โ ๐ is an upper bound on the size of a ๐พ-contributing class4: if ๐๐ผ โฅ ๐ then5: return ๐/๐ผ โ trivial bound
โ run for different guesses on the optimal coverage size in parallel6: for all ๐ง โ {2๐ | ๐ โ [log ๐]} do7: est๐ง โ 08: for ๐ = 1 to log(1/๐ฟ) do โ boost the success probability9: pick โ๐ : ๐ฐ โ [๐ง] from a family of 4-wise independent hash functions.10: for all (๐, ๐) in the data stream do11: feed (๐, โ๐(๐)) to (๐ผ, ๐ฟ, ๐)-oracle ๐12: est๐ง โ max(output of ๐ on the stream constructed by โ๐, est๐ง)
13: return max{est๐ง | est๐ง โฅ ๐ง/(4๐ผ)}
total number of collision among the elements of ๐ under โ.
First, we show that if ๐ โค |๐|2/๐พ, then |โ(๐)| โฅ ๐พ/4. Letโs assume โ(๐) = {๐ฃ1, ยท ยท ยท , ๐ฃ๐}and let ๐๐ denote the number of elements in ๐ that are mapped to ๐ฃ๐ by โ. Then, the total
number of collision, ๐, is
๐ =
๐โ๐=1
(๐๐
2
)โฅ
๐โ๐=1
(๐๐
2)2 โฅ 1
4ยท ๐ ยท ( |๐|
๐)2 =
|๐|24๐
.
This implies that ๐ = |โ(๐)| โฅ ๐พ/4. Using this observation, it only remains to show that
with probability at least 3/4, |๐| โค |๐|2/๐ง. Since โ is selected from a family of 4-wise
independent hash functions, {๐๐,๐}๐๐,๐๐โ๐ are pairwise independent. Hence,
E[๐] =โ
๐๐,๐๐โ๐
E[๐๐,๐] =
(|๐|2
)ยท (1๐ง) โค |๐|
2
2๐ง,
Var[๐] =โ
๐๐,๐๐โ๐
Var[๐๐,๐] =
(|๐|2
)ยท (1๐งโ 1
๐ง2) โฅ ๐ง
8.
Applying Chebyshevโs inequality,
Pr(๐ > |๐|2/๐ง) โค Pr(๐ > E[๐] +Var[๐]) โค 1/Var[๐] โค 8/๐ง โค 1/4.
146
Hence, with probability at least 3/4, ๐ โค |๐|2/๐ง which implies that with probability at least
3/4, |โ(๐)| โฅ ๐ง/4. ๏ฟฝ
Theorem 5.3.6. If there exists a single pass (๐ผ, ๐ฟ, ๐)-oracle for Max ๐-Cover(๐ฐ ,โฑ) on
edge-arrival streams that uses ๐(๐,๐ผ) space with ๐ โฅ 4, then EstimateMaxCover is a
(1/๐(๐ผ))-approximation algorithm for Max ๐-Cover with failure probability at most 4๐ฟ log ๐
that uses ๐(๐(๐,๐ผ)) space on edge-arrival streams.
Proof: LetOPT be an optimal solution of Max ๐-Cover(๐ฐ ,โฑ). First, we show that with high
probability, EstimateMaxCover returns ฮฉ(|๐(OPT)|/๐ผ). Note that for each guess on theoptimal coverage size ๐ง โค |๐(OPT)|, by Lemma 5.3.5, the probability that in none of log(1/๐ฟ)
iterations |โ(๐(OPT))| > ๐(OPT)/4 is at most ๐ฟ (i.e., none of the iterations preserve the
optimal coverage size up to a factor of 4). Moreover, by the guarantee of (๐ผ, ๐ฟ, ๐)-oracles for
Max ๐-Cover, each run of๐ fails with probability at most ๐ฟ. Thus, by an application of union
bound, with probability at least 1 โ 2๐ฟ log ๐, est๐ง is at least ๐ง/(4๐ผ) for all ๐ง โค |๐(OPT)|.This in particular implies that the solution returned by EstimateMaxCover is at least
|๐(OPT)|/(8๐ผ). Moreover, since the coverage of a ๐-cover never increases after applying the
โuniverse reductionโ step (i.e. for each ๐ โ ๐ฐ , |โ(๐(๐))| โค |๐|) and the estimate returned
by the (๐ผ, ๐ฟ, ๐)-oracle ๐ is with high probability less than the optimal coverage size, the
output of EstimateMaxCover is in [|๐(OPT)|/(8๐ผ), |๐(OPT)|] with probability at least
1โ 4๐ฟ log ๐.
Finally, since EstimateMaxCover runs (log ๐)(log 1/๐ฟ) instances of ๐ with parameter
(๐ผ, ๐ฟ, ๐) in parallel and each instance has ๐ sets, the total space of EstimateMaxCover
is ๐(๐(๐,๐ผ)). ๏ฟฝ
The universe reduction step basically enables us to only focus on the instances of Max ๐-
Cover in which the optimal solution covers a constant fraction of the ground set, namely
at least |๐ฐ|/4 elements. Next, in Section 5.4, we design an ๐(๐/๐ผ)-space (๐ผ, ๐ฟ, ๐)-oracle
for Max ๐-Cover with ๐ผ = 1/ฮฉ(1), ๐ โฅ 4 and ๐ฟ โค ๐/ log ๐ (๐ < 1), which together with
Theorem 5.3.6 completes the proof of Theorem 5.3.1. Our (๐ผ, ๐ฟ, ๐)-oracle for Max ๐-Cover
performs three different subroutines in parallel that together guarantee the required proper-
ties of (๐ผ, ๐ฟ, ๐)-oracles and only use ๐(๐/๐ผ2) space:
147
โ Set sampling based approach. This subroutine which provides the guarantee of
(๐ผ, ๐ฟ, ๐)-oracles when the number of common elements is large (see Definition 5.2.1) is
an application of a โmulti-layeredโ variant of set sampling. This subroutine is presented
in Section 5.4.1.
โ HeavyHitter based approach. We relate the problem of ๐ผ-estimating/approximating
of Max ๐-Cover to the problem of finding contributing classes and heavy hitters on
properly sampled substreams (see Section 5.2.2) when the main contribution of an
optimal solution of Max ๐-Cover is due to โlargeโ sets. In particular, this subroutine
finds an (1/๐ผ)-estimation of the optimal coverage size when ๐ผ = ฮฉ(๐). This approach
is presented in Section 5.4.2.
โ Element sampling based approach. Finally, we employ element sampling together
with a new sampling technique that samples a collection of sets to find a desired
estimate of Max ๐-Cover on instances for which the main contribution to an optimal
solution comes from โsmallโ sets. Here, we also take advantage of the structure guaran-
teed by the multi-layered set sampling on the number of elements in different frequency
levels. This subroutine is presented in Section 5.4.3.
5.4. (๐ผ, ๐ฟ, ๐)-Oracle of Max ๐-Cover
In this section, we design the promised (๐ผ, ๐ฟ, ๐)-oracle for Max ๐-Cover. Let OPT denote
an optimal solution of Max ๐-Cover (๐ฐ ,โฑ). As described in Definition 5.3.4, the solu-
tion returned by a (๐ผ, ๐ฟ, ๐)-oracle with high probability, is smaller than |๐(OPT)| and if
|๐(OPT)| โฅ |๐ฐ|/๐, with probability at least (1 โ ๐ฟ), it outputs a value not smaller than
|๐(OPT)|/๐ผ. The following theorem together with Theorem 5.3.6 proves Theorem 5.3.1.
Theorem 5.4.1. Oracle(๐ผ, ๐) performs a single pass on edge arrival streams of the set
system (๐ฐ ,โฑ) and implements a ( ๐(๐ผ), 1/(log ๐ log๐๐), ๐)-oracle of Max ๐-Cover(๐ฐ ,โฑ) us-ing ๐(๐/๐ผ2) space.
Proof: The proof follows from the guarantees of LargeCommon (Theorem 5.4.4), Large-
Set (Theorem 5.4.8) and SmallSet (Theorem 5.4.22). The total space of the algorithm
148
is clearly ๐(๐/๐ผ2) which is the space complexity of the each of subroutines invoked by
Oracle. ๏ฟฝ
To design the promised (๐ผ, ๐ฟ, ๐)-oracle, we design different subroutines such that each
guarantees the properties required by the oracle if certain conditions based on the the
size/value of following notions hold.
Common elements. An important property in design of our oracle is whether there exists
๐ฝ โค ๐ผ such that the number of (๐ฝ๐)-common elements is relatively large (see Definition 5.2.1).
We also take advantage of another useful notion which is a property of a ๐-cover (though,
here we only describe it for optimal ๐-covers).
Contribution to the optimal coverage. Given the input argument ๐ผ and a parameter
s as defined in Table 5.4.1, we define the following notion of contribution for the sets in an
(optimal) ๐-cover.
Definition 5.4.2. For a ๐-cover OPT = {๐1, ยท ยท ยท , ๐๐}, we consider an arbitrary ordering
of the sets in OPT and define the contribution of ๐๐ to ๐(OPT) as |๐โฒ๐| where ๐โฒ
๐ :=
๐๐ โโ
1โค๐<๐๐๐. Note that ๐โฒ๐ are disjoint and |
โ๐โ[๐] ๐
โฒ๐| = |๐(OPT)| = ๐ง. We (conceptually)
define OPTlarge to be the collection of all sets in OPT that contribute more than ๐ง/(s๐ผ)
to ๐(OPT) according to ๐โฒ๐s; OPTlarge = {๐๐ โ OPT | |๐โฒ
๐| โฅ ๐ง/(s๐ผ)} for s < 1 (as in
Table 5.4.1). Note that since ๐โฒ๐s are disjoint, |OPTlarge| โค s๐ผ.
w = min{๐, ๐ผ}, s = 9
2500โ
2๐ log(s๐ผ) log2(๐๐)ยท w๐ผ, t = 2500 log2(๐๐)
s, f = 7 log(๐๐), ๐ = 1
2500 log2(๐๐), ๐ = 4
Table 5.4.1: The values of parameters used in our algorithms.
Design of (๐ฟ, ๐ผ, ๐)-oracle of max ๐-cover. Here we sketch a high-level outline of our
(๐ฟ, ๐ผ, ๐)-oracle for Max ๐-Cover (refer to Algorithm 18 for a formal description). In the
following cases, ๐ = ฮฉ( 1log2(๐๐)
) (as in Table 5.4.1).
โ If there exists a ๐ฝ โค ๐ผ such that |๐ฐ cmn๐ฝ๐ | โฅ ๐๐ฝ|๐ฐ|
๐ผ. In this case, by Observation 5.2.5,
to approximate Max ๐-Cover(๐ฐ ,โฑ) within a factor of ๐(๐ผ), it suffices to find ๐ฝ๐ sets
that cover ๐ฐ cmn๐ฝ๐ which can be done via set sampling (see Section 5.4.1).
โ |๐(OPTlarge)| โฅ |๐(OPT)|/2 and โ๐ฝ โค ๐ผ, |๐ฐ cmn๐ฝ๐ | < ๐๐ฝ|๐ฐ|
๐ผ. The subroutine for this case
which is presented in Section 5.4.2, handles the instances of the problem in which s๐ผ โฅ
149
Algorithm 18 An (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover.
1: procedure Oracle(๐, ๐ผ)2: โ for instances in which โ๐ฝ โค ๐ผ s.t. |๐ฐ cmn
๐ฝ๐ | โฅ ๐๐ฝ|๐ฐ|๐ผ
3: solcmn โ LargeCommon(๐, ๐ผ)4: if s๐ผ โฅ 2๐ then5: โ if s๐ผ โฅ 2๐, then |OPTlarge| โฅ |OPT|/26: solHH โ LargeSet(๐, ๐ผ, ๐)7: else8: โ for instances with s๐ผ < 2๐ and |OPTlarge| โฅ |OPT|/29: solHH โ LargeSet(๐, ๐ผ, ๐ผ)10: โ for instances with |OPTlarge| < |OPT|/211: solsmall โ SmallSet(๐, ๐ผ)
12: return max(solcmn, solHH, solsmall)
2๐ or, s๐ผ < 2๐ and there exists an optimal solution OPT such that |๐(OPTlarge)| โฅ|๐(OPT)|/2.
Claim 5.4.3. If s๐ผ โฅ 2๐, then |๐(OPTlarge)| โฅ |๐(OPT)|/2.
Proof: Consider the optimal solution OPT and ignore the sets in OPT whose contri-
bution to the coverage is less than |๐(OPT)|/(2๐). Note that the survived sets belong
to OPTlarge and their total coverage is at least |๐(OPT)|โ๐ ยท |๐(OPT)|2๐
โฅ |๐(OPT)|/2.๏ฟฝ
โ |๐(OPTlarge)| < |๐(OPT)|/2 and โ๐ฝ โค ๐ผ, |๐ฐ cmn๐ฝ๐ | < ๐๐ฝ|๐ฐ|
๐ผ. In this case, the main
contribution to the coverage of OPT comes from โsmallโ sets. This enables us to
show that if we sample sets with probability 1/๐ผ, then ฮฉ(1/๐ผ)-fraction of sets in OPT
survive and with high probability, their coverage is ฮฉ(|๐(OPT)|/๐ผ). In Section 5.4.3,
we show that element sampling method with some new ideas can take care of this case
which can only happen when s๐ผ < 2๐.
5.4.1. Multi-layered Set Sampling: โ๐ฝ โค ๐ผ s.t. |๐ฐ cmn๐ฝ๐ | โฅ ๐๐ฝ|๐ฐ|
๐ผ
Here, we first guess the value of ๐ฝ (more precisely, a 2-approximate estimate of ๐ฝ) and then
pick ๐ฝ๐ sets โฑ rnd๐ฝ at random and compute their coverage in one pass using ๐(1) space. To get
the desired space complexity, we use the implementation of set sampling with ๐(log(๐๐))
random bits as described in Section 5.7.1.
Theorem 5.4.4. Consider an instance (๐ฐ ,โฑ) of Max ๐-Cover. The LargeCommon al-
gorithm uses ๐(1) space and if there exists ๐ฝ โค ๐ผ such that |๐ฐ cmn๐ฝ๐ | โฅ ๐๐ฝ|๐ฐ|
๐ผ, then with
150
Algorithm 19 A (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover that handles the case in which the numberof common elements is large.
1: procedure LargeCommon((๐, ๐ผ))2: for all ๐ฝg โ {2๐ | 1 โค ๐ โค log๐ผ} in parallel do3: โ perform set sampling in one pass using ๐(1) space.4: pick โg : โฑ โ [ ๐๐ log๐
๐ฝg๐] from ฮ(log(๐๐))-wise independent hash functions
5: let DEg be a (1ยฑ 1/2)-approximation streaming algorithm of ๐ฟ0-estimation6: for all (๐, ๐) in the data stream do7: if โg(๐) = 1 then8: feed ๐ to DEg โ computing the coverage of โฑ rnd
๐ฝg
9: if VAL(DEg) โฅ ๐๐ฝg|๐ฐ|/(4๐ผ) then10: return 2VAL(DEg)/(3๐ฝg)
11: return infeasible โ @๐ฝ โ [๐ผ] s.t. |๐ฐ cmn๐ฝ๐ | โฅ ๐๐ฝ|๐ฐ|
๐ผ๐
high probability, the algorithm returns at least ๐|๐ฐ|/(6๐ผ). Moreover, with high probability
the output of LargeCommon is smaller than the coverage size of an optimal solution of
Max ๐-Cover(๐ฐ ,โฑ).
Claim 5.4.5. For each ๐ฝg โ {2๐ | 1 โค ๐ โค log๐ผ}, with high probability, |โฑ rnd๐ฝg| โค ๐ฝg๐.
Lemma 5.4.6. If there exists ๐ฝ โค ๐ผ such that |๐ฐ cmn๐ฝ๐ | โฅ ๐๐ฝ|๐ฐ|/๐ผ, then with high probability
the output of LargeCommon is at least ๐|๐ฐ|/(6๐ผ).
Proof: Let 2๐ be the smallest power of two which is larger than or equal to ๐ฝ; ๐ := โlog ๐ฝโ.Consider the iteration of LargeCommon in which ๐ฝg = 2๐. Since 2๐ฝ > ๐ฝg โฅ ๐ฝ and by
Observation 5.2.2,
|๐ฐ cmn๐ฝg๐ | โฅ |๐ฐ cmn
๐ฝ๐ | โฅ๐๐ฝ|๐ฐ|๐ผโฅ ๐๐ฝg|๐ฐ|
2๐ผ.
Hence, by the guarantee of existing streaming algorithms for ๐ฟ0-estimation (Theorem 5.2.12)
and set sampling (Lemma 5.2.3 and 5.7.7), w.h.p., VAL(DEg) โฅ 12ยท ๐๐ฝg|๐ฐ|
2๐ผ= ๐๐ฝg|๐ฐ|
4๐ผ. Hence,
the estimate returned by the algorithm which is a lower bound on the coverage of the best
๐ sets in โฑ rnd๐ฝg
(see Observation 5.2.5), w.h.p., is at least 23ยท 1๐ฝgยท ๐๐ฝg|๐ฐ|
4๐ผ= ๐|๐ฐ|
6๐ผ.
Moreover, it is straightforward to check that by the guarantee of the streaming algorithm
for ๐ฟ0-estimation (Theorem 5.2.12), the value returned by LargeCommon with high prob-
ability is not more than the actual coverage of the best ๐-cover in the collection of sampled
sets using โg. ๏ฟฝ
151
Lemma 5.4.7. If LargeCommon returns infeasible, then with high probability, for all
๐ฝ โค ๐ผ, |๐ฐ cmn๐ฝ๐ | โค ๐๐ฝ|๐ฐ|/๐ผ.
Proof: Since the algorithm returns infeasible, by the guarantee of the (1ยฑ1/2)-approximation
algorithm for ๐ฟ0-estimation (Theorem 5.2.12) and set sampling (Lemma 5.2.3), for all values
of ๐ฝg โ {2๐ | ๐ โค log๐ผ}, with high probability,
|๐ฐ cmn๐ฝg๐ | โค |๐(โฑ rnd
๐ฝg)| โค 2VAL(DEg) < ๐๐ฝg|๐ฐ|/(2๐ผ). (5.4.1)
Now, for any given value ๐ฝ โค ๐ผ, consider ๐ฝg := 2โlog ๐ฝโ (i.e. set ๐ฝg to be the smallest power
of two which is larger than or equal to ๐ฝ). By Observation 5.2.2, |๐ฐ cmn๐ฝ๐ | โค |๐ฐ cmn
๐ฝg๐| which
together with Eq. 5.4.1 implies that |๐ฐ cmn๐ฝ๐ | โค ๐๐ฝg|๐ฐ|/(2๐ผ) โค ๐๐ฝ|๐ฐ|/๐ผ. ๏ฟฝ
Proof of Theorem 5.4.4: The guarantee on the output of LargeCommon follows from
Lemma 5.4.6. Moreover, by Theorem 5.2.12, the total amount of space to compute the cover-
age of each collection โฑ rnd๐ฝg
(via existing ๐ฟ0-estimation algorithms in streams) is ๐(1). Hence,
the total space to compute the coverage of all log๐ผ collections considered in LargeCommon
is ๐(1). ๏ฟฝ
5.4.2. Heavy Hitters and Contributing Classes:
|๐(OPTlarge)| โฅ |๐(OPTsmall)|In this section, we show that if there exists an optimal solution OPT of Max ๐-Cover(๐ฐ ,โฑ)such that the main contribution in the coverage of OPT is due to large sets, which are
formally defined to be the sets whose contribution to ๐(OPT) is at least |๐(OPT)|/(s๐ผ),then we can approximate the optimal coverage size within a factor of 1/ ๐(๐ผ) by detectingฮฉ(๐ผ2/๐)-HeavyHitters in properly sampled substreams. Following is the main result of this
section.
Theorem 5.4.8. Consider an instance (๐ฐ ,โฑ) of Max ๐-Cover. In a single pass, LargeSet
uses ๐(๐/๐ผ2) space and if the optimal coverage size of the instance is ฮฉ(|๐ฐ|), then with
probability at least 1 โ 1/(log ๐ log๐๐), it returns at least ฮฉ(|๐ฐ|/๐ผ). Moreover, with high
probability, the estimate returned by LargeSet is smaller than the optimal coverage size.
We defer the proof of Theorem 5.4.8 to Section 5.8. In this section, we prove the same
152
guarantees on the performance of a simplified variant of LargeSet, LargeSetSimple,
when ๐ฐ contains no โcommonโ elements (wee will define them formally later in this section)
which essentially presents the main technical ideas.
Partitioning sets into supersets. We partition sets of โฑ randomly into (๐๐ log๐)/w
supersets ๐ฌ := {๐1, ยท ยท ยท ,๐(๐๐ log๐)/w} via a hash function โ : โฑ โ [(๐๐ log๐)/w] chosen
from a family of ฮ(log(๐๐))-wise independent hash functions. More precisely, each set
๐ โ โฑ belongs to the superset ๐โ(๐).
The parameter w denotes the desired upper bound on the maximum number of sets in
a superset in ๐ฌ defined by โ and is set to min(๐ผ, ๐). In fact, given w, we define โ to be
a function picked uniformly at random from a family of ฮ(log(๐๐))-wise independent hash
functions {โฑ โ [(๐๐ log๐)/w]}.
Claim 5.4.9. With high probability, no superset in ๐ฌ has more than w sets.
Claim 5.4.10. With high probability, for each ๐ โ ๐ฐ โ ๐ฐ cmnw and ๐ โ ๐ฌ, the number of sets
in ๐ that contain ๐ is at most f where f = ฮ(log(๐๐)).
This implies that assuming ๐ฐ cmnw = โ , to get a (1/ ๐(๐ผ))-approximation of Max ๐-Cover(๐ฐ ,โฑ),
it suffices to find a superset whose total size of its sets is ฮฉ(1/๐ผ) times the optimal cover-
age size. Now, we are ready to exploit the results on ๐น2-heavy hitters and ๐น2-contributing
classes mentioned in Section 5.2.2 to describe our (๐ผ, ๐ฟ, ๐)-oracle for Max ๐-Cover assuming
๐ฐ cmnw = โ . Later in Section 5.8, we show how to remove this assumption by performing our
algorithm on a set of sampled elements in ๐ฐ instead.
Partitioning supersets by their total size. First, setting ๐ง = |๐(OPT)|, we partitionthe supersets in ๐ฌ (conceptually) according to the total size of their sets into ๐(log๐ผ) classes
153
as follows:
๐ฌ0 = {๐ |โ๐โ๐
|๐| โฅ ๐ง/2},
๐ฌ๐ = {๐ | ๐ง/2๐+1 โคโ๐โ๐
|๐| < ๐ง/2๐}, โ๐ โ [1, log(๐ผ))
๐ฌsmall = {๐ |โ๐โ๐
|๐| < ๐ง/๐ผ}.
Further, let ๐๐ denote the number of supersets in ๐ฌ๐; ๐๐ = |๐ฌ๐|. Next, we define the
vector ๏ฟฝ๏ฟฝ of size (๐๐ log๐)/w such that ๏ฟฝ๏ฟฝ[๐] =โ
๐โ๐๐|๐| denotes the total size of the sets
in ๐๐. In the following, we show that a subset of supersets with large total size form anฮฉ(๐/๐ผ2)-contributing class of ๐น2(๏ฟฝ๏ฟฝ) and any superset in this ฮฉ(๐/๐ผ2)-contributing class is
a (1/๐ผ)-approximate ๐-cover of (๐ฐ ,โฑ).We consider the following two cases depending on whether the coordinates corresponding
to small supersets, ๐ฌsmall, contribute to ๐น2(๏ฟฝ๏ฟฝ); ๐น2(๏ฟฝ๏ฟฝsmall) โฅ ๐น2(๏ฟฝ๏ฟฝ)/2 where ๏ฟฝ๏ฟฝsmall denotes the
vector ๏ฟฝ๏ฟฝ restricted to the coordinates corresponding to supersets in ๐ฌsmall.
Case 1: Supersets with total size less than ๐ง/๐ผ contribute to ๐น2(๏ฟฝ๏ฟฝ). This implies
that ๐น2(๏ฟฝ๏ฟฝ) โค 2๐น2(๏ฟฝ๏ฟฝsmall) โค 2 ยท ( ๐๐ log๐w
) ยท ( ๐ง๐ผ)2 = 2๐๐ log๐
wยท ๐ง2๐ผ2 .
Claim 5.4.11. If ๐น2(๏ฟฝ๏ฟฝsmall) โฅ ๐น2(๏ฟฝ๏ฟฝ)/2, then there exists an ฮฉ(๐ผ2/๐)-contributing class ๐ฌ๐*
of ๐น2(๏ฟฝ๏ฟฝ) for an index ๐* < log(s๐ผ).
Proof: Since each set in OPTlarge has contribution at least ๐ง/(s๐ผ) to the optimal coverage,
each set ofOPTlarge lands in one of๐ฌ0, ยท ยท ยท ,๐ฌlog(s๐ผ)โ1. Moreover, sinceOPTlarge has coverage
at least ๐ง/2,
log(s๐ผ)โ1โ๐=0
๐๐ ยท๐ง
2๐โฅ
โ๐๐โOPTlarge
|๐๐| โฅ |๐(OPTlarge)| โฅ ๐ง/2,
which implies that there exists an index ๐* < log(s๐ผ) such that ๐๐* โฅ 2๐*/(2 log(s๐ผ)). Hence,
154
๐ฌ๐* is an ฮฉ(๐ผ2
๐)-contributing class of ๐น2(๏ฟฝ๏ฟฝ):
|๐ฌ๐* | ยท (๐ง
2๐*+1)2 โฅ 2๐
*
2 log(s๐ผ)ยท ๐ง2
4(2๐*)2=
w
2๐๐ log๐ยท ๐ผ2 ยท 1
2๐*+3 log(s๐ผ)ยท ๐น2(๏ฟฝ๏ฟฝ) B (2๐
*+1 โค s๐ผ)
โฅ (w
s๐ผยท 1
8๐ log(s๐ผ) log๐) ยท ๐ผ
2
๐ยท ๐น2(๏ฟฝ๏ฟฝ)
More formally, since ws๐ผ
= ฮฉ(1) (see Table 5.4.1), ๐ฌ๐* is a ๐1-contributing class of ๐น2(๏ฟฝ๏ฟฝ)
where
๐1 = (w
s๐ผยท 1
8๐ log(s๐ผ) log๐) ยท ๐ผ
2
๐= ฮฉ(๐ผ2
๐). (5.4.2)
๏ฟฝ
Hence, by Theorem 5.2.11, a superset of total size at least 23ยท 12ยท ๐งs๐ผ
will be identified by
the subroutine ๐น2-Contributing(๐1, s๐ผ) using ๐(๐/๐ผ2) space.
Remark 5.4.12. Recall that in order to find a coordinate in a ๐-contributing class ๐ ๐ก* ,
๐น2-Contributing subsamples the stream proportional to 1/|๐ ๐ก*| (so that only ๐(1) co-
ordinates of ๐ ๐ก* survive) and then with high probability any survived coordinate of ๐ ๐ก*
becomes a ฮฉ(๐)-HeavyHitter in the sampled substream. However, here we show that there
exists a ๐-contributing class ๐ ๐ก* whose intersection with OPTlarge is a ๐-contributing class
of coordinates too. Hence, it suffices to only search for a coordinate in a ๐-contributing class
of size at most |OPTlarge| โค s๐ผ.
Case 2. Supersets with coverage less than ๐ง/๐ผ do not contribute to ๐น2(๏ฟฝ๏ฟฝ).
Claim 5.4.13. If ๐น2(๏ฟฝ๏ฟฝsmall) < ๐น2(๏ฟฝ๏ฟฝ)/2, then there exists an ฮฉ(1)-contributing class ๐ฌ๐* of
๐น2(๏ฟฝ๏ฟฝ) for an index ๐* < log๐ผ.
Proof: In this case, since supersets in ๐ฌsmall are not contributing, there exists an index
๐* < log(๐ผ) (note that we consider all classes ๐ฌ0, ยท ยท ยท ,๐ฌlog๐ผโ1 in this case) such that
๐๐* ยท (๐ง/2๐*)2 โฅ ๐น2(๏ฟฝ๏ฟฝ)/(2 log๐ผ);
155
Algorithm 20 An (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover that handles the case in which the ma-jority of the coverage in an optimal solution is by the sets whose coverage contributions areat least 1/(s๐ผ) fraction of the optimal coverage size.
1: procedure LargeSetSimple((๐ฑ ,w, thr1, thr2))2: โ Input: w is an upper bound on the desired number of sets in a superset3: โ Parameters: ๐1 = ฮฉ(๐ผ2/๐) and ๐2 = ฮฉ(1)4: let Cntrsmall be an instance of ๐น2-Contributing(๐1, s๐ผ) โ for Case 15: let Cntrlarge be an instance of ๐น2-Contributing(๐2,
๐๐ log๐w
) โ for Case 26: pick โ : โฑ โ [(๐๐ log๐)/w] from ฮ(log(๐๐))-wise independent hash functions7: for all (๐, ๐) in the data stream do8: if ๐ โ ๐ฑ then9: feed โ(๐) to both Cntrsmall and Cntrlarge
โ output(Cntr) contains coordinates along with (1ยฑ 1/2)-estimate of frequencies10: if there exists ๐ โ output(Cntrsmall) such that ๐ฃ๐ โฅ 1
2ยท thr1 then
11: return 2๐ฃ๐/(3f)
12: if there exists ๐ โ output(Cntrlarge) such that ๐ฃ๐ โฅ 12ยท thr2 then
13: return 2๐ฃ๐/(3f)
14: return infeasible
in other words, ๐ฌ๐* is a ๐2-contributing class of ๐น2(๏ฟฝ๏ฟฝ) where ๐2 = ( 12 log๐ผ
). ๏ฟฝ
Note that, by Theorem 5.2.11, a superset of total size at least 23ยท 12ยท ๐ง๐ผwill be identified by
the subroutine ๐น2-Contributing(๐2,๐๐ log๐
w) using ๐(1) space.
Lemma 5.4.14. If |๐(OPT)| โฅ |๐ฐ|๐, then with probability at least 1โ 1/(3 log ๐ log๐ ๐), the
estimate returned by LargeSetSimple with parameters (๐ฑ = ๐ฐ ,w, thr1 = |๐ฐ|๐s๐ผ
, thr2 = |๐ฐ|๐๐ผ)
has coverage at least |๐ฐ|/(3f๐๐ผ) = ฮฉ(|๐ฐ|/๐ผ).Proof: With probability at least 1โ2/(9 log ๐ log๐๐), the algorithm returns a superset whose
total size is at least 23ยท |๐ฐ|2๐๐ผโฅ |๐ฐ|
3๐๐ผ(in fact, if it is in Case 1, then the estimate is at least |๐ฐ|
3s๐ผ๐).
Then, by Claim 5.4.10 and assumption ๐ฐ cmnw = โ , the coverage of the reported superset is at
least 1fยท |๐ฐ|3๐๐ผ
. ๏ฟฝ
Lemma 5.4.15. The amount of space used by LargeSetSimple is ๐(๐/๐ผ2).
Proof: By Theorem 5.2.11, the amount of space to perform Cntrsmall and Cntrlarge as defined
in Algorithm 25 is respectively ๐(1/๐1) = ๐(๐/๐ผ2) and ๐(1/๐2) = ๐(1). ๏ฟฝ
156
5.4.3. Element Sampling: |๐(OPTlarge)| < |๐(OPTsmall)|Here we design an (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover with the desired parameters for the case
in which |๐(OPTsmall)| > |๐(OPTlarge)|. Intuitively speaking, in this case, after sampling
each set in โฑ with probability ฮ(1/๐ผ) still we can find ๐(๐/๐ผ) sets whose coverage is at
least ฮฉ(|๐(OPT)|/๐ผ). As proved in Claim 5.4.3, if ๐ผ = ฮฉ(๐), then the main contribution to
the coverage of OPT is due to OPTlarge and we can (1/๐ผ)-approximate the optimal coverage
size by LargeSet. Hence, in this section we assume that ๐ผ = ๐(๐). Moreover, throughout
this section we assume that for all ๐ฝ โค ๐ผ, |๐ฐ cmn๐ฝ๐ | < ๐๐ฝ|๐ฐ|
๐ผ(otherwise, our multi-layered set
sampling approach described in Section 5.4.1 returns a (1/๐ผ)-approximation of the optimal
coverage size).
Lemma 5.4.16. Consider an instance of Max ๐-Cover (๐ฐ ,โฑ). Suppose that ๐ is a collec-
tion of ๐ disjoint sets with coverage ๐ง such that no ๐ โ ๐ has size more than ๐ง/(s๐ผ) where
s < 1 and s = ฮฉ(1). Letโs assume that ๐smp := ๐โฉโณ where each ๐ โ โฑ survives inโณ with
probability ๐/(s๐ผ) where ๐ > 1 is a fixed constant. Then, with probability at least (1 โ 6/๐),
๐smp has size at most (2๐๐)/(s๐ผ) and covers at least (๐๐ง)/(2s๐ผ) elements.
Proof: Let ๐ = {๐ โฒ1, ยท ยท ยท , ๐ โฒ
๐} and for each ๐, let ๐๐ to be the random variable corresponding
to ๐ โฒ๐ such that ๐๐ = |๐ โฒ
๐| if ๐ โฒ๐ โ ๐smp and zero otherwise.
Claim 5.4.17. E[๐๐] =๐s๐ผยท |๐ โฒ
๐| and Var[๐๐] โค ๐s๐ผยท |๐ โฒ
๐|2.
Next, we define ๐ := ๐1 + ยท ยท ยท + ๐๐. Note that E[๐] = (๐๐ง)/(s๐ผ) and, by the pairwise
independence of ๐๐s and the assumption that |๐ โฒ๐| โค ๐ง/(s๐ผ),
Var[๐] โค ๐
s๐ผยท
๐โ๐=1
|๐ โฒ๐|2
โค ๐
s๐ผยท s๐ผ ยท ( ๐ง
s๐ผ)2 = ๐ ยท ( ๐ง
s๐ผ)2,
Finally, applying Chebyshev inequality,
Pr[๐ <๐๐ง
2s๐ผ] = Pr[๐ < (
๐๐ง
s๐ผโโ๐
2ยท (โ๐
s๐ผยท ๐ง)] < 4/๐.
Hence, with probability at least 1โ 4/๐, ๐smp covers at least (๐๐ง)/(2s๐ผ) elements.
157
Next, we show that with probability at least 1โ 2/๐, 0 < |๐smp| < (2๐๐)/(s๐ผ). For each
๐, let ๐๐ denote the random variable corresponding to ๐๐ which is equal to one if ๐๐ โ ๐smp
and zero otherwise.
Claim 5.4.18. E[๐๐] = (๐/s๐ผ) and Var[๐๐] โค (๐/s๐ผ).
We define ๐ = ๐1 + ยท ยท ยท+ ๐โ which denotes the size of ๐smp. Then by pairwise independence
of ๐๐s, E[๐ ] = (๐๐)/(s๐ผ) and Var[๐ ] โค (๐๐)/(s๐ผ). Applying Chebyshev inequality (Pr[|๐ โE[๐ ]| โฅ ๐กVar[๐ ]] โค 1/(๐ก2Var[๐ ])), with probability at least 1โ (s๐ผ)/(๐๐) โฅ 1โ 2/๐ (since
in this case, ๐ผ โค 2๐/s), 0 < |๐smp| < (2๐๐)/(s๐ผ).
Hence, with probability at least (1 โ 6/๐), ๐smp is a subset of size at most (2๐๐)/(s๐ผ)
that covers at least (๐๐ง)/(2s๐ผ) elements. ๏ฟฝ
Corollary 5.4.19. Consider an instance (๐ฐ ,โฑ) of Max ๐-Cover and let OPT be an opti-
mal solution of this instance such that |๐(OPTsmall)| โฅ 12ยท |๐(OPT)| โฅ |๐ฐ|/(2๐). Moreover,
letโณโ โฑ be a collection of ๐(|โฑ|/๐ผ) pairwise independent sets picked uniformly at randomsuch that each ๐ โ โฑ belongs to โณ with probability 18/(s๐ผ). With probability at least 2/3,
Max (36๐s๐ผ
)-Cover(๐ฐ ,โณ) has an optimal solution with coverage at least 9|๐ฐ|/(s๐ผ๐).
Proof: By definition of OPTsmall, for each ๐ โ OPTsmall, the contribution of ๐ to OPT (i.e.
๐โฒ) is at most ๐ง/(s๐ผ) (see Definition 5.4.2). Then, the result follows from an application
of Lemma 5.4.16 on collection ๐ := {๐โฒ | ๐ โ OPTsmall} by setting ๐ = 18 and ๐ง =
|OPTsmall| โฅ |๐ฐ|/(2๐). ๏ฟฝ
Next, we show that we can perform โelement samplingโ and find an ๐(1)-approximation
of Max (36๐s๐ผ
)-Cover of the specified instance in Corollary 5.4.19, (๐ฐ ,โณ), in one pass and
using ๐(๐/๐ผ2) space. To this end, first we compute the space complexity of (โ,โฑ) whereโ โ ๐ฐ is a subset of size ๐(๐) which is picked by element sampling.
Lemma 5.4.20. Suppose that an optimal solution of Max ( ๐๐ผ)-Cover(๐ฐ ,โฑ) has coverage
|๐ฐ|/๐พ = ฮฉ(|๐ฐ|/๐ผ). Let โ โ ๐ฐ be a collection of elements of size ๐(๐๐พ๐ผ) picked uniformly at
random. With high probability, the total amount of space to store the set system (โ,โฑ) is๐(๐/๐ผ).
Proof: Recall that (๐ฐ ,โฑ) has the property that for all ๐ฝ โค ๐ผ, |๐ฐ cmn๐ฝ๐ | < ๐๐ฝ|๐ฐ|/๐ผ (otherwise,
158
the result of Section 5.4.1 can be applied). Next, we (conceptually) partition the elements
in ๐ฐ into log๐ผ + 1 groups as follows:
๐ฒ0 = ๐ฐ โ ๐ฐ cmn๐ผ๐
๐ฒ๐ = ๐ฐ cmn(๐ผ/2๐โ1)๐ โ ๐ฐ cmn
(๐ผ/2๐)๐ โ๐ โ [log๐ผ].
Note that |๐ฒ0| โค |๐ฐ| and for each ๐ โ [log๐ผ], |๐ฒ๐| โค ๐|๐ฐ|/2๐โ1. Since each element ๐ โ ๐ฐsurvives in โ with probability ๐( ๐๐พ
๐ผ|๐ฐ|), w.h.p., for each ๐ โ [log๐ผ], |๐ฒ๐ โฉ โ| = ๐(1 + ๐๐พ๐๐ผ2๐โ1 ).
Furthermore, since each element in๐ฒ๐ appears in at most ๐(2๐๐๐ผ๐
) sets in โฑ , the total amount
of space required to store (โ,โฑ) is at most
log๐ผโ๐=0
|๐ฒ๐ โฉ โ| ยทmax๐โ๐ฒ๐
freq(๐) = ๐(๐๐พ
๐ผ) ยท ๐(
๐
๐ผ๐) +
log๐ผโ๐=1
๐(1 +๐๐พ๐
๐ผ2๐โ1) ยท ๐(
2๐๐
๐ผ๐)
= ๐(๐พ๐
๐ผ2) +
log๐ผโ๐=1
๐(2๐๐
๐ผ๐+
๐๐พ๐
๐ผ2)
= ๐(๐/๐ผ). ๏ฟฝ
Next, we show that after subsampling the sets by a factor of ฮ(1/๐ผ), we can save another
factor of ฮฉ(๐ผ) in the space complexity; in other words, (โ,โณ) uses ๐(๐/๐ผ2) space. Note
that since ๐๐ผ may be as large as ฮฉ(๐) we cannot hope to show directly that each element
in ๐ฒ๐ appears in at most ๐(๐/(2๐๐ผ๐)). However, we can show that the total size of the
intersection of all sets in โณ with โ is ๐(๐/๐ผ2) using the properties of the max cover
instance.
Lemma 5.4.21. Suppose that an optimal solution of Max ( ๐๐ผ)-Cover(๐ฐ ,โฑ) has coverage
|๐ฐ|/๐พ = ฮฉ(|๐ฐ|/๐ผ). Let โ โ ๐ฐ be a collection of elements of size ๐(๐๐พ๐ผ) picked uniformly at
random and letโณโ โฑ be a collection of sets of size ๐(๐/๐ผ) picked uniformly at random.
With high probability, the total amount of space required to store the set system (โ,โณ) is๐(๐/๐ผ2).
Proof: First note that since an optimal ( ๐๐ผ)-cover of (๐ฐ ,โฑ) has coverage |๐ฐ|/๐พ, with high
probability, for each set ๐ โ โณ, |๐ โฉ โ| = ๐(๐/๐ผ). Moreover, by Lemma 5.4.20, the size
159
Algorithm 21 A single pass streaming algorithm that estimates the optimal coverage sizeof Max ๐-Cover(๐ฐ ,โฑ) within a factor of 1/ ๐(๐ผ) in ๐(๐/๐ผ2) space. To return a ๐( ๐
๐ผ)-cover
that achieves the computed estimate it suffices to change the line marked by (โโ) to returnthe corresponding ๐( ๐
๐ผ)-cover as well (see Section 5.5).
1: procedure SmallSet((๐, ๐ผ))2: for all ๐พg โ {2โ๐ | ๐ โ [log๐ผ]} in parallel do3: โ ๐พg is an estimate of the optimal coverage of ๐( ๐
๐ผ)-cover
4: for log ๐ times in parallel do5: โณโ uniformly selected samples of size ฮ(๐/๐ผ) from โฑ6: โ โ uniformly selected samples of size ฮ(๐พg ยท ( ๐๐ผ)) from ๐ฐ7: initialize S(โ,โณ) to be an empty set โ S(โ,โณ) stores (โ,โณ)8: for all (๐, ๐) in the data stream do9: if ๐ โโณ and ๐ โ โ๐ then10: add (๐, ๐) to S(โ,โณ)
11: if S(โ,โณ) > ๐(๐/๐ผ2) then12: terminate13: sol๐พg โ maxโ,โณ{๐(1)-approx solution of Max (36๐
s๐ผ)-Cover(S(โ,โณ))} (โโ)
14: return max๐พg{( |๐ฐ|๐พg(๐/๐ผ)
ยท sol๐พg) | sol๐พg = ฮฉ(๐/๐ผ)}of the intersection of all sets in โฑ with โ is ๐(๐/๐ผ). Next, we (conceptually) partition the
sets of โฑ into ๐(log ๐) groups based on their intersection size with โ as follows (๐ = ๐(1)):
๐ฌ๐ = {๐ โ โฑ |1
2๐ยท ๐๐๐ผโค |๐ โฉ โ| < 1
2๐โ1ยท ๐๐๐ผ}, โ1 โค ๐ โค log ๐
Since the total size of the intersection of all sets with the sampled set โ is w.h.p. ๐(๐/๐ผ),
for each ๐ โค log ๐, |๐ฌ๐| โค๐(๐/๐ผ)
(๐๐)/(2๐๐ผ)= ๐(2
๐ยท๐๐
). Since we have the assumption that ๐๐๐ผโฅ 1 (we
took care of the case ๐ < ๐๐ผ in the first line of EstimateMaxCover separately), w.h.p.,
for each ๐ฌ๐, |๐ฌ๐ โฉโณ| = ๐(2๐๐๐๐ผ
). Hence, the total amount of space to store (โ,โณ) is at
most
log ๐โ๐=1
1
2๐โ1ยท ๐๐๐ผยท ๐(
2๐๐
๐๐ผ) = ๐(
๐
๐ผ2). ๏ฟฝ
Theorem 5.4.22. If |๐(OPTsmall)| โฅ |๐ฐ|/(2๐) and for all ๐ฝ โค ๐ผ, |๐ฐ cmn๐ฝ๐ | < ๐๐ฝ|๐ฐ|
๐ผ, then with
high probability, SmallSet outputs a (1/ ๐(๐ผ))-approximate of the optimal coverage size of
Max ๐-Cover(๐ฐ ,โฑ) in ๐(๐/๐ผ2) space.
Proof: By Corollary 5.4.19, for any sampled collection of sets โณ of size ฮ(๐/๐ผ) (as in
160
SmallSet), with probability at least 2/3, there exists a subset of size at most (36๐s๐ผ
) inโณwhose coverage is 9|๐ฐ|/(s๐ผ๐) = |๐ฐ|/๐พ. Moreover, by the guarantee of element sampling,
Lemma 5.2.4, when ๐พ/2 < ๐พ๐ โค ๐พ, an ๐(1)-approximate solution of Max ๐-Cover(โ,โณ)
with high probability is an ๐(1)-approximate solution of Max ๐-Cover(๐ฐ ,โณ). Hence, with
probability 1 โ ๐โ1, in at least one of the (log ๐) instances with the desired ๐พg, sol๐พg has
coverage ฮฉ( |๐ฐ|๐พยท ๐พg(๐/๐ผ)|๐ฐ| ) = ฮฉ(๐/๐ผ) over the sampled elements โ. Note that we need to scale
sol๐พg by a factor of ฮ(|๐ฐ|/(๐พg ยท ๐๐ผ)) to reflects its coverage on ๐ฐ .Further, by Lemma 5.4.21, the amount of space required to store each (โ,โณ) with high
probability is ๐(๐/๐ผ2) and since SmallSet stores ๐(1) different instances (โ,โฑ), the totalspace of the algorithm is ๐(๐/๐ผ2). ๏ฟฝ
To complete the SmallSet is indeed an (๐ผ, ๐ฟ, ๐)-oracle with the desired parameters, we
need to show that it never overestimates the optimal coverage size.
Lemma 5.4.23. The output of SmallSet with high probability is not larger than the op-
timal coverage size of Max ๐-Cover(๐ฐ ,โฑ).
Proof: Note that if the optimal coverage size Max ๐( ๐๐ผ)-Cover(โ,โณ) is less than ฮฉ(|๐ฐ|/(๐ผ๐พg)),
then with high probability in none of the log ๐ iterations sol๐พg = ฮฉ(๐/๐ผ). Hence, SmallSetwith high probability does not overestimate the size of an optimal ๐(๐/๐ผ)-cover in (๐ฐ ,โฑ).๏ฟฝ
5.5. Approximating Max ๐-Cover Problem in Edge Arrival
Streams
In this section, we show that we can modify Oracle(๐, ๐ผ) and the subroutines within it
slightly to report a (1/๐ผ)-approximate solution as well. Currently, in the algorithms for
estimating maximum coverage size, we only maintain the coverage of a collection of sets. To
provide the actual (1/๐ผ)-approximate solution, we also need to report the indices of the sets
that belong to the collection.
Reporting a ๐-cover in LargeCommon. In this subroutine, the coverage of different
covers with possibly size more than ๐ are computed. For the task of estimating the maximum
coverage size, this does not hurt as long as we scale the coverage by a factor that bounds
how much larger the size of cover is compared to ๐. However, in the task of reporting an
161
Algorithm 22 A single pass streaming algorithm that reports a (1/๐ผ)-approximate solutionof Max ๐-Cover(๐ฐ ,โฑ) in ๐(๐ฝ + ๐) space when the set of common elements is large.
1: procedure ReportLargeCommon((๐, ๐ผ, ๐ฝ))2: for all ๐ฝg โ {2๐ | 0 โค ๐ โค log ๐ฝ} in parallel do3: โ perform set sampling in one pass using ๐(1) space.4: pick โ : โฑ โ [ ๐๐ log๐
(๐ฝg๐)] from ฮ(log(๐๐))-wise independent hash functions
5: pick โ2 : โฑ โ [๐ฝg log๐] from ฮ(log(๐๐))-wise independent hash functions6: let {DEg,๐ | ๐ โ [๐ฝg]} be ๐ฝg instances of (1/2)-approximation of ๐ฟ0-estimation7: for all (๐, ๐) in the data stream do8: if โ(๐) = 1 then9: feed ๐ to DEg,โ2(๐) โ computing ๐(โฑ rnd
๐ฝg,๐)
10: if โ๐* โ [log ๐ฝ] s.t. VAL(DEg,๐*) โฅ ๐|๐ฐ|/(4๐ผ) then11: return 2VAL(DEg)/3 and {๐ | โ(๐) = 1 and โ2(๐) = ๐*}12: return infeasible โ โ no ๐ฝ โ [๐ผ] s.t. |๐ฐ cmn
๐ฝ๐ | = ฮฉ(3๐๐ฝ|๐ฐ|๐ผ๐
)
actual ๐-cover, it becomes crucial to have at most ๐ sets. To this end, instead of computing
the coverage of a randomly selected (๐ฝg๐)-cover, we partition the collection into ๐(๐ฝg) sub-
collections each of size at most ๐ and by Observation 5.2.5, at least one of them achieves
the reported coverage size up to polylogarithmic factors. Moreover, we need to maintain the
coverage of ๐(๐ฝg) = ฮ(๐ฝ) covers which add an extra ๐ฝg term in the space complexity (see
Figure 22).
Lemma 5.5.1. The space complexity of ReportLargeCommon(๐, ๐ผ, ๐ฝ) is ๐(๐ฝ + ๐).
Proof: Similarly to the space analysis of LargeCommon (Theorem 5.4.4) and by employ-
ing an existing (1 ยฑ 1/2)-approximation algorithm of ๐ฟ0-estimation that uses ๐(1) space
(Theorem 5.2.12), the total amount of space used in the algorithm to maintain DEg,๐ for
all ๐ฝ log ๐ฝ possible choices of (๐ฝg, ๐) is ๐(๐ฝ). Moreover, the algorithm uses ๐(๐) space to
compute the indices of the best ๐-cover. In total, the space complexity is ๐(๐ฝ + ๐). ๏ฟฝ
Note that the approximation analysis of ReportLargeCommon is essentially the same
as the analysis of LargeCommon in Theorem 5.4.4.
Reporting a ๐-cover in LargeSet. This subroutine invokes LargeSetComplete
(Algorithm 24) on different sets of sampled elements and in order to report an actual ๐-
cover along with a (1/ ๐(๐ผ))-approximation of the optimal coverage size, we modify the
LargeSetComplete subroutine slightly.
162
In LargeSetComplete, it is always guaranteed that a superset with sufficiently large
coverage is detected and its coverage is computed and reported. To get the actual collection
of sets that achieve the coverage, we need to try โ on all sets (which belong to [๐]) and report
the indices of those that are mapped to the superset with large coverage. More precisely,
anywhere in Figure 24 that the algorithm returns a (non infeasible) value (the lines that
are marked by (โโ)), it also returns the set {๐ | โ(๐) = ๐*} which is guaranteed to have size
at most w = min(๐ผ, ๐).
Lemma 5.5.2. The space complexity of the modified LargeSetComplete with the sug-
gested modification at lines (โโ) is ๐(๐๐ผ2 + ๐).
Proof: Similarly to the space analysis of LargeSetComplete without the modification
(Lemma 5.8.7), the modified algorithm consumes ๐(๐/๐ผ2) space to compute a (1/ ๐(๐ผ))-
approximation of the optimal coverage size. Moreover, the new modified algorithm uses
additional ๐(w) = ๐(๐) space to find the ๐-cover corresponding to the returned estimate of
the optimal coverage size. Hence, the total space complexity of LargeSetComplete with
the suggested modification at lines (โโ) is ๐(๐๐ผ2 + ๐). ๏ฟฝ
Corollary 5.5.3. The space complexity of the LargeSet that invokes LargeSetCom-
plete with the suggested modification at lines (โโ) is ๐(๐๐ผ2 + ๐).
Reporting a ๐-cover in SmallSet. This is the most straightforward case. The only
required change is to modify the line marked by (โโ) in SmallSet to return both an
๐(1)-approximation of the Max ๐( ๐๐ผ)-Cover instance at line (โโ) and its corresponding๐( ๐
๐ผ)-cover. Clearly, the extra amount of space to implement this modification is ๐( ๐
๐ผ).
Lemma 5.5.4. The space complexity of the modified SmallSet with the suggested modifi-
cation at line (โโ) is ๐(๐๐ผ2 +
๐๐ผ).
We also need to slightly modify the described Oracle in Algorithm 18 so that, besides
a (1/ ๐(๐ผ))-estimate of the optimal coverage size, it reports a (1/ ๐(๐ผ))-approximate ๐-
cover. Note that it is crucial to run ReportLargeCommon with different parameters
depending on whether s๐ผ โฅ 2๐ and s๐ผ < 2๐. The main reason is that the space complexity
of ReportLargeCommon as described above in Algorithm 22 has an additive ฮ(๐ฝ) term
163
Algorithm 23 An (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover that reports a (1/ ๐(๐ผ))-approximate๐-cover.1: procedure ReportOracle((๐, ๐ผ))2: if s๐ผ โฅ 2๐ then3: โ if |๐ฐ cmn
๐ | โฅ ๐|๐ฐ|๐ผ
4: solcmn โ ReportLargeCommon(๐, ๐ผ, 1)5: โ if s๐ผ โฅ 2๐, then |OPTlarge| โฅ |OPT|/26: solHH โ LargeSet(๐, ๐ผ, ๐) โ the described modified LargeSetComplete7: else8: โ for instances in which โ๐ฝ โค ๐ผ s.t. |๐ฐ cmn
๐ฝ๐ | โฅ ๐๐ฝ|๐ฐ|๐ผ
9: solcmn โ ReportLargeCommon(๐, ๐ผ, ๐ผ)10: โ for instances with s๐ผ < 2๐ and |OPTlarge| โฅ |OPT|/211: solHH โ LargeSet(๐, ๐ผ, ๐ผ) โ the described modified LargeSetComplete12: โ for instances with |OPTlarge| < |OPT|/213: solsmall โ SmallSet(๐, ๐ผ) โ with the described modification in line (โโ)
14: โ each sol computed above is of from (estimate of the optimal coverage size, ๐-cover)15: return max(solcmn, solHH, solsmall) โ max is over the estimate size of sols
which can be as large as ๐ผ in our implementation of LargeCommon in Algorithm 19.
By all minor modifications in subroutines LargeCommon, LargeSet and SmallSet,
we design a modified variant of Oracle called ReportOracle as in Algorithm 23.
Theorem 5.5.5. ReportOracle(๐, ๐ผ) performs a single pass on the edge arrival stream of
the set system (๐ฐ ,โฑ) and implements an ( ๐(๐ผ), 1/(log ๐ log๐ ๐), ๐)-oracle of Max ๐-Cover(๐ฐ ,โฑ)using ๐(๐/๐ผ2 + ๐) space. Moreover, the modified oracle also reports a ๐-cover with the
promised coverage size.
Proof: The approximation analysis of the algorithm is basically the same as the analysis of
Oracle and we do not repeat it here. We only need show that structural property on the
number of elements in different frequency levels provided by LargeCommon with input
parameters (๐, ๐ผ, 1) is sufficient for the (modified variant of) LargeSet in the case s๐ผ โฅ ๐.
In LargeSet with input ๐ผ = ฮฉ(๐), the only bound on the number of common elements
that is used in the analysis is |๐ฐ cmnw | = |๐ฐ cmn
๐ | โค ๐|๐ฐ|๐ผ
(in Theorem 5.8.6). In other words, in
this case, unlike the case ๐ผ = ๐(๐), we do not need the guarantee on the size of ๐ฐ cmn๐ฝ๐ for
all values of ๐ฝ โค ๐ผ. Hence, for the case ๐ผ = ฮฉ(๐), it is sufficient to invoke LargeCommon
with ๐ฝ = 1. Note that the values of the rest of parameters are the same as in Oracle.
To bound the space complexity, we consider two cases: 1) s๐ผ โฅ 2๐ and 2) s๐ผ < 2๐.
164
1. s๐ผ โฅ 2๐. By Lemma 5.5.1 and Corollary 5.5.3, ReportLargeCommon(๐, ๐ผ, 1) and
LargeSet that invokes the modified variant of LargeSetComplete respectively
use ๐(๐) and ๐(๐๐ผ2 + ๐) space. Hence, the total amount of space used in the case is๐(๐
๐ผ2 + ๐).
2. s๐ผ < 2๐. In this case, by Lemma 5.5.1, ReportLargeCommon(๐, ๐ผ, ๐ผ) uses ๐(๐+
๐ผ) = ๐(๐) space. Moreover, by Corollary 5.5.3 and Lemma 5.5.4, the space complexity
of the other two subroutines with the described modification is ๐(๐๐ผ2 + ๐). Hence, the
total amount of space that the algorithm uses in this case is ๐(๐๐ผ2 + ๐).
Finally, by the modifications we described above, ReportOracle correctly reports an
actual ๐-cover whose coverage is within a ๐(๐ผ) factor of the optimal coverage as well. ๏ฟฝ
The theorem above together with Theorem 5.3.6 completes the proof of Theorem 5.3.2.
5.6. Lower Bound for Estimating Max ๐-Cover in Edge Ar-
rival Streams
By the result of [29], it is known that estimating the size of an optimal coverage of Max ๐-
Cover within a factor of two requires ฮฉ(๐) space. Their argument relies on a reduction
from Set Disjointness problem and implies the mentioned bound for 1-cover instances. In
the following, we generalize their approach and provide lower bounds for the all range of
approximation guarantees ๐ผ smaller thanโ๐. We remark that both our lower bound result
and the lower bound result of [29] are basically similar to the lower bound of ๐ฟโ and ๐ฟ๐
estimation first proved in [13, 27].
The lower bound result we explain in this section is based on the well-known ๐-player Set
Disjointness problem with unique set intersection promise which has been studied extensively
in communication complexity (e.g. [28, 41, 85]). The setting of the problem is as follows:
There are ๐ players and each has a set ๐๐ โ [๐]. The promise is that the input is in one of
the following forms:
โ No Case: There is a unique element ๐ โ [๐] such that for all ๐ โค ๐, ๐ โ ๐๐.
โ Yes Case: All sets are pair-wise disjoint.
165
Moreover, a round of communication consists of each player ๐ sending a message to player
๐ + 1 in order from ๐ = 1 to ๐ โ 1. The goal is that at end of a single round, player ๐
be able to correctly output whether the input belongs to the family of Yes instances or
No instances. Chakrabarti et al. [41] showed the following tight lower bound on the one-
way communication complexity of the ๐-player Set Disjointness problem with unique set
intersection promise.
Theorem 5.6.1 (From [41]). Any randomized one-way protocol that solves ๐-player Set
Disjointness(๐) with success probability at least 2/3 requires ฮฉ(๐/๐) bits of communication.
We remark that the same ฮฉ(๐/๐) communication lower bound was later proved for the
general model (i.e. with multiple rounds) by Gronemeier [85]. However, for our application,
the lower bound on the one-way communication model suffices.
Corollary 5.6.2. Any single-pass streaming algorithm that solves ๐-player Set Disjointness(๐)
with success probability at least 2/3 consumes ฮฉ(๐/๐2) space.
Next, we sketch a reduction from ๐-player Set Disjointness(๐) to Max ๐-Cover with
๐ sets such that an ๐ผ-approximation protocol of Max ๐-Cover solves the corresponding
instance of ๐-player Set Disjointness(๐).
To this end, consider an arbitrary instance โ of ๐ผ-player Set Disjointness(๐) problem
in which each player ๐ has a set ๐๐ โ [๐]. Define ๐ฐโ = {๐1, ยท ยท ยท , ๐๐ผ} to be the set of
elements in the Max 1-Cover instance and for each player ๐ if ๐ โ ๐๐ then add (๐๐, ๐๐,)
to the stream. In other words, in the constructed Max 1-Cover(๐ฐโ ,โฑโ := {๐1, ยท ยท ยท , ๐๐})instance, we have an element ๐๐ corresponding to each player ๐ and there exists a set ๐๐
corresponding to each item ๐ โ [๐]. Moreover, each set ๐๐ in the Max 1-Cover instance
(๐ฐโ ,โฑโ) denotes the set of players in the Set Disjointness(๐) instance โ whose input sets
contain ๐; ๐๐ := {๐ โ [๐ผ] | ๐ โ ๐๐}.
Claim 5.6.3. If โ is a No instance, then the optimal coverage of the Max 1-Cover instance
(๐ฐโ ,โฑโ) is ๐ผ.
Proof: In this case, by the unique intersection promise, there exists an item ๐ that belongs
to all ๐๐ (for ๐ โ [๐ผ]). Hence, by the construction of the Max 1-Cover instance, ๐๐ covers
the whole ๐ฐโ . Thus, the optimal 1-cover has size ๐ผ. ๏ฟฝ
166
Claim 5.6.4. If โ is a Yes instance, then the optimal coverage of the Max 1-Cover instance
(๐ฐโ ,โฑโ) is 1.
Proof: Since ๐๐s are disjoint, for each ๐ โ [๐], the set ๐๐ has cardinality one. ๏ฟฝ
Corollary 5.6.2 together with Claims 5.6.3 and 5.6.4 implies the stated lower bound on
๐ผ-approximating the optimal coverage size of Max ๐-Cover in edge-arrival streams in The-
orem 5.3.3: Any single pass (possibly randomized) algorithm on edge-arrival streams that
(1/๐ผ)-approximates the optimal coverage size of Max ๐-Cover requires ฮฉ(๐/๐ผ2) space.
5.7. Chernoff Bound for Applications with Limited Inde-
pendence
In this section, we mention some of the results in [159] on applications of Chernoff bound
with limited independence that are used in our analysis.
Definition 5.7.1 (Family of ๐-wise Independent Hash Functions). A family of func-
tions โ = {โ : [๐]โ [๐]} is ๐-wise independent, if for any set of ๐ distinct values ๐ฅ1, ยท ยท ยท , ๐ฅ๐,
the random variables โ(๐ฅ1), ยท ยท ยท , โ(๐ฅ๐) are independent and uniformly distributed in [๐] when
โ is picked uniformly at random from โ.
Next, we exploit the results that show for small values of ๐, we can store a family of ๐-wise
hash function in small space and in the same time it suffices for our application of Chernoff
bound.
Lemma 5.7.2 (Corollary 3.34 in [166]). For every values of ๐,๐, and ๐, there is a fam-
ily of ๐-wise independent hash functions โ = {โ : [๐]โ [๐]} such that a selecting a random
function from โ only requires ๐ ยท log(๐๐).
Lemma 5.7.3 (Theorem 5 in [159]). Let ๐1, ยท ยท ยท๐๐ be binary ๐-wise independent ran-
dom variables and let ๐ := ๐1 + ยท ยท ยท+๐๐. Then,
Pr(|๐ โ E[๐]| โฅ ๐ฟE[๐]) โค
โงโจโฉ๐โE[๐]๐ฟ2/3 : if ๐ฟ < 1 and ๐ = ฮฉ(๐ฟ2E[๐]);
๐โE[๐]๐ฟ/3 : if ๐ฟ โฅ 1 and ๐ = ฮฉ(๐ฟE[๐]).
Lemma 5.7.4 (Theorem 6 in [159]). Let ๐1, ยท ยท ยท , ๐๐ and ๐1, ยท ยท ยท , ๐๐ be Bernoulli trials
167
such that for each ๐, E[๐๐] = E[๐๐] = ๐๐. Let assume that ๐๐s are independent, but ๐๐s are
only ๐-wise independent. Further, let ๐(๐) and ๐๐(๐) respectively denote Pr(โ๐
๐=1 ๐๐ = ๐) and
Pr(โ๐
๐=1๐๐ = ๐) and let ๐ =โ๐
๐=1 ๐๐ be the expected number of success in the trials.
If ๐ โฅ ๐๐+ ln(1/๐(0)) + ๐ +๐ท, then |๐๐(๐)โ ๐(๐)| โค ๐โ๐ท๐(๐).
5.7.1. An Application: Set Sampling with ฮ(log(๐๐))-wise Indepen-
dence
Consider a set system (๐ฐ ,โฑ) and let โ : โฑ โ [(๐๐ log๐)/๐พ] be a function selected uni-
formly at random from a family of ฮ(log(๐๐))-wise independent hash functions where ๐ is
a sufficiently large constant. Then, we think of our randomly selected sets โฑ rnd to be the
collection of sets in โฑ that are mapped to one by โ; โฑ rnd := {๐ โ โฑ | โ(๐) = 1}.
Lemma 5.7.5. Assuming ๐พ โฅ 6๐ log2๐, with probability at least 1โ๐โ1, |โฑ rnd| โค ๐พ.
Proof: Let ๐๐ be a random variable which is one if ๐๐ โ โฑ rnd and zero otherwise. We define
๐ := ๐1+ยท ยท ยท+๐๐. Note that ๐๐ are ฮ(log(๐๐))-wise independent and E[๐] = ๐พ/(๐ log๐).
Then, by an application of Chernoff bound with limited independence (Lemma 5.7.3),
Pr(๐ > ๐พ) โค Pr(๐ > (1 +
โ6๐
๐พlog๐)E[๐]โ โ
โค2๐พ/๐
) < ๐โ1.
Hence, with high probability, โฑ rnd has size at most ๐พ. ๏ฟฝ
Next, we show that โฑ rnd covers the set of elements ๐ฐ cmn๐พ (see Definition 5.2.1).
Lemma 5.7.6. With probability at least 1โ ๐โ1, โฑ rnd covers ๐ฐ cmn๐พ .
Proof: Let ๐ โ ๐ฐ cmn๐พ and let ๐1, ยท ยท ยท , ๐๐ be the sets in โฑ that cover ๐: for each ๐ โค ๐, ๐ โ ๐๐.
We define ๐๐ to be a random variable which is one if ๐๐ โ โฑ rnd and is zero otherwise. We
also define ๐ := ๐1 + ยท ยท ยท+๐๐ to denote the number of sets in โฑ rnd that cover ๐. Note that
๐๐ are ฮ(log(๐๐))-wise independent and E[๐] = (๐พ/(๐๐ log๐)) ยท ๐ โฅ log ๐ log(๐๐). Then,
applying Chernoff bound on random variables with limited independence (Lemma 5.7.3),
Pr(๐ < 1) < Pr(๐ < (1โโ
(6 log ๐)/E[๐]โ โ โค1/2
)E[๐]) < ๐โ2.
168
Hence, by union bound over all elements in ๐ฐ cmn๐พ , with high probability, โฑ rnd covers ๐ฐ cmn
๐พ .๏ฟฝ
Lemma 5.7.7. ฮ(log(๐๐)) random bits suffice to implement set sampling method.
5.8. Generalization of Section 5.4.2
In this section we generalize the approach of Section 5.4.2 which relates the results on heavy
hitters and contributing classes to approximating the optimal coverage size. In Section 5.4.2,
to simplify the presentation, we had the assumption that ๐ฐ cmnw is empty. This is in particular
important since we can then assume that the total size of ๐-covers are roughly the same as
their coverage size (up to polylogarithmic factors).
Here, we take care of the case in which ๐ฐ cmnw is non-empty and complete the description
of our (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover for the case |๐(OPTlarge)| โฅ |๐(OPT)|/2. The high
level idea is to sample enough number of elements so that the algorithm using heavy hitters
and contributing classes still work but in the same time no w-common element is among the
sampled elements with at least a constant probability.
Step 1. Sampling Elements. We sample a subset โ โ ๐ฐ in which each element ๐ is in โwith probability ๐ = (t ยท s๐ผ ยท ๐)/|๐ฐ| where t = ๐(1) (see Table 5.4.1 for the exact values). We
implement the process of sampling โ via a hash function from a family of ฮ(log(๐๐))-wise
independent functions โ = {โ : ๐ฐ โ [ ๐tยทs๐ผยท๐ ]} such that โ = {๐ โ ๐ฐ | โ(๐) = 1}.
Claim 5.8.1. With high probability, (๐|๐ฐ|)/2 โค |โ| โค (3๐|๐ฐ|)/2.
For each collection of set๐, we define๐โฒ to be the intersection of๐ with โ; ๐โฒ := {๐โฉโ | ๐ โ๐}.
Claim 5.8.2. If |๐(๐)| โฅ |๐ฐ|/(54f๐๐ผ), then with probability at least 1 โ ๐โ2, |๐(๐โฒ)| โฅ๐|๐(๐)|/2. Moreover, if |๐(๐)| < |๐ฐ|/(54f๐๐ผ), then with probability at least 1 โ ๐โ2,
|๐(๐โฒ)| < ts/(36f).
Proof: Since ts โฅ 27ยท24f log๐ (as in Table 5.4.1), it follows from two applications of Chernoff
bound on random variables with limited independence (Lemma 5.7.3). ๏ฟฝ
Similarly to Lemma 5.4.14, we have the following guarantee for LargeSetSimple using
the sampled set of elements โ.
169
Lemma 5.8.3. If |๐(OPT)| โฅ |๐ฐ|/๐ and โ โฉ ๐ฐ cmn = โ , then with probability at least 1 โ1/(2 log ๐ log๐๐), the output of LargeSetSimple with parameters (โ,w, ๐1 = sโ๐ผ, ๐2 =
๐๐ log๐w
, thr1 =|โ|
18๐s๐ผ, thr2 =
|โ|6๐๐ผ
) is a superset whose coverage is at least |๐ฐ|/(54f๐๐ผ).
Proof: By Claim 5.8.1, (๐|๐ฐ|)/2 โค |โ| โค (3๐|๐ฐ|)/2. Moreover, by Claim 5.8.2 and since
|๐(OPT)| โฅ |๐ฐ|/๐, with probability at least 1 โ๐โ2, |๐(OPTlarge) โฉ โ| โฅ ๐|๐(OPT)|/4 โฅ|โ|/(6๐). We define ๐โ := 6๐ to denote the coverage of OPTlarge over the sampled elements
โ.Now, consider the collection OPTlarge := {๐1, ยท ยท ยท , ๐๐}. Since for each ๐ โค ๐, the con-
tribution of ๐๐ to the coverage of OPT is at least 1/(s๐ผ) fraction (i.e. |๐๐ โโ
๐<๐๐๐| โฅ|๐(OPT)|/(s๐ผ) โฅ |๐ฐ|/(๐s๐ผ)), by Claim 5.8.2, with probability at least 1โ๐โ1, for all ๐ โค ๐,
|(๐๐ โโ๐<๐
๐๐) โฉ โ| โฅ ๐|๐(OPT)|/(2s๐ผ).
This implies that {๐๐ โฉ โ | ๐๐ โ OPTlarge} is a collection of sets whose contribution to
๐(OPT) โฉ โ w.h.p. is at least (๐|๐(OPT)|2s๐ผ
)/(3๐|๐(OPT)|2
) = 1/(3s๐ผ). We define sโ := 3s which
denotes the contribution of sets in OPTlarge compared to the coverage of OPT over the
sampled elements โ.By an application of Lemma 5.4.14 with parameters (๐ฑ := โ, thr1 := |โ|
๐โsโ๐ผ, thr2 :=
|โ|๐โ๐ผ
),
with probability at least 1 โ 1/(3 log ๐ log๐๐), the algorithm returns a superset ๐โฒ๐ whose
coverage on the sampled set โ is at least
|โ|3f๐โ๐ผ
โฅ ๐|๐ฐ|36f๐ผ๐
=ts
36f.
Then, by Claim 5.8.2, with probability at least 1โ๐โ2, ๐๐ has coverage at least |๐ฐ|/(54f๐๐ผ).๏ฟฝ
Step 2. Handling Common Elements. Next, we turn our attention to the case โ โฉ๐ฐ cmnw = โ . Although common elements may be covered ฮฉ(1) times within a single superset,
it is important to note that the contribution of common elements to all supersets are roughly
the same.
Claim 5.8.4. Let โcmnw := โโฉ๐ฐ cmn
w be the set of w-common elements that are sampled in โ.
170
Then, with high probability, for each superset ๐, the total number of times that elements of
โcmnw appear in ๐ (counting duplicates) belongs to [๐, 2๐ ] where ๐ is a fixed number larger
than log ๐.
Proof: Let ๐ โ ๐ฐ cmnw be a w-common element and let ๐1, ยท ยท ยท , ๐๐ be the collection of sets
that contain ๐. For a superset ๐๐, define ๐๐,๐ to be a binary random variable that denotes
whether ๐๐ โ ๐๐. Moreover, let ๐๐,๐ := ๐1,๐ + ยท ยท ยท + ๐๐,๐. Then, E[๐๐,๐] = w๐/(๐๐ log๐).
By an application of Chernoff bound with limited independence (Lemma 5.7.3) and since
w๐ โฅ ๐๐ log๐ log ๐ log(๐๐) (see Definition 5.2.1),
Pr(|๐๐,๐ โ E[๐๐,๐]| โฅโ
6๐ log(๐๐)๐ log๐
w๐โ โ โค1/3
E[๐๐,๐]) โค (๐๐)โ2. (5.8.1)
Note that for any w-common element ๐ and any pair of supersets ๐๐,๐๐, E[๐๐,๐] = E[๐๐,๐].
In particular, we define ๐๐ := E[๐๐,๐] whose value is independet of the supersets. Next, we
define ๐cmn :=โ
๐โ๐ฐcmn ๐๐ to denote the expected contribution of w-common elements to any
superset. Hence, for each superset ๐๐, with probability at least 1โ1/(๐๐2), the total number
of times that w-common elements are covered by ๐๐ belongs to the range [2๐cmn/3, 4๐cmn/3].
Hence, with probability at least 1โ(๐๐)โ1, the contribution of sampled w-common elements
to each superset belongs to [2๐cmn/3, 4๐cmn/3] where ๐cmn โฅ log ๐ log(๐๐)|โcmnw | โฅ log ๐. ๏ฟฝ
Next, we show that if a w-common element is picked in โ the algorithm still does not
return a superset with small coverage (though it may missed all large supersets). To this
end, we modify LargeSetSimple and design a new subroutine LargeSetComplete as
in Figure 24. The high level idea is to guarantee that if the main contribution of a superset
is just from the duplicate counts of w-common elements (โ ๐ ), it will not be returned.
To achieve this, unlike LargeSetSimple we do not allow ๐น2-Contributing to check for
all contributing class of any size (up to |๐ฌ|). Instead, we set parameters ๐1 and ๐2 which
denotes how large the size of a contributing class that we are looking for is. To handle the
case in which the size of a contributing class is large (i.e. larger than ๐2), we sample supersets
proportional to 1/๐2 and compute their coverage by existing ๐ฟ0-estimation algorithms.
171
Algorithm 24 A (๐ผ, ๐ฟ, ๐)-oracle of Max ๐-Cover that handles the case in which the majorityof the coverage in an optimal solution is by the sets whose coverage contributions are atleast 1/(s๐ผ) fraction of the optimal coverage size. By modifying the lines marked by (โโ)slightly, the algorithm returns the ๐-cover that achieves the returned coverage estimate (seeSection 5.5 for more explanation).
1: procedure LargeSetComplete((๐ฑ ,w, ๐1, ๐2, thr1, thr2))2: โ Input: w is an upper bound on the desired number of sets in a superset3: โ Parameters: ๐1 = ฮฉ(๐ผ2/๐) and ๐2 = ฮฉ(1)4: let Cntrsmall be an instance of ๐น2-Contributing(๐1, ๐1) โ for Case 15: let Cntrlarge be an instance of ๐น2-Contributing(๐2, ๐2) โ for Case 26: pick โ : โฑ โ [(๐๐ log๐)/w] from ฮ(log(๐๐))-wise independent hash functions7: for all (๐, ๐) in the data stream do8: if ๐ โ ๐ฑ then9: feed โ(๐) to both Cntrsmall and Cntrlarge
10: โ output(Cntr) contains coordinates along with (1ยฑ 1/2)-estimate of frequencies11: if there exists ๐* โ output(Cntrsmall) s.t. ๐ฃ๐* โฅ 1
2ยท thr1 then
12: return 2๐ฃ๐*/(3f) (โโ) โ add return {๐ | โ(๐) = ๐*} to get the ๐-cover
13: if there exists ๐* โ output(Cntrlarge) s.t. ๐ฃ๐* โฅ 12ยท thr2 then
14: return 2๐ฃ๐*/(3f) (โโ) โ add return {๐ | โ(๐) = ๐*} to get the ๐-cover
15: โ case 2: if size of the contributing class is large; ฮฉ(|๐ฌ|)16: letโณโ ๐ฌ be a collection of size 12|๐ฌ| log๐/๐2 picked uniformly at random17: for all ๐ โโณ do โ DE to estimate the coverage of the supersets in โ18: let DE๐ be a (1/2)-approximation algorithm of ๐ฟ0-estimation initialized to zero
19: pick โ : โฑ โ [(๐๐ log๐)/w] from ฮ(log(๐๐))-wise independent hash functions20: for all (๐, ๐) in the data stream do21: if ๐ โ ๐ฑ and โ(๐) โโณ then feed โ(๐) to DEโ(๐)
22: if there exists ๐* โโณ such that VAL(DE๐*) โฅ 12ยท thr2 then
23: return 2VAL(DE๐*)/3 (โโ) โ add return {๐ | โ(๐) = ๐*} to get the ๐-cover
24: return infeasible
Lemma 5.8.5. Even if โโฉ๐ฐ cmnw = โ , with probability at least 1โ๐โ1, none of the solutions
returned by LargeSetComplete with parameters (โ,w, ๐1 = sโ๐ผ, ๐2 = ฮ( ๐๐ log๐w
), thr1 =
|โ|18๐s๐ผ
, thr2 =|โ|6๐๐ผ
) is a superset whose coverage is at less than |๐ฐ|/(54f๐๐ผ).
Proof: Here, we need to revisit Case 1 and Case 2 of Section 5.4.2 and redo the calculations
with respect to the sampled set of elements โ.
Case 1. Suppose that ๐น2-Contributing(ฮฉ(๐ผ2/๐), sโ๐ผ) returns a solution whose cover-
age is less than |๐ฐ|(54f๐๐ผ).Let ๏ฟฝ๏ฟฝ be a vector of size (๐๐ log๐)/w whose ๐th entry denotes the total size of the
172
intersection of sets in ๐๐ and โrarew := โ โ ๐ฐ cmn
w ; ๏ฟฝ๏ฟฝ[๐] :=โ
๐โ๐๐|๐ โฉ โrare
w |. By Claim 5.8.2,
if |๐(๐๐)| < |๐ฐ|/(54f๐๐ผ), with probability at least 1 โ๐โ2, |๐(๐๐) โฉ โrarew | < |๐(๐๐) โฉ โ| <
ts/(36f). Hence, together with Claim 5.4.10, with probability at least 1โ2๐โ2, ๏ฟฝ๏ฟฝ[๐] < ts/36.
Similarly, let ๏ฟฝ๏ฟฝ be a vector of size (๐๐ log๐)/w whose ๐th entry denotes the total size of
the intersection of sets in ๐๐ with the sampled elements โ; ๏ฟฝ๏ฟฝ[๐] :=โ๐โ๐๐|๐ โฉโ|. Note that
Since ๐ โฅ 1, for each superset ๐๐ with coverage less than |๐ฐ|/(54f๐๐ผ), ๏ฟฝ๏ฟฝ[๐] โค 2๐ + ๏ฟฝ๏ฟฝ[๐] โค(ts/36)๐ . On the other hand, by Claim 5.8.4, with probability at least 1 โ๐โ1, the value
of ๏ฟฝ๏ฟฝ[๐] for all ๐ in the sampled substream is at least ๐ .
Moreover, since the size of contributing classes in this case is at most sโ๐ผ = 3s๐ผ (more pre-
cisely, we can always find at most 3s๐ผ sets that are ฮฉ(๐ผ2
๐)-contributing), by Claim 5.2.8, with
probability at least 1โ log๐๐
, all sampled substreams considered by ๐น2-Contributing(๐1, sโ๐ผ)
have size at least ๐๐ log๐/(ws๐ผ). Hence, by Claim 5.8.4, with probability at least 1โ 2 log๐๐
,
๐น2(๏ฟฝ๏ฟฝsmp) โฅ ( ๐๐ log๐ws๐ผ
)๐ 2 where ๏ฟฝ๏ฟฝsmp is a vector corresponding to a sampled substream consid-
ered in ๐น2-Contributing(๐1, sโ๐ผ). Since s4t2 โค 81/(2๐ log(s๐ผ)) (see Table 5.4.1) and by
the value of ๐1 (in Eq. 5.4.2), the following holds:
(ts
36)2๐ 2 < ๐1 ยท
๐๐ log๐
ws๐ผ๐ 2,
which implies that with probability at least 1 โ 3 log๐/๐, an entry corresponding to a
superset with coverage less than |๐ฐ|/(54f๐๐ผ) cannot be a ๐1-HeavyHitter in any of the
sampled substreams considered in ๐น2-Contributing(๐1, sโ๐ผ).
Case 2. The high level idea in this case is similar to the previous case. In Case 1, we heavily
use the fact that there exists a class containing at most sโ๐ผ coordinates that is ฮฉ(๐ผ2/๐)-
contributing. This observation is crucial because then we could argue that all sampled
substreams considered in ๐น2-Contributing(๐1, sโ๐ผ) have size at least ฮฉ(๐/(wsโ๐ผ)) which
rules out the possibility that a coordinate corresponding to a small superset is a ฮฉ(๐ผ2/๐)-
HeavyHitter for sufficiently small values of s (recall that sโ = 3s).
However, in this case, a contributing class may have size ฮฉ(๐/w) which results in a
sampled substream with only ๐(1) coordinates in the run of ๐น2-Contributing! To address
173
the issue, we handle the case in which a contributing class has more than ๐2 coordinates
separately (๐2 is a parameter to be determined shortly):
โ ฮฉ(1)-contributing class has size less than ๐2 := ๐๐ log๐wยท ๐พ. Since ๐ โฅ 1 and by
Claim 5.8.2 and 5.4.10, for each superset ๐๐ with coverage less than |๐ฐ|/(54f๐๐ผ), withprobability at least 1โ2๐โ2, ๏ฟฝ๏ฟฝ[๐] โค (ts/36)๐ . On the other hand, by Claim 5.8.4, with
probability at least 1โ๐โ1, the value of ๏ฟฝ๏ฟฝ[๐] for all ๐ in the sampled substream is at least
๐ . Moreover, by Claim 5.2.8, with probability at least 1โ log๐๐
, all sampled substreams
considered in ๐น2-Contributing(๐2 = 12 log(๐ผ)
, ๐2) invoked by LargeSetComplete
(which is to handle Case 2) have at least (3๐๐ log๐wยท๐2
) = 3๐พcoordinates. Hence, ๐น2(๏ฟฝ๏ฟฝsmp) โฅ
3๐พยท ๐ 2. By setting
๐พ <3๐2
(ts/36)2=
1944
log(๐ผ)t2s2, (5.8.2)
๏ฟฝ๏ฟฝ[๐]2 โค (ts/36)2๐ 2 < ( 12 log๐ผ
๐น2(๏ฟฝ๏ฟฝsmp)), which implies that an entry corresponding to a
superset with coverage less than |๐ฐ|/(27f๐๐ผ) cannot be a ๐2-HeavyHitter in any of
the sampled substream considered in ๐น2-Contributing(๐2, ๐2).
โ ฮฉ(1)-contributing class has size at least ๐2 := ๐๐ log๐wยท ๐พ. Here, we also need to
consider an extra case compared to Lemma 5.4.14 and 5.8.3 because we do not allow ๐2
to try all values up to ( ๐๐ log๐w
). To address the case in which the number of coordinates
in a contributing class is larger than ๐2, we sample โ = (12 log๐)|๐ฌ|/๐2 supersetsโณuniformly at random from ๐ฌ; with high probability, โณ contains a superset from the
contributing class. Then, we compute the coverage of sampled supersets via an existing
algorithm for ๐ฟ0-estimation. By Claim 5.4.13, the coverage of supersets corresponding
to the ๐2-contributing class whose size is larger than ๐2 on the sampled set โ is at
least |โ|/(๐โ๐ผ) โฅ ts/(12f). Hence, the algorithm finds a superset with coverage at least
ts/(36f) on โ which by Claim 5.8.2, it implies that the returned superset has coverage
at least |๐ฐ|/(54f๐s๐ผ). ๏ฟฝ
Theorem 5.8.6. If |๐(OPT)| โฅ |๐ฐ|/๐, then with probability at least 1 โ 1/(log ๐ log๐ ๐),
LargeSet(๐, ๐ผ) returns at least |๐ฐ|/(54f๐๐ผ). Moreover, if LargeSet(๐, ๐ผ) returns a value
174
other than infeasible, then with probability at least 1โ 4๐โ1, |๐(OPT)| โฅ |๐ฐ|/(54f๐๐ผ).
Proof: By Lemma 5.8.5, with probability at least 1 โ 3๐โ1, LargeSet will never return
a superset with coverage less than |๐ฐ|/(27f๐๐ผ); either it returns a large enough estimate or
returns infeasible. Here, we show that if |๐(OPT)| > |๐ฐ|/๐, then the algorithm will return
an estimate at least |๐ฐ|/(54f๐๐ผ) with probability at least 1โ1/(2 log ๐ log๐๐)โ๐โ1. To this
end, we show that with high probability, in one of the ๐(log ๐) parallel runs of LargeSet,
the sampled sets of elements โ does not contain any common element. Then, by Lemma 5.8.3,
the algorithm with probability at least 1 โ 1/(2 log ๐ log๐ ๐) returns |๐ฐ|/(54f๐๐ผ) in the
iteration in which the sampled set of elements that does not contain any common element.
Now, we show that with probability at least 1โ ๐โ1, the sampled set of one of ๐(log ๐)
parallel runs of LargeSet does not contain any common element. Let ๐ = |๐ฐ cmnw | and define
๐1, ยท ยท ยท , ๐๐ to be independent Bernoulli trials with probability of success equal to ๐. Recall
that, we have the assumption that |๐ฐ cmn๐ | โค ๐|๐ฐ|
๐ผand since w โค ๐, |๐ฐ cmn
w | โค |๐ฐ cmn๐ | โค ๐|๐ฐ|
๐ผ,
๐ = E[
๐โ๐=1
๐๐] โค๐|๐ฐ|๐ผยท ๐ = ts๐๐
Pr(
๐โ๐=1
๐๐ = 0) = (1โ ๐)๐ โฅ ๐โ2๐๐ โฅ ๐โ2ts๐๐
Next, letโs assume that ๐ฐ cmnw = {๐1, ยท ยท ยท , ๐๐}. Further, define ๐1, ยท ยท ยท , ๐๐ to be random
variables such that ๐๐ = 1 if ๐๐ โ โ. Hence, ๐1, ยท ยท ยท , ๐๐ are ฮ(log(๐๐))-wise independent
Bernoulli trials with success probability E[๐๐] = ๐. Next, we apply Lemma 5.7.4 with the
following parameters:
๐ = 0, ln(1/๐(0)) โค 2ts๐๐, ๐ = ts๐๐,๐ท = ฮ(log(๐๐)) = 12 log(๐๐),
and show that
Pr(โ โฉ ๐ฐ cmnw = โ ) = Pr(
๐โ๐=1
๐๐ = 0) โฅ Pr(
๐โ๐=1
๐๐)(1โ ๐โ๐ท) โฅ ๐โ2ts๐๐/2.
175
Algorithm 25 A single pass streaming algorithm.
1: procedure LargeSet((๐, ๐ผ,w))2: โ run LargeSetComplete on sampled elements in parallel for ๐(log ๐) instances3: for ๐(log ๐) times in parallel do4: let โ โ ๐ฐ s.t. each ๐ โ โ (ฮ(log(๐๐))-wise independent) w.p. ๐ = tยทs๐ผยท๐
|๐ฐ|
5: ๐1 โ sโ๐ผ, ๐2 โ ๐๐ log๐wยท ๐พ, thr1 โ |โ|/(18๐s๐ผ), thr2 โ |โ|/(6๐๐ผ) โ ๐พ := 1944
t2s2 log๐ผ
6: solโ LargeSetComplete(โ,w, ๐1, ๐2, thr1, thr2)7: if sol = infeasible then8: return |๐ฐ|/(54f๐๐ผ)9: return infeasible
Hence, since ts๐๐ = ๐ = ๐(1) (see Table 5.4.1),
Pr(In all runs, โ โฉ ๐ฐ cmn = โ ) โค (1โ ๐โ2ts๐๐)๐(log๐) โค ๐โ1.
The second property follows from Lemma 5.8.5: if the algorithm returns a value other
than infeasible, then with probability at least 1โ 4๐โ1, |๐(OPT)| โฅ |๐ฐ|/(54f๐๐ผ). ๏ฟฝ
Lemma 5.8.7. The amount of space used by LargeSet is ๐(๐/๐ผ2).
Proof: Note that LargeSet performs ๐(log ๐) instances of LargeSetComplete in par-
allel. Hence, the total amount of space use by LargeSet is ๐(log ๐) times the space
complexity of LargeSetComplete.
Similarly to the space analysis of LargeSetSimple, the amount of space to perform
Cntrsmall and Cntrlarge as defined in LargeSetComplete is respectively ๐(1/๐1) = ๐(๐/๐ผ2)
and ๐(1/๐2) = ๐(1). Moreover, for the last case in which the contributing class has size
larger than ๐2, by Theorem 5.2.12, in total ๐( ๐wยท๐2
) = ๐(1) space is required to compute the
coverage of sampled supersets inโณ. Note that, in all cases, by Lemma 5.7.2, the algorithm
can store โ in ๐(1) space.
Hence, the total amount of space required to implement LargeSet is ๐(๐/๐ผ2). ๏ฟฝ
Proof of Theorem 5.4.8: The guarantee on the quality of the returned estimate follows
from Theorem 5.8.6 with w = min{๐ผ, ๐} and s = ๐(w/๐ผ) (as in Table 5.4.1). Moreover,
Lemma 5.8.7 shows that the space complexity of LargeSet is ๐(๐/๐ผ2).
Moreover, since with high probability the estimate returned by the algorithm is a lower
bound on the coverage size of a ๐-cover of โฑ , the output of LargeSet with high probability,
176
Chapter 6
The Frequency Estimation Problem
6.1. Introduction
The frequency estimation problem is formalized as follows: given a sequence ๐ of elements
from some universe ๐ , for any element ๐ โ ๐ , estimate ๐๐, the number of times ๐ occurs in ๐.
If one could store all arrivals from the stream ๐, one could sort the elements and compute
their frequencies. However, in big data applications, the stream is too large (and may be
infinite) and cannot be stored. This challenge has motivated the development of streaming
algorithms, which read the elements of ๐ in a single pass and compute a good estimate of the
frequencies using a limited amount of space. Specifically, the goal of the problem is as follows.
Given a sequence ๐ of elements from ๐ , the desired algorithm reads ๐ in a single pass while
writing into memory ๐ถ (whose size can be much smaller than the length of ๐). Then, given
any element ๐ โ ๐ , the algorithm reports an estimation of ๐๐ based only on the content of ๐ถ.
Over the last two decades, many such streaming algorithms have been developed, including
Count-Sketch [43], Count-Min [57] and multi-stage filters [70]. The performance guarantees
of these algorithms are well-understood, with upper and lower bounds matching up to ๐(ยท)factors [110].
However, such streaming algorithms typically assume generic data and do not leverage
useful patterns or properties of their input. For example, in text data, the word frequency
is known to be inversely correlated with the length of the word. Analogously, in network
data, certain applications tend to generate more traffic than others. If such properties
179
can be harnessed, one could design frequency estimation algorithms that are much more
efficient than the existing ones. Yet, it is important to do so in a general framework that
can harness various useful properties, instead of using handcrafted methods specific to a
particular pattern or structure (e.g., word length, application type).
In this chapter, we introduce learning-based frequency estimation streaming algorithms.
Our algorithms are equipped with a learning model that enables them to exploit data prop-
erties without being specific to a particular pattern or knowing the useful property a priori.
We further provide theoretical analysis of the guarantees associated with such learning-based
algorithms.
We focus on the important class of โhashing-basedโ algorithms, which includes some of
the most used algorithms such as Count-Min, Count-Median and Count-Sketch. Informally,
these algorithms hash data items into ๐ต buckets, count the number of items hashed into
each bucket, and use the bucket value as an estimate of item frequency. The process can
be repeated using multiple hash functions to improve accuracy. Hashing-based algorithms
have several useful properties. In particular, they can handle item deletions, which are
implemented by decrementing the respective counters. Furthermore, some of them (notably
Count-Min) never underestimate the true frequencies, i.e., ๐๐ โฅ ๐๐ holds always. However,
hashing algorithms lead to estimation errors due to collisions: when two elements are mapped
to the same bucket, they affect each othersโ estimates. Although collisions are unavoidable
given the space constraints, the overall error significantly depends on the pattern of collisions.
For example, collisions between high-frequency elements (โheavy hittersโ) result in a large
estimation error, and ideally should be minimized. The existing algorithms, however, use
random hash functions, which means that collisions are controlled only probabilistically.
Our idea is to use a small subset of ๐, call it ๐ โฒ, to learn the heavy hitters. We can then
assign heavy hitters their own buckets to avoid the more costly collisions. It is important to
emphasize that we are learning the properties that identify heavy hitters as opposed to the
identities of the heavy hitters themselves. For example, in the word frequency case, shorter
words tend to be more popular. The subset ๐ โฒ itself may miss many of the popular words,
but whichever words popular in ๐ โฒ are likely to be short. Our objective is not to learn the
identity of high frequency words using ๐ โฒ. Rather, we hope that a learning model trained
180
on ๐ โฒ learns that short words are more frequent, so that it can identify popular words even
if they did not appear in ๐ โฒ.
Our main contributions are as follows:
โ We introduce learning-based frequency estimation streaming algorithms, which learn
the properties of heavy hitters in their input and exploit this information to reduce
errors.
โ We provide performance guarantees showing that our learned algorithms can deliver
a logarithmic factor improvement in the error bound over their non-learning coun-
terparts. Furthermore, we show that our learning-based instantiation of Count-Min,
a widely used algorithm, is asymptotically optimal among all instantiations of that
algorithm. See Table 6.4.1 in Section 6.4 for the details.
โ We evaluate our learning-based algorithms using two real-world datasets: network
traffic and search query popularity. In comparison to their non-learning counterparts,
our algorithms yield performance gains that range from 18% to 71%.
6.1.1. Related Work
Frequency estimation in data streams. Frequency estimation, and the closely related
problem of finding frequent elements in a data stream, are some of the most fundamental
and well-studied problems in streaming algorithms, see [55] for an overview. Hashing-based
algorithms such as Count-Sketch [43], Count-Min [57] and multi-stage filters [70] are widely
used solutions for these problems. These algorithms also have close connections to sparse
recovery and compressed sensing [40, 64], where the hashing output can be considered as a
compressed representation of the input data [79].
Several โnon-hashingโ algorithms for frequency estimation have been also proposed [136,
62, 116, 133]. These algorithms do not possess many of the properties of hashing-based
methods listed in the introduction (such as the ability to handle deletions), but they often
have better accuracy/space tradeoffs. For a fair comparison, our evaluation focuses only on
hashing algorithms. However, our approach for learning heavy hitters should be useful for
non-hashing algorithms as well.
181
Some papers have proposed or analyzed frequency estimation algorithms customized to
data that follows Zipf Law [43, 58, 133, 135, 154]; the last algorithm is somewhat similar to
the โlookup tableโ implementation of the heavy hitter oracle that we use as a baseline in our
experiments. Those algorithms need to know the data distribution a priori, and apply only
to one distribution. In contrast, our learning-based approach applies to any data property
or distribution, and does not need to know that property or distribution a priori.
Learning-based algorithms. Recently, researchers have begun exploring the idea of in-
tegrating machine learning models into algorithm design. In particular, researchers have
proposed improving compressed sensing algorithms, either by using neural networks to im-
prove sparse recovery algorithms [140, 33], or by designing linear measurements that are
optimized for a particular class of vectors [25, 141], or both. The latter methods can be
viewed as solving a problem similar to ours, as our goal is to design โmeasurementsโ of
the frequency vector (๐1, ๐2 . . . , ๐|๐ |) tailored to a particular class of vectors. However, the
aforementioned methods need to explicitly represent a matrix of size ๐ตร|๐ |, where ๐ต is the
number of buckets. Hence, they are unsuitable for streaming algorithms which, by definition,
have space limitations much smaller than the input size.
Another class of problems that benefited from machine learning is distance estimation,
i.e., compression of high-dimensional vectors into compact representations from which one
can estimate distances between the original vectors. Early solutions to this problem, such as
Locality-Sensitive Hashing, have been designed for worst case vectors. Over the last decade,
numerous methods for learning such representations have been developed [156, 169, 109, 168].
Although the objective of those papers is similar to ours, their techniques are not usable in
our applications, as they involve a different set of tools and solve different problems.
More broadly, there have been several recent papers that leverage machine learning to
design more efficient algorithms. The authors of [59] show how to use reinforcement learn-
ing and graph embedding to design algorithms for graph optimization (e.g., TSP). Other
learning-augmented combinatorial optimization problems are studied in [101, 24, 128]. More
recently, [120, 137] have used machine learning to improve indexing data structures, includ-
ing Bloom filters that (probabilistically) answer queries of the form โis a given element in the
182
data set?โ As in those papers, our algorithms use neural networks to learn certain properties
of the input. However, we differ from those papers both in our design and theoretical analy-
sis. Our algorithms are designed to reduce collisions between heavy items, as such collisions
greatly increase errors. In contrast, in existence indices, all collisions count equally. This
also leads to our theoretical analysis being very different from that in [137].
6.2. Preliminaries
Estimation Error. To measure and compare the overall accuracy of different frequency
estimation algorithms, we will use the expected estimation error which is defined as follows:
let โฑ = {๐1, ยท ยท ยท , ๐๐} and โฑ๐ = {๐1, ยท ยท ยท , ๐๐} respectively denote the actual frequencies and
the estimated frequencies obtained from algorithm ๐ of items in the input stream. We
remark that when ๐ is clear from the context we denote โฑ๐ as โฑ . Then we define
Err(โฑ , โฑ๐) := E๐โผ๐|๐๐ โ ๐๐|, (6.2.1)
where ๐ denotes the query distribution of the items. Here, we assume that the query dis-
tribution ๐ is the same as the frequency distribution of items in the stream, i.e., for any
๐* โ [๐], Pr๐โผ๐[๐ = ๐*] โ ๐๐* (more precisely, for any ๐* โ [๐], Pr๐โผ๐[๐ = ๐*] = ๐๐*/๐ where
๐ =โ
๐โ[๐] ๐๐ denotes the total sum of all frequencies in the stream).
We note that the theoretical guarantees of frequency estimation algorithms are typically
phrased in the โ(๐, ๐ฟ)-formโ, e.g., Pr[|๐๐ โ ๐๐| > ๐๐ ] < ๐ฟ for every ๐ (see e.g., [57]). However,
this formulation involves two objectives (๐ and ๐ฟ). We believe that the (single objective)
expected error in Equation (6.2.1) is more natural from the machine learning perspective.
Remark 6.2.1. As all upper/lower bounds in this paper are proved by bounding the ex-
pected error of estimating the frequency a single item, E[|๐๐ โ ๐๐|], then using linearity of
expectation, in fact we obtain analogous bounds for any query distribution (๐๐)๐โ[๐]. In
particularly this means that the bounds of Table 6.4.1 for CM and CS hold for any query
distribution. For L-CM and L-CS the factor of log(๐/๐ต)/ log ๐ gets replaced byโ๐
๐=๐ตโ+1 ๐๐
where ๐ตโ = ฮ(๐ต) is the number of buckets reserved for heavy hitters.
183
6.2.1. Algorithms for Frequency Estimation
In this section, we recap three variants of hashing-based algorithms for frequency estimation.
Count-Min. We have ๐ distinct hash functions โ๐ : ๐ โ [๐ต] and an array ๐ถ of size ๐ร๐ต.
The algorithm maintains ๐ถ, such that at the end of the stream we have ๐ถ[โ, ๐] =โ
๐:โโ(๐)=๐ ๐๐.
For each ๐ โ ๐ , the frequency estimate ๐๐ is equal to minโโค๐ ๐ถ[โ, โโ(๐)], and always satisfies
๐๐ โฅ ๐๐.
Count-Sketch. Similarly to Count-Min, we have ๐ distinct hash functions โ๐ : ๐ โ [๐ต]
and an array ๐ถ of size ๐ ร ๐ต. Additionally, in Count-Sketch, we have ๐ sign functions
๐๐ : ๐ โ {โ1, 1}, and the algorithm maintains ๐ถ such that ๐ถ[โ, ๐] =โ
๐:โโ(๐)=๐ ๐๐ ยท ๐โ(๐). Foreach ๐ โ ๐ , the frequency estimate ๐๐ is equal to the median of {๐โ(๐) ยท ๐ถ[โ, โโ(๐)]}โโค๐. Note
that unlike the previous two methods, here we may have ๐๐ < ๐๐.
6.2.2. Zipfian Distribution
In our analysis we assume that the frequency distribution of items follows Zipfโs law. That
is, if we sort the items according to their frequencies with no loss of generality assuming
that ๐1 โฅ ๐2 โฅ ยท ยท ยท โฅ ๐๐, then for any ๐ โ [๐], ๐๐ โ 1/๐. Given that the frequencies of items
follow Zipfโs law and assuming that the query distribution is the same as the distribution of
the frequency of items in the input stream (i.e., Pr๐โผ๐[๐*] = ๐๐*/๐ = 1/(๐* ยท ๐ป๐) where ๐ป๐
denotes the ๐-th harmonic number), we can write the expected error defined in (6.2.1) as
follows:
Err(โฑ , โฑ๐) = E๐โผ๐[|๐๐ โ ๐๐|] =1
๐ยทโ๐โ[๐]
|๐๐ โ ๐๐| ยท ๐๐ =1
๐ป๐
ยทโ๐โ[๐]
|๐๐ โ ๐๐| ยท1
๐(6.2.2)
Throughout this paper, we present our results with respect to the objective function in
the right hand side of (6.2.2), i.e., 1๐ป๐ยทโ๐
๐=1 |๐๐ โ ๐๐| ยท ๐๐. We further assume that in fact
๐๐ = 1/๐. At first sight this assumption may seem strange since it says that item ๐ appears
a non-integral number of times in the stream. This is however just a matter of scaling and
the assumption is convenient as it removes the dependence on the length of the stream in
184
Algorithm 26 Learning-Based Frequency Estimation
1: procedure LearnedSketch(๐ต, ๐ต๐, HH, SketchAlg)2: for each stream element ๐ do3: if HH(๐) = 1 then โ if ๐ is a heavy item4: if ๐ is already stored in a unique bucket then5: count๐ โ count๐ + 16: else7: initialize count๐ = 1
8: else9: feed ๐ to SketchAlg
Learned Oracle
SketchAlg(e.g., Count-Min)
Unique Buckets
Element i
Heavy Not heavy
Figure 6.3.1: Pseudo-code and block-diagram representation of our algorithms
our bounds.
6.3. Learning-Based Frequency Estimation Algorithms
We aim to develop frequency estimation algorithms that exploit data properties for better
performance. To do so, we learn an oracle that identifies heavy hitters, and use the oracle to
assign each heavy hitter its unique bucket to avoid collisions. Other items are simply hashed
using any classic frequency estimation algorithm (e.g., Count-Min, or Count-Sketch), as
shown in the block-diagram in Figure 6.3.1. This design has two useful properties: first,
it allows us to augment a classic frequency estimation algorithm with learning capabilities,
producing a learning-based counterpart that inherits the original guarantees of the classic
algorithm. For example, if the classic algorithm is Count-Min, the resulting learning-based
algorithm never underestimates the frequencies. Second, it provably reduces the estimation
errors, and for the case of Count-Min it is (asymptotically) optimal.
Algorithm 26 provides pseudo code for our design. The design assumes an oracle HH(๐)
that attempts to determine whether an item ๐ is a โheavy hitterโ or not. All items classified as
heavy hitters are assigned to one of the ๐ต๐ unique buckets reserved for heavy items. All other
items are fed to the remaining ๐ต โ ๐ต๐ buckets using a conventional frequency estimation
algorithm SketchAlg (e.g., Count-Min or Count-Sketch).
The estimation procedure is analogous. To compute ๐๐, the algorithm first checks whether
๐ is stored in a unique bucket, and if so, reports its count. Otherwise, it queries the SketchAlg
procedure. Note that if the element is stored in a unique bucket, its reported count is exact,
i.e., ๐๐ = ๐๐.
185
AlgorithmExpected Error
๐ = 1 ๐ > 1
Count-Min ฮ( log๐๐ต
) ฮ(๐ log( ๐๐
๐ต)
๐ต)
Learned Count-Min ฮ(log2( ๐
๐ต)
๐ต log๐) ฮฉ(
log2( ๐๐ต)
๐ต log๐)
Count-Sketch ฮ( log๐ต๐ต
) ฮฉ( ๐1/2
๐ต log ๐) and ๐(๐
1/2
๐ต)
Learned Count-Sketch ฮ( log๐(๐/๐ต)๐ต log๐
) ฮฉ(log( ๐
๐ต)
๐ต log๐)
Table 6.4.1: The performance of different algorithms on streams with frequencies obeyingZipf Law. ๐ is the number of hash functions, ๐ต is the number of buckets, and ๐ is thenumber of distinct elements. The space complexity of all algorithms is ฮ(๐ต).
The oracle is constructed using machine learning and trained with a small subset of ๐,
call it ๐ โฒ. Note that the oracle learns the properties that identify heavy hitters as opposed
to the identities of the heavy hitters themselves. For example, in the case of word frequency,
the oracle would learn that shorter words are more frequent, so that it can identify popular
words even if they did not appear in the training set ๐ โฒ.
6.4. Analysis
Our algorithms combine simplicity with strong error bounds. Below, we summarize our
theoretical results, and leave all theorems, lemmas, and proofs to the appendix. In partic-
ular, Table 6.4.1 lists the results proven in this chapter, where each row refers to a specific
streaming algorithm and its corresponding error bound.
First, we show that if the heavy hitter oracle is accurate, then the error of the learned
variant of Count-Min and Count-Sketch are up to logarithmic factors smaller than that of
their standard counterparts.
Second, we show that, asymptotically, our learned Count-Min algorithm cannot be im-
proved any further by designing a better hashing scheme. Specifically, for the case of Learned
Count-Min with a perfect oracle, our design achieves the same asymptotic error as the โIdeal
Count-Minโ, which optimizes its hash function for the given input.
We remark that for both Count-Min and Count-Sketch we aim at analyzing the expected
value of the variableโ
๐โ[๐] ๐๐ ยท |๐๐โ ๐๐| where ๐๐ = 1/๐ and ๐๐ is the estimate of ๐๐ output by
186
the relevant sketching algorithm. Throughout this paper we use the following notation: for
an event ๐ธ we denote by [๐ธ] the random variable in {0, 1} which is 1 if and only if ๐ธ occurs.
6.4.1. Tight Bounds for Count-Min with Zipfians
We begin by presenting our tight analysis of Count-Min with Zipfians. The main theorem
is the following.
Theorem 6.4.1. Let ๐,๐ต, ๐ โ N with ๐ โฅ 2 and ๐ต โค ๐/๐. Let further โ1, . . . , โ๐ : [๐]โ [๐ต]
be independent and truly random hash functions. For ๐ โ [๐], define the random variable
๐๐ = minโโ[๐]
(โ๐โ[๐][โโ(๐) = โโ(๐)]๐๐
). For any ๐ โ [๐] it holds that
E[|๐๐ โ ๐๐|] = ฮ
(log(๐๐ต
)๐ต
)
Replacing ๐ต by ๐ต/๐ in Theorem 6.4.1 and using linearity of expectation we obtain the desired
bound for Count-Min in the upper right hand side of Table 6.4.1. The natural assumption
that ๐ต โค ๐/๐ simply says that the total number of buckets is upper bounded by the number
of items.
To prove Theorem 6.4.1 we start with the following special case of the theorem.
Lemma 6.4.2. Suppose that we are in the setting of Theorem 6.4.1 and further that1 ๐ = ๐ต.
Then
E[|๐๐ โ ๐๐|] = ๐
(1
๐
).
Proof: It suffices to show the result when ๐ = 2 since adding more hash functions and
corresponding tables only decreases the value of |๐๐ โ ๐๐|. Define ๐โ =โ
๐โ[๐]โ{๐}[โโ(๐) =
โโ(๐)]๐๐ for โ โ [2] and note that these variables are independent. For a given ๐ก โฅ 3/๐ we
wish to upper bound Pr[๐โ โฅ ๐ก]. Let ๐ < ๐ก and note that if ๐โ โฅ ๐ก then either of the following
two events must hold:
๐ธ1: There exists a ๐ โ [๐] โ {๐} with ๐๐ > ๐ and โโ(๐) = โโ(๐).
1In particular we dispose with the assumption that ๐ต โค ๐/๐.
187
๐ธ2: The set {๐ โ [๐] โ {๐} : โโ(๐) = โโ(๐)} contains at least ๐ก/๐ elements.
Union bounding we find that
Pr[๐โ โฅ ๐ก] โค Pr[๐ธ1] + Pr[๐ธ2] โค1
๐๐ +
(๐
๐ก/๐
)๐โ๐ก/๐ โค 1
๐๐ +(๐๐ ๐ก
)๐ก/๐ .
Choosing ๐ = ๐กlog(๐ก๐)
, a simple calculation yields that Pr[๐โ โฅ ๐ก] = ๐(
log(๐ก๐)๐ก๐
). As ๐1 and ๐2
are independent, Pr[๐ โฅ ๐ก] = ๐
((log(๐ก๐)
๐ก๐
)2), so
E[๐] =
โซ โ
0
Pr[๐ โฅ ๐ก] ๐๐ก โค 3
๐+๐
(โซ โ
3/๐
(log(๐ก๐)
๐ก๐
)2
๐๐ก
)= ๐
(1
๐
). ๏ฟฝ
Before proving the full statement of Theorem 6.4.1 we recall Bennettโs inequality.
Theorem 6.4.3 (Bennettโs inequality [30]). Let ๐1, . . . , ๐๐ be independent, mean zero
random variables. Let ๐ =โ๐
๐=1๐๐, and ๐2,๐ > 0 be such that Var[๐] โค ๐2 and |๐๐| โค๐
for all ๐ โ [๐]. For any ๐ก โฅ 0,
Pr[๐ โฅ ๐ก] โค exp
(โ ๐2
๐2โ
(๐ก๐
๐2
)),
where โ : Rโฅ0 โ Rโฅ0 is defined by โ(๐ฅ) = (๐ฅ+ 1) log(๐ฅ+ 1)โ ๐ฅ. The same tail bound holds
on the probability Pr[๐ โค โ๐ก].
Remark 6.4.4. It is well known and easy to check that for ๐ฅ โฅ 0,
1
2๐ฅ log(๐ฅ+ 1) โค โ(๐ฅ) โค ๐ฅ log(๐ฅ+ 1).
We will use these asymptotic bounds repeatedly in this paper.
Proof of Theorem 6.4.1. We start out by proving the upper bound. Let ๐1 = [๐ต] โ {๐} and๐2 = [๐] โ ([๐ต] โช {๐}). Let ๐ โ [๐] be such that
โ๐โ๐1
๐๐ ยท [โ๐(๐) = โ๐(๐)] is minimized. Note
188
that ๐ is itself a random variable. We also define
๐1 =โ๐โ๐1
๐๐ ยท [โ๐(๐) = โ๐(๐)],
๐2 =โ๐โ๐2
๐๐ ยท [โ๐(๐) = โ๐(๐)]
Clearly |๐๐ โ ๐๐| โค ๐1 + ๐2. Using Lemma 6.4.2, we obtain that E[๐1] = ๐( 1๐ต). For ๐2 we
observe that
E[๐2 | ๐] =โ๐โ๐2
๐๐๐ต
= ๐
(log(๐๐ต
)๐ต
).
We conclude that
E[|๐๐ โ ๐๐|] โค E[๐1] + E[๐2] = E[๐1] + E[E[๐2 | ๐]] = ๐
(log(๐๐ต
)๐ต
),
as desired.
Next we show the lower bound. For ๐ โ [๐] and โ โ [๐] we define๐(๐)โ = ๐๐ยท
([โโ(๐) = โโ(๐)]โ 1
๐ต
).
Note that the variables (๐(๐)โ )๐โ[๐] are independent. We also define ๐โ =
โ๐โ๐2
๐(๐)โ for
โ โ [๐]. Observe that |๐(๐)โ | โค ๐๐ โค 1
๐ตfor ๐ โฅ ๐ต, E[๐(๐)
โ ] = 0, and that
Var[๐โ] =โ๐โ๐2
๐ 2๐
(1
๐ตโ 1
๐ต2
)โค 2
๐ต2.
Applying Bennettโs inequality with ๐2 = 2๐ต2 and ๐ = 1/๐ต thus gives that
Pr[๐โ โค โ๐ก] โค exp
(โ2โ
(๐ก๐ต
2
)).
Defining ๐โ =โ
๐โ๐2๐๐ ยท [โโ(๐) = โโ(๐)] it holds that E[๐โ] = ฮ
(log( ๐
๐ต )๐ต
)and ๐โ =
๐โ โ E[๐โ], so putting ๐ก = E[๐โ]/2 in the inequality above we obtain that
Pr[๐โ โค E[๐โ]/2] = Pr[๐โ โค โE[๐โ]/2] โค exp(โ2โ
(ฮฉ(log
๐
๐ต
))).
189
Appealing to Remark 6.4.4 and using that ๐ต โค ๐/๐ the above bound becomes
Pr[๐โ โค E[๐โ]/2] โค exp(โฮฉ
(log
๐
๐ตยท log
(log
๐
๐ต+ 1)))
= exp(โฮฉ(log ๐ ยท log(log ๐ + 1))) = ๐โฮฉ(log(log ๐+1)). (6.4.1)
By the independence of the events (๐โ > ๐ธ[๐โ]/2)โโ[๐] we have that
Pr
[|๐๐ โ ๐๐| โฅ
E[๐โ]
2
]โฅ (1โ ๐โฮฉ(log(log ๐+1)))๐ = ฮฉ(1),
and so E[|๐๐ โ ๐๐|] = ฮฉ(E[๐โ]) = ฮฉ
(log( ๐
๐ต )๐ต
), as desired. ๏ฟฝ
6.4.2. Learned Count-Min with Zipfians
One hash function. By the tight analysis of Count-Min, we have the following theorem
on the expected error of the learned Count-Min with ๐ = 1. By taking ๐ต1 = ๐ตโ = ฮ(๐ต)
and ๐ต2 = ๐ต โ ๐ตโ = ฮ(๐ต) in the theorem below the result on learned Count-Min for ๐ = 1
claimed in Table 6.4.1 follows immediately.
Theorem 6.4.5. Let ๐,๐ต1, ๐ต2 โ N and let โ : [๐] โ [๐ต1]โ [๐ต2] be an independent and truly
random hash function. For ๐ โ [๐] โ [๐ต1], define the random variable ๐๐ =โ
๐โ[๐][โ(๐) =
โ(๐)]๐๐. For any ๐ โ [๐] โ [๐ต1] it holds that
E[|๐๐ โ ๐๐|] = ฮ
โโ log(
๐๐ต1
)๐ต2
โโ Multiple hash functions. Next we compute an asymptotic lower bound on the expected
error of the Count-Min no matter how the hash functions are picked. In particular, this
implies a lower bound on the expected error of the learned Count-Min.
Claim 6.4.6. For a set of items ๐ผ, let ๐(๐ผ) denote the total frequency of all items in ๐ผ;
๐(๐ผ) :=โ
๐โ๐ผ ๐๐. In any hash function of form {โ : ๐ผ โ [๐ต๐ผ ]} with minimum expected error,
any item with frequency at least ๐(๐ผ)๐ต๐ผ
does not collide with any other items in ๐ผ.
Proof: For each bucket ๐ โ [๐ต๐ผ ], lets ๐โ(๐) denotes the frequency of items mapped to ๐ under
190
โ; ๐(๐) :=โ
๐โ๐ผ:โ(๐)=๐ ๐๐. Then we can rewrite Err(โฑ(๐ผ), โฑโ(๐ผ)) as
Err(โฑ(๐ผ), โฑโ(๐ผ)) =โ๐โ๐ผ
๐๐ ยท (๐(โ(๐))โ ๐๐) =โ๐โ๐ผ
๐๐ ยท ๐(โ(๐))โโ๐โ๐ผ
๐ 2๐
=โ๐โ[๐ต๐ผ ]
๐(๐)2 โโ๐โ๐ผ
๐ 2๐ . (6.4.2)
Note that in (6.4.2) the second term is independent of โ and is a constant. Hence, an optimal
hash function minimizes the first term,โ
๐โ๐ต๐ผ๐(๐)2.
Suppose that an item ๐* with frequency at least ๐(๐ผ)๐ต๐ผ
collides with a (non-empty) set of
items ๐ผ* โ ๐ผ โ {๐*} under an optimal hash function โ*. Since the total frequency of the
items mapped to the bucket ๐* containing ๐* is greater than ๐(๐ผ)๐ต๐ผ
(i.e., ๐(โ(๐*)) > ๐(๐ผ)๐ต๐ผ
), there
exists a bucket ๐ such that ๐(๐) < ๐(๐ผ)๐ต๐ผ
. Next, we define a new hash function โ with smaller
estimation error compared to โ* which contradicts the optimality of โ*:
โ(๐) =
โงโจโฉ โ*(๐) if ๐ โ ๐ผ โ ๐ผ*
๐ otherwise.
Formally,
Err(โฑ(๐ผ), โฑโ*(๐ผ))โ Err(โฑ(๐ผ), โฑโ(๐ผ)) = ๐โ*(๐*)2 + ๐โ*(๐)2 โ ๐โ(๐*)2 โ ๐โ(๐)
2
= (๐๐* + ๐(๐ผ*))2 + ๐โ*(๐)2 โ ๐ 2๐* โ (๐โ*(๐) + ๐(๐ผ*))2
= 2๐๐* ยท ๐(๐ผ*)โ 2๐โ*(๐) ยท ๐(๐ผ*)
= 2๐(๐ผ*) ยท (๐๐* โ ๐โ*(๐))
> 0 B Since ๐๐* >๐(๐ผ)
๐ต> ๐โ*(๐). ๏ฟฝ
Next, we show that in any optimal hash function โ* : [๐] โ [๐ต] and assuming Zipfian
distribution, ฮ(๐ต) most frequent items do not collide with any other items under โ*.
Lemma 6.4.7. Suppose that ๐ต = ๐/๐พ where ๐พ is a large enough constant and lets assume
that items follow Zipfian distribution. In any hash function โ* : [๐] โ [๐ต] with minimum
expected error, none of the ๐ต2 ln ๐พ
most frequent items collide with any other items (i.e., they
191
are mapped to a singleton bucket).
Proof: Let ๐๐* be the most frequent item that is not mapped to a singleton bucket under โ*.
If ๐* > ๐ต2 ln ๐พ
then the statement holds. Suppose it is not the case and ๐* โค ๐ต2 ln ๐พ
. Let ๐ผ denote
the set of items with frequency at most ๐๐* = 1/๐* (i.e., ๐ผ = {๐๐ | ๐ โฅ ๐*}) and let ๐ต๐ผ denote
the number of buckets that the items with index at least ๐* mapped to; ๐ต๐ผ = ๐ต โ (๐* โ 1).
It is straightforward to show that ๐(๐ผ) < ln( ๐๐*) + 1. Next, by Claim 6.4.6, we show that
โ* does not hash the items {๐*, ยท ยท ยท , ๐} to ๐ต๐ผ optimally. In particular, we show that the
frequency of item ๐* is more than ๐(๐ผ)๐ต๐ผ
. To prove this, first we observe that the function
๐(๐) := ๐ ยท (ln(๐/๐) + 1) is strictly increasing in [1, ๐]. Hence, for any ๐* โค ๐ต2 ln ๐พ
,
๐* ยท (ln( ๐๐*) + 1) โค ๐ต
2 ln ๐พยท (ln(2๐พ ln ๐พ) + 1)
โค ๐ต ยท (1โ 1
2 ln ๐พ) B Since ln(2 ln ๐พ) + 2 < ln ๐พ for sufficiently large ๐พ
< ๐ต๐ผ
Thus, ๐๐* = 1๐*
> ln(๐/๐*)+1๐ต๐ผ
> ๐(๐ผ)๐ต๐ผ
which implies that an optimal hash function must map ๐*
to a singleton bucket. ๏ฟฝ
Theorem 6.4.8. If ๐/๐ต is sufficiently large, then the expected error of any hash function
that maps a set of ๐ items following Zipfian distribution to ๐ต buckets is ฮฉ( ln2(๐/๐ต)๐ต
).
Proof: Let ๐พ = ๐/๐ต. By Lemma 6.4.7, in any hash function with minimum expected error,
the ( ๐ต2 ln ๐พ
) most frequent items do not collide with any other items (i.e., they are mapped
into a singleton bucket) where ๐พ is a sufficiently large constant.
Hence, the goal is to minimize (6.4.2) for the set of items ๐ผ which consist of all items
other than the ( ๐ต2 ln ๐พ
) most frequent items. Since the sum of squares of ๐ items that summed
to ๐ is at least ๐2
๐, the expected error of any optimal hash function is at least:
192
Err(โฑ(๐ผ), โฑโ*(๐ผ)) =โ๐โ[๐ต]
๐(๐)2 โโ๐โ[๐]
๐ 2๐ B by (6.4.2)
โฅ (โ
๐โ๐ผ ๐๐)2
๐ต(1โ 12 ln ๐พ
)โโ๐โ๐ผ
๐ 2๐
โฅ (ln(2๐พ ln ๐พ)โ 1)2
๐ตโ 2 ln ๐พ
๐ต+
1
๐
= ฮฉ(ln2 ๐พ
๐ต) B for sufficiently large ๐พ
= ฮฉ(ln2(๐/๐ต)
๐ต). ๏ฟฝ
Next, we show a more general statement which basically shows that the estimation error of
any Count-Min with ๐ต buckets is ฮฉ( ln2(๐/๐ต)๐ต
) no matter how many rows it has.
Theorem 6.4.9. If ๐/๐ต is sufficiently large, then the estimation error of any Count-Min
that maps a set of ๐ items following Zipfian distribution to ๐ต buckets is ฮฉ( ln2(๐/๐ต)๐ต
).
Proof: We prove the statement by showing a reduction that given a Count-Min ๐ถ๐(๐) with
hash functions โ1, ยท ยท ยท , โ๐ โ {โ : [๐]โ [๐ต/๐]} constructs a single hash function โ* : [๐]โ [๐ต]
whose estimation error is less than or equal to the estimation error of ๐ถ๐(๐).
For each item ๐, we define ๐ถ โฒ[๐] to be the bucket whose value is returned by ๐ถ๐(๐) as the
estimate of ๐๐; ๐ถ โฒ[๐] := argmin๐โ[๐]๐ถ[๐, โ๐(๐)]. Since the total number of buckets in ๐ถ๐(๐)
is ๐ต, |{๐ถ โฒ[๐] | ๐ โ [๐]}| โค ๐ต; in other words, we only consider the subset of buckets that
๐ถ๐(๐) uses to report the estimates of {๐๐ | ๐ โ [๐]} which trivially has size at most ๐ต. We
define โ* as follows:
โ*(๐) = (๐*, โ๐*(๐)) B for each ๐ โ [๐], where ๐* = argmin๐โ[๐]๐ถ[๐, โ๐(๐)]
Unlike ๐ถ๐(๐), โ* maps each item to exactly one bucket in {๐ถ[โ, ๐] | โ โ [๐], ๐ โ [๐ต/๐]};hence, for each item ๐, ๐ถ โฒ[โ*(๐)] โค ๐ถ[โ*(๐)] = ๐๐ where ๐๐ is the estimate of ๐๐ returned by
๐ถ๐(๐). Moreover, since for each ๐, ๐ถ โฒ[โ*(๐)] โฅ ๐๐, Err(โฑ , โฑโ*) โค Err(โฑ , โฑ๐ถ๐(๐)). Finally,
by Theorem 6.4.8, the estimation error of โ* is ฮฉ( ln2(๐/๐ต)๐ต
) which implies that the estimation
error of ๐ถ๐(๐) is ฮฉ( ln2(๐/๐ต)๐ต
) as well. ๏ฟฝ
193
6.4.3. (Nearly) tight Bounds for Count-Sketch with Zipfians
In this section we proceed to analyze Count-Sketch for Zipfians either using a single or more
hash functions. We start with two simple lemmas which for certain frequencies (๐๐)๐โ[๐] of
the items in the stream can be used to obtain respectively good upper and lower bounds on
E[|๐๐โ ๐๐|] in Count-Sketch with a single hash function. We will use these two lemmas both
in our analysis of standard and learned Count-Sketch for Zipfians.
Lemma 6.4.10. Let ๐ค = (๐ค1, . . . , ๐ค๐) โ R๐, ๐1, . . . , ๐๐ independent Bernoulli variables
taking value 1 with probability ๐, and ๐1, . . . , ๐๐ โ {โ1, 1} independent Rademachers, i.e.,Pr[๐๐ = 1] = Pr[๐๐ = โ1] = 1/2. Let ๐ =
โ๐๐=1 ๐ค๐๐๐๐๐. Then
E[|๐|] = ๐ (โ๐โ๐คโ2) .
Proof: Using that E[๐๐๐๐] = 0 for ๐ = ๐ and Jensenโs inequality
E[|๐|]2 โค E[๐2] = E
[๐โ
๐=1
๐ค2๐ ๐๐
]= ๐โ๐คโ22,
from which the result follows. ๏ฟฝ
Lemma 6.4.11. Suppose that we are in the setting of Lemma 6.4.10. Let ๐ผ โ [๐] and let
๐ค๐ผ โ R๐ be defined by (๐ค๐ผ)๐ = [๐ โ ๐ผ] ยท ๐ค๐. Then
E[|๐|] โฅ 1
2๐ (1โ ๐)|๐ผ|โ1 โ๐ค๐ผโ1.
Proof: Let ๐ฝ = [๐] โ ๐ผ, ๐1 =โ
๐โ๐ผ ๐ค๐๐๐๐๐, and ๐2 =โ
๐โ๐ฝ ๐ค๐๐๐๐๐. Let ๐ธ denote the event
that ๐1 and ๐2 have the same sign or ๐2 = 0. Then Pr[๐ธ] โฅ 1/2 by symmetry. For ๐ โ ๐ผ
we denote by ๐ด๐ the event that {๐ โ ๐ผ : ๐๐ = 0} = {๐}. Then Pr[๐ด๐] = ๐(1 โ ๐)|๐ผ|โ1 and
furthermore ๐ด๐ and ๐ธ are independent. If ๐ด๐ โฉ ๐ธ occurs, then |๐| โฅ |๐ค๐| and as the events
(๐ด๐ โฉ ๐ธ)๐โ๐ผ are disjoint it thus follows that
E[|๐|] โฅโ๐โ๐ผ
Pr[๐ด๐ โฉ ๐ธ] ยท |๐ค๐| =1
2๐ (1โ ๐)|๐ผ|โ1 โ๐ค๐ผโ1. ๏ฟฝ
194
One hash-function. We are now ready to commence our analysis of Count-Sketch for
Zipfians. As in the discussion succeeding Theorem 6.4.1 the following theorem yields the
desired result for a single hash function as presented in Table 6.4.1.
Theorem 6.4.12. Suppose that ๐ต โค ๐ and let โ : [๐] โ [๐ต] and ๐ : [๐] โ {โ1, 1} be
truly random hash functions. Define the random variable ๐๐ =โ
๐โ[๐][โ(๐) = โ(๐)]๐ (๐)๐๐ for
๐ โ [๐]. Then
E[|๐๐ โ ๐ (๐)๐๐|] = ๐
(log๐ต
๐ต
).
Proof: Let ๐ โ [๐] be fixed. We start by defining ๐1 = [๐ต] โ {๐} and ๐2 = [๐] โ ([๐ต] โช {๐})and note that
|๐๐ โ ๐ (๐)๐๐| โคโ๐โ๐1
[โ(๐) = โ(๐)]๐ (๐)๐๐
+โ๐โ๐2
[โ(๐) = โ(๐)]๐ (๐)๐๐
:= ๐1 +๐2.
Using the triangle inequality
E[๐1] โค1
๐ต
โ๐โ๐1
๐๐ = ๐
(log๐ต
๐ต
).
Also, by Lemma 6.4.10, E[๐2] = ๐(1๐ต
)and combining the two bounds we obtain the desired
upper bound.
For the lower bound we apply Lemma 6.4.11 with ๐ผ = ๐1 concluding that
E[|๐๐ โ ๐ (๐)๐๐|] โฅ1
2๐ต
(1โ 1
๐ต
)|๐1|โ1 โ๐โ๐1
๐๐ = ฮฉ
(log๐ต
๐ต
). ๏ฟฝ
Multiple hash functions. Let ๐ โ N be odd. For a tuple ๐ฅ = (๐ฅ1, . . . , ๐ฅ๐) โ R๐ we
denote by median๐ฅ the median of the entries of ๐ฅ. The following theorem immediately leads
to the result on CS with ๐ โฅ 3 hash functions claimed in Table 6.4.1.
Theorem 6.4.13. Let ๐ โฅ 3 be odd, ๐ โฅ ๐๐ต and all โ1, . . . , โ๐ : [๐] โ [๐ต] and ๐ 1, . . . , ๐ ๐ :
[๐]โ {โ1, 1} be truly random hash functions. For each ๐ โ [๐], define the random variable
195
๐๐ = medianโโ[๐]
(โ๐โ[๐][โโ(๐) = โโ(๐)]๐ โ(๐)๐๐
). Assume that2 ๐ โค ๐ต. Then
E[|๐๐ โ ๐ (๐)๐๐|] = ฮฉ
(1
๐ตโ๐ log ๐
), and E[|๐๐ โ ๐ (๐)๐๐|] = ๐
(1
๐ตโ๐
)
The assumption ๐ โฅ ๐๐ต simply says that the total number of buckets is upper bounded by
the number of items. Again using linearity of expectation for the summation over ๐ โ [๐]
and replacing ๐ต by ๐ต/๐ we obtain the claimed upper and lower bounds ofโ๐
๐ต log ๐and
โ๐
๐ต
respectively. We note that even if the bounds above are only tight up to a factor of log ๐
they still imply that it is asymptotically optimal to choose ๐ = ๐(1). To settle the correct
asymptotic growth is thus of merely theoretical interest. We need the following claim to the
prove the theorem.
Claim 6.4.14. Let ๐ผ โ R be the closed interval centered at the origin of length 2๐ก, i.e.,
๐ผ = [โ๐ก, ๐ก]. Suppose that 1โ๐๐ตโค ๐ก โค 1
2๐ต. For โ โ [๐], Pr[๐(โ) โ ๐ผ] = ฮฉ(๐ก๐ต).
Proof: We first show that with probability ฮฉ(1), ๐(โ)2 lies in the interval [1/๐ต, ๐พ/๐ต] for some
constant ๐พ. To see this we note that by Lemma 6.4.10, E[|๐(โ)2 |] = ๐
(1๐ต
), so it follows by
Markovโs inequality that if ๐พ = ๐(1) is large enough, the probability that |๐(โ)2 | โฅ ๐พ/๐ต is at
most 1200
. For a constant probability lower bound on |๐(โ)2 | we write
๐(โ)2 =
โ๐โ๐2โฉ{๐ต+1,...,2๐ต}
[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐ +โ
๐โ๐2โฉ{2๐ต+1,...,๐}
[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐ := ๐1 + ๐2.
Condition on ๐2. If ๐ต โฅ 4 the probability that there exist exactly two ๐ โ ๐2 โฉ {๐ต +
1, . . . , 2๐ต} with โโ(๐) = โโ(๐) is at least
(|๐2 โฉ {๐ต + 1, . . . , 2๐ต}|2
)1
๐ต2
(1โ 1
๐ต
)๐ตโ2
โฅ(๐ต โ 1
2
)1
๐๐ต2โฅ 1
8๐.
With probability 1/4 the corresponding signs ๐ โ(๐) are both the same as that of ๐2. By
independence of ๐ โ and โโ the probability that this occurs is at least 132๐
and if it does,
|๐(โ)2 | โฅ 1/๐ต. Combining these two bounds it follows that |๐(โ)
2 | โ [ 1๐ต, ๐พ๐ต] with probability
2This very mild assumption can probably be removed at the cost of a more technical proof. In our proofit can even be replaced by ๐ โค ๐ต2โ๐ for any ๐ = ฮฉ(1).
196
at least 132๐โ 1
200โฅ 1
200. By symmetry, Pr[๐(โ)
2 โ [ 1๐ต, ๐พ๐ต]] โฅ 1
400= ฮฉ(1). Denote this event by
๐ธ. Also let ๐น be the event that |{๐ โ ๐1 : โโ(๐) = โโ(๐)}| = 1. Then Pr[๐น ] = ฮฉ(1) and as ๐ธ
and ๐น are independent Pr[๐ธ โฉ ๐น ] = ฮฉ(1). Conditioned on ๐ธ โฉ ๐น we now lower bound the
probability that ๐(โ) โ ๐ผ. For this it suffices to fix ๐(โ)2 = ๐
๐ตfor some 1 โค ๐ โค ๐พ and lower
bound the probability that ๐๐ต+๐๐ โ [โ๐ก, ๐ก] where ๐ is a Rademacher and ๐ โ {1/๐ : ๐ โ ๐1}
is chosen uniformly at random. Let ๐1,๐2 โ R>0 be such that
๐
๐ตโ 1
๐1
= โ๐ก and๐
๐ตโ 1
๐2
= ๐ก.
Then ๐2 โ ๐1 = ๐ต 2๐ก๐ต๐2โ๐ก2๐ต2 = ฮฉ(๐ก๐ต2). Using that ๐ต is larger than a big enough constant
and the mild assumption ๐ โค ๐ต, we have that ๐2 โ ๐1 = ฮฉ(๐ต/โ๐) = ฮฉ(
โ๐ต) โฅ 1, and
so โ๐2 โ๐1โ = ฮฉ(๐2 โ๐1) = ฮฉ(๐ก๐ต2) as well. As we have |๐2| โฅ ๐ต โ 1 options for ๐ it
follows that
Pr[ ๐๐ต
+ ๐๐ โ [โ๐ก, ๐ก]]โฅ โ๐2 โ๐1โ
๐ต โ 1= ฮฉ(๐ก๐ต),
as desired. ๏ฟฝ
Proof of Theorem 6.4.13. If ๐ต (and hence ๐) is a constant the result follows easily from
Lemma 6.4.10, so in what follows we assume that ๐ต is larger than a sufficiently large constant.
We first prove the upper bound. Define ๐1 = [๐ต] โ {๐} and ๐2 = [๐] โ ([๐ต]โช{๐}). Let forโ โ [๐], ๐(โ)
1 =โ
๐โ๐1[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐ and ๐
(โ)2 =
โ๐โ๐2
[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐. Finally
write ๐(โ) = ๐(โ)1 +๐
(โ)2 .
As the absolute error in Count-Sketch with one pair of hash functions (โ, ๐ ) is always
upper bounded by the corresponding error in Count-Min with the single hash function โ, we
can use the bound in the proof of Lemma 6.4.2 to conclude that
Pr[|๐(โ)1 | โฅ ๐ก] = ๐
(log(๐ก๐ต)
๐ก๐ต
),
when ๐ก โฅ 3/๐ต. Also
Var[๐โ2] =
(1
๐ตโ 1
๐ต2
)โ๐โ๐2
๐ 2๐ โค
2
๐ต2,
197
so by Bennettโs inequality (with ๐ = 1/๐ต and ๐2 = 2/๐ต2) and Remark 6.4.4,
Pr[|๐(โ)2 | โฅ ๐ก] โค 2 exp (โ2โ(๐ก๐ต/2)) โค 2 exp
(โ1
2๐ก๐ต log
(๐ก๐ต
2+ 1
))= ๐
(log(๐ก๐ต)
๐ก๐ต
),
for ๐ก โฅ 3๐ต. It follows that for ๐ก โฅ 3/๐ต,
Pr[|๐(โ)| โฅ 2๐ก] โค Pr[(|๐(โ)1 | โฅ ๐ก)] + Pr(|๐(โ)
2 | โฅ ๐ก)] = ๐
(log(๐ก๐ต)
๐ก๐ต
).
Let ๐ถ be the implicit constant in the ๐-notation above. If |๐๐ โ ๐ (๐)๐๐| โฅ 2๐ก, at least half of
the values (|๐(โ)|)โโ[๐] are at least 2๐ก. For ๐ก โฅ 3/๐ต it thus follows by a union bound that
Pr[|๐๐ โ ๐ (๐)๐๐| โฅ 2๐ก] โค 2
(๐
โ๐/2โ
)(๐ถlog(๐ก๐ต)
๐ก๐ต
)โ๐/2โ
โค 2
(4๐ถ
log(๐ก๐ต)
๐ก๐ต
)โ๐/2โ
. (6.4.3)
If ๐ผ = ๐(1) is chosen sufficiently large it thus holds that
โซ โ
๐ผ/๐ต
Pr[|๐๐ โ ๐ (๐)๐๐| โฅ ๐ก] ๐๐ก = 2
โซ โ
๐ผ/(2๐ต)
Pr[|๐๐ โ ๐ (๐)๐๐| โฅ 2๐ก] ๐๐ก
โค 4
๐ต
โซ โ
๐ผ/2
(4๐ถ
log(๐ก)
๐ก
)โ๐/2โ
๐๐ก
โค 1
๐ต2๐โค 1
๐ตโ๐.
Here the first inequality uses (6.4.3) and a change of variable. The second inequality uses
that(4๐ถ log ๐ก
๐ก
)โ๐/2โ โค (๐ถ โฒ/๐ก)2๐/5 for some constant ๐ถ โฒ followed by a calculation of the integral.
For our upper bound it therefore suffices to show thatโซ ๐ผ/๐ต
0Pr[|๐๐โ๐ (๐)๐๐| โฅ ๐ก] ๐๐ก = ๐
(1
๐ตโ๐
).
For this let 1โ๐๐ตโค ๐ก โค 1
๐ตbe fixed. If |๐๐โ๐ (๐)๐๐| โฅ ๐ก, at least half of the values (๐(โ))โโ[๐]
are at least ๐ก or at most โ๐ก. Let us focus on bounding the probability that at least half are
at least ๐ก, the other bound being symmetric giving an extra factor of 2 in the probability
bound. By symmetry and Claim 6.4.14, Pr[๐(โ) โฅ ๐ก] = 12โ ฮฉ(๐ก๐ต). For โ โ [๐] we define
๐โ = [๐(โ) โฅ ๐ก], and we put ๐ =โ
โโ[๐] ๐โ. Then E[๐] = ๐(12โ ฮฉ(๐ก๐ต)
). If at least half of
the values (๐(โ))โโ[๐] are at least ๐ก then ๐ โฅ ๐/2. By Hoeffdingโs inequality we can bound
198
the probability of this event by
Pr[๐ โฅ ๐/2] = Pr[๐ โ E[๐] = ฮฉ(๐๐ก๐ต)] = exp(โฮฉ(๐๐ก2๐ต2)).
It follows that Pr[|๐๐ โ ๐ (๐)๐๐| โฅ ๐ก] โค 2 exp(โฮฉ(๐๐ก2๐ต2)). Thus
โซ ๐ผ/๐ต
0
Pr[|๐๐ โ ๐ (๐)๐๐| โฅ ๐ก] ๐๐ก โค 1
๐ตโ๐+
โซ 12๐ต
1
๐ตโ๐
2 exp(โฮฉ(๐๐ก2๐ต2)) ๐๐ก+
โซ ๐ผ/๐ต
12๐ต
2 exp(โฮฉ(๐)) ๐๐ก
= ๐
(1
๐ตโ๐
)+
1
๐ตโ๐
โซ โ
1
exp(โ๐ก2) ๐๐ก = ๐
(1
๐ตโ๐
).
This completes the proof of the upper bound and we proceed with the lower bound. Fix
โ โ [๐] and let ๐1 = [๐ต log ๐] โ {๐} and ๐2 = [๐] โ ([๐ต log ๐] โช {๐}). Write
๐ :=โ๐โ๐1
[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐ +โ๐โ๐2
[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐ := ๐1 + ๐2.
We also define ๐ฝ := {๐ โ ๐1 : โโ(๐) = โโ(๐)}. Let ๐ผ โ R be the closed interval around
๐ โ(๐)๐๐ of length 1๐ตโ๐ log ๐
. We now upper bound the probability that ๐ โ ๐ผ conditioned on
the value of ๐2. To ease the notation the conditioning on ๐2 has been left out in the notation
to follow. Note first that
Pr[๐ โ ๐ผ] =
|๐1|โ๐=0
Pr[๐ โ ๐ผ | |๐ฝ | = ๐] ยท Pr[|๐ฝ | = ๐].
For a given ๐ โฅ 1 we now proceed to bound Pr[๐ โ ๐ผ | |๐ฝ | = ๐]. This probability is the same
as the probability that ๐2 +โ
๐โ๐ ๐๐๐๐ โ ๐ผ, where ๐ โ ๐1 is a uniformly random ๐-subset
and the ๐๐โs are independent Rademachers. Suppose that we sample the elements from ๐ as
well as the corresponding signs (๐๐)๐โ๐ sequentially, and let us condition on the values and
signs of the first ๐โ1 sampled elements. At this point at most ๐ต log ๐โ๐
+1 possible samples for
the last element in ๐ brings ๐ into ๐ผ. Indeed, the minimum distance between consecutive
points in {๐๐ : ๐ โ๐1} is at most 1/(๐ต log ๐)2 so at most
1
๐ตโ๐ log ๐
ยท (๐ต log ๐)2 + 1 =๐ต log ๐โ
๐+ 1
199
samples brings ๐ into ๐ผ. For 1 โค ๐ โค (๐ต log ๐)/2 we can thus can upper bound
Pr[๐ โ ๐ผ | |๐ฝ | = ๐] โค๐ต log ๐โ
๐+ 1
|๐1| โ ๐ + 1โค 2โ
๐+
2
๐ต log ๐โค 3โ
๐.
By a standard Chernoff bound
Pr[|๐ฝ | โฅ ๐ต log ๐/2] = exp (โฮฉ(๐ต log ๐)) = ๐โฮฉ(๐ต).
If we assume that ๐ต is larger than a constant, then Pr[|๐ฝ | โฅ ๐ต log ๐/2] โค ๐โ1. Finally,
Pr[|๐ฝ | = 0] = (1โ 1/๐ต)๐ต log ๐ โค ๐โ1. Combining these three bounds,
Pr[๐ โ ๐ผ] โค Pr[|๐ฝ | = 0] +
(๐ต log ๐)/2โ๐=1
Pr[๐ โ ๐ผ | |๐ฝ | = ๐] ยท Pr[|๐ฝ | = ๐]
+
|๐1|โ๐=(๐ต log ๐)/2
Pr[|๐ฝ | = ๐] = ๐
(1โ๐
),
which holds even after removing the conditioning on ๐2. We now show that with probability
ฮฉ(1) at least half the values (๐(โ))โโ[๐] are at least 12๐ต
โ๐ log ๐
. Let ๐0 be the probability that
๐(โ) โฅ 12๐ต
โ๐ log ๐
. This probability does not depend on โ โ [๐] and by symmetry and what
we showed above, ๐0 = 1/2โ๐(1/โ๐). Define the function ๐ : {0, . . . , ๐} โ R by
๐(๐ก) =
(๐
๐ก
)๐๐ก0(1โ ๐0)
๐โ๐ก.
Then ๐(๐ก) is the probability that exactly ๐ก of the values (๐(โ))โโ[๐] are at least 1๐ตโ๐ log ๐
. Using
that ๐0 = 1/2โ๐(1/โ๐), a simple application of Stirlingโs formula gives that ๐(๐ก) = ฮ
(1โ๐
)for ๐ก = โ๐/2โ, . . . , โ๐/2 +
โ๐โ when ๐ is larger than some constant ๐ถ. It follows that with
probability ฮฉ(1) at least half the (๐(โ))โโ[๐] are at least 1๐ตโ๐ log ๐
and in particular
E[|๐๐ โ ๐๐|] = ฮฉ
(1
๐ตโ๐ log ๐
).
Finally we handle the case where ๐ โค ๐ถ. It is easy to check (e.g. with Lemma 6.4.11) that
๐(โ) = ฮฉ(1/๐ต) with probability ฮฉ(1). Thus this happens for all โ โ [๐] with probability
200
ฮฉ(1) and in particular E[|๐๐ โ ๐๐|] = ฮฉ(1/๐ต), which is the desired for constant ๐. ๏ฟฝ
6.4.4. Learned Count-Sketch for Zipfians
We now proceed to analyze the learned Count-Sketch algorithm. First, we estimate the
expected error when using a single hash function and next we show that the expected error
only increases when using more hash functions. Recall that we assume on the number of
buckets ๐ตโ used to store the heavy hitters that ๐ตโ = ฮ(๐ต โ๐ตโ) = ฮ(๐ต).
One hash function. By taking ๐ต1 = ๐ตโ = ฮ(๐ต) and ๐ต2 = ๐ต โ ๐ตโ = ฮ(๐ต) in the
theorem below the result on L-CS for ๐ = 1 claimed in Table 6.4.1 follows immediately.
Theorem 6.4.15. Let โ : [๐] โ [๐ต1] โ [๐ต2] and ๐ : [๐] โ {โ1, 1} be truly random hash
functions where ๐,๐ต1, ๐ต2 โ N and3 ๐ โ ๐ต1 โฅ ๐ต2 โฅ ๐ต1. Define the random variable ๐๐ =โ๐๐=๐ต1+1[โ(๐) = โ(๐)]๐ (๐)๐๐ for ๐ โ [๐] โ [๐ต1]. Then
E[|๐๐ โ ๐ (๐)๐๐|] = ฮ
(log ๐ต2+๐ต1
๐ต1
๐ต2
)
Proof: Let ๐1 = [๐ต1 + ๐ต2] โ ([๐ต1] โช {๐}) and ๐2 = [๐] โ ([๐ต1 + ๐ต2] โช {๐}). Let ๐1 =โ๐โ๐1
[โ(๐) = โ(๐)]๐ (๐)๐๐ and ๐2 =โ
๐โ๐2[โ(๐) = โ(๐)]๐ (๐)๐๐. By the triangle inequality
and linearity of expectation,
E[|๐1|] = ๐
(log ๐ต2+๐ต1
๐ต1
๐ต2
).
Moreover, it follows directly from Lemma 6.4.10 that E [|๐2|] = ๐(
1๐ต2
). Thus
E[|๐๐ โ ๐ (๐)๐๐|] โค E[|๐1|] + E[|๐2|] = ๐
(log ๐ต2+๐ต1
๐ต1
๐ต2
),
as desired.
3The first inequality is the standard assumption that we have at least as many items as buckets. Thesecond inequality says that we use at least as many buckets for non-heavy items as for heavy items (whichdoesnโt change the asymptotic space usage).
201
For the lower bound on E[๐๐ โ ๐ (๐)๐๐
]we apply Lemma 6.4.11 to obtain that,
E[๐๐ โ ๐ (๐)๐๐
]โฅ 1
2๐ต2
(1โ 1
๐ต2
)|๐1|โ1 โ๐โ๐1
๐๐ = ฮฉ
(log ๐ต2+๐ต1
๐ต1
๐ต2
). ๏ฟฝ
Corollary 6.4.16. Let โ : [๐] โ [๐ตโ] โ [๐ต โ ๐ตโ] and ๐ : [๐] โ {โ1, 1} be truly random
hash functions where ๐,๐ต1, ๐ต2 โ N and ๐ตโ = ฮ(๐ต) โค ๐ต/2. Define the random variable
๐๐ =โ๐
๐=๐ตโ+1[โ(๐) = โ(๐)]๐ (๐)๐๐ for ๐ โ [๐] โ [๐ตโ]. Then
E[|๐๐ โ ๐ (๐)๐๐|] = ฮ
(1
๐ต
)
More hash functions. We now show that, like for Count-Min, using more hash func-
tions does not decrease the expected error. We first state the Littlewood-Offord lemma as
strengthened by Erdลs.
Theorem 6.4.17 (Littlewood-Offord [126], Erdลs [69]). Let ๐1, . . . , ๐๐ โ R with |๐๐| โฅ1 for ๐ โ [๐]. Let further ๐1, . . . , ๐๐ โ {โ1, 1} be random variables with Pr[๐๐ = 1] = Pr[๐๐ =
โ1] = 1/2 and define ๐ =โ๐
๐=1 ๐๐๐๐. For any ๐ฃ โ R it holds that
Pr[|๐ โ ๐ฃ| โค 1] โค(
๐
โ๐/2โ
)ยท 12๐
= ๐
(1โ๐
).
Setting ๐ต1 = ๐ตโ = ฮ(๐ต) and ๐ต2 = ๐ต โ๐ตโ = ฮ(๐ต) in the theorem below gives the final
bound from Table 6.4.1 on L-CS with ๐ โฅ 3.
Theorem 6.4.18. Let ๐ โฅ ๐ต1 + ๐ต2 โฅ 2๐ต1, ๐ โฅ 3 odd, and โ1, . . . , โ๐ : [๐] โ [๐ต1]โ [๐ต2/๐]
and ๐ 1, . . . , ๐ ๐ : [๐] โ [๐ต1] โ [๐ต2/๐] be independent and truly random. Define the random
variable ๐๐ = medianโโ[๐]
(โ๐โ[๐]โ[๐ต1]
[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐
)for ๐ โ [๐] โ [๐ต1]. Then
E[|๐๐ โ ๐ (๐)๐๐|] = ฮฉ
(1
๐ต2
).
Proof: Like in the proof of the lower bound of Theorem 6.4.13 it suffices to show that for
each ๐ the probability that the sum ๐โ :=โ
๐โ[๐]โ([๐ต1]โช{๐})[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐ lies in the
interval ๐ผ = [โ1/(2๐ต2), 1/(2๐ต2)] is ๐(1/โ๐). Then at least half the (๐โ)โโ[๐] are at least
202
1/(2๐ต2) with probability ฮฉ(1) by an application of Stirlingโs formula, and it follows that
E[|๐๐ โ ๐ (๐)๐๐|] = ฮฉ(1/๐ต2).
Let โ โ [๐] be fixed, ๐1 = [2๐ต2] โ ([๐ต2] โช {๐}), and ๐2 = [๐] โ (๐1 โช {๐}), and write
๐โ =โ๐โ๐1
[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐ +โ๐โ๐2
[โโ(๐) = โโ(๐)]๐ โ(๐)๐๐ := ๐1 +๐2.
Now condition on the value of ๐2 = ๐ฅ2 of ๐2. Letting ๐ฝ = {๐ โ ๐1 : โโ(๐) = โโ(๐)} it followsby Theorem 6.4.17 that
Pr[๐โ โ ๐ผ | ๐2 = ๐ฅ2] = ๐
( โ๐ฝ โฒโ๐1
Pr[๐ฝ = ๐ฝ โฒ]โ|๐ฝ โฒ|+ 1
)= ๐
(Pr[|๐ฝ | < ๐/2] + 1/
โ๐).
An application of Chebyshevโs inequality gives that Pr[|๐ฝ | < ๐/2] = ๐(1/๐), so Pr[๐โ โ๐ผ] = ๐(1/
โ๐). Since this bound holds for any possible value of ๐ฅ2 we may remove the
conditioning and the desired result follows. ๏ฟฝ
Remark 6.4.19. The bound above is probably only tight for ๐ต1 = ฮ(๐ต2). Indeed, we know
that it cannot be tight for all ๐ต1 โค ๐ต2 since when ๐ต1 becomes very small, the bound from the
standard Count-Sketch with ๐ โฅ 3 takes overโand this is certainly worse than the bound
in the theorem. It is an interesting open problem (that requires a better anti-concentration
inequality than the Littlewood-Offord lemma) to settle the correct bound when ๐ต1 โช ๐ต2.
6.5. Experiments
Baselines. We compare our learning-based algorithms with their non-learning counter-
parts. Specifically, we augment Count-Min with a learned oracle as in Algorithm 26, and
call the learning-augmented algorithm โLearned Count-Minโ. We then compare Learned
Count-Min with traditional Count-Min. We also compare it with โLearned Count-Min with
Ideal Oracleโ where the neural-network oracle is replaced with an ideal oracle that knows
the identities of the heavy hitters in the test data, and โCount-Min with Lookup Tableโ
where the heavy hitter oracle is replaced with a lookup table that memorizes heavy hitters
in the training set. The comparison with the latter baseline allows us to show the ability
of Learned Count-Min to generalize and detect heavy items unseen in the training set. We
203
101 103 105 107
Sorted items (log scale)
100
101
102
103
104
105
Fre
quen
cy (
log
scal
e)
Figure 6.5.1: Frequency of Internet flows
repeat the evaluation where we replace Count-Min (CM) with Count-Sketch (CS) and the
corresponding variants.
Training a Heavy Hitter Oracle. We construct the heavy hitter oracle by training a
neural network to predict the heaviness of an item. Note that the prediction of the network
is not the final estimation. It is used in Algorithm 26 to decide whether to assign an item to
a unique bucket. We train the network to predict the item counts (or the log of the counts)
and minimize the squared loss of the prediction. Empirically, we found that when the counts
of heavy items are few orders of magnitude larger than the average counts (as is the case for
the Internet traffic data set), predicting the log of the counts leads to more stable training
and better results. Once the model is trained, we can select the optimal cutoff threshold
using validation data, and use the model as the oracle described in Algorithm 26.
6.5.1. Internet Traffic Estimation
For our first experiment, the goal is to estimate the number of packets for each network flow.
A flow is a sequence of packets between two machines on the Internet. It is identified by the
IP addresses of its source and destination and the application ports. Estimating the size of
each flow ๐โi.e., the number of its packets ๐๐โis a basic task in network management [161].
Dataset. The traffic data is collected at a backbone link of a Tier1 ISP between Chicago
and Seattle in 2016 [39]. Each recording session is around one hour. Within each minute,
there are around 30 million packets and 1 million unique flows. For a recording session, we
use the first 7 minutes for training, the following minute for validation, and estimate the
packet counts in subsequent minutes. The distribution of packet counts over Internet flows
204
is heavy tailed, as shown in Figure 6.5.1.
Model. The patterns of the Internet traffic are very dynamic, i.e., the flows with heavy
traffic change frequently from one minute to the next. However, we hypothesize that the
space of IP addresses should be smooth in terms of traffic load. For example, data centers at
large companies and university campuses with many students tend to generate heavy traffic.
Thus, though the individual flows from these sites change frequently, we could still discover
regions of IP addresses with heavy traffic through a learning approach.
We trained a neural network to predict the log of the packet counts for each flow. The
model takes as input the IP addresses and ports in each packet. We use two RNNs to encode
the source and destination IP addresses separately. The RNN takes one bit of the IP address
at each step, starting from the most significant bit. We use the final states of the RNN as the
feature vector for an IP address. The reason to use RNN is that the patterns in the bits are
hierarchical, i.e., the more significant bits govern larger regions in the IP space. Additionally,
we use two-layer fully-connected networks to encode the source and destination ports. We
then concatenate the encoded IP vectors, encoded port vectors, and the protocol type as the
final features to predict the packet counts.4 The inference time takes 2.8 microseconds per
item on a single GPU without any optimizations.5
Results. We plot the results of two representative test minutes (the 20th and 50th) in
Figure 6.5.2. All plots in the figure refer to the estimation error (6.2.2) as a function of the
used space. The space includes space for storing the buckets and the model. Since we use
the same model for all test minutes, the model space is amortized over the 50-minute testing
period.
Figure 6.5.2 reveals multiple findings. First, the figure shows that our learning-based
algorithms exhibit a better performance than their non-learning counterparts. Specifically,
Learned Count-Min, compared to Count-Min, reduces the the error by 32% with space of4We use RNNs with 64 hidden units. The two-layer fully-connected networks for the ports have 16 and
8 hidden units. The final layer before the prediction has 32 hidden units.5Note that new specialized hardware such as Google TPU, hardware accelerators and network compres-
sion [93, 163, 47, 94, 95] can drastically improve the inference time. Further, Nvidia has predicted that GPUwill get 1000x faster by 2025. Because of these trends, the overhead of neural network inference is expectedto be less significant in the future [120].
205
0.0 0.5 1.0 1.5 2.0Space (MB)
0
25
50
75
100
125
150
175
200
Ave
rage
Err
or p
er it
em (
Fre
quen
cy)
Count MinTable lookup CMLearned CM (NNet)Learned CM (Ideal)
(a) Learned Count-Min - 20th test minute
0.0 0.5 1.0 1.5 2.0Space (MB)
0
25
50
75
100
125
150
175
200
Ave
rage
Err
or p
er it
em (
Fre
quen
cy)
Count MinTable lookup CMLearned CM (NNet)Learned CM (Ideal)
(b) Learned Count-Min - 50th test minute
0.0 0.5 1.0 1.5 2.0Space (MB)
0
25
50
75
100
125
150
175
200
Ave
rage
Err
or p
er it
em (
Fre
quen
cy)
Count SketchTable lookup CSLearned CS (NNet)Learned CS (Ideal)
(c) Learned Count-Sketch - 20th test
minute
0.0 0.5 1.0 1.5 2.0Space (MB)
0
25
50
75
100
125
150
175
200
Ave
rage
Err
or p
er it
em (
Fre
quen
cy)
Count SketchTable lookup CSLearned CS (NNet)Learned CS (Ideal)
(d) Learned Count-Sketch - 50th test
minute
Figure 6.5.2: Comparison of our algorithms with Count-Min and Count-Sketch on Internettraffic data.
0.5 MB and 42% with space of 1.0 MB (Figure 6.5.2a). Learned Count-Sketch, compared
to Count-Sketch, reduces the error by 52% at 0.5 MB and 57% at 1.0 MB (Figure 6.5.2c).
In our experiments, each regular bucket takes 4 bytes. For the learned versions, we account
for the extra space needed for the unique buckets to store the item IDs and the counts. One
unique bucket takes 8 bytes, twice the space of a normal bucket.6
Second, the figure also shows that our neural-network oracle performs better than mem-
orizing the heavy hitters in a lookup table. This is likely due to the dynamic nature of
Internet traffic โi.e., the heavy flows in the training set are significantly different from those
in the test data. Hence, memorization does not work well. On the other hand, our model is
able to extract structures in the input that generalize to unseen test data.
6By using hashing with open addressing, it suffices to store IDs hashed into log๐ต๐ + ๐ก bits (instead ofwhole IDs) to ensure there is no collision with probability 1โ 2โ๐ก. log๐ต๐ + ๐ก is comparable to the numberof bits per counter, so the space for a unique bucket is twice the space of a normal bucket.
206
101 103 105
Sorted items (log scale)
100
101
102
103
104
Fre
quen
cy (
log
scal
e)
Figure 6.5.3: Frequency of search queries
Third, the figure shows that our modelโs performance stays roughly the same from the
20th to the 50th minute (Figure 6.5.2b and Figure 6.5.2d), showing that it learns properties
of the heavy items that generalize over time.
Lastly, although we achieve significant improvement over Count-Min and Count-Sketch,
our scheme can potentially achieve even better results with an ideal oracle, as shown by the
dashed green line in Figure 6.5.2. This indicates potential gains from further optimizing the
neural network model.
6.5.2. Search Query Estimation
For our second experiment, the goal is to estimate the number of times a search query
appears.
Dataset. We use the AOL query log dataset, which consists of 21 million search queries
collected from 650 thousand users over 90 days. The users are anonymized in the dataset.
There are 3.8 million unique queries. Each query is a search phrase with multiple words
(e.g., โperiodic table element posterโ). We use the first 5 days for training, the following
day for validation, and estimate the number of times different search queries appear in
subsequent days. The distribution of search query frequency follows the Zipfian law, as
shown in Figure 6.5.3.
Model. Unlike traffic data, popular search queries tend to appear more consistently across
multiple days. For example, โgoogleโ is the most popular search phrase in most of the days
in the dataset. Simply storing the most popular words can easily construct a reasonable
heavy hitter predictor. However, beyond remembering the popular words, other factors also
207
0.2 0.4 0.6 0.8 1.0 1.2Space (MB)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Ave
rage
Err
or p
er it
em (
Fre
quen
cy)
Count MinTable lookup CMLearned CM (NNet)Learned CM (Ideal)
(a) Learned Count-Min 50th test day
0.2 0.4 0.6 0.8 1.0 1.2Space (MB)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Ave
rage
Err
or p
er it
em (
Fre
quen
cy)
Count MinTable lookup CMLearned CM (NNet)Learned CM (Ideal)
(b) Learned Count-Min 80th test day
0.2 0.4 0.6 0.8 1.0 1.2Space (MB)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Ave
rage
Err
or p
er it
em (
Fre
quen
cy)
Count SketchTable lookup CSLearned CS (NNet)Learned CS (Ideal)
(c) Learned Count-Sketch 50th test day
0.2 0.4 0.6 0.8 1.0 1.2Space (MB)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Ave
rage
Err
or p
er it
em (
Fre
quen
cy)
Count SketchTable lookup CSLearned CS (NNet)Learned CS (Ideal)
(d) Learned Count-Sketch 80th test day
Figure 6.5.4: Comparison of our algorithms with Count-Min and Count-Sketch on searchquery data.
contribute to the popularity of a search phrase that we can learn. For example, popular
search phrases appearing in slightly different forms may be related to similar topics. Though
not included in the AOL dataset, in general, metadata of a search query (e.g., the location
of the search) can provide useful context of its popularity.
To construct the heavy hitter oracle, we trained a neural network to predict the number
of times a search phrase appears. To process the search phrase, we train an RNN with LSTM
cells that takes characters of a search phrase as input. The final states encoded by the RNN
are fed to a fully-connected layer to predict the query frequency. Our character vocabulary
includes lower-case English alphabets, numbers, punctuation marks, and a token for unknown
characters. We map the character IDs to embedding vectors before feeding them to the
RNN.7 We choose RNN due to its effectiveness in processing sequence data [162, 83, 120].
7We use an embedding size of 64 dimensions, an RNN with 256 hidden units, and a fully-connected layerwith 32 hidden units.
208
0.00 0.25 0.50 0.75 1.00False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
Tru
e P
ositi
ve R
ate
AUC = 0.900
(a) Internet traffic 50th test minute
0.00 0.25 0.50 0.75 1.00False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
Tru
e P
ositi
ve R
ate
AUC = 0.808
(b) Search query 50th test day
Figure 6.5.5: ROC curves of the learned models.
Results. We plot the estimation error vs. space for two representative test days (the 50th
and 80th day) in Figure 6.5.4. As before, the space includes both the bucket space and the
space used by the model. The model space is amortized over the test days since the same
model is used for all days.
Similarly, our learned sketches outperforms their conventional counterparts. For Learned
Count-Min, compared to Count-Min, it reduces the loss by 18% at 0.5 MB and 52% at 1.0
MB (Figure 6.5.4a). For Learned Count-Sketch, compared to Count-Sketch, it reduces the
loss by 24% at 0.5 MB and 71% at 1.0 MB (Figure 6.5.4c). Further, our algorithm performs
similarly for the 50th and the 80th day (Figure 6.5.4b and Figure 6.5.4d), showing that the
properties it learns generalize over a long period.
The figures also show an interesting difference from the Internet traffic data: memorizing
the heavy hitters in a lookup table is quite effective in the low space region. This is likely
because the search queries are less dynamic compared to Internet traffic (i.e., top queries in
the training set are also popular on later days). However, as the algorithm is allowed more
space, memorization becomes ineffective.
6.5.3. Analyzing Heavy Hitter Models
We analyze the accuracy of the neural network heavy hitter models to better understand
the results on the two datasets. Specifically, we use the models to predict whether an item
is a heavy hitter (top 1% in counts) or not, and plot the ROC curves in Figure 6.5.5. The
figures show that the model for the Internet traffic data has learned to predict heavy items
more effectively, with an AUC score of 0.9. As for the model for search query data, the AUC
209
score is 0.8. This also explains why we see larger improvements over non-learning algorithms
in Figure 6.5.2.
210
Chapter 7
The Low-Rank Approximation Problem
7.1. Introduction
In the low-rank decomposition problem, given an ๐ร๐ matrix ๐ด and a parameter ๐, the goal
is to compute a rank-๐ matrix [๐ด]๐ defined as below
[๐ด]๐ = argmin๐ดโฒ: rank(๐ดโฒ)โค๐
โ๐ดโ ๐ดโฒโ๐น .
Low-rank approximation is one of the most widely used tools in massive data analysis,
machine learning and statistics, and has been a subject of many algorithmic studies. In par-
ticular, multiple algorithms developed over the last decade use the โsketchingโ approach, see
e.g., [157, 171, 92, 52, 53, 144, 132, 34, 54]. Its idea is to use efficiently computable random
projections (a.k.a., โsketchesโ) to reduce the problem size before performing low-rank decom-
position, which makes the computation more space and time efficient. For example, [157, 52]
show that if ๐ is a random matrix of size ๐ ร ๐ chosen from an appropriate distribution1,
for ๐ depending on ๐, then one can recover a rank-๐ matrix ๐ดโฒ such that
โ๐ดโ ๐ดโฒโ๐น โค (1 + ๐)โ๐ดโ [๐ด]๐โ๐น
1Initial algorithms used matrices with independent sub-gaussian entries or randomized Fourier/Hadamardmatrices [157, 171, 92]. Starting from the seminal work of [53], researchers began to explore sparse binarymatrices, see e.g., [144, 132]. In this chapter we mostly focus on the latter distribution.
211
by performing an SVD on ๐๐ด โ R๐ร๐ followed by some post-processing. Typically the
sketch length ๐ is small, so the matrix ๐๐ด can be stored using little space (in the context of
streaming algorithms) or efficiently communicated (in the context of distributed algorithms).
Furthermore, the SVD of ๐๐ด can be computed efficiently, especially after another round of
sketching, reducing the overall computation time. See the survey [170] for an overview of
these developments.
In light of recent developments on learning-based algorithms [168, 59, 120, 24, 128, 151,
137, 141, 25, 33, 134, 96, 103, 1, 118], it is natural to ask whether similar improvements
in performance could be obtained for other sketch-based algorithms, notably for low-rank
decompositions. In particular, reducing the sketch length ๐ while preserving its accuracy
would make sketch-based algorithms more efficient. Alternatively, one could make sketches
more accurate for the same values of ๐. This is the problem we address in this chapter.
7.1.1. Our Results
Our main finding is that learned sketch matrices can indeed yield (much) more accurate
low-rank decompositions than purely random matrices. We focus our study on a streaming
algorithm for low-rank decomposition due to [157, 52], described in more detail in Section 7.2.
Specifically, suppose we have a training set of matrices Tr = {๐ด1, . . . , ๐ด๐} sampled from some
distribution ๐. Based on this training set, we compute a matrix ๐* that (locally) minimizes
the empirical loss โ๐
โ๐ด๐ โ SCW(๐*, ๐ด๐)โ๐น (7.1.1)
where SCW(๐*, ๐ด๐) denotes the output of the aforementioned Sarlos-Clarkson-Woodruff
streaming low-rank decomposition algorithm on matrix ๐ด๐ using the sketch matrix ๐*. Once
the sketch matrix ๐* is computed, it can be used instead of a random sketch matrix in all
future executions of the SCW algorithm.
We demonstrate empirically that, for multiple types of data sets, an optimized sketch
matrix ๐* can substantially reduce the approximation loss compared to a random matrix ๐,
sometimes by one order of magnitude (see Figure 7.5.1 or 7.5.2). Equivalently, the optimized
sketch matrix can achieve the same approximation loss for lower values of ๐.
A possible disadvantage of learned sketch matrices is that an algorithm that uses them no
212
longer offers worst-case guarantees. As a result, if such an algorithm is applied to an input
matrix that does not conform to the training distribution, the results might be worse than
if random matrices were used. To alleviate this issue, we also study mixed sketch matrices,
where (say) half of the rows are trained and the other half are random. We observe that
if such matrices are used in conjunction with the SCW algorithm, its results are no worse
than if only the random part of the matrix was used (Theorem 7.4.1 in Section 7.4)2. Thus,
the resulting algorithm inherits the worst-case performance guarantees of the random part
of the sketching matrix. At the same time, we show that mixed matrices still substantially
reduce the approximation loss compared to random ones, in some cases nearly matching the
performance of โpureโ learned matrices with the same number of rows. Thus, mixed random
matrices offer โthe best of both worldsโ: improved performance for matrices from the training
distribution, and worst-case guarantees otherwise.
Finally, in order to understand the theoretical aspects of our approach further, we study
the special case of ๐ = 1. This corresponds to the case where the sketch matrix ๐ is just a
single vector. Our results are two-fold:
โ We give an approximation algorithm for minimizing the empirical loss as in Equa-
tion 7.1.1, with an approximation factor depending on the stable rank of matrices in
the training set. See Appendix 7.7.
โ Under certain assumptions about the robustness of the loss minimizer, we show gen-
eralization bounds for the solution computed over the training set. See Appendix 7.8.
7.1.2. Related work
As outlined in the introduction, over the last few years there has been multiple papers
exploring the use of machine learning methods to improve the performance of โstandardโ
algorithms. Among those, the closest to the topic of this chapter are the works on learning-
based compressive sensing, such as [141, 25, 33, 134], and on learning-based streaming algo-
rithms [103]. Since neither of these two lines of research addresses computing matrix spectra,
the technical development therein was quite different from ours.
2We note that this property is non-trivial, in the sense that it does not automatically hold for all sketchingalgorithms. See Section 7.4 for further discussion.
213
Algorithm 27 Rank-๐ approximation of a matrix ๐ด using a sketch matrix ๐ (refer toSection 4.1.1 of [52])
1: Input: ๐ด โ R๐ร๐, ๐ โ R๐ร๐
2: ๐,ฮฃ, ๐ โค โ CompactSVD(๐๐ด) โ ๐ = rank(๐๐ด), ๐ โ R๐ร๐, ๐ โ R๐ร๐
3: return [๐ด๐ ]๐๐โค
In this chapter we focus on learning-based optimization of low-rank approximation algo-
rithms that use linear sketches, i.e., map the input matrix ๐ด into ๐๐ด and perform compu-
tation on the latter. There are other sketching algorithms for low-rank approximation that
involve non-linear sketches [125, 78, 77]. The benefit of linear sketches is that they are easy
to update under linear changes to the matrix ๐ด, and (in the context of our work) that they
are easy to differentiate, making it possible to compute the gradient of the loss function as
in Equation 7.1.1. We do not know whether it is possible to use our learning-based approach
for non-linear sketches, but we believe this is an interesting direction for future research.
7.2. Preliminaries
Notation. Consider a distribution ๐ on matrices ๐ด โ R๐ร๐. We define the training set
as {๐ด1, ยท ยท ยท , ๐ด๐} sampled from ๐. For matrix ๐ด, its singular value decomposition (SVD)
can be written as ๐ด = ๐ฮฃ๐ โค such that both ๐ โ R๐ร๐ and ๐ โ R๐ร๐ have orthonormal
columns and ฮฃ = diag{๐1, ยท ยท ยท , ๐๐} is a diagonal matrix with nonnegative entries. Moreover,
if rank(๐ด) = ๐, then the first ๐ columns of ๐ are an orthonormal basis for the column space
of ๐ด (we denote it as colsp(๐ด)), the first ๐ columns of ๐ are an orthonormal basis for the
row space of ๐ด (we denote it as rowsp(๐ด))3 and ๐๐ = 0 for ๐ > ๐. In many applications it
is quicker and more economical to compute the compact SVD which only contains the rows
and columns corresponding to the non-zero singular values of ฮฃ: ๐ด = ๐ ๐ฮฃ๐(๐ ๐)โค where
๐ ๐ โ R๐ร๐,ฮฃ๐ โ R๐ร๐ and ๐ ๐ โ R๐ร๐.
How sketching works. We start by describing the SCW algorithm for low-rank matrix
approximation, see Algorithm 27. The algorithm computes the singular value decomposition
of ๐๐ด = ๐ฮฃ๐ โค, and compute the best rank-๐ approximation of ๐ด๐ . Finally it outputs
[๐ด๐ ]๐๐โค as a rank-๐ approximation of ๐ด.
3The remaining columns of ๐ and ๐ respectively are orthonormal bases for the nullspace of ๐ด and ๐ดโค.
214
Note that if ๐ is much smaller than ๐ and ๐, the space bound of this algorithm is
significantly better than when computing a rank-๐ approximation for ๐ด in the naรฏve way.
Thus, minimizing ๐ automatically reduces the space usage of the algorithm.
Sketching matrix. We use matrix ๐ that is sparse4 Specifically, each column of ๐ has
exactly one non-zero entry, which is either +1 or โ1. This means that the fraction of non-
zero entries in ๐ is 1/๐. Therefore, one can use a vector to represent ๐, which is very
memory efficient. It is worth noting, however, after multiplying the sketching matrix ๐ with
other matrices, the resulting matrix (e.g., ๐๐ด) is in general not sparse.
7.3. Training Algorithm
In this section, we describe our learning-based algorithm for computing a data dependent
sketch ๐. The main idea is to use back-propagation algorithm to compute the stochastic
gradient of ๐ with respect to the rank-๐ approximation loss in Equation (7.1.1), where the
initial value of ๐ is the same random sparse matrix used in SCW. Once we have the stochastic
gradient, we can run stochastic gradient descent (SGD) algorithm to optimize ๐, in order to
improve the loss. Our algorithm maintains the sparse structure of ๐, and only optimizes the
values of the ๐ non-zero entries (initially +1 or โ1).However, the standard SVD implementation (step 2 in Algorithm 27 ) is not differen-
tiable, which means we cannot get the gradient in the straightforward way. To make SVD
implementation differentiable, we use the fact that the SVD procedure can be represented
as ๐ individual top singular value decompositions (see e.g. [10]), and that every top singular
value decomposition can be computed using the power method. See Figure 7.3.1 and Algo-
rithm 28. We store the results of the ๐-th iteration into the ๐-th entry of the list ๐,ฮฃ, ๐ ,
and finally concatenate all entries together to get the matrix (or matrix diagonal) format of
๐,ฮฃ, ๐ . This allows gradients to flow easily.
Due to the extremely long computational chain, it is infeasible to write down the explicit
form of loss function or the gradients. However, just like how modern deep neural networks
4The original papers [157, 52] used dense matrices, but the work of [53] showed that sparse matrices workas well. We use sparse matrices since they are more efficient to train and to operate on.
215
Algorithm 28 Differentiable implementation ofSVD1: Input: ๐ด1 โ R๐ร๐(๐ < ๐)2: ๐,ฮฃ, ๐ โ {}, {}, {}3: for ๐โ 1 . . .๐ do4: ๐ฃ1 โ random initialization in R๐
5: for ๐กโ 1 . . . ๐ do6: ๐ฃ๐ก+1 โ ๐ดโค
๐ ๐ด๐๐ฃ๐กโ๐ดโค
๐ ๐ด๐๐ฃ๐กโ2โ power method
7: ๐ [๐]โ ๐ฃ๐+1
8: ฮฃ[๐]โ โ๐ด๐๐ [๐]โ29: ๐ [๐]โ ๐ด๐๐ [๐]
ฮฃ[๐]
10: ๐ด๐+1 โ ๐ด๐ โ ฮฃ[๐]๐ [๐]๐ [๐]โค
11: return ๐,ฮฃ, ๐
๐ฃ1
๐ดโค๐ ๐ด๐๐ฃ๐ก
โ๐ดโค๐ ๐ด๐๐ฃ๐กโ2
ร๐ times
๐ฮฃ๐
๐ [๐]
ฮฃ[๐]
๐ [๐]
Figure 7.3.1: The ๐-th iteration ofpower method.
compute their gradients, we used the autograd feature in PyTorch to numerically compute
the gradient with respect to the sketching matrix ๐.
We emphasize again that our method is only optimizing ๐ for the training phase. After
๐ is fully trained, we still call Algorithm 27 for low rank approximation, which has exactly
the same running time as the SCW algorithm, but with better performance.
7.4. Worst Case Bound
In this section, we show that concatenating two sketching matrices ๐1 and ๐2 (of size respec-
tively ๐1 ร ๐ and ๐2 ร ๐) into a single matrix ๐* (of size (๐1 +๐2)ร ๐) will not increase
the approximation loss of the final rank-๐ solution computed by Algorithm 27 compared to
the case in which only one of ๐1 or ๐2 are used as the sketching matrix. In the rest of this
section, the sketching matrix ๐* denotes the concatenation of ๐1 and ๐2 as follows:
๐*((๐1+๐2)ร๐) =
โกโขโขโขโฃ๐1(๐1ร๐)
๐2(๐2ร๐)
โคโฅโฅโฅโฆFormally, we prove the following theorem in this section.
Theorem 7.4.1. Let ๐*ฮฃ*๐* and ๐1ฮฃ1๐1 respectively denote the SVD of ๐*๐ด and ๐1๐ด.
216
Then,
||[๐ด๐*]๐๐โค* โ ๐ด||๐น โค ||[๐ด๐1]๐๐
โค1 โ ๐ด||๐น .
In particular, the above theorem implies that the output of Algorithm 27 with the sketch-
ing matrix ๐* is a better rank-๐ approximation to ๐ด compared to the output of the algorithm
with ๐1. In the rest of this section we prove Theorem 7.4.1.
Fact 7.4.2 (Pythagorean Theorem). If ๐ด and ๐ต are matrices with the same number of
rows and columns, then ๐ด๐ตโค = 0 implies ||๐ด+๐ต||2๐น = ||๐ด||2๐น + ||๐ต||2๐น .
Claim 7.4.3. Let ๐1 and ๐2 be two matrices such that colsp(๐1) โ colsp(๐2). Then,
minrowsp(๐)โcolsp(๐1);
rank(๐)โค๐
||๐ โ ๐ด||2๐น โฅ minrowsp(๐)โcolsp(๐2);
rank(๐)โค๐
||๐ โ ๐ด||2๐น
Before proving the main theorem, we state the following helpful lemma from [52].
Lemma 7.4.4 (Lemma 4.3 in [52]). Suppose that ๐ is a matrix with orthonormal columns.
Then, a best rank-k approximation to ๐ด in the colsp(๐ ) is given by [๐ด๐ ]๐๐โค.
Proof: Note that ๐ด๐ ๐ โค is a row projection of ๐ด on the colsp(๐ ). Then, for any conforming
๐ ,
(๐ดโ ๐ด๐ ๐ โค)(๐ด๐ ๐ โค โ ๐ ๐ โค)โค = ๐ด(๐ผ โ ๐ ๐ โค)๐ (๐ด๐ โ ๐ )โค
= ๐ด(๐ โ ๐ ๐ โค๐ )(๐ด๐ โ ๐ )โค = 0.
where the last equality follows from the fact if ๐ has orthonormal columns then ๐ ๐ โค๐ = ๐
(e.g., see Lemma 3.5 in [52]). Then, by the Pythagorean Theorem (Fact 7.4.2), we have
||๐ดโ ๐ ๐ โค||2๐น = ||๐ดโ ๐ด๐ ๐ โค||2๐น + ||๐ด๐ ๐ โค โ ๐ ๐ โค||2๐น (7.4.1)
Since ๐ has orthonormal columns, for any conforming ๐ฅ, ||๐ฅโค๐ โค|| = ||๐ฅ||. Thus, for any ๐
217
of rank at most ๐,
||๐ด๐ ๐ โค โ [๐ด๐ ]๐๐โค||๐น = ||(๐ด๐ โ [๐ด๐ ]๐)๐
โค||๐น = ||๐ด๐ โ [๐ด๐ ]๐||๐นโค ||๐ด๐ โ ๐||๐น = ||๐ด๐ ๐ โค โ ๐๐ โค||๐น
(7.4.2)
Hence,
||๐ดโ [๐ด๐ ]๐๐โค||2๐น = ||๐ดโ ๐ด๐ ๐ โค||2๐น + ||๐ด๐ ๐ โค โ [๐ด๐ ]๐๐
โค||2๐น B By (7.4.1)
โค ||๐ดโ ๐ด๐ ๐ โค||2๐น + ||๐ด๐ ๐ โค โ ๐๐ โค||2๐น B By (7.4.2)
This implies that [๐ด๐ ]๐๐โค is a best rank-๐ approximation of ๐ด in the colsp(๐ ). ๏ฟฝ
Since the above statement is a transposed version of the lemma from [52], we include the
proof in the appendix for completeness.
Proof: (Proof of Theorem 7.4.1) First, we show that colsp(๐1) โ colsp(๐*). By the proper-
ties of the (compact) SVD, colsp(๐1) = rowsp(๐1๐ด) and colsp(๐*) = rowsp(๐*๐ด). Since, ๐*
has all rows of ๐1, then
colsp(๐1) โ colsp(๐*). (7.4.3)
By Lemma 7.4.4,
||๐ดโ [๐ด๐*]๐๐โค* ||๐น = min
rowsp(๐)โcolsp(๐*);rank(๐)โค๐
||๐ โ ๐ด||๐น
||๐ดโ [๐ด๐1]๐๐โค1 ||๐น = min
colsp(๐)โcolsp(๐1);rank(๐)โค๐
||๐ โ ๐ด||๐น
218
Finally, together with (7.4.3) and Claim 7.4.3
||๐ดโ [๐ด๐*]๐๐โค* ||๐น = min
rowsp(๐)โcolsp(๐*);rank(๐)โค๐
||๐ โ ๐ด||๐น
โค minrowsp(๐)โcolsp(๐1);
rank(๐)โค๐
||๐ โ ๐ด||๐น = ||๐ดโ [๐ด๐1]๐๐โค1 ||๐น .
which completes the proof. ๏ฟฝ
Finally, we note that the property of Theorem 7.4.1 is not universal, i.e., it does not
hold for all sketching algorithms for low-rank decomposition. For example, an alternative
algorithm proposed in [54] proceeds by letting ๐ to be the top ๐ singular vectors of ๐๐ด and
then reports ๐ด๐๐โค. It is not difficult to see that, by adding extra rows to the sketching
matrix ๐, one can skew the output of the algorithm so that it is far from the optimal.
7.5. Experimental Results
The main question considered here is whether, for natural matrix datasets, optimizing the
sketch matrix ๐ can improve the performance of the sketching algorithm for the low-rank de-
composition problem. To answer this question, we implemented and compared the following
methods for computing ๐ โ R๐ร๐.
โ Sparse Random. Sketching matrices are generated at random as in [53]. Specifically,
we select a random hash function โ : [๐]โ [๐], and for all ๐ = 1 . . . ๐, ๐โ[๐],๐ is selected
to be either +1 or โ1 with equal probability. All other entries in ๐ are set to 0.
Therefore, ๐ has exactly ๐ non-zero entries.
โ Dense Random. All the ๐๐ entries in the sketching matrices are sampled from
Gaussian distribution (we include this method for comparison).
โ Learned. Using the sparse random matrix as the initialization, we run Algorithm 28 to
optimize the sketching matrix using the training set, and return the optimized matrix.
โ Mixed (J). We first generate two sparse random matrices ๐1, ๐2 โ R๐2ร๐ (assuming ๐
is even), and define ๐ to be their combination. We then run Algorithm 28 to optimize
๐ using the training set, but only ๐1 will be updated, while ๐2 is fixed. Therefore, ๐ is
219
a mixture of learned matrix and random matrix, and the first matrix is trained jointly
with the second one.
โ Mixed (S). We first compute a learned matrix ๐1 โ R๐2ร๐ using the training set,
and then append another sparse random matrix ๐2 to get ๐ โ R๐ร๐. Therefore, ๐
is a mixture of learned matrix and random matrix, but the learned matrix is trained
separately.
Logo Eagle Friends Hyper Tech0
2
4
6
8
Test
Erro
r
0.1 0.2 0.2 0.5
3.02.1
4.3 4.12.9
8.0
2.0
4.74.0 3.5
8.0LearnedSparse RandomDense Random
Figure 7.5.1: Test error by datasets and sketching matrices For ๐ = 10,๐ = 20
20 40 60 80m
10 1
100
Test
Erro
r
LearnedSparse RandomDense Random
20 40 60 80m
100
101
Test
Erro
r
LearnedSparse RandomDense Random
20 40 60 80m
100
101Te
st E
rror
LearnedSparse RandomDense Random
Figure 7.5.2: Test error for Logo (left), Hyper (middle) and Tech (right) when ๐ = 10.
Datasets. We used a variety of datasets to test the performance of our methods:
โ Videos5: Logo, Friends, Eagle. We downloaded three high resolution videos from
Youtube, including logo video, Friends TV show, and eagle nest cam. From each video,
we collect 500 frames of size 1920ร 1080ร 3 pixels, and use 400 (100) matrices as the
training (test) set. For each frame, we resize it as a 5760ร 1080 matrix.
โ Hyper. We use matrices from HS-SOD, a dataset for hyperspectral images from
natural scenes [104]. Each matrix has 1024ร768 pixels, and we use 400 (100) matrices
5They can be downloaded from http://youtu.be/L5HQoFIaT4I, http://youtu.be/xmLZsEfXEgE andhttp://youtu.be/ufnf_q_3Ofg
220
as the training (test) set.
โ Tech. We use matrices from TechTC-300, a dataset for text categorization [60]. Each
matrix has 835, 422 rows, but on average only 25, 389 of the rows contain non-zero
entries. On average each matrix has 195 columns. We use 200 (95) matrices as the
training (test) set.
Evaluation metric. To evaluate the quality of a sketching matrix ๐, it suffices to evaluate
the output of Algorithm 27 using the sketching matrix ๐ on different input matrices ๐ด. We
first define the optimal approximation loss for test set Te as follows: App*Te , E๐ดโผTeโ๐ด โ[๐ด]๐โ๐น .
Note that App*Te does not depend on ๐, and in general it is not achievable by any
sketch ๐ with ๐ < ๐, because of information loss. Based on the definition of the optimal
approximation loss, we define the error of the sketch ๐ for Te as Err(Te, ๐) , E๐ดโผTeโ๐ด โSCW(๐,๐ด)โ๐น โ App*Te.
In our datasets, some of the matrices have much larger singular values than the others.
To avoid imbalance in the dataset, we normalize the matrices so that their top singular
values are all equal.
Figure 7.5.3: Low rank approximation results for Logo video frame: the best rank-10approximation (left), and rank-10 approximations reported by Algorithm 27 using a sparselearned sketching matrix (middle) and a sparse random sketching matrix (right).
7.5.1. Average test error
We first test all methods on different datasets, with various combination of ๐,๐. See Fig-
ure 7.5.1 for the results when ๐ = 10,๐ = 20. As we can see, for video datasets, learned
sketching matrices can get 20ร better test error than the sparse random or dense random
sketching matrices. For other datasets, learned sketching matrices are still more than 2รbetter. Similar improvement of the learned sketching matrices over the random sketching
221
Table 7.5.1: Test error in various settings.
๐,๐, Sketch Logo Eagle Friends Hyper Tech
10, 10,Learned 0.39 0.31 1.03 1.25 6.7010, 10,Random 5.22 6.33 11.56 7.90 17.0810, 20,Learned 0.10 0.18 0.22 0.52 2.9510, 20,Random 2.09 4.31 4.11 2.92 7.9920, 20,Learned 0.61 0.66 1.41 1.68 7.7920, 20,Random 4.18 5.79 9.10 5.71 14.5520, 40,Learned 0.18 0.41 0.42 0.72 3.0920, 40,Random 1.19 3.50 2.44 2.23 6.2030, 30,Learned 0.72 1.06 1.78 1.90 7.1430, 30,Random 3.11 6.03 6.27 5.23 12.8230, 60,Learned 0.21 0.61 0.42 0.84 2.7830, 60,Random 0.82 3.28 1.79 1.88 4.84
Table 7.5.2: Performance of mixed sketches
๐,๐, Sketch Logo Hyper Tech
10, 10,Learned 0.39 1.25 6.7010, 10,Random 5.22 7.90 17.08
10, 20,Learned 0.10 0.52 2.9510, 20,Mixed (J) 0.20 0.78 3.7310, 20,Mixed (S) 0.24 0.87 3.6910, 20,Random 2.09 2.92 7.9910, 40,Learned 0.04 0.28 1.1610, 40,Mixed (J) 0.05 0.34 1.3110, 40,Mixed (S) 0.05 0.34 1.2010, 40,Random 0.45 1.12 3.2810, 80,Learned 0.02 0.16 0.3110, 80,Random 0.09 0.32 0.80
matrices can be observed when ๐ = 10,๐ = 10, 20, 30, 40, ยท ยท ยท , 80, see Figure 7.5.2. We also
include the test error results in Table 7.5.1 for the case when ๐ = 20, 30. Finally, in Figure
7.5.3, we visualize an example output of the algorithm for the case ๐ = 10,๐ = 20 for the
Logo dataset.
7.5.2. Comparing Random, Learned and Mixed
In Table 7.5.2, we investigate the performance of the mixed sketching matrices by comparing
them with random and learned sketching matrices. In all scenarios, mixed sketching matrices
yield much better results than random sketching matrices, and sometimes the results are
comparable to those of learned sketching matrices. This means, in most cases it suffices to
train one half of the sketching matrix to obtain good empirical results, and at the same time,
by our Theorem 7.4.1, we can use the remaining random half of the sketch matrix to obtain
worst-case guarantee.
7.6. The case of ๐ = 1
In this section, we denote the SVD of ๐ด as ๐๐ดฮฃ๐ด(๐ ๐ด)โค such that both ๐๐ด and ๐ ๐ด have
orthonormal columns and ฮฃ๐ด = diag{๐๐ด1 , ยท ยท ยท , ๐๐ด
๐ } is a diagonal matrix with nonnegative
entries. For simplicity, we assume that for all ๐ด โผ ๐, 1 = ๐1 โฅ ยท ยท ยท โฅ ๐๐. We use ๐๐ด๐ to
denote the ๐-th column of ๐๐ด, and similarly for ๐ ๐ด๐ . Denote ฮฃ๐ด = diag{๐๐ด
1 , ยท ยท ยท , ๐๐ด๐ }.
We want to find [๐ด]๐, the rank-๐ approximation of ๐ด. In general, it is hard to obtain a
closed form expression of the output of Algorithm 27. However, for ๐ = 1, such expressions
can be calculated. Indeed, if ๐ = 1, the sketching matrix becomes a vector ๐ โ R1ร๐.
222
Therefore ๐ด๐ in Algorithm 27 has rank at most one, so it suffices to set ๐ = 1. Consider
a matrix ๐ด โผ ๐ as the input to Algorithm 27. By calculation, ๐ด๐ =โ
๐ ๐๐ด๐ โจ๐ , ๐๐ด
๐ โฉ(๐ ๐ด๐ )โค,
which is a vector. For example, if ๐ = ๐๐ด1 , we obtain ๐๐ด
1 (๐๐ด1 )โค.
Since ๐๐ด is a vector, applying SVD on it is equivalent to performing normalization.
Therefore,
๐ =
โ๐๐=1 ๐
๐ด๐ โจ๐ , ๐๐ด
๐ โฉ(๐ ๐ด๐ )โคโโ๐
๐=1(๐๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2
Ideally, we hope that ๐ is as close to ๐ ๐ด1 as possible, because that means [๐ด๐ ]1๐
โค is close
to ๐๐ด1 ๐
๐ด1 (๐
๐ด1 )โค, which captures the top singular component of ๐ด, i.e., the optimal solution.
More formally,
๐ด๐ =
โ๐๐=1(๐
๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ๐๐ด
๐โโ๐๐=1(๐
๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2
We want to maximize its norm, which is:
โ๐๐=1(๐
๐ด๐ )
4โจ๐ , ๐๐ด๐ โฉ2โ๐
๐=1(๐๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2
(7.6.1)
We note that one can simplify (7.6.1) by considering only the contribution from the top
left singular vector ๐๐ด1 , which corresponds to the maximization of the following expression:
โจ๐ , ๐๐ด1 โฉ2โ๐
๐=1(๐๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2
(7.6.2)
7.7. Optimization Bounds
Motivated by the empirical success of sketch optimization, we investigate the complexity of
optimizing the loss function. We focus on the simple case where ๐ = 1 and therefore ๐ is
just a (dense) vector. Our main observation is that a vector ๐ picked uniformly at random
from the ๐-dimensional unit sphere achieves an approximately optimal solution, with the
approximation factor depending on the maximum stable rank of matrices ๐ด1, ยท ยท ยท , ๐ด๐ . This
223
algorithm is not particularly useful for our purpose, as our goal is to improve over the random
choice of the sketching matrix ๐. Nevertheless, it demonstrates that an algorithm with a
non-trivial approximation factor exists.
Definition 7.7.1 (stable rank (๐โฒ)). For a matrix ๐ด, the stable rank of ๐ด is defined as
the squared ratio of Frobenius and operator norm of ๐ด. I.e.,
๐โฒ(๐ด) =||๐ด||2๐น||๐ด||2
=
โ๐(๐
๐ด๐ )
2
max๐(๐๐ด๐ )
2.
Note that since we assume for all matrices ๐ด โผ ๐, 1 = ๐๐ด1 โฅ ยท ยท ยท โฅ ๐๐ด
๐ > 0, for all these
matrices ๐โฒ(๐ด) =โ
๐(๐๐ด๐ )
2.
First, we consider the simplified objective function as in (7.6.2).
Lemma 7.7.2. A random vector ๐ which is picked uniformly at random from the ๐-dimensional
unit sphere, is an ๐(๐โฒ)-approximation to the optimum value of the simplified objective func-
tion in Equation (7.6.2), where ๐โฒ is the maximum stable rank of matrices ๐ด1, ยท ยท ยท , ๐ด๐ .
Proof: We will show that
๐ธ
[โจ๐ , ๐๐ด
1 โฉ2โ๐๐=1(๐
๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2
]= ฮฉ(1/๐โฒ(๐ด))
for all ๐ด โผ ๐ where ๐ is a vector picked uniformly at random from S๐โ1. Since for all
๐ด โผ ๐ we have โจ๐ ,๐๐ด1 โฉ2โ๐
๐=1(๐๐ด๐ )2โจ๐ ,๐๐ด
๐ โฉ2 โค 1, by the linearity of expectation we have that the vector
๐ achieves an ๐(๐โฒ)-approximation to the maximum value of the objective function
๐โ๐=1
โจ๐ , ๐๐ด๐
1 โฉ2โ๐๐=1(๐
๐ด๐
๐ )2โจ๐ , ๐๐ด๐
๐ โฉ2
.
First, recall that to sample ๐ uniformly at random from S๐โ1 we can generate ๐ asโ๐๐=1 ๐ผ๐๐
๐ด๐ /โโ๐
๐=1 ๐ผ2๐ where for all ๐ โค ๐, ๐ผ๐ โผ ๐ฉ (0, 1). This helps us evaluateE
[โจ๐ ,๐๐ด
1 โฉ2โ๐๐=1(๐
๐ด๐ )2โจ๐ ,๐๐ด
๐ โฉ2
]for an arbitrary matrix ๐ด โผ ๐:
224
๐ธ = E
[โจ๐ , ๐๐ด
1 โฉ2โ๐๐=1(๐
๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2
]= E
[(๐ผ1)
2โ๐๐=1(๐
๐ด๐ )
2 ยท (๐ผ๐)2
]
โฅ E
[(๐ผ1)
2โ๐๐=1(๐
๐ด๐ )
2 ยท (๐ผ๐)2|ฮจ1 โฉฮจ2
]ยท Pr(ฮจ1 โฉฮจ2)
where the events ฮจ1,ฮจ2 are defined as:
ฮจ1 , 1
[|๐ผ1| โฅ
1
2
]and ฮจ2 , 1
[๐โ
๐=2
(๐๐ด๐ )
2(๐ผ๐)2 โค 2 ยท ๐โฒ(๐ด)
]
Since ๐ผ๐s are independent, we have
๐ธ โฅ E
[(๐ผ1)
2
(๐ผ1)2 + 2 ยท ๐โฒ(๐ด) |ฮจ1 โฉฮจ2
]ยท Pr(ฮจ1) ยท Pr(ฮจ2) โฅ
1
8 ยท ๐โฒ(๐ด) + 1ยท Pr(ฮจ1) ยท Pr(ฮจ2)
where we used that (๐ผ1)2
(๐ผ1)2+2ยท๐โฒ(๐ด)is increasing for (๐ผ1)
2 โฅ 14. It remains to prove that
Pr(ฮจ1),Pr(ฮจ2) = ฮ(1). We observe that, since ๐ผ๐ โผ ๐ฉ (0, 1), we have
Pr(ฮจ1) = Pr
(|๐ผ1| โฅ
1
2
)= ฮ(1)
Similarly, by Markov inequality, we have
Pr(ฮจ2) = Pr
(๐โ
๐=1
(๐๐ด๐ )
2(๐ผ๐)2 โค 2๐โฒ(๐ด)
)โฅ 1โ Pr
(๐โ
๐=1
(๐๐ด๐ )
2(๐ผ๐)2 > 2๐โฒ(๐ด)
)โฅ 1
2๏ฟฝ
Next, we prove that a random vector ๐ โ S๐โ1 achieves an ๐(๐โฒ(๐ด))-approximation to
the optimum of the main objective function as in Equation (7.6.1).
Lemma 7.7.3. A random vector ๐ which is picked uniformly at random from the ๐-dimensional
unit sphere, is an ๐(๐โฒ)-approximation to the optimum value of the objective function in
Equation (7.6.1), where ๐โฒ is the maximum stable rank of matrices ๐ด1, ยท ยท ยท , ๐ด๐ .
Proof: We assume that the vector ๐ is generated via the same process as in the proof of
225
Lemma 7.7.2. It follows that
E
[โ๐๐=1(๐
๐ด๐ )
4โจ๐ , ๐๐ด๐ โฉ2โ๐
๐=1(๐๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2
]โฅ E
[(๐ผ1)
2โ๐๐=1(๐
๐ด๐ )
2 ยท (๐ผ๐)2
]= ฮฉ(1/๐โฒ(๐ด)) ๏ฟฝ
7.8. Generalization Bounds
Define the loss function as L(๐ ) , โE๐ดโผ๐[โ๐
๐=1(๐๐ด๐ )
4โจ๐ ,๐๐ด
๐ โฉ2โ๐๐=1(๐๐ด
๐ )2โจ๐ ,๐๐ด
๐ โฉ2] We want to find a vector ๐ โ S๐โ1
to minimize L(๐ ), where S๐โ1 is the ๐-dimensional unit sphere. Since ๐ is unknown, we are
optimizing LTr(๐ ) , โ 1๐
โ๐๐=1[
โ๐๐=1
(๐๐ด๐๐
)4โจ๐ ,๐
๐ด๐๐ โฉ2โ๐
๐=1
(๐๐ด๐๐
)2โจ๐ ,๐
๐ด๐๐ โฉ2
].
The importance of robust solutions. We start by observing that if ๐ minimizes the
training loss L, it is not necessarily true that ๐ is the optimal solution for the population
loss L. For example, it could be the case that {๐ด๐}๐=1,ยทยทยท ,๐ are diagonal matrices with only
1 non-zeros on the top row, while ๐ = (๐,โ1โ ๐2, 0, ยท ยท ยท , 0) for ๐ close to 0. In this case, we
know that LTr(๐ ) = โ1, which is at its minimum value.
However, such a solution is not robust. In the population distribution, if there exists a
matrix ๐ด such that ๐ด = diag(โ1โ 100๐2, 10๐, 0, 0, ยท ยท ยท , 0), insert ๐ into (7.6.1),
โ๐๐=1
(๐๐ด๐
)4 โจ๐ , ๐๐ด๐ โฉ2โ๐
๐=1 (๐๐ด๐ )
2 โจ๐ , ๐๐ด๐ โฉ2
=(1โ 100๐2)2๐2 + 104๐4(1โ ๐2)
(1โ 100๐2)๐2 + 100๐2(1โ ๐2)<
๐2 + 104๐4
101๐2 โ 100๐4=
1 + 104๐2
101โ 100๐2
The upper bound is very close to 0 if ๐ is small enough. This is because when the
denominator is extremely small, the whole expression is susceptible to minor perturbations
on ๐ด. This is a typical example showing the importance of finding a robust solution. Because
of this issue, we will show a generalization guarantee for a robust solution ๐ .
Definition of robust solution. First, define event ๐๐ด,๐ฟ,๐ , 1[โ๐
๐=1
(๐๐ด๐
)2 โจ๐ , ๐๐ด๐ โฉ2 < ๐ฟ
],
which is the denominator in the loss function. Ideally, we want this event to happen with a
small probability, which indicates that for most matrices, the denominator is large, therefore
๐ is robust in general. We have the following definition of robustness.
226
Definition 7.8.1 ((๐, ๐ฟ)-robustness). ๐ is (๐, ๐ฟ)-robust with respect to ๐ if E๐ดโผ๐[๐๐ด,๐ฟ,๐ ] โค๐. ๐ is (๐, ๐ฟ)-robust with respect to Tr if E๐ดโผTr[๐๐ด,๐ฟ,๐ ] โค ๐.
For a given ๐, we can define robust solution set that includes all robust vectors.
Definition 7.8.2 ((๐, ๐ฟ)-robust set). M๐,๐,๐ฟ is defined to be the set of all vectors ๐ โ S๐โ1
s.t. ๐ is (๐, ๐ฟ)-robust with respect to ๐.
Estimating M๐,๐,๐ฟ. The drawback of the above definition is that M๐,๐,๐ฟ is defined by the
unknown distribution ๐, so for fixed ๐ฟ and ๐, we cannot tell whether ๐ is in M๐,๐,๐ฟ or not.
However, we can estimate the robustness of ๐ using the training set. Specifically, we have
the following lemma:
Lemma 7.8.3 (Estimating robustness). For a training set Tr of size ๐ sampled uni-
formly at random from ๐, and a given ๐ โ R๐, a constant 1 > ๐ > 0, if ๐ is (๐, ๐ฟ)-robust
with respect to Tr, then with probability at least 1โ ๐โ๐2๐๐
2 , ๐ is(
๐1โ๐
, ๐ฟ)-robust with respect
to ๐.
Proof: Suppose that Pr๐ดโผ๐[๐๐ด,๐ฟ,๐ ] = ๐1, which means E[โ
๐ด๐โTr ๐๐ด๐,๐ฟ,๐
]= ๐1๐ . Since events
๐๐ด๐,๐ฟ,๐ โs are 0-1 random variables, by Chernoff bound,
Pr
(โ๐ด๐โTr
๐๐ด๐,๐ฟ,๐ โค (1โ ๐)๐1๐
)โค ๐โ
๐2๐1๐2
If ๐1 < ๐ < ๐/(1 โ ๐), our claim is immediately true. Otherwise, we know ๐โ๐2๐1๐
2 โค๐โ
๐2๐๐2 . Hence, with probability at least 1โ ๐โ
๐2๐๐2 , ๐๐ =
โ๐ด๐โผTr ๐๐ด๐,๐ฟ,๐ > (1โ ๐)๐1๐ . This
implies that with probability at least 1โ ๐โ๐2๐๐
2 , ๐1 โค ๐1โ๐
. ๏ฟฝ
Lemma 7.8.3 implies that for a fixed solution ๐ , if it is (๐, ๐ฟ)-robust in Tr, it is also
(๐(๐), ๐ฟ)-robust in ๐ with high probability. However, Lemma 7.8.3 only works for a single
solution ๐ , but there are infinitely many potential ๐ on the ๐-dimensional unit sphere.
To remedy this problem, we discretize the unit sphere to bound the number of potential
solutions. Classical results tell us that discretizing the unit sphere into a grid of edge length
๐โ๐gives ๐ถ
๐๐points on the grid for some constant ๐ถ (e.g., see Section 3.3 in [98] for more
details). We will only consider these points as potential solutions, denoted as B๐. Thus, we
227
can find a โrobustโ solution ๐ โ B๐with decent probability, using Lemma 7.8.3 and union
bound.
Lemma 7.8.4 (Picking robust ๐ ). For a fixed constant ๐ > 0, 1 > ๐ > 0, with probability
at least 1 โ ๐ถ๐๐๐โ
๐2๐๐2 , any (๐, ๐ฟ)-robust ๐ โ B
๐with respect to Tr is
(๐
1โ๐, ๐ฟ)-robust with
respect to ๐.
Since we are working on the discretized solution, we need a new definition of robust set.
Definition 7.8.5 (Discretized (๐, ๐ฟ)-robust set). M๐,๐,๐ฟ is defined to be the set of all
vector ๐ โ B๐s.t. ๐ is (๐, ๐ฟ)-robust with respect to ๐.
Using similar arguments as Lemma 7.8.4, we know all solutions from M๐,๐,๐ฟ are robust with
respect to Tr as well.
Lemma 7.8.6. With probability at least 1โ ๐ถ๐๐๐โ
๐2๐๐3 , for a constant ๐ > 0, all solutions in
M๐,๐,๐ฟ, are ((1 + ๐)๐, ๐ฟ)-robust with respect to Tr.
Proof: Consider a fixed solution ๐ โ M๐,๐,๐ฟ. Note that E[โ
๐ด๐โTr ๐๐ด๐,๐ฟ,๐
]= ๐๐ and ๐๐ด๐,๐ฟ,๐
are 0-1 random variables. Therefore by Chernoff bound,
Pr
(โ๐ด๐โTr
๐๐ด๐,๐ฟ,๐ โฅ (1 + ๐)๐๐
)โค ๐โ
๐2๐๐3 .
Hence, with probability at least 1โ ๐โ๐2๐๐
3 , ๐ is ((1 + ๐)๐, ๐ฟ)-robust with respect to Tr. By
union bound on all points in M๐,๐,๐ฟ โ B๐, the proof is complete. ๏ฟฝ
7.8.1. Generalization Bound for Robust Solutions
Finally, we show the generalization bounds for robust solutions. To this can we use Rademacher
complexity to prove generalization bound. Define Rademacher complexity ๐ (M๐,๐,๐ฟ โTr) as
๐ (M๐,๐,๐ฟ โ Tr) ,1
๐E
๐โผ{ยฑ1}๐sup
๐ โM๐,๐,๐ฟ
๐โ๐=1
โกโขโฃ๐๐
โ๐๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2โ๐๐=1
(๐๐ด๐
๐
)2โจ๐ , ๐๐ด๐
๐ โฉ2
โคโฅโฆ .
๐ (M๐,๐,๐ฟ โTr) is handy, because we have the following theorem (notice that the loss function
takes value in [โ1, 0]):
228
Theorem 7.8.7 (Theorem 26.5 in [160]). Given constant ๐ฟ > 0, with probability of at
least 1โ ๐ฟ, for all ๐ โ M๐,๐,๐ฟ,
L(๐ )โ LTr(๐ ) โค 2๐ (M๐,๐,๐ฟ โ Tr) + 4
โ2 log(4/๐ฟ)
๐
That means, it suffices to bound ๐ (M๐,๐,๐ฟ โ Tr) to get the generalization bound.
Lemma 7.8.8 (Bound on ๐ (M๐,๐,๐ฟ โ Tr)). For a constant ๐ > 0, with probability at
least 1โ ๐ถ๐๐๐โ
๐2๐๐3 , ๐ (M๐,๐,๐ฟ โ Tr) โค (1 + ๐)๐+ 1โ๐ฟ
2๐ฟ+ ๐โ
๐.
Proof: Define ๐โฒ = (1 + ๐)๐. By Lemma 7.8.6, we know that with probability 1โ ๐ถ๐๐๐โ
๐2๐๐3 ,
any ๐ โ M๐,๐,๐ฟ is (๐โฒ, ๐ฟ)-robust with respect to Tr, henceโ
๐ดโTr ๐๐ด,๐ฟ,๐ โค ๐โฒ๐ . The analysis
below is conditioned on this event.
Define โ๐ด,๐ฟ,๐ , max{๐ฟ,โ๐๐=1(๐
๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2}. We know that with probability 1โ ๐ถ
๐๐๐โ
๐2๐๐3 ,
๐ ยท๐ (M๐,๐,๐ฟ โ Tr) = E๐โผ{ยฑ1}๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
๐๐
โ๐๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2โ๐๐=1
(๐๐ด๐
๐
)2โจ๐ , ๐๐ด๐
๐ โฉ2
โค ๐โฒ๐ + E๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
๐๐
โ๐๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2
โ๐ด,๐ฟ,๐
(7.8.1)
where (7.8.1) holds because by definition, โ๐ด,๐ฟ,๐ โฅโ๐
๐=1(๐๐ด๐ )
2โจ๐ , ๐๐ด๐ โฉ2 if and only if ๐๐ด,๐ฟ,๐ = 1,
which happens for at most ๐โฒ๐ matrices. Note that for any matrix ๐ด๐,
๐๐
โ๐๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2โ๐๐=1
(๐๐ด๐
๐
)2โจ๐ , ๐๐ด๐
๐ โฉ2โค 1.
229
Now,
E๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
๐๐
โ๐๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2
โ๐ด,๐ฟ,๐
โค E๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
(1๐๐=1
โ๐๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2
๐ฟโ 1๐๐=โ1
๐โ๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2) (7.8.2)
= E๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
(๐๐
๐โ๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2 + 1๐๐=1
๐โ๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2(1
๐ฟโ 1)
)
โค ๐
2๐ฟโ ๐
2+ E๐ sup
๐ โM๐,๐,๐ฟ
๐โ๐=1
๐๐
๐โ๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2 (7.8.3)
The first inequality (7.8.2) holds asโ๐
๐=1
(๐๐ด๐๐
)4โจ๐ ,๐
๐ด๐๐ โฉ2
โ๐ด,๐ฟ,๐ โ [๐ฟ, 1]. It remains to bound the last
term (7.8.3).
E๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
๐๐
๐โ๐=1
(๐๐ด๐
๐
)4โจ๐ , ๐๐ด๐
๐ โฉ2 โค๐โ
๐=1
E๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
๐๐
โจ๐ ,(๐๐ด๐
๐
)2๐
๐ด๐
๐
โฉ2
(7.8.4)
By contraction lemma of Rademacher complexity, we have
E๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
๐๐
โจ๐ ,(๐๐ด๐
๐
)2๐
๐ด๐
๐
โฉ2
โค E๐ sup๐ โM๐,๐,๐ฟ
๐โ๐=1
๐๐
โจ๐ ,(๐๐ด๐
๐
)2๐
๐ด๐
๐
โฉ
= E๐ sup๐ โM๐,๐,๐ฟ
โจ๐ ,
๐โ๐=1
๐๐
(๐๐ด๐
๐
)2๐
๐ด๐
๐
โฉ
โค E๐
๐โ๐=1
๐๐
(๐๐ด๐
๐
)2๐
๐ด๐
๐
2
Where the last inequality is by Cauchy-Schwartz inequality. Now, by Jensenโs inequality,
E๐
๐โ๐=1
๐๐
(๐๐ด๐
๐
)2๐
๐ด๐
๐
2
โค
โโE๐
๐โ๐=1
๐๐
(๐๐ด๐
๐
)2๐
๐ด๐
๐
2
2
โโ 1/2
=
(๐โ๐=1
(๐๐ด๐
๐
)4)1/2
โคโ๐
(7.8.5)
230
Combining (7.8.1), (7.8.3), (7.8.4) and (7.8.5), we have ๐ (M๐,๐,๐ฟ โ Tr) โค ๐โฒ + 1โ๐ฟ2๐ฟ
+ ๐โ๐. ๏ฟฝ
Combining with Theorem 7.8.7, we get our main theorem:
Theorem 7.8.9 (Main Theorem). Given a training set Tr = {๐ด๐}๐๐=1 sampled uniformly
from ๐, and fixed constants 1 > ๐ โฅ 0, ๐ฟ > 0, 1 > ๐ > 0, if there exists a (๐, ๐ฟ)-robust
solution ๐ โ B๐with respect to Tr, then with probability at least 1 โ ๐ถ
๐๐๐โ
๐2๐๐2 โ ๐ถ
๐๐๐โ
๐2๐๐3(1โ๐) ,
for ๐ โ B๐that is a (๐, ๐ฟ)-robust solution with respect to Tr,
L(๐ ) โค LTr(๐ ) +2(1 + ๐)๐
1โ ๐+
1โ ๐ฟ
๐ฟ+
2๐โ๐
+ 4
โ2 log(4/๐ฟ)
๐
Proof: Since we can find ๐ โ B๐s.t. ๐ is (๐, ๐ฟ)-robust with respect to Tr, by Lemma 7.8.4,
with probability 1โ ๐ถ๐๐๐โ
๐2๐๐2 , ๐ is ( ๐
1โ๐, ๐ฟ)-robust with respect to ๐. Therefore, ๐ โ M๐, ๐
1โ๐,๐ฟ.
By Lemma 7.8.8, with probability at least 1โ ๐ถ๐๐๐โ
๐2๐๐3(1โ๐) , ๐ (M๐,๐,๐ฟ โTr) โค ๐(1+๐)
1โ๐+ 1โ๐ฟ
2๐ฟ+ ๐โ
๐.
Combined with Theorem 7.8.7, the proof is complete. ๏ฟฝ
In summary, Theorem 7.8.9 states that if we can find a robust solution ๐ which โfitsโ the
training set then it generalizes to the test set.
231
Bibliography
[1] A. Aamand, P. Indyk, and A. Vakilian. (learned) frequency estimation algorithmsunder zipfian distribution. arXiv preprint arXiv:1908.05198, 2019.
[2] Z. Abbassi, V. S. Mirrokni, and M. Thakur. Diversity maximization under matroidconstraints. In Proceedings of the 19th ACM SIGKDD international conference onKnowledge discovery and data mining, pages 32โ40, 2013.
[3] P. K. Agarwal and J. Pan. Near-linear algorithms for geometric hitting sets and setcovers. In Proc. 30th Annu. Sympos. Comput. Geom. (SoCG), pages 271โ279, 2014.
[4] A. Aghazadeh, R. Spring, D. LeJeune, G. Dasarathy, A. Shrivastava, and R. G. Bara-niuk. Mission: Ultra large-scale feature selection using count-sketches. In InternationalConference on Machine Learning (ICML), 2018.
[5] S. Agrawal, M. Shadravan, and C. Stein. Submodular secretary problem with shortlists.In 10th Innovations in Theoretical Computer Science Conference (ITCS), pages 1:1โ1:19, 2019.
[6] K. J. Ahn and S. Guha. Linear programming in the semi-streaming model with appli-cation to the maximum matching problem. In Proc. 38th Int. Colloq. Automata Lang.Prog. (ICALP), pages 526โ538, 2011.
[7] K. J. Ahn and S. Guha. Access to data and number of iterations: Dual primal algo-rithms for maximum matching under resource constraints. In Proc. 27th ACM Sympos.Parallel Alg. Arch. (SPAA), pages 202โ211, 2015.
[8] K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph structure via linear mea-surements. In Proc. 23rd ACM-SIAM Sympos. Discrete Algs. (SODA), pages 459โ467,2012.
[9] K. J. Ahn, S. Guha, and A. McGregor. Graph sketches: sparsification, spanners, andsubgraphs. In Proc. 31st ACM Sympos. on Principles of Database Systems (PODS),pages 5โ14, 2012.
[10] Z. Allen-Zhu and Y. Li. Lazysvd: even faster svd decomposition yet without agonizingpain. In Advances in Neural Information Processing Systems, pages 974โ982, 2016.
233
[11] Z. Allen-Zhu and L. Orecchia. Nearly-linear time positive LP solver with faster con-vergence rate. In Proc. 47th Annu. ACM Sympos. Theory Comput. (STOC), pages229โ236, 2015.
[12] Z. Allen-Zhu and L. Orecchia. Using optimization to break the epsilon barrier: Afaster and simpler width-independent algorithm for solving positive linear programs inparallel. In Proc. 26th ACM-SIAM Sympos. Discrete Algs. (SODA), pages 1439โ1456,2015.
[13] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating thefrequency moments. Journal of Computer and system sciences, 58(1):137โ147, 1999.
[14] N. Alon, D. Moshkovitz, and S. Safra. Algorithmic construction of sets for ๐-restrictions. ACM Trans. Algo., 2(2):153โ177, 2006.
[15] B. Aronov, E. Ezra, and M. Sharir. Small-size ๐-nets for axis-parallel rectangles andboxes. SIAM J. Comput., 39(7):3248โ3282, 2010.
[16] S. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1):121โ164, 2012.
[17] S. Assadi. Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set CoverProblem. In Proc. 36th ACM Sympos. on Principles of Database Systems (PODS),pages 321โ335, 2017.
[18] S. Assadi, N. Karpov, and Q. Zhang. Distributed and streaming linear programmingin low dimensions. In Proc. 38th ACM Sympos. on Principles of Database Systems(PODS), pages 236โ253, 2019.
[19] S. Assadi and S. Khanna. Tight bounds on the round complexity of the distributedmaximum coverage problem. In Proc. 29th ACM-SIAM Sympos. Discrete Algs.(SODA), pages 2412โ2431, 2018.
[20] S. Assadi, S. Khanna, and Y. Li. Tight bounds for single-pass streaming complexity ofthe set cover problem. In Proc. 48th Annu. ACM Sympos. Theory Comput. (STOC),pages 698โ711, 2016.
[21] S. Assadi, S. Khanna, Y. Li, and G. Yaroslavtsev. Maximum matchings in dynamicgraph streams and the simultaneous communication model. In Proc. 27th ACM-SIAMSympos. Discrete Algs. (SODA), pages 1345โ1364, 2016.
[22] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodularmaximization: Massive data summarization on the fly. In Proceedings of the 20th ACMSIGKDD international conference on Knowledge discovery and data mining, pages671โ680, 2014.
[23] B. Bahmani, A. Goel, and K. Munagala. Efficient primal-dual graph algorithms formapreduce. In International Workshop on Algorithms and Models for the Web-Graph,pages 59โ78, 2014.
234
[24] M.-F. Balcan, T. Dick, T. Sandholm, and E. Vitercik. Learning to branch. In Inter-national Conference on Machine Learning (ICML), pages 353โ362, 2018.
[25] L. Baldassarre, Y.-H. Li, J. Scarlett, B. Gรถzcรผ, I. Bogunovic, and V. Cevher. Learning-based compressive subsampling. IEEE Journal of Selected Topics in Signal Processing,10(4):809โ822, 2016.
[26] N. Bansal, A. Caprara, and M. Sviridenko. A new approximation method for setcovering problems, with applications to multidimensional bin packing. SIAM Journalon Computing, 39(4):1256โ1278, 2009.
[27] Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statis-tics approach to data stream and communication complexity. J. Comput. Sys. Sci.,68(4):702โ732, 2004.
[28] Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Countingdistinct elements in a data stream. In International Workshop on Randomization andApproximation Techniques in Computer Science, pages 1โ10, 2002.
[29] M. Bateni, H. Esfandiari, and V. S. Mirrokni. Almost optimal streaming algorithmsfor coverage problems. In Proc. 29th ACM Sympos. Parallel Alg. Arch. (SPAA), pages13โ23, 2017.
[30] G. Bennett. Probability inequalities for the sum of independent random variables.Journal of the American Statistical Association, 57(297):33โ45, 1962.
[31] S. Bhattacharya, M. Henzinger, D. Nanongkai, and C. Tsourakakis. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. InProc. 47th Annu. ACM Sympos. Theory Comput. (STOC), pages 173โ182, 2015.
[32] J. Blasiok. Optimal streaming and tracking distinct elements with high probability. InProc. 29th ACM-SIAM Sympos. Discrete Algs. (SODA), pages 2432โ2448, 2018.
[33] A. Bora, A. Jalal, E. Price, and A. G. Dimakis. Compressed sensing using generativemodels. In International Conference on Machine Learning ICML, pages 537โ546, 2017.
[34] C. Boutsidis and A. Gittens. Improved matrix algorithms via the subsampled ran-domized hadamard transform. SIAM Journal on Matrix Analysis and Applications,34(3):1301โ1340, 2013.
[35] R. S. Boyer and J. S. Moore. Mjrtyโa fast majority vote algorithm. In AutomatedReasoning, pages 105โ117. 1991.
[36] O. Boykin, A. Bryant, E. Chen, ellchow, M. Gagnon, M. Nakamura, S. Noble,S. Ritchie, A. Singhal, and A. Zymnis. Algebird. https://twitter.github.io/
algebird/, 2016.
[37] V. Braverman, S. R. Chestnut, N. Ivkin, J. Nelson, Z. Wang, and D. P. Woodruff.Bptree: An โ2 heavy hitters algorithm using constant memory. In Proc. 36th ACMSympos. on Principles of Database Systems (PODS), pages 361โ376, 2017.
235
[38] V. Braverman, S. R. Chestnut, N. Ivkin, and D. P. Woodruff. Beating countsketch forheavy hitters in insertion streams. In Proc. 48th Annu. ACM Sympos. Theory Comput.(STOC), pages 740โ753, 2016.
[39] CAIDA. Caida internet traces 2016 chicago.http://www.caida.org/data/monitors/passive-equinix-chicago.xml.
[40] E. J. Candรจs, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signalreconstruction from highly incomplete frequency information. IEEE Transactions oninformation theory, 52(2):489โ509, 2006.
[41] A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multi-partycommunication complexity of set disjointness. In 18th Annual IEEE Conference onComputational Complexity, pages 107โ117, 2003.
[42] A. Chakrabarti and A. Wirth. Incidence geometries and the pass complexity of semi-streaming set cover. In Proc. 27th ACM-SIAM Sympos. Discrete Algs. (SODA), pages1365โ1373, 2016.
[43] M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams.In Proc. 29th Int. Colloq. Automata Lang. Prog. (ICALP), pages 693โ703, 2002.
[44] M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams.In Proc. 29th Int. Colloq. Automata Lang. Prog. (ICALP), pages 693โ703, 2002.
[45] B. Chazelle, R. Rubinfeld, and L. Trevisan. Approximating the minimum spanningtree weight in sublinear time. SIAM Journal on computing, 34(6):1370โ1379, 2005.
[46] C. Chekuri, S. Gupta, and K. Quanrud. Streaming algorithms for submodular functionmaximization. In International Colloquium on Automata, Languages, and Program-ming, pages 318โ330, 2015.
[47] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze. Eyeriss: An energy-efficient reconfig-urable accelerator for deep convolutional neural networks. IEEE Journal of Solid-StateCircuits, 52(1):127โ138, 2017.
[48] F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In Proc. 19thInt. Conf. World Wide Web (WWW), pages 231โ240, 2010.
[49] R. Chitnis, G. Cormode, H. Esfandiari, M. Hajiaghayi, A. McGregor, M. Monem-izadeh, and S. Vorotnikova. Kernelization via sampling with applications to findingmatchings and related problems in dynamic graph streams. In Proc. 27th ACM-SIAMSympos. Discrete Algs. (SODA), pages 1326โ1344, 2016.
[50] K. L. Clarkson and P. W. Shor. Applications of random sampling in computationalgeometry, II. Discrete Comput. Geom., 4:387โ421, 1989.
[51] K. L. Clarkson and K. Varadarajan. Improved approximation algorithms for geometricset cover. Discrete Comput. Geom., 37(1):43โ58, 2007.
236
[52] K. L. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model.In Proc. 41st Annu. ACM Sympos. Theory Comput. (STOC), pages 205โ214, 2009.
[53] K. L. Clarkson and D. P. Woodruff. Low-rank approximation and regression in inputsparsity time. Journal of the ACM (JACM), 63(6):54, 2017.
[54] M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu. Dimensionality reductionfor k-means clustering and low rank approximation. In Proc. 47th Annu. ACM Sympos.Theory Comput. (STOC), pages 163โ172, 2015.
[55] G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. Pro-ceedings of the VLDB Endowment, 1(2):1530โ1541, 2008.
[56] G. Cormode, H. J. Karloff, and A. Wirth. Set cover algorithms for very large datasets.In Proc. 19th ACM Conf. Info. Know. Manag. (CIKM), pages 479โ488, 2010.
[57] G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58โ75, 2005.
[58] G. Cormode and S. Muthukrishnan. Summarizing and mining skewed data streams. InProceedings of the 2005 SIAM International Conference on Data Mining, pages 44โ55.SIAM, 2005.
[59] H. Dai, E. Khalil, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial optimiza-tion algorithms over graphs. In Advances in Neural Information Processing Systems,pages 6351โ6361, 2017.
[60] D. Davidov, E. Gabrilovich, and S. Markovitch. Parameterized generation of labeleddatasets for text categorization based on a hierarchical directory. In Proceedings of the27th Annual International ACM SIGIR Conference on Research and Development inInformation Retrieval, SIGIR โ04, pages 250โ257, 2004.
[61] E. D. Demaine, P. Indyk, S. Mahabadi, and A. Vakilian. On streaming and commu-nication complexity of the set cover problem. In Proc. 28th Int. Symp. Dist. Comp.(DISC), volume 8784, pages 484โ498, 2014.
[62] E. D. Demaine, A. Lรณpez-Ortiz, and J. I. Munro. Frequency estimation of internetpacket streams with limited space. In Proc. 10th Annu. European Sympos. Algorithms(ESA), pages 348โ360, 2002.
[63] I. Dinur and D. Steurer. Analytical approach to parallel repetition. In Proc. 46thAnnu. ACM Sympos. Theory Comput. (STOC), pages 624โ633, 2014.
[64] D. L. Donoho. Compressed sensing. IEEE Transactions on information theory,52(4):1289โ1306, 2006.
[65] P. Drineas, R. Kannan, and M. W. Mahoney. Sampling sub-problems of heteroge-neous Max-Cut problems and approximation algorithms. In Annual Symposium onTheoretical Aspects of Computer Science, pages 57โ68, 2005.
237
[66] F. Dzogang, T. Lansdall-Welfare, S. Sudhahar, and N. Cristianini. Scalable preferencelearning from data streams. In Proc. 24th Int. Conf. World Wide Web (WWW), pages885โ890, 2015.
[67] Y. Emek and A. Rosรฉn. Semi-streaming set cover. In Proc. 41st Int. Colloq. AutomataLang. Prog. (ICALP), volume 8572 of Lect. Notes in Comp. Sci., pages 453โ464, 2014.
[68] A. Ene, S. Har-Peled, and B. Raichel. Geometric packing under non-uniform con-straints. In Proc. 28th Annu. Sympos. Comput. Geom. (SoCG), pages 11โ20, 2012.
[69] P. Erdรถs. On a lemma of littlewood and offord. Bulletin of the American MathematicalSociety, 51(12):898โ902, 1945.
[70] C. Estan and G. Varghese. New directions in traffic measurement and accounting:Focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems(TOCS), 21(3):270โ313, 2003.
[71] T. Feder and D. H. Greene. Optimal algorithms for approximate clustering. In Proc.20th Annu. ACM Sympos. Theory Comput. (STOC), pages 434โ444, 1988.
[72] U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM),45(4):634โ652, 1998.
[73] J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problemsin a semi-streaming model. Theoretical Computer Science, 348(2-3):207โ216, 2005.
[74] P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applica-tions. Journal of computer and system sciences, 31(2):182โ209, 1985.
[75] R. J. Fowler, M. S. Paterson, and S. L. Tanimoto. Optimal packing and covering inthe plane are NP-complete. Inform. Process. Lett., 12(3):133โ137, 1981.
[76] N. Garg and J. Koenemann. Faster and simpler algorithms for multicommodity flowand other fractional packing problems. SIAM Journal on Computing, 37(2):630โ652,2007.
[77] M. Ghashami, E. Liberty, J. M. Phillips, and D. P. Woodruff. Frequent directions:Simple and deterministic matrix sketching. SIAM Journal on Computing, 45(5):1762โ1792, 2016.
[78] M. Ghashami and J. M. Phillips. Relative errors for deterministic low-rank matrixapproximations. In Proc. 25th ACM-SIAM Sympos. Discrete Algs. (SODA), pages707โ717, 2014.
[79] A. Gilbert and P. Indyk. Sparse recovery using sparse matrices. Proceedings of theIEEE, 98(6):937โ947, 2010.
[80] A. Goel, M. Kapralov, and S. Khanna. Perfect matchings in ๐(๐ log ๐) time in regularbipartite graphs. SIAM Journal on Computing, 42(3):1392โ1404, 2013.
238
[81] S. Gollapudi and D. Panigrahi. Online algorithms for rent-or-buy with expert advice.In International Conference on Machine Learning (ICML), pages 2319โ2327, 2019.
[82] A. Goyal, H. Daumรฉ III, and G. Cormode. Sketch algorithms for estimating pointqueries in nlp. In Proceedings of the 2012 Joint Conference on Empirical Methodsin Natural Language Processing and Computational Natural Language Learning, pages1093โ1103, 2012.
[83] A. Graves. Generating sequences with recurrent neural networks. arXiv preprintarXiv:1308.0850, 2013.
[84] M. D. Grigoriadis and L. G. Khachiyan. A sublinear-time randomized approximationalgorithm for matrix games. Operations Research Letters, 18(2):53โ58, 1995.
[85] A. Gronemeier. Asymptotically optimal lower bounds on the NIH-Multi-Party in-formation complexity of the AND-function and disjointness. In 26th InternationalSymposium on Theoretical Aspects of Computer Science, (STACS), pages 505โ516,2009.
[86] T. Grossman and A. Wool. Computational experience with approximation algorithmsfor the set covering problem. Euro. J. Oper. Res., 101(1):81โ92, 1997.
[87] S. Guha and A. McGregor. Lower bounds for quantile estimation in random-orderand multi-pass streaming. In Proc. 34th Int. Colloq. Automata Lang. Prog. (ICALP),volume 4596 of Lect. Notes in Comp. Sci., pages 704โ715, 2007.
[88] S. Guha and A. McGregor. Tight lower bounds for multi-pass stream computation viapass elimination. In Proc. 35th Int. Colloq. Automata Lang. Prog. (ICALP), volume5125 of Lect. Notes in Comp. Sci., pages 760โ772, 2008.
[89] S. Guha, A. McGregor, and D. Tench. Vertex and hyperedge connectivity in dy-namic graph streams. In Proc. 34th ACM Sympos. on Principles of Database Systems(PODS), pages 241โ247, 2015.
[90] V. Guruswami and K. Onak. Superlinear lower bounds for multipass graph processing.In Proc. 28th Conf. Comput. Complexity (CCC), pages 287โ298, 2013.
[91] A. Hadian and T. Heinis. Interpolation-friendly b-trees: Bridging the gap betweenalgorithmic and learned indexes. In Advances in Database Technology - 22nd Interna-tional Conference on Extending Database Technology (EDBT), pages 710โ713, 2019.
[92] N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness:Probabilistic algorithms for constructing approximate matrix decompositions. SIAMreview, 53(2):217โ288, 2011.
[93] S. Han, J. Kang, H. Mao, Y. Hu, X. Li, Y. Li, D. Xie, H. Luo, S. Yao, Y. Wang, et al.Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings ofthe 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays,pages 75โ84, 2017.
239
[94] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. Eie:efficient inference engine on compressed deep neural network. In Computer Architecture(ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on, pages 243โ254,2016.
[95] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neuralnetworks with pruning, trained quantization and huffman coding. arXiv preprintarXiv:1510.00149, 2015.
[96] P. Hand and V. Voroninski. Global guarantees for enforcing deep generative priors byempirical risk. In Conference On Learning Theory (COLT), pages 970โ978, 2018.
[97] S. Har-Peled, P. Indyk, S. Mahabadi, and A. Vakilian. Towards tight bounds for thestreaming set cover problem. In Proc. 35th ACM Sympos. on Principles of DatabaseSystems (PODS), pages 371โ383, 2016.
[98] S. Har-Peled, P. Indyk, and R. Motwani. Approximate nearest neighbor: Towardsremoving the curse of dimensionality. Theory of computing, 8(1):321โ350, 2012.
[99] S. Har-Peled and K. Quanrud. Approximation algorithms for polynomial-expansionand low-density graphs. In Proc. 23rd Annu. European Sympos. Algorithms (ESA),volume 9294 of Lect. Notes in Comp. Sci., pages 717โ728.
[100] S. Har-Peled and M. Sharir. Relative (๐, ๐)-approximations in geometry. DiscreteComput. Geom., 45(3):462โ496, 2011.
[101] H. He, H. Daume III, and J. M. Eisner. Learning to search in branch and boundalgorithms. In Advances in neural information processing systems (NeurIPS), pages3293โ3301, 2014.
[102] M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams.pages 107โ118, 1998.
[103] C.-Y. Hsu, P. Indyk, D. Katabi, and A. Vakilian. Learning-based frequency estimationalgorithms. International Conference on Learning Representations (ICLR), 2019.
[104] N. Imamoglu, Y. Oishi, X. Zhang, G. Ding, Y. Fang, T. Kouyama, and R. Nakamura.Hyperspectral image dataset for benchmarking on salient object detection. In TenthInternational Conference on Quality of Multimedia Experience, (QoMEX), pages 1โ3,2018.
[105] P. Indyk, S. Mahabadi, R. Rubinfeld, J. Ullman, A. Vakilian, and A. Yodpinyanee.Fractional set cover in the streaming model. Approximation, Randomization, andCombinatorial Optimization (APPROX/RANDOM), pages 198โ217, 2017.
[106] P. Indyk, S. Mahabadi, R. Rubinfeld, A. Vakilian, and A. Yodpinyanee. Set coverin sub-linear time. In Proc. 29th ACM-SIAM Sympos. Discrete Algs. (SODA), pages2467โ2486, 2018.
240
[107] P. Indyk and A. Vakilian. Tight trade-offs for the maximum k-coverage problem inthe general streaming model. In Proc. 38th ACM Sympos. on Principles of DatabaseSystems (PODS), pages 200โ217, 2019.
[108] P. Indyk and D. P. Woodruff. Optimal approximations of the frequency moments ofdata streams. In Proc. 37th Annu. ACM Sympos. Theory Comput. (STOC), pages202โ208, 2005.
[109] H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search.IEEE transactions on pattern analysis and machine intelligence, 33(1):117โ128, 2011.
[110] H. Jowhari, M. Saฤlam, and G. Tardos. Tight bounds for lp samplers, finding duplicatesin streams, and related problems. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 49โ58, 2011.
[111] B. Kalyanasundaram and G. Schintger. The probabilistic communication complexityof set intersection. SIAM J. Discrete Math., 5(4):545โ557, 1992.
[112] D. M. Kane, J. Nelson, and D. P. Woodruff. Revisiting norm estimation in datastreams. arXiv preprint arXiv:0811.3648, 2008.
[113] D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinctelements problem. In Proc. 29th ACM Sympos. on Principles of Database Systems(PODS), pages 41โ52, 2010.
[114] M. Kapralov, Y. T. Lee, C. Musco, C. Musco, and A. Sidford. Single pass spectralsparsification in dynamic streams. SIAM Journal on Computing, 46(1):456โ477, 2017.
[115] N. Karmarkar and R. M. Karp. An efficient approximation scheme for the one-dimensional bin-packing problem. In Foundations of Computer Science, 1982.SFCSโ08. 23rd Annual Symposium on, pages 312โ320, 1982.
[116] R. M. Karp, S. Shenker, and C. H. Papadimitriou. A simple algorithm for finding fre-quent elements in streams and bags. ACM Transactions on Database Systems (TODS),28(1):51โ55, 2003.
[117] M. J. Kearns and U. V. Vazirani. An introduction to computational learning theory.MIT press, 1994.
[118] M. Khani, M. Alizadeh, J. Hoydis, and P. Fleming. Adaptive neural signal detectionfor massive MIMO. CoRR, abs/1906.04610, 2019.
[119] C. Koufogiannakis and N. E. Young. A nearly linear-time PTAS for explicit fractionalpacking and covering linear programs. Algorithmica, 70(4):648โ674, 2014.
[120] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The case for learned indexstructures. In Proceedings of the 2018 International Conference on Management ofData (SIGMOD), pages 489โ504, 2018.
241
[121] F. Kuhn, T. Moscibroda, and R. Wattenhofer. The price of being near-sighted. InProc. 17th ACM-SIAM Sympos. Discrete Algs. (SODA), 2006.
[122] R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. Fast greedy algorithms inMapReduce and streaming. In Proc. 25th ACM Sympos. Parallel Alg. Arch. (SPAA),pages 1โ10, 2013.
[123] S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solvinggraph problems in mapreduce. In Proc. 23rd ACM Sympos. Parallel Alg. Arch. (SPAA),pages 85โ94, 2011.
[124] Y. T. Lee and A. Sidford. Path finding methods for linear programming: Solving linearprograms in ๐(๐ฃ๐๐๐๐) iterations and faster algorithms for maximum flow. In Proc.55th Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), pages 424โ433, 2014.
[125] E. Liberty. Simple and deterministic matrix sketching. In Proceedings of the 19thACM SIGKDD international conference on Knowledge discovery and data mining,pages 581โ588, 2013.
[126] J. E. Littlewood and A. C. Offord. On the number of real roots of a random alge-braic equation. ii. InMathematical Proceedings of the Cambridge Philosophical Society,volume 35, pages 133โ148. Cambridge University Press, 1939.
[127] Z. Liu, A. Manousis, G. Vorsanger, V. Sekar, and V. Braverman. One sketch to rulethem all: Rethinking network flow monitoring with univmon. In Proceedings of the2016 ACM SIGCOMM Conference, pages 101โ114, 2016.
[128] T. Lykouris and S. Vassilvitskii. Competitive caching with machine learned advice. InInternational Conference on Machine Learning ICML, pages 3302โ3311, 2018.
[129] G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Approximate medians and otherquantiles in one pass and with limited memory. ACM SIGMOD Record, 27(2):426โ435,1998.
[130] S. Marko and D. Ron. Distance approximation in bounded-degree and general sparsegraphs. In Approximation, Randomization, and Combinatorial Optimization. Algo-rithms and Techniques, pages 475โ486. 2006.
[131] A. McGregor and H. T. Vu. Better streaming algorithms for the maximum coverageproblem. In 20th International Conference on Database Theory (ICDT), pages 22:1โ22:18, 2017.
[132] X. Meng and M. W. Mahoney. Low-distortion subspace embeddings in input-sparsitytime and applications to robust linear regression. In Proc. 45th Annu. ACM Sympos.Theory Comput. (STOC), pages 91โ100, 2013.
[133] A. Metwally, D. Agrawal, and A. El Abbadi. Efficient computation of frequent andtop-k elements in data streams. In International Conference on Database Theory, pages398โ412, 2005.
242
[134] C. Metzler, A. Mousavi, and R. Baraniuk. Learned d-amp: Principled neural networkbased compressive image recovery. In Advances in Neural Information Processing Sys-tems (NeurIPS), pages 1772โ1783, 2017.
[135] G. T. Minton and E. Price. Improved concentration bounds for count-sketch. In Proc.25th ACM-SIAM Sympos. Discrete Algs. (SODA), pages 669โ686, 2014.
[136] J. Misra and D. Gries. Finding repeated elements. Science of computer programming,2(2):143โ152, 1982.
[137] M. Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. InAdvances in Neural Information Processing Systems (NeurIPS), pages 464โ473, 2018.
[138] D. Moshkovitz. The projection games conjecture and the NP-hardness of ln๐-approximating set-cover. In Approximation, Randomization, and Combinatorial Opti-mization. Algorithms and Techniques, pages 276โ287. 2012.
[139] R. Motwani and P. Raghavan. Randomized algorithms. Chapman & Hall/CRC, 2010.
[140] A. Mousavi, G. Dasarathy, and R. G. Baraniuk. Deepcodec: Adaptive sensing andrecovery via deep convolutional neural networks. arXiv preprint arXiv:1707.03386,2017.
[141] A. Mousavi, A. B. Patel, and R. G. Baraniuk. A deep learning approach to structuredsignal recovery. In Communication, Control, and Computing (Allerton), 2015 53rdAnnual Allerton Conference on, pages 1336โ1343, 2015.
[142] N. H. Mustafa, R. Raman, and S. Ray. Settling the apx-hardness status for geometricset cover. In Proc. 55th Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), pages541โ550, 2014.
[143] S. Muthukrishnan. Data streams: Algorithms and applications. Foundations andTrends Rโ in Theoretical Computer Science, 1(2):117โ236, 2005.
[144] J. Nelson and H. L. Nguyรชn. Osnap: Faster numerical linear algebra algorithms viasparser subspace embeddings. In Proc. 54th Annu. IEEE Sympos. Found. Comput.Sci. (FOCS), pages 117โ126, 2013.
[145] H. N. Nguyen and K. Onak. Constant-time approximation algorithms via local im-provements. In Proc. 49th Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), pages327โ336, 2008.
[146] N. Nisan. The communication complexity of approximate set packing and covering. InProc. 29th Int. Colloq. Automata Lang. Prog. (ICALP), pages 868โ875, 2002.
[147] A. Norouzi-Fard, J. Tarnawski, S. Mitrovic, A. Zandieh, A. Mousavifar, and O. Svens-son. Beyond 1/2-approximation for submodular maximization on massive data streams.In International Conference on Machine Learning (ICML), pages 3826โ3835, 2018.
243
[148] K. Onak, D. Ron, M. Rosen, and R. Rubinfeld. A near-optimal sublinear-time algo-rithm for approximating the minimum vertex cover size. In Proc. 23rd ACM-SIAMSympos. Discrete Algs. (SODA), pages 1123โ1131, 2012.
[149] M. Parnas and D. Ron. Approximating the minimum vertex cover in sublinear time anda connection to distributed algorithms. Theoretical Computer Science, 381(1):183โ196,2007.
[150] S. A. Plotkin, D. B. Shmoys, and ร. Tardos. Fast approximation algorithms for frac-tional packing and covering problems. Mathematics of Operations Research, 20(2):257โ301, 1995.
[151] M. Purohit, Z. Svitkina, and R. Kumar. Improving online algorithms via ml pre-dictions. In Advances in Neural Information Processing Systems (NeurIPS), pages9661โ9670, 2018.
[152] R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In Proc. 29th Annu. ACMSympos. Theory Comput. (STOC), 1997.
[153] A. A. Razborov. On the distributional complexity of disjointness. Theo. Comp. Sci.,106(2):385โ390, 1992.
[154] P. Roy, A. Khan, and G. Alonso. Augmented sketch: Faster and more accurate streamprocessing. In Proceedings of the 2016 International Conference on Management ofData (SIGMOD), pages 1449โ1463, 2016.
[155] B. Saha and L. Getoor. On maximum coverage in the streaming model & application tomulti-topic blog-watch. In Proc. SIAM Int. Conf. Data Mining (SDM), pages 697โ708,2009.
[156] R. Salakhutdinov and G. Hinton. Semantic hashing. International Journal of Approx-imate Reasoning, 50(7):969โ978, 2009.
[157] T. Sarlos. Improved approximation algorithms for large matrices via random projec-tions. In Proc. 47th Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), pages 143โ152,2006.
[158] S. Schechter, C. Herley, and M. Mitzenmacher. Popularity is everything: A newapproach to protecting passwords from statistical-guessing attacks. In Proceedingsof the 5th USENIX conference on Hot topics in security, pages 1โ8, 2010.
[159] J. P. Schmidt, A. Siegel, and A. Srinivasan. Chernoffโhoeffding bounds for applicationswith limited independence. SIAM Journal on Discrete Mathematics, 8(2):223โ250,1995.
[160] S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theoryto Algorithms. Cambridge University Press, 2014.
244
[161] V. Sivaraman, S. Narayana, O. Rottenstreich, S. Muthukrishnan, and J. Rexford.Heavy-hitter detection entirely in the data plane. In Proceedings of the Symposium onSDN Research, pages 164โ176, 2017.
[162] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neuralnetworks. In Advances in neural information processing systems (NeurIPS), pages3104โ3112, 2014.
[163] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer. Efficient processing of deep neuralnetworks: A tutorial and survey. Proceedings of the IEEE, 105(12):2295โ2329, 2017.
[164] P. Talukdar and W. Cohen. Scaling graph-based semi supervised learning to largenumber of labels using count-min sketch. In Artificial Intelligence and Statistics, pages940โ947, 2014.
[165] M. Thorup and Y. Zhang. Tabulation-based 5-independent hashing with applica-tions to linear probing and second moment estimation. SIAM Journal on Computing,41(2):293โ331, 2012.
[166] S. Vadhan. Pseudorandomness. Foundations and Trends Rโ in Theoretical ComputerScience, 7(1โ3):1โ336, 2012.
[167] D. Wang, S. Rao, and M. W. Mahoney. Unified acceleration method for packing andcovering problems via diameter reduction. In Proc. 43rd Int. Colloq. Automata Lang.Prog. (ICALP), pages 50:1โ50:13, 2016.
[168] J. Wang, W. Liu, S. Kumar, and S.-F. Chang. Learning to hash for indexing big data- a survey. Proceedings of the IEEE, 104(1):34โ57, 2016.
[169] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Advances in neural infor-mation processing systems (NeurIPS), pages 1753โ1760, 2009.
[170] D. P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations andTrends Rโ in Theoretical Computer Science, 10(1โ2):1โ157, 2014.
[171] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert. A fast randomized algorithm for theapproximation of matrices. Applied and Computational Harmonic Analysis, 25(3):335โ366, 2008.
[172] Y. Yoshida, M. Yamamoto, and H. Ito. Improved constant-time approximation algo-rithms for maximum matchings and other optimization problems. SIAM Journal onComputing, 41(4):1074โ1093, 2012.
[173] N. E. Young. Randomized rounding without solving the linear program. In Proc. 6thACM-SIAM Sympos. Discrete Algs. (SODA), pages 170โ178, 1995.
[174] N. E. Young. Nearly linear-work algorithms for mixed packing/covering and facility-location linear programs. arXiv preprint arXiv:1407.3015, 2014.
245