On the Optimal Ordering of Maps and Selections under Factorization

12
On the Optimal Ordering of Maps and Selections under Factorization Thomas Neumann Sven Helmer Guido Moerkotte Universit¨ at Mannheim Mannheim, Germany tneumann|helmer|moerkotte@informatik.uni-mannheim.de Paper ID: 685 Abstract The query optimizer of a database system is con- fronted with two aspects when handling user-defined functions (UDFs) in query predicates: the vast differ- ences in evaluation costs between UDFs (and other func- tions) and multiple calls of the same (expensive) UDF. The former is dealt with by ordering the evaluation of the pred- icates optimally, the latter by identifying common subex- pressions and thereby avoiding costly recomputation. Current approaches order n predicates optimally (neglect- ing factorization) in O(n log n). Their result may deviate significantly from the optimal solution under factoriza- tion. We formalize the problem of finding optimal orderings under factorization and prove that it is NP-hard. Further- more, we show how to improve on the run time of the brute- force algorithm (which computes all possible orderings) by presenting different enhanced algorithms. Although in the worst case these algorithms obviously still behave exponen- tially, our experiments demonstrate that for real-life exam- ples their performance is much better. 1. Introduction User-defined types (UDT) and user-defined functions (UDF) have found their way into the SQL-3 standard and most commercial database management systems (DBMSs). Two observations concerning UDFs can be made: 1. The cost difference between the evaluation of a UDF and a simple built-in operation like + or > and be- tween two UDFs can be large. 2. The same UDF call can occur at several places in a predicate. As a sample query, let us assume that we want to do content-based image retrieval on a pictorial database. A query is defined by a query image q and selection predicates referring to color, shape, texture, etc. In order to evaluate the predicates of a query we have to calculate one or more fea- ture vectors describing the differences between q and the images in the database. The extraction of the feature vec- tors is implemented by UDFs. Other UDFs then extract fea- tures from the feature vectors and their result is compared with constants: select p.name from pictures p where coarseness(texturediff(p.image, q)) < 1.5 and contrast(texturediff(p.image, q)) < 0.3 and red(colordiff(p.image, q)) < 0.1 and green(colordiff(p.image, q)) < 0.4 and blue(colordiff(p.image, q)) < 0.2 and containscircle(shapediff(p.image, q)) > 0.8; In words, we are looking for an image which has roughly the same coarseness and contrast as our query image q. The shades of red and blue should not deviate too much from q, while we are more lax with green. We also want to have some differences in the shapes found in the images (one should contain a circle-like object that cannot be found in the other). Note that the feature vector calls occur multiple times. Common subexpression elimination (factorization of com- mon subexpressions) is the standard technique to avoid du- plicate evaluation. In this paper, we study the problem of generating optimal query evaluation plans for queries like the one above. Up to now, there is no solution for the prob- lem. Our contributions are: 1. We formalize the problem and prove that it is NP-hard. 2. We present two basic algorithms generating optimal plans. 3. We present two techniques to improve upon the run- time of the basic algorithms while still preserving the optimality of their result. 4. We evaluate the runtime of the algorithms experi- mentally and demonstrate that the time complexities of some variants are—despite the general problem complexity—very low.

Transcript of On the Optimal Ordering of Maps and Selections under Factorization

On the Optimal Ordering of Maps and Selections under Factorization

Thomas Neumann Sven Helmer Guido MoerkotteUniversitat MannheimMannheim, Germany

tneumann|helmer|[email protected]

Paper ID: 685

Abstract

The query optimizer of a database system is con-fronted with two aspects when handling user-definedfunctions (UDFs) in query predicates: the vast differ-ences in evaluation costs between UDFs (and other func-tions) and multiple calls of the same (expensive) UDF. Theformer is dealt with by ordering the evaluation of the pred-icates optimally, the latter by identifying common subex-pressions and thereby avoiding costly recomputation.Current approaches order n predicates optimally (neglect-ing factorization) in O(n log n). Their result may deviatesignificantly from the optimal solution under factoriza-tion.

We formalize the problem of finding optimal orderingsunder factorization and prove that it is NP-hard. Further-more, we show how to improve on the run time of the brute-force algorithm (which computes all possible orderings) bypresenting different enhanced algorithms. Although in theworst case these algorithms obviously still behave exponen-tially, our experiments demonstrate that for real-life exam-ples their performance is much better.

1. Introduction

User-defined types (UDT) and user-defined functions(UDF) have found their way into the SQL-3 standard andmost commercial database management systems (DBMSs).Two observations concerning UDFs can be made:

1. The cost difference between the evaluation of a UDFand a simple built-in operation like + or > and be-tween two UDFs can be large.

2. The same UDF call can occur at several places in apredicate.

As a sample query, let us assume that we want to docontent-based image retrieval on a pictorial database. Aquery is defined by a query image q and selection predicates

referring to color, shape, texture, etc. In order to evaluate thepredicates of a query we have to calculate one or more fea-ture vectors describing the differences between q and theimages in the database. The extraction of the feature vec-tors is implemented by UDFs. Other UDFs then extract fea-tures from the feature vectors and their result is comparedwith constants:

select p.namefrom pictures pwhere coarseness(texturediff(p.image, q)) < 1.5and contrast(texturediff(p.image, q)) < 0.3and red(colordiff(p.image, q)) < 0.1and green(colordiff(p.image, q)) < 0.4and blue(colordiff(p.image, q)) < 0.2and containscircle(shapediff(p.image, q)) > 0.8;

In words, we are looking for an image which has roughlythe same coarseness and contrast as our query image q. Theshades of red and blue should not deviate too much fromq, while we are more lax with green. We also want to havesome differences in the shapes found in the images (oneshould contain a circle-like object that cannot be found inthe other).

Note that the feature vector calls occur multiple times.Common subexpression elimination (factorization of com-mon subexpressions) is the standard technique to avoid du-plicate evaluation. In this paper, we study the problem ofgenerating optimal query evaluation plans for queries likethe one above. Up to now, there is no solution for the prob-lem. Our contributions are:

1. We formalize the problem and prove that it is NP-hard.

2. We present two basic algorithms generating optimalplans.

3. We present two techniques to improve upon the run-time of the basic algorithms while still preserving theoptimality of their result.

4. We evaluate the runtime of the algorithms experi-mentally and demonstrate that the time complexitiesof some variants are—despite the general problemcomplexity—very low.

The rest of the paper is organized as follows. The nextsection discusses related work. Section 3 formalizes theproblem and defines notions needed in the remainder of thepaper. In Section 4 we cover the NP-hardness proof. Welook at different variations of algorithms that determine theoptimal order in Section 5. Finally, Section 6 summarizesour work.

2. Related Work

Hellerstein and Naughton discuss caching results for ex-pensive UDFs [?]. While this is an important performanceimproving technique, it does not solve the problem of find-ing an optimal evaluation order.

Kemper et al. introduce a heuristics for optimizing com-plex boolean expressions in [?]. However, factorization isnot considered.

In [?] they introduce bypass selections to opti-mize boolean expressions containing expensive selec-tions and disjunctions. This technique is extended in [?]to joins and [?] includes a correct handling of NULL val-ues. Factorization is still not considered.

Chaudhuri et al. published the only related work that ex-plicitly handles the problem of factorization, but on a dif-ferent scope. Their work deals with the evaluation of acomplex boolean predicate containing many simple com-parisons between attribute values and constants. Assumingthat for several of the attributes indexes may exist, an ac-cess path intersects and unions the results of several indexscans. How to generate these access paths containing indexunion and intersection is the topic of [?].

In several papers [?, ?, ?, ?, ?, ?] the reordering of ex-pensive selections and joins is discussed. The main topic isordering selections between two joins according to a rankfunction. As factorization is not considered, this can leadto suboptimal plans. Let us demonstrate this by an exam-ple. First recall the applied rank function: for every predi-cate with selectivity s and per-tuple cost c, a rank

r =s − 1

c(1)

is computed. Then, predicates are evaluated in the order ofincreasing ranks.

Let us apply this algorithm to the example querymentioned in the introduction. For the differencefunctions we assume the following evaluation costs:texturediff (7741.07), colordiff (4034.62),shapediff (9877.88). We neglect the costs for access-ing parts of the feature vectors (via the functions red,green, blue, etc.) and the costs for comparing the re-turned numbers with a constant. Furthermore, let usassume that the selection clauses have the following se-lectivities: 0.343 for coarseness < 1.5, 0.380 for

contrast < 0.3, 0.602 for red < 0.1, 0.634 forgreen < 0.4, 0.602 for blue < 0.2, and 0.025 forcontainscircle > 0.8.

We insert these values into equation (1) and or-der the selections according to their ranks (not con-sidering factorization). This suggests first evaluatingcontainscircle, then blue, red, green, and fi-nally coarseness and contrast. The evaluationcost for this order is 10002.43 when applying factoriza-tion. However, we can do better: evaluating the predi-cates in the order red, blue, green, coarseness,contrast, containscircle yields cost of only6111.36.

3. Formalization of the Problem

We now formalize the problem of ordering selectionpredicates considering factorization. For doing this, we use— besides selection — the map operator [?, ?] to evalu-ate UDF calls.

It is defined as

χa:u(R) := {t ◦ [a : v]|t ∈ R, v = u(t)} (2)

and adds an attribute a containing the result of a UDF callu to each input tuple.

For every UDF call in our query we introduce a map op-erator and for every comparison a selection. This results inthe following set of operators:

χtd:texturediff(p.image,q) χx:coarseness(td) σx<1.5

χy:contrast(td) σy<0.3

χcd:colordiff(p.image,q) χr:red(cd) σr<0.1

χg:green(cd) σg<0.4

χb:blue(cd) σb<0.2

χsd:shapediff(p.image,q) χc:containscircle(sd) σc>0.8

Generating the optimal evaluation plan for our query nowboils down to finding an optimal ordering of these mapsand selections. In doing so, we have to pay attention to thefollowing: before evaluating an operator whose subscriptrefers to an attribute generated by a map, this map has to oc-cur before the operator. Capturing these dependencies leadsus to the notion of dependency graph.

3.1. Dependency Graph

The dependency graph has one node corresponding toeach operator. Whenever an operator o uses an attribute gen-erated by a map m, we introduce an edge from m to o. Forour query, the dependency graph is shown in Figure 1.

We denote the set of map operators σ depends on by Xσ .That is, for every path from some χ to σ, we add χ to Xσ .

Ignoring the direction of the edges, we see that the (nowundirected) dependency graph in Figure 1 consists of three

2

χ χ χ χ

χ

χ

χ

χ

σ

χ

σ σ σ σ σ

td:texture

x:coarse y:contrast

x<1.5 y<0.3

cd:color

r:red g:green b:blue

r<0.1 g<0.4 b<0.2

sd:shape

c:circle

c>0.8

Figure 1. Dependency graph for example query

connected components: one for the predicates referring totexture, one for color and one for shape. We can thus referto the connected components of a dependency graph.

3.2. Costs for Evaluating Selections

We associate a selectivity si with every selection σi.Since comparisons are cheap (typically one machine in-struction), we assign a per-tuple-cost of one to every se-lection. The per-tuple-costs of a map χj are denoted by cj .Since cj is equal to the cost of evaluating the UDF appliedin the subscript of χj , cj is typically much larger than one.

Let σ1, . . . , σn be an arbitrary permutation of all selec-tions. The cost of selection σi within this sequence dependson σ1, . . . , σi−1, since they demand the execution of a cer-tain set of maps that σi may also depend on. Therefore, letus define for a sequence S of selections the set of maps thatstill need to be executed for σi after all selections in S (andthe maps they depend on) have been executed:

XS := ∪σ∈SXσ

and

Xσi|S := Xσi\ XS

Then we calculate the per-tuple-costs of a (partial) sequenceof selections σk, . . . , σl executed after some sequence S ofselections as

cost(σk, . . . , σl|S) :=l

i=k

(i−1∏

j=k

sj)[(∑

χj∈Xσi|S

cj) + 1]

Let I be the input cardinality, e.g. of a single relation in thefrom clause. Then the total cost of evaluating a permuta-tion σ1, . . . , σn of all selections is defined as

cost(σ1, . . . , σn) := I ∗ cost(σ1, . . . , σn|ε)

where ε denotes the empty sequence. We call the problemof finding a permutation with minimal total cost the χ-σ-Problem.

3.3. Ranking

It is not difficult to show that our cost function has theadjacent sequence interchange (ASI) property [?]. The rankwe use for determining order is defined as

r(σi) :=si − 1

χk∈Xσick + 1

(3)

In that case we can show the following

Lemma 1. Consider two selections σi and σj . If they donot share any map operators (that is, Xσi

∩Xσj= ∅), then

cost(σiσj) ≤ cost(σjσi) ⇔ r(σi) ≤ r(σj ) (4)

As our cost and rank functions differ from the ones usedin [?], we still have to prove the correctness of the abovelemma.

3.4. Sequencing

As we will see later, we can choose one selection as thestarting point and divide the remaining selections into con-nected components (in order to compute the optimum, ev-ery selection is chosen once as a starting point). Each con-nected component is then brought into an optimal order. Af-ter doing this, all connected components have to be assem-bled into one optimal sequence. We do this by normalizingthe sequence of each connected component and then merg-ing the normalized components. After that we have to de-normalize again and we are finished.

Sequencing strings of jobs with parallel-chains prece-dence constraints that have the ASI-property has been donein [?]. This technique has also been applied successfullyfor join ordering problems [?, ?]. We can use this tech-nique when merging the different connected components ofa dependency graph, as the selections from different com-ponents do not share map operators and Lemma 1 holds.We sketch the well-known algorithm merge in Fig. 2. Forspace reasons, we do not comment on it in detail. We onlymention that the input precedence graph must consist ofseveral connected components. Each connected component

3

// Input: precedence graph G; every// connected component of G must be a sequence;// a sequence of selections S to be executed before// the selections contained in G

// Output: rank-ordered sequence of selectionsmerge(G, S)

for each connected component c in G donormalize(c,S);

merge the normalized components according to rL

denormalize sequence;

// Input: a sequence of selections c to be normalized,a sequence of selections S executed before c

// Output: a normalized sequencenormalize(c, S)

let c = σk , . . . , σl;while(∃ σi: rL

S,σk,σi−1(σi) < rL

S,σk ,...,σi(σi+1))

replace σiσi+1 with compound selection σ′

Figure 2. Merge and normalize

must already form a (totally ordered) sequence. The resultis a single merged sequence. The algorithm merge firstcalls normalize on each sequence. Normalization pro-duces compound selections for any neighbored selectionswith contradictory local ranks. The local rank of a selec-tion σi (whether compound or not) is calculated as

rLS (σi) :=

si − 1∑

χk∈Xσi\XS

ck + 1

for a sequence of selections S executed before σi. A com-pound relation consisting of σl, . . . , σk has a selectivityequal to

k∏

i=l

si

and costs cost(σl, . . . , σk, S).After normalizations, the (compound) relations are

sorted according to their local rank. Hence, mergingthem is simple. The details can be looked up in sev-eral different publications [?, ?, ?]. The following lemmais important for the correctness of some of our algo-rithms:

Lemma 2. Let S be a sequence of selections. Let G bea dependency graph whose connected components are se-quences of selections ci that do not intersect with S. Let C

be the result of merge(G, S). If for all i the concatenatedsequence Sci has minimal costs among all sequences con-taining the selections ci preceded by S, then the sequenceSC has minimal costs among all sequences containing theselections in C preceded by S.

4. Problem Complexity

We show that the χ-σ-Problem is NP-hard by reducingthe clique problem (determining if there is a clique of sizek in a graph) to it. For the proof we need several lemmata,which we list here.

4.1. Mapping

In order to reduce the clique problem we have to map anundirected graph G = (V, E) with a maximum degree of d

(where V is a set of nodes and E a set of edges) to a χ-σ-Problem. We need a constant c chosen arbitrarily in [1,∞[.It represents the costs of a map operator χ. f denotes theselectivity of a selection operator σ and must come from]0, 1

4(dc+1)(d+1) ]. Note that all map operators (except χi)have the same cost and all selections have the same selec-tivity. Finally, I is the cardinality of the input relation andcan be chosen arbitrarily in [1,∞[. We do the actual map-ping in three steps:

1. For every node vi ∈ V we choose a selection σi.

2. For every edge ei,j ∈ E connecting the nodes vi andvj we introduce a map operator χi,j associated with σi

and σj . That means, σi and σj use the result generatedby χi,j .

3. Every node vi ∈ V with a degree di < d is associatedwith an additional map operator χi. This map operatorhas a cost of (d − di)c.

Basically, we are mapping every node to a selection op-erator. The edges represent factorized function calls, i.e.,functions whose return values are needed in the evaluationof both selections connected by the edge. Finally, all se-lections that do not have d map operators associated withthem are assigned an additional map operator each in or-der to bring the total costs of all map operators associatedwith a selection up to dc.

4.2. Preliminaries

On our way to prove the NP-hardness of the χ-σ-Problem we are going to look at some lemmata. (Theproofs can be found in the appendix.)

Lemma 3. In every execution sequence the cost of the firstselection including its map operators is I(dc + 1). Speak-ing formally,

cost(σi|∅) = I(dc + 1) ∀i : vi ∈ V

.

4

Lemma 4. The costs for executing a σ operator after k

other σ operators is at least 4(n + 1) times the cost of ex-ecuting a σ operator after more than k other σ operators,independent of the actual operators involved. More pre-cisely,

cost(σai|σa1

. . . σak. . . σai−1

)

cost(σbk+1|σb1 . . . σbk

)<

1

4(d + 1)∀k

Lemma 5. Given a sequence σa1. . . σan

of σ opera-tors, the costs of a suffix σak+1

. . . σanis at most half

of the costs of an arbitrary χ operator in the pre-fix σa1

. . . σak. Formally,

cost(σa1. . . σan

)−cost(σa1. . . σak

) ≤1

2Ifk−1c ∀k ≤ n

Lemma 6. If the costs for two sequences of the length n

are the same for all but the last σ, the difference is at leastIfn−1c:

cost(σa1. . . σan−1

) = cost(σb1 . . . σbn−1) ∧

cost(σa1. . . σan

) < cost(σb1 . . . σbn)

⇒ cost(σa1. . . σan

) − cost(σb1 . . . σbn) ≥ Ifn−1c.

Lemma 7. If the costs of a certain sequence of σ opera-tors of length n is minimal for all sequences of length n,then the costs of each prefix of this sequence is also mini-mal:

cost(σa1. . . σan

) minimal⇒ ∀1≤k≤ncost(σa1

. . . σak) minimal

Lemma 8. The costs for executing a sequence of σ opera-tions are minimal if the corresponding nodes form a clique:

cost(σa1. . . σan

) = I

n−1∑

i=0

f i[(d − i)c + 1]

iff va1. . . van

form a clique in G. Any other sequence of σ

operators produces greater costs.

4.3. Proof of NP-hardness

After deriving the lemmata from Section 4.2, we are nowready to prove that the χ-σ-problem is NP-hard.

Theorem 1. The χ-σ-problem is NP-hard.

Proof. Transform a graph G = (V, E) into a χ-σ-problemas described in Section 4.1. As shown in Lemma 7, the costsfor each prefix of a minimal cost solution of the problem arealso minimal. Lemma 8 showed that the costs of a prefix areminimal if the nodes in G form a clique and are higher oth-erwise. Therefore, if a clique of the size k exists, the first k

// Input: S = {σ1, . . . , σn}// Output: optimal sequenceperm(S) {

if(|S| > 0) {for(each σi in S) {

Ci = σi ◦ perm(S \ σi);}return Ci with smallest costs;

else {return empty sequence;

}}

Figure 3. Generating permutations

entries of a minimal cost solution of the χ-σ-problem willform a clique in G. Thus, the clique problem can be reducedto the χ-σ-problem.

5. Algorithms

We introduce two basic algorithms (perm and memo) fordetermining the optimal ordering of map and selection op-erators under factorization. Both will be enhanced by twotechniques (pruning and exploitation of connected compo-nents) to reduce their runtime while preserving the optimal-ity of the result.

5.1. Generating Permutations

We start with a simple algorithm that generates all possi-ble permutations of the selection orderings. Figure 3 showsthe pseudocode.

5.1.1. . . . with pruning Our first variant introduces thetechnique of pruning. As we know from Lemma 7, the costsof each prefix of a sequence with minimal costs is also min-imal. So, if we compare the costs of a sequence p ◦ σi ◦ σk

with those of the sequence p ◦ σk ◦ σi (where p is a par-tial plan), we only have to continue expanding the cheaperof the two. That is exactly what the algorithm depicted inFigure 4 does. It starts with an empty plan and expands itby concatenating all selections that are still missing. How-ever, if we encounter a suboptimal prefix, we skip it.

5.1.2. . . . exploiting connected components The secondtechnique uses the concept of sequencing the selections viamerging of connected components as introduced in Section3.4. Figure 5 shows the pseudocode of the algorithm. Wetake all different σi as a starting point of a plan. The re-maining selections are decomposed into (one or more) con-nected components and optimized recursively (for the cal-culation of the correct ranks we have to carry along the

5

// Input: current partial plan P (first call with empty plan),S = {σ1, . . . , σn}

// Output: optimal sequenceperm-p(P, S) {

if(|S| > 0) {for(each σi in S) {

if(|P | > 0) {σk = last selection in P ;Pk = P \ σk;if(cost(Pk ◦ σi ◦ σk) <

cost(Pk ◦ σk ◦ σi)) {continue; // skip this plan

}Ci = perm-p(P ◦ σi, S \ σi);

}else {

Ci = perm-p(σi, S \ σi);}

}return Ci w. smallest cost (discard pruned Cis)

}else {

return P;}

}

Figure 4. Generating permutations with prun-ing

set of the map operators we have already applied). Afterthe connected components are optimized we have to mergethem. This merging is done as described by the merging al-gorithm in Figure 2.

5.1.3. . . . with pruning and exploiting connected com-ponents The variant in Figure 6 combines the connectedcomponent approach with pruning. We can only do prun-ing if there is an unambiguous last selection operator. Thisis the case if we have a single connected component on thecurrent recursion level. If there is more than one connectedcomponent, we are not sure yet how their operators will bemerged. So when optimizing one component cj (of many)we cannot be sure if any operators of other components willbe inserted between σi and the selections of cj . For the ini-tial call of perm-pc, σlast is also set to invalid.

If the current subplan is not pruned, we decomposeS \ σi into its connected components similar to the algo-rithm perm-c. There is one difference, however. We haveto check the number of the connected components we cre-ated. If there is just one, we allow pruning in the recursivecall.

// Input: set of already applied map operators Y ,S = {σ1, . . . , σn}

// Output: optimal sequenceperm-c(Y, S) {

for(each σi in S) {S′ = S \ σi

decompose S′ into connected comp. c1, . . . , ck

C = ∅;for(i = 1; i ≤ k; i++) {

Ci =perm-c(Xσi∪ Y , ci) {

insert Ci into C;}localBesti = σi◦ merge(C, Y ◦ σi);

}return best localBest;

}

Figure 5. Generating permutations exploitingconnected components

// Input: set of already applied map operators Y ,S = {σ1, . . . , σn}, selection σlast applied last,set of applied map operators without Xσlast

: Ylast

// Output: optimal sequenceperm-pc(Y, S, σlast, Ylast) {

for(each σi in S) {if(σlast is valid) {

if(cost(σi ◦ σlast) < cost(σlast ◦ σi)) {continue;

}}decompose S \ σi into connected comp. c1, . . . , ck;C = ∅;for(i = 1; i ≤ k; i++) {

if(k == 1)σtmp = σi;

elseσtmp = invalid;

Ci =perm-pc(Xσi∪ Y , ci, σtmp, Y ) {

insert Ci into C;}localBesti = σi◦ merge(C, Y ◦ σi);

}return best localBest;

}

Figure 6. Combined algorithm

6

10

100

0 2 4 6 8 10

calls

no of maps

5 selections

permperm-pperm-c

perm-pc 10

100

1000

10000

100000

1e+06

1e+07

0 5 10 15 20 25 30 35 40 45

calls

no of maps

10 selections

permperm-pperm-c

perm-pc

Figure 7. Number of calls for the permutation algorithms

5.1.4. Evaluation For the evaluation of the perm algo-rithms, we constructed two χ-σ-problems with five and tenselections, respectively. The selectivities of the selectionsand the costs of the map operators were determined ran-domly: the selectivities were uniformly distributed in [0, 1],the costs uniformly distributed in [10, 1000]. We startedwith no factorization at all, i.e., no selections shared anymap operators. Then we added map operators in the follow-ing way: we chose two selections with the smallest num-ber of shared maps (in the case of ties, the selections werechosen randomly) and connected them via a newly insertedmap operator. This continued until every selection was con-nected to every other selection.

The results of the practical measurements can be seenin Figure 7. The run time was measured in the numberof (recursive) calls of each perm algorithm. As can beclearly seen, all algorithms employing our enhanced tech-niques are better than the straightforward exhaustive search.The break-even point between the variants that exploit con-nected components and the one that does not is also notice-able. In suppoort of perm-c and perm-pc we have to re-mind ourselves that the right hand parts of the curves in Fig-ure 7 represent query types that are not very realistic (as inthose queries almost all selections are connected with eachother). The NP-hardness proof implies that cases in whichselections form cliques in a query graph are worst case sce-narios.

5.2. Memoization

Memoization is a popular technique for lowering the runtime of complex algorithms. It stores a computed answerfor later reuse, rather than recomputing the answer. Ob-viously, this does not come for free, we have to providestorage space for saving already computed answers. In thisand the following sections, we will combine the algorithms

// Input: set of already applied map operators Y ,S = {σ1, . . . , σn}

// Output: optimal sequencememo(Y, S) {

if(|S| > 0) {if(lookup of Y, S in hash table is successful) {

return hash table entry with optimized plan;}else {

for(each σi in S) {Ci = σi ◦ memo(Xσi

∪ Y, S \ σi);}store Ci with smallest cost in

hash table under entry for Y, S;return Ci with smallest costs;

}}else {

return empty sequence;}

}

Figure 8. Memoization

perm, perm-p, perm-c, and perm-cp with memoiza-tion.

We start with the permutation generating algorithm fromSection 5.1. For the memoization part of the algorithm wehave to know which map operators have already been ap-plied in our partial plan and which selection operators arestill missing. Each time we have determined the partial planwith the best costs for these two parameters, we store thisplan in a hash table. When calling memo, we first do alookup in the hash table. If we have already found an op-timal plan for this parameter configuration, we just return

7

// Input: set of already applied map operators Y ,current partial plan P , S = {σ1, . . . , σn}

// Output: optimal sequencememo-p(Y,P, S) {

if(|S| > 0) {if(lookup of Y, S in hash table is successful) {

return P◦ entry(Y, S);}else {

estimcostsi = ∞;for(each σi in S) {

if(|P | > 0) {σk = last selection in P ;Pk = P \ σk;if(cost(Pk ◦ σi ◦ σk) <

cost(Pk ◦ σk ◦ σi)) {estimcosti = cost(P ◦ σi);continue;

}Ci = memo-p(Xσi

∪ Y, P ◦ σi, S \ σi);}else {

Ci = memo-p(Xσi, σi, S \ σi);

}}if(cost(Ci with smallest cost) ≤

min1≤i≤n(estimcosti)) {store Ci \ P with smallest cost in

hash table under entry Y, S;}return Ci with smallest cost

}}else {

return P;}

}

Figure 9. Memoization with pruning

this plan. Otherwise, we have to continue with the compu-tation of further permutations.

5.2.1. . . . with pruning Combining the pruning approachwith memoization is not as straightforward as it may seemat first glance. It is not sufficient to just store the best planfound in a hash table before returning it like in the algo-rithm memo.

Figure 9 depicts the pseudocode for the pruning algo-rithm combined with memoization. We prune consideringa partial plan Pk: if the costs for Pk ◦ σi ◦ σk are smallerthan the costs for Pk ◦ σk ◦ σi, we can discard the sub-plan Pk ◦σk ◦σi, as it will never be part of an optimal plan.How can we introduce memoization to this process? Mem-oizing subplans for a combination of the parameters Y , P ,and S is futile, as we only look once at each combination of

// Input: set of already applied map operators Y ,S = {σ1, . . . , σn}

// Output: optimal sequencememo-c(Y, S) {

if(lookup of Y, S in hash table is successful) {return hash table entry with optimized plan;

}else {

for(each σi in S) {S′ = S \ σi

decompose S′ into connected comp. c1, . . . , ck

C = ∅;for(i = 1; i ≤ k; i++) {

Ci =memo-c(Xσi∪ Y , ci) {

insert Ci into C;}localBesti = σi◦ merge(C, Y ◦ σi);

}store best localBest in

hash table under entry for Y, S;return best localBest;

}}

Figure 10. Memoization exploiting connectedcomponents

Y , P , and S. We could add the last selection σk to the cur-rent values for Y and S. Although this decreases the stor-age overhead considerably, the reusability of the stored en-tries is still quite limited. We can get away with just storingone precomputed answer for each combination of Y and S,but then we have to be careful with the actual memoiza-tion, as a straightforward application would lead to disas-ter.

Let us illustrate this with an example. Assume that wehave four selections σ1, σ2, σ3, and σ4 and that the opti-mal ordering is σ4σ1σ2σ3. One call of memo-pmight occurwith the parameters P = σ1σ4, Y = Xσ1

∪ Xσ4, and S =

{σ2, σ3}. If we happen to find out that cost(σ1, σ2, σ4)is smaller than cost(σ1, σ4, σ2), we prune it and con-tinue with expanding σ1σ4σ3. This suggests that for Y =Xσ1

∪ Xσ4and S = {σ2, σ3} ordering S would result in

σ3σ2, which would be stored in our hash table as an op-timal subplan for this case. If later on we call memo-pwith the parameters P = σ4σ1, Y = Xσ1

∪ Xσ4, and

S = {σ2, σ3}, a lookup in the hash table would result inproviding us with σ3σ2 as an optimal subplan. We wouldthen return σ4σ1σ3σ2 and miss the optimal plan.

As a consequence we should only memoize subplans if

8

// Input: set of already applied map operators Y ,S = {σ1, . . . , σn}, selection σlast applied last,

set of applied map operators without Xσlast: Ylast

// Output: optimal sequencememo-pc(Y,S, σlast, Ylast) {

if(lookup Y, S in hash table is successful) {return hash table entry with optimized plan;

}else {

estimcostsi = ∞;for(each σi in S) {

if(σlast is valid) {if(cost(Ylast ◦ σi ◦ σlast) <

cost(Ylast ◦ σlast ◦ σi)) {estimcostsi = cost(Ylast ◦ σi);continue;

}}decompose S \ σi into conn. comp. c1, . . . , ck;C = ∅;for(i = 1; i ≤ k; i++) {

if(k == 1) {σtmp = σi;

else {σtmp = invalid;

}Ci =memo-pc(Xσi

∪ Y , ci, σtmp, Y ) {insert Ci into C;

}localBesti = σi◦ merge(C, Y ◦ σi);

}if(cost(best localBesti) ≤

min1≤i≤n(estimcosti)) {store best localBest in

hash table under entry for Y, S;}return best localBest;

}}

Figure 11. Combined algorithm

we are 100% sure that this will not lead to wrong conclu-sions later on. We do not have any problems for cases inwhich no pruning takes place, as no permutations are dis-carded there. In the cases in which we prune we estimatethe costs of expanding the partial plan P ◦ σi with all re-maining selections in S \ σi by cost(P ◦ σi). This is ahard lower bound as adding more selections can only in-crease the costs. If the smallest estimated cost of all prunedplans is still larger than the costs of the best plan returnedfrom the recursive call we are on the safe side and can mem-oize the plan (and ignore the pruned plans).

5.2.2. . . . with exploitation of connected componentsCombining the component approach with memoization isfar easier than the combination of pruning with memoiza-tion. We take the basic connected component algorithmfrom Section 5.1.2 and add table lookups and updates asshown in Figure 10.

5.2.3. . . . with pruning and exploitation of con-nected components This variant is the most complicatedof all. It combines the pruning approach from perm-pwith perm-c and memoization from memo. The combi-nation of pruning with the exploitation of connected com-ponents works as shown in perm-pc. However, we haveto address the problems that popped up when combin-ing pruning with memoization. Here we have to apply thesame approach as in perm-pc.

5.2.4. Evaluation We used the same setup for the mea-surements of the memo algorithms as for the perm algo-rithms. The results can be seen in Figure 12.

The general picture is that all algorithms are faster thantheir non-memoizing counterparts. This does not come un-exptected as we are seeing a time-space tradeoff (we are in-vesting space in the form of memoization and gain time).However, the memoization helps the various approachesdifferently. The exhaustive search algorithm profits most.After all, it has the most potential for savings and cuts downon unnecessary recomputations massively. Nevertheless, forthe realistic query cases the memo-c and memo-pc arestill better. In contrast to this, the pure pruning techniquememo-p profits least from memoization, as pruning veryoften prevents the memoization of partial solutions. We candraw the conclusion that pruning does not mix well withmemoization.

5.3. Overall Evaluation

For the overall evaluation we pitted the best perm algo-rithms against the best memo algorithms. Figure 13 showsthe results for five selections. The algorithm with the small-est number of calls was memo-c. We can also see thatmemoization has an edge versus pruning. The only prun-ing perm algorithm that can keep up with the others for asmall number of shared map operators is perm-pc. Thebest perm algorithms include the pruning variants, whilethe best memo algorithm do not include pruning. If we areshort on memory and cannot memoize then pruning is anoption, otherwise memoization should be used, as the mem-ory consumption of perm is linear, while that of memo isexponential.

Figure 14 shows the comparison for ten selections, thistime comparing the number of calls as well as running times(we omitted the running times for five selections, as all al-gorithms finished in well under 0.1 ms). The algorithms

9

10

100

0 2 4 6 8 10

calls

no of maps

5 selections

memomemo-pmemo-c

memo-pc 10

100

1000

10000

100000

1e+06

1e+07

0 5 10 15 20 25 30 35 40 45

calls

no of maps

10 selections

memomemo-pmemo-c

memo-pc

Figure 12. Number of calls for the memoization algorithms

10

100

0 2 4 6 8 10

calls

no of maps

5 selections

perm-pperm-pc

memomemo-c

Figure 13. The best algorithms for 5 selec-tions

display pretty much the same behavior as for five selec-tions. The running times illustrate that although the indi-vidual calls seem different for every algorithm when look-ing at the pseudocode, this does not change the overall pic-ture. The only exception is memo, which was able to im-prove a little bit due to its simplicity.

6. Conclusion

We have investigated the problem of ordering selectionsand map operators optimally under factorization for the firsttime. We observed that under factorization the problem ofordering selections and map operators is much harder thanin the case without factorization. The algorithms currentlyemployed fail to deliver optimal results for factorization.This is not surprising, as we have shown that this problemis NP-hard.

We went further and developed algorithms to compute

the optimal ordering under factorization that improve onthe straightforward approach of exhaustive search. The besttechnique we found, exploiting connected components, is ageneralization of the previously known algorithm for the or-dering problem without factorization. That means, if thereare no selections that share map operators, our best algo-rithm memo-c (and also perm-c) behaves exactly as thealready known one. Also, the running time of memo-c is di-rectly related to the complexity of the given input. For sim-ple problems with a small amount of shared map operatorsthe algorithm is very fast, while for hard problems involv-ing heavily shared map operators, it takes longer.

For future work we would like to integrate these algo-rithms into a more general framework including join opera-tors. Another topic that still needs to be covered is the dis-junction of selections. At the moment we have examinedconjunctions of selections.

A. Proofs of Lemmata

Lemma 1. cost(p1,2) ≤ cost(p2,1), iff s1−1c1+1 ≤ s2−1

c2+1 .

Proof. cost(p1,2) ≤ cost(p2,1)⇔ n[c1 + 1 + s1(c2 + 1)] ≤ n[c2 + 1 + s2(c1 + 1)]⇔ c1 + 1 + s1(c2 + 1) ≤ c2 + 1 + s2(c1 + 1)⇔ s1(c2 + 1) − (c2 + 1) ≤ s2(c1 + 1) − (c1 + 1)⇔ (s1 − 1)(c2 + 1) ≤ (s2 − 1)(c1 + 1)⇔ s1−1

c1+1 ≤ s2−1c2+1

Lemma 2. ∀icost(Sci) minimal ⇒ cost(SC) minimal.

Proof. Consider Co a permutation of C with minimalcost(SCo). Since ∀icost(Sci) minimal and the differ-ent ci do not share maps, ci is a subsequence of CO andbecause of the merge algorithm also of C.

The normalize step does not change the optimal solu-tion, as unrelated selections never benefit from being inside

10

10

100

1000

10000

100000

1e+06

1e+07

0 5 10 15 20 25 30 35 40 45

calls

no of maps

10 selections

perm-pperm-pc

memomemo-c

0.01

0.1

1

10

100

0 5 10 15 20 25 30 35 40 45

times

[ms]

no of maps

10 selections

perm-pperm-pc

memomemo-c

Figure 14. The best algorithms for 10 selections

the compound. Therefore, we assume that the ci are normal-ized.

Since we know that ci must be a subsequence of the opti-mal solution, we can transform the selections in ci by elim-inating the maps already satisfied by preceding selections.As long as we maintain the order inside ci this does not in-fluence the optimal solution or the result of the merge step.Now r is equal to rL.

Since C is ordered by r, we can transform Co in C bysuccessively sorting neighbored selections by r. We neverneed to swap selections from the same component.

Lemma 1 shows that these swaps never increase thecosts, therefore cost(SC) ≤ cost(SCo).

Lemma 3. cost(σi|∅) = I(dc + 1) ∀i : vi ∈ V

Proof. Since the node vi has a degree of di, σi has di con-nected χ operations, producing costs of Idic. If di < d, χi

exists, producing additional costs of I(d − di)c. The σi it-self produced costs of I .

Lemma 4.cost(σai

|σa1...σak

...σai−1)

cost(σbk+1|σb1

...σbk) < 1

4(d+1) ∀k

Proof.

cost(σai|σa1

. . . σak. . . σai−1

) ≤ fk+1I(dc + 1)

cost(σbk+1|σb1 . . . σbk

) ≥ fkI

⇒cost(σai

|σa1. . . σak

. . . σai−1)

cost(σbk+1|σb1 . . . σbk

)≤

fk+1I(dc + 1)

fkI

≤ f(dc + 1)

≤1

4(d + 1)

Lemma 5. cost(σa1. . . σan

) − cost(σa1. . . σak

) ≤12Ifk−1c ∀k ≤ n

Proof. Lemma 4 ⇒

cost(σa1. . . σak+1

) − cost(σa1. . . σak

)

≤ 14(d+1)cost(σak

|σa1. . . σak−1

)

⇒ cost(σa1. . . σan

) − cost(σa1. . . σak

)

≤∑n−k

i=1

(

14

)i cost(σak|σa1

...σak−1)

d+1

≤ 12

cost(σak|σa1

...σak−1)

d+1

≤ 12Ifk−1c

Lemma 6.cost(σa1

. . . σan−1) = cost(σb1 . . . σbn−1

)∧ cost(σa1

. . . σan) < cost(σb1 . . . σbn

)⇒ cost(σa1

. . . σan) − cost(σb1 . . . σbn

) ≥ Ifn−1c.

Proof. The cost difference is caused by the last en-try in each sequence:

cost(σa1. . . σan

) − cost(σb1 . . . σbn)

= cost(σan|σa1

. . . σan−1) − cost(σbn

|σb1 . . . σbn−1)

The input cardinality for the last σ operator is Ifn−1 forboth sequences, and the costs for the σ operators are iden-tical. Therefore, they must differ in at least one χ opera-tor. Since the costs for χ operators are multiples of c, thecost differences must be at least Ifn−1c.

Lemma 7.cost(σa1

. . . σan) minimal

⇒ ∀1≤k≤n cost(σa1. . . σak

) minimal

Proof. By contradiction. We assume that cost(σa1. . . σan

)is minimal, but ∃1≤k≤n such that cost(σa1

. . . σak) not

minimal. For k = n this is obviously a contradiction, fork = 1 this is a contradiction to Lemma 3. Below, we con-sider the case 1 < k < n. W.l.o.g we look at the minimalk.

If cost(σa1. . . σak

) is not minimal, there existsa sequence (b1 . . . bn) such that cost(σb1 . . . σbk

) <

cost(σa1. . . σak

).

11

cost(σa1. . . σan

) ≥ cost(σa1. . . σak

)

≥ cost(σb1 . . . σbk) + Ifk−1c

cost(σb1 . . . σbn) ≤ cost(σb1 . . . σbk

) +1

2Ifk−1c

⇒ (as cost(σa1. . . σan

) ≤ cost(σb1 . . . σbn))

cost(σb1 . . . σbk) + Ifk−1c ≤ cost(σb1 . . . σbk

) + 12Ifk−1c

which is a contradiction, as I , f and c are > 0.

Lemma 8. cost(σa1. . . σan

) = I∑n−1

i=0 f i[(d − i)c + 1]iff va1

. . . vanform a clique in G. Any other sequence of σ

operators produces greater costs.

Proof. By induction over n.Base Case: n = 1A single node always forms a clique, the costs are I(dc +1).Inductive Hypothesis:Lemma 8 holds for an arbitrary, fixed n.Inductive Step: n → n + 1First we show that the costs for a clique are as stated in thelemma. This follows more or less obviously from the factthat every node in the clique is connected with each othernode. Thus, each σ eliminates one χ for the following σ

steps.va1

. . . van+1forms a clique in G

⇒ cost(σa1. . . σan+1

) =cost(σa1

. . . σan) + I(fn[(d − n)c + 1])

va1. . . van

forms a clique in G⇒ cost(σa1

. . . σan) = I

∑n−1i=0 f i[(d − i)c + 1]

⇒ cost(σa1. . . σan+1

) = I∑n

i=0 f i[(d − i)c + 1]Now we show by contradiction that the costs stated

in the lemma can only be produced by nodes in a cliqueand that all other sequences have higher costs. As-sumed va1

. . . van+1does not form a clique in G, and

cost(σa1. . . σan+1

) ≤ I∑n−1

i=0 f i[(d − i)c + 1]:Since va1

. . . van+1does not form a clique in G, ∃1 ≤

k ≤ n such that vakis not connected with van+1

, thereforeno more than n − 1 χ operators required by σan+1

can al-ready be implied by σa1

. . . σan.

⇒ cost(σan+1|σa1

. . . σan) > Ifn[(d − n)c + 1]

⇒ cost(σa1. . . σan+1

)= cost(σa1

. . . σan) + cost(σan+1

|σa1. . . σan

)

≥ I∑n−1

i=0 f i[(d − i)c + 1] + cost(σan+1|σa1

. . . σan)

> I∑n−1

i=0 f i[(d − i)c + 1] + Ifn[(d − n)c + 1]= I

∑n

i=0 f i[(d − i)c + 1]This is a contradiction to the assumption.

12