Relational expressive power of constraint query languages

Relational Expressive Power of Constraint Query Languages

Michael Benedikt, Guozhu Dong, Leonid Libkin, andLimsoon Wong

Copyright 1996 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or [email protected].

Relational Expressive Power of Constraint Query Languages�Michael BenediktyAT&T Bell Labs Guozhu DongzU. of Melbourne Leonid LibkinxAT&T Bell Labs Limsoon Wong{Inst. of Systems ScienceAbstractWe study the expressive power of �rst order query languages with several classes of equalityand inequality constraints and give three main results. (1) We settle the conjecture that recursivequeries such as parity test and transitive closure cannot be expressed in relational calculus withpolynomial inequality constraints over the reals. (2) Noting that relational queries exhibit severalforms of genericity, we establish a number of collapse results of the following form: The class ofgeneric queries expressible in relational calculus augmented with a class of constraints coincideswith the class of queries expressible in relational calculus (with or without an order relation).We prove such results for both natural and active-domain semantics. (3) As a consequence, therelational calculus augmented with polynomial inequalities expresses the same classes of genericrelational queries under the natural and active-domain semantics.In the course of proving these results for the active-domain semantics, we establish Ramsey-type theorems saying that any query involving constraints coincides with a constraint-free query ondatabases whose elements come from a certain in�nite subset of the domain. To prove the collapseresults for the natural semantics, we make use of techniques from nonstandard analysis and fromthe model theory of analytic functions.1 IntroductionIn new database applications involving spatial data (as in geographical databases) and temporal data,it is necessary to store in databases in�nite collections of items and to evaluate queries on suchin�nite collections. The constraint database model, introduced by Kanellakis, Kuper and Revesz intheir seminal paper [18], is designed to meet the requirements of such applications and is a powerfulgeneralization of Codd's relational model. In this new paradigm, instead of tuples, queries act on\generalized tuples" expressed as quanti�er-free �rst-order constraints. A generalized relation is a�Part of this work was done when Limsoon Wong was visiting University of Melbourne and AT&T Bell Labs inMurray Hill.yAT&T Bell Laboratories, 1000 E Warrenville Rd, P.O. Box 3013, Naperville, IL 60566-7013, USA, Email:[email protected] of Computer Science, University of Melbourne, Parkville, Vic. 3052, Australia, Email:[email protected]&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA, Email: [email protected].{Real World Computing Partnership Novel Function Institute of Systems Science Laboratory, Heng Mui Keng Terrace,Singapore 0511, E-mail: [email protected]. 1

�nite set of such generalized tuples. Interesting constraint query languages are then obtained bycoupling the relational calculus with arithmetic constraints. Well-known example of queries that areinexpressible in the pure relational calculus but are expressible with this extension include testingwhether a set of points forms a convex hull.So the coupling of relational calculus with arithmetic constraints enhances power. A natural question,which attracted much attention recently, arises: How much more power can we gain from this coupling?In particular, several recent papers [17, 18, 16, 26, 11, 23] made and attempted to settle the followingconjecture.Conjecture. Queries such as transitive closure, connectivity test, parity test are not de�nable in therelational calculus plus polynomial inequality constraints over the reals.These three queries are singled out because they involve two basic primitives, recursion and counting,and because it was known that they cannot be expressed by the relational calculus [4]. In [10] it wasshown that useful properties for proving the inexpressibility of these queries in the relational calculus,such as locality [8] and 0/1-law [7], do not carry over to the constraint query languages. Nevertheless,with a considerable amount of e�ort, a number of inexpressibility results were established recently.In [12] it is shown, via an AC0 data complexity result, that the parity query cannot be expressed ifonly linear constraints are added to the relational calculus. In [2] it is shown that testing whether aconstraint database is contained in a line is not de�nable with linear constraints. Testing whether aconstraint database represents a line is not de�nable in �rst-order with the order relation [3].Parity and connectivity tests are examples of generic queries. That is, their answer does not changewhen a bijective map on the domain is applied to a database. It is therefore natural to pose themore general question below. The answer to this question can be applied to queries whose inputs aregeneralized relations using order constraints built from a �xed set of variables, since such generalizedrelations can be translated to relations [10].Question. Do constraints add pure relational expressive power? More speci�cally, when limited torelational inputs and outputs, do the extended query languages express more generic queries than therelational calculus?We answer this question under both the active and natural semantics. Under the active semantics,quanti�cation variables are assumed to range over the active domain of the database. Under the naturalsemantics, quanti�cation variables are assumed to range over the whole universe of reals numbers. (1)We show that the addition of constraints to the relational calculus does not add more power beyondordering when interpreted under the active domain semantics. We establish these results by proving,mostly constructively, several Ramsey-style theorems. (2) We show similar results for the naturalsemantics. We establish these results using techniques from nonstandard analysis and some resultsin the model theory of analytic functions. (3) As a consequence, the conjecture mentioned above isa�rmed, and the relational calculus plus polynomial inequality constraints is shown to express thesame generic queries under the two di�erent semantics.The coincidence of the two semantics was established for the special cases of the relational calculus byHull and Su [15] and of the relational calculus with linear constraints by Paredaens, Van den Bussche,and Van Gucht [24]. However, these two results of [15] and [24] are not limited to generic queries.It was also shown in [24] that linear constraints do not add expressive power beyond <. Our results2

generalize the result of [24] to apply to a wide class of signatures. Another constrast to our results isthat of Grumbach and Su [11]. They showed that, with an integer test function in the signature, onecan de�ne parity of cardinality of �nite relations over the reals under the natural interpretation. Itis necessary, therefore to put some sort of restriction on the signature. Our result about the naturalexpansion restricts to signatures that are o-minimal [25]: signatures that can de�ne only a certainkind of subset of the real line. We are able to use a number of results that establish o-minimality, andthis turns out to be su�cient to settle the main conjecture.In addition to the constraint model, other extensions of Codd's relational model have received muchattention in the past few years. Among them are extensions to complex objects and bags. It isinteresting to compare the addition of constraints with those extensions. Several conservativity resultsproved in [33, 20, 21] show that the addition of nested relations (as in complex objects) does not addrelational expressive power. On the other hand, the addition of bags does add relational expressivepower [9, 19, 22]. In particular, parity test is de�nable using bags on ordered domains [19].Organization. Section 2 presents the notations that are used throughout the paper. We also explainthe active and natural semantics of relational calculus and state two previous results relating them.Section 3 describes three notions of genericity of queries. We also brie y investigate their relationship.Section 4 studies the expressibility of various classes of generic queries using the relational calculusunder the active semantics as the ambient language. We prove powerful collapse theorems saying thatthe expressibility of these queries is independent of the presence of arithmetic and other operators inthe ambient language. These results are proved via several Ramsey-like results. Moreover, for thespecial cases of real, rational, and integer arithmetic, our proofs are constructive.Section 5 considers the expressibility of generic queries, but using the relational calculus under thenatural semantics as the ambient language. We prove an analogous collapse theorem saying that theexpressibility of these queries is again independent of the presence of a large class of arithmetic andother operators in this more powerful ambient language. This result is proved via an excursion throughnonstandard models. The conjecture mentioned earlier follows as a corollary.Section 6 concludes the paper by relating the `active' and the `natural' results of this paper. It isknown that the two semantics of relational calculus coincide if < is the only extra primitive. This factand our collapse theorems extend the coincidence of the active and natural semantics, with respect togeneric queries, to a larger number of additional primitives.2 NotationsFix a database schema SC = hR1; : : : ; Rki where Ri is of arity �i. Assume that the domain is anin�nite set U. A database (or database instance) DB is given by an interpretation of each relationalsymbol Ri as a �nite �i-ary relation over U. We denote the set of all database instances of SC withthe domain U by Inst(U;SC ). Given a database DB, its active domain adom(DB) is the set of allelements in U that appear in the database.A relational boolean query (rbq) is a �rst-order sentence built up from atomic formulae in thelanguage containing hR1; : : : ; Rki and equality via the usual logical connectives and quanti�ers of the3

form Qixi, where Qi is either 8 or 9. If we allow our atomic formulae to also mention a symbol <,which is interpreted as a linear order on U then we speak of a �-rbq. For each of the semantics wewill give for these sentences, it will be the case that an arbitrary rbq can be e�ectively transformedinto a semantically equivalent one of the form(1) Q1x1: : : : Qnxn:(x1; : : : ; xn)where each Qi is either 8 or 9 and (�) is a quanti�er-free formula with free variables among x1; : : : ; xn.Let be an interpreted signature on U; that is, a family of operations ! : Uarity (!) ! U. For example,if U = R, then may be h+;�; �;�; exi. A constraint boolean query (cbq) is a sentence built upfrom atomic formulae by connectives and quanti�ers such that is in the language that consists ofpredicate symbols R1; : : : ; Rk, equality and symbols for operations in . For example, if U = R, thena cbq may ask if for two reals in a database, their sum is also in the database. Again, each cbq canbe transformed into an equivalent one in the prenex normal form (1).There are two possible interpretations of each of these classes of sentences. Under the active-domainsemantics (or just active semantics), all quanti�ed variables range over the active domain of a database.That is, a sentence given by (1) de�nes, under active semantics, the query Q such that the value of Qon a database DB is the value ofQ1x1 2 adom(DB): : : : Qnxn 2 adom(DB):(x1; : : : ; xn)Under the natural semantics, all quanti�ed variable range over U. That is, the sentence de�nes thequery Q whose value on DB equals toQ1x1 2 U: : : : Qnxn 2 U:(x1; : : : ; xn)Convention: When the semantics (active or natural) is clear, we will often identify the the sentenceQ that de�nes an rbq or a cbq with the semantic object Q, that is, the map from Inst(U;SC ) tofT;Fg.The value of Q on DB will be denoted by Q(DB). If this value is true, we also write DB j= Q. Itwill always be clear from the context which interpretation is used for evaluating Q.The classes of �rst-order Boolean queries (that is, maps from Inst(U;SC) to fT;Fg) de�nable underthe active and natural semantics will be denoted by AFO(�) and NFO(�) respectively, where welist the domain U (if it is not understood) and the operations from in parentheses. For example,AFO is just the relational calculus, AFO(<) is the relational calculus with the total order relation,AFO(R;+;�; �; 0; 1; <) is the class of Boolean queries with polynomial inequality constraints de�nableunder active semantics, and NFO(R;+; �; 0; 1) is the same class of queries interpreted under naturalsemantics. Note that in this case we do not have to list the order relation as it is de�nable undernatural semantics: x < y , 9z:(z 6= 0) ^ (y � x = z � z), since z ranges over the reals.There are a number of results showing that the natural semantics does not add expressiveness overthe active. In particular, we shall use the following two.Fact 1 (See [15]) AFO = NFO. 24

Fact 2 (See [24]) AFO(R;+;�; 0; 1; <) = NFO(R;+;�; 0; 1; <). 2De�nition 1 Two queries Q1 and Q2 are a-equivalent i� for any database DB, under the active-domain interpretation we have Q1(DB) = Q2(DB). They are n-equivalent i� for any database DB,under the natural interpretation we have Q1(DB) = Q2(DB). When it is clear from the context howQ1 and Q2 are interpreted, we speak of equivalent queries. 2Facts 1 and 2 imply that when the signature contains only =, or linear constraints on the reals, twoqueries are a-equivalent i� they are n-equivalent.3 Notions of GenericityGiven a constraint language, one may ask queries that are speci�c to that language. For example,one may ask if a database contains a root of a given polynomial. However, purely relational queriesmust conform to the data independence principle which says that the internal structure of data has noe�ect on the answers to queries. This is usually captured by a notion of genericity. The most standardnotion of genericity states that queries are invariant under arbitrary permutations of the domain [1].For constraint databases, more complex notions of genericity were considered [23]. Here we use threenotions of genericity. If ' : U ! U is an endomap on the domain, then ' can be extended to databasesover U in an obvious way. We use 'DB to denote this extension.De�nition 2 � A queryQ is totally generic (TG) with respect to a domain U if for any databaseDB and any permutation � of U, it is the case that Q(�DB) = Q(DB).� A query Q ismonotone generic (MG) with respect to an ordered domain U if for any databaseDB and any monotone injective map ' : U ! U, it is the case that Q('DB) = Q(DB).� A query Q is locally generic (LG) with respect to an ordered domain U if for any databaseDB and any partial monotone function ' : U 7! U de�ned on adom(DB), it is the case thatQ('DB) = Q(DB).For any language L, we let LTG, LLG, and LMG stand respectively for the class of TG-, LG-, andMG-queries expressible in L. 2For example, AFOTG(<) is the class of TG-queries de�nable in relational calculus with order, andNFOLG(R;+; �; 0; 1) the class of LG-queries de�nable under the natural interpretation with polyno-mial constraints over the reals. Since queries de�nable in AFO and AFO(<) are TG and LG (MG)respectively, we have the equations AFOTG = AFO and AFOLG(<) = AFOMG(<) = AFO(<).When U is ordered, TG is the strongest notion because it implies both LG and MG. Also, LG impliesMG. Under certain mild restrictions on the order, we also have MG implies LG. We do not need thelocal notion of genericity for unordered sets because it is equivalent to TG for any in�nite U.We call a linearly ordered set hS;<i doubly transitive [27] if for every a < b and x < y there exists anautomorphism f : S ! S such that f(a) = x and f(b) = y.5

Proposition 1 With respect to any in�nite universe U, it is the case that every TG-query is also aLG-query and every LG-query is also a MG-query. With respect to any in�nite ordered universe Uthat is doubly transitive, it is the case that every MG-query is also a LG-query.Proof. To prove that TG implies LG, consider a database with D = adom(DB). Let �D : U ! Ube a partial monotone injective map de�ned on D. Since both U � D and U � �D(D) have thesame cardinality, there exists a permutation � on U that extends �D. Using TG for Q we obtainQ(�DDB) = Q(�DB) = Q(DB), which implies that Q is LG. If Q is LG, consider any monotoneinjection ' : U ! U. Given a database with D = adom(DB). Let 'D be the restriction of ' on D.Then Q('DB) = Q('DDB) = Q(DB). Hence Q is MG. The proof that MG implies LG for doublytransitive orders is similar to that of TG implies LG, because double transitivity provides the neededextension of a partial map, see [27]. 2To see that most of the examples of ordered domains used in the theory of constraint databases aredoubly transitive, we state the following lemma.Lemma 1 ([27]) hQ; <i and hR; <i are doubly transitive. Also, any countable dense linear orderingwithout endpoints is doubly transitive.Thus, when we prove the collapse results for classes of generic queries over ordered databases, we shallaim to prove the result for LG-queries. Then they will automatically imply the corresponding resultsfor TG-queries and, if U is doubly transitive, for MG-queries as well.4 Relational Expressive Power: Active SemanticsIn this section we prove a number of collapse results for the active-domain semantics. Thus throughoutthis section we assume that queries are interpreted under the active-domain semantics. We start bystating the results. Then we prove the main technical results of the section (which we call Ramseytheorems for constraint databases). These results establish the existence of an in�nite subset of thedomain on which a given cbq is equivalent to some query that does not use constraints. Finally, weuse those theorems to conclude the proofs of the collapse theorems.First, we show a very general collapse result for ordered databases.Theorem 1 Let be an arbitrary interpreted signature on the linearly ordered domain hU; <i. Thenfor every LG cbq there exists an equivalent �-rbq. In other words,AFOLG(; <) = AFO(<):This result has a number of corollaries. One of them was also independently proved by Van den Busscheand Otto [29], who used the Ehrenfeucht-Mostowski theorem about the existence of indiscernibles [5].Another corollary of this theorem is that the class of MG-queries in AFO(; <) is exactly the classof AFO(<) queries when the domain is doubly transitive. While this covers domains such as Q and6

R, it excludes Z which is not doubly transitive. To cover the case of Z, we prove a more powerfulcorollary. We call a linearly-ordered set hS;<i without endpoints weakly homogeneous if S is in�niteand for every �nite X � S and in�nite Y � S there exists a monotone injection f : S ! S suchthat f(X) � Y . It is not hard to see that any doubly transitive order is weakly homogeneous and,in addition, so is any linear order without endpoints in which all intervals are �nite (in particular,hZ; <i).Corollary 1 � Let hU; <i be weakly homogeneous and let be an arbitrary signature. ThenAFOMG(; <) = AFO(<).� (See [29]) Let hU; <i be an ordered domain and let be an arbitrary signature. ThenAFOTG(; <) = AFOTG(<).� Transitive closure, parity test, and connectivity test are not �rst-order de�nable over ordereddatabases under the active-domain interpretation, even in the presence of an arbitrary interpretedsignature on the domain. 2There is no analog of the theorem for the unordered case, because one of the interpreted operationscan de�ne a linear order, and it is known that AFO(<) 6= AFO; see [1, page 462]. However, we canstill prove the collapse result for unordered databases for a large class of signatures.Let the domain be R, and let = (h1; h2; : : :) be an at most countable signature that consists offunctions Rarity(hi ) ! R. We call sparse if the subtraction is in and for every -term h(x1; : : : ; xn)and any constants c1; : : : ; ci�1; ci+1; : : : ; cn 2 R, the function hi(x) = h(c1; : : : ; ci�1; x; ci+1; : : : ; cn) iseither identical to zero or has at most countably many zeros.There is a simple way to obtain a large number of sparse signatures over the reals, using the fact thatany analytic function has at most countably many zeros [28].Proposition 2 Let (H1;H2; : : :) be any collection of analytic functions such that the value of each Hiis in R if all its arguments are in R. Let hi be the restriction of Hi to the real arguments. If one ofthe His is subtraction, then = (h1; h2; : : :) is sparse. 2For example, (+;�; �; ex; 0), (+;�; 0; 1), and (+;�; �; 0; 1) are sparse signatures.Theorem 2 Let be a sparse signature. Then for every TG cbq in AFO(R;) there exists ana-equivalent rbq. That is, AFOTG(R;) = AFO:Note that the statements of Theorems 1 and 2 are of a very di�erent nature than complexity basedresults such as [12].Theorem 2 does not easily generalize for Q or Z because the fact that R is uncountable is crucialfor the proof. However, if we are only interested in the ring structure, there is a nice generalization.Recall that an integral domain K is a commutative ring without divisors of zero.7

Theorem 3 Let K be an in�nite integral domain. Then every TG-query in AFO(K ;+;�; �; 0; 1) isa-equivalent to some rbq. That is,AFOTG(K ;+;�; �; 0; 1) = AFO4.1 Ramsey Theorems for Constraint DatabasesThe following propositions are the basis for the proofs of the theorems in this section. They state thatany cbq coincides with some rbq on databases whose active domain is a subset of a certain in�nitesubset of U.Given U � U, we call two queries Q and Q0 U-equivalent if, for any database DB,adom(DB) � U implies Q(DB) = Q0(DB).Recall that we assume the active-domain interpretation of queries. We start with the ordered case.Proposition 3 Let be an arbitrary interpreted signature on the linearly ordered domain hU; <i.Then for any cbq Q there exists an in�nite subset UQ � U and a �-rbq Q0 which is UQ-equivalent toQ.Proof. First we make an observation that will be useful later. Assume that �i(x1; : : : ; xn), i = 1; 2,are formulae in n variables, and let �1(~d) = �2(~d) for any ~d 2 Dn for some set D. Then for anyquanti�er pre�x we have Q1x1 2 D : : : Qnxn 2 D:�1(~x) = Q1x1 2 D : : : Qnxn 2 D:�2(~x).We can transform an arbitrary �rst-order sentence into prenex normal form. Thus, from now on weassume that Q is given by (1). Without loss of generality, we can assume that is a formula indisjunctive normal form whose literals are of form:a. Ri(z1; : : : ; z�i), where all zjs are variables.b. t1(~y)�t2(~z), where t1 and t2 are terms in the signature , � 2 f=; <g, and ~y, ~z contain variablesfrom x1; : : : ; xn.Any formula Q can be rewritten to the form described above. Indeed, if the argumentsof the predicate Ri are terms rather than variables, then we replace each occurrence ofRi(t1(y11 ; : : : ; yli11 ); : : : ; t�i(y1�i ; : : : ; yli�i�i )) by 9z1 : : : 9z�i :(z1 = t1(~y1)) ^ : : : ^ (z�i = t�i(~y�i)) ^Ri(z1; : : : ; z�i). It is easy to see that such a replacement does not change the value of the formula forevery assignment of values in the domain to the variables. Then we can convert the resulting formulainto the desired prenex normal form, whose quanti�er-free part is in disjunctive normal form.We may also assume that no literal of the form b is negated. This follows because the domain islinearly ordered. That is, :(t1(~y) = t2(~z)) can be replaced by (t1(~y) < t2(~z)) _ (t2(~z) < t1(~y)), and:(t1(~y) < t2(~z)) can be replaced by (t1(~y) = t2(~z)) _ (t2(~z) < t1(~y)).Now subdivide the disjuncts into two classes. The �rst one consists of those which are conjunctionsof (possibly negated) literals of the form a. The second class consists of those which are conjunctions8

of literals, at least one of them being of the form b. Thus, = 1 _2, where 1 is a formula in thesignature hR1; : : : ; Rk;=;�i.Let pn be the number of ordered partitions of an n-element set. (An ordered partition of X is apartition X1, ..., Xs with a linear order Xi1 < : : : < Xis on its elements.) Let �i be the formulathat speci�es that values of x1, ..., xn form the ith ordered partition. For example, if n = 3, for thepartition ff1; 2g; f3gg with f1; 2g < f3g, the corresponding � is (x1 = x2) ^ (x1 < x3). Note thatWpni=1 �i = T. Hence, 2 can be replaced by Wpni=1(�i ^2).Let c1; : : : ; cm denote all the constraints of the form b present in 2. It can be readily seen that Q isequivalent to(2) Q1x1: : : : Qnxn:1 _ pn_i=12[(cj ^ �i)=cj ]Here [(cj ^ �i)=cj ] means that every cj is replaced by cj ^ �i. The equivalence follows from the factthat in 2 in every disjunct at least one literal is a condition, and no condition literal is negated. Nowwe need the following lemma.Lemma 2 Suppose that a query Q is given by (20), where(20) Q1x1: : : : Qnxn:1 __i2I2[(cj ^ �i)=cj ]I is a nonempty subset of f1; : : : ; png and 1 does not mention any symbol from . Then, for anyin�nite D � U, there exists an in�nite D0 � D, a formula 01 that does not mention any symbol in ,and an i0 2 I such that for Q0 de�ned by(200) Q1x1: : : : Qnxn:01 _ _i2I�fi0g2[(cj ^ �i)=cj ]it is the case that Q and Q0 coincide for any database DB with adom(DB) � D0.Before we prove this lemma, let us observe that the proposition follows straightforwardly from it. Wejust apply the lemma inductively to (2) until all formulae that mention symbols in disappear.To prove the lemma, pick an element i0 2 I arbitrarily. Let �i0 specify an ordered partition withs classes X1; : : : ;Xs � fx1; : : : ; xng ordered by X1 < : : : < Xs. Let yi 2 Xi; then we may rewrite2[(cj ^ �i0)=cj ] to 2[c0j=cj ] ^ where c0j is a condition of form t1�t2 that only uses fy1; : : : ; ysg and states that all yis are di�erent and that y1 < : : : < ys.Since there are m conditions c0j , there are 2m possible truth values for the system c01; : : : ; c0m. Let Albe the set of s-tuples (d1; : : : ; ds) of distinct elements of D ordered by d1 < : : : < ds such that thetruth values of c0i(d1; : : : ; ds) form the binary representation of the number l, 0 � l � 2m � 1. ThenA0; : : : ; A2m�1 form a partition of the set of s-tuples of distinct elements of D.By Ramsey's theorem, we can �nd an in�nite D0 � D such that all s-tuples of distinct elements of D0are in the same class of the partition, say Al. Let l correspond to the truth values of c0js being tjs.Let i0 = (2[t1=c1; : : : ; tm=cm] ^ ) and 01 = 1 _i0 :9

Then, for any n-tuples of elements of D0, the values of1 _ _i2I�f1;:::;png2[(cj ^ �i)=cj ] and 01 _ _i2I�fi0g2[(cj ^ �i)=cj ]coincide. Indeed, if (d1; : : : ; dn) does not satisfy �i0 , this is immediate (because the disjuncts thatmake the di�erence between the formulae evaluate to F), and if it does satisfy �i0 , it follows from thefact that i0 and 2[(cj ^ �i0)=cj ] coincide on such tuples.Applying the observation made at the beginning of the proof, we see that for Q0 de�ned by (200) where01 = 1 _ i0 , it is the case that Q coincides with Q0 on all databases with active domain in D0.Since i0 does not mention any symbols in , this proves the lemma and the proposition. 2Next, we prove a similar result for sparse signatures.Proposition 4 Let be a sparse signature. Then for any cbq Q there exists an in�nite subset UQ � Uand a rbq Q0 which is UQ-equivalent to Q.Proof. First, assume that Q is given byQx1 : : : Qxn:(x1; : : : ; xn)where is in DNF, and its literals are of form Ri(z1; : : : ; z�i) where all zjs are variables, or they arecondition literals h1(~x)�h2(~x) where h1 and h2 are -de�nable functions and � 2 f=; 6=g; see the proofof Proposition 3. Further assume without loss of generality that no condition literal is negated.Let Bn be the nth Bell number; that is, Bn is the number of partitions of an n-element set. Let�i represent the ith partition on an n-element set of variables, i = 1; : : : ; Bn. For example, for thepartition ff1; 2g; f3; 4gg of f1; 2; 3; 4g we would get � as (x1 = x2) ^ (x3 = x4) ^ (x1 6= x3). SinceWBni=1 �i = T, we have = WBni=1(�i ^).Consider each �i ^. Let �i specify the partition X1; : : : ;Xs and let yi be a representative of Xi. Foreach condition h1(~x)�h2(~x) in , we replace hi by h0i in which only y1; : : : ; ys are used. Furthermore,if h01 is identically h02, we replace the corresponding condition by T or F respectively. Thus, Q isequivalent to a query of form(3) Qx1 : : : Qxn: Bn_i=1(�i ^i)where i is a formula in s variables (provided �i speci�es a partition with s classes) and each conditionis of form h01(~x)�h02(~x), where h01 and h02 are not identical. Moreover, h01 and h02 use variables for which�i implies that their values are distinct. Next, we need the following lemma.Lemma 3 Given a system of equations f1(~x) = 0; : : : ; fk(~x) = 0 where fis are -de�nable functions inat most n variables x1; : : : ; xn. If no fi is identically zero, then there exists an in�nite set R � R suchthat assigning distinct values of R to variables x1; : : : ; xn simultaneously invalidates all the equations.10

This lemma immediately implies the statement of Proposition 4. De�ne a system of equations Fas follows: go through all the disjuncts in Q | assumed to be in the form (3) | and for eachcondition h01(~x)�h02(~x) add h01(~x) � h02(~x) = 0 to F . Note that h01(~x) � h02(~x) is not identically zeroby construction. By the lemma, �nd an in�nite R � R such that assigning distinct values from R tovariables in equations in F simultaneously invalidates all of them. Then rewrite Q to Q0 by replacingeach h01(~x) = h02(~x) by F and each h01(~x) 6= h02(~x) by T. Then, for any DB with adom(DB) � R it isthe case that Q(DB) = Q0(DB), and Q0 does not mention any symbols from , which concludes theproof.To prove the lemma, we call a function IR-de�nable if it is de�nable in extended with constantsfor all elements of R. Let f(x1; : : : ; xn) be a function in n variables and 1 � i � n. We let f i;s be thefunction in n� 1 variables de�ned as f(x1; : : : ; xi�1; s; xi+1; : : : ; xn). We need the followingClaim 1 For any at most countable collection of IR-de�nable functions G = g1; g2; : : : that are notidentically zero and any at most countable set S � R, there exists s 2 R � S such that none of thefunctions gi;s, where g 2 G, is identically zero.To prove this claim, consider a IR-de�nable non-zero function g(x1; : : : ; xn) and assume without lossof generality that i = n. First notice that there exists a (n � 1)-vector ~c such that g~c(x) = g(~c; x) isnot identically zero. Indeed, if g~c is identically zero for any ~c, then g is identically zero. Now pick sucha ~c and, by applying the fact that is sparse, �nd a countable set Sg;i of roots of g~c. Note that forevery s 62 Sg;i we obtain that gi;s is not identically zero.Now consider the countable set SG = S [ [g2G[i Sg;iIt follows from our construction that for any s 2 R � SG none of the functions gi;s is identically zero.This �nishes the proof of the claim.To conclude the proof of the lemma, we start with our original set F0 = F of functions and S = ;,and use the claim to �nd s0 such that none of the functions in F1 = F0 [ fgi;s0 j g 2 F0g isidentically zero. Continuing, at the mth step we have a �nite set S = fs0; : : : ; sm�1g � R and aset Fm�1 of functions none of which is identically zero. We �nd sm 2 R � S such that none ofthe functions in Fm = Fm�1 [ fgi;sm j g 2 Fm�1g is identically zero. Now consider the in�nite setR = fs0; s1; s2; : : :g. It follows immediately from our construction that assigning distinct sis to distinctvariables simultaneously invalidates all equations f(~x) = 0, f 2 F0. This �nishes the proof of thelemma and the proposition. 2A similar result can be shown for integral domains.Proposition 5 Let K be an in�nite integral domain, i.e. = (+;�; �; 0; 1). Then for any cbq Qthere exists an in�nite subset UQ � K and a rbq Q0 which is UQ-equivalent to Q.Proof. We use an argument that is very similar to the proof of Proposition 4. Let p 2 K [x1 ; : : : ; xn]11

be a polynomial in n variables over K . We can represent it as(P1) p = Xi2I ci � xmi11 � : : : � xminnwhere I is a �nite set of indices, Mi = (mi1; : : : ;min) is an n-tuple of natural numbers, Mi 6= Mj fori 6= j and ci 2 K � f0g for all i. (The zero polynomial is represented by I = ;.) From now on assumethat all polynomials are represented by (P1). Note that all conditions used in Q can be assumed tobe of the form p(~x)�0, where p is a polynomial over K . Hence, according to the proof of Proposition4, all that is required to conclude the proof is the following:Lemma 4 For any system of equations p1(~x) = 0; : : : ; pk(~x) = 0 where pi 2 K [x1 ; : : : ; xn] are poly-nomials that are not identically zero, there exists an in�nite set K � K such that assigning distinctvalues of K to variables x1; : : : ; xn simultaneously invalidates all the equations.We �rst need thisClaim 2 If p 2 K [x1 ; : : : ; xn] is written in (P1), then p is identically zero i� either I = ; or ci = 0for all i 2 I.To prove this claim, �rst notice that it is enough to consider polynomials in K [x]. Indeed, if p 2K [x1 ; : : : ; xn] is identically zero, then p0(x) = p(x; 1; : : : ; 1) 2 K [x] is identically zero. To prove the claimfor p 2 K [x], we use induction on card(I). If I is empty, we are done. If I is a singleton, this followsfrom the fact that there are no divisors of zero. Assume card(I) > 1 and p(x) = cnxn+ � � �+ c1x+ c0.Since p(0) = 0, we have c0 = 0. Then by cancellation we have p0(x) = cnxn�1 + � � � + c2x + c1 isidentically zero. By the induction hypothesis, we obtain c1 = : : : = cn = 0.Let p 2 K [x1 ; : : : ; xk] be a polynomial in k variables. For any index i and any s 2 K we denote thepolynomial in k � 1 variables, obtained by substituting s for xi in p, by pi;s.Next, we claim the following.Claim 3 For any �nite collection of polynomials p1; : : : ; pm 2 K [x1 ; : : : ; xp] that are not identicallyzero, and for any �nite set S � K , there exists s 2 K � S such that none of the polynomials pi;sj ,j = 1; : : : ;m, i = 1; : : : ; k is identically zero.To prove this claim, assume that all pjs are represented in the form (P1). Then none of the coe�cientscis is zero by Claim 2. Fix a polynomial p = pj and assume it is given by (P1). Let Mi1 ; : : : ;Mil bethose Ms whose components, except the ith one, coincide. If pi;s is identically zero, then(4) ci1sk1 + � � � + cilskl = 0where kr is the ith component of Mr. Doing this operation for all pjs and all collections of Ms inthe prepresentation (P1) that coincide outside of their ith component, we obtain a �nite number of12

equations that must hold if some of the polynomials pi;s are identically zero. Since all the coe�cientsin equations (4) are nonzero, they may only have �nitely many roots. Thus, if we choose s outside ofS such that s does not coincide with any root of the polynomials (4), then none of pi;s is identicallyzero by Claim 2.Now we can conclude the proof of Lemma 4 (and thus of Proposition 5) in exactly the same way aswe did for Proposition 4. 2Note that we did not give a constructive way of �nding the in�nite set UQ. While the mere existenceof such a set is su�cient for us to prove the main theorems, we now demonstrate that for the signatureof a ring the sets UQ have a very simple structure. By Dr we denote the set frri j i 2 N+g.Proposition 6 Let U be R or Q or Z. Then for any cbq Q there exists a number r in N such that theset UQ from the previous propositions can be taken to be Dr. Moreover, r and the corresponding Q0can be e�ectively constructed from Q.Proof. The proofs of all propositions in this section were based on the fact that we can simultane-ously invalidate all nontrivial constraints. Since in the given signature all constraints are of the formp(~x)f=; 6=; <; 6<g0, we have to �nd an in�nite set on which a �nite number of such constraints aresimultaneously validated or invalidated, provided they are all nontrivial.Let p be a polynomial in n variables given by (P1). Consider an arbitrary ordering i1 � i2 � : : : � inof f1; : : : ; ng. OrderMjs lexicographically with respect to �, i.e. Mj �Ml if mji1 > mli1 or mji1 =mli1and mji2 > mli2 etc. Let Mk be the maximal one of Mj , j 2 I, wrt �. Notice that Mk is uniquelyde�ned. The following is the key lemma.Lemma 5 For p and Mk (as constructed above) there exists r0 2 R (which can be e�ectively con-structed) such that for every r � r0 and for every ~x = (x1; : : : ; xn) 2 Dr � : : : � Dr such thatxi1 > xi2 > : : : > xin , it is the case thatsign(p(~x)) = sign(ck)It is easy to see that Lemma 5 implies the proposition. For the ordered case, we can constructivelyrewrite Q to the form (2). This lemma gives us the sign and the corresponding r0 of each polynomialin the rewritten formula. Hence it allows us to replace each inequality constraint by true or by falseand each equality constraint by false. We can then take r to be the greatest of these r0. Also, at thispoint, we would have already constructed the desired Q0 for the ordered case | the nonconstructivesteps described in Lemma 2 are thus avoided.For the unordered case, we can rewrite Q to the form (3). Then for each polynomial in the rewrittenformula and for each possible order of its variables, we determine a r0 using Lemma 5. Then r can betaken as the greatest of these r0. Then all equations in the rewritten formula can be replaced by falseand all inequations by true | the nonconstructive steps described in Lemma 3 can thus be skipped.It remains now to prove Lemma 5. To make the notation bearable, assume without loss of generalitythat 1 � 2 � : : : � n. Let ck > 0. We use the notation ~xM for Qni=1 xmii for M = (m1; : : : ;mn). Let13

(?) stand for x1 > x2 : : : > xn. Let � = maxi2I maxj=1;:::;nmijClaim 4 If r > n�+ 1, then for any ~x 2 Dnr satisfying (?) and for any Mi 6=Mk, we have~xMk > ~xMirTo prove this claim, assume that M = Mi is under M in the lexicographic ordering. Then, forMk = (m1; : : : ;mn) and M = (�1; : : : ; �n) we can �nd an i such that mj = �j for j < i and mi > �i.For now assume i 6= n. Let xi = rrs . We obtain, since r > 1,~xMk = nYj=1xmjj � (Yj<ixmjj ) � (rrs)miFurthermore, ~xM = (Yj<ix�jj ) � (rrs)�i � (Yj>ix�jj )� (Yj<ixmjj ) � (rrs)mi�1 � (Yj>i (rrs�1)�)� (Yj<ixmjj ) � (rrs)mi�1 � (rrs�1)n�since xjs come from Dr and we know r > 1 and xi > xj for j > i. Finally,~xMk~xM � rrs(rrs�1)n� � r(r�n�)rs�1 > rrs�1 � rsince r�n� > 1 and r > 1. The proof for the case i = n is similar (essentially the last product is notneeded). The claim is thus proved.De�ne G = maxi(jci j) � card(I) and then it su�ces to setr0 = max(Gck ; n�) + 1To see this, suppose r � r0. If I = fkg, we are done. If not, using Claim 4, we obtain for any vector~x 2 Dnr satisfying (?): p(~x) � ck � ~xMk � Xi6=k jci j �~xMi> ck � ~xMk � Xi6=k jci j �~xMkr� ck � ~xMk � Xi6=kmaxl (jcl j) � ~xMkr> ck � ~xMk � maxl (jcl j) � card(I) � ~xMkr= ~xMk(ck � Gr ) > 014

Thus, p(~x) > 0, which proves the case ck > 0. To prove the case ck < 0, just apply the above proof to�p. Lemma 5 is proved. 24.2 Proofs of Main TheoremsWe can now present the straightforward proofs of the main results of this section.Proof of Theorem 1 and Corollary 1 Consider a cbq Q and �nd a set UQ and a �-rbq Q0 asin Proposition 3. Now we claim that Q and Q0 are equivalent. Indeed, take any database DB andlet D = adom(DB). Find any partial monotone injective map � de�ned on D such that �(D) � UQ.This is possible because UQ is in�nite. Since Q is locally generic, Q(DB) = Q(�DB). By Proposition3, Q(�DB) = Q0(�DB) since adom(�DB) � UQ. Since Q0 is a �-rbq, it is locally generic, and henceQ0(�DB) = Q0(DB). Thus, Q(DB) = Q0(DB), proving that Q and Q0 are equivalent as desired.For the proof of Corollary 1, part a, note that if U is weakly homogeneous, then there is a monotoneinjective map such that �(D) � UQ. Furthermore, Q(DB) = Q(�DB) because Q is MG. Then theproof follows. 2Proof of Theorems 2 and 3 Consider a cbq Q and �nd an in�nite set UQ and a rbq Q0 as inProposition 4 or 5. We claim that Q and Q0 are equivalent. For a database DB let D = adom(DB).Find any permutation � of R such that �(D) � UQ. Since Q is generic, Q(DB) = Q(�DB). ByProposition 4 or 5, Q(�DB) = Q0(�DB) since adom(�DB) � UQ. Since Q0 is a rbq, it is totallygeneric, and thus Q0(�DB) = Q0(DB). Therefore, Q(DB) = Q0(DB), proving that Q and Q0 areequivalent as desired. 25 Relational Expressive Power: Natural SemanticsIn this section we prove the collapse theorem for the natural interpretation of queries. That is, weprove that for certain signatures , any LG-query in NFO(; <) can be de�ned just in NFO(<).Throughout this section we assume that the domain U is linearly ordered by <, and that queries areevaluated under the natural interpretation. We say that is o-minimal (see [14]) if every subsetof U that is de�nable with parameters in the model hU;i is composed of a �nite union of (possiblydegenerate) intervals. By intervals we mean sets of form fx j x < ag, fx j x > ag, fx j a < x < bg orclosed, or half-open half-closed versions of these.If U = R, then examples of o-minimal signatures include:� (+; �; <; 0; 1) | this follows from Tarski's quanti�er elimination theorem.� (+; �; ex; 0) | this follows from the work of [32], see also [31].� (+; �; ex;�(x); 0) | this was recently announced [30].15

There are many other intersting examples of o-minimal structures, see [31].For the rest of this section, we restrict our attention to signatures that are o-minimal. Our mainresult isTheorem 4 If is an o-minimal signature on U, then for every LG cbq there exists an n-equivalent�-rbq. That is NFOLG(; <) = NFOLG(<):We choose to make use of the technique of nonstandard universes. This will be of use in simplifyingsome of the bookkeeping involved in Ehrenfeucht-Fraisse games. It also allows us to construct aproof that follows this basic intuition: constraint boolean queries over the real �eld cannot distinguish\large" instances which agree on \all" relational boolean queries.5.1 PreliminariesWe start with some de�nitions of nonstandard universes. For more information, the reader shouldconsult [5].For any set S, the superstructure V (S) over S is de�ned as V (S) = Sn<! Vn(S) where V1(S) = S,and Vn+1(S) = Vn(S) [ fX j X � Vn(S)g.We will work with the structure hV (S);2i considered as a structure for the �rst-order language forthe epsilon relation. A bounded-quanti�er formula in this language is a formula built up fromatomic formulas by the logical connectives and the quanti�cation: 8X 2 Y , 9X 2 Y , where X and Yare variables.A nonstandard universe consists of a pair of superstructures V (S) and V (Y ) and a mapping� : V (S) ! V (Y ) which is the identity when restricted to S (i.e. �x = x for each x in S) and whichsatis�es1. Y = �S.2. (Transfer Principle) For any bounded quanti�er formula �(v1; : : : ; vn) and any list a1; : : : ; an ofelements from V (S), �(a1; : : : ; an) is true in V (S) if and only if �(�a1; : : : ; �an) is true in V (Y ).An element of V (Y ) is standard if it is in the image of the �-map. An element of V (Y ) is internalif it is in the downward transitive closure of the standard sets under 2. Elements of V (Y ) that arenot internal are called external. An internal map is a map whose graph is an internal set.We will assume that our universe also satis�es the following3. (Countable Saturation Principle) For every standard A, and every countable collection �(x;~v)of bounded-quanti�er formulas, and for every vector ~c of internal sets, if every �nite subset of�(x;~c=~v) is satis�ed in V (Y ) by some element of A, then �(x;~c=~v) is satis�ed by an element ofA. 16

Finally, we work in a nonstandard universe that satis�es the following axiom:4. (Isomorphism Property) For any �nite �rst-order language L, and any two L-structures M1 andM2 that are internal, if M1 and M2 are L-elementary equivalent (i.e., agree on all standardsentences of L), then there is an (not necessarily internal) L-isomorphism between M1 and M2.For example, the isomorphism property above guarantees that any two sets whose cardinality isa nonstandard integer have the same external cardinality. For basic facts about the isomorphismproperty, and a proof that saturated models with the isomorphism property exist, the reader is referredto [13].We will often omit the � when convenient: for example, if < is an ordering on S, x1 and x2 areelements of �S, then we will write x1 < x2 rather than x1�< x2. Similarly, if Q is a query on schemaSC, and M is a �-database (an element of the set �S , where S is the set of databases for SC), thenwe will refer to q(M) rather than �q(M).For the proof we take S to be the disjoint union of U and N.We �rst state a couple of propositions that will be useful.Proposition 7 Let L be a �rst-order language such that each symbol is represented by an internal setin the nonstandard universe (although the whole set of symbols need not be internal). Let M be an L-structure such that the domain of M is internal and the interpretation of each symbol in L is internal(such a model is called \internally presented" in the literature). Let �(y) be a countable collection ofL-formula. Then, if � is �nitely satis�able in M , then � is satis�ed in M .Proof. For each formula �(y) in �, letM� be the reduct of M to the (�nite, hence internal) languageof �, and let �0(y) be the formula (in the language of set theory) that says hN�; yi satis�es �(y). Theneach �0(y) is an internal formula satis�ed in the nonstandard universe, so by countable saturation,there is a y satisfying each �0(y). 2Proposition 8 Let L be a language for which each symbol is internal, and let M be an L-structuresuch that the domain of M is internal and the interpretation of each symbol in L is internal. Let �(~x)be a formula of L that has standard �nite cardinality (i.e. number of symbols). Then the internalsatisfaction predicate �j= agrees with the external satifaction predicate j= on �. That is, if ~c is a �nitesequence of parameters from M , then M �j= �(~c) i� M j= �(~c).The proof is by straightforward induction on logical complexity, using the Tarskian de�nition of truthand the transfer principle.Given the above proposition we will not distinguish the two kinds of satisfaction predicates when weare dealing with �nitary �'s.5.2 Proof of theorem 4Note that it su�ces to prove the theorem for �nite, since any counterexample to collapse wouldinvolve a single constraint boolean query, which would involve only �nitely many symbols from the17

language of . So henceforth we will asume to be �nite.Let Q be a counterexample query over some schema SC = fR1; : : : ; Rng. That is, Q is expressible in and is locally generic, but is not expressible only with order.Proposition 9 There are two �SC-databases M1 and M2 in the nonstandard universe (that is, ele-ments of the image under � of the set of databases) such that M1 and M2 agree on every <-query butdisagree on Q.Proof. Note thatM1 andM2 are internal. Let �n enumerate the <-queries. Let =n be the equivalencerelation on databases given by D1 =n D2 i� D1 and D2 agree on the �rst n �i.By saturation, it su�ces to show that for every standard natural number n, there are two modelsthat agree on �i for each i � n but disagree on Q. If this were not the case, then the models of Qare composed of �nitely many =n equivalence classes. But since each equivalence class is de�nableby a <-cbq, this would make Q de�nable as a <-cbq as well, since it would be the disjunction of the�nitely many sentences de�ning the =n classes contained in it, contrary to the assumption on Q. 2Now, let the active domain of M1 have size H, and the active domain of M2 have size K. Withoutloss of generality, we will assume K � H.Let c1; : : : ; cH be an (internal) sequence of elements of �U indiscernible over . Such a sequenceexists by transfer, since for each n there is a sequence indiscernible for the �rst n formulas of . Letd1; : : : ; dK be any subsequence of the ci's of length K.There is an (internal) order-preserving bijection from the active domain of M1 to c1; : : : ; cH . Thisis true by transfer, since the nonstandard universe believes that for any two subsets of U with thesame cardinality, there is an order-preserving map from one to the other. Let M 01 be the image of M1under this mapping | the active domain of M 01 is now c1; : : : ; cH . Similarly, we can get a model M 02with active domain d1; : : : ; dK by applying a di�erent order-preserving mapping. Since the mappingspreserve order, we still have that M 01 and M 02 also agree on every <-query. By the genericity of Q, M 01and M 02 still disagree about Q.Consider M 01 and M 02 as models for the �rst-order language containing the schema predicates and <,with �U as the domain of both models. The fact that M 01 and M 02 agree on all <-queries says exactlythat these two models are elementary equivalent.Applying the Isomorphism Principle to these models, we get that there is a mapping f from �U ontoitself that is an isomorphism (in the language of < plus the schema relations) of M 01 onto M 02. Notethat since f is an isomorphism of the database relations, f maps something in the active domain ofM 01 to something in the active domain ofM 02. Hence f maps the c1; : : : ; cH bijectively onto d1; : : : ; dK .Also note that f is external.Now we consider a new language L0 with symbols for the elements of and <, plus constant symbolskI for I � H. Let N1 be the model for L0 with domain �U, the elements of and order interpreted inthe usual way, and with kI interpreted by cI for I � H. Let N2 be the same, except kI is interpretedby f(cI). Note that the model N2 is not internal.18

We de�ne the relation ~x ) ~y on vectors of the same length from �U to mean that the model hN1; ~xiis L0-elementary equivalent to hN2; ~yi.Proposition 10 ; ) ;Proof. Let �(kj1 ; : : : ; kjn) be any sentence in L0 satis�ed by N1. That is, �; cj1 ; : : : ; cjn satis�es �0,where �0 is the formula mentioning only the symbols in the language of that is obtained from � byreplacing each kj by a free variable. By indiscernibility, �; cm1 ; : : : ; cmn satis�es �0 whenever the mi'sare ordered the same as the ji's. Since the mapping f is order-preserving, and since (as noted above)it maps each ci to some dj, we get that �; f(cj1); : : : ; f(cjn) satis�es �0. But this means N2 satis�es�. 2Proposition 11 The relation ~x) ~y has the back and forth property. That is, if ~x) ~y, then� For each w in �U there is z in �U such that h~x;wi ) h~y; zi.� For each w in �U there is z in �U such that h~x; zi ) h~y;wi.Proof. Let ~x and ~y with ~x) ~y be given, and let w in �U be arbitrary. Let Ln be the language withsymbols for the operations in , and a constant symbol kI for each I � H, and n extra constantswhere n is the length of ~x.SupposeClaim 5 There is a countable set of formulas �(z) in Ln with the property that 1) every formula in �is satis�ed by w in hN1; ~xi and 2) for every formula �(z) satis�ed by w in hN1; ~xi, there is a formula�(z) in � such that hN1; ~xi satis�es 8z:�(z)! �(z).Then the �rst part of the proposition can be argued as follows. First, we show that �(z) is satis�edin hN2; ~yi. That is, with the constants interpreted by f(ci) instead of ci and the �nitely many extraconstants interpreted by ~y instead of ~x. By Proposition 7, it su�ces to show that �(z) is �nitelysatis�ed in hN2; ~yi. But this follows from the fact that �(z) is �nitely satis�able in hN1; ~xi and thefact that ~x) ~y.So we have a w0 that satis�es �(z) in hN2; ~yi. We now show that this w0 satis�es all the same formulasof Ln that w does in hN1; ~xi. Let �(z) be a formula of Ln satis�ed by w in hN1; ~xi. By Claim 5, thereis a formula �(z) of �(z) such that hN1; ~xi j= 8z:�(z)! �(z)Since ~x) ~y we have hN2; ~yi j= 8z:�(z)! �(z)Since w0 satis�es � in hN2; ~yi, it satis�es � in hN2; ~yi. Therefore, w0 satis�es � in hN2; ~yi as desired.The second part of the proposition is proved similarly by assuming the analogous claim below onhN2; ~yi. 19

Claim 6 There is a countable set of formulas �(z) in Ln with the property that 1) every formula in �is satis�ed by w in hN2; ~yi and 2) for every formula �(z) satis�ed by w in hN2; ~yi, there is a formula�(z) in � such that hN2; ~yi satis�es 8z:�(z)! �(z).Next, we prove Claim 5. For any one-variable formula (y) in the language of , possibly includingparameters from �U, an endpoint of (y) means an endpoint of some maximal interval contained inthe subset of �U de�ned by . By o-minimality, each such formula has only �nitely many endpoints.Also, if we have two endpoints x0 < x1 for the formula (y), and there is no other endpoint for (y)lying between these two, then the truth value of (y) is constant on the open interval (x0; x1).For each formula �(~z; y) in the language of with parameters from ~x, letal� = maxfu j there is ~c from c1; : : : cH such that u is an endpoint of �(~c=~z; y) and u � wgar� = minfu j there is ~c from c1; : : : cH such that u is an endpoint of �(~c=~z; y) and w � ugThen al� � w � ar� and there is a vector ~c from c1; : : : ; cH such that both ar� and al� are de�nablefrom ~c [ ~x (since there are only �nitely endpoints for �(~c=~z; y), and we have a linear ordering todistinguish these endpoints, each endpoint is de�nable from the parameters in the formula �).Case 1. If al� = w or ar� = w for some �, then w is de�nable from parameters in c1; : : : ; cH by aformula in the language of . By replacing the parameter ci by the constant symbol from ki, weget that w is de�nable in Ln by a single formula (y). This clearly proves the claim as we can take�(y) to be (y).Case 2. Suppose al� < w < ar� for each �. Now let ��(y) be the formula9z9z0:�1�(z) ^ �2�(z0) ^ z � y � z0where �1� and �2� are the formulas de�ning al� and ar�. Let �(y) be the formula of Ln formed byreplacing each parameter cI by the constant symbol kI . Then each � is a formula of Ln.Now let �(y) = h �(y)i, where � varies over the countable set of formulas in the variables ~z and ywith parameters from ~x. Then we show that �(y) works.If we have another element w0 in �U and �(y) is a formula in Ln satis�ed by w, then there is a formula�0(~z; y) in the language of with parameters from ~x such that � is obtained from �0 by replacingvariables in ~z by constant symbols from hkIiI<H . But then the de�nitions imply that �(y) does notchange truth value between al�0 and ar�0 , hence � is of constant truth value strictly between thesetwo elements.We claim that hN1; ~xi j= �0(y)! �(y). Indeed, if w0 satis�es �0(y), then both w and w0 lie strictlybetween al�0 and ar�0 . Therefore, �(w) $ �(w0), and hence �(w0) holds. This �nishes the proof ofClaim 5.We now brie y trace out the proof of Claim 6 to complete the proof of the proposition. For eachformula �(~z; y) in the language of with parameters from ~y, letal� = maxfu j there is ~d from d1; : : : dK such that u is an endpoint of �(~d=~z; y) and u � wg20

ar� =minfu j there is ~d from d1; : : : dK such that u is an endpoint of �(~d=~z; y) and w � ugCase 1. If al� = w or ar� = w for some �, then w is de�nable from parameters coming from the dI 'sand ~y by a formula in the language of . By replacing the parameter dI with the constant symbolkJ , where f(cJ) = kI , we get that w is de�nable in Ln by a single formula 0(y). This clearly provesthe claim since we can take �(y) = 0(y).Case 2. Suppose al� < w < ar� for each �. Now let ��(y) be the formula9z9z0:�1�(z) ^ �2�(z0) ^ z � y � z0where �1� and �2� are the formulas de�ning al� and ar�. Let �(y) be the formula of Ln formedby replacing each parameters dI by the constant symbol kJ as before (i.e. kJ such that f(cJ) = dI).Then each � is a formula of Ln.We now let �(y) = h �(y)i as before, and check that this works. This completes the proof of Propo-sition 11. 2Finally, let the language L+ contain symbols for the database relations in the schema, and also theoperations in . Let P1 and P2 be the expansions of M 01 and M 02 to L+. Then we haveProposition 12 P1 is elementary equivalent in L+ to P2.Proof. We show how to win the Ehrenfeucht game for L+ (equivalently, we show P1 is partiallyisomorphic to P2). If our opponent plays cI in P1, then we play f(cI) in P2, and if our opponent playsdI in P2, then we play f�1(dI) in P1. If our opponent plays a w in P1 that is not one of the cI 's,then we apply Proposition 11 to get our response in P2, and similarly when our opponent plays in P2.The fact that this strategy works follows from Proposition 11 and the fact that the mapping f is anisomorphism of the schema relations. 2But now we have two models P1 and P2 that agree on every -query, but disagree about Q (since theyare expansions ofM 01 andM 02). Hence Q cannot be expressible in the language of , which contradictsthe assumption on Q. This concludes the proof of Theorem 4.6 Putting it all togetherIn this section, we �rst tie together our results on the active semantics and the natural semantics ofgeneric queries. Then we provide a summary of this paper followed by some concluding remarks.Theorem 5 Let o-min be an o-minimal signature. Then NFOLG(o-min; <) = AFO(<).Proof. Let Q be in NFOLG(o-min; <). By Theorem 4 it is in NFOLG(<) and hence inNFOLG(R;+;�; 0; 1; <). By Fact 2, Q is inAFOLG(R;+;�; 0; 1; <). By Theorem 1 it is inAFO(<).Note that even though we have used the real domain in the argument above, o-min does not need tobe on the reals. 221

AFOTG(sparse) � - NFOLG(o-min; <) ======== AFOLG(; <)AFOTG(+;�; �; 0; 1)wwwwwwww � - NFOTG(+; �; 0; 1) � - NFOLG(+; �; 0; 1)wwwwwwww ===== AFOLG(+;�; �; 0; 1; <)wwwwwwwwAFOTG(+;�; 0; 1)wwwwwwww � - NFOTG(+;�; 0; 1; <)wwwwwwww �- NFOLG(+;�; 0; 1; <)wwwwwwww ====[24] AFOLG(+;�; 0; 1; <)wwwwwwww

AFOwwwwwwwww =================[15] NFOTG[6 � - NFOLG(<)wwwwwwww =============== AFO(<)wwwwwwwwwFigure 1: Summary of the expressiveness results for databases over the realsIt follows immediately thatCorollary 2 Let o-min be an o-minimal signature. Then NFO(o-min; <) cannot express transitiveclosure, deterministic transitive closure, parity test, and connectivity test. In particular, none of theabove is expressible under the natural interpretation of the relational calculus with constraints of theform f(~x)�g(~x) where � 2 f=;�g and f; g are functions de�nable in the signature (+;�; �; ex; 0). 2In conclusion, we have thus settled the open problem of whether parity and connectivity can beexpressed in the relational calculus with arithmetic constraints. In addition, we have shown that theaddition of arithmetics does not give us more power to de�ne generic queries, for both the activedomain semantics and the natural semantics. In fact, we have proved that the two semantics oftencoincide when limited to generic queries. Thus, we have given a clear picture of the expressiveness ofthe relational calculus with arithmetic constraints, where generic queries are concerned.The diagram shown in Figure 1 summarizes our expressiveness results for databases over the reals.By o-min we mean any o-minimal signature and by sparse we mean any sparse signature. The== arrow means equality and the ,! arrow means proper embedding. That AFOTG(+;�; �; 0; 1),! NFOTG(+; �; 0; 1) follows from the fact that there are TG-queries that are not expressible inrelational calculus without order. The embedding NFOTG(+; �; 0; 1) ,! NFOLG(+; �; 0; 1) followsfrom the following observation. Let the scheme contain two unary relation symbols R1 and R2 andlet Q be 8x8y:(R1(x) ^ R2(y)) ! x < y. Since Q is LG but not TG, it separates NFOLG(+; �; 0; 1)from NFOTG(+; �; 0; 1). Other embeddings and equations follow from the results of this paper.22

Acknowledgements. We thank Jianwen Su, Jan Van den Bussche, Martin Otto, and Dirk VanGucht for discussions, encouragements, and references.References[1] S. Abiteboul, R. Hull and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.[2] F. Afrati, S. Cosmadakis, S. Grumbach and G. Kuper. Linear vs polynomial constraints indatabase query languages. In Proceedings of Conference on Constraint Programming, 1994.[3] F. Afrati, T. Andronikos and T. Kavalieros. On the expressiveness of �rst-order query languages.To appear in Proceedings of Conference on Constraint Programming, 1995.[4] A. V. Aho and J. D. Ullman. Universality of data retrieval languages. In Proceedings of 6thSymposium on Principles of Programming Languages, Texas, pages 110{120, January 1979.[5] C.C. Chang and H.J. Keisler. Model Theory. North Holland, 1990.[6] J. Denef and L. Van den Dries. p-adic and real subanalytics sets. Annals of Mathematics 128(1988), 79-138.[7] R. Fagin. Probabilities on �nite models. Journal of Symbolic Logic, 41(1):50-58, 1976.[8] H. Gaifman. On local and non-local properties. In Proceedings of the Herbrand Symposium, LogicColloquium '81, pages 105{135, North Holland, 1982.[9] S. Grumbach and T. Milo. Towards tractable algebras for bags. In Proceedings of 12th ACMSymposium on Principles of Database Systems, Washington, D. C., pages 49{58, May 1993.[10] S. Grumbach and J. Su. Finitely representable databases, In Proceedings of 13th ACM Symposiumon Principles of Database Systems, Minneapolis, Minnesota, pages 289{300, May 1994.[11] S. Grumbach and J. Su. First-order de�nability over constraint databases. To appear in Proceed-ings of Conference on Constraint Programming, 1995.[12] S. Grumbach, J. Su, and C. Tollu. Linear constraint databases. In Proceedings of LCC, 1994.[13] C.W. Henson. The isomorphism property in nonstandard analysis and its use in the theory ofBanach spaces. Journal of Symbolic Logic 39 (1974), 717{731.[14] W. Hodges. Model Theory. Cambridge University Press, 1992.[15] R. Hull and J. Su. Domain independence and the relational calculus. Acta Informatica 31:513{524,1994.[16] P. Kanellakis. Constraint programming and database languages: A tutorial. In Proceedings of14th ACM Symposium on Principles of Database Systems, San Jose Ca, pages 46{53, May 1995.[17] G. Kuper. On the expressive power of the relational calculus with arithmetic constraints. InProceedings of International Conference on Database Theory, pages 201-211, 1990.23

[18] P. Kanellakis, G. Kuper, and P. Revesz. Constraint query languages. Journal of Computerand System Sciences, to appear. Extended abstract in Proceedings of 9th ACM Symposium onPrinciples of Database Systems, Nashville, Tennessee, pages 299{313, May 1990.[19] L. Libkin and L. Wong. Some properties of query languages for bags. In Proceedings of 4thInternational Workshop on Database Programming Languages, New York, pages 97{114, August1993.[20] L. Libkin and L. Wong. Aggregate functions, conservative extension, and linear orders. InProceedings of 4th International Workshop on Database Programming Languages, New York,pages 282-294, August 1993.[21] L. Libkin and L. Wong. Conservativity of nested relational calculi with internal generic functions,Information Processing Letters 49:273{280, 1994.[22] L. Libkin and L. Wong. New techniques for studying set languages, bag languages, and ag-gregate functions. In Proceedings of 13th ACM Symposium on Principles of Database Systems,Minneapolis, Minnesota, pages 155{166, May 1994.[23] J. Paredaens, J. Van den Bussche, and D. Van Gucht. Towards a theory of spatial databasequeries. In Proceedings of 13th ACM Symposium on Principles of Database Systems, Minneapolis,Minnesota, pages 279{288, May 1994.[24] J. Paredaens, J. Van den Bussche, and D. Van Gucht. First-order queries on �nite structures overthe reals. In Proceedings of 10th IEEE Symposium on Logic in Computer Science, San Diego,California, pages 79{87, 1995.[25] A. Pillay, C. Steinhorn. De�nable sets in ordered structures Bulletin of the AMS, 11,pages159-62.[26] P. Revesz. The computational complexity of constraint query languages. In Working notes ofthe Workshop on Semantics in Databases, Prague, 1995. Full paper to appear in the Proceedingsof the Workshop.[27] J. G. Rosenstein. Linear Orderings. Academic Press, New York, 1982.[28] W. Rudin. Real and Complex Analysis. McGraw-Hill, New York, 1966.[29] J. Van den Bussche. Private email. June 1995.[30] L. Van den Dries. New o-minimal expansions of the reals. Lecture. May 1995[31] L. Van den Dries, A. Macintyre and D. Marker. The elementary theory of restricted analytic�elds with exponentiation. Annals of Mathematics 85 (1994), 19{56.[32] A.J. Wilkie. Model completeness results for expansions of the ordered �eld of real numbers byrestricted Pfa�an functions and the exponentail function. To appear in Journal of the AMS.[33] L. Wong. Normal form and conservative properties for query languages. In Proc. 12th ACMSymp. on Principles of Database Systems, 1993.24

Relational expressive power of constraint query languages

Documents

Transcript of Relational expressive power of constraint query languages