Maintaining Views Incrementally

Maintaining Views IncrementallyExtended AbstractAshish Gupta �Stanford [email protected] Inderpal Singh MumickAT&T Bell [email protected] V.S. SubrahmanianyUniversity of [email protected] present incremental evaluation algorithms to computechanges to materialized views in relational and deductivedatabase systems, in response to changes (insertions,deletions, and updates) to the relations. The viewde�nitions can be in SQL or Datalog, and may use UNION,negation, aggregation (e.g. SUM, MIN), linear recursion,and general recursion.We �rst present a counting algorithm that tracksthe number of alternative derivations (counts) for eachderived tuple in a view. The algorithm works with bothset and duplicate semantics. We present the algorithmfor nonrecursive views (with negation and aggregation),and show that the count for a tuple can be computed atlittle or no cost above the cost of deriving the tuple. Thealgorithm is optimal in that it computes exactly thoseview tuples that are inserted or deleted. Note that westore only the number of derivations, not the derivationsthemselves.We then present the Delete and Rederive algorithm,DRed, for incremental maintenance of recursive views (ne-gation and aggregation are permitted). The algorithmworks by �rst deleting a superset of the tuples that needto be deleted, and then rederiving some of them. Thealgorithm can also be used when the view de�nition isitself altered.�This work was supported by NSF grants IRI-91-16646 andIRI-90-16358, and ARO DAAL-03-G-0177.yThis work was supported by ARO grant DAAL-03-92-G-0225, and NSF grant IRI-9109755, and AFOSR grant F49620-93-1-0065.

1 IntroductionA view is a derived (idb) relation de�ned in terms ofbase (stored, or edb) relations. A view can be ma-terialized by storing its extent in the database. In-dex structures can be built on the materialized view.Consequently, database accesses to materialized viewtuples is much faster than by recomputing the view.Materialized views are especially useful in distributeddatabases. However, deletion, insertions, and up-dates to the base relations can change the view. Re-computing the view from scratch is too wasteful inmost cases. Using the heuristic of inertia (only a partof the view changes in response to changes in the baserelations), it is often cheaper to compute only thechanges in the view. We stress that the above is onlya heuristic. For example, if an entire base relationis deleted, it may be cheaper to recompute a viewthat depends on the deleted relation (if the new viewwill quickly evaluate to an empty relation) than tocompute the changes to the view.Algorithms that compute changes to a view in re-sponse to changes to the base relations are called in-cremental view maintenance algorithms. Several suchalgorithms with di�erent applicability domains havebeen proposed [BC79, NY83, SI84, BLT86, BT88,BCL89, CW91, Kuc91, QW91, WDSY91, CW92,DT92, HD92]. View maintenance has applicationsin integrity constraint maintenance, index mainte-nance in object-oriented databases (de�ne the indexbetween attributes of interest as a view), persistentqueries, active database [SPAM91, RS93] (a rule may�re when a particular tuple is inserted into a view).We present two algorithms, counting and DRed,for incremental maintenance of a large class ofviews. Both algorithms use the view de�nition toproduce rules that compute the changes to the viewusing the changes made to the base relations andthe old materialized views. Both algorithms can 1

handle recursive and nonrecursive views (in SQL orDatalog extensions) that use negation, aggregation,and union, and can respond to insertions, deletionsand updates to the base relations. However, we areproposing the counting algorithm for nonrecursiveviews, and the DRed algorithm for recursive views,as we believe each is better than the other on thespeci�ed domain.Example 1.1 Consider the following view de�ni-tion. link(S;D) is a base relation and link(a; b) istrue if there is a link from source node a to destina-tion b. hop(c; d) is true if c is connected to d via twolinks i.e. there is a link from node c to some node xand a link from x to d.CREATE VIEW hop(S;D) AS(SELECT r1:S, r2:DFROM link r1, link r2WHERE r1:D = r2:S).Given link = f(a; b); (b; c); (b; e); (a; d); (d; c)g, theview hop evaluates to f(a; c); (a; e)g. The tuplehop(a; e) has a unique derivation. hop(a; c) on theother hand has two derivations. If the view hadduplicate semantics then hop(a; e) would have a countof 1 and hop(a; c) would have a count of 2.Suppose the tuple link(a; b) is deleted. Then wecan re-evaluate hop to f(a; c)g.The counting algorithm infers that one derivationof each of the tuples hop(a; c) and hop(a; e) isdeleted. The algorithm uses the stored counts toinfer that hop(a; c) has one remaining derivationand therefore only deletes hop(a; e), which has noremaining derivation.The DRed algorithm �rst deletes tuples hop(a; c)and hop(a; e) since they both depend upon thedeleted tuple. The DRed algorithm then looksfor alternative derivations for each of the deletedtuples. hop(a; c) is rederived and reinserted into thematerialized view in the second step. 2Counting The counting algorithm works by stor-ing the number of alternative derivations of each tu-ple t in the materialized view. We call this num-ber count(t). count(t) is derived from the multiplic-ity of tuple t under duplicate semantics, as de�nedin [Mum91] for positive programs and in [MS93a] forprograms with strati�ed negation. Given a programT de�ning a set of views V1; : : : ; Vk, the counting algo-rithm derives a program T� at compile time. T� usesthe changes made to base relations and the old valuesof the base and view relations to produce as outputthe set of changes, �(V1); : : : ;�(Vk), that need to

be made to the view relations. We assume that thecount value for each tuple is stored in the material-ized view. In the set of changes, inserted tuples arerepresented with positive counts and deleted tuplesare represented with negative counts. The new mate-rialized view is obtained by combining the changes�(V1); : : : ;�(Vk) with the stored views V1; : : : ; Vk(combining counts as de�ned in Section 3). The in-cremental view maintenance algorithmworks for bothset and duplicate semantics. On nonrecursive viewswe show that counts can be computed at little or nocost above the cost of evaluating the view (Section 5)for both set and duplicate semantics; hence it can beused for SQL. We propose to use the counting algo-rithm only for nonrecursive views, and describe thealgorithm for nonrecursive views only.Deletion and Rederivation The DRed algorithmcan incrementally maintain (general) recursive views,with negation and aggregation. Given the changesmade to base relations, changes to the view relationsare computed in three steps. First, the algorithmcomputes an overestimate of the deleted derivedtuples: a tuple t is in this overestimate if thechanges made to the base relations invalidate anyderivation of t. Second, this overestimate is prunedby removing (from the overestimate) those tuplesthat have alternative derivations in the new database.Finally, the new tuples that need to be added arecomputed using the partially updated materializedview and the changes made to the base relations.Only set semantics can be used for this algorithm.The algorithm can also maintain materialized viewsincrementally when rules de�ning derived relationsare inserted or deleted.Paper Outline Section 2 compares the results inour paper with related work. Section 3 introducesthe notation used in the paper. Section 4 describesthe counting algorithm for maintaining nonrecursiveviews. Section 5 describes how the counting algo-rithm can be implemented e�ciently. We show thata computation of counts imposes almost no overheadin execution time and storage. Section 6 explainshow negation and aggregation are handled by thecounting algorithm of Section 4. Section 7 discussesthe DRed algorithm for maintaining general recursiveviews. The results are summarized in Section 8.2 Related WorkCeri and Widom [CW91] describe a strategy to ef-�ciently update views de�ned in a subset of SQL 2

without negation, aggregation, and duplicate seman-tics. Their algorithm depends on keys, and cannot beused if the view does not contain the key attributesof a base relation. Qian and Wiederhold [QW91]use algebraic operations to derive the minimal rela-tional expression that computes the change to select-project-join views. The algorithms by Blakeley et.al. [BLT86] and Nicolas and Yazdanian (The BD-GEN system [NY83]) are perhaps most closely relatedto our counting algorithm. Blakeley's algorithm isa special case of the counting algorithm applied toselect-project-join expressions (no negation, aggrega-tion, or recursion). In BDGEN, the counts re ectonly certain types of derivations, are multiplied toguarantee an even count for derived tuples, and allrecursive queries are given �nite counts. Thus theBDGEN counts, unlike our counts, do not correspondto the number of derivations of a tuple, are more ex-pensive to compute, and the BDGEN algorithm can-not be used with (SQL) duplicate semantics or withaggregation, while our algorithm can be.Kuchenho� [Kuc91] and Harrison and Dietrich (thePF algorithm [HD92]) discuss recursive view main-tenance algorithms related to our rederivation algo-rithm. Both of these algorithms cannot handle aggre-gation (we can). Where applicable, the PF (Propaga-tion/Filteration) algorithm computes changes in onederived predicate due to changes in one base pred-icate, iterating over all derived and base predicatesto complete the view maintenance. An attempt torecompute the deleted tuples is made for each smallchange in each derived relation. In contrast, our red-erivation algorithm propagates changes from all basepredicates onto all derived predicates stratum by stra-tum, and recomputes deleted tuples only once. ThePF algorithm thus fragments computation, can red-erive changed and deleted tuples again and again, andcan be worse that our rederivation algorithm by anorder of magnitude (examples in full paper). Kuchen-ho�'s algorithm needs to store auxiliary relations,and fragments computation in a manner similar tothe PF algorithm.Dong and Topor [DT92] derive nonrecursive pro-grams to update right-linear chain views in responseto insertions only. Dong and Su [DS92] give nonrecur-sive transformations to update the transitive closureof speci�c kinds of graphs in response to insertionsand deletions. The algorithm does not apply to allgraphs or to general recursive programs. They alsoneed auxiliary derived relations, and cannot handlenegation and aggregation. Urpi and Olive [UO92]need to derive functional dependencies, a problem

that is known to be undecidable. Wolfson et. al.[WDSY91] use a rule language with negation in thehead and body of rules, along with auxiliary informa-tion about the number of certain derivations of eachtuple. They do not discuss how to handle recursivelyde�ned relations that are derivable in in�nitely manyiterations, and do not handle aggregation.3 NotationWe use Datalog, mostly as discussed in [Ull89],extended with strati�ed negation [VG86, ABW88],and strati�ed aggregation [Mum91]. Datalog ex-tended with strati�ed negation and aggregation canbe mapped to a class of recursive SQL queries, andvice versa [Mum91]. We chose Datalog syntax overSQL syntax for conciseness.De�nition 3.1 StratumNumbers (SN) and RuleStratum Number (RSN): Stratum numbers are as-signed as follows: Construct a reduced dependencygraph (RDG) of the given program by collapsing ev-ery strongly connected component (scc) of the de-pendency graph (as de�ned by [ABW88]) to a singlenode. A RDG is guaranteed to be acyclic. A topo-logical sort of the RDG assigns a stratum number toeach node. If a node represents a scc, all predicates inthe scc are assigned the stratum number of the node.By convention, base predicates have stratum number= 0. The rule stratum number of a rule r, RSN(r),having predicate p in the head is equal to SN(p). 2P refers to the relation corresponding to predicate p.P = fab;mng represents tuples p(a; b) and p(m;n).De�nition 3.2 �(P ): For every relation P , relation�(P ) contains the changes made to P . 2For each tuple t 2 P , count(t) represents thenumber of distinct derivations of tuple t. Similarlyevery tuple in �(P ) has a count associated with it.Negative and positive counts correspond to deletionsand insertions respectively. For instance, �(P ) =fab � 4;mn � �2g says that four derivations of tuplep(a; b) are inserted into P and two derivations of tuplep(m;n) are deleted.The union operator, ], is de�ned over sets of tupleswith counts. Given two such sets S1 and S2, S1]S2is de�ned as follows:1. If tuple t appears in only one of S1 or S2 with acount c, then tuple t appears in S1 ] S2 with acount c. 3

2. If a tuple t appears in S1 and S2 with counts ofc1 and c2 respectively, and c1+ c2 6= 0, then tuplet appears in S1 ] S2 with a count c1 + c2. Ifc1 + c2 = 0 then t does not appear in S1 ] S2.P � refers to the relation P after incorporating thechanges in �(P ). Thus, P � = P ] �(P ). Thecorrectness of our algorithm guarantees that a tuplein P � will not have a negative count; only tuples inrelation �(P ) will have negative counts. The joinoperator is also rede�ned for relations with counts:when two or more tuples join, the count of theresulting tuple is a product of the counts of the tuplesjoined.4 Incremental Maintenance ofNonrecursive Views using CountingThis section gives an algorithm that can be usedto maintain nonrecursive views that use negation andaggregation. We �rst give the intuition using anexample.Example 4.1 Intuition. Consider the view hopde�ned in Example 1.1. We rewrite the viewde�nition using Datalog for succinctness and ease ofexplanation. Recall that link = fab;mng representstuples link(a; b) and link(m;n).(v1): hop(X;Y ) :{ link(X;Z) & link(Z; Y ).If the base relation link changes, the derived relationhop may also change. Let the change to the relationlink be represented as �(link). �(link) containsboth inserted and deleted tuples, represented bypositive and negative counts respectively. The newrelation link� can be written as: link + �(link).The following rule computes the new value for thehop relation in terms of the relation link� :hop�(X;Y ) :{ link�(X;Z) & link�(Z; Y ).Using link� = link ] �(link) and distributingjoins over unions, the above rule can alternatively bewritten as the following set of rules:hop�(X;Y ) :{ link(X;Z) & link(Z; Y ).hop�(X;Y ) :{ �(link)(X;Z) & link(Z; Y ).hop�(X;Y ) :{ link(X;Z) & �(link)(Z; Y ).hop�(X;Y ) :{ �(link)(X;Z) & �(link)(Z; Y ).The �rst rule recomputes relation hop. The remain-ing three rules de�ne �(hop), the changes in relationhop. Of these three rules, the last two can be com-bined, using the fact that link� = link ]�(link).The set of rules that de�nes predicate �(hop) istherefore:

(d1) : �(hop)(X;Y ) :{ �(link)(X;Z) & link(Z; Y )(d2) : �(hop)(X;Y ) :{ link�(X;Z) & �(link)(Z; Y )2De�nition 4.1 Delta Rules: With every rule r ofthe form:(r): p :{ s1 & : : : & sn.we associate n delta rules �i(r); 1 � i � n, de�ningpredicate �(p) as follows:(�i(r)): �(p) :{ s�1 & : : : & s�i�1 & �(si) &si+1 & : : : & sn.2 The counting algorithm is listed as Algorithm 4.1.Ignore statement (2) (surrounded by a box) for now;it will be discussed in Section 5.Example 4.2 Consider the view tri hop de�nedusing the view hop (rule v1, Example 4.1).(v2) : tri hop(X;Y ) :{ hop(X;Z) & link(Z; Y )The stratum numbers of predicates hop and tri hopare 1 and 2 respectively. Consider the following baserelation link and the associated derived relations hopand tri hop.link = fab; ad; dc; bc; ch; fgg.hop = fac � 2; dh; bhg.tri hop = fah � 2g.Let the base relation link be altered as follows:�(link) = fab � �1; df; afg.link� = fad; af; bc; dc; ch; df; fgg.In order to compute the changes, �rst consider therule with the least RSN, namely v1.(�1(v1)) : �hop(X;Y ) :{ �link(X;Z) & link(Z; Y )(�2(v1)) : �hop(X;Y ) :{ link�(X;Z) & �link(Z; Y )Apply rule �1(v1): �(hop) = fac � �1; ag; dggApply rule �2(v1): �(hop) = fafgCombining the above changes, we get:hop� = fac; af; ag; dg; dh; bhg.Now consider the rules with RSN 2, namely rule v2that de�nes predicate tri hop.(�1(v2)): �tri hop(X;Y ) :{ �hop(X;Z) &link(Z; Y ).(�2(v2)): �tri hop(X;Y ) :{ hop�(X;Z) &�link(Z; Y ). 4

Algorithm 4.1Input: A nonrecursive program P .Stored materializations of derived relations in P.Changes (deletions/insertions) to the base relations occurring in program P.Output: Changes (deletions/insertions) to the derived relations occurring in P.Method:Mark all rules unprocessed.For all derived predicates p in program P, doinitialize P � to the materialized relation P .initialize �(P ) = fgWhile there is an unprocessed rulef Q = fr j rule r has the least RSN among all unprocessed rulesgFor every rule r 2 Q dof Compute �(P ) using the delta rules �i(r); 1 � i � n derived from rule r (De�nition 4.1)(�i(r)): �(p) :{ s�1 & : : : & s�i�1 & �(si) & si+1 & : : : & sn : : : : : : : : : : : : (1).P �= P ]�(P ): % Update the predicate that is de�ned by rule r.�(P ) = set(P �)� set(P ) : : : : : : : : : : : : (2)% For optimization only, discussed in Section 5Mark rule r as processed.g g 3Apply rule �1(v2): �(tri hop) = fah � �1; aggApply rule �2(v2): �(tri hop) = fgCombining the above changes, we get:tri hop� = fah; agg.2Lemma 4.1 Let �� be the set of base tuples deletedfrom E, and t be any ground atom that has count(t)derivations w.r.t. program P and database state (edb)E. If �� E then Algorithm 4.1 derives tuple �(t)with a count of at least �1 � count(t). 2That is, given that we insist that the deleted basetuples be a subset of the original database, nomore than the original number of derived tuples aredeleted from any derived relation during evaluationof Algorithm 4.1. Therefore all non-� subgoals havepositive counts.Theorem 4.1 Let t be any ground atom that hascount(t) derivations w.r.t. program P and databasestate (edb) E. Suppose tuple t has count(t�) deriva-tions when edb E is altered to a new edb E� (by inser-tions/deletions). Then: Algorithm 4.1 derives tuple�(t) with a count of count(t�)� count(t). 2If the program needed set semantics, then Algo-rithm 4.1 is optimized by propagating changes topredicates in higher strata only if the relation P � con-sidered as a set changes from relation P consideredas a set. This optimization is done by Statement (2)in Algorithm 4.1 and illustrated in Example 5.1.

5 Implementation Issues andOptimizationsAlgorithm 4.1 needs a count of the number ofderivations for each tuple. Let us see how countsmight be computed during bottom-up evaluation ina database system.We �rst consider database systems that implementduplicate semantics, such as DB2 and SQL/DS fromIBM, and Nonstop SQL from Tandem. The querylanguage SQL in the above systems requires dupli-cates to be retained for semantic correctness [ISO90].In an implementation, duplicates may be representedeither by keeping multiple copies of a tuple, or bykeeping a count with each tuple. In both cases, ouralgorithm works without incurring any overhead dueto duplicate computation. The ] operator in our al-gorithm is mapped to the union operator of the sys-tem when the operands have positive counts. Whenan operand has negative counts, the ] operator isequivalent to multiset di�erence. Though multisetdi�erence is not provided in any of the above exampleSQL systems, it can be executed in time O(n)log(n)or O(n) (where n is the size of the operands) depend-ing on the representation of duplicates.Second, consider systems that have set semantics,such as Glue-Nail and LDL. Such systems can treatduplicates in one of two possible ways during queryevaluation: (1) Do not eliminate duplicates since du-plicate elimination is expensive, and may not have 5

enough payo�, and (2) Eliminate duplicates aftereach iteration of the semi-naive evaluation. The �rstimplementation is likely to be useful only for non-recursive queries because recursive queries may havein�nite counts associated with them. The �rst imple-mentation is similar to computing duplicate seman-tics since all derivation trees will be derived duringevaluation. The second implementation removes du-plicates, and so it may seem that our incrementalview maintenance algorithm must do extra work toderive all the remaining derivation trees. But it is notso for nonrecursive queries.5.1 OptimizationThe boxed statement 2 in Algorithm 4.1 optimizesthe counting algorithm for views where duplicatesemantics is not desired, and for implementationsthat eliminate duplicates.First, note that duplicate elimination is an expen-sive operation, and we can augment the operation tocount the number of duplicates eliminated withoutincreasing the cost of the operation. counts can thenbe associated with each tuple in the relation obtainedafter duplicate elimination. Let us assume that wedo full duplicate computation within a stratum (byextending the evaluation method in some way), andthen do duplicate elimination and obtain counts foreach tuple computed by the stratum. When comput-ing the next higher stratum i + 1, we do not needto make derivations once for each count of a tuple instratum i or less. We do not even need to carry thecounts of tuples in stratum i or lower while evaluatingtuples in stratum i+1. We assume that each tuple ofstratum i or less has a count of one, and compute theduplicate semantics of stratum i + 1. Consequently,the count value for each tuple t corresponds to thenumber of derivations for tuple t assuming that alltuples of lower strata have count 0 or 1. Maintainingcounts as above in uences the propagation of changesin a positive manner. For instance, let predicate p instratum 1 have 20 derivations for a tuple p(a), and letchanges to the base tuples delete 10 of them. How-ever the changes need not be cascaded to a predicateq in stratum 2 because as far as derivations of q areconcerned, p(a) has a count of one as long as its ac-tual count is positive. The incremental computationtherefore stops at stratum 1. The boxed statement2 in Algorithm 4.1 causes us to maintain counts asabove. Consider Example 4.2 if the views had setsemantics.Example 5.1 Consider relations hop� and hop afterthe rules �1(v1) and �2(v1) have been applied. In

order to compute �(hop) we apply the optimizationof Statement 2 from Algorithm 4.1.�(hop)= set(hop�)� set(hop)= fac; af; ag; dg; dh; bhg� fac; dh; bhg= faf; ag; dgg.Note that unlike Example 4.2, the tuple hop(ac��1)does not appear in �(hop) and is not cascaded torelation tri hop. Consequently the tuple (ah � �1)will not be derived for �(tri hop). 2Using the above optimizations, the extra evalua-tion cost incurred by our incremental view mainte-nance algorithm is in computing the duplicate se-mantics of each stratum. For a nonrecursive stratumthere is usually no extra cost in computing the du-plicate semantics. A nonrecursive stratum consistsof a single predicate de�ned using one or more rules,evaluated by a sequence of select, join, project, andunion operators. Each of these operators derives onetuple for each derivation. Thus, tracking counts for anonrecursive view is almost as e�cient as evaluatingthe nonrecursive view.Even in SQL systems implementing duplicatesemantics, it is possible for a query to require setsemantics (by using the DISTINCT operator). Theimplementation issues for such queries are similar tothe case of systems implementing set semantics.6 Negation and AggregationAlgorithm 4.1 can be used to incrementally maintainviews de�ned using negation and aggregation. How-ever, we need to describe how statement 1 in Algo-rithm 4.1 is executed for rules with negated and ag-gregated subgoals, speci�cally how �(S) is evaluatedfor a negated or GROUPBY subgoal s in rule r.6.1 NegationWe consider safe strati�ed negation. Negation issafe as long as the variables that occur in a negatedsubgoal also occur in some positive subgoal of thesame rule. Negation is strati�ed if whenever a derivedpredicate q depends negatively on predicate p, thenSN(p) < SN(q) where SN(p) is the stratum numberof predicate p. Nonrecursive programs are alwaysstrati�ed.The following Example 6.1 gives the intuition forcomputing counts with negated subgoals. A negatedsubgoal is computed in the same way for both set andduplicate semantics.11Formal semantics of Duplicate Datalog with negation isgiven in [MS93a] 6

Example 6.1 Consider view only tri hop that usesviews tri hop and hop as de�ned in Example 4.2.only tri hop contains all pairs of nodes that are con-nected using three links but not using just two.(v3): only tri hop(X;Y ) :{ tri hop(X;Y ) &:hop(X;Y ).Consider the relation link = fab; ae; af; ag; bc;cd; ck; ed; fd; gh; hkg. The relations hop and tri hopare fac; ad � 2; ah; bd; bk; gkg and fad; ak � 2g re-spectively. The relation only tri hop = fak � 2g.Tuple (a; d) does not appear in only tri hop be-cause hop(a; d) is true. Note that hop(a; d) is trueas long as count(hop(a; d)) > 0. Therefore even ifcount(hop(a; d)) was 1 or 5 (as against the indicatedvalue of 2), relation only tri hop would not havetuple (a; d). 2Consider a negated subgoal :q(X;Y ) in rule rde�ning a view. Because negation is safe, thevariables X and Y also occur in positive subgoals inrule r. We represent the relation corresponding to thesubgoal :q as �Q. The relation �Q is computed usingrelation Q and the particular bindings for variablesX and Y provided by the positive subgoals in rule r.A tuple (a; b) is in �Q with a count of 1 if, and onlyif, (i) the positive subgoals in rule r assign the valuesa and b to the variables X and Y , and (ii) the tupleq(a; b) is false ((a; b) 62 Q).Recall that Algorithm 4.1 creates and evaluatesDelta rules of the form �i(r):(�i(r)): �(p) :{ s�1 & : : : & s�i�1 & �(si)& si+1 & : : : & sn.In order to de�ne how rule �i(r) is evaluated,we exhaustively consider all the positions where anegated subgoal can occur in rule �i(r) and de�nehow the subgoal will be computed:Case 1: Subgoal sj = :q, j between i+1 and n: Therelation �Q is computed as described above.Case 2: Subgoal s�j = (:q)� , j between 1 and i� 1:The relation �Q� is equal to the relation �Q� by thefollowing Lemma:Lemma 6.1 For a negated subgoal, :q, predicate(:q)� is equivalent to predicate :(q�). 2Because negation is strati�ed, relation Q� is com-puted before rule �i(r) is used. Relation �Q� is com-puted from Q� in the same way that �Q is computedfrom Q.Case 3: Subgoal �(si) = �(:q): The relation �( �Q)is computed from relations �(Q) and Q according toDe�nition 6.1.

De�nition 6.1 �( �Q): Say Q represents the relationfor predicate q and �(Q) represents the changes madeto Q. A tuple t is in �( �Q) with count(t) = 1 ift 2 �(Q) and t 62 Q ]�(Q);and with count(t) = �1 ift 2 �(Q) and t 62 Q:Note that t 2 �( �Q) only if t 2 �(Q). 2Note that De�nition 6.1 allows �(Q) to be com-puted without having to evaluate the positive sub-goals in rule �i(r). This is important for e�ciency,since the �-subgoal is usually the most restrictivesubgoal in the rule and would be used �rst in the joinorder.Theorem 6.1 Algorithm 4.1 works correctly in thepresence of negated subgoals. 26.2 AggregationAggregation is often used to reduce large amounts ofdata to more usable form. In this section we use thesemantics for aggregation as discussed in [Mum91].The following example illustrates the notation andsemantics.Example 6.2 Consider the relation link from Ex-ample 1.1 and let tuples in link also have a costassociated with them, i.e. link(s; d; c) represents alink from source s to destination d of cost c. We nowrede�ne the relation hop as follows:hop(S; D; C1+C2) :{ link(S; I; C1) & link(I; D; C2)Using hop we now de�ne the relation min cost hopas follows:(v4): min cost hop(S;D;M ) :{GROUPBY (hop(S;D;C) ; [S;D] ; M = MIN(C)).Relation min cost hop contains pairs of nodes alongwith the minimum cost of a hop between them. 2Consider a rule r that contains a GROUPBY subgoalde�ned over relation U . The GROUPBY subgoal repre-sents a relation T whose attributes are the variablesde�ned by the aggregation, and the variables �Y overwhich the groupby occurs. In Example 6.2, the vari-able de�ned by the aggregation isM , and the groupbyoccurs over the variables fS;Dg. The GROUPBY sub-goal GROUPBY (hop(S;D;C) ; [S;D] ; M = MIN(C))thus de�nes a relation over variables fS; D; Mg. Therelation T contains one tuple for every distinct valueof the groupby attributes. All tuples in the grouped 7

relation U that have the same values for the group-ing attributes in set �Y , say �y, contribute one tuple torelation T , a tuple we denote by T�y . In Example 6.2,the relation for the GROUPBY subgoal has one tuplefor every distinct pair of nodes [S;D].Like negation, aggregation subgoals are non-mono-tonic. Consider inserting tuple � into the relation Uwhere the values of the �Y attributes in tuple � are= �c. Inserting � can possibly change the value ofthe aggregate tuple in relation T that correspondsto �c, i.e. tuple T�c. For instance, in Example 6.2,inserting the tuple hop(a; b; 10) can only change themin cost hop tuple from a to b. The change actuallyoccurs if the previous minimum cost from a to b hada cost more than 10. A similar potential changecan occur to tuple T�c if an existing tuple � isdeleted from relation U . Using the old tuple T�cand the tuples in �(U ), the new tuple correspondingto the groupby attribute value �c can be computedincrementally for each of the aggregate functions MIN,MAX, COUNT, SUM, and for any other incrementallycomputable function [DAJ91]. For instance considerthe aggregation operation SUM. The sum of theattribute A of the tuples in a group can be computedusing the old sum when a new tuple is added tothe group by adding �:A to the old sum. Functionslike AVERAGE and VARIANCE that can be decomposedinto incrementally computable functions can also beincrementally computed.To apply Algorithm 4.1 we need to specify how aGROUPBY subgoal is evaluated in a Delta rule �i(r):(�i(r)): �(p) :{ s�1 & : : : & s�i�1 & �(si)& si+1 & : : : & sn.�i(r) could have one or more aggregate subgoals.If an aggregate subgoal t occurs between positionsi + 1 and n, then the relation T for the subgoalis computed as in the case of a normal aggregatesubgoal. If an aggregate subgoal occurs betweenpositions 1; : : : ; i � 1 then the relation T � for thesubgoal can be computed as before using relation U� .If the aggregate subgoal occurs in position j, then thefollowing algorithm is used to compute the relation�(T ):Algorithm 6.1Input: An aggregate subgoal:t = GROUPBY (U; �Y ; �Z = : : :).Changes to the relation for the groupedpredicate u: �(U ).Output: �(T ).Method:For every grouping value �y 2 ��Y�(U )

Incrementally compute T�y� from T�y (old)and �(U ).If T�y� and T�y are di�erent then�(T ) = �(T ) ] f(T�y;�1)g % Insert old% tuple T�y into �(T ) with a count = �1.�(T ) = �(T ) ] f(T�y� ; 1)g % Insert new% T�y� into �(T ) with a count = 1.% Else the aggregate tuple is unchanged% and nothing needs to be done.3If the aggregation function is not incrementallycomputable [DAJ91], and is not even decomposableinto incrementally computable functions, then thecomputation of T�y� may be more expensive. Fornon incrementally computable aggregate functions,the tuple T�y� has to be computed using the tuplesof relation U that have the value �y for the variables�Y .Lemma 6.2 For an aggregate subgoal t, relation T �is equivalent to T ]�(T ). 2Theorem 6.2 Algorithm 4.1 works correctly in thepresence of aggregate subgoals. 27 Incremental Maintenance ofRecursive ViewsWe present the DRed algorithm to incrementallymaintain recursive views that use negation andaggregation and have set semantics.2 The DRedalgorithm can also be used to maintain nonrecursiveviews; however the counting algorithm is expected tobe more e�cient for nonrecursive views. Conversely,we note that the counting algorithm can also beused to incrementally maintain certain recursiveviews [GKM92].A semi-naive [Ull89] computation is su�cientto compute new inserted tuples for a recursivelyde�ned view when insertions are made to baserelations. In the case of deletions however, simplesemi-naive computation would delete a derived tuplethat depends upon a deleted base tuple i.e. if tuple thas even one derivation tree that contains a deletedtuple, then t is deleted. Alternative derivations of tare not considered. Semi-naive therefore computesan overestimate of the tuples that actually needto be deleted. The DRed algorithm re�nes this2The DRed algorithm is similar to an algorithm developedindependently, and at the same time as our work, by Ceriand Widom [CW92], though their algorithm is presented in aproduction rule framework, and they don't handle aggregationand insertions/deletions of rules. 8

overestimate by considering alternative derivations ofthe deleted tuples (in the overestimate) as follows:1. Delete a superset of the derived tuples that needto be deleted: The overestimate is computed by asemi-naive evaluation as follows. For each rule rin the program:(r): p :{ s1 & : : : & si�1 & si & : : : & sn.Create n ��-rules to compute the potentiallydeleted tuples ��(p) for predicate p. The ith ��-rule is:��(p) :{ s1 & :: & si�1 & ��(si) & si+1:: & sn.Say subgoal si refers predicate q. If q is a base re-lation then ��(si) is the given set of tuples deletedfrom relation Q; otherwise ��(si) is the currentoverestimate of the deletions from Q. For othersubgoals sj , use the corresponding materializedor base relation (without incorporating the dele-tions). The ��-rules are applied until the set ofpotentially deleted tuples does not change. Foreach predicate p in the program, the overestimate��(p) of the deleted tuples is removed from thestored materialization P to get relation P �.2. Put back those deleted tuples that have alternativederivations: For each rule r of the form above,create one �r-rule to derive a set �+(p) of tuplesthat were deleted (in Step 1), but have alternativederivations.�+(p) :{ ��(p) & s�1 & : : : & s�n.��(p) is the overestimate of tuples deleted frompredicate p, as computed by Step 1. Let thepredicate for subgoal s�i be q. If q is a base relationthen s�i is the new value of relationQ; otherwise s�iis the current value of Q� obtained by augmentingthe relation Q� derived in Step 1 with tuples for�+(q) derived so far in Step 2. All tuples for�+(p) derived by the �r-rules are inserted into P �,increasing the accuracy of the underestimate forpredicate p, and the �r-rules are re-applied untilno more �+(p) tuples can be derived.If base tuples are inserted into the database, then athird step is used to compute new derived tuples.3. For each rule r of the form above, create n �+-rules to derive new tuples �+(p) to be insertedinto the relation for predicate p. The ith �+-ruleis:�+(p) :{ s�1 & ::s�i�1 & �+(si) & s�i+1 & :: & s�n.Say subgoal si refers predicate q. If q is a baserelation then �+(si) is the given set of tuples

inserted into relation Q; otherwise �+(si) is thecurrent set of tuples inferred to have been insertedinto Q by the �+-rules. Let the predicate forsubgoal s�j be U . Then, the relation for subgoal s�jconsists of relation U� after Step 2 and the tuplesinserted into U in Step 3 until the current point inthe computation. The �+-rules are applied untilno new inserted facts are derived.This three step process is formalized as an algorithm,and proved correct, in [GMS92].A recursive program P can be fragmented intoprograms P1; : : : ;Pn, where Pi = fr jRSN(r) = igconstitutes stratum i. The DRed algorithm computeschange to a view de�ned by a recursive programP by applying the above three steps successivelyto every stratum of P. Every derived predicatein program Pi depends only on predicates that arede�ned in P1; : : : ;Pi�1. All base tuples are instratum 0 i.e. in P0. Changes made to stratum ia�ect only those strata whose SN is � i. Propagatingthe changes stratum by stratum avoids propagatingspurious changes across strata. Let Deli�1 (Addi�1)be the set of tuples that have been deleted (inserted)from strata 1; : : : ; i�1 respectively. Consider stratumi after strata 1; : : : ; i� 1 have been updated. Tuplesare deleted from stratum i based on the set of deletedtuples Deli�1. New tuples are inserted into stratumi based on the set of inserted tuples Addi�1.Theorem 7.1 Let �� and �+ be the set of basetuples deleted and inserted respectively, from theoriginal set of base tuples E. The new derived viewcomputed by the DRed algorithm contains tuple t ifand only if t has a derivation in the database E� =(E � ��) [ �+. 2The DRed algorithm can be applied to recursiveviews with strati�ed negation and aggregation also.The details of the algorithm are given in [GMS92].8 Conclusions and Future WorkWe have presented general techniques for maintainingviews in relational and deductive databases, includingSQL with duplicate semantics, when view de�nitionsinclude negation, aggregation and general recursion.The algorithms compute changes to a materializedview in response to insertions, deletions and updatesto the base relations.The counting algorithm is presented for nonrecur-sive views. We show how this incremental view main-tenance algorithm �ts nicely into existing systemswith both set and multiset semantics. The counting 9

algorithm is a general-purpose algorithm that uni-formly applies to all nonrecursive views, and is the�rst to handle aggregation. The DRed algorithm ispresented for maintaining recursive views. DRed isthe �rst algorithm to handle aggregation in recursiveviews. The algorithm �rst computes an over esti-mate of tuples that need to be deleted in response tochanges to the underlying database. This estimateis re�ned to obtain an exact answer. New derivedtuples are computed subsequently.Counting can be used to maintain recursive viewsalso. However computing counts for recursive viewsis expensive and furthermore counting may notterminate on some views. Techniques to detect�niteness [MS93a] and to use partial derivations forcounting are being explored. Similarly DRed can beused for nonrecursive views also but it is less e�cientthan counting.The techniques to handle negation and aggregationas described in this paper can be used to extend manyother existing view maintenance techniques.References[ABW88] Krzysztof R. Apt, Howard A. Blair, and AdrianWalker. Towards a Theory of Declarative Knowl-edge. In Foundations of Deductive Databases andLogic Programming. Editor J. Minker, 1988 Mor-gan Kaufmann.[BC79] Peter O. Buneman and Eric K. Clemons. E�-ciently Monitoring Relational Databases. In ACMTODS, Vol 4, No. 3, 1979, 368-382.[BCL89] J. A. Blakeley, N. Coburn, and P. Larson. UpdatingDerived Relations: Detecting Irrelevant and Au-tonomously Computable Updates. In ACM TODSVol 14, No. 3, 369-400, 1989.[BLR91] Veronique Benzaken, Christophe Lecluse, andPhilippe Richard. Enforcing Integrity Constraintsin Database Programming Languages. TR Altair68-91, Altair, France, 1991.[BLT86] J. A. Blakeley, P. Larson, and F. W. Tompa. E�-ciently Updating Materialized Views. In SIGMOD1986, pages 61{71.[BMM92] F. Bry, R. Manthey, and B. Martens. IntegrityVer-i�cation in Knowledge Bases. In Logic Program-ming, LNAI 592, pages 114{139, 1992.[BT88] J. A. Blakeley and F. W. Tompa. MaintainingMaterialized Views without Accessing Base Data.Information Systems, 13(4):393{406, 1988.[CW91] Stefano Ceri and Jennifer Widom. Deriving Pro-duction Rules for Incremental View Maintenance.In 17th VLDB, 1991.[CW92] Stefano Ceri and Jennifer Widom. DerivingIncremental Production Rules for Deductive Data.IBM RJ 9071, IBM Almaden, 1992.[DAJ91] S. Dar, R. Agrawal, and H. V. Jagadish. Optimiza-tion of generalized transitive closure. In Seventh

IEEE International Conference on Data Engineer-ing, Kobe, Japan, 1991.[DS92] Guozhu Dong and Jianwen Su. Incremental andDecremental Evaluation of Transitive Closure byFirst-Order Queries. TRCS 92-18, University ofCalifornia, Santa Barbara, 1992.[DT92] Guozhu Dong and Rodney Topor. IncrementalEvaluation of Datalog Queries. In ICDT, 1992.[GKM92] Ashish Gupta, Dinesh Katiyar, and Inderpal SinghMumick. Counting Solutions to the View Main-tenance Problem. In Workshop on DeductiveDatabases, JICSLP, 1992.[GMS92] Ashish Gupta, Inderpal Singh Mumick, and V. S.Subrahmanian. Maintaining views incrementally.TR 921214-19-TM, AT&T Bell Labs, 1992.[HD92] John V. Harrison and Suzanne Dietrich. Main-tenance of Materialized Views in a DeductiveDatabase: An Update Propagation Approach. InWorkshop on Deductive Databases, JICSLP, 1992.[ISO90] ISO ANSI. ISO-ANSI Working Draft: DatabaseLanguage SQL2 and SQL3; X3H2; ISO/IECJTC1/SC21/WG3, 1990.[Kuc91] V. Kuchenho�. On the E�cient Computationof the Di�erence Between Consecutive DatabaseStates. In DOOD, LNCS 566, 1991.[MS93a] Inderpal Singh Mumick and Oded Shmueli. Finite-ness Properties of Database Queries. In FourthAustralian Database Conference, 1993.[Mum91] Inderpal Singh Mumick. Query Optimization inDeductive and Relational Databases. Ph.D. Thesis,Stanford University, CA 94305, 1991.[NY83] J. M. Nicolas and Yazdanian. An Outline ofBDGEN: A Deductive DBMS. In InformationProcessing, pages 705{717, 1983.[QW91] Xiaolei Qian and Gio Wiederhold. IncrementalRecomputation of Active Relational Expressions.In ACM TKDE, 1991.[RS93] Torre Risch and Martin Sk�old. Active rules basedon object-oriented queries. To Appear, ACMTKDE, 1993.[SI84] Oded Shmueli and Alon Itai. Maintenance ofViews. In Sigmod Record, 14(2):240-255, 1984.[SPAM91] Ulf Schreier, Hamid Pirahesh, Rakesh Agrawal,and C. Mohan. Alert: An Architecture forTransforming a Passive DBMS Into an ActiveDBMS. In 17th VLDB, pages 469{478, 1991.[Ull89] Je�rey D. Ullman. Principles of Database andKnowledge-Base Systems, Volumes 1 and 2. Com-puter Science Press, 1989.[UO92] Toni Urpi and Antoni Olive. A Method for ChangeComputation in Deductive Databases. In 18thVLDB, pages 225{237, 1992.[VG86] Allen Van Gelder. Negation as failure using tightderivations for general logic programs. In ThirdIEEE Symposium on Logic Programming, 1986.Springer-Verlag.[WDSY91] Ouri Wolfson, Hasanat M. Dewan, Salvatore J.Stolfo, and Yechiam Yemini. Incremental Evalu-ation of Rules and its Relationship to Parallelism.In SIGMOD 1991, pages 78{87. 10

Maintaining Views Incrementally

Documents

Transcript of Maintaining Views Incrementally