MathSAT: Tight Integration of SAT and Mathematical Decision Procedures
-
Upload
dhbw-stuttgart -
Category
Documents
-
view
3 -
download
0
Transcript of MathSAT: Tight Integration of SAT and Mathematical Decision Procedures
MATHSAT: Tight Integration of SAT and MathematicalDecision Proceduresâ
Marco Bozzano, Roberto Bruttomesso and Alessandro Cimatti({bozzano,bruttomesso,cimatti}@itc.it)ITC-IRST, via Sommarive 18, 38050 Povo, Trento, Italy
Tommi Junttila ([email protected])Helsinki University of Technology, P.O.Box 5400, FIN-02015 TKK, Finland
Peter van Rossum ([email protected])ITC-IRST, Via Sommarive 18, 38050 Povo, Trento, Italy
Stephan Schulz ([email protected])Universita di Verona, Strada le Grazie 15, 37134 Verona, Italy
Roberto Sebastiani ([email protected])DIT, Universita di Trento, via Sommarive 14, 38050 Povo, Trento, Italy
Abstract. Recent improvements in propositional satisfiability techniques (SAT) made it possi-ble to tackle successfully some hard real-world problems (e.g. model-checking, circuit testing,propositional planning) by encoding into SAT. However, a purely boolean representation isnot expressive enough for many other real-world applications, including the verification oftimed and hybrid systems, of proof obligations in software,and of circuit design at RTL level.These problems can be naturally modeled as satisfiability inLinear Arithmetic Logic (LAL),i.e., the boolean combination of propositional variables and linear constraints over numericalvariables.
In this paper we present MATHSAT, a new, SAT-based decision procedure for LAL, basedon the (known approach) of integrating a state-of-the-art SAT solver with a dedicated mathe-matical solver for LAL. We improve MATHSAT in two different directions. First, the top levelprocedure is enhanced, and now features a tighter integration between the boolean search andthe mathematical solver. In particular, we allow for theory-driven backjumping and learning,and theory-driven deduction; we use static learning in order to reduce the number of booleanmodels that are mathematically inconsistent; we exploit problem clustering in order to parti-tion mathematical reasoning; and we define a stack-based interface that allows us to implementmathematical reasoning in an incremental and backtrackable way. Second, the mathematicalsolver is based on layering, i.e. the consistency of (partial) assignments is checked in theoriesof increasing strength (equality and uninterpreted functions, linear arithmetic over the reals,linear arithmetic over the integers). For each of these layers, a dedicated (sub)solver is used.Cheaper solvers are called first, and detection of inconsistency makes calls of the subsequentsolvers superfluous.
We provide a thorough experimental evaluation of our approach, by taking into account alarge set of previously proposed benchmarks. We first investigate the relative benefits anddrawbacks of each proposed technique by comparison with respect to a reference optionsetting. We then demonstrate the global effectiveness of our approach by a comparison withseveral state-of-the-art decision procedures. We show that the behavior of MATHSAT is oftensuperior to its competitors, both on LAL, and in the subclassof Difference Logic.
cŠ 2005Kluwer Academic Publishers. Printed in the Netherlands.
main.tex; 28/04/2005; 21:10; p.1
2
1. Motivations and Goals
Many practical domains of reasoning require a degree of expressiveness be-yond propositional logic. For instance, timed and hybrid systems have a dis-crete component as well as a dynamic evolution of real variables; proof obli-gations arising in software verification are often boolean combinations ofconstraints over integer variables; circuits described atthe Register Trans-fer Level, even though expressible via booleanization, might be easier toanalyze at a higher level of abstraction (see e.g. [12]). Theverification prob-lems arising in such domains can often be modeled as satisfiability in LinearArithmetic Logic (LAL), i.e., the boolean combination of propositional vari-ables and linear constraints over numerical variables. Because of its practicalrelevance, LAL has attracted a lot of interest, and several decision proce-dures (e.g., SVC [15], ICS [23, 18], CVCLITE [15, 7], UCLID [43, 35],HDPLL [31]) are able to deal with it.
In this paper, we propose a new decision procedure for the satisfiability ofLAL, both for the real-valued and integer-valued case. We start from a wellknown approach, previously applied in MATHSAT [27, 3] and in several othersystems [23, 18, 15, 7, 42, 2, 20]: a propositional SAT procedure, modified toenumerate propositional assignments for the propositional abstraction of theproblem, is integrated with dedicated theory deciders, used to check consis-tency of propositional assignments with respect to the theory. We extend thisapproach by improving (i) the top level procedure, and (ii) the mathematicalreasoner.
Thetop level procedurefeatures a tighter integration between the booleansearch and the mathematical solver. First, we allow for theory conflict-drivenbackjumping (i.e. sets of inconsistent constraints identified in the mathemat-ical solver are used to drive backjumping and learning at theboolean level)and theory deduction (i.e. when possible, assignments for unassigned theoryatoms are automatically inferred from the current partial assignment). Boththeory conflicts and theory deductions are learned as clauses codifying therelationships between mathematical atoms at the boolean level; subsequentsearch will thus avoid the generation of boolean assignments that are notmathematically consistent. Second, we suggest a systematic use ofstaticlearning, i.e. the a-priori encoding of some basic mathematical facts at theboolean level before the boolean search. This will stop manyinconsistentassignments from ever being enumerated. A moderate increase in the size ofthe problem is often compensated by significant speed-ups inperformance.
â This work has been partly supported by ISAAC, an European sponsored project, contractno. AST3-CT-2003-501848, by ORCHID, a project sponsored byProvincia Autonoma diTrento, and by a grant from Intel Corporation. The work of T. Junttila has also been supportedby the Academy of Finland, project 53695. S. Schulz has also been supported by a grant of theItalian Ministero dellâIstruzione, dellâUniversita e della Ricerca and the University of Verona.
main.tex; 28/04/2005; 21:10; p.2
3
In this way, MATHSAT settles in the middle ground between the âeagerâapproach, where mathematical facts are discovered during the search, and theâlazy approachesâ approach (e.g. [43, 39]), where a very large number of factsmay be required in order to lift mathematical reasoning to boolean reasoning.Third, we define astack-based interfacebetween the boolean level and themathematical level, that enables the top level to add constraints, set pointsof backtracking, and backjump, in order to exploit the fact that increasinglylarger sets of constraints are analyzed while extending a boolean model. Asa result, the mathematical reasoner can be incremental and backtrackable,and exploit previously derived information rather than restarting from scratchat each call. Finally, we consider that mathematical reasoning is, in manypractical cases, performed on the disjoint union of severalsub-theories (orclusters). Therefore, rather than solving the problem with a single,monolithicmathematical solver, we use a separate instance of the mathematical solver foreach independent cluster.
The main idea underlying themathematical solverfor linear arithmetic isthat it is layered, i.e. it is implemented as a hierarchy of solvers for theoriesof increasing strength. The consistency of (partial) assignments is checkedfirst in the logic of Equality and Uninterpreted Functions (EUF), then in Dif-ference Logics, then in Linear Arithmetic over the reals, and then in LinearArithmetic over the integers (if needed by the problem). Therationale is thatcheaper, more abstract solvers are called first. If unsatisfiability at a moreabstract level is detected, this is sufficient to prune the search.
We provide a thorough experimental evaluation of our approach, basedon a large set of benchmarks previously proposed in the literature. We firstshow the respective merits of each of the proposed optimizations, comparingdifferent configurations of MATHSAT with respect to a âgolden settingâ, andwe show to which extent each of the improvements impacts performance.Then we compare MATHSAT against the state-of-the-art systems (ICS, CV-CLITE, and UCLID) on general LAL problems. We show that our approachis able to deal efficiently with a wide class of problems, withperformancecomparable with and often superior to the other systems. We also compareMATHSAT against the specialized decision procedures DLSAT and TSAT++on the subclass of Difference Logics.
This paper is structured as follows. In Sect. 2 we define Linear ArithmeticLogic. In Sect. 3 we describe the basic MATHSAT approach, and in Sect. 4we present the enhanced algorithm. In Sect. 5 we describe theideas under-lying the mathematical solver. In Sect. 6 we describe the implementation ofthe MATHSAT system. In Sect. 7 we present the result of the experimentalevaluation. In Sect. 8 we discuss some related work; finally,in Sect. 9 wedraw some conclusions and outline the directions for futurework.
This paper updates and extends the content and results presented in a muchshorter conference paper [11].
main.tex; 28/04/2005; 21:10; p.3
4
2. Background: Linear Arithmetic Logic
LetB := {âĽ,>} be the domain of boolean values. LetR andZ be the domainsof real and integer numbers, respectively, and letD denote either of them. Bymath-termsoverD we denote the linear mathematical expressions built onconstants, variables and arithmetical operators overD . Examples of math-terms are constantsci â D , variablesvi over D , possibly with coefficients(i.e.civ j ), and applications of the arithmetic operators+ andâ to math-terms.Boolean atomsare propositionsAi from B. Mathematical atomsare formedby the applications of the arithmetic relations=, 6=,>,<,âĽ,⤠to math-terms.Unspecifiedatomscan be either boolean or mathematical. Bymath-formulaswe denote atoms and their combinations through the standardboolean con-nectivesâ§,ÂŹ,â¨,â,â. For instance,A1â§((v1+5)⤠2v3) is a math-formulaon eitherR or Z. A literal is either an atom (apositiveliteral) or its negation(a negativeliteral). Examples of literals areA1, ÂŹA2, (v1 + 5v2 ⤠2v3 â 2),ÂŹ(2v1âv2 = 5). If l is a negative literalÂŹĎ, then by âÂŹl â we denoteĎ ratherthanÂŹÂŹĎ. We denote the set of all atoms inĎ by Atoms(Ď), and the subset ofmathematical atoms byMathAtoms(Ď).
An interpretation in D is a mappingI which assigns values inD tovariables and truth values inB to boolean atoms. Given an interpretation,math-terms and math-formulas are given values inD and inB, respectively,by interpreting constants, arithmetical operators and boolean connectives ac-cording to their standard (arithmetical or logical) semantics. We writeI (Ď)for the truth value ofĎ under the interpretationI , and similarlyI(t) for thedomain value of the math-termt. We say thatI satisfiesa math-formulaĎ,written I |= Ď, iff I (Ď) = >. For example, the math-formulaĎ := (A1 â(v1â2v2 ⼠4))⧠(ÂŹA1 â (v1 = v2 +3)) is satisfied by an interpretationI inZ s.t.I (A1) = >, I (v1) = 8, andI (v2) = 1.
We say that a math-formulaĎ is satisfiablein D if there exists an inter-pretation inD which satisfiesĎ. The problem of checking the satisfiability ofmath-formulas is NP-hard, since standard boolean formulasare a strict sub-case of math-formulas (this means theoretically âat least as hardâ as standardboolean satisfiability, but in practice it turns out to be much harder).
A total (resp., partial)truth assignmentfor a math-formulaĎ is a functionÂľfrom all (resp., a subset of) the atoms ofĎ to truth values. We represent a truthassignment as a set of literals, with the intended meaning that positive andnegative literals represent atoms assigned to true and to false, respectively. Weuse the notationÂľ= {Îą1, . . . ,ÎąN,β1, . . . ,βM,A1, . . . ,AR,ÂŹAR+1, . . . ,ÂŹAS},whereÎą1, . . ., ÎąN, β1, . . ., βM are mathematical atoms andA1, . . . ,AS areboolean atoms. We say thatÂľ propositionally satisfiesĎ, written Âľ |=p Ď, iffit makesĎ evaluate to true. We say that an interpretationI satisfies a truthassignmentÂľ iff I satisfies all the elements ofÂľ; if there exists an (resp., no)interpretation which satisfies an assignmentÂľ, thenÂľ is said LAL-satisfiable
main.tex; 28/04/2005; 21:10; p.4
5
(resp., LAL-unsatisfiable). The truth assignment{A1,(v1â2v2 ⼠4),ÂŹ(v1 =v2+3)} propositionally satisfies(A1 â (v1â2v2 ⼠4))⧠(ÂŹA1 â (v1 = v2+3)), and it is satisfied byI s.t.I (A1) = >, I (v1) = 8, andI (v2) = 1.
EXAMPLE 2.1. Consider the following math-formulaĎ:
{ÂŹ(2v2âv3 > 2) â¨A1} ⧠{ÂŹA2⨠(2v1â4v5 > 3)}
â§{(3v1â2v2 ⤠3) â¨A2} ⧠{ÂŹ(2v3 +v4 ⼠5) â¨ÂŹ(3v1âv3 ⤠6) â¨ÂŹA1}
â§{A1⨠(3v1â2v2 ⤠3)} ⧠{(v1âv5 ⤠1) ⨠(v5 = 5â3v4)â¨ÂŹA1}
â§{A1⨠(v3 = 3v5 +4) â¨A2}.
The truth assignmentÂľ corresponding to the underlined literals is:
{ÂŹ(2v2âv3 > 2),ÂŹA2,(3v1â2v2 ⤠3),ÂŹ(3v1âv3 ⤠6),(v1âv5 ⤠1),(v3 = 3v5 +4)}.
(Notice thatÂľ is a partial assignment, because it assigns truth values only toa subset of the atoms ofĎ.) Âľ propositionally satisfiesĎ as it sets to true oneliteral of every disjunction inĎ. Notice thatÂľ is not LAL-satisfiable â in fact,neither of the following sub-assignments ofÂľ has a satisfying interpretation:
{ÂŹ(2v2âv3 > 2),(3v1 â2v2 ⤠3),ÂŹ(3v1âv3 ⤠6)} (1)
{ÂŹ(3v1âv3 ⤠6),(v1âv5 ⤠1),(v3 = 3v5 +4)} (2)
3
Given a LAL-unsatisfiable assignmentÂľ, we call aconflict setany LAL-unsatisfiable sub-assignment¾Ⲡâ Âľ; we say that¾Ⲡis aminimal conflict setif all subsets of¾Ⲡare LAL-consistent. E.g., both (1) and (2) are minimalconflict sets ofÂľ.
3. The MATHSAT Algorithm: Basics
A much simplified, recursive representation of the basic MATHSAT proce-dure is outlined in Fig. 1. MATHSAT takes as input a math-formulaĎ, and (byreference) an empty interpretationI . Without loss of generality,Ď is assumedto be in conjunctive normal form (CNF). MATHSAT returns> if Ď is LAL-satisfiable (withI containing a satisfying interpretation), and⼠otherwise.MATHSAT invokes MATHDPLL passing as arguments the boolean formulaĎ := M 2B (Ď) and (by reference) an empty assignment forĎ and the emptyinterpretationI .
We introduce a bijective functionM 2B (for âMath-to-Booleanâ), alsocalled boolean abstractionfunction, that maps boolean atoms into them-selves, math-atoms into fresh boolean atoms âso that two atom instances in
main.tex; 28/04/2005; 21:10; p.5
6
function MATHSAT (Math-formulaĎ,interpretation& I )return MATHDPLL (M 2B (Ď),{}, I );
function MATHDPLL (Boolean-formulaĎ,assignment& Âľ,interpretation& I )
if (Ď == >) /* base */then return MATHSOLVE (B 2M (Âľ),I ) ;
if (Ď == âĽ) /* backtrack */then return Unsat;
if {l occurs inĎ as a unit clause} /* unit prop. */then return MATHDPLL (assign(l ,Ď),ÂľâŞ{l}, I );
if (MATHSOLVE (B 2M (Âľ),I ) == Unsat) /* early pruning */then return Unsat;
l := choose-literal(Ď); /* split */if ( MATHDPLL (assign(l ,Ď),ÂľâŞ{l}, I ) == Sat)
then return Sat;else return MATHDPLL (assign(ÂŹl ,Ď),ÂľâŞ{ÂŹl}, I );
Figure 1. High level view of the MATHSAT algorithm
Ď are mapped into the same boolean atom iff they are syntactically identicalâand distributes over sets and boolean connectives. Its inverse functionB 2M(for âBoolean-to-Mathâ) is calledrefinementrespectively. Both functions canbe implemented efficiently, so that they require a small constant time formapping one atom.
MATHDPLL tries to build an assignmentÂľ satisfyingĎ, such that its re-finement is satisfiable in LAL, and the interpretationI satisfiesB 2M (Âľ) (andĎ). This is done recursively, with a variant of DPLL modified toenumerateassignments, and trying to refine them according to LAL. In particular:Base. If Ď == >, thenÂľ propositionally satisfiesM 2B (Ď). In order to checkif Âľ is LAL-satisfiable, which shows thatĎ is LAL-satisfiable, MATHD-PLL invokes the linear mathematical solver MATHSOLVE on the refinementB 2M (Âľ), and returns aSator Unsatvalue accordingly.Backtrack. If Ď ==âĽ, thenÂľhas led to a propositional contradiction. There-fore MATHDPLL returnsUnsatand backtracks.Unit. If a literal l occurs inĎ as a unit clause, thenl must be assigned a truevalue. Thus, MATHDPLL is invoked recursively with the formula returnedby assign(l,Ď) and the assignment obtained by addingl to Âľ as arguments.assign(l,Ď) substitutes every occurrence ofl in Ď with > and propositionallysimplifies the result.
main.tex; 28/04/2005; 21:10; p.6
7
Early pruning. MATHSOLVE is invoked on (the refinement of) the currentassignmentÂľ. If this is found unsatisfiable, then there is no need to proceed,and the procedure backtracks.Split. If none of the above situations occurs, thenchoose-literal(Ď) returns anunassigned literall according to some heuristic criterion. Then MATHDPLL
is first invoked recursively with argumentsassign(l,Ď) and ¾⪠{l}. If theresult isUnsat, then MATHDPLL is invoked with argumentsassign(ÂŹl ,Ď)andÂľâŞ{ÂŹl}.
4. The MATHSAT Algorithm: Enhancements
The algorithm presented in the previous section is over-simplified for ex-planatory purposes. It can be easily adapted to deal with advanced SAT solv-ing techniques such as splitting heuristics, two-literalswatching, and restarts(see [44] for an overview). This section describes several enhancements thathave been made to the interplay between the boolean and mathematical solvers.
4.1. THEORY-DRIVEN BACKJUMPING AND LEARNING
When MATHSOLVE finds the assignment¾ to be LAL-unsatisfiable, it alsoreturns a conflict setΡ causing the unsatisfiability. This enables MATHDPLL
to backjump in its search to the most recent branching point in which at leastone literall â Ρ is not assigned a truth value, pruning the search space below.We call this techniquetheory-driven backjumping. Clearly, its effectivenessstrongly depends on the quality of the conflict sets generated.
EXAMPLE 4.1. Consider the formulaĎ and the assignmentÂľ of Ex. 2.1.Suppose that MATHDPLL generatesÂľ following the order of occurrence withinĎ, and that MATHSOLVE(Âľ) returns the conflict set (1). Thus MATHDPLL
can jump back directly to the branching pointÂŹ(3v1 â v3 ⤠6) without ex-ploring the right branches of(v3 = 3v5 + 4) and (v1 â v5 ⤠1). If insteadMATHSOLVE(Âľ) returns the conflict set (2), then MATHSAT backtracks to(v3 = 3v5 +4). Thus, (2) causes no reduction in search. 3
When MATHSOLVE returns a conflict setΡ, the clauseΡ can be addedin conjunction toĎ: this will prevent MATHDPLL from generating again anybranch containingΡ. We call this techniquetheory-driven learning.
EXAMPLE 4.2. As in Ex. 4.1, suppose MATHSOLVE(Âľ) returns the conflictset (1). Then the clause(2v2âv3 > 2)â¨ÂŹ(3v1â2v2 ⤠3)⨠(3v1âv3 ⤠6) isadded in conjunction toĎ. Thus, whenever a branch contains two elements of(1), MATHDPLL will assign the third to false by unit propagation. 3
main.tex; 28/04/2005; 21:10; p.7
8
As in the boolean case, learning must be used with some care, since it maycause an explosion in the size ofĎ. Therefore, some techniques can be usedto discard learned clauses when necessary [8]. Notice however the differ-ence with standard boolean backjumping and learning [8]: inthe latter case,the conflict set propositionally falsifies the formula, while in our case it isinconsistent from the mathematical viewpoint.
4.2. THEORY-DRIVEN DEDUCTION
With early pruning, MATHSOLVE is used to check ifÂľ is LAL-satisfiable,and thus to possibly prune whole branches of the search. It isalso possible touse MATHSOLVE to reduce the remaining boolean search: the mathematicalanalysis ofÂľ performed by MATHSOLVE can discover that the value of somemathematical atomĎ 6â Âľ is already determined, based on some subset¾Ⲡâ Âľof the current assignment. For instance, consider the case where the literals(v1 = v2) and (v2 â v3 = 4) are in the current (partial) assignmentÂľ, while(v1 â v3 = 4) is currently unassigned. Since{(v1 = v2),(v2 â v3 = 4)} |=(v1âv3 = 4), atom(v1âv3 = 4) must be assigned to>, because assigning itto⼠would makeÂľ LAL-inconsistent.
MATHSOLVE is therefore used to detect and suggest to the boolean searchwhich unassigned literals have forced values. This kind of deduction is of-ten very useful, since it can trigger new boolean constraintpropagation: thesearch is deepened without the need to split. Moreover, the implication clausesdescribing the deduction (e.g.ÂŹ(v1 = v2)â¨ÂŹ(v2âv3 = 4)â¨(v1âv3 = 4)) canbe learned at the boolean level, and added to the main formula: this constrainsthe remaining boolean search even after backtracking.
4.3. A STACK-BASED INTERFACE TOMATHSOLVE
Since the search is driven by the âstack-basedâ boolean procedure, we definea stack-based interface to call the math solver. In this way,MATHSOLVE cansignificantly exploit previous computations. Consider thefollowing trace (leftcolumn first, then right):
MATHSOLVE (Âľ1) â Sat UndoÂľ2
MATHSOLVE (Âľ1âŞÂľ2) â Sat MATHSOLVE (Âľ1âŞÂľâ˛2) â Sat
MATHSOLVE (Âľ1âŞÂľ2âŞÂľ3) â Sat MATHSOLVE (Âľ1âŞÂľâ˛2âŞÂľâ˛3) â Sat
MATHSOLVE (Âľ1âŞÂľ2âŞÂľ3âŞÂľ4) â Unsat MATHSOLVE (Âľ1âŞÂľ2âŞÂľâ˛3âŞÂľâ˛4) â Sat
On the left, an assignment is repeatedly extended until a conflict is found.We notice that MATHSOLVE is invoked (during early pruning calls) onin-crementalassignments. When a conflict is found, the search backtrackstoa previous point (on the right), and MATHSOLVE is then restarted from apreviously visited state. Based on these considerations, our MATHSOLVE is
main.tex; 28/04/2005; 21:10; p.8
9
not a function call: it has a persistent state, and isincrementalandbacktrack-able. Incremental means that it avoids restarting the computation from scratchwhenever it is given in input an assignment¾Ⲡsuch that¾Ⲡâ Âľ and Âľ hasalready proved satisfiable. Backtrackable means that it is possible to returnto a previous state on the stack in a relatively efficient manner. ThereforeMATHSOLVE has primitives toadd constraints to the current state, tosetbacktrack points, and tojump backto a previously set backtrack point.
4.4. FILTERING
Another way of speeding up MATHSOLVE is to give it smaller but in somesense sufficient sets of constraints.
Pure Literal FilteringAssume that a math-atomĎ only occurs positively in the formulaĎ, i.e., thereis no clause inĎ having the literalÂŹĎ. That is,Ď is apure literal. Now if Ďis assigned to false in the current truth assignmentÂľ, i.e.ÂŹĎ â Âľ, we donâthave to passÂŹĎ to MATHSOLVE. This is because if an extension¾Ⲡof Âľpropositionally satisfiesĎ, so will ¾Ⲡ\ {ÂŹĎ} as Ď is a pure literal. Similaranalysis applies to the case in whichĎ only occurs negatively inĎ.
Notice that if a pure literalĎ is assigned to true inÂľ, then it has tobepassed to MATHSOLVE. Furthermore, one maynot fix Ď to true before theMATHDPLL search as in the purely boolean case.
Theory-Deduced Literal FilteringAnother way of reducing the amount of math-atoms given to MATHSOLVE isto exploit theory-deduced clauses, i.e. those clauses resulting from theory-driven learning (Sect. 4.1), theory-driven deduction (Sect. 4.2), and staticlearning (Sect. 4.6). For each theory-deduced clauseC = l1 ⨠¡¡ ¡ ⨠ln, eachl i being a math-atom or its negation, the truth assignment{ÂŹl1, . . . ,ÂŹln} isLAL-unsatisfiable. That is, all interpretations that satisfy all ÂŹl1, . . . ,ÂŹlnâ1
must satisfy the literalln. Therefore, if the current truth assignmentÂľ con-tains the literalsÂŹl1, . . . ,ÂŹlnâ1, and the literalln is forced to true by unitpropagation on the clauseC, there is no need to passln to MATHSOLVE
as ¾ is LAL-satisfiable iff ¾⪠{ln} is. In order to detect these cases, thetheory-deduced clauses can be marked with a flag.
Combining the filtering methods requires some care. The literalsÂŹl1, . . . ,ÂŹlnâ1
in the current truth assignment must have been passed to MATHSOLVE (i.e. notfiltered) in order to apply theory-deduced literal filteringto ln.
main.tex; 28/04/2005; 21:10; p.9
10
4.5. WEAKENED EARLY PRUNING
Early pruning calls are only used to prune the search; if the current (partial)assignmentÂľ is found to be unsatisfiable, the search backtracks, but if itisfound to be satisfiable, the search goes deeper and the assignment will beextended.
Therefore, during early pruning calls, MATHSOLVE does not have to de-tect all inconsistencies; as long as calls to MATHSOLVE at the end of asearch branch faithfully detect inconsistency, correctness and completenessare guaranteed.
We exploit this fact by using a faster, but less powerful version of MATH-SOLVE for early pruning calls. Specifically, in theR domain, handling dis-equalities requires an extra solver which is often time-consuming (see Sec-tion 5). As disequalities inR are typically very low-constraining, and thusvery rarely cause inconsistency, during early pruning calls MATHSOLVE ig-nores disequalities, which are instead considered when checking completesearch branches.
In theZ domain, as the theory of linear arithmetic onZ is much harder, intheory and in practice, than that onR [9], during early pruning calls MATH-SOLVE looks for a solution on the reals only.
4.6. STATIC LEARNING
Before starting the actual MATHDPLL search, the problem can be pre-processedby adding some basic mathematical relationships among the math-atoms asboolean constraints to the problem. As the added constraints are consequencesof the underlying theory, the satisfiability of the problem is preserved. Thenew constraints may significantly help to prune the search space in thebooleanlevel, thus avoiding some LAL-unsatisfiable models and calls to the moreexpensive MATHSOLVE. In other words, before the search, we learn, at lowcost, some basic facts that most often would have to be discovered, at a muchhigher cost, by the math solver during the search.
The simplest case of static learning is based on (in)equalities betweenmath-terms and constants. Assume thatĎ contains a set of math-atoms ofform St = {(t ./1 c1), ...,(t ./n cn)}, wheret is a math-term,./i â {<,â¤,=,âĽ,>}, andci are constants. First,Ď is conjoined with a set of constraintsover the equality atoms of form(t = ci) in St , ensuring that at most oneof them can be true. This can be achieved with pairwise mutualexclusionconstraints of formÂŹ(t = ci)â¨ÂŹ(t = c j). Second, the math-atoms inSt areconnected with a linear number of binary constraints that compactly encodethe obvious mathematical (in)equality relationships between them. For in-stance, ifSt = {(t ⤠2),(t = 3),(t > 5),(t ⼠7)}, thenĎ is conjoined with theconstraints(t = 3) âÂŹ(t > 5), (t = 3) âÂŹ(t ⤠2), (t ⤠2) âÂŹ(t > 5), and
main.tex; 28/04/2005; 21:10; p.10
11
(t ⼠7) â (t > 5). Now (t ⼠7) implies(t > 5), ÂŹ(t = 3) andÂŹ(t ⤠2) in theboolean level.
Furthermore, some facts among difference constraints of the formt1â t2 ./
c, ./ â {<,â¤,=,âĽ,>}, can be easily derived and added. First, mutually ex-clusive pairs of difference constraints are handled. E.g.,if (t1â t2 ⤠3),(t2ât1 < â4) â MathAtoms(Ď), then the clauseÂŹ(t1â t2 ⤠3)â¨ÂŹ(t2â t1 < â4)is conjoined toĎ. Second, clauses corresponding to triangle inequalities andequalities between difference constraints are added. E.g., if (t1â t2 ⤠3),(t2ât3 < 5),(t1 â t3 < 9) â MathAtoms(Ď), then (t1 â t2 ⤠3)⧠(t2 â t3 < 5) â(t1â t3 < 9) is added toĎ. Similarly, for (t1â t2 = 3),(t2â t3 = 0),(t1â t3 =5) â MathAtoms(Ď) we add the constraint(t1 â t2 = 3) ⧠(t2 â t3 = 0) âÂŹ(t1â t3 = 5) to Ď.
4.7. CLUSTERING
At the beginning of the search,MathAtoms(Ď), i.e. the set of mathematicalatoms, is partitioned into a set of disjointclusters C1 ⪠¡¡ ¡ âŞCk: intuitively,two atoms belong to the same cluster if they share a variable.If Li is thesets of literals built with the atoms in clusteri, it is easy to see that an as-signmentÂľ is LAL-satisfiable if and only if each¾⊠Li is LAL-satisfiable.Based on this idea, instead of having a single, monolithic solver for lineararithmetic, the mathematical solver is instantiatedk different times. Each isresponsible for handling the mathematical reasoning within a single cluster. Adispatcher is responsible for the activation of the suitable mathematical solverinstances, depending on the mathematical atoms occurring in the assignmentto be analyzed.
The advantage of this approach is manifold. First,k solvers running onk disjoint problems are typically faster then running one solver monolithi-cally on the union of the problems. Furthermore, the construction of smallerconflict sets becomes easier, and this may result in a significant gain in theoverall search. Finally, when caching the results of previous calls to the linearsolvers, it increases the likelihood of a hit.
5. A Layered MATHSOLVE
In this section, we discuss the structure of MATHSOLVE. We disregard theissues related to clustering, since the different instances of MATHSOLVE thatresult are completely independent of each other. MATHSOLVE is responsiblefor checking the satisfiability of a set of mathematical atoms Âľ and returning,as appropriate, a model or a conflict set.
In many calls to MATHSOLVE, a general solver for linear constraints isnot needed: very often, the unsatisfiability of the current assignmentÂľ can be
main.tex; 28/04/2005; 21:10; p.11
12
FM ELIMtime
out
Branch andcut
RealdisequalitySIMPLEXDL (BF)EUF (CC)
unsat
sat
unsat
sat
intâs
sat
realssat
unsatunsat unsatunsat
satsatsat
unsat
sat
Reals IntegersEUF
Figure 2. Control flow of MATHSOLVE
established in less expressive, but much easier, sub-theories. Thus, MATH-SOLVE is organized in alayered hierarchyof solvers of increasing solvingcapabilities. If a higher level solver finds a conflict, then this conflict is usedto prune the search at the boolean level; if it does not, the lower level solversare activated.
Layering can be understood as trying to favour faster solvers for moreabstract theories over slower solvers for more general theories. Fig. 2 showsa rough idea of the structure of MATHSOLVE. Three logical components canbe distinguished. Firstly, the current assignmentÂľ is passed to theequationalsolver, that only deals with (positive and negative) equalities (Sect. 5.1).Secondly, if this solver does not find a conflict, MATHSOLVE tries to finda conflict over the reals (see Sect. 5.2). Finally, if the current assignmentis also satisfiable over the reals and the variables are to be interpreted overthe integers, a solver for linear arithmetic over the integers is invoked (seeSect. 5.3).
5.1. EQUALITY AND UNINTERPRETEDFUNCTIONS
The first layer of MATHSOLVE is provided by the equational solver, a satis-fiability checker for the logic of unconditional ground equality over uninter-preted function symbols. It is incremental and supports efficient backtracking.The solver generates conflict sets, deduces assignments forequational literals,and can provide explanations for its deductions. Thanks to the equationalsolver, MATHSAT can be used as an efficient decision procedure for the fulllogic of equality over uninterpreted function symbols(EUF). However, inthis section we focus on the way the equational solver is usedto improve theperformance on LAL.
The solver is based on the basic congruence closure algorithm suggestedin [29]. We slightly extend the logic by allowing forenumerated objectsandnumbers, with the understanding that each object denotes a distinctdomainelement (i.e. an object is implicitly different from all theother objects and
main.tex; 28/04/2005; 21:10; p.12
13
from all numbers). Similarly, different numbers are implicitly different fromeach other (and from all objects).
The congruence closure module internally constructs a congruence datastructure that can determine if two arbitrary terms are necessarily forced tobe equal by the currently asserted constraints, and can thusbe used to deter-mine the value of (some) equational atoms. It also maintainsa list of asserteddisequations, and signals unsatisfiability if either one of these or an implicitdisequation is violated by the current congruence.
If two terms are equal, an auxiliary proof tree data structure allows usto extract the reason, i.e. the original constraints (and just those) that forcedthis equality. If a disequality constraint is violated, we can return the reason(together with the violated inequality) as aconflict set.
Similarly, we can performforward deduction: for each unassigned equa-tional atom, we can determine if the two sides are already forced to be equalby the current assignment, and hence whether the atom has to be asserted astrue or false. Again, we can extract the reason for this deduction and use it torepresent the deduction as a learned clause on the Boolean level.
There are two ways in which the equational solver can be used:as a fullsolver for a purely equational cluster, or as a layer in the arithmetic reasoningprocess. In the first case, the equational solver is associated to a cluster notinvolving any arithmetic at all, which only contains equations of the formvi ./ v j , vi ./ c j , with ./ â {=, 6=}. As stated above, the equational solverimplicitly knows that syntactically different constants in D are semanticallydistinct. So, it provides a full solver for some clusters, avoiding the need tocall an expensive linear solver on an easy problem. This can significantlyimprove performance, since in practical examples it is often the case thata purely equational cluster is present â typical examples are the modeling ofassignments in a programming language, and gate and multiplexer definitionsin circuits.
In the second case, the equational solver also receives constraints involv-ing arithmetic operators. While arithmetic functions are treated as fully un-interpreted, the equational solver has a limited interpretation of < andâ¤,knowing only thats< t impliess 6= t, ands= t implies s⤠t andÂŹ(s< t).However, all deductions and conflicts under EUF semantics are also validunder fully interpreted semantics. Thus, the efficient equational solver can beused to prune the search space. Only if the equational solvercannot deduceany new assignments and reports a tentative model, does thismodel need tobe analyzed by lower level solvers.
5.2. LINEAR ARITHMETIC OVER THE REALS
To check a given assignmentÂľ of linear constraints for satisfiability over thereals, MATHSOLVE first considers only those constraints that are in the differ-
main.tex; 28/04/2005; 21:10; p.13
14
ence logic fragment. I.e., it considers the subassignment of Âľ consisting of allconstraints of the formsvi âv j ./ c andvi ./ c, with ./â {=, 6=,<,>,â¤,âĽ}.Satisfiability checking for this subassignment is reduced to a negative-cycledetection problem in the graph whose nodes correspond to variables andwhose edges correspond to the constraints. MATHSOLVE uses an incrementalversion of the Bellman-Ford algorithm to search for a negative-cycle andhence for a conflict. See, for instance, [13], for backgroundinformation. Inmany practical cases, for instance in bounded model checking problems oftimed automata, a sizable amount or even all ofÂľ is in the difference logicfragment. This causes a considerable speed up, since the Bellman-Ford al-gorithm is much more efficient than a general linear solver and generallygenerates much better (smaller) conflict sets.
If the difference logic fragment ofÂľ turns out to be satisfiable, MATH-SOLVE checks the satisfiability of the subassignment ofÂľ consisting of allconstraints except the disequalities by means of the simplex method. MATH-SOLVE uses a variant of the simplex method, namely the Cassowary algo-rithm (see [10]), that uses slack variables to efficiently allow the addition andremoval of constraints and the generation of a minimal conflict set.
When this also turns out to be consistent, disequalities aretaken into ac-count: the incremental and backtrackable machinery is usedto check, for eachdisequalityâcivi 6= c j in Âľ, and separately from the other disequalities, if itis consistent with the non-disequality constraints inÂľ. We do so by addingand retracting bothâcivi < c j and âcivi > c j . If one of the disequalitiesis inconsistent, the whole assignmentÂľ is inconsistent. However, becausethe theory of the reals is (logically) convex, if each disequality separatelyis consistent, then all ofÂľ is consistent â this follows from a dimensionalityargument, basically because it is impossible to write an affine subspaceA ofR
k as a finite union of proper affine subspaces ofA.
5.3. LINEAR ARITHMETIC OVER THE INTEGERS
Whenever the variables are interpreted over the reals, MATHSOLVE is doneat this point. If the variables are to be interpreted over theintegers, and theproblem is unsatisfiable inR, then it also is so overZ. When the prob-lem is satisfiable in the reals, it is possible that it is not soin the integers.The first step carried out by MATHSOLVE in this case is a simple form ofbranch-and-cut (see, e.g., [26]), that searches for solutions over the integersby tightening the constraints. The algorithm acts on the representation ofthe solution space constructed over the integers, and makesuse of the incre-mental and backtrackable machinery. Branch-and-cut also takes into accountdisequalities.
Branch-and-cut is only complete when the solution space is bounded, andthere are practical cases when it can be very slow to converge. Therefore,
main.tex; 28/04/2005; 21:10; p.14
15
if it does not find either an integer solution or a conflict within a small,predetermined amount of search, the current assignment is analyzed with theFourier-Motzkin Elimination (FME) procedure. Since it is computationallyexpensive, FME is called only as a last resort.
6. The MATHSAT system
The MATHSAT system is a general solver implementing the ideas and al-gorithms described earlier in this paper. It also has some other features andaccepts a richer input language than pure LAL, as e.g. equalities over unin-terpreted functions are allowed.
It is structured in three main components: (i) a preprocessor, (ii) a booleansatisfiability solver, and (iii) the MATHSOLVE theory reasoner.
PreprocessorMATHSAT supports a rich input language, with a large variety of booleanand arithmetic operators, including ternaryif-then-elseconstructs on the termand formula level. For reasons of simplicity and efficiency,MATHDPLL, thecore engine of the solver, handles a much simplified language. Reducing therich input language to this simpler form is done by apreprocessormodule.
The preprocessor performs some basic normalization of atoms, so that thecore engine only has to deal with a restricted set of predicates. It eliminateseach ternaryif-then-elseterm t = ITE(b, t1, t2) over math termst1 and t2by replacing it with a new variablevt and adding the booleanif-then-elseconstraintITE(b,vt = t1,vt = t2) to the formula. Finally, it uses a standardlinear-time, satisfiability preserving translation to transform the formula (in-cluding the remainingif-then-elseon the boolean level) into clause normalform.
Boolean solverThe propositional abstraction of the math-formula produced by the prepro-cessor is given to the boolean satisfiability solver extended to implementthe MATHDPLL algorithm described in Sect. 3. This solver is built uponthe MINI SAT solver [17], from which it inherits conflict-driven learning andback-jumping, restarts [37, 8, 22], optimized boolean constraint propagationbased on the two-watched literal scheme, and the VSIDS splitting heuristics[28]. In fact, if MATHSAT is given a purely Boolean problem, it behaves sub-stantially like MINI SAT, as MATHSOLVE is not instantiated.1 The commu-nication with MATHSOLVE is carried out through an interface (similar to the
1 In some experiments on some very big pure SAT formulas, whichare not reported here,MATHSAT took on average 10-20% more time than MINI SAT to solve the same instances.
main.tex; 28/04/2005; 21:10; p.15
16
one in [20]) that passes assigned literals, LAL-consistency queries and back-tracking commands, and receives back answers to the queries, mathematicalconflict sets and implied literals (Sect. 3).
The boolean solver has been extended to handle some options relevantwhen dealing with math-formulas. For instance, MATHSAT inherits MIN-ISATâs feature of periodically discarding some of the learned clauses to pre-vent explosion of the formula size. However, clauses generated by theory-driven learning and forward deduction mechanisms (Sect. 3)are never dis-carded, as a default option, since they may have required a lot of work inMATHSOLVE. As a second example, it is possible to initialize the VSIDSheuristics weights of literals so that either boolean or theory atoms are pre-ferred as splitting choices early in the MATHDPLL search.
MATHSOLVE
The implementation of MATHSOLVE is composed of several software mod-ules. The equational reasoner is implemented in C/C++, and reuses some ofthe data structures of the theorem prover E [33] to store and process termsand atoms. The module for handling difference constraints is developed inC++. The simplex algorithm for linear arithmetic over the reals is based onthe Cassowary system [5]. The branch-and-cut procedure is implemented ontop of it, and uses the incrementality features of Cassowaryto perform thesearch. For the Fourier-Motzkin elimination, MATHSOLVE uses the Omegasystem [30].
A very important point is that MATHSAT is able to deal with infiniteprecision arithmetic. To this end, the mathematical solverhandles arbitrarylarge rational numbers by means of the GMP library [21].
7. Experimental Evaluation
In this section we report on the experiments we have carried out to eval-uate the performance of our approach. The experiments were run on a bi-processor XEON 3.0GHz machine with 4GB of memory (tests in Sect. 7.2),on a 4-processor PentiumIII 700MHz machine with 6GB of memory (testsin Sect. 7.3.1), and on a bi-processor XEON 1.4GHz machine with 2GB ofmemory (tests in Sect. 7.3.2), all of them running Linux RedHat Enterprise.The time limit for all the experiments was set to 300 seconds,and the memorylimit was set to 512 MB.
An executable version of MATHSAT and the source files of all the exper-iments performed in the paper are available at [27].
main.tex; 28/04/2005; 21:10; p.16
17
7.1. DESCRIPTION OF THE TEST CASES
The set of benchmarks we used in the experimentation, described below, in-volves all the suites available in the literature we are aware of. For the teston LAL, we used the following suites. TheSAL suite, originally presentedin [32], is a set of benchmarks for ground decision procedures, derived frombounded model checking of timed automata and linear hybrid systems, andfrom test-case generation for embedded controllers. TheRTLC suite, pro-vided by the authors of [31], formalizes safety properties for RTL circuits(see [31] for a more detailed description). TheCIRC suite, generated by our-selves, encodes the verification of certain properties for some simple circuits.The suite is composed of three different kinds of benchmarks, all of thembeing parametric in (and scaling up with) N, i.e. the width ofthe data-pathof the circuit, so that[0..2N â1] is the range of integer variables. In the firstbenchmark, the modular sum of two integers is checked for inequality againstthe bit-wise sum of their bit decomposition. The negation ofthe resultingformula is therefore unsatisfiable. In the second benchmark, two identicalshift-and-add multipliers and two integersa and b are given;a and the bitdecomposition ofb (respectivelyb and the bit decomposition ofa) are givenas input to the first (respectively, the second) multiplier,and the outputs ofthe two multipliers are checked for inequality. The negation of the resultingformula is therefore unsatisfiable. In the third benchmark,an integera andthe bitwise decomposition of an integerb are given as input to a shift-and-addmultiplier; the output of the multiplier is compared with the constant integervalue p2, p being the biggest prime number strictly smaller than 2N. Theresulting formula is satisfiable, but it has only one solution, wherea= b= p.TheTM suite is a set of benchmarks for (temporal) metric planning, providedto us by the authors of [36] (see also [41]).
The benchmarks below have been used for the comparison in Sect. 7.3.2,and fall into the difference logic fragment of LAL. TheDLSAT suite isprovided to us by the authors of [14] (see the paper for more detail). Thesuite contains two different sets of benchmarks: the first set formalizes theproblem of finding the optimal schedule for the job shop problem, a com-binatorial optimization problem; the second set is concerned with boundedmodel checking of timed automata that model digital circuits with delays,and formalizes the problem of finding the maximal stabilization time forthe circuits. TheSEP suite [34] is a set of benchmarks for separation logic(i.e., difference logic) derived from symbolic simulationof several hardwaredesigns, which is maintained by O. Strichman. TheDTP suite [38, 1] is a setof benchmarks from the field of temporal reasoning. The set ofbenchmarks issimilar in spirit to the standard random k-CNF SAT benchmark, and consistsof randomly-generated 2-CNF difference formulas. For our tests we have
main.tex; 28/04/2005; 21:10; p.17
18
selected 60 randomly-generated DTP formulas with 35 numerical variablesin the âhardâ satisfiability transition area.
The SAL, TM, DLSAT and DTP suites are in the domain of reals, whilstthe RTCL, CIRC and SEP suites are in the domain of integers. Due to thedifferent sources of problems within one suite, the benchmark suites cannotbe straightforwardly characterized in terms of structuralproperties of theirformulas (except for the DTP suite, in which only positive difference inequal-ities in the form(xây⤠c) occur). Nearly all problems contain a significantquantity of boolean atoms (e.g., control variables in circuits, actions in plan-ning problems, discrete variables in timed and hybrid systems). Nearly allproblems contain many difference inequalities in the form(xâ y⤠c) (e.g.,time constraints in scheduling problems and in timed and hybrid systemsverification problems, range constraints in RTL circuits).Some problems,like ATPG problems in RTLC and timed and hybrid systems in SAL, containlots of simple equalities in the form(x = y) or (x = c). The problems in theCIRC suite contain complex LAL atoms with very big integer constants, like(b = ÎŁi 2ibi) or (x⤠232).
7.2. EVALUATING DIFFERENT OPTIMIZATIONS
In this section we evaluate the impact of several optimizations on the over-all performance of MATHSAT. The experimental evaluation has been con-ducted in the following way. We chose a âdefaultâ option configuration forMATHSAT, that involves theory-driven backjumping and learning, theory-driven deduction, weakened early pruning, static learning, clustering, and EQlayering (that is, using the EUF solver as described in the second case ofSect. 5.1).
This configuration has been tested against each of the configurations ob-tained by switching off (or changing) different optionsone at a time(in otherwords, each version that has been tested differs from the default version onlywith respect to one of the optimizations). Specifically, thevariants we con-sidered are respectively the default version without (weakened) early pruning,with full early pruning, without clustering, without theory-driven deduction,without static learning, and without EQ-layering.
The six variants of MATHSAT have been run on the following test suites:SAL, RTLC, CIRC, TM, DLSAT, SEP, DTP.
The scatter plots of the overall results are given in Fig. 3. Each plot re-ports the results of the evaluation on each of the options. The X andY axesshow, respectively, the performance of the default versionand of the modifiedversion of MATHSAT. A dot in the upper part of a picture, i.e. above thediagonal, means that the default version performs better, and vice versa. Thetwo uppermost horizontal lines represent benchmarks that ended in time-out(lower) or out-of-memory (higher) for the modified version of M ATHSAT,
main.tex; 28/04/2005; 21:10; p.18
19
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
Default / No-Early-Pruning
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
Default / Full-Early-Pruning
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
Default / No-Clustering
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
Default / No-Deduction
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
Default / No-Static-Learning
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
Default / No-EQLayering
Figure 3. Scatter plots for six different variations of MATHSAT (Y axis), compared againstthe default version (X axis).
whereas the two rightmost vertical lines represent time-out (left) or out-of-memory (right) for the default version. Notice that the axesare logarithmic, sothat only big performance gaps are highlighted. E.g., the fact that a variant is50% faster or slower than the default on some sample (i.e., a 1.5 performancefactor) is hardly discernible on these plots.
From the plots in Fig. 3 we observe the following facts.
â Dropping (weakened) early pruning worsens the performances signifi-cantly, or even drastically, in most benchmarks. This is dueto the factthat early pruning may allow for significant cuts to the boolean searchtree, and that the extra cost of intermediate calls to MATHSOLVE is muchreduced by the incrementality of MATHSOLVE. From nearly all our ex-periments, it turns out that early pruning causes a significant reductionof the number of branches explored in the boolean search tree, which isproportional to the overall reduction of CPU time.
â Using full early pruning instead of its weakened version most oftenworsens performances, on bothR andZ domains. From the experimen-tal data, we see that full early pruning does not introduce significant
main.tex; 28/04/2005; 21:10; p.19
20
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
Default / All-Disabled
Figure 4. Scatter plots for the version of MATHSAT with all features disabled (Y axis),compared against the default version (X axis).
reductions in the number of boolean branches explored, whilst the callsto MATHSOLVE require longer times on average.
Within the R domain, this fact seems to suggest that ignoring dise-qualities in the consistency check makes MATHSOLVE faster withoutreducing significantly the pruning effect of the boolean search space.Within the Z domain, this fact seems to suggest that in most cases theassignments that are consistent inR are consistent also inZ, and that theoverhead due to handling integers also in early pruning calls is some-times heavy.
â Dropping clustering slightly worsens the performances in most cases,although the gaps are not dramatic. A possible explanation is that theeffects of âdividing and conqueringâ the mathematical search space arenot as relevant as those of other factors (e.g., cutting the boolean searchspace). This combines with the fact that the mathematical solver is veryeffective in producing small conflict sets, even in presenceof largerproblems. In our tests, only a few tests actually had more than onecluster. A more refined analysis shows that for the problems with only asingle cluster the overhead is not significant.
â Dropping theory-driven deduction worsens the performancein most cases.The importance of deduction is both in the immediate effect of assigningtruth values to unassigned literals, which fires boolean constraint propa-gation, and in the learning of extra clauses from the deduction. From ourexperiments, it turns out that theory-driven deduction is most effectivein problems which are rich of simpler equalities like(x = c) and(x = y)(e.g., the problems in RTLC and the BMC on timed system problems inSAL), which can be easily and effectively deduced by the EUF solver.
main.tex; 28/04/2005; 21:10; p.20
21
â Static learning seems to introduce only slight improvements on average.This may be due to the fact that most benchmarks derive from the encod-ing of verification problems, so that short clauses which canbe learnedeasily are already part of the encodings (see, e.g., [4]). Moreover, ingeneral, the effect of static learning is hindered in part bytheory-drivenlearning. From our experiments, it turns out that in some benchmarks(e.g., DTP, and partly DLSAT and CIRC) where lots of clauses can belearned off-line, static-learning is effective (e.g., more than one ordermagnitude faster on DTP) whilst on other benchmarks where very fewor no clause can be learned off-line, static-learning is ineffective.
â Dropping EQ Layering worsens the performance in most cases.We be-lieve this is due to the fact that many practical problems contain lotsof simple equalities, from which lots of information can be deducedand learned by simply applying equality propagation and congruenceclosure. From our experiments, it turns out that EQ Layeringis mosteffective in problems which are rich of simpler equalities like (x = c)and(x = y) (e.g., the problems in RTLC and the BMC on timed systemproblems in SAL), which can be easily and effectively handled by theEUF solver.
Finally, Fig. 4 shows the impact of switching off simultaneously all thesix options described above. We notice that, altogether, the six optimizationsimprove the performances significantly, and even drastically in most cases.
7.3. COMPARISON WITH OTHER STATE-OF-THE-ART TOOLS
In this section we report the results of the evaluation of MATHSAT withrespect to other state-of-the-art tools. We distinguish the evaluation into twodifferent parts: in Sect. 7.3.1 we compare MATHSAT against CVC, ICS, andUCLID, that support linear arithmetic logic (LAL), whereasin Sect. 7.3.2we compare MATHSAT against TSAT++ and DLSAT, that are specializedsolvers for difference logic (DL).
7.3.1. Comparison on Linear Arithmetic LogicWe have compared MATHSAT with ICS [23, 18], CVCLITE [15, 7], andUCLID [43, 35]. We ran ICS version 2.0 and UCLID version 1.0. For CV-CLITE, we used the version available on the online repository, as of 10thOctober 2004, given that the latest officially released version showed a bugrelated to the management of integer variables (the versionwe used turnedout to be much faster than the official one).
The results are reported in Fig. 5. Each column shows the comparisonbetween MATHSAT and, respectively, CVCLITE, ICS and UCLID. Each
main.tex; 28/04/2005; 21:10; p.21
22
SA
Lsu
ite
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
RT
LC
suite
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
CIR
Csu
ite
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
TM
suite
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
CVCLITE ICS UCLID
Figure 5. Execution time ratio: theX andY axes report MATHSAT and each competitorâstimes respectively
main.tex; 28/04/2005; 21:10; p.22
23
0
20
40
60
80
100
120
140
160
180
200
0.01 0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatICS
CVCLUCLID
0
5
10
15
20
25
30
0.01 0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatICS
CVCLUCLID
SAL suite RTLC suite
0
5
10
15
20
25
30
0.01 0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatICS
CVCL
0
2
4
6
8
10
12
14
16
0.01 0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatICS
CVCL
CIRC suite TM suite
Figure 6. Number of benchmarks solved (Y axis) versus time (X axis) for each suite
of the rows corresponds to the comparison of the four systemson the SAL,RTLC, CIRC, and TM test suites, respectively.
Each point in the scatter plot corresponds to a problem run; on theX axiswe have the execution time of MATHSAT, while theY axis shows the execu-tion time of the competitor system. A point above the diagonal means a betterperformance of MATHSAT and viceversa. The two uppermost horizontallines represent benchmarks that ended in time-out (lower) or out-of-memory(higher) for the competitor system, whereas the two rightmost vertical linesrepresent time-out (left) or out-of-memory (right) for MATHSAT.
The comparison with CVCLITE shows that MATHSAT performs gener-ally much better on the majority of the benchmarks in the SAL suite (CVCLITE
timeouts on several of them, MATHSAT only on five of them). On the RTLCsuite, the comparison is slightly in favour of MATHSAT. For the CIRC andTM suites, the comparison is definitely in favour of MATHSAT, althoughthere are a few problems in the TM suite that neither of the systems cansolve.
The comparison with ICS is reported in the second column. We see thaton the SAL suite (i.e., on ICS own test suite) ICS is slightly superior onthe smaller problems. However, MATHSAT performs slightly better on themedium and significantly better on the most difficult problems in the suite,
main.tex; 28/04/2005; 21:10; p.23
24
0
50
100
150
200
250
300
0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatICS
CVCL
TOTAL
Figure 7. Number of benchmarks solved (Y axis) versus time (X axis) (all suites)
where ICS repeatedly times out. In the RTLC suite, ICS is clearly dominatedby MATHSAT. In the CIRC suite MATHSAT performs better on nearly alltests, although the performance gaps are not impressive. Inthe TM suite,ICS performs slightly better than MATHSAT.
The comparison with UCLID is limited to the problems that canbe ex-pressed, i.e. some problems in SAL and RTLC, and shows a very substantialperformance gap in favour of MATHSAT.
An alternative view of the comparison is shown in Figs. 6 and 7(thesecurves are also known asruntime distributions). For each of the systems, wereport the number of benchmarks solved (Y axis) in a given amount of time(X axis) (the samples are ordered by increasing computationtime). The upperpoint in the trace also shows how many samples were solved within the timelimit. (Notice that the data for UCLID must be interpreted with care, sinceit was confronted only with a subset of the problems. For the same reason,UCLID is not reported in the totals.)
The curves highlight that UCLID is the worst scorer except for the RTLCsuite, where it performs better than ICS, that MATHSAT and ICS performglobally better than CVCLITE, and that MATHSAT is sometimes slower onthe smaller problems than ICS, but more powerful when it comes to harderproblems.
One potential criticism to every empirical comparison is that the choiceof the test cases may bias the results. For our tests, however, we remark thatwe have runall the test casesused by the ICS team in [16], that we havealso introduced other suites with problems from other application domains,and that, except for the CIRC suite, all the suites we have used have beenproposed by other authors in previous papers.
7.3.2. Comparison on Difference LogicWe also compared MATHSAT with TSAT++ [42, 2] and DLSAT [14], whichare specialized solvers for Difference Logics. We did not include in the com-
main.tex; 28/04/2005; 21:10; p.24
25
DL
SA
Tsu
ite
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
SE
Psu
ite
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
DT
Psu
ite
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000 0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100 1000
TSAT++ DLSAT
Figure 8. Execution time ratio: theX andY axes report MATHSAT and each competitorâstimes respectively.
parison SEP [34, 40], a decision procedure based on an eager encoding inpropositional logic, since it is known to be outperformed byTSAT++ [2].
In Fig. 8 we report the results of the comparison between MATHSATand TSAT++ (left column), and DLSAT (right column). Fig. 7.3.2 showsan overall comparison using runtime distributions.
MATHSAT performs slightly better than TSAT++ on the DLSAT suite,slightly worse or equivalently better on the SEP suite, and significantly betteron the DTP suite (i.e., TSAT++ own suite). MATHSAT performs signifi-cantly better than DLSAT on its own test suite, slightly worse on the SEPsuite (notice that the samples here are much less and much simpler) and sig-nificantly better in the DTP suite. On the whole, we can see that MATHSATand TSAT++ both outperform DLSAT. Interestingly, MATHSAT exhibitson these problems a behaviour that is comparable to or even better than
main.tex; 28/04/2005; 21:10; p.25
26
0
10
20
30
40
50
60
0.01 0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatTSAT
DLSAT
0
2
4
6
8
10
12
14
16
0.01 0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatTSAT
DLSAT
DLSAT suite SEP suite
0
5
10
15
20
25
30
35
40
45
0.01 0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatTSAT
DLSAT
0
20
40
60
80
100
120
0.01 0.1 1 10 100 1000 10000
# of
form
ulas
sol
ved
Total CPU Time (secs)
MathSatTSAT
DLSAT
DTP suite TOTAL
Figure 9. Number of benchmarks solved (Y axis) versus time (X axis) for each suite
TSAT++, which is a highly specialized solver, despite its ability to deal witha larger class of problems.
8. Related Work
In this paper we have presented a new decision procedure for Linear Arith-metic Logic. The verification problem for LAL is well known, and has re-ceived a lot of interest in the past. In particular, decisionprocedures are theones considered in Sect. 7.3, namely CVCLITE [15, 7], ICS [23, 18] andUCLID [43, 35].
CVCLITE is a library for checking validity of quantifier-free first-orderformulas over several interpreted theories, including real and integer lineararithmetic, arrays and uninterpreted functions. CVCLITE replaces the oldertools SVC and CVC [15]. ICS is a decision procedure for the satisfiabilityof formulas in a quantifier-free, first-order theory containing both uninter-preted function symbols and interpreted symbols from a set of theories in-cluding arithmetic, tuples, arrays, and bit-vectors. Finally, UCLID is a toolincorporating a decision procedure for arithmetic of counters, the theoriesof uninterpreted functions and equality (EUF), separationpredicates and ar-
main.tex; 28/04/2005; 21:10; p.26
27
rays. UCLID is based on an âeagerâ reduction to propositional SAT, that is,the input formula is translated into a SAT formula in a singlesatisfiability-preserving step, and the output formula is checked for satisfiability by a SATsolver.
In this paper, we have compared these tools using benchmarksfrom lineararithmetic logic (in the case of UCLID the subset of arithmetic of counters).A comparison on the benchmarks dealing with the theory of EUFis part ofour future work.
Other relevant systems are Verifun [19], a tool using lazy-theorem provingbased on SAT-solving, supporting domain-specific procedures for the theoriesof EUF, linear arithmetic and the theory of arrays, and the tool ZAPATO [6],a tool for counterexample-driven abstraction refinement whose overall archi-tecture is similar to Verifun. The DPLL(T) [20] tool is a decision procedurefor the theory of EUF. Similarly to MATHSAT, DPLL(T) is based on aDPLL-like SAT-solver engine coupled with an efficient congruence closuremodule [29] that has inspired our own equational reasoner. However, ouruse of EUF reasoning is directed to tackling the harder problem of LALsatisfiability.
ASAP [25] is a decision procedure for quantifier-free Presburger arith-metic (that is, the theory of LAL over non-negative integers). ASAP is im-plemented on top of UCLID, and would have been a natural candidate for ourexperimental evaluation; unfortunately, a comparison wasnot possible sinceneither the system nor the benchmarks described in [25] havebeen madeavailable.
Finally, we mention HDPLL, a decision procedure for LAL, specializedfor the verification of circuits at the RTL level [31]. The procedure is based onDPLL-like Boolean search engine integrated with a constraint solver basedon Fourier-Motzkin elimination and finite domain constraint propagation.According to the experimental results in [31], HDPLL seems to be veryeffective for its application domain. We are very interested in incorporatingsome of the ideas into MATHSAT, and in performing a thorough experimentalcomparison. However, HDPLL is not publicly available.
Concerning the fragment of difference logic, other relatedtools are theones considered in Sect. 7.3.2, namely TSAT++ [42, 2], DLSAT[14]. WhileTSAT++ and DLSAT implement an approach similar to MATHSAT, theyare specialized to dealing with difference logics, and do not implement anyform of layering. In general, TSAT++ appears to be much more efficientthan DLSAT, based on a lean implementation that tightly integrates the the-ory solver with a state-of-the-art library for SAT. An alternative approachis implemented in SEP [34, 40], that is based on a eager approach that re-duces satisfiability of the difference logic to the satisfiability of a purelypropositional formula.
main.tex; 28/04/2005; 21:10; p.27
28
Concerning the very different domain of Constraint Logic Programming,we notice that some ideas related to the mathematical solver(s) presentedin this paper (i.e., layering, stack-based interfaces, theory-deduction) are tosome extent similar to those presented in [24].2
9. Conclusions and Future Work
In this paper we have presented a new approach to the satisfiability of LinearArithmetic Logic. The work is carried out within the (known)framework ofintegration between off-the-shelf SAT solvers, and specialized theory solvers.We proposed several improvements. In the top level algorithm, we exploittheory learning and deduction, theory-driven backjumping, and we adopt astack-based interface that allows for an incremental and backtrackable im-plementation of the mathematical solver. We also use staticlearning andclustering. We heavily exploit the idea of layering: the satisfiability of theoryconstraints is evaluated in theories of increasing strength (Equality, LinearArithmetic over the reals, and Linear Arithmetic over the integers). The ideais to prefer less expensive solvers (for weaker theories), thus reducing theuse of more expensive solvers. We carried out a thorough experimental eval-uation of our approach: our MATHSAT solver is able to tackle effectively awide class of problems, with performance comparable with and often supe-rior to the state-of-the-art competitors, both on LAL problems, and againstspecialized competitors on the subclass of Difference Logics.
As future work, we plan to enhance MATHSAT by investigating differ-ent splitting heuristics and the integration of other boolean reasoning tech-niques, that are complementary to DPLL. An extension of MATHSAT to non-linear arithmetics is currently ongoing, based on the integration of computer-algebraic methods. Further extensions include the development of specializedmodules to deal with memory access, bit-vector arithmetic,and the extensionto the integration of EUF and LA. On the side of verification, we envisageMATHSAT as a back-end for lifting SAT-based model checking beyond theboolean case, to the verification of sequential RTL circuitsand of hybridsystems.
References
1. Armando, A., C. Castellini, and E. Giunchiglia.: 1999, âSAT-based procedures fortemporal reasoning.â. In:Proc. European Conference on Planning, CP-99.
2. Armando, A., C. Castellini, E. Giunchiglia, and M. Maratea: 2004, âA SAT-based De-cision Procedure for the Boolean Combination of DifferenceConstraintsâ. In:Proc.Conference on Theory and Applications of Satisfiability Testing (SATâ04).
2 We are grateful to an anonymous reviewer for pointing out this fact to us.
main.tex; 28/04/2005; 21:10; p.28
29
3. Audemard, G., P. Bertoli, A. Cimatti, A. KorniĹowicz, andR. Sebastiani: 2002a, âASAT Based Approach for Solving Formulas over Boolean and Linear MathematicalPropositionsâ. In:Proc. CADEâ2002., Vol. 2392 ofLNAI.
4. Audemard, G., A. Cimatti, A. KorniĹowicz, and R. Sebastiani: 2002b, âSAT-BasedBounded Model Checking for Timed Systemsâ. In:Proc. FORTEâ02, Vol. 2529 ofLNCS.
5. Badros, G. and A. Borning: 1998, âThe Cassowary Linear Arithmetic Constraint Solv-ing Algorithm: Interface and Implementationâ. Technical Report UW-CSE-98-06-04,University of Washington.
6. Ball, T., B. Cook, S. Lahiri, and L. Zhang: 2004, âZapato: Automatic Theorem Provingfor Predicate Abstraction Refinementâ. In:Proc. CAVâ04, Vol. 3114 ofLNCS. pp. 457â461.
7. Barrett, C. and S. Berezin: 2004, âCVC Lite: A New Implementation of the CooperatingValidity Checkerâ. In:Proc. CAVâ04, Vol. 3114 ofLNCS. pp. 515â518.
8. Bayardo, Jr., R. J. and R. C. Schrag: 1997, âUsing CSP Look-Back Techniques to SolveReal-World SAT Instancesâ. In:Proc. AAAI/IAAIâ97. pp. 203â208.
9. Bockmayr, A. and V. Weispfenning: 2001, âSolving Numerical Constraintsâ. In:Handbook of Automated Reasoning. MIT Press, pp. 751â842.
10. Borning, A., K. Marriott, P. Stuckey, and Y. Xiao: 1997, âSolving Linear ArithmeticConstraints for User Interface Applicationsâ. In:Proc. UISTâ97. pp. 87â96.
11. Bozzano, M., R. Bruttomesso, A. Cimatti, T. Junttila, P.van Rossum, S. Schulz, and R.Sebastiani: 2005, âAn Incremental and Layered Procedure for the Satisfiability of LinearArithmetic Logicâ. In: Proc. TACAS 2005, Vol. 3440 ofLNCS. pp. 317â333.
12. Brinkmann, R. and R. Drechsler: 2002, âRTL-Datapath Verification using Integer LinearProgrammingâ. In:Proc. ASP-DAC 2002. pp. 741â746.
13. Cherkassky, B. and A. Goldberg: 1999, âNegative-Cycle Detection Algorithmsâ.Math-ematical Programming85, 277â311.
14. Cotton, S., E. Asarin, O. Maler, and P. Niebert: 2004, âSome Progress in SatisfiabilityChecking for Difference Logicâ. In:Proc. FORMATS-FTRTFT 2004.
15. CVC. CVC, CVCLITE and SVC.http://verify.stanford.edu/{CVC,CVCL,SVC}.16. de Moura, L. and H. Ruess: 2004, âAn Experimental Evaluation of Ground Decision
Proceduresâ. In: R. Alur and D. Peled (eds.):Proc. 15th Int. Conf. on Computer AidedVerification - CAV04, Vol. 3114 ofLNCS. Boston, MA, pp. 162â174.
17. Een, N. and N. Sorensson: 2004, âAn Extensible SAT-solverâ. In: Theory andApplications of Satisfiability Testing (SAT 2003), Vol. 2919 ofLNCS. pp. 502â518.
18. Filliatre, J.-C., S. Owre, H. Ruess, and N. Shankar: 2001, âICS: Integrated Canonizerand Solverâ. In:Proc. CAVâ01, Vol. 2102 ofLNCS. pp. 246â249.
19. Flanagan, C., R. Joshi, X. Ou, and J. Saxe: 2003, âTheoremProving using Lazy ProofExplicationâ. In:Proc. CAVâ03, Vol. 2725 ofLNCS. pp. 355â367.
20. Ganzinger, H., G. Hagen, R. Nieuwenhuis, A. Oliveras, and C. Tinelli: 2004, âDPLL(T):Fast Decision Proceduresâ. In:Proc. CAVâ04, Vol. 3114 ofLNCS. pp. 175â188.
21. GMP. GNU Multi Precision Library.http://www.swox.com/gmp.22. Gomes, C., B. Selman, and H. Kautz: 1998, âBoosting Combinatorial Search Through
Randomizationâ. In:Proc. of the Fifteenth National Conf. on Artificial Intelligence. pp.431â437.
23. ICS. ICS.http://www.icansolve.com.24. Jaffar, J., S. Michaylov, P. J. Stuckey, and R. H. C. Yap: 1992, âThe CLP(R) Language
and Systemâ.ACM Transactions on Programming Languages and Systems (TOPLAS)14(3), 339 â 395.
25. Kroening, D., J. Ouaknine, S. Seshia, and O. Strichman: 2004, âAbstraction-Based Sat-isfiability Solving of Presburger Arithmeticâ. In:Proc. CAVâ04, Vol. 3114 ofLNCS. pp.308â320.
main.tex; 28/04/2005; 21:10; p.29
30
26. Land, H. and A. Doig: 1960, âAn Automatic Method For Solving Discrete ProgrammingProblemsâ.Econometrica28, 497â520.
27. MATHSAT. MATHSAT.http://mathsat.itc.it.28. Moskewicz, M. W., C. F. Madigan, Y. Zhao, L. Zhang, and S. Malik: 2001, âChaff:
Engineering an Efficient SAT Solverâ. In:Proc. DACâ01. pp. 530â535.29. Nieuwenhuis, R. and A. Oliveras: 2003, âCongruence Closure with Integer Offsetsâ. In:
Proc. 10th LPAR. pp. 77â89.30. Omega. Omega.http://www.cs.umd.edu/projects/omega.31. Parthasarathy, G., M. Iyer, K.-T. Cheng, and L.-C. Wang:2004, âAn Efficient Finite-
Domain Constraint Solver for Circuitsâ. In:Proc. DACâ04. pp. 212â217.32. SAL. SAL Suite.http://www.csl.sri.com/users/demoura/gdp-benchmarks.html.33. Schulz, S.: 2002, âE â A Brainiac Theorem Proverâ.AI Communications15(2/3), 111â
126.34. SEP. SEP Suite.http://iew3.technion.ac.il/âźofers/smtlib-local/benchmarks.html.35. Seshia, S., S. Lahiri, and R. Bryant: 2003, âA Hybrid SAT-Based Decision Procedure for
Separation Logic with Uninterpreted Functionsâ. In:Proc. DACâ03. pp. 425â430.36. Shin, J.-A. and E. Davis: 2004, âContinuous Time in a SAT-based Plannerâ. In:Proc.
AAAI-04. pp. 531â536.37. Silva, J. P. M. and K. A. Sakallah: 1996, âGRASP - A new Search Algorithm for
Satisfiabilityâ. In:Proc. ICCADâ96. pp. 220â227.38. Stergiou, K. and M. Koubarakis: 2000, âBacktracking Algorithms for Disjunctions of
Temporal Constraintsâ.Artificial Intelligence120(1), 81 â 117.39. Strichman, O.: 2002, âOn Solving Presburger and Linear Arithmetic with SATâ. In:Proc.
of Formal Methods in Computer-Aided Design (FMCAD 2002).40. Strichman, O., S. Seshia, and R. Bryant: 2002, âDecidingseparation formulas with SATâ.
In: Proc. of Computer Aided Verification, (CAVâ02).41. TM. TM-LPSAT.http://cs1.cs.nyu.edu/âźjiae/.42. TSAT. TSAT++.http://www.ai.dist.unige.it/Tsat.43. UCLID. UCLID.http://www-2.cs.cmu.edu/âźuclid.44. Zhang, L. and S. Malik: 2002, âThe Quest for Efficient Boolean Satisfiability Solversâ.
In: Proc. CAVâ02. pp. 17â36.
main.tex; 28/04/2005; 21:10; p.30