More Evaluation of Decision Procedures for Modal Logics

More evaluation of decision procedures for modal logicsEnrico GiunchigliaDIST, V.le Causa 1316145 Genova, Italy Fausto GiunchigliaDISA, Universit�a di Trento.IRST, 38050 Povo,Trento, Italy Roberto SebastianiIRST, 38050 Povo,Trento, Italy Armando TacchellaDIST, V.le Causa 1316145 Genova, ItalyAbstractThis paper follows on previous papers whichpresented and evaluated various decision pro-cedures for modal logics. It con�rms pre-vious experimental results in showing thatSAT based decision procedures, i.e., the pro-cedures built on top of decision proceduresfor propositional satis�ability, are more ef-�cient than tableau based decision proce-dures. It also con�rms previous evidence ofan easy-hard-easy pattern in the satis�abil-ity curve for modal K. Finally, it providesfurther experimental results, suggesting thatSAT based decision procedures are also moree�cient than the decision procedures basedon Ohlbach's translation method. Our re-sults contradict some of the results presentedin previous papers.1 INTRODUCTIONUntil recently, the decision procedures for modal log-ics which could be found in the literature were eithertableau based 1 (see, e.g., [Kripke, 1959; Fitting, 1983;Massacci, 1994]), or based on Ohlbach's translationmethod [Ohlbach, 1991]. Only few of these procedureswere implemented, and even less were thoroughlytested and comparatively evaluated. Giunchiglia andSebastiani [Giunchiglia and Sebastiani, 1996a] changedthis situation in three ways. First, they proposed anovel approach where a decision procedure for modallogics, called Ksat, is de�ned in terms of a SAT deci-sion procedure. To emphasize this fact, they called theresulting class of procedures, SAT based. Second, they1We call \tableau based" any system that implementsSmullyan's tableau calculus as de�ned in [Smullyan, 1968].Thus, in our terminology, Kris [Hollunder et al., 1990;Baader et al., 1994] and LWB [Heuerding et al., 1996] aretableau based systems, whence Ksat [Giunchiglia and Se-bastiani, 1996a], TA [Hustadt and Schmidt, 1997a] andalso Fact [Horrocks, 1997] are not.

provided a new test methodology which naturally ex-tends the �xed-clause-length test method, a very pop-ular test method for SAT [Mitchell et al., 1992; Buroand Buning, 1992]. Third, they built a straightforwardLISP implementation of Ksat, called KsatLisp fromnow on, and tested it against the tableau system Kris[Hollunder et al., 1990; Baader et al., 1994]. These lat-ter results are reported in [Giunchiglia and Sebastiani,1996c].Following on this work, Horrocks presented Fact,a system which further enhances KSAT's ideas byintroducing several other optimizations (e.g. back-jumping). According to [Horrocks, 1997], the LISPimplementation of Fact further enhances KsatLispperformances. Hustadt and Schmidt, instead, makein [Hustadt and Schmidt, 1997a; Hustadt and Schmidt,1997b] a very detailed and critical analysis of the workin [Giunchiglia and Sebastiani, 1996a; Giunchigliaand Sebastiani, 1996c]. They start by improving onGiunchiglia and Sebastiani's test methodology, thusavoiding possible situations where this method cangenerate trivial problems. Then, they challenge thefollowing three claims made by Giunchiglia and Se-bastiani in [Giunchiglia and Sebastiani, 1996c] (asHustadt and Schmidt quote them in [Hustadt andSchmidt, 1997a]):1. Ksat outperforms by orders of magnitude theprevious state-of-the-art decision procedures.2. All SAT based modal decision procedures are in-trinsically bound to be more e�cient than tableaubased decision procedures.3. There is partial evidence of an easy-hard-easy pat-tern on randomly generated modal logic formulasindependent of all the parameters of evaluationconsidered.Finally, they proposeTA, a system based on Ohlbach'stranslation method, which is implemented by appro-priately calling Flotter and the �rst order theoremprover Spass [Weidenbach et al., 1996]. Their analysissuggests that TA has better computational behaviorthan KsatLisp.

The goal of this paper is to provide further experi-mental results and use them to shed some light on theprevious analysis, which, as hinted above, is contra-dictory in some parts. Thus, in Section 2 we describeKsatC, a C++ implementation of Ksat built start-ing from a pre-existing state-of-the-art SAT decisionprocedure. The goal of this section is twofold. First,it shows how the methodology and ideas introducedin [Giunchiglia and Sebastiani, 1996a; Giunchiglia andSebastiani, 1996c] allows us, in practice, to developdecision procedures for modal logics starting from ex-isting implementations of SAT decision procedures.Given the fact that this latter domain has receivedmuch more attention and it is far more developed,we consider this an important result. To build an ef-�cient SAT procedure from scratch requires a lot ofman power. Second, a comparison of KsatLisp andKsatC allows us to get an idea of how much we cangain by working on the implementation. 2 Then, inSection 3 we describe a �rst set of tests which allowsus to con�rm the three claims of ours listed above andchallenged by Hustadt and Schmidt. These tests alsosuggest that KsatC and (contrarily to what reportedin [Hustadt and Schmidt, 1997a; Hustadt and Schmidt,1997b]) even KsatLisp are faster than TA. Finally, inSection 4 we focus on KsatC and TA. These testscon�rm and make more evident the superior compu-tational properties of KsatC over TA.2 KsatCWe use the same conventions as [Giunchiglia and Se-bastiani, 1996c; Giunchiglia and Sebastiani, 1996b]. Amodal atom is a w� of the form 2 , and an atom iseither a modal atom or a propositional letter. An as-signment is a partial function from the set of atoms tofTrue, Falseg. In this paper, all the assignments havea �nite domain. This allows us to identify an assign-ment � as the conjunction of the atoms true in � andof the negations of the atoms false in �. The idea is tosee a modal w� as a propositional w� in its top-levelatoms, so that � corresponds to a standard assignmentin propositional logic.Ksat is a correct and complete decision procedurefor satis�ability in the modal logic K (K-satis�abilityfrom now on) presented and analyzed in detailin [Giunchiglia and Sebastiani, 1996a; Giunchiglia andSebastiani, 1996c]. A high-level description of Ksat isreported in Figure 2. For any modal w� ', Ksat(')returns True if ' is K-satis�able and False other-wise. This is accomplished by calling KsatW ('; ;).2To this extent, it must be pointed out that the im-plementation of KsatC is very similar in spirit to thatof KsatLisp. (See Section 2 for the details.) We believethat the behavior of KsatC could be further improved byadding new heuristics, e.g., those implemented in Fact[Horrocks, 1997].

KsatW is a variant of the Davis-Putnam-Longemann-Loveland SAT procedure [Davis and Putnam, 1960;Davis et al., 1962] (DPLL from now on). The funda-mental di�erence with DPLL is that, whenever an as-signment � satisfying ' has been found (\base" case),KsatW invokes KsatA(�) instead of returning True.Basically, KsatW is used to generate a complete set ofassignments for '.3 Then, the K-consistency of eachassignment generated byKsatW is checked byKsatA,and this is done by recursively calling Ksat. Theseprocedures recurse until we get to an assignment withno modal atoms.KsatC is a C++ implementation of Ksat. KsatCis implemented on top of B�ohm's C implementation ofDPLL (calledDpll from now on), the winner of a 1992SAT competition [Buro and Buning, 1992].4 Dpll hasserved as the basis for the implementation of KsatW .The basic features ofKsatW inherited from Dpll are:� e�cient data structures for literal assignment andw� (partial) evaluation. In Dpll (and hence inKsatW ) literals are assigned and unassigned dy-namically inside a w� simply by moving pointers(that is, without copying the w� in the stack ateach recursive call (!)) in time proportional to thenumber of their occurrences.� smart splitting heuristics. Similarly to otherstate-of-the-art satis�ability checkers (see, e.g.,[Crawford and Auton, 1993]) Dpll splits onthe literal occurring most often in the shortestclauses.5KsatW has been obtained from Dpll by performingthe following steps:� The code implementing the pure literal rule hasbeen ruled out.6 With the pure literal rule, theset of assignments generated by KsatW might beincomplete, so that KsatC might conclude thatthe input w� is not satis�able even if this is actu-ally the case;� Whenever an assignment � satisfying ' is found,KsatW invokes KsatA(�) (instead of printingout the assignment);3We say that a set S of assignments is complete for aw� �, if � is logically equivalent to the disjunction of theassignments in S.4The code of Dpll and a report about this competi-tion are available at the http addresshttp://www.informatik.uni-koeln.de/ls juenger/staff/boehm.html.5By contrast, in KsatLisp a copy of the w� is passedat each DPLL recursive call; literals are assigned by recur-sively descending the whole w�; splitting happens on atoms(and not on literals), and KsatLisp chooses the atom oc-curring most often.6According to the pure literal rule, if a literal l occursin ' while :l does not, l is assigned to True [Davis andPutnam, 1960].

function Ksat(')return KsatW ('; T );function KsatW ('; �)if ' = T /* base */then return KsatA(�);if ' = F /* backtrack */then return False;if fa unit clause (l) occurs in 'g /* unit */then return KsatW (assign(l; '); � ^ l);if not KsatA(�) /* early pruning (optional) */then return False;l := choose-literal('); /* split */return KsatW (assign(l; '); � ^ l) orKsatW (assign(:l; '); � ^ :l);function KsatA(Vi2�i ^Vj :2�j ^ )for any conjunct \:2�j" doif not Ksat(Vi �i ^ :�j)then return False;return True; Figure 1: The structure of Ksat.� Many of the global variables have been made lo-cal to the procedures where they are used: this isnecessary to avoid interferences between two callsKsatW ('; ;) and KsatW ('0; ;). Analogously forthe static variables.� The (optional) step of early pruning has been in-cluded in Dpll (see Figure 2). This step is notmandatory. However, as discussed in [Giunchigliaand Sebastiani, 1996a], early pruning may detectearly inconsistencies in the partial assignment sofar built and therefore greatly prune the searchspace.KsatW is interfaced with the rest of the procedureby means of a global look-up table (LUT) which as-sociates a propositional literal Bi (resp. :Bi) to eachdistinct modal literal 2'i (resp. :2'i) occurring in '.Each row in LUT is a pair hBi; '�i i (resp. h:Bi;:'�i i),where '�i is the propositional w� obtained from 'iby recursively substituting each atom 2'j with thecorresponding Bj . When invoked on the input w�', KsatC initializes LUT and passes '� to Ksat.Each time it is invoked on a propositional assignment� (which represents Vi2�i ^ Vj :2�j ^ ), KsatAretrieves from LUT the CNF w�s corresponding tothe literals in �, it merges them and invokes Ksat onthe resulting CNF w�s ^i�i ^:�j 's. Notice that each

retrieval requires only constant time.3 MORE TESTING OF THREEEARLIER CLAIMSOur goal in this section is to con�rm the three claimslisted in the introduction, �rst made in [Giunchigliaand Sebastiani, 1996c] and then challenged by Hustadtand Schmidt.3.1 THE EVALUATION STRATEGYWe compare �ve systems, namely: Kris, KsatLisp,KsatC, KsatLisp(unsorted), and TA.7 Kris is atableau based system implemented in LISP [Hollun-der et al., 1990; Baader et al., 1994]. Both KsatC andKsatLisppre-sort input w�s before the main routine isinvoked for the �rst time. This allows for renaming thepermutations of a modal atom (e.g., 2(A1 _ A2) and7TA is available at http://www.mpi-sb.mpg.de/ in~hustadt/mdp/. In particular, we used TA version 1.2,Spass/Flotter version 0.55. Kris, KsatLisp, KsatC,the random w� generator and detailed instructions abouthow to reproduce all the tests presented are availableat ftp://ftp.mrg.dist.unige.it/ in pub/mrg-systems.(As the following makes clear, KsatLisp(unsorted) isKsatLisp with a heuristic turned o�).

2(A2 _ A1)) with the same propositional letter. Thisstep, if not performed, may dramatically a�ect the per-formance of the algorithm [Giunchiglia and Sebastiani,1996a]. KsatLisp(unsorted) isKsatLisp with sort-ing disabled. We test KsatLisp(unsorted)(which,we know a priori, is much slower than KsatLisp) inorder to reproduce the tests in [Hustadt and Schmidt,1997a; Hustadt and Schmidt, 1997b]. In fact, as theyexplicitly state in [Hustadt and Schmidt, 1997a][page21], Hustadt and Schmidt test KsatLisp with sortingdisabled (that is, what in this paper we callKsatLisp-(unsorted)).To test these systems, we use the test methodol-ogy described in [Giunchiglia and Sebastiani, 1996a;Giunchiglia and Sebastiani, 1996c] with the improve-ments suggested in [Hustadt and Schmidt, 1997a].8These improvements solve the problem | highlightedin [Hustadt and Schmidt, 1997a] | that, for somechoices of the parameters' values, the generated w�scan be trivial or can be signi�cantly simpli�ed bya simple preprocessing step (like that, for instance,which can be optionally activated in both KsatC andKsatLisp). As a matter of fact, the samples that weused throughout our experimental analisys are highlyinsensitive to preprocessing (i.e. simplifying). We ver-i�ed this property by running the built-in simpli�er ofKsatC on every sample of the PS1 test, and then com-paring the original sample vs. the simpli�ed one. Theresult is that there is little or no di�erence betweenthem.We can brie y overview our testing methodology asfollows. We consider problem sets, i.e., sets of ran-domly generated 3CNFK w�s. A 3CNFK w� is a con-junction of 3CNFK clauses, where each clause is a dis-junctions of three literals. Modal atoms are restrictedto have the form 2C, where C is a 3CNFK clause.A 3CNFK w� is randomly generated according to thefollowing parameters: (i) the modal depth d; (ii) thenumber of clauses L; (iii) the number of propositionalvariables N ; (iv) the probability p with which an atomoccurring in a clause of depth > 0 is purely proposi-tional. Following [Hustadt and Schmidt, 1997a], we�x p = 0 and modify the generator to avoid multipleoccurrences of the same propositional atom inside oneclause.A problem set is thus characterized by a �xed N andd: we let L vary in such a way to empirically coverthe \100% satis�able { 100% unsatis�able" transition.Then, for each tuple of the four values in a problemset, we randomly generate 100 3CNFK w�s. Thesew�s are given in input to the procedure under test.Satis�ability percentages and median CPU times areplotted against the number of clauses L. For practicalreasons, a timeout mechanism stops the execution onone problem after 1000 seconds of CPU. A 1000 sec-8The features and advantages of this methodology aredescribed in [Giunchiglia and Sebastiani, 1996a].

onds median time thus means that more than 50% ofthe total number of samples has exceeded the timeout.3.2 RESULTSWe start with the problem set PS1 (called PS12in [Hustadt and Schmidt, 1997a]) characterized byd = 1 and N = 4. Figure 2 reports the median CPUtime for the �ve systems (left) and the median numberof DPLL calls (right) for KsatLisp and KsatC. ForKsatLisp andKsatC, the CPU time does not includethe time needed for w� sorting. However, this over-head is negligible when compared to the time spentin the decision process. For TA, we consider only thetime that Spass takes to solve the problem. We do nottake into account the time to convert modal formulasinto �rst order formulas and the time used by Flot-ter to perform the conversion to conjunctive normalform. We also plot the percentage of satis�able w�s toget a rough estimation of where the cross-over pointof 50% of satis�able w�s occurs.Consider �rst Figure 2 left.9 Notice the logarithmicscale on the vertical axis. As a �rst remark, observethe dramatic e�ect that disabling the sorting of theinput formula has on the computational behavior ofSAT-based procedures: the gap between KsatLisp-(unsorted) and KsatLisp grows up to more than 2orders of magnitude (e.g., for L = 80). The perfor-mance gap between KsatLisp and TA is more than1 order of magnitude for L � 60. The orders ofmagnitude become 2 if we consider KsatC instead ofKsatLisp. Running TA on highly constrained formu-las (L = 140; 180; 200), we observed that the medianCPU time on 100 samples is always higher than 40seconds, so we have still no evidence that TA run-time decreases when formulas become trivially unsat-is�able. If we consider Kris, the gap with KsatLispand KsatC reaches about 4 and 5 orders of magni-tude respectively for L = 28. For L = 36 no w� issolved by Kris within the timeout. 10 These resultssupport Claim 1 thatKsat outperforms the other cur-rent state-of-the-art decision procedures. With respectto [Giunchiglia and Sebastiani, 1996b] we have furtherevidence coming from the comparison with TA. Theseresults also show that the arguments against our em-pirical analysis in [Giunchiglia and Sebastiani, 1996c]9The tests in Figure 2 have been performed ona Pentium 200MHz MMX, 64MBRam workstation on LinuxRed-Hat 2.0.30. Spass and KsatC are compiled by gcc2.7.2.1, option -O2. Kris and KsatLisp/KsatLisp-(unsorted) are compiled by allegro cl 4.3 and gcl 2.2respectively.10For each value of L the test is stopped if more that50% samples exceed the timeout (i.e., the median value isgreater than 1000 seconds). This saves a lot of testing time.Therefore, \no w� is solved within the timeout" means thatthe �rst 50%+1 samples all exceeded the timeout, while theothers have not been tested.

0 20 40 60 80 100 12010

−2

10−1

100

101

102

103

CPU TIME COMPARISON − PS1 (N=4, d=1, %p=0)

# OF CLAUSES (L)

CP

U T

IME

[SE

C] Kris

KsatLisp (unsorted)TA KsatLisp KsatC % satisf

0 20 40 60 80 100 1200

1000

2000

3000

4000

5000

6000DPLL CALLS − PS1 (N=4, d=1, %p=0)

# OF CLAUSES (L)

# C

ALL

S

KsatLispKsatC % satisf

Figure 2: Left: Kris, TA, KsatLisp(unsorted) KsatLisp and KsatC median CPU time, 100 samples/point.Right: KsatLisp and KsatC median search space size, 100 samples/point. Background: satis�ability percent-age.given in [Hustadt and Schmidt, 1997a; Hustadt andSchmidt, 1997b] are wrong. Even �xing the testingmethodology, there is still evidence supporting Claim1. The di�erent results obtained in [Hustadt andSchmidt, 1997a; Hustadt and Schmidt, 1997b] are sim-ply due to the fact that Hustadt and Schmidt testedKsatLisp(unsorted) and not KsatLisp, clearly un-derestimating the e�ects of this choice.Consider now Claim 2 that SAT based decision proce-dures are intrinsically bound to be more e�cient thantableau based decision procedures. In [Giunchiglia andSebastiani, 1996c], (and, more in detail, in [Giunchigliaand Sebastiani, 1996b]) this strong claim was sup-ported by some empirical results and also by a theoret-ical analysis. We have discussed the empirical resultsin the above paragraph. For what concerns the the-oretical analysis given in [Giunchiglia and Sebastiani,1996c; Giunchiglia and Sebastiani, 1996b], we still sup-port it. The argument goes as follows. We start fromthe observation that SAT deciders are intrinsically su-perior to propositional tableaux, for(a) they don't generate redundant assignments, and(b) they prune branches as soon as they violate onepropositional constraint.Then, it is su�cient to notice that, as [Giunchiglia andSebastiani, 1996c; Giunchiglia and Sebastiani, 1996b]show, this performance gap propagates and expandswith the modal depth of the formulas.Finally, consider Claim 3 about the existence of easy-hard-easy patterns. From a theoretical point of view,it is shown in [Giunchiglia and Sebastiani, 1996b] thatthese patterns are a consequence of property (b) above.

From an empirical point of view, consider Figure 2,right. This �gure reports the size of the search spacefor both KsatLisp and KsatC. It is easy to noticethe existence of easy-hard-easy peaks centered aboutthe 50% satis�able point. This issue will be furtherdiscussed in Section 4.4 KsatC VS. TAConsider now the three problem sets PS2 (d = 1; N =4), PS3 (d = 1; N = 5) and PS4 (d = 1; N = 6). PS2and PS4 reproduce the two experiments presented in[Hustadt and Schmidt, 1997a], with a slight change inthe testing methodology. In particular, the improvedtest methodology presented in [Hustadt and Schmidt,1997a] and described in Section 3 is not yet optimal,as it may be the case that multiple occurrences of thesame modal atom 2C (or di�erent permutations of2C) occur in clauses of depth > 0. For instance, themodal clause 2(A1 _A2 _A3)_:2(A1 _A3 _A2) be-comes a tautology after sorting. To avoid this, we haveintroduced a further slight improvement in the w� gen-erator: pre-sort all modal atoms and avoid multipleoccurrences of (sorted) modal atoms inside clauses. Itis clear that there is no point in trying to simplify thesesamples.The results of PS2, PS3, PS4 are described in Figure3.11 In all �gures, the horizontal axis represents thenumber of clauses L, and each point represent the me-dian value out of 100 values. The three columns cor-11All the tests of Figure 3 have been performed with thesame machine, operating system and compilers as those inFigure 2.

0 20 40 60 80 100 12010

−2

10−1

100

101

102

103


# OF CLAUSES (L)

CP

U T

IME

[S

EC

]

TA KsatC % satisf

0 20 40 60 80 100 120 140 160 180 20010

−2

10−1

100

101

102

103


# OF CLAUSES (L)C

PU

TIM

E [

SE

C]

TA KsatC % satisf

0 50 100 150 200 250 300 350 40010

−2

10−1

100

101

102

103


# OF CLAUSES (L)

CP

U T

IME

[S

EC

]

TA KsatC % satisf

0 20 40 60 80 100 1200

500

1000

1500DPLL CALLS − PS2 (N=4, d=1, %p=0)

# OF CLAUSES (L)

# C

AL

LS

KsatC % satisf

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7x 10

4 DPLL CALLS − PS3 (N=5, d=1, %p=0)

# OF CLAUSES (L)

# C

AL

LS

KsatC % satisf

0 50 100 150 200 250 300 350 4000

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

6 DPLL CALLS − PS4 (N=6, d=1, %p=0)

# OF CLAUSES (L)

# C

AL

LS

KsatC % satisf

Figure 3: First row: TA versusKsatC on PS2, PS3, PS4. Second row: KsatC size of search space. Background:satis�ability percentages (100% satis�able at end-scale).respond to PS2, PS3 and PS4 respectively. In the �rstrow we compare TA and KsatC CPU times, in loga-rithmic scale. In the second row we plot the KsatCglobal number of recursive DPLL calls, i.e., the sizeof the space e�ectively searched by KsatC, in linearscale. The dotted plots in the background representthe percentages of satis�able formulas, in linear scale,with 100% satis�able at end scale.An eye-catching observation to the �rst row shows thatKsatC performs better than TA in all the three tests.For PS2, the gap between TA and KsatC is of 2 (al-most 3) orders of magnitude for L = 68 (i.e. at thecross-over point of 50% of satis�able w�s) and goes uptill 4 orders of magnitude at the right end side of thehorizontal axis. For PS3 and PS4, TA median valuesexceed the timeout for L = 85 and 102 respectively,while the corresponding values of KsatC are 2.0 and

22.9 seconds. TA keeps exceeding the timeout for allthe successive values. After L = 125 and 162 respec-tively, no w� is solved by TA within the timeout.It is very important to notice the qualitative di�erencebetween the KsatC and TA plots: when the testsenter the \unsatis�able" area, the KsatC curves de-crease with L. In fact the size of the space searched(bottom row) tends to zero whenever L approachesthe \100% unsatis�able" area. [Giunchiglia and Sebas-tiani, 1996b] showed that this feature is a consequenceof the ability of a procedure to detect constraint vi-olations as soon as they occur: the more constrainedthe formula is, the more likely a branch violates a con-straint, the higher the search tree is pruned. Roughlyspeaking, this matches the general intuition that over-constrainedness makes unsatis�ability more evident,and thus easier to detect

020

4060

80100

120

50

60

70

80

90

1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

# OF CLAUSES (L)

KsatC CPU TIME − PS2 (N=4, d=1, %p=0)

percentiles

CP

U T

IME

[SE

C]

020

4060

80100

120

50

60

70

80

90

1000

100

200

300

400

500

600

700

# OF CLAUSES (L)

TA CPU TIME − PS2 (N=4, d=1, %p=0)

percentiles

CP

U T

IME

[SE

C]

0

50

100

150

200

50

60

70

80

90

1000

5

10

15

20

25

30

35

# OF CLAUSES (L)


percentiles

CP

U T

IME

[SE

C]

0

50

100

150

200

50

60

70

80

90

1000

200

400

600

800

1000

# OF CLAUSES (L)


percentiles

CP

U T

IME

[SE

C]

0

100

200

300

400

50

60

70

80

90

1000

200

400

600

800

1000

# OF CLAUSES (L)


percentiles

CP

U T

IME

[SE

C]

0

100

200

300

400

50

60

70

80

90

1000

200

400

600

800

1000

# OF CLAUSES (L)


percentiles

CP

U T

IME

[SE

C]

Figure 4: The percentile graphs for KsatC (left column) and TA (right column) on PS2 (top row), PS3 (middlerow) and PS4 (bottom row).

(see [Williams and Hogg, 1994] for a �ne-grained anal-ysis of this point.) Similarly to tableau based systems[Giunchiglia and Sebastiani, 1996b], TA does not seemcapable to take full advantage of over-constrainednessof the input w�s. For instance, in the extreme rightpoints of some plots, TA is not able to solve onesingle instance within the timeout, while these in-stances are easy or even trivial to solve for KsatC.To support this consideration we generated a sampleL = 200; N = 5; d = 1 and tested TA without timelimits. TA came out with a response after 9534 sec-onds (more than two hours and a half) while KsatCtook 0.88 seconds of CPU.It is also worth noticing that the plots support the ev-idence of a phase transition phenomenon { i.e., steepsatis�ability transition plots and easy-hard-easy hard-ness patterns whose peaks are located about the 50%satis�ability point { for K-satis�ability. This con�rmsthe analogous results in [Giunchiglia and Sebastiani,1996c; Giunchiglia and Sebastiani, 1996b].All the above considerations are con�rmed by the Q%-percentile graphs in Figure 4. Formally, the Q%-percentile of a set S of values is the value V suchthat Q% of the values in S are smaller or equal toV . The median value of a set thus corresponds tothe 50% percentile of the set. Figure 4 reports the50%, 60%, 70%, 80%, 90% and 100% percentile curvesfor KsatC (left column) and TA (right column) onPS2 (top row), PS3 (middle row) and PS4 (bottomrow). All the Q%-percentile plotted curves show thatKsatC performs better than TA (notice the di�erentend of scale used on the vertical axis for KsatC andTA), except for those points in PS4 on which bothKsatC and TA exceed the time limit. However, onthe basis of the respective values of the surroundingpoints, we may conjecture that even for these pointsKsatC values are less than the respective TA values.Interestingly, for PS2, there does not seem to be muchdi�erence between the 50%-percentile and the 90%-percentile curves for TA. On PS2, TA 100%-percentilecurve seems to present an easy-hard-easy pattern, butthis is the only curve where this seems to happen.To extend our comparison, we have further tested TAandKsatC on the class of w�s f'Kd gd=1;2;::: presentedin [Halpern and Moses, 1992]. These are K-satis�ablew�s, with depth d and 2d + 1 propositional letters.This test is interesting as every Kripke structure sat-isfying 'Kd has at least 2d+1 � 1 distinct states, whilej'Kd j is O(d2). From the results in [Halpern and Moses,1992] we can reasonably assume a minimum exponen-tial growth factor of 2d in time for any ordinary algo-rithm based on Kripke semantics. We have run TAand KsatC on these formulas, for increasing values ofd.Figure 5 presents the resulting CPU times. 12 TA12The test in Figure 5 have been performed on a SUN

0 2 4 6 8 10 12 1410

−2

10−1

100

101

102

103

104

105

CPU TIME COMPARISON − HALPERN & MOSES WFFS

DEPTH

CP

U T

IME

[S

EC

]

TA KsatC 1000 sec

Figure 5: TA and KsatC CPU times for the class of'Kd formulas.and KsatC CPU times grow approximately as (5:0)dand (2:3)d, respectively, exceeding the 1000 secondslimit for d = 7 (1032.35 seconds) and d = 14 (1616.87seconds), respectively. This con�rms the performancegap highlighted above. On the same test, on thesame machine, Kris and KsatLisp grow approxima-tively as (12:7)d and (2:6)d, respectively. They exceedthe 1000 seconds limit for d = 7 (5085 seconds) andd = 11 (2541.28 seconds) respectively, as reported in[Giunchiglia and Sebastiani, 1996c].5 CONCLUSIONOur analysis in this paper suggests that SAT baseddecision procedures have superior behavior than deci-sion procedures based on the translation method. Inprevious papers we have presented results, con�rmedby the results in this paper, which suggest that SATbased decision procedures are also more e�cient thantableau based decision procedures. So far we have re-stricted ourselves to few basic modal logics (e.g., K,S5). An interesting open problem is whether these(substantial) computational advantages will also ex-ist in other modal logics which are more \interesting"from an applicational point of view, e.g., CTL, LTL,Dynamic Logics.ACKNOWLEDGEMENTSThe work described in this paper has bene�tted a lotfrom the interactions with Ullrich Hustadt and Re-nate Schmidt. The results published in their papersSPARC10, 32MBRam workstation on SunOS 4.1.4. Spass andKsatC are compiled by gcc 2.7.2, option -O2.

[Hustadt and Schmidt, 1997a; Hustadt and Schmidt,1997b] challenged us and forced us to look deeper intothe previous analisys and results. Furthermore, thehelp of Hustadt has been crucial in the generation ofsome of the results described in this paper.This work has also bene�tted of many useful discus-sions with Ian Horrocks, who provided very usefulfeedback.The MRG group at Genova put up with many weeksof time-consuming tests.Finally, thanks to the anonymous referees whose crit-icisms helped us during the writing of the revised ver-sion.References[Baader et al., 1994] F. Baader, E. Franconi, B. Hol-lunder, B. Nebel, and H.J. Pro�tlich. An EmpiricalAnalysis of Optimization Techniques for Termino-logical Representation Systems or: Making KRISget a move on. Applied Arti�cial Intelligence. Spe-cial Issue on Knowledge Base Management, 4:109{132, 1994.[Buro and Buning, 1992] M. Buro and H. Buning. Re-port on a SAT competition. Technical Report 110,University of Paderborn, Germany, November 1992.[Crawford and Auton, 1993] J. Crawford and L. Au-ton. Experimental results on the crossover point insatis�ability problems. In Proc. of the 11th NationalConference on Arti�cial Intelligence, pages 21{27,1993.[Davis and Putnam, 1960] M. Davis and H. Putnam.A computing procedure for quanti�cation theory.Journal of the ACM, 7:201{215, 1960.[Davis et al., 1962] M. Davis, G. Longemann, andD. Loveland. A machine program for theorem prov-ing. Journal of the ACM, 5(7), 1962.[Fitting, 1983] M. Fitting. Proof Methods for Modaland Intuitionistic Logics. D. Reidel Publishg, 1983.[Giunchiglia and Sebastiani, 1996a] F. Giunchigliaand R. Sebastiani. Building decision procedures formodal logics from propositional decision procedures- the case study of modal K. In Proc. of the 13thConference on Automated Deduction, Lecture Notesin Arti�cial Intelligence, New Brunswick, NJ, USA,August 1996. Springer Verlag. Also DIST-TechnicalReport 96-0037 and IRST-Technical Report 9601-02.[Giunchiglia and Sebastiani, 1996b] F. Giunchigliaand R. Sebastiani. Building decision procedures formodal logics from propositional decision procedures- the case study of modal K(m). Technical Report9611-06, IRST, Trento, Italy, 1996. A shorter ver-sion will appear on Information and Computation.

[Giunchiglia and Sebastiani, 1996c] F. Giunchigliaand R. Sebastiani. A SAT-based decision procedurefor ALC. In Proc. of the 5th International Confer-ence on Principles of Knowledge Representation andReasoning - KR'96, Cambridge, MA, USA, Novem-ber 1996. Also DIST-Technical Report 9607-08 andIRST-Technical Report 9601-02.[Halpern and Moses, 1992] J.Y.Halpern and Y. Moses. A guide to the completenessand complexity for modal logics of knowledge andbelief. Arti�cial Intelligence, 54(3):319{379, 1992.[Heuerding et al., 1996] A. Heuerding, G. Jager,S. Schwendimann, and M. Seyfried. The logicsworkbench LWB: A snapshot. Euromath Bulletin,2(1):177{186, 1996.[Hollunder et al., 1990] B. Hollunder, W. Nutt, andM. Schmidt-Schau�. Subsumption Algorithms forConcept Description Languages. In Proc. 8th Eu-ropean Conference on Arti�cial Intelligence, pages348{353, 1990.[Horrocks, 1997] Ian R. Horrocks. OptimisingTableaux Procedures for Description Logics. PhDthesis, Department of Computer Science, Universityof Manchester, 1997.[Hustadt and Schmidt, 1997a] U. Hustadt and R.A.Schmidt. On evaluating decision procedures formodal logic. Research report MPI-I-97-2-003, Max-Planck-Institut f�ur Informatik, Saarbr�ucken, Ger-many, February 1997.[Hustadt and Schmidt, 1997b] U. Hustadt and R.A.Schmidt. On evaluating decision procedures formodal logic. In Proc. of the 15th International JointConference on Arti�cial Intelligence, 1997.[Kripke, 1959] S. Kripke. A completeness theorem inmodal logic, volume 24. 1959.[Massacci, 1994] F. Massacci. Strongly analytictableaux for normal modal logics. In Proc. of the12th Conference on Automated Deduction, 1994.[Mitchell et al., 1992] D. Mitchell, B. Selman, andH. Levesque. Hard and Easy Distributions of SATProblems. In Proc. of the 10th National Conferenceon Arti�cial Intelligence, pages 459{465, 1992.[Ohlbach, 1991] H. J. Ohlbach. Semantics basedtranslation methods for modal logics. Journal ofLogic and Computation, 1(5):691{746, 1991.[Smullyan, 1968] R. M. Smullyan. First-Order Logic.Springer-Verlag, NY, 1968.[Weidenbach et al., 1996] C. Weidenbach, B. Gaede,and G. Rock. SPASS & FLOTTER version0.42. In M.A. McRobbie and J.K. Slaney, edi-tors, Proceedings of the 13th Conference on Auto-mated Deduction (CADE-13), volume 1104 of Lec-ture Notes in Arti�cial Intelligence, pages 141{145,New Brunswick, New Jersey, USA, July/August1996. Springer.

[Williams and Hogg, 1994]C. P. Williams and T. Hogg. Exploiting the deepstructure of constraint problems. Arti�cial Intelli-gence, 70:73{117, 1994.

More Evaluation of Decision Procedures for Modal Logics

Documents

Transcript of More Evaluation of Decision Procedures for Modal Logics