Planning as model checking: The performance of ProB vs NuSMV

10
Planning as Model Checking: The performance of ProB vs NuSMV * Tertia Hörne and John A. van der Poll School of Computing, University of South Africa ABSTRACT In this paper we investigate the feasibility of using two differ- ent model-checking techniques for solving a number of clas- sical AI planning problems. The ProB model checker, based on mathematical set theory and first-order logic, is specif- ically designed to validate specifications of concurrent pro- grams written in the B specification language. ProB uses a constraint logic programming environment to perform model checking. NuSMV is the other model checker used in this work. It is an extension of SMV and makes use of sym- bolic model checking techniques to deal with the state ex- plosion problem common to model checking in general. The problem is represented using Binary Decision Diagrams and model checking is performed using tableaux theorem proving techniques. The scope of the problems chosen is currently limited but it is envisaged that the methodology proposed could usefully be extended to larger planning problems. Categories and Subject Descriptors F.4.1 [Theory of Computation]: Mathematical Logic and Formal Languages—Mathematical Logic ; I.2.3 [Computing Methodologies]: Artificial Intelligence— Deduction and Theorem Proving ; I.2.4 [Computing Methodologies]: Artificial Intelligence— Knowledge Representation Formalisms and Methods General Terms Planning, model checking, satisfiability, constraint logic pro- gramming, BDDs, tableaux theorem proving Keywords Planning, model checking, constraint logic programming, satisfiability, BDDs, tableaux theorem proving * (Does NOT produce the permission block, copyright information nor page numbering). For use with ACM PROC ARTICLE-SP.CLS. Supported by ACM. T. H¨ orne, School of Computing, UNISA, P.O. Box 392, 0003 Pretoria, South Africa. [email protected] John A. van der Poll, School of Computing, UNISA, P.O. Box 392, 0003 Pretoria, South Africa. [email protected] 1. INTRODUCTION Model checking is a popular technique traditionally used for the verification of concurrent systems such as logic circuits, embedded systems and reactive systems. Model checker- ing provides an environment in which the behaviour of such systems can be monitored. The system under consideration is described in a language suitable for the relevant model checker and this is referred to as a model of the system. The behavioural specifications of the model are described in a temporal logic such as Linear Temporal Logic (LTL), Computation Tree Logic (CTL) or CTL*, which is a super- set of LTL and CTL [Clarke et al. 1999]. When we consider model checking techniques in general, the emphasis is usu- ally on the validity of some formula ϕ and the object is to determine whether ϕ is valid in the model under considera- tion. Recent research has shown that model checking also promises to be a good approach for planning problems. This includes the class of classical AI planning problems ([Giunchiglia and Traverso 2000]; [Cimatti and Roveri 2000]; [Pistore and Traverso 2000]; [Giunchiglia and Traverso 1999]). One of the features frequently verified by model checkers is the safety properties of a system. Safety properties are typically rep- resented as a state, or a set of states, that should never be reached. In the case of planning, the model checking problem becomes a satisfiability problem and the question becomes ’is ¬ϕ satisfiable in this model?’ In other words, the negation of the goal is stated as a safety property and presented to the model checker, either by including it in the temporal logic specifications, or by stating it as an invariant of the model. If the problem is found to be satisfiable, one of the strong points of model checkers is activated: it gener- ates a counterexample and therefore provides a solution to the relevant planning problem. The fact that this approach is feasible is supported by Giunchiglia et al. [2000], who state that, since planning should be done by semantically checking the truth of a formula, planning as model checking is conceptually similar to planning as propositional satisfia- bility.

Transcript of Planning as model checking: The performance of ProB vs NuSMV

Planning as Model Checking:The performance of ProB vs NuSMV ∗

Tertia Hörne and John A. van der PollSchool of Computing, University of South Africa

ABSTRACTIn this paper we investigate the feasibility of using two differ-ent model-checking techniques for solving a number of clas-sical AI planning problems. The ProB model checker, basedon mathematical set theory and first-order logic, is specif-ically designed to validate specifications of concurrent pro-grams written in the B specification language. ProB uses aconstraint logic programming environment to perform modelchecking. NuSMV is the other model checker used in thiswork. It is an extension of SMV and makes use of sym-bolic model checking techniques to deal with the state ex-plosion problem common to model checking in general. Theproblem is represented using Binary Decision Diagrams andmodel checking is performed using tableaux theorem provingtechniques. The scope of the problems chosen is currentlylimited but it is envisaged that the methodology proposedcould usefully be extended to larger planning problems.

Categories and Subject DescriptorsF.4.1 [Theory of Computation]: Mathematical Logic andFormal Languages—Mathematical Logic

; I.2.3 [Computing Methodologies]: Artificial Intelligence—Deduction and Theorem Proving

; I.2.4 [Computing Methodologies]: Artificial Intelligence—Knowledge Representation Formalisms and Methods

General TermsPlanning, model checking, satisfiability, constraint logic pro-gramming, BDDs, tableaux theorem proving

KeywordsPlanning, model checking, constraint logic programming,satisfiability, BDDs, tableaux theorem proving

∗(Does NOT produce the permission block, copyrightinformation nor page numbering). For use withACM PROC ARTICLE-SP.CLS. Supported by ACM.

T. Horne, School of Computing, UNISA, P.O. Box 392, 0003Pretoria, South Africa. [email protected]

John A. van der Poll, School of Computing, UNISA, P.O.Box 392, 0003 Pretoria, South Africa. [email protected]

1. INTRODUCTIONModel checking is a popular technique traditionally used forthe verification of concurrent systems such as logic circuits,embedded systems and reactive systems. Model checker-ing provides an environment in which the behaviour of suchsystems can be monitored. The system under considerationis described in a language suitable for the relevant modelchecker and this is referred to as a model of the system.The behavioural specifications of the model are describedin a temporal logic such as Linear Temporal Logic (LTL),Computation Tree Logic (CTL) or CTL*, which is a super-set of LTL and CTL [Clarke et al. 1999]. When we considermodel checking techniques in general, the emphasis is usu-ally on the validity of some formula ϕ and the object is todetermine whether ϕ is valid in the model under considera-tion.

Recent research has shown that model checking also promisesto be a good approach for planning problems. This includesthe class of classical AI planning problems ([Giunchigliaand Traverso 2000]; [Cimatti and Roveri 2000]; [Pistore andTraverso 2000]; [Giunchiglia and Traverso 1999]). One of thefeatures frequently verified by model checkers is the safetyproperties of a system. Safety properties are typically rep-resented as a state, or a set of states, that should neverbe reached. In the case of planning, the model checkingproblem becomes a satisfiability problem and the questionbecomes ’is ¬ϕ satisfiable in this model?’ In other words,the negation of the goal is stated as a safety property andpresented to the model checker, either by including it in thetemporal logic specifications, or by stating it as an invariantof the model. If the problem is found to be satisfiable, oneof the strong points of model checkers is activated: it gener-ates a counterexample and therefore provides a solution tothe relevant planning problem. The fact that this approachis feasible is supported by Giunchiglia et al. [2000], whostate that, since planning should be done by semanticallychecking the truth of a formula, planning as model checkingis conceptually similar to planning as propositional satisfia-bility.

csacbc
ACM

In this paper we investigate the feasibility of using two dif-ferent model-checking techniques for solving a number ofclassical AI planning problems. We chose two model check-ers that use different reasoning techniques. Firstly we se-lected ProB, which is based on mathematical set theory andfirst-order logic. It is specifically designed for the verifica-tion of program specifications written in the B specificationlanguage. Model checking is performed within a ConstraintLogic Programming (CLP) environment using SICStus Pro-log [Wallace 1998]. The other model checker used is NuSMV,an extension of the symbolic model checker SMV [McMil-lan 1993]. With NuSMV the problem is represented usingBinary Decision Diagrams (BDDs) [Bryant 1992]. Modelchecking is performed using tableaux theorem proving tech-niques [Fitting 1996]. Both model checkers used can be de-fined as complete [Giunchiglia and Traverso 2000] since thestate space is explored exhaustively: if there exists a plan,it will be found, and they always terminate. However, theydo not provide all possible plans but terminate after one isfound, if it exists.

The model checker NuSMV provides the option of runningin Bounded Model Checking (BMC) mode. The problemrepresentation is converted into a propositional satisfiabil-ity problem in Conjunctive Normal Form (CNF) and fed toa SAT solver [Davis and Putnam 1960] for finding a solu-tion. In a similar fashion ProB has the option of conduct-ing a breadth-first search when running in Temporal ModelChecking mode. This is similar to the Bounded Model Check-ing mode of NuSMV, which is also performed in a breadth-first fashion. This makes both model checkers powerful sincethese two techniques are complementary and are often suit-able to solve different classes of problem.

We compare the performance of the two model checkers insolving a number of classical AI planning problems in orderto establish to what extent each of the two model check-ers is suitable for the different classes of planning problemconsidered. Currently the size of the problems that wereselected is limited to some extent. We envisage that theprinciples applied in the empirical work can easily be ex-tended to larger planning problems that fits within the AIplanning paradigm.

The layout of the paper is as follows: In the next sectionwe give a brief overview of the concept of planning as modelchecking, as well as a short description of the planning prob-lems used for the empirical work discussed in this paper. InSections 3 and 4 we provide a summary of the reasoning ap-proaches used by NuSMV and ProB respectively. In Section5 the research methodology is described and in Section 6 wediscuss the results. The results are analysed in Section 7 andwe conclude with a final section where some suggestions forfurther research are discussed.

2. PLANNING AS MODEL CHECKINGAs mentioned in the previous section, we treat planningproblems as satisfiability problems. We are interested inthe way the satisfiability of a formula ϕ, here referred to asa goal, can be determined using a model checker. The an-

swer lies in either stating ¬ϕ as a conjunct of the invariantof the relevant model and testing for satisfiability. Alter-natively, ¬ϕ can be included in the temporal specificationsand we can test for modal satisfiability. For the remainderof this paper, we refer to both of these as consistency : amodel is inconsistent if the invariant is violated or if thetemporal specifications are not met in all possible states. Ifthe specified goal is found to be satisfiable, i.e. the model isinconsistent, the model checker generates a counterexamplewhich in turn is a solution to the problem statement.

In model checking, a system can be described in terms of aFinite State Machine (FSM). An assignment of values fromthe domains of the individual state variables constitutes onepossible state of the system [Cavada et al. 2005]. State vari-ables may assume different values in different states. A tran-sition relation is used to describe the way in which eventsnondeterministically lead from one state to some next state.One can also specify fairness conditions on the valid paths ofthe execution of the FSM [Cavada et al. 2005], where fair-ness indicates that a particular condition is true infinitelyoften along an infinite path.

In the context of different approaches to planning problems,the model checking approach falls into the interleaving plan-ning paradigm: the plan is represented as a sequence ofstates and only actions that are ready for execution can bechosen [Marinagi et al. 2005]. The transition relation is de-scribed for all possible states and the next action to be per-formed is chosen nondeterministically. The counterexamplethat is generated in the case of an inconsistent model showsa path from the initial state to the final state. Since theproblems are hard-coded, as will be briefly discussed below,we restrict ourselves to only one initial state.

The presentation of planning as a satisfiability problem isnot a new one but has been limited to a large extent todeterministic classical problems [Giunchiglia and Traverso2000]. The nature of the planning problems presented inthis paper is suitable for nondeterminism. All the possiblechoices for a next action are given and one of these (if thecurrent state allows the action) is selected nondeterministi-cally. Such a choice might have the effect that the goal is notreached but, since model checkers do an exhaustive traver-sal of the model, all the alternative paths will be followed.Cimatti and Roveri [2000] developed a conformant plannerCMBP, that is built on top of NUSMV and is suitable fornondeterministic domains [Cimatti and Roveri 2000]. DiManzo, Giunchiglia and Ruffino [1999] ran bench-markingtests using SMV and NuSMV and both model checkers gavepromising results in comparison with state-of-the-art plan-ning systems. To our knowledge, no bench-marking tests ofthis nature has been done with ProB.

We chose the following ‘classical’ AI planning problems, someof which can be regarded as Solitaire game playing problems,as case studies for the empirical work. The following casestudies were selected:

• The Coins Problem: You are given two 2-Dollar coinsand two 5-Dollar coins that are initially arranged in thefollowing sequence: [D2,D2,space,D5,D5], where D de-notes a dollar. The purpose of the game is to perform asequence of simple moves such that the coins are in thepositions identified as the goal [D5,D5,space,D2,D2],i.e. the positions of the coins are swapped around. Acoin can be moved by either sliding it to the emptysquare adjacent to its current position, or by jumpingover another coin into an empty square (space). Thismeans that the relevant coin and the space swap po-sitions. The D2 coins can only be moved to the right,and the D5 coins can only be moved to the left. Nobacking up is allowed.

• The Farmer, Goat, Cabbage, Wolf Problem referredto as the Farmer Problem for short: A farmer hasa wolf, a goat and a cabbage on the north side of ariver, and also a small boat. He wants to bring hisproperty to the south side of the river, but the boatcan only carry himself and at most one of the otherthree. The problem is that if he leaves the wolf andthe goat together unattended then the wolf will eat thegoat, and if the goat is left behind with the cabbage,the goat will finish off the cabbage. The object of theexercise is to ensure that the farmer gets everyone,including himself, across safely.

• The Triangle Puzzle, one of the class of English PegSolitaire Problems: The triangle puzzle is played withfourteen pegs and a board. The board has fifteen holesdrilled into it in the following triangular pattern:

O

O O

O O O

O O O O

O O O O O

Initially, fourteen of the fifteen holes contain a peg.The player proceeds to repeatedly jump one peg overanother, each time removing the peg that has beenjumped. A jump must always be in a straight line,and only one peg can be jumped at a time. The objectof the game (the goal) is to finish with only one peg inthe board.

• The Eight-Tiles Problem, also referred to as the Eight-Puzzle Problem [Bratko 2001]: We have eight tiles,numbered 1 to 8, that are arranged in a 3 x 3 matrix.The 9th tile is represented by a space (it is empty).A tile can be shifted horizontally or vertically into theempty space. The goal is to arrange the tiles into thefollowing pattern, where 0 represents the empty space:

7 8 1

6 0 2

5 4 3

• The Wounded Soldiers problem A number of woundedsoldiers find themselves behind enemy lines. There is

a bridge that they have to cross to safety but they onlyhave limited time to get everyone across. The extent oftheir wounds have an effect on the time it takes to getacross. They have one torch and the soldiers who arenot heavily wounded have to help the others across.At most two soldiers can cross to safety at one timeand one of them has to bring back the torch to theunsafe side for the next crossing. The goal is to get allthe soldiers to safety in 60 minutes or less.

We now digress and in the next two sections we give a briefsurvey of the two model checkers used.

3. NUSMVNuSMV is an extension of the symbolic model checker SMVdeveloped at the Carnegie Mellon University [McMillan 1992]known as CMU SMV. NuSMV is written in ANSI C and isa joint project between the Formal Methods group at ITC-irst, the Model Checking group at CMU, the MechanizedReasoning Group at University of Genova, and the Mecha-nized Reasoning Group at University of Trento. The latestversion is distributed with an OpenSource license [Cavadaet al. 2005].

Like CMU SMV, NuSMV is based on Binary Decision Dia-grams (BDDs) developed by Bryant [1992] and it uses theCUDD-based BDD package, a state-of-the art BDD packagedeveloped at Colorado University. It is POSIX compliant,i.e. it meets the IEEE standards for application program-ming interfaces as well as shells and utilities for the Unix op-erating system. During model construction, NuSMV buildsa clusterised BDD-based finite state machine using the tran-sition relation [Cavada et al. 2005].

A model is described in terms of a hierarchy of modules.Module instantiations are semantically similar to call-by-reference [Cavada et al. 2005]. Once a module is definedit can be reused, and each instance of the module refersto different data structures. In asynchronous systems onewould typically define several instances of the same module.Interleaving concurrency are modelled using processes. Themodel executes one step at a time by firstly nondeterministi-cally choosing one process in the hierarchy, and then exe-cuting the assignment statements within the process thatwas chosen in parallel [Cavada et al. 2005]. If a process iscurrently selected for execution, a special Boolean variableassociated with the process is set to 1, indicating that theprocess is running. Only one process can run at a time.

NuSMV allows for Boolean, integer and enumerated typesfor state variables. The type word can be used to modelarrays of bits on which bitwise arithmetic and logical opera-tions can be performed [Cavada et al. 2005]. Arrays and setscan also be used as data types. All expressions are typed.

NuSMV allows for the description of finite state systems aseither synchronous or asynchronous. The description of sucha system can be done by either using the ASSIGN constraintwhere a system of equations describes how the FSM evolves

over time [Cavada et al. 2005]. Alternatively, the TRANS

constraint can be used to describe the transition relation interms of current state/next state pairs.

State variables can be given an initial value. All expressionsthat form part of an ASSIGN constraint are executed concur-rently in one time step. The next_expression is used toexpress transitions of state variables in terms of the currentand next states.

In NuSMV, specifications can be expressed in CTL and inLTL, extended with PAST operators as well as the PropertySpecification Language (PSL). NuSMV also allows for realtime CTL specifications but, since we are not using this fa-cility in the context of this paper, the reader is referred to[Cavada et al. 2005] for further details.

NuSMV can execute either in interactive mode or in batchmode. One can manipulate the model using different set-tings of environment variables. Some options are providedfor partially controlling the state space, for example:

• the model can be constructed using different partition-ing techniques;

• the order in which the variables should be stored inthe BDDs can be specified;

• the clustering algorithm may be selected;

• variable assignment can be dynamic; and

• the configuration of the BDD package can be manipu-lated [Cavada et al. 2005]

In standard model checking mode, tableaux-based theorem-proving techniques are used. For each LTL specification, atableau of the behaviours of the model that falsify the rel-evant property is constructed. This is then synchronouslycomposed with the model under consideration [Cavada et al.2005]. NuSMV uses a forward search algorithm, i.e thesearch starts from the initial state I and then determinesa set of states reachable from I. At each step in the searchit checks whether any of the new states in the execution pathsatisfies the goal state [Manzo et al. 1999].

When NuSMV is running in bounded model checking modeSAT-based reasoning techniques are used. In this case atableau is constructed in a similar fashion. However, thelengths of the encoded paths are limited. A proposition-ally satisfiability problem is then generated and tackled bymeans of a SAT solver. The current default SAT-solver usedby NuSMV is MiniSAT [Cavada et al. 2005]. The user alsohas the option of using an external SAT-solver. In this casea DIMACS file is generated, where DIMACS is the standardinput format used by most SAT-solvers.

When running in a Windows environment like we did forthis paper, NuSMV runs under DOS which is a bit cum-bersome. In contrast to this, ProB, which we discuss in thenext section, has a well-developed user interface.

4. PROBThe ProB model checker was developed by Leuschel, But-ler and Lo Presti [Leuschel et al. 2005] for the verificationof B-Machines, where a B-Machine comprises specificationsfor a concurrent system written in the B specification lan-guage [Abrial 1996]. The B language is based on set theoryand first-order logic [Enderton 1977] and can be used for thedesign, specification and coding of software systems. WhenB is used for program specifications, this is referred to asthe B-method. In the B-method Abstract Machine Notation(AMN) supports the B-method and is the language used forwriting the specifications of the system under consideration.Each individual module that forms part of the specificationsis referred to as a B-machine [Wordsworth 1996].

Although ProB has been developed mainly in SICStus Pro-log, it utilises various tools for the verification of a specifica-tion written in B [Leuschel and Butler 2003]. The Tcl/Tk li-brary of SICStus Prolog was used for the development of theGraphical User Interface (GUI) of ProB. In order to run aB-machine, ProB uses the JBTools package [Tatibouet 2001]for the translation of B specifications into XML, a notationconvenient for interpretation within Prolog .

The ProB kernel is written in SICStus Prolog. The strengthof the ProB kernel lies in the way in which it utilises theConstraint Logic Programming (CLP) facilities offered bySICStus Prolog such as co-routining and constraint han-dling techniques. The B-kernel could therefore be considereda constraint solver over finite sets and sequences [Leuschelet al. 2001]. CLP languages such as SICStus Prolog are usedfor solving Constraint Satisfaction Problems (CSPs), wherea CSP comprises the following:

• A finite collection of variables;

• A finite domain associated with each individual vari-able; and

• A set of constraints on these variables.

One of the most important features of CLP is the fact thatthe system is geared to deal with constraints and uses a con-straint store for this purpose. If a B-machine specificationdoes not narrow down the value of a variable to one singlepossible value, the typing information of the specifications isused to enumerate its domain. This information is stored inthe constraint store which contains information about eachindividual constrained variable. The constraint store is con-sistent if there exists an assignment of values to all of thevariables in the constraint store such that all the constraintsare met. The value of a variable may depend on a number ofdifferent constraints. As constraints are dealt with, the ad-ditional information that becomes available is incrementallyused for pruning the domain, which can also be referred to

as the enumeration space, of the variable. This is done usingconsistency techniques such as constraint propagation. Thisresults in the pruning of the state space.

The Windows version of ProB has a well-constructed GUIand when ProB is running a B-machine, the following win-dows are displayed on the screen for the benefit of the user:

• Details of the current state of the machine;

• The history that has led the user to the current state;

• A list of all the enabled operations along with properargument instantiations; and

• The AMN code of the machine.

ProB comprises a versatile combination of techniques andcan run in a variety of modes. As soon as a B-Machine isinitialised, a list of enabled actions is displayed. The usercan choose the mode in which ProB should operate. In An-imation mode, ProB can be used interactively where theuser selects the action to be executed next from the list ofenabled operations. The user also has the option of execut-ing a specified number of actions selected randomly by theProB system, or running a fully automated animation byspecifying the total number of operations to be executed.Animation mode is often used during the design phase todetect errors in the program specifications at an early stage.

In verification mode, i.e. automated testing for inconsis-tency, we can either use the Temporal Model Checker orthe Constraint Model Checker. Although it is referred toas the Temporal Model Checker, its ability to model checktemporal logic specifications is limited since this part of thesoftware package is still being developed. It performs anexhaustive search using the initial state as a starting pointto determine whether the system is consistent, i.e. no statecan be reached in which the invariant is violated. A depth-first search is used as default, but the user can also choosebreadth-first search if required.

It is also possible to specify the maximum number of nodesthat should be traversed per iteration (the default is 100000).The latest versions of ProB can perform LTL model check-ing, but not CTL model checking. Alternative techniquesare available for model checking both LTL and CTL spec-ifications. One option is to save the current state spaceto a CSP (Communicating Sequential Processes) file [Hoare1985] and then use this file as input to the FDR (Failures-Divergence Refinement) tool [Formal Systems Europe Ltd.]. Another option is to save the current state space to aPromela file that can be used as input to the SPIN modelchecker [Holzmann 1997] for the verification of LTL and/orCTL specifications. For further information the reader isreferred to [Leuschel et al. 2001] and [Butler and Leuschel2005].

Both Temporal Model Checking and Breadth-first model

checking have the following options:

• Violation of the invariant : One can state the nega-tion of the goal as part of the invariant and test forconsistency of the B-machine;

• Deadlocks: If the user selects the option that deadlocksshould be identified, the system will stop execution assoon as a deadlock is detected;

• Goals: The user may define a goal and the systemwill indicate when such a goal is reached and list theactions that led to the particular goal state; and

• Existing nodes A facility for inspecting the existingnodes is provided.

The Constraint Model Checker determines whether a stateexists, not necessarily reachable from the starting state, suchthat the system becomes inconsistent if one single operationis performed. If we use constraint-based checking, the usercan select whether the system should search for invariantviolations, assertion violations or abort conditions. No pathis given. This mode is not suitable for the type of problemswe are addressing in this paper since we are interested infinding a sequence of moves. i.e. a path, from the startingstate to the goal, hence we did not use it.

5. METHODOLOGYFor the sake of efficiency, we hard-coded the problems inthe relevant description languages. There are various waysin which the goal for a planning problem can be presented.For ProB we used the negation of the goal as an invari-ant. Model checking was done in Temporal Model Checkingmode using a depth-first search, which is the default, andalso a breadth-first search. We did not check for possibledeadlocks nor did we use the LTL model checking facility.Since NuSMV does not have this facility available, we alsodid not use the option of searching for a defined goal in ProB.

With NuSMV we made use of various of the model checkingmodes that are available. The negation of the goal can bepresented as an invariant or can form part of the temporalspecifications for the model. Since it is common practice in amodel checking environment to provide a description of themodel and consequently to verify the model using tempo-ral specifications, both LTL and CTL model checking wereperformed in order to observe the performance of NuSMVwith the problems we considered. However, for this paper,no manipulation of the BDDs, such as experimenting withthe variable ordering or with the configuration of the BDDpackage, was implemented. All model descriptions are syn-chronous.

From an optimisation point of view, the obvious choice forplanning problems might seem to be the use of boundedmodel checking in order to obtain the shortest path. How-ever, for a large class of planning problems, such as the class

of English Peg Solitaire Problems, which the Triangle Puz-zle Problem belongs to, bounded model checking is not anoption. There is no backtracking and the length of the pathis constant, which means that any solution is optimal. Forother classes of problem such as the Eight-tiles problem, anoptimal solution can be found if the model checker is run inbounded model checking mode. This is because the searchis done in a breadth-first fashion [Giunchiglia and Traverso2000]. We applied psuedo-optimisation with the WoundedSoldiers problem by setting a maximum time limit on thecrossing of all the soldiers.

Problems such as the Eight-tiles problem contain cyclic paths.This could result in no solution being found. One way toaddress this problem is to use CTL specifications and toimplement the specification methods proposed by [Danieleet al. 2000].

Since we worked in a Windows environment, it should benoted that the execution times as obtained in the test runsare not accurate because of the overheads imposed by the op-erating system [Cavada 2008]. As an initial experiment, werepeated several runs up to five times to determine whetherthe execution times of the individual runs differ. No signifi-cant differences were noted, so for the remaining test runs weused only one run to obtain the respective execution times.ProB displays the execution time in a separate window usingan embedded Tcl/Tk environment that also provides somestatistical data. We ran NuSMV in interactive mode us-ing the NuSMV source command that uses an input filecontaining a sequence of interactive commands. In this way,the time command can be used to record the execution time.

All the practical work was done on an HP Centrino Note-book with an Intel Pentium M 1.73GHz processor and 504MBRAM, running in a Microsoft Windows XP Professional en-vironment.

6. RESULTSThe test results are summarised in the following subsections.In these tables TMC denotes Temporal Model Checking,BMC denotes Bounded Model Checking, INVAR denotesusing an invariant in NuSMV, and BF denotes breadth-firstsearch in ProB. Time is given in seconds unless otherwiseindicated. An entry of ’-’ in the table indicates that no so-lution was found within five minutes.

6.1 The Triangle PuzzleThe following table contains a summary of the time to finda solution to the Triangle Puzzle by the two models checkers.

ProB NuSMVTMC BF INVAR LTL CTL BMC

14 pegs 109, 63 − 1.1 − − −10 pegs 15.08 15.43 0.6 21.3 18.0 −6 pegs 0.27 0.28 0.6 3.0 1.9 223.53 pegs 0.17 0.16 2.0 2.0 2.0 34.4

NuSMV model checking with an invariant produced the bestresults. The only other case where a solution for 14 pegs wasfound in less than five minutes was ProB running in Tem-poral Model Checking mode. The performances of ProB forsix and three pegs respectively were better than any of theNuSMV results.

NuSMV running in BMC mode, ran out of memory at depth7 and reported that it could not convert the Boolean ex-pression into a CNF problem. The combinatorial explosionproblem is relevant here. At this stage the length of the listof clauses was already 218618 and the length of the list ofvariables 200. At depth 6 it took more that 100 seconds forthe conversion of the problem and the MiniSat run itself.

6.2 The Eight-tiles problemIn the table given below, 9 tiles means that there are ninetiles out of position in the starting state, etc. We used threedifferent starting states, denoted by (1), (2) and (3), for thecase where there are 5 tiles out position in the starting state.The entry true in the table denotes that no counterexamplecould be found.

ProB NuSMVTMC BF INVAR LTL CTL BMC

9 tiles 28.61 45.33 1.8 − 155.9 175.75 tiles (1) − − 73.3 93.7 78.4 −5 tiles (2) 0.49 1.07 0.90 true true −5 tiles (3) 0.06 0.06 2.9 − 159.5 −

3 tiles 0.49 0.25 0.8 − 149.0 1.1

The ’number of tiles’ out of position is frequently used insearch algorithms where some cost function is associatedwith individual moves [Bratko 2001]. For the case where theinitial state has five tiles out of position, the ProB resultsdiffer significantly depending on the starting state, which werefer to as a starting pattern, of the tiles. In the first case,no solution was found for both verification modes. Withanother starting pattern, solutions were found within an ac-ceptable time in both modes. The results of ProB weregenerally better than that of NuSMV.

Exept for NuSMV model checking with an invariant, theresults of NuSMV were generally poor. For example, withfive tiles out of position and the initial pattern numbered 2,NuSMV found a solution model checking with an invariant.In this case, no counterexample was found by NuSMV run-ning in CTL nor in LTL mode.

The nature of this type of problem lends itself to the possi-bility of cycles. Adding the fact that the next move is chosennondeterministically results in execution times varying sig-nificantly.

6.3 The coins, farmer and soldiers problems

The following table contains a summary of the respectiveexecution times of the two models checkers to find solutionsto the Coins problem, the Farmer problem and the WoundedSoldiers problem. The system descriptions of the WoundedSoldiers problem for ProB is given in Appendix A and thatfor NuSMV in Appendix B.

ProB NuSMVTMC BF INVAR LTL CTL BMC

Coins 0.14 0.17 <0.0 0.2 0.2 0.3Farmer 0.11 0.11 <0.0 0.2 <0.0 0.1Soldiers 0.36 0.44 0.1 0.2 0.3 0.5

From the table it is clear that both model checkers per-formed well with this type of planning problem. The use ofan invariant with NuSMV produced the best results overall.Where BMC techniques were used the performance of ProBwas generally better than that of NuSMV.

7. ANALYSIS OF RESULTSIn this paper we compared the performance of NuSMV andProB using a number of planning problems. It is well knownthat tableaux-based reasoners are generally very efficient[Ben-Ari 2001]. From the results described in the previ-ous section it is clear that the use of the CLP approach usedby ProB compared well with the tableaux-based NuSMVapproach. We also found that system descriptions for ProBwere more elegant and easier to follow than those for NuSMV.This is evidend in the ProB code in Appendix A versus theNuSMV code for the same problem in Appendix B.

As far as ProB is concerned, there was not a significantdifference in the execution times of TMC and BF. WhenBMC techniques were used, the performance of ProB wasgenerally better than that of NuSMV.

The execution times of NuSMV model checking with invari-ants were significantly better than those in which NuSMVwas running in CTL, LTL or BMC mode. With regard toTemporal model checking in NuSMV, the performance wasbetter using LTL rather than CTL specifications.

In general, BDD-based computations are subject to the spaceexplosion problem when computing certain classes of Booleanfunctions [Cimatti and Roveri 2000]. This supports our find-ings for some planning problems such as the Eight-tiles prob-lem where the only instances of a model checker running outof memory space were reported by NuSMV.

In conclusion it appears that, for the type of planning prob-lems considered in this paper, only a few of the options re-ported on in the previous section are suitable. These arethe CLP-based ProB, running in either Temporal ModelChecking mode or performing a breadth-first search, and thetableaux-based NUSMV using an invariant. Although thiswas not tested, the use of a goal definition in ProB shouldalso be suitable.

8. FURTHER RESEARCHAlthough the scope of the problems we addressed in this pa-per was limited, it is envisaged that the methodology pro-posed could usefully be scaled up to larger planning prob-lems, and eventually to industrial size problems.

Cavada et al. [2005] showed that a dramatic improvementin performance by NuSMV can be obtained by selecting agood ordering for the list of variables used to build clusters.This, together with dynamic variable allocation, will be agood avenue to explore. One could also establish whethervariable ordering in the code has an effect on the perfor-mance of ProB.

Since ProB is POSIX compliant, it should be possible to ob-tain accurate execution times in a Linux environment. Run-ning both ProB and NuSMV in a Linux environment couldbe used to do bench-marking with state-of-the-art planningsystems.

We found the coding of the problems in the respective de-scription languages cumbersome. Even for small problemsthe number of lines of coding is quite large. In this regard,ProB is much easier to use and clearer to the reader thanNuSMV. It is important that this aspect be investigated fur-ther since it might not be feasible to use these packages forlarge planning problems. It appears that working on thegeneralisation of these problems should be a priority. Thiscould mean extensive use of parameters, e.g. to make use ofinput parameters so that the number of coins in the Coinsproblem can be varied, and the number of soldiers plus theirrespective crossing times can be supplied as an input to theSoldiers problem.

Another avenue that would be worthwhile exploring is acomparison of the performance of ProB, which uses constraint-based reasoning, with problems written in a constraint logicprogramming language such as ECLiPSe.

9. ACKNOWLEDGEMENTSA special word of thanks to Roberto Cavada, Michael Leuscheland several members of the nusmv-usergroup for their adviceand technical support via email communication.

10. REFERENCESAbrial, J.-R. 1996. The B-Book: Assigning programs tomeanings. Cambridge University Press.

Ben-Ari, M. 2001. Mathematical logic for ComputerScience, 2 ed. Springer.

Bratko, I. 2001. Prolog Programming for ArtificialIntelligence, 3 ed. Addison-Wesley.

Bryant, R. 1992. Symbolic boolean manipulation withordered binary-decision diagrams. ACM ComputingSurveys 24, 3, 142–170.

Butler, M. and Leuschel, M. 2005. Combining CSPand B for Specification and Property Verification. InInternational Symposium of Formal Methods Europe,J. Fitzgerald, I. Hayes, and A. Tarlecki, Eds. Lecture

Notes in Computer Science, vol. 3582. Springer,Heidelberg, 221–236.

Cavada, R. 2008. Private email communication.

Cavada, R., Cimatti, A., Jochim, C., Keighren, G.,Olivetti, E., Pistore, M., Roveri, M., andTchaltsev, A. 2005. NuSMV 2.4 User Manual. CMUand ITC-irst.

Cimatti, A. and Roveri, M. 2000. ConformantPlanning vis Symbolic Model Cheking. Tech. rep.,ITC-irst, Trento, Italy. Technical Report 0006-04.

Clarke, E., Grumberg, O., and Peled, D. 1999.Model Checking. MIT Press.

Daniele, M., Traverso, P., and Vardi, M. 2000.Strong cyclic planning revisited. In Recent Advances inAI Planning. Springer, Heidelberg, 35–48.

Davis, M. and Putnam, H. 1960. A ComputingProcedure for Quantification Theory. Journal of theACM 7, 1, 201–215.

Enderton, H. 1977. Elements of Set Theory. AcademicPress, Inc.

Fitting, M. 1996. First-Order Logic and AutomatedTheorem Proving , 2 ed. Springer.

Formal Systems Europe Ltd. Failures-DivergenceRefinement - FDR User Manual.

Giunchiglia, F. and Traverso, P. 1999. Planning asModel Checking. In Recent advances in AI Planning.Lecture Notes in Artificial Intelligence, vol. 1809.Springer.

Giunchiglia, F. and Traverso, P. 2000. A partialorder approach to branching time temporal logic modelchecking. In Proceedings of the 5th European Conferenceon Planning. Recent Advances in AI Planning. Springer.

Hoare, C. 1985. Communicating Sequential Processes.Prentice-Hall, Englewood Cliffs, NJ.

Holzmann, G. 1997. The model checker SPIN. IEEETransactions on Software Engineering 23, 5, 279–295.

Leuschel, M. and Butler, M. 2003. ProB: A modelchecker for B. In FME 2003: Formal Methods. LectureNotes in Computer Science, vol. 2805. Springer,Heidelberg, 855–874.

Leuschel, M., Butler, M., and Presti, S. L. 2005.ProB User Manual. University of Southampton, UK.

Leuschel, M., Massart, T., and Currie, A. 2001.How to Make FDR Spin: LTL Model Checking of CSPby Refinement. In International Symposium of FormalMethods Europe. Lecture Notes in Computer Science,vol. 2021. Springer, 99–118.

Manzo, M. D., Giunchiglia, E., and Ruffino, S.1999. Planning via Model Checking in DeterministicDomains: Preliminary Report. In Proceedings of theAIMSA ’98 Conference. Number 1480 in Lecture Notesin Artificial Intelligence. Springer, Heidelberg, 221–229.

Marinagi, C., Panayiotopoulos, T., andSpyropoulos, C. 2005. AI Planning and IntelligentAgents. Idea Group Inc., 225–257.

McMillan, K. 1992. Symbolic Model Checking: AnApproach to the State Explosion Problem. Ph.D. thesis,Carnegie Mellon University.

McMillan, K. 1993. Symbolic Model Checking. KluwerAcademic Press, Massachusetts.

Pistore, M. and Traverso, P. 2000. Planning asModel Checking for Extended Goals inNon-deterministic Domains. Tech. rep., ICT-irst, Trento,Italy. Technical Report 0101-03.

Tatibouet, B. 2001. The JBTools Package. Availableat http://lifc.univfcomte.fr/PEOPLE/tatibouet/JBTOOLS/BParser en.html.

Wallace, M. 1998. Constraint Programming. CRCPress LLC, 17.1–17.17.

Wordsworth, J. 1996. Software Engineering with B.Addison-Wesley.

Appendix A (ProB code)

MACHINE

Soldiers

SETS

MEN = {m1,m2,m3,m4}

VARIABLES

minutes,

timer,

safe,

total_time,

torch

INVARIANT

torch: NATURAL &

minutes: NATURAL &

safe<: MEN &

timer: MEN --> NATURAL1 &

total_time: NATURAL &

total_time > 60

INITIALISATION

safe:= {} || /* soldiers are all on the unsafe side */

timer:= {m1|-> 5, m2|-> 10, m3|-> 20, m4|-> 25} ||

minutes:= 0 ||

total_time:= 61 ||

torch:= 0

OPERATIONS

unsafe_to_safe = PRE

card(safe) /= 4 & torch = 0

THEN

ANY s1, s2 WHERE

(s1 /: safe) & (s2 /: safe) & (s1 /= s2)

THEN

safe:= safe \/ {s1,s2} ||

torch:= 1 ||

IF timer(s1) > timer(s2)

THEN minutes:= minutes + timer(s1)

ELSE minutes:= minutes + timer(s2)

END END END;

safe_to_unsafe = PRE

card(safe) /= 4 & torch = 1

THEN

ANY s1 WHERE s1: safe

THEN

safe:= safe - {s1} ||

minutes:= minutes + timer(s1)||

torch:= 0

END END;

within_time = PRE

card(safe) = 4

THEN

total_time:= minutes

END

END

END

Appendix B (NuSMV code)

MODULE main VAR

total_time: 0..61;

torch: boolean;

s: array 1..4 of boolean;

t:array 1..4 of {5,10,20,25};

nexttime: {0,5,10,20,25};

move: {0,1,2,3,4,12,13,14,23,24,34};

ASSIGN

init(total_time):=0;

init(torch):= 0;

init(s[1]):= 0;

init(s[2]):= 0;

init(s[3]):= 0;

init(s[4]):= 0;

t[1]:= 5;

t[2]:= 10;

t[3]:= 20;

t[4]:= 25;

next(move):= case

torch=s[1] :1;

1 :move;

esac union

case

torch=s[2] :2;

1 :move;

esac union

case

torch=s[3] :3;

1 :move;

esac union

case

torch=s[4] :4;

1 :move;

esac union

case

torch=s[1]&torch=s[2] :12;

1 :move;

esac union

case

torch=s[1]&torch=s[3] :13;

1 :move;

esac union

case

torch=s[1]&torch=s[4] :14;

1 :move;

esac union

case

torch=s[2]&torch=s[3] :23;

1 :move;

esac union

case

torch=s[2]&torch=s[4] :24;

1 :move;

esac union

case

torch=s[3]&torch=s[4] :34;

1 :move;

esac union

move;

next(s[1]):= case

torch=s[1] & (next(move)=1|next(move)=12|

next(move)=13|next(move)=14) :next(torch);

1:s[1];

esac;

next(s[2]):= case

torch=s[2] & (next(move)=2|next(move)=12|

next(move)=23|next(move)=24) :next(torch);

1:s[2];

esac;

next(s[3]):= case

torch=s[3] & (next(move)=3|next(move)=13|

next(move)=23|next(move)=34) :next(torch);

1:s[3];

esac;

next(s[4]):= case

torch=s[4] & (next(move)=4|next(move)=14|

next(move)=24|next(move)=34) :next(torch);

1:s[4];

esac;

next(nexttime) := case

next(move)=1:t[1];

next(move)=2:t[2];

next(move)=3:t[3];

next(move)=4:t[4];

next(move)=12 & t[1]>t[2]:t[1];

next(move)=12:t[2];

next(move)=13 & t[1]>t[3]:t[1];

next(move)=13:t[3];

next(move)=14 & t[1]>t[4]:t[1];

next(move)=14:t[4];

next(move)=23 & t[2]>t[3]:t[2];

next(move)=23:t[3];

next(move)=24 & t[2]>t[4]:t[2];

next(move)=24:t[4];

next(move)=34 & t[3]>t[4]:t[3];

next(move)=34:t[4];

1:0;

esac;

next(total_time):= case

total_time+next(nexttime)>60 :61;

1:total_time+next(nexttime);

esac;

next(torch):=!torch;

INVARSPEC

(s[1] & s[2] & s[3] & s[4]

& total_time > 60)

--LTLSPEC !F(s[1] & s[2] & s[3] & s[4]

& total_time<=60)

--CTLSPEC AG((s[1] & s[2] & s[3] & s[4])

-> total_time>60)