Log auditing through model-checking

15
http://www.lsv.ens-cachan.fr/Publis/ In Proc. 14th IEEE Computer Security Foundations Workshop (CSFW’01), Cape Breton, Nova Scotia, Canada, June 2001, pages 220-236. IEEE Comp. Soc. Press, 2001. Log Auditing through Model-Checking Muriel Roger GIE Dyade, INRIA Rocquencourt Domaine de Voluceau B.P. 105 78153 Le Chesnay Cedex, France [email protected] Jean Goubault-Larrecq LSV, ENS Cachan 61, av. du pr´ esident-Wilson 94235 Cachan Cedex, France [email protected] Abstract Log auditing is a basic intrusion detection mech- anism, whereby attacks are detected by uncovering matches of sequences of events against signatures. We argue that this is naturally expressed as a model- checking problem against linear Kripke models. A vari- ant of the classic linear time temporal logic of Manna and Pnueli with first-order variables is first investigated in this framework. But this logic is in dire need of refine- ment, as far as expressiveness and efficiency are con- cerned. We therefore propose a second, less standard logic consisting of flat, Wolper-style linear-time formu- lae. We describe an efficient on-line algorithm, making the approach attractive for complex log auditing tasks. We also present a few optimizations that the use of a for- mal semantics affords us. 1 Introduction Keeping and managing event logs is a standard and fairly universal way of ensuring basic security, whether at the application, system or network level. In particular, it is a cornerstone of intrusion detection, which relies on extracting useful information on potential or actual intruders to react accordingly. Analyzing logs, however, is hard. Detecting intru- sion patterns by hand quickly becomes infeasible as logs grow. Most intrusion detection systems include filtering and counting mechanisms [14, 17], but this is not enough in general to eliminate false positives, and new mechanisms that attempt to detect combina- tions of patterns throughout the logs are required. To take an example from [13], assume we wish to de- tect an intruder exploiting an old sendmail bug on This work was done as part of Dyade, a common venture between Bull S.A. and INRIA. Unix: in this attack, an intruder copies some shell to /usr/spool/mail/root at a time where the lat- ter does not exist, sets its setuid bit 1 , then sends a fake e-mail message to root. On old implementations, when root thereafter attempted to read his mail, the ownership of /usr/spool/mail/root was sim- ply switched to root, making a setuid-bit copy of a shell available to the intruder. Assume these events are logged. Detecting copies of shell files is a good clue that this attack is attempted, and detecting that a non- root user is changing setuid bits too, however as a systems administrator we would like to be warned— automatically, if possible—only when the same user does both. Reports of one action without the other are false positives, where we are warned against a non- existent attack. We may also want to require that an e-mail was indeed sent to root after these two events happened. So we are looking at correlations between different entries in the log—the user has to be the same in each of the copy and setuid events—, together with constraints on the order in which events occur in the log. Our point is that signatures, i.e., specifications of at- tack patterns, are best expressed in a logic including temporal connectives to express ordering of events. This allows one to describe attacks in a declarative way, free of implementation decisions. As in programming lan- guages, using a declarative language allows one to fo- cus on what to monitor instead of how to monitor. This caters for easier writing and easier understanding of sig- natures, and improves maintainability of signature files. Checking a log against temporal logic signatures will then be a model-checking problem. While model- checking may have high complexity, we shall pay spe- cial attention to efficiency. In particular, we shall see that relying on a logical language with a well-defined 1 In Unix, executables normally run with the rights of the user who launched them. However, files with the setuid bit set will run under the identity of their creator. In particular, a shell owned by root with the setuid bit set will give any user root privilege.

Transcript of Log auditing through model-checking

http://www.lsv.ens−cachan.fr/Publis/In Proc. 14th IEEE Computer Security Foundations Workshop (CSFW’01), Cape Breton, Nova Scotia, Canada, June 2001,pages 220−236. IEEE Comp. Soc. Press, 2001.

Log Auditing through Model-Checking �

Muriel Roger��� �

�GIE Dyade, INRIA RocquencourtDomaine de Voluceau B.P. 105

78153 Le Chesnay Cedex, [email protected]

Jean Goubault-Larrecq��� �

�LSV, ENS Cachan

61, av. du president-Wilson94235 Cachan Cedex, France

[email protected]

Abstract

Log auditing is a basic intrusion detection mech-anism, whereby attacks are detected by uncoveringmatches of sequences of events against signatures. Weargue that this is naturally expressed as a model-checking problem against linear Kripke models. A vari-ant of the classic linear time temporal logic of Mannaand Pnueli with first-order variables is first investigatedin this framework. But this logic is in dire need of refine-ment, as far as expressiveness and efficiency are con-cerned. We therefore propose a second, less standardlogic consisting of flat, Wolper-style linear-time formu-lae. We describe an efficient on-line algorithm, makingthe approach attractive for complex log auditing tasks.We also present a few optimizations that the use of a for-mal semantics affords us.

1 Introduction

Keeping and managing event logs is a standard andfairly universal way of ensuring basic security, whetherat the application, system or network level. In particular,it is a cornerstone of intrusion detection, which relieson extracting useful information on potential or actualintruders to react accordingly.

Analyzing logs, however, is hard. Detecting intru-sion patterns by hand quickly becomes infeasible aslogs grow. Most intrusion detection systems includefiltering and counting mechanisms [14, 17], but thisis not enough in general to eliminate false positives,and new mechanisms that attempt to detect combina-tions of patterns throughout the logs are required. Totake an example from [13], assume we wish to de-tect an intruder exploiting an old sendmail bug on

�This work was done as part of Dyade, a common venture between

Bull S.A. and INRIA.

Unix: in this attack, an intruder copies some shell to/usr/spool/mail/root at a time where the lat-ter does not exist, sets its setuid bit1, then sends afake e-mail message to root. On old implementations,when root thereafter attempted to read his mail, theownership of /usr/spool/mail/root was sim-ply switched to root, making a setuid-bit copy of ashell available to the intruder. Assume these events arelogged. Detecting copies of shell files is a good cluethat this attack is attempted, and detecting that a non-root user is changing setuid bits too, however as asystems administrator we would like to be warned—automatically, if possible—only when the same userdoes both. Reports of one action without the other arefalse positives, where we are warned against a non-existent attack. We may also want to require that ane-mail was indeed sent to root after these two eventshappened. So we are looking at correlations betweendifferent entries in the log—the user has to be the samein each of the copy and setuid events—, together withconstraints on the order in which events occur in the log.

Our point is that signatures, i.e., specifications of at-tack patterns, are best expressed in a logic includingtemporal connectives to express ordering of events. Thisallows one to describe attacks in a declarative way, freeof implementation decisions. As in programming lan-guages, using a declarative language allows one to fo-cus on what to monitor instead of how to monitor. Thiscaters for easier writing and easier understanding of sig-natures, and improves maintainability of signature files.

Checking a log against temporal logic signatureswill then be a model-checking problem. While model-checking may have high complexity, we shall pay spe-cial attention to efficiency. In particular, we shall seethat relying on a logical language with a well-defined

1In Unix, executables normally run with the rights of the user wholaunched them. However, files with the setuid bit set will run underthe identity of their creator. In particular, a shell owned by root withthe setuid bit set will give any user root privilege.

semantics allows the log auditing engine (the model-checker) to benefit from several optimizations. This isentirely automated, hence safe, contrarily to more im-perative languages like Russel [13], where defining andapplying these optimizations must be done by hand, andis therefore error-prone.

We wish to do both on-line and off-line log audit-ing. In the latter, the log auditing engine is given onelog file in its entirety (the traditional approach in model-checking). In on-line log auditing, we require the engineto detect which formulae hold, and when, as soon as pos-sible while the log fills in. Although off-line auditing isuseful, on-line auditing is even more so, but places morestringent efficiency constraints on the auditing engine.Our model-checking algorithms will work in both set-tings. In particular, our second algorithm definitely ison-line, in a precise sense (Theorem 3.12).

The plan of the paper is as follows. After reviewingrelated work, we explore the use of standard linear-timetemporal logic with first-order variables for log audit-ing in Section 2. This will give the reader a flavor ofwhat temporal logic is, and how we can model-checktemporal signatures against (on-line) logs efficiently—at least, as efficiently as possible: we shall show that themodel-checking problem for this logic is NP-complete.We identify the shortcomings of this logic, and conse-quently refine our temporal logic in Section 3, improv-ing the previous approach in many respects. Although,technically, we might have presented the logic, seman-tics and algorithms of Section 3 directly, the logic ofSection 2 is simpler, gives the right intuitions, and jus-tifies many choices leading to Section 3. We report onpreliminary practical results and conclude in Section 4.

Related Work. There are many different paradigms tointrusion detection, which may be classified in anomalydetection and misuse detection systems. The former arebased on normal activity profiles and detect deviationsfrom these profiles. They are well-suited to detect previ-ously unknown attacks, but usually generate many falsepositives. On the other hand, misuse detection systemsare usually signature-based mechanisms that allow oneto keep the number of false positives low, but can onlydetect behaviors obeying known patterns.

Our work lies clearly in the signature-based, misusedetection field. To be more precise, it is based on a de-tection language (taking the terminology of [6]), whichwill take the form of a temporal logic. There are nowa handful of proposals for detection systems and lan-guages, among which NFR and its N-code [16], Emer-ald and P-BEST [9], IDIOT [4], STATL [6], ASAX andRussel [13]. Temporal logics have the advantage thatthey are a high-level, compact and mostly readable no-

tation for events occurring as time passes [10].Amongst those, the systems closest to our work are

IDIOT, STATL and ASAX, and we have borrowed someideas from each. Our algorithms are inspired mostly bythe technique of enabled rules lists of Russel [13], whichare already known to provide fast detection algorithmsin practice. Efficiency is indeed one of our concerns.

However, as Mounji notes, Russel is very low level,and would benefit from a higher-level language thatwould compile to Russel (as in [5], notably). Insteadof compiling to Russel, we choose to provide our ownlow-level auditing engine, because it requires specificdata structures that would be hard—although probablynot impossible—to code in Russel.

Our high-level language is, as announced, a temporallanguage, which seems to make it quite different fromthe state-transition diagrams of STAT (State TransitionAnalysis [15]) and its variants, and also from the thecolored Petri approach advocated in [8] and used in ID-IOT [4]. The particular brand of temporal logic thatwe use in Section 3 are in fact automata, that is, state-transition diagrams. While STATL relies both on events(from an on-line log) and states (snapshots of the sys-tem to monitor), our work focuses on events alone. Thisallows us to do not only on-line but also off-line log au-diting. On the other hand, we are not be able to reasonon states. In principle, as shown by Mounji [13], it ispossible to piggyback a system like ASAX or ours witha static auditing engine, in charge of maintaining a con-sistent view of the system. While this is easy to do (em-bed static queries inside the guards of Section 3.1), thiswould draw us away from our main point—studying logauditing from the angle of model-checking.

One important point in applying model-checking tolog auditing is to give a clean formal semantics to oursignature language. None of the tools and languagesabove rests on a formal semantics: it is assumed thatthe users of intrusion detection systems will somehowgrasp, by intuition or experiment, how rules match, inwhich order they are executed, which have precedenceover which others, etc. The only exception we are awareof is Russel, which however is of a rather low level.

Having a formal semantics gives us a reference pointfor understanding signatures and reasoning about them,and has proved crucial already in other domains of com-puter science, in particular in the design of programminglanguages. Notably, this paves the way for other toolson top of the language, e.g., optimizers, debuggers, testcase generators, etc. This already proved beneficial inour case by allowing our tool to apply non-trivial opti-mizations in the course of detection (Section 3.3).

On the other hand, model-checking temporal log-ics is a well-studied topic [10]. As we use variants of

linear-time temporal logic, it would be natural to usethe standard automata-theoretic model-checking algo-rithm [21]: given a formula

�, build a Buchi automa-

ton ����� recognizing exactly the models of � � , thencheck that the synchronized product of ����� with theautomaton � describing the current model is empty.However, this works for propositional linear-time tem-poral logic, while we require a fragment of first-ordertemporal logic, in which case ���� would be an infiniteautomaton. Our algorithms can be seen as constructingfinite portions of this infinite automaton on demand.

2 Linear Temporal Logic for Log Auditing

Consider again the sendmail attack of the intro-duction. Its signature can be described informally by“find some event in the log stating that some user copied some file to /usr/spool/mail/root, fol-lowed by some event where sets its setuid bit throughchown or fchown, followed by some event statingthat /usr/spool/mail/root changed ownershipto root”. This is a temporal formula, so it seems agood idea to investigate temporal logics to specify suchsignatures. Since our Kripke models—the logs—are lin-ear sequences of events, the classic linear-time temporallogic of [11] seems a good first choice. Finding whethera given signature matches some events in the log is thenjust the model-checking problem for formulae over lin-ear Kripke structures. (This choice will turn out to berather imperfect. However it is worth investigating, ifonly because it is well-known and will serve as a step-ping stone to our more complex logic of Section 3.)

A final requirement for our logic is as follows.Look back at the mail example: we want to be ableto check that the user who copies some file to/usr/spool/mail/root is the same as the onewho later sets its setuid bit. So we need a temporal logicthat is not just propositional temporal logic, but has first-order variables. The syntax and semantics of this logic isdescribed in Section 2.1. Our model-checking algorithmfor this logic is described in Section 2.2. We concludeby reporting on experience with this logic in Section 2.3.This will serve to refine our approach in Section 3.

2.1 Syntax and Semantics

Linear-time temporal logic formulae�

, � , . . . , aregiven by the following grammar:

� � ��� �atomic formulae� � � negation (“not”)� �����conjunction (“and”)� �����disjunction (“or”)� � �(strict) sometimes

where it only remains to specify what atomic formulaeare. The latter denote the basic observations we wishto make on states of Kripke structures, i.e., on loggedevents. Now logged events are records:

Definition 2.1 (Record) A record � is a finite map froma set � of labels to a set of values, which we shall taketo be strings. The value ������� , where � �!� , will also benoted ��" � for convenience. The domain #%$'&(� is the setof labels for which ��" � is defined.

We use record patterns as atomic formulae toobserve events, i.e., records. For example, therecord pattern {id=X, action="creat", ob-ject="ˆ/usr/spool/mail/root$"} matchesany record containing a field id that matches the valueof variable X, a field named action matching theregular expression creat (i.e., containing the stringcreat: we define regular expression matching as find-ing some substring that is in the language defined by theregular expression, as in [20]), a field named objectmatching ˆ/usr/spool/mail/root$ (i.e., exactlyequal to the string /usr/spool/mail/root), andwhich finally may contain other fields.

Formally, record patterns are lists of rows )+* =,.-0/ . Ina row, )1* is a label and ,2-3/ is a field pattern defined aseither a regular expression 4 (between double quotes), ora variable name . We assume no two rows in a givenrecord pattern have the same label.

Our Kripke structures are logs:

Definition 2.2 (Log) A log 5 is a finite or infinite se-quence of records. We let 576 be the ) th record in 5 , ifdefined. The domain #%$'&(5 of 5 is the set of indices )for which 586 is defined. The length

� 5 � of 5 is the sup ofall indices ) such that 586 is defined.

The reason why we allow the log to be infinite is to han-dle on-line model-checking: in this setting, the actuallog is only a finite prefix of a possibly infinite sequenceof events, which grows through time.

Assume a regular expression pattern-matching func-tion �:9<; that takes a string = and a regular expression 4and returns true if and only if = matches 4 . Then:

Definition 2.3 (Kripke Semantics) An environment >is any map from variables to string values.

A field pattern , matches a string = under > if andonly if either , is a variable and >?�@A� � = , or , is aregular expression and �:9B;?�1='CD,?� is true.

Define the relation 5ECF>.CF) � �G� on formulae�

( )H�

#%$'&�5 ), as:

� 5 CF>.CF) � � � iff:{ )1* � =, � C�"<"<"<C for all

� �����'C<"�"<"<C�� ,)+*�� =, � } 586 " )1*�� is defined and

,�� matches 586 " )1*�� under >�E� 5 CF>.CF)��� � �� � � ��� 5 CF>.CF) � � � � and 5EC >2C ) � � ���� � � ��� 5 CF>.CF) � � � � or 5EC >2C ) � � ���� � for some ��� ) C��� # $ &�5 :

5 CF>.C�� � � �2.2 Algorithm

Our algorithm takes as input a finite list of temporalformulae

� � , . . . ,���

and a finite log, and returns whichformulae hold at which states of the log and with whichenvironments. Recall that this algorithm must be on-line, i.e., it must output a positive match as soon as itcan. In particular, it must work in one pass over the log.

The algorithm is inspired from the implementation ofRussel [13], and uses two lists of formulae, ���24 4'9 �8/and !!9 "2/ . Given that we are currently reading entry 576of the log 5 , ���24 4'9#�8/ holds all formulae

�such that

we would like to know whether 5ECF>.CF) � � � . This maydepend on deciding questions of the form 5 CF>�$ CF)&%'� � �� $ ; e.g. if

� � � � then 5 CF>.CF) � � �if and only if

5EC >2C )�%(� � � � or 5EC >2C )�%(� � � � . Instead of decidingright away whether the latter hold, we stash such for-mulae

� $ into the ! 9 "2/ list. When we have dealt withall formulae in ���24 4'9#�8/ , we advance to entry 576*) � in5 (provided it exists), move all formulae from ! 9 "2/ to�+�24 4'9 �8/ , set ! 9 "2/ to the empty list, and start again.

For example, consider the formula�-,

:

{op="connection", result="fail",subject= . }/10

{op="connection", result="pass",subject= . }

which describes the signature where user (who-ever it might be) fails to connect, then later manages toconnect. Write

� � for the first record pattern above,� �

for the second, so�, � � � � � � � . Consider the log:2 354

1 op="connection", result="fail",subject="Joe"

2 op="connection", result="pass",subject="Joe"

3 op="exec", result="pass",object="emacs", subject="Joe"

Here is how our algorithm runs. First ! 9 "2/ is ini-tialized to the empty list, ���24 4'9#�8/ to the list containingonly formula

�,, i.e.,

� � � � � � , and we read the firstrecord 5 � . � � � � � � holds at state 5 � if and only if both

� � and� � � hold at state 5 � , so we remove

� � � � � �from ���24 4'9#�8/ , and add both

� � and� � � to �+�24 4'9 �8/ .

Consider� � , and remove it from �+�24 4'9 �8/ : using �:9<; ,

it is easy to see that� � is satisfied at 5 � if and only

has value "Joe". Now consider the other formula in���24 4'9#�8/ , � � � , and remove it. There is no way to de-cide whether it holds at 5 � by just looking at 5 � . Since5ECF>.CF) � � � � � if and only if )6%7��� #%$'&�5 and ei-ther 5ECF>.CF)8%9� � � � � or 5ECF>.CF)8%9� � � � � � , we add

� �and

� � � to !!9#"2/ , therefore postponing the examinationof� � and

� � � to the next stage. (The careful readermay have noticed that conjunctions and disjunctions aretreated alike: whether we encounter

� � �� � or� � �� �

in ���24 4'9#�8/ , we generate both� � and

� � . The differ-ence will be handled through additional data structures,see below.) At this point, ! 9 "2/ contains

� � and� � � ,

and �+�24 4'9 �8/ is empty, so our work on 5 � is complete.So we move !!9 "2/ to �+�24 4'9 �8/ , empty ! 9 "2/ , and

read 51� . ���24 4'9#�8/ contains just� � and

� � � , and wetry to see which hold at 58� . First take

� � and remove itfrom �+�24 4'9 �8/ : it holds exactly when equals "Joe".So� � � holds at 5 � with equal to "Joe", and since� � also holds at 5 � with equal to the same value, the

formula��,

holds at 5 � with equal to "Joe". (Wecall this argument a deduction chain.) The algorithmthen goes on finding other environments and states atwhich the input formula

�-,holds, in the same spirit.

The algorithm requires additional structures. First,we need to represent the sets � �5: 6 of environments >such that 5ECF>.CF) � � � . Without negation, it is enough touse sets � �5: 6 of the form ��> � ><; > , , where > , is asingle partial environment, i.e., a finite map from vari-ables to values, and >=; > , (“ > extends > , ”) if and onlyif >?>*@ ACB�DCE � > , . This would allow us to represent sets� �5: 6 as a single partial environment > , . With negation,we use additional environments > � , . . . , >F� as negativeconstraints, and represent � �5: 6 as > ,HG > � C<"�"<"<CF>F�+I� ��> �>�; > , and >J�; > � and . . . and >J�; >K�& .

Definition 2.4 (Constraints) A constraint � is eitherthe special symbol L , or a �M�%N� � -tuple > ,HG > � C�"<"�"BC >�� ofpartial environments; > , is the positive constraint, and> � C<"�"<"<CF>F� are the negative constraints. When � �(O

, weallow the shorthand > , .

The environment > satisfies � , and we write > � � � , ifand only if � is of the form > ,HG > � C�"<"<"�CF>F� , and >=;�> , ,and >P�; >36 for all ) , �RQ )SQ9� . � is satisfiable if andonly if > � � � for some environment > .

The satisfiability of constraints is decidable in poly-nomial time. Indeed, it is easy to see that > ,TG > � C<"�"<"<CF>F�is unsatisfiable if and only if there is an ) , �JQ )UQ7� ,such that #%$'& >36SV #%$'& > , and for every " � #%$'& > 6 ,>06 �M"8� � > , �M"8� .

Second, we need to maintain deduction chains. Theyare organized as an and-or tree, representing deductionssuch as “ � � � ��� holds at 586 if both � � and ��� do”(and branching), or “

� � holds at 576 if either � holdsat 5?6*) � , or

� � holds at 586*) � ” (or branching). To ab-stract away from implementation details, we shall usethe paradigm of logic programming, which is related toand-or trees in an essential way [18]. With each formula�

occurring in �+�24 4'9 �8/ at record ) , associate a propo-sitional variable � � �5: 6 , whose intended meaning is “

�holds at 5?6 ”. Deduction chains are maintained throughthe use of constrained clauses of the form � � � E : 6 E��� � ���C: 6�� C<"�"<"�C�� � ��� : 6� � � meaning “ 5ECF>.CF) , � � �, pro-vided 5EC >2C ) � � � � � for every

�, � Q � Q� , and

> � � � ”. When � is the true constraint�

(no negativeconstraint, empty positive constraint), we omit the

� �part; we also omit the � sign if � is true and � � O

.Constrained clauses are generated by expansion rulesfor formulae

�at record ) as shown in Figure 1.

Definition 2.5 (On-line Model-Checking Algorithm)Given temporal formulae

� � C�"<"<"<C � � (the sig-nature file), and a log 5 , the model-checkingalgorithm operates as in Figure 2. We say that -3/���� � { )+* � � ,2-3/ � C<"<"�"<C )1*�� � ,2-3/ � } C � � succeeds ifthere is an environment > such that �H" )1*5� is defined and,2-3/ � matches ��" )1*�� under > for every

�, � Q � Q � ;

then we let it be the partial environment that maps every,2-3/ � that is a variable to >?� ,2-3/ � � .

Constrained clauses are used to handle deductionchains by constrained resolution. Call any clause of theform � � �5: 6 � � � a fact. Given a set � of constrainedclauses, the immediate consequence operator ��� mapssets of facts to sets of facts by: ��� ���7�&I� ��� �3��� � �5: 6 � �� , � � � � "<"<" � �6�2� � ��� � ��: 6 � � � ���C: 6�� C<"<"�"<C�� � ���F: 6� �� , � ��� C���� � ��� : 6�� � � � � � ��� C<"<"�"<C���� � ���F: 6� � ��6�2� ��� . The conjunction of two constraints is de-fined by � � L I� L � �<I� L , and �@> ,HG > � C<"�"<"�CF>F�2� ��@>F$, G >�$ � C<"�"<"<CF>F$��� � is �@> , � >F$, � G > � C<"�"<"<CF>F�8CF>F$ � C�"<"<"�CF>F$���provided > , and >F$, agree on # $ & > , � # $ &(>�$, ; oth-erwise, the conjunction is L . The facts deduced from� are the elements of ! )#"��$ , � �� � � � . The algorithm ofDefinition 2.5 is completed by a reporting procedurethat outputs all facts of the form � � �&% : 6 � � � where�<I� > ,HG > � C<"�"<"<C >�� is satisfiable, ��QP�RQ' , ) � # $ &�5(“Attack #� starting at line ) with values: > , ”). We stressthat our use of constraint logic programming is mostlya means of explanation, and that these facts are deducedincrementally, using a variant of and-or trees, as the al-gorithm runs.

This is correct and complete is in the following sense:

Theorem 2.6 (Termination, Soundness, Completeness)Assume 5 is finite, =-I� � 5 � , and ,�I�)(

�6$ � � � 6 � , where

� � �

denotes the size of the formula�

. Then the algorithm ofDefinition 2.5 terminates in time *�� ,?= � .

Let � be the set of clauses generated by the algorithmof Definition 2.5. Then 5 CF>.CF) � � ��+ if and only if there isa fact � � �,% : 6 � � � deduced from � , with � satisfiable,such that > � � � .

Proof. Termination. Take ���24 4'9#�8/ and !!9#"2/ tobe sets, represented as bit vectors of , bit entries. In-deed, �+�24 4'9 �8/ and ! 9 "2/ will only contain subformu-lae of

� � C<"<"�"<C � � . Represent �.-�-F�8= 9 = as a list. Theset-theoretic operations then take constant time,

-3/����takes time linear in the size of the current record timesthe size of the matched record pattern. We may assumethat we deal with entries from �+�24 4'9 �8/ in order of de-creasing sizes in the loop / – �,0 : number bit entries corre-sponding to subformulae in such a way that smaller sub-formulae get lesser numbers, and start from the highestbits in ���24 4'9#�8/ . Then each formula in ���24 4'9 �8/ will bedealt with exactly once, and there are at most , of them.So the algorithm runs in time *�� ,?= � .

Only if. Let us first show that whenever�

is a for-mula that is in �+�24 4'9 �8/ such that 5EC >2C ) � � �

, thenthere is a fact � � ��: 6 � � � deduced from � such that> � � � . This is by induction on �1=21 ) � % � � �

. Asthe argument is repetitive, let us deal with the case� � � � � �+�24 4'9 �8/ . Since 5ECF>.CF) � � �

, for some) $ � ) , ) $ Q = , it holds 5ECF>.CF) $ � � � . In particular) % �<Q = , so ) % � � # $ & 5 , and 5ECF>.CF) % � � � �or 5ECF>.CF) % � � � � . So at the next turn of the loop0 – �,3 , by induction hypothesis some fact of the form� � �5: 6*) � � � � or � �54 : 6*) � � � � with > � � � is deducedfrom � . In the first case, the clause � � �5: 6 � � � � �5: 6*) �allows us to deduce � � �5: 6 � � � ; in the second case,use the clause � � �5: 6 � � � ��4 : 6*) � . The conjunctive cases� � � � � ��� , � � � � � � � ����� , � � � � � are dealtwith by noticing that if > satisfies two constraints � � and� � , then > satisfies � � � � � .

Observe that�#+

occurs in �+�24 4'9 �8/ at turn ) of theloop 0 – �,3 , because of line 3 . It follows that if 5 CF>.CF) � ���+

, then there is a fact � � �,% : 6 � � � deduced from � ,with � satisfiable, such that > � � � .

If. Let � � �5: 6 � � � be any fact deduced from � . Let� be the smallest integer such that it belongs to � �� � � � .An easy induction on � shows that for every > � � � , itholds 5ECF>.CF) � � � . In the conjunctive cases, notice that if> satisfies � � � � � , then > satisfies both � � and � � . 6

While generating constraints is fast, solving them ishard. In fact model-checking is hard:

Theorem 2.7 (NP-Completeness) Model-checkinglinear-time temporal logic with first-order variablesagainst finite linear models, that is, deciding whetherfor some > and ) , � , it holds 5ECF>.CF) � � ��+

, where 5 is

� ������� 4���� � ������4���� ����� 4������� / ��� ��������� � � �"!$# 4&% �'��( � # 4 ���'��(*)�# 4� �,+ � � � � ��� � � � � !$# 4 % �'� ( � # 4 � � !$# 4 % �'� ( ) # 40 � � 2.-0/�132$465 3 � �7� � � �"!$# 4&% �'��(8# 4:9 �;�'�"!$# 4&% � �"!$# 4:9 �0 � � 2.-0/=<132$465 3 � � �> � ��� / ��� > ����� > ��� � � �"!$# 4&% �'��?@( � # 4 � �"!$# 48% � ��?@(*)�# 4> � ��� + ��� > ����� > ��� � � �"!$# 4&% �'��?@( � # 4 ��� ��?@(*)�# 4> � 0 � A� 2B-C/�132$4"5 3 � > �7� � � �"!$# 4&% �'��?@(8# 4D9 �E���'�"!$# 4:9 �> � 0 � A� 2B-C/=<132$4"5 3 � � � �"!$# 4

Figure 1. Expansion rules

1 F ���E�'���*�HG I �; J �����KG I �

; FLNM �.O���O�G I �;2 G I /

;2 while

� 2A132$465�3 ,P3 F ���E�'���*�HG I F �����'���B�*QRP�� ����S�S�S�� � T�U

;4 while

� F �����'�V�*� <I � &P5 pick

�from F �����'�V�*�

; F �����'���B�KG I F �����'�V�*�*WXP��YU;

6 if�

is a record pattern7 then

Pif Z"[]\I^ M �_��` ��� � 3 4� succeeds then F�LNM �BOE�EOG I F�LNM �BOE�EO]QaP �'�"!$# 4&%cb Z"[ U ;

U8 else if

�is a negated record pattern > �

9 then if Z � \I�^ M �_��`*� �7� 3 4dsucceeds then F�LDM �BOE�EO�G I F�LDM �BOE�EO]QaP � ��!$# 4&%eb �'f Z � U ;

10 else FLNM �.O���O�G I FLNM �.O���OHQaP �'�"!$# 4_U;

11 elseP F �����'���B�HG I F ���E�'���*�*Qg������� 4����

;(* see Figure 1. *)12 J �����HG I J �����*Qh�i�V����4����

; F�LNM �BOE�EO�G I FLNM �BOE��OHQg����� 4�����;UXU

132 G I 2.-j/

; F �����'�V�*�HG I J �V���; J �����HG I �

;U

Figure 2. On-line model-checking algorithm

finite, is NP-complete.

Proof. Note first that, in the propositional case (nofirst-order variables), � is a set of ordinary propositionalHorn clauses, for which deduced facts (the least model)can be generated in linear time [12]. So, without first-order variables, this problem is polynomial-time.

To show that the problem is in NP, guess the restric-tion of > to the free variables of

� � C�"<"<"<C � � . This isclearly of polynomial size. Checking that 5ECF>.CF) � �G��+for some ) , � can be done by running the algorithm ofDefinition 2.5, replacing the generation of constrainedfacts � � ��: 6 � � � either by the generation of the fact� � ��: 6 , if > � � � , or by doing nothing if > �� � � . As thisalgorithm runs in polynomial time, and solving the re-sulting set of propositional clauses plus the goal clauses� � � �&% : 6 is linear-time, the problem is in NP.

The problem is NP-hard, by reduction from 3-SAT[7], the problem of the satisfiability of sets of clauseswith 3 literals. Let k I� �T� � C<"�"<"�C �6�& be a set of clauseswith 3 literals, and

� � C<"�"<"�C � � be the variables oc-curring in them. Let 5 be the log with two records,5 � I� {a="true"} and 58�I� {a="false"}. Producethe signature with one formula {a="true"}

�E� � �

"<"�" � � � , where�#+

is obtained from � + by replacingnegative literals � � 6 by

�{a= �6 } and positive literals� 6 by {a= �6 }. It is clear that k is satisfiable if and only

if the signature matches (at line 1). 6Note that this is however much better than the complex-ity of model-checking linear-time temporal logic againstgeneral finite models [19], which is PSPACE-complete.We let the interested reader observe that the model-checking problem is again in NP if we add l , m , andn

operators; and that it is NP-hard as soon as we havegot at least one of

�, or l , or m , or

nand negation.

2.3 Lessons Learnt

Although an NP upper bound is much better in theorythan a PSPACE one, in practice this might be terriblyinefficient: experimental evaluation is called for. Thefirst author implemented the algorithm of Definition 2.5,which was then improved by Xiaobo Li (Dyade). Whileit works well in practice, it is fairly slow, but this isnot due to NP-completeness. The main problem is theway repeated events are dealt with. Assume you wish todetect matches for some temporal formula of the form� � � �

, where�

is some record pattern, and assume

records 5 � , 5 � , � , 51� , � , 5�� , � , . . . , match�

with the samevalues of variables. Then our algorithm will report allmatches:

� � � �matches at line � with the second�

matching line � O � , also at line � with the second�

matching line 0 O � , or matching line 3 O � , . . . , also at line� O � with the second

�matching line 0 O � , or 3 O � , etc.

Most of these matches are redundant, and we would pre-fer some form of shortest match reporting, where only� G � O � , 0 O � G 3 O � , possibly also � O � G 0 O � but not � G 0 O �or � G 3 O � are reported. It is not enough in log auditing tojust report on existence of matches, but reporting on allmatches is too much.

The second lesson learnt is that our choice of thelogic of Section 2.1, which is justified by it being stan-dard, is not totally suited to the task of log auditing. Infact, it is both too expressive and not expressive enough.

It is too expressive: formulae such as those used inthe proof of Theorem 2.7 involving Boolean combina-tions of modal formulae, of the form

� �, are rarely nec-

essary in practice. They are the source of the compli-cation represented by the generation of deduction chainclauses in the algorithm of Definition 2.5, which in turnweigh heavily on the memory and time requirements ofthe algorithm in practice. In fact, most of the formulaewe need are flat, namely of the form � � � � � � � � � "<"�" �� � � � � �+�2� , where

� � , . . . ,� � are present formulae, i.e.

formulae whose validity can be decided without havingto look at future records, and � � , . . . , �+� are themselvesflat. We shall use the idea of flat formulae to dispensewith clauses in Section 3.

Most of the formulae we need don’t use negationseither; in fact, the astute reader will have noticed thatnegations have their share of problems, in that the al-gorithm is not complete on infinite logs in the presenceof negation; for example, it never reports that � � � �holds at 5 � in an infinite log 5 where � always holds.In fact, we can only deal with eventuality properties ina complete way: we only have access to a finite pre-fix of the Kripke model, contrarily to standard model-checking where the whole model is available, in theform of a finite-state automaton.

It is not expressive enough: first, our formulae can-not count, which is a shame, since counting is one ofthe basic mechanisms available in every run-of-the-milllog auditing system. That is, we may require to report amatch when

�occurs at least

�times, giving signature� � � � � � � � � � "�"<" � � � � � (with

�occurrences of�

) to the algorithm, but this is awkward, inefficient (be-cause of the repeated events problem mentioned above),and the number

�has to be given foremost—the algo-

rithm cannot compute how many times�

occurred. Sec-ond, it cannot express parity conditions: consider themail attack of the introduction, where the intruder copies

some shell to /usr/spool/mail/root, sets the se-tuid bit on it, then sends a fake e-mail message to root.We might instead track the setuid bit, so that if the se-tuid bit is cleared before root receives the message, noalert is reported; however, if the setuid bit is set again,we should again consider reporting an alert; and so on.(This is the attack that Mounji considers in [13], p.74.)This cannot be expressed in regular linear time temporallogic, but it can in an extended logic with new temporaloperators described by finite automata atop ordinary op-erators a la Wolper [22]. This is what we shall do in thesequel; this will also enable us to do counting easily.

3 Refining the Logic for Log Auditing

Let us design our new logic. First, start with Wolper-style temporal operators. Our particular variant is builtfrom finite automata � with transitions labeled by for-mulae, as follows: the operator

���� , where = is any state

of � , is such that�����

holds at state (record) 576 if andonly if either = is a final state of � and

�holds at 576 , or

there is a transition =41�� = $ in � such that � holds at 586

and���� � � holds at some 5 + , ��� ) .

Our Wolper-style formulae differ from those ofWolper [22] in at least two respects. First, the transitionsof � are not labeled by actions of the Kripke model,rather by formulae. (Remember that the only action inour linear Kripke models is )� ) %9� .) Second, goingfrom one state = to a successor state = $ means findingsome later record 5 + , � � ) , where

���� � � holds, not

finding a successor record 576*) � where it would hold.This parallels the fact that we don’t need the l oper-ator in log auditing, only the strict

�operator.

����

��

F1

Fn

.

.

.

...q0

1A

A n

Figure 3. � � � � � � � � � "<"�" � � � � � � ���2�

As a first example, note that formulae of the form� � � � � � � � � "<"�" � � � � � � ���2� , in the notation ofSection 2, correspond to automata of the form shownin Figure 3, where �

,is the initial state, and � � , . . . , �U�

are automata corresponding to � � , . . . , �+� respectively.Motivated by the discussion of Section 2.3, we shall re-strict formulae that label transitions to be present formu-lae, so that

� � , . . . ,� � above do not involve any nested

���������� ������ ����copy (X)

setuid (X)

clruid (X)

send (X)q0

(a) A signature for the mail attack with se-tuid/clruid alternations

q0�� ������

{from= , to= , ...}X ξ

<<EOF>>

(b) Detecting probing attacks

q0 � ���

q1

q2

q2 q1ν - ν > n

������

������

������

launch (X)

ε

ε

exit (X)

ε

(c) Tracking launch/exit events

Figure 4. A few Wolper-style formulae

automata constructions. In fact, we shall even restrict thelabel formulae

�further, so that the set of environments

> such that 5ECF>.CF) � � � is representable in a simple way,as just a positive constraint > , ; this will simplify the de-sign of the model-checker, by disallowing negations anddisjunctions from label formulae. Note that disallowingdisjunctions in

�#+does not significantly restrict the ex-

pressive power of the logic, since labelling a transitionof � by

� � ��� � can be simulated by having two transi-tions, one labeled

� � , the other� � .

Wolper-style formulae also allow us to encode loopsin formulae. For example, the refined mail attack ofSection 2.3 ([13], p.74), can be specified by the for-mula

���� E /+4 �?9 , where � is shown in Figure 4(a); wetake � � ,��?�@ � , = 9</ �2)1*.�@A� , � - 4 �2)1*.�@ � , = 9 �7*.�@ � to de-note record patterns recognizing copy, setuid, clruid, andmail sending events done by .

In the presence of loops, it becomes important todistinguish between two kinds of first-order formulae:rigid variables , � , . . . , can only assume a fixed value

throughout the sequence of records matched by a givensignature—these were the only kind of variables in Sec-tion 2—while flexible variables [11] � , � , . . . , may as-sume values that vary from record to record. To illustratewhy this is needed, consider a signature for detecting aform of probing attack against a firewall, where the at-tacker assumes a unique identity and tries to connectto several machines behind the firewall. We might spec-ify this in the logic of Section 2 by writing, say:

{from= , to= � } � � �{from= , to= R� } � � � {from= , to= � } � �

to detect cases where tries to connect to three (pos-sibly identical) machines � , R� , and � . A bettersolution, using Wolper-style operators, is the formula���� E /+4 �?9 where � is shown in Figure 4(b); the final stateis circled, and <<EOF>> is a new record pattern thatonly matches the end of file. While has to be rigid,so that we detect attacks where the same from field ispresent, we definitely do not want the to field to be thesame, whence the use of the flexible variable � . (Wemight just omit the to

� � row here; flexible variableswill be useful in guards, see below; also for reportingthe sequence of their values along matching runs.)

Model-checking a formula�����

works by travelingalong transitions in the automaton � (while advancing inthe log), until a final state is reached, where checking

�starts. Counting can now be achieved by a simple trick.With each state = of � , create a flexible variable � � thatcounts the number of times we go through state = in � .Then, in the example of Figure 4(b), we may report � � Eon reaching the end of the log: for each match, the valueof � � E 19� at the end of the log will be the number oftimes tried to connect to a machine � , for each .

This form of counting can also be used to reach othergoals. In particular, consider that you wish to detect sit-uations where a user manages to launch more than �copies of an application whose license only allows for �distinct active copies. Assume that no copy is launchedat the start of the log, and we track events -�-F� �#��� �@A�(application launch) and 9#"2) /B�@A� (application exit) fromthis point on. Then

� �� E /+4 �?9 , where � is the automaton� of Figure 4(c), does the trick.

This example shows a few new constructions. First,the transition from � � to �

,, and the one from �T� to �

,,

are � -transitions: once control reaches � � (resp. �H� ), itmay spontaneously jump to �

,, without advancing in the

log—contrarily to non- � transitions, which can only betraversed by advancing in the log. The transition from�,

to the final state is also an � transition, but it is sub-scripted with a guard � � ) 1�� � ��� � . (No guard means/+4 �?9 .) The meaning of this is that the signature onlymatches provided control went through state � � more

than � times more than through state � � , i.e., that thenumber of active copies of the application is greater than� , a violation of the license.

Guards are given in a side-effect free language in-cluding equality, inequality and pattern-matching tests,plus a few other constructions. We won’t describe themin detail, as this is secondary. Similarly, the actual con-crete syntax we use for describing Wolper-style formulais in fact a programming language syntax that com-piles to automata just like programs can be translatedto control-flow graphs [1]. The actual concrete syntax isalso not essential to this paper, and we won’t describe it.

3.1 The Logic

We are now ready to define the syntax and semanticsof the logic. Since, in fact, our only formulae will beWolper-style formulae

�����

,�

will also be a Wolper-style formula, and we may simplify the presentation bytaking formulae of the restricted form

� �� /+4 �?9 . First,

we define the present formulae that we use to label tran-sitions of our automata:

Definition 3.1 (Present Definite Formulae) Let ��� ,��� , and ��� be pairwise disjoint sets of so-called rigid,flexible, and counter variables. A record pattern

�is a

list of rows )+* =,.-0/ , where )+* is a label and ,2-3/ is a fieldpattern, defined as either a regular expression 4 , a rigidvariable , or a flexible variable � . We assume no tworows in a given record pattern have the same label.

The set � of present definite formulae � , � , . . . , con-sists of record patterns

�, plus the special end-of-file

declaration <<EOF>>.

We assume that finite logs 5 end with a special end-of-file record, and that <<EOF>> only matches this end-of-file record. This does not restrict the generality of ourapproach, and allows <<EOF>> to be matched at all.

Note that counter variables (the � variables) are notallowed in record patterns. We extend the

-3/���� func-tion of Definition 2.5 to deal with the case of <<EOF>>.It is easy to see that

-0/�� � still succeeds, returning aunique partial environment > , or fails, deterministically.It is convenient here to define partial environments astriples �@>�� C >�%CF>� � mapping rigid, flexible and countervariables respectively to their values. Because valueswere assumed to be strings, but counters are integers,we also tacitly assume the use of conversion functionsbetween each data type, a la Perl [2]. Other solutionsare possible, e.g., defining an algebra of data types; thisis however out of the scope of this paper.

Assume a language of guards built on rigid, flex-ible, and counter variables. As announced earlier, weleave the language unspecified. Its semantics is given

by a total computable evaluation function � � � � suchthat for any guard ; � and any partial environment> I� �@>��'CF>���C >�� � , � � ;�� � > is a Boolean value. If ; labels atransition ���1 �� �T$ , � � ;�� � > will be used to decide whether

we can go from � to � $ .Definition 3.2 (Wolper-Style Formulae) Let � be aspecial symbol, outside of � . A Wolper-style formula�

is a tuple ����C�� C � C��� , where � V���� is a finite set ofso-called control states, ����� is the initial state, �9V�5is the set of final states, and � V(5������ � � � ���� ��5is the transition relation.

Note that every control state is identified with a countervariable: this trick allows us to naturally map each state� to a counter variable � � (i.e., � � � � ); in implemen-tations, we may of course choose different data struc-tures for states and variables. We write � �1��� �T$ when

� � C -.C ;.C �T$ � is in � . In the sequel, we restrict Wolper-style formulae

�to those obeying the following assump-

tions 3.3, 3.4, and 3.5. This does not restrict the gen-erality of the approach, but failing to make these as-sumptions would result in definite complications. (The� -determinism Assumption 3.5 is needed for technicalreasons in the proof of Theorem 3.11.)

Assumption 3.3�

has no cycle of � -transitions, i.e.,it does not contain any sequence of transitions� �"!1��� � �H� !1��� ) "�"<" !1��� � � � .Assumption 3.4 There is no sequence of � -transitionsfrom the initial state to any final state in

�.

Assumption 3.5 For every two distinct � -transitions� !1��� � � � and � !1��� ) �#� with the same start state � , ; �and ;�� cannot both hold, i.e., for no partial environment> it is the case that � � ; � � � > � /+4 �?9 and � � ;��#� � > � /+4 �?9 .

To define the semantics of formulae, first define whatit means to enrich a partial environment, e.g., by the re-sult of a

-3/���� operation. On rigid variables, enrich-ing >�� by >F$� succeeds if and only if >�� and >F$� agree on#%$'& >�� � #%$'& >F$� , and yields >�� �!>�$� , the partial envi-ronment mapping every rigid variable in #%$'& >� to>��0�@ � , and every rigid variable in #%$'& >K$� to >F$� �@ � .(Enrichment first checks that the values of rigid variablesdo not change.) On flexible variables, enriching >$� by>�$ � means computing >%�'&!>F$ � , defined as mapping every� � # $ & >F$ � to >F$ � � �3� , and every � � #%$'& >��)( #%$'& >�$ �to >� � �3� . (New values hide old values.)

Definition 3.6 ( *��84 ) ��� -3/���� ) For every present def-inite formula � , record � , and partial environment> I� �@>��'CF>���C >�� � , if

-3/���� ��� C � � succeeds and returns

�@>F$ � C >�$ � C �� � , and if >�� and >F$� agree on #%$'& >�� � #%$'& >F$� ,let *��84 ) � � -0/�� �7��� C �HCF>%�&I� �@>��5� >F$� C >��& >F$ � CF>� � . Oth-erwise, *+�84 ) ��� -3/����7��� C �HC >�� is undefined.

Note that -3/���� does not involve counter variables.

However, we let >� � � %�% � denote the environment thatmaps � to >� � � �-% � , and every other counter variable�K$� # $ &(>�� to >�� � �K$ � . (This increments the countervariable � .) By extension, if > � �@>�� CF>���C >�� � , then welet > � � %+% � be �@>��'CF>��%CF>� � � %�% �@� .

While we might define the semantics of Wolper-styleformulae in terms of a relation 5 CF>.CF) � � � as in Defini-tion 2.3, we prefer a more operational definition, wherenot only the index ) of the first record 576 where

�matches is explicit, but also all indices where each con-trol state of

�is matched. This is useful notably in re-

porting attacks: the list of records 576 constituting theattack is more useful than the first record alone.

Definition 3.7 (Run) A partial run of a Wolper-styleformula

� I� ����C �2C�� C��� against a log 5 is a fi-nite sequence � I� � � � C<"�"<"<C ��� � ( � � O

) of tuples� + I� � � + CF) + C > + � , ��Q � Q � , each consisting of a state �+

of � , an index ) + � #%$'&�5 of a record in the log 5 , anda partial environment > + , such that, whenever � � � :

1. � � � � (start condition);

2. for every ��� � Q � , there is a transition�+�� � �1��� �

+in � such that:

�D-%� - is a present definite formula � ,>F$�I� *��84 ) � � -0/�� �7��� C 5?6 % � C > +�� � � is defined,> + I� >F$ � � � %H%�% � , � � ;�� � > + is true (the guard holds),and ) + � ) +�� � (advancing in the log);� �<� or - � � , > + � > +�� � � � � % %+% � , � � ;�� � > + is true,and ) + � ) +�� � (not advancing in the log).

A run, or a complete run, is a run � as above where inaddition � � � and � � � � .

The domain #%$'& � of � is ��) � � ) ��� "<"�"�� ) � .The span of ��) � CF) � C�"<"�"BC ) � , and by extension of � is������� � I� � ) � CF) � � (the empty set if � �(O

). The length ��� � �is � . If � � � , the beginning of � is ) � , its end is ) � , itsend state is � � , and its end environment 9 �7*'>?� � � is > � .

We define the strict orderings ������� and on do-mains of runs (and, by extension, on runs), as follows:for every domains !9I� ��) � � ) �"� "<"<"#� ) � and! $ I� ��) $ � � ) $ � � "�"<"$� ) $� � , !%�&���'�(! $ if and onlyif ) � � )�$ � , ) � � )�$� , . . . , ) +�� � � )�$+�� � , ) + � )�$+ for some� , � Q � QG&*) � ��� C � $ � . We let !+ ,!�$ if and only if� ) � CF) � �.- � )�$ � CF)�$� � � , or � ) � C ) � � � � )�$ � C )�$� � � and !,�&���'�/! $ .We use the strict ordering to define the notion of short-est run that we were longing for in Section 2.3: a short-est run of

�against 5 is one that is minimal with re-

spect to . Our reasons for choosing this particular def-inition are that, first, it seems to give results that user

expect in practice—modulo a few changes that we de-scribe below—, and second, it is simple enough to de-sign a model-checking algorithm for this logic that onlyreturns shortest runs. Moreover, shortest runs are opti-mal in a precise sense (see Theorem 3.8), as far as mini-mizing redundant information reporting is concerned.

Let us explain why this notion of shortest run seemsadequate. First, consider the repeated events problemof Section 2.3, where some record pattern

�matches

records 5 � , 5 � , � , 51� , � , 5�� , � , . . . , and we look formatches of

� � � �, i.e., the Wolper-style formula �

,101��2 ��34�� � 01 �2 �35� �#� with ��I� � , , � I� � �#�T . Since any shortest

run has minimal span, the only shortest runs here are5 � G 5 � , � , 5 � , � G 51� , � , 51� , � G 5�� , � , etc. Taking runs withshortest spans only also allows us to report a license vio-lation in the example of Figure 4(c) as soon as the num-ber of active copies exceeds � , and only when it does so.Reporting all runs would force us to report at every sub-sequent record—at least until enough copies have beenexited—that the number of active copies still exceeds � ,which is clearly spurious information.

Our notion of shortest run, however, does not just reston minimum spans, but also on a lexicographic order-ing of domains. To see why such a solution is needed,consider the case of the mail attack of the introduc-tion, detected by the Wolper-style formula �

, �76�8:9<;>=@?1 �2 ��34�� � � � 2 3 6BA�;>=@?1 �2 ��34� �H� � � �4A�;>=@?1 �2 ��34� � � , where � I� � , , � I� � � � . Con-

sider a log 5 where 5 � is ��� ,��?�D-%� for some - , both5 � � and 51� , are = 9</ �2)1*.�D-�� , and 5�� , is =�9#�7*2�D-%� , withno intervening � � , �?�D-%� or = 9 �7*2�D-%� in between. Thenthere are at least two runs with minimum span � �'C 3 O � ,one with domain ��� C#��� C 3 O , the other with domain��� C 0 O C 3 O . Our definition of shortest runs selects theformer, and discards the latter. The choice of one overthe other is arbitrary, but a choice has to be made: re-porting both runs is redundant.

More subtly, consider the mail attack with se-tuid/clruid alternations of Figure 4(a), and assume 5 �is a � � ,��?�D-%� event, 51� , is a = 9</ �2)1*.�D-%� event, 5 � , isa � - 4 �2)1*2�D-%� event, 5DC , is again a =�9�/ �2)1*2�D-%� event,and 5 � , , is a =�9#�7*2�D-%� event. Several runs exist forthis formula against this log, even with the same span� �'C#� O�O � : one has domain ���'C�0 O C�3 O C�/ O C � O�O , another��� C 0 O C#� O O , yet another ��� C / O C#� O O . Using shortestruns, only the one with domain ���'C�0 O C�3 O C�/ O C � O�O , themost informative one, is reported.

The important properties of shortest runs are given inthe following Theorem. Part �@)F� asserts that shortest runsare rare, and are optimal as far as minimizing spuriousreports is concerned: although there may be several runsof�

against 5 with beginning ) , they can only differ by

the values of variables; since attacks are characterizedby the sequence of records 576 in the log 5 that describesthem, the shortest run semantics reports at most one at-tack for each formula

�and beginning ) . Part �@) ) � as-

serts that shortest runs are complete: no information islost by only reporting shortest runs, instead of all runs.

Theorem 3.8 (Optimality, Completeness) Let ) �#%$'&�5 , and

�a Wolper-style formula. Then:

�@) � All shortest runs of�

against 5 with beginning )have the same domain;�@) ) � If there is a run of

�against 5 with beginning ) ,

then there is a shortest one.

Proof. First show that is total on domains of runswith beginning ) , i.e., for any !9I� ��) � � ) � � "�"<" ��) � and ! $ I� ��) $ � � ) $ � � "<"<" � ) $� � , with ) � � ) $ � � ) ,then ! � ! $ or ! ! $ or ! $ ! . If ) � � )�$� � then! ! $ , if )�$� � ��) � then ! $ ! . So it remains to dealwith the case ) � � )�$� � , where ! and !�$ have the samespan. If there is an index � such that ��Q �RQ &*) � ��� C � $��and ) + �� )�$+ , take � to be smallest, then either ) + � )�$+ andtherefore ! !�$ , or ) + � )�$+ , so !�$ ! . Otherwise,no such � exists, and we claim that ! � !�$ . Withoutloss of generality, assume that ��Q � $ . Then the sequence) � �() �/� "<"�"��() � is a prefix of )�$ � �()�$� � "<"�"��()�$� � , inparticular ) � � )�$� . Since ! and !�$ have the same span,) � � )�$� � , so )�$� � )�$� � . If � � � $ , this would entail )�$� �()�$� �since the sequence ! $ is strictly increasing, which wouldbe a contradiction. So � � � $ , hence ! � !�$ .

So we cannot have two shortest runs of�

against 5with beginning ) , with different domains, as one of thedomains should be shorter (w.r.t. ). This proves �@) � .

Secondly, we claim that is well-founded on do-mains of runs. Take any infinite decreasing sequenceof domains !H6 , ) � O

: there must be an ) , such that thespan of !H6 is the same for all ) � ) , . Indeed, the inclu-sion ordering for spans is well-founded. Without loss ofgenerality, assume that ) ,:�9O

and the span of every !H6is � ) CF)�$ � . Then is included in the lexicographic prod-uct of at most )�$�1 )% � copies of the � well-foundedordering on integers

� ) , which is well-founded.It follows that, if there is a run of

�against 5 with

beginning ) , there is a least one for , that is, there is ashortest such run. So �@) ) � holds. 63.2 Model-Checking

Our model-checking algorithm for the logic is sim-ilar to Definition 2.5, except for a few points. First,as desired, we don’t need �.-�-F�8= 9 = any longer, or anymechanism for solving deduction chains. Second, while�+�24 4'9 �8/ and !!9 "2/ used to be lists or sets, but this wasessentially unimportant (except that using sets allowed

us to derive optimal worst-case complexity bounds inthe propositional case), here it will be important that���24 4'9#�8/ and ! 9 "2/ are FIFO queues: this is how weimplement the shortest run semantics of Section 3.1.

Our operations on queues�

are: testing whether�

isempty, creating a new, empty queue (

� �������������� � �1� ),pushing an element - at the back of

�(����������� �D-.C � � ),

and popping an element - from the front of�

, if�is not empty ( - ������������ � � � � ). The informa-

tion we keep on queues ���24 4'9 �8/ and ! 9 "2/ are tuples� � C@,2)1*2C�� C � C �H�#�B9'CF4'9�/+4 )D;0;�9<40� , where

�is a Wolper-

style formula, , )+* is an integer, � is a transition ��1��� �T$

of the automaton�

such that � is a present definite for-mula, � is a partial run of

�against 5 whose end state

is � (if its length is� � ), and �H�#�<9 and 4'9�/+4 )D;0;�9<4 are

Booleans : �H�#�B9 is true if we only want to know whether�holds now (at the current position 576 in the log 5 ), and

false if we want to know whether�

is true now or later;4'9</+4 )D;0;�9<4 is used to implement optimizations, for nowsimply assume that 4'9�/+4 )D;0;�9<4 is true.

Intuitively, the algorithm of Definition 3.9 simulatesthe execution of a parallel non-deterministic machine,running non-deterministic threads in parallel. For eachsignature (Wolper-style formula)

�, and each ) , �

#%$'&(5 , a new thread is launched to test whether thereis a run of

�against 5 with beginning ) , , and ,2)1*

is a unique integer associated with this thread, as inUnix. Contrarily to Unix, however, each thread is a non-deterministic machine, which may split at branch points,just like a non-deterministic Turing machine [7]. In par-ticular, the ���24 4'9#�8/ and ! 9 "2/ queues may contain sev-eral tuples with the same ,2)1* , corresponding to differentbranches of the same non-deterministic thread.

The fact that � � CD, )1*2C�� C � C �H�#�B9'C 4'9</+4 )D;0;�9<40� —at leastwhen �H�#�B9 � � - -@= 9 , which we assume for the

moment—is in the �+�24 4'9 �8/ queue, with � � � �1��� �T$ ,and that we are looking at record 576 in the log 5 , denotesthat thread , )1* is waiting for some record 576 � , )�$ � ) , thatwould match � , and such that ; then evaluates to /+4 �?9 ,to advance to control state � $ . Advancing to state � $means following any sequence of � transitions (remem-ber that there are no � -cycles by Assumption 3.3, so thismust terminate), until we reach non- � -transitions, whichare enqueued onto ! 9 "2/ . To do so, the model-checkertests right away whether 576 (the case )�$ � ) ) matches� , whether this makes ; true, and if so advances to state� $ . It also enqueues � � C@, )+* C�� C � C��H�#�B90CF4'9</+4 )D;0;�9�40� onto!!9#"2/ , to check for later records 586 � , )�$� ) . This is oneinstance of a non-deterministic split.

This normal behavior is modified when �H�#�B9 is/+4 �?9 . If so, then the model-checker won’t enqueue� � C@,2)1*2C�� C � C �H�#�B9'CF4'9�/+4 )D;0;�9<40� onto !!9#"2/ . This happens

in the case when the model-checker tries to find a run ofthe signature

�against 5 with given beginning ) , . Then

it really wants to match � , where � labels one of the firstnon- � transitions of

�, against 586 E , not any later record;

so it should set �H�#�B9 to /+4 �?9 .The purpose of the 4'9�/+4 )D;0;�9<4 Boolean will be ex-

plained in Section 3.3. For now, we may notice that4'9</+4 )D;0;�9�4 is always /+4 �?9 in the following algorithm,and therefore plays no role (yet). The places where weshall modify the algorithm to optimize it are markedwith an asterisk, in lines � of � ��� � ����� and " ��� of themain algorithm. The algorithm is as follows, and is pa-rameterized by an attack reporting function � ��� ��� ; thefunction

�������� �� �1� creates a fresh integer ,2)1* , while �1� isthe empty partial run and � � � � � CF) C >�� denotes the partialrun � with � � CF) C >�� added at the end.

Definition 3.9 (On-line Model-Checking) Let �!) -�- bea global variable containing finite sets of integers.

Define the function � ��� � ����� (advance to state � in�

when at record 586 , with environment > , current partialrun � , thread ,2)1* , and Boolean �H�#�B9 ): see Figure 5,top. This updates �!) -�- and the queue parameter

�.

Let� � C<"�"<"<C � � be Wolper-style formulae (the sig-

nature file). Let � + be the initial state of�#+

. Model-checking

� � C<"<"�"<C � � against the log 5 operates as inFigure 5, bottom.

Lemma 3.10 (Termination) The algorithm of Defini-tion 3.9 terminates on any finite log 5 . More generally,for any ) , � #%$'&�5 , ) reaches the value ) , in line " 3 ofthe algorithm in finite time.

Proof. The � ��� � ����� function terminates because ofAssumption 3.3. Recall that the evaluation function � � � �is total, i.e., terminating. So the loop body "�� – " �,0 termi-nates. Since this loop body only enqueues on the !!9#"2/queue, and dequeues from the ���24 4'9#�8/ queue, eventu-ally ���24 4'9 �8/ will be empty, hence the loop " / – " �,0 ter-minates. The claim follows. 6

An invariant of the algorithm of Definition 3.9 isthat for every entry � � C@,2)1* C�� C � C��H�#�<9'CF4'9</+4 ) ;0;�9<40� in�+�24 4'9 �8/ or ! 9 "2/ , � is a non-empty partial run of

�against 5 , and � � � �1��� �T$ for � equal to the end state

of � . It follows in particular that every � that � ���� ���reports at line / of � ��� � ����� is a valid partial run. More-over, � must be a complete run since its end is a finalstate of

�, because of the test at line 3 of � ��� � ����� . So

only valid runs are reported.The purpose of � ) - - is to remember the , )1* s corre-

sponding to pairs � �#+ CF) , � , such that a run of�#+

withbeginning ) , has already been reported. Note that , )1* sare tested in line "�� , so that once a run with a given , )1*

has been reported, no entry with the same , )+* is ever re-enqueued again. As , )1* s are in one-to-one correspon-dence with pairs � �#+ CF) , � , this algorithm only reports atmost one run for each pair � �#+ CF) , � .

For every partial run � , let � > 6 denote the maxi-mal prefix of � of domain included in � � C ) � . An-other invariant of the algorithm is that: � � � if no runof��+

with beginning ) , has yet been reported, and) � ) , at the start of the loop " / – " � 0 , then for ev-ery run � of

�#+with beginning ) , , there is an entry

� � C@,2)1*2C�� C � C �H�#�B9'CF4'9�/+4 )D;0;�9<40� in �+�24 4'9 �8/ such that ei-ther �H�#�B9 � � - -D= 9 , )N� ) , and � � � > 6 � � , or �H�#�B9 �/+4 �?9 , ) � ) , and � � � > 6 . This is indeed true when) � � , since then the call to � ��� � ����� at line " 3 has en-queued all entries of the form � � CD, )1*2C�� C � CF/+4 �?9'C /+4 �?9 � ,where � ranges over all partial runs of the form� � � � � � + CF) , C �� � C�� �H� CF) , C �� � C<"�"<"<C�� � �8CF) , C �� � � where

� � � � �1 �� �T$ , and certainly any run of�#+

with be-

ginning ) , must start with such a partial run. (Remem-ber that by Assumption 3.4, no sequence of � -transitionsleads from the initial state to any final state.) Then weuse induction on computations, using the easily provedfact that if � � � holds, then after executing the loop " / –" �,0 , if no run of

�#+with beginning ) , has yet been re-

ported, then for every run � of�#+

with beginning ) , ,there is an entry � � CD, )1*2C�� C � C��H�#�B9'C 4'9</+4 )D;0;�9<40� in !!9#"2/such that �H�#�B9 � � - -D= 9 and � � � > 6 . Indeed, if)A� #%$'&�� , then the corresponding entry is added bythe call to � ��� � ����� at line "�� , otherwise is is added byone of the calls to

�����������, at lines " � O , " ��� or " �,0 .

It follows that for every�#+

, for every ) , �!#%$'&(5 , atleast one run of

�#+with beginning ) , will be reported, if

there is one. Therefore:

Theorem 3.11 (Soundness, Completeness) The algo-rithm of Definition 3.9 reports exactly one shortest runof��+

against 5 with beginning ) , , for all ) , ��#%$'&�5such that there is such a run, and all � , ��Q �RQ .

Proof. Taking into account the above considerations,it only remains to show that the unique reported run isshortest. Because records 586 are processed in increasing) order, only runs of

�#+with beginning ) , with smallest

span are reported. So it remains to show that amongstthose, the reported one is the lexicographically small-est. The (intuitive, but slightly wrong) idea is that forevery two entries � �#+ C@, )+* C�� C � C��H�#�B90CF4'9</+4 )D;0;�9�40� and� ��+ CD, )1*2C��H$ C � $ C��H�#�B9 $1CF4'9</+4 )D;0;�9�4 $ � in ���24 4'9 �8/ at thestart of the loop " / – " �,0 , such that the former occurs be-fore the latter in the queue (i.e., it will be dequeued first),then #%$'& � �&���'� # $ & � $ . Intuitively, this condition ispreserved because dequeued entries in ���24 4'9#�8/ are re-enqueued in the same order onto !!9 "2/ . The subtle point

1 function � ��� ������ ��� ��"� 2 � Z � �*��� 2 � ��� �*��� ��� &P2 Z�� G I Z�� �� --��

;� � G I � G G�� "� 2 � Z�� ;

3 if

is final in�

4 thenP�� 2 LNL G I�� 2 LNL QaP � 2 �$U

; (* found shortest run, remember to kill others. *) ��� �"!#��$ ��� ��� ; U5 else for every transition %K\I '&(�)* � in

�jP6 if

I,+then

Pif � � - � � Z then � ��� ������ ��� �� � � 2 � Z � ��� � ��� 2 � ��� � �V� ��� ; U

7 else � ��.#/"� /0� ����� �1� 2 � � % � � � ��� � �V� � �d���B� � ��� ; UXU.1 F ���E�'���*�HG I �0�32�.#/0� /0� �� ; J �����HG I �"�32�.#/0��/0� �� ; � 2 LDL G I �

;2 G I /

;.2 while

� 2A132$465 3 &P.3 for 4 I / S S 5 P � ��� ������ ���76 ��8 6 � 2 � P'U � �� � �0�32��9 � �� � �d�E�B� � F �����'�V�*�

;U

.4 while� F ���E�'���*�

is not empty)P

.5��� �:� 2 � � %H\I � <;(�)* � � �*��� � �V� � �'�V�d� 2 -�- �V�' G I � �#.3/0� /0� � F �����'���B�

;

.6 if� 2 � <1 � 2 LNL then

P.7 if Z��'\I>=��*� 2 �V`�^ M �_��`*�1? � 3 4 � ��� � Z � � � is defined.8 then

Pif � � - � � Z � (* guard is true: advance. *)

.9 thenP � ��� � �0�� ��� �� � � 2 � Z�� � �*�1� 2 � ��� M�L OE� � J �V���

;.10 if

�'���d� 2 -�- ��� and not� � ���

then � ��.3/0� /0� ����� ��� 2 � � % � �*��� M�L OE� � �'���d� 2 -�- ��� � J ����� ;U

.11 else if�d���B� �

and not once then ����.#/0� /"� ����� �1� 2 � � % � �*��� M�L O�� � � ���d� 2 -- ���' � J ����� ;U

.12 else if not once then � ��.#/0��/0� ����� �1� 2 � � % �@�*��� M"L OE� � �'���d� 2 -�- ��� � J ����� ;UXU

.132 G I 2.-0/

; swap F ����� ���*�and J �����

;U

Figure 5. On-line model-checking

is that additional entries, namely those added by the callto � ��� � ����� at line "BA , preserve the condition, since thiscall is made before we re-enqueue at line " � O . Howeverthe condition #%$'& � ��� �'�:#%$'& � $ does not quite work.

Instead, define the orderings � 6 on domains ! C ! $ V� � C ) 1�� � by !,� 6 !�$ if and only if ! �DC ��� �'�/! $ �DC forall C V � ) CC%FE � , C �� � . For every

� � � , map the�th

entry � ��+ CD, )1*2C�� C =<)D; -.C��H�#�B90CF4'9</+4 )D;0;�9�40� in ���24 4'9 �8/at the start of the loop " / – " �,0 to #%$'& � if �H�#�B9 � � - -@= 9 ,or to

�if �H�#�B9 � /+4 �?9 ; call !

+ 6 E� this domain, where � isthe index of

�#+, and ) , is the beginning of � . We claim

that for all � and ) , , the subsequence of the !+ � 6 �E� with

� � ��$ and ) , � )�$, is strictly increasing in � 6 .We show this by induction on computations. This is

certainly true at the start of the loop " / – " �,0 when ) , � ) ,since the corresponding subsequences with given � and) , are each of length one, and contain the sole domain�. (In particular, the claim holds when ) � � at the

start of this loop.) Then the effect of lines " / – " � 0 is tocopy these domains from ���24 4'9#�8/ to !!9#"2/ , keepingthe order (because we use FIFO queues), possibly re-moving some domains (the killed entries), and possiblyadding a finite number of copies of !

+ 6 E� �<��) at theimmediate left of !

+ 6 E� (using � ��� � ����� at line " A ). Notehowever that, because of Assumption 3.5, the latter op-eration must in fact add at most one copy of !

+ 6 E� � ��) to the immediate left of each !

+ 6 E� . Note also that re-

moving some domains preserves the strict ordering in-variant. So we are left with proving that, given any se-quence of domains * � � 6 *K� � 6A"�"<" � 6 *HG , where* � C *K� C<"�"<"<C *HG7V � �'CF) 1 � � , inserting *48 � ��) immedi-ately at the left of selected *58 ’s, � Q , Q ! , yields a� 6*) � -strictly ordering sequence. It suffices to show that* � � ��) /� 6*) � * � � 6*) � *F��� ��) /� 6*) � *K�$� 6*) � "<"�" � 6*) �*�G ���<) � 6*) � *�G , i.e., that: �@)F� *58 � ��) *� 6 ) � *58 forevery , , � Q ,'Q ! , and that: �@)1) � *48 � 6*) � *�8#) � forevery , , ��QA, � ! .

Let C be any non-empty subset of � ) %��'C %IE � , and �the cardinality of *48 . Then *58 �+��) �JC and *58 �JC agreeon their � first elements (those of *48 ), and their � � %� � th elements are respectively ) and the first element ofC , which is by assumption at least )�% � . So *48 � ��) �C ������� *�8 �KC . Since C is arbitrary, �@) � holds.

Since every non-empty subset of � )#% �'C %IE � is also anon-empty subset of � ) CC%IE � , it is clear that *48 � 6 *58#) �implies *�8 � 6*) � *�8#) � , whence �@) ) � holds.

This finishes to show that at turn ) of the loop " / – " �,3 ,for any fixed � and ) , , the !

+ 6 E� ’s are sorted in strictlyincreasing � 6 order in �+�24 4'9 �8/ . So the first reportedrun is not only of minimum span � ) , C ) � � , but also � 6ML:N � -minimal. But it is easy to see that if ! and !R$ have thesame span � ) , CF) � � and ! � 6ML:N � ! $ , then ! ������� ! $ . Sothe first reported run is shortest. 6

The algorithm is still sound and complete on infinitelogs, contrarily to Algorithm 2.5: we never had to as-

sume 5 finite. The following is also clear, and says thatruns are reported as soon as possible:

Theorem 3.12 (On-line) The algorithm is Defini-tion 3.9 is on-line: if there is a run of

��+against 5 with

beginning ) , , and the span of its shortest run is � ) , CF) � � ,then this is reported when reading record 5 � , i.e. when) � ) � in the call to � ��� � ����� at line " A .

3.3 Optimizations

There are two places where we can, and must, im-prove the efficiency of the algorithm of Definition 3.9.

First, at line " ��� , notice that ; is false, but it mighteven be the case that we can statically determine (i.e.,before we run the model-checker) that if ; is false now,then it will remain false forever. Consider a typical time-out condition: one field date of the records 576 is amonotonically increasing time field, e.g., ) Q ) $ implies5?6 " date Q 586 � " date (such information can easily besupplied to the model-checker), and the transition � is of

the form ��1��� �T$ , where � contains a row date

� � ,and ; is the condition � � / , , where / , is a time constant.The idea is that this transition should only be activewhile the current time has not reached / , yet, so attacksextending farther away in time should be discarded. Asstated in Definition 3.9, the algorithm won’t discardthese transitions, because it is not aware that once ; fails,it must fail forever—i.e., ; is an antitonic function (pro-vided Booleans are ordered by

� - -D= 9 � /+4 �?9 ), mean-ing that increasing the values of free variables can onlymake ; decrease. Call a flexible variable � increasing in�

if and only, whenever there is a row )1* � � in�

, )+*is a known monotonically increasing field. The preciseoptimization is that we may replace the marked /+4 �?9��test of line " ��� by a test not

�� ���� � ��� ����� � � � ��;%� ,where

� ��� � ��� ����� � � �� ��;%� returns /+4 �?9 only when; is an antitonic function of the increasing flexible vari-ables and of the counter variables (which can only in-crease through time), and is constant as a function of thenon-increasing flexible variables. (Note that ; may de-pend on rigid variables, which do not vary, at will.) Forshort, say that ; is antitonic. Testing antitonicity is eas-ily computed before the algorithm even starts, by usingstandard abstract interpretation techniques [3].

Second, look at line � of � ��� � ����� . This places thread, )+* in a position where it will wait for some record 576 � ,)�$ � ) at which the transition �6I� � �1��� �T$ can be fired

(line "BA ). Now assume that � may be fired either at 576 � orat 5?6 � � , with ) Q()�$ �()�$ $ . In general, it might be that, al-though � can be fired at 576� , this firing will never lead to acomplete run, and we have to wait for the firing at 576 � � to

obtain one. This is why we re-enqueue at line " � O . How-ever, this is not needed in special cases. For example, as-sume that ; � /+4 �?9 and � binds no variable. (The vari-ables bound by � are those at the right of the equals signin rows.) Then any run that we get by firing 576 � � wouldbe of the form � � � � C ) � C > � � C<"<"�"<C�� �H� � � CF)�� � � CF>5� � � � C� �H�3CF) � CF>5� � C � �H� ) � C ) � ) � CF>5� ) � � C�"<"�" � where )�� � � Q )/�) � � )�$ $ � ) � ) � . Now firing 586 � instead doesnot change the values of variables, since � bindsno variable, hence � � � � C ) � C > � � C<"<"�"<C�� �H� � � CF)�� � � CF>5� � � � C� �H�3CF)�$ CF>5� � C � �H� ) � C ) � ) � CF>5� ) � � C�"<"�" � is another, strictlyshorter run. So it was not necessary to re-enqueue atline " � O . In this case, we set 4'9</+4 )D;0;�9<4 to

� - -D= 9 at line �of � ��� � ����� ; i.e., we replace /+4 �?9�� there by a call tosome Boolean function �� ��� ��� � ��� ��� ��� ��� ��� CF;%� thatreturns

� - -D= 9 in this special case. This argument canbe generalized to the case where � may bind any flex-ible variable, the rigid variables that � binds are eitherin #%$'& > (i.e., they cannot change their values becauseof later calls to

-0/�� � ) or are not free in any guard la-beling any transition reachable from � $ (i.e., althoughthey may change, they won’t influence any later evalua-tion of a guard), and any reachable guard ; $ is antitonic(i.e, evaluating ; at 586 � instead of 5?6 � � can only make ittruer). If so, �� ��� ��� � ��� ��� ������ ��� C ;%� returns

� - -@= 9 ,otherwise /+4 �?9 .

Apart from these two optimizations, we use a fewother tricks. The set � ) - - is regularly purged of , )+* s ofthreads that have been killed already. Moreover, mem-oization is used extensively to avoid recomputing eitherthe values of guards or the environments resulting fromcalls to

-3/���� —in fact, even resulting from individualrow matching operations. Finally, there is no need topropagate whole partial runs � in queue entries: chang-ing the definition of >%� to map flexible variables � tohistories, i.e., sequences of values of � along the run,and reserving a field name line holding line numbers) in records 5?6 allows one to just keep one environmentfrom which the whole run can be reconstructed.

4 Conclusion

The algorithm of Definition 3.9 was implemented inthe logweaver tool, together with the optimizationsof Section 3.3, by the second author. Practical resultslook promising: for example, on a 800 MHz PentiumIII (Coppermine) machine with 256Kb cache runningLinux 2.2.14-5.0, simultaneously detecting probing at-tacks (the same source connects to several destinationaddresses, at least � O O times, all related connections be-ing collected even beyond the first � O�O ) and spoofingattacks (several sources connect to the same destinationaddress) takes

O " 3 / s. on a 942K firewall log consisting of

3026 lines, and 3�" ��A s. on a 9731K firewall log consistingof 31437 lines. In the latter experiment, parsing the logitself already accounts for 0 " O � s., i.e., 54% of the time.Moreover, times are proportional to the size of the log,but only less than proportional to the size and number ofsignatures (because of memoization, which shares com-putations between signatures). Other small experimentsconducted in Linux syslog and messages auditingreport negligible auditing times. In most cases, the opti-mizations of Section 3.3 turned out to be crucial.

We do not claim that Algorithm 3.9 is the final so-lution to log auditing. There are cases where combina-torial explosion occurs; after all, this is an exponential-time algorithm. (Note that, although the complexity ofthe shortest runs problem is in NP, we conjecture that itis not NP-complete.) Nonetheless, in all cases until now,combinatorial explosions were due to errors in writingsignatures. These errors can be located by inspection ofexecution traces of the model-checker, which our imple-mentation produces on demand.

All in all, we join A. Mounji [13] in estimatingthat detecting complex correlations of events is feasi-ble, even perhaps with efficiency comparable to muchsimpler intrusion detection systems that only do filteringand counting. Our import, as we hope to have demon-strated, is that it is even possible to detect complex cor-relations of events, on-line, using a very declarative styleof signatures, expressed in a suitable variant of tempo-ral logic. This temporal logic has a clean semantics anda clear notion of shortest run. This clarity fosters opti-mizations in the model-checker, ensures optimality withrespect to minimizing spurious attack reporting (Theo-rem 3.8), and at the same time provides for better read-ability and maintainability of signature files.

Acknowledgments

We thank Xiaobo Li, who improved and maintainedthe implementation of the model-checker of Section 2,and Stephane Demri for his suggestion that model-checking the logic of Section 2 was NP-complete.

References

[1] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Prin-ciples, Techniques, and Tools. Addison-Wesley, 1985.

[2] T. Christiansen, L. Wall, and J. Orwant. ProgrammingPerl. O’Reilly and Associates, 3rd edition, 2000.

[3] P. Cousot and R. Cousot. Abstract interpreta-tion and application to logic programs. Journal ofLogic Programming, 13(2–3):103–179, 1992. Correctversion at http://www.dmi.ens.fr/˜cousot/COUSOTpapers/JLP92.shtml.

[4] M. Crosbie, B. Dole, T. Ellis, I. Krsul, and E. Spafford.IDIOT – users guide. Technical report, Purdue Univer-sity, COAST Laboratory, 1996.

[5] M. Ducasse and J.-P. Pouzol. Handling generic intrusionsignatures is not trivial. In Recent Advances in IntrusionDetection (RAID) Workshop, 2000.

[6] S. T. Eckmann, V. Giovanni, and R. A. Kemmerer.STATL: An attack language for state-based intrusion de-tection. In ACM Workshop on Intrusion Detection Sys-tems, 2000.

[7] M. R. Garey and D. S. Johnson. Computers and In-tractability, a Guide to the Theory of NP-Completeness.W.H. Freeman and Co., 1979.

[8] S. Kumar. Classification and Detection of Computer In-trusions. PhD thesis, Department of Computer Science,Purdue University, West Lafayette, 1995.

[9] U. Lindqvist and P. A. Porras. Detecting computerand network misuse through the production-based ex-pert system toolset (P-BEST). In IEEE Symposium onSecurity and Privacy, pages 146–161, 1999.

[10] LSV. Systems and Software Verification. Model-Checking Techniques and Tools. Springer Verlag, 2001.

[11] Z. Manna and A. Pnueli. The Temporal Logic of Reactiveand Concurrent Systems. Springer Verlag, 1991.

[12] M. Minoux. LTUR: A simplified linear-time unit reso-lution algorithm for Horn formulae and computer imple-mentation. Information Processing Letters, 29(1):1–12,1988.

[13] A. Mounji. Languages And Tools for Rule-Based Dis-tributed Intrusion Detection. PhD thesis, FUNDP, Na-mur, Belgium, 1997.

[14] V. Paxon. BRO: A system for detecting network intrud-ers in real-time. In 7th USENIX Security Symposium,1998.

[15] P. A. Porras. STAT – a state transition analysis tool forintrusion detection. Master’s thesis, University of Cali-fornia, Santa Barbara, 1992.

[16] M. J. Ranum, K. Landfield, M. Stolarchuk,M. Sienkiewicz, A. Lambeth, and E. Wall. Imple-menting a generalized tool for network monitoring. In11th Systems Administration Conference (LISA’97),pages 1–8. USENIX Association, 1997.

[17] M. Roesch. Snort: Lightweight intrusion detectionfor networks. In 13th Systems Administration Confer-ence (LISA’99), pages 229–238. USENIX Associations,1999.

[18] E. Shapiro. Alternation and the computational complex-ity of logic programs. Journal of Logic Programming,1(1):19–33, 1984.

[19] A. Sistla and E. Clarke. The complexity of propositionallinear temporal logics. Journal of the ACM, 32(3):733–749, 1985.

[20] H. Spencer. regexp package. Available athttp://arglist.com/regex/, 1986.

[21] M. Y. Vardi. An Automata-Theoretic Approach to LinearTemporal Logic, pages 238–266. Springer-Verlag LNCS1043, 1996.

[22] P. Wolper. Temporal logic can be more expressive. In-formation and Control, 56(1/2):72–99, 1983.