Intrusion detection in Information Systems using Signature technique based on Workow mining

12

Transcript of Intrusion detection in Information Systems using Signature technique based on Workow mining

Intrusion detection in Information Systems using Signature

technique based on Work�ow mining

April 14, 2014

Christian Engelbert Ngono and Roger Atsa Etoundi

University of yaounde 1, Faculty of science, Departmentof computer science

ngonooctavie@gmail, [email protected]

Abstract

Computer security is a global challenge that concerns aseries of elements: the physical infrastructures for pro-cessing or communication, soft wares (operating systemsor applications), data, users' behavior. If the mere ef-fects of unexpected or even hacking can abruptly stopa company from functioning, the risk of intrusion is thenew bogeyman. Although there exist Intrusion DetectionSystems (IDS), they are unfortunately not or no longeradapted to new attacks. Rather, most of them are imple-mented by two main techniques of intrusion detection i.eintrusion detection based on anomalies and those basedon intrusion signatures. Each method has its advantagesand disadvantages, but from the point of view of e�-ciency and reliability, the technique based on intrusionsignatures is better than the one based on anomalies.Motivated by this observation, security researchers inter-est themselves in improving this technique. In this work,we propose an approach for intrusion detection using sig-nature technique based on work�ow mining. Work�owmining allows us to discover signatures that are char-acteristic patterns. These can be used to distinguishbetween desirable and undesirable behaviors in an in-formation system through event logs that contain theevents recorded in the information system during therunning phase of a work�ow. The proposed approachwas tested with the event log of the information systemof a world's leader of professional and consumer health:Philips Healthcare.Keywords: Intrusion Detection, Signature technique,

work�ow mining, information system.

1 Introduction and motivations

The growth of IT solutions and services makes crucialinformation systems security [1] because the informationsystem is a key asset of the organization, which should beprotected. Computer security is a challenge that a�ectsa whole chain of elements: the physical infrastructureof processing or communication software (operating sys-tems or applications), data, user behavior. The overalllevel of safety de�ned by the security level of the weakestlink, precautions and cons-measures must be consideredwith respect to vulnerabilities speci�c to the context inwhich the information system is expected to provide ser-vice and support.No information system is 100% safe![2] Among the

known computer security precepts is that stating that,for a company connected to the Internet, the problem to-day is not whether it will be attacked, but instead whenit will happen; a possible solution is to try to push therisk over time by the implementation of various meansfor increasing the security level. If the mere e�ects ofunexpected or even hacking (hacking) can suddenly puta business at a standstill, the risk of intrusion is the newbogeyman. An intrusion can be de�ned as a set of actionsdesigned to compromise the integrity, con�dentiality oravailability of a resource[1]To counter the threat of intru-sion, companies are turning increasingly towards intru-sion detection solutions whose astronomical possibilitiesare touted by the companies editing these software.The intrusion detection systems have to analyze all

or part of the actions performed on the system to de-tect any malfunctions. But there are mainly two typesof intrusion detection techniques namely the intrusionsdetection based on anomalies and those based on intru-sion signatures. Each method has its advantages anddisadvantages, and intrusion detection systems usuallyimplement both. But from the standpoint of e�ciencyand reliability, it is preferable to use that based on signa-tures. Researchers are looking forward at improving thistechnique. Update pro�les, attack signatures or how tospecify rules are generally di�cult.Furthermore, intrusion detection systems require more

skills than administering the security system. Detec-tion systems are usually written for one environmentand do not �t to the monitored system while the in-

1

formation systems are mostly heterogeneous and usedin many ways.[3] Meanwhile large organizations recordrelevant business events for the improvement processes,audits and frauds detection [4]. This analysis may lead tothe re-engineering work�ow process or provide input forthe design of new work�ows that completes the life cy-cle of the old work�ow. To carry out the process controland diagnosis tasks in a work�ow, the work�ow mining isused as a guideline. Its aim is to reverse the process andcollect data at runtime to support the design and anal-ysis of a work�ow [5]. The information gathered duringthe execution phase of a work�ow is used to derive amodel that explains the events recorded in the informa-tion system through the system events log. Traces inthese events logs can be classi�ed as desirable or unde-sirable (e.g malicious activities or not). Thus, this paperpresents how from event logs generated by the work�owmining, one can use the signature technique to detect in-trusions into the information system. In the context ofintrusion detection, the problem remains open with morecomplex and important issues because we must remem-ber that safety is not. It does not set out to quest for theabsolute or build a new magic line deemed impassable,but to determine an acceptable level of vulnerability interms of constraints and objectives, and monitor failures.IT now irrigates all organizations business processes.

A rupture of these services can have serious consequencesin terms of loss in production. Ensure the reliability ofIT services to allow continuity is essential. Cameroon isrightly concerned, the stakes are high [6] as:

• It has accelerated the growth of the market for ITand digital solutions over the past three years, drivenin particular by demand from government and busi-ness services,

• It has implemented a major computerization of pub-lic services program over the period 2012-2017 thatwill impact the market for automation solutions andbusiness intelligence (55% share) in the administra-tion,

• There is a second source of growth in the digitalmarket with a diverse network of SMEs and smallindustries seeking performance and competitiveness,and more and more users of computer and digitalresources.

To ensure these stakes, it is important to focus on the se-curity issue of your information systems which will soonemerge. Thus the main objective of this work is to studyhow intrusion can be detected in an information systemusing the signature technique, based on the work�owmining.The remainder of this paper is organized as follows:

Section 2 presents the scienti�c concepts or the reviewof the literature on the detection of intrusions into infor-mation systems. Section 3 introduces the formalization

of the problem of security, based on the detection of in-trusions into information systems. Indeed, it presentsthe formal de�nition of concepts related to the informa-tion system. Then, it presents the security policy, thenthe descriptive model and the normative model. Finally,we illustrate the model of intrusion detection. Section 4proposes the implementation of this model in a frame-work called ProM that contains a set of plugins for theanalysis and discovery process models through the eventlogs. Section 5 concludes this work by recalling the mainresults, and �nally, the research perspectives that seemmost important for future works.

2 Related Work

In the audit trail, two di�erent things can be searched.The �rst is the behavioral approach that implements theanomaly detection techniques, that is to say, it is whethera user had deviant behavior relative to its habits. We caneither deduce that it is someone else who has taken hisplace, or that he is trying to attack the system by abusinghis rights. In both cases, there is intrusion. The secondthing that can be in the audit trail is an attack signa-ture. This corresponds to the scenario approach that im-plements the technique based on signatures. Thus, thissection presents two techniques and their shortcomings,then later on introduces the concept of work�ow miningand process mining which is one of its �elds focusing onthe mining of processes in a work�ow and �nally presentthe di�erent solutions for security related to the intrusiondetection that exists in the mining process.

2.1 The anomaly detection technique

The anomaly detection is based on the normal behaviorof an actor in an information system. For it earn, any ac-tion that deviates signi�cantly from the normal behavioris considered an intrusion. Intrusion detection based onanomalies is useful for detecting attack types: (1) Themisuse of the protocol and port services, in this case, thecharacteristics of standard protocols can sometimes beviolated or modi�ed by an intruder to tunnel througha �rewall, the installation of a backdoor services on thewell-known standard ports is another common bad use ofservice ports; (2) Denial of Service (DoS) designed basedon payloads, in this type of attack when a malicious in-truder creates an attack using a designed IP packet, itresults that DoS may occur on the network bandwidth,on CPU cycles, on memory resources, or on process /program application. The impact of DoS attack is ananomaly in the quality of service (3) Distributed Denialof Service (DDoS) is a form of attack that �oods thenetwork with a large volume of tra�c. This is due tothe fact that sophisticated tra�c attack cannot be dis-tinguished from regular network tra�c on a peer packetsand the attack does not occur by a speci�c signature that

2

can be sensed by signature based mechanisms, (4) Bu�erOver�ow is the most common vulnerability exploited byattackers. Bu�er over�ow with shell code execution isthe most severe form of this feat because a successful at-tack can lead to execution of arbitrary code on the vic-tim's information system. Many exploited �elds, such aspasswords for FTP are supposed to be composed of print-able ASCII characters based on Request For Comments(RFC) standards by the working group of the InternetEngineering Task Force (IETF). Excessive unprintableASCII characters are strongly suspected anomalies. Inaddition, the integrated shell code in these areas is suresigns of malice, (5) Other failures of natural networks arebased on failures in routers / switches that can lead tochanges in tra�c patterns, observed on Network. Thistype of attack can be observed by a sudden drop in thevolume of tra�c due to broken connections, or as pass-ing tra�c from one link to another as a recovery ac-tion. All these changes are remarkable and can be de-tected in the tra�c anomalies. Techniques of anomalydetection represent a wide range of detection techniquessuch as Anomaly Detection Protocol, anomaly applica-tion payload and intrusion detection based on statisticalanomaly[7], techniques for detecting intrusions based onanomalies detect unusual behavior and therefore have theability to detect symptoms of attack without knowledgeof speci�c details. They also help to generate informa-tion that can be used to de�ne signatures for misuse de-tectors. However, these techniques generally produce alarge number of false alarms due to the unpredictablebehavior of users and networks. Moreover, they often re-quire learning sets records system events to characterizenormal behavior.

2.2 The detection technique based on

signatures

The detection technique based on signature, also knownas abusing detection techniques, enables intrusion detec-tion to catch up in terms of characteristics of known at-tacks or system vulnerabilities. Therefore, any measurethat conforms to the model of a known attack or vul-nerability is considered intrusive. This technique refersto techniques that use known intrusion models or systemweaknesses of to intercept and identify intrusions. Thesequence of actions or crisis activities, conditions thatcompromise the security of an information system andthe damage left by the intrusions, can be represented bya number of general models for pattern recognition. Toperform the intrusion detection, researchers in this �elduse rules to describe the attack[8] actions, state transi-tion diagrams modeling the general system status andthe violations of access [9], control of colored Petri netsto represent intrusion signatures as sequences of eventson the target system[10] In intrusion detection from thesignature, two techniques are used: (a) "correspondence

of expression " based on the recognition of expressions,which searches a stream of events for occurrences of spec-i�ed patterns or signatures, (b) an analysis of state tran-sition of the model as a network attacks of states andtransitions (corresponding events) where each observedevent is applied to �nite instances of state machine, eachrepresenting an attack scenario and which can cause tran-sitions. Any machine that reaches its �nal state indicatesan attack. This approach allows the complex intrusionscenarios to be modeled in a simple, and is able to de-tect slow or distributed attacks, but may have di�cul-ties expressing the developed scenarios. On the basis ofthese techniques, techniques of intrusion detection basedon signature permits detect attacks without generatinga large number of false alarms as was the case for thedetection of anomalies techniques, they can also assist inmanaging information system rapidly and reliably, diag-nose usage of speci�ed attack tool or technique to takecorrective action and follow safety problems on their in-formation systems, however, Intrusion detection basedon signatures can only detect known attacks and mustconstantly update information about new attacks. Todo this, the newly invented attacks will probably go un-noticed, leading to false negative error rate unacceptable.Based on these techniques, many tools have been devel-oped to address the problem of intrusion detection basedon the application domain. Despite the popularity of theabove techniques and associated tools, there is still con-siderable work to be carried out on the problem of intru-sion detection because there is no solution that handlesthis problem in a whole way. As the di�erent techniquesde�ned do not overcome all the di�culties encounteredin the problem of intrusion detection. In this work, theproblem of intrusion detection is considered engineering.The approach used in this research study is based on in-trusion detection based on the discovery of signatures inthe event logs or work�ow logs that can be used to ex-plain or predict the class of traces of malicious activitiesor not in an information system.

2.3 The work�ow mining

The starting point for the exploration of work�ow isbased on the work�ow log containing information aboutthe work�ow work processes as it is actually being im-plemented in the information system. The life cycle ofa work�ow consists of four phases[5]: (1) the design ofwork�ow that allows the construction of work�ow, modelbased on the information at hand and objectives with ac-tivities identi�ed and constraints, (2) the work�ow con-�guration that addresses the limitations and characteris-tics of the management system work�ow resulting fromthe achievement of certain objectives within an organi-zation, (3) the adoption of work�ow that deals with theintegration into the work�ow using new features accord-ing to new requirements that have been identi�ed on thebasis of the strategy of one organization. In this phase,

3

the information system is consistent with the vision ofthe organization, and (4) to the diagnosis of work�owin this phase as from which the data obtained from in-stances of execution �ow are analyzed. This analysis canlead to work�ow / reengineering work�ow processes orprovide input for the design of new work�ows that com-pletes the life cycle of the old work�ow. To carry outthe process control and diagnosis tasks in a work�ow,the work�ow mining is used as a guideline. The purposeof work�ow mining is to reverse the process and collectdata at runtime to support the design and analysis of awork�ow. The information gathered during the execu-tion phase of a work�ow is used to derive a model thatexplains the events recorded in the information system.Modeling a work�ow instance is usually carried out bytaking into account the perception but do not really ex-press what exactly will be done in practice. This seemsto be the same in the �eld of security as the securitypolicy is set to the perception that the head of securitydepartment of the information systems has, but not whatis actually done in the real world. To do this, the modelsare often normative because they indicate what shouldbe done rather than what is actually done by the work-�ow process. The type of model seems to be subjective.For a model to be objective, the designer must considerthe data produced by the actual execution of work�owinstances. The data obtained are related to the activ-ities, to the time constraints and resources responsiblefor their generation. The work�ow mining can then bevery important in the problem of intrusion detection be-cause it allows the elaboration till what is actually donein the information system by building of a model corre-sponding to the situation or the state of the informationsystem in terms of work�ow execution, this model can becompared to the perceived model also known as a priormodel to highlight the di�erence between the two mod-els. The prior model is generally based on the securitypolicy of the perceived information that is de�ned by theset of actions that must be performed by users identi�edon the basis of given resources. The prior model is cal-culated on the basis of the information obtained withinthe execution phase of the set of operations in the infor-mation system. These associated data are stored in thelogs of the information system. These logs keep trackof any event that occurs in the information system forfurther analysis and decision making. Monitoring eventshappening at runtime can also detect di�erences betweendesign built at the design stage of the security policy andactual execution recorded in the treatment of intrusion.Work�ow technology is moving in the direction of greateroperational �exibility to the management of work�ow ex-ceptions. Most exceptions can be considered as an intru-sion into the problem of intrusion detecting. Accordingly,users of information system may di�er from the design ofa policy of information security at prior, for this purpose,according to the data stored in the system log informa-

tion; the gap bound can be monitored. To bring out thedi�erence de�ned above for the purpose of detecting in-trusion. There is no doubt, data mining techniques canbe used to create a feedback loop to adapt the work�owmodel to changing circumstances and in the meantimeto detect imperfections during and after the implemen-tation phase . The use of work�ow mining in this work isto show how work�ow models are constructed from work-�ow event logs, but to detect the intrusion of event logsaccording to prede�ned system's security policy. Sincewe will focus more on process analysis in the event login work�ow mining, there is an �eld which permit's dis-cover process, make the delta analysis and analysis ofperformance in work�ows using event logs, this area isthe process mining.

2.4 The mining process

The main goal of process mining is to use logs trans-action to extract interesting information about transac-tional processes. For this, we assume that it is possibleto record events such as: an event refers to an activity, acase, an actor (which can start or run a activity). Eventshave a date stamp and are always arranged in the orderof occurrence in time. The table below presents an ex-ample of integrating log: 19 events, 05 activities and 6players

Table 1 - An example of events log (audit trail)

In addition to information seen in this table, some logshave more information about the event itself [11] Theevents recorded in the log as shown in the previous ta-ble are used as starting point for the process of processmining. For this we distinguish three di�erent perspec-tives: the process perspective (How?), the organization'sperspective (Who) and the cases prospective (What?).

4

The process perspective: the goal is the �ow control(sequence of activities). It is about �nding a good charac-terization of all possible paths, expressed in terms of net-work knead[12] Or events led process chains (EPC)[13].

The organization's perspective: this perspective fo-cuses on the "actor" �eld, Its purpose is to structure theorganization in classifying people in terms of role andorganizational unit or showing the relationships betweenthe various stakeholders.

The cases prospective: it focuses on the properties ofcases. Cases can be characterized by their way into theprocess, or by the actors involved in this case. The �gurebelow illustrates the �rst two perspectives, using the logof Table 1.

Figure 1: Result of the mining process for the processand organization perspectives Research has focused onprocess mining to intrusion detection. Thereafter some

approaches are presented.

2.5 Intrusion Detection based on process

mining

Intrusion detection in information systems is a new areaof research, This work is in the context of business pro-cesses within companies and more research has been doneon the detection of abnormalities of their performances,[14]Use of association rules and classi�cation to detect

attacks in the audits TCPdump tracks, For this reason,it built a folder during the learning phase (period with-out attack) frequent normal itemsets. Then it performsa real-time algorithm which �nds frequent itemsets andcompared to those stored in the directory of normal item-sets. Finally, it uses a workbook previously trained toclassify suspicious connections type ½attacks known, un-known or false alarm. The limitation of this solution inthe case of this study is the learning phase: the thresh-old for mining association rules and data set. Because itis not easy to distinguish (without a real manual labor)between normal connections and attacks.[15]Use an α-algorithm to discover acceptable behavior

or normal business of a process as a work�ow net whichis then used to detect cases of abnormality. Once thisstep, "compliance" of each new audit trail can be checked

using the "token game". The problem here is to �nd acorrect and complete event log of the system. Moreover,we cannot certify that the work�ow produced representsall possible business process execution paths (problem of�exibility). Two metrics are then proposed to solve theproblem of �exibility.

[16]Solving the problem of �exibility in the system,and proposing to consider an anomaly as a less frequentevent. Unlike the α-algorithm, it proposes three miningprocess algorithms (sampling, threshold and incremen-tal) to detect anomalies. However, as the author pointsout, this method has practical limitations due to the factthat these algorithms support only small logs, [17]Takinginto account the problem of �exibility and o�ering a 05-step approach in supporting large logs. But in its de�-nition of an anomaly, it takes into account the executionpath, forgetting, triggers activities and data quality. Oneapproach that is proposed in [16].

All works on the process mining related to intrusiondetection were based on the study of the behavior ofthe information system using the event log but still as itwas noted in the introduction, this technique is not veryreliable and e�cient hence the interest to see how thesignatures patterns can be generated across these eventlogs.

3 Modeling approach of the pro-

posed solution

In this section, the model for the problem of intrusiondetection by signature technique in an information sys-tem is presented, based on the technical mining processpresented in the previous section. The construction ofthe model starts with the description of the conceptsused that will allow to better understand the relation-ships between di�erent concepts that lead to event logsinformation system. The description is followed by thede�nition of the normative model of security policy ap-plied in the information system. The normative modelis followed by the de�nition of descriptive model associ-ated with the e�ective use of event logs of the informationsystem. Finite by de�nition section de�nes the approachintrusion detection based on the discovery of patternssignature event logs.

3.1 De�nitions of concepts

In the area of business process modeling , several meth-ods have been developed and used. Despite the popular-ity of some, there is not a consensus for modeling stan-dards and concepts. However, there are several perspec-tives to be taken into account for better management ofwork�ows in a company, such as processes, organization ,information, operations and quality of service, ATSEROmethod [18]is based on formal modeling and describes

5

several salient concepts inherent in the understanding ofthese perspectives.1.1.1 De�nitions (Business Process)A business process is a collection of activities or tasks

designed to produce a speci�c output for a client. Thisimplies a strong emphasis on how work is done in a com-pany to deliver a service. Against by a process is a spe-ci�c sequence of activities across time and space, witha beginning and an end, and clearly de�ned inputs andoutputs .1.1.2 De�nitions ( Work�ow)The work�ow is de�ned as a process ( often adminis-

trative) an organization in which tasks , procedures, andinformation is processed or executed successively fromone participant to another according to the rules of busi-ness process ( the participant may be either a human ora machine) .1.1.3 De�nitions (Business)A business is an operating system with a service deliv-

ered based on a certain quality of service. The system isbased in terms of business process , each of them beingimplemented by one or more work�ows[19] Formally, anoted information system is modeled by SI <BP>:

• BP is a set of business process .

A business process is modeled by noted BP <Log> :

• Log is a set of event log , event log each correspondsto a work�ow that implements the business process,

1.1.4 De�nitions (Event log)

An event log is the trace of execution of a work�ow is aset of cases which consist of activities characterized bytheir actor and a date stamp . Formally there will be:

• Case= <Event > , or Event is the set of events andevent refers to an activity,

• Event -> Activitv * Author * Timestamp , or Ac-tivity is referred to the activity, the author Autoractivity (agent) and Temestamp the predator ,

• Eventlog= <Cass> , or Cass Case is a set of eventlog .

Intrusion detection aims to �nd in the system executionabnormal runtime behavior. In the case of informationsystems, or events in the system are stored in the logs(event log) , it is possible to detect intrusions using theselogs.

3.2 Security Policy

Security of information systems is generally limited toensure access to data and system resources rights, imple-menting authentication and control mechanisms . Thesemechanisms ensure that the users of these resources haveonly those rights that were granted them. IT security

must however be considered so as not to prevent users todevelop practices that are necessary, and to ensure theycan use the information system with con�dence. Thisis why it is necessary to de�ne at �rst a security policy, that is to say all the guidelines followed by an entitysecurity[1] In the case of our model , it directs how thetraces in event logs can be classi�ed as desirable or un-desirable (eg , malicious or fraudulent behavior). Thispolicy is a set of rules that are needed to be followed byknown and authorized entities in the organization. Inthis work , based on the problem of intrusion detection, which is considered as one of engineering, the securitypolicy is based on the discovery of signatures that can beused to distinguish between the behavior desirable andundesirable . The concept of event log is used to identifybusiness events relevant to process improvement , audit-ing and intrusion detection causing the violation of itssecurity rules . When an activity is then performed withthe organization, the trace of this execution should be re-tained for further investigation . The concepts used areLog event , the Work�ow, the Business Process . Basedon these concepts, the SP of the security policy is mod-eled as follows :

• PBs : set of P business processes of the informationsystem,

• Logs: set of P event logs associated with the businessprocess work�ow,

• SIs = SI→ BPs : is a function that gives for an infor-mation system the set of all its business processes,

• BPs = BP→Logs: is the function that gives for eachbusiness process all logs associated work�ows

• There will be a function that takes between an eventlog and return output two subsets containing a sig-nature pattern of unwanted system activity and theother containing the signatures of desired patternsof activities. There will therefore be intrusion if theset containing the unwanted activities is not empty,this function will be formally de�ned in the followingwork after �nishing the discovery phase signature inthe event log .

Based on the security policy , the normative model ofthe information system can now be expressed

3.3 Normative Model of Information

System

Based on an application of security policy in the tar-get information system, and without any external actionthat violates this policy, the model of the informationsystem in time is as expected by the various o�cials ofthe organization. The normative model , designated bySInor is formally speci�ed as follows:SInor = (SI , Runs , SP) or

6

• SI: is the information system in place, SP is the se-curity police,

• Runs : is the set of work�ows executed in the infor-mation system,

• SP is associated security policy .

The normative model is far short of the reality of the in-formation system. Considering the higher number of at-tacks that deal with various information systems world-wide, the normative model of the information system is aview of the mind given that , security rules are not met onthe internal database or external actions. The number ofattacks has indicated that there is no case given the realsituation as in many organizations , managers do not re-veal the attacks to the public. This is sometimes becausethey do not want to frustrate their customers. In doingso , this behavior angry customers as a result, their billweights depend on the relationship of trust with theircustomers decrease. In most time , they do not evenknow that their information system has been a sourceof attacks. Based on these strong about the normativemodel , the actual situation of the information system isrepresented by the descriptive model.

3.4 Descriptive security model

This model describes the information system using a realimplementation of activities in the system. SIdes is thedescriptive model de�ned as follows:SIdes= (SI ; Runs )

• SI: is the information system,

• Runs : is the set of events satisfying the trigger con-ditions for the execution of tasks.

3.5 The Intrusion Detection model

Our model begins with the discovery model of signaturesbased on [4]and ends with the detection model itself

3.5.1 Model Discovering signature

The discovery of signatures shown in Figure 2 for discov-ering patterns that discriminate between di�erent cate-gories of behavior (the starting point is an event log ).Each event refers to an activity or action (eg , a de�nitestep in the process ) and is linked to a particular case(eg , a process instance) . events belonging to a case areordered. Consequently , cases are represented as tracesof events that corresponding to "runs" of a process maybe unknown . events can have all kinds of attributes (eg, timestamp, resources used , temperature, costs , etc.) . proposed discovery signatures model in Figure 2 isgeneric and works for any event log with event marked(meaning di�erent classes of behavior) with a provisionfor some of the remaining cases not labeled. We now

explain the components of our model of discovering sig-nature .

Figure 2. Model for signature discovery.

The block depicted in dashed rectangle is an optionalstep that is to be considered when some of the cases inan event log are unlabeled.

A. Class labeling When event logs contain somecases that are blank, an important issue to resolve isthe labels can be assigned to cases not labeled? E�ec-tive ways to automate or semi-automate the labels mustbe designed. We propose the use of clustering techniquesand/or classi�cation, such as k-nearest neighbor and sup-port one class vector machine (SVM ) machine learningto help in the class labeling . For example:

• If unlabeled instances must be assigned to one of theclass labels already present in the event log, then theapproach of k-nearest neighbor may be the basic ideais to determine the k-case closest labeled for eachunlabeled instances and assign the majority classinstances k as class label for the unlabeled example,

• If instances marked in the event log belong to oneclass and we are interested in the labeling of unla-beled cases to larger of the two classes, e.g fraudulentor not fraudulent, as in the case of claims for insur-ance , an interesting approach is the use of a classof vectors machine support. Here, the instances ofa class (eg, non-fraudulent) are assumed to be la-beled. A class work for SVM can be negative in itsown way, ie, the distribution of negative instances isunknown. Once a class SVM is built on the case ofnon-fraudulent , all unlabeled instances can be eval-uated both belonging to the non-fraudulent or notand class marked accordingly, with the assumptionthat all (non-fraudulent) positive cases are similar

7

while each negative instance (fraudulent) . Afterrunning this step, all instances of the event log musthave a class label .

After this preprocessing step , we can discover patternsthat are speci�c to each class and discriminate betweenclasses,

B. Extraction and Feature Selection This stagecorresponds to the feature extraction from a log of eventsthat form the basis of signature models. Once character-istics are de�ned, each instance in the event log mustbe transformed into a vector space, where the elementsof the vector correspond to the value of the selected fea-ture in the proceeding. A wide variety of types of entitiesneed to be considered and the choice of entity type largelydepends on the nature of the problem and its manifes-tation in the event log. Domain knowledge can help inchoosing an appropriate characteristic. Examination ofindividual events, the sequence features (tandem array,maximum repetitions and its variants) , and the char-acteristics of the alphabet de�ned as characteristic arerecommended. Sequences characteristics are importantwhen the occurrence of a particular sequence of eventsin the system log de�nes a symptomatic model, for ex-ample, when a part of malicious activities, resources thatinteract with this malicious activity and attempts look-ing for a response from the game. Attempts often man-ifested in the form of loops, are captured with arraysof tandem. The characteristics of the alphabet are de-rived from sequence features by releasing the order ofevents. The sequence characteristics of which are de-�ned on the same set of activities (events) are consid-ered equivalent under a feature of the alphabet. In ad-dition to the above characteristics, the characteristics ofrestoration to other perspectives such as data (e.g. , dataobjects and their values in each trace) can be adopted.If the number of extracted features is large, then it leadsthe problem of curse of dimensionality, the feature selec-tion techniques dealing with the elimination of irrelevantand redundant features. Simple �ltering techniques suchas the removal of unusual features on dimensionality re-duction techniques such as advanced PCA for feature se-lection can be adopted. Once the extraction and featureselection made , the event log is processed in a vectorspace as shown in TABLE I.

C. Discovering Patterns Given a set of data asshown in Table I, the objective of this step is to dis-cover trends on traits that are highly correlated with theclass label (eg, normal or malicious). We adopt standarddata mining techniques, ie, learning decision tree and as-sociation rule for mining. These two learning algorithmsare chosen mainly for three reasons:

• They are non-parametric, in e�ect , no speci�c dis-tribution data (the set of input data) is assumed.

Table 1: the case of labeling a log of events aretransformed into a vector space based on selectedcharacteristics (� , f 2 ; ::: fm ) , one can choose

between a ( binary) nominal representation ( where thevalue of a feature in a case corresponds to the presence/ absence of the function in this case) and a digital

representation ( where values correspond to thefrequency of the function in the case).

• They generate simple , understandable and easy tointerpret rules by domain experts

• They can easily handle imbalanced data sets , iedata sets where instances of each class are not ap-proximately equally represented.

For the extraction of association rules , Sub special setcalled the class association rules is adopted , which is anintegration of the mining classi�cation rule and searchassociation rules [20] [21], The results of this step arerules such as :

Where Vij are values for corresponding features.

D. Evaluation Standard measures for data mining asthe number of true positives (TP) , false positives (FP) ,true negatives (TX) and false negatives (FX), and mea-sures derived from thereof, such as accuracy, sensitivity,speci�city, accuracy, and FI-score to evaluate the qual-ity of the signatures discovery are adopted. Models witha sensitivity and speci�city close to 1 : 0 are preferred.For a set of data, one can construct many classi�ers. Thedi�erences are mainly due to the choice of parametervalues for the learning algorithm (eg, the standard splitdecision trees, the minimum support and minimum con-�dence constraints in association rule minin, etc.). Animportant characteristic of a learned model is its gener-alization. Widespread reference to the performance of amodel learned over the invisible examples. If the dataset is used to learn the signatures and signatures discov-eries may be over�tting. Accordingly, the learned modelcan perform well on the set of input data, but performspoorly on new examples. Therefore, we adopt the cross-validation techniques during the learning phase in theabove step . The cross-validation technique is a selectionof model where the game input data is divided into two

8

sub-sets, namely, a training set and a validation set. Themodel is learned on the training set and evaluated in thevalidation set. A particular case of the cross-validationtechnique is k-fold cross validation, where all input datais divided into k subsets, and the model is learned on thetraining data comprising k-1 sub-set and validated on thelast subset. This process is repeated k times with k divi-sions between di�erent training and validation data. Theperformance of cross-validation is the average of results(with respect to measures such as precision) for all splits,We prefer signature models with better performance ofcross-validation. If the performance is not satisfactory,you can change the settings for the learning algorithmand relearn signatures.

E. Reporting and Visualization The �nal step inthis model discovery signature presents the results anddisplays the results. Automated reports raising signedmodel and performance indicators are generated. In ad-dition to reports , the results can be described in pictorialforms such as pie charts and scatter plots. This visual-ization helps to evaluate the goodness of a set of features.In the �gure, we can see that the two classes ( normaland malicious) are clearly separated indicating that therepresentation function used for all cases is good enoughto �nd discriminatory patterns

Figure 3:Visualization of dataset using principalcomponents.

3.5.2 Model itself

In the evaluation phase of this signature model, the stan-dard negative measures for data mining as the number oftrue positives (TP) , false positives (FP) , true (TX) andfalse negatives (FX) and measures derived from them,such as accuracy , sensitivity , speci�city , accuracyand F- score to evaluate the quality of the discoveryof signatures were selected . Depending on the �eld ofstudy, there will always be two subset at the end of thisphase, the set containing the grounds of malicious ac-tivity recorded {False-process} and a set containing the

reasons normal activities noted {True-process} To de�nethis model, some de�nitions are used,

• Signature_Discovery_True =Log→{ True-process}: the function that takes as input an event log andreturns {True-process}

• Signature_Discovery_Flase =Log→{ False-process}: the function that takes as input an event log andreturns { False-process}

Let normative and descriptive model of the informationsystem be noted SIdes and SInor respectively and the nor-mative information system be noted SI . SIdes is saidconform to the normative information system , denotedSIdes|= SInor if and only if the following condition is sat-is�ed:

∀Log1∈BPs(BP)∧BP∈SIs(SIdes) and∀Log2∈BPs(BP)∧BP∈SIs(SInor)(Signature_Discovery_False(log1)=Ø

∧ Signature_Discovery_False (Log2) =Ø) , Similarly,the descriptive model will not conform to the normativemodel noted by SIdes2SInor , if and only if all signatures :Signature_Discovery_False (Log)6=Ø this is to say thatthere are reasonable signing in seems malicious activity .

Property : ∀Log∈BPs (BP) ∧ BP∈SIs (SI),Signature_Discovery_True (Log) 6= Ø.

4 Implementation

In this step, the implementation focuses more on the �rstpart of this model namely discovering signature to testif this approach actually works on the actual detection, it will make our case studies as speci�ed in context[4]in order to have the event logs of said information sys-tem. To do this, the software ProM which contains :converters ( e-mail , Sta�ware , InConcert , SAP , etc.) , plugins for the mining process , the plugins for theanalysis work�ow , the plugin for performance test andplugin for social networks[4] is used , given an event logwhere the cases are labeled (indicating di�erent classes ofbehavior), the software discovers discriminatory groundswhich distinguish between di�erent classes of behavior.This software assumes that a case label provided that thevalue of the attribute in the "class" button in the eventlog, Fig. 3 illustrates the con�guration step for the classlabeling while Figures 4 and 5 show the con�gurationsteps for the extraction /selection function and learningalgorithm, respectively, Fig.6 shows the results providedby the plugin.

9

Figure 4: Step con�guration for labeling instancesunlabeled in an event log . The plug-in supports twoalgorithms, namely, k-nearest neighbor and SVM class

for class labeling .

Figure 5: Step con�guration for the feature extractionand selection . Di�erent types of functions aresupported , for example, the sequence and thecharacteristics of the alphabet : tandem arrays,

maximum repetitions and variations , and individualevents.

Figure 6: Step Con�guration F learning algorithm todiscover the signature motifs. Two classes of algorithms

, namely: The decision trees and association rulesmining are supported.

To test this plugin with an event log, the event log of aninformation system for a global leader in professional andconsumer health Philips Healthcare were used . A studyof fault diagnosis of X-ray machines installed worldwideand contains all events conducted by Jagadeesh ChandraBose and Van der Aalst. Logically Philips Healthcaremachines are not supposed to malfunction during theirlives. However, when they do, it is important that theseproblems are quickly and predictably corrected. The X-ray machines considered in this study are installed world-wide , and continues to record all the major events (eg the�rst operating system, warnings, errors, etc.). Moreover, problems (complaints customers) and the actions takento address them are registered as worksheets . The com-bination of two data sources (logs and worksheets ) area rich source of historical data service. The organizationhas seen an opportunity to improve their system mainte-nance through, diagnostic based on the log. Speci�cally,they are interested in whether the diagnostic value ofthe system recorded can be upgraded to discover pat-terns that can be correlated to known problems and/orstrong corrective action. In this case study , they limitedthemselves to the task of �nding patterns in symptomaticevent logs that can be associated with a malfunction re-quiring replacement parts in an X-ray machine . Partsthat can be replaced in the system are called FRU.

Figure 7 : Results of the �Signature Discovery� plugin.The plugin estimates di�erent measures of quality for

each discovered signatures.

5 Conclusion and future work

Ultimately, this research aims to propose an approachthat allows that from work�ow mining , or signaturesthat are characteristic patterns can be used to distin-guish between desirable and undesirable behaviors arefound in the information system through event logs thatcontain events recorded in the information system dur-ing the execution phase of a work�ow. The objective

10

is to detect intrusions in information system using thesignature technique based on work�ow mining.

And the memory begins by presenting scienti�c con-cepts or the review of the literature on intrusion detectionin information systems. Then it presents the proposedapproach , in fact, this part presents the �rst formal def-inition of concepts related to the information system .Then, the security policy then the descriptive model andthe standard model . Finally, the model of intrusiondetection and implementation of our model is proposedusing a ProM framework that contains a set of pluginsfor the analysis and discovery process models throughthe event logs .

Given the fact that business process is implementedby one or more work�ows , de�ning multiple work�owsperforming the same tasks o�ers �exibility that can beused to implement degraded modes. For example, toexchange data between two entities of a virtual enter-prise , a broadband connection can be used , allowingrapid transmission of data so that it becomes possiblefor the employees to work online transactional mode. Tochoose the work�ow that best �ts the context, it is pos-sible to establish a link between the processes and re-sources through a formalism particular (graphs, networkspetri)[19]. This solution takes into account the link be-cause the signatures are formalized on the events thatare tasks work�ows that these are a set of activities thatuse resources that had not yet done so far. Since all thedetection methods based signature technique manipulatedata . Thus, this solution will set security on two virtualenterprises[4] .

As perspective for this work , it is planned to conduct astudy of cases on information large companies nationwidesystems to see the inventory security and implement thismodel of intrusion detection.

6 Acknowledgements

This research is supported by an information system ofintegrated management for state employees and the bal-ance of the Ministry of Public Service and AdministrativeReform. The authors thank Professor will want Wil Vander Aalst from Eindhoven University of Technology forhis valuable advice and prompt response to provide uswith documents that have allowed us to sit our problemsand build in the methodological framework this work.

References

[1] Michel, C. (2003). Langage de descriptiond'attaques p our la détection d'intrusions parcorré-lation d'évènements ou d'alertes en environ-nement réseau hétérogène (Do ctoral dissertation,Rennes 1).

[2] Dagorn, N. (2006). Détection et prévention d' intru-sion : présentation et li mites.

[3] Jansen, W., M ell, P., Karygiannis, T., & Marks, D.(2000, June ). Mobil e Ag ents IN Intrusion Detec-tion AND. In pro ceedings of the lZth annual Cana-dian Iiiformation Technology Security Symp osium,Ottawa, Canada.

[4] Bose, R. P., & van der Aalst, W. M. (2013,April). Discovering signature patterns from eventlogs. In Computational Intell igence and Data Min-ing (CIDM), IEEE Symp osium on (pp. 111-118).IEEE.

[5] van der Aalst, W. M., van Donge n, B. F., Herbst,J., Maruster, L., Schimm, G., & Weijters, A. J. M.M. (2003). Workow mining : A survey of issues andapproaches. Data & knowledge engineering, 47(2),237-267.

[6] Patrick Bassom (September, 2013). Rencon-tre d'Aaires dans des technologies numériquesCameroun, Gab on, Douala Libreville. http://www.ubifrance.fr

[7] Atsa Etoundi, R., Mboup da Moyo, A., NkoulouOnanena, G., & Nkondock Mi Bahanag, N. (2013).A Formal Framework for Intrusion Detection withinan Information System based on Workow Audit. In-ternational Journal of Computer Applications, 81.

[8] Lunt, T. (1993). Detecting intruders in computersystems. In In Proceedings of the 1993 Conferenceon Auditing and Computer Technology.

[9] Ilgun, K., Kemmerer, R. A., & Porras, P. A. (1995).State transitio n analysis : A rule-based intrusiondetection approach. Software Engineering, IEEETransactions on, 21(3), 181-199.

[10] Kumar, S., & Spaord, E. H. (1995). A software ar-chitecture to supp ort misuse intrusion detection.

[11] van der Aalst, W. M., Reijers, H. A., Weijters, A. J.,van Dongen, B. F., Alves de Me de iros, A. K., Song,M., & Verb ee k, H. M. W. (2007). Business process mining : An industrial application. InformationSystems, 32(5), 713-732.

[12] van der Aalst, W. M. (1999). Formalization and ver-ication of event-driven pro cess chains. Informationand Software technol ogy, 41(10), 639-650.

[13] van der Aalst, W. M. (1998). The application ofPetri nets to workow management. Journal of cir-cuits, systems, and com pute rs, 8(01), 21-66.

[14] Barbara, D, J Couto, S Ja jodial, & N PWu(2001). ADAM : detecting intrusion by data min-ing. IEEE(Workshop on information assurance andsecurity), 11-16.

11

[15] Bezerra F, Wainer Jacques, et Aalst W.M.P(2009).Anomaly detection using process mining. En-treprise, Business-Process and Information SystemsModeling(29), 149-161.

[16] Bezerra F, & J Wainer (2008 , June). Anomaly de-tection algorithms in business process logs. ICEIS2008 :Proceeding of the tenth Internationan Confer-ence on Entreprise Information System,11-18

[17] Rozinat, A, et Van der Aalst (2008,March). Confor-mance checking of process based on monitoring realbehavior Information system, 64-95.

[18] Etoundi, R. A. ATSERO Method : A Guideline forBusiness Process and Workow Modeling within anEnterprise.

[19] Hervé Mathieu, Thierry G. Utilisation des contratsde service pour gérer le système d'information.

[20] B. Liu, W. Hsu, and Y. Ma(1998), Integrating Clas-sication and Association Rule Mining. In Fourth In-ternational Conference on Knowledge Discovery andData Mining (KDD). The AAAI Press, pp. 8086

[21] Qinglei Zhou and Yilin Zhao (2013), The Designand Implementation of Intrusion Detection Systembased on Data Mining Technology Research, inJournal of Applied Sciences, Engineering and Tech-nology 5(14):, pp. 3824-3829.

12