Frequency- and ordering-based similarity measure for host-based intrusion detection

13
Frequency And Ordering Based Similarity Measure For Host Based Intrusion Detection Sanjay Rawat 1,2 , V P Gulati 2 , and Arun K Pujari 1 1 AI Lab, Dept. of Computer and Information Sciences University of Hyderabad, Hyderabad-500046, INDIA [email protected] 2 IDRBT Castle Hills, Road No.1 Masab Tank, Hyderabad-500057, INDIA [email protected], [email protected] Abstract. This paper discusses a new similarity measure for the anomaly- based intrusion detection scheme using sequences of system calls. With the increasing frequency of new attacks, it is getting difficult to up- date the signatures database for misuse-based intrusion detection system (IDS). Though, anomaly-based IDS has a very important role to play, the high rate of false positives remains a cause for concern. Our work defines a similarity measure that considers the number of similar system calls, frequencies of system calls and ordering of system calls made by the processes to calculate the similarity between the processes. We propose the use of Kendall Tau distance to calculate the similarity in terms of ordering of system calls in the process. The k nearest neighbor (kNN) classifier is used to categorize a process as either normal or abnormal. The experimental results, which we perform on 1998 DARPA Data, are very promising and show that the proposed scheme results in a high de- tection rate and low rate of false positives. Keywords:Intrusion Detection, Anomaly, System Calls, kNN Classifier. 1 Introduction Intrusions are attempts at compromising the confidentiality and integrity or by- passing the security mechanisms of a computer or network. Intrusion Detection is the process of monitoring the events in a computer or network and analyzing them for signs of Intrusions. With the rapid growth of attacks on computers, in- trusion detection systems (IDS), which are software or hardware products that automate this monitoring and analysis process [2], have become a critical com- ponent of security architecture. According to Axelsson [1], an IDS consists of an Audit Collection/Storage Unit, Processing Unit and an Alarm/Response unit. While the Audit Collec- tion/Storage Unit collects data that is to be analyzed for signs of intrusion, the Processing Unit analyses the data received using various techniques to zero in

Transcript of Frequency- and ordering-based similarity measure for host-based intrusion detection

Frequency And Ordering Based SimilarityMeasure For Host Based Intrusion Detection

Sanjay Rawat1,2, V P Gulati2, and Arun K Pujari1

1 AI Lab, Dept. of Computer and Information SciencesUniversity of Hyderabad, Hyderabad-500046, INDIA

[email protected] IDRBT

Castle Hills, Road No.1Masab Tank, Hyderabad-500057, INDIA

[email protected], [email protected]

Abstract. This paper discusses a new similarity measure for the anomaly-based intrusion detection scheme using sequences of system calls. Withthe increasing frequency of new attacks, it is getting difficult to up-date the signatures database for misuse-based intrusion detection system(IDS). Though, anomaly-based IDS has a very important role to play,the high rate of false positives remains a cause for concern. Our workdefines a similarity measure that considers the number of similar systemcalls, frequencies of system calls and ordering of system calls made by theprocesses to calculate the similarity between the processes. We proposethe use of Kendall Tau distance to calculate the similarity in terms ofordering of system calls in the process. The k nearest neighbor (kNN)classifier is used to categorize a process as either normal or abnormal.The experimental results, which we perform on 1998 DARPA Data, arevery promising and show that the proposed scheme results in a high de-tection rate and low rate of false positives.

Keywords:Intrusion Detection, Anomaly, System Calls, kNN Classifier.

1 Introduction

Intrusions are attempts at compromising the confidentiality and integrity or by-passing the security mechanisms of a computer or network. Intrusion Detectionis the process of monitoring the events in a computer or network and analyzingthem for signs of Intrusions. With the rapid growth of attacks on computers, in-trusion detection systems (IDS), which are software or hardware products thatautomate this monitoring and analysis process [2], have become a critical com-ponent of security architecture.

According to Axelsson [1], an IDS consists of an Audit Collection/StorageUnit, Processing Unit and an Alarm/Response unit. While the Audit Collec-tion/Storage Unit collects data that is to be analyzed for signs of intrusion, theProcessing Unit analyses the data received using various techniques to zero in

on Intrusions. The Alarm/Response Unit triggers an alarm on detecting an in-trusion and it may execute defensive action too. Based on the various ways ofmanaging these units, different types of IDS are proposed in the literature. Thereare two types of IDS on the basis of Audit Collection, - network-based system[16], which collects data directly from the network that is being monitored, inthe form of packets and the host-based system [2], which collects data from thehost being protected. According to the Processing Unit, IDS can too be classi-fied into two types - misuse-based systems and anomaly-based systems. Whilethe first keeps the signatures for known attacks in the database and comparesnew instances with it to find attacks, the second learns the normal behavior ofthe monitored system and then looks out for any deviation in it for the signs ofIntrusions.

In the present work, we propose a novel intrusion detection process for ahost-based system. This work draws inspiration from one of the recent proposalsby Liao and Vemuri [14] on an anomaly-based intrusion detection system. Weobserve that any normal execution of a process follows a pattern and hence thenormal behavior of a process can be profiled by a set of predictable sequencesof system calls. Any deviation in this sequence of system calls is termed asIntrusion in the framework of anomaly-based IDS. The problem of intrusiondetection thus boils down to detecting anomalous sequence of system calls, whichare measurably different from the normal behavior. We propose a scheme inwhich we measure the similarity between processes using a similarity measurethat considers three main factors - occurrence of the individual system call,frequency of this system call in the process, and the position of this call in theprocess. This proposal adequately considers frequency as well as the orderingof system calls to determine anomalous process. Adopting Liao’s method, wemake use of k nearest neighbor scheme with our new similarity measure, thusgracefully extending the result of Liao and Vemuri [14], which does not considerthe ordering of the system calls. We prove that the similarity matrix proposedby Liao et al may result in inaccurate conclusion for intrusion detection and ourproposed similarity matrix overcomes this flaw. The major contributions of ourwork are the following.

– We introduce a novel similarity measure for frequency and occurrence.– After determining the anomalies using these measures, we make use of or-

dering information to detect the intrusion based on a similarity function onsequences.

We corroborate claims of a better IDS by experimental analysis.In the following section, we brief some of the relevant work to understand

the different approaches.Section 2 gives a brief survey on anomaly-based schemes to understand the

different approaches. We explain the scheme proposed by Liao and Vemuri [14]in section 4. Section 4 presents some background and definitions that are usedin the construction of our proposed scheme. Section 5 describes the proposedscheme in detail. Experimental results have been shown in the section 6. Weconclude our work in section 7.

2 Related Work

Anomaly-based IDS has the capability to identify new attacks, as any attackwill differ from the normal activity. However, such systems have a very highrate of false positive [2]. Hence, a lot of research is being done in the area ofanomaly-based intrusion detection [1]. The pioneering work in the field of anom-aly detection by Denning [5] describes a model for detecting computer abuseby monitoring the system’s audit records. In this approach, profiles of subjects(users) are learnt and statistical methods (means and standard deviation) usedto calculate deviations from the normal behavior. Lane et al [12] proposes an-other approach that captures users’ behavior. A database of sequences of UNIXcommands, that a user issues normally, is maintained for each user. Any newcommand sequence is compared with this database using a similarity measure-ment. Though the scheme gives good results, it is rather difficult to profile allthe users, especially of big organizations. Moreover, since the behavioral pat-tern of new users is not very stable, such models may give a high rate of falsepositives. Another approach, initiated by Forrest et al [8][9] [11], captures thenormal behavior of processes as programs show a stable behavior over the pe-riod of time under normal execution. In this approach, short sequences of systemcalls are used to profile a process. A similar approach is followed by Lee et al[13] too, but they make use of a rule learner RIPPER, to form the rules for clas-sification. Artificial Neural Networks have also been used for anomaly detection[10] due to their ability to learn behavior and generalize from this learning. Inthis approach, Ghosh et al use Leaky Bucket algorithm to capture the temporallocality. A new scheme based on the kNN classifier has been proposed by Liaoand Vemuri [14][15], in which, each process is treated as a document and eachsystem call as a word in that document. The process is converted into a vec-tor and cosine similarity measurement is used to calculate the similarity amongprocesses. The proposed scheme in this paper also follows a similar approach byusing kNN classifier for the classification of process.

The following section describes the scheme based on kNN-classifier [14]. Wealso identify some cases in which the Liao’s scheme produces wrong conclusions.

3 Scheme Based On k-NN Classifier

An approach based on kNN classifier is proposed by Liao and Vemuri [14] wherethe frequencies of system calls used by a program (process), instead of their tem-poral ordering, are used to define the program’s behavior. This paper presentsan analogy between text categorization and intrusion detection, such that eachsystem call is treated as a word and a set of system calls generated by the processas a document. The processes under normal execution (hereafter called as nor-mal processes) are collected from the DARPA data [4] and thereafter convertedinto the vectors, consisting of the frequencies of the system calls made by themduring the normal execution. The DARPA data provides processes in the formof BSM (Basic security Module) format, which is labeled as normal or abnor-mal. The complete method of collecting normal processes is described in the

section Experimental Results. From all the normal processes, a matrix A = [aij ]is formed, where aij denotes the frequency of ith system calls in the jth process.In order to categorize a new process P into either normal or abnormal class, theprocess P is first converted into a vector. The kNN classifier then compares itwith all the processes Aj in A to determine the k nearest neighbors, by calculat-ing the cosine similarity CS(P, Aj),using the cosine formula given by equation1.

CS(P, Aj) =P ·Aj

‖ P ‖ · ‖ Aj ‖ (1)

where ‖ X ‖= √X ·X.

The average similarity value of the k nearest neighbors is calculated and a thresh-old is set. When the average similarity value is above the threshold, process Pis considered as normal, and if not, abnormal.

Since the similarity measure, given by equation 1, considers only the frequen-cies of the system calls appearing in the processes, we observe the following twocases in which it may produce wrong results while calculating similarity. Con-sider the following two processes P1 and P2.

P1= open close close close close access access access access andP2= open ioctl mmap pipe access login su su audit audit

The similarity measures of the new process P (given below) to each of P1 andP2 using equation 1 are given by

P= open close ioctl mmap pipe pipe access access login chmod

CS(P, P1) = 0.6048, CS(P, P2) = 0.5714

We observe that there are only three common system calls out of eight be-tween P and P1 and six common system calls out of eight between P and P2.Intuitively, P2 is more similar to P than P1 is, but the similarity measures indi-cate the contrary. This is due to the frequent occurrence of close and access inP1, and absence of close in P2. The above example makes it amply clear thatwhile calculating the similarity score, there is no weight accorded to processeshaving more number of common system calls. We believe that while calculatingthe similarity score, if we include a factor that is dependent on the number ofcommon calls, such results can be avoided.

As noted earlier, during the normal execution, a process follows a sequenceof system calls and any significant change in the order of appearance of systemcalls is considered as an intrusion. But the following example demonstrates thatthe scheme proposed by Liao and Vemuri [14] does not capture the deviation inthe ordering of system calls while calculating the similarity score using equation1.

P1= rename login open ioctl su chmod close

P2= open close su ioctl chmod login renameP= open close su ioctl chmod pipe pipe

CS(P, P1)= 0.629 and CS(P, P2)= 0.629

It may be observed that in P and P2, till the fifth position, ordering of all thesystem calls is same while in P and P1, except ioctl, all the system calls are atdifferent positions. Thus, in this case, while calculating the similarity betweenprocesses, the ordering of system calls is not being taken into consideration,which is a very important factor, especially in the case of intrusion detection.

In our scheme, we define a similarity measure that depends not only onfrequencies of system calls but also on the number of similar system calls andordering of system calls. In the following section, we define some preliminaryresults that are used in the construction of our similarity measurement.

4 Preliminary Results

Let S (say, Card(S) = m) be a set of system calls made by all the processesunder normal execution. From all the normal processes a matrix A = [aij ] isformed, where aij denotes the frequency of ith system call in the jth process. Wealso form a matrix B = [bij ] where, bij = 1, if ith system calls is present in thejth process, and if not, it is 0. Thus each process Pbj = 0, 1m can be representedas a column in B. For example

Let S = {access audit chdir close creat exit fork ioctl}and two nor-mal processes beP1 = access close ioctl access exit, and P2 = ioctl audit chdir chdiraccess. Then we have:

P1 P2 Pb1 Pb2

A =

2 10 10 21 00 01 00 01 1

B =

1 10 10 11 00 01 00 01 1

We now define similarity measures, which we use in our scheme to calculatethe similarity between processes.

4.1 Binary Similarity Measure

We define a similarity score µ(Pbi, P bj) between any two processes Pbi and Pbj

as follows:

µ(Pbi, P bj) =

m∑n=1

(Pbi ∧ Pbj)n

m∑n=1

(Pbi ∨ Pbj)n

(2)

It may be noticed that 0 ≤ µ ≤ 1 . The value of µ increases when there aremore shared system calls between the two processes (due to the numerator) andvalue of µ decreases when the number of system calls, not shared by both theprocesses, is more than the shared ones (due to the denominator) in Pbi andPbj .

4.2 Frequency Similarity Measure

Another similarity score, known as cosine similarity measure λ(Pi, Pj) betweenthe processes Pi and Pj , where Pi and Pj are obtained from A, is defined asfollows: [3]:

λ(Pi, Pj) =Pi · Pj

‖ Pi ‖ · ‖ Pj ‖ (3)

It may be noted that eqn. 3 represents the same similarity measure as used byLiao and Vemuri [14].

We define a new similarity measurement Sim(Pi, Pj) as follows:

Sim(Pi, Pj) = µ(Pbi, P bj) · λ(Pi, Pj) (4)

The motive behind multiplying µ and λ is that λ(Pi, Pj) measures the similaritybased on the frequency and µ(Pbi, P bj) is the weight associated with Pi andPj . In other words, µ(Pbi, P bj) tunes the similarity score λ(Pi, Pj) accordingto the number of similar and dissimilar system calls between the two processes.Therefore, the similarity measure Sim(Pi, Pj) takes frequency and the numberof shared system calls into consideration while calculating similarity betweentwo processes.

4.3 Kendall Tau Distance

As mentioned earlier, the order of system calls occurring in a process is veryimportant while classifying a process as normal or abnormal. We make use ofthe Kendall Tau Distance [6] to determine similarity in terms of ordering. Givena set S of system calls, a process (ordered set of system calls) t with respect to Sis an ordering of a subset M of S, i.e. t = [x1 ≥ x2 ≥ . . . ≥ xd], with each systemcall xi ∈ M , and x− i ≥ xj implies that in the process t, system call xi appearsbefore the system call xj . If i ∈ S is present in t, then l(i) denotes the position

of i. The Kendall Tau Distance counts the number of pair-wise disagreementsbetween two processes t and m as follows:

K(t,m) =|{i, j} : i < j, t(i) < t(j), m(i) > m(j) or i > j, t(i) > t(j), m(i) < m(j)|(|M |

2

)(5)

In order to calculate the Kendall Tau distance between two processes Pi andPj , first of all Pi is converted into P ′i which contains the first occurrence of eachsystem call from Pi. It can be observed that in practice, it is difficult to findthe set M such that each process has system calls only from the set M . We,therefore, modify equation 5 to calculate Kendall distance as follows:

K(Pi, Pj) =|{r, s} : Condition(W ) or Condition(W ′)|

|P ′i ||P ′i |(6)

where Condition(W ) = r < s, P ′i (r) < P ′i (s), P′j(r) > P ′j(s) and

Condition(W ′) = r > s, P ′i (r) > P ′i (s), P′j(r) < P ′j(s). All the (r, s) pairs, for

comparison, are obtained from the set of system calls S. In the next section, wepresent the proposed scheme using the results derived above.

5 Proposed Scheme

As discussed above, the matrices A = [aij ] and B = [bij ] are constructed usingnormal processes and set S. For every new process P , if it contains a systemcall that is not in S, it is classified as abnormal and if not, it is first convertedinto the vector. Binary equivalent of this vector Pb is also calculated. The simi-larity score λ(P, Pj) is calculated for every normal vector Pj by using equation3, if λ(P, Pj) = 1, P is classified as normal. Otherwise, using equations 2 and 4,the values of µ(P, Pj) and Sim(P, Pj) are calculated. Values of Sim(P, Pj) aresorted in descending order and the k nearest neighbors (first k highest values)are chosen. We calculate the average value (Avg Sim) of the k nearest neigh-bors. For each of the k nearest neighbors, we calculate the Kendall Tau distanceK(P, Pj) using equation 6, and then calculate the average value (Avg Dist) forthese k Kendall distances. The kNN classifier categorizes the new process P aseither normal or abnormal according to the rule given below.

If Avg Sim > Sim Threshold and Avg Dist < Dist Threshold, classify Pas normal, otherwise P is abnormal,

where Sim Threshold and Dist Threshold are predefined threshold values forsimilarity measurement and Kendall Distance respectively. The pseudo code forthe proposed scheme is provided in Figure 1.

6 Experimental Results

We use BSM audit logs from the 1998 DARPA data [4] for the training andtesting of our algorithm. After analyzing the whole training data, we extract

Given a set of processes and system calls S, form the matrices A = [aij ] and B = [bij ]

for each process P in the test data do

if P has some system calls which does not belong to S then

P is abnormal; exit.

else then

for each process Aj in the training data A do

calculate Sim(P, Aj);

ifSim(P, Aj) equals 1.0 then

P is normal; exit.

end do

find first k highest values of Sim(P, Aj);

calculate Avg Sim for k nearest neighbors so obtained;

for each k nearest neighbor, calculate Kendall Distance;

calculate Avg Dist for k nearest neighbors;

if Avg Sim is greater than Sim Threshold and

Avg Dist is less than Dist Threshold then

P is normal;

else then

P is abnormal;

end do

Fig. 1. Pseudo code of the proposed scheme

the 50 unique system calls that appear in the training data. All the 50 systemcalls are shown in table 1. For each day of data, a separate BSM file is providedwith the ’BSM List File’. Each line of this file contains the information aboutone session like time, service, source IP and destination IP. A ’0’ at the end ofthe line shows that the session is normal and the presence of a ’1’ at the end ofthe line declares the session intrusive. Any process associated with the normalsession is considered as normal process. All the intrusive sessions are labeledwith the name of the attacks launched during the sessions. We make use of theBSM commands auditreduce and praudit and couples of scripts to extract thedata that can be used in our algorithm.

access, audit, auditon, chdir, chmod, chown, close, creat, execve, exit,fchdir, fchown, fcntl, fork, fork1, getaudit, getmsg, ioctl, kill, link,login, logout, lstat, memcntl, mkdir, mmap, munmap, nice, open, pathconf,pipe, putmsg, readlink, rename, rmdir, setaudit, setegid, seteuid, setgid,setgroups, setpgrp, setrlimit, setuid, stat, statvfs, su, sysinfo, unlink,utime, vfork

Table 1. List of 50 unique system calls

On analyzing the whole BSM logs (list files) carefully, we locate the five dayswhich are free of any type of attacks - Tuesday of the third week, Thursdayof the fifth week and Monday, Tuesday and Wednesday of the seventh week.We choose the first four days for our training data and the fifth one for thetesting of the normal data to determine the false positive rate. There is a totalof 1937 normal sessions reported in the four days data. We carefully extract theprocesses occurring during these days and our training data set consists of 769unique processes. There are 412 normal sessions on the fifth day and we extract4443 normal processes from these sessions. We use these 4443 normal processesfor the testing data. In order to test the detection capability of our method,we incorporate 40 intrusive sessions into our testing data. These sessions consistof all most all types of attacks launched on the victim Solaris machine (in thesimulated DARPA setup) during seven weeks of data and two weeks of testingdata and that can be detected using BSM logs. An intrusive session is said tobe detected if any of the processes associated with this session is classified asabnormal. Thus detection rate is defined as the number of intrusive sessionsdetected, divided by the total number of intrusive sessions. We perform theexperiments with the different values of k- 5, 10 and 15. Tables 2 and 3 show theresults for k= 5 and 10 respectively. We have not shown the results for k=15 aswe do not find any significant difference with the case k=10.

Threshold False PositiveRate

Detection Rate

0.90, 0.15 0.092 1.000.80, 0.15 0.042 0.900.79, 0.15 0.040 0.900.78, 0.20 0.039 0.870.75, 0.25 0.019 0.850.70, 0.25 0.007 0.850.63, 0.25 0.005 0.800.60, 0.25 0.002 0.75

Table 2. False Positive Rate vs Detection Rate for k= 5

The first column in the table provides the threshold values for the experi-ments. The choice of threshold values is on trial and error basis. We calculatethe similarity values of few normal processes first and then, according to theobserved values of similarity, we fix a range for threshold values. After that dif-ferent values within that range are chosen to calculate the results. Entries incolumn two are the rate of false positive, which is equal to the number of normalprocesses detected as abnormal divided by the total number of normal processes.Column three details the detection rate as defined above.

We also experiment with the Liao’s scheme because in our experimentalsetup, we choose different number of training processes and testing processesfrom those reported in [14]. This gives the comparison a valid meaning. Theresults for Liao’s scheme are shown in the table 4. Though our scheme performsbetter than Liao’s scheme for both the case k=5 and k=10, we show the com-

Threshold False PositiveRate

Detection Rate

0.80, 0.06 0.147 1.000.80, 0.20 0.089 0.900.79, 0.20 0.087 0.900.78, 0.20 0.065 0.870.75, 0.25 0.061 0.870.70, 0.25 0.040 0.850.68, 0.25 0.039 0.820.63, 0.25 0.019 0.800.60, 0.25 0.015 0.720.58, 0.25 0.010 0.700.56, 0.35 0.006 0.700.50, 0.35 0.003 0.68

Table 3. False Positive Rate vs Detection Rate for k= 10

Threshold False PositiveRate

Detection Rate

0.992 0.211 1.000.990 0.181 0.950.985 0.129 0.950.980 0.058 0.920.970 0.054 0.830.950 0.035 0.730.930 0.008 0.700.900 0.007 0.680.780 0.002 0.68

Table 4. False Positive Rate vs Detection Rate for Liao and Vemuri scheme

parison only between the ROC curves for k=5 and Liao’s scheme in figure 2.This ROC curve is a graph between detection rate and false positive rate. Foreach threshold value we get the detection rate and false positive rate. The pro-posed scheme reaches at 100% detection rate with false positive rate is as low as9% (Table II) whereas Liao’s scheme reaches 100% detection rate at 21% falsepositive (Table III). It can be seen in the figure 3 that detection rate in bothof the cases k=5 and k=10 is high, at k=5 we get a very low false positive rateas compared to k=10. This can be due to the reason that in the data, there isa high variation among the normal processes. Therefore aggregation over k=10suppresses this variation and kNN classifier may produce wrong results. In oursetup, all the experiments have been performed on a PC, running Windows 2000Professional with Intel-III and 256 MB RAM. As our similarity measurement ismore complex than the one used in Liao’s scheme (Liao, 2002a), total time takenby our algorithm in classifying a process as normal or abnormal is higher thanLiao’ scheme. In order to classify 4443 processes, our scheme takes around 820seconds, whereas it is around 235 seconds in the case of Liao’s scheme.

0 0.05 0.1 0.15 0.2 0.250.65

0.7

0.75

0.8

0.85

0.9

0.95

1

False Positive Rate

Det

ectio

n R

ate Prososed scheme

Liao’s scheme

Fig. 2. ROC curves for the proposed Scheme and Liao’s Schemes

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160.65

0.7

0.75

0.8

0.85

0.9

0.95

1

False Positive Rate

Det

ectio

n R

ate

k=5k=10

Fig. 3. ROC curves for the proposed Scheme at k=5 and k=10

7 Conclusions and Future Work

All Anomaly-Based Intrusion Detection Systems work on the assumption thatnormal activities differ from the abnormal activities (intrusions) substantially. Incase of IDS models that learn programs-behavior, these differences may come inthe frequency of system calls or the ordering of system calls used by the processesunder normal and abnormal execution. Our scheme considers all these factorswhile classifying a new process as normal or abnormal. The use of similarity scoreon the binary forms of the processes gives weight to the processes that containmore similar system calls. By using Kendall Tau Distance, two processes can becompared on the basis of the ordering of the system calls present in the processes.By combining all these techniques in our scheme, we arrive at very promisingresults.

In the present study, we use the BSM audit logs from 1998 DARPA DataSet. In future, we intend to test our scheme on real online data. Presently, allthe system calls are treated equally, but there may be some system calls, thepresence of which in the process is suspicious. We are trying to identify suchpatterns for a better Intrusion Detection System.

References

1. Axelsson S.: Research in intrusion detection systems: A survey. Technical ReportNo. 98-17, Dept. of Computer Engineering, Chalmers University of Technology,Gteborg, Sweden. (1999)

2. Bace R., Mell P.: NIST special publication on intrusion detection system. SP800-31, NIST, Gaithersburg, MD. (2001)

3. Chan Z., Zhu B.: Some Formal Analysis of the Rocchio’s Similarity-based Rele-vance Feedback Algorithm. Technical Report CS-00-22, Dept. of Computer Science,University of Texas-Pan American, Edinburg, TX. (2000)

4. DARPA 1998 Data, MIT Lincoln Laboratory,http://www.ll.mit.edu/IST/ideval/data/data index.html

5. Denning D. E.: An Intrusion-Detection Model. In:Proceedings of the 1986 IEEESymposium on Security and Privacy (SSP ’86). IEEE Computer Society Press.(1990) 118-133

6. Dwork C., Kumar R., Naor M., Sivakumar D.: Rank Aggregation Methods for theWeb. In: Proceedings of the tenth International World Wide Web Conference-2001.(2001) 613-622

7. Eskin E., Arnold A., Prerau M., Portnoy L., Stolfo S.: A Geometric Frameworkfor Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data. In:Barbara D, Jajodia S (eds.) Applications of Data Mining in Computer Security.Kluwer academics Publishers. (2002) 77-102

8. Forrest S., Hofmeyr S. A., Somayaji A., Longstaff T. A.: A Sense of Self for UnixProcesses. In: Proceedings of the 1996 IEEE Symposium on Research in Securityand Privacy. Los Alamitos, CA. IEEE Computer Society Press. (1996) 120-128

9. Forrest S., Hofmeyr S. A., Somayaji A.: Computer Immunology. Communicationsof the ACM. 40(10) (1997) 88-96

10. Ghosh A. K., Schwartzbard A.: A Study in Using Neural Networks for Anomalyand Misuse Detection. In: Proceedings of the 8th USENIX security Symposium.Aug. 23-26, Washington D C USA. (1999) 141-151

11. Hofmeyr S. A., Forrest. A., Somayaji A.: Intrusion Detection Using Sequences ofSystem Calls. Journal of Computer Security. 6 (1998) 151-180

12. Lane T., Brodly C. E.: An Application of Machine Learning to Anomaly Detec-tion. In: Proceeding of the 20th National Information System Security Conference.Baltimore, MD. (1997) 366-377

13. Lee W., Stolfo S., Chan P.: Learning Patterns from Unix Process Execution Tracesfor Intrusion Detection. In: Proceedings of the AAAI97 workshop on AI methodsin Fraud and risk management. AAAI Press. (1997) 50-56

14. Liao Y., Vemuri V. R.: Use of K-Nearest Neighbor Classifier for Intrusion Detec-tion. Computers & Security. 21(5)(2002a) 439-448

15. Liao Y., Vemuri V. R. Using Text Categorization Techniques for Intrusion Detec-tion. In: Proceedings USENIX Security 2002. San Francisco (2002b) 51-59

16. Mukherjee B., Heberlein L. T., Levitt K. N.: Network Intrusion Detection. IEEENetwork. 8(3) (1994) 26-41