Evolving HMMs for Network Anomaly Detection – Learning through Evolutionary Computation

9
Evolving Hidden Markov Models For Network Anomaly Detection Juan J. Flores Division de Estudios de Posgrado Universidad Michoacana Morelia, Mexico [email protected] Anastacio Antolino Division de Estudios de Posgrado Universidad Michoacana Morelia, Mexico antolino@faraday.fie.umich.mx Juan M. Garcia Division de Estudios de Posgrado Universidad Michoacana Morelia, Mexico jmgarcia@faraday.fie.umich.mx Abstract—This paper reports the results of a system that performs network anomaly detection through the use of Hidden Markov Models (HMMs). The HMMs used to detect anomalies are designed and trained using Genetic Algorithms (GAs). The use of GAs helps automating the use of HMMs, by liberating users from the need of statistical knowledge, assumed by software that trains HMMs from data. The number of states, connections and weights, and probability distributions of states are determined by the GA. Results are compared to those obtained with the Baum-Welch algorithm, proving that in all cases that we tested GA outperforms Baum-Welch. The best of the evolved HMMs was used to perform anomaly detection in network traffic activity with real data. Keywords-HMMs; GAs; Anomaly Detection; Baum-Welch; I. I NTRODUCTION Security threats for computer systems have increased im- mensely which include virus, denial of service, vulnerability break-in, etc. While many security mechanisms have been introduced to undermine those threats, none can completely prevent all of them. Security threats to the computer systems have raised the importance of anomaly detection [1]. Anomaly detection is the process of monitoring the events occurring in a computer system or network and analyzing them for signs of anomaly. Anomaly is defined as attempts to compromise the confidentiality, integrity, availability, or to bypass security mechanism of a computer or network. Hidden Markov Models (HMMs) are considered a basic component in speech recognition systems. Besides, HMMs are based on a probabilistic finite state machine used to model stochastic sequences [2]. HMMs have many appli- cations in signal processing, pattern recognition, speech and gesture recognition, as well as applications to anomaly detection. It is important to mention that in an HMM, the estimation of good model parameters affects the performance of the recognition [3] or detection processes. The HMM param- eters can be determined during an iterative process called “training”. The Baum-Welch algorithm [4] is one method applied in setting an HMM’s parameters, but this method has a drawback. The Baum-Welch algorithm, being a gradient- based method, may converge to a local optimum. Global search techniques can be used to optimize an HMM’s parameters. Genetic Algorithms (GAs) are a global optimization technique that can be used to optimize an HMM’s parameters [3]. GAs are a searching process based on the laws of natural selection and genetics. It emulates the individuals in a natural environment, where the natural selection mechanism makes the stronger individuals likely winners in the com- peting environment. Studies of using GAs to train HMMs are relatively rare, particularly in comparison with the large literature on applying GAs to neural network training [2]. In this paper we present the use of GAs for creation and optimization of an HMM on one step; our system takes a time series as input data and produces a trained HMM, without any human intervention. After that, the HMM model is used for anomaly detection in a computer network traffic. Finally, we compare the performance of HMMs created with GA against others trained with the Baum-Welch algorithm. The remainder of the paper is structured as follows. Section 2 introduces the related HMM theory. Section 3 describes GAs for readers not familiar with the topic. Section 4 describes Intrusion Detection theory. Section 5 is the base of the proposal of this paper. Section 6 illustrate the results on comparison HMMs created with GAs against other created with the Baum-Welch algorithm. Section 7 is a brief overview of some approaches and uses of HMMs for anomaly detection and GAs used for optimizing an HMM’s parameters. Section 8 presents our conclusions and future work. II. HIDDEN MARKOV MODELS An HMM is a doubly stochastic process with an un- derlying stochastic process that is not observable, but can only be observed through another set of stochastic processes that produce the sequence of observed symbols. The most common application of HMMs, which are best known for their contribution, is in automatic speech recognition, where HMMs were used to characterize the statistical properties of a signal [4], [5]. Also, HMMs have been working in the bioinformatic area. Recently, HMMs have been applied to a variety of appli- cations outside of speech recognition, such as handwriting recognition, pattern recognition in molecular biology, and

Transcript of Evolving HMMs for Network Anomaly Detection – Learning through Evolutionary Computation

Evolving Hidden Markov Models For Network Anomaly Detection

Juan J. FloresDivision de Estudios de Posgrado

Universidad MichoacanaMorelia, [email protected]

Anastacio AntolinoDivision de Estudios de Posgrado

Universidad MichoacanaMorelia, Mexico

[email protected]

Juan M. GarciaDivision de Estudios de Posgrado

Universidad MichoacanaMorelia, Mexico

[email protected]

Abstract—This paper reports the results of a system thatperforms network anomaly detection through the use of HiddenMarkov Models (HMMs). The HMMs used to detect anomaliesare designed and trained using Genetic Algorithms (GAs). Theuse of GAs helps automating the use of HMMs, by liberatingusers from the need of statistical knowledge, assumed bysoftware that trains HMMs from data. The number of states,connections and weights, and probability distributions of statesare determined by the GA. Results are compared to thoseobtained with the Baum-Welch algorithm, proving that in allcases that we tested GA outperforms Baum-Welch. The bestof the evolved HMMs was used to perform anomaly detectionin network traffic activity with real data.

Keywords-HMMs; GAs; Anomaly Detection; Baum-Welch;

I. INTRODUCTION

Security threats for computer systems have increased im-mensely which include virus, denial of service, vulnerabilitybreak-in, etc. While many security mechanisms have beenintroduced to undermine those threats, none can completelyprevent all of them. Security threats to the computer systemshave raised the importance of anomaly detection [1].

Anomaly detection is the process of monitoring the eventsoccurring in a computer system or network and analyzingthem for signs of anomaly. Anomaly is defined as attemptsto compromise the confidentiality, integrity, availability, orto bypass security mechanism of a computer or network.

Hidden Markov Models (HMMs) are considered a basiccomponent in speech recognition systems. Besides, HMMsare based on a probabilistic finite state machine used tomodel stochastic sequences [2]. HMMs have many appli-cations in signal processing, pattern recognition, speechand gesture recognition, as well as applications to anomalydetection.

It is important to mention that in an HMM, the estimationof good model parameters affects the performance of therecognition [3] or detection processes. The HMM param-eters can be determined during an iterative process called“training”. The Baum-Welch algorithm [4] is one methodapplied in setting an HMM’s parameters, but this method hasa drawback. The Baum-Welch algorithm, being a gradient-based method, may converge to a local optimum.

Global search techniques can be used to optimize anHMM’s parameters. Genetic Algorithms (GAs) are a global

optimization technique that can be used to optimize anHMM’s parameters [3].

GAs are a searching process based on the laws of naturalselection and genetics. It emulates the individuals in anatural environment, where the natural selection mechanismmakes the stronger individuals likely winners in the com-peting environment. Studies of using GAs to train HMMsare relatively rare, particularly in comparison with the largeliterature on applying GAs to neural network training [2].

In this paper we present the use of GAs for creation andoptimization of an HMM on one step; our system takesa time series as input data and produces a trained HMM,without any human intervention. After that, the HMM modelis used for anomaly detection in a computer network traffic.Finally, we compare the performance of HMMs created withGA against others trained with the Baum-Welch algorithm.

The remainder of the paper is structured as follows.Section 2 introduces the related HMM theory. Section 3describes GAs for readers not familiar with the topic. Section4 describes Intrusion Detection theory. Section 5 is thebase of the proposal of this paper. Section 6 illustrate theresults on comparison HMMs created with GAs againstother created with the Baum-Welch algorithm. Section 7 isa brief overview of some approaches and uses of HMMs foranomaly detection and GAs used for optimizing an HMM’sparameters. Section 8 presents our conclusions and futurework.

II. HIDDEN MARKOV MODELS

An HMM is a doubly stochastic process with an un-derlying stochastic process that is not observable, but canonly be observed through another set of stochastic processesthat produce the sequence of observed symbols. The mostcommon application of HMMs, which are best known fortheir contribution, is in automatic speech recognition, whereHMMs were used to characterize the statistical properties ofa signal [4], [5].

Also, HMMs have been working in the bioinformatic area.Recently, HMMs have been applied to a variety of appli-cations outside of speech recognition, such as handwritingrecognition, pattern recognition in molecular biology, and

Figure 1. An HMM example with 4 states

fault detection. Variants and extentions of HMMs includeeconometrics, time series, and signal processing [6].

An HMM is formed by a finite number of states con-nected by transitions (see Figure 1), which can generatean observation sequence depending on its transition, andinitial probabilities. That means an HMM is represented bythree sets of probabilities (initial state probability, transitionbetween states probability, and the probability to observe asymbol in each state). A Markov Model is hidden becausewe do not know which state led to each observation.

An HMM is defined, among others, basically by thesethree parameters:

A = {aij} is the state transition probability matrix.B = {bj(k)} is the emission probability matrix, indicating

the probability of a specified symbol being emitted giventhat the system is in a particular state.

Π = {πi} is the initial state probability distribution.An HMM can therefore be defined by the triplet [7]:

λ = {A,B, π} (1)

There are three basic problems of interest that must besolved for the HMMs to be useful in real-world applications.Mainly HMM applications are reduced for solving threekinds of problems:

1) Determine the probability that a given observationsequence had been generated by λ.

2) Determine the highest-probability state sequence for agiven observation sequence and a λ.

3) Adjust or re-estimate the model’s parameters λ.HMMs deal with the problems described above, undera probabilistic or statistical framework. HMMs offer theadvantages of having strong and powerful statistical foun-dations and being computationally efficient to develop and

evaluate due to the existence of established training algo-rithms. But one disadvantage of using HMMs is the need foran a-priori notion of the model topology. That is, in orderto build an HMM we must first decide how many statesthe model will contain and what transitions between stateswill be allowed. Once a model structure has been designed,the transition and emission parameters need to be estimatedfrom training data.

The Baum-Welch algorithm is used to train model param-eters. The Baum-Welch algorithm is an iterative expectation-maximization algorithm that, given an initial parameterconfiguration, adjusts model parameters to locally maximizethe likelihood of data. Baum-Welch training suffers fromthe fact that it finds local optima, and is thus sensitiveto initial parameters settings. For further information onHMMs, please see [4].

III. GENETIC ALGORITHMS

HMMs parameters are determined during an iterativeprocess called “training”. One of the conventional methodsapplied in HMM model parameter values is the Baum-Welch algorithm [4]. One drawback of this method is thatit converges to a local optimum. Global search techniquescan be used to optimize an HMM parameters. GeneticAlgorithms (GA) is a global optimization technique [8], [3]that can be used to optimize the HMM parameters [3].

A. Basic Concepts

GAs simulate evolution phenomena of nature. The searchspace is mapped into a genetic space. The possible solutionis encoded into a vector (chromosome), and elements ofthe chromosome vector are called genes. Through continu-ally calculating the fitness of each chromosome, the bestchromosome is selected and the optimization solution isobtained [9].

The GA is a robust general purpose optimization tech-nique, which evolves a population of solutions. GA is asearch technique that has a representation of the problemsates and also has a set of operations to move through thesearch space [3].

Selection, crossover, and mutation are three main opera-tors of GA, and individuals are the objects upon which thoseoperators work. These components constitute the wholegenetic process, making GAs have good characteristics,which other classic methods do not have. You can find thegeneral steps of a GA in [9].

B. A Framework For Evolving An HMM

In this subsection we propose a framework for evolvingan HMM’s parameters. The evolution process consists of:• A population of chromosomes. One chromosome is

constituted as shown in Table I. These genes aredescribed as:

Table IA COMPLETE CHROMOSOME FOR EVOLVING HMMS

Fitness Size Transitions Parameters Pi

Table IITRANSITION GENES OF AN HMM’S CHROMOSOME

a11 ... a1N

a21 ... a2N

... ... ...aN1 ... aNN

Table IIIPROBABILITY GENES OF AN HMM’S CHROMOSOME

State Mean Varianceµ σ2

S1 µ1 σ21

S2 µ2 σ22

... ... ...SN µN σ2

N

Table IVPI GENES OF AN HMM’S CHROMOSOME

S1 S2 ... SN

1) Fitness. This is its fitness evaluation, after beingevaluated by the objective function (see Equa-tion 2).

P (O|λ) =N∑

i=1

αT (i), (2)

2) Size. It is a value that indicates the number ofstates of the HMM.

3) Transitions. This gene has the probability transi-tions among the states of the HMM (see Table II).

4) Parameters. This set of genes contains the Meanand Variance of a Gaussian distribution functionof each state in the HMM (see Table III).

5) Pi. The last set of genes represents the probabilityof state Si to start at time t = 1 (see Table IV).

Where the chromosomes are the encoded form of thepotential solution. Initially, the population is generatedrandomly and the fitness values of all chromosomes areevaluated by calculating the objective function (Equation 2).This is the forward variable α that calculates P (O|λ) [4],probability that the observation sequence had been generatedby model λ (Equation 1).• After the initialization of the population pool, the GA

evolution cycle is begun. At the beginning of eachgeneration, the mating pool is formed by selectingsome chromosomes from the population. This pool ofchromosomes, is used as the parents for the geneticoperations to generate the offspring. The fitness valuesof offspring are also evaluated. Each offspring is a

complete HMM, so the system calculates how goodthis model is with respect to the observation sequence;we compute its probability and assign this value as itsfitness. That is, the topology with the maximum valueof likelihood is considered as the best topology.

• At the end of the generation, some chromosomes willbe replaced by the offspring, maintaining a total pop-ulation of constant size across generations, preservingoffspring and parents with the best fitnesses.

• This GA is setup to apply some evolutive operatorslike mutParams (mutates parameters values mean andvariance), mutTransition (mutates probability values ontransition links among states), delTransition (deletes astate transition), addTransition (adds a new state transi-tion), Copy (copies a complete chromosome), addNode(adds a new node to the model), delNode (deletes anode from the model). This GA builds HMM modelsranging from 3 to 15 states. The Mean and Variance arerandomly initialized and evolution is driven by mutationmagnitude parameters.

• The above generation and genetic operations are re-peated until the cycle gets the maximum number ofgenerations. By emulating the natural selection and ge-netic operations, this process will hopefully determinethe best chromosomes of the highly optimized solutionsto the problem.

Once GA has finished evolving an HMM, we take thebest model of the last generation and use it to determineif a given observation sequence has anomalies or not. Ournew statistic model can examine new observation sequencesand determine if a time series belongs to this model. Thatis, if the observation are likely to have been produced bythe model. If the observations have a low probability ofbelonging to the model, it is possible that the sequencespresent an anomaly.

As we can see in Figure 2, we have an HMM evolvedwith GAs. This HMM was evolved based on the observa-tion sequence given by the network bandwith used in ourUniversity.

The HMM generated and optimized by GAs, is a greathelp to detect statistically anomalous behavior; its function isto detect anomalous behavior and discriminate over a normalone. HMMs focus on statistics-based anomaly detectiontechniques. We built a statistics-based normal profile andemployed statistical tests to determine whether observedactivities deviate significantly from the normal model.

IV. ANOMALY DETECTION

Anomaly detection is performed by detecting changesin the patterns of resources utilization or behavior of thesystem. Anomaly detection may be performed by build-ing a statistical model that contains metrics derived fromsystem operation and flagging as intrusive any observedmetrics that have a statistically significant deviation from

Figure 2. An HMM Evolved by GAs

the model [1]. In other words, an anomaly detection uses amodel of “normal” network behavior to compare to currentlyobserved network behavior. Any behavior that varies fromthis model is considered a network anomaly and behaviorclosely matching the model is “normal”. In general, thenormal behavior of a computing system can be characterizedby observing its properties over time.

The problem of detecting anomalies can be viewed asfiltering non-permitted deviations of the characteristic prop-erties in the monitored network system [10]. Anomaly de-tection can also be defined as the process of identifying ma-licious behavior that targets a network and its resources [11].

Anomaly detection research is carried out in several appli-cation domains, such as monitoring business news, epidemicor bioterrorism detection, intrusion detection, hardware faultdetection, network alarm monitoring, fraud detection, etc.The anomaly detection problem involves large volumes oftime series data, which has a significant number of entitiesand activities. The main goal of anomaly detection is toidentify as many interesting and rare events as possiblewith minimum delay and the minimum number of falsealarms [12].

V. RESULTS

This section presents the results of our proposal andcompares them with those obtained using the Baum-Welchlearning algorithm. It also presents the results of usingHMMs in anomaly detection.

In all our experiments we used a time series produced bynetwork bandwidth usage (Figure 3) in our University. Thesevalues recorded bandwidth used for our department. HMMstraining used this series with 48 values, which correspondsto bandwidth used in kbit/second in two consecutive days.The values status fluctuates between 14.625 and 5964.81

Figure 3. Network Bandwidth with normal and anomalous behavior

Table VSEVERAL HMMS EVOLVED BY GAS

Num. of Population Num. of P(O | λ)States Generations

5 500 500 7.70688× 10−103

5 500 250 7.70688× 10−105

6 250 250 2.52091× 10−75

6 200 50 7.19032× 10−98

8 1000 500 1.19399× 10−88

kbit/second, considered as a normal behavior. The timeseries was modified (using random values) to simulate ananomalous behavior; in the presence of a given kind ofattack, the bandwidth usage increases.

A. HMMs Produced By GAs

Our system evolves HMMs using GAs, with differentnumber of generations and different population sizes. Ta-ble V describes several examples of HMMs evolved withGAs. The second column of Table V, shows each model’sprobabilities for the observation sequence. We selected ourHMM from these evolved models. We took the model withthe highest probability, and used it in our anomaly detectiontests.

Figure 4 shows how GAs evolve HMMs; each subfigureshows the best individual of corresponding generations.a) Generation 1, produced an HMM with 14 states. b)generation 5, has evolved to 13 states, c) generation 10,HMM has 12 states, d) generation 11, evolved to 11 states, e)in generation 85, has an HMM with 10 states, and f) presentsthe final HMM evolved with states and transitions. In the lastgenerations, GA has already come to an optimum numberof states and focuses on refining the transition probabilities.

Regarding our results, we observed the following points:

• GA does not assume the user to possess any back-ground knowledge about statistical models. GA doeseverything, creates and optimizes an HMM based on the

Figure 4. Evolving an HMM with GAs

time series data and a probabilistic function to qualifythat sequence.

• All this method needs is a time series data, a fitnessfunction, and time for processing the learning method.

• GA builds an HMM with all the required parameters.• The Baum-Welch algorithm is a good method for

learning using time series data, but its problem is thatit converges to a local optimum.

• Another problem that the Baum-Welch algorithmpresents, is that users need a previous background orknowledge about HMMs theory and how to build it.

• The Baum-Welch algorithm’s performance depends alot on good HMM parameters to refine them.That is,if we give it bad starting parameters, the method maydiverge.

B. Baum-Welch Learning

Figure 5 shows an HMM structure created at random.We took this HMM and we trained it with the Baum-Welchalgorithm, obtaining a reestimated HMM. This HMM wastrained based on the observation sequence given by thebandwith used. An HMM was obtained with the Baum-Welch algorithm training in 50 iterations, as you can seein Figure 6.

Table VI describes our tests and results. We created

Figure 5. HMM Created with Random Values, before Training

Figure 6. HMM of Figure 6 after Training by the Baum-Welch Algorithm

Table VISEVERAL HMMS TRAINED WITH THE BAUM-WELCH ALGORITHM

Number P(O | λ)of States

3 1.8920× 10−123

4 6.44442× 10−175

5 3.43836× 10−137

6 1.07894× 10−127

7 5.90112× 10−125

8 1.09739× 10−119

9 1.22676× 10−119

10 6.55885× 10−172

several HMMs and reestimated their parameters using theBaum-Wech algorithm, trying to get better probabilities forour models. We started with a 3-state HMM as you can seein the first column of Table VI. The second column, labeledP(O | λ), shows the probability for each model. These prob-abilities were obtained using the Baum-Welch algorithm.We generated models from 3 to 10 states. In all tests weperformed, we applied the Baum-Welch algorithm for 50iterations. Our initial models were built with random valuesas initial parameters. These HMMs present low probabilitiesbecause they try to represent the complete likelihood of allobservation sequence. It is the probability that this sequencewas given or generated by these HMMs.

Table VIIRESULTS OF ANOMALY DETECTION USING AN HMM TRAINED WITH

BAUM-WELCH ALGORITHM

Window %Size Hits False Positives False Negatives

3 57 43 04 60 40 05 59 41 06 65 35 0

From our experiments, we realize that the Baum-Welchalgorithm needs for its proper usage:• An initial number of states.• An initial start point (an estimate for transition and

starting point probabilities).• A Probability Distribution function for each state.

As a conclusion, we noted that the models that we createdand trained with the Baum-Welch algorithm did not performas well as any of the HMM models created with GA, theprobability of the best model is 2.5209 x 10−75 with 6 states.The Baum-Wech algorithm needs the expertise of the usersto create an initial HMM.

We started our experiments using Jahmm [13] as ourprimary tool, but we found that this tool had problemscomputing probabilities on continuous sequences. So wedecided to implement the Baum-Welch algorithms in Math-ematica [14].

Table VII describes the results of applying HMMs, trainedwith the Baum-Welch algorithm, to detecting anomalousbehavior. Our experiments’ setup is the following: givenan observation sequence, the trained HMM analyzes a timewindow of data and determines the probability of thatsequence having been generated by the model. Our timewindow moves to the next item in the time series, which isverified also by the HMM. The process continues until thewhole test time series is traversed and tested. The windowsizes we tested were 3, 4, 5, and 6. All these probabilities,generated by the sliding time window and the HMM, werecompared with our threshold probability and then analyzedto determine if an anomaly exists or not.

Figure 7 shows the probabilities obtained through theapplication of the HMM to the sliding window data fornormal and anomalous behaviours. A threshold is set, whichhelps to determine if an anomalous behavior exists in thetime series.

We do not report all that information for the lack of space.Instead, we report the percentages of hits, false positives, andfalse negatives. Table VII shows these results.

Table VIII shows the results from an HMM evolvedthrough GA used in anomaly detection. Our experiments’setup is the same as we mentioned lines above. We tookthe same time series and applied the same windows sizes,following the same steps. All these probabilities, generatedby the sliding time window and the HMM evolved, were

Figure 7. Threshold between Normal and Abnormal Behaviors

Table VIIIRESULTS OF ANOMALY DETECTION USING AN HMM EVOLVED WITH

GAS

Window %Size Hits False Positives False Negatives

3 89 7 44 93 2 55 89 4 76 84 5 11

also compared with our threshold probability, to determineif an anomaly exists or not (see Figure 7). We also reportedthe percentages of hits, false positives, and false negatives.Table VIII shows these results.

The method we are using to analyze and determinepossible anomalies, is comparing the probability generatedby a time window of the time series, with the threshold.If this value is lower than our threshold, we flagged thewindow as anomalous behavior.

Our experiments report the best result with a windowsize of 6, using an HMM trained with the Baum-Welchalgorithm, with a 65% of correct detections or hits, 35%of false positives, and 0% of false negatives. The worstresult was produced for window size of 3, with 57% of hits,43% of false positives, and 0% of false negatives. The bestresult with a window size of 4, using an HMM evolved withAGs, with a 93% of hits, 2% of false positives, and 5% offalse negatives detections. The worst result was produced forwindow size of 6, with 84% of hits, 5% of false positives,and 11% of false negatives.

VI. RELATED WORK

Determining the best parameters of an HMM is a hard op-timization problem. The performance of the HMM dependslargely on these parameters. There are research works inthis area, applying GAs as a search method to find the bestparameters of an HMM; in this section we mention some ofthe related work.

We are going to mention some works in GAs aimed toHMM parameter optimization. Most of these papers arefocused to optimize or select a good model to use it in speechor gesture recognition.

A. Production of HMMs with GA

Bhuriyakorn et al. [8] show several approaches that areextensively used for HMM topology generation. One algo-rithm predefines rigidly the connectivity among the states;hence, this algorithm only estimates the numbers of statesand Gaussian mixtures. Another algorithm, Successive StateSplitting, begins allowing one state transition, and growsfrom one state to the optimal topology. That is, it beginsfrom one state to a complete HMM topology. Anotheralgorithm is also used, State Reduction starts with a fully-connected topology with a predefined maximum number ofstates, and then, it iteratively reduces a transition until theprocess terminates. They also describe an algorithm that usesmultiple paths in a single HMM, the idea of this algorithmis to improve topologies instead of using significant effortsto adjust the single path for better topologies. Their workfocuses basically on the topology selection problem. It isimportant to mention that these algorithms focus on the taskof Thai phoneme recognition only.

Ogawa et al. [15], used a well-known simple GA as anew structure optimization method in which the dependenceof the states and the observation of Partly Hidden MarkovModel (PHMM) are optimally defined according to eachmodel using the weighted likelihood-ratio maximization.

Won et al. [2] present the use of GAs for evolving HMMsin the prediction of secondary structure information forprotein sequences. This hybrid GA was run on a clusterof computers as a Parallel Genetic Algorithm. This paperalso mentions that the topology of the HMM was restrictedto biologically meaningful building blocks.

Yang et al. [16] treat the problem of optimizing HMMs’parameters. First, they use a GA in order to get into shortspace solution, where Tabu Search (TS) is applied as ashort term memory, recording in a tabu list genetic operatorsused, and offspring generated. So the GATS (GA and TScombined) algorithm guarantees that GA will not fall in apremature convergence, due to genetic operator used andoffspring generated are recorded in a tabu list and canbe rejected with a probability in future. So, to improveconvergence speed of the GATS-based HMM optimizationalgorithm, the Baum-Welch algorithm (BW) is combinedwith it. Generating the hybrid GATSBW algorithm, whichcombines these three elements mentioned above, with theproposes to train HMMs in continuous speech recognition.This GATSBW algorithm overcomes the shortcoming ofthe slow convergence speed of the GATS algorithm andalso helps the Baum-Welch algorithm escape from localoptimum.

B. HMMs used in Anomaly Detection

In this subsection, we provide a brief overview of someresearch work about anomaly detection using HMM, and seehow these systems generally work.

Warrender et al. [17] present a method comparisonsused in intrusion detection systems. Among these methodsanalyzed by Warrender and her team, HMM was analyzedand compared to determine how good was this methodused in anomaly detection. Their conclusion were that theperformance of HMM depends on its initial parameters,sometimes depends on the experience of who creates theHMM. It was hard and took long time to train an HMM,when this HMM had several states and long observationsequences. But they found that an HMM, once was trained,was fast to determine the probability of an observationsequence. They used programs data and call system, asobservation sequences.

Jha et al. [18] present a statistical anomaly detectionalgorithm based on Markov Chains. The algorithm theypropose can be directly applied for intrusion detection bydiscovering anomalous activities. This algorithm uses thesequence of system calls corresponding to a process as thetrace of its activity. The algorithm uses a set of normalsystem call traces and a statistical model is constructed, thenit is used to construct a classifier capable of discriminatingbetween normal and abnormal behavior.

Garca et al. [19] present an approach to anomaly detectionbased on the construction, hand-designed, of an HMM,trained on processor workload data. For that, this HMM wasused to observe a sequence of processor load measurements,where if the probability of the observation of having beengenerated by the model is lower than it was estimated withthe HMM, it was considered as an anomaly.

Khanna et al. [20], discuss an HMM strategy for intrusiondetection using a multivariate Gaussian model for observa-tions that are then used to predict an attack that exists inthe form of a hidden state. They focus in anomaly detectionusing users and trends profiles. And any behavior out of thisprofile is considered as unusual activity, which is recorded asan observation. Once it has these observations, an HMM isused to predict a possible intrusion, based on its hidden statestransitions for the most probable intrusion state sequence.These authors are working on ad hoc architecture, so theyneed a sensor data which one collect and analyze data. ThisHMM based approach (where the HMM is designed byhand) correlates the system observations (usage and activityprofile) and state transitions to predict the most probableintrusion state sequence.

Singh et al. [12] illustrate the capabilities of HMMs,combined with feature-aided tracking, for the detection ofasymmetric threats (it refers to tactics employed by subjectsto carry out attacks on a superior opponent, while tryingto avoid direct confrontation). This approach combines atransactional-based probabilistic model with HMMs andfeature-aided tracking.

Most of GAs’ work focus on just determining the bestHMM’s parameters, other are focusing on, once obtainedthe best parameters, applying HMMs to phonetic or protein

sequence research. And most of HMMs used for anomalydetection, described above, were hand-designed. Hand-design implies implicit knowledge derived from previousprobabilistic experience.

This work has combined two techniques: GAs for opti-mization HMM’s parameters, and HMMs used for AnomalyDetection. Once an HMM has evolved from GAs, we appliedit for Anomaly Detection. This approach allows a user todeploy the system without assuming any background onprobability and HMMs theory.

VII. CONCLUSIONS AND FUTURE WORK

In this paper, we outlined a framework for evolvingHMMs used for anomaly detection. These models generatedand optimized by GAs, provide detection capabilities foranomalous behavior. They are capable of distinguishingbetween normal and abnormal behaviors. Our experimentsshow that Hidden Markov Models do well, 93%, overanomaly detection.

All models were produced by means of GA, without theneed of human intervention at all. The GA determines thesize of the HMM, the probability distributions on each state,and the transition probabilities among pairs of states. That is,our process receives the time series and produces an HMMwith all its parameters fine tuned to a point to perform betterthan those produced by the Baum-Welch algorithm.

Forthcoming experiments include using parallel work onthe Baum-Welch algorithm. We hope in a near future, tocompare our method with some others, like the one usedby Warrender et al. [17] and determine how good is oursystem against others. Finally, we also plan to model severalvariables to model an HMM.

Our work on HMMs is centered around the task ofanomaly detection in a network bandwidth usage. Themodels we described in this paper are built from time seriesdata of bandwidth used, and the models can be employedby our computer center in a near future.

We expect these improvement to contribute to the develop-ment of more accurate models for anomaly detection usingHMMs. The next steps are to introduce time in the HMMmodels, and generalize them to include observations frommore than one variable.

Nowadays, completely protect a network from attacksis a very hard task; even heavily protected networks aresometimes penetrated.

REFERENCES

[1] M. H. Islam and M. Jamil, “Taxonomy of statistical basedanomaly detection techniques for intrusion detection,” pp.270–276, 2005.

[2] K.-J. Won, T. Hamelryck, A. Prugel-Bennett, and A. Krogh,“Evolving hidden markov models for protein secondary struc-ture prediction,” Evolutionary Computation, IEEE, vol. 1, pp.1–18, September 2005.

[3] M. Korayem, A. Badr, and I. Farag, “Optimizing hiddenmarkov models using genetic algorithms and artificial im-mune systems,” Computing and Information Systems, vol. 11,no. 2, 2007.

[4] L. R. Rabiner, “A tutorial on hidden markov models andselected applications in speech recognition,” Proceedings ofthe IEEE, vol. 77, no. 2, February 1989.

[5] L. R. Rabiner and B. H. Juang, “An introduction to hiddenmarkov models,” IEEE ASSP Magazine, 1986.

[6] Y. Bengio, “Markovian models for sequential markovianmodels for sequencial data,” Statistical Science, 1997.

[7] P. Nicholl, A. Amira, D. Bouchaffra, and R. H. Perrot,“A statistical multiresolution approach for face recognitionusing structural hidden markov models,” EURASIP Journalon Advances in Signal Processing, ACM, vol. 2008, p. 13,2008.

[8] P. Bhuriyakorn, P. Punyabukkana, and A. Suchato, “A geneticalgorithm-aided hidden markov model topology estimationfor phoneme recognition of thai continuous speech,” in NinthACIS International Conference on Software Engineering,Artificial Intelligence, Networking and Parallel/DistributedComputing. IEEE, 2008, pp. 475–480.

[9] X. Zhang, Y. Wang, and Z. Zhao, “A hybrid speech recog-nition training method for hmm based on genetic algorithmand baum welch algorithm,” IEEE, pp. 572 – 572, 2007.

[10] D. Dasgupta and H. Brian, “Mobile security agents for net-work traffic analysis,” IEEE Transactions on Power Systems,vol. 2, no. 332-340, 2001.

[11] F. Jemili, M. Zaghdoud, and M. Ben Ahmed, “A frameworkfor an adaptive intrusion detection system using bayesiannetwork,” Intelligence and Security Informatics, IEEE, 2007.

[12] S. Singh, W. Donat, K. Pattipati, and P. Willet, “Anomaly de-tection via feature-aided tracking and hidden markov model,”Aerospace Conference, IEEE, pp. 1–18, March 2007.

[13] J.-M. Francois, “http://www.run.montefiore.ulg.ac.be/ fran-cois/software/jahmm/,” October 2009. [Online].Available: http://www.run.montefiore.ulg.ac.be/ fran-cois/software/jahmm/

[14] S. Wolfram, “http://www.wolfram.com/products/ mathe-matica/index.html,” October 2009. [Online]. Available:http://www.wolfram.com/ products/mathematica/ index.html

[15] T. Ogawa and T. Kobayashi, “Genetic algorithm based op-timization of partly-hidden markov model structure usingdiscriminative criterion,” IEICE TRANS. INF. & SYS., vol.E89-D, no. 3, March 2006.

[16] F. Yang, C. Zhang, and G. Bai, “A novel genetic algorithmon tabu search for hmm optimization,” Fourth InternationalConference on Natural Computation, IEEE, 2008.

[17] C. Warrender, S. Forrest, and B. Pearlmutter, “Detectingintrusions using system calls: Alternative data models,” inSecurity and Privacy, 1999. Proceedings of the 1999 IEEESymposium on. IEEE, May 1999, pp. 133–145.

[18] S. Jha, K. Tan, and R. A. Maxion, “Markov chains, classi-fiers, and intrusion detection,” Computer Security FoundationsWorkshop, pp. 206–219, 2001.

[19] J. M. Garcia, T. Navarrete, and C. Corona Orozco, “Workloadhidden markov model for anomaly detection,” SECRYPT2006, vol. Proceedings of the International Conference onSecurity and Cryptography, pp. 56–59, August 2006.

[20] R. Khanna and H. Liu, “System approach to intrusion de-tection using hidden markov model,” IWCMC’06, ACM, July2006.