Network intrusion detection based on statistical protocol ...

Masaryk UniversityFaculty of Informatics

}w�� !"#$%&'()+,-./012345<yA|Network intrusion detectionbased on statistical protocol

identification

Master thesis

Adam Mariš

Brno, Autumn 2015

Declaration

Hereby I declare, that this paper is my original authorial work, whichI have worked out by my own. All sources, references and literatureused or excerpted during elaboration of this work are properly citedand listed in complete reference to the due source.

Adam Mariš

Advisor: RNDr. Marián Novotný, Ph.D.

ii

Acknowledgement

I would like to thank my supervisor, Marián Novotný, for his helpfulcomments on this thesis. On personal note, I would like to thanks myfamily and friend for their support.

iii

Abstract

The aim of the thesis is to analyze an approach of intrusion detectionbased on statistical protocol identification. Network protocol identifica-tion using SPID algorithm will be analyzed and evaluated regarding tothe network intrusion detection. Improvements of the algorithm will beproposed and evaluated in an experiment with a real traffic focusing ondetection of network intrusions. A standalone detection/classificationsystem will be proposed and prototyped. Results will be compared withthe other approaches from literature.

iv

Keywords

trojan, network, intrusion, detection, genetic algorithm, snort, spid,protocol, security, malware

v

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Theoretical Background . . . . . . . . . . . . . . . . . . . 4

2.1 Intrusion Detection Systems . . . . . . . . . . . . . . . . 42.1.1 Conceptual Categorization . . . . . . . . . . . . . 4

Knowledge-based IDS . . . . . . . . . . . . . . . 4Behavior-based IDS . . . . . . . . . . . . . . . . 5Compound Systems . . . . . . . . . . . . . . . . 5

2.1.2 Categorization based on Subject of Inspection . . 6Host-based IDS . . . . . . . . . . . . . . . . . . . 6Network-based IDS . . . . . . . . . . . . . . . . 6

2.1.3 Response-based Categorization . . . . . . . . . . 72.1.4 SNORT . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Traffic Classification . . . . . . . . . . . . . . . . . . . . 72.2.1 Supervised Learning . . . . . . . . . . . . . . . . 92.2.2 Unsupervised Learning (Clustering) . . . . . . . . 92.2.3 Classification Metrics . . . . . . . . . . . . . . . . 9

Accuracy . . . . . . . . . . . . . . . . . . . . . . 10Precision . . . . . . . . . . . . . . . . . . . . . . 10Recall . . . . . . . . . . . . . . . . . . . . . . . . 10F-measure . . . . . . . . . . . . . . . . . . . . . 10

2.3 Statistical Protocol Identification (SPID) . . . . . . . . . 112.3.1 Classification . . . . . . . . . . . . . . . . . . . . 13

2.4 Advanced Persistent Threat . . . . . . . . . . . . . . . . 132.4.1 Cyber Kill Chain . . . . . . . . . . . . . . . . . . 14

Reconnaissance . . . . . . . . . . . . . . . . . . . 14Weaponization . . . . . . . . . . . . . . . . . . . 14Delivery . . . . . . . . . . . . . . . . . . . . . . . 14Exploit . . . . . . . . . . . . . . . . . . . . . . . 14C2 . . . . . . . . . . . . . . . . . . . . . . . . . . 14Exfiltration . . . . . . . . . . . . . . . . . . . . . 15

2.4.2 Remote Access Trojans . . . . . . . . . . . . . . . 15Poison Ivy . . . . . . . . . . . . . . . . . . . . . 16Dark Comet . . . . . . . . . . . . . . . . . . . . 17Xtreme RAT . . . . . . . . . . . . . . . . . . . . 19

2.5 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . 20

1

3 Network Intrusion Detection based on SPID . . . . . . 233.1 Algorithm Structure . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Main . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.2 ProgramManager . . . . . . . . . . . . . . . . . . 273.1.3 XMLSerializer . . . . . . . . . . . . . . . . . . . . 273.1.4 ProtocolManager . . . . . . . . . . . . . . . . . . 293.1.5 Classifier . . . . . . . . . . . . . . . . . . . . . . . 29

Configuration . . . . . . . . . . . . . . . . . . . . 29Protocol Fingerprints . . . . . . . . . . . . . . . 29Subprotocols . . . . . . . . . . . . . . . . . . . . 30

3.1.6 Optimizer . . . . . . . . . . . . . . . . . . . . . . 32GA1DBinaryOptimizer . . . . . . . . . . . . . . 33GA2DBinaryOptimizer . . . . . . . . . . . . . . 33ExhaustiveOptimizer . . . . . . . . . . . . . . . . 34Exhaustive2DOptimizer . . . . . . . . . . . . . . 34

3.1.7 Flow . . . . . . . . . . . . . . . . . . . . . . . . . 343.1.8 Metric . . . . . . . . . . . . . . . . . . . . . . . . 353.1.9 Feature . . . . . . . . . . . . . . . . . . . . . . . 35

Base64Enconding . . . . . . . . . . . . . . . . . 36ByteVariance . . . . . . . . . . . . . . . . . . . . 36ControlCharacterFrequency . . . . . . . . . . . . 37ControlCharacterRatio . . . . . . . . . . . . . . 37DirectionEntropy . . . . . . . . . . . . . . . . . . 37Two/Three/Four-byteHash . . . . . . . . . . . . 37PacketLengthPairsReoccurring . . . . . . . . . . 37PayloadSizeHashPairs . . . . . . . . . . . . . . . 38PayloadSizeChanges . . . . . . . . . . . . . . . . 38PayloadSizes . . . . . . . . . . . . . . . . . . . . 38

3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2

1 Introduction

The area of network intrusion detection has been a very hot topic formany years and probably for many years as hot topic will it also stay.Although this statement is applicable on many security related fields,the failures in which usually have significant impact on lives, properties,privacy, etc., the areas dealing with threats spreading through networkswill always be on the front pages, due to the fact it could theoreticallyimpact every user in the world. With the rise of Internet of Things [1]in the coming years, there is no doubt the number of possible attackswill rise, as it already does in recent years [2]. Keeping the successfulattacks to the lowest number as possible is a goal of large number ofsecurity products and mechanisms, like antiviruses, firewalls, intrusiondetection systems, intrusion prevention systems, hardware modules, etc.The security of the whole system is as secure as its weakest point, thuswe must fight the threats on many fronts and it’s important that thestudy and development of each of security areas is done equally.

For our interest, we picked the area of intrusion detection systemsthat deals with attacks on a daily basis. Many approaches for thedetection of malicious activities were introduced up to this date, yetmany of them stayed in experimental phase, most of which were proposedin academical papers, and only a few of them are commonly used [3].There is also a number of private detection methods integrated inenterprise solutions the closeness of which might be double-edged sword.

The aim of this thesis is to analyze a novel approach of intrusiondetection based on statistical protocol identification. This approachwas designed for classification of regular protocols [15], but we see thepotential for its utilization in the field of intrusion detection. To proveit, we propose a prototype of the classifier based on this method andevaluate its performance with other approaches.

Second chapter of this thesis explains and discusses the theory be-hind the used approaches, technologies, algorithms and attack scenarios.In the third chapter, we introduce our own intrusion detection systemand provide the detection results in comparison with other methods.Fourth chapter is a place for a summary of what we achieved, briefdiscussion of problems we still face and ideas for improvements.

3

2 Theoretical Background

2.1 Intrusion Detection Systems

In general, intrusion detection system (IDS) is a detector that processesinformation coming from the system that is to be protected. It is de-signed to dynamically monitor the actions taken in a given environmentand decides whether these actions are symptomatic of an attack orconstitute a legitimate use of the environment [4]. Many approachesand conceptions of IDS systems were proposed up to these days, oftenmaking the categorization of such systems difficult to properly describe.

2.1.1 Conceptual Categorization

With little simplification, there can be seen two complementary trendsin intrusion detection—Knowledge-based IDS and Behavior-based IDS.

Knowledge-based IDS

Knowledge-based detection is also referred as misuse detection, or sig-nature detection. Knowledge-based intrusion-detection techniques applythe knowledge accumulated about specific attacks and system vulnera-bilities. The intrusion detection systems contain information about thesevulnerabilities and looks for attempts to exploit them [4]. In this kindof detection, we can define what constitutes legal or illegal behaviorand compare the observed behavior accordingly. These systems areusually programmed with an explicit decision rule containing usually astraightforward coding of what can be expected to be observed in theevent of intrusion. Intrusions can be encoded as a number of differentstates, each of which has to be present for the intrusion to take place.For state-modeling, time series models are often used. The simplestknowledge-based systems are based on string matching, where the pres-ence of actual attacks are determined by the match of certain substringin the transmitted payload [5]. This method requires deep packet inspec-tion, which is time-consuming and has power and flexibility drawbacks,but gives accurate results. Another common type of knowledge-basedsystems are expert systems, where the attacks are described as a set of

4

2. Theoretical Background

rules. During detection, audit events are translated into facts and theconclusion is based on comparison of these facts with rules [4].

Behavior-based IDS

Behavior-based detection are the opposite of knowledge-based detectionin the sense of modeling the events of interest. Anomaly detectionsystems attempt to model normal behavior, where any event that isabnormal is considered suspicious and should be thoroughly examined [5].We assume that intrusion can be detected by observing a deviationfrom normal behavior, that will show up when comparing the currentactivity with the learned model of normal traffic. The advantage of thisdetection method is the ability of detecting the novel attacks, which ishardly possible with the knowledge-based detection [6]. However, thisapproach also generates non-negligible rate of false positives.

In the area of behavior-based systems, we distinguish between self-learning systems and programmed systems. The method of learningin self-learning systems is typically "learn by example", where systemsobserves traffic for an extended period of time and builds models ofbehavior. Further analyzed traffic is classified based on the built modelsduring training phases [5]. Different approach are programmed systems,which requires a "teacher" who programs it to detect certain anomalousevents. The teacher forms an opinion on what is considered abnormalenough for the system to signal a security violation [5].

Compound Systems

In complement to these two kinds of intrusion detections, there are alsocompound systems, that mix both described techniques performed byone detector. The detector operates by detecting the intrusion againstthe background of the normal traffic of the system [5]. These systemsgive better chance of correctly detecting intrusive behavior, since theypossess the patterns of intrusions and can relate them to the normalbehavior of the system. These systems are usually self-learning, whichmeans they are able to automatically learn what constitutes intrusiveand normal behavior for a system by being presented with examples ofnormal behavior interspersed with intrusive behavior [5]. The examplesof intrusive behavior must be flagged as such during the training phase.

5


They sometimes offer automatic feature selection, when systems operateby automatically determining what observable features are interestingwhen forming the intrusion detection decision [5].

2.1.2 Categorization based on Subject of Inspection

Based on the subject of inspection, we further classify intrusion detectionsystems into host-based and network-based intrusion detection systems.

Host-based IDS

Host-based IDS deal with operating system call traces. The intrusionsare in the form of anomalous subsequences of the traces. The anomaloussubsequences translate to malicious programs, unauthorized behaviorand policy violations. The co-occurrence of events are the key factor indifferentiating between normal and anomalous behavior. Events belongto the predefined alphabet, consisting of individual system calls, e.g."open", "read", "mmap", etc. [6] Although easily-accessible and non-intrusive source of information is provided, recreating context of theinterleaved events can be difficult, and the validity of log events afterthe compromise takes place is questionable [6].

Network-based IDS

Network IDS deal with detecting intrusions in network data. The mainsource of events becomes the traffic between hosts. IDS can observeall communication between a network attacker and the victim system,which has benefits over the host-based IDS, mainly in making no impactof processing on the hosts themselves, isolating the stations from attack-ers influence and the ability to observe network-level events [7]. Maindrawbacks are the performance issues, as the network data dramati-cally increases, and the difficulty to tell whether the data streams arereconstructed identically on monitored hosts and inside the monitor [7].To override the drawbacks from both types of detections, host networkmonitoring approach that combines both techniques was adopted bysome IDS systems and personal firewalls. Data are observed at all levelsof the host’s network protocol stack and the events streams observedby the probe are those observed by the systems itself [4]. However, theimpact on each monitored system is noticeable.

6


2.1.3 Response-based Categorization

Based on the steps taken by IDS after detection of intrusion, we definepassive and active IDS. Most IDS generate an alarm when the intrusionis detected, but no further countermeasures are usually made to stop theattack [4]. The main reason for such approach is the fact that currentlythese systems generate non-negligible false alarms, which would causedenial of service for a number of legitimate users.

2.1.4 SNORT

SNORT is an open-source IDS used for protocol analysis and deeppacket inspection against intrusion signature. SNORT system processesthe traffic of packets on multiple stages, using method called analyze-normalized matching [8]. SNORT uses many efficient string matchingalgorithms for searching for intrusion patterns in packet headers andpayloads. Signatures are defined as rules that may contain header andcontent fields. Header part checks the protocol, source IP and destinationIP addresses and port. Content part scans packet payload for one ormore patterns. Rules can also contain negation patterns. Matchingpatterns are usually in ASCII or HEX format [8].

2.2 Traffic Classification

Traffic classification is often a core part of IDS used to detect intrusionpatterns or abnormal behavior. Besides the traditional security require-ment, such as very high rate of successful detection of intrusions, thereis also a need for high speed processing due to the continuous increaseof network traffic bandwidth. Besides the area of intrusion detection,traffic classification algorithms are also applied in network management,mainly in quality of service control, or lawful interception [9]. The goalsof the traffic classification can be therefore different. In some cases thereis need only for coarse-grained classification, i.e. classify the traffic intothe categories such as transaction-oriented, bulk-transfer, or peer-to-peer file sharing. For IDS the goal is naturally to achieve finer-grainedclassification, e.g. to detect the exact application that generates theinspected traffic [11]. Most commonly deployed IP traffic classificationtechniques are port-based prediction methods and payload-based deep

7


inspection methods. The objects of classification usually are flows. Flowsare defined as 5-tuples that do have in common following attributes:

• protocol

• source port, destination port

• source IP address and destination IP address

Port-based classification, the simplest one, is based on assumptionthat most legitimate applications consistently use well-known ports,i.e. ports belonging into the range from 0 to 1023. This classificationmethod suffer from the increase of application using unpredictable portnumbers [10].

More advanced classification methods infer application type by in-specting the full packet content. Some IDS systems of this type areaware of application semantics and look for application-specific data inpayloads. This type of packet inspection relies on classifier which possesthe knowledge of syntax of each application’s packet payloads. The maindrawback of such approach is the heavy operational load - monitoringdevices have to be repeatedly updated in case of new applications orchanges in existing ones [12].

More scalable approach is based on recognizing the statistical prop-erties in externally observable attributes of the traffic, such as packetlengths, flow directions, character distribution in payloads, etc. Someof these attributes can be easily observed even in encrypted traffic,avoiding the common drawback of deep packet inspection techniques.Some systems use Machine-Learning algorithms for optimization ofclassification based on statistical patterns. Features are first definedby which inspected IP traffic will be identified and differentiated. Fea-tures are attributes of flows calculated over multiple packets. Then themachine-learning classifier is trained to associate sets of features withknown traffic classes and machine-learning algorithms are then appliedto classify the unknown traffic using previously learned rules [12]. Thereare two basic types of learning—supervised and unsupervised.

2.2.1 Supervised Learning

Algorithm is given a collection of sample instances, that are manuallypre-classified (labeled) into classes. Outcome of such learning process is

8


a classification model constructed by examining and generalizing fromthe provided instances. The goal is to identify the mapping from inputfeatures to an output class [12]. Two major phases are compulsory forsupervised learning:

Training - classifier examines the provided data (training dataset) andconstructs a classification model

Testing - the model built during training phase is used to classify newunseen instances

2.2.2 Unsupervised Learning (Clustering)

Clustering focuses on finding patterns in the input data. Clusterergroups instances with similar properties by itself, there is no need forguidance as opposite to classification. These clusters can be exclusive oroverlapping, hierarchical or non-hierarchical [12].

2.2.3 Classification Metrics

Classifiers are evaluated by various metrics. These metrics can be usedto characterize the classifier’s performance with respect to differentpoint of views. Basic variables are:

true positives (TP) percentage of flows belonging into class A thatare correctly classified as belonging into class A

true negatives (TN) percentage of flows belonging into other classesthat are correctly classified as not belonging to class A

false positives (FP) percentage of flows belonging into other classthat are incorrectly classified as belonging into class A

false negatives (FN) percentage of flows belonging into class A in-correctly classified as not belonging into class A

Using these variables, other characterizations can be derived [13].

9


Accuracy

Accuracy is basic measure for binary classification. It is defined asproportion of true results (both true positives and true negatives) amongthe total number of cases examined [13].

A = TP + TN

TP + FP + TN + FN

Precision

Precision, or sometimes called confidence, is most used evaluation mea-sure in the area of Machine Learning, Data Mining and InformationRetrieval. It can be analogously called True positive accuracy, being ameasure of accuracy of predicted positives in contrast with the rate ofdiscovery of real positives [14].

P = TP

TP + FP

Recall

Recall, or sometimes called Sensitivity, is the portion of real positive casesthat are correctly predicted positive. This measure is highly weighted inarea of Computational Linguistics and Machine Translation context [14].

R = TP

TP + FN

F-measure

F-measure, also called harmonic mean or F1 score, combines precisionand recall in following manner:

F1 = 2 ∗ P ∗RP +R

In this form, the weight is evenly distributed among precision andrecall. There is also more generalized form where the evaluator can putmore emphasis on one of the two measures.

10


2.3 Statistical Protocol Identification (SPID)

SPID is Statistical Protocol Identification algorithm designed to per-form protocol identification based on simple statistical measurements ofvarious protocol attributes [15][16]. The attributes in the SPID-basedframework can be defined over arbitrary data that are possible to extractfrom packets and packet flows. It provides a hybrid approach combin-ing elements of deep packet inspection and statistical flow properties.The goal of SPID is to provide fine-grained classification that reliablyidentifies specific application layer protocols the generates the inspectednetwork communication session. SPID doesn’t require preliminary knowl-edge of application protocols, it has the ability to automatically deduceproperties from training data.

It is supervised technique, so the algorithm expects pre-classifiedtraining data. For adding a new protocol into a database of knownprotocols, one has to gather a certain amount of training data for thegiven protocol instead of analyzing the protocol to create protocolpatterns, which can be in some cases easier. SPID framework representsprotocol model fingerprints in XML format and is designed to be cross-platform and language independent. The proof-of-concept was writtenin C# [15]. The key requirements for SPID are:

• small protocol database size

• low time complexity

• early identification of the protocol

• reliable and accurate protocol identification

Real-world data are often gathered and inspected by embeddednetwork devices that usually don’t have much computational powerand memory and the data amount is often extensively large. Thereforethe efficiency is in the real-world often more valued than the actualaccuracy and reliability. To provide QoS to an active session in real-time or to detect the malicious protocol, early identification of protocolis very important success factor. And finally, in the area of threatdetection, the accuracy and reliability of protocol identification can’tdrop behind the other factors. In order to be deployed in real-time

11


Figure 2.1: Example of attribute fingerprint for ByteFrequency feature [16]

traffic classification, SPID provides classification on-the-fly [15]. SPIDcan identify the most protocols with high probability after first 10inspected packets in session [17].

SPID uses protocol models containing a set of attribute fingerprints.Fingerprints are created through frequency analysis of various attributes.such as application data or flow features. The proof-of-concept algorithmcontains around 30 attribute meters which are the functions that providedistribution measurements for each attribute[15].

One of the simpler attribute is ByteFrequencyMeter (Figure) thatmeasures the frequency of occurrence in the application payload foreach character. The fingerprint consists of 256 entries with number ofoccurrences and the calculated probability of occurrence for the corre-sponding character. To be specific, attribute fingerprints are representedby two vectors of discrete bins. The first one is counter vector holding thediscrete values and the second vector is probability vector. Probabilityvectors are normalized versions of the counter vectors with all valuesin every probability vector summing up to 1.0. Proof-of-concept SPIDalgorithm uses vectors of 256 length for all attributes, however SPIDframework doesn’t restrict the length to fixed length. The important isjust that lengths of vectors of the same attributes have to be equal.

Protocol models are constructed after establishment of session, e.g.after TCP handshake. Although proof-of-concept SPID algorithm sup-ports only TCP sessions, the framework can be used to identify protocolswith any session-based communication scheme, i.e. uni- or bi-directionalflow. Every further packet containing application payload is then fed intothe attribute meters which provide measurements that are stored in thesession’s protocol model. Protocol model is updated for every attributeand fingerprint counters are incremented. Pre-classified training dataare converted to protocol model objects by generating protocol modelfor each session which are then merged by adding the fingerprints of thesame protocol and attribute type.

It was observed that in order to construct reliable protocol model, a

12


number of sessions equal to 10% of the fingerprints vector lengths arerequired on the input during training phase [15].

2.3.1 Classification

Protocol models are classified by comparing the corresponding attributefingerprints. The comparison is calculated between probability vectorsof protocol models using Kullback-Leibler divergence (K-L) [18]. K-Ldivergence, also known as relative entropy, represents a value expressinghow much extra information is needed to describe the values in theobserved session by using a code, which is optimized for the knownprotocol model instead of using a code optimized for the session protocolmodel. It can be also simpler interpreted as information gain achievedif P is used instead of Q. The output of K-L ranges from 0 to 1. Inthe following equation, Pattr represents probability vector for a specificattribute of an observed session, whereasQattr,prot denotes the probabilityvector of the same attribute of a known protocol model [15].

KL(Pattr, Qattr,prot) =∑

i

Pattr(i) · log2Pattr(i)

Qattr,prot(i)(2.1)

It should be also noted that this equation is defined only if

Qattr,prot(i) = 0⇒ Pattr(i) = 0

which will result in zero outcome. K-L divergence is not symmetric in Pand Q, meaning KL(P,Q)! = KL(Q,P ) in general [19]. Best matchingprotocol is the one with the lowest K-L divergence. Observed session isthen classified as the best matching protocol, if K-L divergence resulthas also fit under the defined threshold to lower false positive rate.Otherwise the observed flow will be classified as unknown [15].

2.4 Advanced Persistent Threat

Advanced Persistent Threat (APT) is best described as a campaign,which is a series of attacks over time [20]. The persistence in APTintrusions is manifested in maintaining a presence on the network andrepeatedly attempting to gain entry to areas where presence is notestablished[21].

13


2.4.1 Cyber Kill Chain

Each attack can be described in 6 sequential stages:

Reconnaissance

First phase of the chain is information collecting, learning the internalstructure of the target organization, studying the applied security pro-tections and finding vulnerabilities. This is often done by port scanning,system enumeration, browsing websites, pulling down PDFs, etc. [22]

Weaponization

Weaponization is the act of placing malicious payload into a deliveryvehicle, usually using a technique for obfuscation of shellcode, the wayan executable is packed into a trojaned document, etc. This phase is notvisible to victims and may not happen after reconnaissance. Detectionis not always possible[22].

Delivery

In this phase, the malicious payload is delivered to its target, e.g. byHTTP request containing SQL injection code, or email with hyperlinkto compromised website, or infected DOC document sent by email [22].

Exploit

Software, human, or hardware vulnerability is being exploited duringthis stage. It can be single-phase or multi-phase exploit. A single-phaseexploit results in compromised host behaving according to attacker’swishes directly as a result of the successful execution of the deliveredpayload. A multi-phase exploit involves delivery of shellcode whosesole function is to pull down and execute more capable code uponexecution[22].

C2

C2 phase, more descriptively called also as command-and-control, rep-resents the period after the successful compromise of a system. The

14


communication back to adversary must often be made before any po-tential for impact to data can be realized. The chance of detection ofthe attack is often highest during this phase and mitigation is usuallysimple after detection [22].

Exfiltration

This phase actually happens when the data, which has been the ultimatetarget of attack, are taken [22].

This chain doesn’t have to be a linear flow, some phases may occurin parallel, some earlier phases can be interchanged, or may not occurat all. The chain represents how far along an adversary has progressedin the attack, the corresponding damage, and investigation that mustbe performed [22].

2.4.2 Remote Access Trojans

Trojan horse is a computer program that appears to have useful function,but also has a hidden and potentially malicious function [23]. Theprimary purpose is to trick the user into executing the program oropening the file containing the Trojan horse. In contrast to virusesand worms, Trojan horses do not replicate and always require userintervention to perform their unauthorized activities. Remote accessTrojan (RAT), or sometimes called as remote access tool, is a Trojanhorse that, when executed, enables some form of remote access andcontrol of the compromised system by unauthorized adversary. It is verysimilar to backdoor [24].

RATs are very popular mainly among the community of script kiddies,due to their simplicity of usage with no requirements for advancedtechnical knowledge. In spite of that, they still remain a keystone ofmany sophisticated cyber attacks and APT campaigns and still posea serious threat [20] to the security. The difference between most ofcrimeware and RATs is that RATs require live, direct, real-time humaninteraction by the attacker, which might make the attack a biggerchallenge to detect in case of cautious and skilled attacker.

In following sections, we’ll describe selected RAT horses.

15


Poison Ivy

Poison Ivy is freely available popular RAT targeting Windows operationsystems. It provides many features common to most Windows-basedRATs, such as remote desktop capturing, key logging, file transfer,system administration, etc. Since its release in 2005, it has become partof several well-known APT campaigns [20].

Poison Ivy allows to build customized PIVY server, which is deliveredto a victim either integrated as a malicious payload of some containeror as a standalone binary, typically using social engineering. Once theserver is executed on victim’s side, it connects to PIVY client runningon attacker’s machine, providing the attacker with the control over thevictim’s system [20].

PIVY server usually divides its code into initialization and mainte-nance code and networking code. The initialization and maintenancecode is injected into the already-running explorer.exe process. De-pending on the PIVY server’s configuration, networking code mightlaunch hidden default Web browser process, inject itself into that processand downloads the rest of the code and data for other features andfunctionality from the specified location. All of PIVY’s global variables,configuration details and function pointers are stored in a C-style structdata structure, which is also injected into the target processes [20].

Initialization and maintenance code is injected into explorer.exefunction by function—each with its own memory region, filling in thefunction pointers in its struct data structure. If the persistence optionis enabled, a watchdog thread is also injected into explorer.exe, thatautomatically restarts the PIVY server if it’s unexpectedly terminatedby victim’s operating system [20].

Poison Ivy uses its custom application protocol over TCP. Commu-nication is encrypted using Camellia cipher with 256 bit key. Key isderived from password, with default being “admin”. The password iszero-padded to 32 bytes. For initialization, Poison Ivy uses challenge-response protocol depicted in Figure 2.3. PIVY server sends 256 bytes ofrandom data to the PIVY client. PIVY client encrypts these data usingthe configured key and sends back to PIVY server for validation. Mostof the data sent is compressed before encryption with the use of LZNT1compression algorithm [32]. Protocol sends encrypted data in chunkswith prepended header containing command ID, stream ID (multiple

16


alert tcp $HOME_NET any -> $EXTERNAL_NET 3460 (msg:"ETTROJAN PoisonIvy Key Exchange with CnC Init";flow:established,to_server; dsize:256;flowbits:set,ET.Poison1; flowbits:noalert;reference:url,doc.emergingthreats.net/2008380;classtype:trojan-activity; sid:2008380; rev:3;)

alert tcp $EXTERNAL_NET any -> $HOME_NET any (msg:"ETTROJAN Chorns/PoisonIvy related Backdoor Keep Alive";flow:established; dsize:12; content:"/AVAILABL/|0d0a|";reference:url,doc.emergingthreats.net/2010345;reference:md5,9fbd691ffdb797cebe8761006b26b572;classtype:trojan-activity; sid:2010345; rev:4;)

alert tcp $HOME_NET any -> $EXTERNAL_NET any (msg:"ETTROJAN PoisonIvy.E Keepalive to CnC";flow:established,to_server; content:"|90 48 5c d5 ec70 a3 8b 41 72 28 50 ec f6 d5 2a|"; offset:16; depth:16;reference:url,www.threatexpert.com/report.aspx?md5=fc414168a5b4ca074ea6e03f770659ef;classtype:trojan-activity; sid:2013337; rev:4;)

Figure 2.2: SNORT rules for Poison Ivy

streams of data are allowed), size of chunk and other validation info [20].There many SNORT signatures for detection of Poison Ivy com-

munication freely available [38]. Some of them exploits the fact, thatPIVY server sends 256 bytes in the first packet, which may lead tohigher occurrence of false positives. Others detects PIVY by “keepalive”packets and some of them are configuration-dependable, which mightbe bypassed by different configuration. Figure 2.2 shows some of them.

Dark Comet

Similarly as Poison Ivy, Dark Comet is also very popular among RATprograms, due to the fact it’s freely available with active development,simple to use and feature-rich. With newer versions Dark Comet directstoward more legitimate use, yet still providing options for bypassing,

17


Figure 2.3: Initial communication of Poison Ivy’s protocol [20]

hiding and disabling services. As most of RAT programs, it uses client-server model with client-side provided with an administration console(Figure 2.4) to manage all incoming connections allowing full commandand control capability over the compromised servers [30].

Dark Comet uses customized application protocol running via TCP.Communication is encrypted by default, using RC4 algorithm with afew optimizations introduced by the compiler that adds entropy to thepool. [31]. It uses 256 bit key derived from security password set by theattacker. The default encryption key is #KCMDDC(DC_version)#-890,where (DC_version) denotes the version of Dark Comet (4 for version4, 42F for version 4.2, 5 for version 5, etc.). If security password isconfigured, the password is appended to the default key.

If no active command and control traffic occurs, the network con-nection is usually maintained with a series of TCP requests [PSH,ACK]containing the word “keepalive” followed by a string of digits.

To bypass a firewall, Dark Comet injects the communication codeinto a process that is allowed to pass through the firewall, e.g. InternetExplorer. First of all, Internet Explorer is identified, opened in back-ground and suspended. Some extra memory is allocated into the processand Dark Comet’s code is copied inside this new buffer. After this is

18


Figure 2.4: Client-side administration console in Dark Comet

done, iexplorer.exe process is resumed.There are couple of SNORT rules freely available for network detec-

tion of Dark Comet’s communication protocol [38]:

Xtreme RAT

XtremeRAT is another popular freely available RAT tool that hasappeared in 2010. There are also other private versions which arepaid. However, even free versions provides users with rich functionality,such as remote shell, file manager, registry manager, audio and videocapturing, process manager, keylogging, etc. Traditionally, it comprisesof two components—client at attacker’s side and server on victim’sendpoint. XtremeRAT backdoors maintain and reference configurationdata chosen by attacker at the time of building malicious executable.Several versions of XtremeRAT stores this configuration data into%APPDATA% Microsoft Windows encrypted with RC4 algorithm usingfixed password [39]. This configuration data includes:

name of installed backdoor

directory under which backdoor is installed

19


alert tcp $EXTERNAL_NET any -> $HOME_NET any(msg:"ET TROJAN DarkComet-RAT init connection";flow:from_server,established; dsize:12; content:"|38 45 41 34 41 42 30 35 46 41 37 45|"; flowbits:set,ET.DarkCometJoin; reference:url,www.darkcomet-rat.com;reference:url,anubis.iseclab.org/?action=result&task_id=1a7326f61fef1ecb4ed4fbf3de3f3b8cb&format=txt;classtype:trojan-activity; sid:2013283; rev:4;)

alert tcp $EXTERNAL_NET any -> $HOME_NET any(msg:"Context Signature: DarkComet-RAT Incoming Keepalive";flow:from_server,established; content:"KeepAlive";pcre:"/KeepAlive\|\d{7}/"; classtype:trojan-activity;sid:1000001; rev:3; reference:url,www.contextis.com/research/blog/malware-analysis-dark-comet-rat/;)

Figure 2.5: SNORT rules for Dark Comet

which process it will inject into

CnC information (attacker’s IP address and port)

FTP information for sending stolen keystroke data

mutex name of the master process

ID and group name used for organizational purposes

The whole communication is encrypted using RC4 algorithm as well.There can be found some SNORT rules for detection of XtremeRAT,most of which are based on picking the CnC beacon:

2.5 Genetic Algorithms

Genetic algorithms are popular optimization technique employed inmany systems, due to its robustness, as the search is not biased towardsthe locally optimal solution, but possess the capacity to locate theglobal optimum [25]. The difference from random sampling algorithms

20


alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS(msg:"ET TROJAN W32/Xtreme.RAT CnC Beacon";flow:established,to_server; content:".functions";http_uri; pcre:"/\x2F[0-9]+\x2Efunctions$/U";classtype:trojan-activity;reference:url,www.malwaresigs.com/2013/01/17/xtreme-rat/;reference:url,www.symantec.com/connect/blogs/w32extrat-syrian-conflict-used-deliver-xtreme-rat;sid:139992; rev:1;)

Figure 2.6: SNORT rule for Xtreme RAT [40]

is that genetic algorithms are able to direct the search towards relativelyprospective regions in the search space. Genetic algorithm is usuallycharacterized by:

• genetic representation (encoding) of solutions

• population of encoded solutions (also called chromosomes) - mostcommonly encoded as binary strings

• fitness function that evaluates the optimality of solutions

• genetic operators generating new population from existing popu-lation

• control parameters

Genetic algorithm can be viewed as an evolutionary process where apopulation of solutions evolves over a sequence of generations. Duringeach generation, the fitness of each solution is evaluated and solutionsare selected for reproduction based on their fitness. “Good” solutions,the ones with highest fitness score, are selected for reproduction and“bad” solutions are eliminated[25]. The reproduction consists of apply-ing genetic operators, the most common of which are crossover andmutation.

Crossover is a reproduction technique that takes two parent chro-mosomes and produces two child chromosomes. Most common methodis one-point crossover. Using this method, both parent chromosomesare split into left and right subchromosomes and child chromosome gets

21


the left subchromosome of one parent and the right subchromosome ofthe other parent [26]. Crossover occurs only with some defined proba-bility. When chromosomes aren’t subjected to crossover, they remainunmodified.

Mutation involves the modification of the value of each gene of achromosome with some defined probability. Assuming binary string isused for encoding, this operation is equal to bit flip on each bit withdefined probability. These random modifications prevent the prematureconvergence of the algorithm to suboptimal solutions [25].

The process of searching terminates when a termination condition isreached, typically when a solution satisfying a minimum criteria wasfound, or fixed number of generations was reached.

22

3 Network Intrusion Detection based onSPID

In this chapter, we present a prototype of network IDS system basedon optimized SPID approach for classification of malicious protocolswith focus on RAT protocols. We introduce a number of novel protocolfeatures the selection for classification of which is optimized using geneticalgorithms. We present enhanced selection approach resulting into betterresults in classification. We also present new method of training ofcomplex protocols that is based on recognizing the subprotocols andtreating them as different protocols. We implemented this algorithm inC++ using C++11 standard. For easier referring, we call this algorithmSPID4NID.

The design of our algorithm is inspired by master thesis of JánRusnačko “Self-optimizing traffic classification framework” [27].

3.1 Algorithm Structure

SPID4NID was designed to be modular and hierarchically structured,as can be seen on Figure 3.1. This way it is easy to extend with furtherfunctionality or integrate the algorithm as a module into larger projects.For easy usage, it provides command-line interface with simple syntax.We will break the description of the functionality into sections describingeach of the algorithm’s classes, in the similar order as the user dataflows across these modules.

3.1.1 Main

This module is implemented by main.cpp source file. It provides interac-tive command-line interface with the use of libedit [28], Berkeley-stylelicensed command line editor library, providing line editing, history, auto-completion, etc. This module parses user input and calls the functionsof ProgramManager class accordingly. It supports following commands:

train train_dir [--val_dir val_dir][--log output_log] [--state output_state][--mode mode]

23

3. Network Intrusion Detection based on SPID

Figure 3.1: SPID4NID class diagram24


Used for training the protocols. It takes one positional argumenttrain_dir-path to directory with pcap files. The names of thesefiles will be used for the names of protocols.val_dir is optional argument and is expected to be the path tothe directory with pcap files of corresponding protocols that areused for optimization. Note that the corresponding files musthave the same name as the files in train_dir. If no val_dir isgiven, optimization will not happen.output_log is optional argument used for storing the resultsof training and optimization into a simple text format. If nooutput_log is given, output will be directed to the standardoutput.output_state is optional argument containing the path to XMLfile that is to be newly created. It is used for storing the databaseof fingerprints that will be created after training or storing thegenerated configuration after training and optimizing. This canbe used for later loading the fingerprints without the need fortraining, loading the configuration without the need of repeatedoptimization, or exporting the database or optimized configura-tion for other purposes. If no output_log is given, no output willbe created.mode is used to indicate which should be stored—database offingerprints (argument 1), or generated configuration (argument2), or both of them (argument 3), with the first being the default.

validate val_dir [--state output_state] [--log output_log] [--mode mode]Used for optimization of classifier’s configuration to gain betterresults in classifying. This function can be called additionallyto train function if no val_dir was given. If val_dir was used,this function does not provide any additional functionality. Themeaning of arguments of validate function is the same as in train.

classify class_dir [--log output_log]Designed for classification of flows captured in pcap files.class_dir is positional argument containing the path to the di-rectory with pcap files which are about to be classified. Each

25


flow in each file will be classified independently as one of thepreviously trained protocol or as unknown if no similarity wasfound between known protocols and currently classified protocol.output_log is the path to the file that will be newly created withthe classification results. For each pcap file there will be listedall flows with the list of 5 most similar trained protocols withthe corresponding score of similarity and the final decision. If noargument is given, default is standard output.

test test_dir [--log output_log]Intended for testing the classifier and its current configura-tion with standard metrics Accuracy, Precision, Recall, and F1-measure as described in 2.2.3.test_dir is positional argument referring to the path to directorycontaining pcap files with protocols the classifier will be testedagainst. The requirement is that this directory has to containfiles named after protocols that were previously trained.output_log will be newly created text file with test results. Af-ter classification of each flow from the given directory, overallstatistics of mentioned metrics will be given, so the analyst cansee the overall performance of the classifier, not just metrics forthe individual protocols. If no output_log is specified, standardoutput will be used.

save save_file [--mode mode]Allows to save either the database of protocol fingerprints thathave been trained or the current configuration—selection of fea-tures. Users can save their trained protocols and later load itwithout the need for repeated training. This gives an opportunityto create public databases of known malicious protocols thatcan be shared among the users. It is similar to the databases ofrules used by SNORT. Regular users can use SPID4NID onlyfor classification, whereas security companies and researcherswith the access to malicious traffic could provide such databasesalong with the optimized configurations. Data are stored in XMLformat which can be easily parsed by any other programs so theknowledge can be shared without any restrictions.

26


save_file is positional argument with the name of newly createdXML file.

mode is optional argument used to indicate whether database ofprotocol fingerprints (value 1), or configuration (value 2), or both(value 3) should be stored, with first being the default. Note thatthis command can be run additionally to save both of them. Thestructure of XML file is described in 3.1.3.

load load_file

Responsible for loading the XML file with protocol fingerprintsand/or configuration, parsing it and storing into internal struc-tures of classifier. Functions automatically look for protocol fin-gerprints and configuration and if found, configures Classifieraccordingly.

help Prints helping message for using the program.

3.1.2 ProgramManager

This module represents the first layer of actual functionality related toclassification. This class manages the program flow and is implementedby ProgramManager.h and ProgramManager.cpp files. Its functions callsfunctions of ProtocolManager and XMLSerializer classes. Main modulecalls functions implemented in ProgramManager class accordingly tothe typed commands. This class also provides functionality for testingand calculates metrics.

3.1.3 XMLSerializer

This class provides functionality for creating the XML documents rep-resenting the database of protocol fingerprints and the configuration,storing it onto XML files, parsing given XML files and creating thedatabase and configuration out of them. It uses TinyXML-2 [29], C++XML parser, released under zlib license. The example of database ofprotocol fingerprints along with configuration can be seen in Figure 3.2.

27


<program_state><protocol_fingerprints>

<protocol name="dark_comet2"><feature id="0">

<vector type="counter"><v>3</v><v>0</v><v>113</v>...

</vector><vector type="probability">

<v>0.0015881419</v><v>0</v><v>0.059820011</v>...

</vector><feature id="1">...

<protocol name="poison_ivy3">...

</protocol_fingerprints><configuration type="2d" width="30" height="8">

<v>1</v><v>0</v><v>1</v>...

</configuration></program_state>

Figure 3.2: Example of classifier’s state in XML

28


3.1.4 ProtocolManager

ProtocolManager class is responsible for managing flows. It stores allcurrent flows, extracts packet information out of pcap files and feedsfunctions of Classifier and functions of Optimizer with them.

3.1.5 Classifier

Classifier class provides the functionality for the actual training andclassifying. It stores the array of all the features, metrics, database ofprotocol fingerprints and the configuration.

Configuration

Configuration can be of two types. The one being a one-dimensionalbinary array with the length equal to number of features. Each i-thelement of the array indicates whether the i-th feature will be computedor not. The goal is to optimize the selection of features to find the mostsuitable combination for all the trained protocols that gives best resultin classification. This approach is common for its simplicity, however itusually isn’t efficient and accurate at the same time for the set of verydifferent protocols. One has to consider a trade-off between the speedthat is proportional with the number of selected features and accuracy.

We propose an approach where each of the protocols has its ownbinary array of the selected features that describe it best. This methodmight be more efficient, because the fingerprint for one feature thatisn’t essential for one protocol, but is essential for other protocols,doesn’t have to be compared with the fingerprint of given protocolduring classification. In most cases it is more accurate because whenthe “to-be-classified” protocol is being compared to known protocol,only fingerprints for features that best describes the given protocol arecompared. This results into two-dimensional array configuration theexample of which can be seen in Figure 3.3.

Protocol Fingerprints

Protocol fingerprints are represented in the same way as in SPID. InSPID4NID, they are implemented using these C++ structures:typedef struct {

29


protocols feature 1 feature 2 feature 3 . . .dark_comet1 1 0 1 . . .poison_ivy4 0 0 1 . . .xtreme_rat2 0 1 1 . . .

. . .

Figure 3.3: Example of 2D configuration.

std::vector<uint> counter_vec;std::vector<float> probability_vec;

} attr_fprint;

typedefstd::map<std::string,std::map<uint,attr_fprint> >proto_fprints;

Subprotocols

RAT programs typically offer great number of functionalities. Ourselected RATs - DarkComet, PoisonIvy, XtremeRAT - are particularlyfeature-rich when it comes to the possibilities the attacker is able todo. We learned from using these RATs for testing purposes that these“protocols” are much more complex and should be viewed in fact asa set of protocols instead of one protocol. There is usually one main“managing” flow which stays opened as long as the attack session isactive. DarkComet and XtremeRAT for most of other functions createnew flow with new source port at the victim’s side to transfer therequested data.

For example new flows are created for remote shell, remote desktopcapture, keylogging, file system traversing, etc. All of these are differentfeatures in nature and there is no wonder that these flows do havedifferent attributes. Therefore it would be nonsensical to calculateattribute fingerprints for each flow and then map them into the oneset of attribute fingerprints belonging to one complex protocol. Thisapproach would effectively destroy most of the attributes specific tovarious flows when being merged in the end.

Another extreme would be to treat all flows as separate protocols.This approach would make the database very large and the optimization

30


and classification many times slower.To deal with this problem, we took a “hybrid” method, implemented

by merge-while-train approach. After the given flow is done with train-ing, the similarity is computed between its attribute fingerprints andattribute fingerprints of all other flows that have ended their trainingbefore. The flow with the highest similarity is taken and if the similar-ity is higher than the defined threshold, attribute fingerprints of bothflows are merged, effectively creating new protocol fingerprint. Usingthis approach, most similar flows are grouped into the same subproto-col, whereas different flows will stay in the different subprotocol. Thisapproach is an effective trade-off between classification accuracy andpractical usability.

The example of debug run of train function can be found in Fig-ure 3.4. dark_comet0 represents the first flow (starting from zero) fromdark_comet.pcap file, the middle integer indicates the ID of the feature(30 in total), and the last floating point number denotes the calculatedsimilarity.

train/poison_ivy.pcap is being readSimilarity among protocols: poison_ivy7dark_comet0:0:0.220615dark_comet0:1:0.0188686...Average similarity: 0.358982dark_comet10:0:0.217782...dark_comet10:1:0.0100894Average similarity: 0.403251poison_ivy5:0:0.9875poison_ivy5:1:1...Average similarity: 0.884729Maximum: poison_ivy5Merged with poison_ivy5

Figure 3.4: Debug run of train function

31


3.1.6 Optimizer

Optimizer class provides optimization of configuration based on resultsfrom validation. Attribute fingerprints of trained protocols are comparedto attribute fingerprints of flows from validation directory and theirsimilarity is computed. These similarities between each protocols arethen used by Optimizer to generate the optimal solution.

The goal is to have high similarity between the flows of the sameprotocols and low similarity between the flows of different protocols. Weassume that pcap files of the same protocols will have the same names, sowe can deduce the protocol of the compared flows according to file names.For each generated configuration we calculate the similarities betweenprotocols according to which features are used using this configuration.We then get similarity for each pair and we calculate the individualscore for each configuration. Score is calculated according to fitnessfunction the simplified version of which can be seen in Figure 3.5.

The logic behind this fitness function is to reward the solution byraising the score for each true positive hit as well as true negative hit. Ifthe similarity between two flows of the same protocol is above THRESH-OLD, we raise the score proportional to the calculated similarity. Thehigher the similarity is, the higher will be the increase. If the flows arenot of the same protocol and the similarity is lower than THRESHOLD,i.e. they can’t be misclassified as each other, the solution will be raisedproportional to 1− similarity, i.e. the lower the similarity is, the higherthe increase will be.

On the other hand, we also punish the solution in case of falsenegatives and false positives in the same way. This method is the mostgeneral and is suitable for the majority of cases. Some applications mayfind tuning the parameters of this function to put more weight on somehit occurrences more accurate to meet their needs. However, we keptthis function symmetric for the demonstration purposes of the mostgeneral concept.

There are 4 Optimizer subclasses currently implemented:

GA1DBinaryOptimizer

This class provides optimization of the one-dimensional configurationwith the use of genetic algorithms. Parameters for genetic algorithms

32


if proto_name1 6= proto_name2 thenif similarity ≥ THRESHOLD then

score← score− similarityelse

score← score+ (1− similarity)end if

elseif similarity ≥ THRESHOLD then

score← score+ similarityelse

score← score− (1− similarity)end if

end if

Figure 3.5: Fitness function

are set as following:

• population size of one generation - 200

• number of generations - 4000

• probability of permutation - 0.01

• probability of crossover - 0.9

GA2DBinaryOptimizer

This class provides optimization of the two-dimensional configurationwith the use of genetic algorithms. Parameters for genetic algorithmsare set as following:

• population size of one generation - 300

• number of generations - 6000

• probability of permutation - 0.001

• probability of crossover - 0.9

Both GA1DBinaryOptimizer as GA2DBinaryOptimizer uses GAlib [33],C++ library of genetic algorithm components.

33


ExhaustiveOptimizer

This class provides optimization of the one-dimensional configuration byexhaustively searching for all possible configurations. The configurationwith the highest score is selected. This way of optimization is consumessignificant amount of CPU time, as the time complexity of this algorithmis in θ(2n). Considering the length of binary array is 30, real runtimeof this optimization on common CPU is in several hours, whereasoptimization based on genetic algorithms takes seconds.

Exhaustive2DOptimizer

This class provides optimization of the two-dimensional configurationby exhaustively searching for all possible configurations. However, it’simplemented as a sequence of one-dimensional array optimization, sincefor 30 features, the time complexity is for high number of trainedprotocols intractable. The asymptotical time complexity is also in θ(2n).

The fitness function 3.5 is used for all 4 subclasses the same. We were notable to inspect the accuracy of solutions generated by genetic algorithmsin comparison with exhaustive search approach for configuration with30 features, but for lower number of features (no more than 20), theresulting scores were very similar.

3.1.7 Flow

Flow is class possessing only attributes used by features. Each flowstores attribute fingerprint for each feature, implemented as a map:std::map<uint, attr_fingerprint>. Among other attributes, thereis also packet buffer, that stores last 10 packets, which is regularlycleaned. Other attributes are number of analyzed packets, indicationwhether it’s uni-flow or bi-flow, flow direction changes, size of last packet,etc.

3.1.8 Metric

Although SPID uses Kullback-Leibler divergence [18] for measuringthe difference between probability vectors of attribute fingerprints, wedecided to use Cosine similarity (3.1), because of its symmetricity,

34


outcomes from probability space and no additional requirements asopposed to KL divergence. Metric class provide function for computingthe similarity between two given attribute fingerprints.

cos(θ) =∑n

i=1 Ai ·Bi√∑ni=1 A

2i ·

√∑ni=1 B

2i

(3.1)

3.1.9 Feature

Feature class is abstract class inherited by specific features used forcalculating the attributes from provided packet payloads. It providestwo virtual functions:

updateFeature - Takes Flow object as input, calculates some specificattribute and updates the attribute fingerprint stored in Flowobject. Note that packet data are retrieved from packet bufferstored in Flow object.

mergeFeatureVectors - Takes two attribute fingerprints, merges themand return the new attribute fingerprint as output. This functionis implemented only by features requiring some specific calculationof probability vectors out of counter vectors. The usual way ofcalculating the probability vector out of counter vector is shownin Figure 3.6. Boolean object variable merge is used for indicationwhether the feature implements mergeFeatureVectors method.If yes, merging of attribute fingerprints during training will behandled by this method, otherwise the default way of mergingbased on standard calculation of probability vectors is used.

SPID4NID implements 30 features in total. Majority of featureswere taken from [15][17][27], some of them might have been slightlymodified. Additionally we implemented 12 features by our own that canbe applied in many applications due to its generality, while they useonly information obtainable from packet.

Base64Enconding

Base64EncondingFeature simply reads the payload data and if it de-tects the character not belonging into Base64 [34] alphabet, it sets

35


count← 0for elem ∈ counter_vec do

count← count+ elemend forfor i ∈ seq(probability_vec.length) do

probability_vec[i]← counter_vec[i]/countend for

Figure 3.6: Calculation of probability vector out of counter vector in at-tribute fingerprint

counter_vec[0] to 0 and counter_vec[1] to 1, otherwise it setscounter_vec[0] to 1 and counter_vec[1] to 0. Base64 encoding isused e.g. in MIME [35] protocol.

ByteVariance

ByteVariance counts statistical variance from all the characters in thepacket payloads of the whole flow according to equation 3.2. The value isthen transposed to probability space < 0, 1 > by dividing by maxvar 3.3and this value is stored to probability_vec[0] and 1− var

maxvaris stored

to probability_vec[1.

var =

N∑i=1

(Xi − µ)2

N(3.2)

maxvar = (max(X)−min(X))2

4 (3.3)

where Xi corresponds to the ASCII value of i-th character in flow, µdenotes average ASCII value of all character if packet payload of thewhole flow, and N is the size of all packet payloads. This feature isclosely related to ByteFrequency feature. Protocols using encryptionwith no special encoding will have large variance, whereas not encryptedor encoded, e.g. by Base64 encoding, will have smaller variance.

36


ControlCharacterFrequency

This feature counts the frequency distribution of only control characters,i.e. non-printing characters [36].

ControlCharacterRatio

This feature is closely related to ControlCharacterFrequency feature. Itcalculates the portion of control characters on all the characters in pay-loads. Number of control characters is stored into probability_vec[0]and number of other characters is stored in probability_vec[1].

DirectionEntropy

DirectionEntropy feature counts how much packets are sent in whichdirection. Vectors are of length 2 and numbers are stored in the sameway as in ControlCharacterRatio. In RAT protcols, number of packetsflowing from server (victim) to client (attacker) is usually much higherthan the other way.

Two/Three/Four-byteHash

These 3 features look at pairs/triplets/quadruplets of characters insteadof single characters and count their frequencies. These tuples are mappedto single character value with the use of XOR operation on all elementsof given tuple. These features are used for looking for certain repeatingstrings, like HTTP, GET, POST, etc.

PacketLengthPairsReoccurring

This feature looks for consecutive packets with the same length andcounts their frequency. This condition is usually met in protocols trans-ferring larger amount of data, when consecutive packets with size hittingmaximum transfer unit (MTU) [37] occur. In RAT protocols, this situa-tion often happens, e.g. when capturing the desktop, recording soundinput, etc.

37


PayloadSizeHashPairs

The concept of this feature is similar to Two/Three/Four-byteHashfeatures, with the difference that this feature counts the frequency ofhashes, of payload sizes of consecutive packets. Difference between Pack-etLengthPairsReoccurring feature and this feature is that PacketLength-PairsReoccurring feature counts only the same sizes of consecutivepackets, whereas PayloadSizeHashPairs feature counts also non-equalsizes of consecutive packets. Hashing is again done by operation XORon both sizes to produce an index to vectors of length equal to MTUfor Ethernet.

PayloadSizeChanges

This feature counts the frequency of the difference between sizes ofeach pair of consecutive packets. In other words, for every packet itcounts the difference between the payload size of current packet andthe payload size of the previous packet, and increments the number ofcounter vector bin with index that is equal to this difference.

PayloadSizes

This feature stores the information on which packets in order has whatsize. For i-th packet, it stores the size of this packet into counter_vec[i].With this feature we enrich our knowledge of positions of packet sizesin order as they are transmitted.

3.2 Evaluation

We’ve set up two virtual machines with Windows XP installed tosimulate the attacks with chosen RAT tools and capture the full com-munication between attacker and victim. This way we could try numberof possibilities for configurations and inspect the its impact on theclassification.

In following, we’ll compare the results of classification of our ownsystem SPID4NID with results of SPID and also SNORT. Since SNORTworks in different way, it will be compared and discussed separately.

In first test (Figure 3.7), we’ve captured three different attacks for

38


Accuracy Recall PrecisionS4N SPID S4N SPID S4N SPID

dark_comet 70% 4.76% 70% 4.76% 100% 100%poison_ivy 14.28% 14.28% 25% 14.28% 25% 100%xtreme_rat 40% 10% 40% 10% 10% 100%OVERALL 55.3% 7.89% 59.09% 7.89% 89.65% 100%

Figure 3.7: Test 1

all three RATs, but with the same configuration, i.e. same version oftool, same encryption key, etc. The structure of these flows should bethe most similar, i.e. the easiest case for classification.

In Figure 3.7, we can see three common classification metrics calcu-lated for each of the chosen RAT and the overall statistics, that is calcu-lated considering all number of flows (not visible in table). In Accuracyand Recall metrics, we can see a major improvement in all three RATs,which is caused by the use of optimization by GA2DBinaryOptimization.In Precision metric, SPID always hits 100% value due to the fact itdidn’t produce any false positives, only a small number of true positivesand large number of false negatives. Since RAT tools are very similar innature, SPID4NID produced some number of misclassification, whensome captured flows belonging to one RAT was classified as belongingto another RAT.

In second test (Figure 3.8), we wanted to inspect how the changeof the security configuration, influences the classification. In this test,we used the same training set and validation set data as in Test 1, andclassified the communication of RAT protocols with different configu-ration. This is important to know, because attacker has got typicallymany options for configuration of such tools. Our classifier was againoptimized using GA2DBinaryOptimization.

As we expected, we can see a notable drop-down of numbers inoverall. The biggest change was produced by Poison Ivy RAT, whichwasn’t detected by any of both classifiers. However, the case of XtremeRAT is rather surprising, as the numbers have even improved againstthe previous test. The reason might be that during the simulated attack,we used the same functions as in training attacks.

In the third test (Figure 3.9), we wanted to inspect how does the

39



dark_comet 42.3% 0% 42.3% 0% 100% 0%poison_ivy 0% 0% 0% 0% 0% 0%xtreme_rat 50% 25% 50% 25% 86.66% 100%OVERALL 36.11% 4.16% 38.23% 4.16% 86.66% 100%

Figure 3.8: Test 2


dark_comet 77.7% 0% 77.7% 0% 100% 0%

Figure 3.9: Test 3

change of RAT tool version influence the results of classification. RATtools are being developed as any other programs. Developers usuallyadds exploitation functionalities and bypassing mechanisms, which mightchange the structure of protocols, making harder to detect yet unknownnewer versions of RAT tools.

We tested only Dark Comet RAT due to the problems of findingolder functioning versions of RAT. Training set contained the datacaptured from communication of Dark Comet v4.2 and tested data werecaptured from communication of Dark Comet v5.3.1.

As we can see in Figure 3.9, SPID4NID has succeeded in classifyingthe majority of flows as opposed to SPID. Although the tools arechanging, it is of no such influence on the overall structure of protocolto make the detection much harder.

When comparing with SNORT, we have to change the methodologyof evaluation. Instead of classifying each of the flows, we consider thatdetection happened if at least one alert was triggered by some of theflows belonging to the protocol of the corresponding rule that triggeredthe alert, since that’s the way how SNORT works. In our classifiermodel, the condition is equal to at least one flow was classified as thegiven malicious protocol. In following tests, we apply rules for each ofour chosen RATs mentioned earlier in Figures 2.2,2.5,2.6.

In Figure 3.10 we can see that detection of RATs with default

40


SPID4NID SPID SNORTdark_comet yes yes yespoison_ivy yes yes yesxtreme_rat yes yes yes

Figure 3.10: Detection of RATs with default configuration

SPID4NID SPID SNORTdark_comet yes no nopoison_ivy no no yesxtreme_rat yes yes no

Figure 3.11: Detection of RATs with modified configuration

configuration set makes no problem for any of tested systems.However most RATs can be easily configured also by technical

newbies. In next test, we tested RATs either differently configured or ofdifferent version from trained protocols. From Figure 3.11, we can seethat SNORT failed to find patterns in modified RATs of tested versionsin majority of cases, so as SPID did. With the exception of Poison IvyRAT, SPID4NID was successful also in this test.

Detection is always about the trade-off. Good detection systemdetects most of the malicious protocols, yet still it has to preserve lowfalse positive rate, otherwise it will be useless. We have chosen a numberof application protocols and Windows applications the communicationof which we captured and classified by our tested classifiers trained onlywith our RATs. From Figure 3.12, it can be seen that none of theseclassifiers misclassified the regular protocols as malicious. Our SPID4NIDsystem has managed to significantly improve the classification rate ofour chosen RATs in comparison with SPID, yet still it preserves thefalse positive rate on chosen applications and application protocols verylow, in this case zero, so as SPID does.

41


AccuracySPID4NID SPID

iTunes 100% 100%Origin 100% 100%

Overwolf 100% 100%SSH 100% 100%

Steam 100% 100%uTorrent 100% 100%WinSCP 100% 100%

Figure 3.12: Detection of RATs among regular non-malicious protocols

42

4 Conclusion

We have proposed a prototype of hybrid intrusion detection systembased on statistical protocol identification. We are not aware of anyknown, well-functioning system based on the approach we used. Thissystem proved its worth in number of tests we have prepared, whereit improved the detection rate of malicious protocols used by remoteaccess trojans in comparison with other tested type of detections andmanaged to preserve very low false positive rate.

We designed this classifier to be easily usable and simple for demon-stration purposes, yet using advanced kind of optimizations. Our classi-fier can be easily enriched with knowledge from external sources ratherthan training by the users by themselves. This gives an opportunity tocreate public databases of known malicious protocols in XML format,that can be downloaded and easily integrated into SPID4NID system,similarly as databases of rules are used in SNORT.

We also introduced a significant number of novel features that provedto be worthy in some protocols, judging by their selection by optimizationalgorithms. We presented new method of training of complex protocolsby automatically dividing them into simpler protocols. We suggestedalso an enhanced feature selection method based on feature selectionmatrix that improved the classification rate.

In the end, we should note that the system presented here is only aprototype and by no means should be regarded as “real world ready” orerror-proof. This way we wanted to present an alternative approach tothe current systems used for intrusion detection.

We think that SPID4NID system has still a long way to the stateof being prepared for real world detection. We see the future in self-learning and auto-tuning methods employing machine learning algo-rithms. SPID4NID still uses a number of fixed parameters, methods oralgorithms that can be self-optimized and can improve the detectionrate, which is fair as shown in tests, and finally can be employed at realworld scenarios.

43

List of Figures

2.1 Example of attribute fingerprint for ByteFrequencyfeature [16] 12

2.2 SNORT rules for Poison Ivy 172.3 Initial communication of Poison Ivy’s protocol [20] 182.4 Client-side administration console in Dark Comet 192.5 SNORT rules for Dark Comet 202.6 SNORT rule for Xtreme RAT [40] 213.1 SPID4NID class diagram 243.2 Example of classifier’s state in XML 283.3 Example of 2D configuration. 303.4 Debug run of train function 313.5 Fitness function 333.6 Calculation of probability vector out of counter vector in

attribute fingerprint 363.7 Test 1 393.8 Test 2 403.9 Test 3 403.10 Detection of RATs with default configuration 413.11 Detection of RATs with modified configuration 413.12 Detection of RATs among regular non-malicious

protocols 42

44

References

[1] Feng Xia, Laurence T. Yang, Lizhe Wang, Alexey Vinel. Internetof Things. International Journal of Communcation Systems 25, pp.1101-1102, 2012.

[2] Symantec Internet Security Threat Report, Vol. 20, 2015. Onlineon https://www4.symantec.com/mktginfo/whitepaper/ISTR/21347932_GA-internet-security-threat-report-volume-20-2015-social_v2.pdf

[3] Di Pietro, Roberto, and Luigi V. Mancini. Intrusion detectionsystems. Vol. 38. Springer Science & Business Media, 2008.

[4] Hervé Debar, Marc Dacier, Andreas Wespi. Towards a taxonomyof intrusion-detection systems. Computer Networks 31, pages 805 -822, 1999.

[5] Stefan Axelsson. Intrusion Detection Systems. Technical Report99-15, 2000.

[6] Varun Chandola, Arindam Banerjee, Vipin Kumar. Anomaly Detec-tion: A Survey. ACM Computing Surveys, Vol. 41, Issue 3, ArticleNo. 15, 2009.

[7] Theuns Verwoerd, Ray Hunt. Intrusion Detection Techniques andApproaches. Computer Communications, Vol. 25, Issue 15, pp.1356-1365, 2002.

[8] Tamer AbuHmed, Abedelaziz Mohaisen, DaeHun Nyang. A Sur-vey on Deep Packet Inspection for Intrusion Detection Systems.Magazine of Korea Telecommunication Society, Vol. 24, No. 11, pp.25–36, 2007.

[9] Jun Zhang, Yang Xiang, Yu Wang, Wanlei Zhou, Yong Xiang, YongGuan. Network Traffic Classification Using Correlation Information.Parallel and Distributed Systems, IEEE Transactions on Vol. 24,Issue 1, 2012.

45

References

[10] T. Karagiannis, A. Broid, N. Brownlee, K. Claffy, M. Faloutsos. IsP2P dying or just hiding? Global Telecommunications Conference,GLOBECOM ’04, IEEE Vol. 3, pp. 1532-1538, 2004.

[11] Min Zhang, Wolfgang John, KC Claffy, Nevil Brownlee. State ofthe art in traffic classification: A research review. PAM StudentWorkshop, 2009.

[12] Nguyen, Thuy TT, and Grenville Armitage. "A survey of tech-niques for internet traffic classification using machine learning."Communications Surveys & Tutorials, IEEE 10.4 (2008): 56-76.

[13] Metz, Charles E. "Basic principles of ROC analysis." Seminars innuclear medicine. Vol. 8. No. 4. WB Saunders, 1978.

[14] Powers, David Martin. "Evaluation: from precision, recall andF-measure to ROC, informedness, markedness and correlation."(2011).

[15] E. Hjelmvik, “The spid algorithm - statistical protocol identifica-tion,” Technical Report, 2008.

[16] Hjelmvik, Erik, and Wolfgang John. "Statistical protocol identifica-tion with spid: Preliminary results." Swedish National ComputerNetworking Workshop. 2009.

[17] Köhnen, Christopher, et al. "Enhancements to Statistical ProtocolIDentification (SPID) for Self-Organised QoS in LANs." ComputerCommunications and Networks (ICCCN), 2010 Proceedings of 19thInternational Conference on. IEEE, 2010.

[18] Polani, Daniel. "Kullback-leibler divergence." Encyclopedia of Sys-tems Biology. Springer New York, 2013. 1087-1088.

[19] Kullback, Solomon. Information theory and statistics. Courier Cor-poration, 1968.

[20] Poison Ivy: Assessing Damage and Extracting In-telligence, FireEye Technical Report, 2014. On-line on https://www.fireeye.com/content/dam/fireeye-www/global/en/current-threats/pdfs/rpt-poison-ivy.pdf

46

References

[21] Security Intelligence: Defining APT Campaigns, SANS Dig-ital Forensics and Incident Response Blog, 2010. Online onhttps://digital-forensics.sans.org/blog/2010/06/21/security-intelligence-knowing-enemy/

[22] Security Intelligence: Attacking the Cyber Kill Chain, SANSDigital Forensics and Incident Response Blog, 2009. On-line on https://digital-forensics.sans.org/blog/2009/10/14/security-intelligence-attacking-the-kill-chain/

[23] SANS Glossary of security terms

[24] Kienzle, Darrell M., and Matthew C. Elder. "Recent worms: asurvey and trends." Proceedings of the 2003 ACM workshop onRapid malcode. ACM, 2003.

[25] Adaptive Probabilities of Crossover Genetic in Mutation and Algo-rithms

[26] Srinivas, Mandavilli, and Lalit M. Patnaik. "Adaptive probabilitiesof crossover and mutation in genetic algorithms." Systems, Manand Cybernetics, IEEE Transactions on 24.4 (1994): 656-667.

[27] Jano

[28] libedit, command line editor library, online onhttp://thrysoee.dk/editline/

[29] TinyXML-2, C++ XML parser, online onhttp://www.grinninglizard.com/tinyxml2/

[30] Laura Aylward, Malware Analysis - Dark Comet RAT, 2011. Onlineon http://www.contextis.com/resources/blog/malware-analysis-dark-comet-rat/

[31] Quequero, DarkComet Analysis – Understanding theTrojan used in Syrian Uprising, 2012. Online onresources.infosecinstitute.com/darkcomet-analysis-syria/

[32] LZNT1 Algorithm Details, Microsoft MSDN. Online onhttps://msdn.microsoft.com/en-us/library/jj665697.aspx

47

References

[33] GAlib, genetic algorithms library, online onhttp://lancet.mit.edu/ga/

[34] S. Josefsson. The Base16, Base32, and Base64 Data Encodings.RFC 4648, 2006. Online on https://tools.ietf.org/html/rfc4648

[35] N. Freed Innosoft, N. Borenstein. Multipurpose Internet Mail Ex-tensions (MIME) Part One: Format of Internet Message Bodies,RFC 2045, 1996. Online on https://tools.ietf.org/html/rfc2045

[36] https://tools.ietf.org/html/rfc20

[37] Vint Cerf. ASCII format for Network Interchange, RFC 20, 1696.Online on https://tools.ietf.org/html/rfc4459

[38] Emerging Threats database of SNORT rules, online onhttp://rules.emergingthreats.net/open/snort-2.9.0/emerging-all.rules

[39] Nart Villeneuve, James T. Bennett. XtremeRAT: Nui-sance or Threat? FireEye Threat Research Blog,2014. Online on https://www.fireeye.com/blog/threat-research/2014/02/xtremerat-nuisance-or-threat.html

[40] Denbow, Shawn, and Jesse Hertz. "pest control: tam-ing the rats." Matasano Security, www. steptoecyberblog.com/files/2012/11/PEST-CONTROL1. pdf (2012).

48

Network intrusion detection based on statistical protocol ...

Documents

Transcript of Network intrusion detection based on statistical protocol ...