Adaptive Network Flow Parameters for Stealthy Botnet Behavior
Detection and Mitigation of Botnet Infiltration using Intelligent Swarm Networks
Transcript of Detection and Mitigation of Botnet Infiltration using Intelligent Swarm Networks
NANYANG TECHNOLOGICAL UNIVERSITY
Detection and Mitigation of Botnet Infiltration using
Intelligent Swarm Networks
Supervisor: A/P Yow Kin Choong
A/P Lee Keok Kee
Examiner: A/P Vinod Achutavarrier Prasad
Student: Lee Wai Seng Jonathan
School of Computer Engineering
2011/2012
NANYANG TECHNOLOGICAL UNIVERSITY
SCE11-0315
Detection and Mitigation of Botnet Infiltration using
Intelligent Swarm Networks
Submitted in Partial Fulfillment of the Requirements
for the Degree of Bachelor of Computer Engineering
of Nanyang Technological University
by
Lee Wai Seng Jonathan
School of Computer Engineering
2011/2012
ii
ABSTRACT
This paper discusses the development of appropriate algorithms to analyze traffic
patterns from internal machines to outside machines. Strange anomalies such as the
abnormal amount of network traffic, or immense differences in request and response
packets can indicate a botnet command and control attempt. A three-pronged
approach was suggested to neutralize, mitigate and attack the source of attacks,
which was later modified to follow a more assured path. We propose to analyse the
traffic coming from internal nodes to external networks using a threshold and known
flooding signs. Simulation experiments have been carried out with Wireshark, a
network protocol analyzer, and LOIC, a network stress testing tool. Out of 6 files and
15 tests, our method proved to be rather accurate at 73.33%. Nevertheless, the
accuracy could be better calculated with larger sample sizes.
iii
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude to my supervisor, Associate Professor
Yow Kin Choong, who has been continuously providing invaluable advice and
direction throughout the process of this project. It would not have been such an
enriching and enjoyable experience without him.
Additionally, I am also appreciative of the efforts of my second supervisor, Associate
Professor Lee Keok Kee, for the assistance and supervision over my report.
Last but not least, I would like to give my thanks and appreciation to Lua Ruiping,
who has given me pointers as well as guidance in the completion of my
experimentations.
iv
TABLE OF CONTENTS
ABSTRACT ................................................................................................................. ii
ACKNOWLEDGEMENTS ........................................................................................ iii
TABLE OF CONTENTS ............................................................................................ iv
LIST OF FIGURES ................................................................................................... vii
1. INTRODUCTION ................................................................................................... 1
1.1 Background information ............................................................................... 1
1.2 Objective ....................................................................................................... 2
1.3 Scope ............................................................................................................. 3
1.4 Overview ....................................................................................................... 3
2. REVIEW OF RELATED WORK ........................................................................... 4
2.1 Statistical Anomaly Detection ........................................................................... 4
2.2 Chi-Square Statistic Approach ........................................................................... 4
2.3 Covariance Analysis Method ............................................................................. 6
2.4 Intelligent Fast-Flux Swarm Network ............................................................... 6
3. PROJECT PLAN AND STRATEGY ...................................................................... 8
3.1 Initial Idea – Three-Pronged Approach (1) ........................................................ 8
3.1.1 DDoS Mitigation Module ........................................................................... 8
3.1.2 DDoS Neutralizing Module ........................................................................ 9
3.1.3 DDoS Attack Module ............................................................................... 10
3.2 Initial Idea – Three Pronged Approach (2) ...................................................... 12
4. IMPLEMENTATION ............................................................................................ 13
4.1 Attack Simulation ............................................................................................ 13
4.1.1 Experiment Software ................................................................................ 14
4.1.1.1 XAMPP v1.17 VC9 ........................................................................... 14
4.1.1.2 No-IP DNS Update Client (DUC) v3.0.4 .......................................... 15
4.1.1.3 Hive Mind LOIC v1.1.2.1 .................................................................. 16
4.1.1.4 mIRC v7.22 ........................................................................................ 17
4.1.1.5 Wireshark v1.6.5 (SVN Rev 40429 from /trunk-1.6) ........................ 18
4.1.1.6 Sawmill Enterprise v8.5.5 .................................................................. 19
4.1.1.7 NetWitness Investigator v9.0.5.4 ....................................................... 20
4.1.1.8 WinDump ........................................................................................... 20
v
4.1.2 Experiment Set-Up ................................................................................... 21
4.1.3 Experiment Steps and Details ................................................................... 22
4.1.4 Experiment Results ................................................................................... 23
4.1.5 Experiment Discussion ............................................................................. 25
4.2 Network Bandwidth Analysis .......................................................................... 27
4.2.1 Analysis Approach and Design ................................................................. 27
4.2.1.1 Tracing Analysis ................................................................................ 28
4.2.1.2 Statistical Analysis ............................................................................. 28
4.2.1.3 Graphical Analysis ............................................................................. 28
4.3 Main Implementation ....................................................................................... 31
4.3.1 Wireshark Filters ....................................................................................... 31
4.3.1.1 HTTP flood ........................................................................................ 31
4.3.1.2 SYN flood .......................................................................................... 32
4.3.1.3 UDP flood .......................................................................................... 33
4.3.1.4 ICMP flood ........................................................................................ 33
4.3.1.5 Using Wireshark ................................................................................ 33
4.3.2 Outflow.java ............................................................................................. 35
4.3.2.1 Outflow.java Input Parameters .......................................................... 35
4.3.2.2 Outflow.java Methods ....................................................................... 37
4.3.2.3 Outflow.java Output .......................................................................... 40
4.3.3 Testing ...................................................................................................... 41
4.3.3.1 FileNotFound Exception .................................................................... 41
4.3.3.2 ArrayIndexOutOfBounds Exception ................................................. 41
4.3.3.3 Invalid Arguments ............................................................................. 42
5. RESULTS .............................................................................................................. 43
6. DISCUSSION ........................................................................................................ 44
6.1 Observed Results ............................................................................................. 44
6.2 Design Limitations ........................................................................................... 44
6.2.1 Assumptions .............................................................................................. 44
6.2.2 Response time ........................................................................................... 45
6.2.3 Logic Limitations ...................................................................................... 45
7. CONCLUSION ...................................................................................................... 47
vi
8. RECOMMENDATIONS ....................................................................................... 48
REFERENCES .......................................................................................................... 49
APPENDIX ................................................................................................................ 50
1. PROGRAM SOURCE CODE ............................................................................... 50
1.1 Outflow.java .................................................................................................... 50
1.2 Calc.java .......................................................................................................... 60
2. SAMPLE FILES .................................................................................................... 68
2.1 Sample filtered log file from Wireshark(.csv) ................................................. 68
2.2 Sample results file ............................................................................................ 69
vii
LIST OF FIGURES
Figure 1 – XAMPP in operation ............................................................................. 14
Figure 2 – No-IP DUC Client .................................................................................. 15
Figure 3 – Hive Mind LOIC .................................................................................... 16
Figure 4 - mIRC ....................................................................................................... 17
Figure 5 - Wireshark ............................................................................................... 18
Figure 6 – Sawmill Enterprise ................................................................................ 19
Figure 7 – NetWitness Investigator ........................................................................ 20
Figure 8 - WinDump ................................................................................................ 20
Figure 9 – Experiment Set-Up ................................................................................ 21
Figure 10 – NetWitness Investigator log of IRC-commanded LOIC .................. 23
Figure 11 - NetWitness Investigator log of Twitter-commanded LOIC ............. 24
Figure 12 - NetWitness Investigator demo collection showing ‘Alerts’ .............. 25
Figure 13 – Wireshark > Statistics > IO Graphs .................................................. 34
Figure 14 – Wireshark IO Graphs ......................................................................... 34
Figure 15 – Outflow.java in use .............................................................................. 35
Figure 16 – Outflow.java Output ........................................................................... 40
Figure 17 – FileNotFound Exception ..................................................................... 41
Figure 18 – ArrayIndexOutOfBounds Exception ................................................. 41
Figure 19 - Invalid Arguments ............................................................................... 42
1
1. INTRODUCTION
1.1 Background information
In today’s digital age, technology advances at such a progressive rate that new
Information Technology (IT) related jargons have to be coined every few years. In
comes “botnets”, “zombie computers” and “Denial of Service (DoS) attacks”. Each
of them a new term to strike fear into end-users.
Zombie computers are computers that are remotely-controlled by a Command and
Control (C&C) server mainly through IRC ports. Together, they form a botnet, or a
net of ‘bots’. Users of these machines may not even realize that they are part of one
as all that they will notice is a slight lag in accessing the Internet.
As organized crime took to the digital realm in the last decade, their activities were
mainly attributed to money-making through phishing and blackmails. Botnets
were used in this sense to send spam emails and launch distributed DoS (DDoS)
attacks in an attempt to bring down the targeted organisation’s website. In the present
days, threats to the cyber world continue to rise as their foci shift with the times.
Motion Picture Association of America (MPAA) was brought down by Anonymous
for its anti-piracy stands[1]. Similarly, websites of Mastercard, Visa, Paypal, CIA,
The Vatican and many other security firms and high profile targets were taken offline
by the groups, Anonymous and LulzSec, for the reasons of revenge or highlighting
and demonstrating security flaws in insecure systems[2][3][4].
DDoS attacks are simply DoS attacks on a larger scale, usually from thousands of
zombie computers trying to open connections with the targeted website, flooding it
with requests and causing servers or websites to lose online availability and
serviceability. Low Orbit Ion Cannon (LOIC), an open source network stress testing
tool, was tweaked and used by members of the groups Anonymous and LulzSec in
their recent acts[5].
2
Companies which require their websites to be highly available would lose a
considerable amount of returns due to disruptions in their online services. Most of
them would even resort to spending 75% more for extra bandwidth. Furthermore,
Bandwidth overprovisioning is not the cheapest nor most productive solution to the
problem as it is only used to counter against the worst outcome possible. Recent
statistics indicate that DDoS attacks nowadays range from as high as 1 million
packets per second (Mpps) to almost 5 Mpps. An attack of such magnitude would
easily cripple networks, even those that are sufficiently prepared[6].
These trends clearly signify a call for a new and efficient method to respond to such
attacks.
1.2 Objective
This project aims to develop appropriate algorithms to analyze traffic patterns from
internal machines to outside machines. Strange anomalies in connection duration,
time of day, or type of information uploaded/ downloaded can indicate a botnet
command and control attempt. However, the computing demands to analyze this
massive amount of data on a single machine make this task infeasible, and hence we
look for a distributed approach to overcome this – using an intelligent swarm
network.
3
1.3 Scope
In this project, we explored an appropriate method and algorithm, in conjunction
with the intelligent swarm network model network infrastructure, to alleviate attacks
as well as to detect unusual network traffic activity with a reasonable measure of
certainty. It identifies and classifies IP addresses that could be the source of a
probable DDoS attack. This is further assisted by utilizing Wireshark, a network
protocol analyser that captures packets and traffic on a network.
A simulated DDoS attack was carried out by attackers and their attack data,
alongside its usual traffic data, were captured for analysis. The captured data was
compared against multiple factors from its input/ output (IO) results.
1.4 Overview
The entire report is divided into four main segments.
The introduction is presented in chapter 1.
Chapter 2 reviews related works and articles.
Chapters 3, 4 and 5 cover the project strategy, implementation as well as the
results.
Chapters 6, 7 and 8 discusses about the results of the implementation, the
conclusion and also any other possibilities or improvements that could be
applied in future.
4
2. REVIEW OF RELATED WORK
Chapter 2 describes a few useful articles that are related to the study and the
contributions of this report.
2.1 Statistical Anomaly Detection
Roland used a statistical anomaly approach in the detection of DDoS attacks [7]. As
anomalies in network traffic can usually be attributed to attacks, statistical anomaly
detection exploits this notion as it calculates the deviation of parameters between
current traffic and normal traffic. Header field values from the packets are estimated
to a multinomial distribution. The empirical cumulative distribution function (ECDF)
of the oscillations are calculated around the expected mean as well as for the last N
oscillation values before using a 2-sample Goodness-of-Fit test and comparing the
difference in area under the two ECDFs to determine if an anomaly has occurred.
This detection method is based on the assumption that equates all anomalies to
attacks on the network. Furthermore, the nominal traffic is also assumed to include
all possible cases of normal traffic such as the downloading / uploading of files as
well as the spectrum of TCP or UDP ports used. If the second assumption is not
validated accurately, this detection method would produce multiple false positives.
2.2 Chi-Square Statistic Approach
AIDS by Leu and Lin is a distributed security system using distributed computing in
the form of mobile agents in order to delegate analysis tasks to multiple components,
greatly reducing the computation workload of the detection task [8]. Once the
detection of an attack occurs, the attack source information would be saved into a
database where a firewall can be updated automatically in real-time to forcibly drop
current connections and preventing further communication between the system and
the attacker.
5
AIDS improves on the chi-square formula used by Feinstein et al. in DoS
detection[9]. According to the original chi-square formula, the authors created two
groups of data which comprises of the frequencies of the packets from connections
recorded during the current time and previous time respectively. The connections in
each group are further segmented into six ranges of varying packet frequencies. The
difference of the two groups of data is calculated with a chi-square statistical method
and compared in order to detect the presence of a DoS or DDoS attack based on the
significance of the magnitude of difference. Leu and Lin further improved the
approach by creating different mechanisms in order to capture the source of the
attack as well as broaden and increase the number of ranges used to classify the
various connections. AIDS establishes a baseline profile by collecting normal
packets from subunits of a geographically concentrated unit before it starts its
detection algorithm. Source information such as the IP address, port used, packet
frequencies per day over a range of 7 days as well as the average per-10-seconds
frequency would be recorded in a database for analysis before being updated every
10 seconds and organised into the different ranges ranked accordingly based on the
magnitude of packet frequency sent.
On the contrary, design and logic flaws exist within such a system. Firstly, the
detection assumes that the first contact of every new connection would have
frequencies similar to that of normal traffic. The captured source information would
then be recorded and the connection ranked. In the case of a new IP source trying to
connect to the system, assuming that the attacker is DoS-ing right from the start and
that his attack pattern is somewhat constant to a certain extent, all the system would
do would be to log in the information and later rank it at the top of the list since the
frequency of packets sent would be much greater than usual traffic. Detection of a
DoS attack would not occur since the chi-square value of the packet frequency in the
first 10 seconds would not differ much from the next 10 seconds. By the time
detection occurs, it would be when the attacker has decreased or stopped the slew of
attacks but yet remain connected to the system. The difference between the chi-
square values would now be significantly greater. Assuming that if it were more than
one of such connections, the system would easily be overloaded and brought down.
6
Secondly, this system only provides a preventive approach in solving the DDoS
situation. In the case of an actual compromise, it does not provide a cure, relief or
any other failsafe methods to alleviate the situation.
2.3 Covariance Analysis Method
Jin and Yeung proposed the effect of multivariate correlation analysis and covariance
matrices to detect DDoS attacks by its characteristic SYN flooding[10]. In such
attacks, the numbers of the SYN and FIN flags in the control field of the TCP header
do not match. Correlation changes between each pairs of flags in the TCP header
control field could allow for the detection of the incidence of an anomaly. This
method allows for an apparent differentiation between the normal and attack traffic;
however, the selection of an appropriate time interval for observation is debatable.
2.4 Intelligent Fast-Flux Swarm Network
The intelligent swarm network as proposed by Lua and Yow has the ability to self-
organize in order to react to different situations in the case of an attack[11]. The
swarm network is essentially almost similar in equivalence to a botnet where
multiple bots, or in this case, nodes, are connected to form a network infrastructure
which organizes and transmits messages between clients and servers. Swarm
networks utilize the benefits of distributed computing in order to mitigate the number
of requests in the event of an attack. Comparable to a zombie computer, spare
bandwidth of the swarm nodes would be employed for such a purpose.
Swarm networks use the Intelligent Water-Drop algorithm for a large-scale parallel
search to delegate the networking demands. This algorithm searches for optimal
relay routes similar to how water flowing naturally would follow a path of least
resistance. The fast-fluxing hosting technique allows for the rapid shuffling of IP
addresses under a domain host which connects the swarm nodes, servers and clients.
Combining both the Intelligent Water-Drop algorithm with fast-fluxing allows for a
robust and optimized system in withstanding against an overload of connections.
7
Furthermore, this implementation would be easily applied as no alteration to the
clients and servers are needed – IP addresses and domains can simply be registered
together with the swarm network. On the other hand, there is simply just that much
that the network can withstand. If the number of connections or malicious nodes
were to be far larger than the number of swarm nodes, overwhelming is still a
probable outcome.
8
3. PROJECT PLAN AND STRATEGY
Chapter 3 discusses various methods that give rise to a suitable and effective model
of DDoS protection that consumers can adopt. It expounds more on the purpose of
this report by describing the rationale behind it. It also elucidates the initial ideas as
well as the research conducted preceding the implementation, in order to arrive at an
appropriate recommendation.
3.1 Initial Idea – Three-Pronged Approach (1)
An effective and comprehensive solution to counter against the threat of DDoS
attacks would be to adopt a three-pronged approach in mitigating, neutralizing and
attacking the source of flooding of packets on servers and websites.
The mitigation section alleviates any excessive traffic that is unable to be disengaged
or identified.
The neutralizing section would most definitely be the crucial and primary portion of
this model as it is where attacks are stopped in its tracks. Great emphasis has been
placed in this area as made evident by the many detection methods and preventive
measures researched and recommended by the industry.
The last section would be the attacking section, which is rare for it to be included
into this model, or for any other proposals. However, we are bold enough to
incorporate this function as it gives a chance to shut down the source of the attack.
These three focus areas will be further explained in the sections below.
3.1.1 DDoS Mitigation Module
In order to first protect against an attack, the first resource that should be needed is
bandwidth as it acts as a failsafe. With reference to section 1.1 in the introduction of
9
this report, we mentioned that bandwidth overprovisioning was not the most efficient
nor cost-effective solution.
Looking at Anonymous’ attack on Sony Playstation Network in April 2011[12], the
DDoS attack was merely a diversion for other sophisticated attacks taking place at
the same time. Learning from these incidents, perhaps a method to absorb or cushion
these impacts would be much more desirable a trait to implement in our model.
Therefore, we propose the adoption of Lua and Yow’s Intelligent Swarm Network as
the preferred network infrastructure that is effective in the act of absorbing the
impacts of DDoS attacks.
3.1.2 DDoS Neutralizing Module
Based on this design, we also propose the use of a ban list or firewall. Once an attack
is detected, the source IP addresses that are found to be DDoS-ing will be added into
a database which will also be included into the firewall itself. This prevents further
threats from the confirmed and suspected IP source addresses. Conversely, we
acknowledge that in all detection methods, be it statistical or by correlation, there
might still be a slim chance of false positives reported. Thus, a CAPTCHA would be
implemented in an authentication page whereby users who have been falsely reported
to send malicious traffic could regain their connections and remove the bans on
themselves, unless they were banned again for infringing the detection conditions,
which should be unlikely if they were human users.
Using the firewall, rules can be added in order to block specific DDoS tools, such as
LOIC, which was being used by the Anonymous group in the recent spate of political
protesting events. The binary version of LOIC, initially created as a network stress
testing tool by Praetox Technologies, allows for the attack selection of HTTP, UDP
or TCP SYN floods as well as the target port number, package message, number of
threads and the request timeout[13]. A new web-based Javascript version of LOIC
that was released in 2010 only allowed for HTTP attacks. Notable signatures of
10
LOIC have been analysed by Montoro and they can easily be detected when these
behavioural patterns are included in the firewall rules[14].
DDoS attacks have many different variations such as UDP floods, TCP SYN floods,
ICMP floods and HTTP floods. According to latest statistics, HTTP floods remain
the highest and most-used form of DDoS attacks with UDP floods at second place
with a huge gap of frequency difference[15]. Even though the Swarm layer would be
effective enough to counter against such floods, nevertheless, we should endeavour
to add on as much layers of protection as possible for the best defence. One method
that we can deploy is the use of the Covariance Analysis method as mentioned by Jin
and Yeung to specially tackle against TCP SYN floods. Similarly, an Anomaly
Detection method would help in identifying threats based on the different protocols.
Using the Statistical Analysis method, a high traffic rate or atypical number of
packets from bots can be identified and filtered. Using the Anomaly Recognition
method, auto-learning of nominal baselines for protocol and source network traffic
can help in the identification and filtering of such malicious activities.
3.1.3 DDoS Attack Module
One special characteristic of this model that is rarely seen would be the decision to
include an approach for counterattack even in the midst of an attack.
In the early stages of planning, the initial idea was to set aside a small group of
selected swarm nodes to ‘flank’ and counter-DDoS the C&C server such that the
botnet’s instructions for attack to its bots would cease, thereby bringing it to an
immediate end. Despite it being an effective ‘fight fire with fire’ approach, a few
reasons were ascertained to reconsider a different strategy.
Firstly, an algorithm or criteria would be required for the selection process of swarm
nodes in the swarm network. Clusters have to be formed and it could be done using
either nodes from a common geographical location, hierarchical cluster or just
simply nodes that were entertaining low demands of bandwidth. This would require a
11
complicated set of rules to account for the many different situations that would arise.
Furthermore, this cluster would have to be temporal and also automated. Focusing
too much on this technical scope would overlook the other concerns underlying the
purpose of forming such a cluster, which are presented in the points that follow.
Secondly, given that the botnet utilized IP spoofing together with fast-flux
technology, it would not be impossible but it would be tough to determine the actual
IP address of the C&C server. Especially when combined with fast-fluxing, the IP
addresses would be changing itself continuously within a broad range of other
obscure IP addesses. Tracing the actual cyber-location of the server would have to be
done manually and the effort needed would be tedious. Perhaps after all the analysis,
the attacks would have already ceased and it might defeat the purpose of the counter-
attacking module. In addition, insisting on bombarding all those IP addresses would
eventually affect innocent individuals whose workstations are unknowingly
transformed into zombie computers. Although it adopts the ‘sacrifice some for the
greater good’ methodology, these individuals might just be genuine users of the
online service provided by the organisation and thus will have an effect on their
brand loyalty in a certain way.
Thirdly, retaliation in a like manner would result in a greater counteroffensive by the
botnet. Knowing that hacktivists, Anonymous, are a group of individuals who push
political ideas through cyber crimes, it can be envisaged that a great form of
vengeance would follow after the initial assault.
Through the above reasons, we conclude that a counter-offensive in like manner is
deemed as an undesirable resolution in this approach and we seek a solution a
simple, yet quick solution in deducing the source of the botnet attacks in which it
would reduce any possible negative implications. Therefore, we propose an alternate
idea in the attacking phase. In section 3.2, we begin to explore the analysis of
internal swarm nodes which might have a part to play in the botnet attacks.
12
3.2 Initial Idea – Three Pronged Approach (2)
Still conforming to the three-pronged strategy as mentioned in section 3.1, the
second strategy proposed a safer method that could be implemented in the attacking
module of the said approach.
A technique could be used to detect the range of fast-fluxed IP addresses used by the
botnet through the analysis of communication exchange from the internal swarm
nodes with the external botnet server. This implies that we have to assume a node in
the swarm network is compromised and that it in itself is sending a flood of packets
to its own or other networks.
Therefore, we propose to search for anomalous traffic between a node and external
networks that could involve the node communicating with the server. From this, we
attempt to produce a list of possible IP addresses of the botnet server, together with
an algorithm to classify the probability of the suspected IP being a C&C server. In
turn, the results will be ranked, with this list being sent to a relevant authority to shut
down the source of the attack, namely, the C&C server. This would further reduce any
friction between the botnet and the organisation using this model.
Even though this approach may not be targeting the desired effect we initially had in
mind, i.e. the quick impedance of the botnet sending an overflow of packets, we
believe that it will benefit in the long run.
An experiment was conducted to explore the feasibility of the solution. Details of the
simulation will be further explained in Chapter 4.1.
13
4. IMPLEMENTATION
This chapter describes the process as well as the challenges faced in searching for a
suitable attack approach for the three-pronged model proposed in section 3.2.
4.1 Attack Simulation
In this experiment, we simulate an actual botnet attack to an online server while
gathering information and data on the modes of attack as well as communication
with the botnet server.
14
4.1.1 Experiment Software
This section lists the different software that were beneficial to this experiment.
4.1.1.1 XAMPP v1.17 VC9
Figure 1 – XAMPP in operation
XAMPP is a fuss-free Apache distribution containing MySQL, PHP and Perl which
easily facilitates the setting up of an online server.
15
4.1.1.2 No-IP DNS Update Client (DUC) v3.0.4
Figure 2 – No-IP DUC Client
No-IP is a DNS service provider which offers Dynamic DNS as well as managed
DNS. The Dynamic Update Client allows users to keep their server's names updated
in DNS. This is used in conjunction with XAMPP in order to create an online server
that is accessible through the internet.
16
4.1.1.3 Hive Mind LOIC v1.1.2.1
Figure 3 – Hive Mind LOIC
LOIC was created by Praetox Technologies for the basic purpose of network stress
testing. It delivers a Denial of Service attack by flooding with the various network
protocols such as HTTP, TCP and UDP. Hive Mind LOIC, a variation of the original
LOIC, was adapted for centralized control by ‘NewEraCracker’ before the project
later taken over by ‘Urijah’. The first version included remote control through
Internet Relay Chat (IRC) in which a particular IRC channel could dish commands
through messages. The later revision added RSS control which allowed more modern
and easily accessible social media such as Twitter.
17
4.1.1.4 mIRC v7.22
Figure 4 - mIRC
mIRC is a full featured Internet Relay Chat client that can be used to communicate,
with others on IRC networks around the world, either in group or private discussions.
18
4.1.1.5 Wireshark v1.6.5 (SVN Rev 40429 from /trunk-1.6)
Figure 5 - Wireshark
Wireshark is a powerful network protocol analyser which allows for the capture of
packet information in network traffic. It screens all packets on a designated port such
as through Ethernet or Wifi. The main feature of Wireshark would be the ability to
inspect the raw information of individual packets in a collection of captured data.
19
4.1.1.6 Sawmill Enterprise v8.5.5
Figure 6 – Sawmill Enterprise
Sawmill is a web-based hierarchical log analysis tool that is able to process almost
any type of log data. It displays reports of statistics and graphs through records saved
into its database. It also allows dynamic segmentation of the reports through its
advanced filtering capabilities.
20
4.1.1.7 NetWitness Investigator v9.0.5.4
Figure 7 – NetWitness Investigator
NetWitness Investigator is a threat analysis tool that provides an accurate analysis of
raw captured network data. It has the ability to parse packets by using a lexicon of
words, verbs and adjectives.
4.1.1.8 WinDump
Figure 8 - WinDump
WinDump is the Windows version of tcpdump, a command line network analyzer for
UNIX. It captures information using the WinPcap library and drivers. WinDump can
also inspect and diagnose network traffic according to various complex rules.
21
4.1.2 Experiment Set-Up
The figure below shows the configuration of the various hardware components used.
Figure 9 – Experiment Set-Up
A few workstations would be conducting an attack through Hive Mind LOIC on an
online server. Wireshark will be installed on all workstations as well as on the server
to capture the flow of network traffic to and from the workstations. These logs will
then be parsed using NetWitness, and Sawmill through WinDump, to see if there are
any suspicious indicators while LOIC relies on the IRC or Twitter RSS feeds for
commands.
22
4.1.3 Experiment Steps and Details
This section explains the detailed steps as to how the experiment was carried out.
1. XAMPP is set up and both Apache and MySQL is running.
2. Static IP address is created.
3. No-IP account created and DNS Update Client is started to ensure that
the server is accessible online.
4. Wireshark is installed on all workstations to start capturing data.
5. An mIRC channel is set up and the following sample command is
used to start the attack.
6. Similarly, we use the same command through Twitter’s RSS feeds.
7. Once a sufficient amount of data is logged, we stop the capturing and
pass it through Sawmill and NetWitness Investigator to detect whether
there are any anomalies.
!lazor targetip=127.0.0.1 message=test_test port=80 method=tcp wait=false random=true
start
23
4.1.4 Experiment Results
The following shows no obvious alerts when the logs are parsed using Sawmill and
NetWitness Investigator. Screenshots of the results are taken and displayed below.
Figures 10 and 11 show the logs that are parsed through NetWitness Investigator for
the IRC and Twitter-based DDoS attacks.
Figure 10 – NetWitness Investigator log of IRC-commanded LOIC
24
Figure 11 - NetWitness Investigator log of Twitter-commanded LOIC
Figure 12 below depicts the alerts shown by NetWitness Investigator in its ‘Demo
Collection’ highlighted in red.
25
Figure 12 - NetWitness Investigator demo collection showing ‘Alerts’
Screenshots from Sawmill and Wireshark are not posted as the results do not differ
much in the statistics of the reports generated from logs of normal traffic.
4.1.5 Experiment Discussion
The results from section 4.1.4 give us a sample idea that not all communication
between the C&C server and the actual bot dishing the DDoS attacks can be easily
sieved out. Currently, LOIC’s method of extracting understandable instructions can
easily be replicated in any other applications such as a basic RSS reader.
Furthermore, the underlying medium connecting these zombie computers may not
simply be the case of a network stress testing tool that was voluntarily engaged by
the user- malware may infect unknowing users and turning their machines into
slaves. Directives for different malware will definitely be poles apart and we cannot
26
expect to search for common answers in random ports or information in packets
associated with this connection.
There are many other solutions that are currently available in the industry that might
help in seeking out these IP addresses. Network tracing is one good manual
resolution. However, we do not seek to reinvent the wheel, but rather derive
alternative solutions to a similar problem from a dissimilar angle.
Therefore, we propose to analyse the traffic from an internal swarm node to an
external network – not the communication as before, but rather the quantity of data
sent.
27
4.2 Network Bandwidth Analysis
In this approach, we no longer tackle this subject through the offensive perspective
of counterattacking and the like. Instead, we seek out solutions to prevent bandwidth
drain in both internal and external networks in these two possible scenarios.
Firstly, the internal node is compromised and it spams the internal network with
requests. This might be the more precarious situation as the malware might be spread
through the swarm network or directly through flash drives by unsuspecting
consumers of a node’s services within the same geographical location.
Secondly, similar to the first in terms of a compromised internal node, but just that it
would be flooding external networks instead. As the internal swarm layer utilizes a
fast-flux technology to mitigate traffic, external networks that detect an attack from
our internal node might try to ban its IP address, which might lead to a ban of
multiple IPs assuming the fast-flux works for outgoing traffic as well.
In an actual situation, we propose the analysis of network logs captured by the
network administrator or Internet service provider to compare against a log of
nominal traffic flow with a reasonable threshold of 250% to determine if the
bandwidth exceeds the expected normal flow. However, we simulate this with the
same set up and logs as in section 4.1.2.
4.2.1 Analysis Approach and Design
We utilize Java as our platform choice as it is portable, powerful and that it has a
well-designed set of APIs, coupled with sophisticated just-in-time compilers.
Three different methods have been brainstormed, which would enable the detection
of the presence of an anomalously large traffic.
28
4.2.1.1 Tracing Analysis
The first method thought out was to parse the log files using a different algorithm
and to trace suspicious packets streams to verify the intent and the cause of heavy
traffic when it occurs. On the other hand, as stated previously in section 4.1.5 that
tracing, in itself, is largely manual in nature since it would be tough to program for
all significantly different case scenarios. Therefore we come to consider the second
method.
4.2.1.2 Statistical Analysis
In terms of statistics, the closest distribution would be the Chi-Square distribution
with its Goodness-of-Fit test. Initially the Chi-Square test was implemented in the
program when the frequencies of packets at regular intervals where averaged and
classified into different ranges according to the values in the nominal traffic. This
was to assist in deriving the observed and expected values. However, structural
zeros, cells in the contingency table in which observations can never occur, are
present in the table due to the way it was constructed. Thus, we come to our final
choice.
4.2.1.3 Graphical Analysis
The initial idea was to rely on the use of weights for different ranges of frequencies
but given a large datafile of 10000 seconds and the attack takes 3 minutes, the
significance of the 3 minutes is only 1.8%, which will be rather unfeasible.
The next idea was to find the difference in total area under the current and nominal
data graphs with a threshold difference from the nominal graph. The areas will be
normalized such that both graphs are set to the same reference points before
comparison.
29
First, all packets from both nominal and current data files would be averaged over a
10 second period using the formula:
Secondly, the frequencies will be added up before the total value from the current
data file is normalized by dividing by the total number of such averaged periods,
multiplied by the number of periods in the nominal data file.
Lastly, the current data file’s new value will be checked against the nominal data
file’s value multiplied by the threshold of 250%.
If it is shown to have exceeded, an alert will be printed out to the screen.
Other than the difference of frequencies, the program will also display alternative
calculation results. Below shows two of such results.
Range of
values
<100% 100%-
150%
150%-
200%
200%-
250%
250%-
500%
500%-
1000%
> 1000%
Nominal 4536.0 0.0 0.0 0.0 0.0 0.0 0.0
Expected 3132.0 54.0 81.0 108.0 486.0 432.0 243.0
Table 1 – Example Output: Range of Values
Another result displays the differences between the averaged number of request and
response packets for only HTTP and TCP options when it is selected.
30
Nominal 5.5 2.5 0.3 0.7 2.4 5.4 0.3 0.4 0.3 4.8 0.9
Current 12.1 2.3 3.4 3.1 8.9 5.2 6.9 2.0 2.9 2.7 6.2
Table 2 – Example Output: Difference in request and response packets
This helps the user of the program to interpret the data from a different angle, as well
as to support the claim that a possible attack sequence is present.
31
4.3 Main Implementation
This section touches on the methodology of how a log file is filtered and passed into
the program in order to derive a conclusion with substantial probability.
4.3.1 Wireshark Filters
Wireshark's most powerful feature is its vast array of display filters. It allows for one
to drill down to the exact traffic that one would like to see. It forms the basis of many
of Wireshark's other features such as the coloring rules. We shall employ these
display filters in our methodology.
Before we commence on searching for the preferred filter, it is imperative to first
recognise the necessity of the filters and the main intent of employing them.
One of the most significant features of a DDoS bot’s traffic would be that it would
always try to create as many connections as possible within a short time, in order to
bring down the serviceability of a website or server. Thus, with this knowledge, we
make use of Wireshark’s filters to pinpoint on request packets that are sent by the
node.
The following sections explain on the use different filters used for each type of
flooding.
4.3.1.1 HTTP flood
A HTTP flood attack sends as many requests as possible without waiting for
acknowledgement or port assignment responses. This provides the potential to take
down any single server from any reasonably fast system with a decent connection
simply by running the server out of available response ports, roughly around 64
thousands of them per Ethernet port before the 90 second connection timeouts occur.
32
Although this would not be as effective on most cloud servers, it could take down a
small Intranet or privately hosted site entirely.
Therefore, we make use of the following filters to sieve out packets from and to the
node.
HTTP GET Request
HTTP Response
The IP address filter ‘192.168.0.0/16’ signifies a wildcard filter of a total of 16 bits
where the ‘zero’ value is.
4.3.1.2 SYN flood
Detecting SYN Flood attacks is usually rather easy - if there are countlesss packets
coming in with the SYN flag set in a very short time frame, from either one single IP
or multiple from around the world, it would indicate the occurrence of an attack.
Typically those attacks try to flood servers with a rapid series of SYN packets
without ever reacting to the resulting SYN/ACK.
SYN/ACK
SYN
(tcp.flags.syn == 1 and tcp.flags.ack == 0 and ip.src == 192.168.0.0/16)
(tcp.flags.syn == 1 and tcp.flags.ack == 1 and ip.dst == 192.168.0.0/16)
(ip.dst== 192.168.0.0/16 and http.response)
(ip.src == 192.168.0.0/16 and http.request)
33
4.3.1.3 UDP flood
UDP is a connectionless protocol and it does not require any connection setup
procedure to transfer data. A UDP flood occurs when an attacker sends a UDP
packet to a random port on the victim system. When the victim system receives a
UDP packet, it will determine what application is waiting on the destination port.
When it realizes that there is no application that is waiting on the port, it generates an
ICMP packet of destination unreachable to the forged source address. If enough
UDP packets are delivered to ports on the victim, the system will go down.
UDP
4.3.1.4 ICMP flood
ICMP flood attack is also known as a ping attack. It is where large ICMP ping
packets are sent to the server repeatedly so that the server will not have time to
respond to other servers.
ICMP
4.3.1.5 Using Wireshark
This section lists the steps of utilizing Wireshark in aid in the parsing of log files.
1. After a log has been captured, we can navigate to the IO Graphs function as
seen in Figure 13.
(icmp)
(udp)
34
Figure 13 – Wireshark > Statistics > IO Graphs
2. From here, we can input the various filters from sections 4.3.1.1 through
4.3.1.4, or no filter at all to compare the raw traffic. However, only up to 5
graphs can be defined, as depicted in Figure 14.
Figure 14 – Wireshark IO Graphs
35
3. Note that the various filters have to be in order, such as HTTP and TCP
where the ‘HTTP GET Request’ filter has to be directly before the ‘HTTP
Response’ filter.
4. After loading all the graphs, click on ‘Copy’.
5. Open a ‘New Text Document’ and paste the values into the Notepad (or text
editor of choice).
6. Save the document as a “.csv” file.
7. Repeat for both current and nominal files.
4.3.2 Outflow.java
Outflow.java and Calc.java, its corresponding calculation class, were created to help
in the parsing of these Wireshark logs. It follows the logic as mentioned in section
4.2.1.3. This section describes the methods and input and output parameters.
4.3.2.1 Outflow.java Input Parameters
Figure 15 below shows the usage of Outflow.
Figure 15 – Outflow.java in use
The input command to execute Outflow.java is as follows:
java Outflow “nominal.csv” “current.csv” “outputfile” (-h) (-t) (-r) (-u) (-i)
36
The table below shows the arguments as well as what they signify:
Arg Explanation
-h HTTP GET Requests and HTTP Response packets
-t TCP SYN and TCP SYN/ACK packets
-r Raw (unfiltered) packets
-u UDP packets
-i ICMP packets
Table 3 – Outflow.java Available Arguments
The options in the brackets are only limited to 3 per execution - the reason being that
Wireshark only allows for the definition of 5 graphs, for which HTTP and TCP
already require 2 graphs each for comparison.
37
4.3.2.2 Outflow.java Methods
Below shows the main method of Outflow.java.
The main method was created to take in arguments from the command prompt and
check if the arguments are valid before storing the arguments into variables.
The arguments are passed into the file read in method to create 2 arrays to store the
values. Calc objects are created to store the relevant information of each test (http,
tcp, udp, icmp).
38
For the corresponding Calc.java, it handles all the calculations of the values that are
passed in. Such methods include calculating the average values in periods of 10
seconds, classifying data into different ranges and finding the weightage of both
current and nominal data before comparing the two based on a given threshold of
250%.
39
Other methods not included here are the file input and output methods in
Outflow.java, as well as the ‘get’ methods in Calc.java. The rest of the source code
will be included in the Appendix.
40
4.3.2.3 Outflow.java Output
Figure 15 in section 4.3.2.1 shows the printed output of Outflow.java in addition to
its usage in the command prompt.
Figure 16 – Outflow.java Output
41
4.3.3 Testing
Numerous exception catching and checks have been put in place in the program. This
portion will seek to address any such flaws that may occur within the execution. We
define the following input parameter as the optimal input in the command line:
The following attempts have been made to see if there was any failed catching of
exceptions:
1) FileNotFound Exception
2) ArrayIndexOutOfBounds Exception
3) Invalid Arguments
4.3.3.1 FileNotFound Exception
In the input parameters, we produce the following output in Figure 17 by changing
the name of the input file. This indicates that the exception has been caught.
Figure 17 – FileNotFound Exception
4.3.3.2 ArrayIndexOutOfBounds Exception
Now, we try to change the values in the read in files to see if any exceptions are
caught.
Figure 18 – ArrayIndexOutOfBounds Exception
java Outflow nom.csv cur.csv out.txt -h -t -u
42
4.3.3.3 Invalid Arguments
Instead of the optimal input, we added extra arguments and lessened arguments in
which all of them have been mitigated.
Figure 19 - Invalid Arguments
Thus, we conclude that our testing is complete and that there are no human coding
errors.
43
5. RESULTS
Filename Tested Against
Actual Status Type
Time captured
Max Avg
Nominal Weight
Current Weight
Attack Prob.
False Positive
?
150212 140212 Normal
HTTP Get
34000
16.3 540.2 881.94 N.A. No
HTTP Response 16.5
TCP SYN 6.8 143 383.74 26.84 Yes
TCP SYN/ACK 6.8
UDP 15.3 3840.77 5223.23 N.A. No
250212 140212 Victim
of attack
HTTP Get
17004
11.7 540.2 348.8 N.A. No
HTTP Response 12
TCP SYN 12.1 143 325.11 N.A. No
TCP SYN/ACK 11
UDP 15.7 3840.77 1193.75 N.A. Not sure
260212 140212 Victim
of attack
HTTP Get
17713
19.4 540.2 3093.77 57.27 Yes
HTTP Response 18.6
TCP SYN 12.6 143 2743.18 191.83 Yes
TCP SYN/ACK 12.3
UDP 284.4 3840.77 61162.5
5 159.25 Not sure
17032012hivemindR
SS 140212
Failed Attacker
HTTP Get
122
0.4 540.2 422.58 57.27 No
HTTP Response 0.9
TCP SYN 0.4 143 528.23 36.94 No
TCP SYN/ACK 0.4
UDP 6.5 3840.77 7923.46 N.A. No
20032012hivemindR
SS 140212
Failed Attacker
HTTP Get
32
0 540.2 0 N.A. No
HTTP Response 0
TCP SYN 0.1 143 114.45 N.A. No
TCP SYN/ACK 0.1
UDP 0.6 3840.77 686.7 N.A. No
The program has successfully achieved a 73.33% accuracy thus far from a total of 6
files with normal traffic and 0 files that are malicious in nature. This proves that the
logic behind the program is significant and could be considered as a possible solution
to be employed and recommended in thwarting internal machines from inflicting
digital harm to external networks.
44
6. DISCUSSION
In this chapter, we discuss the results as well as a few limitations that are relevant to
the model that we have proposed.
6.1 Observed Results
While the initial estimate for the accuracy of our program to be in the range of 80%
to 98%, results taken from 6 files from multiple computers showed 73.33% accuracy.
This could largely be due to the small sample size that enabled it to obtain a low
score. In order to fully test the capabilities of Outflow.java, real data of at least a 100
files would be sufficient. Since this experiment was merely a simulation, it followed
as closely as possible to obtain a normal traffic flow similar to real world data.
This could imply that the traits used to identify such outgoing threats could be much
of a consideration for employment in the industry.
6.2 Design Limitations
Even though we believe that the model that we have recommended is appropriate and
effective in botnet attacks, there are still certain areas that could be looked into to
improve its efficiency or its logic. We expound on three limitations below.
6.2.1 Assumptions
The first and most important assumption that we have to make is that a node within
the swarm layer is compromised and has been infected with malware that allows it to
be part of the zombie network.
The second assumption that we make is that we assume the nominal flow is perfect.
Or even before that, what constitutes as a good nominal flow? If this question is not
answered, it is tough to define a perfect flow or even attain it. Furthermore, the
45
traffic flow rates might change with time and events, and new nominal flows have to
be calculated if needed.
6.2.2 Response time
One big con in this model is that the network administrator has to manually operate
the capturing of the packets of different nodes through Wireshark, later to be parsed
individually through the program that has been created. It is tedious and results in a
great bottleneck at the log capture portion instead of the analysis portion.
In addition, using such statistical or graphical analysis requires that data has to be
first collected before any form of computation can be carried out. This implies that
when threats or anomalies do occur, their presence will be made known only when
logs of the nodes are examined. This lowers the response time greatly.
6.2.3 Logic Limitations
Having an irregular sample size for analysis means that complex logics and
algorithms are required in order to deal with such cases. Given a standardised capture
time size, it would improve the efficiency by incorporating intricate algorithms that
are better fitting and produce more accurate results.
‘Pulsing’ zombies are compromised computers that are instructed to launch
intermittent and short-lived spamming of victims with the intent of merely slowing it
rather than crashing it. This type of attack, referred to as "degradation-of-service"
rather than "denial-of-service", can be more difficult to detect than regular zombie
invasions and can disrupt and hamper connection to websites for prolonged periods
of time, potentially causing more disruption than concentrated floods. Exposure of
degradation-of-service attacks is complicated further by the matter of discerning
whether the attacks really are attacks, or just healthy increases in website traffic.
46
With the threshold of the program set at 250%, we may not be able to accurately
detect such a small-scale attack. Nonetheless, given that we are employing the use of
swarm networks as the base of our model, mitigation of 250% of normal traffic
would not put much of a dent in the serviceability of the network.
47
7. CONCLUSION
In this paper, we presented an effective DoS detection technique. A simulation
experiment with 19 sample traffic files was carried out to test the effectiveness of the
program. The proposed algorithm searches for abnormality in the frequency of
network packets sent and received. The compromised node is analysed by
highlighting certain known facts in HTTP, TCP, UDP and ICMP floods such as the
multiple requests for connection without reacting much to the response given. The
program calculates the difference through the averages of the frequencies of packet
occurences after Wireshark is used to filter the said packets. The program analyses
these information and produces a result with certain probability that it is accurate.
Additional data was produced that could help the user in the support of the argument.
Results have shown that such threats from internal to external machines can be easily
identified using the said algorithm.
48
8. RECOMMENDATIONS
This program and model as a whole would be suitable for consumers to adopt. The
efficiency could be improved by creating a GUI for the program, analysing in
batches, or even using a script such as Tshark to automate both the Wireshark
filtering as well as analysis.
Possible ways to include backscatter analysis and Chi-Square analysis into the
program and model could be looked into as potential methods to detect and halt
attacks in their tracks.
49
REFERENCES
[1] ‘4chan Users Crash MPAA Website in Pro-Piracy Protest’. [Online].
Available: http://mashable.com/2010/09/18/4chan-mpaa-ddos-attack/. [Accessed:
29-Mar-2012].
[2] ‘Anonymous retaliates for LulzSec arrests, hacks Panda Security website |
Fox News’. [Online]. Available:
http://www.foxnews.com/scitech/2012/03/07/anonymous-retaliates-for-lulzsec-
arrests-hacks-panda-security-website/. [Accessed: 29-Mar-2012].
[3] ‘Anonymous takes down Vatican website • The Register’. [Online].
Available: http://www.theregister.co.uk/2012/03/08/anonymous_italy_hit_vatican/.
[Accessed: 29-Mar-2012].
[4] ‘Anonymous, WikiLeaks team up | News.com.au’. [Online]. Available:
http://www.news.com.au/breaking-news/anonymous-wikileaks-team-up/story-
e6frfku0-1226283931894. [Accessed: 29-Mar-2012].
[5] ‘LOIC: The Tool Anonymous Is Using to Essentially Turn You Into a
Botnet’. [Online]. Available: http://gizmodo.com/5877719/heres-the-tool-
anonymous-is-tricking-the-internet-into-using. [Accessed: 29-Mar-2012].
[6] Forrester Consulting, DDoS: A Threat You Can't A_ord To Ignore, 2009,
available_at http://verisigninc.com/assets/whitepaper-ddos-threat-forrester.pdf
[7] Roland Kwitt, A Statistical Anomaly Detection Approach for Detecting Network
Attacks.PDF, 2004,available at http://www.6qm.net/workshop/slides/6qm.PDF
[8] F. Y. Leu and I. L. Lin, ‘A DoS/DDoS Attack Detection System Using Chi-
Square Statistic Approach’.
[9] L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred, ‘Statistical
approaches to DDoS attack detection and response’, in DARPA Information
Survivability Conference and Exposition, 2003. Proceedings, 2003, vol. 1, pp. 303–
314.
[10] S. Jin and D. S. Yeung, ‘A covariance analysis model for DDoS attack
detection’, in Communications, 2004 IEEE International Conference on, 2004, vol.
4, pp. 1882–1886.
[11] R. Lua and K. C. Yow, ‘Mitigating DDoS Attacks with Transparent and
Intelligent Fast-Flux Swarm Network’. [Online]. Available:
http://dl.comsoc.org/livepubs/ni/public/2011/jul/lua.html. [Accessed: 29-Mar-2012].
[12] ‘The hacktivist threat: Brazilian bank sites continue to fall victim to DDoS
attacks | Security Bistro’. [Online]. Available:
http://www.securitybistro.com/blog/?p=975. [Accessed: 29-Mar-2012].
[13] ‘Attacks by “Anonymous” WikiLeaks Proponents not Anonymous -
UTpublications’. [Online]. Available: http://doc.utwente.nl/75331/. [Accessed: 29-
Mar-2012].
[14] ‘LOIC DDoS Analysis and Detection - SpiderLabs Anterior’. [Online].
Available: http://blog.spiderlabs.com/2011/01/loic-ddos-analysis-and-detection.html.
[Accessed: 29-Mar-2012].
[15] ‘DDoS attacks in H2 2011 - Securelist’. [Online]. Available:
http://www.securelist.com/en/analysis/204792221/DDoS_attacks_in_H2_2011.
[Accessed: 29-Mar-2012].
50
APPENDIX
1. PROGRAM SOURCE CODE
1.1 Outflow.java
import java.io.*;
import java.util.*;
import java.lang.*;
public class Outflow
{
/*Initialize all the variables*/
private static String nominalfile;
private static String currentfile;
private static String outputfile;
private static String nominalBank[][];
private static String currentBank[][];
private static Calc analysisTable[];
private static long startTime, endTime;
private static String type[] = new String [3];
private static int fileCol;
private static String choice;
public static void main(String[] args) throws IOException
{
try
{
fileCol=1;
/*Ensure that there is at least 4 and at most 6 parameters*/
if((args.length >= 4) && (args.length <= 6))
{
nominalfile = args[0];
currentfile = args[1];
outputfile = args[2];
for(int arg = 3; arg < args.length; arg++)
{
if (args[arg].equals("-h") || args[arg].equals("-t") ||
args[arg].equals("-u") || args[arg].equals("-i") || args[arg].equals("-r"))
51
type[arg-3]=args[arg];
else
{
System.out.println("Invalid Argument "+args[arg]);
System.exit(0);
}
}
}
else
{
System.out.println("Invalid Argument...");
System.exit(0);
}
/*Start the time taken for the process*/
startTime = System.nanoTime();
/*Read in the test data and the train data*/
fileReadIn(nominalfile, currentfile);
/*check if file read in is equivalent to number of choices*/
if((nominalBank[0].length-1)<(args.length-
3)||(currentBank[0].length-1)<(args.length-3))
{
System.out.println("Insufficient data...");
System.exit(0);
}
analysisTable = new Calc[type.length];
for(int i = 3; i <args.length; i++)
{
choice = args[i];
Calc cal = new Calc(nominalBank, currentBank, choice,
fileCol);
analysisTable[i-3]=cal;
if (args[i].equals("-h") ||args[i].equals("-t"))
fileCol+=2;
else //If choice is "-u", "-i" or "-r"
fileCol+=1;
}
52
/*Create output file*/
BufferedWriter outfile = new BufferedWriter(new FileWriter(new
File( outputfile )));
outfile.write("=============================================================
=====================");
outfile.newLine();
outfile.write("Detection of Botnets in Network Traffic
Outflow");
outfile.newLine();
outfile.write("=============================================================
=====================");
outfile.newLine();
outfile.write("Capture details of Nominal Packet file:");
outfile.newLine();
outfile.write("Total Time Captured = "+(nominalBank.length-1));
outfile.newLine();
outfile.write("Number of 10 second periods = " +
analysisTable[0].getNomNumPeriod());
outfile.newLine();
outfile.newLine();
outfile.write("Capture details of Current Packet file:");
outfile.newLine();
outfile.write("Total Time Captured = "+(currentBank.length-1));
outfile.newLine();
outfile.write("Number of 10 second periods = " +
analysisTable[0].getCurNumPeriod());
outfile.newLine();
outfile.newLine();
outfile.newLine();
for(int i=0; i< analysisTable.length; i++)
{
outfile.write("----------------------------------------------
------------------------------------");
outfile.newLine();
if ((analysisTable[i].getChoice()).equals("-h"))
53
outfile.write("Comparison of HTTP Get Requests and HTTP
Response packets");
else if ((analysisTable[i].getChoice()).equals("-t"))
outfile.write("Comparison of TCP SYN and TCP SYN/ACK
packets");
else if ((analysisTable[i].getChoice()).equals("-u"))
outfile.write("Comparison of UDP packets");
else if ((analysisTable[i].getChoice()).equals("-i"))
outfile.write("Comparison of ICMP packets");
else if ((analysisTable[i].getChoice()).equals("-r"))
outfile.write("Comparison of raw packets");
outfile.newLine();
outfile.write("----------------------------------------------
------------------------------------");
outfile.newLine();
outfile.newLine();
/*
outfile.write("average of nominal Data: ");
outfile.newLine();
for(int row = 0; row < analysisTable[0].getNomData().length;
row++)
{
for(int col = 0; col <
analysisTable[0].getNomData()[0].length; col++)
{
outfile.write(analysisTable[0].getNomData()[row][col]+"
");
}
outfile.newLine();
}
*/
outfile.write("Peak average values: ");
outfile.newLine();
outfile.write("Nominal file: ");
outfile.newLine();
if ((analysisTable[i].getChoice()).equals("-h"))
outfile.write("HTTP Get Requests HTTP Response
packets");
54
else if ((analysisTable[i].getChoice()).equals("-t"))
outfile.write("TCP SYN TCP SYN/ACK");
else if ((analysisTable[i].getChoice()).equals("-u"))
outfile.write("UDP");
else if ((analysisTable[i].getChoice()).equals("-i"))
outfile.write("ICMP");
else if ((analysisTable[i].getChoice()).equals("-r"))
outfile.write("Raw");
outfile.newLine();
for(int row = 0; row <
(analysisTable[i].getNomMaxAvg()).length; row++)
outfile.write(analysisTable[i].getNomMaxAvg()[row]+"
");
outfile.newLine();
outfile.newLine();
outfile.write("Current file: ");
outfile.newLine();
if ((analysisTable[i].getChoice()).equals("-h"))
outfile.write("HTTP Get Requests HTTP Response
packets");
else if ((analysisTable[i].getChoice()).equals("-t"))
outfile.write("TCP SYN TCP SYN/ACK");
else if ((analysisTable[i].getChoice()).equals("-u"))
outfile.write("UDP");
else if ((analysisTable[i].getChoice()).equals("-i"))
outfile.write("ICMP");
else if ((analysisTable[i].getChoice()).equals("-r"))
outfile.write("Raw");
outfile.newLine();
for(int row = 0; row <
(analysisTable[i].getCurMaxAvg()).length; row++)
outfile.write(analysisTable[i].getCurMaxAvg()[row]+"
");
outfile.newLine();
outfile.newLine();
if ((analysisTable[i].getChoice()).equals("-h") ||
(analysisTable[i].getChoice()).equals("-t"))
{
outfile.write("Difference in request and response
55
packets");
outfile.newLine();
outfile.write("Nominal: ");
for(int j=0; j<analysisTable[i].getNomDiff().length; j++)
outfile.write(analysisTable[i].getNomDiff()[j]+" ");
outfile.newLine();
outfile.write("Current: ");
for(int j=0; j<analysisTable[i].getCurDiff().length; j++)
outfile.write(analysisTable[i].getCurDiff()[j]+" ");
outfile.newLine();
outfile.newLine();
outfile.newLine();
}
outfile.write("Normalized frequency proportional to nominal
file");
outfile.newLine();
outfile.write("Range classes:");
outfile.newLine();
outfile.write(" <=100%, 100%-150%, 150%-200%, 200%-
250%, 250%-500%, 500%-1000%, >1000%");
outfile.newLine();
outfile.write("Nominal: ");
for(int
col=0;col<analysisTable[i].getRange()[0].length;col++)
outfile.write(analysisTable[i].getRange()[0][col]+" ");
outfile.newLine();
outfile.write("Current: ");
for(int
col=0;col<analysisTable[i].getRange()[1].length;col++)
outfile.write(analysisTable[i].getRange()[1][col]+" ");
outfile.newLine();
outfile.newLine();
outfile.newLine();
outfile.write("Weight of Nominal Data = " +
analysisTable[i].getWeightage()[0]);
outfile.newLine();
outfile.write("Weight of Current Data = " +
56
analysisTable[i].getWeightage()[1]);
outfile.newLine();
outfile.write("Threshold = " +
(analysisTable[i].getThreshold()*100)+"%");
outfile.newLine();
if
(analysisTable[i].getWeightage()[1]>(analysisTable[i].getWeightage()[0]*anal
ysisTable[i].getThreshold()))
outfile.write("Possible attack sequence detected with
probability of " + analysisTable[i].getProbability()+"%" );
outfile.newLine();
outfile.newLine();
outfile.newLine();
}
System.out.println("Output file saved to: "+outputfile);
outfile.close();
}
catch (FileNotFoundException e)
{
System.err.println("FileNotFoundException: " +
e.getMessage());
}
catch (IOException e)
{
System.err.println( "IOException: " + e.getMessage());
}
/*print out duration of the process*/
endTime = System.nanoTime();
System.out.println("Duration: " + ((endTime - startTime))/1000000
+" msec");
}//end main()
/*Read in the nominal and current data*/
public static void fileReadIn(String nomFileName, String
curFileName)throws IOException
{
try
{
String line;
int nomRowCount = 0;
57
int nomColCount = 0;
int curRowCount = 0;
int curColCount = 0;
int nomRow = 0, nomCol = 0, curRow = 0, curCol = 0;
/*Count number of lines*/
BufferedReader nominalfile = new BufferedReader (new
FileReader(nomFileName));
while ( (line = nominalfile.readLine()) != null )
{
nomRowCount++;
if (nomRowCount == 1)
{
StringTokenizer nomRowTokenizer = new
StringTokenizer(line, ",");
nomColCount=nomRowTokenizer.countTokens();
}
}
nominalfile.close();
/*Initialise nominalBank*/
nominalBank = new String[nomRowCount][nomColCount];
for(int row = 0; row < nominalBank.length; row++)
{
for(int col = 0; col < nominalBank[row].length; col++)
nominalBank[row][col] = "0";
}
/*Read in the nominal data*/
BufferedReader nominalfile2 = new BufferedReader (new
FileReader(nomFileName));
while (nominalfile2.ready())
{
line = nominalfile2.readLine();
StringTokenizer nomRowTokenizer = new StringTokenizer(line,
",");
/*first token is time label*/
nominalBank[nomRow][nomCol++] = nomRowTokenizer.nextToken();
while(nomRowTokenizer.hasMoreTokens())
{
StringTokenizer nomColTokenizer = new
58
StringTokenizer(nomRowTokenizer.nextToken(),"\"");
nominalBank[nomRow][nomCol++] =
nomColTokenizer.nextToken();
}
nomRow++;
nomCol = 0;
}
nominalfile2.close();
/*Count number of lines in current file*/
BufferedReader currentfile = new BufferedReader (new
FileReader(curFileName));
while ( (line = currentfile.readLine()) != null )
{
curRowCount++;
if (curRowCount == 1)
{
StringTokenizer curRowTokenizer = new
StringTokenizer(line, ",");
curColCount=curRowTokenizer.countTokens();
}
}
currentfile.close();
/*Initialise currentBank*/
currentBank = new String[curRowCount][curColCount];
for(int row = 0; row < currentBank.length; row++)
{
for(int col = 0; col < currentBank[row].length; col++)
currentBank[row][col] = "0";
}
/*Read in the current data*/
BufferedReader currentfile2 = new BufferedReader (new
FileReader(curFileName));
while (currentfile2.ready())
{
line = currentfile2.readLine();
StringTokenizer curRowTokenizer = new StringTokenizer(line,
",");
59
/*first token is time label*/
currentBank[curRow][curCol++] = curRowTokenizer.nextToken();
while(curRowTokenizer.hasMoreTokens())
{
StringTokenizer curColTokenizer = new
StringTokenizer(curRowTokenizer.nextToken(),"\"");
currentBank[curRow][curCol++] =
curColTokenizer.nextToken();
}
curRow++;
curCol = 0;
}
currentfile2.close();
}
catch (ArrayIndexOutOfBoundsException e)
{
System.err.println("Caught ArrayIndexOutOfBoundsException: "
+ e.getMessage());
System.exit(0);
}
}//end fileReadIn()
}
60
1.2 Calc.java
import java.io.*;
import java.lang.Math.*;
public class Calc
{
private String[][] nomData;
private String[][] curData;
private double[] nomMaxAvg;
private double[] curMaxAvg;
private double[] nomDiff;
private double[] curDiff;
private int nomNumPeriod;
private int curNumPeriod;
private String choice;
private double[][] range;
private double mean;
private double[] weightage;
private double threshold;
private double probability;
public Calc(String nominal[][], String current[][], String choice, int
column)
{
int k;
this.choice = choice;
if (choice.equals("-h") || choice.equals("-t"))
k = 2;
else //If choice is "-u", "-i" or "-r"
k = 1;
/*Initialising Size of Period array*/
nomNumPeriod = (nominal.length-1)/10;
if ((nominal.length-1)%10 !=0)
nomNumPeriod+=1;
curNumPeriod = (current.length-1)/10;
if ((current.length-1)%10 !=0)
curNumPeriod+=1;
nomData = new String[nomNumPeriod][k];
61
curData = new String[curNumPeriod][k];
/*Finding average of packet frequecies in 10 sec period*/
int col = 0; //col is the column for nomData, the averages
for(int i = column; i<(k+column); i++) //i is the column of data
where the average is calc-ed from
{
double sum =0;
/*Calculating averages for Nominal Data*/
for(int row =1; row<nominal.length; row++)
{
sum+=Integer.parseInt(nominal[row][i]);
if (row%10==0)
{
nomData[(row/10)-1][col]=Double.toString(sum/10);
sum=0;
}
if (row==(nominal.length-2))//minus 2 for header and array
indexing
{
nomData[row/10][col]=Double.toString(sum/((nominal.length-
1)%10));
sum=0;
}
}
/*Calculating averages for Current Data*/
for(int row =1; row<current.length; row++)
{
sum+=Integer.parseInt(current[row][i]);
if (row%10==0)
{
curData[(row/10)-1][col]=Double.toString(sum/10);
sum=0;
}
if (row==(current.length-2)) //minus 2 for header and array
indexing
{
curData[row/10][col]=Double.toString(sum/((current.length-
1)%10));
sum=0;
62
}
}
col++;
}
analyse();
}//end of Calc constructor
private void analyse()
{
/*Initializing Nominal and Current differences for HTTP or TCP*/
if (choice.equals("-h") || choice.equals("-t"))
{
nomDiff= new double[nomData.length];
curDiff= new double[curData.length];
for(int i = 0; i<nomData.length; i++)
nomDiff[i]=Double.parseDouble(nomData[i][0])-
Double.parseDouble(nomData[i][1]);
for(int i = 0; i<curData.length; i++)
curDiff[i]=Double.parseDouble(curData[i][0])-
Double.parseDouble(curData[i][1]);
/*Begin bubble sort*/
//sort(nomData,1);
//sort(curData,1);
}
//sort(nomData,0);
//sort(curData,0);
/*Classify and Weightage*/
findMaxAvg();
//Range classes: <100%, 100%, 150%, 200%, 250%, 500%, 1000%
//Initialise Variables
range = new double[2][7];
weightage = new double[2];
weightage[0] = 0.0;
weightage[1] = 0.0;
threshold = 2.5;
for(int i=0;i<range.length;i++)
63
for(int j=0;j<range[0].length;j++)
range[i][j]=0.0;
//comparing Data (GET values) with Nominal Max Average (GET values)
range[0][0] = nomData.length;
//Find weightage of Nominal Data
for(int i =0;i < nomData.length; i++)
weightage[0]+=Double.parseDouble(nomData[i][0]);
//Find weightage of Current Data & ranges
for(int i =0;i < curData.length; i++)
{
weightage[1]+=Double.parseDouble(curData[i][0]);
if (Double.parseDouble(curData[i][0])>(nomMaxAvg[0]*10))
range[1][6]+=1;
else if (Double.parseDouble(curData[i][0])>(nomMaxAvg[0]*5))
range[1][5]+=1;
else if (Double.parseDouble(curData[i][0])>(nomMaxAvg[0]*2.5))
range[1][4]+=1;
else if (Double.parseDouble(curData[i][0])>(nomMaxAvg[0]*2))
range[1][3]+=1;
else if (Double.parseDouble(curData[i][0])>(nomMaxAvg[0]*1.5))
range[1][2]+=1;
else if (Double.parseDouble(curData[i][0])>(nomMaxAvg[0]))
range[1][1]+=1;
else if (Double.parseDouble(curData[i][0])<(nomMaxAvg[0]))
range[1][0]+=1;
}
//Normalizing
for(int i =0;i < range[1].length; i++)
range[1][i]=(range[1][i]/curNumPeriod)*nomNumPeriod;
weightage[1]=(weightage[1]/curNumPeriod)*nomNumPeriod;
probability = weightage[1]*10/weightage[0];
if (weightage[1]>(weightage[0]*threshold))
{
System.out.println();
if (choice.equals("-h"))
System.out.println("---------HTTP Get Requests and HTTP
64
Response packets---------");
else if (choice.equals("-t"))
System.out.println("----------------TCP SYN and TCP SYN/ACK
packets--------------");
else if (choice.equals("-u"))
System.out.println("---------------------------UDP packets---
--------------------");
else if (choice.equals("-i"))
System.out.println("--------------------------ICMP packets---
--------------------");
else if (choice.equals("-r"))
System.out.println("--------------------------Raw packets----
-------------------");
System.out.printf("Possible attack sequence detected with
probability of %.2f", probability );
System.out.print("%.");
System.out.println();
System.out.println();
System.out.println();
}
}//end analyse()
/*Bubblesort Method*/
private void sort(String[][] data, int column)
{
boolean swap = true;
while(swap)
{
swap = false;
for (int i=0; i<data.length-1; i++)
{
if(data[i+1][column] != null)
{
if (Double.parseDouble(data[i][column]) <
Double.parseDouble(data[i+1][column]))
{
String temp = "";
temp = data[i][column];
data[i][column] = data[i+1][column];
data[i+1][column] = temp;
65
swap = true;
}
}
}
}//end while
}//end sort()
//method to find the max values in the nominal and current files
private void findMaxAvg()
{
double max=0;
nomMaxAvg= new double[nomData[0].length];
curMaxAvg= new double[curData[0].length];
/*Initialise nomMaxAvg and curMaxAvg*/
for(int row = 0; row < nomMaxAvg.length; row++)
nomMaxAvg[row]=0;
for(int row = 0; row < curMaxAvg.length; row++)
curMaxAvg[row]=0;
/*Find Max Average value for Nominal traffic*/
for (int nomCol =0; nomCol<nomData[0].length; nomCol++)
{
for (int nomRow =0; nomRow<nomData.length; nomRow++)
{
//find max
if (max < Double.parseDouble(nomData[nomRow][nomCol]))
max = Double.parseDouble(nomData[nomRow][nomCol]);
}
nomMaxAvg[nomCol]=max;
max=0;
}
/*Find Max Average value for Current traffic*/
for (int curCol =0; curCol<curData[0].length; curCol++)
{
for (int curRow =0; curRow<curData.length; curRow++)
{
//find max
if (max < Double.parseDouble(curData[curRow][curCol]))
max = Double.parseDouble(curData[curRow][curCol]);
66
}
curMaxAvg[curCol]=max;
max=0;
}
}//end findMaxAvg()
/*GET METHODS*/
public String[][] getNomData()
{
return nomData;
}
public String[][] getCurData()
{
return curData;
}
public double[] getNomMaxAvg()
{
return nomMaxAvg;
}
public double[] getCurMaxAvg()
{
return curMaxAvg;
}
public int getNomNumPeriod()
{
return nomNumPeriod;
}
public int getCurNumPeriod()
{
return curNumPeriod;
}
public double[] getNomDiff()
{
return nomDiff;
67
}
public double[] getCurDiff()
{
return curDiff;
}
public String getChoice()
{
return choice;
}
public double[][] getRange()
{
return range;
}
public double getMean()
{
return mean;
}
public double[] getWeightage()
{
return weightage;
}
public double getThreshold()
{
return threshold;
}
public double getProbability()
{
return probability;
}
}
68
2. SAMPLE FILES
2.1 Sample filtered log file from Wireshark(.csv)
"Interval start","Graph 1","Graph 2","Graph 3","Graph 4","Graph 5"
"0.000","11","0","0","0","0"
"1.000","20","0","0","0","0"
"2.000","4","0","0","0","0"
"3.000","8","0","0","0","0"
"4.000","25","1","1","0","0"
"5.000","15","0","0","0","0"
"6.000","11","0","0","0","0"
"7.000","6","0","0","0","0"
"8.000","9","1","1","1","0"
"9.000","13","0","0","0","1"
"10.000","4","0","0","0","0"
"11.000","5","0","0","0","0"
"12.000","0","0","0","0","0"
"13.000","0","0","0","0","0"
"14.000","3","0","0","0","0"
"15.000","1","0","0","0","0"
"16.000","5","0","0","0","0"
"17.000","4","0","0","0","0"
"18.000","1","0","0","0","0"
"19.000","0","0","0","0","0"
"20.000","2","0","0","0","0"
"21.000","2","0","0","0","0"
"22.000","0","0","0","0","0"
"23.000","0","0","0","0","0"
"24.000","6","0","0","0","0"
"25.000","11","0","0","0","0"
"26.000","1","0","0","0","0"
"27.000","2","0","0","0","0"
69
2.2 Sample results file
=====================================================================
Detection of Botnets in Network Traffic Outflow
=====================================================================
Capture details of Nominal Packet file:
Total Time Captured = 45352
Number of 10 second periods = 4536
Capture details of Current Packet file:
Total Time Captured = 1671
Number of 10 second periods = 168
----------------------------------------------------------------------------------
Comparison of HTTP Get Requests and HTTP Response packets
----------------------------------------------------------------------------------
Peak average values:
Nominal file:
HTTP Get Requests HTTP Response packets
71.7 3.6
Current file:
HTTP Get Requests HTTP Response packets
971.5 121.5
Difference in request and response packets
Nominal: 5.5 2.5 0.3 0.7 2.4 5.4 0.3 0.4 0.3 4.8 0.9 1.0 4.8 1.0 2.9 0.3 4.2 0.9 2.3 5.0
Current: 12.100000000000001 2.3 3.4 3.1 8.9 5.2 6.9 2.0 2.9 2.7 6.2 4.3 11.4 5.5 5.6 1.4
22.6 35.7 7.1
Normalized frequency proportional to nominal file
Range classes:
<=100%, 100%-150%, 150%-200%, 200%-250%, 250%-500%, 500%-1000%, >1000%
Nominal: 4536.0 0.0 0.0 0.0 0.0 0.0 0.0
Current: 3402.0 297.0 135.0 27.0 540.0 108.0 27.0
70
Weight of Nominal Data = 34521.600000000086
Weight of Current Data = 301843.7999999999
Threshold = 250.0%
Possible attack sequence detected with probability of 87.43621384872054%
----------------------------------------------------------------------------------
Comparison of TCP SYN and TCP SYN/ACK packets
----------------------------------------------------------------------------------
Peak average values:
Nominal file:
TCP SYN TCP SYN/ACK
3.5 1.5
Current file:
TCP SYN TCP SYN/ACK
118.5 34.8
Difference in request and response packets
Nominal: 0.3 0.0 0.0 0.0 0.1 0.3 0.0 0.0 0.0 0.3 0.0 -0.1 -0.10000000000000003 -0.1
Current: 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.1 0.0 1.0 0.0 0.0 0.0
0.10000000000000003 2.7
Normalized frequency proportional to nominal file
Range classes:
<=100%, 100%-150%, 150%-200%, 200%-250%, 250%-500%, 500%-1000%, >1000%
Nominal: 4536.0 0.0 0.0 0.0 0.0 0.0 0.0
Current: 3159.0 162.0 81.0 135.0 324.0 405.0 270.0
Weight of Nominal Data = 504.40000000000833
Weight of Current Data = 31538.70000000001
Threshold = 250.0%
Possible attack sequence detected with probability of 625.2716098334554%
----------------------------------------------------------------------------------
Comparison of UDP packets
71
----------------------------------------------------------------------------------
Peak average values:
Nominal file:
UDP
1.6
Current file:
UDP
34.0
Normalized frequency proportional to nominal file
Range classes:
<=100%, 100%-150%, 150%-200%, 200%-250%, 250%-500%, 500%-1000%, >1000%
Nominal: 4536.0 0.0 0.0 0.0 0.0 0.0 0.0
Current: 3132.0 54.0 81.0 108.0 486.0 432.0 243.0
Weight of Nominal Data = 24.000000000000004
Weight of Current Data = 13410.9
Threshold = 250.0%
Possible attack sequence detected with probability of 5587.874999999999%