Detecting SQL Injection Attacks in VoIP using Real-time Deep ...

Post on 12-Mar-2023

0 views 0 download

Transcript of Detecting SQL Injection Attacks in VoIP using Real-time Deep ...

Linköpings universitetSE–581 83 Linköping+46 13 28 10 00 , www.liu.se

Linköping University | Department of Computer and Information ScienceMaster’s thesis, 30 ECTS | Datateknik

202019 | LIU-IDA/LITH-EX-A--2019/060--SE

Detecting SQL InjectionAttacks inVoIP using Real-timeDeep PacketInspection– Can a Deep Packet Inspection Firewall Detect SQL InjectionAttacks on SIP Traffic with Reasonable Performance?Kan man i realtid upptäcka SQL-injektioner i VoIP med en brand-vägg som använder Deep Packet Inspection?

Linus Sjöström

Supervisor : Patrick LambrixExaminer : Niklas Carlsson

External supervisor : Jonathan Jogenfors

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annananvändning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligheten finns lösningar av teknisk och administrativ art.Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning somgod sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentetändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.För ytterligare information om Linköping University Electronic Press se förlagets hemsidahttp://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for aperiod of 25 years starting from the date of publication barring exceptional circumstances.The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercialresearch and educational purpose. Subsequent transfers of copyright cannot revoke this permission.All other uses of the document are conditional upon the consent of the copyright owner. The publisherhas taken technical and administrative measures to assure authenticity, security and accessibility.According to intellectual property law the author has the right to bementionedwhen his/her workis accessed as described above and to be protected against infringement.For additional information about the Linköping University Electronic Press and its proceduresfor publication and for assurance of document integrity, please refer to its www home page:http://www.ep.liu.se/.

© Linus Sjöström

Abstract

The use of the Internet has increased over the years, and it is now an integral part ofour daily activities, as we often use it for everything from interacting on social media towatching videos online. Phone calls nowadays tend to use Voice over IP (VoIP), ratherthan the traditional phone networks. As with any other services using the Internet, thesecalls are vulnerable to attacks. This thesis focus on one particular attack: SQL injectionin the Session Initial Protocol (SIP), where SIP is a popular protocol used within VoIP. Tofind different types of SQL injection, two classifiers are implemented to either classify SIPpackets as "valid data" or "SQL injection". The first classifier uses regex to find SQL meta-characters in headers of interest. The second classifier uses naive Bayes with a training dataset to classify. These two classifiers are then compared in terms of classification throughput,speed, and accuracy. To evaluate the performance impact of packet sizes and to betterunderstand the classifiers resiliance against an attacker introducing large packets, a testwith increasing packet sizes is also presented. The regex classifier is then implemented in aDeep Package Inspection (DPI) open-source implementation, nDPI, before being evaluatedwith regards to both throughput and accuracy. The result are in favor of the regex classifieras it had better accuracy and higher classification throughput. Yet, the naive Bayes classifierworks better for new types of SQL injection that we do not know. It therefore argues thatthe best choice depends on the scenario; both classifiers have their strengths and weakness!

Acknowledgments

First of all, I would like to thank Sectra Communication AB for preparing this Master’s thesisproposal for me. I wanted to do my thesis at Sectra. Also a big thanks to Jonathan Jogen-fors for being a very supportive supervisor, and giving many tips for LaTeX, grammar, andstructure. Without him as a supervisor, this thesis would have been very hard to finish.

Another person that I would like to thank is Niklas Carlsson, my examiner. He has takenhis time for setting up meetings for the milestones and giving feedback even though he hasa tight schedule with other projects and students. Finally, I would like to thank my oppo-nents Carl Nykvist and Martin Larsson for giving valuable feedback on my thesis and theiropinions.

iv

Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables viii

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Real-time Transport Protocol (RTP) . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Real-time Transport Control Protocol (RTCP) . . . . . . . . . . . . . . . . . . . . 42.3 The RTP bleed Bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Inserting jitter in RTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.5 Session Initiation Protocol (SIP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.6 Registration hijacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.7 SQL injection in SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.8 Session Description Protocol (SDP) . . . . . . . . . . . . . . . . . . . . . . . . . . 102.9 Deep Packet Inspection (DPI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.10 nDPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.11 Naive Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Related Work 153.1 Vulnerability investigation in VoIP and VoLTE . . . . . . . . . . . . . . . . . . . 153.2 Performance investigation in DPI . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Method 184.1 SQL injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3 Naive Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.4 Regex classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.5 Test setup for Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.6 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.7 nDPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Results 285.1 Naive Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

v

5.2 Regex classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.3 Regex vs naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.4 Regex classifier in nDPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6 Discussion 356.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.3 The work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7 Conclusion 387.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Bibliography 40

8 Appendix 448.1 Packets generated for testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448.2 Packets that failed identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

vi

List of Figures

2.1 The packet structure of how the RTP header looks like. . . . . . . . . . . . . . . . . 42.2 The packet structure of how the RTCP header looks like. . . . . . . . . . . . . . . . 52.3 Illustration of an attack against the RTP bleed Bug. . . . . . . . . . . . . . . . . . . . 62.4 Illustration of SIP initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 Illustration of registration hijacking. . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 An example of a firewall using DPI on the router. . . . . . . . . . . . . . . . . . . . 12

5.1 Throughput of regex classifier and naive Bayes classifier for different packet sizes. 325.2 CDF of the classification time of the regex classifier and naive Bayes classifier for

2000 generated packets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.3 CCDF of the classification time of the regex classifier and naive Bayes classifier for

2000 generated packets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.4 CDF and CCDF for 2000 packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.5 CDF of the classification time of the regex classifier and naive Bayes classifier for

2011 generated and special formatted packets . . . . . . . . . . . . . . . . . . . . . . 335.6 CCDF of the classification time of the regex classifier and naive Bayes classifier for

2011 generated and special formatted packets . . . . . . . . . . . . . . . . . . . . . . 335.7 CDF and CCDF for 2011 packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.8 The graph shows the distribution of the classification time of the regex classifier

in the DPI on the network trace of the SIP call . . . . . . . . . . . . . . . . . . . . . . 34

vii

List of Tables

5.1 Naive Bayes classifier on generated SQL injections. . . . . . . . . . . . . . . . . . . 305.2 Naive Bayes classifier on generated SQL injections and more complex special cases

of SQL injection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.3 Regex classifier on generated SQL injections. . . . . . . . . . . . . . . . . . . . . . . 315.4 Regex classifier on generated SQL injections and harder special cases of SQL in-

jection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.5 The throughputs impact when increasing the size of the packets for regex and

naive Bayes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

viii

1 Introduction

The use of the Internet has increased over the years [1] and it is now an integral part ofour daily activities, as we often use it for everything from interacting on social media towatching videos online. Even old solutions have been remodeled to work over the Internetinstead. One of these popular, modernized solutions is Voice-over-IP (VoIP). Telephony hasearlier been analog and used with circuit switching, but has now been changed to use packetswitching and VoIP. The use of VoIP has had a major increase. As of 2010 there were 34 mil-lion subscriptions of VoIP, and as of 2018, there are 118.2 million subscriptions in the UnitedStates [2].

A popular protocol to use within VoIP is Session Initiation Protocol (SIP). SIP manages,for example, registration to allow clients to register what IP address they have and the ring-ing functionality. However, many of these functions give room for exploits. Just like anyother services done over IP, it is vulnerable to different types of malicious attacks, such asDenial of Service (DoS) and Man-in-the-Middle (MITM) attacks. This means, in the case ofMITM attacks, that someone could eavesdrop on the phone call and hear everything that wassaid in the call. As of 2016, IBM Managed Security Services showed that 51% of all attackstowards VoIP was targeting SIP [3]. For organizations or authorities that use SIP and mightneed secrecies within their call, as the message are confidential, this high percentage is verytroubling.

Sectra Communication AB, which is where this Master’s thesis was written, is an orga-nization that develops software and hardware solution to maintain confidentially, integrity,and availability for different types of security organization and authorities. With this thesis,we investigate whether it is possible to use Deep Packet Inspection (DPI) to detect any SQLattacks directed at SIP. Except for detecting the attacks, it is also essential to maintain thequality of the phone call and to avoid introducing new attacks towards the new implementa-tion. For this reason, any solution should be efficient and handle large classification volumes.DPI allows us to have an accurate detection, by thoroughly inspecting the packet, of whatprotocol that is used on the server and inspecting the header and the payload for maliciousattacks while having a high throughput with low delays. Therefore, a highly maintainedopen-source DPI framework, called nDPI, is used. This is a fork from OpenDPI which is anopen-source DPI no longer maintained. nDPI is written in C and can supposedly achieve10-Gigabit throughput, with commodity hardware, and identify over 200 protocols, whichmade it an excellent candidate to use for this thesis [4].

1

1.1. Motivation

1.1 Motivation

Because there are vulnerabilities present in both SIP and Real-time Transport Protocol (RTP),there is a risk for Sectra to use them. However, if a method can be implemented as a proof ofconcept to show that at least some of these vulnerabilities can be mitigated using for exampleDPI, it would be an alternative to apply as this may help lower the risk of attacks for Sectra.

DPI is very good at identifying protocols [5], to inspect the payload and providing an im-portant tool for classifying packets as either "normal" or containing malicious data. However,going thoroughly through the payload requires a lot of processor power and might introduceprocessily delays. Because of this, it is relevant to investigate whether DPI is a good can-didate to find malicious attacks towards protocols, such as SIP and RTP, and whether suchclassification can be done in real-time.

As of 2019, SQL injection is a very common attack [6], making it both important andinteresting to investigate. Even though the servers are protected against SQL injection, if anSQL injection is identified, it points out that there is probably a vicious person within thenetwork trying to exploit the system. This identification is then worth highlighting for thesystem administrator, the active callers, and other relevant persons.

1.2 Aim

This Master’s thesis aims to investigate whether it is possible to detect any malicious SQLattacks within the protocol SIP using DPI in real-time. This is mostly specified on Sectra’sarchitecture but is also applicable in the wild as similar architectures are used by others.For the methods consideration, the DPI-based classification solutions must be fast enough tomaintain high classification throughput, and low processing times, as the quality of the callis essential.

1.3 Research questions

The research questions considered in this thesis are:

1. What are the delays and accuracy of using a regex classifier or naive Bayes classifier toclassify a packet as SQL injection?

2. How robust is a regex classifier and naive Bayes classifier against new attacks, such asDoS?

3. Can a regex classifier or naive Bayes classifier be implemented together with DPI with-out affecting the

2

2 Background

In this chapter, background and related work is presented. The background presentsattacks, concepts, and protocols that are used in the method. This is to give a fur-ther understanding of why it is relevant to understand VoIP, possible exploits, and themethod.

2.1 Real-time Transport Protocol (RTP)

RTP is a protocol with the intent for end-to-end media transfer in real-time. Becauseit is a real-time protocol, it is preferred to use it on top of the User Datagram Protocol(UDP) to reduce delays and checksum services. Using UDP means that some packetsmight be lost during the transfer, but this can be made somewhat unnoticeable if usedwith a suitable error concealment algorithm. This means using Transmission ControlProtocol (TCP) that would guarantee no packet loss, is not necessary and would onlyintroduce delays. The RTP protocol does not rely on the underlying network to sendthe packets in sequential order; however, it does rely on the underlying network tosend the packet within a reasonable time [7].

Figure 2.1 illustrates the byte order of the RTP header. Here we will describe whateach field means in the header of RTP.

Version (V): Represents the version of RTP.

Padding (P): Represents whether the packet contains one or more additionalpadding octets. The padding might be required by, for example, an encryption al-gorithm. If the bit is set, then the statement is true.

Extension (X): Represents if the fixed header must be followed by one header ex-tension.

CSRC count (CC): Represents the number of CSRC identifiers that follow the fixedheader.

Marker (M): The representation is defined by a profile. The meaning of the markeris to allow important events such as frame boundaries to be marked in the packetstream.

3

2.2. Real-time Transport Control Protocol (RTCP)

Figure 2.1: The packet structure of how the RTP header looks like.

Payload Type (PT): Represents the format of the payload and how the applicationshould interpret it. The payload type can be changed during a session by an RTPsource.

Sequence Number: Represents an identification of the packet sequence. The se-quence number is increased by one for each RTP packet and can, therefore, detectpacket losses when used with UDP. The sequence numbers initial value should beunpredictable to complicate known-plaintext attack on encrypted data.

Timestamp: Represents the sampling instant of the octet that appears first in theRTP data packet. To allow synchronization and jitter calculation, the sampling mustbe derived from a clock, which increments monotonically and linearly in time.

SSRC identifier: Represents the synchronization source. The value of this fieldshould be randomly chosen as prevention that, within the same RTP session, twosynchronization sources will not have the same SSRC value.

CSRC identifiers: Represents the identifiers of the contributing source for the pay-load contained within the given packet. The CSRC count specifies the number ofidentifiers, and if there are more than 15 contributing sources, only 15 can be identi-fied.

2.2 Real-time Transport Control Protocol (RTCP)

Real-time Transport Control Protocol is the sister protocol of RTP. It is used to send outinformation about statistics, such as round-trip delay, packet count, and packet loss.It does not transport any media. However, with the given statistics that are sent outto every participant of the session, everyone can modify their Quality of Service (QoS)parameters, such as limiting flow. With this information, everyone can attempt to max-imize the QoS. RTCP tends to use the next odd-numbered port number from RTP thatuses an even-numbered port number [8]. The Request For Comments (RFC) specifiesthat 5% of the traffic should be allocated to RTCP and 25% of the RTCP traffic to bededicated to senders. This is useful in the scenario where many participants are onlyreceiving and a small percentage sending; then new participants will get informationabout the session faster. It is also recommended to have the interval dynamic depend-ing on the number of participants, but at least have a minimum of five seconds [7].

RTCP has five types of control information to carry out. The first is Sender Report (SR)and is used for transmission and reception statistics, which comes from active sendersof the participants. The second is Receiver Report (RR), which is similar to SR but is fornon-active senders. The third one is Source Description Items (SDES), which describesthe source and includes the canonical name. The fourth is BYE, and it indicates the endof participation and should be the last packet sent. The final and fifth is APP, which isdefined by the application used and may vary.

The header of RTCP is similar to how RTP looks like. Figure 2.2 illustrates the byteorder of the RTCP header.

4

2.3. The RTP bleed Bug

Figure 2.2: The packet structure of how the RTCP header looks like.

Version (V): Represents the version of RTCP, which is the same as in RTP packets.Padding (P): Represents whether the packet contains one or more additionalpadding octets. The padding might be required by, for example, an encryption al-gorithm. If the bit is set, then the statement is true.Reception report Count (RC): Represents the amount of reception report blockswithin a packet.Packet Type (PT): Represents the packet type. To identify this as an RTCP SRpacket, it contains the constant 200.Length: Represents the length of the packet, including the header and any padding.SSRC identifier: Represents the synchronization source identifier for the originatorof the packet.

2.3 The RTP bleed Bug

A severe bug that was found in 2011 is the RTP bleed Bug. It is related to having a Net-work Address Translator (NAT) on the local network, and because of this, the proxycannot rely on the IP and port delivered by SIP. Instead, many RTP proxies developeda method to learn the IP and port automatically, by looking at the source IP and portand setting it as the responder. The exploit took advantage of the fact that the learningmethod did not require any authentication. The attacker can therefore send RTP trafficto the proxy and receive someones else’s proxied RTP traffic. What made this exploiteven more noxious is that it made it possible for an attacker not having to intercept thetraffic in the middle of the user and the proxy server, in other words, a MITM attack.The attack can also be seen as DoS because the proxy will start sending the RTP packetto the attacker and the original caller would stop receiving RTP packets. However, thisis not always the case as the proxy might sometimes, in reality, send some packets to theattacker and some to the original caller. In this case, the original caller may notice theaudio quality to appear choppy, but the call continues. The attack can likely be iden-tified by UDP port scan detection, but should not be seen as a fix, rather a temporarymitigation [9].

Figure 2.3 illustrates how an attacker can exploit the RTP bleed Bug. In this example,both Bob and Eve is not on the same network as Alice but sends their RTP packets tothe same proxy.

a) A VoIP call is established between Alice and Bob.b) Alice then starts to send RTP packets to Bob, through the NAT and the proxy. The

proxy now applies the learning method to know which IP address to use.c) Bob now starts sending his RTP packets to Aliced) Eve, the attacker, will now start sending RTP packets to Bob through the proxy.

The proxy’s learning method now gets unsure of what IP address to use and willswitch between Alice and Eve when Bob is trying to send to Alice.

Known software to be affected by this bug is Asterisk 14.4.0 and RTPproxy 2.2.al-pha.20160822 (git) [9].

5

2.4. Inserting jitter in RTP

Proxy

2

Alice

NAT

2

Bob

3

4

Eve

1

44

Figure 2.3: Illustration of an attack against the RTP bleed Bug.

2.4 Inserting jitter in RTP

A form of DoS attack that is hard to detect is sending bogus RTP packets. Adams et al.introduced this form of attack to prove that it can go through the Intrusion DetectionSystem (IDS) and be seen as regular traffic [10]. By sending only a few packets, it can beavoided to be flagged as DoS packets, and because going through the payload to vali-date it as bogus data was time-consuming. Because of this, the attack could successfullybe deployed to lower the quality of a phone call. As each RTP packet contains about20-60 ms of actual voice data, it was hard to deny the call altogether. However, it waspossible to make the the two persons having problem to hear each other over the phoneas there were many interrupts in the call. Over the timespan of 1 minute, 3000 regularRTP packets and 857 bogus RTP packets were sent.

Furthermore, in theory, it does not matter whether the data is encrypted or unencryptedas it could not be flagged as bogus data until it reaches the target and hears it as bogus.To flag the data as bogus, classification of what valid and not valid data is would have tobe applied. Despite this, the attacker could send valid data but not valid in the contextof the call.

2.5 Session Initiation Protocol (SIP)

SIP is a protocol used for signaling and controlling media communication sessions. Itis related to applications within Internet telephony for video and voice calls and canbe used both in private networks and in Voice over Long-Term Evolution (VoLTE).The protocol also allows the invitation of new participants to an already establishedsession. However, the protocol is not complete in and of itself for Internet telephonybut is rather a component that is used with other protocols to establish a complete VoIPcall. One of these protocols is, of course, RTP which is used to carry the actual voiceand video data, but other protocols such as Session Description Protocol (SDP) can beused with it as well [11].

6

2.5. Session Initiation Protocol (SIP)

SIP includes five aspects of establishing and canceling a session:

Session Setup The establishment of the parameters of the session at both callingand called party.

Session Management Refers to the transferring and termination of sessions, invok-ing services, and adjust parameters.

User Location Determination of what end system that will be used.

User Capabilities Determination of what media parameters and media that will beused.

User Availability Determination if the called party will engage in communications.

The SIP header is similar to the HTTP header as it is text-based in contrast to RTP wherethe header is represented by bits. It also uses a HTTP-like response and request transac-tion model. Whenever a transaction consists of a request, the server invokes a methodand responds. A SIP session starts with a user, Bob, for example, sending an INVITEthrough a proxy to Alice. The header of Bobs invite request contains numerous headerfields which are named and provides further information, just as HTTP. The partici-pants in an INVITE include a unique identifier, Bob’s address, the destination address,and the type of session that Alice would like to establish. Listing 2.1 illustrates a typicalINVITE from the client and a response from the server.

Listing 2.1: SIP INVITE example [11].

INVITE s ip : bob@biloxi . com SIP /2.0Via : SIP /2.0/UDP pc33 . a t l a n t a . com ; branch=z9hG4bK776asdhdsMax Forwards : 70To : Bob <s ip : bob@biloxi . com>From : Al ice <s ip : a l i c e @ a t l a n t a . com>; tag =1928301774Call ID : a84b4c76e66710@pc33 . a t l a n t a . comCSeq : 314159 INVITEContact : <s ip : a l ice@pc33 . a t l a n t a . com>Content´Type : a p p l i c a t i o n /sdpContent´Length : 142

SIP has six core methods which are used to establish, modify, or destroy a connection.The first method is INVITE, as shown in Listing 2.1 and this method is used to startthe session with another client. The second method is BYE, which essentially meanstermination of an on-going session and can be done by either client. The third methodis REGISTER, which is a request to register a user to the registrar server. The fourthmethod is CANCEL, and this method is similar to BYE except that it terminates a sessionthat is not established yet. The fifth method is ACK, and it is a acknowledge used as afinal response to an INVITE. The sixth and last method is OPTIONS, which is used toquery a client or proxy server to discover its current availability and list its capabilities.

Figure 2.4 illustrates an initialization of a SIP call. What is worth mentioning is that theSIP Registrar and SIP proxy can be both seen as separate units and as the same unitdepending on the implementation of the SIP architecture.

a) Alice (the caller) and Bob send a REGISTER to their corresponding SIP registrar.

b) Alice sends an INVITE to Bob through Alice’s proxy with the intent to start a VoIPcall with Bob.

c) Alice’s proxy forwards the INVITE to Bob’s proxy.

7

2.6. Registration hijacking

Proxy Proxy

SIP Registrar SIP Registrar

2

6

7 7 7

6 6

53

114

AliceBob

Figure 2.4: Illustration of SIP initialization.

d) Bob’s proxy asks the SIP registrar for Bob’s IP address.

e) Bob’s proxy contacts Bob.

f) Bob answers and sends back a 200 OK.

g) Alice sends an ACK, and the session is now established, and RTP packets can besent to Bob.

In addition to the core methods, there are also extension methods. These methodsare not required to have a practical SIP session but can contribute to additional ser-vices. The extension methods consist of eight methods, and the first one is SUBSCRIBE.This method allows a client to get a notification whenever a specified event occurs.The second method is EVENT and is intended to notify which is related to the firstone, SUBSCRIBE, as a user ask if a specified event has occurred. The third methodis PUBLISH, and its purpose is to inform the server about the state of an event. Thefourth method is REFER, which is used by a client to access a URI for a dialog by re-ferring to another client. The fifth method is INFO, and this method is used to informanother client about the call signaling information which a client has established a ses-sion with. The sixth method is UPDATE, and it is used to update the session’s state whenthe session is not entirely formed. The seventh method is PRACK, and this method ac-knowledges a reliable provisional response. The eighth and last method is MESSAGE,which is used to send a message to the other client using SIP.

2.6 Registration hijacking

By exploiting the REGISTER method, an attacker can manage to redirect all new incom-ing packets to himself instead of the original caller. This is done by taking advantageof the contact header. If the attacker sends a modified REGISTER with the same headerexcept for the contact header where the IP address is changed to the attackers IP ad-dress, the proxy will forward this to the caller, and the callee will now believe that thisis the correct person. This can only work if the original caller’s REGISTER is either race-conditioned by sending repeatedly REGISTER in a short timeframe, deregistered or notsent to the proxy, which can be done by, for example, a DoS attack towards the client.

8

2.7. SQL injection in SIP

After this, the attacker sends its REGISTER with the new IP address. The attacker cannow choose to either to deny any packets to be sent to the client or act as a man in themiddle to eavesdrop the phone call [12]. Figure 2.5 illustrates how the attack works.

Proxy Proxy

SIP Registrar SIP Registrar

2

6

7 7 7

6 6

53

10

1

4

AliceBob

Eve

Figure 2.5: Illustration of registration hijacking.

) Eve (the attacker) starts with performing a DoS attack against Bob (the client)which is trying to register himself.

a) Alice (the caller) sends a REGISTER to the SIP registrar. Eve does the same, withthe same parameters that Bob would use except for switching the IP address withEve’s own.

b) Alice sends an INVITE to Bob through Alice’s proxy with the intent to start a VoIPcall with Bob.

c) Alice’s proxy makes a domain look up and then forwards the INVITE to Bob’sproxy.

d) Bob’s proxy asks the SIP registrar for Bob’s IP address but receives Eve’s IP ad-dress.

e) Bob’s proxy thinks it contacts Bob, but contacts Eve instead.

f) Eve’s answers and sends back an 200 OK.

g) Alice sends an ACK, and the session is now established, and RTP packets can besent to Eve’s, while Alice thinks it is Bob.

2.7 SQL injection in SIP

The SQL injection attack is the most observed attack [6] and is a well-known attackwithin the protocol HTTP as the server tends to use the information from it to get, in-sert, update or delete data from a database. The use of the database is similar for SIP,as it stores user credentials and appropriate data to serve the end-user value-added ser-vices. The SIP server can, for example, look up a username, contacts, and authorizationwith SQL queries. Open-source SIP implementations tend to support MySQL and Post-greSQL that uses various data tables, but most importantly, the critical tables Subscriberand Location. Subscriber, which can be seen as the most essential, stores information

9

2.8. Session Description Protocol (SDP)

about the user, such as username, password, and domain. This makes it a perfect targetfor SQL injection as an attacker might want to authorize himself or herself as anotherperson, and during this process, the Subscriber table is used [13].

Listing 2.2 shows an example of how a SQL injection can look like while performingan authorization with REGISTER. It exploits the username by adding an update queryafter the username value.

Listing 2.2: SIP REGISTER SQL injection example.

REGISTER sips:ss2.biloxi.example.com SIP/2.0Via: SIP/2.0/TLS client.biloxi.example.com:5061;branch=z9hG4bKnashd92Max-Forwards: 70From: Bob <sips:bob@biloxi.example.com>;tag=ja743ks76zlflHTo: Bob <sips:bob@biloxi.example.com>Call-ID: 1j9FpLxk3uxtm8tn@biloxi.example.comCSeq: 2 REGISTERContact: <sips:bob@client.biloxi.example.com>Authorization: Digest username="bob’;UPDATE subscriber SET password=’simple’where username=’bob’--", realm="atlanta.example.com"nonce="ea9c8e88df84f1cec4341ae6cbe5a359", opaque="",uri="sips:ss2.biloxi.example.com",response="dfe56131d1958046689d83306477ecc"Content-Length: 0

SQL injection attack can be both used to make authentication succeed or to make theSIP proxy useless (a form of DoS attack). However, as of today, SQL injection is a well-known attack, and most servers are protected against it. This means that SQL injectionin and of itself is not a threat, but if detected, it is known that there might be a vicioususer on the network.

2.8 Session Description Protocol (SDP)

Session Description Protocol is a protocol intended to describe multimedia communica-tion sessions. This includes announcements, invitation, and parameter negotiation of asession. SDP, like SIP, does not in and of itself sends any media-related data but insteadused for describing what type of media type and related properties to be used in thesession. It can both be used as a standalone format but also included in the payload ofRTP, SIP or even HTTP.

SDP is structured as a variable-equal-value per line, similar to how HTTP works. Thereare three types of description for SDP, and these are Session, Time, and Media. Eachof these descriptions has mandatory and optional values [14]. Below are each valuedescribed and must come in the given order. Variables marked with an asterisk (*) aremandatory.

Session Description

*v The protocol version number.

*o Id, username, network address and version number.

*s The session name.

i A title or short summary.

u URI of. description.

10

2.8. Session Description Protocol (SDP)

e Email address, and optional name of contacts.

p Phone number, and optional name of contacts.

c Information about the connection.

b Information about bandwidth.

One or more Time descriptions

z Adjustments of time zone.

k Encryption key.

a Session attribute lines.

Zero or more Media descriptions

One or more Time Description has to be be included, but for MediaDescription there is no requirements. Below is the order of how TimeDescription and Media Description should be formated.

Time Description

*t Time when the session is active.

r Number of repeat times.

Media Description (optional)

*m Transport address and name of the media.

i Title or summary of the media.

c Information about the connection.

b Information about bandwidth.

k Encryption key.

a Media attribute lines.

How an SDP description looks like can vary a lot depending on the use-case. An exam-ple of how a detailed SDP description can look like is shown in Listing 2.3.

Listing 2.3: SDP description example [14].

v=0o=jdoe 2890844526 2890842807 IN IP4 1 0 . 4 7 . 1 6 . 5s=SDP Seminari =A Seminar on the s e s s i o n d e s c r i p t i o n protoco lu=http ://www. example . com/seminars/sdp . pdfe= j . doe@example . com ( Jane Doe )c=IN IP4 2 2 4 . 2 . 1 7 . 1 2 / 1 2 7t =2873397496 2873404696a=recvonlym=audio 49170 RTP/AVP 0m=video 51372 RTP/AVP 99a=rtpmap : 9 9 h263´1998/90000

11

2.9. Deep Packet Inspection (DPI)

2.9 Deep Packet Inspection (DPI)

DPI is a method to inspect packets at a given inspection point, for example at a proxyserver. If compared to the typical firewall that only reads labels or headers of the packet,DPI does precisely as the name states. DPI goes through layer three up to layer seven(application layer) in the OSI model and also the payload of layer seven, and because ofthis, it is considered more effective than Shallow Packet Inspection and Medium DepthPacket Inspection ,which are method that does not go through the whole protocols. Bythis approach, it is possible to set up a set of rules to block or allow different types ofpackets, in real-time without any lengthy delays.

Furthermore, as DPI detect different protocols, this can be used to prioritize differenttypes of packets, such as VoIP packets. It also manages to protect against certain kindsof threats, such as DoS attacks, buffer overflow attacks, and malware [15].

Figure 2.6 illustrates two clients connected to a router using DPI. Using DPI on therouter allows for blocking and analyzing of packets, an inspection of SSL traffic, open-ing and closing port and warding off SSL sniffing. Moreover, this is done in real-timewith minimum delays.

Figure 2.6: An example of a firewall using DPI on the router.

DPI can be implemented both as a hardware solution, which allows it for faster pro-cessing, and software solution, which makes it more extendable [16]. Which one to usedepends on the situation, for example, in the case of an Internet Service Provider (ISP) ahardware solution is best suited as the vast amount of data needs to be processed withminimum delays and high throughput. In the case of when high throughput is less ofa requirement, a software solution is more suited as it is easier to change rules and tomaintain.

2.10 nDPI

nDPI is an open-source projected maintained by ntop. It is a fork from the no longermaintained open-source project OpenDPI. It had limitations such as no thread-safety,and after identifying a packet as a certain protocol, it tried to see if it could identify itto more protocols, rather than just returning at the first match. nDPI did resolve manyof the issues present in OpenDPI and added support for more protocols. For example,OpenDPI did not support more modern protocols such as Skype or encrypted protocolssuch as HTTPS and this was supported in nDPI. Because OpenDPI was written in C,nDPI continued the same path with continuing using C. However, OpenDPI requiredtheir dissectors to be written in C. nDPI did take another approach by allowing to use aruntime configuration directive instead of having to change the C code.

12

2.11. Naive Bayes classifier

At most, nDPI needs eight packets to be able to identify a protocol, for example, BitTor-rent as its signature is more difficult to classify. However, there are also many packetsthat only require one packet, such as DNS, SNMP, or NetFlow. nDPI has proper true-positive identification on protocols, such as Skype, HTTP, SSH, and DropBox, where allof them are above 90%. Other protocols, which are harder to identify, such as encryptedBitTorrent and TOR, has 54.41% and 33.51% respectively [17]. Also, to quickly andaccurate identify patterns related to different protocols, nDPI uses the Aho-Corasick al-gorithm, which is a string-searching algorithm with low time complexity [18]. In 2013,it was shown that nDPI had the best accuracy when compared to other open-sourcetraffic classifiers, such as PACE, UPC MLA, and Libprotoient [5]. As of 2019 February,nDPI is still maintained regularly at GitHub with over 1400 commits and has just under70 contributers [4].

2.11 Naive Bayes classifier

Naive Bayes classifier is a straightforward classifier that represents class labels as vec-tors of features. The features are independent of the other features, meaning xi and xjare seen without any correlation. In practice, this means that if there is, for example, ablack box with the area 1m2 that is considered to be a shape. Then a naive Bayes clas-sifier will see all of these features as independent of each other for the probability forthis shape to be a box. The correlations between the color, area, and the box itself arenot taken into account. However, the representation of the feature vector can vary. Themost common way to represent them is counting how many time each word appears inthe text. Another approach, to account for ngrams, is to express the features as n-words.This means that if the text is "I am happy", it will also result in "I am" and "am happy"features.

Naive Bayes is a conditional probability model structured as Equation 2.1, where wehave a feature vector x = (x1, ..., xn) containing independent features and yk is the classlabel.

p(yk|x1, ..., xn) = p(yk|x) (2.1)

This can be reformulated by applying Bayes theorem and can then be described as theprior multiplied with the likelihood and divided with the evidence:

p(yk|x) =p(yk) ¨ p(x|yk)

p(x)(2.2)

Because the denominator is independent of y and the values of the features in x aregiven, the denominator will remain constant. This gives us Equation 2.3:

p(yk) ¨ p(x|yk)

p(x)9p(yk) ¨ p(x|yk) (2.3)

To be able to classify what class that it most likely is, the classifier selects the class labelyk with the highest probability over all such labels. This can be expressed as Equation2.4:

arg max(p(yk) ¨n

ź

i=1

p(xi|yk)) (2.4)

There are two major approaches to use with naive Bayes, and these are multi-variateBernoulli model and multinomial model [19]. With the multi-variate Bernoulli model,

13

2.11. Naive Bayes classifier

the feature vector is represented by each word as binary. Each word is in the featurevector representing whether the word is present in the document or not.

The multi-variate Bernoulli model probability is calculated as described in Equation 2.5,where pki represents the probability of the class yk generating word i.

p(x|yk) =n

ź

i=1

pxik,i(1´ pki)

1´xi (2.5)

The multinomial model probability is calculated as described in Equation 2.6.

p(x|yk) =(řn

i=1 xi)!śn

i=1 xi!¨

i=1

(pki)xi (2.6)

14

3 Related Work

Related work will provide an overview of what has been done within this thesis areaand how it relates to this thesis.

3.1 Vulnerability investigation in VoIP and VoLTE

VoIP, in terms of VoLTE, has vulnerabilities in many forms. Li et al. [20] present some ofthe weaknesses that are present in VoLTE. These vulnerabilities are not only in the formto attack the individual but also includes the operator. The attacks to exploit the vulner-abilities presented in the paper are mostly regarding how Long Term Evolution (LTE)works. Attacks that exploit LTE mostly takes the advantage that the mobile operatorhas special rules regarding VoIP protocols. Such rules can be, for example, prioritiesin order to fulfill QoS in the manner of real-time quality in the calls. This example canbe directly exploited by masquerading other data within the VoIP protocols. By doingthis, the attacker can achieve higher throughput and less delay. Other papers [21–23]also brings up this exploit where the goal is to hide secret information from the operatorby imitating common protocols such as Skype, or at least bypass a what the operatorhas blocked. These papers do, however, have another aspect of it, as the context is cen-sorship in non-democratic countries, rather than an attacker just exploiting to achievebetter performance.

Li et al. aim with their paper to confirm the discussed vulnerabilities present in VoLTEand recommended fixes for each of them. One particular attack was, however, more rel-evant for this thesis, and this attack is based on DoS. The attack accomplishes to mutethe victim and to make this possible the session ID must be required. By having the ses-sion ID, one can forge RTP packets with a correct session ID and injecting it to the voicebearer. By doing this, it mutes both the uplink and downlink. The recommended fix byLi et al., contains two different approaches; the first naive approach is just abandoninghigh-priority QoS for VoLTE. The other approach is that once it is detected as forgedtraffic, the volume of the data will be accounted traced back to the source if it exceeds agiven threshold. Once the volume is larger than the threshold, the priority is lowered.

15

3.2. Performance investigation in DPI

Tu et al. also illustrates the attack where it is possible to mute the victim [24]. How-ever, they also present another vulnerability of using VoLTE as it can expose threats fordraining the battery with a pace of 5-8 times faster than average.

There are also other attacks vulnerable in VoLTE that would allow for free-of-chargedata [25]. This is also confirmed by Li et al. in their paper. It works by imitatingother protocols that the operator would have as free-of-charge. An operator might haveVoLTE calls for free, and then by hiding data within a SIP packet, that could be presentin a VoLTE call, an attacker could send free-of-charge data.

A very recent paper by Shintre at al. brings up the problems of communication recordsrevealing private relations between two parties [26]. This achieved, but not guaranteed,by capturing the interaction frequency over time, the existence of mutual friends, thecommunication mutuality and information about recent interaction. However, to someextent, it can be avoided by using private networks, but it is still vulnerable to privatecommunication detection (PCD). Shintre at al. studied the efficacy of PCD for a sce-nario containing two communicating targets and an eavesdropper in the middle. Theydeveloped a mathematical model that represents the calling behavior of the targets andthe probing strategy of the eavesdropper. They, on top of that, analyzed the efficacy ofcountermeasures. This included resource randomness, which is supposed to bring thenoise to the attacker’s observation process and reduces the anonymity leakage. Anothercountermeasure that was relevant for this thesis is the use of VPN, which is present inVoIP calls for Sectra’s system. This approach may seem reasonable enough, but it stillcomes with insufficiency for the prevention of PCD. There are two reasons for this, andthe first one is if the attacker is within the same VPN. The second is if the attacker canstill send probes and receive responses from the target’s device despite being outsidethe private network. This means using a VPN is not sufficient enough to prevent PCD.They conclude that their approach, using resource randomness, is the best way to avoidPCD due to the introduction of noise.

A popular application to investigate, regarding VoIP, is Skype [27–29]. Zhu et al. in-vestigated the privacy of Skype [30]. One of the attacks they investigate is made bycollecting Skype call traces from a victim. Then application-level features are extractedto train a Hidden Markov Model (HMM). By this approach, whenever a Skype call traceis captured, it can be used with the HMM to calculate the likelihood of the call trace tobe originating from the victim, the trained data is from.

3.2 Performance investigation in DPI

The need for high performance in DPI is not necessarily because many concurrent legitpackets are sent, but due to making it more robust against DoS and DDoS attacks. A re-cent paper brings up the processing power limitation of using DPI on the software side.Jyothi et al. [16] evaluate a hardware solution called Deep Packet Field Extraction En-gine (DPFEE) against other commonly used solution to extract layer seven-fields. Theyshowed that using DPFEE as hardware offloading can reduce the load on critical systemresource by at least 30%. It is also faster than using a GPU that advantages 200 paral-lel instances. It outperformed the GPU by being three times as fast and having 12-1020times lower latency. However, looking into using GPU for speedup from the typical usecase of DPI is still relevant as it is more portable rather than special-purpose hardware.The investigation of increasing the speed of pattern matching on GPU has been exten-sively researched [31–35] and they confirm that it is better for the throughput to useGPU rather than the traditional CPU. This has also been confirmed in a more recent pa-per [36] where Vasiliadis et al. propose GASPP, a network traffic processing frameworkdesigned for modern graphics processors. They make special optimizations to avoidtime-consuming context switches to have lower delays and better throughput. They

16

3.2. Performance investigation in DPI

manage to get a speedup of 16.2 times compared to monolithic GPU-based implemen-tations and also hides complex computations behind the framework, making it easierfor the programmer to deal with the relevant data. However, they still encounter prob-lems for highly dependent real-time protocols such as RTP. The latency of transferringto the GPU comes with a delay and therefore, using GPUs for real-time protocols is stillproblematic when using GPUs to increase throughput. Using GASPP does, however,hide the complex computations and only reveals the data relevant for the programmerand increases the extendability compared to using special-purpose hardware.

A more suitable approach related to this thesis is presented by Jamshed et al. [37] wherethey present an IDS called Kargus. Jamshed et al. investigate the cooperation betweenthe CPU and GPU on commodity hardware. As stated earlier, using the GPU intro-duces delays because of the data transfer [36] and because of this Jashed et al. tries tofind the threshold of the packet size to see when it is the delay is lower of transferringit over to the GPU. They found that packets smaller than 82B (Ethernet frame size) per-form multi-string matching faster on the CPU, and more massive will perform betterwhen transferring to the GPU. With this fact, they implement Kargus to dynamicallyswitch between the CPU and GPU for different packets when suitable. They compareKargus with Snort and finds that Kargus can achieve up to 33 Gbps for regular trafficand around 9 to 10 Gbps when all traffic is malicious. Snort, however, failed to even benear the same throughput as Kargus.

17

4 Method

In this chapter, the method is described of how the experiment is done. It presents theSQL injections, data collection, classifiers, test architecture design, hardware, and themeasurements.

4.1 SQL injection

This section presents the SQL injection attacks and how they can be crafted to make itharder to detect.

4.1.1 Simple SQL injection

To simulate that an SQL injection has occurred, a simple UPDATE query was insertedduring the REGISTER. This is done by modifying the normal network trace, and in-stead of sending a standard REGISTER a modification on the REGISTER is done. Bychanging the User 1 REGISTER packet to send instead a malformed REGISTER con-taining the UPDATE SQL query in the Authorization header at the username, the packetis now malicious. Listing 4.1 shows precisely how it will be seen as to the DPI on theapplication layer seven.

The first line in the protocol (Listing 4.1) represents what method that is used, includingthe server address. It is followed by the line Via which indicates the location wherethe response is to be sent. Max-forward specifies the maximum of hops a request cantransit. From includes the originator of the request and To is the recipient of the request.Call-id is a unique identifier for a series of messages. CSeq serves as an identifier totrack the order of transaction, and it also includes a method. Contact contains a SIPor SIPS URI which can be used to contact for subsequent requests. Authorizationincludes authentication credentials of a user can be used with HTTP authentication.

Listing 4.1: SIP REGISTER SQL injection for User 1.

REGISTER sips:ss2.biloxi.example.com SIP/2.0Via: SIP/2.0/TLS client.biloxi.example.com:5061;branch=z9hG4bKnashd92Max-Forwards: 70

18

4.1. SQL injection

From: Bob <sips:bob@biloxi.example.com>;tag=ja743ks76zlflHTo: Bob <sips:bob@biloxi.example.com>Call-ID: 1j9FpLxk3uxtm8tn@biloxi.example.comCSeq: 2 REGISTERContact: <sips:bob@client.biloxi.example.com>Authorization: Digest username="user1’;UPDATE subscriber SET first_name=’hacker’where username=’bob’--", realm="atlanta.example.com"nonce="ea9c8e88df84f1cec4341ae6cbe5a359", opaque="",uri="sips:ss2.biloxi.example.com",response="dfe56131d1958046689d83306477ecc"Content-Length: 0

4.1.2 Weirdly formatted SQL injection

A SQL injection can be formatted in very unorthodox ways but still valid as a SQLquery. For example, by applying multiple spaces, switching between lowercase anduppercase, and defining random variables. Therefore this must be accounted for inthe search for SQL injections. Listing 4.2 illustrates a weirdly formatted SQL injection,where the red text is the SQL injection and blue text is auto-generated valid data. TheSQL injection includes random amount of spaces between each operator, method andvalues. It also switches between capital and lowercase characters and uses randomvalues. This makes it tricky for a naive Bayes classifier to classify correct as the randomvalues does not give trait for SQL injection from the training data. Also as seen inthis example, it includes a header Allow which specifies which method the user agentallows.

Listing 4.2: SIP REGISTER weird formatted SQL injection.

REGISTER sip:87.106.136.32:5060;transport=udp SIP/2.0Via:SIP/2.0/UDP 185.250.108.158:21755;rport;branch=

z9hG4bKPjp9M4yIhZ25v1x7q-BjpZ8X.zptgZtruTFrom: <sip:82H9Xymavbi4f4zvDi8t@87.106.136.32>;tag=

DxnOIAxkAZ8JAAPHIcFRSNCG-LBK.UbWTo: <sip:82H9Xymavbi4f4zvDi8t@87.106.136.32>Call-ID:0tPPPDAj0-uPoLRs3J9ovnqzvGfmwJgMCSeq:1482 REGISTERMax-Forwards:70Allow:PRACK,INVITE,ACK,BYE,CANCEL,UPDATE,INFO,SUBSCRIBE,NOTIFY,

REFER,MESSAGE,OPTIONSAuthorization:Digest username="’;SELECT salt, pass FROM

subscriber;DROP TABLE subscriber;UpDaTe subscriberseT pass = ’okS5n’,username = ’NcAiiWH8GgGIrKjM9nDkyD1Hbms1Oem9LmRC6PqFDrZsPLeSw9’WHEre username = ’vu7bTQ4AgsRNtvvjqzJHVtxPV6v2G5NaJ5mym’ AnD

pass = ’jlZLoxKsHfQxrXDlmEDq7iSeS’Or first_name = ’

qw7DD1Ufs503EL21crIaseZ’ aND salt = ’i9HE5pVZgqYFFcEcuzGjWtK5v80Ee2MRZaq9Oc86aF5TF1sVutKJw1xPZy4’

oR last_name = ’2bE88v74LLprD7fr6M8NLMheo05RV1x7rZc9x0d8Wz4y9gAOxe0aOY’ ;

19

4.1. SQL injection

DROP TABLE subscriber;TRUNCATE TABLE subscriber; ----",realm="87.106.136.32",nonce="57305a3400000c349f463704c03776fee69173ba18251941",uri="sip:87.106.136.32:5060;transport=udp",response="8e69861d68f3efad861da5c8f3f994a2"

Contact: <sip:82H9Xymavbi4f4zvDi8t@185.250.108.158:21755;ob>Expires:1260Content-Length:0

4.1.3 Complex SQL injection

There might be very complex queries that includes recursive functions, loops and math-ematical functions; these SQL queries also have to be identified. Listing 4.3 illustratesa complex SQL query. The query includes recursive functions, mathematical functionsand is overall very hard to understand. By using many different functions it makes itharder for a classifier to identify it as SQL injection as it has less clearer traits.

Listing 4.3: SIP REGISTER weird formatted SQL injection.

REGISTER sip:45.78.126.244:5060;transport=udp SIP/2.0Via:SIP/2.0/UDP 101.215.151.122:21755;rport;branch=

z9hG4bKPjp9M4yIhZ25v1x7q-BjpZ8X.zptgZtruTFrom: <sip:h0VZ2e4t3b@45.78.126.244>;tag=

DxnOIAxkAZ8JAAPHIcFRSNCG-LBK.UbWTo: <sip:h0VZ2e4t3b@45.78.126.244>Call-ID:0tPPPDAj0-uPoLRs3J9ovnqzvGfmwJgMCSeq:1482 REGISTERMax-Forwards:70Allow:PRACK,INVITE,ACK,BYE,CANCEL,UPDATE,INFO,SUBSCRIBE,NOTIFY,

REFER,MESSAGE,OPTIONSAuthorization:Digest username="’; WITH RECURSIVE q(r, i, rx, ix

, g) AS ( SELECT r::DOUBLE PRECISION * 0.02, i::DOUBLEPRECISION * 0.02, .0::DOUBLE PRECISION , .0::DOUBLE PRECISION, 0 FROM generate_series(-60, 20) r,generate_series(-50, 50) i UNION ALL SELECT r, i, CASEWHEN abs(rx * rx + ix * ix) <= 2 THEN rx * rx - ix * ix END+ r, CASE WHEN abs(rx * rx + ix * ix) <= 2THEN 2 * rx * ix END + i, g + 1 FROM q WHERE rx IS NOTNULL AND g < 99 ) SELECT array_to_string(array_agg(s ORDERBY r), ’’) FROM ( SELECT i, r, substring(’ .:-=+*#%@’, max(g) / 10 + 1, 1) s FROM q GROUP BY i, r ) q GROUP BY iORDER BY i--",realm="45.78.126.244",nonce="57305a3400000c349f463704c03776fee69173ba18251941",uri="sip:45.78.126.244:5060;transport=udp",response="8e69861d68f3efad861da5c8f3f994a2"

Contact: <sip:h0VZ2e4t3b@101.215.151.122:21755;ob>Expires:1260Content-Length:0

4.1.4 Spaceless SQL injection

SQL queries do not necessarily depend on having spaces between and be executed asshown in Listing 4.4 and Listing 4.5. By using comments or citation characters, the

20

4.1. SQL injection

spaces in a SQL query can be avoided. This also has to be taken account for by eitherremoving them and replacing with spaces or including them as traits for SQL injection.Listing 4.4 uses comments to avoid spaces and in this approach it is easy to just replacethem with spaces instead. Listing 4.5 uses citation to avoid spaces, and in this SQLinjection replacing with spaces does not work well as some of the citations are requiredfor setting values for columns in SQL. Instead citations can be included as traits for SQLinjection.

Listing 4.4: SIP REGISTER weird formatted SQL injection.

REGISTER sip:45.78.126.244:5060;transport=udp SIP/2.0Via:SIP/2.0/UDP 101.215.151.122:21755;rport;branch=

z9hG4bKPjp9M4yIhZ25v1x7q-BjpZ8X.zptgZtruTFrom: <sip:h0VZ2e4t3b@45.78.126.244>;tag=

DxnOIAxkAZ8JAAPHIcFRSNCG-LBK.UbWTo: <sip:h0VZ2e4t3b@45.78.126.244>Call-ID:0tPPPDAj0-uPoLRs3J9ovnqzvGfmwJgMCSeq:1482 REGISTERMax-Forwards:70Allow:PRACK,INVITE,ACK,BYE,CANCEL,UPDATE,INFO,SUBSCRIBE,NOTIFY,

REFER,MESSAGE,OPTIONSAuthorization:Digest username="’;select/**/*/**/from/**/

subscriber;--",realm="45.78.126.244",nonce="57305a3400000c349f463704c03776fee69173ba18251941",uri="sip:45.78.126.244:5060;transport=udp",response="8e69861d68f3efad861da5c8f3f994a2"

Contact: <sip:h0VZ2e4t3b@101.215.151.122:21755;ob>Expires:1260Content-Length:0

Listing 4.5: SIP REGISTER weird formatted SQL injection.

REGISTER sip:45.78.126.244:5060;transport=udp SIP/2.0Via:SIP/2.0/UDP 101.215.151.122:21755;rport;branch=

z9hG4bKPjp9M4yIhZ25v1x7q-BjpZ8X.zptgZtruTFrom: <sip:h0VZ2e4t3b@45.78.126.244>;tag=

DxnOIAxkAZ8JAAPHIcFRSNCG-LBK.UbWTo: <sip:h0VZ2e4t3b@45.78.126.244>Call-ID:0tPPPDAj0-uPoLRs3J9ovnqzvGfmwJgMCSeq:1482 REGISTERMax-Forwards:70Allow:PRACK,INVITE,ACK,BYE,CANCEL,UPDATE,INFO,SUBSCRIBE,NOTIFY,

REFER,MESSAGE,OPTIONSAuthorization:Digest username="’;select’username’from’

subscriber’;--",realm="45.78.126.244",nonce="57305a3400000c349f463704c03776fee69173ba18251941",uri="sip:45.78.126.244:5060;transport=udp",response="8e69861d68f3efad861da5c8f3f994a2"

Contact: <sip:h0VZ2e4t3b@101.215.151.122:21755;ob>Expires:1260Content-Length:0

21

4.2. Data collection

4.1.5 Valid input

There is valid input that might remind of SQL injection. SIP REGISTER authorizationmight use a legitimate username that is similar to SQL injection. The DPI should notflag these types of usernames. The list below shows an example of valid names to usethat can remind of SQL syntax.

• SELECT

• UPDATE

• SELECT_username_FROM_subscriber

• subscriber

• TRUNCATETABLEsubcriber

Also, there is the SIP method MESSAGE which can send a valid message containing SQLcode. This is because the sender might want to send something that would representa SQL injection or just SQL code. The DPI should not classify this as SQL injection.An example of how such a packet can look like is shown in Listing 4.6. Here the SQLinjection is inserted after the text SQL injection code:.

Listing 4.6: SIP MESSAGE example of a valid text message with SQL code.

MESSAGE sip:df6al4@1.16.124.225:21755 SIP/2.0Via: SIP/2.0/TCP 205.145.14.248:49168;branch=z9hG4bK776sgdkseMax-Forwards: 70From: sip:3adf23@205.145.14.248;tag=49583To: sip:df6al4@1.16.124.225Call-ID: D4oCX-qyokk4gujwYz0HaIppd6xOFmgZCSeq: 1 MESSAGEContent-Type: text/plainContent-Length: 54

SQL injection code: ’;DROP TABLE PaRIu_BoZaNAWq_dDX;--

4.2 Data collection

The detection mechanism were not applied on live data. Instead a PCAP file 1 is used.The reason for this is because the attacks do not then have to be simulated in real-time.They can instead be inserted into the PCAP file afterward, making it easier and moreprecise. However, this may not accurate capture the delays as it may differ when read-ing from a file and reading from the network interface. The key point is the differencebetween using the implemented classifier and not in the DPI, and this difference shouldbe similar.

The collected data to the PCAP file is from a real VoIP phone call using SIP and RTP.No exploits or attacks is present in the call; it only contains a pre-recorded voice talkingover the phone.

To simulate SQL injections, a Python script is used to generate different types of SQLinjections. These different types of SQL injection are produced by:

• Generating random different tables and rows to use.

1A file containing captured packets from a network trace.

22

4.3. Naive Bayes classifier

• Randomly using different responses and methods with SQL injection on differentheaders.

REGISTER

ACK

200 OK

600 Busy Everywhere

180 Ringing

• Randomly switching between different types of SQL queries.SELECT

UPDATE

DELETE

INSERT

TRUNCATE

DROP

• Randomly switching between lowercase and uppercase.

• Randomly selecting different rows and tables present in the database.

• Randomly applying different amount of spaces.

• Randomly applying AS statements

• Randomly using WHERE statement (where applicable) with different variablesand random values.

• Randomly applying more queries.

• Randomly generating different IP addresses and usernames.

When generating valid SIP data, everything is made the same as the SQL injection ex-amples except including the SQL injection itself. Listings 8.1-8.6 (in Section 8.1) illus-trate examples of how the data is generated. By replacing the {x} with the randomlygenerated information, a legitimate SIP packet is generated.

• {0} is replaced with a random generated IP address.

• {1} is replaced with a random generated IP address.

• {2} is replaced with a random generated valid username.

• {3} is replaced with a random generated valid username.

• {4} is replaced with the SQL injection. If empty, no SQL injection is present in thepacket.

By this approach, suitable training and test data are generated. However, this Pythonscript does not support some of the special cases of weird syntaxes, such as avoidingspaces, using synonyms and functions.

4.3 Naive Bayes classifier

Naive Bayes classifier is a suitable approach for identifying SQL injections. This is be-cause SQL queries are formatted very similarly to natural language and Naive Bayesis known to be accurate at classifying natural languages. The event model used isBernoulli naive Bayes, and this is because it is binary, meaning if a word is presentor not. If a word is present (a word unique for SQL injection), it will score higher. Also,to avoid the meaning of lower case and upper case, all text can be made lower casedsince the meaning of the syntax is still the same, whether it is capital letters or not, and

23

4.4. Regex classifier

the values of columns, in SQL, are irrelevant. By generating different types of SQL in-jection attempts in different SIP methods, Naive Bayes will have an extensive featurevector to be able to identify features that are present in SQL injection. Also, differenttypes of valid SIP methods are generated to have labels for each of the classificationof SQL injection or valid data. After testing with different amounts of packets, using amore packets for the training data gave better accuracy. Therefore, 18000 of valid packetand 18000 SQL injected packets, the classifier scored a high accuracy.

To implement this, Python is used together with the library Scikit-learn, which includesdifferent machine learning algorithms which are easy to use [38]. To represent the fea-ture vector, a CountVector is used, which counts the frequency of words in documents.Without any modification, CountVector will only split on spaces, since the typical caseis used on natural language. However, as showed in Section 4.1.4, a SQL query can alsobe written without spaces. To avoid this, a customized tokenizer was designed to spliton everything that is not an underline, character, or number. By this approach, eachword of the query can be identified even though no spaces are present. One problemwith this approach is that the character asterisk will not be seen as a part of the SQLquery. The character also has to be seen as a delimiter as it a part of commenting in SQL/*comment*/.

Another aspect is that the comment syntax -- should be a feature since it is a part ofa SQL injection. However, the hyphen character is also a subtracter operator and toseparate the two values as independent features, and the tokenizer can not split on thehyphen. To solve this, a regex to split a string is introduced. This regex query is quitecomplex but solves the problem introduced with comments and special characters. Theregex query used is shown in Listing 4.7. Also, to take account for the delimiters, theyare also included in the feature vector. This because they are still relevant as a part ofclassifying the text as SQL injection.

Listing 4.7: The regex query used to identify comments and special characters

(((\/\*)(.|\r\n)*?(\*\/))|[^_a-zA-Z\d])

The use of machine learning might introduce new problems. Using Naive Bayes classi-fier will have further delays, but since this only affects the SIP protocol, it is not crucial,but the throughput should still be high enough not to present new DoS attacks. Asimple approach to avoid this is to drop SIP packets larger than most top detected SIPpackets in bytes from the test data. To make it a bit more failsafe, adding another 100bytes will make it more robust for false-positives. However, the SIP method MESSAGEwould have to ignore this rule. It is because a message has irregular sizes depending onwhat the user wants to send. Sectra, however, might want to put a limit on the messagesize and then the approach of dropping packet over a certain size can still be used inthe case of MESSAGE.

4.4 Regex classifier

Using regex to classify SQL injection introduces problems. It is hard and time-consuming to write a regex that identifies SQL syntax in text. A more suitable approachis to identify SQL meta-characters that are common in a SQL injection, such as --, #,etc. However, because the SIP protocol can contain these special characters in a validpacket, the headers where the SQL injection can be injected has to be identified and onlyinspected at them rather than the whole protocol. The SQL meta-characters that will beused in this case are:

• ’

24

4.5. Test setup for Python

• -

• =

• ;

The headers of interest, where SQL injection might occur, are

• From

• To

• Authorization

Listing 4.8 shows an example of where SQL injections can occur. They can co-occur indifferent headers or just in one of them.

Listing 4.8: SIP REGISTER example of where SQL injection can occur.

REGISTER sip:45.78.126.244:5060;transport=udp SIP/2.0Via:SIP/2.0/UDP 101.215.151.122:21755;rport;branch=

z9hG4bKPjp9M4yIhZ25v1x7q-BjpZ8X.zptgZtruTFrom: <sip:h0VZ2e4t3bSQL INJECTION@45.78.126.244>;tag=

DxnOIAxkAZ8JAAPHIcFRSNCG-LBK.UbWTo: <sip:h0VZ2e4t3SQL INJECTIONb@45.78.126.244>Call-ID:0tPPPDAj0-uPoLRs3J9ovnqzvGfmwJgMCSeq:1482 REGISTERMax-Forwards:70Allow:PRACK,INVITE,ACK,BYE,CANCEL,UPDATE,INFO,SUBSCRIBE,NOTIFY,

REFER,MESSAGE,OPTIONSAuthorization:Digest username="SQL INJECTION",realm

="45.78.126.244SQL INJECTION",nonce="57305a3400000c349f463704c03776fee69173ba18251941",uri="sip:45.78.126.244:5060;transport=udp",response="8e69861d68f3efad861da5c8f3f994a2"

Contact: <sip:h0VZ2e4t3b@101.215.151.122:21755;ob>Expires:1260Content-Length:0

For each of the header of interest, the text is matched with a regex written to identifythe specified SQL meta-characters. Listing 4.9 shows the regex query used. Symantecpresents the regex in one of their articles [39].

Listing 4.9: The regex query used to identify SQL meta-characters [39]

/((\%3D)|(=))[^\n]*((\%27)|(\’)|(\-\-)|(\%3B)|(;))/i

4.5 Test setup for Python

The tests were performed on a stationary computer using Windows 10 as OperatingSystem (OS). This is used with VirtualBox, which makes it possible to use Ubuntu 16.04.Using VirtualBox introduces some performance degradation; however, it still allows usto look at the difference between using regex classifier and naive Bayes classifier. Thisperformance degradation should not be a problem because SIP is not real-time depen-dent (to some degree at least). Having 500 milliseconds delay is defined as an okaydelay per packet; however, with that said it can still introduce a new attack, DoS. To ac-count for this aspect, in addition to measuring the time and accuracy of regex classifierand naive Bayes classifier, a measure of the throughput when increasing the number

25

4.6. Measurements

of characters (bytes) in the packet will be tested. The new characters will be placed ina header that is also processed by the regex in order to have reasonable times of pro-cessing the packet; otherwise, regex would barely be affected. This test only includesthe application layer, which is written in a text file as the SIP protocol is text-based.With other words, the PCAP file is not used, but rather text files, including different SIPpackets.

4.5.1 Hardware

The hardware used on the stationary computer will impact the time measurements. Thecomputer that performs the test is using:

• CPU: Intel Core i5-4590 @ 3.3 GHz• RAM: 16 GB• Hypervisor: Oracle VM VirtualBox 6.0.4• OS: Windows 10 (Host), Ubuntu 16.04 (Guest)

4.6 Measurements

In this section, the measurements is described. The used measurements are time,throughput, and accuracy.

4.6.1 Classification time

We measure the total time it takes for the program to classify the test data as eitherSQL injection or as valid. This includes for the regex classifier to extract the headersof interest and running the regex script on the headers of interest. For naive Bayesclassifier, it includes to transform the input to a feature vector and classifying it. Thetime it takes to learn the naive Bayes classifier is not measured since it will only bedone once and can be done at startup of the system and therefore, has no meaning inthis context. The regex classifier also has a compiling time which is not measured, asit is also only done once. Finally, to account for the variations in runtime (eg. otherprocesses might co-occur while running the script) the script will be tested ten timesand then an average is taken.

4.6.2 Packet throughput

We define packet throughput as classified packets per second (p/s). This means that thenumber of classified packets used will be divided by the time it took to make classifica-tion of the packet. This throughput is applied in the tests for Python. The throughputfor the nDPI will be defined as bits per seconds.

4.6.3 Accuracy

The accuracy is measured by dividing the total amount of tests with the number oftrue positives and false negatives. This will then describe the amount of test that was

successful in percentage. The accuracy is defined as a =ttp + t f n

ttot, where ttot is the total

amount of tests, ttp is the true positives and t f n is the false negatives.

4.7 nDPI

nDPI is built only to identify protocols. Therefore, modification of the code to sup-port also identifying threats in the application layer has been implemented. It goes

26

4.7. nDPI

through the application layer as the protocol has been determined. When detected, itlooks whether it is SIP to go through the packet again to identify possible SQL injectionattacks. If the packet is not SIP, it is ignored. The regex classifier is tested when thepacket is identified as a SIP to see how affected the throughput and delay is by extend-ing to also classifying if the SIP packets contain SQL injection. This test also uses thesame hardware and virtual machine as the test setup for Python. It does, however, in-clude the whole stack from the network trace rather than only the top application layer,includes other packets such as RTP and is written in C.

To test the DPI, all packets are counted and also counting every packet that was notidentified by the DPI. This is to see if there are any SIP packets that nDPI could notidentify. Also, eight different types of SQL injections were randomly injected in differ-ent kinds of SIP packets. To see what kind of impact the regex classifier implementationhas, ten samples is taken to get an average throughput of both with and without theregex classifier implementation.

27

5 Results

In this chapter, the results of the experiments is presented.

5.1 Naive Bayes classifier

Table 5.1 shows the time, throughput, and accuracy when using different types of"valid" and SQL injected SIP packets. The packets used are the packets generated bythe Python script described in Chapter 4. the column "SQL injection" shows the num-ber of packets containing an SQL injection, "Valid" shows the number of valid packets,"Classification time" shows the time it takes to classify the packet as either valid or SQLinjection in seconds, "Packet throughput" shows the number of classified packets persecond (p/s), and "Accuracy" shows the percent of packets that is classified correctly(true positives and false negatives). As seen in Table 5.1, all packets are correctly classi-fied with an accuracy of 100%. Also, an SQL injected packet takes about 14µs to identifycompared to "valid" packets. The last row illustrates the most common real-world case,as most of the data sent typically would be "valid". It is also clear that increasing thenumber of packets sent increases the throughput. Another aspect worth mentioning isthat the throughput is higher when only valid packets are used, compared to only SQLinjection packets. Comparing the first and second row, the difference in the through-put is (1471´ 1305) = 166 p/s, and comparing the third and fourth row results in adifference of (2885´ 2587) = 298 p/s.

Table 5.2 includes more complex and special cases of SQL injections, as defined in Sec-tion 4.1.2, aside from the generated SQL injection data which was classified by the naiveBayes classifier. Again, the last row is perhaps the most realistic use-case as in the realworld, as most of the data sent typically would be "valid". As seen in the first row ofTable 5.2 and sixth row of Table 5.1 the timing differs with (0.765´ 0.692) = 0.073 sec-onds. However, there is only a difference of 11 packets, which seems like a long timedifference for eleven packets. Table 5.2 also shows that using valid packets will resultin higher throughput. The last row, which has the highest number of valid packets, hasthe highest throughput.

28

5.1. Naive Bayes classifier

In the tests from Table 5.2 only one failed. Listing 5.1 shows which packet that failed.The naive Bayes classifier classifies the text message as SQL injection. However, as it isallowed to send text messages that resemble SQL injection, this is classified wrong.

29

5.2. Regex classifier

Naive Bayes classifier

SQL injection Valid Classificationtime (s)

Packet through-put (p/s)

Accuracy

1 0 7.66ˆ 10´4 1,305 100 %0 1 6.80ˆ 10´4 1,471 100 %100 0 0.0387 2,587 100 %0 100 0.0347 2,885 100 %100 100 0.0780 2,563 100 %1000 1000 0.692 2,891 100 %10000 10000 6.801 2,941 100 %1 10000 3.160 3,165 100 %

Table 5.1: Naive Bayes classifier on generated SQL injections.

Naive Bayes Classifier

SQL injection Valid Classificationtime (s)

Packet through-put (p/s)

Accuracy

1007 1004 0.765 2630 99.95 %7 4 0.00460 2390 90.91 %7 10004 3.191 3138 99.99 %

Table 5.2: Naive Bayes classifier on generated SQL injections and more complex special casesof SQL injection.

Listing 5.1: SIP MESSAGE that incorrectly classified by naive Bayes

MESSAGE sip:df6al4@1.16.124.225:21755 SIP/2.0Via: SIP/2.0/TCP 205.145.14.248:49168;branch=z9hG4bK776sgdkseMax-Forwards: 70From: sip:3adf23@205.145.14.248;tag=49583To: sip:df6al4@1.16.124.225Call-ID: D4oCX-qyokk4gujwYz0HaIppd6xOFmgZCSeq: 1 MESSAGEContent-Type: text/plainContent-Length: 54

SQL injection code: ’;DROP TABLE PaRIu_BoZaNAWq_dDX;--

5.2 Regex classifier

Table 5.3 presents the result of using the regex classifier with same generated data thatwas used for the naive Bayes classifier. As seen compared to Table 5.1, the naive Bayesclassifier has long delays and lower throughput, and using regex increases the through-put and decreases the delays explicitly. The accuracy of the regex classifier is the sameas the naive Bayes classifier on the generated data, which is 100% for all test cases.Compared to Tables 5.1 and 5.2, Tables 5.3 and 5.4 does not show any clear differencesbetween using valid or SQL injection packets in terms of throughput.

Table 5.4 includes weird and special cases of SQL injections, as defined in Section 4.1.2,aside from the generated SQL injection data which was classified by the regex classifier.Again, the last row is perhaps the most realistic use-case as in the real world, as most of

30

5.3. Regex vs naive Bayes

Regex Classifier

SQL injection Valid Classificationtime (s)

Packet through-put (p/s)

Accuracy

1 0 1.345ˆ 10´5 74323 100 %0 1 1.352ˆ 10´5 73974 100 %100 0 0.00158 63171 100 %0 100 0.00144 69302 100 %100 100 0.00289 69157 100 %1000 1000 0.0268 74692 100 %10000 10000 0.268 74685 100 %1 10000 0.138 72300 100 %

Table 5.3: Regex classifier on generated SQL injections.

Regex Classifier

SQL injection Valid Classificationtime (s)

Packet through-put (p/s)

Accuracy

1007 1004 0.0265 76027 100 %7 4 1.85ˆ 10´4 59338 100 %7 10004 0.132 75856 100 %

Table 5.4: Regex classifier on generated SQL injections and harder special cases of SQL injec-tion.

the data sent typically would be "valid". As seen in Table 5.4 and Table 5.3, using almostthe same amount of data remains similar in the amount of time which is expected. Theregex classifier does not inspect the message body and therefore does not classify theSIP method MESSAGE as SQL injection, which naive Bayes classifier did.

5.3 Regex vs naive Bayes

Table 5.5 shows the time and throughput between the naive Bayes classifier and theregex classifier when using larger packets. The length is defined as the number of char-acters used, the time is defined as milliseconds, and the throughput is packet per sec-ond. As seen in Table 5.5, the decrease is almost linear except in the smaller packetsfor both the naive Bayes classifier and the regex classifier. The regex classifier has asignificant decrease between 712 bytes and 1500 bytes, a decrease of 86%. The regexclassifier has nearly 45 times better throughput at the size of 712 bytes than the naiveBayes classifier, but at 48000 bytes it is only three times better.

As seen in Figure 5.1, the throughput is not linear when increasing the number of bytesused. This means that increasing the packet size affects the classifiers throughput sig-nificant. The larger packets used, the lower the difference, in percentage, the regex, andnaive Bayes classifiers throughput is. It is also clear that using regex for the averagepacket size 712 bytes is much faster, and when increasing the packet size, the differenceis not as significant.

31

5.3. Regex vs naive Bayes

Naive Bayes vs Regex

Naive Bayes Naive Bayes Regex Regex

Length (B) Classificationtime (s)

Packet through-put (p/s)

Classificationtime (s)

Packet through-put (p/s)

712 8.89ˆ 10´4 1125 1.976ˆ 10´5 505951500 0.00111 899 1.42ˆ 10´4 70493000 0.00165 607 3.45ˆ 10´4 29026000 0.00272 368 7.30ˆ 10´4 137012000 0.00565 177 0.00177 56424000 0.0102 98 0.00327 30648000 0.0191 52 0.00672 149

Table 5.5: The throughputs impact when increasing the size of the packets for regex and naiveBayes.

103 104

Size (bytes)

102

103

104

Thro

ughp

ut (p

/s)

Naive Bayes classifierRegex classifier

Figure 5.1: Throughput of regex classifier and naive Bayes classifier for different packet sizes.

T better understand the processing time difference observed between the two classifiers,we next plot distribution plots of the processing delays. In particular, Figures 5.2 and 5.3show the Cumulative Distribution Function (CDF) and Complementary CumulativeDistribution Function (CCDF) when using the classifiers on 2000 generated packets. Weagain note the regex classifier handles deviations better than the naive Bayes classifier.The regex classifier has its range between 10µs and 40µs, but with some cases whereit goes over 100µs, while the naive Bayes classifier has between 150µs and 520µs, withsome special cases where it can go up to 130ms. We have also ran experiments were weincluded hard special cases too. The corresponding results when also including thesespecial formatted packets are shown in Figures 5.5 and 5.6. Here it is interesting to notethat many of these packets end up continuity to the tail of the distribution, as seen bycomparing the CDDF’s in Figures 5.3 and 5.6.

32

5.4. Regex classifier in nDPI

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500

CD

F

Classification time (µs)

Naive BayesRegex

Figure 5.2: CDF of the classification timeof the regex classifier and naive Bayesclassifier for 2000 generated packets.

0.0001

0.001

0.01

0.1

1

10 100 1000 10000

CC

DF

Classification time (µs)

Naive BayesRegex

Figure 5.3: CCDF of the classification timeof the regex classifier and naive Bayesclassifier for 2000 generated packets.

Figure 5.4: CDF and CCDF for 2000 packets

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500

CD

F

Classification time (µs)

Naive BayesRegex

Figure 5.5: CDF of the classification timeof the regex classifier and naive Bayesclassifier for 2011 generated and specialformatted packets

0.0001

0.001

0.01

0.1

1

10 100 1000 10000 100000

CC

DF

Classification time (µs)

Naive BayesRegex

Figure 5.6: CCDF of the classification timeof the regex classifier and naive Bayesclassifier for 2011 generated and specialformatted packets

Figure 5.7: CDF and CCDF for 2011 packets

5.4 Regex classifier in nDPI

Out of the 13,897 packets evaluated, the DPI failed to identify eight packets. Thesepackets are either the SIP method MESSAGE or the response SIP/2.0 200 OK. It isknown from earlier testings of nDPI that it is not able to classify all packets correct [5].Listings 8.9–8.16 (from Section 8.2) shows the packets that could not be identified bythe DPI. This results in nDPI having a total accuracy of 0.9994. If only looking at theamount of SIP packets, which was a total of 281, the accuracy is 0.97.

As seen in Listing 8.16, the header From also included a SQL injection. This SQL injec-tion could not be identified by the regex as it was never even identified as a SIP packetby the DPI.

The amount of found SQL injections found by the Regex classifier in the DPI was sevenout of eight. The DPI had an average throughput of 4.6 Gbps when used without theimplemented regex classifier. When using the regex classifier, the average throughputwas 2.01 Gbps. This means that it had a decrease of 55.6% throughput when using theregex classifier.

33

5.4. Regex classifier in nDPI

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160

CD

F

Classification time (µs)

Regex

Figure 5.8: The graph shows the distribution of the classification time of the regex classifierin the DPI on the network trace of the SIP call

The regex classifier in nDPI classifies the packets in a range between 15µs and 150µs,when classifying packets from a SIP call, as seen in Figure 5.8. About 75% are classifiedin 40µs or less. However, we also observe 5% of cases has a classification time above80µs. We note that there is a step in the CDF. This is most likely due to the packetsincluding SQL injection.

34

6 Discussion

In this chapter, the results and the method is discussed.

6.1 Results

This section relates to the result of the classifiers and nDPI.

6.1.1 The classifiers

As seen in Tables 5.1–5.4, the accuracy is 100% except in the case of Table 5.2 wherethe naive Bayes classifier found one true-negative. However, we expect that the naiveBayes will be better in the case of finding SQL injections as it looks at the whole proto-col, whereas the regex only looks in specified headers. This means if one header thatis not included for the regex is vulnerable to SQL injection, it would not be able toidentify it. Therefore using naive Bayes to identify SQL injection is more reliable as itwould not have to take account for this. Overall it would be better to flag one packetas SQL injection incorrect rather than missing one SQL injection. However, as seen inFigure 5.1, the naive Bayes classifier can not maintain a higher throughput as the packetsizes increases, which the regex classifier can do better. This means that a naive Bayesimplementation would be more vulnerable to a DoS attack. Both the regex classifier andthe naive Bayes classifier has different kinds of pros and cons. If one of these classifiersis used in a scenario where DoS attacks is not a threat, it would recommend using thenaive Bayes classifier; otherwise, the regex classifier seems more suitable to lower thechances of DoS attacks.

When replicating these tests, the result can differ because the training data is mostlyauto-generated from the Python script. However, as there have been many experimentswhere the training data was recreated again, and the result did not differ, it is somewhatrobust. The timings might also vary depending on what hardware used and if using iton a host rather than a guest as what was done in this test on VirtualBox. However, thedifferences in the timings in percent should be the same.

Looking at Table 5.1, the naive Bayes classifier, it is interesting that when sending onlyone packet of the throughput is much lower compared to sending many packets. It

35

6.2. Method

seems like there is a constant delay, and this constant delay will not affect when usingmany packets, but when sending one packet, it has a significant impact on the through-put. This does not seem to be a problem for the regex classifier. Looking at Table 5.3,the throughput when switching between one packet and many packets seem to be abit random. This is, of course, expected as the computers workload is not fixed, thebackground processes affect the run time.

6.1.2 nDPI

As seen from Listings 8.9–8.16 the method and response that nDPI failed to identifywere method MESSAGE and response SIP/2.0 200 OK. This is interesting as it seemslike it should be straight forward to identify them as SIP. With this said, including aSQL injection in one of these methods or response, a SQL injection could go throughundetected. Unfortunately, the earlier paper that investigated the performance of nDPIdid not include SIP nor RTP [17]. It would be interesting to validate my result of thenDPI to someone else work.

6.2 Method

The method was designed to test as many forms of SQL injections as possible withinthe time frame. Also, to test the regex classifier in nDPI to see what the performancedegrade was. In this chapter, this is discussed and other important points of the method.

6.2.1 Python classifier

The classifiers that were written in Python could give varying times, depending onwhen the script was executed. This is because it was performed on a virtual machineand with some background processes from both the guest and the host OS that mightconflict. It was, however, minimized by doing ten tests with same parameters and thenan average of the time was taken to reduce this variation. Despite this, it still had someflaws as the test was performed straight after each other, it would probably give a moreproper result if it were performed with some delay after each test.

6.2.2 Regex classifier

Since we did not know what exactly from the protocol the SIP server uses in thedatabase (do they store the IP addresses, usernames for example), we had to searchfor typical SQL injection in SIP and make assumptions. The obvious is that the user-name is stored and also the IP address. However, more than this, we were not awareof, and this is a limitation. The regex classifier did not look through the whole protocolbut rather parts of the protocol where we know there are vulnerabilities. If there aremore parts of the protocol that are vulnerable to SQL injection, the regex classifier willnot detect it.

6.2.3 Training and test data

The data used to train the naive Bayes classifier is auto-generated from the Pythonscript. This means it is somewhat lacking with different ways to perform SQL injec-tions as it could have many more different ways of SQL injections. It also does notinclude all the examples of methods that can be used in SIP. Because of this, it mightexist different SQL injections with varying types of methods for SIP that could pass theclassifier. To minimize this, except for the auto-generated data, special cases found fromsearching on the Internet were used, but this is only limited to what was found. This is

36

6.3. The work in a wider context

also the reason for the classifier to score as high as it did. The test data is biased as itdoes not include more than was known to us at the time of the implementation

6.2.4 nDPI

There were supposed to be two implementations of nDPI; one using naive Bayes andone using regex to classify packets. However, as an implementation of Bernoulli naiveBayes classifier for C was not found as open-source. There was one implementationof Bernoulli naive Bayes classifier for C++; however, the documentation of how thetraining data were supposed was formated poor, and we had no experience of howto link C++ libraries to C. Because of this and the short timespan, we prioritized toimplement the regex classifier and test it.

Also, the thesis was planned to include a test of a real scenario with either real servers orvirtual machines to test multiple VoIP calls simultaneously to see what the delays arecompared to not using DPI. However, the timespan was to narrow, and this scenariocould not be tested. It would give a more reliable answer to research question 3, but asof now, we can only speculate from the given result.

6.3 The work in a wider context

Except for using classifiers to detect SQL injection in SIP, our results can be applied toother text-based protocols, such as HTTP. Using naive Bayes would not even requireany modification except for the training data to be functional with HTTP. Using regexwould require a new definition of headers to investigate as it should not be used on thewhole protocol, instead on specific parts where SQL injection is possible.

6.3.1 Ethics of DPI

As DPI inspects all packets going through the network layer and up to the applicationlevel, the privacy can be discussed. When implementing a system with DPI, the pri-vacy of a user should always be considered. We would not recommend using DPI inany other use than to maintain the greater good of the users’ security or QoS. For exam-ple, using the DPI to block or read content is something that we would not recommendfor an Internet Service Provider (ISP). Some totalitarian societies can use DPI to blockcontent or to supervise. In the case of Sectra’s use, it would be fine as the user areaware that the security is weighted higher than their privacy since the use-case it work-related and not personal. A use-case, regarding non-work-related, where it would beok is to identify the protocol and prioritize it. For example, when using real-time de-pendent data, these protocols could be identified and prioritized to achieve higher QoS.It can also be done to identify some protocols that would have less cost. For example,some data services can have free-of-charge uses. This is, however, questionable. Therehas been a discussion about network neutrality; for example, in Sweden, Telia wantedto have Facebook (and other large social medias) services free-of-charge [40], but thiswould not be a fair as it would benefit Facebook and disadvantage other social media.

37

7 Conclusion

This paper aimed to identify threats of using VoIP and to narrow it down to the fo-cus on identifying SQL injections in SIP. Sectra uses VoIP in some of their products,which means they have vulnerabilities. The threats that were present in VoIP was foundthrough a thorough literature study. The risks found were either solved by using a moresecurely developed version of SIP and RTP, which are Session Initiation Protocol Secure(SIPS) and Secure Real-time Transport Protocol (SRTP). They include authentication andencryption, which removes many problems. However, if a virus infects one of the de-vices, the issue of authentication and encryption is no longer an obstacle for attacker,and therefore, another approach is necessary to detect the attacks. This is where the sug-gested DPI with regex or naive Bayes classifier comes in. It can detect SQL injections inreal-time and can be modified to also take action depending on what the system adminwants to do. Using the regex with the DPI can allow up to approximately 5 Gbps oncommodity hardware.

For Sectra, we would recommended implementing a naive Bayes classifier for nDPI,as it can detect SQL injections that we did not know existed. As the hardware is nota problem for Sectra, the lower throughput of the naive Bayes classifier should not bea problem. Also, because it only applies on SIP packets, which are not as real-timedependent as RTP, for example, it should not interfere with the overall quality of thephone calls.

a) What are the delays and accuracy of using regex or naive Bayes to classify a packetas SQL injection?The delays of an average case are about 0.766 ms for naive Bayes classifier andabout 0.0135 ms for regex and the accuracy is 100 % for regex and in naive Bayesclassifier only one false-positive was found.

b) How robust are a regex and naive Bayes against new attacks, such as DoS?As the packet size increases, the throughput is affected more and more. NaiveBayes classifier has lower throughput but manages the growth of the size betteras it is not changed as much (in percentage). The regex classifier decreases itsthroughput a lot at first, and then when the packet size is above 12,000 bytes, it

38

7.1. Future work

starts to get linear. With this said, the regex implementation does add a new attackas DoS is easier to perform against the system using nDPI.

c) Can regex and naive Bayes be implemented together with DPI without affectingthe overall quality of the VoIP call?The overall quality should be unnoticeable as in the typical case, where the trafficis not to high which can be expected within a private network at least. However,this question can not be answered with substantial evidence as of now as it has notbeen tested in a real scenario where multiple calls pass through the system in realtime.

7.1 Future work

In this section, future work regarding different types of attacks is presented.

7.1.1 Classifying SQL injection

This thesis covers many aspects of identifying SQL injection in SIP while maintaininglow delay and high throughput. However, there are still some aspects that would needfurther work to strengthen our initial questions whether using DPI with regex or naiveBayes to identify SQL injection is a suitable approach.

• An implementation of naive Bayes in C should be tested to see what the through-put would be of using naive Bayes in nDPI.

• To see better if it is possible to have a stable VoIP architecture with a regex classifieror naive Bayes classifier in DPI, a real scenario should be tested. For example, onecould use virtual machines to set up multiple clients and one SIP server to test theVoIP calls and see how long the delays are.

7.1.2 Classifying bogus data

Except for looking into SQL injection, other attacks are worth investigating. One suchattack is to identify whether an encrypted RTP payload contains bogus data or validdata. The reason for investigating this is because one could send bogus data as a formof DoS attack without being flagged as DoS [10]. Recent papers have studied if it ispossible to classify what type of protocol or service that is used on encrypted data [41–44], with similar approach it should be possible to detect bogus data in the encryptedRTP payload by classifying it as false or valid. It would also be interesting to see if itcould maintain the QoS and Quality of Experience (QoE) while using a classifier on theRTP data.

39

Bibliography

[1] Global Internet usage. en. Page Version ID: 882648070. Feb. 2019. URL: https://en.wikipedia.org/w/index.php?title=Global_Internet_usage&oldid=882648070 (visited on 02/15/2019).

[2] VoIP telephone lines in the U.S. 2010-2018 | Statistic. en. URL: https://www.statista.com/statistics/615387/voip- telephone- lines- in-the-us/ (visited on 01/31/2019).

[3] Hello, You’ve Been Compromised: Upward Attack Trend Targeting VoIP Protocol SIP.en-US. Nov. 2016. URL: https://securityintelligence.com/hello-youve-been-compromised-upward-attack-trend-targeting-voip-protocol-sip/ (visited on 01/31/2019).

[4] Open Source Deep Packet Inspection Software Toolkit: ntop/nDPI. original-date: 2015-04-19T04:56:52Z. Jan. 2019. URL: https://github.com/ntop/nDPI (visitedon 02/01/2019).

[5] S. Alcock and R. Nelson. “Measuring the accuracy of open-source payload-basedtraffic classifiers using popular Internet applications”. In: Proceedings of the IEEEConference on Local Computer Networks - Workshops. Oct. 2013, pp. 956–963. DOI:10.1109/LCNW.2013.6758538.

[6] Web Attack Visualization | Akamai. en-US. URL: https://www.akamai.com/us/en/resources/our-thinking/state-of-the-internet-report/web-attack-visualization.jsp (visited on 02/13/2019).

[7] V. Jacobson, R. Frederick, S. Casner, and H. Schulzrinne. RTP: A Transport Pro-tocol for Real-Time Applications. RFC 3550. RFC Editor, July 2003. URL: https://tools.ietf.org/html/rfc3550 (visited on 01/28/2019).

[8] C. Huitema. Real Time Control Protocol (RTCP) attribute in Session Description Proto-col (SDP). RFC 3605. RFC Editor, Oct. 2003. URL: https://tools.ietf.org/html/rfc3605 (visited on 02/04/2019).

[9] The RTP bleed Bug. URL: https : / / www . rtpbleed . com/ (visited on02/05/2019).

40

Bibliography

[10] M. Adams and M. Kwon. “Vulnerabilities of the Real-Time Transport (RTP) Pro-tocol for Voice over IP (VoIP) Traffic”. In: Proceedings of the IEEE Consumer Com-munications and Networking Conference (CCNC). Jan. 2009, pp. 1–5. DOI: 10.1109/CCNC.2009.4784756.

[11] E. Schooler, G. Camarillo, M. Handley, J. Peterson, J. Rosenberg, A. Johnston, H.Schulzrinne, and R. Sparks. SIP: Session Initiation Protocol. RFC 3261. RFC Editor,June 2002. URL: https://tools.ietf.org/html/rfc3261 (visited on01/28/2019).

[12] Two attacks against VoIP | Symantec Connect Community. URL: https://www.symantec.com/connect/articles/two-attacks-against-voip (vis-ited on 02/06/2019).

[13] D. Geneiatakis, T. Dagiuklas, G. Kambourakis, C. Lambrinoudakis, S. Gritzalis,K. S. Ehlert, and D. Sisalem. “Survey of security vulnerabilities in session initi-ation protocol”. In: IEEE Communications Surveys Tutorials 8.3 (2006), pp. 68–81.ISSN: 1553-877X. DOI: 10.1109/COMST.2006.253270.

[14] M. Handley, C. Perkins, and V. Jacobson. SDP: Session Description Protocol. RFC4566. RFC Editor, July 2006. URL: https://tools.ietf.org/html/rfc4566(visited on 01/30/2019).

[15] What is Deep Packet Inspection (DPI) ? | How does it works? URL: https : / /securebox.comodo.com/ssl-sniffing/deep-packet-inspection/(visited on 01/29/2019).

[16] V. Jyothi, S. K. Addepalli, and R. Karri. “DPFEE: A High Performance ScalablePre-Processor for Network Security Systems”. In: IEEE Transactions on Multi-ScaleComputing Systems 4.1 (Jan. 2018), pp. 55–68. DOI: 10.1109/TMSCS.2017.2765324.

[17] L. Deri, M. Martinelli, T. Bujlow, and A. Cardigliano. “nDPI: Open-source high-speed deep packet inspection”. In: Proceedings of the International Wireless Commu-nications and Mobile Computing Conference (IWCMC). Aug. 2014, pp. 617–622. DOI:10.1109/IWCMC.2014.6906427.

[18] Alfred V. Aho and Margaret J. Corasick. “Efficient string matching: an aid to bib-liographic search”. In: Communications of the ACM 18.6 (June 1975), pp. 333–340.DOI: 10.1145/360825.360855.

[19] A. McCallum and k Nigam. “A comparison of event models for Naive Bayes textclassification”. In: Proceedings of the AAAI Workshop on Learning for Text Categoriza-tion. 1998, pp. 41–48.

[20] C. Li, G. Tu, C. Peng, Z. Yuan, Y. Li, S. Lu, and X. Wang. “Insecurity of Voice So-lution VoLTE in LTE Mobile Networks”. In: Proceedings of the ACM SIGSAC Con-ference on Computer and Communications Security (CCS). Denver, Colorado, USA,2015, pp. 316–327. DOI: 10.1145/2810103.2813618.

[21] A. Houmansadr, C. Brubaker, and V. Shmatikov. “The Parrot Is Dead: ObservingUnobservable Network Communications”. In: Proceedings of the IEEE Symposiumon Security and Privacy (SP). Berkeley, CA United States, May 2013, pp. 65–79. DOI:10.1109/SP.2013.14.

[22] A. Houmansadr, T. Riedl, N. Borisov, and A. Singer. “I Want My Voice to BeHeard: IP over Voice-over-IP for Unobservable Censorship Circumvention”. In:Proceedings of the Network and Distributed System Security Symposium (NDSS). SanDiego, CA United States, Feb. 2013.

41

Bibliography

[23] H. Mohajeri Moghaddam, B. Li, M. Derakhshani, and I. Goldberg. “SkypeMorph:Protocol Obfuscation for Tor Bridges”. In: Proceedings of the ACM Conferenceon Computer and Communications Security (CCS). Raleigh, North Carolina, USA:ACM, 2012, pp. 97–108. DOI: 10.1145/2382196.2382210. URL: http://doi.acm.org/10.1145/2382196.2382210.

[24] G. Tu, C. Li, C. Peng, and S. Lu. “How voice call technology poses security threatsin 4G LTE networks”. In: Proceedings of the IEEE Conference on Communications andNetwork Security (CNS). Sept. 2015, pp. 442–450. DOI: 10.1109/CNS.2015.7346856.

[25] C. Peng, C. Li, G. Tu, S. Lu, and L. Zhang. “Mobile Data Charging: New At-tacks and Countermeasures”. In: Proceedings of the ACM Conference on Computerand Communications Security (CCS). Raleigh, North Carolina, USA: ACM, 2012,pp. 195–204. DOI: 10.1145/2382196.2382220.

[26] S. Shintre, V. Gligor, and J. a Barros. “Anonymity Leakage in Private VoIP Net-works”. In: IEEE Transactions on Dependable and Secure Computing 15.1 (Jan. 2018),pp. 14–26. DOI: 10.1109/TDSC.2015.2513761.

[27] W. Kho, S. A. Baset, and H. Schulzrinne. “Skype relay calls: Measurements andexperiments”. In: Proceedings of the IEEE INFOCOM Workshops. Apr. 2008, pp. 1–6.DOI: 10.1109/INFOCOM.2008.4544646.

[28] S. A. Baset and H. G. Schulzrinne. “An Analysis of the Skype Peer-to-Peer In-ternet Telephony Protocol”. In: Proceedings of the IEEE International Conferenceon Computer Communications (INFOCOM). Apr. 2006, pp. 1–11. DOI: 10.1109/INFOCOM.2006.312.

[29] Saikat Guha, Neil Daswani, and Ravi Jain. An Experimental Study of the Skype Peer-to-Peer VoIP System. 2006.

[30] Y. Zhu, Y. Lu, A. Vikram, and H. Fu. “On Privacy of Skype VoIP Calls”. In: Pro-ceedings of the IEEE Global Telecommunications Conference (GLOBECOM). Nov. 2009,pp. 1–6. DOI: 10.1109/GLOCOM.2009.5425852.

[31] R. Smith, N. Goyal, J. Ormont, K. Sankaralingam, and C. Estan. “Evaluating GPUsfor network packet signature matching”. In: Proceedings of the IEEE InternationalSymposium on Performance Analysis of Systems and Software. Apr. 2009, pp. 175–184.DOI: 10.1109/ISPASS.2009.4919649.

[32] G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and Sotiris Ioanni-dis. “Gnort: High performance network intrusion detection using graphics pro-cessors”. In: Proceedings of the International Symposium on Recent Advances in Intru-sion Detection, pp. 116–134.

[33] G. Vasiliadis, M. Polychronakis, S. Antonatos, E. P. Markatos., and S. Ioannidis.“Regular Expression Matching on Graphics Hardware for Intrusion Detection”.In: Proceedings of the Recent Advances in Intrusion Detection (RAID). Springer, 2009,pp. 265–283.

[34] G. Vasiliadis, M. Polychronakis, and S. Ioannidis. “Parallelization and character-ization of pattern matching using GPUs”. In: Proceedings of the IEEE InternationalSymposium on Workload Characterization (IISWC). Nov. 2011, pp. 216–225. DOI: 10.1109/IISWC.2011.6114181.

[35] N. Huang, H. Hung, S. Lai, Y. Chu, and W. Tsai. “A GPU-Based Multiple-PatternMatching Algorithm for Network Intrusion Detection Systems”. In: Proceedingsof the International Conference on Advanced Information Networking and Applications -Workshops (AINA workshops). Mar. 2008, pp. 62–67. DOI: 10.1109/WAINA.2008.145.

42

Bibliography

[36] G. Vasiliadis, L. Koromilas, M. Polychronakis, and S. Ioannidis. “Design and Im-plementation of a Stateful Network Packet Processing Framework for GPUs”.In: IEEE/ACM Transactions on Networking 25.1 (Feb. 2017), pp. 610–623. DOI: 10.1109/TNET.2016.2597163.

[37] M. A. Jamshed, J. Lee, S. Moon, I. Yun, D. Kim, S. Lee, Y. Yi, and K. Park. “Kargus:a highly-scalable software-based intrusion detection system”. In: Proceedings ofthe ACM conference on Computer and communications security (CCS). Raleigh, NorthCarolina, USA, 2012, p. 317. DOI: 10.1145/2382196.2382232.

[38] scikit-learn: machine learning in Python — scikit-learn 0.21.1 documentation. URL:https://scikit-learn.org/stable/ (visited on 05/16/2019).

[39] Detection of SQL Injection and Cross-site Scripting Attacks | Symantec ConnectCommunity. URL: https : / / www . symantec . com / connect / articles /detection-sql-injection-and-cross-site-scripting-attacks(visited on 03/25/2019).

[40] Telias kunder fortsätter surfa fritt på sociala medier – och nu på ännu fler! sv.URL: http : / / press . telia . se / pressreleases / telias - kunder -fortsaetter- surfa- fritt- paa- sociala- medier- och- nu- paa-aennu-fler-1870625 (visited on 06/04/2019).

[41] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret. “Network Traf-fic Classifier With Convolutional and Recurrent Neural Networks for Internet ofThings”. In: IEEE Access 5 (2017), pp. 18042–18050. DOI: 10.1109/ACCESS.2017.2747560.

[42] R. Ding and W. Li. “A hybrid method for service identification of SSL/TLS en-crypted traffic”. In: Proceedings of the IEEE International Conference on Computerand Communications (ICCC). Oct. 2016, pp. 250–253. DOI: 10.1109/CompComm.2016.7924703.

[43] Z. Zou, J. Ge, H. Zheng, Y. Wu, C. Han, and Z. Yao. “Encrypted Traffic Classi-fication with a Convolutional Long Short-Term Memory Neural Network”. In:Proceedings of the IEEE International Conference on High Performance Computing andCommunications; IEEE International Conference on Smart City; IEEE InternationalConference on Data Science and Systems (HPCC/SmartCity/DSS). June 2018, pp. 329–334. DOI: 10.1109/HPCC/SmartCity/DSS.2018.00074.

[44] V. Krishnamoorthi, N. Carlsson, E. Halepovic, and E. Petajan. “BUFFEST: Predict-ing Buffer Conditions and Real-time Requirements of HTTP(S) Adaptive Stream-ing Clients”. In: Proceedings of the ACM on Multimedia Systems Conference (MMSys).Taipei, Taiwan: ACM, June 2017, pp. 76–87. DOI: 10.1145/3083187.3083193.

43

8 Appendix

8.1 Packets generated for testing

Below are the packets that are generated to train and test classifying SQL injection inSIP.

Listing 8.1: SIP REGISTER example of the test data that is generated.

REGISTER sip:{0}:5060;transport=udp SIP/2.0Via:SIP/2.0/UDP {1}:21755;rport;branch=z9hG4bKPjp9M4yIhZ25v1x7q

-BjpZ8X.zptgZtruTFrom: <sip:{2}@{0}>;tag=DxnOIAxkAZ8JAAPHIcFRSNCG-LBK.UbWTo: <sip:{2}@{0}>Call-ID:0tPPPDAj0-uPoLRs3J9ovnqzvGfmwJgMCSeq:1482 REGISTERMax-Forwards:70Allow:PRACK,INVITE,ACK,BYE,CANCEL,UPDATE,INFO,SUBSCRIBE,NOTIFY,

REFER,MESSAGE,OPTIONSAuthorization:Digest username="{3}",realm="{0}{4}",nonce="57305

a3400000c349f463704c03776fee69173ba18251941",uri="sip:{0}:5060;transport=udp",response="8e69861d68f3efad861da5c8f3f994a2"

Contact: <sip:{2}@{1}:21755;ob>Expires:1260Content-Length:0

Listing 8.2: SIP REGISTER example of the test data that is generated.

REGISTER sip:{0}:5060;transport=udp SIP/2.0Via:SIP/2.0/UDP {1}:21755;rport;branch=z9hG4bKPjp9M4yIhZ25v1x7q

-BjpZ8X.zptgZtruTFrom: <sip:{2}@{0}>;tag=DxnOIAxkAZ8JAAPHIcFRSNCG-LBK.UbWTo: <sip:{2}@{0}>Call-ID:0tPPPDAj0-uPoLRs3J9ovnqzvGfmwJgM

44

8.1. Packets generated for testing

CSeq:1482 REGISTERMax-Forwards:70Allow:PRACK,INVITE,ACK,BYE,CANCEL,UPDATE,INFO,SUBSCRIBE,NOTIFY,

REFER,MESSAGE,OPTIONSAuthorization:Digest username="{3}{4}",realm="{0}",nonce="57305

a3400000c349f463704c03776fee69173ba18251941",uri="sip:{0}:5060;transport=udp",response="8e69861d68f3efad861da5c8f3f994a2"

Contact: <sip:{2}@{1}:21755;ob>Expires:1260Content-Length:0

Listing 8.3: SIP ACK example of the test data that is generated.

ACK sip:{3}@{0}:21755;ob SIP/2.0Via: SIP/2.0/TCP {1}:49168;rport;branch=z9hG4bKPj7hKm-P0ZTtA.3

oBe2rnPVgIbL8J1.KIA;aliasMax-Forwards: 70From: sip:{3}{4}@{1};tag=7EXGGcEjSFlMfAC3VvYYqiv2UOjH2ctYTo: sip:{2}@{1};tag=mw9XpMT23V1tiHwqEmJg8HqorWOV8tz0Call-ID: D4oCX-qyokk4gujwYz0HaIppd6xOFmgZCSeq: 11091 ACKRoute: <sip:{1};transport=tcp;lr;r2=on>Route: <sip:{1};lr;r2=on>Content-Length: 0

Listing 8.4: SIP 200 OK example of the test data that is generated.

SIP/2.0 200 OKVia: SIP/2.0/UDP {0}:5060;received=172.22.0.5;branch=

z9hG4bKf678.ee5e6bf7.0;i=62bVia: SIP/2.0/TCP {1}:49828;rport=49828;received=172.22.0.34;

branch=z9hG4bKPjWNyxayvMOgurKFI-iSB7dZuaQ9JGcNvM;aliasCall-ID: JOyaDf0ltX0JdM.q-2Y9xX1EmcC1rAsbFrom: <sip:{2}@{1}>;tag=5V8sqVAdd3D1DISRkRSYM0u5imq7GDx2To: <sip:{3}{4}@{1}>;tag=z9hG4bKf678.ee5e6bf7.0CSeq: 60214 MESSAGEContent-Length: 0

Listing 8.5: SIP 600 Busy Everywhere example of the test data that is generated.

SIP/2.0 600 Busy EverywhereVia: SIP/2.0/UDP {0}:5060;received=172.22.0.5;branch=

z9hG4bK6356.7e9cbfc4.0Via: SIP/2.0/UDP {1}:21755;rport=21755;received=172.22.0.13;

branch=z9hG4bKPjpkEDe-wIVm6QpkU9uenF1gT9v-AS1BonRecord-Route: <sip:{0};lr>Call-ID: 0-VTBSxifNAZ5m2FuZ3A579xSIfU1DQrFrom: <sip:{3}{4}@{0}>;tag=UXBXLgaqXkcjLPrD4AHHC8qqPNT5Idp0To: <sip:{3}@{0}>;tag=JTo.3tWpzTJGV0KNv96so.WDfqMWKlX3CSeq: 13386 INVITEAllow: PRACK, INVITE, ACK, BYE, CANCEL, UPDATE, INFO, SUBSCRIBE

, NOTIFY, REFER, MESSAGE, OPTIONSContent-Length: 0

45

8.2. Packets that failed identification

Listing 8.6: SIP 180 Ringing example of the test data that is generated.

SIP/2.0 180 RingingVia: SIP/2.0/UDP {0}:21755;rport=21755;received=172.22.0.13;

branch=z9hG4bKPjUdL.7jBqqN6b1JH8CziFQ5hG6MW1zR.mRecord-Route: <sip:{1};lr>Call-ID: rQXyojNuXMTjgailIfnwBrlvGDglVf1hFrom: <sip:{3}{4}@{1}>;tag=wCT26iJ1LGkaHSTb.lp12X81-yzgn5ibTo: <sip:{2}@{1}>;tag=Xd2B6G0EBpHATXr56UeB9s8sH.B.y6CtCSeq: 28097 INVITEContact: <sip:{2}@{0}:21755;ob>Allow: PRACK, INVITE, ACK, BYE, CANCEL, UPDATE, INFO, SUBSCRIBE

, NOTIFY, REFER, MESSAGE, OPTIONSContent-Length: 0

Listing 8.7: SIP Message example of the test data that is generated.

MESSAGE sip:{2}@{0}:21755 SIP/2.0Via: SIP/2.0/TCP {1}:49168;branch=z9hG4bK776sgdkseMax-Forwards: 70From: sip:{3}{4}@{1};tag=49583To: sip:{2}@{0}Call-ID: D4oCX-qyokk4gujwYz0HaIppd6xOFmgZCSeq: 1 MESSAGEContent-Type: text/plainContent-Length: 12

test message

Listing 8.8: SIP Message example of the test data that is generated.

MESSAGE sip:{2}@{0}:21755 SIP/2.0Via: SIP/2.0/TCP {1}:49168;branch=z9hG4bK776sgdkseMax-Forwards: 70From: sip:{3}@{1};tag=49583To: sip:{2}{4}@{0}Call-ID: D4oCX-qyokk4gujwYz0HaIppd6xOFmgZCSeq: 1 MESSAGEContent-Type: text/plainContent-Length: 12

test message

8.2 Packets that failed identification

This section presents the packets that nDPI could not identify as SIP.

Listing 8.9: SIP MESSAGE where DPI failed to identify protocol.

MESSAGE sip:004615550002@172.22.0.5 SIP/2.0Via: SIP/2.0/TCP 172.22.0.34:49828;rport;branch=

z9hG4bKPj5bTeb5GZKVhEwUPZVcLUHOEHplPnNNif;aliasMax-Forwards: 70From: <sip:004615550022@172.22.0.5>;tag=

lJZLVPzLOk05fPDKadUfClXWB7sG1mE5

46

8.2. Packets that failed identification

To: <sip:004615550002@172.22.0.5>Call-ID: di-o2d0LrQbixv029CztpvC0LdWlXBB1CSeq: 37161 MESSAGEAccept: text/plain, application/im-iscomposing+xmlContent-Type: text/plainContent-Length: 792

WP/B1YEEAABIT01TTVMxNhHi1BMQsBvbmrxjHrmsla+WqUeygcPhIoGNgiUkqnExpISLqT7yCOnkfyK7SXONf/1erCgVkpde71qcK3MvTkzLUrHLJvGN7siKE9H3hUR6eySFYPUfe0UYOvfOXzq++58573Af0yt/M7K521tWfUxKmxwEsJXTDtnMRASSJKZucET1FxQBEAmX6Sp8Lyc2i3Wv6Q4LW19VCylqk2HUr5aoDDFAjYPJPbE2995qyeof3J6WmAG5Il+CAao7CoivhkAYTa7C3MSHUlBtZYGA5qopxoE7/l4BNmlxrMUBqRIIaeFffPZlwg0ulSc9VDFMVB5cz3qteDON08+osWCqFnjcAFHqhGeVhqckcQ9aOh8lQjniny5luT8ajW77PiryKuSQhiGxBeB1wZs14pD9dnJ8+gMnL58IBdcEezgQdMS2lz13Pg7z78DaWaHYDb4UUzyuYBdSw+ed7xDrsyLIpzGW4sB18DsRZ9+hfAtUmml/Azq2YNpGcDoJwTL3QkmgqNNdAhcxOdLjPMAzRMbz6uI7wtpOLpkVJ7A+BAQlo40B49MSnj5k8c6RSBss+sn9XdvIOS5azodnU+AuHnUP+UQYgYjMWIaXEMJ0gbUVa45SSarc4phpOA2QRLL4pQ+rT3yoNxifP/6qa0ellkmCdUsKntmXtM/IsCuqD2CC4MdbCnE3hfK4E8NnDmUcMHg+skyJ3PIXi9VLoibOUr5SBG5YOyrL57/xpuY6qosTNlVClTLxLpPvAvdR1Jk+q6RR5/F8

Listing 8.10: SIP 200 OK where DPI failed to identify protocol.

SIP/2.0 200 OKVia: SIP/2.0/TCP 172.22.0.34:49828;rport=49828;received

=172.22.0.34;branch=z9hG4bKPj5bTeb5GZKVhEwUPZVcLUHOEHplPnNNif;alias

Call-ID: di-o2d0LrQbixv029CztpvC0LdWlXBB1From: <sip:004615550022@172.22.0.5>;tag=

lJZLVPzLOk05fPDKadUfClXWB7sG1mE5To: <sip:004615550002@172.22.0.5>;tag=z9hG4bKdd4c.26e2f145.0CSeq: 37161 MESSAGEContent-Length: 0

Listing 8.11: SIP MESSAGE where DPI failed to identify protocol.

MESSAGE sip:004615550003@172.22.0.5 SIP/2.0Via: SIP/2.0/TCP 172.22.0.34:49828;rport;branch=

z9hG4bKPjWNyxayvMOgurKFI-iSB7dZuaQ9JGcNvM;aliasMax-Forwards: 70From: <sip:004615550022@172.22.0.5>;tag=5

V8sqVAdd3D1DISRkRSYM0u5imq7GDx2To: <sip:004615550003@172.22.0.5>Call-ID: JOyaDf0ltX0JdM.q-2Y9xX1EmcC1rAsbCSeq: 60214 MESSAGEAccept: text/plain, application/im-iscomposing+xmlContent-Type: text/plainContent-Length: 792

WP/B1oEEAABIT01TTVMxNke5E9J66pOY/a6Rs8ViMDSIF/UVdJCTjSWxVmi2TPE3RzqnFHk/

47

8.2. Packets that failed identification

gjUKc9RZ8sAldyPKy7NHM0JPgIn6KjN8ZymHT9iDfLYIiR6C9X5I+unsfKA13n/Ya5QTbFzgfPAlfpP3Fvlf4aPLwIchdmDYzqNzvYjFLkIvScUIoRNPwPu4K7t52yhjqXjj/NOV4k6Ix8K5oafHmmQDVyoox9jeLpck2BmZo5smHiZohcPGqkzjmjHWkpcIK8dYx/3RAiPcRFJTgntCgjuIbGs21UEjVl4sFmNjlzFIax5BmP7w9gNcg56a/rBVfjnNRAqzFXmTEF/v3UTkJ4Q91JO6dj3PJZ9Vl152O9Y1gIesROERWfggWrjJ2SDU1OubrgAZhYBOSgiBxsDqS69jq3k71Yh7fUTxb+yy9+gaXXzXX5uYt9ryz5YY3MuQxfUJWSx6+EDbWub6+PTPOou7nDUp5G0/OyKifU6I69VOK+wajwoBXtchKbZMYTZEgyutlVasbjfutHdmV8zXwL03e50DHA6Idlj6sYO7/73JbojZUCUt8qtmOR1Q1GlsHv1MKH3yhbi58ggexdDy0F8gVKfHxDBy9Qk3kCO5ty+YzsXVHhmSW7Tyc+zoM3rxZ2EvAcnJzVmV65K8CzaLpIUO29sYWgUhggeZlypLsOSTU5/bGxnTjOkFeXnH/g+h9+tA5UQy+qe8EWAEYYKLDGN+md6JEobSJdaug+LoRkjZXvFeB4nWQS6SyqA7k045e/vDTdpTB3FmPGtF

Listing 8.12: SIP 200 OK where DPI failed to identify protocol.

SIP/2.0 200 OKVia: SIP/2.0/TCP 172.22.0.34:49828;rport=49828;received

=172.22.0.34;branch=z9hG4bKPjWNyxayvMOgurKFI-iSB7dZuaQ9JGcNvM;alias

Call-ID: JOyaDf0ltX0JdM.q-2Y9xX1EmcC1rAsbFrom: <sip:004615550022@172.22.0.5>;tag=5

V8sqVAdd3D1DISRkRSYM0u5imq7GDx2To: <sip:004615550003@172.22.0.5>;tag=z9hG4bKf678.ee5e6bf7.0CSeq: 60214 MESSAGEContent-Length: 0

Listing 8.13: SIP MESSAGE where DPI failed to identify protocol.

MESSAGE sip:004615550003@172.22.0.5 SIP/2.0Via: SIP/2.0/TCP 172.22.0.14:49684;rport;branch=z9hG4bKPjR.

NVZgDQ9IeGgYw17XFgQlDGKZfpBj11;aliasMax-Forwards: 70From: <sip:004615550002@172.22.0.5>;tag=

CUi1uNWSXHNojkG0VUShWaxLf0WMcl4LTo: <sip:004615550003@172.22.0.5>Call-ID: idPuN4l-o2Ii0C2.E-TNniB3ugHjHubMCSeq: 42470 MESSAGEAccept: text/plain, application/im-iscomposing+xmlContent-Type: text/plainContent-Length: 792

WP/BpoEEAABIT01TTVMxNp048/J5eellW5B7ypD5EKLBVfwYYsdUCJqA8PSGNBI9gtSOtJo6O0ZUNu0m/4l8QLryBUkQiov3UHEoWgHyD6pCkmznGwee/tb8DsZxBDCn20/C7xq6pBV5yH5VkZf7vUsFQZRkIPTh8GvZCe817uPVVaNtGaAEsARRRMJfr8qF+e1UHaz3E4+uQqSfRlPSBnBBNNBlPTgG7hRwY0bZGc4p0b0EAHDW2qZbSw1Ewpy1j40GzUpuNKGDo4jGekBVsMFrlBvMkHMSteS8uQ3j4psSNjhS1oYzwBiJNAzz19+9dk3UXJC2QGYZQlL4Nf8/

48

8.2. Packets that failed identification

MEjeX9xcZa15LAYQiwGHsBzixeX3XMX8Z6KvTIyoh7BL24MZMDsZVnaQ3F2JDdPhZhfmeN3yJZ6nWTFBnZddtOEV4NxkT+sRWlUldK79BLa7y67QHLSLv2PiYnC0aALnH71GJHIqbibudJZBuBQVg9WNNcWavQ2GJw+nYqNWCTUc+05WLNpIMaMOdL5oqj2vRaYmSXjKVB0pcirUWtYx6UzyPcSUQ9COS4NVItxxXiQLKGkl0eo/D8s0oapmAfjXLqQ1g3K9O+nEyHSVdO+7Ad7T1nox79eoFS3Apzx4pPl5Fo4E/yYBoMfJ1lTCTp0/98aFMPYIagDkU8+w527sU4l6LLseyJ6yebwb9GdMdht0d6m2e1ztM8rbyBuX7w7m7qZQK674FBXTcxDXhBF5xAkVO1nEZve9vEvghirXFVDEqmB8jTuy4eKye1/SYlTscTgp

Listing 8.14: SIP 200 OK where DPI failed to identify protocol.

SIP/2.0 200 OKVia: SIP/2.0/TCP 172.22.0.14:49684;rport=49684;received

=172.22.0.14;branch=z9hG4bKPjR.NVZgDQ9IeGgYw17XFgQlDGKZfpBj11;alias

Call-ID: idPuN4l-o2Ii0C2.E-TNniB3ugHjHubMFrom: <sip:004615550002@172.22.0.5>;tag=

CUi1uNWSXHNojkG0VUShWaxLf0WMcl4LTo: <sip:004615550003@172.22.0.5>;tag=z9hG4bKad26.b0c43d41.0CSeq: 42470 MESSAGEContent-Length: 0

Listing 8.15: SIP MESSAGE where DPI failed to identify protocol.

MESSAGE sip:004615550002@172.22.0.5 SIP/2.0Via: SIP/2.0/TCP 172.22.0.13:49682;rport;branch=

z9hG4bKPj048cREFOoUZEKtKVAPI5IkGHzWzaTx0v;aliasMax-Forwards: 70From: <sip:004615550003@172.22.0.5>;tag=

CeRWEV4tC4Ub04wy6ZuxirMs7o.bKroqTo: <sip:004615550002@172.22.0.5>Call-ID: Y8Yqe8pZyWzAukqyPN6kawlfC86o2c3ACSeq: 11656 MESSAGEAccept: text/plain, application/im-iscomposing+xmlContent-Type: text/plainContent-Length: 792

WP/B0oEEAABIT01TTVMxNm18ZkHHO4rhnxNGp19KQrSJiAT7w9WrfoK2FxXdCqpEoDRWIU2v6tKQDQ178/Ox5Seq/VWadL/AUSogbiVLnMi7CJtRlyXSw7rXCeBPYP3dOm3uGrcdiBMj4fvLi7SXZRA+LvvaYOX1MQt9+8ghKrZA0pgS8u0X+7TQdSUipS3mLRk36rzLRd7rpMvDiqK/hXe4OfV2mntwA8f1+/js4y5Eqhdg+js04UKzxDizsfVKYByWH/YUc3IDujc70M/deodtRNCk+jF5z6coWI+nsnVwKUeXxjOuiFHS9SzpMh8jgUJRoKgm20IbD7dA6ZQBWAcGGxnmu5QlKvFT7QeGJsIQ/k0z7tuO7bNKLPvp6GNpza7KQ1Jc1HkmB2RV/P5QG2I6cCj45CzaP2eeWIyrQeI0lJhp7RProEahPO5XIhjYKKK4O89NJorYkqzR7K2RTF4xYJ8X8AdRsM6Yxf1GdQI5n2HUWsMo4eKqozlhH8G+xkrBCU49BYByJxv5oIP8jy2MgJVLPPKPXcMjbNnO4VXeynpDBOwzPIA6SxXaTKPBujTW6zRWtH6PV2jnL0FCIQ32mtrhr9N9z0PTuU8vH9Jiv+OqyRVfEwozGiflP8Qm/dJKHgWGA5hgLF5uualcYJ/NFc2f6FYY2qf51z3dJMCbfgiEPCuLGiCVNFgAK4XyvtC33gM6BPlYp1f2a8Gmo1WX3fb6rZgToANBEhqRATwswYVRYLg6RnAprpGDqDLHzDFc09kOcK8GXmaG/8IHOomY

49

8.2. Packets that failed identification

Listing 8.16: SIP 200 OK where DPI failed to identify protocol.

SIP/2.0 200 OKVia:SIP/2.0/TCP 172.22.0.13:49682;rport=49682;received

=172.22.0.13;branch=z9hG4bKPj048cREFOoUZEKtKVAPI5IkGHzWzaTx0v;alias

From: <sip:asd’;select’asd’from’dsad’;@172.22.0.5>;tag=CeRWEV4tC4Ub04wy6ZuxirMs7o.bKroq

To: <sip:004615550002@172.22.0.5>;tag=z9hG4bK26a7.a135ed57.0Call-ID:Y8Yqe8pZyWzAukqyPN6kawlfC86o2c3ACSeq:11656 MESSAGEContent-Length:0

50