Intrusion detection in voice over IP environments

20
Int. J. Inf. Secur. (2009) 8:153–172 DOI 10.1007/s10207-008-0071-0 REGULAR CONTRIBUTION Intrusion detection in voice over IP environments Yu-Sung Wu · Vinita Apte · Saurabh Bagchi · Sachin Garg · Navjot Singh Published online: 16 December 2008 © Springer-Verlag 2008 Abstract In this article, we present the design of an intrusion detection system for voice over IP (VoIP) networks. The first part of our work consists of a simple single- component intrusion detection system called Scidive. In the second part, we extend the design of Scidive and build a distributed and correlation-based intrusion detection system called SpaceDive. We create several attack scenarios and evaluate the accuracy and efficiency of the system in the face of these attacks. To the best of our knowledge, this is the first comprehensive look at the problem of intrusion detection in VoIP systems. It includes treatment of the challenges faced due to the distributed nature of the system, the nature of the VoIP traffic, and the specific kinds of attacks at such systems. Keywords Intrusion detection · Voice over IP system · Cross-protocol detection · Stateful detection · Correlation-based IDS · SIP · RTP Y.-S. Wu and V. Apte contributed equally to the paper. Y.-S. Wu · V. Apte · S. Bagchi (B ) Dependable Computing Systems Lab, School of Electrical and Computer Engineering, Purdue University, 465 Northwestern Avenue, West Lafayette, IN 47907, USA e-mail: [email protected] Y.-S. Wu e-mail: [email protected] V. Apte e-mail: [email protected] S. Garg Yahoo Labs, Bangalore, India e-mail: [email protected] N. Singh Avaya Labs, Basking Ridge, USA e-mail: [email protected] 1 Introduction Voice over IP (VoIP) systems are gaining in popularity as the technology for transmitting voice traffic over IP networks. While VoIP technology is set to revolutionize communica- tions, and is already being used by a number of traditional telephone companies to connect their regional offices, on a smaller scale it can also be a useful solution for businesses looking to trim their telephone expenses. As the popular- ity of VoIP systems increases, they are being subjected to different kinds of intrusions, some of which are specific to such systems, and some of which follow a general pattern of attacks against an IP infrastructure. There have been enor- mous strides made in the field of intrusion detection systems (IDS) for different components of the information technol- ogy infrastructure. Some of the IDSs are generic in nature and can be customized with detection rules specific to the environment in which they are deployed (e.g., Snort [7] and Prelude [10]), and some are tools specifically targeted to an environment or to specific classes of intrusions, such as IBM Tivoli Intrusion Manager for MQSeries products [11]. VoIP systems pose several new challenges to IDS design- ers. First, these systems employ multiple protocols for call management and data delivery. Second, the systems are dis- tributed in nature and employ distributed clients, servers, and proxies. Third, the attacks against such systems span a large class, from denial of service to billing fraud. Finally, the sys- tems are heterogeneous and typically under several different administrative domains, e.g., the proxy server may be pro- vided by the service provider and the client managed by the home organization. In this article, we present our work in the design and deployment of an intrusion detection system for VoIP networks. The initial goal of our work was to build a local, single-component VoIP IDS. We call this system Scidive 123

Transcript of Intrusion detection in voice over IP environments

Int. J. Inf. Secur. (2009) 8:153–172DOI 10.1007/s10207-008-0071-0

REGULAR CONTRIBUTION

Intrusion detection in voice over IP environments

Yu-Sung Wu · Vinita Apte · Saurabh Bagchi ·Sachin Garg · Navjot Singh

Published online: 16 December 2008© Springer-Verlag 2008

Abstract In this article, we present the design of anintrusion detection system for voice over IP (VoIP) networks.The first part of our work consists of a simple single-component intrusion detection system called Scidive. In thesecond part, we extend the design of Scidive and build adistributed and correlation-based intrusion detection systemcalled SpaceDive. We create several attack scenarios andevaluate the accuracy and efficiency of the system in the faceof these attacks. To the best of our knowledge, this is the firstcomprehensive look at the problem of intrusion detection inVoIP systems. It includes treatment of the challenges faceddue to the distributed nature of the system, the nature of theVoIP traffic, and the specific kinds of attacks at such systems.

Keywords Intrusion detection · Voice over IP system ·Cross-protocol detection · Stateful detection ·Correlation-based IDS · SIP · RTP

Y.-S. Wu and V. Apte contributed equally to the paper.

Y.-S. Wu · V. Apte · S. Bagchi (B)Dependable Computing Systems Lab,School of Electrical and Computer Engineering,Purdue University, 465 Northwestern Avenue,West Lafayette, IN 47907, USAe-mail: [email protected]

Y.-S. Wue-mail: [email protected]

V. Aptee-mail: [email protected]

S. GargYahoo Labs, Bangalore, Indiae-mail: [email protected]

N. SinghAvaya Labs, Basking Ridge, USAe-mail: [email protected]

1 Introduction

Voice over IP (VoIP) systems are gaining in popularity as thetechnology for transmitting voice traffic over IP networks.While VoIP technology is set to revolutionize communica-tions, and is already being used by a number of traditionaltelephone companies to connect their regional offices, on asmaller scale it can also be a useful solution for businesseslooking to trim their telephone expenses. As the popular-ity of VoIP systems increases, they are being subjected todifferent kinds of intrusions, some of which are specific tosuch systems, and some of which follow a general pattern ofattacks against an IP infrastructure. There have been enor-mous strides made in the field of intrusion detection systems(IDS) for different components of the information technol-ogy infrastructure. Some of the IDSs are generic in natureand can be customized with detection rules specific to theenvironment in which they are deployed (e.g., Snort [7] andPrelude [10]), and some are tools specifically targeted toan environment or to specific classes of intrusions, such asIBM Tivoli Intrusion Manager for MQSeries products [11].VoIP systems pose several new challenges to IDS design-ers. First, these systems employ multiple protocols for callmanagement and data delivery. Second, the systems are dis-tributed in nature and employ distributed clients, servers, andproxies. Third, the attacks against such systems span a largeclass, from denial of service to billing fraud. Finally, the sys-tems are heterogeneous and typically under several differentadministrative domains, e.g., the proxy server may be pro-vided by the service provider and the client managed by thehome organization.

In this article, we present our work in the design anddeployment of an intrusion detection system for VoIPnetworks. The initial goal of our work was to build a local,single-component VoIP IDS. We call this system Scidive

123

154 Y.-S. Wu et al.

(pronounced as Skydive). Scidive is structured to detect dif-ferent classes of intrusions, including, masquerading, denialof service, and media stream-based attacks. It can be installedat multiple points—clients, servers, or proxies, and can,without substantial system customization, be extended fordetecting new classes of attacks. The IDS can handle clientmobility, an important design goal of VoIP protocols suchas SIP, and does not flag false alarms for such situations.Scidive can operate with both classes of protocols that com-pose VoIP systems—call management protocols (CMP), e.g.,the Session Initiation Protocol (SIP) [2], and media deliv-ery protocols (MDP), e.g., the Real Time Transport Proto-col (RTP) [3]. Scidive proposes two critical abstractions forVoIP IDS—stateful detection and cross-protocol detection.Stateful detection denotes the functionality of assemblingstate from multiple packets and using the aggregated statein the rule matching engine. The reassembly functionalityis applicable to packets of both CMP and MDP and can beconfigured to handle packets spread out arbitrarily far apartin time. Some existing IDSs provide support for reassembly,but they are restrictive and applicable only to specific proto-cols. For example, Snort’s stream4 module can reassembleTCP packets that belong to the same session. Cross-proto-col detection denotes the functionality of matching rules thatspan multiple protocols, e.g., detecting a pattern in a SIPpacket followed by one in a succeeding RTP packet followedby one in an RTCP packet. The aggregation across proto-cols can be chained in an arbitrarily long manner and spreadout in time. This abstraction is powerful for VoIP systemsbecause they involve multiple protocols and several attacksare based on sequences that cross protocol boundaries. Thereare very few systems today that support cross-protocol detec-tion. One of the notable ones is WebSTAT [12] for detect-ing attacks against web servers by correlating protocols in avertical stack, e.g., application level (web server) and oper-ating system log. Since VoIP systems use multiple applica-tion layer protocols, horizontal cross-protocol correlation isrequired.

The architecture of Scidive uses a Distiller, throughwhich all incoming network traffic passes and which trans-lates packets into protocol dependent information units calledFootprints. The Footprints that belong to the same sessionare grouped into Trails. The Event Generator maps Foot-prints into Events which are matched by the Rule Match-ing Engine against a Ruleset. According to the stateful andcross-protocol philosophies, the Events can potentially havestate information and encapsulate information from multiplepackets. The rules considered are misuse-based, as opposedto the complementary philosophy of anomaly-based rules.The classic tradeoffs between the two approaches also holdhere—misuse-based detection cannot deal with novel typesof attacks, while anomaly based rules can have large amountof false positives.

Scidive is demonstrated on a sample VoIP system thatcomprises SIP clients and SIP proxy servers. The systemuses SIP Express Router for the proxy and three differentkinds of clients—KDE’s KPhone [13], Microsoft WindowsMessenger [14], and XTen’s X-Lite IP Telephony client [9].The protocols used are SIP for call management and RTP forreal-time audio data transfer. In our experiments, an instanceof Scidive is associated with each client. Four different typesof attacks are simulated on the system and the effectivenessand efficiency of Scidive analyzed.

Voice over IP systems involve multiple components andare distributed in nature. These components are widely spreadand often fall under different administrative domains. SinceScidive uses local, single-component based detection, it can-not detect an attack that manifests itself at multiple compo-nents. Therefore, in the second part of our work, we extendthe design of Scidive and build a distributed and correlation-based intrusion detection system called SpaceDive. Space-Dive also uses the concepts of stateful and cross-protocoldetection. It further proposes the abstraction of correlationbased IDS and provides a rule language to express correlatedrules. Correlation based IDS’s have the goal of clusteringalerts from multiple detectors (in our case, the detectors res-ident on the different VoIP components) to come up with acombined alert or the determination of an attack. The corre-lation may be of information gathered from peer entities orentities at different levels. SpaceDive is demonstrated on asample VoIP system that comprises SIP clients and SIP serv-ers spread over two domains. Moreover, our testbed is in theform of a hierarchy of components that respects administra-tive domains and is thus suitable for deployment in real-worldsettings. Several attack scenarios are created and the accu-racy and the efficiency of the system evaluated with rulesmeant to catch these attacks.

In summary, the contributions and the advantages ofSpaceDive can be specified as follows:

(1) SpaceDive presents the architecture of a hierarchicalcorrelation based IDS that is well suited to detectingattacks in VoIP applications. The ability to match rulesremotely makes the system less prone to DoS attackslaunched against VoIP components or their hosts.

(2) SpaceDive provides a language to specify rules forlocal matching and remote matching. SpaceDive’sarchitecture makes the matching of rules efficient andscalable, both essential features for a VoIP system.

SpaceDive anticipates the growing trend of peer-to-peerVoIP systems and its architecture is well suited to detect-ing intrusions on the basis of information exchanged on apeer-to-peer basis. Additionally, we present a taxonomy ofattacks against VoIP systems. Some notable attacks from thistaxonomy are redirect attacks, toll frauds, call interception

123

Intrusion detection in voice over IP environments 155

attacks, etc. We propose the end-to-end matching paradigmas a powerful approach for detecting a large class of theseattacks. Yet another contribution of this work is our rule lan-guage that extends Snort’s rule language and customizes itfor a VoIP IDS.

The taxonomy of attacks and the rule language can be thestarting point for much needed research and development inVoIP IDS.

1.1 VoIP overview

Voice over IP (VoIP) systems provide facilities for settingup and managing voice communications based on one oftwo main call management protocols: H.323 [1] and SIP.H.323 is the most widely deployed standard in VoIP com-munications, but SIP is increasing in popularity due to itssimplicity and corresponding ease of implementation. Withboth protocols, endpoints or terminals, which may be physi-cal phones (hardphones) or software programs executing ona general-purpose computer (softphones), send and receiveRTP packets that contain encoded voice conversations. Sincevoice calls may be made between IP phones and phones onthe Public-Switched Telephone Network (PSTN), gatewaysoften perform transparent translation between IP and non-IP based networks. Such gateways may implement proto-cols for media gateway management such as MGCP [4] andMEGACO/H.248 [5]. Within an H.323 network, an optionalgatekeeper may be present. The gatekeeper performs severalfunctions including authorizing network access, assisting inmanaging quality of service, and providing address-transla-tion services. Also, multipoint controllers may be present tomanage multipoint conferences between three or more ter-minals or gateways.

SIP networks also include additional types of servers.A proxy server forwards requests, possibly after perform-

ing some processing or translation. A redirect server is usedto support mobile clients and performs address translation foran accepted request and returns the new address to the orig-inator of the request. Both proxy and redirect servers maybe used to implement call forwarding and other similar ser-vices. User agent clients send requests to user agent serversto initiate calls. The user notifies a registrar of his currentlocation to allow others to contact him. The registrar is oftencombined with a proxy or redirect server (Fig. 1).

Suppose that a user Bob wants to call another user Alice.Bob begins by sending a SIP INVITE message to its ownproxy server that contains Alice’s SIP address in the destina-tion field. Bob’s proxy server then does a lookup (this couldbe a DNS lookup, a simple table lookup etc) to locate Alice’sproxy server and then forwards the INVITE to Alice. WhenAlice receives the INVITE and is willing to accept the callfrom Bob, she sends a SIP OK message back to Bob usingthe same path. Once this call setup is complete, Alice and

SIPClient

SIPClientRTP

S I P

S I P

SIPGateway

PSTN

S I P Proxy and Redirect Servers

SIPClient

SIPClientRTP

S I P

S I P

SIPGateway

PSTN

S I P Proxy and Redirect Servers

Fig. 1 SIP-based VoIP environment

Table 1 Sample SIP error messages

301 Moved permanently Redirect

302 Moved temporarily Redirect

403 Forbidden

404 Not Found

408 Request Timeout

480 Temporarily Unavailable

500 Server Internal Error

Bob can exchange voice data encoded in the form of RTPpackets directly.

Both H.323 and SIP provide protocols for call setup, man-agement, and media delivery. Voice is encoded using a nego-tiated codec and delivered using RTP over UDP/IP for bothprotocols. However, call setup and management are han-dled quite differently. H.323 relies on the H.225.0 [15] andH.245 protocols [16], whereas SIP uses a much simpler set ofrequest messages: INVITE, ACK, OPTIONS, BYE, CAN-CEL, and REGISTER. SIP provides a globally reachableaddress to which callees bind using SIP REGISTER method.The INVITE message is used by a user client agent wishing toinitiate a session, which can be responded to with an OK, fol-lowed by an ACK. To tear down a connection, a BYE messageis sent. CANCEL cancels a pending INVITE. The OPTIONSmessage is used to query or change optional parameters ofthe session, such as, encryption. Some common SIP errormessages are described in Table 1.

1.2 VoIP vulnerabilities

One of the main advantages of a VoIP system is the con-vergence of voice and data networks with voice being con-veyed over a data network. While this offers advantages incost and ease of management, the use of the data network ina converged system makes the voice network vulnerable to

123

156 Y.-S. Wu et al.

the same vulnerabilities suffered by the data network. Thisincludes well-known attacks such as denial of service attacksas well as authentication attacks. In addition, a voice networkintroduces potential vulnerabilities related to toll fraud, pri-vacy, and denial of service attacks based on degrading thequality of service of the voice conversation.

A major source of vulnerabilities lies in the protocols usedto set up and manage calls. Both H.323 and SIP transmitpacket headers and payload in clear text without a per-mes-sage integrity check (e.g. digitally signed), which allows anattacker to forge packets that manipulate device and callstates. For example, such forged packets can prematurelyterminate calls, redirect calls, or facilitate toll fraud. Someefforts are currently underway to develop encrypted signal-ing, but no solution has found widespread adoption. Some ofthe security features are built into SIP, such as, end-to-end orhop-by-hop encryption of the SIP traffic. Some other solu-tions rely on lower layer security primitives, such as, IP-Sec. However, a concern regarding use of any of the securityprimitives is the computation load on the end points, someof which are very resource constrained. It also relies on thedeployment and use of keys, which requires the adoption ofa key management infrastructure to which the different VoIPservice providers must agree. This has turned out to be achallenging deployment issue so far.

In addition to vulnerabilities present in the signaling proto-cols, the RTP protocol for media delivery also introduces sev-eral vulnerabilities due to the absence of authentication andencryption. Even if Secure RTP (SRTP) is used, the systemwould still be vulnerable to smart DoS attacks, as the extraencryption-decryption time required for each SRTP packetonly worsens the situation. Each RTP packet header containsa sequence number that allows the recipient to play backvoice packets in the proper order. However, an attacker caneasily inject artificial packets with higher sequence numbersthat will cause the injected packets to be played in place ofthe real packets. Also SPIT (Spam over Internet Telephony)is predicted to become a problem as VoIP systems becomemore widespread.

1.3 Attacks taxonomy

We can classify the possible attacks into two broad catego-ries: (1) generic attacks and (2) VoIP-specific attacks.

1.3.1 Generic attacks

We classify the generic attacks into four categories: Denialof Service (DoS) attacks, Buffer Overflow, Unauthorizedaccess, and Attack on the operating system. These kindsof attacks are well-known in traditional computing environ-ments, but also affect VoIP environments, since they shareseveral protocols, and hence their vulnerabilities.

1.3.2 VoIP-specific attacks

Signal protocol attack. Such attacks exploit vulnerabilitiesin the signaling protocol. For example, the SIP protocol con-tains holes in the subset related to invite messages.Redirect attacks. A redirect attack might change a voicemailaddress or a call forwarding address to the address of a hacker,thereby opening a channel to abuse.Call interception. An unauthorized person could monitor andintercept voice packets, perhaps reading and stealing or cor-rupting them.This is an example of a passive attack and is outside thepurview of SpaceDive.Toll fraud. An unauthorized person could monitor and inter-cept call setup packets, gaining sufficient information toallow her to masquerade as a legitimate user and make fraud-ulent calls.

1.4 Scidive architecture

Voice over IP (VoIP) applications involve multiple protocolsand each of these protocols has its own protocol states thatneed to be well-synchronized in order to keep the VoIP appli-cations running correctly. As the end goal all attacks againstVoIP is to cause disruptions to these applications, we foundthat it would be handy for our IDS to have the capability todetect the disruptions from the protocols and the correspond-ing states. To achieve this, the IDS’s rules must possess theability to express the combination of the protocol and thestates of interest. Most existing IDS solutions, such as Snort,provide only a generic architecture in which the IDS itselfdoes not pay much attention to the application level protocolsand the corresponding state information. Consequently, theuser has to construct detection rules from the ground to cap-ture these application level contexts for performing detection.This creates problems in directly applying these IDS solu-tions effectively to a VoIP system.

The design of the Scidive architecture and its correspond-ing components is meant to facilitate the process of detectingVoIP related attacks by utilizing both the protocols and thestate information. We propose a new IDS methodology inScidive in which each rule could harvest the stateful infor-mation of VoIP sessions for matching malicious patterns—Stateful Detection. It could also harvest packets from dif-ferent protocols involved in a VoIP protocol for detection—Cross Protocol Detection. Note that the SciDive architectureis not specific to the signaling protocol or the media proto-col; it just assumes that there are two distinct protocols forthese two functions. Thus, either H.323 or SIP can be usedfor the signaling protocol. However, in our current imple-mentation, the parser for the rules has been created for SIPand our discussion hereon will assume SIP as the signalingprotocol.

123

Intrusion detection in voice over IP environments 157

2 SCIDIVE components: footprints, trails, events, rules

Figure 3 presents an overview of the SCIDIVE architecture.In SCIDIVE, incoming network flows first pass through theDistiller, which translates packets into protocol dependentinformation units called Footprints. The Distiller is respon-sible for doing IP fragmentation, reassembly, decoding pro-tocols, and finally generating the corresponding Footprints.A Footprint is a protocol dependent information unit, which,for example, could be composed of a SIP message or an RTPpacket. Footprints that belong to the same session are groupedinto Trails. In Fig. 2, we have three Trails that correspond totwo SIP sessions and one RTP session. The Event Generatormaps footprints into a single event. For example, we can maptwo out of order RTP Footprints into an event called RtpJitter.Event Generator is hard-coded and seamlessly coupled withinternal structures for best possible performance. In general,it is just a layer of abstraction, which correlates the infor-mation in footprints and concentrates the information into asingle event. It helps performance by hiding some compu-

INVITE

Ringing

OK

ACK

Conversation over RTP

BYE

OK

INVITE

Ringing

OK

ACK

Conversation over RTP

BYE

OK

Fig. 2 Sample SIP message exchange

Fig. 3 SCIDIVE architecture

tationally expensive matching, e.g., by triggering the rulesetat the moment of interest instead of triggering it upon eachincoming RTP Footprint.

Ruleset is triggered by a sequence of Events, e.g., we candefine a rule for detecting RTP flow [event 1] after a sessionis torn down [event 2]. The matching in the Ruleset is basedon Events that can potentially encapsulate information frommultiple packets and can bear state information. Besides theinformation that Events provide, the Ruleset can also per-form the matching based on crude information directly fromthe Trails in case no suitable Event is available. For example,we might be interested in knowing who prematurely tearsdown the session. To achieve this, we probably need to havea look at the corresponding SIP Footprint to identify the IDand IP address of the originator. This direct access howeverwill cause some degree of inefficiency.

3 Cross-protocol methodology for detection

We propose a powerful abstraction for intrusion detectionsystems in general, and VoIP IDSs in particular, namely,cross-protocol detection. An IDS that uses cross-protocoldetection accesses packets from multiple protocols in a sys-tem to perform its detection. This methodology is suitableto systems that use multiple protocols and where attacksspanning these multiple protocols are possible. There is theimportant design consideration that such access to informa-tion across protocols must be made efficiently.

A VoIP system incorporates multiple protocols. A typi-cal example is the use of SIP to establish a connection, fol-lowed by use of RTP to transfer voice data. Also, RTCP andICMP are used to monitor the health of the connection. VoIPsystems typically have application level software for billingpurposes and therefore may have accounting software and adatabase.

To motivate the need for cross-protocol detection, weintroduce a synthetic example of a billing fraud attack. SinceVoIP systems have been gaining in popularity only of late,there are very few instances of actual attacks in databasessuch as CERT [15] and Bugtraq [16]. In our synthetic sce-nario, the attack is launched by the attacker exploiting a vul-nerability in the SIP proxy. She sends a carefully crafted SIPmessage to fool the proxy into believing the call is initiatedby someone else. The proxy initiates the accounting softwarewith the information about the incorrect source for the call.This allows the attacker to make calls without being charged.Using the cross-protocol methodology for detection, one cancreate a cross-protocol rule to look at the SIP messages, thetransaction messages between the accounting software andthe database, and the RTP flows later on. Specifically, eachof the following three conditions must hold.

123

158 Y.-S. Wu et al.

1. The SIP message should follow the correct format.2. When the accounting software sends out a transaction

to denote a call from user A to user B, check if user Ahas sent a SIP Call Initialization message to user B. Ifuser A has not set up the call with a legitimate SIP CallInitialization message, then this condition will be vio-lated.

3. Check the source/destination IP addresses of the subse-quent RTP flows. Together with information from DNSand SIP Location Servers, we can reconfirm that eachRTP flow has a corresponding legitimate call setup.

In SCIDIVE, cross-protocol detection is achieved throughkeeping multiple trails for different sessions. In our example,we can have (1) a ‘SIP trail’ which tracks all the SIP mes-sages in the session between user A and user B; (2) an ‘RTPtrail’ which tracks all the RTP packets in the session betweenA and B; and (3) an ‘Accounting trail’ which tracks relevantaccounting transactions in this session between A and B.Then, we can define three events based on the three trailscorresponding to the three conditions above. The first eventis “an incorrectly formatted SIP message in the SIP trail”,which could be an indication of an attempt to exploit the vul-nerability in the SIP proxy. The second event is “a transactionin the Accounting trail that has no matching call initializa-tion message in the SIP trail”. The third event is “either thesource or destination IP addresses of the RTP packet with-out a matching address in the SIP packet”. The third eventis specialized to take mobility into account, which will beindicated by a SIP REINVITE message with an update ofstate at the SIP Registrar that maintains location informa-tion. In the Ruleset, we can put a rule called Billing Fraud,which is triggered by a combination of these three events.An advantage of creating a rule based on a sequence of threeevents is improving the accuracy of the alarm because therule is based on three facets of the attack. It is perceivablethat relying solely on Event 1 or Event 3 to signal ‘BillingFraud’ alarm will result in false alarms. Also, bugs or tempo-rary system failures might cause Event 2. Therefore, relyingsolely on Event 2 will possibly give us false alarms.

4 Stateful methodology for detection

A second abstraction useful for VoIP systems in particular isstateful detection. Stateful detection implies building up rel-evant state within a session and across sessions and using thestate in matching for possible attacks. It is important that thestate aggregation be done efficiently so that the technique isapplicable in high throughput systems, such as VoIP systems.

A VoIP system maintains considerable amount of systemstate. The client side maintains state about all the active con-nections—when the connection was initiated, when it can be

torn down, and what the properties of the connection are.The server side also maintains state relevant to billing, suchas the duration of the call. To motivate the need for statefuldetection, we introduce a synthetic example of a DoS attackand a password guessing attack. An unauthorized user cli-ent keeps sending unauthenticated REGISTER requests tobombard the SIP proxy and ignores the 401 UNAUTHO-RIZED reply error message from the SIP proxy. If the userclient keeps sending the same request to the server, it canbe seen as a type of DoS attack on the SIP proxy. Alongwith the UNAUTHORIZED reply message, the proxy sendsa challenge phrase to the client. If the client keeps sendingrequests with different values in the challenge response field,this could be seen as a type of attack that is trying to breakthe authentication key by brute force. In either case, it wouldbe helpful for detection if the system can look at the seriesof user client requests and the subsequent server responses.Since 4XX responses are not uncommon in a normal session,a traditional IDS like Snort with a rule to detect multiple4XX responses may flag a large number of false alarms. Forexample, most user clients send an unauthenticated REG-ISTER request to the server, presuming that the SIP Proxydoes not require authentication. Later, the server sends a 401response along with a challenge phrase to the client to indi-cate that authentication is required. The client should thensend a new REGISTER request to the server along with thecorrect response phrase to continue the registration process.If the IDS does not isolate 4XX error messages from dif-ferent sessions and doesn’t correlate 4XX error messageswith requests, it is likely it will make false verdicts based onunrelated 4XX error messages. In SCIDIVE, Footprints thatbelong to a session are structured and kept in a single trail.Therefore, the history of all the state transitions of each ses-sion can be easily tracked. To handle the two attack scenariosabove, we can set up the following two events: (1) an event“DoS via repeated SIP requests”, which represents contin-uous, alternating SIP requests and 4XX error messages ina particular session and (2) an event “Password guessing”which represents continuous, alternating SIP requests withdifferent challenge responses and 401 Unauthorized replyerror messages in a particular session. Flagging of the twoevents indicates two different kinds of attacks that may havedifferent responses. In the first case, the response may be toidentify the source of the attacker and block her IP addresswhile in the second case, it may be important to take morestringent measures to ensure the security of the system, suchas changing the authentication password to a longer one.

5 Placement of SCIDIVE components

The SCIDIVE architecture has a great deal of flexibility interms of the placement of its components. For example, it ispossible to deploy the SCIDIVE IDS only on the SIP client

123

Intrusion detection in voice over IP environments 159

side for detecting anomalies in the VoIP traffic in and out ofthe client. A more aggressive approach would be to deploy theSCIDIVE IDS on all the components—Clients, SIP Proxy,and Registrar server—in a VoIP system. For example, in theBilling attack scenario outlined earlier in the section “Cross-protocol Methodology for Detection”, we need to deploy theIDS on the SIP Proxy and the Accounting Server to detectEvent 1 and Event 2. Also we need to deploy the IDS closeto both clients to monitor RTP flows to detect Event 3. Itseems a logical path to use multiple SCIDIVE IDS and analert correlation engine to give better detection accuracy. Theactual gains, impacts on the system performance, and costsfor doing so are explored in the next part in the system calledSpaceDive.

5.1 Prototype and experiments

An IDS prototype is built to instantiate the SCIDIVE archi-tecture for VoIP systems. We implement four attacks againstthe VoIP system, instantiate rules in SCIDIVE for detectingthe attacks, and perform analysis of the detection efficiency.For simplicity, the IDS is placed at each client for the exper-iments here. This configuration is shown in Fig. 4 and isreferred to as an End-point based IDS architecture.

5.2 Testbed

Our testbed comprises the following:

• Proxy : SIP Express Router from http://www.iptel.org

Internet

IDS

IDS Protected Area

SIPProxy

Hub Internet

IDS

IDS Protected Area

SIPProxy

Hub

Fig. 4 IDS Engine sits on or close to the end-point

• Clients : Kphone from http://www.wirlab.net; WindowsMessenger from Microsoft; X-Lite from http://www.xten.com

Figure 5 depicts the topology of our testbed. The IDS is con-nected to a hub through which the traffic of Client A can beseen. Although the prototype IDS can also see the traffic ofClient B and the SIP Proxy, it does not look into those trafficfor any purpose, thus mimicking an end-point based IDS.

5.3 Attacks and demonstration of the IDS

We implement four attacks to demonstrate the functional-ity of the IDS. Three of the four attacks are based on thevulnerabilities in the signaling protocol, i.e., SIP. They areBYE attack, Fake Instant Messaging, and Call Hijacking. Thefourth is based on the vulnerabilities in the media transportprotocol, which is the RTP attack. A summary of the attacks

Fig. 5 Testbed with clientphones, proxy servers, networkelements, attacker host and theIDS

123

160 Y.-S. Wu et al.

Table 2 Summary of attacks

Name of attack Protocols involved Cross-protocol or not?;if yes, how?

Stateful or not?; if yes,how?

Rule Snippet

BYE attack SIP, RTP Yes; Detect no RTPtraffic once SIP BYEhas been seen

Yes; Monitor session todetermine when it hasbeen torn down

No RTP traffic shouldbe seen after a SIP BYEfrom a particular useragent.

Fake instant messaging SIP, IP Yes; Check the sourceIP addresses ofincoming IM messages(SIP Message)

No; Check the IP addressesof all the incomingmessages. The addressshould stay the samewithin a time period.

Call Hijacking SIP, RTP Yes; Detect no RTPtraffic once SIPREINVITE has beenseen

Yes; Monitor session todetermine when it hasbeen redirected

No RTP traffic shouldbe seen after a SIP BYEfrom a particular useragent.

RTP Attack RTP, IP Yes; Check the source IPaddress of RTP packets

Yes. Check if thesequence numbers inconsecutive packetsincrease as expected

Check if RTP packetscome from legitimate IPaddress and if thesequence numberincreases appropriately.

is presented in Table 2. Details of the attacks are presentedin the following sections.

5.4 BYE attack

In this attack scenario, we have three SIP User Agents {A, B,and Attacker}. We also have a SIP Proxy for setting up theconnections. The goal of the BYE attack is to prematurelytear down an existing dialog session, which can be viewed asa kind of Denial-of-Service attack. In Fig. 6, SIP UA A andSIP UA B have an ongoing dialog session. Later, Attackersends a faked BYE message to A. After that, A will believethat it is B who wants to tear down the connection by sendingthe BYE message. A will stop its outward RTP flow immedi-ately, while B will continue to send RTP packets to A, sinceB has no notion that the connection should be terminated.

Detect orphan RTP flow. Specifically, RTP flow (the gold line) should stop before Client A sees the ‘BYE’

SIP UA [A]

SIP:enterprise@sam

SIP ProxySam.xxx.com

SIP UA [B]SIP:typhoon@sam

SIP UA [Attacker]

BYE

Detect orphan RTP flow. Specifically, RTP flow (the gold line) should stop before Client A sees the ‘BYE’

Fig. 6 Schematic of BYE attack

In order to detect this attack, we create a rule that detectsorphan RTP flow. Specifically, if it is indeed B who wants tostop the connection, then A should not see the RTP flow fromB after getting the BYE message. Thus, in the IDS, we createa rule which signals an alarm when seeing new RTP Foot-prints in the RTP Trail that corresponds to the flow from Bafter seeing a BYE SIP Footprint. Although the attack itselfoccurs only within the signaling protocol (SIP), our detectionrule spans SIP and RTP and provides an illustration of theimportance of cross-protocol detection.

5.5 Fake instant messaging

In addition to IP Phone Call setup, SIP also supports InstantMessaging. By faking the header of an instant message appro-priately, the attacker can forge a message to A and misleadit into believing the message is from B (Fig. 7). Our rule fordetecting this attack looks at the IP addresses of the mes-sages. Under most circumstances, within a period, messagesfrom B should bear the same source IP address. Therefore,once we find a message from B which carries a different IPaddress, it would be an indication that this message is a fakeone. The rule takes rate of user mobility into account andallows for changes in the IP address according to the maxi-mum rate of user motion. Indeed, this rule is not perfect. Ifthe attacker is able to spoof its IP address, then this rule willnot work. However, based on the Host-based architecture,this is probably the best we can do. This leads to interestsin research on a more ambitious architecture like deployingIDS on both client ends.

123

Intrusion detection in voice over IP environments 161

SIP ProxySam.xxx.com

SIP UA [B]SIP:typhoon@sam

Compare the source IP addresses of all messages and look for discrepancies…(i.e. within a period, all messages from B should bear the same source IP address.)

SIP UA [A]

SIP:enterprise@sam

SIP UA [Attacker]Fake IM –A will believe this message is from B, though it is actually from C

IM

Compare the source IP addresses of all messages and look for discrepancies…(i.e. within a period, all messages from B should bear the same source IP address.)

Fig. 7 Schematic of a Fake Instant Messaging attack

SIP UA [A]SIP:enterprise@sam

SIP ProxySam.xxx.com

SIP UA [B]SIP:typhoon@sam

SIP UA [Attacker]

Re-Invite

Detect orphan RTP flow. Specifically, RTP flow (the gold line) should stop before Client A sees the ‘Re-Invite’

-

Detect orphan RTP flow. Specifically, RTP flow (the gold line) should stop before Client A sees the ‘Re-Invite’

Fig. 8 Schematic of Call Hijacking attack

5.6 Call Hijacking

A direct impact of this attack is that B will experience a con-tinued silence since A is no longer sending its voice data toB. For this part, the Call Hijacking attack can be seen as akind of Denial-of-Service attack. A more subtle impact isthat if the attacker were able to emulate the voice data of B,then after successfully redirecting the call, A would not beable to distinguish between B and the attacker. This couldlead to issues of confidentiality and breach of privacy sincethe attacker will be able to listen to what A is saying (Fig. 8).

To detect this attack, we use a similar approach as for theBYE attack. Intuitively, if the REINVITE message is indeedfrom B, then A should not see any RTP flow from B after that.By detecting orphan flows, we are able to determine whetherit’s a legitimate REINVITE message or a malicious one.

5.7 RTP attack

All the attacks mentioned earlier are signaling based attacks.The RTP attack on the contrary is based on the vulnerabili-ties in media transport. In this attack, the attacker sends RTPpackets whose contents are garbage (both the header andthe payload are filled with random bytes) to one of the per-

SIP UA [A]SIP:enterprise@sam

SIP ProxySam.xxx.com

SIP UA [B]SIP:typhoon@sam

SIP UA [Attacker]

Junk RTP packet

Have sanity check on the RTP flow. (e.g. whether these packets come from the correct IP address and whether the sequence number increases nicely.)

Have sanity check on the RTP flow. (e.g. whether these packets come from the correct IP address and whether the sequence number increases nicely.)

Fig. 9 Schematic of an RTP attack

sons in a dialog. In this example, the attacker sends garbagebytes to A (Fig. 9). Since these garbage packets will corruptthe jitter buffer in the IP Phone client, depending on differ-ent implementations, this attack could result in intermittentvoice conversation or in crashing the client. For example,in our experiment, X-Lite will crash after this attack whilethe effect on Microsoft Messenger is intermittent voice con-versation. This attack also leads to degradation in QoS (jit-ter) and in the voice quality. The rule we use for detectingthis attack is that the sequence number in succeeding RTPpackets should increase regularly. Specifically, if we see twoconsecutive packets whose sequence numbers have a differ-ence greater than 50, the IDS will signal an alarm. The gapbetween sequence numbers can also be seen in a DoS attackfree environment, which can be caused by abrupt networktraffic congestion or high CPU utilization on a phone. Forexample, we have seen cases where the sequence numberjumped by hundreds due to crashed phone clients. However,these are not so common as seen in our testbed, and the num-ber 50 is empirically chosen such that the false positive rateis kept low enough (e.g. < 4% in our experiment). A largerdifference value will further decrease the false positive ratebut can decrease the sensitivity in detecting DoS attacks aswell.

5.8 Performance evaluation

In this section, we comment on the performance of the IDSsystem with respect to three metrics: (1) the detection delay,D, which is defined as the time from when an attack/intrusionis made to the time it is detected, (2) the probability of falsealarm, Pf, which is defined as the probability that the IDSflags an intrusion when none has occurred, and (3) the prob-ability of missed alarm, Pm, which is defined as the probabil-ity that the IDS system does not flag an intrusion when oneoccurs. The goal is to make the reader aware of the variablesthat affect these metrics and give an idea of practical values.

123

162 Y.-S. Wu et al.

Detailed performance evaluation with numerical computa-tion is the subject of ongoing work.

5.8.1 BYE and Call Hijacking attack

In both these attacks, the IDS rule looks at the SIP signalingevent (BYE or REINVITE) and monitors the media streamfollowing this event to detect an intrusion. Specifically, thearrival of an RTP packet at the original RTP port from theoriginal sender flags an intrusion.

5.9 Detection delay

Measuring from the time the SIP message is received, itincludes the time up to the arrival of the RTP packet from theoriginal sender. Obviously, the time depends on the frequencyof RTP packets. A typical period employed is 20 ms. How-ever, the RTP packet arrival depends also on the network con-ditions. Specifically, the delay distribution of packets fromthe sender to the receiver would cause this quantity to change.Figure 10 outlines the various timing variables involved. Letthe time of the last RTP packet arrival be T1 (before the fakeBYE/REINVITE message arrival). Also, without loss of gen-erality, let this message originate from the sender at time 0.Then T1 is the transport delay of this RTP packet. Further,let Tsip be the time of arrival of the SIP message. Let T2 bethe time of arrival of the RTP packet, which originated 20 msafter the previous RTP packet. We assume that the fake SIPmessage was generated after the 1st RTP packet but beforethe 2nd RTP packet. When the SIP message is received, theIDS system starts looking for RTP packets for a total durationof “m” milliseconds. Obviously, T1, T2 and Tsip are randomvariables and m is a fixed value. The detection delay D is afunction of these four parameters as D = T2 − Tsip, whereT2 = 20 + Nrtp and Tsip = Gsip + Nsip. Nrtp and Nsip

Fig. 10 Detection delay

are the random variables associated with the network delayfor each packet. While the second RTP packet is generated20 ms after the origin, the fake SIP message is generatedbetween the two RTP packets with a distribution given byGsip. Therefore, D = 20 + Nrtp − (Gsip − Nsip). Given thedistributions of these random variables, it is possible to com-pute the detection delay distribution. Under the simplest ofassumptions, where the fake SIP message is generated witha uniform distribution in (0, 20), and the network delay isassumed to be independent and identical for all packets, theexpected detection delay is 10 ms, which is half of the RTPpacket generation period.

5.10 Probability of missed alarm

Since the detection depends on monitoring after a SIP mes-sage arrival and since this monitoring interval is necessaryfinite (‘m’ milliseconds), there is a probability that the IDSsystem may not detect the intrusion. For instance, if the subse-quent RTP packet(s) from the original sender are lost (some-where in the network) and no RTP packet arrives within themonitoring interval, then no alarm will be raised. Referringto Fig. 10, the probability is given by:

P f = Pr{T2 > Tsip + m}= Pr{Nrtp − Gsip − Nsip > m − 20}.

5.11 Probability of false alarm

Although rare, it is possible for a valid BYE message toarrive before the RTP packet if, for instance, they take adifferent route, as shown in Fig. 11. In this case, the IDS sys-tem will raise a false alarm. In order to compute this prob-ability, we assume that the sender generated the valid SIP

Fig. 11 Probability of false alarm

123

Intrusion detection in voice over IP environments 163

BYE/REINVITE immediately (with zero delay) after it hassent out the last RTP packet. In that case, the false alarmprobability is given as P f = Pr{Nsip < Nrtp}. If the den-sity function and distribution function of Nrtp and Nsip isassumed to be identical and independent denoted by f N (t)and F N (t) respectively, then the false alarm probability is∫ m

0 FN (t)fN (t) dt.

5.12 SpaceDive

As an extension of Scidive, we propose the design of a sys-tem called SpaceDive to serve as a correlation-based IDS forVoIP systems. The implementation provides an instantiationof the design principles, which are applicable to the design ofany IDS for VoIP applications. Our solution centers on threebasic design principles.

Rule matching engine at the local and remote levels. The sys-tem includes rule matching engines located at the individualVoIP components (local rule matching engine, or RMEL) aswell as remote rule matching engines (RMER). Thus fastmatching of local attack patterns can be done together withcorrelation of attack patterns arising from multiple sources.The placement of the RME’s is flexible as is the generationand transfer of information needed for the RME′

Rs.

Cross protocol and stateful detection. These principles werepresent in our earlier version of the system, SciDive andare maintained in the distributed architecture as well. Torecap, stateful detection denotes the functionality of assem-bling state from multiple packets and using the aggregatedstate in the rule matching engine. Cross protocol detectiondenotes the functionality of matching rules that span multipleprotocols. However, SciDive did not provide any rule lan-guage for specifying these patterns while SpaceDive does.Thus the system takes the rulebase as an input easing the taskof applying SpaceDive to new deployments.

Integration with Snort. SpaceDive needs to examine pack-ets coming in at a host executing a VoIP component. Thepacket rate may be high and therefore fast matching of therules locally and fast processing to generate events for cor-relation are required. The Snort IDS is well known for itsefficiency in examining incoming packets and SpaceDiveleverages the Snort functionality. To avoid performance loss,SpaceDive is built into Snort using part of its low-level func-tionality (examining and processing packets) and adding to it(e.g., to build state to support stateful detection) and buildingcompletely the high level functionality specific to the VoIPenvironment.

The typical process flow for the detection of an attack inSpaceDive is as follows. At the local rule matching engine(RMEL), an incoming packet is sniffed and passed throughlocal rules. These local rules are specified in a language

derived from Snort’s and augmented with constructs to cre-ate state. The match generates an event for the local eventtrail and optionally state associated with the event. An eventparser next parses the local events to optionally generate anetwork event. A network event is synonymous with an eventthat needs to be aggregated with events from other VoIP com-ponents for matching at an RMER. At the RMER, there existsa rulebase for network events specified in a high level lan-guage introduced in this paper. The network events from theindividual RME′

Ls can be either pushed to or pulled by theRMER, depending on the nature of the event and the rulebaseat the RMER.

6 SPACEDIVE design

SpaceDive design hierarchy. The SpaceDive design can bebroken down in two parts—the local-level design and thenetwork-level design. Local-level design involves a singleVoIP component (client, proxy, etc.) and has the local RuleMatching Engine (RMEL). Network–level design takes intoconsideration all the components deployed in one domain oracross multiple domains and the interactions between themand provides the remote Rule Matching Engine (RMER).

Providing separate IDSs at the local level and at the networklevel plays a large part in keeping SpaceDive scalable.

7 SPACEDIVE local level design

Figure 12 shows the SpaceDive components at the locallevel. The components below the dashed line represent an

Rule Matching Engine - (RMER)

Snort Rule Matching Engine

RME ExtensionVoIP State Repository

Snort packet sniffing / reassembly / decoding

SIP / RTP packets parser

OriginalSnortRules

ExtensionRules

Local Event Trail

Network Event Trail

Event Parser

Existing Snort Functionalities

AddedSPACEDIVE

Functionalities

Rule Matching Engine - (RMER)

Snort Rule Matching Engine

RME ExtensionVoIP State RepositoryVoIP State Repository

Snort packet sniffing / reassembly / decoding

SIP / RTP packets parser

SIP / RTP packets parser

OriginalSnortRules

ExtensionRules

ExtensionRules

Local Event TrailLocal Event Trail

Network Event TrailNetwork Event Trail

Event ParserEvent Parser

Existing Snort FunctionalitiesExisting Snort Functionalities

AddedSPACEDIVE

Functionalities

AddedSPACEDIVE

Functionalities

Fig. 12 Local level design

123

164 Y.-S. Wu et al.

instance of SpaceDive installed on each VoIP componentand integrated with Snort. The sniffing module makes use ofthe libpcap library to read packets received over the network.Currently, Snort understands 4 protocols: IP, TCP, UDP, andICMP. We have modified Snort so that it now identifies SIPand RTP packets too.

The State Repository stores the current state of the sys-tem. State comprises the status of an ongoing session—i.e.connecting, established, terminated, etc., the status of a node,e.g., if the node has moved, or the reception of a particulartype of packet (e.g. a SIP BYE message). The Event Trailkeeps track of events, specified using the low level rule lan-guage detailed later in the section titled “Low Level Rule Lan-guage”. The event trail contains events ordered by session ID.For events that have no associated session ID, such as RTPpackets, the timestamp is used for ordering. The ProcessingEngine determines whether a pre-defined event has occurredand records it in the event trail. It also updates the state ofthe rule variables in the State Repository. The Event Parsertakes the event trail as input and generates a trail of “NetworkEvents”. What constitutes a Network Event is specified inthe RM ER, which disseminates the pertinent network eventdefinitions to the local RMEs. The RMER uses the NetworkEvent Trail to correlate events across the different compo-nents of the network.

8 Low level rule language

Since the RMEL is built on Snort, we make use of the Snortrule language to match incoming packets at each VoIP com-ponent to generate the local events. A rule can be brokendown into two basic parts, the rule header and options forthe rule. The rule header contains the action to perform, theprotocol that the rule applies to, and the source and destina-tion addresses and ports. The rule options is used to createa descriptive message to associate with the rule, as well ascheck a variety of other packet attributes by making use ofSnort’s extensive library of plug-ins.

The general form of a local rule is: action proto src_ipsrc_port direction dst_ip dst_port (options).

There are four major categories of rule options:meta-data These options provide information about the

rule but do not have any effect during detectionpayload These options all look for data inside the packet

payload and can be inter-relatednon-payload These options look for non-payload datapost-detection These options are rule specific triggers that

happen after a rule has “fired”.Some important rule options are ‘content’ (allows the user

to set rules that search for specific content in the packet pay-load and trigger response based on that data) and ‘priority’(assigns a severity level to rules).

When a packet comes in, its source and destination IPaddresses and ports are compared to the rules in the ruleset.If any of them are applicable to the packet, then the optionsare compared to the packet. If these comparisons return amatch, then the specified action is taken.

The native rule language of Snort is not well-suited forVoIP stateful or cross-protocol detection. Snort provides lim-ited capability for remembering state both within a VoIP ses-sion for a given protocol (e.g. SIP) and across protocols (e.g.SIP and RTP). To make up for this, we add constructs to theexisting rule language so that it is better-suited for detectingattacks targeted to VoIP environments that span packets ina session and different protocols. Next, we provide a briefdescription of the new constructs.

1. sip. This construct is used to identify a SIP call session,which includes all the SIP messages used in a phone callbetween two parties.

2. var. This construct is used to set the integer value of avariable in case of a rule match. This is used as a way ofkeeping state. The var construct belongs to the ‘options’part of a Snort rule.

3. Event. The event construct is used to create event trails.It tells Snort to record an event when the correspond-ing rule match occurs. An event can be triggered on acombination of rule matches according to the followingconstructs.

I. And/Or/Not—Logical Constructs. These const-ructs are used to trigger an event based on logicalcombinations of rule matches;

II. Before/After—Temporal Constructs. The Beforeand After constructs are used to trigger events basedon a temporal sequence of rule matches.

4. Net_Event. This construct follows the same syntax as‘Event’ except that it is used to represent a network eventas opposed to a local event.

5. Protocol-specific constructs. To detect certain attackswe need to look into specific fields in the header of a pro-tocol. For example, we may need to know the windowof allowable sequence numbers for an RTP packet. Thisleads us to define a construct called ‘seqwin’ that repre-sents the in-range sequence numbers of RTP. This meansthat now SpaceDive needs to parse the VoIP protocols(currently SIP and RTP) in addition to the four protocols(IP, ICMP, TCP, UDP) Snort is currently able to parse.This is more difficult especially for RTP, because RTPpackets do not have a fixed length and they do not con-tain any string identifier to identify them as RTP packets.SpaceDive uses the port numbers negotiated via SIP toidentify an RTP session. Currently, we limit our SIP pars-ing to the session id and the RTP port fields contained inthe SDP header of SIP since these fields suffice for the

123

Intrusion detection in voice over IP environments 165

range of attacks we consider; while in the case of RTP,we parse the entire header.

Sample Rule: Suppose that an attack pattern consists ofpacket A containing the string “DESTROY” followed bypacket B containing the string “ERASE”. Assume that packetA is on port 5,000 and packet B is an RTP packet on port6,000. Then we construct the rule as follows, which will logan event in the local event trail if both packets are found.

var r1; var r2;alert udp any 5,000 -> any any (content:“DESTROY”;

var:r1);alert rtp any 6,000 ->any any (content:“ERASE”; var:r2);event (r1 AND r2);

9 SPACEDIVE network level design

At the network level, SpaceDive views the system as com-posed of multiple VoIP domains, each with its own RMER

(Fig. 13).The RME′

Rs perform remote rule matching from networkevents generated by each RMEL in its domain. Each RMER

comes with a configuration script that gives it the followinginformation—the IP address and hostname of all the clients,servers, and proxies in its domain. The RME uses this con-figuration information for network level rule matching.

We have developed a high-level rule language for specify-ing network level events in the RMEs. At the network levela rule R can be represented as:

R = ((wherei : whati )conni )response (i = 1, . . ., N )

In R,wherei denotes the location of event i, whati denotesthe network level event i, conni is the connective betweenthe events and may be AND, OR, NOT, BEFORE, AFTER,

LAN1

LAN1

RMER

LAN2

LAN2

RMEH

RMELRMEL

RMEL RMEL

RMER

LAN1

LAN1

RMER

LAN2

LAN2

RMEH

RMELRMELRMELRMEL

RMELRMEL RMELRMEL

RMER

Fig. 13 Rule matching engine hierarchy

or NULL. The clause response indicates the response to betaken in the case of a match and is currently limited to analert or dropping a packet. The RME extracts the event spec-ifications (whati ) from the rule pertaining to a particularnode (wherei ) and disseminates them to that node at start-up. These event specifications are then used by the eventparsers in the RME′

Ls to generate network events. For exam-ple, consider a VoIP call established between clients A and B.An attacker C sends a faked SIP BYE message to A to makeit believe that B wants to tear down the connection. Since Bdoes not know of this attack, it will continue sending RTPpackets to A. To detect this attack, we look for an orphanRTP flow at A after the BYE message is received. We canframe the high level rule as:

(clientA:RTP_Flow) AFTER (clientB:BYE_Sent) alert

To make the detection infrastructure scalable, the RMEsmay be arranged in a hierarchy as shown in Fig. 13 correlat-ing information across domains. In this hierarchy, RMEH is ahigher level RME that can look at rules corresponding to mul-tiple administrative domains. Driven by the realization thatdifferent parts of a VoIP system may be owned by differentorganizations, SpaceDive has the capability to accommo-date multiple peer-level RME′

Rs. The policies and the resul-tant rules in each administrative domain may be encapsulatedwithin the RME′

Ls and the RMER without the need to sharethem. A third-party organization may own the RMEH whichmatches for rules that affect multiple organizations.

10 Algorithms in SPACEDIVE

10.1 Local level event generation—efficient matching

At the local level, all rules specified in the rules file are parsedto form an expression tree. For example, consider the ruleevent ((rv1 AND rv2) OR rv3 AND rv4). The correspondingexpression tree for this rule is shown in Fig. 14.

AND

rv4OR

AND rv3

rv1 rv2

AND

rv4True

AND

rv4OR

AND rv3

rv1 rv2

AND

rv4OR

AND rv3

rv1 rv2

AND

rv4True

AND

rv4True

(a) (b)

Fig. 14 Example rule processing at RMEL

123

166 Y.-S. Wu et al.

InternetInternetRMEH

SIP Gateway

SIP Proxy and Registrar

RMER

SIP Gateway

SIP Proxy andRegistrar

RMERDomain 2

Domain 1

InternetInternet

SIP Gateway

SIP Proxy and Registrar

RMER

RME

Domain 1

Fig. 15 SpaceDive testbed

Each variable (rv1, rv2, rv3, rv4) in the rule will have apointer to the root of this expression tree stored with it.Note that a variable may be involved in multiple rules. Inthat case, we store a list of root pointers with eachvariable.

Whenever a variable is set to 1 after a rule match, we needto evaluate the expression tree. For example, in the abovecase, when rv3 is set to 1, we find that the value of the ORexpression (the parent of rv3) becomes TRUE, irrespectiveof the values of rv1 and rv2. So we can simplify the tree struc-ture by rolling up the sub-tree rooted at the OR as shown inFig. 14.

Assuming a total of r rules and an average of v variables ineach rule, the total number of variables at RMEL is V = r.v.

The search operation is O(log V) time using binary search.Let H be the height of the original expression tree before anyroll-up. The worst case for locating the variable in the treeis when the tree is unbalanced with height H = v − 1. Tolocate a variable in the expression tree of height i, we needO(i) operations. In the worst case, we assume that settingeach variable in the tree to 1 decreases the height of the treeby just one, i.e. no tree roll-up is possible at any stage. Thiswould happen for example if all the connectives are AND.

S2 BS1A H

Invite

Invite

Invite

B is busy

Set Busy Transfer

Invite

OK from B

OK from H

OK from H

OK from H

S2 BS1A H

Invite

Invite

Invite

B is busy

Set Busy Transfer

Invite

OK from B

OK from H

OK from H

OK from H

Fig. 16 Call Hijacking

Thus, to evaluate the entire expression we get a complexity

O

(H∑

i=1

(log V + i)

)

= O(H. log V + H2)

= O(v. log V + v2)

= O[v. log v + v. log r + v2].If r < v, then the operation is O(v2).

10.2 Network level event generation

The event parser is loaded with a rules file at start-up bypushing this information from the RMER that tells it whatevents to look for in generating the network events. It thenmatches these rule definitions with the local event trail togenerate network events. The search through the event trailis linear, not logarithmic, since the events are generated atruntime and the sorting to enable binary search is not pos-sible. Consider that the RMER manages MRME′

Ls. Let usassume the number of remote events for each RMEL fol-lows a uniform distribution U(LR, UR). Let the buffer sizefor the local event trail be L . In the worst case, there willbe UR rules, the local event trail buffer is full, and each rulewill have to be checked against the entire buffer. This givesa worst case complexity for the remote event generation asO(U R × L).

Table 3 Rules to detect the CallHijacking attack Component Rule Snippet

A alert sip any any -> any any (var:rv1; content:INVITE;); net_event (rv1;)

B alert sip any any -> any any (var:rv2; content:OK;); net_event (rv2;)

S2 alert sip any any -> any any (var:rv3; content:OK;); net_event (rv3;)

RME (A:SIP_SESSION_ESTB) AND (B:SIP_OK) AND (NOT(S2:SIP_OK)) alert

123

Intrusion detection in voice over IP environments 167

10.3 Processing at RMER

The RMER also generates an expression tree from the highlevel rules. The rules are pointed to by a hash table of size M,

each entry corresponding to an RMEL. Assuming r rules atthe RMER, the number of what clauses (equal to the numberof nodes in the expression tree for one rule) is h = k M/r,where k = 1/2(L R + U R). Therefore the cost of matchingone rule is

O

(h∑

i=1

(c + i)

)

= O(h2).

11 Demonstration and experiments

11.1 Experimental Testbed

Figure 15 shows the layout of our testbed. To realisticallysimulate a VoIP environment, we have built a testbed withtwo domains. This enables us to demonstrate intra domaincalls as well as inter-domain calls. Each domain has a SIPgateway, a proxy server, a registrar server, clients and supportservers like FTP, DNS etc. The SIP clients and servers areequipped with the SpaceDive IDS. We use the SIP ExpressRouter (ser) [8] for the SIP servers. Ser can be configured asa SIP registrar or proxy server. Our SIP clients are Windowsbased and use X-Lite [9]. In the testbed, we have the gateway,registrar and proxy server running on the same machine. Wedeploy RMEH in domain 1 though in practice it can belongto either domain or be in a separate domain altogether.

11.2 Workload

The normal workload consists of the scenario where client 1from domain 1 makes a call to client 2 in domain 2. Whenclient 1 initiates a call, a SIP request is sent to the SIP server(either a proxy or a redirect server) in domain 1 with theaddresses of the caller and the callee. If a proxy server isused, client 1 sends an INVITE request to the proxy server,the proxy server determines the path, and then forwards therequest to client 2. Client 2 responds to the proxy server,which in turn, forwards the response to client 1. The proxyserver forwards the acknowledgments of both parties. A ses-sion is then established between the two clients. The commu-

S2 HS1A B

Invite

Challenge

InviteInvite

Invite with Response to challenge

Challenge

Invite with Response to

challenge

Invite404 Not Found

OK

OKOK

S2 HS1A B

Invite

Challenge

InviteInvite

Invite with Response to challenge

Challenge

Invite with Response to

challenge

Invite404 Not Found

OK

OKOK

Fig. 17 Man in the middle attack

nication between the caller and the callee happens throughRTP packets. The SIP Gateways provide call control.

11.3 Attack scenarios

This section describes several possible attack scenarios on aVoIP system. In all the attack scenarios, S1, S2 are two SIPproxies overseeing two different domains, A and B are legit-imate clients, while H is a malicious client. To compare andcontrast the power of a local IDS (SciDive) and SpaceDive,we give the steps in detecting each attack scenario by the twosystems. Fragments of the relevant rules from SpaceDive arealso given.

At the end of the section, we classify the attack scenariosin two dimensions and summarize the detection in the twosystems.

11.3.1 Call Hijacking

In the Call Hijacking attack, we assume that H is at a place onthe network where she can sniff the traffic from B. A possiblecase is that B and H happen to be using the same networkhub. In the attack, H first sends a busy transfer request toserver S2 such that any call to B will be transferred to Hwhen B is busy. Now A places a call to B. H is able to sniffthe Invite message from S2 to B and responds with a ‘B is

Table 4 Rules to detect the manin the middle attack Component Rule Snippet

S1 alert sip A any -> any any (var:rv1; content:OK;); net_event (rv1;)

S2 alert sip S1 any -> any any (var:rv2; content:OK;) net_event (rv2;)

B alert sip S2 any -> any any (var:rv3; content:OK;) net_event (rv3;)

RME (S1:SIP_OK) AND (S2:SIP_OK) AND (NOT(B:SIP_OK)) alert

123

168 Y.-S. Wu et al.

S2 BS1A H

Invite

Invite

Invite

OK from B

BYE for B

BYE for B

BYE for B

OK from B

OK from B

OK from AOK from A

OK from A

S2 BS1A H

Invite

Invite

Invite

OK from B

BYE for B

BYE for B

BYE for B

OK from B

OK from B

OK from AOK from A

OK from A

Fig. 18 BYE attack

busy’ message back to S2 before B is able to reply to S2(Fig. 16).

S2 sends a new invite message to H, thinking that B is busyand the call should be transferred to H. Although B’s ok mes-sage will eventually go to S2, many server implementationswill regard this as a noisy reply. On Openser, with the defaultconfiguration, this ok message will also be forwarded to A,which is then ignored by the X-Lite client. Thus H hijacks theconversation and can collect confidential information that Awants to pass to B.SpaceDiveDetection: A SpaceDive rule checks if the OKreply from B goes correctly all the way from B to A. Thecorrelation is done across events at B, S2, S1, and A. Thisfalls in the general class of rule called end-to-end matchingin SpaceDive, where the correlation is done across events ateach component in the path. This is a powerful rule class andcan detect many different kinds of attacks (Rules in Table 3).

11.3.2 Man in the middle attack: intercepting outgoing calls

In this attack, we assume H is on the route between S2and B. The goal for H is to intercept outgoing calls fromB. The INVITE messages are authenticated through a chal-lenge-response mechanism. As B places an outgoing call,the attacker H forwards the INVITE messages and the chal-lenge-responses between S2 and B until the authenticationphase is completed (Fig. 17).

Then H fakes a ‘404 Not Found’ message back to B suchthat B thinks A is not present. In effect a call is establishedbetween H and A with H representing itself as B.SciDiveDetection: SciDive is not able to detect this attack,since the message exchanges that happen at A, S1, S2, andB can all be part of a legitimate call signaling process. Fur-thermore, since H is the malicious client, SciDive cannot beplaced on H.SpaceDiveDetection: This can be detected by SpaceDivewith an end-to-end matching rule for the OK message goingcorrectly all the way from A to B through S1 and S2 (Rulesin Table 4).

11.3.3 BYE attack

In this attack, H’s goal is to prematurely tear down a currentcall session between A and B. For the attack, H sends a BYEto A, which will trick A into tearing down the dialog with B.Here, either the session is not secure, or if it is secure, H isable to masquerade as B through some vulnerability in thesystem (Fig. 18).SpaceDiveDetection: This can be detected by SpaceDivewith an end-to-end matching rule for the BYE message fromB → S2 → S1 → A. In this attack scenario, the part ofB → S2 is missing (Rules in Table 5).

11.3.4 Compromised SIP proxy

Assume a hybrid system with SIP based VoIP telephony andtraditional PSTN telephony. The accounting of the relayingcalls is done at the SIP proxy S2. Therefore, only authorizedusers from the VoIP side are able to make calls to a PSTNphone. Here, we assume that the attacker H exploits a vul-nerability at S2, such that by sending a malformed INVITEmessage, she is able to impersonate B and place a call to thePSTN phone (Fig. 19).SciDiveDetection: A SciDive IDS at S2 may use signa-ture based rule checking to decide whether the incomingINVITE message contains malformed data. A weakness ofthis approach is that the signature is very specific and theremay be a large number of ways of exploiting vulnerabilitiesin S2 through malformed INVITE messages.SpaceDiveDetection: In SpaceDive, detection componentsinstalled on both S2 and B check whether B did send out theINVITE message (Rules in Table 6).

Table 5 Rules to detect theBYE attack Component Rule Snippet

S2 alert sip B any -> any any (var:rv1; content:BYE;) net_event (rv1;)

S1 alert sip S2 any -> any any (var:rv2; content:BYE;) net_event (rv2;)

A alert sip S1 any -> any any (var:rv3; content:BYE;) net_event (rv3;)

RME (NOT(S2:SIP_BYE)) AND (S1:SIP_BYE) AND (A:SIP_BYE) alert

123

Intrusion detection in voice over IP environments 169

S2 BS1A H

Invite B

Invite B

OK from B

OK from B

H hacked into S2 and controls S2

S2 BS1A H

Invite B

Invite B

OK from B

OK from B

H hacked into S2 and controls S2

Fig. 19 Compromised SIP proxy

11.3.5 Denial of service (DoS) attack

An attacker can launch a denial of service attack by flood-ing the servers signaling port, flooding the media proxy’slistening port or by flooding the clients media or signalingport.SpaceDiveDetection: SpaceDive can detect a DoS attacktargeted at a client’s media port. To detect a DoS attacklaunched on host A we have the rule as shown in Table 7.

This rule will generate an alert if the RTP sequence num-bers of two consecutive packets in the same session are offby +50 or −50 units. Thus, if one packet has sequence num-ber of 400 and the next one has anything greater than 450 orless than 350, SciDive will flag an alarm. The detection inSciDive follows the same principle though the rule has to behard coded in the system.

In Table 8, we classify the attack scenarios in two dimen-sions—whether secure VoIP protocols can nullify the attackand whether the attack is dependent on position of H vis-à-visA and B. A “Yes” indicator in a column indicates the attack ismore difficult to execute. Finally, the table gives whether theattack scenario can be detected in SpaceDive and Scidive.

Note that all the above attacks are active attacks. Space-Dive cannot detect passive attacks like eavesdropping. Thisis an inherent limitation of the current IDS technology ingeneral.

11.4 Experimental results

11.4.1 Timeline for correlated detection

For this experiment, we use the Man in the Middle Attack todemonstrate the timeline of the correlated detections amongSpaceDive components. The layout of the testbed is shownin Fig. 20. We use an Openser SIP proxy (say, X) and twoX-Lite SIP softphones on A and B. The attacker H, which is asimple home made proxy, relays the traffic between B and X.We have SpaceDive RME′

Ls deployed on A, B, and X, andthe RMER on a separate host. B first tries to make an outboundcall to A and begins by authenticating with X. The authenti-cation process ends with an INVITE message from X to B.Until this point, H faithfully relays all messages exchangedbetween X and B. After this, H immediately sends a ‘404 NotFound’ message back to B and stops relaying any messagebetween B and X. Once A gets the INVITE message, she willreply with an OK message, which should go all the way backto B for establishing the call. However, since H has sent the‘404 Not Found’ message instead of the OK message to B, Bwill never see an OK message and will assume that A is notpresent on the other side. The way we detect this attack is byputting local rules at S0, S1, and S2 such that when an OKmessage is seen, the local detector will send an event to theRMER. The rule at RMER is then to check whether there arethree OK messages from S2, S1, and S0 in sequence withina bounded time window—we use a configurable window of4 ms from the time the first event is received. In this case,S0 does not see the OK message, which will then triggerthe man in the middle detection alert. The timeline for thesedetections is shown in Fig. 21.

11.4.2 Performance of rule matching

In this experiment we compared the rule matching overheadof SpaceDive for different rule types. We tested four cate-gories of rules, each of which involves content matching fora string in the payload.

Type 0 is rule matching in Snort, while Type 1 is a vanillaSnort rule matched in SpaceDive. Type 2 rules use the varconstruct to set the value of a variable in the state repository.Type 3 rules are ones that use the event construct to create alocal event in the event trail. We show cases of using 1, 2, 4,

Table 6 Rules to detect acompromised SIP proxy Component Rule Snippet

S1 alert sip A any -> any any (var:rv1; content:INVITE;) net_event (rv1;)

S2 alert sip S1 any -> any any (var:rv2; content:INVITE;) net_event (rv2;)

B alert sip S2 any -> any any (var:rv3; content:INVITE;) net_event (rv3;)

RME (S1:SIP_INVITE) AND (S2:SIP_INVITE) AND (NOT(B:SIP_INVITE)) alert

123

170 Y.-S. Wu et al.

Table 7 Rule to detect a DoSattack

Rule Snippet

alert rtp any any → A any (seqwin:50; var:rv;); net_event (rv;)

Table 8 Classification of attackscenarios Attack Scenario Mitigated by secure VoIP protocol;

Dependent on position of malicious clientDetected by SciDive;Detected by SpaceDive

Call hijacking YES; YES YES; YES

Man in the middle NO; YES NO; YES

Bye Attack YES; NO YES; YES

Compromised SIP proxy NO; NO NO; YES

Billing fraud YES; NO MAYBE; YES

A B

SIPPROXY

XH

S2

S1S0

RME R

A B

SIPPROXY

XH

S2

S1S0

RME R

A B

SIPPROXY

XH

S2

S1S0

RME R

Fig. 20 Experimental setup

Local event detection at S2

Local event detection at S1 Attack detected

at RME R

Time (ms)

t = 0 t=3.6 t=5.2

Local event detection at S2

Local event detection at S1 Attack detected

at RME R

Time (ms)

t = 0 t=3.6 t=5.2

Fig. 21 Timeline for Remote Rule Matching (times are not drawn toscale)

and 8 variables to construct the event. Figure 22 shows thecomparative performance of the four categories of rules.

As seen from the figure, rule matching for Type 1 rulesincurs negligible overhead. Type 2 rules involve about 8%overhead over Type 0. Type 3 rules with 1 variable fare almostthe same as Type 2 rules. This is quite intuitive, since theexpression tree for one variable is just a single node (the rootof the tree). As expected, the performance cost increases withthe number of variables. For a Type 3 rule with 8 variables,

0

10

20

30

40

50

60

Type

0

Type

1

Type

2

Type

3- 1

varia

ble

Type

3- 2 va

riable

s

Type

3- 4 va

riable

s

Type

3- 8 va

riable

s

Rule Type

Tim

e(µ

s)

Fig. 22 Processing time for classes of SpaceDive rules

the overhead is, on an average about 40%. This results fromthe fact that an expression tree with 8 variables has a depthof at least 3. This increases the time it takes to perform thetree rollup.

11.4.3 Resilience to DoS

DoS attacks are crucial ones in VoIP systems since secureVoIP protocols such as SRTP [6] do not protect against themand the traffic is delay-sensitive. The objective of this experi-ment is to test the resilience of a VoIP client to a DoS attack inthe baseline system and equipped with SpaceDive. A mali-cious client M sends garbage RTP packets to a receiver A con-currently with a legitimate sender client B. Client A requiresservice at the rate of 64 kbps for acceptable voice quality. Thisis used to normalize the quality of service in the presence ofthe client M.

Figure 23 shows the degradation in the quality as M is ableto pump more DoS traffic to A. Since SpaceDive is config-ured to drop garbage RTP packets (from M), A’s degradationin quality is much less steep. However, the processing tomatch the rule for RTP packets causes some degradation.

123

Intrusion detection in voice over IP environments 171

0

0.25

0.5

0.75

1

0 100 200 300 400 500 600

Data rate of malicious node (kbps)

Qu

alit

y (N

orm

aliz

ed)

Quality (without SpaceDive)

Quality (with SpaceDive)

Fig. 23 Resilience to DoS attacks

11.4.4 Detection false positive rates

Here we test the five detection rules in SpaceDive against 4call traces collected over different time periods from a closedattack-free environment consisting of two SIP proxy serv-ers and 192 phone clients. These call traces are numberedv4–v7. In terms of the average call duration, the respec-tive values for the call traces are {v4:2.79, v5:4.53, v6:4.02,v7:5} (unit: minutes). In terms of the average inter-arrivaltime between consecutive phone calls, the values are {v4:3.6,v5:3.71, v6:4.79, v7:5.48} (unit: minutes). The call tracesinclude the network packets collected by tcpdump [17] fromthe system. The detection result is presented in Table 9. Thefalse alarm rate is defined as the number of alarms triggeredover the number of rule instantiations.

Here a detection rule is instantiated when any of the rulecomponents in the corresponding detection rules (Tables 3, 4,5, and 6) is generated by a RMEL. For the DoS Attack Detec-tion rule (Table 7), the rule is instantiated every time a RTPpacket is sniffed and inspected by SpaceDive. The numberof rule instantiations from “Compromised SIP Proxy” is thenumber of calls placed, as the rule is instantiated by the veryfirst INVITE message from a phone call. The number of ruleinstantiations from “Call Hijacking” and “Man in the middle”is the number of connected calls, as the rule is instantiated bythe OK message or the call established state. Since a calleecan be busy or unavailable, the number of rule instantiationsfrom “Call Hijacking” and “Man in the middle” is always lessthan or equal to the number from “Compromised SIP Proxy”.The rule for detecting “BYE Attack” is instantiated by theBYE message and can be seen as the number of connectedphone calls that were torn down gracefully. For the “DoS inRTP Streams” rule, the number of rule instantiations corre-sponds to the number of RTP streams found in the call trace.Since RTP streams are bidirectional, we found roughly 1,000RTP streams out of the call traces each of which consists ofapproximately 500 calls.

Overall, we found that the false positive rates from Space-Dive rules used in detecting the three SIP based attacks arevery low (< 0.5%). The false positive rates from the “DoSin RTP Streams” are slightly higher but still low (< 4%).

The higher false positive rates from the DoS detection rulein Table 7 are understandable since RTP streams carry hugetraffic and jitters can occur normally even when DoS attacks

Table 9 False Positive Rates from each SpaceDive detection rule with respect to four different legitimate call traces

v4 v5 v6 v7

f.p. (%) a.t. r.i. f.p. (%) a.t. r.i. f.p. (%) a.t. r.i. f.p. (%) a.t. r.i.

Call Hijacking 0.00 0 450 0.00 0 450 0.00 0 461 0.00 0 452

Man in the middle 0.00 0 450 0.00 0 450 0.00 0 461 0.00 0 452

BYE 0.23 1 426 0.24 1 423 0.00 0 442 0.23 1 437

Compromised SIP Proxy 0.00 0 526 0.00 0 524 0.00 0 517 0.00 0 515

DoS in RTP Streams 3.18 31 976 3.90 38 974 1.94 19 978 2.79 27 967

a.t. : Number of alerts triggeredr.i. : Number of rule instantiationsf.p. : False Positive rate = (a.t.)/(r.i)

Table 10 Number of falsealarms from default Snort rulesfor the four different legitimatecall traces

v4 v5 v6 v7

[1 : 402 : 7] ICMP Destination Unreachable Port Unreachable 3823 5535 8413 5535

[1 : 1417 : 9] SNMP request udp 51 61 37 61

[1 : 1419 : 9] SNMP trap udp 32 54 73 54

[1 : 1384 : 8] MISC UPnP malformed advertisement 1222 1768 2691 1768

[1 : 527 : 8] BAD-TRAFFIC same SRC/DST 124 224 242 224

[1 : 553 : 7] POLICY FTP anonymous login attempt 0 1 1 1

123

172 Y.-S. Wu et al.

are not present. The false positive rate can be decreased byassigning a higher RTP sequence number gap threshold toflag an alarm. However, a tradeoff is that the number of falsepositives can be undesirably brought down as well.

Table 10 shows the number of alerts with respect to thefour legitimate VoIP call traces from the default Snort rules(v 2.2.0 Build 30). These alerts are all false positives. Here[x:y:z] is the corresponding Snort rule ID and revision num-ber. Compared to the number of alerts seen from SpaceDiverules (Table 9), the false positives from Snort rules are signifi-cantly higher. Thus, unmodified Snort is not suitable for VoIPattack detection since the high alert rates can overwhelm asystem administrator.

12 Conclusion and future work

We have presented the design and implementation of a state-ful and cross-protocol IDS for VoIP systems called Sci-dive. We further extend the design principles of Scidiveto build a distributed and correlation based IDS for VoIPsystems—SpaceDive. SpaceDive places local rule match-ing engines (RMEL) at individual VoIP components and aremote rule matching engine (RMER) for each domain tocorrelate events across components. The RMEL leveragesthe fast packet matching capability of Snort and augmentsSnort’s rule language to perform stateful and cross-proto-col detection. SpaceDive presents a flexible, easy to parse,high level rule language to match network events. Severalattack scenarios are presented and classified to bring out thepower of a correlation based IDS for VoIP. The design isimplemented in a testbed and demonstrated to increase theresilience of a VoIP client to DoS attacks. The cost of the rulematching is also quantified.

In ongoing work, we are looking at techniques to reducefalse alarms from the RME′

Ls through correlation, use of theRMEH to have rules private to an organization, and negoti-ation protocol between the RME’s automatically generated

from the rulebase. Another important area of our ongoingresearch is that of peer-to-peer VoIP systems. Peer-to-peerVoIP systems have a great potential for wide-spread deploy-ment—an example is the popular Skype system. We are alsolooking at ways to detect Spam in VoIP systems. Finally, weare investigating the effectiveness of our system in presenceof secure protocols—for example, secure RTP (SRTP).

References

1. ITU-T: Packet-based multimedia communications systems. Rec-ommendation H.323 February (1998)

2. Handley, M. et al.: SIP: Session Initiation Protocol. RFC 2543,March (1999)

3. Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V.: RTP: ATransport Protocol for Real-Time Applications. IETF, RFC 3550,July (2003)

4. Arango, M. et al.: Media Gateway Control Protocol (MGCP) Ver-sion 1.0. RFC 2705 October (1999)

5. Cuervo, F. et al.: Megaco Protocol Version 1.0. RFC 3015, Novem-ber (2000)

6. Baugher, M. et al.: The Secure Real-time Transport Protocol(SRTP). RFC 3711. March (2004)

7. The Snort Intrusion Detection System: http://www.snort.org8. Express Router, S.I.P.: (ser) http://www.iptel.org/ser/9. X-Lite: http://xten.com/index.php?menu=X-Series

10. Prelude Hybrid IDS. Available at: http://www.prelude-ids.org11. Software, I.B.M.: IBM Tivoli Intrusion Manager. Available at:

http://www.ibm.com/software/tivoli/products/intrusionmgr/12. Giovanni Vigna: William Robertson, Vishal Kher, Richard A. Kem-

merer: A stateful intrusion detection system for world-wide webservers. In: Proceedings of the 19th Annual Computer SecurityApplications Conference. Las Vegas, Nevada 8–12 December 2003

13. Debian GNU/Linux: KDE K-Phone. Available at: http://www.wirlab.net/kphone/

14. Microsoft, “MSN Messenger v. 6.1,” Available at: http://www.messenger.msn.com/

15. ITU-T: Call Signaling protocols and media stream packetization forpacket-based multimedia communication systems. Recommenda-tion H.225.0, February (1988)

16. ITU-T: Control protocol for multimedia communication. Recom-mendation H.245, September (1988)

17. tcpdump/libpcap, Available at: http://www.tcpdump.org/

123