Covert channel detection in VoIP streams

6
Covert Channel detection in VoIP streams Gonzalo Garateguy Department of Electrical and Computer Engineering University of Delaware Newark, DE 19711 [email protected] Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware Newark, DE 19711 [email protected] Juan Pelaez U.S. Army Research Laboratory Adelphi, MD 20783 [email protected] Abstract—This paper presents two approaches to detect VoIP covert channel communications using compressed versions of the data packets. The approach is based on specialized random projection matrices that take advantage of prior knowledge about the normal traffic structure. The reduction scheme relies on the assumption that normal traffic packets belong to a subspace of smaller dimension or that can be included a convex set. We show that through the incorporation of this information in the design of the random projection matrices, the detection of anomalous traffic packets can be performed in the compressed domain with a slight performance loss with respect to the uncompressed domain. The validation of the detection algorithm is based on real data captured on a test bed designed to that end. I. INTRODUCTION VoIP is one of the most popular services in IP networks and is being used not only at the user level but also for inter and intra company communications as well as to replace traditional analog land lines. With the increase of the traffic volume due to VoIP services the suitability of using it for steganographic purposes had become an important threat to network security. In recent studies [1]–[4] many techniques to disguise stegano- graphic information in media and call signaling protocols had been proposed, showing a surprising capacity to exfil- trate great amounts of information. Considering the inevitable convergence of voice, video and data communications in both commercial and tactical environments; new techniques to uncover VoIP covert channels are of high interest to avoid exfiltration of sensitive information. Among all the protocols used in VoIP communications (e.g. SIP, SDP, RTP, RTCP, etc) media protocols represent the biggest threat to network security. In an average call, media traffic packets correspond to 99% of the total number of packets transmitted. The method proposed in this paper focuses on the analysis of the RTP media transport protocol but the same technique can be applied to other signaling and control protocols. Since a typical VoIP call is in the order of minutes, considerable amounts of data have to be analyzed to effectively detect potential covert channel communications. In this context an in depth inspection of the packets becomes a very computationally intensive task. The analysis of packets can degrade the quality of a VoIP call or even make this communication impractical due to the sensitivity of voice traffic to latency. Our proposed solution uses compressive sensing techniques to acquire sketches of the data packets and then perform the detection and classification SIP Signaling Server OpenSIPS SIP client A SIP client B Sniffing station (wireshark) 10.43.42.2/24 Fig. 1. Test bed used to capture the VoIP traffic data in the compressed domain, thus reducing processing time and storage capacity. We present a procedure to design specialized random matrices as in [5], [6] which take advantage of prior knowledge about the normal traffic structure. Once the data dimensionality is reduced, classification is performed using support vector machines based on training samples from both anomalous and normal data. We show that classification using support vector machines gives good performance for high compression ratios as stated in [7]. II. DATA CAPTURE AND FORMATTING The traffic used to test the detection algorithms was gen- erated in a test bed built to that end. A diagram of the setup is depicted in Figure 1. The test bed consist of a signaling server executing Opensips [8], two clients which are capable of executing several types of softphones in both Windows and Linux environments and a sniffing station used to capture signaling and media traffic. The sniffing station runs wireshark software which includes a module to detect SIP signaling and identify individual RTP streams associated to a each of the active VoIP calls. The VoIP analysis module of wireshark allows to extract all the packets associated to a call and save them in the RTPdump format specified in [9]. The Client stations are dual boot machines capable of executing Windows 978-1-4244-9848-2/11$26.00©2011 IEEE

Transcript of Covert channel detection in VoIP streams

Covert Channel detection in VoIP streams

Gonzalo Garateguy

Department of Electrical and

Computer Engineering

University of Delaware

Newark, DE 19711

[email protected]

Gonzalo R. Arce

Department of Electrical and

Computer Engineering

University of Delaware

Newark, DE 19711

[email protected]

Juan Pelaez

U.S. Army Research Laboratory

Adelphi, MD 20783

[email protected]

Abstract—This paper presents two approaches to detect VoIPcovert channel communications using compressed versions ofthe data packets. The approach is based on specialized randomprojection matrices that take advantage of prior knowledge aboutthe normal traffic structure. The reduction scheme relies on theassumption that normal traffic packets belong to a subspace ofsmaller dimension or that can be included a convex set. We showthat through the incorporation of this information in the designof the random projection matrices, the detection of anomaloustraffic packets can be performed in the compressed domain with aslight performance loss with respect to the uncompressed domain.The validation of the detection algorithm is based on real datacaptured on a test bed designed to that end.

I. INTRODUCTION

VoIP is one of the most popular services in IP networks and

is being used not only at the user level but also for inter and

intra company communications as well as to replace traditional

analog land lines. With the increase of the traffic volume due

to VoIP services the suitability of using it for steganographic

purposes had become an important threat to network security.

In recent studies [1]–[4] many techniques to disguise stegano-

graphic information in media and call signaling protocols

had been proposed, showing a surprising capacity to exfil-

trate great amounts of information. Considering the inevitable

convergence of voice, video and data communications in

both commercial and tactical environments; new techniques

to uncover VoIP covert channels are of high interest to avoid

exfiltration of sensitive information. Among all the protocols

used in VoIP communications (e.g. SIP, SDP, RTP, RTCP,

etc) media protocols represent the biggest threat to network

security. In an average call, media traffic packets correspond

to 99% of the total number of packets transmitted. The method

proposed in this paper focuses on the analysis of the RTP

media transport protocol but the same technique can be applied

to other signaling and control protocols. Since a typical VoIP

call is in the order of minutes, considerable amounts of data

have to be analyzed to effectively detect potential covert

channel communications. In this context an in depth inspection

of the packets becomes a very computationally intensive task.

The analysis of packets can degrade the quality of a VoIP

call or even make this communication impractical due to the

sensitivity of voice traffic to latency. Our proposed solution

uses compressive sensing techniques to acquire sketches of the

data packets and then perform the detection and classification

SIP Signaling Server

OpenSIPS

SIP client ASIP client B

Sniffing station

(wireshark)

10.43.42.2/24

Fig. 1. Test bed used to capture the VoIP traffic data

in the compressed domain, thus reducing processing time and

storage capacity. We present a procedure to design specialized

random matrices as in [5], [6] which take advantage of prior

knowledge about the normal traffic structure. Once the data

dimensionality is reduced, classification is performed using

support vector machines based on training samples from both

anomalous and normal data. We show that classification using

support vector machines gives good performance for high

compression ratios as stated in [7].

II. DATA CAPTURE AND FORMATTING

The traffic used to test the detection algorithms was gen-

erated in a test bed built to that end. A diagram of the setup

is depicted in Figure 1. The test bed consist of a signaling

server executing Opensips [8], two clients which are capable

of executing several types of softphones in both Windows

and Linux environments and a sniffing station used to capture

signaling and media traffic. The sniffing station runs wireshark

software which includes a module to detect SIP signaling and

identify individual RTP streams associated to a each of the

active VoIP calls. The VoIP analysis module of wireshark

allows to extract all the packets associated to a call and save

them in the RTPdump format specified in [9]. The Client

stations are dual boot machines capable of executing Windows

978-1-4244-9848-2/11$26.00©2011 IEEE

bit

offset0-1 2 3 4-7 8 9-15 16-31

0 Ver. P X CC M PT Sequence Number

32 Timestamp

64 SSRC identifier

96 CSRC identifiers (optional)

RTP header Extension (optional)

Payload

RTP padding RTP count

SRTP master key identifier (MKI optional)

Authentication tag

Fig. 2. RTP packet format

and Linux operating systems and the clients used are X-lite

softphone for Windows and Twinkle softphone for linux. In

addition also an exfiltration client is installed at Client A

allowing to transmit non-voice data to Client B. This packets

are considered attacks and are the ones we try to identify.

A. Data matrix formation

The data used in the simulations is prepared as follows. The

stream of packets captured is divided into groups and then each

group of packets is arranged as a column of the data matrix

D. As the size of the packets might change during a single

call, we group a number of packets that does not exceed a

preset number m of bytes per column. If the number of bytes

exceeds the limits, the last packet in the group is mapped

onto the following column and the remaining bytes of the

present column filled with 0. In this way the data matrix Dalways has dimensions m × n where m is fixed and n can

change according to the number of groups formed from the

captured packets. The standard fields of RTP packet headers

(see Figure 2) are mapped to the beginning of the columns.

The remaining bytes in the packet (including the payload)

are associated to one entry immediately after the headers (see

Figure 3), all this values are normalized to the interval [0, 1]by dividing its values over 255. Considering the format of

the RTP packets, most of the fields take values in a small

set, Ver ∈ {2}, P ∈ {0, 1}, X ∈ {0, 1}, CC ∈ {0, .., 16}.

Some increase monotonically from one packet to the next

one, for example the Timestamp field and the Sequence

number field. And some remain fixed for the whole call as

the SSRC field. Since the Sequence number, Timestamp

and SSRC fields take large values with respect to the other

fields only the difference between the present and a previous

packet is stored in the data matrix.

B. Normal and anomalous data

The exfiltration attacks were carried out using an attacking

tool developed by Salare Security [10]. This tool inject the

steganographic content in the payload of the RTP packets

while keeping the values of the header fields in their typical

values. Different types of data was exfiltrated, i.e. JPG images,

PDF files and Text files. The codec declared in the headers

. . . . . . . . . . . . . . . .

(np-nh).L rows

nh.L rows

n colums

headerspayloads

m rows

Stream of RTP Packets

. . . . . . . .

Data Matrix

Fig. 3. nh is the number of fields in the header that are mapped at thebeginning of each column, np is the maximum number of bytes per packetallowed and L is the number of packets per column. If a group of L packetsexceeds np bytes the last one is mapped to the next column.

of exfiltration packets was G711 which was also the codec

used in the normal traffic generated. Several minutes of normal

traffic calls were recorded using speech audio content.

III. DIMENSIONALITY REDUCTION AND SIGNAL

SEPARATION

In the proposed algorithm the data used to classify the

traffic is sampled and compressed while taking advantage

of the prior knowledge about the normal signal structure.

Random Projections have shown a great capacity to capture the

fundamental characteristics of signals that have considerable

structure, allowing to perform filtering, detection and classi-

fication in a dimensionally reduced space. It is demonstrated

that the loss incurred by the dimensionality reduction via ran-

dom projections, with respect to the classification in the high

dimensional space, can be bounded with arbitrary precision as

long as the number of dimensions of the reduced space and the

random matrices are designed properly [11]–[14]. Moreover,

if prior information with respect to the normal behaviour of the

signal is known (i.e. a basis of a subspace containing normal

signals), the performance of classification and detection can

be further improved. If the data is represented as a vector

x ∈ RN (i.e. one of the columns of our data matrix) the

random projections are calculated as y = Φx with Φ being

a random matrix of size M × N , M ≪ N . In this case we

take advantage of the prior knowledge about the normal traffic

structure in the design of the random matrix Φ. We assume

two basic models, the first one is the subspace or affine space

model and the second one is a convex set covering model.

In the first case we assume that normal traffic belongs to a

particular subspace S of smaller dimension than the ambient

978-1-4244-9848-2/11$26.00©2011 IEEE

space RN and also that anomalous vectors lie in a different

subspace or affine space which may have a small intersection

with the normal traffic space. In the second case, the model

assumes that there exist a convex set that only contains normal

traffic vectors excluding all the anomalous vectors.

A. Subspace model

Under the subspace model the random matrix Φ is designed

as the composition of two linear transformations. The first

transformation projects the data vectors over S⊥ and then

a random matrix is used to reduce the dimensionality. The

orthogonal projection matrix is defined as PS⊥ = I −B(BT B)−1BT where B is a matrix whose columns generate

the subspace S. The random projection is performed by means

of a random matrix G of dimensions M × N with M < Nand such that

gi,j =

+√

3/M with probability 1

6

0 with probability 2

3

−√

3/M with probability 1

6

. (1)

The specialized random matrix for the subspace model

assumption is then Φ = G.PS⊥ . This type of matrices are

known to fulfill the RIP property if the dimension of the

subspace S⊥ is small enough [15], this ensures that distances

are almost preserved in the low dimensional space. To estimate

a basis of the subspace we use a training sample {xi}Ti=1,

with T > N taken from a segment of normal traffic. Before

the estimation the mean value of the sample is subtracted

xi = xi − µ, where µ = 1

T

∑Ti=1

xi. We use a variation of

MacQueen algorithm [16] to train a set of Nq points with

Nq ≤ N , that generate the space of normal traffic. The

algorithm used is the following.

1) Take Nq points at random from the training sample

A = {xi}Ti=1 and denote that set by Q = {zi}

Nq

i=1

2) initialize indices ji = 1 for all i = 1, .., Nq

3) take a new point x from A\Q4) Find a zi that is closest to x in L1 norm and update the

point zi by zi = ji.zi+xji+1

5) Update ji associated with the above zi by setting ji =ji + 1

6) Repeat steps 3 to 5 until there are no remaining points

in A\Q.

This algorithm is similar to k-means algorithm for vector

quantization and allows to find the optimal vector quantizer

according to the distribution of the data and a given dis-

tance metric. In particular since our data is contained in a

subspace, the quantization points would also be contained in

that subspace. If the number of quantization vectors Nq is

equal to the dimension of the subspace S, and the resulting

vectors are linearly independent they form a basis of the

subspace, while if the number is smaller than n the subspace

generated by them is included in S. Changing the number

of vectors Nq allows us to control the dimension of the

subspace generated and therefore the sparsity of the vectors

PS⊥ x. The drawback of choosing a number of vectors Nq

smaller than the dimension of the subspace is that some of

the normal traffic components will appear in the projection

reducing the maximum achievable compression ratio. A way

to estimate the dimension of the subspace is to set Nq = Nand then calculate the SVD decomposition of the Matrix Zwhose columns are the elements of Q = {zi}

Nq

i=1. Then select

Nq such that 1 −∑Nq

i=1σi(Z)/

∑Ni=1

σi(Z) ≈ 0.001, where

{σ1(Z), ..., σN (Z)} are the singular values of Z. With this

value of Nq run the clustering algorithm again to obtain a

generator of S. Orthonormalizing the elements of Q we can

simplify the computation of the orthogonal projection matrix

to PS⊥ = I − BBT .

B. Convex set covering model

Even though the subspace model appears to be a good

model for the type of VoIP traffic tested in our experiments,

it might be the case when attacks and normal traffic lay on

the same or almost the same subspace but are still separable.

If we can find a convex set that includes most of the normal

vectors while keeping the anomalous vectors outside the set

we can take advantage of it to improve the performance of

the classifier in the compressed domain. In a similar fashion

as we defined the projection over a subspace we can define

the projection over the orthogonal complement to the convex

set as PC⊥(x) = x−PC(x) where PC(x) = argminy∈C

||x−y||2.

The calculation of the projection PC(x) for a general convex

set can be very complex, however there are some sets like half

planes, balls or ellipsoids in which it can be easily defined. In

this work we use an elliptical closed convex set oriented along

the principal vectors of a training sample with scale parameters

along those axes given by the singular values of the sample.

From the training sample of normal traffic {xi}Ti=1 we form

a matrix A ∈ RN×T whose columns are the elements of the

sample. This matrix can be decomposed as ASV D= UΣV T ,

where U and V are unitary square matrices and Σ is a rect-

angular matrix with non zero elements only in the diagonal.

The principal directions along which the data is distributed

are given by the columns of U while the scales along each

of the directions are given by the associated singular values

σi(A). Without loss of generality we can assume that N < Tand then form the matrix ΣN by eliminating the last T − Ncolumns of Σ. If we also restrict the matrix V in the same

manner to obtain VN , we have that A = UΣNV TN . Based on

this restricted representation the projection over the ellipsoid

is defined by

PEρ(x) = UΣ

1/2

N PBρ(Σ

−1/2

N UT x) (2)

where the operator PBρis the projection over the centered

ball of radius ρ in RN

PBρ(x) =

{

x , if ||x||2 < ρ

ρ x||x||2

, if ||x||2 > ρ.(3)

978-1-4244-9848-2/11$26.00©2011 IEEE

The projection defined in equation 2 is the composition

of several transformations. First we rotate the vector x by

means of the unitary matrix U which aligns the principal

vectors with the coordinate axis. Then we rescale the vector by

means of Σ−1/2

N . The composition of this two transformations,

maps vectors inside an ellipsoid to vectors inside a sphere

allowing to use PBρ(x) to find the projection over a ball

and then inverting the rotation and scaling. Clearly the size

of the ellipsoid is controlled by the parameter ρ. In the

following we will see that a correct selection of this parameter

is fundamental in the performance of the classifiers as the

compression ratio increases.

Since the matrix A is usually low rank, say rank(A) = r <min(N,T ) there would be N − r elements that are zero or

near zero in the diagonal of ΣN . That presents a problem in

the calculation of Σ−1/2

N but it can be easily overcome if we

modify the matrix Σ by ΣN = ΣN + δI with δ > 0 being a

small value.

After calculating the projection over the orthogonal com-

plement to the convex set the dimensionality is reduced using

a random matrix of the class defined in equation 1. The final

dimensionality reduction operator is Φ(x) = G.PE⊥ρ

(x).

C. Classification

Our main goal is to determine the feasibility of performing

classification of the data packets sketches given that we have a

priori information about the structure of normal and anomalous

traffic. To that end, we employ compressed training samples

from normal and anomalous traffic to determine the best

separating surface between the two classes. The simplest and

fastest classifier is a linear kernel support vector machine.

Even though other kernels might yield better results, we

leave this analysis for future work and focus on the basic

linear kernel here. Given a training sample of normal traffic

{xni }

Ti=1 and a sample of attacks {xa

i }Ti=1 both with elements

in RN we obtain the compressed training samples {yni }

Ti=1

and {yai }

Ti=1 by applying the dimensionality reduction operator

to the original elements yni = GP (xn

i ) and yai = GP (xa

i ).Here, the operator P (·) refers either to the projection over the

orthogonal subspace S⊥ or the projection over the orthogonal

complement to the convex set C⊥. Additionally we associate

the labels lni = 1 to the samples in the first group, and the

labels lai = −1 to the samples in the second group, forming the

sets of labeled and compressed training samples {(yni , lni )}T

i=1

and {(yai , lai )}T

i=1. The optimal separating hyperplane is given

by two parameters, the normal vector to the plane w and a

bias scalar parameter b. This parameters are the solutions of

the following problem

minimizew,b,ξi

1

2w

Tw + C

i ξi (4)

subject to lni (wT yni + b) ≥ 1 − ξn

i , i = 1, .., T

lai (wT yai + b) ≥ 1 − ξa

i , i = 1, .., T

ξa,ni ≥ 0

which finds the hyperplane with maximal margin and mini-

mal misclassification over the selected training samples. The

10 20 30 40 50 60 70 80 900.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

compression ratio in %

ave

rag

e p

rob

ab

ility

of

co

rre

ct

cla

ssific

atio

n

Nq=315

Nq=300

Nq=280

Nq=150

Nq=40

Fig. 4. The different graphs present the results using different numberof vectors Nq = i to estimate a basis of the subspace S. Each pointwas calculated averaging the results of 10 different classifiers learned usingdifferent training samples. The compression ratio is the quotient between thenumber of rows and the number of columns of the random matrix G used inthat series of experiments.

classification function based on this hyperplane is given by

g(x) = sign(〈w, x〉 + b).

IV. EXPERIMENTAL RESULTS

To test the two approaches for compressed classification

we recorded several minutes of conversations in our test bed.

The anomalous packets were generated injecting .pdf, .txt

and .jpg files in the payloads of the RTP packets. The data

matrices were then generated from this streams as described

in section II-A. The number of rows in the matrix required

to store a normal traffic packet is 9 for the header fields

and 160 for payload bytes while for the anomalous packets

is 9 for header and 161 for the payload. Taking that into

account we set the number of rows of the data matrices to

340 which suffice to accommodate 2 normal or anomalous

packets. As a consequence of different packet lengths, normal

and anomalous data matrices differ in the last 2 rows with the

normal data matrix having lower rank than the anomalous data

matrix. For this reason we generate 2 different data sets. The

first one corresponding to the original data matrices with 340

rows and the second one corresponding to the restriction of

this matrices to the first 338 rows. The classification approach

based on the subspace model was tested in the first data set

while the convex set covering approach was tested using the

second data set. Both datasets are available for downloaded at

http://www.ece.udel.edu/∼garategu/CISS2011-data/.

Figure 4 depicts the results of the classification using the

subspace model over the first dataset. The probability of cor-

rect detection for each level of compression was calculated av-

eraging the correct classification rate of 10 different classifiers

trained from 1000 normal and anomalous samples. For each

of the classifiers we calculate the rate of correct classification

using a sample of 4500 points different from the ones used

in the training stage. It can be seen that the performance of

978-1-4244-9848-2/11$26.00©2011 IEEE

10 20 30 40 50 60 70 80 900.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

compression ratio in %

ave

rag

e p

rob

ab

ility

of

co

rre

ct

cla

ssific

atio

n

Nq=2

Nq=10

Nq=30

Nq=90

Nq=180

Fig. 5. The graphs show the classification performance for the subspacemodel in the second data set, varying the level of compression and the numberof vectors Nq used in the basis estimation

the classification decrease with the compression rate if the

dimension of the estimated subspace is smaller that the true

dimension, which is 321 in this case. When the number of

vectors in the basis approaches the dimension of the subspace,

all the normal vectors are mapped to 0 by PS⊥ while the

anomalous vectors still have components outside the subspace.

The good performance achieved here can be attributed to the

fact that the subspace assumption is clearly true. Normal traffic

vectors have the last 2 components equal to 0 while anomalous

vectors don’t.

In the case of the second data set, the problem is more

challenging since it is not obvious that the anomalous and

normal vectors belong to different subspaces. Figure 5 shows

the results of repeating the experiment of Figure 4, but

this time using the second data set. We can see that the

projection over the orthogonal subspace actually degrade the

average performance of the classifier as Nq approaches the

dimensionality of the normal traffic subspace. This results

confirm the fact that the subspace model assumption doesn’t

hold for this dataset. If we use the convex set covering model

instead (see figure 6), the classification accuracy improves but

strongly depend on the selection of the parameter ρ which

define the size of the ellipsoid. In this simulation the average

probability of correct classification was calculated averaging

the results of 10 different classifiers for each compression

ratio.

V. DISCUSSION AND CONCLUSIONS

We have presented two simple methods to for the classifica-

tion of VoIP data packets taking advantage of the knowledge

about the structure of normal traffic. The Subspace model

method have the advantage of being very simple and yield

excellent performances if the vectors are clearly separable. It

only requires the multiplication of one column of the data

matrix with the projection matrix Φ, and the computation

of the discriminative function g(Φx) to label each new data

10 20 30 40 50 60 70 80 900.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

compression ratio in %

ave

rag

e p

rob

ab

ility

of

co

rre

ct

cla

ssific

atio

n

ρ=0

ρ=0.4

ρ=0.66667

ρ=0.8

ρ=1.0667

Fig. 6. The graphs shows the classification performance for different scalesof the ellipsoid used to calculate the projections, using the second dataset

vector. The computation of Φ involves the estimation of a basis

of the normal subspace, but this operation can be completed

offline and may be repeated over long periods of time to

account for variations in the normal traffic structure. On

the other hand if the anomalous traffic packets are on the

same or near the same subspace as the normal traffic the

model becomes too broad (see Figure 5) and we actually lose

separability by using the subspace information. The convex set

covering model on the contrary is more powerful, but at the

same time requires more computations in the high dimensional

space. Even though the matrices UΣ1/2

N and Σ−1/2

N UT can

be pre computed offline from a training sample of normal

traffic, the projection operator requires the computation of the

norm of each data vector, a comparison operation and possibly

one multiplication of a scalar by a vector before multiplying

by the random matrix G. We think that this work shows

promising results and demonstrates that the incorporation of

prior knowledge allows to compress network traffic data while

keeping relevant information that can be used for classification,

detection or analysis of statistical behaviour. Future directions

in this research includes the incorporation of non-linear kernels

in the support vector machine, which might help to improve

separability between classes. Another possibility to improve

the separability is to simply increase the dimension of the

data vectors. Augmenting the columns of the data matrix can

be sufficient to separate the subspaces enough so that the

subspace model can be easily applied.

REFERENCES

[1] T. Takahashi and W. Lee, “An assessment of voip covert channelthreats,” in Security and Privacy in Communications Networks and the

Workshops, 2007. SecureComm 2007. Third International Conference

on, pp. 371 –380, 2007.[2] J. LuBacz, W. Mazurczyk, and K. Szczypiorski, “Vice over ip,” IEEE

Spectrum, vol. 47, no. 2, pp. 42–47, 2010.[3] J. Lubacz, W. Mazurczyk, and K. Szczypiorski, “Hiding data in voip,”

in Proceedings of the Army Science Conference (26th), 2008.[4] W. Mazurczyk. and K. Szczypiorsk., “Steganography of voip streams,”

in On the Move to Meaningful Internet Systems: OTM 2008, pp. 1001–1018, Springer, 2008.

978-1-4244-9848-2/11$26.00©2011 IEEE

[5] Z. Wang, J. Paredes, and G. R. Arce, “Adaptive subspace compresseddetection of sparse signals,” submitted for publication, 2010.

[6] J. Paredes, Z. Wang, G. Arce, and B. Sadler, “Compressive matchedsubspace detection,” European Signal Processing Conf., 2009.

[7] R. Calderbank, S. Jafarpour, and R. Schapire, “Compressed learning:Universal sparse dimensionality reduction and learning in the measure-ment domain,” ht tp://dsp. rice. edu/files/cs/cl. pdf, 2009.

[8] OpenSIPS available at http://www.opensips.org/.[9] RTPdump, “Format specification.” available at

http://www.cs.columbia.edu/irt/software/rtptools/.[10] Salare-Security webpage http://www.salaresecurity.com/.[11] M. Davenport, M. Wakin, and R. Baraniuk, “Detection and estimation

with compressive measurements,” Dept. of ECE, Rice University, Tech.

Rep, 2006.[12] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Informa-

tion Theory, vol. 52, no. 4, pp. 1289–1306, 2006.[13] M. Duarte, M. Davenport, M. Wakin, and R. Baraniuk, “Sparse signal

detection from incoherent projections,” IEEE International Conference

on Acoustics, Speech, and Signal Processing (ICASSP), May 2006,pp. 305–308, 2006.

[14] J. Haupt, R. Castro, R. Nowak, G. Fudge, and A. Yeh, “Compressivesampling for signal classification,” Signals, Systems and Computers,

2006. ACSSC’06. Fortieth Asilomar Conference on, pp. 1430–1434,2006.

[15] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proofof the restricted isometry property for random matrices,” Constructive

Approximation, vol. 28, no. 3, pp. 253–263, 2008.[16] D. Quiang and T.-W. W., “Numerical studies of macqueen’s k-means

algorithm for computing the centroidal vronoi tessellations,” Computers

and Mathematics with Applications, vol. 44, no. 3, pp. 511–523, 2002.

978-1-4244-9848-2/11$26.00©2011 IEEE