Using Transcoding for Hidden Communication in IP Telephony

17
1 Using Transcoding for Hidden Communication in IP Telephony Wojciech Mazurczyk, Pawel Szaga, Krzysztof Szczypiorski Warsaw University of Technology, Institute of Telecommunications Warsaw, Poland, 00-665, Nowowiejska 15/19 Abstract. The paper presents a new steganographic method for IP telephony called TranSteg (Transcoding Steganography). Typically, in steganographic communication it is advised for covert data to be compressed in order to limit its size. In TranSteg it is the overt data that is compressed to make space for the steganogram. The main innovation of TranSteg is to, for a chosen voice stream, find a codec that will result in a similar voice quality but smaller voice payload size than the originally selected. Then, the voice stream is transcoded. At this step the original voice payload size is intentionally unaltered and the change of the codec is not indicated. Instead, after placing the transcoded voice payload, the remaining free space is filled with hidden data. TranSteg proof of concept implementation was designed and developed. The obtained experimental results are enclosed in this paper. They prove that the proposed method is feasible and offers a high steganographic bandwidth. TranSteg detection is difficult to perform when performing inspection in a single network localisation. Key words: IP telephony, network steganography, TranSteg, information hiding 1. Introduction Voice over IP (VoIP), or IP telephony, is one of the services of the IP world that is changing the entire telecommunication’s landscape. It is a real-time service, which enables users to make phone calls through data networks that use an IP protocol. An IP telephony connection consists of two phases, in which certain types of traffic are exchanged between the calling parties (Fig. 1). These are: Signalling phase – in this phase signalling protocol messages, e.g. SIP messages (Session Initiation Protocol) [33], are exchanged between the caller and callee. These messages are intended to setup and negotiate the connection parameters between the calling parties. Conversation phase – if the previous phase is successful then the conversation takes place, in the form of two audio streams which are sent bidirectionally. RTP (Real-Time Transport Protocol) [34] is most often utilised for voice data transport, thus packets that carry voice payload are called RTP packets (see Fig. 2). The consecutive RTP packets form an RTP stream. Steganography encompasses various information hiding techniques, whose aim is to embed a secret message (steganogram) into a carrier. Network steganography, to perform hidden communication, utilizes network protocols and/or relationships between them as the carrier for steganograms. Because of its popularity, IP telephony is becoming a natural target for network steganography [22]. Steganographic methods are aimed at hiding of the very existence of the communication, therefore any third-party observers should remain unaware of the presence of the steganographic exchange [35]. Fig. 1: Protocols for SIP-based VoIP In this paper we introduce a new steganographic method – TranSteg (Transcoding Steganography) – which is intended for a broad class of multimedia and real-time applications, but its main foreseen application is IP telephony. TranSteg can also be exploited in other applications or services (like video streaming), wherever a possibility exists to efficiently compress (in a lossy or lossless manner) the overt data. The typical approach to

Transcript of Using Transcoding for Hidden Communication in IP Telephony

1

Using Transcoding for Hidden Communication in IP Telephony

Wojciech Mazurczyk, Paweł Szaga, Krzysztof Szczypiorski

Warsaw University of Technology, Institute of Telecommunications

Warsaw, Poland, 00-665, Nowowiejska 15/19

Abstract. The paper presents a new steganographic method for IP telephony called TranSteg

(Transcoding Steganography). Typically, in steganographic communication it is advised for

covert data to be compressed in order to limit its size. In TranSteg it is the overt data that is

compressed to make space for the steganogram. The main innovation of TranSteg is to, for a

chosen voice stream, find a codec that will result in a similar voice quality but smaller voice

payload size than the originally selected. Then, the voice stream is transcoded. At this step the

original voice payload size is intentionally unaltered and the change of the codec is not

indicated. Instead, after placing the transcoded voice payload, the remaining free space is

filled with hidden data. TranSteg proof of concept implementation was designed and

developed. The obtained experimental results are enclosed in this paper. They prove that the

proposed method is feasible and offers a high steganographic bandwidth. TranSteg detection

is difficult to perform when performing inspection in a single network localisation.

Key words: IP telephony, network steganography, TranSteg, information hiding

1. Introduction

Voice over IP (VoIP), or IP telephony, is one of the services of the IP world that is changing the entire

telecommunication’s landscape. It is a real-time service, which enables users to make phone calls through data

networks that use an IP protocol. An IP telephony connection consists of two phases, in which certain types of

traffic are exchanged between the calling parties (Fig. 1). These are:

• Signalling phase – in this phase signalling protocol messages, e.g. SIP messages (Session Initiation

Protocol) [33], are exchanged between the caller and callee. These messages are intended to setup and

negotiate the connection parameters between the calling parties.

• Conversation phase – if the previous phase is successful then the conversation takes place, in the form of

two audio streams which are sent bidirectionally. RTP (Real-Time Transport Protocol) [34] is most often

utilised for voice data transport, thus packets that carry voice payload are called RTP packets (see Fig. 2).

The consecutive RTP packets form an RTP stream.

Steganography encompasses various information hiding techniques, whose aim is to embed a secret message

(steganogram) into a carrier. Network steganography, to perform hidden communication, utilizes network

protocols and/or relationships between them as the carrier for steganograms. Because of its popularity, IP

telephony is becoming a natural target for network steganography [22]. Steganographic methods are aimed at

hiding of the very existence of the communication, therefore any third-party observers should remain unaware of

the presence of the steganographic exchange [35].

Fig. 1: Protocols for SIP-based VoIP

In this paper we introduce a new steganographic method – TranSteg (Transcoding Steganography) – which is

intended for a broad class of multimedia and real-time applications, but its main foreseen application is IP

telephony. TranSteg can also be exploited in other applications or services (like video streaming), wherever a

possibility exists to efficiently compress (in a lossy or lossless manner) the overt data. The typical approach to

2

steganography is to compress the covert data in order to limit its size (it is reasonable in the context of a limited

steganographic bandwidth). In TranSteg compression of the overt data is used to make space for the

steganogram – the concept is similar like in invertible authentication watermark that was first proposed by

Fridrich et al. [10] for JPEG images. TranSteg bases on the general idea of transcoding (lossy compression) of

the voice data from a higher bit rate codec (and thus greater voice payload size) to a lower bit rate codec (with

smaller voice payload size) with the least possible degradation in voice quality.

The general idea behind TranSteg is as follows (the detailed procedures for different hidden communication

scenarios are described in Sec. 3). RTP packets carrying user voice are inspected and the codec originally used

for speech encoding (here called overt codec) is determined by analysing the PT (Payload Type) field in the RTP

header (marked in Fig. 2).

Fig. 2: RTP packet secured with SRTP protocol

Then, TranSteg finds an appropriate codec for the overt codec, called a covert codec. The application of the

covert codec yields a comparable voice quality but a smaller voice payload size than originally. Next, the voice

stream is transcoded, but the original, large, voice payload size and the codec type indicator are preserved, thus

the PT field is left unchanged. Instead, after placing the transcoded voice of a smaller size inside the original

payload field, the remaining free space is filled with hidden data (see Fig. 3).

It is worth noting that TranSteg can be utilized in different hidden communication scenarios – not only

between the sender and the receiver of the RTP stream (i.e. in an end-to-end manner – see Sec. 3 for details). It

can be also, in particular, utilized for secured VoIP streams e.g. using the most popular SRTP protocol [4]

(Secure RTP) that provides confidentiality and authentication for RTP packets. Fig. 2 illustrates the parts of RTP

packets that are encrypted and authenticated.

Fig. 3: Frame bearing voice payload: without (top) and with (bottom) hidden data inserted by TranSteg

TranSteg, like every steganographic method, can be described by the following set of characteristics: its

steganographic bandwidth, its undetectability and the steganographic cost. The term – steganographic bandwidth

– refers to the amount of secret data can be sent per time unit, when using a particular method. Undetectability is

defined as the inability to detect a steganogram within a certain carrier. The most popular way to detect a

steganogram is to analyse statistical properties of the captured data and compare them with the typical values for

that carrier. Lastly, the steganographic cost characterises the degradation of the carrier caused by the application

of the steganographic method. In the case of TranSteg, this cost can be expressed by means of providing a

measure of conversation quality degradation, induced by transcoding and the introduction of an additional delay.

The contributions of this paper are:

• A detailed presentation of a new VoIP steganographic method,

3

• An analysis of the properties of TranSteg with the aid of a proof of concept implementation: the

steganographic bandwidth, undetectability and the steganographic cost,

• Proposition of potential approaches for TranSteg detection.

The rest of the paper is structured as follows. Section 2 describes related work in VoIP steganography.

Section 3 describes in detail the functioning of TranSteg and its hidden communication scenarios. Section 4

discusses the proof of concept implementation and presents the experimental results. Finally, Section 5

concludes our work.

2. Related Work

IP telephony as a hidden data carrier can be considered a fairly recent discovery. The proposed steganographic

methods stem from two distinctive research origins. Firstly, from the well-established image and audio file

steganography [5] – these methods targeted the digital representation of voice as carrier for hidden data. The

second sphere of influence are the so called covert channels, created in different network protocols [1], [29] (a

good survey on covert channels, by Zander et al., can be found in [43]) – these solutions target specific VoIP

protocol fields (e.g. signalling protocol – SIP, transport protocol – RTP or control protocol – RTCP) or their

behaviour. Presently, steganographic methods that can be utilized in telecommunication networks are jointly

described by the term network steganography, or, specifically, when applied to IP telephony, by the term

steganophony [22].

The first VoIP steganographic methods that utilize the digital voice signal as a hidden data carrier were

proposed by Dittmann et al. in 2005 [7]. Authors had evaluated the existing audio steganography techniques,

with a special focus on the solutions which were suitable for IP telephony. This work was further extended and

published in 2006 in [19]. In [14], an implementation of SteganRTP was described. This tool employed least

significant bits (LSB) of the G.711 codec to carry steganograms. Wang and Wu, in [41], also suggested using the

least significant bits of voice samples to carry secret communication but here, the bits of the steganogram were

coded using a low rate voice codec, like Speex. In [36], Takahashi and Lee proposed a similar approach and

presented its proof of concept implementation – Voice over VoIP (Vo2IP), which can establish a hidden

conversation by embedding compressed voice data into the regular, PCM-based, voice traffic. The authors had

also considered other methods that can be utilized in VoIP steganography, like DSSS (Direct Sequence Spread

Spectrum), FHSS (Frequency-Hopping Spread Spectrum) or Echo hiding. Aoki in [2] proposed a steganographic

method based on the characteristics of PCMU (Pulse Code Modulation), in which the 0-th speech sample can be

represented by two codes due to the overlap. Another LSB-based method was proposed by Tian et al. in [40].

Authors had incorporated the m-sequence technique to eliminate the correlation among secret messages to resist

statistical detection. A similar approach, also LSB-based, relying on adaptive VoIP steganography was presented

by the same authors in [39]; a proof of concept tool - StegTalk – was also developed. In [27] Miao and Huang

presented an adaptive steganography scheme that based on smoothness of the speech block. Such an approach

proved to give better results in terms of voice quality than the LSB-based method.

Utilisation of the VoIP-specific protocols as a steganogram carrier was first proposed by Mazurczyk and

Kotulski in 2006 [23]. The authors proposed using covert channels and watermarking to embed control

information (expressed as different parameters) into VoIP streams. The unused bits in the header fields of IP,

UDP and RTP protocols were utilized to carry the type of parameter and the actual parameter value is embedded

as watermark into the voice data. The parameters are used to bound control information, including data

authentication to the current VoIP data flow. In [25] and [26] Mazurczyk and Szczypiorski described network

steganography methods that can be applied to VoIP: to its signalling protocol – SIP (with SDP), and to its RTP

streams (also with RTCP). They discovered that a combination of information hiding solutions provides a

capacity to covertly transfer about 2000 bits during the signalling phase of a connection and about 2.5 kbit/s

during the conversation phase. In [26], a novel method called LACK (Lost Audio Packets Steganography) was

introduced; it was later described and analysed in [24] and [22]. LACK relies on the modification of both: the

content of the RTP packets, and their time dependencies. This method takes advantage of the fact that, in typical

multimedia communication protocols, like RTP, excessively delayed packets are not used for the reconstruction

of the transmitted data at the receiver, i.e. the packets are considered useless and discarded. Thus, hidden

4

communication is possible by introducing intentional delays to selected RTP packets and substituting the

original payload with a steganogram.

Bai et al. in [3] proposed a covert channel based on the jitter field of the RTCP header. This is performed

two-stage: firstly, statistics of the value of the jitter field in the current network are calculated. Then, the secret

message is modulated into the jitter field according to the previously calculated parameters. The utilization of

such modulation guarantees that the characteristic of the covert channel is similar to that of the overt one. In [6],

Forbes proposed a new RTP-based steganographic method that modifies the timestamp value of the RTP header

to send steganograms. The method’s theoretical maximum steganographic bandwidth is 350 bit/s.

The TranSteg technique presented in this paper is a development of the latter of the discussed groups of

steganographic methods for VoIP, originating from covert channels. Compared to the existing solutions, its main

advantages are a high steganographic bandwidth, low steganographic cost (i.e. little voice quality degradation)

and difficult detection. However the last depends on the utilized hidden communication scenario. All of these

features will be described and analysed in the context of TranSteg in the consecutive sections.

3. Communication Scenarios, Functioning and Detection

The performance of TranSteg depends, most notably, on the characteristics of the pair of codecs (as

mentioned in the Introduction): one used originally to encode user speech – the overt codec, and one utilized for

transcoding – the covert codec. It is worth noting that, depending on the hidden communication scenario,

TranSteg may or may not be able to influence the choice of this codec. It is assumed that it is always possible to

find a covert codec for a given overt one. However, it must be noted, that for very low bit rate codecs, the

steganographic bandwidth shall be limited. In the ideal conditions the covert codec should:

• not degrade considerably user voice quality (caused by the transcoding operation and the introduced

delays), when compared to the quality of the overt codec.

• provide the smallest achievable voice payload size as it results in the most free space in an RTP packet

to convey a steganogram.

If there is a possibility to influence the overt codec (see the hidden communication scenarios below), in an

ideal situation it should:

• result in a largest possible voice payload size to provide, together with the covert codec, the highest

achievable steganographic bandwidth,

• be commonly used to not to raise suspicion.

Taking the above into account, TranSteg’s steganographic bandwidth (SB) can be expressed as:

(3-1)

where PSO denotes the overt codec’s payload size, PSC is the covert codec’s payload size and PNS describes

the number of RTP packets sent during one second.

TranSteg can be utilized in four hidden communication scenarios (Fig. 4). The first scenario (S1 in Fig. 4) is

the most common and typically the most desired: the sender and the receiver conduct a VoIP conversation while

simultaneously exchanging steganograms (end-to-end). The conversation path is identical with the hidden data

path. In the next three scenarios (marked S2-S4 in Fig. 4) only a part of the VoIP end-to-end path is used for

hidden communication. As a result of actions undertaken by intermediate nodes, the sender and/or the receiver

are, in principle, unaware of the steganographic data exchange. The application of TranSteg in IP telephony

connections offers a chance to preserve users’ conversation and simultaneously transfer steganograms. As noted

previously, this is especially important for scenarios S2-S4.

In the abovementioned scenarios it is assumed that potential detection (steganalysis), usually executed by a

warden [9], is not able to audit the speech carried in RTP packets because of the privacy issues related with this

matter. Thus, the presence of a steganogram inside RTP packet payload can remain undiscovered. Other

possibilities of TranSteg detection will be discussed in detail in subsection 3.3.

In the following part of this section TranSteg will be described with reference to the abovementioned scenarios.

The most important factor in this context is whether the Steganogram Sender (SS) is located on the same host as

]/[)( sbitPNPSPSSB SCO ⋅−=

5

the RTP packets’ issuer. Thus, it may be able to control the RTP stream transmitter. Otherwise, when located on

some intermediate network node, it will not be capable of such control.

Fig. 4: Hidden communication scenarios for TranSteg

TranSteg may be also influenced by the utilization of SRTP protocol, which is used to provide the RTP stream

with confidentiality and authentication. As mentioned in the Introduction, securing of the RTP stream does not

necessarily impede the possibility of the exploitation of TranSteg. Such mode of operation may potentially even

increase TranSteg’s undetectability – this effect will be further investigated throughout the following subsections.

3.1 Steganogram Sender controlling an RTP packet transmitter (scenarios S1 & S2) In scenario S1 a steganogram is embedded into an RTP packet and travels along the entire path between the

RTP stream sender and receiver. Thus, there is no need to execute the operation of transcoding. User voice can

be directly encoded with the desired covert codec with the omission of the prior encoding with the overt codec

and thus avoid the whole process of transcoding. Despite this, the RTP stream will appear to have been encoded

with the aid of the overt codec. The voice payload size and PT field in the RTP header shall not be changed. It is

assumed that the SS and SR had agreed prior on the choice of the covert codecs corresponding to different overt

codecs. Such common mapping may, for example, bind an overt codec G.711 with the covert codec G.726, or

Speex 24.6 kbit/s with Speex 8 kbit/s, etc..

Thus, the SS shall perform the following steps for the embedding of a steganogram (Fig. 5):

• Step 1: Set the RTP payload size and modify Payload Type in the RTP header according to the chosen

overt codec. These changes will indicate usage of the overt coding algorithm that will not, actually, be

utilized.

• Step 2: The voice transcoded with the covert codec is inserted into the overt codec’s RTP payload field.

• Step 3: Remaining free space is allocated for the hidden data and filled with a steganogram.

• Step 4: RTP packet is sent to the receiver.

When the modified TranSteg RTP stream reaches the SR, it extracts the voice payload and steganogram from

the consecutive packets. The voice payload is then used for speech reconstruction and the steganogram parts are

concatenated. This preserves the conversation functionality between the SS and SR and simultaneously enables

hidden communication. For a third party observer, even if he/she is able to physically monitor the activity of

both users (e.g. wiretap both locations) it will look like a regular call taking place.

To further mask the presence of TranSteg, SS can utilize the SRTP protocol to perform RTP payload

encryption of both: the voice coded with a covert codec and the steganogram; thus making the detection of

steganography even harder to perform (see Fig. 2).

The hidden communication scenario S1 offers most flexibility, and is advantageous when compared with the

remaining ones, because:

• SS can choose the overt codec and thus influence the resulting steganographic bandwidth.

• The delays introduced by TranSteg to the RTP stream are the smallest in this scenario as there is no time

consumption related with the transcoding (the voice is directly encoded with the covert codec).

• This scenario does not assume any required path of communication that the RTP stream should follow.

6

• To capacitate the exploitation of TranSteg, it is only necessary to modify the IP telephony client.

Notably, the RTP protocol is usually implemented in software, which means it can be easily modified.

No other protocol’s modifications are required (i.e. UDP and frame checksums).

• RTP stream can be, additionally, secured with the aid of the SRTP protocol – this can be utilized to mask

the contents of the transcoded voice data and the steganogram.

Fig. 5: The TranSteg concept – scenario S1 (SS – Steganogram Sender; SR – Steganogram Receiver)

In scenario S2, the main difference when compared with S1, is that the SR is situated at some intermediate

network node. Thus, the IP telephony conversation is performed between the SS (caller) and an unaware of the

steganographic procedure callee. The assumption in this scenario is that the SR is able to intercept and analyze

all RTP packets exchanged between the SS and the callee. The TranSteg procedure for SS remains the same as in

scenario S1. What changes is the behaviour of the SR.

When the tampered RTP stream reaches the SR, it performs the following steps:

• Step 1: It extracts voice payload and the steganogram from the RTP packets.

• Step 2: The voice payload is transcoded from the covert to overt codec and placed once again in

consecutive RTP packets. By performing this task the steganogram is overwritten with user voice data.

• Step 3: Checksums for the lower layer protocols (i.e. the UDP checksum and CRC at the data link if

they had been utilized) are adjusted.

• Step 4: Modified frames with encapsulated RTP packets are sent to the receiver (callee).

If the IP telephony connection is required to be secured with the SRTP protocol it does not impede the

possibility to utilize TranSteg. The session keys used for authentication and encryption are exchanged between

the calling parties before the conversation phase of the call and will be known in advance to the SS. This means

that when the SS initiates an RTP stream, the first RTP packets contain transcoded voice but are intentionally not

encrypted. Instead of a steganogram, they carry cryptographic keys that where negotiated between the SS and

callee. Upon their extraction, the SR is able to encrypt the transcoded voice payload prior to forwarding it to the

RTP packets receiver. Thus, the receiving party will not be aware of the steganographic procedure. Secondly, the

SR will be capable of performing bidirectional hidden communication.

To summarize scenario S2:

• SS can still choose the overt codec and thus influence the resulting steganographic bandwidth.

• The delays introduced by TranSteg to the RTP stream depend on one transcoding operation.

• There is an assumption that SR is on the communication path between the calling parties and is able to

oversee the whole RTP stream.

• TranSteg requires certain protocol modifications in the SR: the RTP and other network protocols (e.g.

the UDP or data link layer protocols).

• Utilization of SRTP between the calling parties is not an obstacle for TranSteg. Analogically to scenario

S1, it can be viewed as means to further mask hidden communication.

3.2 Steganogram Sender Located at an Intermediate Network Node (scenarios S3 & S4) In scenario S3, the assumption is that SS is able to intercept and analyse all RTP packets exchanged between

SR and the callee. SS does not control the RTP packet’s transmitter, thus it cannot pick a suitable overt codec.

7

However, SR is a legitimate (overt) receiver of the RTP stream. Thus it is able to influence the choice of overt

codec by negotiating it during the signalling phase of the call, with the calling party remaining unaware of the

steganographic procedure. The behaviour of the SS is similar to the behaviour of SR in scenario S2 (see Sec. 3.1).

The only difference is that SS is responsible for the transcoding from the overt to covert codec and for

embedding of the steganogram – the remaining steps are the same. Thus, SS behaves as follows:

• Step 1: For an incoming RTP stream it transcodes the user’s voice data from the overt to covert codec.

• Step 2: Transcoded voice payload is placed once again in an RTP packet.

• Step 3: The remaining free space of the RTP payload field is filled with steganogram’s bits (thus the

original voice payload is erased).

• Step 4: Checksums in lower layer protocols (UDP checksum and CRC at the data link) are adjusted.

• Step 5: Modified frames with encapsulated RTP packets are sent to the receiver (SR).

SR’s operation is solely limited to extraction and analysis of the voice payload and steganogram from

consecutive RTP packets (it is the same behaviour as in scenario S1, see Sec. 3.1).

In the presence of SRTP, in this scenario, the use of the TranSteg is not compromised – the conditions and the

solution (cryptographic key’s sharing between the SR and SS by means of TranSteg) is the same as in scenario 2

(see Sec. 3.1).

To summarize, in scenario S3:

• SR is responsible of influencing the choice of the overt codec by negotiating it with the calling party

(unaware of the steganographic procedure).

• The delays introduced to the RTP stream by TranSteg depend on one transcoding operation.

• There is an assumption that SS is on the communication path between the calling parties and is able to

oversee the whole RTP stream.

• TranSteg requires certain protocol modifications in the SS: the RTP and other network protocols (e.g.

the UDP or data link layer protocols).

• Utilization of SRTP between the calling parties is not an obstacle for TranSteg. Analogically to scenario

S1, it can be viewed as means to further mask hidden communication.

In scenario S4 it is assumed that both: SS and SR, are able to intercept and analyze all RTP packets exchanged

between the calling parties. Thus, SS and SR cannot at all influence the choice of the overt codec, because they

are both located at some intermediate network node (Fig. 6). Due to this fact they are bound to rely on the codec

chosen by the overt, non-steganographic, calling parties. This, in particular, can result in low steganographic

bandwidth as the hidden communication parties must adjust the covert codec to the negotiated overt codec. The

most significant advantage of this TranSteg scenario is its potential use of aggregated IP telephony traffic to

transfer steganograms. If both SS and SR have access to more than one VoIP call then the achievable

steganographic bandwidth can be significantly increased, which can compensate for the loss in steganographic

bandwidth caused by the inability to influence the choice of the overt codec.

The behavior of SS and SR is similar – they both perform transcoding: SS from overt to covert, and SR from

covert to overt codecs. Steganogram is exchanged only along the part of the communication path where RTP

stream travels “inside” the network – it never reaches the endpoints. The steps of the TranSteg scenario for SS

are exactly the same as in scenario S3 (see above) and SR follows the logic presented in scenario S2 (see Sec.

3.1). It is also worth noting that, in this scenario, the utilization of SRTP protocol for conversation security

entirely incapacitates the usage of TranSteg.

Fig. 6: The TranSteg concept – scenario S4 (SS – Steganogram Sender; SR – Steganogram Receiver)

8

To summarize scenario S4:

• There is an assumption that both: SS and SR, are on the communication path between the calling parties

and are able to oversee the whole RTP stream.

• Neither SS nor SR can influence the choice of the overt codec, which potentially leads to a decrease in

the steganographic bandwidth.

• There is a possibility to use aggregated VoIP traffic at the path between SS and SR, and thus

significantly increase TranSteg’s steganographic bandwidth.

• Neither SS nor SR are involved in the IP telephony conversation as overt calling parties. Thus, it is

harder to detect hidden communication between the SS and SR comparing to the previously described

scenarios (since neither is an initiator of the overt traffic).

• The delays introduced to the RTP stream are the highest compared with the other presented scenarios

(due to the two transcoding operations).

• TranSteg requires certain protocol modifications in both: SS and SR; these involve modifications to the

RTP and other network protocols (e.g. UDP or data link layer protocols).

• Utilization of SRTP to secure the conversation makes the use of TranSteg impossible.

3.3 TranSteg Detection As mentioned at the beginning of Section 3, we assume that during the TranSteg-based hidden communication

there is a warden executing detection (steganalysis) methods. We further assume that the warden will not be able

to “listen” to the speech carried in RTP packets because of the privacy issues related with this matter. However,

it must be emphasised that, if the SRTP protocol had been used for securing a TranSteg conversation, the warden

will fail to detect the presence of steganograms in the RTP stream, in any of the below-mentioned scenarios

(with the exception for S4).

To perform hidden communication, TranSteg utilizes modifications to the PDUs (Protocol Data Units) as a

carrier – more precisely to the RTP payload field. When compared with other steganographic VoIP methods that,

e.g. influence the order of the RTP packets or the delays between them, TranSteg does not introduce any

irregularities to the RTP stream.

The successful detection of TranSteg mainly depends on:

• the location(s) at which the warden is able to monitor the modified RTP stream.

• the utilized TranSteg scenario (S1-4).

If the warden is capable of inspecting traffic solely in a single network, e.g. in the LAN (Local Area Network)

of the overt transmitter or receiver, then the detection is hard to accomplish. The reason for the above is due to

the fact that an RTP stream at a single traffic inspection point resembles legitimate streams. The remaining cases

will be discussed below.

In scenario S1, there is no change of format of voice payloads during the traversing of the network. Thus, even

if the warden would monitor traffic in different networks – the result would always be the same. Thus, the

chances of TranSteg detection are very limited.

In scenarios S2 and S3 there is one transcoding operation, therefore the modification of the RTP packets’

payload can be detected if the warden is able to probe and compare traffic from two localizations: prior and post

the transcoding.

In scenario S4, there are three possible locations where the warden can inspect RTP traffic. They are marked

as 1, 2 and 3 in Fig. 6. If a warden can monitor traffic in networks: 1 and 2 or 2 and 3 the detection capabilities

are the same for scenarios S2 and S3. In the case when the warden is able to inspect the RTP stream in networks

1 and 3, where the voice format should be the same, then, due to the transcoding operation, some differences can

be noted. This case is further investigated in Section 4. Thus, compared to the others, this scenario is the most

susceptible to detection. Moreover, this scenario potentially can induce the largest voice quality degradation due

to the necessary two transcoding operations.

Communication via TranSteg can be thwarted by certain actions undertaken by the wardens. The method can

be defeated by applying random transcoding to every non-encrypted VoIP connection, to which the warden has

access. Alternatively, only suspicious connections may be subject to transcoding. However, such an approach

would lead to a deterioration of the quality of conversations. What must be emphasised, not only steganographic

calls would be affected – the non-steganographic calls could also be “punished”. In section 4 we provide

9

guidelines for pinpointing suspicious IP telephony connections: we investigate RTP payload byte values’

distribution as a possible indication of TranSteg utilization. It is worth noting that this approach will fail to

succeed if SRTP protocol is applied.

Due to the above, it is necessary to explore other possibilities, which could facilitate the development of an

efficient detection method fulfilling the constraints dictated by the VoIP’s real-time operation constraints. One

promising research direction worth pursuing is the adoption of the method proposed by Wright et al. in [42],

which can be utilized for SRTP encrypted payload. However, this technique is only applicable for variable bit

rate codecs. The authors of this work discovered that the lengths of encrypted RTP packets can be used to

identify phrases spoken within a call. Therefore, if extended, this approach can be applied to deduce the

characteristics of the employed speech codec, which would increase the probability of detection of covert

communication.

The summary and comparison of hidden communication scenarios with respect to TranSteg functioning and

detection (described in Sections 3.1-3.3) is presented in Table I.

Table I Comparison of hidden communication scenarios (S1-4)

Scenario S1 S2 S3 S4

SS/SR must be on the communication path to capture RTP stream - SR SS SS and SR

SS/SR influence the choice of the overt codec SS and SR SS SR -

Number of necessary TranSteg transcoding operations 0 1 1 2

Possibility to use aggregate VoIP traffic - - - +

Neither SS nor SR is a VoIP calling party (harder detection) - - - +

Necessary modifications to lower layer protocols, i.e. UDP

checksum, frame CRC

- + + +

SRTP utilization Masks

TranSteg

Masks

TranSteg

Masks

TranSteg

Prevents

TranSteg

Number of necessary network monitoring/probing localizations for

TranSteg detection - 2 2 2

4. TranSteg Implementation and Experimental Results

TranSteg implementation was developed for the hidden communication scenario presented in Fig. 6, i.e. when

both SS and SR are located at intermediate network nodes. This scenario was chosen because, from the

perspective of the delays introduced by TranSteg, it is a worst case scenario (due to the two transcoding

operations - first at SS and then at SR). Thus, with the aid of the prototype, we want to find out to what extent

TranSteg can degrade voice quality. For any other hidden communication scenario from Fig. 4, the introduced

delays will be lower.

4.1 TranSteg Proof of Concept Implementation

TranSteg proof of concept implementation was developed in C++ on the Linux Ubuntu 10.10 (kernel 2.6.35)

platform. Fig. 7 presents the functional architecture of the TranSteg prototype implementation. It is based on

GTK+ 2.24.4 [12] threads which are used for GUI support, packet capture control and steganogram exchange.

Netfilter [30] framework was also utilized for IP packet manipulation – modules: iptables 1.4.10 and

libnetfilter_queue 0.0.17 were used.

Fig. 7: TranSteg implementation architecture

All packets passing through a host are traversing iptables chains (see Fig. 8). There are three main types of

these chains:

10

• INPUT – chain for incoming (received) packets that are intended for a process running on the local

machine;

• OUTPUT – chain for packets that are being sent from a process running on the local machine;

• FORWARD – chain for packets that are being forwarded through the host (from one network interface

to another).

Fig. 8: IP packet traversing iptables chains

The developed TranSteg implementation enables controlling the iptables module. It is therefore possible to

select from which chain the packets will be taken for modifications. This permits for the recreation of all of the

previously discussed hidden communication scenarios (from S1 to S4) with the aid of the created prototype.

Table II presents which chains should be used for each scenario S1-4.

Table II SS and SR iptables chains for different hidden communication scenarios S1-S4

Scenario SS iptables chain SR iptables chain

S1 OUTPUT INPUT

S2 OUTPUT FORWARD

S3 FORWARD INPUT

S4 FORWARD FORWARD

While traversing the iptables chains, packets are in the kernel space, which is inaccessible from the user space.

Because of that, to perform modifications, packets from selected chains are put to a special QUEUE chain. This

chain does not appear in the normal/usual packet traversing paths (see Fig. 8). Subsequently, the

libnetfilter_queue module is used to send packets one by one to the user space, where TranSteg modifications

take place. This process will be described further. Following the modifications, packets are returned to the

QUEUE and continue traversing the iptables chains.

The TranSteg implementation bases on the G.711 (64 kbit/s) as an overt codec and G.726 (32 kbit/s) as the

covert one. Thus, the transcoding is from G.711 to G.726 at the SS and the inverse is performed at the SR. The

choice of the covert codec is entirely controlled by the SS and SR. Therefore, once the overt codec is known, a

proper covert codec can be selected (e.g. one that does not degrade the voice quality and offers high

steganographic bandwidth). In this TranSteg implementation G.711 codec was chosen as an overt codec because:

• It is the most popular speech codec that is widely used in IP telephony endpoints (both soft- and hard-

phones) - it is simple to implement and does not require any license to be used.

• Generally, most common hard-phones and VoIP hardware devices, like gateways, (e.g. Cisco IP Phones,

Sipura SPA-841, Vonage Phone Adapter, Linksys PAP2 or WRT54GP2) utilize codecs such as G.711,

G.723 or G.729, while most of the popular softphones (e.g. SJPhone, Google Talk or X-lite) often offer,

besides G.711, popular codecs like free Speex or iLBC (a list of VoIP clients and supported codecs can

be found at: http://www.ozvoip.com/voip-codecs/devices/). In the majority of the endpoints G.711 is

chosen as a default codec as it offers high quality – above 4 in MOS (Mean Opinion Score) scale – and

is most likely to guarantee a successful speech codec negotiation during the signalling phase of a

connection. For example, if we want to make a call between a Cisco IP Phone 7960 (which supports

G.711 and G.729) and a popular, free softphone X-lite (which supports G.711, GSM, iLBC and Speex)

successful connection negotiation is only possible with the G.711 codec.

• Whenever it is necessary to setup a call to PSTN (Public Switched Telephone Network) then, to avoid

unnecessary transcoding and provide interoperability, the G.711 codec is frequently utilized.

4.2 Experiment Methodology

The experimental setup used to evaluate TranSteg’s performance is presented in Fig. 9. The experimental

environment was a controlled LAN network, so that no RTP packets were lost or excessively delayed. There we

11

no network-related or endpoint-related interferences. Two hosts (A and B) took part in this experiment, both of

them working under Linux Ubuntu 10.10 (kernel 2.6.35). On host A a soft-phone and TranSteg SS were

launched, while on host B a soft-phone and TranSteg SR (Fig. 9).

Fig. 9: TranSteg experimental test-bed

RTP packets were exchanged between the soft-phones (Ekiga 3.2.7 [8] and Linphone 3.3.2 [21] soft-phones

were used in the tests). Both of them were configured to encode users’ voice with the G.711 codec. After the

generation of RTP packets at the originating user’s soft-phone, they are taken by Netfilter module to the

TranSteg SS application. Here they are being transcoded to G.726 and a steganogram is added. The transcoding

is performed with the aid of Sun’s CCITT implementation published on General Public License. RTP packet

payload coded with G.711 is of the size 160 bytes. Post the transcoding to G.726, voice payload size decreases to

80 bytes. As mentioned earlier, TranSteg intentionally does not change the RTP payload field size after the

transcoding. That is why the rest of the unused 80 bytes can be allocated for a steganogram. In the developed

implementation, the steganogram is inserted into RTP packets from a user selected file. After changing the RTP

payload, the application recalculates UDP checksum. Then the modified packet is returned to the Netfilter

module and sent through outgoing network interface. Ethernet frame recalculation is done automatically before

RTP packets are sent via the network interface. Then, after traversing the network, RTP packets reach host B.

Next, they are redirected by the Netfilter module to TranSteg SR responsible for the extraction of the

steganogram and transcoding the voice payload from G.726 back to G.711. It also inserts the transcoded voice

into RTP payload fields and recalculates UDP checksums. Afterwards, the RTP packets are sent to soft-phone

application for user conversation reconstruction.

Four TranSteg characteristics were measured with the aid of the above testbed (Fig. 10):

• Influence on the call quality (expressed in MOS scale),

• Steganographic bandwidth (expressed in kbit/s),

• Introduced delays (expressed in ms),

• Distribution of the byte values in RTP packets’ payload.

Fig. 10: TranSteg experimental setup

Call quality may be expressed in terms of subjective and objective quality measures. Objective measures are

usually based on algorithms such as the E-Model [15], PAMS or PESQ [17] or others [20], [44]. In our analysis

we shall use the subjective measure MOS (Mean Opinion Score) [16] calculated with the PESQ method. In our

experiments, we used audio recordings from the TIMIT [11] continuous speech corpus - one of the most widely

used corpora in the speech recognition community. Based on these recordings the voice packet payload was

compiled into seven .wav files. Both male and female voices speaking English were used. Each resultant .wav

input file was about 30 seconds long. The adopted coding involved PCM, 8000 Hz sampling, 16 bit sample

representation of monophonic signal. It was ensured that the setup conformed to ITU-T P.862.3 recommendation

[18] requirements which guarantees proper functioning of the PESQ method. Every .wav input file part was then

inserted into the payloads of consecutive RTP packets and sent to the receiver where reassembling into a .wav

file was performed. Then, the original and degraded files were compared with the use of PESQ and the resultant

MOS-LQO (MOS-Listening Quality, Objective) was returned.

12

It was experimentally verified that the average call duration for IP telephony falls in the range of 7-11 minutes

[13]. Thus, the tests for obtaining the values of steganographic bandwidth and latency were obtained for call

duration of 9 minutes. The 9 minute representation was chosen to show how much secret data can be sent in one

direction during a typical IP telephony call. Each experiment was repeated 20 times and the average results are

presented.

Latency was measured with the aid of the MGEN 5.02 tool [37]. To achieve this goal MGEN utilizes NTP

(Network Time Protocol) 4.2.6 [28], implementation v. 4.2.6 [31]. After the synchronization of two hosts: A and

B (Fig. 11) latency measurements were performed.

Fig. 11: Latency measurement

MGEN client and server were running on host A and B respectively. MGEN enables latency measurements by

saving packet sending time in the packet payload. The MGEN client was sending a UDP stream, whose

characteristic was chosen to exactly match RTP stream with G.711 payload (50 packets per second, 160 bytes of

voice payload). The MGEN-generated packets were sent to the Netfilter module and then to TranSteg

Steganogram Sender. In this module all of the TranSteg-related operations i.e. the transcoding of voice from

G.711 do G.726 and the insertion of steganogram were performed, but packet payload was intentionally not

changed. It was necessary to leave the original UDP payload because otherwise it would be impossible to extract

packets’ sending time. Next, packets were sent to host B. There they were transferred by the Netfilter module to

the TranSteg Steganogram Receiver. In the SR all necessary operations i.e. transcoding from G.726 to G.711 and

steganogram extraction were performed, and once again the payload was not changed. Then, when packets

reached the MGEN server it would then generate a report about their sending and receiving time. Basing on the

information from the MGEN report TRPR (TRace Plot Real-time) 2.1b2 tool [38] latency of each packet was

calculated.

4.3 Experimental Results

The obtained experimental results are presented in Tables I, II and in Fig. 12-17. As mentioned in the previous

subsection, TranSteg had been investigated for the hidden communication scenario S4 from Fig. 6.

Steganographic bandwidth for every performed call was identical and equalled 32 kbit/s. The explanation of

this lies in the utilized codecs. For the overt codec – G.711 – the resulting payload is 160 bytes and for the covert

codec – G.726 (32 kbit/s) – it is 80 bytes. Thus, when using 80 bytes for steganogram in every RTP packet (and

50 packets are generated every second of the call), then, during the whole, typical IP telephony connection it is

possible to transfer about 2.2 MB (in each direction). This must be considered as a high steganographic

bandwidth when compared with other existing VoIP steganographic methods.

The voice quality results (Table III) show that the average obtained result 3.83 (in MOS scale), is similar to the

voice quality offered by G.726 (it is about 3.85). Thus, the resulting voice quality is lower than originally but it

is still considered as good – the change is almost imperceptible for an average IP telephony user.

Table III TranSteg voice quality results

Wav file number 1 2 3 4 5 6 7

Reference MOS 4.46

MOS-LQO 4.013 3.908 3.687 3.657 3.709 3.715 4.149

Average MOS-LQO 3.834 (with standard deviation 0.18)

Table IV presents delays introduced by TranSteg. The measured latency difference between calls with and

without TranSteg turned out to be about 0.4 ms. This signifies that TranSteg does not introduce significant

delays that could seriously affect voice quality. The exemplary latency results for VoIP calls with and without

TranSteg are presented in Fig. 12. It is worth noting that latency was measured for the worst case scenario (from

13

the point of view of the introduced delays) where two transcoding operations took place (first at SS, and then the

second at SR). For the other hidden communication scenarios from Fig. 4 the introduced delays would be even

lower. However, it must be emphasised that for a different pair of codecs the results would differ as both codecs:

G.711 and G.726, are of low computational complexity.

Table IV TranSteg latency results

With TranSteg Without TranSteg Difference

Average Latency [ms] 1.24 0.85 0.39

Standard Deviation [ms] 0.32 0.07 -

Fig. 12: Latency results for one exemplary IP telephony connection without (left) and with (right) TranSteg

Distribution of the byte values in RTP packet’s payload was also investigated. This was done to verify how

much the transcoding operations and the addition of the steganogram change the voice payload. This knowledge

can be later utilized to aid the development of TranSteg detection method (see Sec. 3.3).

Fig. 13 presents the byte values’ distribution for G.711 encoded speech before it reaches SS, i.e. prior to

transcoding to G. 726, and after this operation (after leaving SS). As expected, there is a significant difference

between the two presented curves. Thus, if the warden is able to monitor RTP traffic in the two networks (1 and

2 in Fig. 6), then the suspicious IP telephony connections can be discovered.

However, it must be emphasised, that in the other hidden communication scenarios from Fig. 4 (S1-3), when

the SRTP protocol is utilized for conversation security, the byte frequency distribution in RTP payloads before

and after transcoding will be similar (due to SRTP encryption) and, thus, TranSteg will be difficult to detect.

0 50 100 150 200 2500

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

5

Byte values

Fre

qu

en

cy d

istr

ibu

tio

n o

f b

yte

valu

es

Original G.711 encoded speech

Transcoded speech (G.711 -> G.726) with steganogram

Fig. 13: Frequency distribution of byte values for G.711 encoded speech (before reaching SS) and transcoded to G.726

(after leaving SS)

Fig. 14 illustrates frequency distribution of byte values for the case where speech is transcoded at SS to G.726

– prior and post to the embedding of the steganogram. This diagram shows the sole influence of this operation,

without the transcoding, of the voice payload. As can be seen, the two curve shapes are significantly different.

Connection time [s] Connection time [s]

Lat

ency

[s]

Lat

ency

[s]

14

0 50 100 150 200 2500

2

4

6

8

10

12x 10

4

Byte values

Fre

qu

en

cy d

istr

ibu

tio

n o

f b

yte

valu

es

Transcoded speech (G.711 -> G.726) without steganogram

Transcoded speech (G.711 -> G.726) with steganogram

Fig. 14: Frequency distribution of byte values for speech transcoded to G.726 (at SS) without a steganogram and with a

steganogram (after leaving SS)

In Fig. 15 it is presented how this situation can be changed when the same steganogram is compressed using

zip algorithm. In this case, the two curve shapes are very similar. The compression operation leads to the

randomization of the steganogram’s bytes. This leads to an even shift in the observed frequency distribution of

byte values.

0 50 100 150 200 2500

1

2

3

4

5

6x 10

4

Byte values

Fre

qu

en

cy d

istr

ibu

tio

n o

f b

yte

valu

es

Transcoded speech (G.711 -> G.726) without steganogram

Transcoded speech (G.711 -> G.726) with zipped steganogram

Fig. 15: Frequency distribution of byte values for speech transcoded to G.726 (at SS) without steganogram and with zipped

steganogram (after leaving SS)

Finally, let us analyze for an originally G.711 encoded speech i.e. at a point prior to reaching SS and how

frequency distribution of byte values changes past the inverse transcoding to G.711, i.e. after leaving the SR. The

obtained results are illustrated in Fig. 16. Careful analysis of this figure leads to the conclusion that there are

only slight differences between the G.711 encoded speech at the transmitter as compared with the received one

(marked with arrows in Fig. 17). This means that, if the warden is able to monitor RTP traffic in two distinct

localizations: in the overt transmitter’s and the overt receiver’s LANs (Local Area Networks), then TranSteg

utilization could still remain unnoticed. The differences in shapes of the presented curves can be explained, to

some extent, as introduced by the network, e.g. due to packet losses or transmission errors. The latter are present

as the underlying transport protocol in the TCP/IP stack that is used is the unreliable UDP protocol. However,

the presence of major differences can indicate TranSteg utilization.

The presented experimental results confirm the conclusions from Sec. 3.3. A warden capable of monitoring

traffic in more than single network localization is very likely to detect the presence of TranSteg. However, it

must be emphasized that the chosen hidden communication scenario S4 was selected to verify the potential

15

influence of TranSteg on voice quality in a worst case scenario. From the point of view of the required effort for

detection, this scenario is the easiest to defeat. It must be noted, that if any other scenario is utilized, e.g. the S1

scenario, together with SRTP encryption, then the disclosure of TranSteg can be very hard to attain.

0 50 100 150 200 2500

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

5

Byte values

Fre

qu

en

cy d

istr

ibu

tio

n o

f b

yte

valu

es

Original G.711 encoded speech

Transcoded speech (G.726 -> G.711)

Visible

differences

Fig. 16: Frequency distribution of byte values for G.711 encoded speech (before reaching SS) and transcoded to G.711

(after leaving SR)

However, even when considering the scenario S4, there is a simple way to obstruct the detection process,

especially for the situation illustrated in Fig. 16. The proposed solution is to encode steganogram’s bits until the

original and transcoded speech’s curves, at the receiver, are the same. Such an approach will surely limit the

potential available steganographic bandwidth. At the same time, with the original and transcoded byte values’

frequency distribution curves looking exactly the same, the disclosure of hidden communication shall be

impossible.

To summarize, the detection of the TranSteg method is not trivial, especially for the hidden communication

scenarios S1-3. Even for scenario S4 some simple measures can be taken to improve undetectability. Simple

analysis of the frequency distribution of byte values in RTP payload as shown above may not be sufficient. Thus,

as stated in Sec. 3.3, other possibilities leading to the development of an efficient detection method (one that

would fulfil VoIP’s real-time constraints) must be investigated.

5. Conclusions and Future Work

In this paper, a new IP telephony steganographic method, named TranSteg, was introduced. It was described

basing on the possible hidden communication scenarios (S1-4 from Fig. 4). It was shown that the scenario where

steganogram sender and receiver are the original source and final destination of the RTP traffic, respectively, is

the most advantageous from the point of view of the achievable steganographic bandwidth, introduced

steganographic cost and the undetectability of the method.

TranSteg proof of concept implementation was also designed and developed for a worst case scenario, where

the introduced delays are largest. The obtained experimental results, for G.711 as an overt and G.726 as a covert

codec, proved that the proposed method is feasible and offers a high steganographic bandwidth up to 32 kbit/s

while introducing delays lower than 1 ms, and still retaining good voice quality (about 3.8 in MOS scale).

Detection of TranSteg strongly depends on the realized hidden communication scenario and the capabilities of

a warden responsible for network steganography detection (e.g. the locations where it can monitor VoIP traffic).

Generally, TranSteg detection can be difficult to perform, especially, if the SRTP protocol is utilized for

securing RTP streams. Detection is also impeded when the warden is able to inspect traffic only in a single

network localization.

Future work should involve an in depth analysis of speech codec pairs (overt and covert) that would be most

advantageous for TranSteg. The algorithm for the selection of the covert codec will be developed, with the

consideration of assuring an acceptable voice quality, low introduced delays and different VoIP codecs’

16

characteristic features. Moreover, a prototype implementation should be developed for an end-to-end hidden

communication scenario (S1 from Fig. 4) with the SRTP capability. This will allow the pursuing of an efficient,

real-time, TranSteg detection method. On the other hand, to enhance the undetectability of TranSteg the different

mechanisms of spreading the steganogram over the voice data instead of filling them in the end of the payload

will be analysed in more detail. Additionally, detection methods that proved to be successful for digital images

based on identifying double-compression [32] or compression artefacts should be considered and evaluated in

TranSteg context. However, it must be noted that when for TranSteg scenario S1 (Fig. 4) is utilized together with

SRTP voice stream encryption then utilization of such techniques for TranSteg detection will likely fail.

ACKNOWLEDGMENTS

• This work was supported by the Polish Ministry of Science and Higher Education under grants:

504/M/1036/0248/2011 and IP2010 025470.

• The authors would like to thank Elżbieta Zielińska from Warsaw University of Technology (Poland) for helpful

comments and remarks.

REFERENCES

[1] Ahsan, K., Kundur, D.: Practical Data Hiding in TCP/IP. In: Proc. of: Workshop on Multimedia Security at ACM

Multimedia 2002, Juan-les-Pins, France (2002)

[2] Aoki, N.: A Technique of Lossless Steganography for G.711 Telephony Speech, International Conference on

Intelligent Information Hiding and Multimedia Signal Processing (IIHMSP 2008), pp. 608 – 611, Harbin, China,

15-17 August 2008

[3] Bai, L. Y., Huang, Y., Hou, G., Xiao, B.: Covert Channels Based on Jitter Field of the RTCP Header, International

Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2008

[4] Baugher, M., Casner, S., Frederick, R., Jacobson, V.: The Secure Real-time Transport Protocol (SRTP), RFC 3711,

March 2004

[5] Bender, W., Gruhl, D., Morimoto, N., Lu, A.: Techniques for Data Hiding. IBM. System Journal. 35(3,4), 313–336

(1996)

[6] Christopher R. Forbes, A New Covert Channel over RTP, MSc thesis, Rochester Institute of Technology, URL:

https://ritdml.rit.edu/bitstream/handle/1850/12883/CForbesThesis8-21-2009.pdf?sequence=1

[7] Dittmann, J., Hesse, D., Hillert, R.: Steganography and steganalysis in voice-over IP scenarios: operational aspects

and first experiences with a new steganalysis tool set, Proc. of SPIE, Vol. 5681, Security, Steganography, and

Watermarking of Multimedia Contents VII, San Jose, 2005, pp. 607-618

[8] Ekiga soft-phone, URL: http://ekiga.org/

[9] Fisk G., Fisk M., Papadopoulos C., Neil J., Eliminating steganography in Internet traffic with active wardens, 5th

International Workshop on Information Hiding, Lecture Notes in Computer Science, 2578, pp.18–35, 2002.

[10] Fridrich, J., Goljan, M., Du, R.: Invertible Authentication Watermark for JPEG Images, ITCC 2001, Las Vegas,

Nevada, April 2-4, 2001, pp. 223-227

[11] Garofolo, J. S., et al.: TIMIT Acoustic-Phonetic Continuous Speech Corpus Linguistic Data Consortium,

Philadelphia, 1993

[12] GTK+ (GIMP Toolkit) documentation, URL: http://www.gtk.org

[13] Guha, S., Daswani, N., Jain, R.: An experimental study of the Skype peer-to-peer VoIP system. Sixth International

Workshop on Peer-to-Peer Systems (IPTPS), February 2006

[14] I)ruid, Real-time Steganography with RTP, Technical Report, September, 2007 URL:

http://www.uninformed.org/?v=8&a=3&t=pdf

[15] ITU-T, Recommendation G. 107, The E-Model, a computational model for use in transmission planning, 2002

[16] ITU-T, Recommendation. P.800, Methods for subjective determination of transmission quality, 1996

[17] ITU-T, Recommendation. P.862, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-

end speech quality assessment of narrow-band telephone networks and speech codecs, 2001

[18] ITU-T, Recommendation. P.862.3, Application guide for objective quality measurement based on

Recommendations P.862, P.862.1 and P.862.2, November 2007

[19] Krätzer, C., Dittmann J., Vogel, T., Hillert R.: Design and Evaluation of Steganography for Voice-over-IP, In Proc.

of IEEE International Symposium on Circuits and Systems, (ISCAS) 2006, Kos, Greece, 2006

17

[20] Lee, W., Lee M., McGowan, J., Enhancing objective evaluation of speech quality algorithm: current efforts,

limitations and future directions, European Transactions on Telecommunications, Vol. 20, Issue 6, October 2009,

pp. 594–603

[21] Linphone soft-phone, URL: http://www.linphone.org/

[22] Lubacz, J., Mazurczyk, W., Szczypiorski, K.: Vice over IP (invited paper), In: IEEE Spectrum, ISSN: 0018-9235,

February 2010, pp. 40-45

[23] Mazurczyk, W., Kotulski, Z.: New security and control protocol for VoIP based on steganography and digital

watermarking, In Proc. of 5th International Conference on Computer Science - Research and Applications (IBIZA

2006), Poland, Kazimierz Dolny 9-11 February 2006

[24] Mazurczyk, W., Lubacz, J.: LACK – a VoIP Steganographic Method, In: Telecommunication Systems: Modelling,

Analysis, Design and Management, Vol. 45, Numbers 2-3, 2010, ISSN: 1018-4864 (print version), ISSN: 1572-

9451 (electronic version), Springer US, Journal no. 1123

[25] Mazurczyk, W., Szczypiorski, S.: Covert Channels in SIP for VoIP signalling, In: Hamid Jahankhani, Kenneth

Revett, and Dominic Palmer-Brown (Eds.): ICGeS 2008 - Communications in Computer and Information Science

(CCIS) 12, Springer Verlag Berlin Heidelberg, Proc. of 4th International Conference on Global E-security 2008,

London, United Kingdom, 23-25 June 2008, pp. 65-70

[26] Mazurczyk, W., Szczypiorski, S.: Steganography of VoIP Streams, In: Robert Meersman and Zahir Tari (Eds.):

OTM 2008, Part II - Lecture Notes in Computer Science (LNCS) 5332, Springer-Verlag Berlin Heidelberg, Proc.

of OnTheMove Federated Conferences and Workshops: The 3rd International Symposium on Information Security

(IS'08), Monterrey, Mexico, November 9-14, 2008, pp. 1001-1018

[27] Miao, R., Huang, Y.: An Approach of Covert Communication Based on the Adaptive Steganography Scheme on

Voice over IP, Communications, IEEE International Conference on (ICC 2011), 2011

[28] Mills, D., Delaware, U., Martin, J., Burbank, J., Kasch W.: Network Time Protocol Version 4: Protocol and

Algorithms Specification, IETF RFC 5905, June 2010

[29] Murdoch, S., Lewis, S.: Embedding Covert Channels into TCP/IP. Information Hiding, 247–266 (2005)

[30] Netfilter framework documentation, URL: http://www.netfilter.org

[31] NTP: The Network Time Protocol documentation, URL: http://ntp.org

[32] Pevny, T., Fridrich, J.: Detection of Double-Compression for Applications in Steganography, IEEE Transactions

on Information Security and Forensics, 3(2), pp. 247-258, 2008

[33] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.: SIP: Session Initiation Protocol, IETF, RFC 3261,

June 2002

[34] Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V.: RTP: A Transport Protocol for Real-Time Applications,

IETF, RFC 3550, July 2003

[35] Simmons, G. J., Subliminal channels; past and present. European Transactions on Telecommunications, Vol. 5,

Issue 4, July/August 1994, pp. 459–474.

[36] Takahashi, T., Lee, W.: An Assessment of VoIP Covert Channel Threats. In: Proc. of 3rd

International Conference

on Security and Privacy in Communication Networks (SecureComm 2007), Nice, France (2007)

[37] The Multi-Generator MGEN documentation, URL: http://cs.itd.nrl.navy.mil/work/mgen/

[38] The TRace Plot Real-time (TRPR) documentation, URL: http://pf.itd.nrl.navy.mil/proteantools/trpr.html

[39] Tian, H., Zhou, K., Jiang, H., Liu, J., Huang, Y., Feng D.: An adaptive steganography scheme for voice over IP,

IEEE International Symposium on Circuits and Systems (ISCAS 2009), Taipei, Taiwan, 24-27 May 2009

[40] Tian, H., Zhou, K., Jiang, H., Liu, J., Huang, Y., Feng D.: An M-Sequence Based Steganography Model for Voice

over IP, IEEE International Conference on Communications (ICC 2009), pp. 1-5, Dresden, Germany, 14-18 June

2009

[41] Wang, C., Wu, W., Information Hiding in Real-Time VoIP Streams, Ninth IEEE International Symposium on

Multimedia (ISM 2007), pp. 255 – 262, Taichung, Taiwan, 10-12 Dec. 2007

[42] Wright, C., Ballard, L., Coulls, S., Monrose, F., Masson, G.: Spot me if you can: recovering spoken phrases in

encrypted VoIP conversations. In Proceedings of IEEE Symposium on Security and Privacy, May, 2008

[43] Zander, S., Armitage, G., Branch, P. (2007) A Survey of Covert Channels and Countermeasures in Computer

Network Protocols, IEEE Communications Surveys & Tutorials, 3rd Quarter 2007, Volume: 9, Issue: 3, pp. 44-57,

ISSN: 1553-877X

[44] Zhou, Y., Chan, W.: E-model based comparison of multiple description coding and layered coding in packet

networks, European Transactions on Telecommunications, Vol. 18, Issue 7, November 2007, pp. 661–668