An empirical study on the robustness of a fragile watermark for relational databases

An Empirical Study on the Robustness of a

Fragile Watermark for Relational

Databases

Ibrahim Kamel Waheeb Yaqub Kareem Kamel

Dept. of Electrical and Computer Engineering

University of Sharjah

kamel@sharjah.ac.ae, {waheebyaqub , kamel.kareem}@gmail.com

Abstract─ Databases most often contain critical information.

Unauthorized changes to databases can have serious

consequences and may result in significant losses for the

organization. This paper presents a viable solution for

protecting the integrity of the data stored in relational databases

using fragile watermarking. Prior techniques introduce

distortions to the watermarked values and thus cannot be

applied to all attributes. Our technique protects relational

tables by reordering tuples relative to each other according to a

secrete value (watermark). This paper introduces empirical

study on the effect of data distribution and other data measures

like the mean and standard deviation on the attack detection

rates. A study on the cost, in terms of the execution time, of the

proposed watermark insertion and data verification algorithms

is also presented.

I. INTRODUCTION

Information security and integrity are becoming critical

areas of research especially with the recent outburst of

Internet attacks and hostility. Adversaries launch attacks for

many reasons, most importantly for profit. Databases usually

contain critical information like salaries, ownership, land use,

personal information, etc. Unauthorized changes to such

databases might result in significant losses for organizations

and individuals. Recently, the database research community

came to realize the importance of watermarking in protecting

relational databases and particularly databases published on

the web ‎[2] ‎[5] ‎[10] ‎[12].

The state of the art in detecting unauthorized alterations in

relational databases can only detect some of the attacks on

specific types of attributes. These techniques use probabilistic

models. Thus, their detection rate depends on the type and

value of the modification. Moreover, prior database

watermarking techniques are designed to protect attributes

that tolerate distortion to their values.

In this paper, we propose a fragile watermarking technique

to protect the integrity of relational database. The proposed

technique does not distort the watermarked attribute. We

present an empirical study of the effect of various data

distribution and properties on the robustness of the proposed

technique.

In the next section, prior works in watermarking databases

are summarized. Section III describes the basic idea behind

using the fragile watermarking technique. In Section IV, we

describe the watermark insertion algorithm. Section V shows

how the attacks on the integrity of the data can be detected.

Section VI evaluates the effectiveness of the proposed

watermarking technique. Then a sensitivity analysis on the

effect of data distribution on the detection rate is introduced.

Finally the execution time of the watermark insertion and data

verification algorithm are presented. Concluding remarks and

future works are presented in Section VII.

II. RELATED WORK

Recently, the use of watermarking for data integrity and

copyright protection has attracted a lot of interest in the

databases research community ‎[1,2,4,12]. Traditional

techniques in watermarking databases simply hide a secret

message in some of the attributes in the relation ‎[13]‎[14].

Normally, the watermark insertion introduces distortion to the

attribute by changing its original value. Thus these techniques

can only be applied to relations with attributes that are not

sensitive to small errors, e.g., temperature readings ‎[2].

This scheme uses a secret key; tuples are first selected for

watermark embedding. A few bits of some arbitrary attributes

in the selected tuples are modified to embed watermark bits.

This scheme requires the entire database relation to be

available at the time of watermark embedding. Attributes like

salary, prices, and property coordinates cannot be used to host

watermarks. The number attributes that can be watermarked

depend on the nature of the database. Some relations might

have few watermark-able attributes, while other relations

might not have any attributes that can be watermarked. The

detection of malicious alteration is probabilistic and depends

on the number of watermark-able attributes. The larger the

number of watermark-able attributes the more secure the

database becomes.

Wayner ‎[9] used a technique similar to our proposed

technique, for hiding text in the order of a list of songs.

Unlike‎our‎proposed‎scheme,‎Wayner’s technique is a type of

steganography. Thus, it should be as robust as possible to

preserve the hidden message even after malicious attacks.

Sion et al. ‎[12] presented a robust watermarking scheme that

sorts all the tuples and divides them into mutually exclusive

subsets. A single watermark bit is embedded in each subset

by modifying the distribution of tuple values. Watermark bits

are embedded more than once and an error-control coding

scheme is used to recover the embedded bits. This scheme is

not suitable for dynamic natured databases with frequent

updates, due to expensive maintenance costs.

Guo et.al. ‎[5] proposed a technique for relational database

watermarking. The proposed technique arranges the tuples in

groups. For each group of tuples, a two dimensional grid of

watermarks is created. Along one dimension, a set of

watermarks is calculated for selected fields across all tuples.

Along the other dimension a watermark is created for each

tuple across all the fields. The watermark, which is stored in

the least significant bits, is created as a function of the values

of the attributes using a secure hash function. An attribute

change would affect two intersecting watermarks; a

horizontal one (at the tuple level) and a vertical one (at the

attribute level) that spans all the tuples in the group.

Identifying the affected watermarks would in turn identify the

altered attribute. This way attacks can be detected and the

victim attribute can be identified. The proposed technique, on

the other hand, protects a potentially important attribute (or a

function of multiple attributes) by shuffling the set of tuples

in a way that corresponds to the secret watermark.

We presented some preliminary thoughts about the use of

tuple shuffling to hide the secret watermark in [7]. In this

paper, we present complete implementation for the watermark

insertion and attack verification algorithms and show

extensive simulation results on the robustness and the

performance of the proposed algorithms.

Watermarking XML documents was studied by Gross-

Amblard ‎[3] and Sion et al. ‎[11]. The idea is to hide the

watermark in certain values in a way as to preserve the

response to certain queries.

III. PROTECTING THE RELATION USING FRAGILE

WATERMARK

Enterprise data is usually stored in relational databases. Even

though the relation will be accessed through the DBMS, the

attacker can access the hard disk to change attribute values in

the relation either manually or using malicious code. The fact

that the database relations and their indexes are disk resident,

makes them vulnerable to attacks by third party code even if

the DBMS is offline (off-line attack).

Table 1: Symbol table

Symbol Description

ai The ith

attribute in a relation R(a1, a2,‎…,‎am)

N Number of attributes in relation R

M Size of the watermarked group in terms of the

number of tuples

W The value of the watermark in decimal number

system

WF Watermark in factorial number system

OR A sorting order, e.g., ascending,

Thus the main objective is to identify unauthorized updates

in relational databases. The proposed method hides the

watermark in the relative order of tuples (records). Tuples are

organized into groups of size m and re-arranged in a way that

corresponds to the value of a secret watermark W. Table 1

lists the most common symbols used in the following

discussion. The ordering is done according to the value of the

sensitive attribute(s) that need to be protected. This re-

arrangement is done relative to a secret initial order, e.g.,

sorting the tuples in ascending order on the attribute value ai.

The watermark W and the initial order are secret and

known only to authorized users. Updates to databases, e.g.,

tuple insertion, deletion, and modification may require

adjustment of the relative order of the tuples within the group

to respect the hidden watermark W. Authorized users, who

know the value of W and initial order can update the relation

and adjust accordingly the order of the tuples in the group.

On the other hand, malicious alterations will disturb this

order. To be able to check the integrity of a tuple t, we need to

check conformance to the order of the group (that t belongs

to) to the initial order. Thus, the watermark value W is used to

reconstruct the initial order from the current order of the set

of tuples.

The algorithm in Fig. 1 shows how the watermark is

inserted in a group of tuples. If an organization is interested in

protecting the salary attribute, for example, in a payroll

system, then each group of m payroll tuples will be first

sorted according to the initial order. For simplicity, let us

assume that the initial order is an ascending numerical order.

Then the group of tuples is shuffled among each other

according to the value of the secret watermark W. The result

is that tuple j is placed in a location (relative to other tuples in

the same group) that corresponds to the value of awj. Suppose

that the Mallory (the attacker) increased the value of the

attribute awj in one of the tuples j in a group of m tuples. The

current location of tuple j does not correspond to the value of

its attribute awj. Authorized users who know the secret value

W and the initial order can then detect the alteration in awj.

IV. WATERMARK INSERTION

As an example, let us consider a payroll system where a

relation stores salary information of all employees in an

organization. Since databases are usually disk resident,

Mallory can use a stand-alone code to illicitly change the data

in the relation. The following are some possible attacks that

Mallory can launch against the database:

Modification attack: increasing or decreasing the value of

one or more attributes.

Insertion attack: insert a non-existing tuple in the original

relation.

Removal attack: remove one or more tuples from the

relation.

Recall that our watermarking technique hides the secret

value (watermark) in the relative order of the tuples. Notice

that insertion and deletion attacks would be easier to detect as

they will most likely disturb the order of the tuples in the

group.

To be able to rearrange a set of tuples (that belong to the

same group) in a way that corresponds to a specific

watermark value, we need to establish a one to one mapping

from all possible permutations of the tuples to all values of W.

Algorithm Name: Watermark insertion

Input: G is a group of m tuples; watermark WF

Output: GW

For (i = m-1 ; i > 0 ; i--)

//Circular-left-shift the subset G[x,y]

//by the value WF [i]

leftCircularShift(WF[i] , G[i+1,k])

Where:

G[x,y] is a subset of G from x to y

WF[i] is the ith

digit in WF

Fig. 1: Algorithm for watermarking a set of m tuples.

V. INTEGRITY CHECK ALGORITHM

This section shows how to check the integrity of the data

prior to its use. Before retrieving tuple T the DBMS runs the

integrity check algorithm on the group gT of tuples that T

belongs to. The integrity check algorithm can either be run

occasionally as a separate operation or prior to every read.

Since attacker (Mallory) does not know the watermark value

W nor the initial order, changing data in T would corrupt the

existing watermark.

The idea is this, given a watermarked group of tuples gT

and the secret watermark value W, we can reconstruct the

initial order Oi (Extraction algorithm). This operation can be

thought of as de-watermarking the group of tuples gT. The

extraction algorithm is opposite to the watermark insertion

algorithm shown in Fig. 1.

The initial order Oi is the expected order, e.g., in ascending

order. In this case, the reconstructed group of tuples should be

ordered such that the tuple with the least salary values appears

first followed by the next least in values and so on. However

if the order of the tuples does not follow the expected order

(according to the value of the salary attribute), then, at least,

one of the salary values in this group of tuples has been

attacked (or altered). On the other hand, the opposite is not

always true; if entries respect the ascending order of the

chosen attribute (e.g., salary) this does not necessarily mean

that there were no attacks. A salary values might have been

attacked but the change in value is too small to violate the

sorting order.

Another factor that affects the detection rate of malicious

attacks is the group size m or the number of watermarked

entries. The experiment section shows that the detection rate

increases as the group size m increases, allowing us to

achieve a 100% detection rate for small attacks with a

reasonable group size.

VI. EXPERIMENTAL EVALUATION

In this section we evaluate the effectiveness of the proposed

watermarking technique. First the robustness of the proposed

technique is evaluated by measuring the success rate in

detecting random attacks. Then a sensitivity analysis on the

effect of data distribution on the detection rate is introduced.

The experiments vary data distribution and other factors like

standard deviation, attack percentage and group size. The

code was written in Java and we ran experiments on data that

were generated using different data distributions. The attacks

are simulated as follows: first a group of tuples is randomly

chosen. A victim tuple is then selected at random. The victim

attribute is increased or decreased by a predefined percentage.

A number of interesting observations were noted.

Fig. 2: detection rate vs. the group size

A. The effect of the group size on the detection rate

Increasing the group size will increase the number of different

values that are watermarked (shuffled) together. This

experiment measures the attack detection rate and shows how

a change in the group size affects the detection rate of the

attack.

In Fig. 2 the y-axis shows the attack detection rate as a

function of group size m (x-axis), which is varied from 5 to

0 10 20 30 40 50

Group Size

50. For each group size, the victim value is increased by 20%;

the experiment is repeated 500 times and detection rate is

averaged. The first observation is that the detection rate is not

always 100%. The reason is that the detection technique

depends on the change in order. Thus, if the attack is so small

such that it does not disturb the initial order, the attack will

not be detected. The second observation is that the detection

rate improves rapidly and reaches to 100% with increasing

the group size. The reason for the rapid increase is that the

gap between every two consecutive values in the group

decreases as new tuples are added to the group.

B. The effect of the attack percentage on the detection rate

In these experiments we focus only on the protected attribute

(e.g. salary). We generate a set of salary values according to a

certain data distribution. In this subsection the data

distribution is assumed to be normal. The set of experiments

measure how the increase in attack percentage affects the

detection rate.

In Fig. 3, the y-axis shows the attack detection rate as a

function of attack percentage (x-axis), which ranges from

0.1% to 15%. Group size m is set to 50. Fig. 3 shows that the

detection rate is low when the attack percentage is small. This

is due to the fact that the change in the attribute value is not

large enough to disturb the initial order (ascending order in

this experiment). As the attack percentage increases, the

detection rate improves. Notice that we can achieve 100%

detection rate when the attack percentage is 21% or more. As

we will see later, a 100% detection rate can be achieved with

a small attack percentage by increasing the group size m.

Fig. 3: detection rate vs. attack percentage

C. Relation between detection rate and standard deviation

This experiment measures how the change in standard

deviation of generated data affects the detection rate. In this

experiment, the group size is set to 50, and the distribution of

data follows a normal distribution. Standard deviation of

generated data ranges between 5 and 70. All of the standard

deviations below 5 also have a detection rate of 100%.

Fig. 4 shows that the standard deviation is inversely

proportional to the detection rate. The higher the standard

deviation, the lower the detection rate. The reason is that as

the standard deviation increases, the separations between

consecutive values increase. Thus a malicious attack that

alters a victim value might not disturb the initial order.

Fig. 4: standard deviation effect on the detection rate (normal distribution)

D. Required group size to achieve a 100% detection rate

In this experiment we are trying to discover the minimum

empirical value for the group size if a certain detection rate is

required for a specific attack percentage. In this experiment

the data follows a normal distribution and the attack

percentage ranges from 0.5% to 10%. The standard deviation

of the data is set to 10. In Fig. 5 the x-axis shows the group

size m while the y-axis measures the detection rate. It is clear

that the detection rate decreases with the attack percentage.

However, one can always achieve a 100% detection rate by

arbitrarily increasing the group size.

Fig. 5: Group sizes that achieve certain detection rate

0 5 10 15

Attack %

0 20 40 60 80

Standard Deviation

0 50 100 150 200

Group Size

Atk=0.5%

Atk=1%

Atk=3%

Atk=5%

Atk=10%

E. The effect of data distribution on the detection

This experiment was carried out to understand how different

data distributions affect the detection rate. Uniform, Normal

and Poisson distributions are used in this comparison.

In Fig. 6, all three distributions have the same standard

deviation, set to 10, and the same mean, set to 100. The attack

percentage is varied from 0.1% to 10% and the group size m

is set to 50. Fig. 6 shows that normal distribution performs

worse than uniform and Poisson distributions. Because of the

bell-shaped curve, 68% of the elements in normal distribution

will be around the mean and the rest is distributed with two or

more standard deviations away from the mean. The

separations between consecutive values are larger than that of

Poisson or uniform distributions.

Fig. 6: Comparing the detection rate for various distributions of data

F. The effect of the standard deviation on the detection rate

This experiment focuses on finding the minimum value for

a group size m, if certain detection is required for specific

standard deviation.

In this experiment, the data follows a normal distribution.

The standard deviation is varied from 10 to 70, and attack

percentage is kept 10% throughout this simulation. It is clear

that the detection rate decreases with increasing the standard

deviation. Fig. 7 shows that even though the detection rate

decreases with increasing standard deviation, one can achieve

a 100% detection rate by increasing the group size.

G. The cost of the proposed watermarking algorithms

This section shows how the change in group size affects the

performance of database when watermark is inserted and

when data integrity is verified. The experiment in Fig 8

measures the overheads, in terms of the execution time,

associated with watermark insertion and verification of data.

Fig. 7: To find a minimum group size that satisfies certain detection rate with

certain standard deviation

In Fig 8 the y-axis shows the execution time of the

watermark insertion‎ algorithm‎ ‘WM‎ insertion’‎ and‎ the‎

integrity‎verification‎algorithm‎‘Verification’‎in‎msec. The x-

axis, which is varied from 5 to 200, shows the group size m.

For a value of m, a victim tuple is selected at random and the

value of the protected attribute is increased by 10% (to

simulate an attack). The experiment is repeated 500 times for

various victim tuples and the resulting detection rate is

averaged.

Fig 8: Group size effect on the performance in term of the execution time

The first observation is that the verification algorithm is

consistently more expensive than the watermark insertion

algorithm. Recall that the watermark insertion algorithm

consists of two main operations: a sorting operation and a

tuple shuffling operation. On the other hand, the verification

algorithm includes the same two operations, namely,

shuffling and sorting in addition to checking the conformance

of the tuples to the initial order.

The difference between the watermark insertion and the

verification algorithms is due to the conformance checking

operation. In this operation we compare each attribute value

(of the protected field, e.g., salary) with its neighbor. The

comparison is done to check whether the group’s initial order

0 2 4 6

Attack %

Normal

Uniform

Poisson

0 50 100 150 200

Group Size

StD=10

StD=20

StD=30

StD=50

StD=70

0 50 100 150 200

Group Size

WM insertition

Verification

is violated. For group size of m tuples, the algorithm requires

m-1 comparisons. The complexity of conformance checking

operation is O(m). The second observation is that both

watermark insertion and verification costs are monotonically

increasing with increasing group size m. The reason is that

larger groups result in more tuples being included in the

shuffle and sort processes.

VII. CONCLUSIONS

This paper introduces a distortion free watermarking

technique for protecting the integrity of relational databases.

Tuples are organized into groups and shuffled in a way that

corresponds to a secret value. Unauthorized users do not

know the secret value (watermark) and thus changing the

protected attribute would cause the order to be disturbed. As a

result, the attack can be detected. In a few cases where the

attack (the change in the value) is so small such that the attack

does not disturb the order of the group of tuples, the detection

algorithm would fail to detect it.

We conducted simulation experiments to measure the

sensitivity of the proposed integrity scheme for various types

of attacks. The experiments show that a 100% detection rate

can always be achieved by increasing the group size m. We

also showed the effects of changing the data distribution on

the detection rate of the proposed technique. Moreover,

experiments that show the cost overhead, in terms of the

execution time, for the watermark insertion algorithm and the

verification algorithm are presented.

In the future, we will study ways to improve the detection

and verification rate of the proposed technique. We will also

study the effect of the length of the secret watermark on the

insertion and detection execution times.

REFERENCES

[1] R. Agrawal, J. Kieman, R. Srikant and Y.Xu.

"Hippocratic Databases". International Conference on

Very Large Databases (VLDB) 2002.

[2] R.‎ Agrawal,‎ P.J.‎ Haas‎ and‎ J.Kiernan.‎ “Watermarking‎

relational‎data:‎framework,‎algorithms‎and‎analysis”‎The‎

International Journal VLDB. vol. 12, issue 2, August

2003, pp. 157-169.

[3] I. J. Cox and M. L.‎Miller,‎ “Electronic‎ watermarking:‎

the‎ first‎ 50‎ years,”‎ Journal on Applied Signal,

EURASIP, vol. 2002, issue 2, Feb 02, pp. 126-132. [4] Li, H. Guo, S. Jajodia, Tamper detection and

localization for categorical data using fragile

watermarks, in: The 4th

International ACM Workshop

on Digital Rights Management, October 2004.

[5] H. Guo, Y. Li, A. Liua and S. Jajodia, A fragile

watermarking scheme for detecting malicious

modifications of database relations, Information

Sciences, vol. 176, issue 10, 22 May 2006, Pages 1350-

[6] I. Kamel,‎ “A‎ schema‎ for‎ protecting‎ the‎ integrity‎ of‎

databases”,‎ International Journal of Computers and

Security, Elsevier, vol. 28, issue 7, October 2009, Pages

698-709.

[7] I.‎ Kamel‎ and‎ K.‎ Kamel,‎ “Toward‎ Protecting‎ the‎

Integrity‎ of‎ Relational‎ Databases”,‎ WorldCIS-2011,

London, UK, Feb 2011.

[8] I. Kamel, and Q. Albluwi, A Robust Software

Watermarking for Copyright Protection, International

Journal of Computers & Security, Elsevier, vol. 28,

issue 6, pp. 395-409, 2009. [9] P. Wayner,‎ “Disappearing Cryptography,”‎ Morgan

Kaufmann, Second Edition, 2002.

[10] Y. Li, V. Swarup, S. Jajodia, A robust watermarking

scheme for relational data, in: Proceeding of the 13th

Workshop on Information Technology and Engineering,

December 2003, pp. 195–200.

[11] R.‎ Sion,‎M.‎ J.‎ Atallah‎ and‎ S.‎ K‎ Prabhakar.‎ “Resilient‎

information‎ hiding‎ for‎ abstract‎ semistructures,”.‎

International Workshop on Digital Watermarking

(IWDW), 2003.

[12] R. Sion, M. Atallah, S. Prabhakar, Rights protection for

relational data, in the Proceeding of the ACM

International Conference SIGMOD, pp. 98–109, 2003.

[13] R. Halder, S. Pal, A. Cortesi,‎“Watermarking‎Techniques‎

for‎ Relational‎ Databases:‎ Survey,‎ Classification‎ and‎

Comparison”,‎ Journal of Universal Computer Science,

vol. 16, no. 21 (2010), 3164-3190

[14] U.‎ Rao‎ ‎ ,‎ D.‎ Patel‎ ‎ ,‎ P.‎ Vikani,‎ “Relational Database

Watermarking for Ownership Protection”,‎ 2nd

International Conference on Communication, Computing

&Security (ICCCS)-2012

An empirical study on the robustness of a fragile watermark for relational databases

Documents

Transcript of An empirical study on the robustness of a fragile watermark for relational databases

Communication for Development Interventions in Fragile States

Robustness in biological neural networks

Robustness Tests for Quantitative Research

Robustness in large-scale random networks

Robustness of Shor's algorithm

Localized Lossless Authentication Watermark (LAW

Improving Livelihoods on Fragile Lands

Fragile States: Defining Difficult Environments for Poverty ...

The fragile basic anchoring effect

Dilution robustness for mean field ferromagnets

Robustness Assessment of Building Structures under Explosion

SECURITY AND ROBUSTNESS OF LOCALIZATION ...

Purchase from to remove the watermark

The fragile success of team start-ups

WRAP-historical-review-prescriptive-design-rules-robustness ...

Discovering Robustness Amongst CBIR Features

Transitioning Fragile States: A Sequencing Approach

Fragile States - Open Knowledge Repository - World Bank ...

Modulation of Lactobacillus plantarum Gastrointestinal Robustness by Fermentation Conditions Enables Identification of Bacterial Robustness Markers

Fragile X Syndrome Hydrocephalus - respiteservices.com