Post on 02-May-2023
An Empirical Study on the Robustness of a
Fragile Watermark for Relational
Databases
Ibrahim Kamel Waheeb Yaqub Kareem Kamel
Dept. of Electrical and Computer Engineering
University of Sharjah
kamel@sharjah.ac.ae, {waheebyaqub , kamel.kareem}@gmail.com
Abstract─ Databases most often contain critical information.
Unauthorized changes to databases can have serious
consequences and may result in significant losses for the
organization. This paper presents a viable solution for
protecting the integrity of the data stored in relational databases
using fragile watermarking. Prior techniques introduce
distortions to the watermarked values and thus cannot be
applied to all attributes. Our technique protects relational
tables by reordering tuples relative to each other according to a
secrete value (watermark). This paper introduces empirical
study on the effect of data distribution and other data measures
like the mean and standard deviation on the attack detection
rates. A study on the cost, in terms of the execution time, of the
proposed watermark insertion and data verification algorithms
is also presented.
I. INTRODUCTION
Information security and integrity are becoming critical
areas of research especially with the recent outburst of
Internet attacks and hostility. Adversaries launch attacks for
many reasons, most importantly for profit. Databases usually
contain critical information like salaries, ownership, land use,
personal information, etc. Unauthorized changes to such
databases might result in significant losses for organizations
and individuals. Recently, the database research community
came to realize the importance of watermarking in protecting
relational databases and particularly databases published on
the web [2] [5] [10] [12].
The state of the art in detecting unauthorized alterations in
relational databases can only detect some of the attacks on
specific types of attributes. These techniques use probabilistic
models. Thus, their detection rate depends on the type and
value of the modification. Moreover, prior database
watermarking techniques are designed to protect attributes
that tolerate distortion to their values.
In this paper, we propose a fragile watermarking technique
to protect the integrity of relational database. The proposed
technique does not distort the watermarked attribute. We
present an empirical study of the effect of various data
distribution and properties on the robustness of the proposed
technique.
In the next section, prior works in watermarking databases
are summarized. Section III describes the basic idea behind
using the fragile watermarking technique. In Section IV, we
describe the watermark insertion algorithm. Section V shows
how the attacks on the integrity of the data can be detected.
Section VI evaluates the effectiveness of the proposed
watermarking technique. Then a sensitivity analysis on the
effect of data distribution on the detection rate is introduced.
Finally the execution time of the watermark insertion and data
verification algorithm are presented. Concluding remarks and
future works are presented in Section VII.
II. RELATED WORK
Recently, the use of watermarking for data integrity and
copyright protection has attracted a lot of interest in the
databases research community [1,2,4,12]. Traditional
techniques in watermarking databases simply hide a secret
message in some of the attributes in the relation [13][14].
Normally, the watermark insertion introduces distortion to the
attribute by changing its original value. Thus these techniques
can only be applied to relations with attributes that are not
sensitive to small errors, e.g., temperature readings [2].
This scheme uses a secret key; tuples are first selected for
watermark embedding. A few bits of some arbitrary attributes
in the selected tuples are modified to embed watermark bits.
This scheme requires the entire database relation to be
available at the time of watermark embedding. Attributes like
salary, prices, and property coordinates cannot be used to host
watermarks. The number attributes that can be watermarked
depend on the nature of the database. Some relations might
have few watermark-able attributes, while other relations
might not have any attributes that can be watermarked. The
detection of malicious alteration is probabilistic and depends
on the number of watermark-able attributes. The larger the
number of watermark-able attributes the more secure the
database becomes.
Wayner [9] used a technique similar to our proposed
technique, for hiding text in the order of a list of songs.
Unlikeourproposedscheme,Wayner’s technique is a type of
steganography. Thus, it should be as robust as possible to
preserve the hidden message even after malicious attacks.
Sion et al. [12] presented a robust watermarking scheme that
sorts all the tuples and divides them into mutually exclusive
subsets. A single watermark bit is embedded in each subset
by modifying the distribution of tuple values. Watermark bits
are embedded more than once and an error-control coding
scheme is used to recover the embedded bits. This scheme is
not suitable for dynamic natured databases with frequent
updates, due to expensive maintenance costs.
Guo et.al. [5] proposed a technique for relational database
watermarking. The proposed technique arranges the tuples in
groups. For each group of tuples, a two dimensional grid of
watermarks is created. Along one dimension, a set of
watermarks is calculated for selected fields across all tuples.
Along the other dimension a watermark is created for each
tuple across all the fields. The watermark, which is stored in
the least significant bits, is created as a function of the values
of the attributes using a secure hash function. An attribute
change would affect two intersecting watermarks; a
horizontal one (at the tuple level) and a vertical one (at the
attribute level) that spans all the tuples in the group.
Identifying the affected watermarks would in turn identify the
altered attribute. This way attacks can be detected and the
victim attribute can be identified. The proposed technique, on
the other hand, protects a potentially important attribute (or a
function of multiple attributes) by shuffling the set of tuples
in a way that corresponds to the secret watermark.
We presented some preliminary thoughts about the use of
tuple shuffling to hide the secret watermark in [7]. In this
paper, we present complete implementation for the watermark
insertion and attack verification algorithms and show
extensive simulation results on the robustness and the
performance of the proposed algorithms.
Watermarking XML documents was studied by Gross-
Amblard [3] and Sion et al. [11]. The idea is to hide the
watermark in certain values in a way as to preserve the
response to certain queries.
III. PROTECTING THE RELATION USING FRAGILE
WATERMARK
Enterprise data is usually stored in relational databases. Even
though the relation will be accessed through the DBMS, the
attacker can access the hard disk to change attribute values in
the relation either manually or using malicious code. The fact
that the database relations and their indexes are disk resident,
makes them vulnerable to attacks by third party code even if
the DBMS is offline (off-line attack).
Table 1: Symbol table
Symbol Description
ai The ith
attribute in a relation R(a1, a2,…,am)
N Number of attributes in relation R
M Size of the watermarked group in terms of the
number of tuples
W The value of the watermark in decimal number
system
WF Watermark in factorial number system
OR A sorting order, e.g., ascending,
Thus the main objective is to identify unauthorized updates
in relational databases. The proposed method hides the
watermark in the relative order of tuples (records). Tuples are
organized into groups of size m and re-arranged in a way that
corresponds to the value of a secret watermark W. Table 1
lists the most common symbols used in the following
discussion. The ordering is done according to the value of the
sensitive attribute(s) that need to be protected. This re-
arrangement is done relative to a secret initial order, e.g.,
sorting the tuples in ascending order on the attribute value ai.
The watermark W and the initial order are secret and
known only to authorized users. Updates to databases, e.g.,
tuple insertion, deletion, and modification may require
adjustment of the relative order of the tuples within the group
to respect the hidden watermark W. Authorized users, who
know the value of W and initial order can update the relation
and adjust accordingly the order of the tuples in the group.
On the other hand, malicious alterations will disturb this
order. To be able to check the integrity of a tuple t, we need to
check conformance to the order of the group (that t belongs
to) to the initial order. Thus, the watermark value W is used to
reconstruct the initial order from the current order of the set
of tuples.
The algorithm in Fig. 1 shows how the watermark is
inserted in a group of tuples. If an organization is interested in
protecting the salary attribute, for example, in a payroll
system, then each group of m payroll tuples will be first
sorted according to the initial order. For simplicity, let us
assume that the initial order is an ascending numerical order.
Then the group of tuples is shuffled among each other
according to the value of the secret watermark W. The result
is that tuple j is placed in a location (relative to other tuples in
the same group) that corresponds to the value of awj. Suppose
that the Mallory (the attacker) increased the value of the
attribute awj in one of the tuples j in a group of m tuples. The
current location of tuple j does not correspond to the value of
its attribute awj. Authorized users who know the secret value
W and the initial order can then detect the alteration in awj.
IV. WATERMARK INSERTION
As an example, let us consider a payroll system where a
relation stores salary information of all employees in an
organization. Since databases are usually disk resident,
Mallory can use a stand-alone code to illicitly change the data
in the relation. The following are some possible attacks that
Mallory can launch against the database:
Modification attack: increasing or decreasing the value of
one or more attributes.
Insertion attack: insert a non-existing tuple in the original
relation.
Removal attack: remove one or more tuples from the
relation.
Recall that our watermarking technique hides the secret
value (watermark) in the relative order of the tuples. Notice
that insertion and deletion attacks would be easier to detect as
they will most likely disturb the order of the tuples in the
group.
To be able to rearrange a set of tuples (that belong to the
same group) in a way that corresponds to a specific
watermark value, we need to establish a one to one mapping
from all possible permutations of the tuples to all values of W.
Algorithm Name: Watermark insertion
Input: G is a group of m tuples; watermark WF
Output: GW
For (i = m-1 ; i > 0 ; i--)
//Circular-left-shift the subset G[x,y]
//by the value WF [i]
leftCircularShift(WF[i] , G[i+1,k])
Where:
G[x,y] is a subset of G from x to y
WF[i] is the ith
digit in WF
Fig. 1: Algorithm for watermarking a set of m tuples.
V. INTEGRITY CHECK ALGORITHM
This section shows how to check the integrity of the data
prior to its use. Before retrieving tuple T the DBMS runs the
integrity check algorithm on the group gT of tuples that T
belongs to. The integrity check algorithm can either be run
occasionally as a separate operation or prior to every read.
Since attacker (Mallory) does not know the watermark value
W nor the initial order, changing data in T would corrupt the
existing watermark.
The idea is this, given a watermarked group of tuples gT
and the secret watermark value W, we can reconstruct the
initial order Oi (Extraction algorithm). This operation can be
thought of as de-watermarking the group of tuples gT. The
extraction algorithm is opposite to the watermark insertion
algorithm shown in Fig. 1.
The initial order Oi is the expected order, e.g., in ascending
order. In this case, the reconstructed group of tuples should be
ordered such that the tuple with the least salary values appears
first followed by the next least in values and so on. However
if the order of the tuples does not follow the expected order
(according to the value of the salary attribute), then, at least,
one of the salary values in this group of tuples has been
attacked (or altered). On the other hand, the opposite is not
always true; if entries respect the ascending order of the
chosen attribute (e.g., salary) this does not necessarily mean
that there were no attacks. A salary values might have been
attacked but the change in value is too small to violate the
sorting order.
Another factor that affects the detection rate of malicious
attacks is the group size m or the number of watermarked
entries. The experiment section shows that the detection rate
increases as the group size m increases, allowing us to
achieve a 100% detection rate for small attacks with a
reasonable group size.
VI. EXPERIMENTAL EVALUATION
In this section we evaluate the effectiveness of the proposed
watermarking technique. First the robustness of the proposed
technique is evaluated by measuring the success rate in
detecting random attacks. Then a sensitivity analysis on the
effect of data distribution on the detection rate is introduced.
The experiments vary data distribution and other factors like
standard deviation, attack percentage and group size. The
code was written in Java and we ran experiments on data that
were generated using different data distributions. The attacks
are simulated as follows: first a group of tuples is randomly
chosen. A victim tuple is then selected at random. The victim
attribute is increased or decreased by a predefined percentage.
A number of interesting observations were noted.
Fig. 2: detection rate vs. the group size
A. The effect of the group size on the detection rate
Increasing the group size will increase the number of different
values that are watermarked (shuffled) together. This
experiment measures the attack detection rate and shows how
a change in the group size affects the detection rate of the
attack.
In Fig. 2 the y-axis shows the attack detection rate as a
function of group size m (x-axis), which is varied from 5 to
0
20
40
60
80
100
0 10 20 30 40 50
De
tect
ion
Rat
e
Group Size
50. For each group size, the victim value is increased by 20%;
the experiment is repeated 500 times and detection rate is
averaged. The first observation is that the detection rate is not
always 100%. The reason is that the detection technique
depends on the change in order. Thus, if the attack is so small
such that it does not disturb the initial order, the attack will
not be detected. The second observation is that the detection
rate improves rapidly and reaches to 100% with increasing
the group size. The reason for the rapid increase is that the
gap between every two consecutive values in the group
decreases as new tuples are added to the group.
B. The effect of the attack percentage on the detection rate
In these experiments we focus only on the protected attribute
(e.g. salary). We generate a set of salary values according to a
certain data distribution. In this subsection the data
distribution is assumed to be normal. The set of experiments
measure how the increase in attack percentage affects the
detection rate.
In Fig. 3, the y-axis shows the attack detection rate as a
function of attack percentage (x-axis), which ranges from
0.1% to 15%. Group size m is set to 50. Fig. 3 shows that the
detection rate is low when the attack percentage is small. This
is due to the fact that the change in the attribute value is not
large enough to disturb the initial order (ascending order in
this experiment). As the attack percentage increases, the
detection rate improves. Notice that we can achieve 100%
detection rate when the attack percentage is 21% or more. As
we will see later, a 100% detection rate can be achieved with
a small attack percentage by increasing the group size m.
Fig. 3: detection rate vs. attack percentage
C. Relation between detection rate and standard deviation
This experiment measures how the change in standard
deviation of generated data affects the detection rate. In this
experiment, the group size is set to 50, and the distribution of
data follows a normal distribution. Standard deviation of
generated data ranges between 5 and 70. All of the standard
deviations below 5 also have a detection rate of 100%.
Fig. 4 shows that the standard deviation is inversely
proportional to the detection rate. The higher the standard
deviation, the lower the detection rate. The reason is that as
the standard deviation increases, the separations between
consecutive values increase. Thus a malicious attack that
alters a victim value might not disturb the initial order.
Fig. 4: standard deviation effect on the detection rate (normal distribution)
D. Required group size to achieve a 100% detection rate
In this experiment we are trying to discover the minimum
empirical value for the group size if a certain detection rate is
required for a specific attack percentage. In this experiment
the data follows a normal distribution and the attack
percentage ranges from 0.5% to 10%. The standard deviation
of the data is set to 10. In Fig. 5 the x-axis shows the group
size m while the y-axis measures the detection rate. It is clear
that the detection rate decreases with the attack percentage.
However, one can always achieve a 100% detection rate by
arbitrarily increasing the group size.
Fig. 5: Group sizes that achieve certain detection rate
0
20
40
60
80
100
0 5 10 15
De
tect
ion
Rat
e
Attack %
0
20
40
60
80
100
0 20 40 60 80
De
tect
ion
Rat
e
Standard Deviation
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200
De
tect
ion
Rat
e
Group Size
Atk=0.5%
Atk=1%
Atk=3%
Atk=5%
Atk=10%
E. The effect of data distribution on the detection
This experiment was carried out to understand how different
data distributions affect the detection rate. Uniform, Normal
and Poisson distributions are used in this comparison.
In Fig. 6, all three distributions have the same standard
deviation, set to 10, and the same mean, set to 100. The attack
percentage is varied from 0.1% to 10% and the group size m
is set to 50. Fig. 6 shows that normal distribution performs
worse than uniform and Poisson distributions. Because of the
bell-shaped curve, 68% of the elements in normal distribution
will be around the mean and the rest is distributed with two or
more standard deviations away from the mean. The
separations between consecutive values are larger than that of
Poisson or uniform distributions.
Fig. 6: Comparing the detection rate for various distributions of data
F. The effect of the standard deviation on the detection rate
This experiment focuses on finding the minimum value for
a group size m, if certain detection is required for specific
standard deviation.
In this experiment, the data follows a normal distribution.
The standard deviation is varied from 10 to 70, and attack
percentage is kept 10% throughout this simulation. It is clear
that the detection rate decreases with increasing the standard
deviation. Fig. 7 shows that even though the detection rate
decreases with increasing standard deviation, one can achieve
a 100% detection rate by increasing the group size.
G. The cost of the proposed watermarking algorithms
This section shows how the change in group size affects the
performance of database when watermark is inserted and
when data integrity is verified. The experiment in Fig 8
measures the overheads, in terms of the execution time,
associated with watermark insertion and verification of data.
Fig. 7: To find a minimum group size that satisfies certain detection rate with
certain standard deviation
In Fig 8 the y-axis shows the execution time of the
watermark insertion algorithm ‘WM insertion’ and the
integrityverificationalgorithm‘Verification’inmsec. The x-
axis, which is varied from 5 to 200, shows the group size m.
For a value of m, a victim tuple is selected at random and the
value of the protected attribute is increased by 10% (to
simulate an attack). The experiment is repeated 500 times for
various victim tuples and the resulting detection rate is
averaged.
Fig 8: Group size effect on the performance in term of the execution time
The first observation is that the verification algorithm is
consistently more expensive than the watermark insertion
algorithm. Recall that the watermark insertion algorithm
consists of two main operations: a sorting operation and a
tuple shuffling operation. On the other hand, the verification
algorithm includes the same two operations, namely,
shuffling and sorting in addition to checking the conformance
of the tuples to the initial order.
The difference between the watermark insertion and the
verification algorithms is due to the conformance checking
operation. In this operation we compare each attribute value
(of the protected field, e.g., salary) with its neighbor. The
comparison is done to check whether the group’s initial order
0
20
40
60
80
100
120
0 2 4 6
De
tect
ion
Rat
e
Attack %
Normal
Uniform
Poisson
0
20
40
60
80
100
0 50 100 150 200
De
tect
ion
Rat
e
Group Size
StD=10
StD=20
StD=30
StD=50
StD=70
0
20
40
60
80
100
120
140
0 50 100 150 200
Tim
e(m
s)
Group Size
WM insertition
Verification
is violated. For group size of m tuples, the algorithm requires
m-1 comparisons. The complexity of conformance checking
operation is O(m). The second observation is that both
watermark insertion and verification costs are monotonically
increasing with increasing group size m. The reason is that
larger groups result in more tuples being included in the
shuffle and sort processes.
VII. CONCLUSIONS
This paper introduces a distortion free watermarking
technique for protecting the integrity of relational databases.
Tuples are organized into groups and shuffled in a way that
corresponds to a secret value. Unauthorized users do not
know the secret value (watermark) and thus changing the
protected attribute would cause the order to be disturbed. As a
result, the attack can be detected. In a few cases where the
attack (the change in the value) is so small such that the attack
does not disturb the order of the group of tuples, the detection
algorithm would fail to detect it.
We conducted simulation experiments to measure the
sensitivity of the proposed integrity scheme for various types
of attacks. The experiments show that a 100% detection rate
can always be achieved by increasing the group size m. We
also showed the effects of changing the data distribution on
the detection rate of the proposed technique. Moreover,
experiments that show the cost overhead, in terms of the
execution time, for the watermark insertion algorithm and the
verification algorithm are presented.
In the future, we will study ways to improve the detection
and verification rate of the proposed technique. We will also
study the effect of the length of the secret watermark on the
insertion and detection execution times.
REFERENCES
[1] R. Agrawal, J. Kieman, R. Srikant and Y.Xu.
"Hippocratic Databases". International Conference on
Very Large Databases (VLDB) 2002.
[2] R. Agrawal, P.J. Haas and J.Kiernan. “Watermarking
relationaldata:framework,algorithmsandanalysis”The
International Journal VLDB. vol. 12, issue 2, August
2003, pp. 157-169.
[3] I. J. Cox and M. L.Miller, “Electronic watermarking:
the first 50 years,” Journal on Applied Signal,
EURASIP, vol. 2002, issue 2, Feb 02, pp. 126-132. [4] Li, H. Guo, S. Jajodia, Tamper detection and
localization for categorical data using fragile
watermarks, in: The 4th
International ACM Workshop
on Digital Rights Management, October 2004.
[5] H. Guo, Y. Li, A. Liua and S. Jajodia, A fragile
watermarking scheme for detecting malicious
modifications of database relations, Information
Sciences, vol. 176, issue 10, 22 May 2006, Pages 1350-
1378.
[6] I. Kamel, “A schema for protecting the integrity of
databases”, International Journal of Computers and
Security, Elsevier, vol. 28, issue 7, October 2009, Pages
698-709.
[7] I. Kamel and K. Kamel, “Toward Protecting the
Integrity of Relational Databases”, WorldCIS-2011,
London, UK, Feb 2011.
[8] I. Kamel, and Q. Albluwi, A Robust Software
Watermarking for Copyright Protection, International
Journal of Computers & Security, Elsevier, vol. 28,
issue 6, pp. 395-409, 2009. [9] P. Wayner, “Disappearing Cryptography,” Morgan
Kaufmann, Second Edition, 2002.
[10] Y. Li, V. Swarup, S. Jajodia, A robust watermarking
scheme for relational data, in: Proceeding of the 13th
Workshop on Information Technology and Engineering,
December 2003, pp. 195–200.
[11] R. Sion,M. J. Atallah and S. K Prabhakar. “Resilient
information hiding for abstract semistructures,”.
International Workshop on Digital Watermarking
(IWDW), 2003.
[12] R. Sion, M. Atallah, S. Prabhakar, Rights protection for
relational data, in the Proceeding of the ACM
International Conference SIGMOD, pp. 98–109, 2003.
[13] R. Halder, S. Pal, A. Cortesi,“WatermarkingTechniques
for Relational Databases: Survey, Classification and
Comparison”, Journal of Universal Computer Science,
vol. 16, no. 21 (2010), 3164-3190
[14] U. Rao , D. Patel , P. Vikani, “Relational Database
Watermarking for Ownership Protection”, 2nd
International Conference on Communication, Computing
&Security (ICCCS)-2012