Image Replica Detection based on Binary Support Vector Classifier

15
ITS EPFL 2005.27 1 Image Replica Detection based on Binary Support Vector Classifier Yannick Maret, Fr´ ed´ eric Dufaux, and Touradj Ebrahimi Abstract—In this paper, we present a system for image replica detection. More specifically, the technique is based on the extraction of 162 features corresponding to texture, color and gray-level characteristics. These features are then weighted and statistically normalized. To improve training and performances, the features space dimension- ality is reduced. Lastly, a decision function is generated to classify the test image as replica or non-replica of a given reference image. Experimental results show the effectiveness of the proposed system. Target applications include search for copyright infringement (e.g. variations of copyrighted images) and illicit content (e.g. pedophile images). Index Terms— image replica detection, features extrac- tion, support vector machine, dimensionality reduction, image search, copyright infringement I. I NTRODUCTION In this paper, we propose a system to detect image replicas. By replica, we refer not only to a bit exact copy of a given reference image, but also to modified versions of the image after minor manipulations, malicious or not, as long as these manipulations do not change the perceptual meaning of the image content. In particular, replicas include all variants of the reference image obtained after common image processing manipulations such as compression, filtering, and adjustments of contrast, saturation or colors. The proposed image replica detection system can be applied to detect copyright infringement by identifying variations of a given copyrighted image. Another application is to discover illicit content, such as child pornography images. The problem of image replica detection is a particular subset of the more general problem of content-based search and retrieval of multimedia content. In recent years, multimedia search and retrieval has been the subject of extensive re- search works and standardization activities such as MPEG-7 [1], [2] and the more recent JPSearch [3]. However, the problem of image replica detection has so far been the focus of fewer research efforts. Two approaches to detect image replicas are watermarking [4] and robust fingerprinting [5]– [7]. Watermarking techniques [4] consist in em- bedding a signature in the reference image before dissemination. Replicas of the reference image can subsequently be detected by verifying the presence of the watermark. This class of techniques typically achieves high efficiency for the correct classification of replicas and non-replicas. However, it requires to modify the reference image, namely to embed a sig- nature, prior to its distribution. Unfortunately, this is not always possible. For instance, the method is not applicable to already disseminated copyrighted content or in the case of illicit content. Robust fin- gerprinting techniques [5]–[7] analyze the reference image in order to extract a signature associated with the image content. Replicas are then identified whenever their signatures are close to the one of the reference. This class of techniques is often based on a single feature, for example characteristic points of the Radon transform [5], log-mapping of the Radon transform [6], or intra-scale variances of wavelet coefficients [7]. While it is usually robust, com- putationally efficient, and suitable for fast database indexing and retrieval, it however performs poorly for the accurate classification of replicas and non- replicas. More recently, techniques for image replica de- tection have been described in [8], [9]. Ke et al [8] propose a method based on the extraction of fea- tures, referred to as Key Points (KPs), which are stable in the scale-space representation. An image is typically represented by thousands of KPs. Test images are then classified as replicas or non-replicas using local sensitive hashing to match their KPs to those of the reference image. More specifically, no distance is directly computed, but it is rather the number of matching KPs which quantifies the similarity between two images. While this approach

Transcript of Image Replica Detection based on Binary Support Vector Classifier

ITS EPFL 2005.27 1

Image Replica Detection based on Binary SupportVector Classifier

Yannick Maret, Frederic Dufaux, and Touradj Ebrahimi

Abstract— In this paper, we present a system for imagereplica detection. More specifically, the technique is basedon the extraction of 162 features corresponding to texture,color and gray-level characteristics. These features arethen weighted and statistically normalized. To improvetraining and performances, the features space dimension-ality is reduced. Lastly, a decision function is generatedto classify the test image as replica or non-replica ofa given reference image. Experimental results show theeffectiveness of the proposed system. Target applicationsinclude search for copyright infringement (e.g. variationsof copyrighted images) and illicit content (e.g. pedophileimages).

Index Terms— image replica detection, features extrac-tion, support vector machine, dimensionality reduction,image search, copyright infringement

I. I NTRODUCTION

In this paper, we propose a system to detectimage replicas. By replica, we refer not only toa bit exact copy of a given reference image, butalso to modified versions of the image after minormanipulations, malicious or not, as long as thesemanipulations do not change the perceptual meaningof the image content. In particular, replicas includeall variants of the reference image obtained aftercommon image processing manipulations such ascompression, filtering, and adjustments of contrast,saturation or colors.

The proposed image replica detection systemcan be applied todetect copyright infringementbyidentifying variations of a given copyrighted image.Another application is todiscover illicit content,such as child pornography images.

The problem of image replica detection is aparticular subset of the more general problem ofcontent-based search and retrieval of multimediacontent. In recent years, multimedia search andretrieval has been the subject of extensive re-search works and standardization activities such asMPEG-7 [1], [2] and the more recent JPSearch [3].

However, the problem of image replica detection hasso far been the focus of fewer research efforts.

Two approaches to detect image replicas arewatermarking [4] and robust fingerprinting [5]–[7]. Watermarking techniques [4] consist in em-bedding a signature in the reference image beforedissemination. Replicas of the reference image cansubsequently be detected by verifying the presenceof the watermark. This class of techniques typicallyachieves high efficiency for the correct classificationof replicas and non-replicas. However, it requires tomodify the reference image, namely to embed a sig-nature, prior to its distribution. Unfortunately, thisis not always possible. For instance, the method isnot applicable to already disseminated copyrightedcontent or in the case of illicit content. Robust fin-gerprinting techniques [5]–[7] analyze the referenceimage in order to extract a signature associatedwith the image content. Replicas are then identifiedwhenever their signatures are close to the one of thereference. This class of techniques is often based ona single feature, for example characteristic points ofthe Radon transform [5], log-mapping of the Radontransform [6], or intra-scale variances of waveletcoefficients [7]. While it is usually robust, com-putationally efficient, and suitable for fast databaseindexing and retrieval, it however performs poorlyfor the accurate classification of replicas and non-replicas.

More recently, techniques for image replica de-tection have been described in [8], [9]. Keet al [8]propose a method based on the extraction of fea-tures, referred to as Key Points (KPs), which arestable in the scale-space representation. An imageis typically represented by thousands of KPs. Testimages are then classified as replicas or non-replicasusing local sensitive hashing to match their KPsto those of the reference image. More specifically,no distance is directly computed, but it is ratherthe number of matching KPs which quantifies thesimilarity between two images. While this approach

2 ITS EPFL 2005.27

achieves very good performance for replica detec-tion, it requires a computationally complex featuresextraction step. Qamraet al [9] propose a differentmethod based on the computation of a perceptualdistance function (DPF). More precisely, a DPF isgenerated for each pair of reference and test images,to measure the similarity between the two. The mainidea of the approach is to activate different featuresfor different images pairs. More specifically, onlythe most similar features are taken into account tocompute the distance. While this method achievesgood performance, it is inferior to [8].

In this paper, we introduce a new approach forimage replica detection based on our earlier workin [10]–[12]. More precisely, we extract 162 featuresfrom each image, representing texture, color andgray-level characteristics. The resulting features arethen weighted by the proportion of pixels contribut-ing to each feature, and statistically normalized toensure that the features are commensurable. In thenext step, the dimensionality of the features spaceis reduced. In this way, less training examples areneeded and only features relevant to the given im-ages are kept. Finally, a decision function is built todetermine if the test image is a replica of the givenreference image. Note that the considered approachconsider a replica detector that is specifically fine-tuned toeachreference image.

Simulation results show the effectiveness of theproposed system. For instance, for an average falsenegative rate of 8%, we achieve a fixed false positiverate of1 · 10−4. Indeed, our technique significantlyoutperforms DPF [9] even though we use fewerfeatures. While our performance is not as goodas KPs [8], we obtain a speed up in terms ofcomputational complexity in the range of 2 to 3orders of magnitudes.

This paper is structured as follow. We present anoverview of the proposed replica detection systemin Sec. II, and a more thorough description of thevarious algorithmic steps in Sec. III. In order toevaluate the performance of the proposed system,the evaluation methodology is defined in Sec. IV,and experimental results are reported in Sec. V.In Sec. VI, we discuss applications of the system.Finally, we draw conclusions in Sec. VII.

II. OVERVIEW AND PRELIMINARY REMARKS

We first present an overview of the proposedreplica detection system. The system consists of

six steps as shown in Fig. 1a. An outline of eachstep is provided in Sec. II-A. The method can bedecomposed into two distinct parts. The first one,consisting of the steps shown in the upper part ofFig. 1a, is independent from the reference image.Conversely the second one, comprising the stepsshown in the lower part of Fig. 1a, depends onthe reference image. Therefore the latter steps needto be trained. To achieve this, training examplesare needed for both replicas and non-replicas, asdetailed in Sec. II-B. The training phase is outlinedin Fig. 1b. The training performance is assessedusing the F-score metric described in Sec. II-C.Notations used throughout this paper are detailedin Sec. II-D.

A. Method Overview

a) Image preprocessing:In the first step thetest image is preprocessed. More specifically, theimage is resized, and represented in a modified HSIcolor space. It adds some invariance against com-mon image processing operations, such as resizingand illumination changes.

b) Feature Extraction: Feature extractionmaps images into a common space, where compari-son is easier. For this purpose global statistics, suchas color channels and textures, are extracted fromthe test image.

c) Weighted Inter-image Differences:In thethird step, the test image features are subtractedfrom those of the reference image, and ‘incom-mensurable’ features are penalized. For example,statistics about yellow pixels are incommensurablewhen the test and reference images contain verydifferent proportions of yellow pixels.

d) Statistical Normalization:In the fourth stepthe inter-image differences are statistically normal-ized. In other words, the same importance is givento each feature, independently of their value range.

e) Dimensionality Reduction:In the fifth stepthe feature dimensionality is reduced. Less trainingexamples are needed, and only feature mixturesrelevant to the replica detection task are kept.

f) Decision Function:Finally, in the last stepa decision function is used to determine if the testimage is a replica of the reference image.

B. Training Examples

Examples of replica imagescan be generatedartificially. Indeed, the reference image can be mod-

MARET et al.: IMAGE REPLICA DETECTION BASED ON BINARY SUPPORT VECTOR CLASSIFIER 3

te

st im

ag

e

featu

res,

weights

reference DB

refe

rence im

age

norm

alization

consta

nts

re

plic

a/n

on

-re

plic

a

reduction

m

atrix

classifier

features

extraction

Statistical

norm.

dim.

reduction

decision

function

weighted

inter-image

difference

image

preprocessing

(a) Block Diagram for the Testing Phase. A test image is given to the systemthat determines if it is a replica of a given reference image contained in thedatabase. The method can be decomposed into two distinct parts: steps that areindependent from the reference image (upper part of the figure), and steps thatdepend on the reference image (lower part of the figure).

Statistical

norm.

dim.

reduction

decision

function

weighted

inter-image

difference

(b) Block Diagram for the Training Phase. The featuresfref of the reference image, the training examples{fi}i,and the corresponding labels{yi}i are fed to the trainingalgorithm that produces the parameters of the referenceimage dependent steps.

Fig. 1: Block diagram of the replica detection system. The system is composed of two phases, namelytraining and testing.

ified using different operations, resulting in severalreplicas. In this works, the replicas are generatedby the operations listed in Table I. Furthermore, itis possible to have a richer set of training examplesby nesting two or more operations to form a newoperator known as acomposition. However, weassume that an operation cannot be nested more thanonce in the same composition. For example, a JPEGcompression cannot be followed by another JPEGcompression with the same or a different qualityfactor. In this way,420 replicas of the referenceimage are synthesized by using up to two nestinglevels of compositions.

Examples ofnon-replicaimages can be obtainedby using a set of images that are known to benon related to the reference image. This set canalso be enriched by applying operations on itselements. In this study, we only consider the gray-level conversion. It permits to enrich the training setwith gray-level images in order to avoid relying tooheavily on the color features.

C. Training Metric

The F-score metricF (·) is used to assess thedetection performance during the training phase.

TABLE I: Training replicas generation.Image op-erations and their parameters.

Operations ParametersJPEG compression Q = 10, 50Gaussian noise addition σ = 20/255, 60/255Resizing s = 0.8, 1.2Averaging filter order= 2, 4Gamma correction γ = 0.8, 1.2Horizontal flipping NAgray level conversion NArotation θ = 90◦, 180◦, 270◦

cropping keep 50% and 80%V channel change -10% and +10%S channel change -10% and +10%

The F-score is defined as follows [13]:

F (TP, FP, P ) =TP

TP

TP + FP, (1)

where P is the the total number of positive in-stances,TP is the number of positive instances cor-rectly classified, andFP is the number of negativeinstances wrongly classified. The first term in theright hand side of (1) corresponds to theRecall.Conversely the second term represents theprecision.F-score balances these two conflicting properties:precision increases as the number of false positivesdecreases, and recall decreases as the number of

4 ITS EPFL 2005.27

false negatives diminish (usually meaning that thenumber of false positives increases).

Equation (1) can be rewritten as:

Fρ (rfp, rfn) =(1 − rfp) · (1 − rfn)

1 + ρ · rfp − rfn

, (2)

where rfp and rfn are the false positive and falsenegative rates.ρ = N/P gives the ratio betweenthe number of negative and positive instances. Inthe rest of the document, we use the formulationgiven by (2). One drawback of this metric lies in theratio ρ between the number of negative and positiveinstances. It has to be known beforehand.

D. Notations

Subscript in Greek letters index vector ele-ments. Subscripts in Roman letters index vectors(or scalars). Training patterns (or examples) aredenoted asxi, with i = 1, . . . ,m wherem is thetotal number of training patterns. During the trainingphase, a labelyi is assigned to each patternxi. Apattern corresponding to a replica is simply called areplica and labeledyi = +1. Otherwise it is calleda non-replicaand labeledyi = −1.

We denote theα-th feature of an image asfα.All features of an image can be held in a columnvector denoted asf .

III. R EPLICA DETECTION SYSTEM

We now describe thoroughly the proposed replicadetection system. In particular, each step presentedin Fig. 1 is detailed along with the training proce-dures whenever required.

A. Image Preprocessing

Before extracting features, an image is firstcropped such that only 70% of its center region iskept. It introduces a weak robustness to operationssuch as framing. Then, it is resized such that itcontains approximately216 pixels (corresponding toa square image of256× 256 pixels), while keepingits original aspect ratio. It introduces a weak formof scale invariance and permits to speed up featureextraction by reducing the number of pixels toprocess.

The cropped and scaled image is then representedin a modified Hue Saturation Intensity (HSI) space:

the logarithmic Hue, Saturation, and equalized-Intensity space. More specifically, the logarithmicHue, Hlog, is defined as follows [14]:

Hlog =log R − log G

log R + log G − 2 log B, (3)

where R, G and B are the red, green and bluevalues of a pixel. The logarithmic Hue has theadvantage to be invariant to gamma and brightnesschanges [14]. The Saturation,S, is the same as forclassical HSI [15]:

S = 1 −3 min(R,G,B)

R + G + B. (4)

By construction, the Saturation is invariant tochanges in illumination. Finally, the equalized Il-lumination,Iequ, is given by

Iequ = T

(R + G + B

3

)

, (5)

where T(·) is the global histogram equalizationoperator [15]. The equalization permits to make theIntensity mostly invariant to changes of gamma andbrightness as shown in Appendix .

B. Features Choice and Extraction

In order to compare the similarity between twoimages, visual features are extracted. The goal offeature extraction is twofold. First, it maps imagesonto a common space where they can be more easilycompared. Second, it reduces the space dimension-ality by keeping only the relevant information.

Several visual features can be envisioned: color,texture, shape, etc. For an extensive survey ongeneral features extraction, refer to [16]. The choiceof features depends on the image type. In the case ofthe image replica detection problem, it also dependson the type of replicas that are to be detected. Forinstance, if rotated images are considered, it wouldmake sense to choose at least one feature that isrotation invariant.

The features used in this work are of three types:texture, color and gray-level statistics. They aresimilar to those used in [9]. The main differencesare the added24 gray-level features, and the absenceof ‘local’ statistics. These features are found to givegood results in image retrieval applications [9]. Intotal, we extract162 features, as shown in Table II.They are detailed in the following subsections.

MARET et al.: IMAGE REPLICA DETECTION BASED ON BINARY SUPPORT VECTOR CLASSIFIER 5

TABLE II: Features overview.List the types of usedfeatures and the number of extracted statistics.

name ♯ featuresGabor, squared coeff. mean 30Gabor, squared coeff. std dev. 30Color, histogram 10Color, channel mean 24Color, channel std dev. 24Color, spatial distribution 20Gray-level, histogram 8Gray-level, spatial distribution 16total 162

1) Texture Features:The texture features arecomposed of the first and second order statistics ofeach subband of the Gabor transform. The latter isperformed as in [17] on the equalized Illumination.More precisely, the used parameters are0.75 forthe upper center frequency,0.05 for the lower centerfrequency, five scales and six orientations. Mean andstandard deviation estimates of the squared coeffi-cients are computed for each of the30 subbands.It results in a total of30 mean and30 varianceestimates.

2) Color Features:The color features are basedon the modified HSI color space presented inSec. III-A. Each pixel in the image is classified intoone of tencolor classesdepending on its positionin this space. The classes are the achromatic colors(S = 0) black, gray and white, and the chromaticcolors (S > 0) red, orange, yellow, green, cyan,blueandpurple. The equalized Illumination is usedto classify a pixel into one of the three achromaticclasses. The logarithmic Hue is used to classify apixel in one of the seven chromatic classes.

This is similar to the ‘culture’ color approachproposed in [9]. In this study, pink and brownare also considered, whereas in our case these twocolors are classified as red or orange. Brown andpink have the same Hue as red or orange, butdiffer in the Intensity and/or Saturation channels.Operations such as saturation or intensity changesare common in image processing, and modify theIntensity and the Saturation channels but not theHue channel. If brown and pink are considered, redor orange pixels could be transformed into brown orpink pixels, or vice versa. For this reason, we havedecided to include brown and pink within the redand orange classes.

a) Color Classes Histogram:A histogram iscomputed, giving the proportion of each color class

in the image. It is normalized such that it sums toone. It comprises10 values.

b) Channel Statistics:Mean and variance es-timates of the equalized Intensity channel are com-puted for each color class. Mean and variance es-timates of Saturation and logarithmic Hue channelsare also computed for each chromatic color class.It results in a total of24 mean and24 varianceestimates.

c) Spatial Distribution Shape:The shape ofthe spatial distribution of each color class is com-puted. This is achieved by computing two shapescharacteristics for each color class: spreadness andelongation [18], [19]. The first characteristic mea-sures the compactness of the spatial distribution of acolor class. The second one reflects how much thespatial distribution has more length than width. Itresults in10 spreadness and10 elongation measures.

3) Gray-Level Features:The gray-level featuresare based on the equalized Intensity channel of theHSI model. The dynamic range of the image islinearly partitioned into eight bins corresponding toas many classes. Each pixel of the image falls intoone of these bins.

The use of gray-level feature is important becausethe color features can be unsuited in some cases.For instance, it can happen when the reference orthe test images are gray-level, or when conversionto gray-level is one of the considered operations inthe replica detection system.

a) Gray-level Classes Histogram:A gray-level classes histogram is computed, giving the pro-portion of the eight intensity range in the image. Itis normalized such that it sums to one. It comprises8 values.

b) Spatial Distribution Shape:As for color,the shape of the spatial distribution of each gray-level class is computed. It results in8 spreadnessand8 elongation measures.

C. Weighted Inter-Image Differences

Inter-image differences are computed in thissteps. They are basically the difference between thestatistics of the test image and those of the referenceimage.

The channels statistics and spatial distributionshape for color classes, and the spatial distributionshape for gray level correspond to non-overlappingregions of the images. These regions have not neces-sarily the same size for different images. Therefore,

6 ITS EPFL 2005.27

a legitimate question is whether image statistics arecomparable when the regions from which they areestimated differ significantly in size.

In the following, a method is proposed thatgenerates small inter-image features when inter-image class proportions are similar and large oneswhen they are dissimilar. More precisely, inter-image statistics corresponding to significantly dif-ferent region size are penalized.

A ‘weight’ wα is associated to each featurefα.The weight gives the proportion of pixels takeninto account to computefα. For the color features,the weights are the entries of the color histogram.Likewise, the weights corresponding to gray-levelfeatures are those of the gray-level histogram. Forthe remaining features, no weight is used.

We define the weighted inter-image differenceδα

as:

δα = fα− fα+ sα ·sgn(fα − fα

)·(wα − wα)2 , (6)

where fα and wα are theα-th feature, respectivelyweight, of the reference image, andsα is a non-negative parameter that gives more or less impor-tance to the discrepancy between the weights. Thesign function is such thatsgn(0) = +1.

The idea behind (6) is as follows. On the onehand, a replica feature is assumed to be close tothat of the reference image. Consequently,fα − fα

and wα − wα are small, resulting in smallδα. Onthe other hand, a non-replica feature has no relationto that of the reference image. Therefore it is lesslikely that bothfα andwα are simultaneously closethan onlyfα or wα. That is, (6) ensures thatδα is,with higher probability, smaller for a replica thanfor a non-replica.

The parameterssα are found through a trainingprocedure. To achieve this, simple classifiers areconsidered. There are as many simple classifier thanfeaturesα, and the decision functions are given by:

sgn (Tα − |δα(sα)|) , (7)

whereTα are thresholds chosen among the|δα(sα)|corresponding to the replicas. Thesα maximizingthe F-score of these classifiers are then chosen.Namely, thesα are computed as follows:

sα = argsαmaxsα,Tα

Fρ (rfp, rfn) , (8)

where rfp and rfn give estimates of the falsepositive and false negative rates on the simple

classifiers using thresholdsTα and parameterssα.To reduce computation time, thesα are chosenamong the following candidatesµα · 10k, wherethe µα are the average value of the|fα| over alltraining non-replicas for theα-th feature, andk =−∞,−1, 0, +1, +2.

D. Normalization

The weighted inter-image difference are normal-ized using a statistical normalization method [20].More precisely, letµα and σα be the mean andstandard deviation estimates of theα-th inter-imagedifference over a subset of the training set. Trainingexamples for which any feature is an extremumover the training set are ignored. Therefore, outliersexamples are not taken into account. The normalizedinter-image differenceδα is then given by:

δα =δα − µα

k · σα

, (9)

where δα is the inter-image feature given in (6).By Tchebychev’s theorem, at least a fraction1 −1/k2 of the δα are within the interval[−1, 1]. In thefollowing k is set to10 so that more than 99% ofthe features are within[−1, 1]. The features outsidethis interval are clipped to+1 or −1.

The goal of normalization is to ensure that thefeature elements are commensurable. This is espe-cially important for dimensionality reduction.

E. Dimensionality Reduction

The relatively large number of features preventsto directly use many classification techniques on theweighted inter-image differences. It would requiresa prohibitively large number of training examples,and the optimization process would probably yieldan overtrained decision function due to the ‘curseof dimensionality’ [21]. For this reason, the di-mensionalityD of the vector δ is reduced tod.In our case, the initial dimensionD is 162. Thedimensionality reduction makes use of the followinglinear transformation:

x = WD→d · δ, (10)

where thed × D matrix WD→d is computed usinga slightly modified version of the method pro-posed in [22]. The method is based on independentcomponent analysis (ICA) [23]. It adds the classinformation to the feature vector in order to elect

MARET et al.: IMAGE REPLICA DETECTION BASED ON BINARY SUPPORT VECTOR CLASSIFIER 7

Require: {xi}i, {yi}i, dEnsure: WD→d

for each examplei, set xi = [xTi yi]

T

apply ICA to {xi}i → separating matrixWa

for each rowα of W, setaα = 1D+1

PD+1β=1 |wαβ |

setγ = 0.9 and∆γ = 0.001repeat

setR = W

for each elementαβ of R, setrαβ = 0 if |rαβ | < γ · aα

delete rowsα of R for which rα(D+1)

PD

β=1 |rαβ | = 0

set d = “number of rows inR”if d = d then

delete the(D+1)-th column ofRsetWD→d = R

elsesetγ = γ + sgn(d − d) · ∆γ

end ifif no convergence tod then

set∆γ = ∆γ/2end if

until d = d

aW size is(D+1) × (D+1)

Fig. 2: Dimensionality Reduction Algorithm.It is amodified version of [22] that enables to seta priorithe number of dimensiond.

the independent components best suited to the bi-nary classification problem. The algorithm [22] ismodified so that the number of dimensions can beseta priori. The method is briefly described in theFig. 2.

F. Decision Function

The decision function needs to determine whetherthe vectorx corresponds to a replica of the referenceimage. It is a binary classification problem, wherethe two classes correspond to replicas and non-replicas, respectively. The goal is to build, using alimited number of training examples, a classifier thatgeneralizes well to novel patterns. Many classifica-tion algorithms can be used for this purpose. In ourprevious works [11], [12], we showed that SupportVector Machine (SVM) yielded good performancesfor the replica detection problem.

The basic SVM [24], [25] is a binary classifierthat separates two classes with an hyperplane. Fur-thermore, non-linear kernels allow to map patternsinto a space where they can be better discriminatedby a hyperplane.

1) Support Vector Machine:We use theν-parametrization [24], [26] of the SVM, and a ra-dial basis function as kernel. The dual constrainedoptimization problem is given in (11). In the dual

form, the Lagrangian is maximized by optimizingthe Lagrangian multipliersαi.

maxα

−1

2

m∑

i,j=1

αiαjyiyj k(xi,xj), (11)

subject to the constraints∑m

i=1 αiyi = 0,∑m

i=1 αi ≥ν, and0 ≤ αi ≤ 1/m. In this work, we use a radialbasis function kernel given by:

k(xi,xj) = exp

(

−||xi − xj||

2

σ2

)

. (12)

The parameters of this classification technique areν ∈ [0, 1] and σ ∈ R

+. The parameterν can beshown to be an upper bound on the fraction oftraining errors, and a lower bound on that of supportvectors [24], [26]. The kernel parameterσ controlsthe complexity of the decision boundary. The con-strained optimization problem given in (11) can besolved by means of standard quadratic programmingtechniques.

The decision function indicates to which class thetest patternz belongs. It is given by:

f(z) = sgn

(m∑

i=1

yiαi k(z,xi) + b

)

, (13)

where the constant b is determined bythe support vectors. More precisely,b = yk −

∑mi=1 yiαi k(xi,xk), for all xk such

that 0 < αk < 1/m. The name support vectorsstems from the fact that many of the optimizedαi are equal to 0. Hence, only a relatively smallfraction of the training patterns defines the decisionfunction.

2) Determination of the Classification Parame-ters: In theν-SVM, the kernel parameterσ and theparameterν are to be determined. More precisely,they need to be set in order to minimize thegen-eralization error is minimized, which is the errorobtained when testing novel pattern with a traineddecision function.

More precisely, we want to minimize the F-scoreF (rfp, rfn, ρ) whererfp is the generalization errorfor false positive (novel non-replicas classified asreplicas), rfp is the generalization error for falsenegative (novel replicas classified as non-replicas),andρ is the ratio between the number of novel non-replicas and replicas. In the considered application,there are usually many more non-replicas than repli-cas so thatρ ≫ 1. Nevertheless,ρ remainsa priori

8 ITS EPFL 2005.27

unknown. Moreover,rfp andrfn are also unknownand need to be estimated.

Cross-validation is a popular technique for es-timating generalization error. Ink-fold cross-validation, the training patterns are randomly splitinto k mutually exclusive subsets (the folds) of ap-proximately equal size. The SVM decision functionis obtained by training onk−1 of the subsets, and isthen tested on the remaining subset. This procedureis repeatedk times, with each subset used for testingonce. Averaging the test error over thek trials givesan estimate of the expected generalization error.This method was shown to yield a good estimationof the generalization error [27].

In the following, we use a normalized version ofthe radial basis function kernel whereσ in (12) isreplaced byκ·σ. The normalization constantκ is setto the second decile of the distribution of the intra-replica distances within the training set. It ensuresthat the optimal value ofσ is larger than one withhigh probability.

While ν has an intuitive signification, it is notclear what should be its optimal value [26], [28]. Itwas shown that twiceR, a close upper bound on theexpected optimal Bayes risk, is an asymptoticallygood estimate [28]. However, no such bound canbe easily determineda priori.

In this work, good parameters forσ andν are es-timated in two steps:coarseandfine grid searches.In each step, a tenfold cross-validation is carried outfor each feasible pairs(ν, σ). The pair for which theestimated F-score is the highest is then chosen. Thetried pairs are the following:

• Coarse search:(σ, ν) for ν = 0.05 · 2k, k =−4, ..., 4 andσ = k, k = 1, . . . , 10.

• Fine search:(σ, ν) for ν = ν1 · (1 + k/6), k =−2, . . . , +2 and σ = σ1 · (1 + k), k =−2, . . . , +2. Here,ν1 andσ1 denote the valuedetermined in the first step.

IV. EVALUATION METHODOLOGY

A. Test Images

To simulate the performance of the proposedapproach, we used the same image database asin [8]1. It contains18, 785 photographs including(but not limited to) landscapes, animals, construc-tions, and people. The image sizes and aspect ratios

1 http://www-2.cs.cmu.edu/ ˜ yke/retrieval/

are variables, for example900× 600, 678× 435, or640×480. They are mostly color images, except forabout one thousand images that are gray-levels.

For training, we randomly selected700 imagesin the database. Among the selected pictures,200are randomly chosen to be the reference images,and the remaining are used as non-replica examplesduring the training phase. For each reference image,a replica detector is built as described in Sec. III.

The replica detectors are then tested on the re-maining images in the database. This permits toestimate the false positive rate for each referenceimages. The false negative rate is estimated bytesting the replica detectors on test replica exam-ples. They are generated by the transforms listedbelow. These operations are the same than the onesused in [8], [9]. They are implemented using thefree software suite ImageMagick2. There are twelvecategories, as shown thereafter. An example for eachof them is depicted in Fig. 3.

• Colorizing. Tint the Red, Green, or Blue chan-nel by 10%;

• Changing contrast. Increase or decrease thecontrast using ImageMagick’s default param-eter;

• Cropping. Crop by 5%, 10%, 20%, or 30%;• Despeckling. Apply ImageMagick’s despeck-

ling operation;• Downsampling. Downsample by 10%,

20%,30%,40%, 50%, 70%, or 90% (withoutantialiasing filtering);

• Flipping. Flip along the horizontal axis.• Color Depth. Reduce the color palette to 256

colors;• Framing. Add an outer frame of 10% the image

size. Four images are produced with differentframe color.

• Rotating. Rotate by 90◦, 180◦or 270◦.• Scaling. Scale up by 2, 4, 8 times, and down

by 2, 4, 8 times (use antialiasing filter).• Changing Saturation. Change the values of the

saturation channel by 70%, 80%, 90%, 110%,or 120%.

• Changing Intensity. Change the intensity chan-nel by 80%, 90%, 110%, or 120%.

2 http://www.imagemagick.org

MARET et al.: IMAGE REPLICA DETECTION BASED ON BINARY SUPPORT VECTOR CLASSIFIER 9

(a) Original

(b) Replicas

Fig. 3: Examples of Test Replicas.There is onereplica example per category, the order used is thesame than in the text (left-right, top-down).

B. Evaluation Metrics

In order to evaluate the performance of the pro-posed system, we measure the tradeoff between thefalse positiveand false negativerates.

The Receiver Operating Characteristic (ROC)curve [13] is often used to represent the tradeoffbetween error types. In this study, we use a variantof the ROC curve called Detection Error Tradeoff(DET) curve [29]. In DET curve error rates areplotted on both axis. It means that both axis canmake use of a logarithmic scale. The interpretationof DET curves is analogous to that of ROC curves:a classifier X is more accurate than a classifier Ywhen its DET curve is below that of Y.

The DET curve is computed using the valuesunder the sign function in the right-hand side of(13). Every reference image detector produces aDET curve, which are synthesized in a single DETcurve by using vertical averaging [13]. The max-

TABLE III: Vertically averaged DET curve pre-cision. The precision is obtained by taking intoaccount the number of test examples (18, 085 forthe non-replicas, and40 for the replicas), and thenumber of individual DET curves used for averaging(200).

error type maximal absolute precisionrfp ± 1

18,085= ±5.5 · 10−5

rfn ± 140·200

= ±1.3 · 10−4

imal absolute precision that can be achieved onthe vertically averaged DET curve is reported inTable III. It takes into account the number of testreplicas (P = 40), the number of test non-replicas(N = 18, 085), and the number of individual DETcurves (n = 200).

V. RESULTS

In this section, we present experimental results inorder to evaluate the performance of the proposedreplica detection system. In the following, the usedparameters ared = 50 for dimensionality reduction,andρ = 104 for the F-score parameterization, unlessstated otherwise.

A. Influence of the F-Score Metric Parameterization

In this experiment, we explore the effect ofpossible parameterizations of the F-score metric.The valueρ gives the ratio between the number ofexpected non-replica instances and that of expectedreplica instances. In the considered applications,these numbers can hardly be determineda priori.However, we can safely assume thatρ is much largerthan one because there are many more non-replicasthan replicas.

Figure 4a shows the vertically averaged DETcurve for ρ = 1. Then, Fig. 4b depicts the verticaldifferences between the averaged DET curves forρ = {102, 104, 106} and that forρ = 1. Globally,different values ofρ influence only slightly onthe results, namely the difference is less than1%.However, highρ values favor classifiers with verylow false positive rates while keeping reasonablefalse negative rates.

B. DET Curves Distribution

We now analyse the distribution of DET curvesbefore vertical averaging. Figure 5 shows the false

10 ITS EPFL 2005.27

10−4

10−3

10−2

10−1

100

0

0.05

0.10

0.15

0.20

rfp

rf

n

(a) Average DET curve forρ = 1

10−4

10−3

10−2

10−1

100

−0.02

−0.01

0

0.01

0.02

rfp

∆rf

n

ρ = 1

ρ = 102

ρ = 104

ρ = 106

(b) Vertical differences to curve 4a

Fig. 4: Influence ofρ. Vertically averaged DETcurves for a fixed number of dimensionsd = 50,and different values ofρ.

negative rates histograms for a fixed false positiverate rfp = 5 · 10−5 (leftmost value of the curves inFig. 4) and two different values ofρ, namely1 and104. On the one hand, the two histograms show thatabout 50% of the individual classifiers have falsepositive rates below5%. Actually, about25% of theclassifiers have no false negative at all. On the otherhand, the two histograms show that a small numberof classifiers (respectively15 and 9) presents falsepositive rates above50%. Moreover, this proportioncan be seen to decrease whenρ increases. Thecorresponding reference images possess few colors,or are gray-level images. It shows that color relatedfeatures are very powerful discriminating features,and that the lack of color variety complicates thereplica detection task. The bad classifiers partici-

0 0.2 0.4 0.6 0.8 10

50

100

150

rfp

coun

t

ρ = 1

ρ = 104

Fig. 5: False negative histograms. For a fixed falsepositive raterfp = 5 · 10−5 (leftmost value of thecurves in Fig. 4), number of dimensionsd = 50,and two different values ofρ.

pate heavily in increasing the average false positiverates. When their proportion diminishes, so does theaverage false positive error, explaining the resultsobtained in Fig. 4.

C. Gray Level Features

In this trial, the detection performance obtainedby using gray-level features are compared to thatwhen not using them. Figure 6 depicts the perfor-mance improvement brought by adding gray-levelfeatures. The performance gap augments as the falsepositive rate decreases. Note that gray-level imagesare present in both the reference images and the testimages. Straightforwardly making use of gray-levelfeatures greatly improves the performance on theseimages. Moreover, it also increases the performancefor color images. Indeed, gray-level features captureinformation that is missed by the color features,namely the global intensity shape.

A possible improvement is to use a replica detec-tion system to test color images, and another onewhich does not make use of color features to testtest gray-level images. The drawback of such anapproach is that it requires storing two descriptionsfor each reference image.

D. Weighted Inter-Image Differences

In this experiment, we analyse the advantage ofusing weighted inter-image differences. Figure 7illustrates the performance increase due to theweighted inter-image differences. The DET curve

MARET et al.: IMAGE REPLICA DETECTION BASED ON BINARY SUPPORT VECTOR CLASSIFIER 11

10−4

10−3

10−2

10−1

100

0

0.05

0.10

0.15

0.20

rfp

rf

n

no gray-level features

with gray level features

Fig. 6: Influence of gray-level features. Verticallyaveraged DET curves with, and without, using gray-level features.

10−4

10−3

10−2

10−1

100

0

0.05

0.10

0.15

0.20

rfp

rf

n

‘optimzed’ sα

sα = 0

Fig. 7: Influence of the weighted inter-image differ-ences. Vertically averaged DET curves forsα = 0and optimizedsα.

with sα = 0 corresponds to the situation wherethe differences are not weighted. In this case, (6)becomesδα = fα − fα. The other DET curve corre-sponds to the case where thesα have been optimizedas described in Sec. III-C. The performance gapincreases as the false positive rate diminishes. Forlow false positive rates, the false replicas are mainlyimages that are similar to the reference image. Forexample, if the reference is a picture of a city,most false replicas contain buildings or straightstructures. That is, many of the features of the falsereplicas are close to those of the reference image.In this situation, the use of weighted inter-imagedifferences helps in taking the correct decision.

10−4

10−3

10−2

10−1

100

0

0.05

0.1

0.15

0.2

rfp

rf

n

no dim. red.

d = 10

d = 20

d = 30

d = 40

d = 50

d = 60

d = 120

Fig. 8: Influence of the dimensionality reduction.Vertically averaged DET curves for different valuesof d.

E. Dimensionality Reduction Performance

We now explore the efficiency of the dimen-sionality reduction step. Figure 8 illustrates theperformance obtained with different dimensionsd:in general, the larger the number of dimensions, thebetter the performances. However, the gain is notvery important when going overd = 50. The thickline represents the performances achieved withoutdimensionality reduction. In this case, the perfor-mance is mostly equivalent to that ofd = 40 for lowfalse positive rates (rfp < 10−4) and significantlylower for higher false positive rates. This can beeasily explained by the limited number of trainingexamples. Indeed, the number of needed trainingexamples grows as the number of dimension in-creases [21], [30].

Another factor explaining the results obtained inFig. 8 is related to metric consideration. When nodimensionality reduction is applied, the distanceused for the kernel computation is proportional toa weighted Euclidian distance where the weightsare given by the inverse of the standard deviation,as shown by (9) and (12). This givesa prioriequal weights to each feature. However, there is noreason that every features carry the same amount ofdiscriminative information. Moreover, they dependon each other. When dimensionality reduction isapplied, the reduced patternsxi are of unit variance,and theoreticallyindependent[23]. In that case,giving equal weights to every features makes moresense and gives better results. In fact, recent works

12 ITS EPFL 2005.27

TABLE IV: Storage requirements estimation.Realnumber are coded on 16 bits (two bytes). The num-ber of dimensions after dimensionality reduction is50. In our experiment (usingd = 50 andρ = 104),the average number of support vectors is found tobe about85.

name size in bytesimage features 162 · 2 = 324image weights 162 · 0 = 0norm. const. 2 · 162 · 2 = 648dim. reduct. matrix 50 · 162 · 2 = 16, 200support vectors 50 · 85 · 2 = 8, 500total 25,672 = 25 kilobytes

have shown that using metrics that are learnedfrom side-information can improve classificationresults [31], [32]. In the case of the replica detec-tion problem, two side-informations are available:the class (replica or non-replica), and the relativedistance to the reference for replicas (for example,a JPEG compressed image withQ = 10 is fartherto the reference than one withQ = 90). Futureresearch will study the improvements brought byusing learned metric in the replica detection prob-lem.

F. Efficiency

The replica detection method efficiency is nowanalysed in term of storage requirement and com-putational effort.

A number of parameters are needed to comparea test image to a given reference. Namely, theyare the reference image features and weights, thenormalization constants, the dimensionality reduc-tion matrix, and the support vectors of the deci-sion functions. In the following we refer to theaforementioned elements as thedescriptionof thereference image. The storage requirement are listedin Table IV. On average, about25 kilobytes areneeded to store each description. In other words,one megabyte can held, on average, up to fortydescriptions. This is a negligible amount of memoryfor today computers.

Another important aspect is that of computationalcomplexity of the method. The proposed methodrequire a training for each reference image. Thetraining is computationally complex and take up tofifteen minutes per reference image on a PC witha 2.8 GHz processor and 1Go of RAM. Featureextraction from the replica examples is the most

TABLE V: Average running time for testing.The ex-periments were carried out on a PC with a 2.8GHzprocessor and 1Go of RAM.

name time, sreferenceimage indep.

preprocessing 0.2feature extraction 1.8a

referenceimage dep.

weighted features 48 · 10−6

normalization 90 · 10−6

dim. reduction 17 · 10−6

decision function 50 · 10−6

a0.1 s when optimized as in [9].

complex part of the training, and takes up to75% ofthe running time. Since training can be done offline,its computational complexity is less critical.

The computational complexity of testing is es-timated in Table V. Note that except for the SVMpart, the method is implemented inMatlab withoutany optimization. This incurs longer running time.For instance, the feature extraction could be reducedto, at least,0.1 seconds [9]. In the discussion thatfollows, we assume an optimized feature extractionstep. The preprocessing and feature extraction stepsare reference image independent, and take up about0.3 seconds. The remaining steps are reference im-age dependent, and take up about0.2·10−3 seconds.When the reference image database contains lessthan1, 500 images, most of the testing time is spenton the test image preprocessing and the featureextraction. In that case, up to four test imagescan be processed per second. For larger referencedatabases, most of the testing time is spent onthe reference image dependent steps. The numberof test images that can be processed each seconddecreases linearly as the number of reference imagesgrows. Future research will concentrate on pruningthe reference images, in order to avoid testing themall. That is, only the reference images for which thetest image can be potentially a replica are selected.Such methods can reduce the testing time, and havebeen applied with success in [8], [9].

G. Comparison with Existing Replica DetectionMethods

Figure 9 compares the performance of the pro-posed replica detection system with state of thearts techniques in [8], [9]. The continuous linecorresponds to the vertically averaged DET curveobtained with our system, usingd = 50 and ρ =

MARET et al.: IMAGE REPLICA DETECTION BASED ON BINARY SUPPORT VECTOR CLASSIFIER 13

10−4

10−3

10−2

10−1

100

0

0.05

0.1

0.15

0.2

rfp

rf

nproposed method

DPF

KPs

Fig. 9: Comparison with other methods. The pro-posed method is compared against other methods:KPs [8], and DPF [9].

104. The dashed line represents the performancesof a replica detection method based on PerceptualDistance Function (DPF) [9]. The circle point indi-cates the performance of a replica detection systembased on key points (KPs) [8]. For DPF the curveis obtained by inspecting the precision-recall curvereported in Fig. 5 of [9], and usingN = 20, 000 andP = 40. For KPs, the point is computed using theinformation reported in Table 1 and 2 of [8], andusingN = 18, 722 andP = 40.

It can be seen that the proposed method achievesgood performance. For instance, an average falsenegative rate of8% corresponds to a fixed falsepositive rate of1 · 10−4. On the one hand, ourmethod outperforms that of DPF for false positiverates below10−2. Moreover, it should be noted thatthe features used in the current work are mainly asubset of those used in DPF: we use162 featuresagainst298 in the latter study. On the other hand,the proposed method is outperformed by KPs. Inour method, most of the wrongly classified replicas(false negative errors) correspond to replicas forwhich the illumination or the intensity have beenchanged to a great extend. The method KPs usesfeatures (salient points, or key points [33]) invariantto this change but computationally more complex toextract. Indeed, the feature extraction time of KPs isbetween 1 and 10 seconds per image [8], [9]. Thisis between 10 to 100 times slower than that for theproposed method (when optimized as in [9]).

VI. A PPLICATIONS AND SCENARIO

The proposed image replica detection systemis suitable to detect copyright infringement or toidentify illicit content. In this section, we discussin more details the scenarios to use the proposedtechnique in such a task.

In the design of our system, we assume a givendatabase of reference images, and we test input testimages towards this database in order to identifyreplicas. Furthermore, we assume that the set oftest images can be extremely large, but the set ofreference images is moderate in size. To guarantee afast testing procedure, we consider a limited numberof low level features, as previously discussed. Notethat in order to handle large set of reference images,a subset of all the features can be used in a firststep in order to quickly drop test images which areclassified as non-replicas with high confidence.

In the target applications, the database of ref-erence images can for instance be a collection ofcopyrighted images or illicit child pornography im-ages. To perform the task of copyright infringementor illicit content detection, several configurationsare possible. In one scenario, we consider a proxyserver performing network sniffing at nodes of theInternet. The proxy server contains the database ofreference images, or more specifically the extractedfeatures, normalization constants, reduction matrixand classifier for each reference. Subsequently, theproxy server processes each incoming image, apply-ing preprocessing and features extraction, comparesit pairwise with each reference image, and finallydecides whether it is a replica of one image inthe database. In another scenario, we consider aweb crawler to search for replicas on the Internet.Similarly to the previous case, the crawler looksfor test images and compares them to a databaseof reference images in order to identify replicas.

Other applications and scenarios are also pos-sible, although the proposed system may be lesssuited for them. For instance, the technique can alsobe used in an image web search engine in order toprune the results of a query by eliminating replicas.Alternatively, it is also possible to build an index ofweb images, and to check whether a given referenceimage has replicas in this index.

As common in a classification technique, a trade-off exists between the false positive and false neg-ative rates. On the one hand, a low false positive

14 ITS EPFL 2005.27

rate is desired whenever a user does not want to beoverwhelmed by false positive. On the other hand, alow false negative rate is preferable for a user whowished to detect all possible replicas. The optimaltrade-off is therefore application dependent.

VII. C ONCLUSION

In this paper, we have described a techniqueto classify whether a test image is a replica of agiven reference image. We performed experimentson a large database containing18, 785 photographsrepresenting a wide range of content. We were ableto detect, on average,92% of the replicas whileachieving a fixed false positive rate of only1 ·10−4.Moreover, we showed that using a replica detectorthat is fine-tuned to each reference image can greatlyimprove the performance.

Future works include the addition of a pruningstep in order to decrease the number of referenceimage to be tested (presently all reference imagesare to be tested). It can be accomplished by meansof tree-based indexing techniques. Another workconsists in using additional side information toimprove the fine-tuning of the replica detector.

APPENDIX

INVARIANCE OF EQUALIZED ILLUMINATION TO

GAMMA AND ILLUMINATION CHANGES

Intensity and gamma changes are modeled as:

g(r) = αrγ , (14)

wherer is the intensity of the input image (in therange[0, 1]), andg(r) that of the output image.

Let pi(w) be the probability density of pixelsof the input image. It follows that the probabilitydensity of the output image is given by:

po(w) = h′(w) pi(h(w)), (15)

whereh(w) = g−1(w) = (w/α)1/γ andh′(w) is itsfirst derivative.

The global histogram equalization maps the im-age with a given probability densityp(w) to animage with an uniform probability density. Themapping is given by [15]:

s = T(r) =

∫ r

0

p(w) dw (16)

wherer is the intensity before equalization, andsthe one after.

Let so = T(ro) be the equalized intensity afterchanges of illumination and gamma onri. Moreprecisely,

so = T(ro) = T(g(ri)) =

∫ g(ri)

0

po(w) dw (17)

=

∫ g(ri)

0

h′(w) pi(h(w)) dw (18)

=

∫ h(g(ri))=ri

0

h′ (g(v)) pi (h(g(v))) g′(v) dv

(19)

=

∫ ri

0

h′ (g(v)) pi(v) g′(v) dv (20)

=

∫ ri

0

d

dv

h (g(v))︸ ︷︷ ︸

=v

pi(v) dv (21)

=

∫ ri

0

pi(v) dv = T(ri) = si. (22)

An image and its versions processed by (14) aremapped to the same equalized image by (16). Theonly fact used above is that an inverse exists forthe functiong(·). Therefore the above results canbe generalized to any reversible transformation ofthe image.

Note that it cannot be proved in general thatthe discrete version of the histogram equalizationproduces the discrete equivalent of a uniform prob-ability function [15]. However, in practice, discretehistogram equalization yields images that are mostlyinvariant to the gamma and illumination changes.

ACKNOWLEDGEMENTS

The first author is partly supported by the SwissNational Science Foundation — “Multimedia Secu-rity”, grant number 200021-1018411. The work waspartially developed within VISNET3, a EuropeanNetwork of Excellence, funded under the EuropeanCommission IST FP6 programme. Also, the authorswould like to thanks the creators of the SVM librarylibsvm [34], and those of the image processingsoftware suite ImageMagick.

REFERENCES

[1] ISO/IEC JTC1/SC29/WG11 WG11N4358, “Final Draft Int’lStandard 15938-3 Information Technology - Multimedia Con-tent Description Interface - Part 3 Visual,” July 2001.

3 http://www.visnet-noe.org

MARET et al.: IMAGE REPLICA DETECTION BASED ON BINARY SUPPORT VECTOR CLASSIFIER 15

[2] Martınez, J.M. and Koenen, R. and Pereira, F., “MPEG-7:The Generic Multimedia Content Description Standard,”IEEEMultimedia, vol. 9, no. 2, pp. 78–87, April 2002.

[3] ISO/IEC JTC1/SC29/WG1 WG1N3684, “JPSearch Part 1 TR- System Framework and Components,” July 2005.

[4] F. Hartung and M. Kutter, “Multimedia watermarking tech-niques,” Proc. of the IEEE, vol. 87, no. 7, pp. 1079 – 1107,July 1999.

[5] F. Lefebvre, B. Macq, and J.-D. Legat, “Rash: Radon soft hashalgorithm,” in EURASIP European Signal Processing Conf.,France, September 2002.

[6] J. Seo, J. Haitsma, T. Kalker, and C. Yoo, “Affine transformresilient image fingerprinting,” inIEEE Int’l Conf. on Acoustics,Speech, and Signal Processing, Hong Kong, April 2003.

[7] R. Venkatesan, S.-M. Koon, M.-H. Jakubowski, and P. Moulin,“Robust image hashing,” inIEEE Int’l Conf. on Image Process-ing, Vancouver, September 2000.

[8] Y. Ke, R. Sukthankar, and L. Huston, “An Efficient Parts-BasedNear-Duplicate and Sub-Image Retrieval System,” inACM Int’lConf. on Multimedia, 2004, pp. 869–876.

[9] A. Qamra, Y. Meng, and E. Y. Chang, “Enhanced PerceptualDistance Functions and Indexing for Image Replica Recogni-tion,” IEEE Trans. on Pattern Analysis and Machine Intelli-gence, vol. 27, no. 3, pp. 379–391, March 2005.

[10] Y. Maret, G. Molina, and T. Ebrahimi, “Images IdentificationBased on Equivalence Classes,” inProc. Workshop on ImageAnalysis for Multimedia Interactive Services, April 2005.

[11] ——, “Identification of Image Variations based on EquivalenceClasses,” inProc. SPIE Visual Communications and ImageProcessing. SPIE, July 2005.

[12] Y. Maret, F. Dufaux, and T. Ebrahimi, “Image Replica Detectionbased on Support Vector Classifier,” inProc. SPIE Applicationsof Digital Image Processing XXVIII, August 2005.

[13] T. Fawcett, “ROC Graphs: Notes and Practical Considerationsfor Data Mining Researchers,” 2003, technical Report HPL-2003-4.

[14] G. Finlayson and G. Schaefer, “Hue that is Invariant to Bright-ness and Gamma,” inProc. British Machine Vision Conf., 2001.

[15] R. Gonzalez and R. Woods,Digital Image Processing, 2/E.Prentice-Hall, 2002.

[16] Y. Rui, T. Huang, and S. Chang, “Image retrieval: currenttechniques, promising directions and open issues,”J. of VisualCommunication and Image Representation, vol. 10, no. 4, pp.39–62, April 1999.

[17] B. S. Manjunath and W. Y. Ma, “Texture Features for Browsingand Retrieval of Image Data,”IEEE Trans. on Pattern Analysisand Machine Intelligence, vol. 18, no. 8, pp. 837–842, August1996.

[18] M.-K. Hu, “Visual Pattern Recognition by Moment Invariants,”IEEE Trans. on Information Theory, pp. 179–187, 1962.

[19] J.-L. Leu, “Computing a Shape’s Moments from its Boundary,”Pattern Recognition, vol. 24, no. 10, pp. 949–957, 1991.

[20] J. Smith and A. Natsev, “Spatial and Feature Normalization forContent Based Retrieval,” inIEEE Int’l Conf. Multimedia andExpositions, vol. 1, 2002, pp. 193–196.

[21] D. L. Donoho, “High-dimensional data analysis: The cursesand blessings of dimensionality,” August 1998. [Online].Available: http://www-stat.stanford.edu/∼donoho/

[22] N. Kwak, C.-H. Choi, and C.-Y. Choi, “Feature extraction usingica,” in Proc. Int’l Conf. on Artificial Neural Networks, August2001, pp. 568–576.

[23] A. Hyvarinen and E. Oja, “Independent component analysis:Algorithms and applications,”Neural Networks, vol. 13, no.4-5, pp. 411–430, 2000.

[24] B. Scholkopf, A. Smola, R. Williamson, and P. L. Bartlett,

“New support vector algorithms,”Neural Networks, vol. 22,pp. 1083–1121, 2000.

[25] C. J. C. Burges, “A tutorial on support vector machines forpattern recognition,”Data Mining and Knowledge Discovery,vol. 2, no. 2, pp. 121–167, 1998.

[26] P.-H. Chen, C.-J. Lin, and B. Scholkopf, “A tutorial on nu-support vector machines,”Applied Stochastic Models in Busi-ness and Industry, vol. 21, pp. 111–136, 2005.

[27] K. Duan, S. S. Keerthi, and A. N. Poo, “Evaluation of sim-ple performance measures for tuning SVM hyperparameters,”Neurocomputing, vol. 51, pp. 41–59, 2003.

[28] I. Steinwart, “On the Optimal Parameter Choice forν-SupportVector Machines,”IEEE Trans. on Pattern Analysis and Ma-chine Intelligence, vol. 25, no. 10, pp. 1274–1284, October2003.

[29] A. Martin, G. Doggintgton, T. Kamm, M. Ordowski, andM. Przybocki, “The DET Curve in Assessment of DetectionTask Performance,” inProc. Eurospeech, Greece, 1997, pp.1895—1898.

[30] R. Duda, P. Hart, and D. Stork,Pattern Classification. Wiley-Interscience, 2000.

[31] E. Xing, A. Ng, M. Jordan, and S. Russell, “DistanceMetric Learning with Application to Clustering with Side-Information,” in Proc. Advances in Neural Information Pro-cessing Systems, 2003, pp. 505–512.

[32] A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, “LearningDistance Functions using Equivalence Relations,” inProc. Int’lConf. on Machine Learning, Washington DC, August 2003.

[33] D. Lowe, “Distinctive Image Features from Scale-InvariantKeypoints,” Int’l J. of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.

[34] C. Chang and C. Lin, LIBSVM: a library forsupport vector machines, 2001. [Online]. Available:http://www.csie.ntu.edu.tw/∼cjlin/libsvm