A robust image fingerprinting system using the Radon transform

15
Signal Processing: Image Communication 19 (2004) 325–339 A robust image fingerprinting system using the Radon transform Jin S. Seo a, *, Jaap Haitsma b , Ton Kalker b , Chang D. Yoo a a Department of EECS, KAIST, 373-1 Guseong Dong, Yuseong Gu, Daejeon 305-701, South Korea b Philips Research Eindhoven, Prof. Holstlaan 4, Eindhoven 5656AA, The Netherlands Received 6 May 2003; received in revised form 12 November 2003; accepted 1 December 2003 Abstract With the ever-increasing use of multimedia contents through electronic commerce and on-line services, the problems associated with the protection of intellectual property, management of large database and indexation of content are becoming more prominent. Watermarking has been considered as efficient means to these problems. Although watermarking is a powerful tool, there are some issues with the use of it, such as the modification of the content and its security. With respect to this, identifying content itself based on its own features rather than watermarking can be an alternative solution to these problems. The aim of fingerprinting is to provide fast and reliable methods for content identification. In this paper, we present a new approach for image fingerprinting using the Radon transform to make the fingerprint robust against affine transformations. Since it is quite easy with modern computers to apply affine transformations to audio, image and video, there is an obvious necessity for affine transformation resilient fingerprinting. Experimental results show that the proposed fingerprints are highly robust against most signal processing transformations. Besides robustness, we also address other issues such as pairwise independence, database search efficiency and key dependence of the proposed method. r 2003 Elsevier B.V. All rights reserved. Keywords: Fingerprinting; Robust hashing; Robust matching; Content identification; Image indexing 1. Introduction Multimedia fingerprinting (also known as ro- bust hashing) is an emerging research area that is receiving increased attention. Fingerprints are perceptual features or short summaries of a multimedia object. This concept is an analogy with cryptographic hash functions which map arbitrary length data to a small and fixed number of bits [27]. Although cryptographic hashing is a proven method in message encryption and authen- tication, it is not possible to directly apply it to multimedia fingerprinting. Multimedia contents often undergo various manipulations during dis- tribution including compression, enhancement, geometrical distortions and analog-to-digital con- version that may preserve perceptual value. How- ever, cryptographic hash functions are bit sensitive: an alteration of a single bit in the ARTICLE IN PRESS *Corresponding author. E-mail addresses: [email protected] (J.S. Seo), jaap. [email protected] (J. Haitsma), [email protected] (T. Kalker), [email protected] (C.D. Yoo). 0923-5965/$ - see front matter r 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2003.12.001

Transcript of A robust image fingerprinting system using the Radon transform

ARTICLE IN PRESS

Signal Processing: Image Communication 19 (2004) 325–339

*Correspondin

E-mail addr

haitsma@philips

(T. Kalker), cdy

0923-5965/$ - see

doi:10.1016/j.ima

A robust image fingerprinting system using theRadon transform

Jin S. Seoa,*, Jaap Haitsmab, Ton Kalkerb, Chang D. Yooa

aDepartment of EECS, KAIST, 373-1 Guseong Dong, Yuseong Gu, Daejeon 305-701, South KoreabPhilips Research Eindhoven, Prof. Holstlaan 4, Eindhoven 5656AA, The Netherlands

Received 6 May 2003; received in revised form 12 November 2003; accepted 1 December 2003

Abstract

With the ever-increasing use of multimedia contents through electronic commerce and on-line services, the problems

associated with the protection of intellectual property, management of large database and indexation of content are

becoming more prominent. Watermarking has been considered as efficient means to these problems. Although

watermarking is a powerful tool, there are some issues with the use of it, such as the modification of the content and its

security. With respect to this, identifying content itself based on its own features rather than watermarking can be an

alternative solution to these problems. The aim of fingerprinting is to provide fast and reliable methods for content

identification. In this paper, we present a new approach for image fingerprinting using the Radon transform to make the

fingerprint robust against affine transformations. Since it is quite easy with modern computers to apply affine

transformations to audio, image and video, there is an obvious necessity for affine transformation resilient

fingerprinting. Experimental results show that the proposed fingerprints are highly robust against most signal

processing transformations. Besides robustness, we also address other issues such as pairwise independence, database

search efficiency and key dependence of the proposed method.

r 2003 Elsevier B.V. All rights reserved.

Keywords: Fingerprinting; Robust hashing; Robust matching; Content identification; Image indexing

1. Introduction

Multimedia fingerprinting (also known as ro-bust hashing) is an emerging research area that isreceiving increased attention. Fingerprints areperceptual features or short summaries of amultimedia object. This concept is an analogy

g author.

esses: [email protected] (J.S. Seo), jaap.

.com (J. Haitsma), [email protected]

[email protected] (C.D. Yoo).

front matter r 2003 Elsevier B.V. All rights reserve

ge.2003.12.001

with cryptographic hash functions which maparbitrary length data to a small and fixed numberof bits [27]. Although cryptographic hashing is aproven method in message encryption and authen-tication, it is not possible to directly apply it tomultimedia fingerprinting. Multimedia contentsoften undergo various manipulations during dis-tribution including compression, enhancement,geometrical distortions and analog-to-digital con-version that may preserve perceptual value. How-ever, cryptographic hash functions are bitsensitive: an alteration of a single bit in the

d.

ARTICLE IN PRESS

Nomenclature

Hð Þ hash or fingerprint functionHK ð Þ hash or fingerprint function with key K

f ðx; yÞ an image signal in spatial coordinate ðx; yÞgðs; yÞ the Radon transform of an imagecðl; yÞ the normalized auto-correlation of each

radial projection

Cðzl ; zyÞ the 2D Fourier transform of the log-mapped cðl; yÞ

PFA the false alarm ratePFR the false rejection rateBðn; pÞ binomial distribution to a sequence of n

trials with probability p

Nðm;sÞ normal distribution with mean m andstandard deviation s

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339326

multimedia content will result in a completelydifferent hash value. This renders cryptographichash functions not applicable to multimediaobject. As opposed to hash functions for thebinary messages (or documents), the multimediafingerprinting function should allow for somemodification of the content. To supplement thesedeficiencies of cryptographic hash functions wearrive at the notion of multimedia fingerprinting,sometimes referred to as robust hash functions[19,43].A number of applications of multimedia finger-

printing have been considered [24].

* Filtering technology for file sharing: Filteringrefers to active intervention in content distribu-tion. The prime example is Napster. To settlelegal dispute with the music industry Napsterintroduced the audio filtering system, whichrestricts copyrighted songs from being down-loaded. The demand for filtering mechanism isgrowing as more people exchange copyrightedimages and video through file sharing services.Multimedia fingerprinting is considered as agood candidate for such a filtering mechanism[19,32,36]. Besides copyright protection, thefiltering scheme can be used to provide morerefined file sharing service: a service thatprovides free material, different kinds of pre-mium material (accessible to those with aproper subscription) and forbidden material.

* Broadcast monitoring: Monitoring refers totracking of radio, television or web broadcastsfor, among others, the purposes of royaltycollection, program verification and peoplemetering [19,15,32]. This application is passivein the sense that it has no direct influence on

what is being broadcast: the main purpose ofthe application is to observe and report. Abroadcast monitoring system based on finger-printing consists of several monitoring sites anda central site where the fingerprint server islocated. At the monitoring sites fingerprints areextracted from all the (local) broadcast chan-nels. The central site collects the fingerprintsfrom the monitoring sites. Subsequently thefingerprint server, containing a huge fingerprintdatabase, produces the playlists of the respec-tive broadcast channel.

* Automated indexing of multimedia library:Nowadays many computer users have a multi-media library containing several hundreds,sometimes even thousands, of multimedia files(song, image and video clips). When the files areobtained from different sources, such as rippingfrom a music CD, scanning of image anddownloading from file sharing services, theselibraries are often not well organized. Byidentifying these files with fingerprinting thefiles can be automatically labeled with thecorrect metadata, allowing easy organizationbased on, for example, artist, music album orgenre [18,29,30,42].

* Connected content: Connected content is ageneral term for consumer applications wherethe content is somehow connected to additionaland supporting information. Audio recognitionover mobile phone [19,41] is the typicalexample. Imagine the following situation. Youare in your car, listening to the radio andsuddenly you hear a song that catches yourattention. It is the best new song you have heardfor a long time. However you missed theannouncement and you do not recognize the

ARTICLE IN PRESS

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339 327

artist. Still, you would like to know more aboutthis music. For this case, you could connect theaudio fingerprinting server through your mobilephone to get the title of the music and the nameof the artist [19]. It goes without saying thatsimilar applications can be defined for othertypes of content.

* Authentication: Powerful and easy multimediamanipulation software has made it possible toalter digital contents. Authentication verifiesthe originality of the contents by detectingmalicious manipulations. Regarding multime-dia authentication, the key issue is to protectthe message conveyed by the content not theparticular representation of the content [33,37].Content authentication methods can be classi-fied into fingerprint-based [9,17,37] and water-mark-based [12,46] approaches. In general thefingerprint-based approaches are more robustthan watermark-based ones. However thefingerprint-based approaches need additionalcommunication channel or storage to check theauthenticity of the content; the watermark-based ones do not need these.

The mentioned applications have boosted theinterest in multimedia fingerprinting and also anumber of image fingerprinting methods havebeen proposed so far. Schneider and Changproposed a method that uses the intensity histo-gram of each image block as a fingerprint to verifyauthenticity of an image [37]. Venkatesan et al.proposed a method based on randomized proces-sing [43]. Fridrich proposed a method that projectseach image block by a random pattern andthresholds it to get fingerprint bits [16]. Lefebvreet al. proposed a fingerprinting method based onthe Radon transform [25]. Also patents [13,20] andother papers [1,11,28,44] have been published onthis topic. Already a number of companies [2–4,7]have realized the business potential of multimediafingerprinting. Other signs of a growing awarenessof the potential of fingerprinting are a Europeanproject on audio recognition [35], the announce-ment of cooperation between a file-sharing com-pany and content recognition company [36]and a Call for Information by the IFPI and theRIAA [22].

In this paper, we propose the affine transforma-tion resilient image fingerprinting method usingthe Radon transform [23]. The Radon transformhas some useful properties to achieve affineresilience and has proved to be robust againstmany image processing steps such as sharpening,blurring and compression. Most of the previousmethods show limited robustness [16,28,37,43] orrequire search [25] for the affine transformations.The proposed method is highly robust againstaffine transformations without searching for theoriginal orientation and achieves collision-freeproperty with relatively small amount of finger-print bits (400 bits per image). Clearly these willincrease the practical use of the proposed method.This paper is organized as follows. Section 2

presents the definition and requirements of crypto-graphic hashing and multimedia fingerprinting.Section 3 describes the proposed affine invariantfingerprinting system. Section 4 evaluates theperformance of the proposed method.

2. Overview of multimedia fingerprinting

2.1. Cryptograhic hashing vs. multimedia

fingerprinting

A cryptographic hash function HðX Þ maps an(usually large) object X to a (usually small) hashvalue. It allows comparing two large objects X andY ; by only comparing their respective hash valuesHðX Þ and HðY Þ: Mathematical equality of HðX Þand HðY Þ implies the equality of X and Y withonly a very low probability of error. For aproperly designed cryptographic hash functionthis should be 2�L; where L equals the numberof bits in the hash value. Using cryptographic hashfunctions, an efficient method exists to checkwhether or not a particular data item X iscontained in a given and large data set Y ¼ fYig:Instead of storing and comparing with all of thedata in Y ; it is sufficient to store the set of hashvalues fhi ¼ HðYiÞg; and to compare HðX Þ withthis set of hash values. This method is moreefficient because the storage requirements areusually more relaxed and fewer bits have to becompared. The only caveat is that an initial

ARTICLE IN PRESS

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339328

pre-computation is required to compute the hashvalues fhig:We note that good cryptographic hashfunctions do indeed exist. The desirable propertiesof cryptographic hash functions Hð Þ are given asfollows [5]:

* One-way hash function: Given HðX Þ; it is hardto find an original object X ; and given X andHðX Þ; it is hard to find an object X 0ðaX Þ suchthat HðX 0Þ ¼ HðX Þ:

* Collision-free hash function: It is hard to findtwo distinct objects X and Y that hash to thesame result ðHðX Þ ¼ HðY ÞÞ:

* Keyed hash function: Without knowledge of keyK ; it is hard to determine HK ðX Þ for any objectX ; to find X from HK ðX Þ and to find twodistinct objects X and Y that hash to the sameresult (HK ðX Þ ¼ HK ðY Þ). Given (possiblymany) pairs of objects Xi and their hash valuesHðXiÞ; it is hard to find the secret key K :

Given the above arguments, one might supposethat cryptographic hash functions are a good toolto identify multimedia content: take a hashfunction, store hash values for all availablecontents in a large database (costly, but it needsto be done only once) and identify content by hashmatching. This method will however fail whenusing classical cryptographic hash functions. For amultimedia content we are not interested inmathematical equality, but perceptual equalitysince it often undergoes various quality-preservingmanipulations during distribution, which includecompression, enhancement, geometrical distor-tions and analog-to-digital conversion. Crypto-graphic hash functions cannot identify theprocessed images as the original one. This problemcould be mitigated if cryptographic hash functionswould have a continuous behavior, i.e. if percep-tually similar content would at least result inmathematically similar hash values. However,cryptographic hash functions typically have ratherthe opposite property, in the sense that they are bit

sensitive: a single bit of difference in the contentwill result in a completely different hash value.From these observations we conclude that for

multimedia identification we need multimediafingerprinting (perceptual hashing) functions, func-

tions that (i) map large multimedia objects to asmall number of bits, and (ii) map perceptuallysimilar objects to (mathematically) similar finger-print values [24]. More precisely, for a properlydesigned fingerprint function F ; there should be athreshold T such that jF ðAÞ � F ðBÞjoT if multi-media objects A and B are similar and with highprobability jF ðAÞ � F ðBÞj > T if they are dissimilar[19]. An observing reader might wonder whyinstead of (ii) we do not require that perceptuallysimilar objects have mathematically equal finger-print values. The question is valid, but the answeris that such a modeling of perceptual similarity isnot possible for reasons of transitivity. To be moreprecise, it is a known fact that perceptual similarityis not transitive. Perceptual similarity of a pair ofobjects A and B and of another pair of objects Band C does not necessarily imply the perceptualsimilarity of objects A and C. However, modelingperceptual similarity by equality of fingerprintingfunctions would lead to a transitive relationship.

2.2. Multimedia fingerprinting concept

Content recognition by comparing features canbe categorized in two main classes; the class ofmethods based on the semantic and the non-semantic features [32]. The former class consists ofextracting and representing content by semanti-cally meaningful features. Examples of suchfeatures are scene boundaries and color-histo-grams. The latter class uses features that have nodirect semantic interpretation but are nonethelessrobust with respect to content quality preservingtransformations. Both features can be used toestablish perceptual equality of an image. How-ever, it should be noted that a feature extractionmethod for fingerprinting must be quite differentfrom the methods used for retrieval. In retrieval,the features must facilitate searching of imagesthat somehow look similar to the query, of thatcontain similar objects as the query. In fingerprint-ing the requirement is to identify images that areperceptually the same, except for quality differ-ences or the effects of other signal processing.Therefore, the features for fingerprinting need tobe far more discriminatory, but they do notnecessarily need to be semantic [32]. Moreover,

ARTICLE IN PRESS

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339 329

there is no effective method yet to automaticallygenerate good semantic features for an image [21].Because of those reasons, the non-semantic robustfeatures are widely used in building fingerprints.Constructing a fingerprinting function is not a

trivial task. Given the lack of a proper audio–visual perception model (or perceptual metric) andan exact discrimination criterion, it is an ill-posedproblem to start with. For example, humans donot readily perceive a perceptual difference be-tween an original image and a moderately resizedor compressed version. In general, the fingerprint-ing function needs to have the following properties.

* Robustness (Invariance under perceptual simi-larity): The fingerprints resulting from degradedversions of an image should result in the sameor at least similar fingerprints with respect tothe fingerprint of the original image. Robust-ness refers to the ability to positively identifytwo perceptually similar objects as similar.

* Pairwise independence (Collision free): If twoimages are perceptually different, the finger-prints from two images should be considerablydifferent. For pairwise independence, it is desir-able that fingerprint bits have uniform distribu-tion and are uncorrelated with each other.

* Database search efficiency: For the commercialapplications mentioned in Section 1, fastdatabase search is essential. Through a naiveapproach, every identification of an imagewould approximately require more than mil-lions of comparisons for a large database (forexample, millions of images). For any reason-able bit-size of the fingerprints, this would meanimpractical access times. Without a properstructure in a fingerprint database, searchingand retrieving will easily explode into animpractical system [8,10]. However, if the spaceof fingerprint values has some kind of percep-tual ordering structure, search complexity canbe considerably reduced [24].

3. Proposed image fingerprinting method

The fingerprinting function should be based onrobust and perceptually relevant features to meet

the requirements in Section 2.2. Our primeobjective is to achieve high robustness againstaffine transformations without losing other desir-able properties. Resilience to affine transformationshas been one of the main issues in many imageprocessing research areas, such as pattern recogni-tion [31,45] and watermarking [6,26,34,40]. In orderto obtain affine resilience, the affine invariantfeatures based on the Radon transform are selectedfor the proposed image fingerprints. The Radontransform has proved to be robust against imageprocessing such as sharpening, blurring, addingnoise, compression and has some desirable proper-ties with regard to affine transformations. Detailsof the proposed method are in the next subsections.

3.1. Radon transform and its properties

The Radon transform represents an image as acollection of projections along various directions.It is widely used in areas ranging from seismologyto computer vision. The Radon transform of animage f ðx; yÞ; denoted as gðs; yÞ; is defined as its lineintegral along a line inclined at an angle y from they-axis and at a distance s from the origin [33] asshown in Fig. 1. Mathematically, it is written as

gðs; yÞ ¼Z

N

�N

ZN

�N

f ðx; yÞdðx cos y

þ y sin y� sÞ dx dy; ð1Þ

where �NosoN; 0pyop: The Radon trans-form gðs; yÞ is the one-dimensional projection off ðx; yÞ at an angle y: The Radon transform has thefollowing useful properties for the affine transfor-mations of an image.

(P1)

Translation of an image by ðx0; y0Þ causes theRadon transform to be translated in thedirection of s; i.e.,

f ðx � x0; y � y0Þ2gðs � x0 cos y� y0 sin y; yÞ:

(P2)

Scaling (retaining aspect ratio) of an imageby a factor r ðr > 0Þ causes the Radontransform to be scaled through the samefactor, i.e.,

f ðrx; ryÞ21

jrjgðrs; yÞ:

ARTICLE IN PRESS

Fig. 2.

Lena i

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339330

Rotation of an image by an angle yr causes

(P3) the Radon transform to be shifted by thesame amount, i.e.,

f ðx cos yr � y sin yr;x sin yr þ y cos yrÞ

2gðs; y� yrÞ:

Fig. 2 shows the Radon transform of the Lenaimage. Fig. 2a–d shows the Radon transform of

Fig. 1. Projection integral in the direction y:

The Radon transform of: (a) original, (b) translated (64 pixels

mage (size 512� 512).

original, translated (64 pixels in x direction), scaled(scaling factor 0.75) and rotated ð25Þ Lena image,respectively. Fig. 2b–d shows that gðs; yÞ istranslated by 64 cos y pixels in the s direction,scaled by a factor of 0.75 in the s direction andcyclicly shifted by 25 in the y direction, respec-tively, with respect to the Radon transform of theoriginal Lena image.

3.2. Affine invariant feature extraction

We consider here angle preserving affine trans-formations: translation, scaling (retaining aspect-ratio) and rotation. By using the above propertiesof the Radon transform, affine-invariant featuresare obtained. From (P1) the translation of animage causes translation in the Radon domain, butthe amount of translation in each projection isdifferent. For translation invariance, the normal-ized auto-correlation of each radial projection iscalculated that is given as follows:

cðl; yÞ ¼

RN

�Ngðs; yÞgðs � l; yÞ dsR

N

�Ngðs; yÞgðs; yÞ ds

: ð2Þ

in x direction), (c) scaled (scaling factor 0.75), (d) rotated ð25Þ;

ARTICLE IN PRESS

Fig. 3. Overview of affine transformation resilient fingerprint

extraction.

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339 331

By taking the auto-correlation, we get a transla-tion-invariant signal cðl; yÞ: Among the affinetransformations, scaling and rotation are re-mained in cðl; yÞ: Consider the auto-correlationcðl; yÞ of the Radon transform of an originalimage. From (P2) and (P3), the auto-correlation ofthe Radon transform of a scaled and rotatedimage is given as c0ðl; yÞ ¼ cðrl; y� yrÞ where rðr > 0Þ and yr are the amount of scaling androtation, respectively. To achieve invariance on thescaling and rotation, the log mapping and the 2DFourier transform are used. The log mappingtranslates the scaling of the signal to a shift. Thesubsequent Fourier transform translates this shiftinto a phase change. By the log mapping l ¼ em;the signal c0ðl; yÞ can be written as

c0ðl; yÞ ¼ cðrl; y� yrÞ

¼ cðexp½mþ log r ; y� yrÞ: ð3Þ

Then the log-mapped signal *c0ðm; yÞ is given by

*c0ðm; yÞ ¼ *cðmþ log r; y� yrÞ: ð4Þ

The 2D Fourier transform of the log-mappedsignal is written as

C0ðzl ; zyÞ ¼Z p

0

ZN

�N

*c0ðm; yÞexp½�jmzl � jyzy dm dy

¼ exp½jzl log r� jzyyr Cðzl ; zyÞ: ð5Þ

Then the magnitude jC0ðzl ; zyÞj and phase f0ðzl ; zyÞof the complex signal C0ðzl ; zyÞ are given by

jC0ðzl ; zyÞj ¼ jCðzl ; zyÞj ð6Þ

f0ðzl ; zyÞ ¼ zl log r� zyyr þ+Cðzl ; zyÞ

¼ zl log r� zyyr þ fðzl ; zyÞ; ð7Þ

where+Cðzl ; zyÞ is the phase of the complex signalCðzl ; zyÞ:As shown above, the log mapping translates

scaling into a shift, and the subsequent Fouriertransform translates the shift into a phase change.By using these properties, we find features that areinvariant to scaling. From Eq. (6), jC0ðzl ; zyÞj isaffine invariant. Since zl log r� zyyr in Eq. (7) is alinear function of zl and zy; the double differentia-tion of f0ðzl ; zyÞ on zl or zy is also affine invariant.

3.3. Fingerprint bit extraction

An overview of the proposed fingerprintingmethod is shown in Fig. 3. To obtain affineresilience, the affine invariant features described inSection 3.2 are used for fingerprint bit extraction.In practice, the energy compaction property of theDFT (Discrete Fourier Transform) ensures thatmost of the energy of the image is concentrated atlow-frequency coefficients of C0ðzl ; zyÞ: We notethat a large majority of the Fourier coefficients hasrelatively small values and therefore contributelittle to the total energy of the signal. From theenergy compaction property, we take the 21� 21low-frequency coefficients of C0ðzl ; zyÞ for thefiltering as shown in Fig. 4. In general, the low-frequency coefficients are more robust, and thehigh-frequency coefficients are more discrimina-tory. Therefore we should make a trade-offbetween robust and discriminatory features inchoosing the coefficients. Fingerprints from the21� 21 low-frequency coefficients showed the bestbalance between the two. Experimentally it wasverified that the sign of the difference betweenaffine invariant features is very robust againstmany kinds of processing and reduces correlationbetween fingerprint bits. The coefficients oflogjC0ðzl ; zyÞj and f0ðzl ; zyÞ are filtered by a simple2D filter Fzl ;zy (along both zl and zy axes), of whichthe kernel Fzl ;zy equals

Fzl ;zy ¼�1 1

1 �1

" #: ð8Þ

The output of the filter Fzl ;zy is invariant to affinetransformations as we have seen in Section 3.2. If akeyed fingerprinting function is required, we caninterleave jC0ðzl ; zyÞj and f0ðzl ; zyÞ before filteringas shown in Fig. 3. The details of the

ARTICLE IN PRESS

Fig. 4. Fingerprint bit extraction.

Fig. 5. (a) Fingerprint of original Lena image, (b) Fingerprint

of compressed Lena image, (c) the difference between a and b

showing bit errors in black ðBER ¼ 0:05Þ:

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339332

keyed fingerprinting function are described inSection 3.5. Finally the 20� 20 filter output (onlythose parts of the filter output that are computedwithout the zero padding) is converted to bits bytaking the sign of the resulting value (threshold-ing). Then we get two intermediate fingerprintsfrom the magnitude jC0ðzl ; zyÞj and the phasef0ðzl ; zyÞ: The fingerprint bits are determined bytaking exclusive-or (XOR) of the bits from themagnitude and the bits from the phase. It wasexperimentally verified that mixing the magnitudeand the phase information by XOR improves thepairwise independence (collision-free). Finally weobtain 20� 20 fingerprint bits (400 bits perimage). We refer to the 20� 20 fingerprint bitsas a fingerprint block and the 20 bits in each row asa sub-fingerprint. This ordering structure canreduce the database search complexity consider-ably [19]. The experimental database search resultis given in Section 4.2.Fig. 5 shows an example of a fingerprint block

(20 subsequent 20-bit sub-fingerprints) extractedwith the proposed method from the Lena image. A‘1’ bit corresponds to a white pixel and a ‘0’ bit toa black pixel. Fig. 5a and b shows a fingerprintblock from an original image and JPEG com-pressed (quality factor 10%) version of it, respec-tively. Ideally these two fingerprints should beidentical, but due to the compression some of thebits are erroneous. These bit errors, which are usedas the similarity measure, are shown in black inFig. 5c.

3.4. Fingerprint matching

For the fingerprint matching, the images aredeclared similar if the Hamming distance (bit errorrate) between their fingerprints is below a certainthreshold T : The problem could be formulated as

the following hypothesis testing using the finger-printing function Hð Þ:

* L0: Two images I and I 0 are from the sameimage if the Hamming distance DðHðIÞ;HðI 0ÞÞis below a threshold T :

* L1: Two images I and I 0 are from the differentimage if the Hamming distance DðHðIÞ;HðI 0ÞÞis above a threshold T :

For the selection of threshold T ; the false alarmrate PFA and the false rejection rate PFR should beconsidered. The false alarm rate PFA is theprobability to declare different images as similar.The false rejection rate PFR is the probability todeclare the images from the same image asdissimilar. In practice, PFR is difficult to analyzesince there are plenty of image processing steps ofwhich the exact characteristics are not known.Thus it is common to deal with PFA for theselection of threshold T :In order to analyze the choice of threshold T ; we

assume that the fingerprint extraction processyields random independent and identically distrib-uted (i.i.d.) bits. Then the number of bit errorsbetween the fingerprints from different images willhave a binomial distribution Bðn; pÞ where n

equals the number of bits extracted and p is the

ARTICLE IN PRESS

Fig. 6. Nested structure for keyed fingerprinting.

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339 333

probability that a ‘0’ or ‘1’ bit is extracted. Sincen ð¼ 400Þ is sufficiently large, the binomial dis-tribution can be approximated by a normaldistribution with mean np and standard deviationffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

npð1� pÞp

: From that, the bit error rate (BER)has a normal distribution with mean m ¼ p andstandard deviation s ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1� pÞ=n

p: For the ideal

case, p ¼ 0:5 and thus m ¼ 0:5 and s ¼ 0:025:Through the normal approximation Nðm; sÞ; thefalse alarm rate PFA for BER is given as follows:

PFA ¼Z T

�N

1ffiffiffiffiffiffi2p

psexp

�ðx � mÞ2

2s2

� �dx

¼1

2erfc

m� Tffiffiffi2

ps

!: ð9Þ

For a certain value of PFA; the threshold T for theBER can be determined. In the experiments we useT ¼ 0:3: Then we arrive at a very low false alarmprobability of erfcð5:66Þ=2 ¼ 6:2E � 16: It meansthat out of 400 bits there must be less than 120 bitsin error in order to decide that the fingerprintblocks originate from the same image. It wasexperimentally verified in Section 4.1 that thefingerprints extracted using the proposed methodfollow the random i.i.d. case fairly well.

3.5. Keyed fingerprinting function

In some applications (for example imageauthentication), the security of the fingerprintextraction algorithm is an issue. More precisely,it is sometimes required that the fingerprintfunction depends on a key K : For two differentkeys K1 and K2; the fingerprinting function H

should have the property that HK1ðX ÞaHK2

ðX Þfor any image X : Some general guidelines forkeyed cryptographic hash functions are given at[5]. First, it should use the secret key and every bitof the object iteratively in the hashing process.Second, it should uniformly distribute the hashbits to thwart statistical attacks. These guidelinesshould be also considered in constructing keyedfingerprinting function. We construct a keyedfingerprinting function by using the interleavingas shown in Fig. 3. The coefficients of C0ðzl ; zyÞ arerandomly interleaved in either zl or zy direction bythe permutation table (this is key information).

The construction of the permutation table as afunction of key can be found in [38]. Afterinterleaving, the same fingerprint extraction meth-od in Section 3.3 is used. Interleaving does nothave any effect on the affine invariance of theoutput of the filter Fzl ;zy : To get more security, anested structure shown in Fig. 6 can be used. Thefundamental building block in the nested structureis the fingerprint extraction function and the XORoperation. This is called a round. In each round,fingerprint bits are extracted from the image with adifferent key. Then, the fingerprint bits in thecurrent round are XORed by the fingerprints inthe previous rounds. Mathematically it is given asfollows:

Bp ¼ Bp�1"HKpðX Þ; ð10Þ

where X is the input image, " is the XOR, Bp ispth round fingerprints, Kp is pth round permuta-tion table and p ¼ 1;y;P (P is constant positiveinteger). The process is continued until Pth round,and then the final fingerprint bits are BP:

4. Experimental results

To evaluate the proposed method, we tested ourmethod on a thousand images that include indoorand outdoor scenes, people, vehicles, sportingevents and paintings. Some of the images weretaken by digital camera, and others were gatheredfrom the Internet. The sizes of the images rangefrom 133� 209 to 2560� 1920: Prior to applyingthe proposed method, we normalize the image by

ARTICLE IN PRESS

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339334

taking the luminance component of it and resizingit uniformly to either 512� M or M � 512 whereM is smaller than 512. The resized image is filteredby median filter that is somehow effective incorrecting small geometric processing, such asbend, distort and stretch. The preprocessed imageis projected onto N (typically, N ¼ 512) radialdirections using the Radon transform. The othersteps are the same as in Fig. 3.

4.1. Pairwise independence and key dependence

To test pairwise independence, we extractedfingerprints from 1000 images. Thereafter the BERbetween all possible pairs (499,500) of the finger-prints were calculated. Fig. 7a shows the histo-gram of the measured BER. All the measured BERwere in the range between 0.37 and 0.62. Thehistogram of the measured BER shows that theproposed fingerprints follow the ideal randomi.i.d. case in Section 3.4 fairly well. The mean ofthe measured BER was 0.4930 which is close to0.5, and the standard deviation of it was 0.0272which is also close to 0.025. This shows that theproposed method is approximately pairwise in-dependent.To show validity of the interleaving as a key, we

generated 1000 fingerprints from the Lena image

Fig. 7. (a) Histogram of measured BER between the finger-

prints from different images, (b) Histogram of measured BER

between fingerprints of Lena generated with different keys; The

dotted line represents the ideal random i.i.d. case Nð0:5; 0:025Þ:

using different interleaving. Similar with the aboveanalysis, the BER between all possible pairs of thefingerprints were calculated. The histogram of themeasured BER is shown in Fig. 7b. Mean andstandard deviation of the measured BER were0.5000 and 0.0263, respectively. This result clearlyshows the fingerprint is significantly dependent onthe key information (interleaving). By combiningthe nested structure in Fig. 6 with interleaving as inSection 3.5, the proposed method can provideefficient keying scheme. In terms of security, sucha strong dependency on the key is significant. Oncea key is broken, the user can simply change it,like a password [43] without modifying overallsystem.

4.2. Robustness and database search efficiency

To test robustness of the proposed method, theoriginal images were subjected to various imageprocessing steps (see [14] for a detailed descriptionof the processing steps) and their respectivefingerprint blocks were extracted. Mean andstandard deviation and false rejection rate of theBER between the original and the processed imagefingerprints are shown in Table 1 for 1000 images.Figs. 8 and 9 show histograms of the measuredBER between the original and the processed imagefingerprints for the geometric and the non-geometric image processing steps respectively.

Table 1

Mean, standard deviation (Std) and false rejection rate (with

threshold T ¼ 0:3) of the measured BER for different kinds of

signal degradations for 1000 test images

Processing Mean Std PFR

JPEG ðQ ¼ 10%Þ 0.0487 0.0350 0.002

Gaussian filtering 0.0206 0.0175 0

Sharpening filtering 0.0458 0.0317 0

Median filtering ð4� 4Þ 0.0519 0.0277 0

Rotation (worst case 45:176) 0.1740 0.0275 0.001

Rotation ð90Þ 0.0806 0.0185 0

Scaling ðr ¼ 0:5Þ 0.0202 0.0184 0

Scaling ðr ¼ 0:15Þ 0.1146 0.0690 0.023

Cropping (2%) 0.1483 0.0524 0.007

17 column 5 row removed 0.1671 0.0670 0.040

Random bending attack 0.1928 0.0629 0.049

ARTICLE IN PRESS

Fig. 8. Histogram of measured BER for: (a) rotation, ð45:176Þ (b) scaling ðr ¼ 0:15Þ; (c) cropping (2%), (d) random bending attack.

Fig. 9. Histogram of measured BER for: (a) Gaussian filtering, (b) median filtering, (c) JPEG compression, ðQ ¼ 10%Þ (d) 17 column 5row removed.

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339 335

ARTICLE IN PRESS

Table 2

Mean of the number of hits in the database for different kinds

of signal degradations for 1000 test images

Processing 1 candidate case

ðd ¼ 0Þ1024 candidates

case ðd ¼ 10Þ

JPEG ðQ ¼ 10%Þ 9.702 (5) 17.715 (1)

Gaussian filtering 14.018 (1) 19.353 (0)

Sharpening filtering 9.861 (4) 17.921 (0)

Median filtering

(4� 4)

8.661 (2) 17.631 (0)

Rotation (worst case

45:176)1.490 (228) 9.140 (1)

Rotation ð90Þ 6.622 (0) 15.077 (0)

Scaling ðr ¼ 0:5Þ 14.271 (1) 19.364 (0)

Scaling ðr ¼ 0:15Þ 5.059 (54) 13.356 (2)

Cropping (2%) 3.123 (96) 10.825 (0)

17 column 5 row

removed

1.684 (291) 9.384 (12)

Random bending

attack

2.159 (196) 8.463 (4)

The number in the parentheses refers to the number of images

without hits.

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339336

The result denotes that the proposed method ishighly robust to affine transformations, whichpreserve aspect ratio (all possible angles of therotation and the scaling factor r larger than 0.15)and other image processing steps including com-pression and various filtering.It is important for any fingerprinting method

that it not only results in a low BER but alsoallows efficient searching. For the efficient search-ing we made an ordering structure; the fingerprintblock and the sub-fingerprint as described inSection 3.3. Rather than searching the databasewith fingerprint block every time, sub-fingerprintmatching using a lookup table is more efficient[19]. The simplest assumption is that at least onesub-fingerprint has an exact match at the database.Then the only fingerprints in the database thatneed to be checked are the ones where one of the20 sub-fingerprints in the fingerprint blockmatches perfectly. However, for heavily degradedimages the simplest assumption might not bealways true. All the 20 sub-fingerprints might haveseveral bit errors. In [19] a search algorithm ispresented that exploits the fact that each of the 20sub-fingerprints in a fingerprint block has a list ofmost probable candidates for being an original

sub-fingerprint. For the proposed method, thesub-fingerprints are obtained by comparing andthresholding the difference of affine invariantfeatures. If the difference is very close to thethreshold (in our case the threshold is zero), it willbe more likely that the bit was received incorrectly.If the difference is much larger than the threshold,the probability of an incorrect bit will be low.From this soft-decoding information, a list of mostprobable candidates for being an original sub-fingerprint is determined. If we want to allow d

errors in each sub-fingerprint, the number ofcandidates will be 2d : In our experiments, a listof 1024 most probable candidates ðd ¼ 10Þ wascreated for each sub-fingerprint. Then we have 20lists of 1024 candidates from 20 sub-fingerprints inthe fingerprint block. From these lists the finger-print database can be searched very efficiently, andwith high probability at least one of the 20 lists ofthe fingerprint block contains a correspondingoriginal sub-fingerprint. There are two measures inassessing the fingerprint database search: recall

and precision. Recall refers to the rate at which thedatabase search contains the corresponding finger-print. Precision refers to the proportion of thedatabase search that is actually correct. To test therecall rate of the database search, we search thedatabase with the processed images. Table 2 showsthe number of lists that contain a correspondingoriginal sub-fingerprint for each processing step.This number is referred to as the number ofdatabase hits. Table 2 shows that the number ofdatabase hits is sufficient to database search formost of the image processing steps. We recall from[19] that only a single database hit is needed for asuccessful search. To test the precision of thedatabase search, we search the database with the1000 original test images that are used for makingthe database. Then it is desirable that the databasesearch reports only one candidate image interms of precision. Fig. 10 shows the histogramof the number of candidate images which thedatabase search reported. The average number ofcandidate images was 2.396 and 51.451 for d ¼ 0and d ¼ 10 cases, respectively. We note that theprecision is the reciprocal of the number ofcandidate images. Table 2 and Fig. 10 show thatthere is a trade-off between the recall and

ARTICLE IN PRESS

Fig. 10. Histogram of the number of candidate images which the database search reported for 1000 test images, (a) 1 candidate for

each sub-fingerprint (d ¼ 0 case), (b) 1024 candidates for each sub-fingerprint (d ¼ 10 case).

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339 337

the precision of the database search with the valueof d:

5. Conclusion

For multimedia fingerprinting, extracting fea-tures that allow direct access to the relevantdistinguishing information is crucial. For a goodfingerprinting system, the features should be bothfairly discriminative and robust. In this paper, weproposed a robust image fingerprinting method bybasing on the affine invariant differential featuresof the Radon transform. Fingerprint bits areobtained using the nonlinear operation (takingthe sign after thresholding) and random permuta-tion of the affine invariant features. It wasexperimentally verified that the proposed imagefingerprints satisfy the main requirements offingerprints; robustness under quality preservingsignal processing steps, pairwise independencewith different inputs and database search effi-ciency. A variant of the proposed fingerprintscheme has also been successfully used for

extracting speed-change resilient audio fingerprints[39]. Future work includes the extension of theproposed method to video.

Acknowledgements

Jin S. Seo and Chang D. Yoo were supported byGrant No. R01-2003-000-10829-0 from the BasicResearch Program of the Korea Science &Engineering Foundation.

References

[1] M. Abdel-Mottaleb, G. Vaithilingam, S. Krishnamachari,

Signature-based image identification, in: Proceedings of

the SPIE 3845, Multimedia Systems and Applications II,

Boston, 1999.

[2] ACNielsen, http://www.acnielsen.com/.

[3] Arbitron, http://www.arbitron.com/.

[4] AudibleMagic, http://www.audiblemagic.com/.

[5] S. Bakhtiari, R. Safavi-Naini, J. Pierzyk, Cryptographic

hash functions: a survey, Department of Computer

ARTICLE IN PRESS

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339338

Science, University of Wollongong, Technical Report

95-09, July 1995.

[6] S. Baudry, P. Nguyen, H. Maitre, Optimal decoding

for watermarks subject to geometrical attacks, Signal

Processing: Image Communication 18 (4) (April 2003)

297–307.

[7] BDSOnline, http://www.bdsonline.com/.

[8] S. Berchtold, C. Bohm and H. Kriegel, ‘‘The pyramid

technique: towards breaking the curse of dimensionality,

in: Proceedings of the ACM SIGMOD 98, Seattle, WA,

June 1998.

[9] S. Bhattacharjee, M. Kutter, Compression tolerant image

authentication, in: Proceedings of the IEEE ICIP 1998,

Chicago, IL, October 1998.

[10] K. Chakrabarti, S. Mehrotra, High dimensional feature

indexing using hybrid trees, in: Proceedings of the

International Conference on Data Engineering, Sydney,

Australia, March 1999.

[11] S.S. Cheung, A. Zakhor, Video similarity detection with

video signature clustering, in: Proceedings of the IEEE

ICIP 2001, Thessaloniki, Greece, 2001.

[12] J. Eggers, B. Girod, Blind watermarking applied to image

authentication, in: Proceedings of the IEEE ICASSP 2001,

Saltlake city, UT, May 2001.

[13] M.D. Ellis, S.M. Dunn, M.W. Fellinger, F.B. Younglove,

D.M. James, D.L. Clifton, R.S. Land, Method and

apparatus for producing a signature characterizing an

interval of a video signal while compensating for picture

edge shift, US Patent 5572246, November 1995.

[14] F.A.P. Fetitcolas, Watermarking schemes evaluation,

IEEE Signal Process. 17 (5) (September 2000) 58–64.

[15] D. Fragoulis, G. Rousopoulos, T. Panagopoulos,

C. Alexiou, C. Papaodysseus, On the automated recogni-

tion of seriously distorted musical recordings, IEEE Trans.

Signal Process. 49 (4) (April 2001) 536–540.

[16] J. Fridrich, Robust bit extraction from images, in

Proceedings of the IEEE International Conference on

Multimedia Computing and Systems (ICMCS), Florence,

Italy, June 1999.

[17] G.L. Friedman, The trustworthy digital camera: restoring

credibility to the photographic image, IEEE Trans.

Consumer Electron. 39 (4) (November 1993) 905–910.

[18] Gracenote, http://www.gracenote.com/.

[19] J.A. Haitsma, T. Kalker, A highly robust audio finger-

printing system, in: Proceedings of the International

Conference on Music Information Retrieval (ISMIR)

2002, Paris, October 2002.

[20] A. Hershtik, Monitoring system, Patent Application WO

070869, May 2000.

[21] J. Huang, S.R. Kumar, R. Zabih, An automatic hierarch-

ical image classification scheme, in: Proceedings of the

ACM Conference on Multimedia, Bristol, England,

September 1998.

[22] IFPI, Call for information on fingerprinting technology,

http://www.ifpi.org/.

[23] A.K. Jain, Fundamentals of Digital Image Processing,

Prentice-Hall, Englewood Cliffs, NJ, 1989.

[24] T. Kalker, J.A. Haitsma, J. Oostveen, Issues with digital

watermarking and perceptual hashing, in: Proceedings of

the SPIE 4518, Multimedia Systems and Applications IV,

November 2001, 189–197.

[25] F. Lefebvre, B. Macq, J.-D. Legat, RASH: Radon soft

hash algorithm, in: Proceedings of the European Signal

Processing Conference 2002, Toulouse, France, September

2002.

[26] C-Y. Lin, M. Wu, J.A. Bloom, M.L. Miller, I.J. Cox, Y-M.

Lui, Rotation, scale, and translation resilient public

watermarking for images, IEEE Trans. Image Process. 10

(5) (May 2001), 767–782.

[27] A. Menezes, P. Oorshot, S. Vanstone, Handbook of

Applied Cryptography, CRC Press, Boca Raton, FL, 1997.

[28] M.K. Mihcak, R. Venkatesan, New iterative geometric

methods for robust perceptual image hashing, in: Proceed-

ings of the ACM Workshop on Security and Privacy in

Digital Rights Management, Philadelphia, PA, November

2001.

[29] MusicBrainz, http://www.musicbrainz.org/.

[30] NISO, http://www.niso.org/.

[31] J.-R. Ohm, F. Bunjamin, W. Liebsch, B. Makai,

K. Muller, A. Smolic, D. Zier, A set of visual feature

descriptors and their combination in a low-level descrip-

tion scheme, Signal Processing: Image Communication 16

(1–2) (September 2000), 157–179.

[32] J. Oostveen, T. Kalker, J.A. Haitsma, Feature extraction

and a database strategy for video fingerprinting, in:

Proceedings of the International Conference on Visual

Information Systems, HsinChu, Taiwan, March 2002.

[33] M.P. Queluz, Authentication of digital images and video:

generic models and a new contribution, Signal Processing:

Image Communication 16 (5) (January 2001), 461–475.

[34] J. Ruanaidh, T. Pun, Rotation, scale and translation

invariant spread spectrum digital image watermarking,

Signal Processing 66 (3) (May 1998).

[35] RAA, Recognition an analysis of audio, http://raa.

joanneum.ac.at/.

[36] Relatable, http://www.relatable.com/.

[37] M. Schneider, S.-F. Chang, A robust content based digital

signature for image authentication, in: Proceedings of the

IEEE ICIP 96, Laussane, Switzerland, October 1996.

[38] R. Sedgewick, Permutation generation methods, ACM

Comput. Surv. 9 (2) (June 1977), 137–164.

[39] J.S. Seo, J.A. Haitsma, T. Kalker, Linear speed-change

resilient audio fingerprinting, in: Proceedings of the IEEE

Benelux Workshop on Model Based Processing and

Coding of Audio (MPCA) 2002, Leuven, Belgium,

November 2002.

[40] V. Solachidis, I. Pitas, Circularly symmetric watermark

embedding in 2-D DFT domain, IEEE Trans. Image

Process. 10 (11) (November 2001).

[41] Shazam, http://www.shazamentertainment.com/.

[42] TASI, http://www.tasi.ac.uk/advice/delivering/meta.html/.

[43] R. Venkatesan, S.-M. Koon, M.H. Jakubowski,

P. Moulin, Robust image hashing, in: Proceedings of the

IEEE ICIP 2000, Vancouver, Canada, September 2000.

ARTICLE IN PRESS

J.S. Seo et al. / Signal Processing: Image Communication 19 (2004) 325–339 339

[44] H. Wang, F. Guo, D. Feng, J. Jin, A signature for

content-based image retrieval using a geometrical trans-

form, in: Proceedings of the Multimedia and Security

Workshop at ACM Multimedia, Bristol, UK, 1998,

229–234.

[45] J. Wood, Invariant pattern recognition: a review, Pattern

Recognition 29 (1) (January 1996).

[46] M. Wu, B. Liu, Watermarking for image authentication,

in: Proceedings of the IEEE ICIP 1998, Chicago, IL,

October 1998.