Towards the semantic extraction of digital signatures for librarian image-identification purposes

11
Towards the Semantic Extraction of Digital Signatures for Librarian Image-Identification Purposes Marios Poulos, George Bokos, and Fotios Vaioulis Department of Archives and Library Science, Ionian University, Plateia Eleftherias, Palaia Anaktora, Corfu, 49100, Greece. E-mail: {mpoulos, gbokos, fovai}@ionio.gr In study we attempt to solve the problem of determining how authentic published images on the Internet are, and to what degree they may be identified by compari- son to the original image. The technique proposed aims to serve the new requirements of libraries. One of these is the development of computational tools for the control and preservation of intellectual property such as digital objects, and especially of digital images. For this pur- pose, this article proposes the use of a serial number extracted using a previously tested semantic-properties method. This method, based on the multilayers of a set of arithmetic points, assures the following two proper- ties: the uniqueness of the final extracted number, and the semantic dependence of this number on the image used as the method’s input. The major advantage of this method is that it can serve as the authentication for a published image or detect partial modifications to a reliable degree. Also, it requires the better of the known hash functions that the digital-signature schemes use, and produces alphanumeric strings for checking authen- ticity and the degree of similarity between an unknown image and an original image. As an example of a possi- ble application, this article suggests that this method could be incorporated into the well-known DOI system in order to provide a reliable tool for identification and comparison of digital images. Introduction The role that libraries play has changed dramatically in the past few years. In particular, libraries must now play the dif- ficult role of providing strategic management, safety, storage, maintenance, and retrieval of digital sources that are distrib- uted over the Internet. Multimedia files, and especially im- ages (because of the huge amplitude), present a permanent source of problems for the design of strategies to provide these services, both because these files create storage prob- lems due to the nature of their coding, and because the authenticity check of published images involves particularly heavy demands on the system. In regard to this last problem, no reliable mechanisms for managing the authenticity of images have been developed; in fact existing computational tools have not solved the problems of plagiarism involved with text files. For example, digital-signature schemes (Schneier, 1996) check whether or not images are authentic. This is possible because the extracted alphanumeric string (the digital signa- ture) is dependent on the image, and the hash function used produces a completely different and unrelated result string even if only one bit changes from the original image. The digital-watermarking technique (Kutter, 1998; Kalker, Depovere, Haitsma, & Maes, 1998; Delanay & Macq, 2000; Pereira & Pun, 1999) is considered to be a protective method rather than an authenticity check, because it relies on mecha- nisms of image-modification prevention that are insufficiently secure. On the other hand is rather sophisticated because of the difficulty of the mechanisms used for the coding and de- coding of the image; this produces the extra problem of a lack of interoperability. In this work, a mechanism is proposed for simultaneously checking the authenticity and the degree of similarity be- tween images when the hash function of the digital-signature schemes fail to do so. This work also proposes a scenario for the management of the process of authentication or detection of degree of violation of images; this method could be adopted as a component of libraries’ strategy for the protec- tion of copyrights of images published on the Web. The use of a serial-number mechanism should achieve these objectives. This mechanism could be implemented using a well-fitted query mode database such as that used in the study by Wirth and Patton (2000). This is a basic data- base-querying system that allows the user to retrieve data for a set of objects that match the criteria that the user specifies. Then objects can be selected based on magnitude, redshift, position, serial number, and a comparison of the properties of these objects and any other quantity in the database. Each field can be qualified with a lower bound, an upper bound, or a range, and the criteria in different fields can be combined through AND or OR logic. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 59(5):708–718, 2008 Received March 7, 2006; revised May 16, 2007; accepted July 3, 2007 © 2008 ASIS&T Published online 6 February in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asi.20753

Transcript of Towards the semantic extraction of digital signatures for librarian image-identification purposes

Towards the Semantic Extraction of Digital Signatures for Librarian Image-Identification Purposes

Marios Poulos, George Bokos, and Fotios VaioulisDepartment of Archives and Library Science, Ionian University, Plateia Eleftherias, Palaia Anaktora, Corfu, 49100, Greece. E-mail: {mpoulos, gbokos, fovai}@ionio.gr

In study we attempt to solve the problem of determininghow authentic published images on the Internet are,and to what degree they may be identified by compari-son to the original image. The technique proposed aimsto serve the new requirements of libraries. One of theseis the development of computational tools for the controland preservation of intellectual property such as digitalobjects, and especially of digital images. For this pur-pose, this article proposes the use of a serial numberextracted using a previously tested semantic-propertiesmethod. This method, based on the multilayers of a setof arithmetic points, assures the following two proper-ties: the uniqueness of the final extracted number, andthe semantic dependence of this number on the imageused as the method’s input. The major advantage ofthis method is that it can serve as the authentication fora published image or detect partial modifications to areliable degree. Also, it requires the better of the knownhash functions that the digital-signature schemes use,and produces alphanumeric strings for checking authen-ticity and the degree of similarity between an unknownimage and an original image. As an example of a possi-ble application, this article suggests that this methodcould be incorporated into the well-known DOI system inorder to provide a reliable tool for identification andcomparison of digital images.

Introduction

The role that libraries play has changed dramatically in thepast few years. In particular, libraries must now play the dif-ficult role of providing strategic management, safety, storage,maintenance, and retrieval of digital sources that are distrib-uted over the Internet. Multimedia files, and especially im-ages (because of the huge amplitude), present a permanentsource of problems for the design of strategies to providethese services, both because these files create storage prob-lems due to the nature of their coding, and because theauthenticity check of published images involves particularly

heavy demands on the system. In regard to this last problem,no reliable mechanisms for managing the authenticity ofimages have been developed; in fact existing computationaltools have not solved the problems of plagiarism involvedwith text files.

For example, digital-signature schemes (Schneier, 1996)check whether or not images are authentic. This is possiblebecause the extracted alphanumeric string (the digital signa-ture) is dependent on the image, and the hash function usedproduces a completely different and unrelated result stringeven if only one bit changes from the original image.

The digital-watermarking technique (Kutter, 1998; Kalker,Depovere, Haitsma, & Maes, 1998; Delanay & Macq, 2000;Pereira & Pun, 1999) is considered to be a protective methodrather than an authenticity check, because it relies on mecha-nisms of image-modification prevention that are insufficientlysecure. On the other hand is rather sophisticated because ofthe difficulty of the mechanisms used for the coding and de-coding of the image; this produces the extra problem of a lackof interoperability.

In this work, a mechanism is proposed for simultaneouslychecking the authenticity and the degree of similarity be-tween images when the hash function of the digital-signatureschemes fail to do so. This work also proposes a scenario forthe management of the process of authentication or detectionof degree of violation of images; this method could beadopted as a component of libraries’ strategy for the protec-tion of copyrights of images published on the Web.

The use of a serial-number mechanism should achievethese objectives. This mechanism could be implementedusing a well-fitted query mode database such as that used inthe study by Wirth and Patton (2000). This is a basic data-base-querying system that allows the user to retrieve data fora set of objects that match the criteria that the user specifies.Then objects can be selected based on magnitude, redshift,position, serial number, and a comparison of the propertiesof these objects and any other quantity in the database. Eachfield can be qualified with a lower bound, an upper bound, ora range, and the criteria in different fields can be combinedthrough AND or OR logic.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 59(5):708–718, 2008

Received March 7, 2006; revised May 16, 2007; accepted July 3, 2007

© 2008 ASIS&T • Published online 6 February in Wiley InterScience(www.interscience.wiley.com). DOI: 10.1002/asi.20753

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008 709DOI: 10.1002/asi

The serial number, called a semantic content identifier(SCI), is made from the pixels of the image. The method thatproduces the serial number has at its root the multilayers ofa set of arithmetic points that represent the values of eachimage’s pixels. Specifically, the brightness of every pixel inthe image is extracted and a matrix of numbers is created.Then, the known onion-peeling algorithm from computa-tional geometry (Dalal, 2004; O’Rourke, Chien, Olson, &Naddor, 1982) is used and the serial number is created. Thesemantic properties of this algorithm have been proven viaits applications in several other areas of research. PoulosMagkos, Chrissikopoulos and Alexandris (2003) and Poulos,Papavlasopoulos, and Chrissikopoulos (2004) used this algo-rithm for fingerprint verification and text categorization,proving that it creates a unique convex polygon, like a digitalimage, for each input. In addition, the semantic and uniquefeatures of the convex polygon have been described in previ-ous studies (e.g., Zunic 2003), and the possibility of utilizingthe onion algorithm for database purposes has been demon-strated by a recent IBM technical report (Castelli, 2001),according to which the onion-peeling procedure is consid-ered to be well suited for searching algorithm problems.

The creation of this serial number, which could be used asa digital signature, can be compared with any other digitalsignature that comes from the same image, from a similarimage, or from a completely different image. The decision ofauthenticity can then be made by the use of the Hausdorffalgorithm, which measures the geometrical differences be-tween two sets of points.

Finally, this method would be best embedded into thewell-known DOI system (International DOI Foundation,2007) as an application scenario. The next section of thisarticle presents more details on this scenario. The choice ofDOI as the system for the embedding of the serial numberwas made because of the large interoperability that it affordsto information technologies, and because it can collabo-rate with the database of CROSSref and the well-knownprotocol, OpenURL. In this way, the form of exploitation ofthe algorithm indicated in accordance with this mechanismcan be interchanged and transported in a metadata schemefor the identification and checking of the degree of authen-ticity of digital images that are protected by copyrights fromorganizations and libraries that have adopted this strategyfor the protection and control of the intellectual propertyof images.

Finally, an additional aim of this article is to compare theproposed method with the studies by Zachary, Iyengar, andBarhen (2001) and Shah, Raghavan, Dhatric, and Zhao(2006), which are considered the most representative of thetechniques of Web image retrieval. The comparison withthese two studies is presented below.

Method

This method is divided into four stages: preprocessingstages 1 and 2, the processing stage, and the metaprocessingstage.

Preprocessing Stage 1

The input image is made suitable for further processingby image enhancement techniques using Matlab 6.5, and istransformed into a matrix of pixels. Specifically, in this stageany image that is available in any of the common image for-mats (tiff, bmp, jpg, etc.), is transformed into a matrix (atwo-dimensional array) of pixels. The brightness of eachpoint in this array is proportional to the value of its pixel.This gives the synthesized image of a bright square on a darkbackground. This value is often derived from the output ofan A/D converter. The matrix of pixels, i.e., the image, willbe described as N � L m-bit pixels, where m controls thenumber of brightness values. Using m bits gives a range of2m values, ranging from 0 to 2m � 1. Thus, the digital imagemay be denoted by the following compact matrix form:

f(x, y) =

(1)

The coordinate vector of the above matrix is

P � [ f(x, y)] (2)

Thus, a vector P of 1 � N � L dimension is constructed,which is then used in the next stage.

Preprocessing Stage 2

The matrix created in the first stage is submitted to a fastFourier transform. The data that comes from this step is sub-mitted to specific segmentation (into data sets) using compu-tational geometric algorithms implemented via Matlab.Thus, onion layers (convex polygons) are created from thesedata sets (see Figures 1 and 2).

We considered that if the number of elements of the vectorP is K, where K � N � L, then the spectral density can be cal-culated from the coefficients of vector P, {Pk, k � 0, 1 . . . ,K � 1}, via the Fourier transform:

(3)

Finally, the magnitude or spectrum of the Fourier trans-form is calculated. Then, the vector S0 constitutes the initialpoint set:

S0 � |Rf | (4)

Processing Stage of CGA (Computational Geometry Algorithm)

Proposition. We considered that the set of brightness valuesfor each image contains a convex subset, which has a specific

Rf � aK�1

k�0

SK e� j2pf k

k , f � 0, 1, . . . , K � 1.

≥f(0, 0) f(0, 1) . . . f(0, L � 0)

f(1, 0) f(1, 1) . . . f(1, L � 0)

o o o of(N � 1, 0) f(N � 1, 1) . . . f(N � 1, L � 1)

¥

710 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008DOI: 10.1002/asi

position in relation to the original set. This position may bedetermined by using a combination of computational-geo-metric algorithms, known as the onion peeling algorithm,with an overall complexity of O(d*n log n) time.

Implementation. We consider the set of brightness values ofan image to be the vector S (Equation 4). The algorithmstarts with a finite set of points S � S0 in the plane, and thefollowing iterative process is conducted. Let S1 be the setS0 � 0h(S0) : S minus all the points on the boundary of thehull of S). Similarly, define Si�1 � Si � 0h(Si). The processcontinues until the set has � 3 points. The hulls Hi � 0h(Si)are called the layers of the set, and the process of peelingaway the layers is called onion peeling, for obvious reasons.Any point on Hi is said to have an onion depth (or just depth)of i. Thus, the points on the hull of the original set have adepth of 0.

Extraction of the Points of the Convex Polygon WithSemantic Properties

The latest research has proven statistically that the smallestconvex layer in this algorithm carries specific information,because this position gives a geometrical interpretation of theaverage of the image’s brightness. In other words, the smallestconvex polygon (layer) depicts a particular geometrical areain which this average value of the brightness is found.

This feature may be characterized as unique (Dalal, 2004;Poulos et al., 2003; Lee & Kharogonekar, 2003) to eachimage because the following two conditions are met:

• The selected area layer does not intersect with another layer.• The particular depth of the smallest layer is variable in each

case.

Thus, two variables are extracted from the proposedimage-processing method: the area of the smallest onionlayer, Sx, and the depth of this layer, which is a subset of theoriginal image set S values, where x is the number of ampli-tudes of the smallest layer. Taking into account the specificfeatures of the aforementioned variables, it is easy to ascer-tain that these may be used for accurate image identification.

Constructed serial number. We constructed the proposedserial number in the following way: We reserve some placesfor the number of pixels of the image, for instance, 1024 �768 � 786432 (the first area in Figure 3). Then there aresome places for the number of layers of the smallest convexpolygon, for example, 524. After that, we have the specificpoints of the smallest onion layer. These points may be threeor more. First we have places for the x values of the points(the third section in Figure 3), and then places for the yvalues (the fourth section). All of these digits togetherconstitute the proposed serial number for the identificationof digital images.

Identification Procedure

The algorithm known as the Hausdorff Distance (Atallah,1983) was adopted for the procedure of comparing the con-vex polygons’ points. This algorithm was created specifi-cally for the computation of such quantities as the mean

FIG. 1. Onion layers of a set of points in a theoretical basis.

FIG. 2. Instances of onion layers of a real image.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008 711DOI: 10.1002/asi

Euclidean distance of two convex polygons in Euclideanspace.

In the present work, the convex pairs of values (x, y)represent both the smallest onion layer that came from theoriginal image and the smallest onion layer that came from asecond image. From the computation of the distance betweenthese two sets of points it emerges whether, and to whatdegree, the second image is identical to the original one.

Specifically, the use of this algorithm can be illustrated asfollows:

Given two sets of points A � {a1, a2, . . . , am} and B � {b1, b2, . . . , bn}, the Hausdorff Distance is definedas H(A, B) � max(h(A, B), h(B, A)), where h(A, B) �

is any metric between the

points a(x1, y1) and b(x2, y2), such as the Euclidian distance.In this case, if h(A, B) � 0, and the number of pixels and

all the points of the smallest polygon (Figure 3) of the twoimages that are compared are the same, then the two imagesare identical. Otherwise, the degree of similarity between thetwo images is decided by the value of Hausdorff distance. Atypical indicator of high similarity is decided empirically tobe any value equal to or less than 0.1: h(A, B) � 0.1, and inall other cases, in which the Hausdorff distance is biggerthan 0.1, the images are considered different.

Computation of Complexity: Comparison With Other methods

In this section we will calculate the complexity of theprocedure used in our method, and then we will comparethe complexity of our method with that of a well-known,accurate philosophy method based on a similar philosophy(Shah et al., 2006). In our case, the query-image procedureis focused on the calculation of the complexity of theHausdorff-distance extracted by two simple convex poly-gons that represent two images. This complexity has beencalculated as O(n1 � m1) time (Normand & Bouillot, 1998).It must be noted that this number is bounded according tothis condition: e � n1� 3 & e �m1� 3, where, in practice,in the worst case e� 100. For N images compared with one,this complexity is determined by O (N *(n1 � mi)) order.

On the other hand, the method in Shah et al. (2006)focuses on a new approach of the content-based image

maxa�A

minb�B

7a � b 7 and 7a � b 7

retrieval (CBIR) problem, which is based on the combina-tion of two methods: (a) the newly proposed similarity-preserving space transformation method, which enablesefficient query processing, and (b) a clustering scheme thatfurther improves the efficiency of this retrieval system. Thetotal optimal complexity of this method is calculated bythe authors as O (NL2) time, where N is the number ofimages in the database and L is the number of low-level fea-tures. By conducting a rigorous comparison between the twomethods, we concluded that the value L corresponds with thenumber n1 � mi (feature extraction) of our method. Then,the present study yields a significantly smaller complexity,O(NL), in comparison to the other method, which yields anoptimal complexity of O(NL2).

Furthermore, in a recent significant study (Zachary et al.,2001) the authors focused on the real-valued representationof color, which is based on the information theoretic con-cept of entropy. A theoretical presentation of image entropyis accompanied by a practical description of the merits andlimitations of image entropy compared to color histograms(Zachary et al.). However, the authors of this study state thatinformation theory and entropy have not received adequateresearch attention in the image-interpretation and pattern-recognition fields. Areas for future work (Wirth & Patton,2000) include information-theoretic descriptions of the othervisual features of images, including spatial and geometricfeatures. Taking this into account, this article may be charac-terized as an implemented aim of the study by (Wirth &Patton, 2000) because our method creates geometric pattern-recognition features, including spatial ones such as the convexhulls (Guillot, Estoup, Mortier, & Cosson, 2005).

Empirical Observations

In order to test our method, we selected some imagesand created their serial numbers. We then made some modifi-cations to these images, creating new images. These modi-fications involved changing the file format from jpeg to bitmapand tiff, and inserting a text box (with the following text:“http://www.ionio.gr/~mpoulos”), a tiny picture, or a smallblack box into the images. Following this, we paired the im-ages and computed the Hausdorff distance for each pairs.

Specifically, there were three starting images: Sample1.jpg, which is a yellow futuristic car design, Sample 2.jpg,

FIG. 3. The schematic representation of the proposed serial number.

712 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008DOI: 10.1002/asi

which is a real car on a street, and Sample 3.jpg, which is alandscape. For each of these three images, five new imageswere created. The first (Sample 1.1.jpg) was created byinserting an image of a tiny orange car somewhere in theoriginal image. The second (Sample 1.2.jpg) was created byinserting a text box somewhere in the original image. Thethird (Sample 1.3.jpg) was created by inserting a black boxsomewhere in the original image. The fourth (Sample 1.4.tif)was created by changing the file format of original image totiff. The fifth (Sample 1.5.bmp) was created by changingthe file format of the original image to bitmap. In all, therewere 18 images and serial numbers. The serial number ofimage Sample 1.jpg, for example, is 98490-564-17100-17257-17081-17100-180-183-182-180; it can be seen inFigure 4 with the numbers separated into the four areas.

Then, 18 � 3 � 54 pairs of images were created in orderfor the Hausdorff distance to be computed. From the resultsof these computations, which are presented in the Table 1, itis clear that the Hausdorff distance is insignificant or zero forthe images that were created by modification from the origi-nal ones, and is about one or greater when the two imagesare different.

Having in mind the conditions of identification and simi-larity, presented in the previous section, it emerges that theimages that are absolutely identical are the ones that werecompared with themselves and the ones that resulted fromchanging the file format. In each of these cases the Hausdorffdistance is equal to zero. In the cases in which the comparisonis between two images with minimal changes, the Hausdorffdistance is much smaller than 0.1. Finally, in the cases inwhich the compared images are entirely different, then theHausdorff distance is significantly larger than 0.1.

Example Application Scenario for the Algorithm

As mentioned in the introduction, the proposed serialnumber (SCI) could be applied in the construction of theunique Digital Object Identifier (DOI) for an image.

DOI is a system for identifying, in a unique way, digitalobjects in the digital environment. A DOI is an alphanumericstring, e.g. “10.1000/182”, which can be assigned, as anidentifier, to any digital entity, such as a text document, foruse in digital networks.

However, a DOI resource metadata declaration (RMD)is a message designed specifically for metadata exchangebetween registration agencies (RAs; International DOIFoundation, 2006b). The format may also be used for inputof service metadata, but it is not intended as a replacement

for other domain or service-specific schemes. An RMD is inthe form of an XML document that conforms to an XMLSchema (xsd). All its elements and allowed values aremapped into the indecs Data Dictionary (iDD). In our case,this mechanism could be used to adapt the SCI model in theRDM form (see Figure 5). In greater detail, our extension ispositioned in the current DOI metadata model as a mediatorbetween the DOI resources and the RMD flows.

The DOI provides current information, including theplace on the Internet where the entity can be found. The DOIis paired with the object’s electronic address, or URL, in anupdateable central directory, and is published in place of theURL in order to allow the content to move to another loca-tion without the need to change the link itself.

The basic form of the DOI has the following syntax:10.xxxx/sssss. This sequence contains two sections: the prefix10.xxxx, which involves the identification of the publishingauthority, the “10.” indicating that this string is a DOI al-phanumeric string; and the suffix, which provides the capabil-ity for any meaningless or meaningful identification string tobe added. The publishing authority usually adds the stringsthat identify the object in the suffix.

As an example application of the proposed serial numberof an image, the SCI may play the role of the suffix of theimage’s DOI. For the Sample 1.jpg, which has SCI equal to98490-578-17380-17392-17481-17278-17380-177-177-177-180-177, the DOI could be: 10.xxxx/SCI984905781738017392174811727817380177177177180177, with the existingmodel of an RMD description of an image resource in the DOImofel (International DOI Foundation, 2006b) enclosing theHausdorff distance (named as SCI Criterion) represented as

Resource

Identifier [“URI”]

Subtype DOI

SCIIdentifier [“98490-578-17380-17392-17481-17278-17380-177-177-

177-180-177”]

Subtype DOI

SCI Criterion [“0�n�1”] <<Which is the Hausdorrf

Criterion>>

Subtype DOI

Name[“1”]

Subtype[“title”]

Category [jpeg]

Subtype[MimeType]

with a resulting DOI suffix of:

10.xxxx/SCI984905781738017392174811727817380177177177180177

FIG. 4. The serial number of image Sample 1.jpg.

TABLE 1. The identification procedure for 15 images.

714 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008DOI: 10.1002/asi

TABLE 1. (Continued).

Considering the image-description limitations that DOIhas, we embrace the idea that the SCI extension as an extrafield of the existing DOI mechanism is a valuable improve-ment of RMD descriptors in the standard design (Interna-tional DOI Foundation, 2006a). With respect to the existinginterfaces, we consider the application of the proposedextension to be an improvement of DOI image retrieval formore sophisticated cases (e.g., digital rights managementetc). In such a case the proposed serial number as part of the

DOI suffix may play a meaningful role, since apart frombeing a unique identifier, it can also be used to check theauthenticity of the image it is assigned to.

Library-Retrieval Case Study

The proposed method could be applied in three virtualscenarios of image retrieval using realistic presuppositions.In particular, we considered a number of DOI database SCI

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008 715DOI: 10.1002/asi

FIG. 5. The existing interface of DOI resource metadata declaration (RMD) using the SCI model of the standard DOI resource metadata flow (image fromInternationanl DOI Foundation, 2006b).

model images, which are registered in a database named“Multimedia Ionian Library.”

1. Scenario of control identification of an image. In this casewe considered an image, for example 1.1.jpg, publishedwith a DOI number 10.xxxx/SCI984905781738017392174811727817380177177177180177 in a Web location.The identification of the source using the SCI identifiercould be carried out using the Hausdorff distance algo-rithm setting as input criterion data n � 0. Then the proce-dure for either identification (n � 0) or for similarity de-gree 0 � n � 1 between two images could be implementedvia a similar philosophy, such as the query mode (see In-troduction) searching mechanism (see Figure 6).

2. Scenario of collection of similar images. In this case, weextended the procedure for a selection of images thatobtained a particular degree of Hausdorff distance similar-ity, 0 � n � 10�2. Thus, a user could retrieve similar im-ages from the multimedia Ionian library following the sameprocedure used in the previous scenario (see Figure 7).

3. Scenario using an OpenURL resolver. Finally, in a moreextended scenario (for multimultimedia databasessources; see Figure 8) a serial number is embedded intothe suffix of the DOI system. This solution was adopted

because the DOI cooperates perfectly with such metadatainformation services as the CROSSREF and the protocolOpenURL. Specifically, an OpenURL consists of a baseURL, which addresses the user’s institutional link server,and a query string, which contains the data of the partic-ularentry, typically in the form of key-value pairs. Forexample, an OpenURL query that utilizes the SCI Identi-fier could be as follows: http://resolver.example.ionio.gr/cgi?content�image&SCI�984905781738017392174811727817380177177177180177&title�1&format�jpg

Conclusions

This study’s major objective was to try to construct a serialnumber which, under certain circumstances, could be insertedinto the information-management strategies of informationorganizations and big, cooperative libraries as a mechanismfor the identification and control of intellectual property suchas electronically published material. The existing techniques,and especially digital-signature schemes, fulfill only thefirst—the identification—part of the objective.

This work presents the adjustment of a computationalgeometric algorithm for the semantic representation of theinformation from an image’s pixels in a well-chosen serial

716 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008DOI: 10.1002/asi

number. The idea for this construction came from a test of theonion-peeling algorithm in other areas of image-processing,such as the identification of humans by fingerprint. The aimof this application is to construct a serial number to identifya copyright-protected published image even if its file formathas changed from one type to another. Furthermore, thisapplication aims to provide a satisfactory amount of correla-tion similarity with other images created from the original

by a small pixel alteration, and to detect and automaticallyreject images that are not related to the original.

For a realistic implementation of the proposed serialnumber, we created a small database with three imagestransformed from jpeg format to tiff and bmp. Also, somepixels of the original image were changed by the addition ofsmall objects covering a tiny area. The changes were done toaddress a hypothetical scenario of image violation. In total,

FIG. 6. An application scenario of image retrieval using a DOI descriptor.

FIG. 7. An application scenario of similar-image retrieval using a DOI descriptor.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008 717DOI: 10.1002/asi

18 images were created, six from each of the three thematicsections, 3 � 6 � 18, which were formed into comparisonpairs, and 3 � 18 � 64 comparisons were performed (seeTable 1).

This study adopted the Hausdorff distance, which has beenapplied to solving problems involving the identification ofconvexes in Euclidean space, for its preliminary identificationdecisions. The present work’s experimental results show thatin cases of absolute identification the Hausdorff distance iszero. The images made by a slight alteration of pixels, whichwould be suspected of violating a copyright, yielded Hausdorffdistances of less than 0.1. Entirely different images producedHausdorff distances much greater than 0.1.

Further research, therefore, is needed to test the methodexperimentally with more images and comparisons in order toevaluate its validity with a larger sample base, and to examinethe utilization of DOI as a metadata platform cooperationmodel with such other information networks as neural net-works. The ultimate goal is the adoption of the proposedmodel as a strategic component for organizations and librariesto use for the identification and control of the authenticity ofelectronically published images on the Web.

Finally this method may be characterized as an accuratemethod that produces a very good degree of complexity com-pared to other methods (see Shah et al., 2006). Furthermore,

the method may be characterized as novel and helpful becauseof the algorithms used and the sophisticated schema basedon this technique, which are in agreement with recent stud-ies (Zachary et al., 2001; Shah et al.).

References

Atallah, M.J. (1983). A linear time algorithm for the Hausdorff distancebetween convex polygons. Information Processing Letters, 17, 207–209.

Castelli, V. (2001). Multidimensional indexing structures for content-basedretrieval. IBM Research Report RC 22208 (98723). Retrieved January 8,2008, from <http://domino.watson.ibm.com/library/cyberdig.nsf/papers/3F5C03FC99D24C3885256AE6005FF278/$File/RC22208.pdf>

Dalal, K. (2004). Counting the onion. Random Structures and Algorithms,24(2), 155–165.

Delanay, D., & Macq, B. (2000). Generalized 2-D cyclic patterns for secretwatermark generation. Proceedings of the IEEE Conference on ImageProcessing, 2, 77–79.

Guillot, G., Estoup, A., Mortier, F., & Cosson, J.F. (2005). A spatial statisti-cal model for landscape genetics. Genetics, 170(3), 1261–1280.

International DOI Foundation (2007). The digital object indentifier system.Retrieved January 8, 2008, from http://www.doi.org

Interntational DOI Foundation (2006a). Appendix 5: DOI resource metadatadeclaration. Retrieved January 8, 2008, from http://doi.org/handbook_2000/appendix_5.pdf

Internationanl DOI Foundation (2006b).Resource metadata declaration(RMD) for metadata interchange. DOI Handbook. Retrieved January 8,2008, from http://doi.org/handbook_2000/metadata.html#4.3.3

FIG. 8. An application ccenario of image retrieval using OpenURL technology.

718 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2008DOI: 10.1002/asi

Kalker, T., Depovere, G., Haitsma, J., & Maes, M. (1998). A video water-marking system for broadcast monitoring. Proceedings of. SPIE: Sympo-sium on Electronic Imaging, 3657, 103–112.

Kutter, M. (1998). Watermarking resistance to translation, rotation, and scaling.Proceedings of SPIE: Multimedia Systems and Applications, 3528, 423–431.

Lee, J.-W., & Khargonekar, P.P. A convex optimization-based nonlinear fil-tering algorithm with applications to real-time sensing for patternedwafers. IEEE Transactions on Automatic Control, 48(2), 224–235.

Normand, G., & Bouillot, M. (1998)Hausdorff distance between convex polygons, Computational Geometry Web Project. Retrieved January 8,2008, from http:// www.cgrl.cs.mcgill.ca/godfried/teaching/cg-projects/98/normand/main.html

O’Rourke, J., Chien, J., Olson, C., & Naddor, T. (1982). A new linear algo-rithm for intersecting convex polygons, Computer Graphics and ImageProcessing, 19(4), 384–391.

Pereira, S., & Pun, T. (1999). Fast robust template matching for affine resis-tant image watermarks. In A. Pfitzmann (Ed.), Lecture Notes in ComputerScience: Vol. 1768. Proceedings of the Third International Workshop onInformation Hiding (pp. 200–210). London: Springer-Verlag.

Poulos, M., Magkos, E., Chrissikopoulos, V., & Alexandris, N. Securefingerprint verification based on image processing segmentation usingcomputational geometry algorithms, (2003). Proceedings of the IASTEDInternational Conference on Signal Processing, Pattern Recognition, andApplications (pp. 308–312). Anaheim, CA: ACTA Press.

Poulos, M., Papavlasopoulos, S., & Chrissikopoulos, V. A text categorizationtechnique based on a numerical conversion of a symbolic expression andan onion layers algorithm. Journal of Digital Information, 6(1), Article 276.Retrieved January 8, 2008, from http://jodi.tamu.edu/Articles/v06/i01/Poulos/

Schneier, B. (1996). Applied Cryptography (2nd ed.): Protocols, algo-rithms, and source code in C. New York: John Wiley & Sons.

Shah, B., Raghavan, V., Dhatric, P., & Zhao, X. (2006). A cluster-basedapproach for efficient content-based image retrieval using a similarity-preserving space transformation method. Journal of the American Soci-ety for Information Science and Technology, 57(12),1694–1707.

Wirth, G.D., & Patton, D.R. (2000). VERDI: A Web database system for redshift surveys astronomical data analysis software and systems IX. In N. Manset, C. Veillet, & D. Crabtree (Eds.), ASP Conference Proceed-ings: Vol. 216 (p. 251). San Francisco: Astronomical Society of the Pacific.

Zachary, J., Iyengar, S.S., & Barhen, J. (2001). Content-based image retrievaland information theory: A general approach. Journal of the American Soci-ety for Information Science and Technology, 52(10), 840–852.

Zunic, J.D. (2003). A characterization of discretized polygonal convexregions by discrete moments. In A. Sanfelui & J. Ruiz-Shulcloper (Eds.),Lecture Notes in Computer Science: Vol. 2905. Progress in Pattern Recog-nition, Speech, and Image Analysis: Eigth Iberoamerican Congress on Pat-tern Recognition (CIARP 2003). (pp. 529–536). Berlin: Springer-Verlag.