What's beyond query by example

25
SN 0249-6399 ISRN INRIA/RR--5068--FR+ENG apport de recherche THÈME 3 INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE What’s beyond query by example? Nozha Boujemaa — Julien Fauqueur — Valérie Gouet N° 5068 Décembre 2003

Transcript of What's beyond query by example

ISS

N 0

249-

6399

ISR

N IN

RIA

/RR

--50

68--

FR

+E

NG

ap por t de r ech er ch e

THÈME 3

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

What’s beyond query by example?

Nozha Boujemaa — Julien Fauqueur — Valérie Gouet

N° 5068

Décembre 2003

Unité de recherche INRIA RocquencourtDomaine de Voluceau, Rocquencourt, BP 105, 78153 Le ChesnayCedex (France)

What's beyond query by example?Nozha Boujemaa , Julien Fauqueur , Valérie GouetThème 3 � Intera tion homme-ma hine,images, données, onnaissan esProjet IMEDIARapport de re her he n° 5068 � Dé embre 2003 � 22 pagesAbstra t: Over the last ten years, the ru ial problem of information retrieval in multime-dia do uments has boosted resear h a tivities in the �eld of visual appearan e indexing andretrieval by ontent. In the early resear h years, the on ept of the �query by visual exam-ple� (QBVE) has been proposed and shown to be relevant for visual information retrieval.It is obvious that QBVE is not able to satisfy the multiple visual sear h usage requirements.In this paper, we fo us on two major approa hes that orrespond to two di�erent retrievalparadigms. First, we present the partial visual query that ignores the ba kground of theimages and allows a straight user expression on its visual interest without relevan e feedba kme hanism. The se ond retrieval paradigm onsists in sear hing for the user mental imagewhen no starting visual example is available. This new approa h relies on the unpervisedgeneration of a visual thesaurus from whi h query by logi al omposition of region ate-gories an be performed. This query paradigm is losely related to that of text retrieval.Mental image sear h is a hallenging and promising issue for retrieval by visual ontent inthe forth oming years sin e it allows di�erent ri h user expression and intera tion modeswith the sear h engine.Key-words: query by visual example, partial queries, region of interest, points of interest,mental image, logi al omposition of region ategories, visual thesaurus of regions, userexpression

What's beyond query by example?Résumé : Au ours de la dernière dé ennie, le problème fondamental de la re her hed'information dans les bases multimédia a alimenté les travaux dans le domaine de l'indexationet de la re her he par le ontenu visuel. Initialement, le paradigme de �re her he parl'exemple visuel� a été proposé et s'est avéré pertinent pour la re her he d'informationvisuelle. Ce paradigme n'o�re qu'une solution limitée aux divers besoins existants. Dans etarti le, nous nous intéressons à deux appro hes orrespondant à deux paradigmes distin tsde re her he. Nous présentons, d'une part, un paradigme de requêtes partielles permettantd'ignorer le fond de l'image et qui permettant l'expression dire te de parties d'image perti-nentes sans mé anisme de bou lage de pertinen e. Le se ond paradigme permet la re her heà partir d'une image mentale de la ible et ne né essite pas d'image exemple initiale. Cettenouvelle appro he repose sur la génération non-supervisée d'un thesaurus visuel à partir du-quel peuvent être formulées des requêtes par omposition logique de atégories de régions.Ce dernier paradigme est semblable à elui de la re her he de texte. La re her he par imagementale onstitue un axe prometteur dans le domaine de la re her he par le ontenu visueldans la mesure où il o�re des modes ri hes et di�érents d'intera tion ave le moteur dere her he.Mots- lés : re her he par l'exemple, requêtes partielles, régions d'intérêt, points d'intérêt,image mentale, omposition logique de atégories de régions, thesaurus visuel de régions,expression utilisateur

What's beyond query by example? 31 Problem statementThe amount of available multimedia do uments has steadily in reased in later years andwith it the need for e� ient organization and retrieval of this information when needed.Simple arrangements of items and immediate lookup is no longer su� ient in a world moreinterested by the ontent than by the des ription tags found in most ar hives. These growingneeds have boosted resear h a tivities in the �eld of ontent-based image retrieval (CBIR)that used to be a hieved thanks to textual annotation.In the last de ade, the on ept of �query by visual example� (QBVE) has been introdu ed[27℄[10℄ and has shown the relevan e of the low-level-based visual information retrieval. Thisapproa h onsists in sending the entire image (i.e. its omputed low-level signatures) to thesear h engine as a visual query. Hen e beside human-based metadata (textual annotation)that usually bring semanti information, these ma hine-based metadata related to the phys-i al ontent (low level features) be ome available as an information retrieval support. Allmajor institutions, in industry as well as in a ademi and publi resear h groups, investi-gated this �eld.In the ase of QBVE (in the sense of full image query), retrieval results express an overallglobal visual similarity, thus an approximate similarity. In this ontext, we may have twoimages with di�erent image omponents (�obje ts�) with di�erent shapes and appearan es,but whi h remain globally similar. For some given visual queries, this leads to di�eren esbetween user intention/target and retrieved results. Starting from these observations, our ommunity has dis ussed the on ept of �semanti gap� through di�erent approa hes. Thiswas also the statement of the fa t that the QBVE paradigm is not able to satisfy the multiplevisual sear h requirements.There are several ways to deal with the semanti gap. One prior work is to optimize the�delity of physi al- ontent des riptors (image signatures) to visual ontent appearan e of theimages. The obje tive of this preliminary step is to bridge what we all the �numeri al gap�.To minimize the numeri al gap, we have to develop e� ient images signatures ( ompa t andvisually onsistent, eg. [7℄). The weakness of visual retrieval results, due to the numeri algap, is often onfusingly attributed to the semanti gap. We think that providing ri heruser-system intera tion allows user expression on his preferen es and fo us on his semanti visual- ontent target.Beyond QBVE, ri h user expression omes in a variety of forms:1. allow the user to notify his satisfa tion (or not) on the system retrieval results method ommonly alled �relevan e feedba k�. In this ase, the user rea tion expresses moregenerally a subje tive preferen e and therefore an ompensate for the semanti gapbetween visual appearan e and user intention,2. provide pre ise visual query formulation that allows the user to sele t pre isely itsregion of interest and pull o� image parts that are not representative of his visualtarget,RR n° 5068

4 Boujemaa, Fauqueur & Gouet3. provide a me hanism to sear h for the user mental image when no starting imageexample is available. Part of this work will be published in [5℄.Besides, ombined image and text indexing and retrieval approa hes are of great interestfor the semanti gap redu tion and are heavily investigated. However, this item is out ofthe s ope of this paper.The dis ussions of the �rst and the se ond item are losely related. However, let us lingerover the following question: relevan e feedba k, what for? What are the obje tive and theuse of this me hanism?This me hanism was the initial way for user expression ommonly investigated for se-manti gap redu tion. It was mostly used to ompensate for all mismat hes between userquery and retrieval results. We strongly think that this me hanism should be reserved tosubje tive/semanti user preferen e or to on ept sear h. For example, let us onsider thesear h for the on ept �Cézanne paintings� among an image database of masterpie es. For agiven Cézanne painting query, the sear h engine ould �nd as similar masterpie es whi h arenot Cézanne's and onversely. By relevan e feedba k me hanism, the user helps the systemto �nd out what visual appearan e is important to rea h [6℄. Here the riterion is the usersatisfa tion.We noti e that the use of relevan e feedba k has sometimes overstepped its obje tives andits abilities. One frequent example of mistaking is the following: among animals database,we would look for images that depi ts �tiger in a forest�, �tiger in a savannah�, �tiger in adesert�. The target here is to retrieve tiger images among the other animals independentlyfrom the nature of image ba kground. For this family of queries, it is more appropriated toallow the user sele ting the tiger image surfa e than to perform feedba k on the entire imagethrough several iterations to help the system to understand the user visual target. Whenthe user points the obje t of interest with partial query, the ba kground image signature is onsidered as noise.The third item on erns the mental sear h paradigm. In this ase, there is no startingimage (or region) example available for the user. There is only a starting mental image froman event or a ontext memory. The obje tive is then: how ould we build a sear h engine torea h this mental image? This question is also related to the �page zero� problem [9℄ (whatimages should be shown �rst to the user to formulate the query?). There is a non-uniqueapproa h to provide a solution. In one of the earliest approa h ([21℄ and [15℄), the systemsuggests two (or a list of) possible target images (entire images) to the user who points tothe system what are the images that are more likely lose to his mental image. The obje tiveis to minimize the number of iterations to lo ate the target. We noti e that in this ase thesystem questions the user and not the ontrary. A part from this kind of statisti al model,there is another approa h for mental image sear h.In this paper, we present and dis uss two lo al image signatures whi h provide lo alimage des ription and allows partial querying. These lo al methods redu e expli itly thesemanti gap as they integrate more appropriate user intera tion with the sear h engineINRIA

What's beyond query by example? 5within a pre ise query formulation. In the se ond part of the paper, we present a newvisual retrieval paradigm based on target image sear h from visual thesaurus formed byimage region ategories. This paradigm allows the user to express his preferen e by logi al omposition of image parts over this visual thesaurus to rea h a mental image when he hasno starting image examples.2 Partial visual sele tion for pre ise queryWhen the user is interested to retrieve only similar parts or obje ts of an image, a lo alsele tion must be onsidered. The idea onsists �rst in lo alizing features of interest inthe image and se ond in hara terizing the primitives obtained with lo al des riptors. We ould have weak requirements on partial retrieval pre ision for some appli ations and forothers have hard onstraints on the partial pat h on�guration su h as photometri and/orgeometri properties.In this ontext, two main lasses of primitives are relevant to hara terize parts in animage: region segmentation and des ription for homogeneous area similarity sear h [11℄ andpoints of interest dete tion and des ription [16℄ when onsidering heterogeneous area. Wewill �rst present these two approa hes and then dis uss di�eren es and omplementaritiesbetween these two methods for partial query formulation.2.1 Regions of InterestWe present here the approa h we develop at the IMEDIA group to perform region-basedvisual query. The key idea, in this ontext, onsists in onsidering rough image segmentationbut in the same time a �ne visual appearan e des ription. Indeed, we state that we do notneed high pre ision nor sub-pixel ontour dete tion for image retrieval. Let us onsideragain the tiger visual retrieval example. It is su� ient to at h a pie e of tiger texturesurfa e to be able to retrieve images that ontain tigers (or tiger-like surfa es). We do notneed to segment pre isely legs or other ontour details. On the other hand, we ompensatethe roughness of this obje t extra tion by �ne adaptive region appearan e des ription. We onsider this hoi e as a reasonable ompromise sin e an optimal and generi segmentationmethod is a utopia.2.1.1 Region extra tionDete ted regions should en ompass a ertain visual diversity to be visually hara teristi ,using a oarse segmentation. We want to stay beyond a too �ne level of spatial and featuredetails. This oarseness makes regions a omplementary approa h to points of interest thatrather hara terize high spatial frequen ies.RR n° 5068

6 Boujemaa, Fauqueur & GouetFigure 1: Example of segmented images. Small dis arded regions are shown in gray. Moreexamples of segmented images are available online [1℄.The adopted segmentation approa h, proposed in [11℄, is unsupervised and fast. It isbased on the lustering of Lo al Distributions of Quantized Colors (referred to as LDQC's).Extra ted in pixel neighborhoods this primitive aptures the lo al olor variability.LDQC's are extra ted as follows. For a more ompa t representation of olors in LDQChistograms, a olor quantization of the image is �rst performed. Pixels in the Luv spa eare grouped using the Competitive Agglomeration lustering algorithm (referred to as CA,see [14℄) whi h has the major advantage to automati ally determine the number of lusters.We slide a window over pixels and evaluate the orresponding lo al distribution over thequantized olor set. To ea h window (or neighborhood) orresponds a LDQC. All extra tedLDQC distributions are lustered using CA and the Color Quadrati Form Distan e [19℄.Regions are de�ned as onne ted pixels in image spa e whi h have the same LDQC lustertag. A Region Adja en y Graph is generated to merge or dis ard small regions, to im-prove spatial oheren e of dete ted regions. We invite the reader to refer to [11℄ or [13℄ formore details on this segmentation s heme. Figure 1 illustrates some examples of our oarsesegmentation s heme.On a standard 2GHz PC, segmenting a 500x400 image takes 1.9 se ond on average. Onan 11,479 image database, 5.2 regions per image are extra ted on average.This region extra tion s heme is employed for both region-based retrieval s hemes: re-trieval by example region and retrieval by logi al omposition of region ategories (see se tion3).2.1.2 Region des ription and retrievalWe suppose now that all images have been segmented. Given a region in an image, the usermay wants to �nd other similar regions in the database regardless the ba kground. Regionsmay orrespond to salient areas su h as a sky, a �eld, a road or may roughly orrespond toan �obje t� su h as a ar, a fa e, an animal...The problem onsists in omparing the visual appearan e of a query region to all re-gions in the database. The key idea in our approa h detailed in [11℄ and [13℄ relies onthe extra tion of oarse regions, whi h are ompared using a �ne visual des ription, su hthat regions are spe i� against ea h other in the database. Existing olor des riptor forregions as in VisualSeek [28℄, Blobworld [8℄ and Netra [22℄ are based on a �xed palette ofINRIA

What's beyond query by example? 7approximately 200 olors (between 166 and 256) to represent the entire olor spa e. Whilesu h an approximate olor representation is well suited to des ribe and retrieve images bytheir global ontent using traditional olor histogram, regions are by onstru tion more ho-mogeneous and more numerous require a �ner olor representation to be distinguished fromone another. In [11℄, the olor variability region des riptor ADCS (Adaptive Distribution ofColor Shades) is proposed. It is based on the distribution over a �ne adaptive olor binningof ea h region. A region index onsists of the set of olor shades spe i� to ea h region andtheir orresponding olor populations. Color shades are determined for ea h region by a �ne olor quantization of the regions pixels using CA in the Luv olor spa e. Any olor from thefull olor spa e an be used as a olor shade and does not depend on a �xed olor palette.Compared to usual region olor histogram, ADCS is more ompa t and more a urate torepresent regions olor variability.Sin e olor shades are spe i� to ea h region, measuring similarity between two ADCS re-quires an adapted distribution distan e. The Generalized Form of Color Quadrati Distan e[13℄ provides an e� ient way to ompare two olor distributions whatever their respe tive olor binning.Depending on the type of sear hedFigure 2: Retrieval results from top-left lavender region.Retrieved regions are outlined in white. Color (ADCSdes riptor), area and position are the region des riptorsinvolved in the sear h. Other regions of lavender areretrieved although having various textures.

regions, additional geometri al fea-tures may be ombined with ADCS.For instan e, a query initiated by anexample of snow region may returnregions of pale sky if they have sim-ilar gray/white distributions. By on-straining the position of sear hed re-gion in the lower part of the image,one an retrieve snow regions moree� iently. On the other hand to sear hregions of vegetation, position andarea may not be parti ularly dis rim-inant. In our Ikona platform [4℄, weallow the user to intera tively set rel-ative weights between ADCS (see [11℄),position and area in the region omparison pro ess depending on the type of sear hed regions.Figure 2 illustrates a query on a lavender region using the ombination of ADCS, area andposition. Retrieved regions are similar to the query region with respe t to both photometri and geometri features.Indexing all regions in a 400x500 image takes 0.8 se ond on average. Retrieving regionssimilar to a given example region from a photosto k database of 11,479 images with 56,374regions, takes 0.8 se ond at most on a 2GHz PC.RR n° 5068

8 Boujemaa, Fauqueur & Gouet2.2 Points of InterestIn this se tion, we present an alternate method to perform partial query with harder require-ments on the retrieved pat h visual properties (photometri or geometri ). This se tion isstru tured as follows: se tion 2.2.1 des ribes the olor lo al des riptor used for sub-imageor obje t retrieval whi h is based on points of interest. In se tion 2.2.2 we present ourpoint-based image retrieval system a ording to this lo al des ription. Finally two pra ti als enarios are presented at se tion 2.2.3.2.2.1 A lo al olor image des riptionWhen applied to image retrieval, image mat hing based onFigure 3: Example of Har-ris Color Point extra tion.

points of interest needs points with ex ellent repeatability, i.e.points that an be extra ted from images with the same a u-ra y and under various onditions like viewpoint or illumination hanges. Many point extra tors exist for gray value images, forexample [20℄ [23℄ and only one for olor images [24℄. It has beendemonstrated [18℄ that it is the olor operator whi h �ts betterfor the required repeatability. We use this one to extra t thepoints in the whole image during the indexing step.In a se ond step, it is ne essary to des ribe the points in afeature spa e, whi h is fun tion of the photometri informationaround the point. Some approa hes exist for grey value images[3℄ [25℄ [26℄. For olor images, the HCP solution (Harris ColorPoints) proposed in [16℄ onsists in a hara terization based on the ombination of theHilbert's di�erential invariants, whi h is invariant to image translation and rotation, robustto s ale hange (with a multi-s ale approa h), to illumination hanges (if images or featuresare lo ally normalized), and robust to image oding and ompression by onsidering loworders [17℄. At �rst order for RGB images, we obtain the following features at point −→x :−→v (−→x , σ) = (R || ∇R ||2 G || ∇G ||2 B || ∇B ||2 ∇R.∇G ∇R.∇B)T (1)where σ represents the size of the Gaussian smoothing applied during the derivatives om-putation.The similarity measure employed is the Mahalanobis distan e δ2, whi h takes into a - ount the di�erent magnitudes of the omponents and in ludes a model of noise. Su h afeature spa e will be noted (V , δ2) in the paper; its size depends on the order onsideredfor the invariants omputation.2.2.2 Retrieval strategyWe present in this se tion the strategy we have adopted to retrieve the most similar imagesto the query one onsisting of a voting algorithm. INRIA

What's beyond query by example? 9If we onsider the lo al des riptors presented above, an image is represented by a set ofn points {pi} hara terized in a feature spa e (V, δ2). In this ontext, building an index fora database {Ij} of N images onsists in omputing a set of n × N des riptors in (V, δ2).Sear hing for an image or part of an image in the indexed image database omes to �nd inthe spa e the losest points to the query points.Let {qi} be the set of query points. The {ri} losest points to the {qi} query ones are hara terized by s ores whi h are fun tion of the distan e between the ouples (ri, qi). A voteis omputed for ea h image by onsidering a ombination of the s ores related to the mat hes(ri, qi) involved in the image. The most similar images to the query are hara terized by thebest votes. The omplexity of the query an be e� iently redu ed by organizing the indexesa ording to multidimensional stru tures.In addition, a semi-lo al geometri hara terization that onsiders the spatial relationsbetween neighbor points of the same image an be added to enri h the photometri des rip-tion [24℄.2.2.3 Example of retrievalWe present in this se tion two typi al s enarios of use of the lo al image des ription presentedabove.Pattern retrieval: Digital art images are be oming widely available for ultural her-itage material sharing. Moreover, one an �nd art resour es a ross the web with onlinebrowsing fa ilities (e.g. online museum olle tions, ompanies' websites1 whi h propose on-line galleries of antiques to sell, et .). In the best ases, the user an interrogate the databaseby using keywords.Other websites23 , dedi ated to stolen works of art, make an inventory of stolen itemsafter registration to poli e whi h although sometimes asso iate images, do not allow anykind of visual interrogation.We present in this se tion one pra ti al interrogation s enario on su h databases, a - ording to the ontent-based indexing and retrieval approa hes presented in the previousse tion. The database used for the experiments ontains 1077 olor images of antiques4with di�erent viewpoints, illumination onditions and partial o lusions.Let us suppose that a olle tor got a vase (see left pi ture) and that hewould like to a quire other items from the same olle tion. If he does notknow that this pie e is a Gre o-Roman por elain urn hara terized by s rollinga anthus leaves, it will not be easy for him to exploit keyword-based sear htools. On the other hand, the lo al ontent-based approa h presented will beable to perform this task in a better way, as illustrated in �gure 4. This resultpresents the interfa e of our CBIR system Ikona [4℄, here implemented with theHCP des riptor. The �rst image on the upper-left shows the query area de�nedmanually by the user. Only the points of interest ontained in this windoware used during the retrieval step. The retrieval results are then presented byRR n° 5068

10 Boujemaa, Fauqueur & Gouet

Figure 4: An example of pattern retrieval: intera tive query sele tion by the user represented bythe blue re tangle area in top-left image. Random view of antiques image database ontent (left).The partial query results by de reasing similarity value (right).de reasing order of vote. We see that the database e�e tively ontains two other obje ts hara terized by these parti ular blue leaves and belonging to the query series.Ba kground retrieval in di�erent ontexts: In the s enario of �gure 5 below, wewant to retrieve the images of the TV series whi h ontain a parti ular ba kground sele tedby the user. Indeed, we fo used on the upper left part of the image, whi h shows partiallysomething like a wine storeroom.The retrieval was performed on this parti ular region, des ribed by about thirty interestpoints, as exhibited on the se ond image of the �gure. The mat hing results were sorted byde reasing order of the image s ores obtained and the best ones are presented on the se ondrow of the �gure. We show that the query area has been retrieved in �ve images, whi hinvolve the same room but with di�erent hara ters. The resulting images di�er from theimage query in global shape and olor, in viewpoint and present some o ultation. Globalindexing approa hes naturally would not have given interesting results for this lass of query.Approa hes based on region segmentation would not allow the user to make the query onthis part, sin e it represents small regions not easily dete table.The s enario presented here is urrently used by the Fren h Judi ial Poli e as an in-vestigation aid with image similarity retrieval. Other examples of sub-image retrieval withpoints of interest an be seen at [2℄.1http://www.fa ents. om/.2http://www.gazette-drouot. om/vols.html3http://www.gendarmerie.defense.gouv.fr/judi iaire/4Images presented in this se tion are provided by the �Fren h A ents� ompany(http://www.fa ents. om/ ).INRIA

What's beyond query by example? 11

Figure 5: A s enario of ba kground retrieval with di�erent foreground visual information.2.3 Dis ussion of lo al image signatures typologyIn se tions 2.1 and 2.2 we proposed two approa hes to hara terize and retrieve images parts:regions and points of interest. Both allow the user to sele t an image area and retrieve imageswhi h ontain a similar area. But they really di�er in terms of visual representation andusage and their omplementarity is an asset to satisfy di�erent partial query s enarios.In the �rst ase, a region is de�ned as an area of onne ted pixels whi h have similarlo al des ription (the LDQC) in an o�-line and unsupervised phase. Regions are oarseand orrespond to dominant and homogeneous areas in images. Their des ription takes intoa ount all pixels of the region by measuring their olor distribution (with ADCS): this isa statisti al des ription, i.e. the smaller and the fewer the details are and the lower theirparti ipation in the region des ription is. The statisti al nature of the des ription allows anapproximate sear h based on the overall appearan e of dete ted regions.In the se ond ase, points are dete ted with the HCP dete tor and exist only on very smallsites (a few pixels wide) whi h present a high photometri variability. As a onsequen e,uniform image sites are ignored. These points are then des ribed by Hilbert di�erentialinvariants whi h hara terize this photometri variability. User sele tion en ompasses theset of dete ted points whi h are the query points. The voting s heme allows mat hingthem with the best andidate points in ea h image. As a onsequen e, areas of interestare intera tively de�ned. The pointwise mat hing s heme and the �exibility in the areade�nition make the point retrieval omputationally expensive.In �gure 6, on the same three images regions and points were dete ted. We noti e thatin uniform (sky, snow) and smooth parts (green blurred ba kground) no points are dete tedwhile segmentation easily dete ted these parts. On the other hand, points appear on largetextured parts (grape sta k, vegetation) and on details (wine bottles).RR n° 5068

12 Boujemaa, Fauqueur & Gouet

Figure 6: Lo al features: region segmentation and point of interest extra tion on a parti ular s ene.From left to right: original images, segmented images, images with region boundaries superimposedon originals, extra ted points of interest. While points of interest dete t image sites of high spatialfrequen y, regions are expe ted to dete t large areas whi h are homogeneous with respe t to aphotometri primitive.

INRIA

What's beyond query by example? 13The two obje tive di�eren es between both approa hes are the advantage of �exibilityfor the area de�nition in the point approa h and a mu h lower retrieval omputational ostfor the region approa h.Other di�eren es depend on the nature of sear hed areas. If user's fo us in images islarge and homogeneous areas, region approa h should be preferred sin e dete ted pointswill be non-existent or will orrespond to noise. Conversely to sear h very small areas with hara teristi details, region approa h is irrelevant sin e segmentation may not have dete tedsu h small areas and even if dete ted, their des ription won't be relevant (see wine ellars enario). Typi ally, a pre ise (i.e. with a �ne level of visual details) sear h on salient detailsshould be arried out with points. As a onsequen e, the usage of ea h approa h should bemotivated by the user target pre ision requirements. Otherwise, if the nature of sear hedareas is not learly de�ned in advan e, region approa h should be used be ause of its fasterresponse. Point-based retrieval is more pre ise but mu h more ostly in the same time. Ourongoing resear h work is on erned with s alability of this approa h.3 Mental image sear h by regions omposition from vi-sual thesaurusBe they global or partial based, existing CBIR systems all require a starting image or regionexample to perform a sear h. This approa h is well suited to perform visual omparisonbetween a given example and entries in the database, i.e. to answer to su h a query: ��ndimages/regions in the database similar to this image/region�. But very often, the userdoesn't have an example image to start the sear h. When the target image is only amental image in the user's mind, the prior sear h of an example to perform the a tualquery by example is tedious, espe ially in the ase of a multiple region query.A new framework presented in [12℄ di�ers ompletely from this paradigm on both queryand retrieval pro esses. Images are retrieved by logi al omposition of region ategories. Cat-egories of similar regions are generated by an o�-line unsupervised pro ess. A �photometri region thesaurus� of the database is derived from these ategories. From this thesaurus inthe query interfa e, the user an sele t the types of regions whi h should and should notappear in mental target image. It allows retrieving images very qui kly from logi al queriesas omplex as: ��nd images omposed of regions of these types and no regions of thosetypes�.3.1 Categorization and range query in the regions feature spa eThe database stru ture is based on the following prin iple. Images are �rst segmented by lassi� ation of LDQC, as proposed in se tion 2.1. All extra ted regions in the database areindexed by a visual des riptor. We de�ne the region ategories (denoted C1, ..., CP ) asthe lusters of regions whi h have similar visual features. They are the basis of the de�nitionof similar regions in the retrieval phase. Here we hoose to hara terize regions with theirRR n° 5068

14 Boujemaa, Fauqueur & Gouetmean olor su h that regions from the same ategory have similar mean olor. It is importantto note that other visual ues ould be used su h as olor distribution, texture, position,area and some spe i� des riptor. For instan e if a texture des riptor were used instead ofmean olor, ea h ategory would be expe ted to group regions of similar texture. Despitethe straightforwardness of mean olor des ription, we will see it is su� ient to form generi ategories. Regions mean olors are determined in the Luv spa e, whi h is hosen for itsper eptual uniformity. We annot make a priori assumption on erning the well-de�nition of lusters of regions for any database. But what an always be guaranteed is an intra- ategoryvisual oheren e by setting a �ne lustering granularity.Region ategories are formed by grouping the regions des riptors with CA and a �negranularity. For ea h region ategory, its representative region is de�ned as the losestregion to its prototype. Representative regions are used to identify ea h ategory in thequery interfa e. Sin e similarity between regions will be de�ned, at a �rst level, as membersof the same ategory, a �ne lustering granularity will ensure the retrieval of very similarregions (hen e high retrieval pre ision). At a se ond level, we also onsider as similar regionswhi h are in lose ategories ( alled �neighbor ategories�) to also allow high re all. This keyidea allows a hieving range queries in the regions feature spa e. The neighbor ategoryof a ategory Cq of prototype pq is de�ned as a ategory Cj whose prototype pj satis�es:|| pq−pj ||L2< γ, for a given range radius threshold γ (whi h is adjusted at the query phase).We all Nγ(Cq) the set of neighbor ategories of a ategory Cq. See �gure 7 for anillustration of the de�nition of neighbors using the radius. Note that thanks to this rangequery s heme, the sear h is less dependent on the database partition into ategories sin eall lose ategories are onsidered together.The ombination of homogeneous region ategories with the integration of neighbor at-egories is the key hoi e in the de�nition of the range query s heme.3.2 Image Retrieval by CompositionFrom this point on, regions aren't onsidered individually anymore but are totally identi�edto the ategory they belong to. With the help of all ategories representative regions (see�gure 3.2), the user will sele t Positive Query Categories (referred to as PQC's) and Neg-ative Query Categories (NQC's). The PQCs orrespond to the user-sele ted ategories ofregions whi h should appear in retrieved images. They are denoted as {CPQ1

, ..., CPQM}.The NQCs orrespond to the user-sele ted ategories of regions whi h should not appear inretrieved images and are denoted as {CNQ1

, ..., CNQR}. In its most omplex form, a query omposition is de�ned as the formulation: ��nd images omposed of regions in these PQCsand no region from those NQCs�. It is expressed as the list of PQC labels {pq1, ..., pqM} andNQC labels {nq1, ..., nqR}.Performing a query omposition �rst requires to retrieve images whi h ontain a regionfrom a single PQC ategory denoted Cpq say. For a given ategory Cpq , we de�ne IC(Cpq)to be the set of images ontaining at least one region belonging to ategory Cpq. To expandthis sear h to a range query, we also take into a ount neighbor ategories of Cpq by de�ning

INRIA

What's beyond query by example? 15B

range radius

range radius

A

B

range radius

A

range radius

Figure 7: Range radius and neighbor ategories: A and B are two ategories. Neighbor ategoriesare drawn with thi k ontours. Prototypes are identi�ed by rosses. The gray disks over theneighbor ategories. A high radius (top) or a lower radius (bottom) integrates more or less neighbor ategories to de�ne the type of sear hed regions.relevant images as those whi h have a region from ategory Cpq or from any of its neighbors:⋃

C∈Nγ(Cpq)

IC(C) (2)Range radius threshold γ is set in the user interfa e.To extend the query to all M PQCs: Cpq1, ..., CpqM

, we sear h images whi h have aregion in Cpq1or its neighbors and ... and a region in CpqM

or its neighbors. The set SQ ofimages satisfying this multiple query is now written as:SQ =

M⋂

i=1

[

C∈Nγ(Cpqi)

IC(C)

] (3)Then, to also satisfy the negative query we must determine images whi h ontain aregion from any of the R NQCs : Cnq1, ..., CnqR

. As before, neighbor ategories should alsobe taken into a ount. So the set SNQ of images ontaining the NQCs is written as:SNQ =

R⋂

i=1

[

C∈Nγ(Cnqi)

IC(C)

] (4)RR n° 5068

16 Boujemaa, Fauqueur & GouetThe set Sresult of retrieved images whi h have regions in the di�erent PQCs and whi hdon't have regions in the NQCs is expressed as the set subtra tion of SQ and SNQ :Sresult = SQ \ SNQ (5)This set Sresult onstitutes the �nal set of relevant images.Unions, interse tions and subtra tions in the expression of Sresult are dire tly equivalentto formulate the query with logi al operators as illustrated in �gure 10: or between theneighbors (in expressions of SQ and SNQ ), and between query ategories (also in SQ and

SNQ), andnot for negative query ategories (in expressions of Sresult ).To evaluate the expression of Sresult , the brute for e approa h would onsist in testing,for ea h image in the database, if it ontains regions belonging to the PQCs (and theirneighbors) but ontains no region from the NQCs (and their neighbors). Instead, to redu edramati ally this number of tests in a simple way, we use the fa t that Sresult is expressedas interse tions and subtra tions of image sets. The idea is to initialize Sresult with one ofthe image sets and then dis ard images whi h don't belong to the other image sets. Thisinitialization avoids testing individually ea h image of the database. We dire tly start o�with a set of potentially relevant images. Sresult will be gradually redu ed as follows:1. initialize Sresult as the set ⋃

Nγ(Cpq1) IC(C).2. dis ard images in Sresult whi h do not belong to any of the other union ategories(i=2,...,M) to obtain the interse tions of SQ . At this point, we have Sresult = SQ .3. to perform the subtra tion of SNQ from Sresult , dis ard in Sresult images whi h belongto the negative-query union ategories (i=1,...,R). We get Sresult =SQ \ SNQ .Gradually, Sresult is redu ed from ⋃

Nγ(Cpq1) IC(C) to SQ\SNQ. By this approa h, we'llsee in next se tion that a signi� ant fra tion of the database is not a essed at all.This retrieval s heme is easily implemented using three tables of asso iation whi h pro-vide asso iations between ategories, neighbor ategories and images. It is important to notethat at retrieval time we don't deal with regions themselves but only with images and labelsof region ategories, so that we don't have to individually a ess the large number of regionsin the database. Sear h pro ess is very fast sin e it only involves elementary operationson integers, unlike lassi sear h approa hes whi h require distan e omputations betweenmultidimensional feature ve tors.3.3 Results and query intera tionOn a database of 9,995 images from Corel Photosto k5, 50,220 regions were extra ted bysegmentation. From these regions, 91 ategories were generated by grouping their mean olors.5Corel Photosto k: http://www. orel. om

INRIA

What's beyond query by example? 17In the query interfa e (see �gure 8), ea h ategory of regions is represented by its rep-resentative region whi h is de�ned as the region whose index is the losest to the ategoryprototypes. Figure 8 shows a part of the 91 representative regions. The set of ategoryrepresentatives provides an overview of the types of regions. This set de�nes the photo-metri region thesaurus of the database. Any region ategory an be sele ted by itsrepresentative region to indi ate that this type of regions should or should not appear inmental target image. Given the set of sele ted ategories (see �gure 9 the query is translatedby the system as a logi al expression of omposition of region ategories (see �gure 10).Consider an example of queryFigure 8: In the query interfa e, the user an sele t ea hof the 91 ategory representatives as a PQC or a NQC. Ati k in a green box indi ates that this type of region shouldappear in the mental target image and a ti k in a redbox indi ates the type should not appear. This interfa e onstitutes the �photometri region thesaurus�.Figure 9: A query example: the user wants images witha blue region and a gray region but with no green region.Range radius γ is set to 15.Figure 10: Query of �gure 9) is translated by the systemas a logi al expression of omposition of region ategorieswith the neighbor ategories orresponding to ea h query ategory.

omposition. In the Corel database,to sear h itys apes, the user willsear h images with a building, somesky and no vegetation. This an betranslated into the following query omposition: �images omposed ofa gray region and a blue region butno green region�. Figure 9 and �g-ure 10 illustrate this query and �g-ure 11 ( ropped s reenshot) showsthe relevant set of images retrievedfor this query among the 9,995. Inthese images, gray regions mat hbuildings, monuments or ro ks andblue regions essentially mat h sky.The system also shows images whi hwere reje ted due to presen e of agreen region (see �gure 12). It isinteresting to noti e that, on thisquery for itys apes, reje ted im-ages orrespond to lands apes (see�gure 12)). We observed that vi-sual semanti s arises from the log-i al omposition expressed by theuser.Images retrieved with su h a sim-ple system do satisfy the onstraintsof region omposition. Its perfor-man e relies on the ability of thesegmentation s heme to orre tly de-te t salient regions and the natureof des riptor (from whi h regions are grouped) whi h should be hosen with respe t to theRR n° 5068

18 Boujemaa, Fauqueur & Gouetdomain of appli ation. Note that ategory and range radius sele tion an help the userimprove intera tively the retrieval performan e depending on his satisfa tion.Retrieval s heme solely relies on a esses to three tables of asso iation and no numeri alfeature distan e is involved at this step (only at o�-line grouping phase). Retrieval s heme ishen e very fast: 0.03 se ond for most omplex queries on a standard 498MHz PC. Besides,note that sear h in the database is neither exhaustive in terms of images, nor region ate-gories, nor regions be ause database organization provides dire t a ess to set of potentiallyrelevant regions.Figure 11: Retrieved images from the � itys ape� query: �gray region and blue region and notgreen regions�. Images do ontain a blue and gray region but no green region.Figure 12: Images reje ted from the � itys ape� query due to the presen e of a green region.Reje ted images turn out to depi t natural lands apes.When viewing ategories as labels of similar regions, whi h are the onstituting unitsof the database images, this indexing an be onsidered as symboli rather than numeri al.The following omparison with text retrieval an be made :� image → do ument� region → term� region ategory → on ept� neighbor ategories → similar/synonymous on epts� set of region ategories → thesaurus� Query by logi al omposition → Google-like query 63.4 Summary of the new paradigmThis new paradigm allows performing logi al omposition of region ategories. The systemallows retrieving images by query omposition like: ��nd images with regions of theses typesand not like those types�. The originality of this approa h relies on the grouping of similarregions into ategories and has the following advantages:� no required starting example region6http://www.google. om

INRIA

What's beyond query by example? 19� query by image omposition using regions ategories� natural region range query by intera tive de�nition of neighbors ategories� e� ient indexing and very fast image retrievalAlthough a very simple olor region feature is used, the onstraint of omposition inretrieved images seems to express some underlying �visual semanti s� in images.This framework is very simple and general. It an lead to further developments su has proposing a more per eptual arrangement of ategories in query interfa e, integratingother region des riptors, handling spatial layout of the ategories, developing a hierar hi alregion ategorization to handle very large databases, asso iation between visual ategoriesprototypes and textual ontology to allow a visual-semanti sear h.4 Con lusionThe most lassi al visual query paradigm QBVE, where the visual query on erns the wholeimage was useful to make the proof of feasibility for information retrieval by visual ontent.On e it is viable, we an ful�ll the everyday user requirements beyond the QBVE simplequery formulation.We have presented some new user visual query formulation me hanisms. In all ases,these me hanisms allow the user to pre ise visual target sele tion, expression of visual targetimage omposition in luding his preferen es, with logi al omposition. The latest ase on- erns visual information retrieval when no starting image example is available but a mentalimage in the user mind. The main goal is to �t more and more to the semanti target of theuser. Hen e, the user is more and more intera ting with the system by the means of newquery paradigms.Ongoing work, deal with s alability issues (parti ularly for point-of-interest des ription),di�erent lo al image signatures with regions (handling spatial layout) and point features ombination. Also, hybrid text and image indexing are heavily investigated to help bridgingthe semanti gap, parti ularly making the onne tion between textual ontology with theregions prototypes in the visual thesaurus.Referen es[1℄ http://www-ro q.inria.fr/~fauqueur/ADCS/.[2℄ http://www-ro q.inria.fr/~gouet/HCP/.[3℄ A. Baumberg. Reliable feature mat hing a ross widely separated views. IEEE ComputerVision and Pattern Re ognition (CVPR), pages 774�781, 2000.[4℄ N. Boujemaa, J. Fauqueur, M. Fere atu, F. Fleuret, V. Gouet, B. Le Saux, and H. Sahbi.Ikona: Intera tive generi and spe i� image retrieval. International workshop on Mul-timedia Content-Based Indexing and Retrieval (MMCBIR'2001), Ro quen ourt, Fran e,pages 25�28, 2001.RR n° 5068

20 Boujemaa, Fauqueur & Gouet[5℄ N. Boujemaa, J. Fauqueur, and V. Gouet. What's beyond query by example? to appearin Trends and Advan es in Content-Based Image and Video Retrieval, L. Shapiro, H.P.Kriegel, R. Veltkamp (ed.). LNCS, Springer Verlag, 2004.[6℄ N. Boujemaa, V. Gouet, and M. Fere atu. Approximate sear h vs. pre ise sear h byvisual ontent in ultural heritage image databases. Invited paper in MIR workshop in onjun tion with ACM Multimedia, Juan-Les-Pins, Fran e, 2002.[7℄ N. Boujemaa C. Vertan. Upgrading olor distributions for image retrieval: an we dobetter? Pro . of International Conferen e on Visual Information System (VIS'00),pages 178�188, 2-4 Nov. 2000.[8℄ C. Carson and al. Blobworld: A system for region-based image indexing and retrieval.Pro . of International Conferen e on Visual Information System, LNCS vol. 1614, pages509�517, 1999.[9℄ M. La Cas ia, S. Sethi, and S. S laro�. Combining textual and visual ues for ontent-based image retrieval on the world wide web. IEEE Workshop on Content-based A essof Image and Video Libraries (CBAIVL'98), june 1998.[10℄ A. DelBimbo. Visual Information Retrieval. Morgan Kau�man, San Fran is o, CA,1999.[11℄ J. Fauqueur and N. Boujemaa. Region-based retrieval: Coarse segmentation with �nesignature. IEEE International Conferen e on Image Pro essing (ICIP), pages 609�612,2002.[12℄ J. Fauqueur and N. Boujemaa. New image retrieval paradigm: logi al omposition ofregion ategories. IEEE International Conferen e on Image Pro essing (ICIP), pages601�604, 2003.[13℄ J. Fauqueur and N. Boujemaa. Region-based image retrieval: Fast oarse segmentationand �ne olor des ription. Journal of Visual Languages and Computing (JVLC), spe ialissue on Visual Information Systems, 15(1):69�95, 2004.[14℄ H. Frigui and R. Krishnapuram. Clustering by ompetitive agglomeration. PatternRe ognition, 30(7):1109�1119, 1997.[15℄ D. Geman and R. Moquet. A sto hasti model for image retrieval. Congrès Fran ophonede Re onnaissan e des Formes et Intelligen e Arti� ielle (RFIA), Paris, pages 173�180,2000.[16℄ V. Gouet and N. Boujemaa. Obje t-based queries using olor points of interest. IEEEWorkshop on Content-Based A ess of Image and Video Libraries (CBAIVL'01), 2001.[17℄ V. Gouet and N. Boujemaa. On the robustness of olor points of interest for imageretrieval. IEEE International Conferen e on Image Pro essing (ICIP), Ro hester, USA,2002. INRIA

What's beyond query by example? 21[18℄ V. Gouet, P. Montesinos, R. Deri he, and D. Pelé. Evaluation de déte teurs de pointsd'intérêt pour la ouleur. Congrès Fran ophone de Re onnaissan e des Formes et In-telligen e Arti� ielle (RFIA), Paris, 2000.[19℄ J. Hafner, H. Sawhnay, and al. E� ient olor histogram indexing for quadrati formdistan e fun tions. IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e(PAMI), 17(7):729�736, July 1995.[20℄ C. Harris and M. Stephens. A ombined orner and edge dete tor. Alvey VisionConferen e, pages 147�151, 1988.[21℄ Thomas P. Minka Ingemar J. Cox, Matt L. Miller. The bayesian image retrieval system,pi hunter: Theory, implementation and psy hologi al experiments. IEEE Transa tionson Image Pro essing, 9(1):20�37, 2000.[22℄ W. Y. Ma and B. S. Manjunath. Netra: A toolbox for navigating large image databases.Multimedia Systems, 7(3):184�198, 1999.[23℄ K. Mikolaj zyk and C. S hmid. Indexing based on s ale invariant interest points. In-ternational Conferen e on Computer Vision (ICCV), pages 525�531, 2001.[24℄ P. Montesinos, V. Gouet, and R. Deri he. Di�erential invariants for olor images.In Pro eedings of 14th International Conferen e on Pattern Re ognition (ICPR'98),Brisbane, Australia, 1998.[25℄ F. S ha�alitzky and A. Zisserman. Multi-view mat hing for unordered image sets.European Conferen e on Computer Vision (ECCV), pages 414�431, 2002.[26℄ C. S hmid and R. Mohr. Lo al grayvalue invariants for image retrieval. IEEE Trans-a tions on Pattern Analysis and Ma hine Intelligen e (PAMI), 19(5):530�535, 1997.[27℄ A. Smeulders, M. Worring, and S. Santini. Content-based image retrieval at the endof the early years. IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e(PAMI), 22(12), 2000.[28℄ J. R. Smith and S. F. Chang. Visualseek: A fully automated ontent-based image querysystem. ACM Multimedia Conferen e, Boston, MA, USA, pages 87�98, 1997.������������������������RR n° 5068

22 Boujemaa, Fauqueur & GouetContents1 Problem statement 32 Partial visual sele tion for pre ise query 52.1 Regions of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.1 Region extra tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Region des ription and retrieval . . . . . . . . . . . . . . . . . . . . . . 62.2 Points of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.1 A lo al olor image des ription . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Retrieval strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.3 Example of retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Dis ussion of lo al image signatures typology . . . . . . . . . . . . . . . . . . 113 Mental image sear h by regions omposition from visual thesaurus 133.1 Categorization and range query in the regions feature spa e . . . . . . . . . . 133.2 Image Retrieval by Composition . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Results and query intera tion . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Summary of the new paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Con lusion 19

INRIA

Unité de recherche INRIA RocquencourtDomaine de Voluceau - Rocquencourt - BP 105 - 78153 Le ChesnayCedex (France)

Unité de recherche INRIA Futurs : Parc Club Orsay Université- ZAC des Vignes4, rue Jacques Monod - 91893 ORSAY Cedex (France)

Unité de recherche INRIA Lorraine : LORIA, Technopôle de Nancy-Brabois - Campus scientifique615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex (France)

Unité de recherche INRIA Rennes : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex (France)Unité de recherche INRIA Rhône-Alpes : 655, avenue de l’Europe - 38334 Montbonnot Saint-Ismier (France)

Unité de recherche INRIA Sophia Antipolis : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex (France)

ÉditeurINRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France)http://www.inria.fr

ISSN 0249-6399