Leveraging non-relevant images to enhance image retrieval performance

Leveraging Non-Relevant Images To Enhance ImageRetrieval Performance

Ashwin T.V.IBM India Research Lab

Indian Institute of TechnologyNew Delhi, India

vashwin@in.ibm.com

Rahul GuptaIBM India Research Lab

rahulgup@in.ibm.com

Sugata GhosalIBM India Research Lab

gsugata@in.ibm.com

ABSTRACTInherent subjectivity in user’s perception of an image hasmotivated the use of relevance feedback (RF) in the imageretrieval process. RF techniques interactively determine theuser’s desired output or query concept based on user’s rele-vance judgment on a set of images. Either a parametric ora non-parametric model for the user’s query concept is typ-ically employed. The parameters of the assumed model areestimated and refined based on user’s relevance feedback.Parametric models offer higher robustness in estimation ofparameters when the size of user’s feedback is small. Non-parametric models permit easy incorporation of non-relevantimages. Most parametric model based RF algorithms useonly relevant images to refine the parameters of a distancemetric. Consequently images in the database close to thenon-relevant images continue to be retrieved in further iter-ations.

In this paper we propose a robust technique that uti-lizes non-relevant images to efficiently discover the relevantsearch region. A similarity metric, estimated using the rele-vant images is then used to rank and retrieve database im-ages in the relevant region. A decision surface is determinedto split the feature space into relevant and non-relevant re-gions. The decision surface is composed of hyperplanes, eachof which is normal to the minimum distance vector from anon-relevant point to the convex hull of the relevant points.Experimental results demonstrate significant improvementin retrieval performance for the small feedback size scenarioover two well established RF algorithms.

KeywordsImage retrieval, Relevance feedback, Similarity search, Non-relevant judgment, Ellipsoid query processing

1. INTRODUCTIONRelevance feedback (RF) based image retrieval algorithms

employ a learning algorithm or the learner to estimate theuser’s query concept given the user’s relevance judgments on

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Juan-les-PinsFranceCopyright 2002 ACM X-XXXXX-XX-X/XX/XX ... $5.00.

certain proposed images. The design of an effective learningalgorithm is crucial for the success of an RF based retrievalsystem. The learner utilizes a learning model of the user’squery concept. The parameters of the model are estimatedand refined based on user’s relevance feedback.

Existing learning models can broadly be classified into twocategories, (a) parametric models and (b) non-parametricmodels. Though both the models do have parameters to beestimated, they can be distinguished by the influence eachparameter has on the model’s behavior in the feature space.The parameters of a parametric model have a global influ-ence in the feature space, i.e., a change in any parameterinfluences the structure of the model over the entire featurespace. In contrast, parameters in a non-parametric modelhave a markedly local influence, i.e., modifying a parame-ter changes the model’s structure in only a small portionof the feature space. With large and representative trainingdata, non-parametric methods outperform parametric meth-ods and hence are regularly employed in pattern classifica-tion applications. Further, non-parametric methods permiteasy incorporation of non-relevant images. However, onlineimage retrieval applications differ from pattern classifica-tion applications in that a human user cannot be expectedto label a large number of images. Hence, the ability of thelearning algorithm to estimate the user’s query concept atpoints in the feature space far from training points is es-sential. Since parametric models aggregate all informationavailable in user’s RF into a few parameters rather thandistribute the information in the feature space, parametricmodels can be far more robustly estimated even when thesize of RF is small. Based on these observations, in this pa-per we propose a robust means of incorporating non-relevantimages (or negative examples) in a parametric model.

1.1 Summary of our contributionsOur key contribution in this paper is the use of non-

relevant judgments to delineate the relevant region in thefeature space, thereby ensuring that the relevant region doesnot contain any non-relevant images. A similarity metric es-timated using the relevant images is then used to rank andretrieve database images in the relevant region. This leadsto a significant improvement (up to 20%) in precision ofthe retrieval process as demonstrated in our experiments.The partitioning of the feature space is achieved by usinga piecewise linear decision manifold that separates the rel-evant and non-relevant images. Each of the hyperplanesconstituting the decision surface is normal to the minimum

distance vector from a non-relevant point to the convex hullof the relevant points. Our algorithm robustly estimates thehyperplanes that constitute the decision manifold even whenthe size of feedback is small. The relevant region is obtainedas the result of intersection of half spaces and hence forms aconvex subset of the feature space. This ensures that we canuse a quadratic distance metric (described in Section 2) torank and retrieve database images inside the relevant region.

The rest of the paper is organized as follows. In Section 2,we analyze several existing parametric and non-parametrictechniques. In Section 3, we formulate and describe thesolution of the query learning problem to estimate a para-metric model using both relevant and non-relevant images.Section 4 presents details of our proposed algorithm. In Sec-tion 5 we report experimental results showing that our pro-posed algorithm performs significantly better than MARS [1]and Support Vector Machine (SVM) [2] based RF algo-rithms. Finally we present our conclusions in Section 6.

2. RELATED WORKIn this section, we briefly summarize some of the recently

proposed RF algorithms In parametric model based RF al-gorithms, the quadratic distance metric given by,

D(~x, ~q, Q) = (~x− ~q)T Q(~x− ~q) (1)

and its corresponding probability density function, the Mul-tivariate Normal Density function given by,

p(~x|~q, Q) =|Q|1/2

(2π)d/2exp(~x−~q)T Q(~x−~q) (2)

is commonly used. The parameters ~q (query center), andQ (inverse of the cross correlation matrix) are estimatedfrom the relevant images. MARS [1], one of the earliest RFsystems, uses a diagonal Q. Each element of the diagonaldetermines the weight assigned to a particular feature in thedistance metric. These weights are computed as the inverseof the variance of the feature among all relevant images,Qii = 1

σ2i. The estimated query center ~qnew is given by,

~qnew = α~qprev +β

|G |X~xi∈G

~xi − γ

|B |X~yi∈B

~yi (3)

where G , B represent the sets of relevant and non-relevantimages, respectively. The heuristically designed expressionfor the query center ~q is not robust in practice. In Min-dReader [3], Ishikawa et. al used a full matrix Q and for-mulated the estimation of the parameters as an optimiza-tion problem. The parameters of the optimal solution cor-respond to the Maximum Likelihood Estimate (MLE) forthe multivariate normal distribution given the set of rele-vant images. Non-relevant images are not used in the es-timation process. Rui and Huang [4] proposed a two levelrepresentation scheme for Q matrix to better cope with thesingularity issues due to a small number of relevant images.Non-relevant images were again not considered in their for-mulation. Aksoy and Haralick [5] use two normal distribu-tions prel(~x|~qr, Qr), pnrel(~x|~qn, Qn) to model the distribu-tions of relevant and non-relevant images. A likelihood score(L) is computed for each image in the database as,

L(~x) =prel(~x|~qr, Qr)

pnrel(~x|~qn, Qn)(4)

−6 −4 −2 0 2 4 6 8 10

2Probability density of relevant examples, p

Probability density of non−relevant examples, pnrel

Log−Likelihood score = log(prel

) − log(pnrel

l, pnr

g(p re

l) −

p nrel

Figure 1: Example profile of likelihood score L inEquation 4.

Images having high likelihood scores are displayed to theuser. It is easy to see that even though the probabilitiesobtained through maximum likelihood estimates are wellbehaved, the likelihood score behaves unpredictably owingto the probability term in the denominator. In Figure 1,we illustrate for a simple 1-D case, the profile of the Log-likelihood score. It is clear from the figure that though thelikelihood score preserves the distribution of non-relevantimages, the distribution of relevant images is lost. Hencepoints in feature space with high log-likelihood scores maybe quite distant from the relevant examples.

In machine learning literature, decision trees [6] are rou-tinely used to achieve a partitioning of the feature space.Inducing a decision tree on the relevant and non-relevantimages may result in multiple disconnected relevant regions.MacArthur et. al [7] use a single Euclidean distance metricto rank and retrieve images from multiple relevant regions.Estimating a similarity metric independently in each rele-vant region would be difficult given the small number ofrelevant judgments in each of the relevant regions.

Among the recent non-parametric techniques, Wu et. al [8](D-EM algorithm) use an expectation maximization (EM)step to label a few unlabelled examples in the database andhence increase the size of the training set. The EM step isfollowed by a Multiple Discriminant Analysis (MDA) stepon the expanded training data to obtain the optimal direc-tions for projection. The parameter estimates of the EMstep after a few iterations are used to classify and rank im-ages in the database. The computation time for the D-EMalgorithm is of major concern since it involves repeated EMsteps on images in the database. Kernel based learning al-gorithms [9] have recently received a lot of attention forsolving classification problems. The basic idea of a Ker-nel based approach is to map the the feature vectors in theinput space into a higher dimensional space via the kerneloperator. Zhou and Huang [10] have adopted a kernel-basedapproach to their Biased Discriminant Analysis algorithm toperform non-linear discrimination in the input feature space.They report significant improvement in classification perfor-mance using a relatively large number (100) of relevant ex-amples. Support Vector Machines (SVM) [2] have emergedas the most promising of the kernel based classification al-gorithms. In the context of RF, Tong and Chang [11] useSVM to classify the relevant images from the non-relevantimages. They use an active learning based strategy to se-lect the most informative images to solicit user’s feedback.Though it is theoretically possible to compute the optimalset of images to request feedback from the user, the au-thors observe that that the computational requirements areexponential in the number of images shown [11]. To over-come this, they propose that images close to the classifica-

Table 1: Symbols and their definitions.Symbol Definition

D = {~x1, ~x2, . . . } Feature vectors in the dataset.d Feature space dimension.

G = {~g1, ~g2, . . . } Feature vectors of images marked rel-evant by the user.

~v Goodness scores assigned by user torelevant images. If user only identifiesrelevant images, set ~v = ~1.

B = {~b1,~b2, . . . } Feature vectors of images markednon-relevant by the user.

k Number of images displayed to theuser in each iteration to solicit furtherfeedback.

~qcurr, ~qopt Query center for the current iteration,optimal solution with current set offeedback

Qcurr, Qopt Inverse covariance matrix for the cur-rent iteration, optimal solution withcurrent set of feedback

D(~x, ~q, Q) Quadratic distance function with pa-rameters (~q, Q), defined in Equa-tion 5.

tion boundary be shown to the user. For a comprehensivereview of RF, readers are referred to [12].

3. PROBLEM FORMULATIONRelevance judgments provided by the user at each itera-

tion of relevance feedback constitute a relevant set of imagesG , and a non-relevant set of images B . If the user providesdifferent degrees of desirability for the relevant images, thenthis information is available as a vector of goodness scores~v. If user only marks relevant images, then the goodnessscores for all relevant images are set to 1. Symbols used inthis paper and their definitions are listed in Table 1.

Let ~x and ~q represent the feature vectors correspondingto an image in the database and the estimated center ofthe query. A quadratic distance function to measure thedistance of ~x from ~q is defined as

D(~x, ~q, Q) = (~x− ~q)T Q(~x− ~q) (5)

MindReader [3] estimates the parameters (~q, Q) to minimizethe total distance of images in the relevant set G . This canbe written as,

min~q,Q

X~xi∈G

viD(~xi, ~q, Q). (6)

Subject to:

det(Q) = 1. (7)

Consider the ellipsoid E defined by the estimated parameters(~qopt, Qopt) and “radius” equal to the largest value of Dopt

distance for relevant images. The images displayed to theuser to obtain the next set of RF are obtained by expandingE till k images are enclosed inside the ellipsoid. However,during this expansion, the exclusion of non-relevant imagesis not guaranteed.

To ensure that non-relevant images are not retrieved, weformulate a new optimization problem with additional con-straints as follows, (abbreviating D(~x, ~q, Q) as D(~x))

min~q,Q

X~xi∈G

viD(~xi) (8)

Subject to :

∀~x ∈ G , D(~x) ≤ c (9)

∀~x ∈ B , D(~x) > c (10)

|{~x : ~x ∈ D , D(~x) ≤ c}| ≥ k (11)

c > 0 (12)

det(Q) = 1. (13)

Let the optimal solution be (~qopt, Qopt) and the optimal dis-tance function with parameters (~qopt, Qopt) be Dopt. Con-sider the ellipsoid E defined by (~qopt, Qopt) with radius copt.Equations 9 and 10 ensure that E partitions the feature spacewith all relevant images inside and non-relevant images out-side. Equation 11 ensures that E encloses at-least k imagesto present to the user. Equation 10 and 11 ensure thatnon-relevant images are not shown to the user in the nextiteration. The minimization of distances of relevant imagesensures that relevant images are ranked higher than otherimages in the database.

Whereas the above formulation is sufficient to utilize non-relevant information effectively, since the formulation in-volves solving a quadratic optimization problem with quadraticconstraints, a straightforward solution of Equation 8 is dif-ficult to obtain. Considering the quadratic nature of theconstraints, it is likely that the feasible region for the pa-rameters (~qopt, Qopt) is not convex, leading to numerous lo-cal minima. Also, Equation 11 involves images from thedatabase. When this constraint is expanded, an additionalconstraint for each image in the database is generated. Thismakes the problem expensive to solve in the current form.

To decrease the computation required, we simplify theabove formulation as follows. The problem is first split intotwo independent subproblems,

Subproblem 1 Find a decision boundary that separatesthe relevant from non-relevant images . The boundaryshould be sufficiently close to the non-relevant imagesto maximize the size of the relevant region.

Subproblem 2 Find a distance function that minimizesthe total distance of relevant images.

We approximate the decision boundary by a piecewise linearmanifold, i.e., the decision surface is composed of a numberof hyperplanes. This reduces computation time and alsoallows the convexity constraint for the relevant region to beeasily incorporated. The second subproblem is the same asMindReader’s formulation and hence their results hold inthis case.

It is easy to see that the convex hull (CH) of the relevantimages is one of the many possible piecewise linear manifoldsthat satisfy Equations 9 and 10. To satisfy Equation 11 weneed to “expand” the convex hull so as to obtain k imagesinside. Note that during expansion we must also ensure thatEquations 9 and 10 hold for the new manifold. This processis not efficient since images from the database need to beaccessed in each expansion step. We address these issuesand describe our proposed solution in the next section.

��

ε > 0

relevantimagesConvex

non−

non−relevant image

(H )−

(CH){g ...}i

seperating hyperplane

closest point in CH (p )

(a) Illustration of feature space partitioning using onenon-relevant image.

(b) 2-D example, unshaded and shaded regions repre-sent the estimated relevant and non-relevant featurespace partitions.

Figure 2: Proposed feature space partitioning tech-nique described in Section 3.1. ( Figure 4 presentsmore examples.)

3.1 Proposed solutionRather than using an incremental scheme to refine the

decision manifold (discussed in Section 3), we create a man-ifold that maximizes the size of relevant region. For each

non-relevant image (~bi), we create a hyperplane (Hi) nor-

mal to the shortest distance vector from ~bi to CH. Hi ispositioned a small distance (ε > 0) from~bi. Figure 2(a) illus-trates an example. The positive halfspace (H+

i ) of each suchHi contains the relevant images and the negative halfspace(H−

i ) contains the non-relevant image. The relevant regionR is obtained as the intersection of all positive halfspaces(H+

i ). The unshaded region in Figure 2(b) represents therelevant region. A distance metric D estimated using onlythe relevant images is then used to present top k databaseimages that belong to R. Actual computation of CH is not

necessary, the point in CH closest to ~b is obtained by solvingan optimization problem (Equation 14). Note that in somecases R may not satisfy Equation 11, i.e, there are fewerthan k images in R. Since this result is the best possiblewith a linear manifold we present a set of size less than k tothe user.

4. PROPOSED RF ALGORITHM

In this section we describe the proposed RF algorithm.As described in Section 3.1, the proposed algorithm has twoindependent steps. The first step is that of obtaining a man-ifold that separates the relevant from non-relevant images,thereby partitioning the feature space into relevant and non-relevant regions. This step is detailed in Section 4.1. Thesecond step described in Section 4.2, involves using relevantimages to estimate a distance metric. The final step involvesranking and retrieval of images from the database to presentto the user and is detailed in Section 4.3.

4.1 Partitioning the feature spaceThe proposed method separates the relevant and non-

relevant images with a piecewise linear manifold. Each ofthe non-relevant images is used to create a hyperplane asmentioned in Section 3.1 and illustrated in Figure 2(a). For

each ~bi ∈ B (~bi is the feature vector associated with theith non-relevant image), the closest point ~pi in the convexhull CH of the relevant points is computed as follows. LetG = [~g1, . . . , ~g|G|], columns of G represent the feature vec-tors associated with the relevant images in G . The vector ~pi

can be written as a linear combination of the relevant points

as ~pi = G~λ, where ~λ=[λ1, . . . , λ|G|]T and

Pj λj = 1. The

computation of ~pi can hence be formulated as,

|G~λ−~bi|2 (14)

subject to

|G|Xj=1

λj = 1 (15)

∀j, λj ≥ 0 (16)

Equation 14 is a convex quadratic problem with linear con-straints. We use the reduced gradient method outlined in

Section 7 to obtain the optimal value of ~λ. The closest point

on the hull ~pi is now obtained using ~λopt. The correspondinghyperplane (Hi) can be represented as in Equation 17.

Hi = { ~x : (~pi −~bi) · (~x−~bi) = ε } (17)

where ~a ·~b = ~aT~b

|~a| |~b|

H+i = { ~x : (~pi −~bi) · (~x−~bi) > ε } (18)

H−i = { ~x : (~pi −~bi) · (~x−~bi) < ε } (19)

Here, ε is a small positive constant. In cases where the non-

relevant point~bi lies inside CH, the closest point ~pi will equal~bi, no hyperplanes are constructed for such cases. Each hy-perplane Hi partitions the feature space into a positive half-space (H+

i , Equation 18) containing the relevant images anda negative halfspace (H−

i , Equation 19) containing the non-

relevant point ~bi. The intersection of the positive halfspacesH+

i defines the relevant region and the union of negativehalfspaces H−

i defines the non-relevant region.

4.2 Estimating the similarity metricThe parameters of the distance metric in Equation 5 are

estimated using only the relevant images (G ) along withtheir associated goodness scores (~v). The parameters (~q, Q)are estimated independent of the feature space partitions.This permits the usage of any scheme like MindReader [3]or MARS [1] to estimate the new set of parameters.

Input: G , B the set of relevant and non-relevantimages.Output: Next set of k images to solicit user’s feedback.∀bi ∈ B {

Find ~pi by solving Eq. 14

Hi defined as (~pi −~bi) · (~x−~bi) = ε, (Eq. 17)}Compute (~qnew ,Qnew , Eq. 20) or (~qnew,~wnew , Eq. 23).∀~xi ∈ D {

if ( (~pi −~bi) · (~x−~bi) > ε , ∀bi ∈ B ){/* xi lies in the relevant region */Disti = D(~xi, ~qnew , Qnew), (Eq. 5)

ORDisti = Dw(~x, ~q, ~w), (Eq. 22)

} else {/* xi lies in the non-relevant region */Disti = ∞

}}R = 5k images in D having smallest Dist values.Return k randomly sampled images from R

Figure 3: Algorithm to retrieve k relevant imagesgiven user’s current RF. Details are presented inSection 4.3.

The parameter estimates for the distance function in Equa-tion 5 used by MindReader [3] are as follows,

~qnew =

P~xi∈G vi~xiP

~xi∈G viand Qnew = det(C)

1d C−1 (20)

C is the weighted covariance matrix of the relevant images,given by

C = [cjk], cjk =X~xi∈G

vi(xij − qj)(xik − qk) (21)

In our implementation of MARS [1], we use a weightedEuclidean distance metric Dw given by,

Dw(~x, ~q, ~w) =

wi(xi − qi)2 (22)

The estimates for query center ~qnew and feature weights~wnew are given by,

~qnew =1

|G |X~xi∈G

~xi & wnewj =

σj2, ∀j = 1, . . . , d (23)

Where, σj2 is the variance of the jth feature over all images

in G . To stabilize the estimate for the feature weights (~w)(useful when the number of relevant images is small), weadd a history term,

wnewj = (1− δ)

+ δ wprevj (24)

we choose δ = 0.1. wprevj is initialized to 1 in the first

iteration.

4.3 Ranking and retrievalThe sets of relevant and non-relevant images G , B con-

stitute the user’s relevance feedback in a particular RF it-eration. The sets of relevant and non-relevant images may

be accumulated over RF iterations. The proposed RF algo-rithm uses G , B to partition the feature space into relevantand non-relevant regions and also to estimate a similaritymetric. Figure 3 illustrates one iteration of the retrieval pro-cess. Given a feature vector ~x representing an image in thedatabase, it is determined if ~x belongs to the relevant regionby checking if ~x lies in the positive halfspace with respect toeach hyperplane Hi. For vectors in the relevant region, dis-tances are computed using the estimated distance function,either D(., ~qnew , Qnew) (Equation 5) or Dw(., ~qnew , ~wnew)(Equation 22). The set of images displayed to solicit furtherfeedback from the user are obtained by randomly samplingk images from among 5k images in the database with small-est distances. This ensures that some amount of variabilityis present among the images displayed to the user. Thiseases the task of labelling for the user and also improves thegeneralization ability of the learning algorithm. Retrievalaccuracy is measured by computing the number of relevantimages among a fixed number of top ranked images.

5. EXPERIMENTSWe use synthetic and real datasets to demonstrate per-

formance of the proposed RF algorithm. The experimentsdemonstrate that our algorithm effectively uses the non-relevant images to restrict the search space leading to sig-nificant improvement in retrieval accuracy. As discussedin Section 4.2, changing the structure of the Q matrix ofthe quadratic distance function we obtain different distancemetrics. In practice there is a tradeoff between increasedflexibility and the robustness with which parameters can beestimated. We chose MARS over the MindReader metric asthe number of parameters to be estimated is an order smallerand hence can be more robustly estimated when there arefew relevant images. To demonstrate that it is still feasibleto use MindReader for small dimensions, we use the gen-eralized ellipsoid distance metric for experiments with thesynthetic 2-D dataset.

5.1 Results for Synthetic 2-D datasetThe synthetic 2-D dataset, shown in Figure 4(a) is used

to graphically illustrate the proposed RF technique basedon feature space partitioning (Figure 3), and compare itwith an SVM based RF technique. The dataset contains ap-proximately 100 relevant points and 900 non-relevant points.Figures 4(b)-4(e) illustrates the feature space partitions pro-duced by the the proposed algorithm after 1, 2, 3 and 8 RFiterations (rounds). The similarity metric estimated fromthe relevant points is also shown. This metric is used torank the points in the relevant region, top 15 points (alongwith feedback provided in previous rounds) are used as RFfor the next round. For comparison the partitions producedby SVM with RBF kernel are shown in Figures 4(g)-4(j).Note that in case of SVM, an explicit similarity metric isnot needed since distance from the hyperplane serves as themeasure of relevance. The grades of relevance values for theSVM algorithm are not shown for clarity.

For both the algorithms, the feedback provided in the firstround is obtained through labelling the points returned by anearest neighbor search with a Euclidean distance metric. Inthe first iteration of RF, both the algorithms (Figures 4(b)and 4(g)) have similar search space partitions. The proposedalgorithm is able to significantly refine the relevant region inthe next round. The new relevant region well approximates

−1.5 −1 −0.5 0 0.5 1 1.5

−1.5

−0.5

Synthetic 2d dataset

Relevant class, 116 points Non−relevant class, 884 points Starting point for RF

Proposed algorithm, relevance feedback iteration 1

Relevant pointNonrelevant pointDesired metricEstimated metricSeparating plane

Relevant point Nonrelevant pointDesired metric Estimated metric

Relevant pointNonrelevant pointDesired metricEstimated metric

0 2 4 6 80

Precision, 15 feedback/iteration, 2d synthetic dataset

Iterations of relevance feedback

Mindreader Proposed Algorithm with Mindreader metricSVM

SVM, relevance feedback iteration 1

Relevant NonrelevantDesired

Figure 4: Comparison of results for the synthetic 2-D dataset (Section 5.1). Unshaded and shaded regionsrepresent the estimated relevant and non-relevant feature space partitions. The precision values in Figure 4(f)are for experiments with one starting point (displayed in Figure 4(a)).

the desired relevant region. The SVM algorithm with theGaussian kernel is only able to achieve local modificationsto the classification boundary after each round of feedback,leading to a undulated classification boundary in 4(h). Com-paring the relevant regions after 7 rounds of feedback, theproposed algorithm is able to closely approximate the de-sired ellipse whereas the SVM algorithm has captured onlya fraction of the desired ellipse. Figure 4(f) also substanti-ates the significant improvement in precision achieved by theproposed algorithm. The dip in precision for the proposedalgorithm after three rounds of feedback gets corrected asmore points become available.

5.2 Experiments with Corel image datasetAs an example of a real-world content based retrieval ap-

plication, we tested our algorithm with the Corel imagedataset [13]. In Section 5.2.1, we describe the image fea-tures used. The creation of a dataset to demonstrate re-trieval performance is discussed in Section 5.2.2. In Sec-tion 5.2.3, details of the RF algorithms used in the exper-iments are described. Section 5.2.4 describes the experi-mental setup and Section 5.2.5describes performance eval-uation. The results of the retrieval experiments and theiranalysis are presented in Section 5.2.6.

5.2.1 Feature extractionThe objective of feature extraction is to obtain a suc-

cinct representation of an image in the form of a featurevector. A feature extractor is designed to capture humanperception. An effective algorithm to automatically deter-mine the right class of features for a given image databaseis not known. Hence database specific feature extractorsare typically used. For images containing textures wherecapturing repetitive patterns is essential, a multi-resolutiontechniques employing wavelets is widely used [4], [11]. Inimage databases where repetitive patterns are not visible(images of landscapes, city scenes, nature), histograms rep-

resenting the distribution of color are used. Among the re-cent developments, Tong and Chang [11] extract histogramsat different resolutions to capture the spatial color distribu-tion. Deng et. al [14] propose a region based scheme whereinthe dominant colors for each image segment are obtainedthrough a clustering process. A vector composed of thedominant colors forms the feature vector. Huang et. al [15]adapt color correlograms to enable image subregion queryingwithout requiring explicit segmentation.

Since the primary emphasis of this paper is the relevancefeedback algorithm, we use a simple but effective featureextraction procedure based on computing color histogramsin the HSV color space. The bin centers for the histogramare distributed as follows. 12 centers are spaced at equalangular intervals with V = 0.8 and S = 0.5 to capturedifferent Hue values. 2 centers are located at black and whitecolor locations. Hence, the feature vector for each imageis a 14 dimensional vector. Each element of the featurevector represents the fraction of pixels in an image belongingto a particular bin of the histogram. Weighted Euclideandistance (Equation 22) is used as the distance metric tocompare two images.

5.2.2 Dataset creationThe Corel image catalog has 700 image categories (classes)

with approximately 100 images in each category, compris-ing a total of about 68, 000 images. Given the low dimen-sional image representation scheme described in the previoussection, it is not possible to ensure that all classes will beeffectively captured, i.e., there is considerable overlap be-tween different classes in the 14 dimensional feature space.Also, the running time required to search and rank all im-ages is prohibitive. Hence, we choose a subset of 50 classesto constitute the dataset. To ensure a fair and meaningfulcomparison of performance of various RF algorithms we em-ploy an iterative procedure to choose the appropriate subsetof image categories. The objective is to select that subset

of image categories which maximizes the per-class precisionwith weighted Euclidean distance as the ranking function(for each image class, the parameters of the distance metricare estimated using all images in the class).

We employ a greedy procedure to iteratively reduce thesize of the dataset. After each iteration, a fraction of imagecategories having the highest precision are retained. The fi-nal 50 category dataset consists of 4811 images. The featurevectors are zero centered and scaled to have unit variancealong all dimensions.

5.2.3 Relevance feedback algorithms used in experi-ments

We empirically evaluate the performance of the proposedalgorithm and compare with the results of traditionally usedRF algorithms. The details of the three RF algorithms wecompare are as follows.

1. MARS: We use the MARS [1] algorithm as a repre-sentative parametric RF algorithm. We choose MARSover MindReader as the individual feature variancesfor MARS can be estimated more robustly than the co-variance matrix for MindReader. We use Equation 23to estimate the query center and Equation 24 to esti-mate the feature weights.

2. SVM: Among the non-parametric methods we choseSupport Vector Machines (SVM), which have recentlymet with significant success in many real-world learn-ing tasks. Reliable implementations of SVM are read-ily available [16]. We use the radial basis function ker-

nel (RBF kernel) given by K(~x, ~y) = exp(−(~x−~y)T (~x−~y)

2σ2 )

with the unbiased hyperplane option in SVMlight [17].The value of kernel width parameter γ = 1

2σ2 , is em-pirically chosen to give the best performance over allclasses. The average precision of SVM for differentvalues of γ are shown in Figure 5. γ = 1 achievesthe best performance on our dataset, we use γ = 1 inour remaining experiments. Distance from the sepa-rating plane is used as the ranking function. Featurevectors with larger distances are considered more rel-evant than those close to the separating plane (for afeature vector to be considered for ranking, it shouldlie in the relevant half space).

In SVMactive [11], Tong and Chang propose thatthe images close to the separating plane should be usedfor soliciting user’s feedback. Images farthest fromthe separating plane are retrieved to measure preci-sion. They report improved performance with a 15category dataset (1920 images) and a 144 dimensionalfeature space. Images in the first iteration are obtainedthrough random sampling from the database (in oursetting, in the first iteration, RF from the user is so-licited on k nearest neighbors of the initial query cen-ter). With a 2-D example, we observed that the k im-ages closest to the separating plane are frequently clus-tered along small portions of the classification bound-ary. We suspect this as one of the possible reasons forSVMactive’s poor performance in Figure 5.

3. Proposed Algorithm: We employ the proposed al-gorithm (Figure 3) to estimate the relevant region.Weighted Euclidean distance metric with parameter

1 2 3 4 5 6 70.05

0.55Precision (top 100) with 15 relevance feedback/iteration

Iteration of relevance feedback

SVM, γ=1 SVM, γ=10 SVM, γ=0.1 SVM, γ=0.01 SVM

active, γ=1

Figure 5: Average precision over 50 classes (4811 im-ages) for the SVM algorithm with different values ofγ = 1

2σ2 . Details of SVM and SVMactive are presented

in Section 5.2.3.estimates given in Equations 23, 24 are used to rankand retrieve images in the relevant region.

5.2.4 Experimental setupOur retrieval experiments are essentially incremental clas-

sification experiments. Given an image class from the dataset,the objective is to measure the accuracy of results returned(fraction of returned results belonging to the desired class)by the RF algorithm as more feedback is provided in suc-cessive iterations (rounds) of RF. For the first iteration ofRF, an image from the desired class is picked as the startingpoint. A Euclidean distance metric centered at the start-ing point is used as the ranking function to ensure that allalgorithms start with the same initial feedback. The RFprovided in later iterations is obtained as follows. The RFalgorithm returns images in the database in decreasing orderof relevance. A fixed number of images is randomly chosenfrom among top ranked 100 images. Each image is labelledas relevant/non-relevant depending on whether it belongs tothe desired class or not. Feature vectors corresponding tothese images along with their relevance labels serve as RFfor the next iteration. The images chosen for feedback arefresh images (i.e., images that have not been previously usedas feedback), also the feedback (relevant and non-relevant)is accumulated over successive iterations.

5.2.5 Measurement of performancePrecision in top 100 retrieved images (each class contains

approximately 100 images) is used as the performance met-ric. For all RF algorithms, images once marked non-relevantare not retrieved in further iterations even if they are rankedhigh (this is typically the case with MARS). By construc-tion, SVM and the proposed algorithm will not rank non-relevant images high. This ensures that the measured dif-ference in precision captures the true generalization abilityof the learning algorithm. The objective of a RF algorithmis to achieve higher precision with a small number of RFiterations. The size of feedback provided in each iterationis an important parameter, since a human user cannot beexpected to label a large number of images. In the nextsection, we present the results of the experiments to demon-strate the significant improvement in precision achieved bythe proposed algorithm over both SVM and MARS for dif-ferent sizes of feedback.

5.2.6 Results and Discussions

1 2 3 4 5 6 7−0.1

Precision (top 100) with 5 relevance feedback/iteration

SVM, MeanMARS, MeanProposed Algorithm, MeanMean +/− standard deviation

1 2 3 4 5 6 7−0.1

25 50 75 100 125 150−0.1

Number of images retrieved to measure precision

Relevance feedback iteration# 7 with 5 feedback/iteration

25 50 75 100 125 150−0.1

Figure 6: Precision for different RF algorithms (Section 5.2.3) using the Corel dataset with different sizes offeedback. (a)-(d) plots precision in top 100 retrieved images at successive rounds of RF, (e)-(h) comparesprecision measured with increasing number of retrieved images at 7th round of RF. Statistics are extractedfrom 4811 samples (Section 5.2.6).

Figures 6(a) - 6(d) plot the precision measured in top 100retrieved images for the three RF algorithms with feedbacksizes ranging from 5 to 20 images per iteration. The resultsrepresent the statistics for precision extracted from experi-ments with 50 image classes. Given an image class, exper-iments are conducted with each image in the class as thestarting point for the RF process. The statistics (mean andstandard deviation) have hence been extracted from 4811samples. Table 2 compares the precision values at the 7th

RF iteration for the three RF algorithms with different sizesof feedback. Figures 6(e) - 6(h) compare the precision valuesfor increasing number of retrieved images and correspond tothe precision-recall curve popularly used in literature.

Considering Figure 6 and Table 2, it is clear that the pro-posed algorithm consistently outperforms both the SVM andMARS algorithm for

(a) different sizes of feedback and(b) for different numbers of images retrieved (i.e., different

values of recall).MARS achieves slightly higher precision than SVM in all

plots. This confirms that for small feedback size, a distancebased parametric model is more effective in approximat-ing the desired concept than a kernel based non-parametricmodel.

Figure 7 presents per-class precision results with 20 feed-back per iteration after 7 rounds of RF. Since the numberof images in a class is small (≈ 100), we use percentiles torepresent the true distribution of precision values. Table 3summarizes the results for all classes.

MARS shows better performance than the proposed algo-rithm in case of Owls, Prehistoric World, African Wildlifeand Dinosaur Illustrations. Yet the difference in mean preci-sion and percentile values is small. SVM shows best perfor-mance in Religious Stained Glass and Beautiful Roses, theproposed algorithm comes a close second with significantly

higher precision than MARS.In case of Classic Aircraft, Arabian Horses, Studio Mod-

els, Fitness, Crystallography, Divers & Diving, Polo, NewGuinea, Cards, WWII Planes, Surf ’s Up, Surfing, Art Por-traits, Cusine and Office Interiors the proposed algorithmperforms significantly better than MARS and SVM, in manycases, the 0.25 percentile value for the proposed algorithmexceeds the best precision value of both MARS and SVM.

Table 2: Mean ± standard deviation of precision(top 100) for different sizes of feedback k, with 7 RFrounds (from Figures 6(a)-6(d)).

RF Algo k = 5 k = 10 k = 15 k = 20SVM 0.40±0.23 0.46±0.24 0.51±0.24 0.54±0.24

MARS 0.43±0.21 0.49±0.21 0.54±0.21 0.57±0.20Proposed 0.50±0.20 0.58±0.19 0.64±0.19 0.68±0.18

Table 3: Consolidated performance comparison forall classes with 20 feedback per iteration and 7 RFrounds (Figure 7 and Table 2).

RF Number of classes with Precision, All classes,algorithm highest mean precision Mean ± st. dev.

SVM 2 0.54 ± 0.24MARS 4 0.57 ± 0.20

Proposed 44 0.68 ± 0.18

6. CONCLUSIONWe have proposed a novel technique incorporating nega-

tive relevance judgments to improve accuracy of adaptablesimilarity based image retrieval. Improved performance ofthe proposed technique over MARS and SVM based RF al-gorithms is demonstrated with a large number of experi-ments. The experiments also demonstrate that the proposedalgorithm is more robust and handles small feedback sizesbetter than traditional algorithms. Parametric techniques

proposed in the past using both relevant and non-relevantjudgments frequently lead to unfavorable performance, be-cause incompatible information conveyed by the relevanceand non-relevance judgments are combined to derive theranking function. Instead we have proposed a two-step ap-proach, where non-relevant images in conjunction with rel-evant images have been used to define the feasible searchspace. The ranking function, estimated using only the rel-evant images was used to retrieve top k matches from in-side the feasible search region. This enables the search toexplicitly move away from the non-relevant region, whilekeeping close to the relevant region. Note that our pro-posed method does not depend on database-specific parame-ter tuning. Moreover, it is usable on top of existing schemes,e.g., MindReader and MARS.

In this study we have used a relatively low dimensionalfeature space (14 dimensions). We are currently analyzingthe scalability of the proposed algorithm for higher dimen-sional feature spaces. An indexing scheme capable of effi-ciently retrieving feature vectors in the database satisfyinga set of linear constraints will considerably speed up theexecution of the proposed retrieval algorithm. Our ongo-ing work also includes dimensionality reduction to achievehigher precision in a high-dimensional feature space withsmall feedback size.

7. REDUCED GRADIENT ALGORITHMConsider the minimization problem:

(G~λ−~b)T (G~λ−~b) (25)

Subject to : ~e T ~λ = 1 and λi ≥ 0, ∀i (26)

where ~e = [1 . . . 1]T . The problem can be rewritten as,

2~λT D~λ−HT~λ (27)

Subject to : ~e T ~λ = 1 and λi ≥ 0, ∀i (28)

where D = 2GT G and H = 2~b T G.Let n = |G |. Specifying any n−1 λi’s uniquely determines

the value of the nth λ (using Equation 28). Hence we split~λ into (λy, ~λz), where ~λz is an independent (n − 1) sizedvector and λy is a scalar dependent on λz. λy is chosen to

be one of the strictly positive components of ~λ. See [18] fora detailed description of the algorithm. The problem nowreduces to :

minλy,~λz

2~λT D~λ−HT~λ (29)

Subject to : λy + ~e T ~λz = 1 , λy ≥ 0, ~λz ≥ 0 (30)

where ~e = [11 . . . 1]T . We now use a modified steepest de-scent method using the reduced gradient. The reduced gra-

dient at a point λ = (λy, ~λz) is obtained as

r = ∇~λzf(λy , ~λz)−∇λyf(λy, ~λz)B

−1C (31)

The centroid of the relevant points whose corresponding ~λ

is given by ~λinitial = [ 1n

. . . 1n]T , can be used as a feasible

starting point for the iterative process.One iteration of the reduced gradient method is as follows :

1. Compute r(~λ) using Equation 31

2. Let ∆λzi =

� −rzi if rzi < 0 or λzi > 00 otherwise

If ∆~λz = 0, then return current ~λ as the solution, else

find ∆λy = −B−1C∆~λz.

3. Find α1, α2, α3 so that,

α1 = max{α : λy + α∆λy ≥ 0}α2 = max{α : ~λz + α∆~λz ≥ 0}α3 = min{α1, α2, α

′}where α′ = −∆~λT (D~λ−H)

∆~λT D∆~λ.

Set ~λ = ~λ + α3∆~λ.

4. If α3 < α1 then goto step 2 else incorporate λy into ~λz

and mark one of the strictly positive λz’s as λy.

8. REFERENCES[1] Y. Rui, T. S. Huang, and S. Mehrotra. Content-based

image retrieval with relevance feedback in MARS.Proc. Int’l Conf on Image Processing, 1997.

[2] C. Burges. A tutorial on support vector machines forpattern recognition. Data Mining and KnowledgeDiscovery, 2:121-167,1998.

[3] Y. Ishikawa, R. Subramanya, and C. Faloustos.MindReader: Query databases through multipleexamples. Proc. of the 24th VLDB Conf. NY, 1998.

[4] Y. Rui and T. S. Huang. Optimizing learning in imageretrieval. Proc. IEEE Conf. Computer Vision andPattern Recognition, South Carolina, June 2000.

[5] S. Aksoy and R. M. Haralick. Probabilistic vs.geometric similarity measure for image retrieval. Proc.IEEE Conf. Computer Vision and PatternRecognition, South Carolina, June 2000.

[6] J. Quinlan. C4.5: Programs for machine learning. SanMateo, CA: Morgan Kaufmann, 1993.

[7] S. D. MacArthur, Carla E. Brodley, and C. Shyu.Relevance Feedback using Decision Trees incontent-based image retrieval. Proc. IEEE WorkshopCBAIVL, South Carolina, June 12 2000.

[8] Y. Wu, Q. Tian, and T. S. Huang. Discriminant EMalgorithm with application to image retrieval. Proc.IEEE Conf. Computer Vision and PatternRecognition, South Carolina, June 2000.

[9] Muller, K.-R., S. Mika, G.R. Ratsch, K. Tsuda, andB. Scholkopf. An introduction to kernel based learningalgorithms. IEEE Transactions on Neural Networks,12, No. 2 181-201 2001.

[10] X. S. Zhou and T. S. Huang. Small sample learningduring multimedia retrieval using BiasMap. Proc.IEEE Conf. Computer Vision and PatternRecognition, Hawaii, December 2001.

[11] S. Tong and E. Chang. Support vector machine activelearning for image retrieval. Proc. ACM Multimedia,Ottawa, Canada, September 2001.

[12] X. S. Zhou and T. S. Huang. Exploring the natureand variants of relevance feedback. Proc. IEEEWorkshop on Content-Based Access of Image andVideo Libraries, Hawaii, December 2001.

[13] Corel Corporation. Corel Stock Photos on CD ROM,http://www.corel.com/products/clipartandphotos/photos/photolib.htm.

prop sv

prop svm

prop sv

onPer−class precision (top 100), relevance feedback iteration# 7 with 20 feedback/iteration

raf−

t Illu

o−ric

r Il−

0.75−0.99 Percentile 0.50−0.75 Percentile 0.26−0.50 Percentile 0.01−0.26 Percentile Mean

prop sv

prop svm

ard−

f’sU

prop sv

Figure 7: Per-class precision statistics are extracted from approximately 100 RF experiments per class (Sec-tion 5.2.6). prop represents the proposed algorithm. A percentile bar of 0.26 − 0.50 represents that in 26%cases, the precision values are below the lower end of the bar and in 50% cases the precision values are belowthe upper end of the bar.

[14] Deng, B.S. Manjunath, C. Kenney, M.S.Moore, andH.Shin. An efficient color representation for imageretrieval. IEEE Transactions on Image Processing, 10,No. 1 140-147 2001.

[15] J. Huang, S. R. Kumar, M. Mitra, W. Zhu, andR. Zabih. Image indexing using color correlograms.Proc. IEEE Conf. Computer Vision, 1998.

[16] T. Joachims. Making large-scale SVM learning

practical. Advances in Kernel Methods - SupportVector Learning. MIT-Press, 1999.

[17] T. Joachims. SV M light,http://svmlight.joachims.org/.

[18] David G. Luenberger. Introduction to linear andnon-linear programming. Addison-Wesley PublishingCompany, 1973.

Leveraging non-relevant images to enhance image retrieval performance

Documents

Transcript of Leveraging non-relevant images to enhance image retrieval performance

Leveraging Brand Equity through Sponsorships - CiteSeerX

leveraging math cognition - PsyArXiv

LEVERAGING A WATER EFFICIENT ECONOMY - Sitawi

Maximising acquisition success through leveraging employee ...

Leveraging private investment for sustainable development

Leveraging economic migration for development

Robust. Relevant. Responsive. - Zensar

Leveraging Social Networks For Effective Spam Filtering

Leveraging gains from African Center for Integrated ...

India India and the Knowledge Economy Leveraging ...

More applicable environmental scanning systems leveraging ...

Selective neural stimulation by leveraging electrophysiological ...

Leveraging Smart-Phone Cameras and Image Processing ...

constructing and leveraging “flight and expulsion”: expellee

Leveraging the QorIQ Data Path Acceleration Architecture ...

LEVERAGING ON SYNERGISTIC STRENGTHS - Singapore ...

Leveraging High-Impact Family Engagement Strategies to ...

Ioan.pdf - Penalmente Relevant

Resilient and Fast Persistent Container Storage Leveraging ...

Dissolution and clinically relevant specifications