Context Sensitive Tag Expansion with Information Inference

Post on 25-Mar-2023

0 views 0 download

Transcript of Context Sensitive Tag Expansion with Information Inference

Context Sensitive Tag Expansion

with Information Inference

Hongyun Cai, Zi Huang, Jie Shao, and Xue Li

School of Information Technology and Electrical EngineeringThe University of Queensland, Australia

caihongyun@gmail.com, {huang,jshao,xueli}@itee.uq.edu.au

Abstract. The exponential explosion of web image data on the Inter-net has been witnessed over the last few years. The precise labeling ofthese images is crucial to effective image retrieval. However, most existingimage tagging methods discover the correlations from tag co-occurrencerelationship, which leads to the limited scope of extended tags. In this pa-per, we study how to build a new information inference model over imagetag datasets for more effective and complete tag expansion. Specifically,the proposed approach uses modified Hyperspace Analogue to Language(HAL) model instead of association rules or latent dirichlet allocations tomine the correlations between image tags. It takes advantage of contextsensitive information inference to overcome the limitation caused by thetag co-occurrence based methods. The strength of this approach lies inits ability to generate additional tags that are relevant to a target imagebut may have weak co-occurrence relationship with the existing tags inthe target image. We demonstrate the effectiveness of this proposal withextensive experiments on a large Flickr image dataset.”

1 Introduction

With the rapid growth of popularity in various kinds of tagging systems [10,13],tags occupy an important position in many different areas (e.g., [1,2,3]). Forinstance, image tagging is the crux of text-based image navigation and search-ing systems [8] and the performance of the image retrieval engine is largelyrelated to the accuracy of image tags. However, manually tagging images istime-consuming. A study on Flickr [3] has shown that, although generally usersare willing to provide tags, most images are only with 1∼3 tags from user an-notations [15]. Thus, automatic image tag expansion has become a fundamentalresearch topic recently.

Currently, most of semantic image tag expansion methods are applying asso-ciation rules [11] or other techniques such as latent dirichlet allocation (LDA)[12] to mine image tags. Deriving one tag from another one mainly relies on co-occurrence of these two individual tags, which limits the candidates of expandedtags to small scopes (in the tag co-occurrence lists with high frequencies) andthus some hidden strong relationships between two tags with low co-occurrencefrequency will be lost.

S.-g. Lee et al. (Eds.): DASFAA 2012, Part I, LNCS 7238, pp. 440–454, 2012.c© Springer-Verlag Berlin Heidelberg 2012

Context Sensitive Tag Expansion with Information Inference 441

To overcome this problem, in this paper we present a context sensitive tagexpansion method by building an information inference model. Our proposedmethod can be divided into two main components: an offline tags mining pro-cess and an online tag expension process. In the offline process, unweightedHyperspace Analogue to Language (HAL) models are constructed on the im-age tags. In the online processing, given an image with few tags (seed tags),the concept combination of all these seed tags is computed in the form of high-dimensional HAL vectors. Then the relevance between the combined conceptand each word in the dataset vocabulary is calculated based on three factors,including co-occurrence relationship, inverse vector frequency and contextualsimilarity. Finally, the top K most relevant words are picked up as the expendedtags for the target image. The most important characteristic of this approach isits ability to generate additional tags that are relevant to a target image but mayhave very low co-occurrence frequencies with seed tags in the target image. Theproposed approach uses modified HAL model to infer image tags instead of us-ing association rules or LDA, which takes advantage of information inference forimage tag expansion to overcome the limitation caused by the tag co-occurrencebased similarity metrics. The approach would be beneficial for image sharingwebsites including Flickr where it can be used to assist users to provide tags atupload time by giving them a list of possible candidates to choose from.

The remainder of this paper is organized as follows. Related work on tagexpansion and information inference are reviewed in Section 2, followed by thedescription of HAL model, and information flow in Section 3. Our improveddegree algorithm of information flow model and the proposed tag expansionapproach are presented in Section 4. In Section 5 we present our evaluationresults along with comparison to three proposed models and two other state-of-the-art algorithms in tag expansion and information inference. Finally, wesummarize the work in this paper in Section 6.

2 Related Work

The main objective of our work is to improve the process of image tag expansionvia exploiting information inference. In this section, we review research workrelated to image tag expansion and information inference.

2.1 Image Tag Expansion

Image tag expansion is a process to automatically expand extra descriptive tagsaccording to one or more existing tags of an image in accordance with the ex-panding rules generated by antecedent tags mining process. These expanded tagscould be used as a personalized tag recommendation to the users of a taggingsystem [13,10], or a collective tag recommendation which acts as the results ofa query expansion to improve the efficiency of image retrieval relied on tags [8].

Tags mining could be processed based on semantic similarity [15,20],visual similarity [14,21] or both [18,19]. We focus our review on the first one on

442 H. Cai et al.

account of the fact that our proposal does not include analysis of low-level im-age visual features. Sigurbjornsson and Zwol [15] formed and analyzed a tagcharacterization of Flickr users’ tagging behaviors, and presented four differ-ent tag recommendation strategies by means of using global asymmetric tagco-occurrence metrics to mine tags relationships. Two tag aggregation strate-gies (voting and summing) along with a promotion function were proposed toconstruct an efficient tag recommendation system without introducing tag-classspecific heuristics. In [20], Xu et al. also obtained semantic similarity betweentags based on tag co-occurrence. To improve the performance of tag recommen-dation, an authority score based on the history tagging behavior was assignedto each user and co-occurring tags which have been assigned by one user wererewarded according to the authority score while those assigned by different userswere penalized. Heymann et al. [11] introduced an method of mining associa-tion rules in market-basket model for tag recommendation. In [12], rather thanassociation rules, latent dirichlet allocation was investigated for collective tag rec-ommendation. Latent topics were elicited from the collaborative tagging effortsof multiple users, thus solving the cold start problem for tagging new resourceswith only a few tags. However, the semantic similarity metrics of these methodsmainly relied on tag co-occurrence, thus they all suffer from one problem: thetightly related tags may not necessarily have strong co-occurrence relationship.This is the situation especially when two tags have semantic relationships buthave never occurred together. Such kind of information between tags is lost viausing traditional tags mining process.

Our work mainly focuses on employing concepts from information inferenceto process tags mining to overcome the limitations of association rule basedtagging algorithms and also discover the hidden tags relationship to improve theaccuracy of recommended tags.

2.2 Information Inference

Inferential information content was first formalized by Barwise and Seligman [5]in 1997, followed by a cognitive model which was combined of three levels includ-ing symbolic processing [9]. Based on these two definitions, Song and Bruza [16]focused on the information inferences which “can be drawn on the basis of wordsseen in the context of other words under the proviso that such inferences corre-late with corresponding human information inference” and thus proposed contextsensitive information inference. A representational model of semantic memorynamed Hyperspace Analogue to Language (HAL) was introduced, along with aspecifically-designed corresponding heuristic concept combination. Furthermore,an HAL-based information flow was defined and evaluated by the effectivenessof query models.

This HAL construction model is designed for analyze document corpus, thusit is not very suitable for image tags which only contain limited tags in an image.To build HAL model on image tags, we modified the HAL construction algorithmin this work. The technical details will be discussed in the next section.

Context Sensitive Tag Expansion with Information Inference 443

3 Preliminaries

3.1 Conceptual Space Construction

Hyperspace Analogue to Language (HAL) model [7,6] is a computational modelto represent a conceptual space by high-dimensional vectors based on a givencorpus. Similar to the Vector Space Model (VSM), each dimension in the con-ceptual space corresponds to a word in the given vocabulary with N words. InHAL model, the whole corpus is scanned by a fix-sized sliding window, basedon which a co-occurrence matrix is created to represent the relationships be-tween the words within the window. The strength of the relationship betweentwo words is inversely proportional to the distance between them. After scanningthe entire corpus, an accumulated co-occurrence matrix for all words is created.Since the words in a sliding window can be considered as a certain context, theHAL model represents an accumulation of experience of the contexts in whichthe words appear. Given a word c, it is eventually represented as a normalizedweighted vector c =< wc,1, ..., wc,N >, where wc,i, i = 1..N is the strength of as-sociation between the word c and the i-th word in the vocabulary computed fromthe global co-occurrence matrix. The higher wc,i, the more frequently the wordc appears with the i-th word in the same context (i.e., same sliding window).

3.2 Concept Combination

Words in different contexts may carry different concepts. For example, given theword “penguin”, if the surrounding words are “ocean”,“Antarctica”, and etc.,“penguin” has strong association with “creature” or “bird”. On the other hand, ifthe surrounding words are “book”, “publishing” and etc., most likely “penguin”indicates the publisher “Penguin Books”. Given a group of words, they are thetriggers of each other to reveal new concepts. A concept can be a real word, likepenguin, or a virtual word defined by a group of concepts. Concept combinationis basically a vector operation, which was first introduced in [16].

Definition 1 (Concept Combination). Given two concepts c1 =< wc1,1, ...,wc1,N > and c2 =< wc2,1, ..., wc2,N >, the new concept derived by combining c1and c2 is denoted as c1 ⊕ c2 =< wc1⊕c2,1, ..., wc1⊕c2,N >. The weight of the kth

dimension in the new concept, wc1⊕c2,k is computed by the follows:

wc1⊕c2,k = (l1 +l1 × wc1,k

maxi=1..N

wc1,i+ l2 +

l2 × wc2,k

maxi=1..N

wc2,i)× α (1)

l1, l2 ∈ (0.0, 1.0] and l1 > l2

α ≥ 1.0

where α is a multiplier to emphasize the weights of dimensions appearing in bothc1 and c2. If both wc1,k and wc2,k are greater than a threshold θ, α is set asa value > 1.0, otherwise, α = 1. The symbol “⊕” stands for the operation ofcombination.

444 H. Cai et al.

The parameters used in concept combination are set as l1 = 0.5, l2 = 0.3, α = 2,and θ = 0 in [16] to achieve the best performance.

Concept combination on a group of concepts can be derived by recursivelyapplying the combination operation on two concepts, such as c1 ⊕ c2 ⊕ c3 =(c1 ⊕ c2)⊕ c3.

3.3 Information Inference

Given a group of words {c1, ..., cm}, the resulting combined concept is denotedas ⊕ci, which is usually just a virtual word represented by a weighted vector.How to find the real words from the vocabulary strongly associated with theconcept ⊕ci is important. An HAL based information flow is used in [16] forinformation inference.

Definition 2 (HAL-Based Information Flow). Given a combined concept⊕ci, it implies cj, iff degree(⊕ci � cj)≥ λ, where

degree(⊕ci � cj) =

∑k∈Q(⊕ci)∩Q(cj)

w⊕ci,k∑

k∈Q(⊕ci)w⊕ci,k

(2)

Q(c) denotes the set of dimensions of c with the weight greater than θ and λ isa threshold value predefined by user.

degree(⊕ci � cj) reflects the ratio of intersecting dimensions of ⊕ci and cj . Theunderlying idea of Definition 2 is that if the majority of the most importantproperties of ⊕ci appear in cj , cj must have strong association with c1, ..., cm.By calculating and ranking the degrees between a combined concept and thereal words from the vocabulary, we can return a list of words at the topmostrank as a recommendation. For example, given the words “penguin”, “book”,and “publishing”, “publisher” is the most relevant word with the largest degreewith penguin⊕book⊕publishing. We can also have another two examples, suchas river⊕clean-up⇒<flood, garbage, recover> and river⊕sunshine⇒<holiday,sky>.

4 Context Sensitive Tag Expansion

Tagging has been playing an important role in web information retrieval byannotating various web sources. Taking image tagging as an example, to facili-tate an efficient semantic based image search, it is crucial to assign meaningfuldescriptors (tags) to images. However, manual image tagging is extremely la-bor consuming yet the performance of content-based automatic image taggingis unsatisfactory due to the problem of “semantic gap”. Thus efficiently recom-mending accurate and meaningful tags to users is critical. Here, we propose acontext sensitive model for tag expansion. Given a few seed tags, a group of newtags will be expanded by analyzing the underlying concepts of the seed tags anddiscovering the information inference. A list of notations used in this paper isshown in Table 1.

Context Sensitive Tag Expansion with Information Inference 445

Table 1. Notations

Notation Description

c, ci, cj individual concept

⊕ concept combination operator

⊕ci a combined concept

N the size of vocabulary

< wc,1, ..., wc,N > weighted vector of concept c

4.1 Image Conceptual Space

The general idea of HAL is to represent the concept of one word by the statisticsof its appearance within the context of other concepts. Traditionally, HAL isused in text information retrieval. The HAL space is built on documents. Inthis paper, we aim to construct a conceptual space on images based on theirtags. Usually, Flickr images are associated with a set of user-defined tags. Weconsider each image as a document, where each tag of the image is a word. InHAL model, a fix-sized sliding window is applied to scan the whole corpus. Thewords within the sliding window are assumed to have stronger relationships witheach other than with the words outside the window. However, given an image, allits tags are supposed to describe its characteristics or semantics. It means thatall the tags of an image have strong relationships with each other to some extent.Thus, a flexible-sized sliding window is applied, which scans the corpus imageby image. When scanning an image, the size of the sliding window is exactly thenumber of the associated tags. Since the number of tags is different from image toimage, the size of the sliding window is flexible. After scanning the entire imageset, the global co-occurrence matrix of tags is constructed by accumulating thelocal co-occurrence matrix built on each sliding window. Based on the global co-occurrence matrix, the weighted vectors of tags can be generated. The conceptcombination of two or more tags is calculated according to Equation (1) andimplemented in Algorithm 1. Example 1 illustrates the details step by step.

Example 1 (Image Conceptual Space). Given three images I1, I2 and I3 associ-ated with tags <river, fish, sunshine, kids, birds>, <river, fish, holiday> and<river, fish, birds> respectively, the global co-occurrence matrix built on I1,I2 and I3 is shown in Table 2. The dimensions of the image conceptual spaceconstructed on I1, I2 and I3 are “river”, “fish”, “sunshine”, “kids”, “birds”, and“holiday”. The weighted concept vector of “river” is

river=<fish:0.75, sunshine:0.25, kids:0.25, birds:0.50, holiday:0.25>

Take the dimension “fish” as an example. The weighted value on dimension“fish” of “river” is calculated as 3/

√32 + 12 + 12 + 22 + 12 = 0.75.

4.2 Tag Expansion with Information Inference

Given an image associated with a small number of seed tags, we propose toexpand them to a group of meaningful tags by discovering the information

446 H. Cai et al.

Algorithm 1. Concept Combination

Input:c1 < wc1,1, wc1,1, . . . , wc1,n > (concept vector),c2 < wc2,1, wc2,1, . . . , wc2,n > (concept vector),l1 (rescale parameter for concept c1),l2 (rescale parameter for concept c2),θ (threshold value for quality dimensions of a concept),α (emphasize multiplier).

Output:c (combined concept of c1 and c2).

Description:1: for each weight wc1,i in c1 do2: wc1,i = l1 + (l1 × wc1,i)/maxk(wc1,k);3: end for4: for each weight wc2,i in c2 do5: wc2,i = l2 + (l2 × wc2,i)/maxk(wc2,k);6: end for7: for each dimension i in c1 do8: if wc1,i > θ and wc2,i > θ then9: wc1,i = α× wc1,i, wc2,i = α× wc2,i;10: end if11: end for12: for each dimension i in c do13: wc,i = wc1,i + wc2,i;14: end for15: for each dimension i in c do16: wc,i = wc,i/

√∑n1 (wc,k)2;

17: end for18: return c

inference between them. Each tag stands for a concept, which can be representedby a weighted vector in the image conceptual space. Three inference models areproposed for tag expansion via information flow and various weighting schemes.

The highly weighted dimensions in the concept vector of a tag c are the tagsfrequently co-occurring with c in the corpus. The larger the weight, the strongerthe association. However, some generic tags frequently appear in the whole collec-tion, resulting in a high weight tomost of tags.Motivated by the concept of InverseDocument Frequency (IDF) in text mining, [17] proposed the Inverse Vector Fre-quency (IVF) to measure the information carried by a word, which is formallydefined as:

IV F (c) =log(N+0.5

n )

log(N + 1)(3)

where N is the total number of tags in the vocabulary and n is the number oftags co-occurring with c in the same image. The higher the occurrence frequency,the less the information c carries. By considering both the specific contributionto a concept and the general information it carries, a TF/IVF inference modelis designed.

Context Sensitive Tag Expansion with Information Inference 447

Table 2. Co-occurrence Matrix

river fish sunshine kids birds holiday

river 3 1 1 2 1fish 3 1 1 2 1sunshine 1 1 1 1kids 1 1 1 1birds 2 2 1 1holiday 1 1

Definition 3 (TF/IVF Inference Model). Given a combined concept ⊕ciand the j-th tag cj in the vocabulary, ⊕ci implies cj, iff

w⊕ci,j × IV F (cj) > λ (4)

where w⊕ci,j stands for the contribution (i.e., the strength of the association)to ⊕ci from cj and λ is a threshold value. If w⊕ci,j = 0, w⊕ci,j is set as theminimum positive weight of ⊕ci.

This model only considers the co-occurrence of ⊕ci and cj , yet the contextinformation is not taken into account. In some cases, w⊕ci,j is relatively small,which means cj rarely appears with ⊕ci. According to Definition 3, most likelycj will not be derived from ⊕ci. However, if the context of cj ’s appearances issimilar to the context of ⊕ci, an implicit correlation is expected to exist betweenthem, which has been evidenced in [17,7]. The IVF-HAL model [17] is proposedto capture the context information and discover the implicit relationship betweenwords appearing in similar contexts.

Definition 4 (IVF-HAL Inference Model). Given a combined concept ⊕ci,it implies cj, iff

IV F (cj)× degree(⊕ci � cj) ≥ λ (5)

where degree(⊕ci � cj) is defined in Equation 2 and λ is a threshold value pre-defined by user.

IVF-HAL inference model is an upgraded version of HAL-based inference modelintroduced in Definition 2. The original weighting scheme of HAL is frequencybiased. The high frequency words always obtain high weights in any conceptvectors, even though they may not be much informative. By introducing IVF,the effect of high frequent words is decreased. The experiment results also confirmthat IVF-HAL inference model is superior than the original HAL model.

However, IVF-HAL model considers the context information only, while theabsolute contribution of cj to ⊕ci is neglected. To discover both the explicit andimplicit relationship between concepts for an accurate information inference, anovel inference model is proposed to take into account not only the informationcarried by cj via TF/IVF weighting scheme, but also the closeness of the contextin which cj and ⊕ci appear.

448 H. Cai et al.

Definition 5 (TF/IVF Context Sensitive Inference Model). Given acombined concept ⊕ci, it implies cj, iff

w⊕ci,j × IV F (cj)× degree(⊕ci � cj) ≥ λ (6)

where degree(⊕ci � cj) is defined in Equation 2 and λ is a threshold value pre-defined by user. If w⊕ci,j = 0, w⊕ci,j is set as the minimum positive weightof ⊕ci.

Ci

CjCi

Cj

(a) (b)

Fig. 1. An explanation of context relationship between two concepts

This model leverages the advantages from both TF/IVF weighting scheme andthe context sensitive information flow. When we conduct information inferenceon ⊕ci and cj , there are four possible situations.

1. The first case is that ⊕ci and cj rarely co-occur and the contexts of theirappearances are different. In this case, both w⊕ci,j and degree(⊕ci� cj) arevery low, resulting in that cj will not be inferred by ⊕ci.

2. The second case is that ⊕ci and cj are highly associated, resulting in highw⊕ci,j and degree(⊕ci � cj). Thus cj will be derived from ⊕ci.

3. The third case is that ⊕ci and cj rarely co-occur but the contexts of theirappearances are similar (dashed area in Figure 1(a)), resulting in a low w⊕ci,j

but a high degree(⊕ci�cj). If we apply TD/IVF model, cj cannot be deriveddue to the low w⊕ci,j . By involving context similarity in the proposed model,a high degree(⊕ci � cj) compensates the information loss caused by the lowco-occurrence.

4. The last case is that ⊕ci frequently co-occurs with cj , however degree(⊕ci�cj) is very low (Figure 1(b)). It means that ⊕ci is a quite narrow concept con-tained by cj . For example, “cat” (i.e., ⊕ci) appears frequently with “animal”(cj), while the context of “animal” appearing is much broader than it of “cat”(shadow area in Figure 1(b)). If applying IVF-HAL model, the informationinference between “cat” (i.e., ⊕ci) and “animal” (cj) cannot be proceeded,because IVF-HAL model overemphasizes the effect of degree(⊕ci�cj), whichis supposed to be balanced by w⊕ci,j in our proposed model.

Context Sensitive Tag Expansion with Information Inference 449

A comprehensive performance study on the above three inference models will begiven in the following section. The experimental results confirm the significantsuperiority of the proposed TF/IVF context sensitive model.

5 Experiments

In this section, we evaluate our three inference models and two other existingmethods for tag expansion on a collection of real-life images downloaded fromFlickr. An comprehensive performance study on different models is provided.

5.1 Experimental Setup

Image Dataset. The image corpus is constructed by performing keyword searchon Flickr with “water”, “nature”, “outdoor”, “river”, and “environment”. A totalnumber of 97,622 different images1 with 669,470 tags are collected after stem-ming and removing stop words (e.g.,“the”), camera models (e.g., “Canon”), num-bers, and some other insignificant words. The number of unique tags is 14,785.A separate testing dataset of the experiment is also collected from Flickr on thesame keywords with time stamp from 10/30/2008 to 04/01/2009. It is prettyhard to get real tagging ground truth. In our assumption, images with largernumber of tags are more likely to be well-tagged, where these tags are furtherconsidered as the tagging ground truth in the evaluation. Thus we removing theimages which contain less than 5 tags. After the preprocessing, we get the testingimage set with 4,419 different images.

Evaluation Strategy. For each testing image, its first 1 to 4 tags are pickedas the seed tags, based on which the tag expansion is performed by applyingdifferent methods. The reason we choose the first few tags as the seed is thatwe assume Flickr users are used to input closely relevant tags first to an image.Take the images in Figure 2 as the example. The initial tags defined by usersof the image (a) are “blue”, “light”, “clouds”,“sky”, etc. When setting the seedtags as the first two, “blue” and “light”, different models give different top 3recommended tags as the expansion to the seeds tags. For the image (b), we use3 seed tags and top 5 recommended tags are used for expansion. We assume theinitial user defined tags are ground truth, based on which the precision can becalculated for performance evaluation. For the image (a), our model provides allfive tags correctly compared with the user defined tags, resulting in a precision@5= 100%, while precision(IVF-HAL)@5=40% and precision(AR)@5=80%. TopK=8, 10, 15 and 20 tags recommended by different models are evaluated bycomparing the corresponding precisions (i.e., precision@K).

5.2 Experimental Results

We show the comprehensive experimental results of two widely used methodsincluding association rule mining and language modeling, and three inferencemodels proposed in the paper in Figure 3.

1 Uploaded to Flickr from 04/30/2009 to 09/15/2011.

450 H. Cai et al.

User Defined Tags:

blue, light, clouds, sky, nature, ocean, islands, water,

sun, peace, quiet

2-Seed Tags: blue, light

Our Model:

blue, light, sky, sun, cloud

IVF-HAL Model:

blue, light, sunset, color, red

Association Rule:

blue, light, nature, water, beach

User Defined Tags:

autumn, bridge, tree, fall, water, river, leaves,

nature, river, reflection

3-Seed Tags: autumn, bridge, tree

Our Model:

autumn, bridge, tree, fall, leave, river, sky, forest

IVF-HAL Model:

autumn, bridge, tree, city, leave, fall, forest, wood

Association Rule:

autumn, bridge, tree, green, water, flower, nature, sky

(a)

(b)

Fig. 2. An example of tag expansion

Association Rules (AR). Association rules have been investigated by Hey-mann et al. [11]. The general idea is that if two tags are often used together forimage tagging, an explicit strong relationship exists between them. In the ex-periment, the min sup is set to be 0.00488, and thus 262,555 frequent item setswere derived. Unlike all the other four methods which rank the whole vocabu-lary according to specified similarity measure and the top K tags are selected asthe results, the recommended tags from AR are derived from strong associationrules, which makes it possible that AR model is not able to recommend anytags by giving a set of seed tags. As observe from Figure 3, with the increaseof the number of seed tags, the precision of AR is dropping when required torecommend 15 or 20 tags for the given seed tags. It is difficult for AR model todiscover a large group of tags which have strong associations with each other.

Language Modeling. Tag expansion is similar to query expansion in informa-tion retrieval to some extent. Thus, the method of query expansion with termrelationships via language modeling [4] is also implemented and compared inour experiment. The performance of this method is the worst among all com-pared methods, which demonstrates the selected query expansion method is notsuitable for tags. In [4], this query expansion is only used to expand the querysentence, which will be later used as the index to select appropriate documents.Expect for query model, there is also a document model in their case. Besides,objective of their query expansion is to generate a better result of documents

Context Sensitive Tag Expansion with Information Inference 451

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4

Prec

isio

n

No. of seed tags

Precision at 8

LMIVF-HAL

ARTF/IVF

TF/IVF CS

(a)

0

0.1

0.2

0.3

0.4

0.5

1 2 3 4

Prec

isio

n

No. of seed tags

Precision at 10

LMIVF-HAL

ARTF/IVF

TF/IVF CS

(b)

0

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4

Prec

isio

n

No. of seed tags

Precision at 15

LMIVF-HAL

ARTF/IVF

TF/IVF CS

(c)

0

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4

Prec

isio

n

No. of seed tags

Precision at 20

LMIVF-HAL

ARTF/IVF

TF/IVF CS

(d)

Fig. 3. Comparisons of precision

search. Since we only take the query expansion part of their method (not theentire algorithm), the suboptimal performance could be understandable.

TF/IVF Inference Model. Surprisingly, this method has relatively good per-formance, especially when the expanded tags are no more than 10 and the seedtags are more than 2. In this method, the large-frequency tags are smoothingby IVF and priority of tightly related tags are increased by weight of the tag inthe concept vector of the combined concept. However, a critical problem of themethod is if two tags never be co-occurrence but they have very similar context,they are not very likely to be recommended by TF/IVF model. This leads tothe need of a method which takes both co-occurrence and similar context intoconsideration, which is exactly the movitation of our work.

IVF-HAL Inference Model. Based on the original HAL-based informationflow [16], an improved information flow algorithm is proposed and used in theIVF-HAL inference model [17] to overcome the problem brought by highly fre-quent yet less informative words. Degree of each tag is ranked in descendingorder, where top K tags are returned as expanded tags. Compared with the

452 H. Cai et al.

original HAL model2, IVF-HAL successfully decreases the weights of genericwords with high frequencies when calculating the degree between the combinedconcept of seed tags and the candidate tags. Observed from Figure 3, the preci-sion of IVF-HAL model is not as good as we expected compared with TF/IVFmodel. The possible reason is that the image corpus we generated for conceptualspace construction is not comprehensive. It contains only few topics accordingto our keywords setting. The case of the tags with low co-occurrence frequencyyet high context similarity is not often. Thus the advantage of IVF-HAL modelcannot be shown. However, when we use two seed tags, the precision of IVF-HALmodel is always better than TF/IVF model. The parameters used in the modelare set as l1 = 0.5, l2 = 0.3, α = 2 and θ = 0.

TF/IVF Context Sensitive Inference Model. In the proposed TF/IVF CSmodel, there are three aspects taken into account, where TF is to increase therank of highly co-occurring tags, IV F is to alleviate the effect of high frequencytags and degree is to take the similarity of context as a judging standard too.As confirmed by the experiment results in Figure 3, the proposed model is theone significantly outperforms other models. Given two seed tags and K=8, therelative precision increases of TF/IVF CS model on AR, LM, TF and IVF-HALare 35.4%, 113%, 12% and 23.5% respectively. Given three seed tags and K=8,the relative precision increases are 45.9%, 102%, 5.5% and 64%. Compared withTF/IVF model, the effort of degree (i.e., context information) is shown by theprecision improvement. The parameters used in the model are set as l1 = 0.5,l2 = 0.3, α = 2 and θ = 0.

5.3 User-Involved Assessment

We also perform a manual assessment experiment so as to evaluate further onAR, IVF-HAL model and TF/IVF CS model. 20 people are involved to evaluatethe performance of different tagging methods on 300 Flickr images (randomlypicked from our test image set). Test images along with ten ranked recommendedtags are presented to the assessors, who need to pick the tags which they thinkare appropriate to describe the corresponding image. The experiment results areshown in Figure 4, where we present the precisions of three different models atvarious K values (K = 4..10). The number of seed tags is 3. We can observethat TF/IVF CS model is still superior to other methods. Another interestingobservation is that the precision evaluated by the people is higher than by usingFlickr image tags as ground truth. The reason is that image tagging is verysubjective and the tags used for an image may be synonyms. Take image (a) inFigure 2 as an example. The blue water shown on the image can be annotatedas “ocean” or “sea”. In our example, the user chooses “ocean” as one of itstags rather than “sea”. Once the tag expansion models recommend “sea” to theassessors, most likely, it will be accepted as a correct tagging.

2 Performance comparison between HAL model and IVF-HAL model is not shown inthis paper due to the page limit.

Context Sensitive Tag Expansion with Information Inference 453

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

4 5 6 7 8 9 10

Prec

isio

n

K

3 seed tags

TF/IVF CSIVF-HAL

AR

Fig. 4. Comparison of precision with human judgement

6 Conclusion

In this paper, we propose a new tag expansion method by utilizing informationinference model for more effective and complete tag expansion. The proposedapproach extends Hyperspace Analogue to Language (HAL) model to infer im-age tags instead of using association rules or latent dirichlet allocations, whichtakes advantage of information inference for context sensitive tag expansion toovercome the limitation caused by the tag co-occurrence based methods. Thisapproach is able to discover implicit relationship among different tags by ana-lyzing the context of tag appearance. We demonstrate the effectiveness of thisproposal with extensive experiments on a large Flickr image dataset, comparedwith several existing methods.

References

1. Citeulike, http://www.citeulike.org

2. Delicious, http://www.delicious.com

3. Flickr, http://www.flickr.com

4. Bai, J., Song, D., Bruza, P., Nie, J.-Y., Cao, G.: Query expansion using termrelationships in language models for information retrieval. In: CIKM, pp. 688–695(2005)

5. Barwise, J., Seligman, J.: Information Flow: The Logic of Distributed Systems.Cambridge University Press (1997)

6. Burgess, C., Livesay, K., Lund, K.: Explorations in context space: Words, sentences,discourse. Discourse Processes 25(2/3), 211–257 (1998)

7. Burgess, C., Lund, K.: Modeling parsing constraints with high-dimensional contextspace. Language and Cognitive Processes 12(2/3), 177–210 (1997)

8. Datta, R., Ge, W., Li, J., Wang, J.Z.: Toward bridging the annotation-retrievalgap in image search. IEEE MultiMedia 14(3), 24–35 (2007)

9. Gardenfors, P.: Conceptual Spaces: The Geometry of Thought. MIT Press (2000)

454 H. Cai et al.

10. Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J.Information Science 32(2), 198–208 (2006)

11. Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: SIGIR,pp. 531–538 (2008)

12. Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recom-mendation. In: Proceedings of the 2009 ACM Conference on Recommender Sys-tems, pp. 61–68 (2009)

13. Marlow, C., Naaman, M., Boyd, D., Davis, M.: Ht06, tagging paper, taxonomy,flickr, academic article, to read. In: Proceedings of the 17th ACM Conference onHypertext and Hypermedia, pp. 31–40 (2006)

14. Moxley, E., Mei, T., Manjunath, B.S.: Video annotation through search and graphreinforcement mining. IEEE Transactions on Multimedia 12(3), 184–193 (2010)

15. Sigurbjornsson, B., van Zwol, R.: Flickr tag recommendation based on collectiveknowledge. In: WWW, pp. 327–336 (2008)

16. Song, D., Bruza, P.: Towards context sensitive information inference.JASIST 54(4), 321–334 (2003)

17. Song, D., Bruza, P., Cole, R.: Concept learning and information inferencing on ahigh dimensional semantic space. In: Proceedings ACM SIGIR 2004 Workshop onMathematical/Formal Methods in Information Retrieval (2004)

18. Wang, C., Jing, F., Zhang, L., Zhang, H.: Image annotation refinement using ran-dom walk with restarts. In: ACM Multimedia, pp. 647–650 (2006)

19. Wu, L., Yang, L., Yu, N., Hua, X.-S.: Learning to tag. In: WWW, pp. 361–370(2009)

20. Xu, Z., Fu, Y., Mao, J., Su, D.: Towards the semantic web: Collaborative tagsuggestions. In: Proceedings of the Collaborative Web Tagging Workshop at theWWW 2006 (2006)

21. Yang, Y., Huang, Z., Shen, H.T., Zhou, X.: Mining multi-tag association for imagetagging. World Wide Web 14(2), 133–156 (2011)