Cross-tagging for personalized open social networking

8
Cross-Tagging for Personalized Open Social Networking Avaré Stewart, Ernesto Diaz-Aviles, Wolfgang Nejdl L3S Research Center / University of Hannover Appelstr. 9A Hannover, Germany 30167 {stewart, diaz, nejdl}@L3S.de Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme University of Hildesheim Marienburger Platz 22 Hildesheim, Germany 31141 {marinho, nanopoulos, schmidt- thieme}@ismll.uni-hildesheim.de ABSTRACT The Social Web is successfully established and poised for continued growth. Web 2.0 applications such as blogs, book- marking, music, photo and video sharing systems are among the most popular; and all of them incorporate a social as- pect, i.e., users can easily share information with other users. But due to the diversity of these applications – serving dif- ferent aims – the Social Web is ironically divided. Blog users who write about music for example, could possibly benefit from other users registered in other social systems operat- ing within the same domain, such as a social radio station. Although these sites are two different and disconnected sys- tems, offering distinct services to the users, the fact that domains are compatible could benefit users from both sys- tems with interesting and multi-faceted information. In this paper we propose to automatically establish social links be- tween distinct social systems through cross-tagging, i.e., en- riching a social system with the tags of other similar social system(s). Since tags are known for increasing the predic- tion quality of recommender systems (RS), we propose to quantitatively evaluate the extent to which users can bene- fit from cross-tagging by measuring the impact of different cross-tagging approaches on tag-aware RS for personalized resource recommendations. We conduct experiments in real world data sets and empirically show the effectiveness of our approaches. Categories and Subject Descriptors: H.3 [Information Storage and Retrieval]: Information Search and Retrieval– Information Filtering ; K.4 [Computer and Society]: Gen- eral General Terms: Algorithms, Performance, Experimenta- tion Keywords: Social Media, Recommender Systems, Web 2.0, Tags Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. HT’09, June 29–July 1, 2009, Torino, Italy. Copyright 2009 ACM 978-1-60558-486-7/09/06 ...$5.00. 1. INTRODUCTION The social networking phenomena has attracted many mil- lions of users, and has resulted in a proliferation of sites. These sites intentionally seek to distinguish themselves by a set of community practices (social activities) and what they offer members. However, given the sheer number, it is often the case that there is redundancy or overlap with respect to the type of media, resources or topics to which the sites are devoted. Although overlap exists, it is untapped to the ben- efit of those who actually constitute the social networking ecosystem: the result is a Social Networking Divide. The momentum is swinging in favor of truly Open Social Networking (OSN) – where data can be ported across vari- ous sites: Google 1 , MySpace [15] and Facebook [12]. These sites seek to establish de-facto standards, to handle issues re- lated to the portability and interoperability of data, personal identities, as well as social graphs. Recent advances toward a more open social networking paradigm are also prevalent in the Semantic Web community and in cross-folksonomy platforms where the user’s multiple identities are consoli- dated [22]. These efforts support, but do not address, how the social practices in one community may be exploited to support those in another, comparable social community; when un- derlying resources – but not necessarily the users – in the different social systems are the same. Consider the scenario of an emerging OSN platform, tar- geted towards linking open data for the purpose of facili- tating information finding of tagged resources in the music domain [18]. In such a system, there are different types of (distinct) users. Bloggers, for example, write text about artists, tracks, albums or music videos and taggers tag audio tracks. A registered tagger can greatly benefit from tags for improving browsing, searching and personalized recommen- dations, in contrast, for example, to registered non-tagging bloggers. For such environments, we propose Cross-Tagging : an approach by which the experience of a non-folksonomy user, such as a music blogger, is personalized by exploiting the tags assertions made by users of folksonomies (Figure 1). The user-tag-resource relations in the tag community is ex- ploited by mapping common resources and inferring similari- ties between different users in the blog social community. By considering an Open Social Network from this perspective, the social activities in one site are exploited, to support the discovery of new interrelationships within another. 1 http://www.google.com/friendconnect/ 271

Transcript of Cross-tagging for personalized open social networking

Cross-Tagging for Personalized Open Social Networking

Avaré Stewart,Ernesto Diaz-Aviles,

Wolfgang NejdlL3S Research Center / University of Hannover

Appelstr. 9AHannover, Germany 30167

{stewart, diaz, nejdl}@L3S.de

Leandro Balby Marinho,Alexandros Nanopoulos,

Lars Schmidt-ThiemeUniversity of HildesheimMarienburger Platz 22

Hildesheim, Germany 31141{marinho, nanopoulos, schmidt-thieme}@ismll.uni-hildesheim.de

ABSTRACTThe Social Web is successfully established and poised forcontinued growth. Web 2.0 applications such as blogs, book-marking, music, photo and video sharing systems are amongthe most popular; and all of them incorporate a social as-pect, i.e., users can easily share information with other users.But due to the diversity of these applications – serving dif-ferent aims – the Social Web is ironically divided. Blog userswho write about music for example, could possibly benefitfrom other users registered in other social systems operat-ing within the same domain, such as a social radio station.Although these sites are two different and disconnected sys-tems, offering distinct services to the users, the fact thatdomains are compatible could benefit users from both sys-tems with interesting and multi-faceted information. In thispaper we propose to automatically establish social links be-tween distinct social systems through cross-tagging, i.e., en-riching a social system with the tags of other similar socialsystem(s). Since tags are known for increasing the predic-tion quality of recommender systems (RS), we propose toquantitatively evaluate the extent to which users can bene-fit from cross-tagging by measuring the impact of differentcross-tagging approaches on tag-aware RS for personalizedresource recommendations. We conduct experiments in realworld data sets and empirically show the effectiveness of ourapproaches.

Categories and Subject Descriptors: H.3 [Information

Storage and Retrieval]: Information Search and Retrieval–Information Filtering ; K.4 [Computer and Society]: Gen-eral

General Terms: Algorithms, Performance, Experimenta-tion

Keywords: Social Media, Recommender Systems, Web 2.0,Tags

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.HT’09, June 29–July 1, 2009, Torino, Italy.Copyright 2009 ACM 978-1-60558-486-7/09/06 ...$5.00.

1. INTRODUCTIONThe social networking phenomena has attracted many mil-

lions of users, and has resulted in a proliferation of sites.These sites intentionally seek to distinguish themselves by aset of community practices (social activities) and what theyoffer members. However, given the sheer number, it is oftenthe case that there is redundancy or overlap with respect tothe type of media, resources or topics to which the sites aredevoted. Although overlap exists, it is untapped to the ben-efit of those who actually constitute the social networkingecosystem: the result is a Social Networking Divide.

The momentum is swinging in favor of truly Open SocialNetworking (OSN) – where data can be ported across vari-ous sites: Google1, MySpace [15] and Facebook [12]. Thesesites seek to establish de-facto standards, to handle issues re-lated to the portability and interoperability of data, personalidentities, as well as social graphs. Recent advances towarda more open social networking paradigm are also prevalentin the Semantic Web community and in cross-folksonomyplatforms where the user’s multiple identities are consoli-dated [22].

These efforts support, but do not address, how the socialpractices in one community may be exploited to supportthose in another, comparable social community; when un-derlying resources – but not necessarily the users – in thedifferent social systems are the same.

Consider the scenario of an emerging OSN platform, tar-geted towards linking open data for the purpose of facili-tating information finding of tagged resources in the musicdomain [18]. In such a system, there are different typesof (distinct) users. Bloggers, for example, write text aboutartists, tracks, albums or music videos and taggers tag audiotracks. A registered tagger can greatly benefit from tags forimproving browsing, searching and personalized recommen-dations, in contrast, for example, to registered non-taggingbloggers. For such environments, we propose Cross-Tagging :an approach by which the experience of a non-folksonomyuser, such as a music blogger, is personalized by exploitingthe tags assertions made by users of folksonomies (Figure 1).The user-tag-resource relations in the tag community is ex-ploited by mapping common resources and inferring similari-ties between different users in the blog social community. Byconsidering an Open Social Network from this perspective,the social activities in one site are exploited, to support thediscovery of new interrelationships within another.

1http://www.google.com/friendconnect/

271

Figure 1: Cross-Tagging applied to distributed Web 2.0 applications operating over the same domain.

The contributions of this work lay on devising an open-social networking framework based on two tag-aware recom-mender components:

1. First, we cast the Cross-Tagging problem as a tag rec-

ommendation problem, where tags from one socialsystem are recommended in order to automatically an-notate resources in another social system.

2. Second, state-of-the art tag-aware resource recom-

menders exploit these annotations in order to recom-mend high quality personalized resources to the users.This component is also used as an evaluator for mea-suring the quality of the tags generated by the firstcomponent, i.e., the higher the quality of the recom-mended tags the better the performance of a tag-awareresource recommender that is based on them.

This paper is organized as follows: in Section 2, we presentthe related work in unified tag and user profile paradigms. InSection 3, we present our cross-tagging approach; beginningwith introducing terminology. We then lay the foundationfor our approach, by discussing alternative tag recommen-dation algorithms: first an unpersonalized and then a per-sonalized one, that is based on collaborative filtering (CF).The section is concluded by describing the tag-aware recom-mendation we use. Section 4 describes a recommendationbased evaluation for cross-tagging. Finally in Section 5, wepresent conclusions and future work.

2. RELATED WORKIn this section we present related work along two axes:

(i) Unified Tag Spaces and (ii) Unified Profiles for Recom-mendations.

Unified Tag SpacesTagging has proved to be an intuitive and flexible Web 2.0mechanism to facilitate search [1, 26], navigation (e.g., tagclouds) and recommendations [24], but syntactic and seman-tic differences across tagging systems make it difficult to ex-ploit their latent information in a more versatile manner.For this reason, tags have been exploited to create a unified

tag space where multiple information sources from compa-rable, but different domains are combined.

The “cross-media” system [2] offers a personalized searchacross disparate media types including: video, images andsocial bookmarks. By relying upon external APIs, their“cross-media” system creates ranked lists of thumbnail itemshaving tags that match the user’s input. Personalizationis achieved by either: restricting the search space to theresources uploaded by the user, or ranking search resultsbased on resources the user picks from results sets. As well,View Completion [20], uses collaborative tags to heuristi-cally complete missing or inadequate feature sets (or views).The basic premise underlying view completion, is that formany tasks, combining multiple information sources yieldssignificantly better results than using just a single one alone.Views are used in this case, since that blogs are not typi-cally available on collaborative tagging websites, and as suchthe tags provided by bloggers suffer from the vocabularyproblem and cannot be adequately used as a shared index.Finally in [17], the goal is to provide a seamless naviga-tion between tag spaces, while the the work presented in [6]merges the areas of formal concept analysis and associationrule mining to discover shared conceptualizations that arehidden in folksonomies.

These tag unification systems focus on media resources ofvarious types but in our case, we are also interested in aunification paradigm that includes blogs. Bloggers can tagblog posts, but typically cannot directly associate a tag to aspecific resource (i.e. song, album or artist) that being writ-ten about within the blog. Furthermore, the tags associatedwith the blog post may not be appropriate for all instancesof the entities appearing in the entire post.

Unified Profiles for RecommendationsIn recommender systems, Cross-System Personalization [25,14, 13, 16] is a body of work, which enables personal in-formation across different systems to be shared. The focushas been on adequate representations of dependencies be-tween the user’s profile, to support a unified representationin the different systems. These approaches are ego-centricin that they assume the same user to exist across differentsystems; and that the user is interested in an aggregated

272

view of their profile or social networking information. Thisis not the case in an our view of an open social networkingenvironment, where users are assumed to be similar (in someway), but have distinct digital identities.

Tags have been viewed as an explicit, personalized (tag-aware) annotation of a person [23, 24, 11, 7]. Diametrically,tags have also been viewed as unpersonalized, i.e., proper-ties of a resource [5, 1]: in such cases, tag assignments areassociated with a resource and are considered the same forall users. Moreover, within a single social system, tags havebeen integrated into the recommendation task, for the pur-pose of recommending resources other than the tags them-selves [4, 11]. Our cross-tagging approach exploits tag-awarerecommender systems in two manners: First, to automati-cally annotate resources across multiple social system. Sec-ond, we exploit these annotations to provide users with per-sonalized recommendations of resources of their interest.

3. CROSS-TAGGING IN MUSIC DOMAINIn this section we present our approach to Cross-Tagging

in the Music Domain in the context of a plausible applicationscenario to improve the user experience: an Open SocialNetwork Recommender System.

For the purpose of this work, we use: Blogger.com, ablog site and Last.fm, a social music site, and present auni-directional recommendation process, first a mapping be-tween the sites and then provide an enrichment via the map-ping. We discuss each aspect in turn.

3.1 Terminology and Problem DefinitionBefore explaining our approach for crossing tags between

different social systems, we first present some terminologyabout the concepts discussed in this paper. Similarly to [6],we define a folksonomy as a four-tuple, F := (U, T, R, Y ) ,where:

• U ,T and R are finite sets, whose elements are calledusers, tags and resources, respectively, and

• Y a ternary relation between them, i.e. Y ⊆ U×T×R,whose elements are called tag assignments.

For convenience we define the Blogger.com system as afolksonomy Fb where the set of tags contains initially onlyone element, i.e., Tb := {default}. We will use the subscriptsl and b to distinguish the corresponding users, tags, re-sources and tag assignments from Blogger.com and Last.fm,respectively. Assuming that all resources present in Blog-ger.com are also present in Last.fm, a trivial way to crosstags between these two systems is to make a join with re-spect to the overlapping resources, i.e.,

Ynew

b := πub,tl,rb(σrb=rl

(Yb × Yl)) (1)

were σ and π are the relational algebra operators for selec-tion and projection.

First, from the cartesian product Yb × Yl, the tuples withequal resources in both sites are selected, and then the pro-jection is taken over the Blogger.com users, the Last.fm tags,and the common resource elements. Given this mapping,the question is which of these tags are useful, if at all, to theBlogger.com users.

The problem can then be described as: (i) how to find themost appropriate Last.fm tags for a given (ub, rb) pair, and

(ii) how to measure to which extent these tags are indeed ap-propriate. We discuss our approaches on how to address (i)and (ii) in subsections 3.2 and 3.3, respectively.

3.2 Cross-Tagging ApproachesNote that the subproblem (i) can be easily cast as a tag

recommendation problem, i.e., given a particular (user, re-source) pair, one wants to suggest a certain number of tagsthat respect some previously defined criterion [7]. Thereby,we identified and selected two tag recommendation algo-rithms, introduced in [11, 7], that can be easily applied tothe problem at hand. The first one corresponds to an un-personalized approach and the second one is a personalizedtag recommender based on collaborative filtering (CF). Wediscuss both approaches in the following subsections.

Unpersonalized Cross-TaggingIn this approach, the most popular tags used to annotatethe active resource are recommended, i.e.:

T (ub, rb) :=n

argmaxt∈T

(|Yrb,tl|) (2)

where Yrb,tl:= Yl ∩ ({rb} × {tl} × Ul). Note, however, that

differently from [11, 7], where the recommender algorithmsoperate in single datasets, here the tags assigned to the re-source in question belong to users that are not necessarily inthe Blogger.com system. The assumption behind this strat-egy is that collective knowledge about a given domain shouldhold across different systems, as long as these systems op-erate over the same domain. This method will serve as abaseline for our experiments.

Personalized Cross-TaggingFor this approach, a personalized tag recommender basedon collaborative filtering (CF) is used. The idea is to firstcompute a neighborhood of Last.fm users based on how sim-ilar their profiles are to the Blogger.com user profiles. Next,the tags of the neighborhood that were used for the activeresource are weighted, aggregated and sorted by decreas-ing weight (see Equation 4). The assumption behind thisstrategy is that users who share similar resources, also sharesimilar tags. We are able to show through our experiments(see Section 4), that this strategy proves indeed to be morebeneficial to the users than the unpersonalized one.

Note that in CF, for m users and n resources (in our casemusic tracks), the user profiles are represented in a user-resource matrix X ∈ {0, 1}m×n. Each user profile is in turnrepresented as row vectors of X :

X := [~x1, ..., ~xm]T with ~xu := [xu,1, ..., xu,n], foru := 1, . . . , m,

where xu,r indicates that user u co-occurred with resourcer by xu,r := 1. Also note, that since we do not have ex-plicit feedback in the form of numerical ratings, the resourcematrix is binary, where 1 denotes that a certain user co-occurred with a particular resource, and 0 otherwise. In ourcase, we have two resource-matrices, one for Blogger.comand one for Last.fm respectively. The well known cosinesimilarity measure was used for computing the most k sim-ilar Last.fm users for a particular Blogger.com user, i.e.,

sim(~xl, ~xb) := 〈~xl,~xb〉‖~xl‖‖~xb‖

.

273

The best k neighbors of ub in Last.fm are then computedas follows:

Nkub

:=k

argmaxul∈Ul

sim(~xl, ~xb) (3)

After that, the set T (ub, rb) of n recommended tags for agiven (ub, rb) pair, and some n ∈ N, is computed as follows:

T (ub, rb) :=n

argmaxt∈T

X

ul∈Nkub

sim(~ub, ~ul)δ(ul, tl, rb) (4)

where δ(ul, tl, rb) := 1 if (ul, tl, rb) ∈ Yl and 0 else.

Example. Consider the blog user Paul shown in Fig-ure 2. The profile of this user is composed of two songs,i.e., Don’t Cry and Z.I.TO. Suppose we want to recommendtags for the pair (Paul, Don’t Cry) through the cross-taggingapproaches introduced in Section 3.2.

For the unpersonalized one, we just need to count whichtags were used most often for this song (see Equation 2). Ifwe restrict the number of tags to be recommended to 2, wewould have the tags Rock and Hard Rock (see upper partof Figure 2b). Note that through this approach, each userwould receive the same tag recommendations independentlyof his/her profile.

For the personalized approach, we first find the users hav-ing the most similar profiles to Paul (see Equation 3), inthis case Jack and John (see Figure 2a), and aggregate thetags used by these “best neighbors” to the resource we wantto recommend (see Equation 4). This would lead, in thisparticular case, to Guns N’ Roses and Hard Rock (see bot-tom part of Figure 2b). Note that in this case, each userwould eventually receive different tag recommendations forthe same resource, since the recommendations are based onthe individual profiles of the users and thus reflect their per-sonal interests.

Figure 2: A blog user profile (top) and last.fm users (bot-tom) (a). Output of unpersonalized cross-tagging (top) andoutput of personalized cross-tagging (b)

3.3 Personalized Recommendations based onTensor Approximation

The tags that are generated by the cross-tagging processcan be exploited for providing personalized recommenda-tions of resources. This section describes the tag-aware rec-ommendation algorithm that we use for this purpose. Themotivation for using a recommendation algorithm is twofold:(a) Recommended resources significantly improve the blogusers’ everyday experience, by allowing them to easily lo-cate resources and address the “information overload” prob-lem that emerges in large blogs. (b) The task of recom-mendation comprises a suitable evaluation framework forcross-tagging strategies, as the higher the quality of gener-ated tags is (i.e., tags that better reflect the personalizedaspect of users for the corresponding resources), the betterthe performance of a tag-aware recommendation algorithmthat is based on them. A similar idea is used in [10], wherethe authors used ontology-aware recommender systems toindirectly measure the quality of a given set of ontologies.

As pointed out in Section 3.1, cross-tagging results ina set of triples with the form 〈u, t, r〉, denoting that useru tagged resource r with the tag t. We model the setof all triples with a 3-order tensor (3 dimensional array)A = (au,t,r) ∈ RNU×NT ×NR , where NU , NT , NR is the to-tal number of users (first mode of A), tags (second modein A), and resources (third mode of A), respectively. Foreach u, t, r (1 ≤ u ≤ NU , 1 ≤ t ≤ NT , 1 ≤ r ≤ NR) forwhich there exist a triple 〈u, t, r〉, we set the correspondingelement au,t,r equal to 1, whereas all other elements of Aare set to 0. Thus, A is, in general, a sparse tensor.

By modeling the triples with a tensor, we are able to ex-ploit the underlying latent semantic structure in A formedby multi-way correlations between users, tags, and resources.This can be attained using a recommendation algorithmthat is based on tensor reduction, which has been proposedin [21]. With this algorithm we can effectively detect multi-way correlations, leading to improved performance, which isempirically confirmed by our experimental results.

First, we decompose A using the Tucker decomposition,which is the multi-dimensional analog of SVD for tensors[8]. The decomposition of A is expressed in Equation 5.U ∈ RNU×NU ,T ∈ RNT ×NT ,R ∈ RNR×NR are orthonormalmatrices corresponding to the dominant singular vectors permode. S is the core tensor that contains the singular values,thus it has the same number of dimensions as A and theproperty of all orthogonality.2 The symbol ×i denotes thei-mode multiplication between a tensor and a matrix.

A = S ×1 U ×2 T×3 R (5)

After decomposing A, we truncate matrices U,T,R, andthe core tensor S by maintaining only the highest D singu-lar values and the corresponding singular vectors per mode(henceforth, D denotes the fraction, e.g., 0.7, of the main-tained values divided by the original number of values).This produces the truncated matrices UD ∈ RNU×D, TD ∈RTU×D,RD ∈ RNR×D, and the truncated core tensor SD ∈RD×D×D.

2Differently from SVD in 2-order tensors, i.e., matrices, however, S

is not diagonal.

274

Using truncation we can approximate A with the recon-structed tensor A ∈ RNU×NR×NT as expressed in Equation 6and illustrated in Figure 3.3

A := SD ×1 UD ×2 TD ×3 RD (6)

The reconstructed tensor A is not sparse. The value of

Figure 3: Visualization of the Tensor Approximationmethod.

each element au,t,r in A predicts the association among useru, tag t, and resource r (the higher the value, the strongerthe association). In particular, all the non-zero elements

of A represent quadruplets of the form 〈u, t, r, p〉, with p

expressing the likeliness that u will tag r as t. Therefore,resources can be recommended to a u for a particular t,according to their weights associated with the quadrupletsthat contain the (u, t) pair. If we want to recommend N

resources to u for t, then we select the N ones with thehighest corresponding p value.

4. EXPERIMENTSThis section first describes the test data sets and evalua-

tion metric, and then reports the experimental results.

4.1 Data SetFor Cross-Tagging, we used two data sets: one data set

consists of personal music blogs from Blogger.com, one ofthe most popular blogsites, whereas the second data set con-sisted of tagged tracks from Last.fm, a radio and musiccommunity website and one of the largest social music plat-forms. The details of each data set are presented in thissection, and summarized in Table 1.

Blogger.com DataThe raw music blogs were collected by experimentally se-lecting seed bloggers using several music directories4 andlimiting the bloggers selected to the genre of pop and rockmusic. The blogroll for each seed was traversed, fanning outin a breath-first order, going three levels deep in the blogrollhierarchy. For each of the pages, the html was stripped, nostemming or stop word removal was done for a total numberof bloggers equal to |UBlogger.com| = 6, 620. Once the blog-ger’s pages were collected, profiles were built by parsing the

3Due to the sparsity of A, its decomposing and approximation can

be performed efficiently following the approach of Sun and Kolda [9].4http://www.musicblogscatalog.com/,http://yocheckthisjam.com/music-blog-directory/,http://www.blogged.com/directory/entertainment/music/rock,http://www.blogcatalog.com/directory/music/rock

Table 1: Data Sets Summary

Data CollectedData Used

in the

Experiments†

Blogger.com Last.fm Enriched Blog

|U |, Users 6,620 44,143 3,827|R|, Resources

17,372 17,372 1,323(i.e., tracks)|T |, Tags 0 4,903 422|Y |, Tag-

0 254,388 32,900Assignments† For our experiments, we only considered the (ub, rb) pairs for which

a personalized top-10 tag recommendation could be generated.

tracks in the blogs by relying upon a dictionary of tracksgathered from Bill Board Music Dictionary5.

Bill Board Music DictionaryTo extract the blogger profiles, a dictionary constructedfrom Billboard, a magazine devoted to the music industrywhich monitors the most popular songs and albums in vari-ous categories on a weekly basis. The data was obtained vialicense for a specified period of time. The artist and trackdictionary was constructed from the Billboard Hot 100 andthe Billboard 200 survey was used for constructing the al-bum dictionary.

Some filter was required for the dictionary entries, par-ticularly for concatenating-tokens within track names suchas: “and”; “featuring”, dashes or parenthesis. Concatenatedtokens were removed and each artist was then made a sep-arate dictionary entry. This process led to duplicate dictio-nary entries, so an additional filter to remove duplicates wassubsequently applied. Some fourty-two domain-specific stopwords were also applied, that described variations of a trackor album such as: “Radio Edit”, “Main Version”, or “Orig-inal Version”, etc. All dictionary entries were converted tolower case.

The approximate dictionary matching algorithm of Aho-Corasick Algorithm available under LingPipe6 was used withtolerance set to zero. It was chosen for its performance withrespect to the size of the dictionary, which is our case con-sisted of 51,226 distinct tracks.

Last.fm DataA total of 17,372 unique tracks for both Blogger.com andLast.fm were collected. Also, for Last.fm, a total of 44,143users, 4,903 tags, and 254,388 user-resource-tag triples wereobtained.

Enriched Blog DataFor our experiments, we only considered the (ub, rb) pairs forwhich a personalized top-10 tag recommendation could begenerated. Note, that it is not always the case that a givenneighborhood will have the resource for which we want torecommend tags. This yielded 3,827 users, 1,323 resources,422 distinct tags and 32,900 triples.

5http://www.billboard.com/bbcom/index.jsp6http://alias-i.com/lingpipe/

275

(a)

1 5 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

n

Rec

all@

5

NN=30, D=0.5

PMP

(b)

10 30 50 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

NN

Rec

all@

5

n=5, D=0.5

PMP

(c) (d)

Figure 4: Experimental Results

4.2 Experimental Results

Protocol and Evaluation MetricFor measuring the recommendation quality, we used the re-call7 measure. We examine the Tensor Approximation algo-rithm having as input either a folksonomy created through“Personalized” Cross-Tagging (for convience denoted as P)or through “Unpersonalized”Cross-Tagging (denoted as MPfor “most popular”). In order to investigate the overall im-pact of Cross-Tagging, we also compare the tag-aware rec-ommender with the classic plain CF without tags, denotedhere as UB for user-based CF [19]. The evaluation was per-formed with the Allbut1 [3] protocol, i.e., for each user oneresource was randomly hidden and used for testing, whilethe remaining ones were used for training.

ParametersThe parameters we considered in our experiments are thefollowing: n is the number of suggested tags (default valuen = 5), NN is the neighborhood size used in the personal-

7With a fixed number of recommendations, precision isjust the same as recall up to a multiplicative constant andthereby there is no need to evaluate precision.

ized Cross-Tagging approach (default value NN = 30), D isthe fraction of maintained dimensions per mode during thetensor reduction (default value D = 0.5). For UB we exam-ined various user-neighborhood sizes and report results forthe best value equal to 50.

ResultsFigure 4a depicts the recall of P, MP, and UB for vary-ing number of recommended resources. As expected, therecall of all methods increases with increasing number ofrecommendations. UB is clearly outperformed by P andMP, which confirms our assumption that, indeed, tags cancarry valuable information. Moreover, P attains better re-call than MP, especially for the number of recommended re-sources that are reasonable for real-world applications. Thereason lies in the better personalization during the processof Cross-Tagging. To simplify the presentation, we hence-forth omit results for UB, as it is consistently outperformedby the other two methods.

To gain further insights on the performance of P and MP,we proceed to examine the impact of the parameters. Fig-ure 4b depicts recall@5 (i.e., recall when 5 resources arerecommended) while varying n, the number of tags. Whenn is low, the performance of both P and MP drops, as there

276

is not enough margin left to personalize the Cross-Taggingprocess. However, after a point, the consideration of moretags does not add much to the performance. This indicatesthat a reasonable number of tags per post is enough to bothattain good recommendation quality and to speed up theoutcome of the Cross-Tagging process. We have to notethat in all cases, P performs better than MP.

Next, we examine the impact of NN on P (note that MPis independent to NN). Figure 4c depicts recall@5 for vary-ing NN . Evidently, when NN is high (e.g., 100), the per-sonalization process is not efficient as very “poor” neighborsare also considered. Thus, the performance of P decreases.On the other hand, when the value of NN allows for theidentification of more similar and consistent neighbors, thenCross-Tagging proves to be effective.

Finally, we examine the impact of D on P. Figure 4d de-picts recall@5 for varying D. When D is high, then noise inthe data is not effectively filtered out. In contrast, when D

is low, information is being lost. Therefore, for reasonablerange of D values, P performs efficiently with has the niceproperty of not being too sensitive on D.

4.3 DiscussionSimplifying assumptions were made when constructing pro-

files.A more robust approach which applies confidence measure

to profiles based on a combination of statistical, semanticand syntactic approaches would be needed to disambiguatethe named entities extracted from the music blog text. Forexample Light-weight “semantic association networks” de-rived from collocated terms in music blogs or the miningof frequent items sets could be useful in identifying promi-nent pairs of resources. Profiles which exhibit less prominentpatterns could be weighted with lower confidence. Syntac-tic approaches could involved part-of-speech analysis appliedto the music blog text. Terms that have an unlikely part ofspeech as a music entity can be weighted with a lower con-fidence.

5. CONCLUSION AND FUTURE WORKIn this work we introduced Cross-Tagging in the music

domain, an approach to Open Social Networking where theexperience of a non-folksonomy Music blogger, is personal-ized by exploiting the tags assertions made by folksonomyusers in Last.fm. Our Cross-Tagging approach exploits tag-aware recommender systems by first automatically annotat-ing resources across multiple social system and modelingresource triples with a tri-dimensional array (i.e., tensor),to exploit the underlying latent semantic structure formedby multi-way correlations between them.

We evaluated the approach with a recommendation al-gorithm that is based on tensor reduction algorithm thatcan effectively detect multi-way correlations. We found thatwhen compared to classic collaborative filtering without tags,and an unpersonalized cross-tagging system, better person-alization was achieved with personalized cross-tagging asmeasured by recall.

The implications for such results suggest that the socialpractices in one community can be exploited to supportthose in another, comparable social community; when un-derlying resources are the same, but users are not. Therebybringing the open initiative community one step closer toclosing the gap in the social network divide.

In future work, we plan to explore the use of more robustprofiles, which applies confidence measures for disambiguat-ing the named entities extracted from the music blog text.We also plan to extend the approach to a bi-directional rec-ommendation to support both, mutual and dual enrichmentfor each social site in the Cross-Tagging ecosystem.

AcknowledgmentsThis work was funded in part by the European ProjectPHAROS (IST Contract No.045035), by the ProgrammeAlβan, the European Union Programme of High Level Schol-arships for Latin America, scholarship no. (E07D400591SV),CNPq an institution of Brazilian Government for scientificand technologic development and X-Media project (www.xmedia-project.org) sponsored by the European Commis-sion as part of the Information Society Technologies (IST)programme under EC grant number IST-FP6-026978.

6. REFERENCES[1] K. Bischoff, C. S. Firan, W. Nejdl, and R. Paiu. Can

all tags be used for search? In CIKM ’08: Proceedingsof the 17th Conference on Information and KnowledgeManagement. To Appear. ACM, 2008.

[2] M. Braun, K. Dellschaft, T. Franz, D. Hering,P. Jungen, H. Metzler, E. Muller, A. Rostilov, andC. Saathoff. Personalized search and exploration withmytag. In WWW ’08: Proceeding of the 17thinternational conference on World Wide Web, pages1031–1032, New York, NY, USA, 2008. ACM.

[3] J. S. Breese, D. Heckerman, and C. Kadie. Empiricalanalysis of predictive algorithms for collaborativefiltering. In Proceedings of the Fourteenth Conferenceon Uncertainty in Artificial Intelligence (UAI-98),pages 43–52. Morgan Kaufmann, 1998.

[4] C. S. Firan, W. Nejdl, and R. Paiu. The benefit ofusing tag-based profiles. In LA-WEB ’07: Proceedingsof the 2007 Latin American Web Conference, pages32–41, Washington, DC, USA, 2007. IEEE ComputerSociety.

[5] C. Hanser and B. Berendt. Tags are not metadata, but”just more content” - to some people. In Proceedings ofthe International Conference on Weblogs and SocialMedia (ICWSM 2007), 2007.

[6] R. Jaschke, A. Hotho, C. Schmitz, B. Ganter, andG. Stumme. Discovering shared conceptualizations infolksonomies. Web Semant., 6(1):38–53, 2008.

[7] R. Jaschke, L. Marinho, A. Hotho,L. Schmidt-Thieme, and G. Stumme. Tagrecommendations in social bookmarking systems. AICommunications, pages 231–247, 2008.

[8] T. G. Kolda and B. W. Bader. Tensor decompositionsand applications. SIAM Review. to appear (acceptedJune 2008).

[9] T. G. Kolda and J. Sun. Scalable tensordecompositions for multi-aspect data mining. InProceedings of the 8th IEEE International Conferenceon Data Mining (ICDM 2008), December 2008.

[10] L. B. Marinho, K. Buza, and L. Schmidt-Thieme.Folksonomy-based collabulary learning. InInternational Semantic Web Conference (ISWC 08).Springer, 2008.

277

[11] L. B. Marinho and L. Schmidt-Thieme. Collaborativetag recommendations. In Proceedings of 31st AnnualConference of the Gesellschaft fur Klassifikation(GfKl), Freiburg. Springer, 2007.

[12] C. McCarthy. Myspace announces “Data Availability”project with yahoo, ebay, photobucket, twitter.http://news.cnet.com/8301-13577 3-9939286-36.html,2008.

[13] B. Mehta and T. Hofmann. Cross systempersonalization by learning manifold alignments. In KI2006: Advances in Artificial Intelligence, volume4314/2007, pages 244–259. Springer Berlin /Heidelberg, 2006.

[14] B. Mehta, T. Hofmann, and P. Fankhauser. Crosssystem personalization by factor analysis. In ITWPWorkshop at AAAI 2006. AAAI Press, 2006.

[15] D. Morin. Announcing facebook connect.http://developers.facebook.com/news.php?blog=1&story=108,2008.

[16] C. Niederee, A. Stewart, B. Mehta, and M. Hemmje. Amulti-dimensional, unified user model for cross-systempersonalization. In Proceedings of Workshop OnEnvironments For Personalized Information Access atAdvanced Visual Interfaces, May 2004.

[17] S. Oldenburg. Comparative studies of socialclassification systems using rss feeds. In J. Cordeiro,J. Filipe, and S. Hammoudi, editors, WEBIST (2),pages 394–403. INSTICC Press, 2008.

[18] R. Paiu, L. Chen, C. S. Firan, and W. Nejdl. Pharos -personalizing users’ experience in audio-visual onlinespaces. In PersDB, pages 40–47, 2008.

[19] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, andJ. Riedl. Grouplens: An open architecture forcollaborative filtering of netnews. In Proc. of ACM1994 Conference on Computer Supported CooperativeWork, pages 175–186, Chapel Hill, North Carolina,1994. ACM.

[20] Shankara Bhargava Subramanya. View CompletationAnd Collaborative Tagging In Blogosphere. Master’sthesis, Arizona State University, July 2008.

[21] M. Y. Symeonidis P., Nanopoulos A. Tagrecommendations based on tensor dimensionalityreduction. In 2nd ACM Conference in RecommenderSystems (RecSys 08), pages 43–50, Lausanne,Switzerland, 2008.

[22] M. Szomszor, H. Alani, I. Cantador, K. O’Hara, andN. Shadbolt. Semantic modelling of user interestsbased on cross-folksonomy analysis. In InternationalSemantic Web Conference, volume 5318 of LectureNotes in Computer Science, pages 632–648. Springer,2008.

[23] M. Szomszor, H. Alani, I. Cantador, K. O’Hara, andN. Shadbolt. Semantic modelling of user interestsbased on cross-folksonomy analysis. In InternationalSemantic Web Conference, pages 632–648, 2008.

[24] K. H. L. Tso-Sutter, L. B. Marinho, andL. Schmidt-Thieme. Tag-aware recommender systemsby fusion of collaborative filtering algorithms. In SAC’08: Proceedings of the 2008 ACM symposium onApplied computing, pages 1995–1999, New York, NY,USA, 2008. ACM.

[25] C. Wang, Y. Zhang, and F. Zhang. User modeling forcross system personalization in digital libraries.Information Technologies and Applications inEducation, 2007. ISITAE ’07. First IEEEInternational Symposium on, pages 238–243, Nov.2007.

[26] J. Wang and B. D. Davison. Explorations in tagsuggestion and query expansion. In SSM ’08:Proceeding of the 2008 ACM workshop on Search insocial media, pages 43–50, New York, NY, USA, 2008.ACM.

278