Collaborator Recommendation for Isolated Researchers

Collaborator Recommendation for Isolated Researchers

Tin Huynh∗, Atsuhiro Takasu†, Tomonari Masada†, Kiem Hoang∗∗University of Information Technology, Linh Trung Ward, Thu Duc Dist, HCM City, Vietnam†National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430 Japan‡Nagasaki University, 1-14 Bunkyo-machi, Nagasaki-shi, Nagasaki, 852-8521 Japan

Email: [email protected], [email protected], [email protected], [email protected]

Abstract—Successful research collaborations may facilitatemajor outcomes in science and their applications. Thus,identifying effective collaborators may be a key factor thataffects success. However, it is very difficult to identify po-tential collaborators and it is particularly difficult for youngresearchers who have less knowledge about other researchersand experts in their research domain. This study introducesand defines the problem of collaborator recommendation for‘isolated’ researchers who have no links with others in coauthornetworks. Existing approaches such as link-based and content-based methods may not be suitable for isolated researchersbecause of their lack of links and content information. Thus,we propose a new approach that uses additional informationas new features to make recommendations, i.e., the strength ofthe relationship between organizations, the importance rating,and the activity scores of researchers. We also propose a newmethod for evaluating the quality of collaborator recommen-dations. We performed experiments by crawling publicationsfrom the Microsoft Academic Search website. The metadatawere extracted from these publications, including the year,authors, organizational affiliations of authors, citations, andreferences. The metadata from publications between 2001 and2005 were used as the training data while those from 2006to 2011 were used for validation. The experimental resultsdemonstrated the effectiveness and efficiency of our proposedapproach.

Keywords-Coauthor Network, Collaboration Quality, Collab-orator Recommendation, ‘Isolated’ Researcher.

I. INTRODUCTION

The numbers of researchers and their collaborative re-lationships have been increasing rapidly. This makes itchallenging to analyze coauthor networks for predictive orrecommendation purposes. In particular, the identificationof potential collaborators is a key factor that may affect thesuccess of research projects. Students and junior researchersmay need to find collaborators during their studies or re-search. However, they may have little knowledge about theresearchers and experts in their research community and somay encounter problems identifying potential collaborators.Therefore, it would be useful to have access to a collaboratorrecommendation system that calculates and generates a listof potential collaborators automatically.

To address this problem, the most effective methodsthat have been proposed are link based and content based[1][2][3][4]. Using coauthor networks, link-based methodsare powerful approaches for predicting links and making

Figure 1. Existing vertex-based similarity methods can make recommen-dations for non-isolated researchers (dotted lines) but they are not suitablefor isolated researchers (around the question mark)

researcher recommendations. Unfortunately, many studentsand junior researchers have no links in coauthor networks(and are thus ‘isolated’ researchers) and so such methodsare not suitable in these cases (Figure 1). The aim ofthis study was to develop new methods to address thecollaborator recommendation problem. Our proposed systemcan provide a list of potential candidates for collaborationwhere the inputs are one or several researchers who lackcoauthorship information and have very few publications(content information).

The key contribution of our method is the use of additionalinformation to facilitate researcher recommendation if thereis a lack of coauthorship and content information (isolatedresearchers). The additional types of information used tomake recommendations in our proposed method are asfollows: (1) content information based on a few availablepublications or input keywords, (2) the strength of therelationships between organizations where researchers haveworked, (3) the importance rating, and (4) the activity levelsof recommended researchers. These types of additionalinformation are used as features to calculate the similaritiesbetween isolated researchers and potential collaborators. Fi-nally, a ranking list of potential candidates is generated. Theaccuracy of coauthorship prediction was used to evaluate theperformance of our collaborator recommendation method forisolated researchers. A new method was also developed to

2014 28th International Conference on Advanced Information Networking and Applications Workshops

978-1-4799-2652-7/14 $31.00 © 2014 IEEE

DOI 10.1109/WAINA.2014.105

639

2014 28th International Conference on Advanced Information Networking and Applications Workshops

978-1-4799-2652-7/14 $31.00 © 2014 IEEE

DOI 10.1109/WAINA.2014.105

639

https://www.researchgate.net/publication/220923688_CollabSeer_A_Search_Engine_for_Collaboration_Discovery?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

https://www.researchgate.net/publication/241623584_Discovering_missing_links_in_networks_using_vertex_similarity_measures?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

https://www.researchgate.net/publication/221300743_Finding_similar_experts?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

https://www.researchgate.net/publication/254006063_Similar_researcher_search_in_academic_environments?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

evaluate the quality of coauthorship.The key contributions of this study are summarized as

follows.• The introduction and formalization of the collaborator

recommendation problem for isolated researchers. Thefacilitation of collaborator recommendation for researchstudents and junior researchers because the currentlyavailable methods are unsuitable for these groups.

• The construction of an experimental dataset by crawlingthe Microsoft Academic Search website1.

• The use of additional information as a set of newfeatures to enable collaborator recommendation forisolated researchers.

• A new method for evaluating collaborator recommen-dations based on coauthorship quality.

The remainder of this paper is organized as follows. Insection 2, we present and discuss related work. Section3 provides a formal definition of the problem. Section 4describes the proposed features and methods in detail. Thedataset, experimental results, and discussion are provided insection 5. We conclude the paper and suggest future researchin section 6.

II. RELATED WORK

Due to the great commercial value, recommendation sys-tems have been used widely in industry, especially for e-commerce, such as product recommendation at Amazon [5],movies recommendation by MovieLens [6], etc. However,the utilization of recommendation systems for academicresearch has not received much attention [7]. In the academicdomain, however, a significant problem that has attractedmany studies in recent years is collaborator recommenda-tion.

Chen at al. introduced CollabSeer, which is an opensystem that recommends potential research collaborators toscholars and scientists [1]. CollabSeer considers the structureof a coauthor network and an author’s research intereststo make collaborator recommendations. A different list ofcollaborators is suggested to each user after consideringtheir position in the coauthor network structure. Huynh et al.proposed new methods for calculating similarity of verticesin the coauthor network by taking the trend information intoconsidering relational strength of different authors [8]. Inanother study, Lopes et al. proposed an innovative approachfor recommending new collaborators and for intensifyingexisting collaborations [9]. They considered the semanticissues involved in the relationships between researchers indifferent areas, as well as structural issues, by analyzing ex-isting relationships between researchers. Tang et al. definedthe problem of cross-domain collaboration recommendation[10]. They proposed methods for ranking and recommendingpotential collaborators by modeling cross-domain topics.

1http://academic.research.microsoft.com/

These previous studies are the most relevant to our research.However, all of these did not consider collaborator recom-mendation for isolated researchers.

In addition, much research related to this problem hasbeen conducted in the domains of expert matching andfinding. Balog and Rijke introduced a novel expert-findingmethod for use when a small number of example expertsis available, where the system returned similar experts [3].They defined, compared, and evaluated a number of methodsfor representing experts, and investigated how the size ofthe initial sample set affected performance. Hofmann et al.explored the role of contextual factors when finding similarexperts [11]. They extended content-based expert-findingapproaches using known contextual factors with effects onhuman expert finding. Gollapalli et al. reviewed and eval-uated several techniques for representing expertise profilesbased on the available evidence and proposed models forcomputing the similarity between two profiles [4]. In anotherstudy, Tang et al. studied the problem of expertise matchingwith various constraints and applied their proposed methodto conference paper reviewer suggestion [12]. In general,previous studies of expert matching and finding have usedsimilarity-based methods for collaborator recommendation.

Our method is also relevant to the coauthor link predictionproblem and the discovery of missing links in complexnetworks [2][13]. Predicting future coauthors is not a trivialtask. Content-based and link-based methods are efficientfor link prediction and most current research is focusedon evaluating the performance of proposed methods basedon the quantity or accuracy of coauthor link prediction[9][10]. However, collaborator recommendation is differentfrom coauthor link prediction and current methods do notselect the recommendation results based on the quality ofcollaborations.

In summary, two deficiencies are not considered by cur-rent methods: (1) there is a lack of collaborator recommen-dation methods for isolated researchers, and (2) the qualityof the recommended collaborations is not considered.

III. PROBLEM DEFINITION

We define isolated researchers as those without coauthorlinks in a coauthor network (Figure 1).

A set of isolated researchers R′ ⊆ R (set of all re-searchers) and their available information can be representedas follows.• R = {r}: set of all researchers• R′ = {r′}: set of isolated researchers• P = {p}: set of all available publications• N = {R,E}: a coauthor network (E = RxR, set of

edges in a network representing coauthorships)• O = {o}: list of organizations where researchers have

worked• CiNet: citation network of researchers• m org(r) = {o ∈ O|r ∈ o}

640640




https://www.researchgate.net/publication/2479158_MovieLens_Unplugged_Experiences_with_an_Occasionally_Connected_Recommender_System?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

https://www.researchgate.net/publication/251942307_Research_paper_recommendation_with_topic_analysis?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

https://www.researchgate.net/publication/220552240_On_optimization_of_expertise_matching_with_various_constraints?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==


https://www.researchgate.net/publication/221273605_Co-author_Relationship_Prediction_in_Heterogeneous_Bibliographic_Networks?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

https://www.researchgate.net/publication/220433205_Contextual_Factors_for_Finding_Similar_Experts?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

https://www.researchgate.net/publication/3419552_Linden_G_Smith_B_and_York_J_'Amazoncom_recommendations_item-to-item_collaborative_filtering'_Internet_Comput_IEEE_7?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

https://www.researchgate.net/publication/254464203_Cross-domain_collaboration_recommendation?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==


https://www.researchgate.net/publication/220062980_Industry_Report_Amazoncom_Recommendations_Item-to-Item_Collaborative_Filtering?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

For each given isolated researcher r′ ∈ R′, our task is togenerate a ranking list of potential candidates who could bea coauthor with r′ ∈ R′.

IV. OUR APPROACH

For isolated researchers with no existing coauthorshipsin the current coauthor network, it is obvious that link-based methods are unsuitable because of the lack of coau-thorship information. In addition, content-based methods areunsuitable because very few publications or keywords canrepresent the interests of isolated researchers. To address thissituation, we propose the use of additional information asnew features to predict coauthorships using a support vectormachine (SVM).

First, we explain the assumptions and the feature set usedfor coauthorship prediction with the SVM, and we describehow these features can be computed. Next, we present ournew method for evaluating the quality of coauthorships,which is very useful for collaborator recommendation.

A. Feature Set

1) Content Similarity.: Isolated researchers have no coau-thor links but a few publications or keywords can stillrepresent their research interests. Thus, we can calculatethe similarity of two different researchers based on thisinformation as a baseline to compare with other proposefeatures. To obtain this feature, we use the term frequency-inverse document frequency (TF-IDF) [14] to calculate thesimilarity. The feature vectors wr and wr′ of the terms usedin the papers published by researcher r and r′ are con-structed, respectively. These feature vectors are normalizedbased on the TF-IDF. The content similarity is calculatedusing the cosine similarity of wr and wr′ as follows.

ContentSim(r, r′) =(wr.wr′)

‖wr‖.‖wr′‖(1)

2) Organization Similarity.: We assume that potentialcollaborations may occur between organizations that haveexisting collaborative relationships. Thus, organizations withstronger relationships are more likely to have new col-laborations. To calculate the strength of the relationshipbetween organizations, the collaborative network of theseorganizations is constructed based on the coauthorshipsof researchers who have worked for these organizations(Figure 2). Using this collaborative network, we can com-pute the strength of the relationship between organizations(OrgRS(r, r′)), where two researchers r, r′ have worked.

In the network, p is a single path from o to o′ through kother nodes o ≡ o1, o2, ..., ok ≡ o′

w(oi, oi+1) = P (oi, oi+1) =Coll Num(oi, oi+1)

Total Coll Num(oi)(2)

Figure 2. A collaborative network of various organizations, which wasconstructed and quantified based on the coauthorships of researchers

Where,- Coll Num(oi, oi+1): number of coauthorships of re-searchers in oi and oi+1.- Total Coll Num(oi): total number of coauthorships ofoi with other organizations.

We can use basic measures from graph theory and bio-statistics to calculate the strength of the relationship betweenthe organizations o, o′ (OrgRS(o, o′)) in the collaborativenetwork, as follows:

OrgRS(o, o′) =m∑i=1

(Path Weightpi(o, o′)) (3)

Where,- p1, ..., pm: all non-circular directed paths between o, o′.

- Path Weightp(o, o′) =

k∏i=1

w(oi, oi+1).

In general, the organization similarity of two researchersr, r′ is computed based on the strength of the relationshipbetween the organizations where r, r′ have worked.

OrgRS(r, r′) = OrgRS(or, or) (4)

3) Researcher Importance Rating.: The importance ratingof a researcher is another key factor that determines howmany new collaborators they might have. The importancerating can be determined by performing a random-walk-with-restart (RWR) [15] in the coauthor network or thecitation network. In our experiment, the citation networkwas used to calculate the importance rating of researchers,I.Rate(ri): Where,- N : total number of researchers in the coauthor network- |OutLink(r)|: number of out links of r- d: damping factor (d is usually set to 0.85).

4) Researcher Activity.: In general, researchers prefer tomake new collaborations with active researchers. Therefore,we can assume that more active researchers will havemore new collaborators. Thus, it is necessary to quantifya score that represents the activity degree of a researcher r(ScoreActive(r)). Our proposed method for quantifying theresearcher activity level is as follows:

factive(r) =i=0..n∑[ti,ti+1]

N(r)[ti, ti+1] ∗ exp(−δ(t)) (6)

641641

https://www.researchgate.net/publication/200110593_Fast_Random_Walk_with_Restart_and_Its_Applications?el=1_x_8&enrichId=rgreq-c8f242b53135ff2859850faa2877c4b4-XXX&enrichSource=Y292ZXJQYWdlOzI2OTI5OTU5MDtBUzoyMDg1MjQyOTEzODMyOTZAMTQyNjcyNzQ3MzM2Ng==

I.Rate(ri) =1− dN

+ d ∗ (∑

rjLinkTo ri

I.Rate(rj)

|OutLink(rj)|+

out−links∑rj has no

I.Rate(rj)

N) (5)

Where,- t0: current year- tn: the year when researcher r had their first publication- N(r)[ti, ti+1]: number of publications by researcher rduring the period [ti, ti+1]- δ(t) = t0 − ti: number of years from year ti until thecurrent year t0- ti+1 = ti + period (period: a sliding window of time.Various values can be tested experimentally, e.g., 1, 2, or 3years).

Normalization: to scale the activity score (ScoreActive(r))of researchers as [0,1], we normalize the values as follows:

ActiveScore(r) =factive(r)− min

ri∈R(factive{ri})

maxri∈R

(factive{ri})− minri∈R

(factive{ri})(7)

B. Collaboration Quality

To make collaborator recommendations, we evaluate thepotential collaborators based on the likelihood of collabo-rations in the near future, but also based on the quality ofthese collaborations. Thus, we need to quantify the qualityof collaborations that may occur in the near future. Onekey factor that may characterize the quality of potentialcollaborators is the number of possible coauthored papersin the future.

Thus, we propose a new method for quantifying thequality of coauthorships based on the number of coauthoredpapers. We assume that a collaboration is better than othersif it can generate more publications in the future. This meansthat the system should highly recommend potential collabo-rators who can work effectively with the isolated researcherand can produce as many publications as possible in thefuture. Therefore, the quality of the topN recommendedcollaborators can be calculated as follows: Where,• N : the top N potential researchers recommended to

isolated researchers• decision value(i): decision value returned by the

SVM. This value is used to rank the top N potentialcollaborators.

• Coll Num(r′, r): number of papers coauthored by anisolated researcher r′ and a recommended researcher r.

V. EXPERIMENTAL EVALUATION

In this section, we present an evaluation of the use ofthe proposed features for coauthorship prediction basedon a large dataset, which we crawled from the MicrosoftAcademic Search website.

A. Construction of the Experimental Dataset

To our knowledge, no standard datasets exist for evalu-ating academic researcher recommendations, especially forisolated researchers. To prepare our experimental data, weextracted information related to publications in the fieldof computer science using the Microsoft Academic Searchwebsite. The dataset used in our experiment comprisedinformation related to researchers and their publicationsduring 2001–2011, i.e., 1,266,790 publications and 807,005different authors.

Publications from 2001–2005 were used as the trainingset and those from 2006–2011 were used as the test set.Therefore, authors with publications in the period [2001,2005] were used to build the training network (G0) andauthors with publications in the period [2006, 2011] wereused to build the test network (G1).

Isolated researchers lacked coauthor links in G0, but theyhad some coauthor links in G1 extracted for our experi-mental evaluation. In total, 23,651 isolated researchers wereextracted from G0, whereas only 1,491 isolated researcherswith new publications and new coauthor links were in G1.Due to the limited computing resources of our server, weselected 300 random isolated researchers for our experiment.

We extracted all of the coauthor links of the 300 isolatedresearchers with others in G1 (positive instances). In total,1263 positive instances were extracted. To evaluate theaccuracy of link prediction using binary classification, thenegative instances that contained isolated researchers whowere paired with other researchers with no coauthor linksin G1 were extracted randomly. In total, 1263 negativeinstances were also extracted to balance the experimentaldataset. Next, 50% of the positive instances (631) and 50%(631) of the negative instances were used for training.The remaining instances were used for testing. The featurevectors of all the instances were calculated, before using theSVM to perform coauthor link prediction.

B. Evaluation Method

To evaluate the performance of collaborator recommen-dation for isolated researchers, we considered the accuracy(quantity) and quality of the predicted collaborators.

Quantity. To quantitatively evaluate the suitability ofthe proposed features for collaborator recommendation, weused the accuracy of coauthorship prediction for pairs ofresearchers, i.e., an isolated researcher and a recommendedcoauthor. We used the precision, recall, F-measure, andaverage precision (AP) as evaluation metrics. If the systempredicted that a pair of researchers would be coauthors and

642642


Collaboration Quality(TopN) =i=N∑i=1

decision value(i) ∗ Coll Num(r′, r) (8)

Table IACCURACY OF RECOMMENDATIONS WITH DIFFERENT FEATURES

Features Precision Recall AveragePrecision

ContentSim (Baseline) 0.5113 0.7896 0.5328ContentSim, OrgRS 0.9079 0.4367 0.8039ContentSim, OrgRS, I.Rate 0.9079 0.4367 0.8039ContentSim, OrgRS, I.Rate,ActiveScore

0.8792 0.4953 0.8122

OrgRS 0.9133 0.4335 0.8048OrgRS, I.Rate 0.9133 0.4335 0.8042OrgRS, I.Rate, ActiveScore 0.8864 0.4446 0.8113

0.5328

0.8039 0.8039 0.8122 0.8048 0.8042 0.8113

00.10.20.30.40.50.60.70.80.9

ContentSim(Baseline)

ContentSim,OrgRS

ContentSim,OrgRS,I.Rate

ContentSim,OrgRS &I.Rate &

ActiveScore

OrgRS OrgRS,I.Rate

OrgRS,I.Rate,

ActiveScore

Average Precision

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

1 2 3 4 5

ContentSim (Baseline)

ContentSim, OrgRS

ContentSim, OrgRS, I. Rate

ContentSim, OrgRS, I. Rate,ActiveScore

OrgRS

OrgRS, I. Rate

OrgRS, I. Rate, ActiveScore

Figure 3. Average precision (AP) of recommendations using differentfeatures

a coauthorship occurred subsequently, the system was re-garded as having made a correct recommendation, otherwisethe system made an incorrect recommendation.

Quality. Collaborator recommendation is different fromcollaborator prediction because it requires a correct rec-ommendation and a good-quality recommendation. First, tofacilitate evaluations of the quality of coauthorship collabo-rations, we used the proposed method in section 4.2 for thispurpose.

C. Experimental Results and Discussion

The objective of coauthorship prediction is making rec-ommendations. Thus, we analyzed the accuracy and qualityof the classified positive instances, rather than the negativeinstances. Tables 1 and 2 show the experimental results interms of the quantity and quality of the collaborator rec-ommendations after we added the new features sequentially.

Quantity. The feature ContentSim was not sufficientto distinguish positive and negative instances. ContentSimcould not determine whether pairs of researchers would becoauthors. The AP when using ContentSim alone was 0.5328(Table 1).

The addition of the feature OrgRS had a significant effectbecause the AP increased from 0.5328 to 0.8039. However,

Table IIQUALITY OF THE TOPN RECOMMENDATIONS USING DIFFERENT

FEATURES

Features Quality of CollaborationsT@10 T@20 T@30 T@40 T@50

ContentSim (Baseline) 19.64 36.89 50.50 65.68 95.30ContentSim, OrgRS 74.22 140.30 213.95 275.75 398.82ContentSim, OrgRS,I.Rate,

74.23 140.31 213.95 275.75 398.80

ContentSim, OrgRS,I.Rate, ActiveScore

102.04 175.56 233.97 292.16 446.19

OrgRS 91.57 154.52 221.18 278.60 370.69OrgRS, I.Rate 91.56 154.52 221.18 278.59 370.67OrgRS, I.Rate,ActiveScore

178.75 349.76 469.04 585.74 662.89

0.5328

0.8039 0.8039 0.8122 0.8048 0.8042 0.8113

00.10.20.30.40.50.60.70.80.9

ContentSim(Baseline)

ContentSim,OrgRS

ContentSim,OrgRS,I.Rate

ContentSim,OrgRS &I.Rate &

ActiveScore

OrgRS OrgRS,I.Rate

OrgRS,I.Rate,

ActiveScore

Average Precision

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

1 2 3 4 5

ContentSim (Baseline)

ContentSim, OrgRS

ContentSim, OrgRS, I. Rate

ContentSim, OrgRS, I. Rate,ActiveScore

OrgRS

OrgRS, I. Rate

OrgRS, I. Rate, ActiveScore

Figure 4. Quality of topN recommendations with different features

adding other new features, i.e., the I.Rate and ActiveScoreof recommended researchers, had no significant effects onthe accuracy of recommendations (Table 1). A visualizationof this comparison is shown in Figure 3.

Quality. Analyzing the quality of the topN recommen-dations using the two features OrgRS and ActiveScorehad significant effects. The addition of the OrgRS featureincreased the score for the “Quality of Collaborations” from19.64 to 74.22 with the Top10 recommendations, and from95.30 to 398.82 with the Top50 recommendations.

In addition, the exclusion of the ContentSim feature fromall other features increased the score for the “Quality ofCollaborations” from 102.04 to 178.75 with the Top10 rec-ommendations, and from 446.19 to 662.89 with the Top50recommendations (Table 2).

Using our dataset, experiments showed that the I.Rate ofrecommended researchers had no effect on the accuracy orthe quality of the recommended collaborators (Tables 1 and2). A visual comparison of the quality of the recommenda-tions is shown in Figure 4.

643643

VI. CONCLUSIONS AND FUTURE WORK

In this study, we developed a solution to the problemof collaborator recommendation for isolated researchers.We formally defined the problem and presented a methodthat suggests potential collaborators to isolated researcherswith no links in coauthor networks. We used additionalinformation as new features to facilitate recommendation,i.e., OrgRS, I.Rate, and ActiveScore. We also developeda new method for evaluating the quality of recommendedcollaborators. The experimental results demonstrated theeffectiveness and efficiency of our proposed approach.

Most current methods use the accuracy of collaboratorprediction to evaluate collaborator recommendations. In ourfuture work, we will continue to develop new methodsfor evaluating the quality of collaborator recommendations.Other factors that may characterize the quality of potentialcollaborators will be studied: (1) the length of the col-laborative period, and (2) the possibility of expanding tonew coauthorships from the recommended coauthor. We willconsider additional information such as meetings betweenresearchers at conferences where they can develop newcollaborations. Finally, we will investigate new methods forcollaborator recommendation for isolated researchers.

REFERENCES

[1] H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles, “Collabseer:a search engine for collaboration discovery,” in Proceedingsof the 11th annual international ACM/IEEE joint conferenceon Digital libraries, ser. JCDL ’11. New York, NY, USA:ACM, 2011, pp. 231–240.

[2] H.-H. Chen, L. Gou, X. L. Zhang, and C. L. Giles, “Discov-ering missing links in networks using vertex similarity mea-sures,” in Proceedings of the 27th Annual ACM Symposiumon Applied Computing, ser. SAC ’12. New York, NY, USA:ACM, 2012, pp. 138–143.

[3] K. Balog and M. de Rijke, “Finding similar experts,” inProceedings of the 30th annual international ACM SIGIRconference on Research and development in informationretrieval, ser. SIGIR ’07. New York, NY, USA: ACM, 2007,pp. 821–822.

[4] S. D. Gollapalli, P. Mitra, and C. L. Giles, “Similar researchersearch in academic environments,” in Proceedings of the12th ACM/IEEE-CS joint conference on Digital Libraries, ser.JCDL ’12. New York, NY, USA: ACM, 2012, pp. 167–170.

[5] G. Linden, B. Smith, and J. York, “Amazon.com recommen-dations: Item-to-item collaborative filtering,” IEEE InternetComputing, vol. 7, no. 1, pp. 76–80, Jan. 2003.

[6] B. N. Miller, I. Albert, S. K. Lam, J. A. Konstan, and J. Riedl,“Movielens unplugged: Experiences with an occasionallyconnected recommender system,” in Proceedings of the 8thInternational Conference on Intelligent User Interfaces, ser.IUI ’03. New York, NY, USA: ACM, 2003, pp. 263–266.

[7] C. P. W. Li;, “Research paper recommendation with topicanalysis,” in Computer Design and Applications (ICCDA),2010 International Conference. IEEE, 2010, pp. 264–268.

[8] T. Huynh, K. Hoang, and D. Lam, “Trend based vertexsimilarity for academic collaboration recommendation,” inICCCI, 2013, pp. 11–20.

[9] G. R. Lopes, M. M. Moro, L. K. Wives, and J. P. M.De Oliveira, “Collaboration recommendation on academicsocial networks,” in Proceedings of the 2010 internationalconference on Advances in conceptual modeling: applicationsand challenges, ser. ER’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 190–199.

[10] J. Tang, S. Wu, J. Sun, and H. Su, “Cross-domain collab-oration recommendation,” in Proceedings of the 18th ACMSIGKDD international conference on Knowledge discoveryand data mining, ser. KDD ’12. New York, NY, USA: ACM,2012, pp. 1285–1293.

[11] K. Hofmann, K. Balog, T. Bogers, and M. de Rijke, “Contex-tual factors for finding similar experts,” J. Am. Soc. Inf. Sci.Technol., vol. 61, no. 5, pp. 994–1014, May 2010.

[12] W. Tang, J. Tang, T. Lei, C. Tan, B. Gao, and T. Li, “Onoptimization of expertise matching with various constraints,”Neurocomput., vol. 76, no. 1, pp. 71–83, Jan. 2012.

[13] Y. Sun, R. Barber, M. Gupta, C. C. Aggarwal, and J. Han,“Co-author relationship prediction in heterogeneous biblio-graphic networks,” in Proceedings of the 2011 InternationalConference on Advances in Social Networks Analysis andMining, ser. ASONAM ’11. Washington, DC, USA: IEEEComputer Society, 2011, pp. 121–128.

[14] R. A. Baeza-Yates and B. Ribeiro-Neto, Modern InformationRetrieval. Boston, MA, USA: Addison-Wesley LongmanPublishing Co., Inc., 1999.

[15] H. Tong, C. Faloutsos, and J.-Y. Pan, “Fast random walkwith restart and its applications,” in Proceedings of the SixthInternational Conference on Data Mining, ser. ICDM ’06.Washington, DC, USA: IEEE Computer Society, 2006, pp.613–622.

644644













































Collaborator Recommendation for Isolated Researchers

Documents

Transcript of Collaborator Recommendation for Isolated Researchers