Yakub Sebastian_Visit_NTU_201405016v4

34
Information Technology Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation Uncovering hidden connections in scientific literature: From the informatics and complexity science perspectives Yakub Sebastian 16th May 2014

Transcript of Yakub Sebastian_Visit_NTU_201405016v4

Information Technology

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

Uncovering hidden connections in scientific literature: From the informatics and complexity science perspectives

Yakub Sebastian

16th May 2014

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 2

Agenda

1 Hidden connections

2 Literature based discovery

3 Cluster link prediction

4 Collaboration and feedback

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 3

‘The return of the prodigal son’ by Rembrandt (Wood, J. 2012. Euresis Journal. 2, 5-7)

Source: http://uploads6.wikipaintings.org/images/rembrandt/the-return-of-the-prodigal-son-1669.jpg

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 4

Look more closely

Source: http://upload.wikimedia.org/wikipedia/commons/8/8d/Rembrandt_Harmensz._van_Rijn_-

_The_Return_of_the_Prodigal_Son_-_detail_son.jpg

Female hand

(mercy)

Male hand

(justice)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 5

Hidden Connections

‘The whole is greater than the sum of its parts’ – Aristotle (?)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 6

Hidden Connections

Conference: Hidden Connections 3 - 5 March 2014, Complexity Program, NTU

Brian Uzzi

“… the highest-impact science is

primarily grounded in exceptionally

conventional combinations of prior

work yet simultaneously features an

intrusion of unusual combination. “

(Uzzi, B. et al. 2013. Science. 342, 6157, 468-472)

Novelty → the pairing of two

conventional ideas that have never

been put together before. (Uzzi, B. 2014. Complexity Program Annual Conference:

Hidden Connections. 3-5 Mar, Nanyang Technological

University, Singapore)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 7

Hidden Connections

Association: ‘The forgotten half of scientific thinking’ – Marten Scheffer

“… thinking has two complementary

modes: roughly, association versus

reasoning … . We systematically

underestimate the role of the first …”

“How can we feed the associative

machine in our brain with potential

elements for such unexpected links?

This is a tantalizing problem,

because if the connection should be

unexpected one cannot plan for it.”

(Scheffer, M. 2014. PNAS. 111, 17, 6119)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 8

Agenda

1 Hidden connections

2 Literature based discovery

3 Cluster link prediction

4 Collaboration and feedback

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 9

Literature Based Discovery

Literature Based Discovery (LBD)

uses computational algorithms to discover potential hidden connections

between previously disconnected sets of literature.

(Smalheiser, N. 2012. JASIST. 63, 2, 218-224)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 10

Literature Based Discovery

A finding very similar to Uzzi’s result has been reported by a

scientometrician in 2012.

Novel connections established by [Watts, D.J. and Strogatz, S.H. 1998. Nature. 393,

6684, 440-443].

“An article that introduces novel

connections between clusters of co-

cited references is likely to

subsequently become highly cited.” (Chen, C. 2012. JASIST. 63, 3, 431-449)

Brian’s unawareness of Chen’s work

is self-exemplary of a hidden

connection itself!

(personal communication)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 11

Literature Based Discovery

Example 1.

Fish Oil and Raynaud’s

Syndrome hidden connection

(Swanson, D.R.. 1986. Perspectives in Biology and

Medicines.. 30, 1, 7-18)

These literature are:

A. Non-interactive

B. Complementary {A} → Fish oil disrupts blood

viscosity

{C} → Blood viscosity causes

Raynaud’s Syndrome

Illustration: Torvik, V.I. and Smalheiser , N. 2007.

Bioinformatics. 23, 13, 1658-1665.

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 12

Literature Based Discovery

(Swanson, D.R. and Smalheiser, N. 1997. Artificial

Intelligence. 91, 2, 183-203)

Example 2.

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 13

Literature Based Discovery

Content

Structure

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 14

Agenda

1 Hidden connections

2 Literature based discovery

3 Cluster link prediction

4 Collaboration and feedback

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 15

Cluster Link Prediction

pre-discovery (1900 – 1985)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 16

Cluster Link Prediction

post-discovery (1900 – 1986)

Novel inter-cluster links

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 17

Cluster Link Prediction

Observations

A. Potential hidden connections between disparate scientific fields might

be found among non-overlapping clusters that:

do not have existing links, but

whose member nodes exhibit a high propensity to converge.

B. The linking of these clusters involves the novel pairing of

conventional ideas that have never been put together before.

C. As demonstrated in Swanson’s case, such novel pairing does result in

a scientific breakthrough.

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 18

Cluster Link Prediction

Conjectures

A. A search for hidden connections in literature can be re-formulated as

a cluster link prediction problem.

B. One may better predict inter-cluster link formation using a

combination of (a) content-based analysis (semantic) and (b)

structural analysis.

C. Inter-cluster links may emerge as result of the dynamics in the

complex systems of citation networks. This exposes the cluster link

prediction problem to a whole range of methods and tools in

complexity science.

(Newman, M.E.J. 2001. PNAS. 98, 2, 404-409)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 19

Cluster Link Prediction

Bibliographic coupling (shared references) network

during pre-discovery (1900 – 1985) period.

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 20

Cluster Link Prediction

Research questions

RQ1: How do we group scientific papers into clusters of distinct

research areas?

a. Many existing algorithms

b. Performance

c. Ground-truth

RQ2: How do we predict the future formation of links between nodes

in previously disconnected clusters?

a. Features

b. Algorithm

c. Interestingness

(not every inter-cluster link means a scientific breakthrough)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 21

Cluster Link Prediction

Research questions

RQ1: How do we group scientific papers into clusters of distinct

research areas? Earlier works:

Chen, P. and Redner, S. 2010. Journal of Informetrics. 4, 3, 278-290. physics

Waltman, L. and van Eck, N.J. 2012. JASIST. 63, 12, 2378-2392.

Chen, C. 2012. JASIST. 63, 2, 431-449. scientometrics

Boyack, K.W. and Klavans, R. 2014. JASIST. 65, 4, 670-685.

Community detection algorithms. (Fortunato, S. 2010. Physics Reports. 486, 3, 75-174)

Evaluation. (Lancichinetti, A. and Fortunato, S. 2009. Phys. Rev. E. 80, 5, 056117)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 22

(Newman, M.E.J. and Girvan, M. 2004. Phys. Rev. E. 69, 2, 026113)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 23

(1) (2) (3) (4)

Algorithm 1, 2:

Lancichinetti, A. and Fortunato, S. 2009. Phys. Rev. E. 80, 5, 056117

Algorithm 3:

Waltman, L. and van Eck, N.J. 2012. JASIST. 63, 12, 2378-2392

Algorithm 4:

Traag, V.A., Van Dooren, P. and Nesterov, Y. 2011. Phys. Rev. E. 84, 1, 016114

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 24

Consistent with the result reported in (Lancichinetti, A. and Fortunato, S. 2009. Phys. Rev. E. 80, 5, 056117).

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 25

(Park, J. and Newman, M.E.J. 2005. J. Stat. Mech. Theory. Exp. 10, P10014)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 26

(1) (2) (3) (4)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 27

Again, consistent with the result reported in (Lancichinetti, A. and Fortunato, S. 2009. Phys. Rev. E. 80, 5,

056117).

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 28

Cluster Link Prediction

Future work

A. Apply INFOMAP on citation data sets of American Physical Society1

(Rosvall, M. and Bergstrom, C.T. 2008. PNAS. 105, 4, 1118-1123)

Size : > 450,000 articles

Years : 1893 – 2010

Coverage : Physical Review Letters

Physical Review

Reviews of Modern Physics 1https://publish.aps.org/datasets

B. Evaluate cluster quality.

Ground truth? Suitable metrics?

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 29

Cluster Link Prediction

Future work

C. RQ2: How do we predict the future formation of links between nodes

in previously disconnected clusters?

Latent Domain Similarity (LDS)

Assumption: Different literature could have been published separately in seemingly unrelated

fields. It is possible that they share many similar domains previously unknown to

researchers in each field (i.e. latent).

Goal: To explore whether these shared latent domains correlate with the probability

of previously disconnected clusters to form future citation links with each

other.

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 30

Cluster Link Prediction

Future work

Topic modeling (Blei, D.M., Ng, A.Y. and Jordan, M.I. 2003. J. Mach. Learn. Res. 3, 993-1022)

Approach: content analysis (conventional) + structural analysis

Recent example: Lancichinetti, A., Sirer, M.I., Wang, J.X., Acuna, D., Körding, K. and Amaral, L.A.N.

2014. arXiv:1402.0422v1

Evaluation benchmark (?)

PRL Milestone papers (1958-2008), including 40 Nobel Prize papers.

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 31

Agenda

1 Hidden connections

2 Literature based discovery

3 Cluster link prediction

4 Collaboration and feedback

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 32

Collaboration and feedback

The main purpose of the current visit.

Contributions to the Complexity Program @ NTU: New added dimension to the current complexity studies

Potential shared publications

Benefits to my research:

Developing a new complexity science-oriented LBD method

Access expertise and resources in complexity science and

physics

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 33

Conference: Hidden Connections 3 - 5 March 2014, Complexity Program, NTU

Brian Uzzi

“… the highest-impact science is

primarily grounded in exceptionally

conventional combinations of prior

work yet simultaneously features an

intrusion of unusual combination. “

(Uzzi, B. et al. 2013. Science. 342, 6157, 468-472)

Novelty → the pairing of two

conventional ideas that have never

been put together before. (Uzzi, B. 2014. Complexity Program Annual Conference:

Hidden Connections. 3-5 Mar, Nanyang Technological

University, Singapore)

Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation

16th May 2014 Research presentation at Nanyang Technological University 34

Thank you

Yakub Sebastian

PhD Candidate

School of Information Technology

Monash University Malaysia

Jalan Lagoon Selatan

46150 Bandar Sunway

Petaling Jaya, Selangor, Malaysia

Email: [email protected]