Kernels for Link Prediction with Latent Feature Models

30
Background Latent Feature Models Link Kernels Experiments Conclusion Kernels for Link Prediction with Latent Feature Models Canh Hao Nguyen & Hiroshi Mamitsuka Bioinformatics Center, Institute of Chemical Research Kyoto University {canhhao,mami}@kuicr.kyoto-u.ac.jp Canh Hao Nguyen & Hiroshi Mamitsuka Kernels for Link Prediction with Latent Feature Models

Transcript of Kernels for Link Prediction with Latent Feature Models

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels for Link Predictionwith Latent Feature Models

Canh Hao Nguyen & Hiroshi Mamitsuka

Bioinformatics Center, Institute of Chemical ResearchKyoto University

{canhhao,mami}@kuicr.kyoto-u.ac.jp

Canh Hao Nguyen & Hiroshi Mamitsuka Kernels for Link Prediction with Latent Feature Models

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Outline1 Background

Link Prediction on NetworksNetwork Structures

2 Latent Feature ModelsMotivationLatent Feature Models

3 Link KernelsKernels from Latent Feature ModelsNode KernelsLink Kernels

4 ExperimentsLatent Feature AssumptionTime ComplexityLink Prediction Results

5 Conclusion2/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Link Prediction on NetworksNetwork Structures

Link Prediction on Networks

A formal ways of representing relationsbetween entities.

Found in social networks, CollaborativeFiltering, Systems Biology.

A link is a relation between twoentities.

Problem: predicting new links given anetworks.

Information used: nodes’ informationor network structures.

3/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Link Prediction on NetworksNetwork Structures

Assumption of Link Prediction

),( 21 PP

),( 42 PP

),( 23 PP

),( 53 PP

),( 51 PP

Fundamental assumption:independence and identicaldistribution of links.

BUT...

networks have structures

4/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Link Prediction on NetworksNetwork Structures

Assumption of Link Prediction

),( 21 PP

),( 42 PP

),( 23 PP

),( 53 PP

),( 51 PP

Fundamental assumption:independence and identicaldistribution of links.

BUT... networks have structures

4/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Link Prediction on NetworksNetwork Structures

Network Structures

Networks proved to have structures: scale-free, similarity, bipartite,clique, motif ...

Similarity networks.Social networks, metabolicnetworks...

Bipartite networks.Collaborative Filtering,protein-ligand bindings...

Latent feature modelbased networks.

Social networks, PPI, GRnetworks...

5/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Link Prediction on NetworksNetwork Structures

Objective

To model network structures with latent features

To derive latent feature models for non-similarity networks.

To approximate the model using kernel frameworks.

To learn the model optimally and efficiently.

and apply to the link prediction problem in biological networks.

(a) Input (b) Learning (c) Output

6/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

MotivationLatent Feature Models

Outline

1 Background

2 Latent Feature ModelsMotivationLatent Feature Models

3 Link Kernels

4 Experiments

5 Conclusion

7/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

MotivationLatent Feature Models

Biological Motivation

Protein protein interactions are based on domain domaininteractions.

Protein docking is based on shape complementarity.

These features (domain, shape) determine PPI ability.

8/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

MotivationLatent Feature Models

Biological Motivation

Protein protein interactions are based on domain domaininteractions.

Protein docking is based on shape complementarity.

These features (domain, shape) determine PPI ability.

8/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

MotivationLatent Feature Models

Latent Feature Models of Links

A node in the graph contains some(latent) features (F ).

Some pairs of features interact (W).

The nodes contain that pairs link to eachother.

A = F ×W × FT

1 2 3

2

4

6

8

10

12

14

16

1 2 3

1

2

3

2 4 6 8 10 12 14 16

1

2

3

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

Latent Feature Vector

Feature Interaction Matrix

Latent Feature Matrix

Feature Interaction

9/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

MotivationLatent Feature Models

Inferring the Models

given the adjacency matrix A.

Not globally optimal.

Using Indian Buffet Processes (IBP)

... and Gibbs sampling.

Sampling F from some distribution.

Inferring W and repeat the process again.

Very time consuming.

Scalable to small graphs (< 500 nodes).

10/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Outline

1 Background

2 Latent Feature Models

3 Link KernelsKernels from Latent Feature ModelsNode KernelsLink Kernels

4 Experiments

5 Conclusion

11/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Latent Feature Models in Kernels

Kernels to encode similarity

High similarity on the sameclass.

Low similarity otherwise.

),( 21 PP

),( 42 PP

),( 23 PP

),( 53 PP

),( 51 PP

Link Similarity with latent features.

Links are similar if nodes aresimilar.

Nodes are similar if they sharemany latent features.

Ideal Kernels

for any diagonal, nonnegativeD ∈ Rd×d ,

K ∗(D) = F × D × FT

Any node kernels should be close to Ideal Kernels

12/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Latent Feature Models in Kernels

Kernels to encode similarity

High similarity on the sameclass.

Low similarity otherwise.

),( 21 PP

),( 42 PP

),( 23 PP

),( 53 PP

),( 51 PP

Link Similarity with latent features.

Links are similar if nodes aresimilar.

Nodes are similar if they sharemany latent features.

Ideal Kernels

for any diagonal, nonnegativeD ∈ Rd×d ,

K ∗(D) = F × D × FT

Any node kernels should be close to Ideal Kernels

12/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Latent Feature Models in Kernels

Kernels to encode similarity

High similarity on the sameclass.

Low similarity otherwise.

),( 21 PP

),( 42 PP

),( 23 PP

),( 53 PP

),( 51 PP

Link Similarity with latent features.

Links are similar if nodes aresimilar.

Nodes are similar if they sharemany latent features.

Ideal Kernels

for any diagonal, nonnegativeD ∈ Rd×d ,

K ∗(D) = F × D × FT

Any node kernels should be close to Ideal Kernels

12/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Node Kernels

Idea: nodes with more common latentfeatures are more similar.

Observation: nodes with more commonlatent features have similar connectivitypatterns.

We use connectivity pattern to define nodesimilarity.

Kn = norm(A2) (diagonally normalized).

In sparse networks, this kernel Kn

approximates ideal kernels.

PPI networks are sparse (> 80% nodes inyeast, all in fruit-fly are of degree ≤ 10).

13/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Node Kernels

Idea: nodes with more common latentfeatures are more similar.

Observation: nodes with more commonlatent features have similar connectivitypatterns.

We use connectivity pattern to define nodesimilarity.

Kn = norm(A2) (diagonally normalized).

In sparse networks, this kernel Kn

approximates ideal kernels.

PPI networks are sparse (> 80% nodes inyeast, all in fruit-fly are of degree ≤ 10).

13/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Kernels on Links

Given node similarity are encoded in node kernel Kn.

We use pairwise kernels to encode link similarity.

If corresponding nodes are similar, links are similar.

K ({a, b}, {c , d}) = Kn(a, c) ∗ Kn(b, d) + Kn(a, d) ∗ Kn(b, c)

Feature-wise: if ai , bj are feature of a and baibj + ajbi is a feature of {a, b}

14/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Demonstration

1 2 3

2

4

6

8

10

12

14

16

1 2 3

1

2

3

2 4 6 8 10 12 14 16

1

2

3

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

Network model Incomplete observation

F2

−0.4

−0.2

0

−0.3−0.2−0.100.10.20.30.4

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

Nodes and Links Nonlinks class

15/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Properties (Kn similar to ideal kernels)

Positivity: (Kn)ij for a pair of nodes (i , j) is positive if theyshare a common feature.if there is a shared feature according to the generative model,Kn should recognize it

Monotonicity: the more a pair of nodes share features, thehigher the kernel value is.Kn respects the monotonicity as an ideal kernels: morefeatures, higher kernel values

16/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Kernels from Latent Feature ModelsNode KernelsLink Kernels

Properties (Kn similar to ideal kernels) cont.

Recall that Kn = FDFT = F (WEW T )FT , (E = FTF), idealkernels K ∗(D̂) = FD̂FT .

Ideal Condition: Kn is an ideal kernel iff WFT has all rowvectors uncorrelated.

On Sparse Networks: If the model is sparse in the sense that

(∑

i 6=j |WilWjkElk |p)1p ≤ δ then Kn is close to an ideal kernels

in the sense that‖ D − D̂ ‖p≤ δ

17/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Latent Feature AssumptionTime ComplexityLink Prediction Results

Outline

1 Background

2 Latent Feature Models

3 Link Kernels

4 ExperimentsLatent Feature AssumptionTime ComplexityLink Prediction Results

5 Conclusion

18/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Latent Feature AssumptionTime ComplexityLink Prediction Results

Experiments and Data

Input: adjacency matrix A.

90/10 train/test splits.

Learn SVMs on thekernels.

Show: running times,assumption comparison,prediction AUCs on PPInetworks.

Compare to using nodes’attributes: sequencekernels.

PPI networks of yeast and fruit fly:the largest networks available.

Extracted from DIP databases(hand-curated, only directinteractions).

We used only the connected part ofthe networks.

We filtered out the nodes withdegrees less than m.

For m = 1, network sizes: 4762proteins in yeast and 6644 proteinsin fruit fly.

19/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Latent Feature AssumptionTime ComplexityLink Prediction Results

On the Latent Feature Model Assumpsion

Reasonability of the latent feature assumption as opposed to thesimilarity assumption: nearest neighbor classifier.

0.68

0.72

0.76

0.8

0.84

0.88

0 1 2 3 4 5 6 7 8 9 10 11

AUC

Minimum degree of nodes

LK-NN

SimNN

Latent feature assumption is more suitable than than similarity.

20/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Latent Feature AssumptionTime ComplexityLink Prediction Results

Execution Time of Our Method vs. IBP

Our method scales to this size of data as opposed to explicitfeature generation methods.

0 2 4 6 8 1050

100

150

200

250

300

350

400

450

500

Minimum degree of nodes

Sec

ond

Time to train one model

−7.5 −7 −6.5 −6 −5.5 −5 −4.5 −4 −3.5

x 104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 104

Model log−likelihood

Sec

ond

Log−likelihood convergence in time

Our method on all subnetworks IBP on the smallest one.

Our method takes minutes on this sizes of data. IBP only scales tothe smallest ones in many hours.

21/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Latent Feature AssumptionTime ComplexityLink Prediction Results

Link Prediction Performance on yeast PPI networks

0.83

0.85

0.87

0.89

0.91

0 1 2 3 4 5 6 7 8 9 10 11

AUC

Minimum degree of nodes

Linear

Gaussian

IBP

Our method gives very high performance compared to IBP.

The results are significantly higher than random, meaning thatnetworks have significant structures.

Using sequence (spectrum kernels) to predict links in thesenetworks: 0.71± 0.008.

22/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Latent Feature AssumptionTime ComplexityLink Prediction Results

Link Prediction Performance on fruit fly PPI networks

0.7

0.72

0.74

0.76

0.78

0.8

0 1 2 3 4 5 6 7 8 9

AUC

Minimum degree of nodes

Linear

Gaussian

IBP

Our method gives higher result than IBP.

The networks have significant structures.

Using sequence (spectrum kernels) to predict links in thesenetworks: 0.65± 0.016.

23/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Conclusion

We argued to use network structures to predict links.

We proposed a latent feature model to describe the networkstructures.

We used kernels to encode the model implicitly and to train itefficiently and optimally.

Our method scaled to real data of PPI networks, unlike IBP.

Our method gave a high performance of predicting PPI.

Our method is efficient and effective for latent feature models ofnetworks

24/24

BackgroundLatent Feature Models

Link KernelsExperimentsConclusion

Conclusion

We argued to use network structures to predict links.

We proposed a latent feature model to describe the networkstructures.

We used kernels to encode the model implicitly and to train itefficiently and optimally.

Our method scaled to real data of PPI networks, unlike IBP.

Our method gave a high performance of predicting PPI.

Our method is efficient and effective for latent feature models ofnetworks

24/24