A quantum Jensen–Shannon graph kernel for unattributed graphs

12
A quantum JensenShannon graph kernel for unattributed graphs Lu Bai a,n , Luca Rossi b,nn , Andrea Torsello c , Edwin R. Hancock a,1 a Department of Computer Science, The University of York, York YO10 5DD, UK b School of Computer Science, University of Birmingham, UK c Department of Environmental Science, Informatics, and Statistics, CaFoscari University of Venice, Italy article info Article history: Received 8 September 2013 Received in revised form 20 March 2014 Accepted 21 March 2014 Keywords: Graph kernels Continuous-time quantum walk Quantum state Quantum JensenShannon divergence abstract In this paper, we use the quantum JensenShannon divergence as a means of measuring the information theoretic dissimilarity of graphs and thus develop a novel graph kernel. In quantum mechanics, the quantum JensenShannon divergence can be used to measure the dissimilarity of quantum systems specied in terms of their density matrices. We commence by computing the density matrix associated with a continuous-time quantum walk over each graph being compared. In particular, we adopt the closed form solution of the density matrix introduced in Rossi et al. (2013) [27,28] to reduce the computational complexity and to avoid the cumbersome task of simulating the quantum walk evolution explicitly. Next, we compare the mixed states represented by the density matrices using the quantum JensenShannon divergence. With the quantum states for a pair of graphs described by their density matrices to hand, the quantum graph kernel between the pair of graphs is dened using the quantum JensenShannon divergence between the graph density matrices. We evaluate the performance of our kernel on several standard graph datasets from both bioinformatics and computer vision. The experi- mental results demonstrate the effectiveness of the proposed quantum graph kernel. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction Structural representations have been used for over 30 years in pattern recognition due to their representational power. However, the increased descriptiveness comes at the cost of a greater difculty in applying standard techniques to them, as these usually require data to reside in a vector space. The famous kernel trick [1] allows the focus to be shifted from the vectorial representation of data, which now becomes implicit, to a similarity representation. This allows standard learning techniques to be applied to data for which no obvious vectorial representation exists. For this reason, in recent years pattern recognition has witnessed an increasing interest in structural learning using graph kernels. 1.1. Literature review 1.1.1. Graph kernels One of the most inuential works on structural kernels was the generic R-convolution kernel proposed by Haussler [2]. Here graph kernels are computed by comparing the similarity of each of the decompositions of the two graphs. Depending on how the graphs are decomposed, we obtain different kernels. Generally speaking, most R-convolution kernels count the number of isomorphic substructures in the two graphs. Kashima et al. [3] compute the kernel by decomposing the graph into random walks, while Borgwardt et al. [4] have proposed a kernel based on shortest paths. Here, the similarity is determined by counting the numbers of pairs of shortest paths of the same length in a pair of graphs. Shervashidze et al. [5] have developed a subtree kernel on subtrees of limited size. They compute the number of subtrees shared between two graphs using the WeisfeilerLehman graph invariant. Aziz et al. [6] have dened a backtrackless kernel on cycles of limited length. They compute the kernel value by counting the numbers of pairs of cycles of the same length in a pair of graphs. Costa and Grave [7] have dened a so-called neighborhood subgraph pairwise distance kernel by counting the number of pairs of isomorphic neighborhood subgraphs. Recently, Kriege et al. [8] counted the number of isomorphisms between pairs of subgraphs, while Neumann et al. [9] have introduced the concept of propagation kernels to handle partially labeled graphs through the use of continuous-valued vertex attributes. One drawback of these kernels is that they neglect the loca- tional information for the substructures in a graph. In other words, the similarity does not depend on the relationships between substructures. As a consequence, these kernels cannot establish Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition http://dx.doi.org/10.1016/j.patcog.2014.03.028 0031-3203/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. Tel.: þ86 13084142051, þ44 7429399030. nn Corresponding author. Tel.: þ44 1214144766. E-mail addresses: [email protected], [email protected] (L. Bai), [email protected], [email protected] (L. Rossi), [email protected] (A. Torsello), [email protected] (E.R. Hancock). 1 He was supported by a Royal Society Wolfson Research Merit Award. Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.03.028i Pattern Recognition (∎∎∎∎) ∎∎∎∎∎∎

Transcript of A quantum Jensen–Shannon graph kernel for unattributed graphs

A quantum JensenndashShannon graph kernel for unattributed graphs

Lu Bai an Luca Rossi bnn Andrea Torsello c Edwin R Hancock a1

a Department of Computer Science The University of York York YO10 5DD UKb School of Computer Science University of Birmingham UKc Department of Environmental Science Informatics and Statistics Carsquo Foscari University of Venice Italy

a r t i c l e i n f o

Article historyReceived 8 September 2013Received in revised form20 March 2014Accepted 21 March 2014

KeywordsGraph kernelsContinuous-time quantum walkQuantum stateQuantum JensenndashShannon divergence

a b s t r a c t

In this paper we use the quantum JensenndashShannon divergence as a means of measuring the informationtheoretic dissimilarity of graphs and thus develop a novel graph kernel In quantum mechanics thequantum JensenndashShannon divergence can be used to measure the dissimilarity of quantum systemsspecified in terms of their density matrices We commence by computing the density matrix associatedwith a continuous-time quantum walk over each graph being compared In particular we adopt theclosed form solution of the density matrix introduced in Rossi et al (2013) [2728] to reduce thecomputational complexity and to avoid the cumbersome task of simulating the quantum walk evolutionexplicitly Next we compare the mixed states represented by the density matrices using the quantumJensenndashShannon divergence With the quantum states for a pair of graphs described by their densitymatrices to hand the quantum graph kernel between the pair of graphs is defined using the quantumJensenndashShannon divergence between the graph density matrices We evaluate the performance of ourkernel on several standard graph datasets from both bioinformatics and computer vision The experi-mental results demonstrate the effectiveness of the proposed quantum graph kernel

amp 2014 Elsevier Ltd All rights reserved

1 Introduction

Structural representations have been used for over 30 years inpattern recognition due to their representational power Howeverthe increased descriptiveness comes at the cost of a greaterdifficulty in applying standard techniques to them as these usuallyrequire data to reside in a vector space The famous kernel trick [1]allows the focus to be shifted from the vectorial representation ofdata which now becomes implicit to a similarity representationThis allows standard learning techniques to be applied to data forwhich no obvious vectorial representation exists For this reasonin recent years pattern recognition has witnessed an increasinginterest in structural learning using graph kernels

11 Literature review

111 Graph kernelsOne of the most influential works on structural kernels was the

generic R-convolution kernel proposed by Haussler [2] Here graph

kernels are computed by comparing the similarity of each of thedecompositions of the two graphs Depending on how the graphsare decomposed we obtain different kernels Generally speakingmost R-convolution kernels count the number of isomorphicsubstructures in the two graphs Kashima et al [3] compute thekernel by decomposing the graph into random walks whileBorgwardt et al [4] have proposed a kernel based on shortestpaths Here the similarity is determined by counting the numbersof pairs of shortest paths of the same length in a pair of graphsShervashidze et al [5] have developed a subtree kernel onsubtrees of limited size They compute the number of subtreesshared between two graphs using the WeisfeilerndashLehman graphinvariant Aziz et al [6] have defined a backtrackless kernel oncycles of limited length They compute the kernel value bycounting the numbers of pairs of cycles of the same length in apair of graphs Costa and Grave [7] have defined a so-calledneighborhood subgraph pairwise distance kernel by counting thenumber of pairs of isomorphic neighborhood subgraphs RecentlyKriege et al [8] counted the number of isomorphisms betweenpairs of subgraphs while Neumann et al [9] have introduced theconcept of propagation kernels to handle partially labeled graphsthrough the use of continuous-valued vertex attributes

One drawback of these kernels is that they neglect the loca-tional information for the substructures in a graph In other wordsthe similarity does not depend on the relationships betweensubstructures As a consequence these kernels cannot establish

Contents lists available at ScienceDirect

journal homepage wwwelseviercomlocatepr

Pattern Recognition

httpdxdoiorg101016jpatcog2014030280031-3203amp 2014 Elsevier Ltd All rights reserved

n Corresponding author Tel thorn86 13084142051 thorn44 7429399030nn Corresponding author Tel thorn44 1214144766E-mail addresses bailu69hotmailcom lucsyorkacuk (L Bai)

lrossicsbhamacuk blextargmailcom (L Rossi)torsellodsiuniveit (A Torsello) erhcsyorkacuk (ER Hancock)

1 He was supported by a Royal Society Wolfson Research Merit Award

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎

reliable structural correspondences between the substructuresThis limits the precision of the resulting similarity measure Toovercome this problem Frohlich et al [10] introduced alternativeoptimal assignment kernels Here each pair of structures is alignedbefore comparison However the introduction of the alignmentstep results in a kernel that is not positive definite in general [11]The problem results from the fact that alignments are not ingeneral transitive In other words if s is the vertex-alignmentbetween graph A and graph B and π is the alignment betweengraph B and graph C in general we cannot guarantee that thealignment between graph A and graph C is πs On the other handwhen the alignments are transitive there is a common simulta-neous alignment of all the graphs Under this alignment theoptimal assignment kernel is simply the sum over all the vertexedge kernels and this is positive definite since it is the sum ofseparate positive definite kernels While lacking positive definite-ness the optimal assignment kernels cannot be guaranteed torepresent an implicit embedding into a Hilbert space they havenonetheless been proven to be very effective in classifying struc-tures Another example of alignment-based kernels is the edit-distance-based kernels introduced by Neuhaus and Bunke [12]Here the alignments obtained from graph-edit distance are used toguide random walks on the structures being compared

An attractive alternative way to measure the similarity of a pair ofgraphs is to use the mutual information and compute the classicalJensenndashShannon divergence [13] In information theory the JensenndashShannon divergence is a dissimilarity measure between probabilitydistributions It is symmetric always well defined and bounded [13]Bai and Hancock [14] have used the divergence to define a JensenndashShannon kernel for graphs Here the kernel between a pair of graphsis computed using a nonextensive entropy measure in terms of theclassical JensenndashShannon divergence between probability distribu-tions over graphs For a graph the elements of the probabilitydistribution are computed from the degree of corresponding verticesThe entropy associated with a probability distribution of an individualgraph can thus be directly computed without the need of decompos-ing the graph into substructures Hence unlike the aforementionedexisting graph kernels the JensenndashShannon kernel between a pair ofgraphs avoids the computationally burdensome task of determiningthe similarities between all pairs of substructures Unfortunately therequired composite entropy for the JensenndashShannon kernel is com-puted from a product graph formed by a pair of graphs and reflectsno correspondence information between pairs of vertices As a resultthe JensenndashShannon graph kernel lacks correspondence informationbetween the probability distributions over the graphs and thuscannot precisely reflect the similarity between graphs

There has recently been an increasing interest in quantumcomputing because of the potential speed-ups over classicalalgorithms Examples include Grovers polynomially faster searchalgorithm [15] and Shors exponentially faster factorization algo-rithm [16] Furthermore quantum algorithms also offer us a richerstructure than their classical counterparts since they use qubitsrather than bits as the basic representational unit [32]

112 Quantum computationQuantum systems differ from their classical counterparts since

they add the possibility of state entanglement to the classicalstatistical mixture of classical systems which results in an expo-nential increase of the dimensionality of the state-space which isat the basis of the quantum speedup Pure quantum states arerepresented as entries in a complex Hilbert space while poten-tially mixed quantum states are represented through the densitymatrix Mixed states can then be compared by examining theirdensity matrices One convenient way to do this is to use thequantum JensenndashShannon divergence first introduced by Majtey

et al [1318] Unlike the classical JensenndashShannon divergencewhich is defined as a similarity measure between probabilitydistributions the quantum JensenndashShannon divergence is definedas the distance measure between mixed quantum states describedby density matrices Moreover it can be used to measure both thedegree of mixing and the entanglement [1318]

In this paper we are interested in computing a kernel betweenpairs of graphs using the quantum JensenndashShannon divergencebetween two mixed states representing the evolution ofcontinuous-time quantum walks on the graphs The continuous-time quantum walk has been introduced as the natural quantumanalogue of the classical random walk by Farhi and Gutmann [19]and has been widely used in the design of novel quantumalgorithms Ambainis et al [2021] were among the first togeneralize random walks on finite graphs to the quantum worldFurthermore they have explored the application of quantumwalks on graphs to a number of problems [22] Childs et al[2324] have explored the difference between quantum andclassical random walks and then exploited the increased powerof quantum walk as a general model of computation

Similar to the classical randomwalk on a graph the state space ofthe continuous-time quantumwalk is the set of vertices of the graphHowever unlike the classical random walk where the state vector isreal-valued and the evolution is governed by a doubly stochasticmatrix the state vector of the continuous-time quantum walk iscomplex-valued and its evolution is governed by a time-varyingunitary matrix The continuous-time quantum walk possesses anumber of interesting properties which are not exhibited by theclassical random walk For instance the continuous-time quantumwalk is reversible and non-ergodic and does not have a limitingdistribution As a result the continuous-time quantumwalk offers usan elegant way to design quantum algorithms on graphs For furtherdetails on quantum computation and quantum algorithms we referthe reader to the textbook in [32]

There have recently been several attempts to define quantumkernels using the continuous-time quantumwalk For instance Baiet al [26] have introduced a novel graph kernel where thesimilarity between two graphs is defined in terms of the similaritybetween two quantum walks evolving on the two graphs Thebasic idea here is to associate with each graph a mixed quantumstate representing the time evolution of a quantum walk Thekernel between the walk is defined as the divergence between thecorresponding density operators However this quantum diver-gence measure requires the computation of an additional mixedstate where the system has equal probability of being in each ofthe two original quantum states Unless this quantum kernel takesinto account the correspondences between the vertices of the twographs it can be shown that it is not permutation invariant Rossiet al [2728] have attempted to overcome this problem byallowing the two quantum walks to evolve on the union of thetwo graphs This exploits the relation between the interferences ofthe continuous-time quantum walks and the symmetries in thestructures being compared Since both the mixed states aredefined on the same structure this quantum kernel addressesthe shortcoming of permutation variance However the resultingapproach is less intuitive thus making it particularly hard to provethe positive semidefiniteness of the kernel Moreover the uniongraph is established by roughly connecting all vertex pairs of thetwo graphs and thus also lacks vertex correspondence informa-tion As a result this kernel cannot reflect precise similaritybetween the two graphs

12 Contributions

The aim of this paper is to overcome the shortcomings of theexisting graph kernels by defining a novel quantum kernel To this

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎2

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

end we develop the work in [26] further We intend to solve theproblems of permutation variance and missing vertex correspon-dence by introducing an additional alignment step and thuscompute an aligned mixed density matrix The goal of thealignment is to provide vertex correspondences and a lower boundon the divergence over permutations of the verticesquantumstates However we approximate the alignment that minimizesthe divergence with that which minimizes the Frobenious normbetween the density operators This latter problem is solved usingUmeyamas graph matching method [29] Since the aligneddensity matrix is computed by taking into account the vertexcorrespondences between the two graphs the density matrixreflects the locational correspondences between the quantumwalks over the vertices of the two graphs As a result our newquantum kernel can not only overcome the shortcomings ofmissing vertex correspondence and permutation variance thatarise with the existing kernels from the classicalquantum Jen-senndashShannon divergence but also overcome the shortcoming ofneglecting locational correspondence between substructures thatarises in the existing R-convolution kernels Furthermore incontrast to what is done in previous assignment-based kernelswe do not align the structures directly Rather we align the densitymatrices which are a special case of a Laplacian family signature[30] well known to provide a more robust representation forcorrespondence estimation Finally we adopt the closed formsolution of the density matrix introduced in [2728] to reducethe computational complexity and to avoid the cumbersome taskof explicitly simulating the time evolution of the quantum walkWe evaluate the performance of our kernel on several standardgraph datasets from bioinformatics and computer vision Theexperimental results demonstrate the effectiveness of the pro-posed graph kernel providing both a higher classification accuracythan the unaligned kernel and also achieving a performancecomparable or superior to other state-of-the-art graph kernels

13 Paper outline

In Section 2 we introduce the quantum mechanical formalismthat will be used to develop the ideas proposed in the paper Wedescribe how to compute a density matrix for the mixed quantumstate of the continuous-time quantum walk on a graph With themixed state density matrix to hand we compute the von Neumannentropy of the continuous-time quantum walk on the graphIn Section 3 we compute a graph kernel using the quantumJensenndashShannon divergence between the density matrices for apair of graphs In Section 4 we provide some experimentalevaluations which demonstrate experimentally the effectivenessof our quantum kernel In Section 5 we conclude the paper andsuggest future research directions

2 Quantum mechanical background

In this section we introduce the quantummechanical formalismto be used in this work We commence by reviewing the concept ofa continuous-time quantumwalk on a graph We then describe howto establish a density matrix for the mixed quantum state of thegraph through a quantum walk With the density matrix of themixed quantum state to hand we show how to compute a vonNeumann entropy for a graph

21 The continuous-time quantum walk

The continuous-time quantum walk is a natural quantumanalogue of the classical random walk [1931] Classical randomwalks can be used to model a diffusion process on a graph and

have proven to be a useful tool in the analysis of graph structureIn general diffusion is modeled as a Markovian process definedover the vertices of the graph where the transitions take place onthe edges More formally let Gfrac14 ethV ETHORN be an undirected graphwhere V is a set of n vertices and Efrac14 ethV VTHORN is a set of edges Thestate vector for the walk at time t is a probability distribution overV ie a vector ptARn whose uth entry gives the probability thatthe walk is at vertex u at time t Let A denote the symmetricadjacency matrix of the undirected graph G and let D be thediagonal degree matrix with elements du frac14sumn

v frac14 1Aethu vTHORN where duis the degree of the vertex u Then the continuous-time randomwalk on G will evolve according to the equation

pt frac14 eLt p0 eth1THORN

where Lfrac14DA is the graph LaplacianAs in the classical case the continuous-time quantum walk is

defined as a dynamical process over the vertices of the graphHowever in contrast to the classical case where the state vector isconstrained to lie in a probability space in the quantum case thestate of the system is defined through a vector of complexamplitudes over the vertex set V The squared norm of theamplitudes sums to unity over the vertices of the graph withno restriction on their sign or complex phase This in turn allowsboth destructive and constructive interferences to take placebetween the complex amplitudes Moreover in the quantumcase the evolution of the walk is governed by a complex valuedunitary matrix whereas the dynamics of the classical randomwalks is governed by a stochastic matrix Hence the evolution ofthe quantum walk is reversible which implies that quantumwalks are non-ergodic and do not possess a limiting distributionAs a result the behaviour of classical and quantum walks differssignificantly In particular the quantum walk possesses a numberof interesting properties not exhibited in the classical case Onenotable consequence of the interference properties of quantumwalks is that when a quantum walker backtracks on an edge itdoes so with the opposite phase This gives rise to destructiveinterference and reduces the problem of tottering In fact thefaster hitting time observed in quantum walks on the line is aconsequence of the destructive interference of backtrackingpaths [31]

The Dirac notation also known as bra-ket notation is astandard notation used for describing quantum states We callpure state a quantum state that can be described using a single ketvector jarang In the context of the paper a ket vector is a complexvalued column vector of unit Euclidean length in an n-dimen-sional Hilbert space Its conjugate transpose is a bra (row) vectordenoted as langaj As a result the inner product between two statesjarang and jbrang is written as langajbrang while their outer product is jaranglangbjUsing the Dirac notation we denote the basis state correspondingto the walk being at vertex uAV as jurang In other words jurang is thevector which is equal to 1 in the position corresponding to thevertex u and is zero everywhere else A general state of the walk isthen expressed as a complex linear combination of these ortho-normal basis states such that the state vector of the walk at time tis defined as

jψ trangfrac14 sumuAV

αuethtTHORNjurang eth2THORN

where the amplitude αuethtTHORNAC is subject to the constraintssumuAVαuethtTHORNαn

uethtTHORN frac14 1 for all uAV tARthorn where αnuethtTHORN is the complex

conjugate of αuethtTHORNAt each instant of time the probability of the walk being at

a particular vertex of the graph GethV ETHORN is given by the squarednorm of the amplitude associated with jψ trang More formally let Xt

be a random variable giving the location of the walk at time t

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 3

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

The probability of the walk being at vertex uAV at time t isgiven by

PrethXt frac14 uTHORN frac14 αuethtTHORNαn

uethtTHORN eth3THORNMore generally a quantum walker represented by jψ trang can be

observed to be in any state jϕrangACn and not just in the statesjuranguAV corresponding to the graph vertices Given an orthonor-mal basis jϕ1ranghellip jϕnrang of C

n the quantumwalker jψ trang is observed instate jϕirangwith probability langψ ijψ irang After being observed on state jϕirangthe new state of the quantum walker is jϕirang Further given thenature of quantum observation only orthonormal states can bedistinguished by a single observation

In quantum mechanics the Schrodinger equation is a partialdifferential equation that describes how quantum systems evolvewith time In the non-relativistic case it assumes the form

iℏpartpartt

ψrangfrac14H ψrang eth4THORN

where the Hamiltonian operatorH accounts for the total energy of asystem In the case of a particle moving in an empty space with zeropotential energy this reduces to the Laplacian operator nabla2 Here wetake the Hamiltonian of the system to be the graph Laplacian matrixL However any Hermitian operator encoding the structure of thegraph can be chosen as an alternative The evolution of the walk attime t is then given by the Schrodinger equation

partpartt

ψ trangfrac14 iL ψ trang eth5THORN

where the Laplacian L plays the role of the system HamiltonianGiven an initial state jψ0rang the Schrodinger equation has anexponential solution which can be used to determine the stateof the walk at a time t giving

jψ trangfrac14 e iLt jψ0rang eth6THORNwhere Ut frac14 e iLt is a unitary matrix governing the evolution of thewalk To simulate the evolution of the quantum walk it iscustomary to rewrite Eq (6) in terms of the spectral decomposi-tion Lfrac14ΦTΛΦ of the Laplacian matrix L where Λfrac14 diagethλ1 λ2hellip λjV jTHORN is a diagonal matrix with the ordered eigenvalues aselements ethλ1oλ2o⋯oλjV jTHORN and Φfrac14 ethϕ1jϕ2j⋯jϕjV jTHORN is a matrixwith the corresponding ordered orthonormal eigenvectors ascolumns Hence Eq (6) can be rewritten as

jψ trangfrac14ΦTe iΛtΦjψ0rang eth7THORNIn this work we let αueth0THORN be proportional to du and using the

constraints applied to the amplitude we get

αueth0THORN frac14ffiffiffiffiffiffiffiffiffiffidusumdu

s eth8THORN

Note that there is no particular indication as how to choose theinitial state However choosing a uniform distribution over thevertices of G would result in the system remaining stationarysince the initial state vector would be an eigenvector of theLaplacian of G and thus a stationary state of the walk

22 A density matrix from the mixed state

While a pure state can be naturally described using a single ketvector in general a quantum system can be in a mixed state ie astatistical ensemble of pure quantum states jψ irang each with prob-ability pi The density operator (or density matrix) of such a systemis defined as

ρfrac14sumipijψ iranglangψ ij eth9THORN

Density operators are positive unit-trace matrices directlylinked with the observables of the (mixed) quantum system

Let O be an observable ie a Hermitian operator acting on thequantum states and providing a measurement Without loss ofgenerality we have Ofrac14summ

i frac14 1viPi where Pi is the orthogonalprojector onto the i-th observation basis and vi is the measure-ment value when the quantum state is observed to be in thisobservation basis

The expectation value of the measurement over a mixed statecan be calculated from the density matrix ρ

langOrangfrac14 trethρOTHORN eth10THORNwhere tr is the trace operator Similarly the observation prob-ability on the i-th observation basis can be expressed in terms ofthe density matrix ρ as

PrethX frac14 iTHORN frac14 trethρPiTHORN eth11THORNFinally after the measurement the corresponding density opera-tor will be

ρ0 frac14 summ

i frac14 1PiρPi eth12THORN

For a graph GethV ETHORN let jψ trang denote the state corresponding to acontinuous-time quantumwalk that has evolved from time tfrac140 totime tfrac14T We define the time-averaged density matrix ρG

T forGethV ETHORN

ρTG frac14 1T

Z T

0ψ tranglangψ t dt

eth13THORN

which can be expressed in terms of the initial state jψ0rang as

ρTG frac14 1T

Z T

0e iLt ψ0ranglangψ0 eiLt dt

eth14THORN

In other words ρGT describes a system which has an equal

probability of being in any of the pure states defined by theevolution of the quantum walk from tfrac140 to tfrac14T Using thespectral decomposition of the Laplacian we can rewrite theprevious equation as

ρTG frac14 1T

Z T

0ΦTe iΛtΦ ψ0ranglangψ0 ΦTeiΛtΦ dt

eth15THORN

Let ϕxy denote the ethxyTHORN th element of the matrix of eigenvectorsΦ of the Laplacian The (rc)th element of ρT can be computed as

ρTGethr cTHORN frac141T

Z T

0sumn

k frac14 1sumn

l frac14 1ϕrke

iλktϕlkψ0l

sumn

x frac14 1sumn

y frac14 1ψ0xϕxye

iλytϕcy

dt

eth16THORNLet ψ k frac14sumlϕlkψ0l and ψ y frac14sumxϕxyψ0y where ψ0l denotes the lthelement of jψ0rang Then

ρTGethr cTHORN frac141T

Z T

0sumn

k frac14 1ϕrke

iλktψ k sumn

y frac14 1ϕcye

iλytψ y

dt eth17THORN

which can be rewritten as

ρTGethr cTHORN frac14 sumn

k frac14 1sumn

y frac14 1ϕrkϕcyψ kψ y

1T

Z T

0eiethλy λkTHORNt dt eth18THORN

If we let T-1 Eq (18) further simplifies to

ρ1G ethr cTHORN frac14 sumλA ~Λ

sumxABλ

sumyBλ

ϕrxϕcyψ xψ y eth19THORN

where ~Λ is the set of distinct eigenvalues of the Laplacian matrix Land Bλ is a basis of the eigenspace associated with λ As aconsequence computing the time-averaged density matrix relieson computing the eigendecomposition of GethV ETHORN and hence hastime complexity OethjV j3THORN Note that in the remainder of this paperwe will simplify our notation by referring to the time-averageddensity matrix as the density matrix associated with a graph

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎4

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

Moreover we will assume that ρGT is computed for T-1 unless

otherwise stated and we will denote it as ρG

23 The von Neumann entropy of a graph

In quantum mechanics the von Neumann entropy [32] HN of amixture is defined in terms of the trace and logarithm of thedensity operator ρ

HN frac14 trethρ log ρTHORN frac14 sumiξi ln ξi eth20THORN

where ξ1hellip ξn are the eigenvalues of ρ Note that if the quantumsystem is a pure state jψ irang with probability pifrac141 then the vonNeumann entropy HNethρTHORN frac14 trethρ log ρTHORN is zero On other hand amixed state generally has a non-zero von Neumann entropyassociated with its density operator

Here we propose to associate to each graph the von Neumannentropy of the density matrix computed as in Eq (19) Consider agraph GethV ETHORN its von Neumann entropy is

HNethρGTHORN frac14 trethρG log ρGTHORN frac14 sumjV j

jλGj log λGj eth21THORN

where λG1 hellip λGj hellip λGjV j are the eigenvalues of ρG

24 Quantum JensenndashShannon divergence

The classical JensenndashShannon divergence is a non-extensivemutual information measure defined between probability distri-butions over potentially structured data It is related to theShannon entropy of the two distributions [33] Consider two(discrete) probability distributions P frac14 ethp1hellip pahellip pATHORN and Qfrac14ethq1hellip qbhellip qBTHORN then the classical JensenndashShannon divergencebetween P and Q is defined as

DJSethPQTHORN frac14HSPthornQ2

12HSethPTHORN1

2HSethQTHORN eth22THORN

where HSethPTHORN frac14sumAa frac14 1pa log pa is the Shannon entropy of distri-

bution P The classical JensenndashShannon divergence is alwayswell defined symmetric negative definite and bounded ie0rDJSr1

The quantum JensenndashShannon divergence has recently beendeveloped as a generalization of the classical JensenndashShannondivergence to quantum states by Lamberti et al [13] Given twodensity operators ρ and s the quantum JensenndashShannon diver-gence between them is defined as

DQJSethρsTHORN frac14HNρthorns2

12HNethρTHORN

12HNethsTHORN eth23THORN

The quantum JensenndashShannon divergence is always well definedsymmetric negative definite and bounded ie 0rDQJSr1 [13]

Additionally the quantum JensenndashShannon divergence verifiesa number of properties which are required for a good distinguish-ability measure between quantum states [1318] This is a problemof key importance in quantum computation and it requires thedefinition of a suitable distance measure Wootters [34] was thefirst to introduce the concept of statistical distance between purequantum states His work is based on extending a distance overthe space of probability distributions to the Hilbert space of purequantum states Similarly the relative entropy [35] can be seen asa generalization of the information theoretic KullbackndashLeiblerdivergence However the relative entropy is unbounded it is notsymmetric and it does not satisfy the triangle inequality

The square root of the QJSD on the other hand is bounded it isa distance and as proved by Lamberti et al [13] it satisfies thetriangle inequality This is formally proved for pure states whilefor mixed states there is only empirical evidence Some alterna-tives to the QJSD are described in the literature most notably the

Bures distance [36] Both the Bures distance and the QJSD requirethe full density matrices to be computed However the QJSD turnsout to be faster to compute On the one hand the Bures distanceinvolves taking the square root of matrices usually computedthrough matrix diagonalization and which has time complexityOethn3THORN where n is the number of vertices in the graph On the otherhand the QJSD simply requires computing the eigenvalues of thedensity matrices which can be done in Oethn2THORN

3 A quantum JensenndashShannon graph kernel

In this section we propose a novel graph kernel based on thequantum JensenndashShannon divergence between continuous-timequantumwalks on different graphs Suppose that the graphs underconsideration are represented by a set fGhellipGahellipGbhellipGNg weconsider the continuous-time quantum walks on a pair of graphsGaethVa EaTHORN and GbethVb EbTHORN whose mixed state density matrices ρaand ρb can be computed using Eq (18) or Eq (19) With the densitymatrices ρa and ρb to hand we compute the quantum JensenndashShannon divergence DQJSethρa ρbTHORN using Eq (23) However the resultdepends on the relative order of the vertices in the two graphsand we need a way to describe the mixed quantum state given bythe walk on both graphs To this end we consider two strategiesWe refer to these as the unaligned density matrix and the aligneddensity matrix In the first case the density matrices are addedmaintaining the vertex order of the data This results in theunaligned kernel

kQJSUethGaGbTHORN frac14 expethμDQJSethρa ρbTHORNTHORN eth24THORN

where μ is a decay factor in the interval 0oμr1

DQJSethρa ρbTHORN frac14HNρathornρb

2

12HNethρaTHORN

12HNethρbTHORN eth25THORN

is the quantum Jensen Shannon divergence between the mixedstate density matrices ρa and ρb of Ga and Gb and HNethTHORN is the vonNeumann entropy of a graph density matrix defined in Eq (21)Here μ is used to ensure that the large value does not tend todominant the kernel value and in this work we use μfrac141 Theproposed kernel can be viewed as a similarity measure betweenthe quantum states associated with a pair of graphs Unfortunatelysince the unaligned density matrix ignores the vertex correspon-dence information for a pair of graphs it is not permutationinvariant to vertex order Instead it relies only on the globalproperties and long-range interactions between vertices to differ-entiate between structures

In the second case the mixing is performed after the densitymatrices are optimally aligned thus computing a lower bound ofthe divergence over the set Σ of state permutations This results inthe aligned kernel

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

frac14 exp μminQ AΣ

DQJS ρaQρbQT

eth26THORN

Here the alignment step adds correspondence information allow-ing for a more fine-grained discrimination between structures

In both cases the density matrix of smaller size is padded to thesize of the larger one before computing the divergence Note thatthis is equivalent to padding the adjacency matrix of the smallergraph to be the same size as the larger one since both theseoperations are simply increasing the dimension of the kernel ofthe density matrix

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 5

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

31 Alignment of the density matrix

The definition of the aligned kernel requires the computationof the permutation that minimizes the divergence between thegraphs This is equivalent to minimizing the von Neumann entropyof the mixed state ethρathornρbTHORN=2 However this is a computationallyhard task and so we perform a two step approximation to theproblem First we approximate the solution by minimizing theFrobenious (L2) distance between matrices rather than the vonNeumann entropy Second we find an approximate solution to theL2 problem using Umeyamas method [29]

The rationale behind this approximate method is given by thefact that we already have the eigenvectors and eigenvalues of theHamiltonian to hand and with these we can efficiently computethe eigendecomposition of the density matrices [37] (which isrequired in order to apply Umeyamas spectral approach) Moreprecisely the authors of [37] show that

Lρa frac14 sumλA ~Λ

λPλPTλ

sumλA ~Λ

Pλρ0PTλ

frac14 sum

λA ~Λ

Pλλρ0PTλ

frac14 sumλA ~Λ

Pλρ0PTλ

sumλA ~Λ

λPλPTλ

frac14 ρaL eth27THORN

where ρ0 denotes the density matrix for G at time tfrac140 andPλ frac14sumμethλTHORN

k frac14 1ϕλkϕTλk is the projection operator onto the subspace

spanned by the μethλTHORN eigenvectors ϕλk associated with the eigen-value λ of the Hamiltonian In other words given the spectraldecomposition of the Laplacian matrix Lfrac14ΦΛΦT if we express thedensity matrix in the eigenvector basis given by Φ then it assumesa block diagonal form where each block corresponds to aneigenspace of L corresponding to a single eigenvalue As aconsequence if L has all eigenvalues distinct then ρa will havethe same spectrum as L Otherwise we need to independentlycompute the eigendecomposition of each diagonal block resultingin a complexity OethsumλA ~ΛμethλTHORN2THORN where μethλTHORN denotes the multiplicityof the eigenvalue λ

32 Properties of the quantum JensenndashShannon kernel for graphs

Theorem 31 If the Quantum JensenndashShannon divergence is nega-tive definite then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof First note that for the computation of Quantum JensenndashShannon divergence padding the smaller matrix to the size of thelarger one is equivalent to padding both to an even larger sizeThus we can consider the computation of the kernel matrix over Ngraphs to be performed over density matrices padded to the size ofthe larger graph This eliminates the dependence of the densitymatrix of the first graph on the choice of the second graph

In [38] the authors show that if sethGaGbTHORN is a conditionallypositive semidefinite kernel ie sethGaGbTHORN is a negative definitekernel then ks frac14 expethλsethGaGbTHORNTHORN is positive semidefinite for anyλ40

Hence if the Quantum JensenndashShannon divergence DQJSethGaGbTHORNis negative definite then DQJSethGaGbTHORN is conditionally positivesemidefinite and the Quantum JensenndashShannon kernelkQJSethGaGbTHORN frac14 expethμDQJSethGaGbTHORNTHORN is positive definite

Note that this proof is conditioned on the negative definitenessof the Quantum JensenndashShannon divergence as is the case for theclassical JensenndashShannon divergence Currently this is only aconjecture proved to be true on pure quantum states and forwhich there is a strong empirical evidence [39]

In general for the optimal alignment kernel [1011] we cannotguarantee positive definiteness for the aligned version of our

quantum kernel due to the lack of transitivity of the alignmentsWe do have however a similar result when using any (possiblynon optimal) set of alignments that satisfy transitivity

Corollary 32 If the Quantum JensenndashShannon divergence is nega-tive definite and the density matrices are aligned by a transitive set ofpermutations then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof If the alignments are transitive each of the density matricescan be aligned to a single common frame for example that of the firstgraph The aligned kernel on the original unaligned data is equivalentto the unaligned kernel on the new pre-aligned data

Another important property for a kernel is its independence onthe ordering of the data There are two ways by which the kernelcan depend on this ordering First it might depend on the order inwhich different graphs are presented Second it might depend onthe order in which the vertices are presented in each graph

Since the kernel only depends on second order relations asufficient condition for the first type of independence is for thekernel to be deterministic and symmetric If this is so then apermutation of the graphs will just result in the same permutationto be applied to the rows and columns of the kernel matrix For thesecond type of invariance we need full invariance of the computedkernel values with respect to vertex permutations

The unaligned kernel trivially satisfies the first condition sincethe Quantum JensenndashShannon divergence is both deterministicand symmetric while it fails to satisfy the second condition thusresulting in a kernel that is independent on the order of thegraphs but not on the vertex order within each graph For thealigned kernel on the other hand the addition of the alignmentstep requires a little more analysis

Theorem 33 The aligned Quantum JensenndashShannon kernel isindependent both to graph and to vertex order

Proof Recall that given two graphs Ga and Gb with mixed statedensity matrices ρa and ρb the aligned kernel is

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

eth28THORN

As a result the kernel value is uniquely identified evenwhen there aremultiple possible alignments Due to their optimality these align-ments must all result in the same optimal divergence It is easy to seethat the kernel is also symmetric In fact since the von Neumannentropy is invariant to similarity transformations we have

HNρathornQρbQ

T

2

frac14HN Q

QTρaQthornρb2

QT

frac14HN

QTρaQthornρb2

eth29THORN

From the above DQJSethρaQρbQT THORN frac14DQJSethQTρaQ ρbTHORN frac14DQJSethρbQTρaQ THORN

Hence

kQJSAethGaGbTHORN frac14maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN

frac14maxQ AΣ

expethμDQJSethρbQρaQT THORNTHORN frac14 kQJSAethGbGaTHORN eth30THORN

As a result the aligned kernel is deterministic and symmetric even ifthe optimizer might not be Thus the kernel is independent of theorder of the graphs

The independence of vertex ordering again derives directlyfrom the optimality of the value of the divergence Let G0

b be avertex-permuted version of Gb such that its mixing matrix

ρ0b frac14 TρbTT for a permutation matrix TAΣ and assume that

kQJSAethGaG0bTHORNakQJSAethGaGbTHORN Without loss of generality we can

assume kQJSAethGaG0bTHORN4kQJSAethGaGbTHORN Let

Qnfrac14 argmaxQΣ

expethμDQJSethρaQρ0bQT THORNTHORN eth31THORN

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎6

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

then

expethμDQJSethρaQnρ0bQnT THORNTHORN frac14 expethμDQJSethρaQnTρbT

TQnT THORNTHORN

4maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN eth32THORN

leading to a contradiction

Note that these properties depend on the optimality of thealignment with respect to the Quantum JensenndashShannon diver-gence We approximate the optimal alignment by the minimiza-tion of the Frobenious (L2) norm between the density matricesThis is approximated using Umeyamas algorithm since the result-ing matching problem is still in general NP-hard However in theproof the optimality was used as a means to uniquely identify thevalue of the divergence Any other uniquely identified value wouldstill give an order invariant kernel even if this is suboptimal

Eq (29) does not depend on the choice of optimal alignmentand so it would still hold under L2 optimality The main problem iswhether two permutations leading to the same (minimal) Frobe-nious distance between the density matrices can give differentdivergence values If permutations Q1 and Q2 differ by thecombination of symmetries present in the two graphs ieQ1 frac14 SQ2R with SAAutethGaTHORN and RAAutethGbTHORN then clearly they willhave the same divergence since ρa frac14 SρsS

T and ρb frac14 RρbRT but that

might not characterize all the L2 optimal permutations None-theless we have the following

Theorem 34 If the L2 optimal permutation is unique moduloisomorphism in the two graphs then the L2 approximation isindependent of the order of the graphs

Proof Derives directly from the uniqueness of the kernel value forall the optimizers

There remains another source of error since Umeyamas algo-rithm is only an approximation to the Frobenious minimizer Theimpact of this is much wider since the lack of optimality implies aninability to uniquely identify the value of the kernel in an invariantway However this is implicit in all alignment-based approachesdue to the NP hard nature of the graph matching problem

33 Complexity analysis

For a graph dataset having N graphs the kernel matrix of thequantum JensenndashShannon graph kernel can be computed usingAlgorithm 1

Algorithm 1 Computing the kernel matrix of the quantumJensenndashShannon graph kernel on N graphs

Input A graph dataset Gfrac14 fG1hellipGahellipGbhellipGNg with N testgraphs Output A jNj jNj kernel matrix KQJS1 Computing the density matrices of graphs For each graph

GaethVa EaTHORN compute the density matrix evolving from time0 to time T using the definition in Section 19

2 Computing the density matrix for each pair of graphs Foreach pair of graphs GaethVa EaTHORN and GbethVb EbTHORN with theirdensity matrices ρa and ρb compute ρa thornρb

2 using theunaligned or the aligned procedure

3 Computing the kernel matrix for the N graphs Computethe kernel matrix KQJS For each pair of graphs GaethVa EaTHORNand GbethVb EbTHORN compute the ethGaGbTHORN th element of KQJS askQJSethGaGbTHORN using Eq (24) or Eq (26)

For N graphs each of which has n vertices the time complexityof the quantum JensenndashShannon graph kernel on all pairs of thesegraphs is given by the three computational steps of Algorithm 1

These are (a) The computation of the density matrix for eachgraph Based on the definition in Section 22 this step has timecomplexity OethNn3THORN (b) The computation of the aligned densitymatrix and the unaligned density matrix for all pairs of graphsThis has time complexities OethN2n4THORN and OethN2nTHORN (c) As a result thetime complexities of the quantum kernel using the aligned andunaligned density matrices are OethNn3thornN2n4THORN and OethNn3thornN2nTHORNrespectively

4 Experimental evaluation

In this section we test our graph kernels on several standardgraph based datasets There are three stages to our experimentalevaluation First we evaluate the stability of the von Neumannentropies obtained from the density matrices with time t Secondwe empirically compare our new kernel with several alternativestate-of-the-art graph kernels Finally we evaluate the computa-tional efficiency of our new kernel

41 Graph datasets

We show the performance of our quantum JensenndashShannongraph kernel on five standard graph based datasets from bioinfor-matics and computer vision These datasets include MUTAG PPIsPTC(MR) COIL5 and Shock

Some statistic concerning the datasets are given in Table 1MUTAG The MUTAG dataset consists of graphs representing

188 chemical compounds and aims to predict whether eachcompound possesses mutagenicity [40] Since the vertices andedges of each compound are labeled with a real number wetransform these graphs into unweighted graphs

PPIs The PPIs dataset consists of proteinndashprotein interactionnetworks (PPIs) [41] The graphs describe the interaction relation-ships between histidine kinases in different species of bacteriaHistidine kinase is a key protein in the development of signaltransduction If two proteins have direct (physical) or indirect(functional) association they are connected by an edge There are219 PPIs in this dataset and they are collected from five differentkinds of bacteria with the following evolution order (from older tomore recent) Aquifex4 and thermotoga4 PPIs from Aquifex aelicusand Thermotoga maritima respectively Gram-Positive52 PPIs fromStaphylococcus aureus Cyanobacteria73 PPIs from Anabaena varia-bilis and Proteobacteria40 PPIs from Acidovorax avenae There is anadditional class (Acidobacteria46 PPIs) which is more controversialin terms of the bacterial evolution since they were discoveredHere we select Proteobacteria40 PPIs and Acidobacteria46 PPIs asthe testing graphs

PTC The PTC (The Predictive Toxicology Challenge) datasetrecords the carcinogenicity of several 100 chemical compounds formale rats (MR) female rats (FR) male mice (MM) and female mice(FM) [42] These graphs are very small (ie 20ndash30 vertices) andsparse (ie 25ndash40 edges We select the graphs of male rats (MR)for evaluation There are 344 test graphs in the MR class

COIL5 We create a dataset referred to as COIL5 from the COILimage database The COIL database consists of images of 100 3D

Table 1Information of the graph based datasets

Datasets MUTAG PPIs PTC COIL5 Shock

Max vertices 28 232 109 241 33Min vertices 10 3 2 72 4Ave vertices 1793 10960 2560 14490 1316 graphs 188 86 344 360 150 classes 2 2 2 5 10

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 7

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

objects In our experiments we use the images for the first fiveobjects For each of these objects we employ 72 images capturedfrom different viewpoints For each image we first extract cornerpoints using the Harris detector and then establish Delaunaygraphs based on the corner points as vertices Each vertex is usedas the seed of a Voronoi region which expands radially with aconstant speed The linear collision fronts of the regions delineatethe image plane into polygons and the Delaunay graph is theregion adjacency graph for the Voronoi polygons Associated witheach object there are 72 graphs from the 72 different object viewsSince the graphs of an object through different viewpoints are verydifferent this dataset is hard for graph classification

Shock The Shock dataset consists of graphs from the Shock 2Dshape database Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape There are150 graphs divided into 10 classes Each class contains 15 graphs

42 Evaluations on the von Neumann entropy with time t

We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies In this experimentwe use the testing graphs in the MUTAG PPIs PTC and COIL5datasets while we do not perform the evaluation on the Shockdataset For each graph we allow a continuous-time quantumwalk to evolve with t frac14 012hellip50 As the walk evolves fromtime tfrac140 to time tfrac1450 we compute the corresponding densitymatrix using Eq (15) Thus at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

with each graph The experimental results are shown in Fig 1 Thesubfigures of Fig 1 show the mean von Neumann entropies for thegraphs in the MUTAG PPIs PTC and COIL5 datasets separately Thex-axis shows the time t from 0 to 50 and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class Here the different lines represent the entropies forthe different classes of graphs These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset

43 Experiments on standard graph datasets from bioinformatics

431 Experimental setupWe now evaluate the performance of our quantum Jensenndash

Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU) Furthermore we compareour kernel with several alternative state-of-the-art graph kernelsThese kernels include (1) the WeisfeilerndashLehman subtree kernel(WL) [5] (2) the shortest path graph kernel (SPGK) [4] (3) theJensenndashShannon graph kernel associated with the steady staterandom walk (JSGK) [14] (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6]and (5) the random-walk graph kernel [3] For our quantumkernel we decide to let t-1 For the WeisfeilerndashLehmansubtree kernel we set the dimension of the WeisfeilerndashLehmanisomorphism as 10 Based on the definition in [5] this means thatwe compute 10 different WeisfeilerndashLehman subtree kernelmatrices (ie keth1THORN keth2THORNhellip keth10THORN) with different subtree heights

0 10 20 30 40 500

005

01

015

02

025

03

035

04

045

The time t

The

mea

n en

tropy

val

ueEntropy evaluation on the MUTAG dataset

Mutagenic aromaticMutagenic heteroaromatic

0 10 20 30 40 500

02

04

06

08

1

12

14

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the PPIs dataset

Proteobacteria40 PPIsAcidobacteria46 PPIs

0 10 20 30 40 500

005

01

015

02

025

The

mea

n en

tropy

val

ue

The time t

Entropy evaluation on the PTC dataset

First class (MR)Second class (MR)

0 10 20 30 40 50minus01

0

01

02

03

04

05

06

07

08

09

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the COIL5 dataset

box objectonion objectship objecttomato objectbottle object

Fig 1 Evaluations on the von Neumann entropy with time t (a) on the MUTAG dataset (b) on the PPIs dataset (c) on the PTC(MR) dataset and (d) on the COIL5 dataset

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎8

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

reliable structural correspondences between the substructuresThis limits the precision of the resulting similarity measure Toovercome this problem Frohlich et al [10] introduced alternativeoptimal assignment kernels Here each pair of structures is alignedbefore comparison However the introduction of the alignmentstep results in a kernel that is not positive definite in general [11]The problem results from the fact that alignments are not ingeneral transitive In other words if s is the vertex-alignmentbetween graph A and graph B and π is the alignment betweengraph B and graph C in general we cannot guarantee that thealignment between graph A and graph C is πs On the other handwhen the alignments are transitive there is a common simulta-neous alignment of all the graphs Under this alignment theoptimal assignment kernel is simply the sum over all the vertexedge kernels and this is positive definite since it is the sum ofseparate positive definite kernels While lacking positive definite-ness the optimal assignment kernels cannot be guaranteed torepresent an implicit embedding into a Hilbert space they havenonetheless been proven to be very effective in classifying struc-tures Another example of alignment-based kernels is the edit-distance-based kernels introduced by Neuhaus and Bunke [12]Here the alignments obtained from graph-edit distance are used toguide random walks on the structures being compared

An attractive alternative way to measure the similarity of a pair ofgraphs is to use the mutual information and compute the classicalJensenndashShannon divergence [13] In information theory the JensenndashShannon divergence is a dissimilarity measure between probabilitydistributions It is symmetric always well defined and bounded [13]Bai and Hancock [14] have used the divergence to define a JensenndashShannon kernel for graphs Here the kernel between a pair of graphsis computed using a nonextensive entropy measure in terms of theclassical JensenndashShannon divergence between probability distribu-tions over graphs For a graph the elements of the probabilitydistribution are computed from the degree of corresponding verticesThe entropy associated with a probability distribution of an individualgraph can thus be directly computed without the need of decompos-ing the graph into substructures Hence unlike the aforementionedexisting graph kernels the JensenndashShannon kernel between a pair ofgraphs avoids the computationally burdensome task of determiningthe similarities between all pairs of substructures Unfortunately therequired composite entropy for the JensenndashShannon kernel is com-puted from a product graph formed by a pair of graphs and reflectsno correspondence information between pairs of vertices As a resultthe JensenndashShannon graph kernel lacks correspondence informationbetween the probability distributions over the graphs and thuscannot precisely reflect the similarity between graphs

There has recently been an increasing interest in quantumcomputing because of the potential speed-ups over classicalalgorithms Examples include Grovers polynomially faster searchalgorithm [15] and Shors exponentially faster factorization algo-rithm [16] Furthermore quantum algorithms also offer us a richerstructure than their classical counterparts since they use qubitsrather than bits as the basic representational unit [32]

112 Quantum computationQuantum systems differ from their classical counterparts since

they add the possibility of state entanglement to the classicalstatistical mixture of classical systems which results in an expo-nential increase of the dimensionality of the state-space which isat the basis of the quantum speedup Pure quantum states arerepresented as entries in a complex Hilbert space while poten-tially mixed quantum states are represented through the densitymatrix Mixed states can then be compared by examining theirdensity matrices One convenient way to do this is to use thequantum JensenndashShannon divergence first introduced by Majtey

et al [1318] Unlike the classical JensenndashShannon divergencewhich is defined as a similarity measure between probabilitydistributions the quantum JensenndashShannon divergence is definedas the distance measure between mixed quantum states describedby density matrices Moreover it can be used to measure both thedegree of mixing and the entanglement [1318]

In this paper we are interested in computing a kernel betweenpairs of graphs using the quantum JensenndashShannon divergencebetween two mixed states representing the evolution ofcontinuous-time quantum walks on the graphs The continuous-time quantum walk has been introduced as the natural quantumanalogue of the classical random walk by Farhi and Gutmann [19]and has been widely used in the design of novel quantumalgorithms Ambainis et al [2021] were among the first togeneralize random walks on finite graphs to the quantum worldFurthermore they have explored the application of quantumwalks on graphs to a number of problems [22] Childs et al[2324] have explored the difference between quantum andclassical random walks and then exploited the increased powerof quantum walk as a general model of computation

Similar to the classical randomwalk on a graph the state space ofthe continuous-time quantumwalk is the set of vertices of the graphHowever unlike the classical random walk where the state vector isreal-valued and the evolution is governed by a doubly stochasticmatrix the state vector of the continuous-time quantum walk iscomplex-valued and its evolution is governed by a time-varyingunitary matrix The continuous-time quantum walk possesses anumber of interesting properties which are not exhibited by theclassical random walk For instance the continuous-time quantumwalk is reversible and non-ergodic and does not have a limitingdistribution As a result the continuous-time quantumwalk offers usan elegant way to design quantum algorithms on graphs For furtherdetails on quantum computation and quantum algorithms we referthe reader to the textbook in [32]

There have recently been several attempts to define quantumkernels using the continuous-time quantumwalk For instance Baiet al [26] have introduced a novel graph kernel where thesimilarity between two graphs is defined in terms of the similaritybetween two quantum walks evolving on the two graphs Thebasic idea here is to associate with each graph a mixed quantumstate representing the time evolution of a quantum walk Thekernel between the walk is defined as the divergence between thecorresponding density operators However this quantum diver-gence measure requires the computation of an additional mixedstate where the system has equal probability of being in each ofthe two original quantum states Unless this quantum kernel takesinto account the correspondences between the vertices of the twographs it can be shown that it is not permutation invariant Rossiet al [2728] have attempted to overcome this problem byallowing the two quantum walks to evolve on the union of thetwo graphs This exploits the relation between the interferences ofthe continuous-time quantum walks and the symmetries in thestructures being compared Since both the mixed states aredefined on the same structure this quantum kernel addressesthe shortcoming of permutation variance However the resultingapproach is less intuitive thus making it particularly hard to provethe positive semidefiniteness of the kernel Moreover the uniongraph is established by roughly connecting all vertex pairs of thetwo graphs and thus also lacks vertex correspondence informa-tion As a result this kernel cannot reflect precise similaritybetween the two graphs

12 Contributions

The aim of this paper is to overcome the shortcomings of theexisting graph kernels by defining a novel quantum kernel To this

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎2

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

end we develop the work in [26] further We intend to solve theproblems of permutation variance and missing vertex correspon-dence by introducing an additional alignment step and thuscompute an aligned mixed density matrix The goal of thealignment is to provide vertex correspondences and a lower boundon the divergence over permutations of the verticesquantumstates However we approximate the alignment that minimizesthe divergence with that which minimizes the Frobenious normbetween the density operators This latter problem is solved usingUmeyamas graph matching method [29] Since the aligneddensity matrix is computed by taking into account the vertexcorrespondences between the two graphs the density matrixreflects the locational correspondences between the quantumwalks over the vertices of the two graphs As a result our newquantum kernel can not only overcome the shortcomings ofmissing vertex correspondence and permutation variance thatarise with the existing kernels from the classicalquantum Jen-senndashShannon divergence but also overcome the shortcoming ofneglecting locational correspondence between substructures thatarises in the existing R-convolution kernels Furthermore incontrast to what is done in previous assignment-based kernelswe do not align the structures directly Rather we align the densitymatrices which are a special case of a Laplacian family signature[30] well known to provide a more robust representation forcorrespondence estimation Finally we adopt the closed formsolution of the density matrix introduced in [2728] to reducethe computational complexity and to avoid the cumbersome taskof explicitly simulating the time evolution of the quantum walkWe evaluate the performance of our kernel on several standardgraph datasets from bioinformatics and computer vision Theexperimental results demonstrate the effectiveness of the pro-posed graph kernel providing both a higher classification accuracythan the unaligned kernel and also achieving a performancecomparable or superior to other state-of-the-art graph kernels

13 Paper outline

In Section 2 we introduce the quantum mechanical formalismthat will be used to develop the ideas proposed in the paper Wedescribe how to compute a density matrix for the mixed quantumstate of the continuous-time quantum walk on a graph With themixed state density matrix to hand we compute the von Neumannentropy of the continuous-time quantum walk on the graphIn Section 3 we compute a graph kernel using the quantumJensenndashShannon divergence between the density matrices for apair of graphs In Section 4 we provide some experimentalevaluations which demonstrate experimentally the effectivenessof our quantum kernel In Section 5 we conclude the paper andsuggest future research directions

2 Quantum mechanical background

In this section we introduce the quantummechanical formalismto be used in this work We commence by reviewing the concept ofa continuous-time quantumwalk on a graph We then describe howto establish a density matrix for the mixed quantum state of thegraph through a quantum walk With the density matrix of themixed quantum state to hand we show how to compute a vonNeumann entropy for a graph

21 The continuous-time quantum walk

The continuous-time quantum walk is a natural quantumanalogue of the classical random walk [1931] Classical randomwalks can be used to model a diffusion process on a graph and

have proven to be a useful tool in the analysis of graph structureIn general diffusion is modeled as a Markovian process definedover the vertices of the graph where the transitions take place onthe edges More formally let Gfrac14 ethV ETHORN be an undirected graphwhere V is a set of n vertices and Efrac14 ethV VTHORN is a set of edges Thestate vector for the walk at time t is a probability distribution overV ie a vector ptARn whose uth entry gives the probability thatthe walk is at vertex u at time t Let A denote the symmetricadjacency matrix of the undirected graph G and let D be thediagonal degree matrix with elements du frac14sumn

v frac14 1Aethu vTHORN where duis the degree of the vertex u Then the continuous-time randomwalk on G will evolve according to the equation

pt frac14 eLt p0 eth1THORN

where Lfrac14DA is the graph LaplacianAs in the classical case the continuous-time quantum walk is

defined as a dynamical process over the vertices of the graphHowever in contrast to the classical case where the state vector isconstrained to lie in a probability space in the quantum case thestate of the system is defined through a vector of complexamplitudes over the vertex set V The squared norm of theamplitudes sums to unity over the vertices of the graph withno restriction on their sign or complex phase This in turn allowsboth destructive and constructive interferences to take placebetween the complex amplitudes Moreover in the quantumcase the evolution of the walk is governed by a complex valuedunitary matrix whereas the dynamics of the classical randomwalks is governed by a stochastic matrix Hence the evolution ofthe quantum walk is reversible which implies that quantumwalks are non-ergodic and do not possess a limiting distributionAs a result the behaviour of classical and quantum walks differssignificantly In particular the quantum walk possesses a numberof interesting properties not exhibited in the classical case Onenotable consequence of the interference properties of quantumwalks is that when a quantum walker backtracks on an edge itdoes so with the opposite phase This gives rise to destructiveinterference and reduces the problem of tottering In fact thefaster hitting time observed in quantum walks on the line is aconsequence of the destructive interference of backtrackingpaths [31]

The Dirac notation also known as bra-ket notation is astandard notation used for describing quantum states We callpure state a quantum state that can be described using a single ketvector jarang In the context of the paper a ket vector is a complexvalued column vector of unit Euclidean length in an n-dimen-sional Hilbert space Its conjugate transpose is a bra (row) vectordenoted as langaj As a result the inner product between two statesjarang and jbrang is written as langajbrang while their outer product is jaranglangbjUsing the Dirac notation we denote the basis state correspondingto the walk being at vertex uAV as jurang In other words jurang is thevector which is equal to 1 in the position corresponding to thevertex u and is zero everywhere else A general state of the walk isthen expressed as a complex linear combination of these ortho-normal basis states such that the state vector of the walk at time tis defined as

jψ trangfrac14 sumuAV

αuethtTHORNjurang eth2THORN

where the amplitude αuethtTHORNAC is subject to the constraintssumuAVαuethtTHORNαn

uethtTHORN frac14 1 for all uAV tARthorn where αnuethtTHORN is the complex

conjugate of αuethtTHORNAt each instant of time the probability of the walk being at

a particular vertex of the graph GethV ETHORN is given by the squarednorm of the amplitude associated with jψ trang More formally let Xt

be a random variable giving the location of the walk at time t

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 3

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

The probability of the walk being at vertex uAV at time t isgiven by

PrethXt frac14 uTHORN frac14 αuethtTHORNαn

uethtTHORN eth3THORNMore generally a quantum walker represented by jψ trang can be

observed to be in any state jϕrangACn and not just in the statesjuranguAV corresponding to the graph vertices Given an orthonor-mal basis jϕ1ranghellip jϕnrang of C

n the quantumwalker jψ trang is observed instate jϕirangwith probability langψ ijψ irang After being observed on state jϕirangthe new state of the quantum walker is jϕirang Further given thenature of quantum observation only orthonormal states can bedistinguished by a single observation

In quantum mechanics the Schrodinger equation is a partialdifferential equation that describes how quantum systems evolvewith time In the non-relativistic case it assumes the form

iℏpartpartt

ψrangfrac14H ψrang eth4THORN

where the Hamiltonian operatorH accounts for the total energy of asystem In the case of a particle moving in an empty space with zeropotential energy this reduces to the Laplacian operator nabla2 Here wetake the Hamiltonian of the system to be the graph Laplacian matrixL However any Hermitian operator encoding the structure of thegraph can be chosen as an alternative The evolution of the walk attime t is then given by the Schrodinger equation

partpartt

ψ trangfrac14 iL ψ trang eth5THORN

where the Laplacian L plays the role of the system HamiltonianGiven an initial state jψ0rang the Schrodinger equation has anexponential solution which can be used to determine the stateof the walk at a time t giving

jψ trangfrac14 e iLt jψ0rang eth6THORNwhere Ut frac14 e iLt is a unitary matrix governing the evolution of thewalk To simulate the evolution of the quantum walk it iscustomary to rewrite Eq (6) in terms of the spectral decomposi-tion Lfrac14ΦTΛΦ of the Laplacian matrix L where Λfrac14 diagethλ1 λ2hellip λjV jTHORN is a diagonal matrix with the ordered eigenvalues aselements ethλ1oλ2o⋯oλjV jTHORN and Φfrac14 ethϕ1jϕ2j⋯jϕjV jTHORN is a matrixwith the corresponding ordered orthonormal eigenvectors ascolumns Hence Eq (6) can be rewritten as

jψ trangfrac14ΦTe iΛtΦjψ0rang eth7THORNIn this work we let αueth0THORN be proportional to du and using the

constraints applied to the amplitude we get

αueth0THORN frac14ffiffiffiffiffiffiffiffiffiffidusumdu

s eth8THORN

Note that there is no particular indication as how to choose theinitial state However choosing a uniform distribution over thevertices of G would result in the system remaining stationarysince the initial state vector would be an eigenvector of theLaplacian of G and thus a stationary state of the walk

22 A density matrix from the mixed state

While a pure state can be naturally described using a single ketvector in general a quantum system can be in a mixed state ie astatistical ensemble of pure quantum states jψ irang each with prob-ability pi The density operator (or density matrix) of such a systemis defined as

ρfrac14sumipijψ iranglangψ ij eth9THORN

Density operators are positive unit-trace matrices directlylinked with the observables of the (mixed) quantum system

Let O be an observable ie a Hermitian operator acting on thequantum states and providing a measurement Without loss ofgenerality we have Ofrac14summ

i frac14 1viPi where Pi is the orthogonalprojector onto the i-th observation basis and vi is the measure-ment value when the quantum state is observed to be in thisobservation basis

The expectation value of the measurement over a mixed statecan be calculated from the density matrix ρ

langOrangfrac14 trethρOTHORN eth10THORNwhere tr is the trace operator Similarly the observation prob-ability on the i-th observation basis can be expressed in terms ofthe density matrix ρ as

PrethX frac14 iTHORN frac14 trethρPiTHORN eth11THORNFinally after the measurement the corresponding density opera-tor will be

ρ0 frac14 summ

i frac14 1PiρPi eth12THORN

For a graph GethV ETHORN let jψ trang denote the state corresponding to acontinuous-time quantumwalk that has evolved from time tfrac140 totime tfrac14T We define the time-averaged density matrix ρG

T forGethV ETHORN

ρTG frac14 1T

Z T

0ψ tranglangψ t dt

eth13THORN

which can be expressed in terms of the initial state jψ0rang as

ρTG frac14 1T

Z T

0e iLt ψ0ranglangψ0 eiLt dt

eth14THORN

In other words ρGT describes a system which has an equal

probability of being in any of the pure states defined by theevolution of the quantum walk from tfrac140 to tfrac14T Using thespectral decomposition of the Laplacian we can rewrite theprevious equation as

ρTG frac14 1T

Z T

0ΦTe iΛtΦ ψ0ranglangψ0 ΦTeiΛtΦ dt

eth15THORN

Let ϕxy denote the ethxyTHORN th element of the matrix of eigenvectorsΦ of the Laplacian The (rc)th element of ρT can be computed as

ρTGethr cTHORN frac141T

Z T

0sumn

k frac14 1sumn

l frac14 1ϕrke

iλktϕlkψ0l

sumn

x frac14 1sumn

y frac14 1ψ0xϕxye

iλytϕcy

dt

eth16THORNLet ψ k frac14sumlϕlkψ0l and ψ y frac14sumxϕxyψ0y where ψ0l denotes the lthelement of jψ0rang Then

ρTGethr cTHORN frac141T

Z T

0sumn

k frac14 1ϕrke

iλktψ k sumn

y frac14 1ϕcye

iλytψ y

dt eth17THORN

which can be rewritten as

ρTGethr cTHORN frac14 sumn

k frac14 1sumn

y frac14 1ϕrkϕcyψ kψ y

1T

Z T

0eiethλy λkTHORNt dt eth18THORN

If we let T-1 Eq (18) further simplifies to

ρ1G ethr cTHORN frac14 sumλA ~Λ

sumxABλ

sumyBλ

ϕrxϕcyψ xψ y eth19THORN

where ~Λ is the set of distinct eigenvalues of the Laplacian matrix Land Bλ is a basis of the eigenspace associated with λ As aconsequence computing the time-averaged density matrix relieson computing the eigendecomposition of GethV ETHORN and hence hastime complexity OethjV j3THORN Note that in the remainder of this paperwe will simplify our notation by referring to the time-averageddensity matrix as the density matrix associated with a graph

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎4

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

Moreover we will assume that ρGT is computed for T-1 unless

otherwise stated and we will denote it as ρG

23 The von Neumann entropy of a graph

In quantum mechanics the von Neumann entropy [32] HN of amixture is defined in terms of the trace and logarithm of thedensity operator ρ

HN frac14 trethρ log ρTHORN frac14 sumiξi ln ξi eth20THORN

where ξ1hellip ξn are the eigenvalues of ρ Note that if the quantumsystem is a pure state jψ irang with probability pifrac141 then the vonNeumann entropy HNethρTHORN frac14 trethρ log ρTHORN is zero On other hand amixed state generally has a non-zero von Neumann entropyassociated with its density operator

Here we propose to associate to each graph the von Neumannentropy of the density matrix computed as in Eq (19) Consider agraph GethV ETHORN its von Neumann entropy is

HNethρGTHORN frac14 trethρG log ρGTHORN frac14 sumjV j

jλGj log λGj eth21THORN

where λG1 hellip λGj hellip λGjV j are the eigenvalues of ρG

24 Quantum JensenndashShannon divergence

The classical JensenndashShannon divergence is a non-extensivemutual information measure defined between probability distri-butions over potentially structured data It is related to theShannon entropy of the two distributions [33] Consider two(discrete) probability distributions P frac14 ethp1hellip pahellip pATHORN and Qfrac14ethq1hellip qbhellip qBTHORN then the classical JensenndashShannon divergencebetween P and Q is defined as

DJSethPQTHORN frac14HSPthornQ2

12HSethPTHORN1

2HSethQTHORN eth22THORN

where HSethPTHORN frac14sumAa frac14 1pa log pa is the Shannon entropy of distri-

bution P The classical JensenndashShannon divergence is alwayswell defined symmetric negative definite and bounded ie0rDJSr1

The quantum JensenndashShannon divergence has recently beendeveloped as a generalization of the classical JensenndashShannondivergence to quantum states by Lamberti et al [13] Given twodensity operators ρ and s the quantum JensenndashShannon diver-gence between them is defined as

DQJSethρsTHORN frac14HNρthorns2

12HNethρTHORN

12HNethsTHORN eth23THORN

The quantum JensenndashShannon divergence is always well definedsymmetric negative definite and bounded ie 0rDQJSr1 [13]

Additionally the quantum JensenndashShannon divergence verifiesa number of properties which are required for a good distinguish-ability measure between quantum states [1318] This is a problemof key importance in quantum computation and it requires thedefinition of a suitable distance measure Wootters [34] was thefirst to introduce the concept of statistical distance between purequantum states His work is based on extending a distance overthe space of probability distributions to the Hilbert space of purequantum states Similarly the relative entropy [35] can be seen asa generalization of the information theoretic KullbackndashLeiblerdivergence However the relative entropy is unbounded it is notsymmetric and it does not satisfy the triangle inequality

The square root of the QJSD on the other hand is bounded it isa distance and as proved by Lamberti et al [13] it satisfies thetriangle inequality This is formally proved for pure states whilefor mixed states there is only empirical evidence Some alterna-tives to the QJSD are described in the literature most notably the

Bures distance [36] Both the Bures distance and the QJSD requirethe full density matrices to be computed However the QJSD turnsout to be faster to compute On the one hand the Bures distanceinvolves taking the square root of matrices usually computedthrough matrix diagonalization and which has time complexityOethn3THORN where n is the number of vertices in the graph On the otherhand the QJSD simply requires computing the eigenvalues of thedensity matrices which can be done in Oethn2THORN

3 A quantum JensenndashShannon graph kernel

In this section we propose a novel graph kernel based on thequantum JensenndashShannon divergence between continuous-timequantumwalks on different graphs Suppose that the graphs underconsideration are represented by a set fGhellipGahellipGbhellipGNg weconsider the continuous-time quantum walks on a pair of graphsGaethVa EaTHORN and GbethVb EbTHORN whose mixed state density matrices ρaand ρb can be computed using Eq (18) or Eq (19) With the densitymatrices ρa and ρb to hand we compute the quantum JensenndashShannon divergence DQJSethρa ρbTHORN using Eq (23) However the resultdepends on the relative order of the vertices in the two graphsand we need a way to describe the mixed quantum state given bythe walk on both graphs To this end we consider two strategiesWe refer to these as the unaligned density matrix and the aligneddensity matrix In the first case the density matrices are addedmaintaining the vertex order of the data This results in theunaligned kernel

kQJSUethGaGbTHORN frac14 expethμDQJSethρa ρbTHORNTHORN eth24THORN

where μ is a decay factor in the interval 0oμr1

DQJSethρa ρbTHORN frac14HNρathornρb

2

12HNethρaTHORN

12HNethρbTHORN eth25THORN

is the quantum Jensen Shannon divergence between the mixedstate density matrices ρa and ρb of Ga and Gb and HNethTHORN is the vonNeumann entropy of a graph density matrix defined in Eq (21)Here μ is used to ensure that the large value does not tend todominant the kernel value and in this work we use μfrac141 Theproposed kernel can be viewed as a similarity measure betweenthe quantum states associated with a pair of graphs Unfortunatelysince the unaligned density matrix ignores the vertex correspon-dence information for a pair of graphs it is not permutationinvariant to vertex order Instead it relies only on the globalproperties and long-range interactions between vertices to differ-entiate between structures

In the second case the mixing is performed after the densitymatrices are optimally aligned thus computing a lower bound ofthe divergence over the set Σ of state permutations This results inthe aligned kernel

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

frac14 exp μminQ AΣ

DQJS ρaQρbQT

eth26THORN

Here the alignment step adds correspondence information allow-ing for a more fine-grained discrimination between structures

In both cases the density matrix of smaller size is padded to thesize of the larger one before computing the divergence Note thatthis is equivalent to padding the adjacency matrix of the smallergraph to be the same size as the larger one since both theseoperations are simply increasing the dimension of the kernel ofthe density matrix

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 5

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

31 Alignment of the density matrix

The definition of the aligned kernel requires the computationof the permutation that minimizes the divergence between thegraphs This is equivalent to minimizing the von Neumann entropyof the mixed state ethρathornρbTHORN=2 However this is a computationallyhard task and so we perform a two step approximation to theproblem First we approximate the solution by minimizing theFrobenious (L2) distance between matrices rather than the vonNeumann entropy Second we find an approximate solution to theL2 problem using Umeyamas method [29]

The rationale behind this approximate method is given by thefact that we already have the eigenvectors and eigenvalues of theHamiltonian to hand and with these we can efficiently computethe eigendecomposition of the density matrices [37] (which isrequired in order to apply Umeyamas spectral approach) Moreprecisely the authors of [37] show that

Lρa frac14 sumλA ~Λ

λPλPTλ

sumλA ~Λ

Pλρ0PTλ

frac14 sum

λA ~Λ

Pλλρ0PTλ

frac14 sumλA ~Λ

Pλρ0PTλ

sumλA ~Λ

λPλPTλ

frac14 ρaL eth27THORN

where ρ0 denotes the density matrix for G at time tfrac140 andPλ frac14sumμethλTHORN

k frac14 1ϕλkϕTλk is the projection operator onto the subspace

spanned by the μethλTHORN eigenvectors ϕλk associated with the eigen-value λ of the Hamiltonian In other words given the spectraldecomposition of the Laplacian matrix Lfrac14ΦΛΦT if we express thedensity matrix in the eigenvector basis given by Φ then it assumesa block diagonal form where each block corresponds to aneigenspace of L corresponding to a single eigenvalue As aconsequence if L has all eigenvalues distinct then ρa will havethe same spectrum as L Otherwise we need to independentlycompute the eigendecomposition of each diagonal block resultingin a complexity OethsumλA ~ΛμethλTHORN2THORN where μethλTHORN denotes the multiplicityof the eigenvalue λ

32 Properties of the quantum JensenndashShannon kernel for graphs

Theorem 31 If the Quantum JensenndashShannon divergence is nega-tive definite then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof First note that for the computation of Quantum JensenndashShannon divergence padding the smaller matrix to the size of thelarger one is equivalent to padding both to an even larger sizeThus we can consider the computation of the kernel matrix over Ngraphs to be performed over density matrices padded to the size ofthe larger graph This eliminates the dependence of the densitymatrix of the first graph on the choice of the second graph

In [38] the authors show that if sethGaGbTHORN is a conditionallypositive semidefinite kernel ie sethGaGbTHORN is a negative definitekernel then ks frac14 expethλsethGaGbTHORNTHORN is positive semidefinite for anyλ40

Hence if the Quantum JensenndashShannon divergence DQJSethGaGbTHORNis negative definite then DQJSethGaGbTHORN is conditionally positivesemidefinite and the Quantum JensenndashShannon kernelkQJSethGaGbTHORN frac14 expethμDQJSethGaGbTHORNTHORN is positive definite

Note that this proof is conditioned on the negative definitenessof the Quantum JensenndashShannon divergence as is the case for theclassical JensenndashShannon divergence Currently this is only aconjecture proved to be true on pure quantum states and forwhich there is a strong empirical evidence [39]

In general for the optimal alignment kernel [1011] we cannotguarantee positive definiteness for the aligned version of our

quantum kernel due to the lack of transitivity of the alignmentsWe do have however a similar result when using any (possiblynon optimal) set of alignments that satisfy transitivity

Corollary 32 If the Quantum JensenndashShannon divergence is nega-tive definite and the density matrices are aligned by a transitive set ofpermutations then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof If the alignments are transitive each of the density matricescan be aligned to a single common frame for example that of the firstgraph The aligned kernel on the original unaligned data is equivalentto the unaligned kernel on the new pre-aligned data

Another important property for a kernel is its independence onthe ordering of the data There are two ways by which the kernelcan depend on this ordering First it might depend on the order inwhich different graphs are presented Second it might depend onthe order in which the vertices are presented in each graph

Since the kernel only depends on second order relations asufficient condition for the first type of independence is for thekernel to be deterministic and symmetric If this is so then apermutation of the graphs will just result in the same permutationto be applied to the rows and columns of the kernel matrix For thesecond type of invariance we need full invariance of the computedkernel values with respect to vertex permutations

The unaligned kernel trivially satisfies the first condition sincethe Quantum JensenndashShannon divergence is both deterministicand symmetric while it fails to satisfy the second condition thusresulting in a kernel that is independent on the order of thegraphs but not on the vertex order within each graph For thealigned kernel on the other hand the addition of the alignmentstep requires a little more analysis

Theorem 33 The aligned Quantum JensenndashShannon kernel isindependent both to graph and to vertex order

Proof Recall that given two graphs Ga and Gb with mixed statedensity matrices ρa and ρb the aligned kernel is

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

eth28THORN

As a result the kernel value is uniquely identified evenwhen there aremultiple possible alignments Due to their optimality these align-ments must all result in the same optimal divergence It is easy to seethat the kernel is also symmetric In fact since the von Neumannentropy is invariant to similarity transformations we have

HNρathornQρbQ

T

2

frac14HN Q

QTρaQthornρb2

QT

frac14HN

QTρaQthornρb2

eth29THORN

From the above DQJSethρaQρbQT THORN frac14DQJSethQTρaQ ρbTHORN frac14DQJSethρbQTρaQ THORN

Hence

kQJSAethGaGbTHORN frac14maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN

frac14maxQ AΣ

expethμDQJSethρbQρaQT THORNTHORN frac14 kQJSAethGbGaTHORN eth30THORN

As a result the aligned kernel is deterministic and symmetric even ifthe optimizer might not be Thus the kernel is independent of theorder of the graphs

The independence of vertex ordering again derives directlyfrom the optimality of the value of the divergence Let G0

b be avertex-permuted version of Gb such that its mixing matrix

ρ0b frac14 TρbTT for a permutation matrix TAΣ and assume that

kQJSAethGaG0bTHORNakQJSAethGaGbTHORN Without loss of generality we can

assume kQJSAethGaG0bTHORN4kQJSAethGaGbTHORN Let

Qnfrac14 argmaxQΣ

expethμDQJSethρaQρ0bQT THORNTHORN eth31THORN

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎6

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

then

expethμDQJSethρaQnρ0bQnT THORNTHORN frac14 expethμDQJSethρaQnTρbT

TQnT THORNTHORN

4maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN eth32THORN

leading to a contradiction

Note that these properties depend on the optimality of thealignment with respect to the Quantum JensenndashShannon diver-gence We approximate the optimal alignment by the minimiza-tion of the Frobenious (L2) norm between the density matricesThis is approximated using Umeyamas algorithm since the result-ing matching problem is still in general NP-hard However in theproof the optimality was used as a means to uniquely identify thevalue of the divergence Any other uniquely identified value wouldstill give an order invariant kernel even if this is suboptimal

Eq (29) does not depend on the choice of optimal alignmentand so it would still hold under L2 optimality The main problem iswhether two permutations leading to the same (minimal) Frobe-nious distance between the density matrices can give differentdivergence values If permutations Q1 and Q2 differ by thecombination of symmetries present in the two graphs ieQ1 frac14 SQ2R with SAAutethGaTHORN and RAAutethGbTHORN then clearly they willhave the same divergence since ρa frac14 SρsS

T and ρb frac14 RρbRT but that

might not characterize all the L2 optimal permutations None-theless we have the following

Theorem 34 If the L2 optimal permutation is unique moduloisomorphism in the two graphs then the L2 approximation isindependent of the order of the graphs

Proof Derives directly from the uniqueness of the kernel value forall the optimizers

There remains another source of error since Umeyamas algo-rithm is only an approximation to the Frobenious minimizer Theimpact of this is much wider since the lack of optimality implies aninability to uniquely identify the value of the kernel in an invariantway However this is implicit in all alignment-based approachesdue to the NP hard nature of the graph matching problem

33 Complexity analysis

For a graph dataset having N graphs the kernel matrix of thequantum JensenndashShannon graph kernel can be computed usingAlgorithm 1

Algorithm 1 Computing the kernel matrix of the quantumJensenndashShannon graph kernel on N graphs

Input A graph dataset Gfrac14 fG1hellipGahellipGbhellipGNg with N testgraphs Output A jNj jNj kernel matrix KQJS1 Computing the density matrices of graphs For each graph

GaethVa EaTHORN compute the density matrix evolving from time0 to time T using the definition in Section 19

2 Computing the density matrix for each pair of graphs Foreach pair of graphs GaethVa EaTHORN and GbethVb EbTHORN with theirdensity matrices ρa and ρb compute ρa thornρb

2 using theunaligned or the aligned procedure

3 Computing the kernel matrix for the N graphs Computethe kernel matrix KQJS For each pair of graphs GaethVa EaTHORNand GbethVb EbTHORN compute the ethGaGbTHORN th element of KQJS askQJSethGaGbTHORN using Eq (24) or Eq (26)

For N graphs each of which has n vertices the time complexityof the quantum JensenndashShannon graph kernel on all pairs of thesegraphs is given by the three computational steps of Algorithm 1

These are (a) The computation of the density matrix for eachgraph Based on the definition in Section 22 this step has timecomplexity OethNn3THORN (b) The computation of the aligned densitymatrix and the unaligned density matrix for all pairs of graphsThis has time complexities OethN2n4THORN and OethN2nTHORN (c) As a result thetime complexities of the quantum kernel using the aligned andunaligned density matrices are OethNn3thornN2n4THORN and OethNn3thornN2nTHORNrespectively

4 Experimental evaluation

In this section we test our graph kernels on several standardgraph based datasets There are three stages to our experimentalevaluation First we evaluate the stability of the von Neumannentropies obtained from the density matrices with time t Secondwe empirically compare our new kernel with several alternativestate-of-the-art graph kernels Finally we evaluate the computa-tional efficiency of our new kernel

41 Graph datasets

We show the performance of our quantum JensenndashShannongraph kernel on five standard graph based datasets from bioinfor-matics and computer vision These datasets include MUTAG PPIsPTC(MR) COIL5 and Shock

Some statistic concerning the datasets are given in Table 1MUTAG The MUTAG dataset consists of graphs representing

188 chemical compounds and aims to predict whether eachcompound possesses mutagenicity [40] Since the vertices andedges of each compound are labeled with a real number wetransform these graphs into unweighted graphs

PPIs The PPIs dataset consists of proteinndashprotein interactionnetworks (PPIs) [41] The graphs describe the interaction relation-ships between histidine kinases in different species of bacteriaHistidine kinase is a key protein in the development of signaltransduction If two proteins have direct (physical) or indirect(functional) association they are connected by an edge There are219 PPIs in this dataset and they are collected from five differentkinds of bacteria with the following evolution order (from older tomore recent) Aquifex4 and thermotoga4 PPIs from Aquifex aelicusand Thermotoga maritima respectively Gram-Positive52 PPIs fromStaphylococcus aureus Cyanobacteria73 PPIs from Anabaena varia-bilis and Proteobacteria40 PPIs from Acidovorax avenae There is anadditional class (Acidobacteria46 PPIs) which is more controversialin terms of the bacterial evolution since they were discoveredHere we select Proteobacteria40 PPIs and Acidobacteria46 PPIs asthe testing graphs

PTC The PTC (The Predictive Toxicology Challenge) datasetrecords the carcinogenicity of several 100 chemical compounds formale rats (MR) female rats (FR) male mice (MM) and female mice(FM) [42] These graphs are very small (ie 20ndash30 vertices) andsparse (ie 25ndash40 edges We select the graphs of male rats (MR)for evaluation There are 344 test graphs in the MR class

COIL5 We create a dataset referred to as COIL5 from the COILimage database The COIL database consists of images of 100 3D

Table 1Information of the graph based datasets

Datasets MUTAG PPIs PTC COIL5 Shock

Max vertices 28 232 109 241 33Min vertices 10 3 2 72 4Ave vertices 1793 10960 2560 14490 1316 graphs 188 86 344 360 150 classes 2 2 2 5 10

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 7

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

objects In our experiments we use the images for the first fiveobjects For each of these objects we employ 72 images capturedfrom different viewpoints For each image we first extract cornerpoints using the Harris detector and then establish Delaunaygraphs based on the corner points as vertices Each vertex is usedas the seed of a Voronoi region which expands radially with aconstant speed The linear collision fronts of the regions delineatethe image plane into polygons and the Delaunay graph is theregion adjacency graph for the Voronoi polygons Associated witheach object there are 72 graphs from the 72 different object viewsSince the graphs of an object through different viewpoints are verydifferent this dataset is hard for graph classification

Shock The Shock dataset consists of graphs from the Shock 2Dshape database Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape There are150 graphs divided into 10 classes Each class contains 15 graphs

42 Evaluations on the von Neumann entropy with time t

We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies In this experimentwe use the testing graphs in the MUTAG PPIs PTC and COIL5datasets while we do not perform the evaluation on the Shockdataset For each graph we allow a continuous-time quantumwalk to evolve with t frac14 012hellip50 As the walk evolves fromtime tfrac140 to time tfrac1450 we compute the corresponding densitymatrix using Eq (15) Thus at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

with each graph The experimental results are shown in Fig 1 Thesubfigures of Fig 1 show the mean von Neumann entropies for thegraphs in the MUTAG PPIs PTC and COIL5 datasets separately Thex-axis shows the time t from 0 to 50 and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class Here the different lines represent the entropies forthe different classes of graphs These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset

43 Experiments on standard graph datasets from bioinformatics

431 Experimental setupWe now evaluate the performance of our quantum Jensenndash

Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU) Furthermore we compareour kernel with several alternative state-of-the-art graph kernelsThese kernels include (1) the WeisfeilerndashLehman subtree kernel(WL) [5] (2) the shortest path graph kernel (SPGK) [4] (3) theJensenndashShannon graph kernel associated with the steady staterandom walk (JSGK) [14] (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6]and (5) the random-walk graph kernel [3] For our quantumkernel we decide to let t-1 For the WeisfeilerndashLehmansubtree kernel we set the dimension of the WeisfeilerndashLehmanisomorphism as 10 Based on the definition in [5] this means thatwe compute 10 different WeisfeilerndashLehman subtree kernelmatrices (ie keth1THORN keth2THORNhellip keth10THORN) with different subtree heights

0 10 20 30 40 500

005

01

015

02

025

03

035

04

045

The time t

The

mea

n en

tropy

val

ueEntropy evaluation on the MUTAG dataset

Mutagenic aromaticMutagenic heteroaromatic

0 10 20 30 40 500

02

04

06

08

1

12

14

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the PPIs dataset

Proteobacteria40 PPIsAcidobacteria46 PPIs

0 10 20 30 40 500

005

01

015

02

025

The

mea

n en

tropy

val

ue

The time t

Entropy evaluation on the PTC dataset

First class (MR)Second class (MR)

0 10 20 30 40 50minus01

0

01

02

03

04

05

06

07

08

09

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the COIL5 dataset

box objectonion objectship objecttomato objectbottle object

Fig 1 Evaluations on the von Neumann entropy with time t (a) on the MUTAG dataset (b) on the PPIs dataset (c) on the PTC(MR) dataset and (d) on the COIL5 dataset

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎8

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

end we develop the work in [26] further We intend to solve theproblems of permutation variance and missing vertex correspon-dence by introducing an additional alignment step and thuscompute an aligned mixed density matrix The goal of thealignment is to provide vertex correspondences and a lower boundon the divergence over permutations of the verticesquantumstates However we approximate the alignment that minimizesthe divergence with that which minimizes the Frobenious normbetween the density operators This latter problem is solved usingUmeyamas graph matching method [29] Since the aligneddensity matrix is computed by taking into account the vertexcorrespondences between the two graphs the density matrixreflects the locational correspondences between the quantumwalks over the vertices of the two graphs As a result our newquantum kernel can not only overcome the shortcomings ofmissing vertex correspondence and permutation variance thatarise with the existing kernels from the classicalquantum Jen-senndashShannon divergence but also overcome the shortcoming ofneglecting locational correspondence between substructures thatarises in the existing R-convolution kernels Furthermore incontrast to what is done in previous assignment-based kernelswe do not align the structures directly Rather we align the densitymatrices which are a special case of a Laplacian family signature[30] well known to provide a more robust representation forcorrespondence estimation Finally we adopt the closed formsolution of the density matrix introduced in [2728] to reducethe computational complexity and to avoid the cumbersome taskof explicitly simulating the time evolution of the quantum walkWe evaluate the performance of our kernel on several standardgraph datasets from bioinformatics and computer vision Theexperimental results demonstrate the effectiveness of the pro-posed graph kernel providing both a higher classification accuracythan the unaligned kernel and also achieving a performancecomparable or superior to other state-of-the-art graph kernels

13 Paper outline

In Section 2 we introduce the quantum mechanical formalismthat will be used to develop the ideas proposed in the paper Wedescribe how to compute a density matrix for the mixed quantumstate of the continuous-time quantum walk on a graph With themixed state density matrix to hand we compute the von Neumannentropy of the continuous-time quantum walk on the graphIn Section 3 we compute a graph kernel using the quantumJensenndashShannon divergence between the density matrices for apair of graphs In Section 4 we provide some experimentalevaluations which demonstrate experimentally the effectivenessof our quantum kernel In Section 5 we conclude the paper andsuggest future research directions

2 Quantum mechanical background

In this section we introduce the quantummechanical formalismto be used in this work We commence by reviewing the concept ofa continuous-time quantumwalk on a graph We then describe howto establish a density matrix for the mixed quantum state of thegraph through a quantum walk With the density matrix of themixed quantum state to hand we show how to compute a vonNeumann entropy for a graph

21 The continuous-time quantum walk

The continuous-time quantum walk is a natural quantumanalogue of the classical random walk [1931] Classical randomwalks can be used to model a diffusion process on a graph and

have proven to be a useful tool in the analysis of graph structureIn general diffusion is modeled as a Markovian process definedover the vertices of the graph where the transitions take place onthe edges More formally let Gfrac14 ethV ETHORN be an undirected graphwhere V is a set of n vertices and Efrac14 ethV VTHORN is a set of edges Thestate vector for the walk at time t is a probability distribution overV ie a vector ptARn whose uth entry gives the probability thatthe walk is at vertex u at time t Let A denote the symmetricadjacency matrix of the undirected graph G and let D be thediagonal degree matrix with elements du frac14sumn

v frac14 1Aethu vTHORN where duis the degree of the vertex u Then the continuous-time randomwalk on G will evolve according to the equation

pt frac14 eLt p0 eth1THORN

where Lfrac14DA is the graph LaplacianAs in the classical case the continuous-time quantum walk is

defined as a dynamical process over the vertices of the graphHowever in contrast to the classical case where the state vector isconstrained to lie in a probability space in the quantum case thestate of the system is defined through a vector of complexamplitudes over the vertex set V The squared norm of theamplitudes sums to unity over the vertices of the graph withno restriction on their sign or complex phase This in turn allowsboth destructive and constructive interferences to take placebetween the complex amplitudes Moreover in the quantumcase the evolution of the walk is governed by a complex valuedunitary matrix whereas the dynamics of the classical randomwalks is governed by a stochastic matrix Hence the evolution ofthe quantum walk is reversible which implies that quantumwalks are non-ergodic and do not possess a limiting distributionAs a result the behaviour of classical and quantum walks differssignificantly In particular the quantum walk possesses a numberof interesting properties not exhibited in the classical case Onenotable consequence of the interference properties of quantumwalks is that when a quantum walker backtracks on an edge itdoes so with the opposite phase This gives rise to destructiveinterference and reduces the problem of tottering In fact thefaster hitting time observed in quantum walks on the line is aconsequence of the destructive interference of backtrackingpaths [31]

The Dirac notation also known as bra-ket notation is astandard notation used for describing quantum states We callpure state a quantum state that can be described using a single ketvector jarang In the context of the paper a ket vector is a complexvalued column vector of unit Euclidean length in an n-dimen-sional Hilbert space Its conjugate transpose is a bra (row) vectordenoted as langaj As a result the inner product between two statesjarang and jbrang is written as langajbrang while their outer product is jaranglangbjUsing the Dirac notation we denote the basis state correspondingto the walk being at vertex uAV as jurang In other words jurang is thevector which is equal to 1 in the position corresponding to thevertex u and is zero everywhere else A general state of the walk isthen expressed as a complex linear combination of these ortho-normal basis states such that the state vector of the walk at time tis defined as

jψ trangfrac14 sumuAV

αuethtTHORNjurang eth2THORN

where the amplitude αuethtTHORNAC is subject to the constraintssumuAVαuethtTHORNαn

uethtTHORN frac14 1 for all uAV tARthorn where αnuethtTHORN is the complex

conjugate of αuethtTHORNAt each instant of time the probability of the walk being at

a particular vertex of the graph GethV ETHORN is given by the squarednorm of the amplitude associated with jψ trang More formally let Xt

be a random variable giving the location of the walk at time t

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 3

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

The probability of the walk being at vertex uAV at time t isgiven by

PrethXt frac14 uTHORN frac14 αuethtTHORNαn

uethtTHORN eth3THORNMore generally a quantum walker represented by jψ trang can be

observed to be in any state jϕrangACn and not just in the statesjuranguAV corresponding to the graph vertices Given an orthonor-mal basis jϕ1ranghellip jϕnrang of C

n the quantumwalker jψ trang is observed instate jϕirangwith probability langψ ijψ irang After being observed on state jϕirangthe new state of the quantum walker is jϕirang Further given thenature of quantum observation only orthonormal states can bedistinguished by a single observation

In quantum mechanics the Schrodinger equation is a partialdifferential equation that describes how quantum systems evolvewith time In the non-relativistic case it assumes the form

iℏpartpartt

ψrangfrac14H ψrang eth4THORN

where the Hamiltonian operatorH accounts for the total energy of asystem In the case of a particle moving in an empty space with zeropotential energy this reduces to the Laplacian operator nabla2 Here wetake the Hamiltonian of the system to be the graph Laplacian matrixL However any Hermitian operator encoding the structure of thegraph can be chosen as an alternative The evolution of the walk attime t is then given by the Schrodinger equation

partpartt

ψ trangfrac14 iL ψ trang eth5THORN

where the Laplacian L plays the role of the system HamiltonianGiven an initial state jψ0rang the Schrodinger equation has anexponential solution which can be used to determine the stateof the walk at a time t giving

jψ trangfrac14 e iLt jψ0rang eth6THORNwhere Ut frac14 e iLt is a unitary matrix governing the evolution of thewalk To simulate the evolution of the quantum walk it iscustomary to rewrite Eq (6) in terms of the spectral decomposi-tion Lfrac14ΦTΛΦ of the Laplacian matrix L where Λfrac14 diagethλ1 λ2hellip λjV jTHORN is a diagonal matrix with the ordered eigenvalues aselements ethλ1oλ2o⋯oλjV jTHORN and Φfrac14 ethϕ1jϕ2j⋯jϕjV jTHORN is a matrixwith the corresponding ordered orthonormal eigenvectors ascolumns Hence Eq (6) can be rewritten as

jψ trangfrac14ΦTe iΛtΦjψ0rang eth7THORNIn this work we let αueth0THORN be proportional to du and using the

constraints applied to the amplitude we get

αueth0THORN frac14ffiffiffiffiffiffiffiffiffiffidusumdu

s eth8THORN

Note that there is no particular indication as how to choose theinitial state However choosing a uniform distribution over thevertices of G would result in the system remaining stationarysince the initial state vector would be an eigenvector of theLaplacian of G and thus a stationary state of the walk

22 A density matrix from the mixed state

While a pure state can be naturally described using a single ketvector in general a quantum system can be in a mixed state ie astatistical ensemble of pure quantum states jψ irang each with prob-ability pi The density operator (or density matrix) of such a systemis defined as

ρfrac14sumipijψ iranglangψ ij eth9THORN

Density operators are positive unit-trace matrices directlylinked with the observables of the (mixed) quantum system

Let O be an observable ie a Hermitian operator acting on thequantum states and providing a measurement Without loss ofgenerality we have Ofrac14summ

i frac14 1viPi where Pi is the orthogonalprojector onto the i-th observation basis and vi is the measure-ment value when the quantum state is observed to be in thisobservation basis

The expectation value of the measurement over a mixed statecan be calculated from the density matrix ρ

langOrangfrac14 trethρOTHORN eth10THORNwhere tr is the trace operator Similarly the observation prob-ability on the i-th observation basis can be expressed in terms ofthe density matrix ρ as

PrethX frac14 iTHORN frac14 trethρPiTHORN eth11THORNFinally after the measurement the corresponding density opera-tor will be

ρ0 frac14 summ

i frac14 1PiρPi eth12THORN

For a graph GethV ETHORN let jψ trang denote the state corresponding to acontinuous-time quantumwalk that has evolved from time tfrac140 totime tfrac14T We define the time-averaged density matrix ρG

T forGethV ETHORN

ρTG frac14 1T

Z T

0ψ tranglangψ t dt

eth13THORN

which can be expressed in terms of the initial state jψ0rang as

ρTG frac14 1T

Z T

0e iLt ψ0ranglangψ0 eiLt dt

eth14THORN

In other words ρGT describes a system which has an equal

probability of being in any of the pure states defined by theevolution of the quantum walk from tfrac140 to tfrac14T Using thespectral decomposition of the Laplacian we can rewrite theprevious equation as

ρTG frac14 1T

Z T

0ΦTe iΛtΦ ψ0ranglangψ0 ΦTeiΛtΦ dt

eth15THORN

Let ϕxy denote the ethxyTHORN th element of the matrix of eigenvectorsΦ of the Laplacian The (rc)th element of ρT can be computed as

ρTGethr cTHORN frac141T

Z T

0sumn

k frac14 1sumn

l frac14 1ϕrke

iλktϕlkψ0l

sumn

x frac14 1sumn

y frac14 1ψ0xϕxye

iλytϕcy

dt

eth16THORNLet ψ k frac14sumlϕlkψ0l and ψ y frac14sumxϕxyψ0y where ψ0l denotes the lthelement of jψ0rang Then

ρTGethr cTHORN frac141T

Z T

0sumn

k frac14 1ϕrke

iλktψ k sumn

y frac14 1ϕcye

iλytψ y

dt eth17THORN

which can be rewritten as

ρTGethr cTHORN frac14 sumn

k frac14 1sumn

y frac14 1ϕrkϕcyψ kψ y

1T

Z T

0eiethλy λkTHORNt dt eth18THORN

If we let T-1 Eq (18) further simplifies to

ρ1G ethr cTHORN frac14 sumλA ~Λ

sumxABλ

sumyBλ

ϕrxϕcyψ xψ y eth19THORN

where ~Λ is the set of distinct eigenvalues of the Laplacian matrix Land Bλ is a basis of the eigenspace associated with λ As aconsequence computing the time-averaged density matrix relieson computing the eigendecomposition of GethV ETHORN and hence hastime complexity OethjV j3THORN Note that in the remainder of this paperwe will simplify our notation by referring to the time-averageddensity matrix as the density matrix associated with a graph

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎4

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

Moreover we will assume that ρGT is computed for T-1 unless

otherwise stated and we will denote it as ρG

23 The von Neumann entropy of a graph

In quantum mechanics the von Neumann entropy [32] HN of amixture is defined in terms of the trace and logarithm of thedensity operator ρ

HN frac14 trethρ log ρTHORN frac14 sumiξi ln ξi eth20THORN

where ξ1hellip ξn are the eigenvalues of ρ Note that if the quantumsystem is a pure state jψ irang with probability pifrac141 then the vonNeumann entropy HNethρTHORN frac14 trethρ log ρTHORN is zero On other hand amixed state generally has a non-zero von Neumann entropyassociated with its density operator

Here we propose to associate to each graph the von Neumannentropy of the density matrix computed as in Eq (19) Consider agraph GethV ETHORN its von Neumann entropy is

HNethρGTHORN frac14 trethρG log ρGTHORN frac14 sumjV j

jλGj log λGj eth21THORN

where λG1 hellip λGj hellip λGjV j are the eigenvalues of ρG

24 Quantum JensenndashShannon divergence

The classical JensenndashShannon divergence is a non-extensivemutual information measure defined between probability distri-butions over potentially structured data It is related to theShannon entropy of the two distributions [33] Consider two(discrete) probability distributions P frac14 ethp1hellip pahellip pATHORN and Qfrac14ethq1hellip qbhellip qBTHORN then the classical JensenndashShannon divergencebetween P and Q is defined as

DJSethPQTHORN frac14HSPthornQ2

12HSethPTHORN1

2HSethQTHORN eth22THORN

where HSethPTHORN frac14sumAa frac14 1pa log pa is the Shannon entropy of distri-

bution P The classical JensenndashShannon divergence is alwayswell defined symmetric negative definite and bounded ie0rDJSr1

The quantum JensenndashShannon divergence has recently beendeveloped as a generalization of the classical JensenndashShannondivergence to quantum states by Lamberti et al [13] Given twodensity operators ρ and s the quantum JensenndashShannon diver-gence between them is defined as

DQJSethρsTHORN frac14HNρthorns2

12HNethρTHORN

12HNethsTHORN eth23THORN

The quantum JensenndashShannon divergence is always well definedsymmetric negative definite and bounded ie 0rDQJSr1 [13]

Additionally the quantum JensenndashShannon divergence verifiesa number of properties which are required for a good distinguish-ability measure between quantum states [1318] This is a problemof key importance in quantum computation and it requires thedefinition of a suitable distance measure Wootters [34] was thefirst to introduce the concept of statistical distance between purequantum states His work is based on extending a distance overthe space of probability distributions to the Hilbert space of purequantum states Similarly the relative entropy [35] can be seen asa generalization of the information theoretic KullbackndashLeiblerdivergence However the relative entropy is unbounded it is notsymmetric and it does not satisfy the triangle inequality

The square root of the QJSD on the other hand is bounded it isa distance and as proved by Lamberti et al [13] it satisfies thetriangle inequality This is formally proved for pure states whilefor mixed states there is only empirical evidence Some alterna-tives to the QJSD are described in the literature most notably the

Bures distance [36] Both the Bures distance and the QJSD requirethe full density matrices to be computed However the QJSD turnsout to be faster to compute On the one hand the Bures distanceinvolves taking the square root of matrices usually computedthrough matrix diagonalization and which has time complexityOethn3THORN where n is the number of vertices in the graph On the otherhand the QJSD simply requires computing the eigenvalues of thedensity matrices which can be done in Oethn2THORN

3 A quantum JensenndashShannon graph kernel

In this section we propose a novel graph kernel based on thequantum JensenndashShannon divergence between continuous-timequantumwalks on different graphs Suppose that the graphs underconsideration are represented by a set fGhellipGahellipGbhellipGNg weconsider the continuous-time quantum walks on a pair of graphsGaethVa EaTHORN and GbethVb EbTHORN whose mixed state density matrices ρaand ρb can be computed using Eq (18) or Eq (19) With the densitymatrices ρa and ρb to hand we compute the quantum JensenndashShannon divergence DQJSethρa ρbTHORN using Eq (23) However the resultdepends on the relative order of the vertices in the two graphsand we need a way to describe the mixed quantum state given bythe walk on both graphs To this end we consider two strategiesWe refer to these as the unaligned density matrix and the aligneddensity matrix In the first case the density matrices are addedmaintaining the vertex order of the data This results in theunaligned kernel

kQJSUethGaGbTHORN frac14 expethμDQJSethρa ρbTHORNTHORN eth24THORN

where μ is a decay factor in the interval 0oμr1

DQJSethρa ρbTHORN frac14HNρathornρb

2

12HNethρaTHORN

12HNethρbTHORN eth25THORN

is the quantum Jensen Shannon divergence between the mixedstate density matrices ρa and ρb of Ga and Gb and HNethTHORN is the vonNeumann entropy of a graph density matrix defined in Eq (21)Here μ is used to ensure that the large value does not tend todominant the kernel value and in this work we use μfrac141 Theproposed kernel can be viewed as a similarity measure betweenthe quantum states associated with a pair of graphs Unfortunatelysince the unaligned density matrix ignores the vertex correspon-dence information for a pair of graphs it is not permutationinvariant to vertex order Instead it relies only on the globalproperties and long-range interactions between vertices to differ-entiate between structures

In the second case the mixing is performed after the densitymatrices are optimally aligned thus computing a lower bound ofthe divergence over the set Σ of state permutations This results inthe aligned kernel

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

frac14 exp μminQ AΣ

DQJS ρaQρbQT

eth26THORN

Here the alignment step adds correspondence information allow-ing for a more fine-grained discrimination between structures

In both cases the density matrix of smaller size is padded to thesize of the larger one before computing the divergence Note thatthis is equivalent to padding the adjacency matrix of the smallergraph to be the same size as the larger one since both theseoperations are simply increasing the dimension of the kernel ofthe density matrix

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 5

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

31 Alignment of the density matrix

The definition of the aligned kernel requires the computationof the permutation that minimizes the divergence between thegraphs This is equivalent to minimizing the von Neumann entropyof the mixed state ethρathornρbTHORN=2 However this is a computationallyhard task and so we perform a two step approximation to theproblem First we approximate the solution by minimizing theFrobenious (L2) distance between matrices rather than the vonNeumann entropy Second we find an approximate solution to theL2 problem using Umeyamas method [29]

The rationale behind this approximate method is given by thefact that we already have the eigenvectors and eigenvalues of theHamiltonian to hand and with these we can efficiently computethe eigendecomposition of the density matrices [37] (which isrequired in order to apply Umeyamas spectral approach) Moreprecisely the authors of [37] show that

Lρa frac14 sumλA ~Λ

λPλPTλ

sumλA ~Λ

Pλρ0PTλ

frac14 sum

λA ~Λ

Pλλρ0PTλ

frac14 sumλA ~Λ

Pλρ0PTλ

sumλA ~Λ

λPλPTλ

frac14 ρaL eth27THORN

where ρ0 denotes the density matrix for G at time tfrac140 andPλ frac14sumμethλTHORN

k frac14 1ϕλkϕTλk is the projection operator onto the subspace

spanned by the μethλTHORN eigenvectors ϕλk associated with the eigen-value λ of the Hamiltonian In other words given the spectraldecomposition of the Laplacian matrix Lfrac14ΦΛΦT if we express thedensity matrix in the eigenvector basis given by Φ then it assumesa block diagonal form where each block corresponds to aneigenspace of L corresponding to a single eigenvalue As aconsequence if L has all eigenvalues distinct then ρa will havethe same spectrum as L Otherwise we need to independentlycompute the eigendecomposition of each diagonal block resultingin a complexity OethsumλA ~ΛμethλTHORN2THORN where μethλTHORN denotes the multiplicityof the eigenvalue λ

32 Properties of the quantum JensenndashShannon kernel for graphs

Theorem 31 If the Quantum JensenndashShannon divergence is nega-tive definite then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof First note that for the computation of Quantum JensenndashShannon divergence padding the smaller matrix to the size of thelarger one is equivalent to padding both to an even larger sizeThus we can consider the computation of the kernel matrix over Ngraphs to be performed over density matrices padded to the size ofthe larger graph This eliminates the dependence of the densitymatrix of the first graph on the choice of the second graph

In [38] the authors show that if sethGaGbTHORN is a conditionallypositive semidefinite kernel ie sethGaGbTHORN is a negative definitekernel then ks frac14 expethλsethGaGbTHORNTHORN is positive semidefinite for anyλ40

Hence if the Quantum JensenndashShannon divergence DQJSethGaGbTHORNis negative definite then DQJSethGaGbTHORN is conditionally positivesemidefinite and the Quantum JensenndashShannon kernelkQJSethGaGbTHORN frac14 expethμDQJSethGaGbTHORNTHORN is positive definite

Note that this proof is conditioned on the negative definitenessof the Quantum JensenndashShannon divergence as is the case for theclassical JensenndashShannon divergence Currently this is only aconjecture proved to be true on pure quantum states and forwhich there is a strong empirical evidence [39]

In general for the optimal alignment kernel [1011] we cannotguarantee positive definiteness for the aligned version of our

quantum kernel due to the lack of transitivity of the alignmentsWe do have however a similar result when using any (possiblynon optimal) set of alignments that satisfy transitivity

Corollary 32 If the Quantum JensenndashShannon divergence is nega-tive definite and the density matrices are aligned by a transitive set ofpermutations then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof If the alignments are transitive each of the density matricescan be aligned to a single common frame for example that of the firstgraph The aligned kernel on the original unaligned data is equivalentto the unaligned kernel on the new pre-aligned data

Another important property for a kernel is its independence onthe ordering of the data There are two ways by which the kernelcan depend on this ordering First it might depend on the order inwhich different graphs are presented Second it might depend onthe order in which the vertices are presented in each graph

Since the kernel only depends on second order relations asufficient condition for the first type of independence is for thekernel to be deterministic and symmetric If this is so then apermutation of the graphs will just result in the same permutationto be applied to the rows and columns of the kernel matrix For thesecond type of invariance we need full invariance of the computedkernel values with respect to vertex permutations

The unaligned kernel trivially satisfies the first condition sincethe Quantum JensenndashShannon divergence is both deterministicand symmetric while it fails to satisfy the second condition thusresulting in a kernel that is independent on the order of thegraphs but not on the vertex order within each graph For thealigned kernel on the other hand the addition of the alignmentstep requires a little more analysis

Theorem 33 The aligned Quantum JensenndashShannon kernel isindependent both to graph and to vertex order

Proof Recall that given two graphs Ga and Gb with mixed statedensity matrices ρa and ρb the aligned kernel is

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

eth28THORN

As a result the kernel value is uniquely identified evenwhen there aremultiple possible alignments Due to their optimality these align-ments must all result in the same optimal divergence It is easy to seethat the kernel is also symmetric In fact since the von Neumannentropy is invariant to similarity transformations we have

HNρathornQρbQ

T

2

frac14HN Q

QTρaQthornρb2

QT

frac14HN

QTρaQthornρb2

eth29THORN

From the above DQJSethρaQρbQT THORN frac14DQJSethQTρaQ ρbTHORN frac14DQJSethρbQTρaQ THORN

Hence

kQJSAethGaGbTHORN frac14maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN

frac14maxQ AΣ

expethμDQJSethρbQρaQT THORNTHORN frac14 kQJSAethGbGaTHORN eth30THORN

As a result the aligned kernel is deterministic and symmetric even ifthe optimizer might not be Thus the kernel is independent of theorder of the graphs

The independence of vertex ordering again derives directlyfrom the optimality of the value of the divergence Let G0

b be avertex-permuted version of Gb such that its mixing matrix

ρ0b frac14 TρbTT for a permutation matrix TAΣ and assume that

kQJSAethGaG0bTHORNakQJSAethGaGbTHORN Without loss of generality we can

assume kQJSAethGaG0bTHORN4kQJSAethGaGbTHORN Let

Qnfrac14 argmaxQΣ

expethμDQJSethρaQρ0bQT THORNTHORN eth31THORN

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎6

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

then

expethμDQJSethρaQnρ0bQnT THORNTHORN frac14 expethμDQJSethρaQnTρbT

TQnT THORNTHORN

4maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN eth32THORN

leading to a contradiction

Note that these properties depend on the optimality of thealignment with respect to the Quantum JensenndashShannon diver-gence We approximate the optimal alignment by the minimiza-tion of the Frobenious (L2) norm between the density matricesThis is approximated using Umeyamas algorithm since the result-ing matching problem is still in general NP-hard However in theproof the optimality was used as a means to uniquely identify thevalue of the divergence Any other uniquely identified value wouldstill give an order invariant kernel even if this is suboptimal

Eq (29) does not depend on the choice of optimal alignmentand so it would still hold under L2 optimality The main problem iswhether two permutations leading to the same (minimal) Frobe-nious distance between the density matrices can give differentdivergence values If permutations Q1 and Q2 differ by thecombination of symmetries present in the two graphs ieQ1 frac14 SQ2R with SAAutethGaTHORN and RAAutethGbTHORN then clearly they willhave the same divergence since ρa frac14 SρsS

T and ρb frac14 RρbRT but that

might not characterize all the L2 optimal permutations None-theless we have the following

Theorem 34 If the L2 optimal permutation is unique moduloisomorphism in the two graphs then the L2 approximation isindependent of the order of the graphs

Proof Derives directly from the uniqueness of the kernel value forall the optimizers

There remains another source of error since Umeyamas algo-rithm is only an approximation to the Frobenious minimizer Theimpact of this is much wider since the lack of optimality implies aninability to uniquely identify the value of the kernel in an invariantway However this is implicit in all alignment-based approachesdue to the NP hard nature of the graph matching problem

33 Complexity analysis

For a graph dataset having N graphs the kernel matrix of thequantum JensenndashShannon graph kernel can be computed usingAlgorithm 1

Algorithm 1 Computing the kernel matrix of the quantumJensenndashShannon graph kernel on N graphs

Input A graph dataset Gfrac14 fG1hellipGahellipGbhellipGNg with N testgraphs Output A jNj jNj kernel matrix KQJS1 Computing the density matrices of graphs For each graph

GaethVa EaTHORN compute the density matrix evolving from time0 to time T using the definition in Section 19

2 Computing the density matrix for each pair of graphs Foreach pair of graphs GaethVa EaTHORN and GbethVb EbTHORN with theirdensity matrices ρa and ρb compute ρa thornρb

2 using theunaligned or the aligned procedure

3 Computing the kernel matrix for the N graphs Computethe kernel matrix KQJS For each pair of graphs GaethVa EaTHORNand GbethVb EbTHORN compute the ethGaGbTHORN th element of KQJS askQJSethGaGbTHORN using Eq (24) or Eq (26)

For N graphs each of which has n vertices the time complexityof the quantum JensenndashShannon graph kernel on all pairs of thesegraphs is given by the three computational steps of Algorithm 1

These are (a) The computation of the density matrix for eachgraph Based on the definition in Section 22 this step has timecomplexity OethNn3THORN (b) The computation of the aligned densitymatrix and the unaligned density matrix for all pairs of graphsThis has time complexities OethN2n4THORN and OethN2nTHORN (c) As a result thetime complexities of the quantum kernel using the aligned andunaligned density matrices are OethNn3thornN2n4THORN and OethNn3thornN2nTHORNrespectively

4 Experimental evaluation

In this section we test our graph kernels on several standardgraph based datasets There are three stages to our experimentalevaluation First we evaluate the stability of the von Neumannentropies obtained from the density matrices with time t Secondwe empirically compare our new kernel with several alternativestate-of-the-art graph kernels Finally we evaluate the computa-tional efficiency of our new kernel

41 Graph datasets

We show the performance of our quantum JensenndashShannongraph kernel on five standard graph based datasets from bioinfor-matics and computer vision These datasets include MUTAG PPIsPTC(MR) COIL5 and Shock

Some statistic concerning the datasets are given in Table 1MUTAG The MUTAG dataset consists of graphs representing

188 chemical compounds and aims to predict whether eachcompound possesses mutagenicity [40] Since the vertices andedges of each compound are labeled with a real number wetransform these graphs into unweighted graphs

PPIs The PPIs dataset consists of proteinndashprotein interactionnetworks (PPIs) [41] The graphs describe the interaction relation-ships between histidine kinases in different species of bacteriaHistidine kinase is a key protein in the development of signaltransduction If two proteins have direct (physical) or indirect(functional) association they are connected by an edge There are219 PPIs in this dataset and they are collected from five differentkinds of bacteria with the following evolution order (from older tomore recent) Aquifex4 and thermotoga4 PPIs from Aquifex aelicusand Thermotoga maritima respectively Gram-Positive52 PPIs fromStaphylococcus aureus Cyanobacteria73 PPIs from Anabaena varia-bilis and Proteobacteria40 PPIs from Acidovorax avenae There is anadditional class (Acidobacteria46 PPIs) which is more controversialin terms of the bacterial evolution since they were discoveredHere we select Proteobacteria40 PPIs and Acidobacteria46 PPIs asthe testing graphs

PTC The PTC (The Predictive Toxicology Challenge) datasetrecords the carcinogenicity of several 100 chemical compounds formale rats (MR) female rats (FR) male mice (MM) and female mice(FM) [42] These graphs are very small (ie 20ndash30 vertices) andsparse (ie 25ndash40 edges We select the graphs of male rats (MR)for evaluation There are 344 test graphs in the MR class

COIL5 We create a dataset referred to as COIL5 from the COILimage database The COIL database consists of images of 100 3D

Table 1Information of the graph based datasets

Datasets MUTAG PPIs PTC COIL5 Shock

Max vertices 28 232 109 241 33Min vertices 10 3 2 72 4Ave vertices 1793 10960 2560 14490 1316 graphs 188 86 344 360 150 classes 2 2 2 5 10

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 7

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

objects In our experiments we use the images for the first fiveobjects For each of these objects we employ 72 images capturedfrom different viewpoints For each image we first extract cornerpoints using the Harris detector and then establish Delaunaygraphs based on the corner points as vertices Each vertex is usedas the seed of a Voronoi region which expands radially with aconstant speed The linear collision fronts of the regions delineatethe image plane into polygons and the Delaunay graph is theregion adjacency graph for the Voronoi polygons Associated witheach object there are 72 graphs from the 72 different object viewsSince the graphs of an object through different viewpoints are verydifferent this dataset is hard for graph classification

Shock The Shock dataset consists of graphs from the Shock 2Dshape database Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape There are150 graphs divided into 10 classes Each class contains 15 graphs

42 Evaluations on the von Neumann entropy with time t

We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies In this experimentwe use the testing graphs in the MUTAG PPIs PTC and COIL5datasets while we do not perform the evaluation on the Shockdataset For each graph we allow a continuous-time quantumwalk to evolve with t frac14 012hellip50 As the walk evolves fromtime tfrac140 to time tfrac1450 we compute the corresponding densitymatrix using Eq (15) Thus at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

with each graph The experimental results are shown in Fig 1 Thesubfigures of Fig 1 show the mean von Neumann entropies for thegraphs in the MUTAG PPIs PTC and COIL5 datasets separately Thex-axis shows the time t from 0 to 50 and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class Here the different lines represent the entropies forthe different classes of graphs These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset

43 Experiments on standard graph datasets from bioinformatics

431 Experimental setupWe now evaluate the performance of our quantum Jensenndash

Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU) Furthermore we compareour kernel with several alternative state-of-the-art graph kernelsThese kernels include (1) the WeisfeilerndashLehman subtree kernel(WL) [5] (2) the shortest path graph kernel (SPGK) [4] (3) theJensenndashShannon graph kernel associated with the steady staterandom walk (JSGK) [14] (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6]and (5) the random-walk graph kernel [3] For our quantumkernel we decide to let t-1 For the WeisfeilerndashLehmansubtree kernel we set the dimension of the WeisfeilerndashLehmanisomorphism as 10 Based on the definition in [5] this means thatwe compute 10 different WeisfeilerndashLehman subtree kernelmatrices (ie keth1THORN keth2THORNhellip keth10THORN) with different subtree heights

0 10 20 30 40 500

005

01

015

02

025

03

035

04

045

The time t

The

mea

n en

tropy

val

ueEntropy evaluation on the MUTAG dataset

Mutagenic aromaticMutagenic heteroaromatic

0 10 20 30 40 500

02

04

06

08

1

12

14

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the PPIs dataset

Proteobacteria40 PPIsAcidobacteria46 PPIs

0 10 20 30 40 500

005

01

015

02

025

The

mea

n en

tropy

val

ue

The time t

Entropy evaluation on the PTC dataset

First class (MR)Second class (MR)

0 10 20 30 40 50minus01

0

01

02

03

04

05

06

07

08

09

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the COIL5 dataset

box objectonion objectship objecttomato objectbottle object

Fig 1 Evaluations on the von Neumann entropy with time t (a) on the MUTAG dataset (b) on the PPIs dataset (c) on the PTC(MR) dataset and (d) on the COIL5 dataset

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎8

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

The probability of the walk being at vertex uAV at time t isgiven by

PrethXt frac14 uTHORN frac14 αuethtTHORNαn

uethtTHORN eth3THORNMore generally a quantum walker represented by jψ trang can be

observed to be in any state jϕrangACn and not just in the statesjuranguAV corresponding to the graph vertices Given an orthonor-mal basis jϕ1ranghellip jϕnrang of C

n the quantumwalker jψ trang is observed instate jϕirangwith probability langψ ijψ irang After being observed on state jϕirangthe new state of the quantum walker is jϕirang Further given thenature of quantum observation only orthonormal states can bedistinguished by a single observation

In quantum mechanics the Schrodinger equation is a partialdifferential equation that describes how quantum systems evolvewith time In the non-relativistic case it assumes the form

iℏpartpartt

ψrangfrac14H ψrang eth4THORN

where the Hamiltonian operatorH accounts for the total energy of asystem In the case of a particle moving in an empty space with zeropotential energy this reduces to the Laplacian operator nabla2 Here wetake the Hamiltonian of the system to be the graph Laplacian matrixL However any Hermitian operator encoding the structure of thegraph can be chosen as an alternative The evolution of the walk attime t is then given by the Schrodinger equation

partpartt

ψ trangfrac14 iL ψ trang eth5THORN

where the Laplacian L plays the role of the system HamiltonianGiven an initial state jψ0rang the Schrodinger equation has anexponential solution which can be used to determine the stateof the walk at a time t giving

jψ trangfrac14 e iLt jψ0rang eth6THORNwhere Ut frac14 e iLt is a unitary matrix governing the evolution of thewalk To simulate the evolution of the quantum walk it iscustomary to rewrite Eq (6) in terms of the spectral decomposi-tion Lfrac14ΦTΛΦ of the Laplacian matrix L where Λfrac14 diagethλ1 λ2hellip λjV jTHORN is a diagonal matrix with the ordered eigenvalues aselements ethλ1oλ2o⋯oλjV jTHORN and Φfrac14 ethϕ1jϕ2j⋯jϕjV jTHORN is a matrixwith the corresponding ordered orthonormal eigenvectors ascolumns Hence Eq (6) can be rewritten as

jψ trangfrac14ΦTe iΛtΦjψ0rang eth7THORNIn this work we let αueth0THORN be proportional to du and using the

constraints applied to the amplitude we get

αueth0THORN frac14ffiffiffiffiffiffiffiffiffiffidusumdu

s eth8THORN

Note that there is no particular indication as how to choose theinitial state However choosing a uniform distribution over thevertices of G would result in the system remaining stationarysince the initial state vector would be an eigenvector of theLaplacian of G and thus a stationary state of the walk

22 A density matrix from the mixed state

While a pure state can be naturally described using a single ketvector in general a quantum system can be in a mixed state ie astatistical ensemble of pure quantum states jψ irang each with prob-ability pi The density operator (or density matrix) of such a systemis defined as

ρfrac14sumipijψ iranglangψ ij eth9THORN

Density operators are positive unit-trace matrices directlylinked with the observables of the (mixed) quantum system

Let O be an observable ie a Hermitian operator acting on thequantum states and providing a measurement Without loss ofgenerality we have Ofrac14summ

i frac14 1viPi where Pi is the orthogonalprojector onto the i-th observation basis and vi is the measure-ment value when the quantum state is observed to be in thisobservation basis

The expectation value of the measurement over a mixed statecan be calculated from the density matrix ρ

langOrangfrac14 trethρOTHORN eth10THORNwhere tr is the trace operator Similarly the observation prob-ability on the i-th observation basis can be expressed in terms ofthe density matrix ρ as

PrethX frac14 iTHORN frac14 trethρPiTHORN eth11THORNFinally after the measurement the corresponding density opera-tor will be

ρ0 frac14 summ

i frac14 1PiρPi eth12THORN

For a graph GethV ETHORN let jψ trang denote the state corresponding to acontinuous-time quantumwalk that has evolved from time tfrac140 totime tfrac14T We define the time-averaged density matrix ρG

T forGethV ETHORN

ρTG frac14 1T

Z T

0ψ tranglangψ t dt

eth13THORN

which can be expressed in terms of the initial state jψ0rang as

ρTG frac14 1T

Z T

0e iLt ψ0ranglangψ0 eiLt dt

eth14THORN

In other words ρGT describes a system which has an equal

probability of being in any of the pure states defined by theevolution of the quantum walk from tfrac140 to tfrac14T Using thespectral decomposition of the Laplacian we can rewrite theprevious equation as

ρTG frac14 1T

Z T

0ΦTe iΛtΦ ψ0ranglangψ0 ΦTeiΛtΦ dt

eth15THORN

Let ϕxy denote the ethxyTHORN th element of the matrix of eigenvectorsΦ of the Laplacian The (rc)th element of ρT can be computed as

ρTGethr cTHORN frac141T

Z T

0sumn

k frac14 1sumn

l frac14 1ϕrke

iλktϕlkψ0l

sumn

x frac14 1sumn

y frac14 1ψ0xϕxye

iλytϕcy

dt

eth16THORNLet ψ k frac14sumlϕlkψ0l and ψ y frac14sumxϕxyψ0y where ψ0l denotes the lthelement of jψ0rang Then

ρTGethr cTHORN frac141T

Z T

0sumn

k frac14 1ϕrke

iλktψ k sumn

y frac14 1ϕcye

iλytψ y

dt eth17THORN

which can be rewritten as

ρTGethr cTHORN frac14 sumn

k frac14 1sumn

y frac14 1ϕrkϕcyψ kψ y

1T

Z T

0eiethλy λkTHORNt dt eth18THORN

If we let T-1 Eq (18) further simplifies to

ρ1G ethr cTHORN frac14 sumλA ~Λ

sumxABλ

sumyBλ

ϕrxϕcyψ xψ y eth19THORN

where ~Λ is the set of distinct eigenvalues of the Laplacian matrix Land Bλ is a basis of the eigenspace associated with λ As aconsequence computing the time-averaged density matrix relieson computing the eigendecomposition of GethV ETHORN and hence hastime complexity OethjV j3THORN Note that in the remainder of this paperwe will simplify our notation by referring to the time-averageddensity matrix as the density matrix associated with a graph

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎4

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

Moreover we will assume that ρGT is computed for T-1 unless

otherwise stated and we will denote it as ρG

23 The von Neumann entropy of a graph

In quantum mechanics the von Neumann entropy [32] HN of amixture is defined in terms of the trace and logarithm of thedensity operator ρ

HN frac14 trethρ log ρTHORN frac14 sumiξi ln ξi eth20THORN

where ξ1hellip ξn are the eigenvalues of ρ Note that if the quantumsystem is a pure state jψ irang with probability pifrac141 then the vonNeumann entropy HNethρTHORN frac14 trethρ log ρTHORN is zero On other hand amixed state generally has a non-zero von Neumann entropyassociated with its density operator

Here we propose to associate to each graph the von Neumannentropy of the density matrix computed as in Eq (19) Consider agraph GethV ETHORN its von Neumann entropy is

HNethρGTHORN frac14 trethρG log ρGTHORN frac14 sumjV j

jλGj log λGj eth21THORN

where λG1 hellip λGj hellip λGjV j are the eigenvalues of ρG

24 Quantum JensenndashShannon divergence

The classical JensenndashShannon divergence is a non-extensivemutual information measure defined between probability distri-butions over potentially structured data It is related to theShannon entropy of the two distributions [33] Consider two(discrete) probability distributions P frac14 ethp1hellip pahellip pATHORN and Qfrac14ethq1hellip qbhellip qBTHORN then the classical JensenndashShannon divergencebetween P and Q is defined as

DJSethPQTHORN frac14HSPthornQ2

12HSethPTHORN1

2HSethQTHORN eth22THORN

where HSethPTHORN frac14sumAa frac14 1pa log pa is the Shannon entropy of distri-

bution P The classical JensenndashShannon divergence is alwayswell defined symmetric negative definite and bounded ie0rDJSr1

The quantum JensenndashShannon divergence has recently beendeveloped as a generalization of the classical JensenndashShannondivergence to quantum states by Lamberti et al [13] Given twodensity operators ρ and s the quantum JensenndashShannon diver-gence between them is defined as

DQJSethρsTHORN frac14HNρthorns2

12HNethρTHORN

12HNethsTHORN eth23THORN

The quantum JensenndashShannon divergence is always well definedsymmetric negative definite and bounded ie 0rDQJSr1 [13]

Additionally the quantum JensenndashShannon divergence verifiesa number of properties which are required for a good distinguish-ability measure between quantum states [1318] This is a problemof key importance in quantum computation and it requires thedefinition of a suitable distance measure Wootters [34] was thefirst to introduce the concept of statistical distance between purequantum states His work is based on extending a distance overthe space of probability distributions to the Hilbert space of purequantum states Similarly the relative entropy [35] can be seen asa generalization of the information theoretic KullbackndashLeiblerdivergence However the relative entropy is unbounded it is notsymmetric and it does not satisfy the triangle inequality

The square root of the QJSD on the other hand is bounded it isa distance and as proved by Lamberti et al [13] it satisfies thetriangle inequality This is formally proved for pure states whilefor mixed states there is only empirical evidence Some alterna-tives to the QJSD are described in the literature most notably the

Bures distance [36] Both the Bures distance and the QJSD requirethe full density matrices to be computed However the QJSD turnsout to be faster to compute On the one hand the Bures distanceinvolves taking the square root of matrices usually computedthrough matrix diagonalization and which has time complexityOethn3THORN where n is the number of vertices in the graph On the otherhand the QJSD simply requires computing the eigenvalues of thedensity matrices which can be done in Oethn2THORN

3 A quantum JensenndashShannon graph kernel

In this section we propose a novel graph kernel based on thequantum JensenndashShannon divergence between continuous-timequantumwalks on different graphs Suppose that the graphs underconsideration are represented by a set fGhellipGahellipGbhellipGNg weconsider the continuous-time quantum walks on a pair of graphsGaethVa EaTHORN and GbethVb EbTHORN whose mixed state density matrices ρaand ρb can be computed using Eq (18) or Eq (19) With the densitymatrices ρa and ρb to hand we compute the quantum JensenndashShannon divergence DQJSethρa ρbTHORN using Eq (23) However the resultdepends on the relative order of the vertices in the two graphsand we need a way to describe the mixed quantum state given bythe walk on both graphs To this end we consider two strategiesWe refer to these as the unaligned density matrix and the aligneddensity matrix In the first case the density matrices are addedmaintaining the vertex order of the data This results in theunaligned kernel

kQJSUethGaGbTHORN frac14 expethμDQJSethρa ρbTHORNTHORN eth24THORN

where μ is a decay factor in the interval 0oμr1

DQJSethρa ρbTHORN frac14HNρathornρb

2

12HNethρaTHORN

12HNethρbTHORN eth25THORN

is the quantum Jensen Shannon divergence between the mixedstate density matrices ρa and ρb of Ga and Gb and HNethTHORN is the vonNeumann entropy of a graph density matrix defined in Eq (21)Here μ is used to ensure that the large value does not tend todominant the kernel value and in this work we use μfrac141 Theproposed kernel can be viewed as a similarity measure betweenthe quantum states associated with a pair of graphs Unfortunatelysince the unaligned density matrix ignores the vertex correspon-dence information for a pair of graphs it is not permutationinvariant to vertex order Instead it relies only on the globalproperties and long-range interactions between vertices to differ-entiate between structures

In the second case the mixing is performed after the densitymatrices are optimally aligned thus computing a lower bound ofthe divergence over the set Σ of state permutations This results inthe aligned kernel

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

frac14 exp μminQ AΣ

DQJS ρaQρbQT

eth26THORN

Here the alignment step adds correspondence information allow-ing for a more fine-grained discrimination between structures

In both cases the density matrix of smaller size is padded to thesize of the larger one before computing the divergence Note thatthis is equivalent to padding the adjacency matrix of the smallergraph to be the same size as the larger one since both theseoperations are simply increasing the dimension of the kernel ofthe density matrix

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 5

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

31 Alignment of the density matrix

The definition of the aligned kernel requires the computationof the permutation that minimizes the divergence between thegraphs This is equivalent to minimizing the von Neumann entropyof the mixed state ethρathornρbTHORN=2 However this is a computationallyhard task and so we perform a two step approximation to theproblem First we approximate the solution by minimizing theFrobenious (L2) distance between matrices rather than the vonNeumann entropy Second we find an approximate solution to theL2 problem using Umeyamas method [29]

The rationale behind this approximate method is given by thefact that we already have the eigenvectors and eigenvalues of theHamiltonian to hand and with these we can efficiently computethe eigendecomposition of the density matrices [37] (which isrequired in order to apply Umeyamas spectral approach) Moreprecisely the authors of [37] show that

Lρa frac14 sumλA ~Λ

λPλPTλ

sumλA ~Λ

Pλρ0PTλ

frac14 sum

λA ~Λ

Pλλρ0PTλ

frac14 sumλA ~Λ

Pλρ0PTλ

sumλA ~Λ

λPλPTλ

frac14 ρaL eth27THORN

where ρ0 denotes the density matrix for G at time tfrac140 andPλ frac14sumμethλTHORN

k frac14 1ϕλkϕTλk is the projection operator onto the subspace

spanned by the μethλTHORN eigenvectors ϕλk associated with the eigen-value λ of the Hamiltonian In other words given the spectraldecomposition of the Laplacian matrix Lfrac14ΦΛΦT if we express thedensity matrix in the eigenvector basis given by Φ then it assumesa block diagonal form where each block corresponds to aneigenspace of L corresponding to a single eigenvalue As aconsequence if L has all eigenvalues distinct then ρa will havethe same spectrum as L Otherwise we need to independentlycompute the eigendecomposition of each diagonal block resultingin a complexity OethsumλA ~ΛμethλTHORN2THORN where μethλTHORN denotes the multiplicityof the eigenvalue λ

32 Properties of the quantum JensenndashShannon kernel for graphs

Theorem 31 If the Quantum JensenndashShannon divergence is nega-tive definite then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof First note that for the computation of Quantum JensenndashShannon divergence padding the smaller matrix to the size of thelarger one is equivalent to padding both to an even larger sizeThus we can consider the computation of the kernel matrix over Ngraphs to be performed over density matrices padded to the size ofthe larger graph This eliminates the dependence of the densitymatrix of the first graph on the choice of the second graph

In [38] the authors show that if sethGaGbTHORN is a conditionallypositive semidefinite kernel ie sethGaGbTHORN is a negative definitekernel then ks frac14 expethλsethGaGbTHORNTHORN is positive semidefinite for anyλ40

Hence if the Quantum JensenndashShannon divergence DQJSethGaGbTHORNis negative definite then DQJSethGaGbTHORN is conditionally positivesemidefinite and the Quantum JensenndashShannon kernelkQJSethGaGbTHORN frac14 expethμDQJSethGaGbTHORNTHORN is positive definite

Note that this proof is conditioned on the negative definitenessof the Quantum JensenndashShannon divergence as is the case for theclassical JensenndashShannon divergence Currently this is only aconjecture proved to be true on pure quantum states and forwhich there is a strong empirical evidence [39]

In general for the optimal alignment kernel [1011] we cannotguarantee positive definiteness for the aligned version of our

quantum kernel due to the lack of transitivity of the alignmentsWe do have however a similar result when using any (possiblynon optimal) set of alignments that satisfy transitivity

Corollary 32 If the Quantum JensenndashShannon divergence is nega-tive definite and the density matrices are aligned by a transitive set ofpermutations then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof If the alignments are transitive each of the density matricescan be aligned to a single common frame for example that of the firstgraph The aligned kernel on the original unaligned data is equivalentto the unaligned kernel on the new pre-aligned data

Another important property for a kernel is its independence onthe ordering of the data There are two ways by which the kernelcan depend on this ordering First it might depend on the order inwhich different graphs are presented Second it might depend onthe order in which the vertices are presented in each graph

Since the kernel only depends on second order relations asufficient condition for the first type of independence is for thekernel to be deterministic and symmetric If this is so then apermutation of the graphs will just result in the same permutationto be applied to the rows and columns of the kernel matrix For thesecond type of invariance we need full invariance of the computedkernel values with respect to vertex permutations

The unaligned kernel trivially satisfies the first condition sincethe Quantum JensenndashShannon divergence is both deterministicand symmetric while it fails to satisfy the second condition thusresulting in a kernel that is independent on the order of thegraphs but not on the vertex order within each graph For thealigned kernel on the other hand the addition of the alignmentstep requires a little more analysis

Theorem 33 The aligned Quantum JensenndashShannon kernel isindependent both to graph and to vertex order

Proof Recall that given two graphs Ga and Gb with mixed statedensity matrices ρa and ρb the aligned kernel is

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

eth28THORN

As a result the kernel value is uniquely identified evenwhen there aremultiple possible alignments Due to their optimality these align-ments must all result in the same optimal divergence It is easy to seethat the kernel is also symmetric In fact since the von Neumannentropy is invariant to similarity transformations we have

HNρathornQρbQ

T

2

frac14HN Q

QTρaQthornρb2

QT

frac14HN

QTρaQthornρb2

eth29THORN

From the above DQJSethρaQρbQT THORN frac14DQJSethQTρaQ ρbTHORN frac14DQJSethρbQTρaQ THORN

Hence

kQJSAethGaGbTHORN frac14maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN

frac14maxQ AΣ

expethμDQJSethρbQρaQT THORNTHORN frac14 kQJSAethGbGaTHORN eth30THORN

As a result the aligned kernel is deterministic and symmetric even ifthe optimizer might not be Thus the kernel is independent of theorder of the graphs

The independence of vertex ordering again derives directlyfrom the optimality of the value of the divergence Let G0

b be avertex-permuted version of Gb such that its mixing matrix

ρ0b frac14 TρbTT for a permutation matrix TAΣ and assume that

kQJSAethGaG0bTHORNakQJSAethGaGbTHORN Without loss of generality we can

assume kQJSAethGaG0bTHORN4kQJSAethGaGbTHORN Let

Qnfrac14 argmaxQΣ

expethμDQJSethρaQρ0bQT THORNTHORN eth31THORN

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎6

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

then

expethμDQJSethρaQnρ0bQnT THORNTHORN frac14 expethμDQJSethρaQnTρbT

TQnT THORNTHORN

4maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN eth32THORN

leading to a contradiction

Note that these properties depend on the optimality of thealignment with respect to the Quantum JensenndashShannon diver-gence We approximate the optimal alignment by the minimiza-tion of the Frobenious (L2) norm between the density matricesThis is approximated using Umeyamas algorithm since the result-ing matching problem is still in general NP-hard However in theproof the optimality was used as a means to uniquely identify thevalue of the divergence Any other uniquely identified value wouldstill give an order invariant kernel even if this is suboptimal

Eq (29) does not depend on the choice of optimal alignmentand so it would still hold under L2 optimality The main problem iswhether two permutations leading to the same (minimal) Frobe-nious distance between the density matrices can give differentdivergence values If permutations Q1 and Q2 differ by thecombination of symmetries present in the two graphs ieQ1 frac14 SQ2R with SAAutethGaTHORN and RAAutethGbTHORN then clearly they willhave the same divergence since ρa frac14 SρsS

T and ρb frac14 RρbRT but that

might not characterize all the L2 optimal permutations None-theless we have the following

Theorem 34 If the L2 optimal permutation is unique moduloisomorphism in the two graphs then the L2 approximation isindependent of the order of the graphs

Proof Derives directly from the uniqueness of the kernel value forall the optimizers

There remains another source of error since Umeyamas algo-rithm is only an approximation to the Frobenious minimizer Theimpact of this is much wider since the lack of optimality implies aninability to uniquely identify the value of the kernel in an invariantway However this is implicit in all alignment-based approachesdue to the NP hard nature of the graph matching problem

33 Complexity analysis

For a graph dataset having N graphs the kernel matrix of thequantum JensenndashShannon graph kernel can be computed usingAlgorithm 1

Algorithm 1 Computing the kernel matrix of the quantumJensenndashShannon graph kernel on N graphs

Input A graph dataset Gfrac14 fG1hellipGahellipGbhellipGNg with N testgraphs Output A jNj jNj kernel matrix KQJS1 Computing the density matrices of graphs For each graph

GaethVa EaTHORN compute the density matrix evolving from time0 to time T using the definition in Section 19

2 Computing the density matrix for each pair of graphs Foreach pair of graphs GaethVa EaTHORN and GbethVb EbTHORN with theirdensity matrices ρa and ρb compute ρa thornρb

2 using theunaligned or the aligned procedure

3 Computing the kernel matrix for the N graphs Computethe kernel matrix KQJS For each pair of graphs GaethVa EaTHORNand GbethVb EbTHORN compute the ethGaGbTHORN th element of KQJS askQJSethGaGbTHORN using Eq (24) or Eq (26)

For N graphs each of which has n vertices the time complexityof the quantum JensenndashShannon graph kernel on all pairs of thesegraphs is given by the three computational steps of Algorithm 1

These are (a) The computation of the density matrix for eachgraph Based on the definition in Section 22 this step has timecomplexity OethNn3THORN (b) The computation of the aligned densitymatrix and the unaligned density matrix for all pairs of graphsThis has time complexities OethN2n4THORN and OethN2nTHORN (c) As a result thetime complexities of the quantum kernel using the aligned andunaligned density matrices are OethNn3thornN2n4THORN and OethNn3thornN2nTHORNrespectively

4 Experimental evaluation

In this section we test our graph kernels on several standardgraph based datasets There are three stages to our experimentalevaluation First we evaluate the stability of the von Neumannentropies obtained from the density matrices with time t Secondwe empirically compare our new kernel with several alternativestate-of-the-art graph kernels Finally we evaluate the computa-tional efficiency of our new kernel

41 Graph datasets

We show the performance of our quantum JensenndashShannongraph kernel on five standard graph based datasets from bioinfor-matics and computer vision These datasets include MUTAG PPIsPTC(MR) COIL5 and Shock

Some statistic concerning the datasets are given in Table 1MUTAG The MUTAG dataset consists of graphs representing

188 chemical compounds and aims to predict whether eachcompound possesses mutagenicity [40] Since the vertices andedges of each compound are labeled with a real number wetransform these graphs into unweighted graphs

PPIs The PPIs dataset consists of proteinndashprotein interactionnetworks (PPIs) [41] The graphs describe the interaction relation-ships between histidine kinases in different species of bacteriaHistidine kinase is a key protein in the development of signaltransduction If two proteins have direct (physical) or indirect(functional) association they are connected by an edge There are219 PPIs in this dataset and they are collected from five differentkinds of bacteria with the following evolution order (from older tomore recent) Aquifex4 and thermotoga4 PPIs from Aquifex aelicusand Thermotoga maritima respectively Gram-Positive52 PPIs fromStaphylococcus aureus Cyanobacteria73 PPIs from Anabaena varia-bilis and Proteobacteria40 PPIs from Acidovorax avenae There is anadditional class (Acidobacteria46 PPIs) which is more controversialin terms of the bacterial evolution since they were discoveredHere we select Proteobacteria40 PPIs and Acidobacteria46 PPIs asthe testing graphs

PTC The PTC (The Predictive Toxicology Challenge) datasetrecords the carcinogenicity of several 100 chemical compounds formale rats (MR) female rats (FR) male mice (MM) and female mice(FM) [42] These graphs are very small (ie 20ndash30 vertices) andsparse (ie 25ndash40 edges We select the graphs of male rats (MR)for evaluation There are 344 test graphs in the MR class

COIL5 We create a dataset referred to as COIL5 from the COILimage database The COIL database consists of images of 100 3D

Table 1Information of the graph based datasets

Datasets MUTAG PPIs PTC COIL5 Shock

Max vertices 28 232 109 241 33Min vertices 10 3 2 72 4Ave vertices 1793 10960 2560 14490 1316 graphs 188 86 344 360 150 classes 2 2 2 5 10

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 7

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

objects In our experiments we use the images for the first fiveobjects For each of these objects we employ 72 images capturedfrom different viewpoints For each image we first extract cornerpoints using the Harris detector and then establish Delaunaygraphs based on the corner points as vertices Each vertex is usedas the seed of a Voronoi region which expands radially with aconstant speed The linear collision fronts of the regions delineatethe image plane into polygons and the Delaunay graph is theregion adjacency graph for the Voronoi polygons Associated witheach object there are 72 graphs from the 72 different object viewsSince the graphs of an object through different viewpoints are verydifferent this dataset is hard for graph classification

Shock The Shock dataset consists of graphs from the Shock 2Dshape database Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape There are150 graphs divided into 10 classes Each class contains 15 graphs

42 Evaluations on the von Neumann entropy with time t

We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies In this experimentwe use the testing graphs in the MUTAG PPIs PTC and COIL5datasets while we do not perform the evaluation on the Shockdataset For each graph we allow a continuous-time quantumwalk to evolve with t frac14 012hellip50 As the walk evolves fromtime tfrac140 to time tfrac1450 we compute the corresponding densitymatrix using Eq (15) Thus at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

with each graph The experimental results are shown in Fig 1 Thesubfigures of Fig 1 show the mean von Neumann entropies for thegraphs in the MUTAG PPIs PTC and COIL5 datasets separately Thex-axis shows the time t from 0 to 50 and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class Here the different lines represent the entropies forthe different classes of graphs These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset

43 Experiments on standard graph datasets from bioinformatics

431 Experimental setupWe now evaluate the performance of our quantum Jensenndash

Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU) Furthermore we compareour kernel with several alternative state-of-the-art graph kernelsThese kernels include (1) the WeisfeilerndashLehman subtree kernel(WL) [5] (2) the shortest path graph kernel (SPGK) [4] (3) theJensenndashShannon graph kernel associated with the steady staterandom walk (JSGK) [14] (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6]and (5) the random-walk graph kernel [3] For our quantumkernel we decide to let t-1 For the WeisfeilerndashLehmansubtree kernel we set the dimension of the WeisfeilerndashLehmanisomorphism as 10 Based on the definition in [5] this means thatwe compute 10 different WeisfeilerndashLehman subtree kernelmatrices (ie keth1THORN keth2THORNhellip keth10THORN) with different subtree heights

0 10 20 30 40 500

005

01

015

02

025

03

035

04

045

The time t

The

mea

n en

tropy

val

ueEntropy evaluation on the MUTAG dataset

Mutagenic aromaticMutagenic heteroaromatic

0 10 20 30 40 500

02

04

06

08

1

12

14

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the PPIs dataset

Proteobacteria40 PPIsAcidobacteria46 PPIs

0 10 20 30 40 500

005

01

015

02

025

The

mea

n en

tropy

val

ue

The time t

Entropy evaluation on the PTC dataset

First class (MR)Second class (MR)

0 10 20 30 40 50minus01

0

01

02

03

04

05

06

07

08

09

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the COIL5 dataset

box objectonion objectship objecttomato objectbottle object

Fig 1 Evaluations on the von Neumann entropy with time t (a) on the MUTAG dataset (b) on the PPIs dataset (c) on the PTC(MR) dataset and (d) on the COIL5 dataset

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎8

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

Moreover we will assume that ρGT is computed for T-1 unless

otherwise stated and we will denote it as ρG

23 The von Neumann entropy of a graph

In quantum mechanics the von Neumann entropy [32] HN of amixture is defined in terms of the trace and logarithm of thedensity operator ρ

HN frac14 trethρ log ρTHORN frac14 sumiξi ln ξi eth20THORN

where ξ1hellip ξn are the eigenvalues of ρ Note that if the quantumsystem is a pure state jψ irang with probability pifrac141 then the vonNeumann entropy HNethρTHORN frac14 trethρ log ρTHORN is zero On other hand amixed state generally has a non-zero von Neumann entropyassociated with its density operator

Here we propose to associate to each graph the von Neumannentropy of the density matrix computed as in Eq (19) Consider agraph GethV ETHORN its von Neumann entropy is

HNethρGTHORN frac14 trethρG log ρGTHORN frac14 sumjV j

jλGj log λGj eth21THORN

where λG1 hellip λGj hellip λGjV j are the eigenvalues of ρG

24 Quantum JensenndashShannon divergence

The classical JensenndashShannon divergence is a non-extensivemutual information measure defined between probability distri-butions over potentially structured data It is related to theShannon entropy of the two distributions [33] Consider two(discrete) probability distributions P frac14 ethp1hellip pahellip pATHORN and Qfrac14ethq1hellip qbhellip qBTHORN then the classical JensenndashShannon divergencebetween P and Q is defined as

DJSethPQTHORN frac14HSPthornQ2

12HSethPTHORN1

2HSethQTHORN eth22THORN

where HSethPTHORN frac14sumAa frac14 1pa log pa is the Shannon entropy of distri-

bution P The classical JensenndashShannon divergence is alwayswell defined symmetric negative definite and bounded ie0rDJSr1

The quantum JensenndashShannon divergence has recently beendeveloped as a generalization of the classical JensenndashShannondivergence to quantum states by Lamberti et al [13] Given twodensity operators ρ and s the quantum JensenndashShannon diver-gence between them is defined as

DQJSethρsTHORN frac14HNρthorns2

12HNethρTHORN

12HNethsTHORN eth23THORN

The quantum JensenndashShannon divergence is always well definedsymmetric negative definite and bounded ie 0rDQJSr1 [13]

Additionally the quantum JensenndashShannon divergence verifiesa number of properties which are required for a good distinguish-ability measure between quantum states [1318] This is a problemof key importance in quantum computation and it requires thedefinition of a suitable distance measure Wootters [34] was thefirst to introduce the concept of statistical distance between purequantum states His work is based on extending a distance overthe space of probability distributions to the Hilbert space of purequantum states Similarly the relative entropy [35] can be seen asa generalization of the information theoretic KullbackndashLeiblerdivergence However the relative entropy is unbounded it is notsymmetric and it does not satisfy the triangle inequality

The square root of the QJSD on the other hand is bounded it isa distance and as proved by Lamberti et al [13] it satisfies thetriangle inequality This is formally proved for pure states whilefor mixed states there is only empirical evidence Some alterna-tives to the QJSD are described in the literature most notably the

Bures distance [36] Both the Bures distance and the QJSD requirethe full density matrices to be computed However the QJSD turnsout to be faster to compute On the one hand the Bures distanceinvolves taking the square root of matrices usually computedthrough matrix diagonalization and which has time complexityOethn3THORN where n is the number of vertices in the graph On the otherhand the QJSD simply requires computing the eigenvalues of thedensity matrices which can be done in Oethn2THORN

3 A quantum JensenndashShannon graph kernel

In this section we propose a novel graph kernel based on thequantum JensenndashShannon divergence between continuous-timequantumwalks on different graphs Suppose that the graphs underconsideration are represented by a set fGhellipGahellipGbhellipGNg weconsider the continuous-time quantum walks on a pair of graphsGaethVa EaTHORN and GbethVb EbTHORN whose mixed state density matrices ρaand ρb can be computed using Eq (18) or Eq (19) With the densitymatrices ρa and ρb to hand we compute the quantum JensenndashShannon divergence DQJSethρa ρbTHORN using Eq (23) However the resultdepends on the relative order of the vertices in the two graphsand we need a way to describe the mixed quantum state given bythe walk on both graphs To this end we consider two strategiesWe refer to these as the unaligned density matrix and the aligneddensity matrix In the first case the density matrices are addedmaintaining the vertex order of the data This results in theunaligned kernel

kQJSUethGaGbTHORN frac14 expethμDQJSethρa ρbTHORNTHORN eth24THORN

where μ is a decay factor in the interval 0oμr1

DQJSethρa ρbTHORN frac14HNρathornρb

2

12HNethρaTHORN

12HNethρbTHORN eth25THORN

is the quantum Jensen Shannon divergence between the mixedstate density matrices ρa and ρb of Ga and Gb and HNethTHORN is the vonNeumann entropy of a graph density matrix defined in Eq (21)Here μ is used to ensure that the large value does not tend todominant the kernel value and in this work we use μfrac141 Theproposed kernel can be viewed as a similarity measure betweenthe quantum states associated with a pair of graphs Unfortunatelysince the unaligned density matrix ignores the vertex correspon-dence information for a pair of graphs it is not permutationinvariant to vertex order Instead it relies only on the globalproperties and long-range interactions between vertices to differ-entiate between structures

In the second case the mixing is performed after the densitymatrices are optimally aligned thus computing a lower bound ofthe divergence over the set Σ of state permutations This results inthe aligned kernel

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

frac14 exp μminQ AΣ

DQJS ρaQρbQT

eth26THORN

Here the alignment step adds correspondence information allow-ing for a more fine-grained discrimination between structures

In both cases the density matrix of smaller size is padded to thesize of the larger one before computing the divergence Note thatthis is equivalent to padding the adjacency matrix of the smallergraph to be the same size as the larger one since both theseoperations are simply increasing the dimension of the kernel ofthe density matrix

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 5

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

31 Alignment of the density matrix

The definition of the aligned kernel requires the computationof the permutation that minimizes the divergence between thegraphs This is equivalent to minimizing the von Neumann entropyof the mixed state ethρathornρbTHORN=2 However this is a computationallyhard task and so we perform a two step approximation to theproblem First we approximate the solution by minimizing theFrobenious (L2) distance between matrices rather than the vonNeumann entropy Second we find an approximate solution to theL2 problem using Umeyamas method [29]

The rationale behind this approximate method is given by thefact that we already have the eigenvectors and eigenvalues of theHamiltonian to hand and with these we can efficiently computethe eigendecomposition of the density matrices [37] (which isrequired in order to apply Umeyamas spectral approach) Moreprecisely the authors of [37] show that

Lρa frac14 sumλA ~Λ

λPλPTλ

sumλA ~Λ

Pλρ0PTλ

frac14 sum

λA ~Λ

Pλλρ0PTλ

frac14 sumλA ~Λ

Pλρ0PTλ

sumλA ~Λ

λPλPTλ

frac14 ρaL eth27THORN

where ρ0 denotes the density matrix for G at time tfrac140 andPλ frac14sumμethλTHORN

k frac14 1ϕλkϕTλk is the projection operator onto the subspace

spanned by the μethλTHORN eigenvectors ϕλk associated with the eigen-value λ of the Hamiltonian In other words given the spectraldecomposition of the Laplacian matrix Lfrac14ΦΛΦT if we express thedensity matrix in the eigenvector basis given by Φ then it assumesa block diagonal form where each block corresponds to aneigenspace of L corresponding to a single eigenvalue As aconsequence if L has all eigenvalues distinct then ρa will havethe same spectrum as L Otherwise we need to independentlycompute the eigendecomposition of each diagonal block resultingin a complexity OethsumλA ~ΛμethλTHORN2THORN where μethλTHORN denotes the multiplicityof the eigenvalue λ

32 Properties of the quantum JensenndashShannon kernel for graphs

Theorem 31 If the Quantum JensenndashShannon divergence is nega-tive definite then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof First note that for the computation of Quantum JensenndashShannon divergence padding the smaller matrix to the size of thelarger one is equivalent to padding both to an even larger sizeThus we can consider the computation of the kernel matrix over Ngraphs to be performed over density matrices padded to the size ofthe larger graph This eliminates the dependence of the densitymatrix of the first graph on the choice of the second graph

In [38] the authors show that if sethGaGbTHORN is a conditionallypositive semidefinite kernel ie sethGaGbTHORN is a negative definitekernel then ks frac14 expethλsethGaGbTHORNTHORN is positive semidefinite for anyλ40

Hence if the Quantum JensenndashShannon divergence DQJSethGaGbTHORNis negative definite then DQJSethGaGbTHORN is conditionally positivesemidefinite and the Quantum JensenndashShannon kernelkQJSethGaGbTHORN frac14 expethμDQJSethGaGbTHORNTHORN is positive definite

Note that this proof is conditioned on the negative definitenessof the Quantum JensenndashShannon divergence as is the case for theclassical JensenndashShannon divergence Currently this is only aconjecture proved to be true on pure quantum states and forwhich there is a strong empirical evidence [39]

In general for the optimal alignment kernel [1011] we cannotguarantee positive definiteness for the aligned version of our

quantum kernel due to the lack of transitivity of the alignmentsWe do have however a similar result when using any (possiblynon optimal) set of alignments that satisfy transitivity

Corollary 32 If the Quantum JensenndashShannon divergence is nega-tive definite and the density matrices are aligned by a transitive set ofpermutations then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof If the alignments are transitive each of the density matricescan be aligned to a single common frame for example that of the firstgraph The aligned kernel on the original unaligned data is equivalentto the unaligned kernel on the new pre-aligned data

Another important property for a kernel is its independence onthe ordering of the data There are two ways by which the kernelcan depend on this ordering First it might depend on the order inwhich different graphs are presented Second it might depend onthe order in which the vertices are presented in each graph

Since the kernel only depends on second order relations asufficient condition for the first type of independence is for thekernel to be deterministic and symmetric If this is so then apermutation of the graphs will just result in the same permutationto be applied to the rows and columns of the kernel matrix For thesecond type of invariance we need full invariance of the computedkernel values with respect to vertex permutations

The unaligned kernel trivially satisfies the first condition sincethe Quantum JensenndashShannon divergence is both deterministicand symmetric while it fails to satisfy the second condition thusresulting in a kernel that is independent on the order of thegraphs but not on the vertex order within each graph For thealigned kernel on the other hand the addition of the alignmentstep requires a little more analysis

Theorem 33 The aligned Quantum JensenndashShannon kernel isindependent both to graph and to vertex order

Proof Recall that given two graphs Ga and Gb with mixed statedensity matrices ρa and ρb the aligned kernel is

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

eth28THORN

As a result the kernel value is uniquely identified evenwhen there aremultiple possible alignments Due to their optimality these align-ments must all result in the same optimal divergence It is easy to seethat the kernel is also symmetric In fact since the von Neumannentropy is invariant to similarity transformations we have

HNρathornQρbQ

T

2

frac14HN Q

QTρaQthornρb2

QT

frac14HN

QTρaQthornρb2

eth29THORN

From the above DQJSethρaQρbQT THORN frac14DQJSethQTρaQ ρbTHORN frac14DQJSethρbQTρaQ THORN

Hence

kQJSAethGaGbTHORN frac14maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN

frac14maxQ AΣ

expethμDQJSethρbQρaQT THORNTHORN frac14 kQJSAethGbGaTHORN eth30THORN

As a result the aligned kernel is deterministic and symmetric even ifthe optimizer might not be Thus the kernel is independent of theorder of the graphs

The independence of vertex ordering again derives directlyfrom the optimality of the value of the divergence Let G0

b be avertex-permuted version of Gb such that its mixing matrix

ρ0b frac14 TρbTT for a permutation matrix TAΣ and assume that

kQJSAethGaG0bTHORNakQJSAethGaGbTHORN Without loss of generality we can

assume kQJSAethGaG0bTHORN4kQJSAethGaGbTHORN Let

Qnfrac14 argmaxQΣ

expethμDQJSethρaQρ0bQT THORNTHORN eth31THORN

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎6

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

then

expethμDQJSethρaQnρ0bQnT THORNTHORN frac14 expethμDQJSethρaQnTρbT

TQnT THORNTHORN

4maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN eth32THORN

leading to a contradiction

Note that these properties depend on the optimality of thealignment with respect to the Quantum JensenndashShannon diver-gence We approximate the optimal alignment by the minimiza-tion of the Frobenious (L2) norm between the density matricesThis is approximated using Umeyamas algorithm since the result-ing matching problem is still in general NP-hard However in theproof the optimality was used as a means to uniquely identify thevalue of the divergence Any other uniquely identified value wouldstill give an order invariant kernel even if this is suboptimal

Eq (29) does not depend on the choice of optimal alignmentand so it would still hold under L2 optimality The main problem iswhether two permutations leading to the same (minimal) Frobe-nious distance between the density matrices can give differentdivergence values If permutations Q1 and Q2 differ by thecombination of symmetries present in the two graphs ieQ1 frac14 SQ2R with SAAutethGaTHORN and RAAutethGbTHORN then clearly they willhave the same divergence since ρa frac14 SρsS

T and ρb frac14 RρbRT but that

might not characterize all the L2 optimal permutations None-theless we have the following

Theorem 34 If the L2 optimal permutation is unique moduloisomorphism in the two graphs then the L2 approximation isindependent of the order of the graphs

Proof Derives directly from the uniqueness of the kernel value forall the optimizers

There remains another source of error since Umeyamas algo-rithm is only an approximation to the Frobenious minimizer Theimpact of this is much wider since the lack of optimality implies aninability to uniquely identify the value of the kernel in an invariantway However this is implicit in all alignment-based approachesdue to the NP hard nature of the graph matching problem

33 Complexity analysis

For a graph dataset having N graphs the kernel matrix of thequantum JensenndashShannon graph kernel can be computed usingAlgorithm 1

Algorithm 1 Computing the kernel matrix of the quantumJensenndashShannon graph kernel on N graphs

Input A graph dataset Gfrac14 fG1hellipGahellipGbhellipGNg with N testgraphs Output A jNj jNj kernel matrix KQJS1 Computing the density matrices of graphs For each graph

GaethVa EaTHORN compute the density matrix evolving from time0 to time T using the definition in Section 19

2 Computing the density matrix for each pair of graphs Foreach pair of graphs GaethVa EaTHORN and GbethVb EbTHORN with theirdensity matrices ρa and ρb compute ρa thornρb

2 using theunaligned or the aligned procedure

3 Computing the kernel matrix for the N graphs Computethe kernel matrix KQJS For each pair of graphs GaethVa EaTHORNand GbethVb EbTHORN compute the ethGaGbTHORN th element of KQJS askQJSethGaGbTHORN using Eq (24) or Eq (26)

For N graphs each of which has n vertices the time complexityof the quantum JensenndashShannon graph kernel on all pairs of thesegraphs is given by the three computational steps of Algorithm 1

These are (a) The computation of the density matrix for eachgraph Based on the definition in Section 22 this step has timecomplexity OethNn3THORN (b) The computation of the aligned densitymatrix and the unaligned density matrix for all pairs of graphsThis has time complexities OethN2n4THORN and OethN2nTHORN (c) As a result thetime complexities of the quantum kernel using the aligned andunaligned density matrices are OethNn3thornN2n4THORN and OethNn3thornN2nTHORNrespectively

4 Experimental evaluation

In this section we test our graph kernels on several standardgraph based datasets There are three stages to our experimentalevaluation First we evaluate the stability of the von Neumannentropies obtained from the density matrices with time t Secondwe empirically compare our new kernel with several alternativestate-of-the-art graph kernels Finally we evaluate the computa-tional efficiency of our new kernel

41 Graph datasets

We show the performance of our quantum JensenndashShannongraph kernel on five standard graph based datasets from bioinfor-matics and computer vision These datasets include MUTAG PPIsPTC(MR) COIL5 and Shock

Some statistic concerning the datasets are given in Table 1MUTAG The MUTAG dataset consists of graphs representing

188 chemical compounds and aims to predict whether eachcompound possesses mutagenicity [40] Since the vertices andedges of each compound are labeled with a real number wetransform these graphs into unweighted graphs

PPIs The PPIs dataset consists of proteinndashprotein interactionnetworks (PPIs) [41] The graphs describe the interaction relation-ships between histidine kinases in different species of bacteriaHistidine kinase is a key protein in the development of signaltransduction If two proteins have direct (physical) or indirect(functional) association they are connected by an edge There are219 PPIs in this dataset and they are collected from five differentkinds of bacteria with the following evolution order (from older tomore recent) Aquifex4 and thermotoga4 PPIs from Aquifex aelicusand Thermotoga maritima respectively Gram-Positive52 PPIs fromStaphylococcus aureus Cyanobacteria73 PPIs from Anabaena varia-bilis and Proteobacteria40 PPIs from Acidovorax avenae There is anadditional class (Acidobacteria46 PPIs) which is more controversialin terms of the bacterial evolution since they were discoveredHere we select Proteobacteria40 PPIs and Acidobacteria46 PPIs asthe testing graphs

PTC The PTC (The Predictive Toxicology Challenge) datasetrecords the carcinogenicity of several 100 chemical compounds formale rats (MR) female rats (FR) male mice (MM) and female mice(FM) [42] These graphs are very small (ie 20ndash30 vertices) andsparse (ie 25ndash40 edges We select the graphs of male rats (MR)for evaluation There are 344 test graphs in the MR class

COIL5 We create a dataset referred to as COIL5 from the COILimage database The COIL database consists of images of 100 3D

Table 1Information of the graph based datasets

Datasets MUTAG PPIs PTC COIL5 Shock

Max vertices 28 232 109 241 33Min vertices 10 3 2 72 4Ave vertices 1793 10960 2560 14490 1316 graphs 188 86 344 360 150 classes 2 2 2 5 10

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 7

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

objects In our experiments we use the images for the first fiveobjects For each of these objects we employ 72 images capturedfrom different viewpoints For each image we first extract cornerpoints using the Harris detector and then establish Delaunaygraphs based on the corner points as vertices Each vertex is usedas the seed of a Voronoi region which expands radially with aconstant speed The linear collision fronts of the regions delineatethe image plane into polygons and the Delaunay graph is theregion adjacency graph for the Voronoi polygons Associated witheach object there are 72 graphs from the 72 different object viewsSince the graphs of an object through different viewpoints are verydifferent this dataset is hard for graph classification

Shock The Shock dataset consists of graphs from the Shock 2Dshape database Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape There are150 graphs divided into 10 classes Each class contains 15 graphs

42 Evaluations on the von Neumann entropy with time t

We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies In this experimentwe use the testing graphs in the MUTAG PPIs PTC and COIL5datasets while we do not perform the evaluation on the Shockdataset For each graph we allow a continuous-time quantumwalk to evolve with t frac14 012hellip50 As the walk evolves fromtime tfrac140 to time tfrac1450 we compute the corresponding densitymatrix using Eq (15) Thus at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

with each graph The experimental results are shown in Fig 1 Thesubfigures of Fig 1 show the mean von Neumann entropies for thegraphs in the MUTAG PPIs PTC and COIL5 datasets separately Thex-axis shows the time t from 0 to 50 and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class Here the different lines represent the entropies forthe different classes of graphs These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset

43 Experiments on standard graph datasets from bioinformatics

431 Experimental setupWe now evaluate the performance of our quantum Jensenndash

Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU) Furthermore we compareour kernel with several alternative state-of-the-art graph kernelsThese kernels include (1) the WeisfeilerndashLehman subtree kernel(WL) [5] (2) the shortest path graph kernel (SPGK) [4] (3) theJensenndashShannon graph kernel associated with the steady staterandom walk (JSGK) [14] (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6]and (5) the random-walk graph kernel [3] For our quantumkernel we decide to let t-1 For the WeisfeilerndashLehmansubtree kernel we set the dimension of the WeisfeilerndashLehmanisomorphism as 10 Based on the definition in [5] this means thatwe compute 10 different WeisfeilerndashLehman subtree kernelmatrices (ie keth1THORN keth2THORNhellip keth10THORN) with different subtree heights

0 10 20 30 40 500

005

01

015

02

025

03

035

04

045

The time t

The

mea

n en

tropy

val

ueEntropy evaluation on the MUTAG dataset

Mutagenic aromaticMutagenic heteroaromatic

0 10 20 30 40 500

02

04

06

08

1

12

14

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the PPIs dataset

Proteobacteria40 PPIsAcidobacteria46 PPIs

0 10 20 30 40 500

005

01

015

02

025

The

mea

n en

tropy

val

ue

The time t

Entropy evaluation on the PTC dataset

First class (MR)Second class (MR)

0 10 20 30 40 50minus01

0

01

02

03

04

05

06

07

08

09

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the COIL5 dataset

box objectonion objectship objecttomato objectbottle object

Fig 1 Evaluations on the von Neumann entropy with time t (a) on the MUTAG dataset (b) on the PPIs dataset (c) on the PTC(MR) dataset and (d) on the COIL5 dataset

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎8

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

31 Alignment of the density matrix

The definition of the aligned kernel requires the computationof the permutation that minimizes the divergence between thegraphs This is equivalent to minimizing the von Neumann entropyof the mixed state ethρathornρbTHORN=2 However this is a computationallyhard task and so we perform a two step approximation to theproblem First we approximate the solution by minimizing theFrobenious (L2) distance between matrices rather than the vonNeumann entropy Second we find an approximate solution to theL2 problem using Umeyamas method [29]

The rationale behind this approximate method is given by thefact that we already have the eigenvectors and eigenvalues of theHamiltonian to hand and with these we can efficiently computethe eigendecomposition of the density matrices [37] (which isrequired in order to apply Umeyamas spectral approach) Moreprecisely the authors of [37] show that

Lρa frac14 sumλA ~Λ

λPλPTλ

sumλA ~Λ

Pλρ0PTλ

frac14 sum

λA ~Λ

Pλλρ0PTλ

frac14 sumλA ~Λ

Pλρ0PTλ

sumλA ~Λ

λPλPTλ

frac14 ρaL eth27THORN

where ρ0 denotes the density matrix for G at time tfrac140 andPλ frac14sumμethλTHORN

k frac14 1ϕλkϕTλk is the projection operator onto the subspace

spanned by the μethλTHORN eigenvectors ϕλk associated with the eigen-value λ of the Hamiltonian In other words given the spectraldecomposition of the Laplacian matrix Lfrac14ΦΛΦT if we express thedensity matrix in the eigenvector basis given by Φ then it assumesa block diagonal form where each block corresponds to aneigenspace of L corresponding to a single eigenvalue As aconsequence if L has all eigenvalues distinct then ρa will havethe same spectrum as L Otherwise we need to independentlycompute the eigendecomposition of each diagonal block resultingin a complexity OethsumλA ~ΛμethλTHORN2THORN where μethλTHORN denotes the multiplicityof the eigenvalue λ

32 Properties of the quantum JensenndashShannon kernel for graphs

Theorem 31 If the Quantum JensenndashShannon divergence is nega-tive definite then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof First note that for the computation of Quantum JensenndashShannon divergence padding the smaller matrix to the size of thelarger one is equivalent to padding both to an even larger sizeThus we can consider the computation of the kernel matrix over Ngraphs to be performed over density matrices padded to the size ofthe larger graph This eliminates the dependence of the densitymatrix of the first graph on the choice of the second graph

In [38] the authors show that if sethGaGbTHORN is a conditionallypositive semidefinite kernel ie sethGaGbTHORN is a negative definitekernel then ks frac14 expethλsethGaGbTHORNTHORN is positive semidefinite for anyλ40

Hence if the Quantum JensenndashShannon divergence DQJSethGaGbTHORNis negative definite then DQJSethGaGbTHORN is conditionally positivesemidefinite and the Quantum JensenndashShannon kernelkQJSethGaGbTHORN frac14 expethμDQJSethGaGbTHORNTHORN is positive definite

Note that this proof is conditioned on the negative definitenessof the Quantum JensenndashShannon divergence as is the case for theclassical JensenndashShannon divergence Currently this is only aconjecture proved to be true on pure quantum states and forwhich there is a strong empirical evidence [39]

In general for the optimal alignment kernel [1011] we cannotguarantee positive definiteness for the aligned version of our

quantum kernel due to the lack of transitivity of the alignmentsWe do have however a similar result when using any (possiblynon optimal) set of alignments that satisfy transitivity

Corollary 32 If the Quantum JensenndashShannon divergence is nega-tive definite and the density matrices are aligned by a transitive set ofpermutations then the unaligned Quantum JensenndashShannon Kernel ispositive definite

Proof If the alignments are transitive each of the density matricescan be aligned to a single common frame for example that of the firstgraph The aligned kernel on the original unaligned data is equivalentto the unaligned kernel on the new pre-aligned data

Another important property for a kernel is its independence onthe ordering of the data There are two ways by which the kernelcan depend on this ordering First it might depend on the order inwhich different graphs are presented Second it might depend onthe order in which the vertices are presented in each graph

Since the kernel only depends on second order relations asufficient condition for the first type of independence is for thekernel to be deterministic and symmetric If this is so then apermutation of the graphs will just result in the same permutationto be applied to the rows and columns of the kernel matrix For thesecond type of invariance we need full invariance of the computedkernel values with respect to vertex permutations

The unaligned kernel trivially satisfies the first condition sincethe Quantum JensenndashShannon divergence is both deterministicand symmetric while it fails to satisfy the second condition thusresulting in a kernel that is independent on the order of thegraphs but not on the vertex order within each graph For thealigned kernel on the other hand the addition of the alignmentstep requires a little more analysis

Theorem 33 The aligned Quantum JensenndashShannon kernel isindependent both to graph and to vertex order

Proof Recall that given two graphs Ga and Gb with mixed statedensity matrices ρa and ρb the aligned kernel is

kQJSAethGaGbTHORN frac14maxQ AΣ

exp μDQJS ρaQρbQT

eth28THORN

As a result the kernel value is uniquely identified evenwhen there aremultiple possible alignments Due to their optimality these align-ments must all result in the same optimal divergence It is easy to seethat the kernel is also symmetric In fact since the von Neumannentropy is invariant to similarity transformations we have

HNρathornQρbQ

T

2

frac14HN Q

QTρaQthornρb2

QT

frac14HN

QTρaQthornρb2

eth29THORN

From the above DQJSethρaQρbQT THORN frac14DQJSethQTρaQ ρbTHORN frac14DQJSethρbQTρaQ THORN

Hence

kQJSAethGaGbTHORN frac14maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN

frac14maxQ AΣ

expethμDQJSethρbQρaQT THORNTHORN frac14 kQJSAethGbGaTHORN eth30THORN

As a result the aligned kernel is deterministic and symmetric even ifthe optimizer might not be Thus the kernel is independent of theorder of the graphs

The independence of vertex ordering again derives directlyfrom the optimality of the value of the divergence Let G0

b be avertex-permuted version of Gb such that its mixing matrix

ρ0b frac14 TρbTT for a permutation matrix TAΣ and assume that

kQJSAethGaG0bTHORNakQJSAethGaGbTHORN Without loss of generality we can

assume kQJSAethGaG0bTHORN4kQJSAethGaGbTHORN Let

Qnfrac14 argmaxQΣ

expethμDQJSethρaQρ0bQT THORNTHORN eth31THORN

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎6

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

then

expethμDQJSethρaQnρ0bQnT THORNTHORN frac14 expethμDQJSethρaQnTρbT

TQnT THORNTHORN

4maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN eth32THORN

leading to a contradiction

Note that these properties depend on the optimality of thealignment with respect to the Quantum JensenndashShannon diver-gence We approximate the optimal alignment by the minimiza-tion of the Frobenious (L2) norm between the density matricesThis is approximated using Umeyamas algorithm since the result-ing matching problem is still in general NP-hard However in theproof the optimality was used as a means to uniquely identify thevalue of the divergence Any other uniquely identified value wouldstill give an order invariant kernel even if this is suboptimal

Eq (29) does not depend on the choice of optimal alignmentand so it would still hold under L2 optimality The main problem iswhether two permutations leading to the same (minimal) Frobe-nious distance between the density matrices can give differentdivergence values If permutations Q1 and Q2 differ by thecombination of symmetries present in the two graphs ieQ1 frac14 SQ2R with SAAutethGaTHORN and RAAutethGbTHORN then clearly they willhave the same divergence since ρa frac14 SρsS

T and ρb frac14 RρbRT but that

might not characterize all the L2 optimal permutations None-theless we have the following

Theorem 34 If the L2 optimal permutation is unique moduloisomorphism in the two graphs then the L2 approximation isindependent of the order of the graphs

Proof Derives directly from the uniqueness of the kernel value forall the optimizers

There remains another source of error since Umeyamas algo-rithm is only an approximation to the Frobenious minimizer Theimpact of this is much wider since the lack of optimality implies aninability to uniquely identify the value of the kernel in an invariantway However this is implicit in all alignment-based approachesdue to the NP hard nature of the graph matching problem

33 Complexity analysis

For a graph dataset having N graphs the kernel matrix of thequantum JensenndashShannon graph kernel can be computed usingAlgorithm 1

Algorithm 1 Computing the kernel matrix of the quantumJensenndashShannon graph kernel on N graphs

Input A graph dataset Gfrac14 fG1hellipGahellipGbhellipGNg with N testgraphs Output A jNj jNj kernel matrix KQJS1 Computing the density matrices of graphs For each graph

GaethVa EaTHORN compute the density matrix evolving from time0 to time T using the definition in Section 19

2 Computing the density matrix for each pair of graphs Foreach pair of graphs GaethVa EaTHORN and GbethVb EbTHORN with theirdensity matrices ρa and ρb compute ρa thornρb

2 using theunaligned or the aligned procedure

3 Computing the kernel matrix for the N graphs Computethe kernel matrix KQJS For each pair of graphs GaethVa EaTHORNand GbethVb EbTHORN compute the ethGaGbTHORN th element of KQJS askQJSethGaGbTHORN using Eq (24) or Eq (26)

For N graphs each of which has n vertices the time complexityof the quantum JensenndashShannon graph kernel on all pairs of thesegraphs is given by the three computational steps of Algorithm 1

These are (a) The computation of the density matrix for eachgraph Based on the definition in Section 22 this step has timecomplexity OethNn3THORN (b) The computation of the aligned densitymatrix and the unaligned density matrix for all pairs of graphsThis has time complexities OethN2n4THORN and OethN2nTHORN (c) As a result thetime complexities of the quantum kernel using the aligned andunaligned density matrices are OethNn3thornN2n4THORN and OethNn3thornN2nTHORNrespectively

4 Experimental evaluation

In this section we test our graph kernels on several standardgraph based datasets There are three stages to our experimentalevaluation First we evaluate the stability of the von Neumannentropies obtained from the density matrices with time t Secondwe empirically compare our new kernel with several alternativestate-of-the-art graph kernels Finally we evaluate the computa-tional efficiency of our new kernel

41 Graph datasets

We show the performance of our quantum JensenndashShannongraph kernel on five standard graph based datasets from bioinfor-matics and computer vision These datasets include MUTAG PPIsPTC(MR) COIL5 and Shock

Some statistic concerning the datasets are given in Table 1MUTAG The MUTAG dataset consists of graphs representing

188 chemical compounds and aims to predict whether eachcompound possesses mutagenicity [40] Since the vertices andedges of each compound are labeled with a real number wetransform these graphs into unweighted graphs

PPIs The PPIs dataset consists of proteinndashprotein interactionnetworks (PPIs) [41] The graphs describe the interaction relation-ships between histidine kinases in different species of bacteriaHistidine kinase is a key protein in the development of signaltransduction If two proteins have direct (physical) or indirect(functional) association they are connected by an edge There are219 PPIs in this dataset and they are collected from five differentkinds of bacteria with the following evolution order (from older tomore recent) Aquifex4 and thermotoga4 PPIs from Aquifex aelicusand Thermotoga maritima respectively Gram-Positive52 PPIs fromStaphylococcus aureus Cyanobacteria73 PPIs from Anabaena varia-bilis and Proteobacteria40 PPIs from Acidovorax avenae There is anadditional class (Acidobacteria46 PPIs) which is more controversialin terms of the bacterial evolution since they were discoveredHere we select Proteobacteria40 PPIs and Acidobacteria46 PPIs asthe testing graphs

PTC The PTC (The Predictive Toxicology Challenge) datasetrecords the carcinogenicity of several 100 chemical compounds formale rats (MR) female rats (FR) male mice (MM) and female mice(FM) [42] These graphs are very small (ie 20ndash30 vertices) andsparse (ie 25ndash40 edges We select the graphs of male rats (MR)for evaluation There are 344 test graphs in the MR class

COIL5 We create a dataset referred to as COIL5 from the COILimage database The COIL database consists of images of 100 3D

Table 1Information of the graph based datasets

Datasets MUTAG PPIs PTC COIL5 Shock

Max vertices 28 232 109 241 33Min vertices 10 3 2 72 4Ave vertices 1793 10960 2560 14490 1316 graphs 188 86 344 360 150 classes 2 2 2 5 10

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 7

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

objects In our experiments we use the images for the first fiveobjects For each of these objects we employ 72 images capturedfrom different viewpoints For each image we first extract cornerpoints using the Harris detector and then establish Delaunaygraphs based on the corner points as vertices Each vertex is usedas the seed of a Voronoi region which expands radially with aconstant speed The linear collision fronts of the regions delineatethe image plane into polygons and the Delaunay graph is theregion adjacency graph for the Voronoi polygons Associated witheach object there are 72 graphs from the 72 different object viewsSince the graphs of an object through different viewpoints are verydifferent this dataset is hard for graph classification

Shock The Shock dataset consists of graphs from the Shock 2Dshape database Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape There are150 graphs divided into 10 classes Each class contains 15 graphs

42 Evaluations on the von Neumann entropy with time t

We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies In this experimentwe use the testing graphs in the MUTAG PPIs PTC and COIL5datasets while we do not perform the evaluation on the Shockdataset For each graph we allow a continuous-time quantumwalk to evolve with t frac14 012hellip50 As the walk evolves fromtime tfrac140 to time tfrac1450 we compute the corresponding densitymatrix using Eq (15) Thus at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

with each graph The experimental results are shown in Fig 1 Thesubfigures of Fig 1 show the mean von Neumann entropies for thegraphs in the MUTAG PPIs PTC and COIL5 datasets separately Thex-axis shows the time t from 0 to 50 and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class Here the different lines represent the entropies forthe different classes of graphs These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset

43 Experiments on standard graph datasets from bioinformatics

431 Experimental setupWe now evaluate the performance of our quantum Jensenndash

Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU) Furthermore we compareour kernel with several alternative state-of-the-art graph kernelsThese kernels include (1) the WeisfeilerndashLehman subtree kernel(WL) [5] (2) the shortest path graph kernel (SPGK) [4] (3) theJensenndashShannon graph kernel associated with the steady staterandom walk (JSGK) [14] (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6]and (5) the random-walk graph kernel [3] For our quantumkernel we decide to let t-1 For the WeisfeilerndashLehmansubtree kernel we set the dimension of the WeisfeilerndashLehmanisomorphism as 10 Based on the definition in [5] this means thatwe compute 10 different WeisfeilerndashLehman subtree kernelmatrices (ie keth1THORN keth2THORNhellip keth10THORN) with different subtree heights

0 10 20 30 40 500

005

01

015

02

025

03

035

04

045

The time t

The

mea

n en

tropy

val

ueEntropy evaluation on the MUTAG dataset

Mutagenic aromaticMutagenic heteroaromatic

0 10 20 30 40 500

02

04

06

08

1

12

14

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the PPIs dataset

Proteobacteria40 PPIsAcidobacteria46 PPIs

0 10 20 30 40 500

005

01

015

02

025

The

mea

n en

tropy

val

ue

The time t

Entropy evaluation on the PTC dataset

First class (MR)Second class (MR)

0 10 20 30 40 50minus01

0

01

02

03

04

05

06

07

08

09

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the COIL5 dataset

box objectonion objectship objecttomato objectbottle object

Fig 1 Evaluations on the von Neumann entropy with time t (a) on the MUTAG dataset (b) on the PPIs dataset (c) on the PTC(MR) dataset and (d) on the COIL5 dataset

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎8

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

then

expethμDQJSethρaQnρ0bQnT THORNTHORN frac14 expethμDQJSethρaQnTρbT

TQnT THORNTHORN

4maxQ AΣ

expethμDQJSethρaQρbQT THORNTHORN eth32THORN

leading to a contradiction

Note that these properties depend on the optimality of thealignment with respect to the Quantum JensenndashShannon diver-gence We approximate the optimal alignment by the minimiza-tion of the Frobenious (L2) norm between the density matricesThis is approximated using Umeyamas algorithm since the result-ing matching problem is still in general NP-hard However in theproof the optimality was used as a means to uniquely identify thevalue of the divergence Any other uniquely identified value wouldstill give an order invariant kernel even if this is suboptimal

Eq (29) does not depend on the choice of optimal alignmentand so it would still hold under L2 optimality The main problem iswhether two permutations leading to the same (minimal) Frobe-nious distance between the density matrices can give differentdivergence values If permutations Q1 and Q2 differ by thecombination of symmetries present in the two graphs ieQ1 frac14 SQ2R with SAAutethGaTHORN and RAAutethGbTHORN then clearly they willhave the same divergence since ρa frac14 SρsS

T and ρb frac14 RρbRT but that

might not characterize all the L2 optimal permutations None-theless we have the following

Theorem 34 If the L2 optimal permutation is unique moduloisomorphism in the two graphs then the L2 approximation isindependent of the order of the graphs

Proof Derives directly from the uniqueness of the kernel value forall the optimizers

There remains another source of error since Umeyamas algo-rithm is only an approximation to the Frobenious minimizer Theimpact of this is much wider since the lack of optimality implies aninability to uniquely identify the value of the kernel in an invariantway However this is implicit in all alignment-based approachesdue to the NP hard nature of the graph matching problem

33 Complexity analysis

For a graph dataset having N graphs the kernel matrix of thequantum JensenndashShannon graph kernel can be computed usingAlgorithm 1

Algorithm 1 Computing the kernel matrix of the quantumJensenndashShannon graph kernel on N graphs

Input A graph dataset Gfrac14 fG1hellipGahellipGbhellipGNg with N testgraphs Output A jNj jNj kernel matrix KQJS1 Computing the density matrices of graphs For each graph

GaethVa EaTHORN compute the density matrix evolving from time0 to time T using the definition in Section 19

2 Computing the density matrix for each pair of graphs Foreach pair of graphs GaethVa EaTHORN and GbethVb EbTHORN with theirdensity matrices ρa and ρb compute ρa thornρb

2 using theunaligned or the aligned procedure

3 Computing the kernel matrix for the N graphs Computethe kernel matrix KQJS For each pair of graphs GaethVa EaTHORNand GbethVb EbTHORN compute the ethGaGbTHORN th element of KQJS askQJSethGaGbTHORN using Eq (24) or Eq (26)

For N graphs each of which has n vertices the time complexityof the quantum JensenndashShannon graph kernel on all pairs of thesegraphs is given by the three computational steps of Algorithm 1

These are (a) The computation of the density matrix for eachgraph Based on the definition in Section 22 this step has timecomplexity OethNn3THORN (b) The computation of the aligned densitymatrix and the unaligned density matrix for all pairs of graphsThis has time complexities OethN2n4THORN and OethN2nTHORN (c) As a result thetime complexities of the quantum kernel using the aligned andunaligned density matrices are OethNn3thornN2n4THORN and OethNn3thornN2nTHORNrespectively

4 Experimental evaluation

In this section we test our graph kernels on several standardgraph based datasets There are three stages to our experimentalevaluation First we evaluate the stability of the von Neumannentropies obtained from the density matrices with time t Secondwe empirically compare our new kernel with several alternativestate-of-the-art graph kernels Finally we evaluate the computa-tional efficiency of our new kernel

41 Graph datasets

We show the performance of our quantum JensenndashShannongraph kernel on five standard graph based datasets from bioinfor-matics and computer vision These datasets include MUTAG PPIsPTC(MR) COIL5 and Shock

Some statistic concerning the datasets are given in Table 1MUTAG The MUTAG dataset consists of graphs representing

188 chemical compounds and aims to predict whether eachcompound possesses mutagenicity [40] Since the vertices andedges of each compound are labeled with a real number wetransform these graphs into unweighted graphs

PPIs The PPIs dataset consists of proteinndashprotein interactionnetworks (PPIs) [41] The graphs describe the interaction relation-ships between histidine kinases in different species of bacteriaHistidine kinase is a key protein in the development of signaltransduction If two proteins have direct (physical) or indirect(functional) association they are connected by an edge There are219 PPIs in this dataset and they are collected from five differentkinds of bacteria with the following evolution order (from older tomore recent) Aquifex4 and thermotoga4 PPIs from Aquifex aelicusand Thermotoga maritima respectively Gram-Positive52 PPIs fromStaphylococcus aureus Cyanobacteria73 PPIs from Anabaena varia-bilis and Proteobacteria40 PPIs from Acidovorax avenae There is anadditional class (Acidobacteria46 PPIs) which is more controversialin terms of the bacterial evolution since they were discoveredHere we select Proteobacteria40 PPIs and Acidobacteria46 PPIs asthe testing graphs

PTC The PTC (The Predictive Toxicology Challenge) datasetrecords the carcinogenicity of several 100 chemical compounds formale rats (MR) female rats (FR) male mice (MM) and female mice(FM) [42] These graphs are very small (ie 20ndash30 vertices) andsparse (ie 25ndash40 edges We select the graphs of male rats (MR)for evaluation There are 344 test graphs in the MR class

COIL5 We create a dataset referred to as COIL5 from the COILimage database The COIL database consists of images of 100 3D

Table 1Information of the graph based datasets

Datasets MUTAG PPIs PTC COIL5 Shock

Max vertices 28 232 109 241 33Min vertices 10 3 2 72 4Ave vertices 1793 10960 2560 14490 1316 graphs 188 86 344 360 150 classes 2 2 2 5 10

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 7

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

objects In our experiments we use the images for the first fiveobjects For each of these objects we employ 72 images capturedfrom different viewpoints For each image we first extract cornerpoints using the Harris detector and then establish Delaunaygraphs based on the corner points as vertices Each vertex is usedas the seed of a Voronoi region which expands radially with aconstant speed The linear collision fronts of the regions delineatethe image plane into polygons and the Delaunay graph is theregion adjacency graph for the Voronoi polygons Associated witheach object there are 72 graphs from the 72 different object viewsSince the graphs of an object through different viewpoints are verydifferent this dataset is hard for graph classification

Shock The Shock dataset consists of graphs from the Shock 2Dshape database Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape There are150 graphs divided into 10 classes Each class contains 15 graphs

42 Evaluations on the von Neumann entropy with time t

We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies In this experimentwe use the testing graphs in the MUTAG PPIs PTC and COIL5datasets while we do not perform the evaluation on the Shockdataset For each graph we allow a continuous-time quantumwalk to evolve with t frac14 012hellip50 As the walk evolves fromtime tfrac140 to time tfrac1450 we compute the corresponding densitymatrix using Eq (15) Thus at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

with each graph The experimental results are shown in Fig 1 Thesubfigures of Fig 1 show the mean von Neumann entropies for thegraphs in the MUTAG PPIs PTC and COIL5 datasets separately Thex-axis shows the time t from 0 to 50 and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class Here the different lines represent the entropies forthe different classes of graphs These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset

43 Experiments on standard graph datasets from bioinformatics

431 Experimental setupWe now evaluate the performance of our quantum Jensenndash

Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU) Furthermore we compareour kernel with several alternative state-of-the-art graph kernelsThese kernels include (1) the WeisfeilerndashLehman subtree kernel(WL) [5] (2) the shortest path graph kernel (SPGK) [4] (3) theJensenndashShannon graph kernel associated with the steady staterandom walk (JSGK) [14] (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6]and (5) the random-walk graph kernel [3] For our quantumkernel we decide to let t-1 For the WeisfeilerndashLehmansubtree kernel we set the dimension of the WeisfeilerndashLehmanisomorphism as 10 Based on the definition in [5] this means thatwe compute 10 different WeisfeilerndashLehman subtree kernelmatrices (ie keth1THORN keth2THORNhellip keth10THORN) with different subtree heights

0 10 20 30 40 500

005

01

015

02

025

03

035

04

045

The time t

The

mea

n en

tropy

val

ueEntropy evaluation on the MUTAG dataset

Mutagenic aromaticMutagenic heteroaromatic

0 10 20 30 40 500

02

04

06

08

1

12

14

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the PPIs dataset

Proteobacteria40 PPIsAcidobacteria46 PPIs

0 10 20 30 40 500

005

01

015

02

025

The

mea

n en

tropy

val

ue

The time t

Entropy evaluation on the PTC dataset

First class (MR)Second class (MR)

0 10 20 30 40 50minus01

0

01

02

03

04

05

06

07

08

09

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the COIL5 dataset

box objectonion objectship objecttomato objectbottle object

Fig 1 Evaluations on the von Neumann entropy with time t (a) on the MUTAG dataset (b) on the PPIs dataset (c) on the PTC(MR) dataset and (d) on the COIL5 dataset

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎8

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

objects In our experiments we use the images for the first fiveobjects For each of these objects we employ 72 images capturedfrom different viewpoints For each image we first extract cornerpoints using the Harris detector and then establish Delaunaygraphs based on the corner points as vertices Each vertex is usedas the seed of a Voronoi region which expands radially with aconstant speed The linear collision fronts of the regions delineatethe image plane into polygons and the Delaunay graph is theregion adjacency graph for the Voronoi polygons Associated witheach object there are 72 graphs from the 72 different object viewsSince the graphs of an object through different viewpoints are verydifferent this dataset is hard for graph classification

Shock The Shock dataset consists of graphs from the Shock 2Dshape database Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape There are150 graphs divided into 10 classes Each class contains 15 graphs

42 Evaluations on the von Neumann entropy with time t

We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies In this experimentwe use the testing graphs in the MUTAG PPIs PTC and COIL5datasets while we do not perform the evaluation on the Shockdataset For each graph we allow a continuous-time quantumwalk to evolve with t frac14 012hellip50 As the walk evolves fromtime tfrac140 to time tfrac1450 we compute the corresponding densitymatrix using Eq (15) Thus at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

with each graph The experimental results are shown in Fig 1 Thesubfigures of Fig 1 show the mean von Neumann entropies for thegraphs in the MUTAG PPIs PTC and COIL5 datasets separately Thex-axis shows the time t from 0 to 50 and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class Here the different lines represent the entropies forthe different classes of graphs These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset

43 Experiments on standard graph datasets from bioinformatics

431 Experimental setupWe now evaluate the performance of our quantum Jensenndash

Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU) Furthermore we compareour kernel with several alternative state-of-the-art graph kernelsThese kernels include (1) the WeisfeilerndashLehman subtree kernel(WL) [5] (2) the shortest path graph kernel (SPGK) [4] (3) theJensenndashShannon graph kernel associated with the steady staterandom walk (JSGK) [14] (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6]and (5) the random-walk graph kernel [3] For our quantumkernel we decide to let t-1 For the WeisfeilerndashLehmansubtree kernel we set the dimension of the WeisfeilerndashLehmanisomorphism as 10 Based on the definition in [5] this means thatwe compute 10 different WeisfeilerndashLehman subtree kernelmatrices (ie keth1THORN keth2THORNhellip keth10THORN) with different subtree heights

0 10 20 30 40 500

005

01

015

02

025

03

035

04

045

The time t

The

mea

n en

tropy

val

ueEntropy evaluation on the MUTAG dataset

Mutagenic aromaticMutagenic heteroaromatic

0 10 20 30 40 500

02

04

06

08

1

12

14

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the PPIs dataset

Proteobacteria40 PPIsAcidobacteria46 PPIs

0 10 20 30 40 500

005

01

015

02

025

The

mea

n en

tropy

val

ue

The time t

Entropy evaluation on the PTC dataset

First class (MR)Second class (MR)

0 10 20 30 40 50minus01

0

01

02

03

04

05

06

07

08

09

The time t

The

mea

n en

tropy

val

ue

Entropy evaluation on the COIL5 dataset

box objectonion objectship objecttomato objectbottle object

Fig 1 Evaluations on the von Neumann entropy with time t (a) on the MUTAG dataset (b) on the PPIs dataset (c) on the PTC(MR) dataset and (d) on the COIL5 dataset

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎8

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

h (hfrac1412hellip10) respectively Note that the WL and SPGK kernelsare able to accommodate attributed graphs In our experimentswe use the vertex degree as a vertex label for the WL and SPGKkernels

For each kernel and dataset we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classification accuracies of the different kernels Morespecifically we use the C-SVM implementation of LIBSVM [43] Foreach class we use 90 of the samples for training and theremaining 10 for testing The parameters of the C-SVMs areoptimized separately for each dataset We repeat the evaluation 10times and we report the average classification accuracies (7stan-dard error) of each kernel in Table 2 Furthermore we also reportthe runtime of computing the kernel matrices of each kernel inTable 3 with the runtime measured under Matlab R2011a runningon a 25 GHz Intel 2-Core processor (ie i5-3210m) Note that forthe WL kernel the classification accuracies are the averageaccuracies over all the 10 matrices and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details)

432 Results and discussion(a) On the MUTAG dataset the accuracies for all of the kernels

are similar The SPGK kernel achieves the highest accuracy Yet theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

Table 2Accuracy comparisons (in 7standard errors) on graph datasets abstracted frombioinformatics and computer vision

Datasets MUTAG PPIs PTC(MR) COIL5 Shock

QJSU 82727 44 69507120 56707 49 70117 61 40607 92QJSA 82837 50 73377104 57397 46 69757 58 42807 86WL 82057 57 78507140 56057 51 33167101 36407100SPGK 83387 81 61127109 56557 53 69667 52 37887 93JSGK 83117 80 57877136 57297 41 69137 79 21737 76BRWK 77507 75 53507147 53977 31 14637 21 0337 37RWGK 80777 72 55007 88 55917 37 20807 47 2267101

Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision

Datasets MUTAG PPIs PTC COIL5 Shock

QJSU 20Prime 59Prime 1046Prime 18020Prime 14PrimeQJSA 1030Prime 23025Prime 16040Prime 8h290 32PrimeWL 4Prime 13Prime 11Prime 105Prime 3PrimeSPGK 1Prime 7Prime 1Prime 31Prime 1PrimeJSGK 1Prime 1Prime 1Prime 1Prime 1PrimeBRWK 2Prime 14020Prime 3Prime 16046Prime 8PrimeRWGK 46Prime 107Prime 2035Prime 19040Prime 23Prime

0 5 10 15 20 25 300

02

04

06

08

1

12

14

16

18

Graph size n

Run

time

in s

econ

ds

0 50 100 150 200 250 3000

05

1

15

Graph size n

Run

time

in s

econ

ds

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Data size N

Run

time

in s

econ

ds

0 20 40 60 80 1000

50

100

150

Data size N

Run

time

in s

econ

ds

Fig 2 Runtime evaluation (a) QJSA with graph size n (b) QJSU with graph size n (c) QJSA with data size N and (d) QJSU with data size N

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 9

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

kernels (b) On the PPIs dataset the WL kernel achieves thehighest accuracy The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel but outperform thoseof other kernels (c) On the PTC dataset the accuracies for all of thekernels are similar Our QJSA quantum kernel achieves the highestaccuracy The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels (d) On the COIL5 dataset theaccuracies of all the kernels are similar with the exception of theWL RWGK and BRWK kernels Our QJSA quantum kernel achievesthe highest accuracy The accuracy of our QJSU quantum kernelovercomes that of other kernels (e) On the Shock dataset ourQJSA quantum kernel achieves the highest accuracy The accuracyof our QJSU quantum kernel overcomes that of other kernels

Overall in terms of classification accuracy our QJSA and QJSUquantum JensenndashShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels Especially the classifica-tion accuracies of our quantum kernel are significantly better thanthose of the graph kernels using the classical JensenndashShannondivergence the classical random walk and the backtracklessrandom walk This suggests that our kernel which makes use ofcontinuous-time quantum walks to probe the graphs structure issuccessful in capturing the structural similarities between differ-ent graphs Furthermore we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel The reason for this is that the aligned density matrixcomputed through Umeyamas matching method can reflect the

precise correspondence information between pairs of vertices inthese graphs while the unaligned density matrix ignores thecorrespondence information and it is not permutation invariantto the vertex order

In terms of the runtime all the kernels can complete thecomputation in polynomial time on all the datasets The computa-tion efficiency of our QJSU is lower than that of the WL SPGK andJSGK kernels but it is faster than that of the BRWK and RWGKkernels (ie the kernels using the classical random walk and thebacktrackless random walk) The computational efficiency of ourQJSA kernel is lower than any alternative kernel The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices

44 Computational evaluation

In this subsection we evaluate the relationship between thecomputational overheads (ie the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs

441 Experimental setupWe evaluate the computational efficiency of our quantum

kernel on randomly generated graphs with respect to two para-meters (a) the graph size n and (b) the graph dataset size N

0 2 4 6 8 10 12minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10 12minus7

minus6

minus5

minus4

minus3

minus2

minus1

0

1

3log(n)

Log

Run

time

0 2 4 6 8 10minus6

minus4

minus2

0

2

4

6

2log(N)

Log

Run

time

0 2 4 6 8 10minus5

minus4

minus3

minus2

minus1

0

1

2

3

4

5

2log(N)

Log

Run

time

Fig 3 Log runtime evaluation (a) log runtime vs 3 log ethnTHORN using QJSA kernel (b) log runtime vs 3 log ethnTHORN using QJSU kernel (c) log runtime vs 2 log ethNTHORN using QJSA kernel and(d) log runtime vs 2 log ethNTHORN using QJSU kernel

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎10

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

More specifically we vary nfrac14 f1020hellip300g and Nfrac14 f12hellip100gseparately

We first generate 30 pairs of graphs with an increasing numberof vertices We report the runtime for computing the kernel valuesfor all the pairs of graphs We also generate 100 graph datasetswith an increasing number of test graphs Each test graph has 50vertices We report the runtime for computing the kernel matricesof the graphs from each dataset The CPU runtime is reported inFig 2 The experiments are run in Matlab R2011a on a 25 GHzIntel 2-Core processor (ie i5-3210m)

442 Experimental resultsFig 2(a) and (c) shows the results obtained with the QJSA

kernel when varying the parameters n and N respectively Fig 2(b) and (d) shows those obtained with the QJSU kernel Further-more for the parameters n and N we also plot the log runtimeversus 3 log ethnTHORN and 2 log ethNTHORN respectively Fig 3(a) and (c) showsthose obtained with the QJSA kernel Fig 3(b) and (d) shows thoseobtained with the QJSA kernel In Fig 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (ie 3 log ethnTHORN and 2 log ethNTHORN)

From Fig 2 and 3 we obtain the following conclusions Whenvarying the number of vertices n of the graphs we observe thatthe runtime for computing the quantum JensenndashShannon QJSAand QJSU graph kernels scales approximately cubically with nWhen varying the graph dataset size N we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N These computational evaluations verify thatour quantum JensenndashShannon graph kernel can be computed inpolynomial time

Furthermore we observe that the QJSU kernel is a little moreefficient than the QJSA kernel The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences

5 Conclusion

In this paper we have developed a novel graph kernel by usingthe quantum JensenndashShannon divergence and the continuous-time quantum walk on a graph Given a graph we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph From the density matrix for the mixed state we computedthe von Neumann entropy With the von Neumann entropies tohand the kernel between a pair of graphs was defined as afunction of the quantum JensenndashShannon divergence betweenthe corresponding density matrices Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel

Our future work includes extending the quantum graph kernelto hypergraphs Bai et al [44] have developed a hypergraph kernelby using the classical JensenndashShannon divergence and Ren et al[45] have explored the use of discrete-time quantum walks ondirected line graphs which can be used as structural representa-tions for hypergraphs It would thus be interesting to extend theseworks by using the quantum JensenndashShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs

Conflict of interest statement

None declared

Acknowledgements

Edwin R Hancock is supported by a Royal Society WolfsonResearch Merit Award

We thank Prof Karsten Borgwardt and Dr Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods and Dr Geng Li for providing the graph datasetsWe also thank Prof Horst Bunke and Dr Peng Ren for theconstructive discussion and suggestions

References

[1] B Scholkopf A Smola Learning with Kernels MIT Press 2002[2] D Haussler Convolution Kernels on Discrete Structures Technical Report UCS-

CRL-99-10 Santa Cruz CA USA 1999[3] H Kashima K Tsuda A Inokuchi Marginalized kernels between labeled

graphs in Proceedings of International Conference on Machine Learning(ICML) 2003 pp 321ndash328

[4] KM Borgwardt HP Kriegel Shortest-path kernels on graphs in Proceedingsof the IEEE International Conference on Data Mining (ICDM) 2005 pp 74ndash81

[5] N Shervashidze P Schweitzer EJ van Leeuwen K Mehlhorn KM BorgwardtWeisfeilerndashLehman graph kernels J Mach Learn Res 1 (2010) 1ndash48

[6] F Aziz RC Wilson ER Hancock Backtrackless walks on a graph IEEE TransNeural Netw Learn Syst 24 (2013) 977ndash989

[7] F Costa KD Grave Fast neighborhood subgraph pairwise distance kernel inProceedings of International Conference on Machine Learning (ICML) 2010pp 255ndash262

[8] Kriege Nils Mutzel Petra Subgraph matching Kernels for attributed graphs inProceedings of International Conference on Machine Learning (ICML) 2012

[9] Neumann Marion Novi Patricia Roman Garnett Kristian Kersting Efficientgraph kernels by randomization Machine Learning and Knowledge Discoveryin Databases Springer Berlin Heidelberg (2012) 378ndash393

[10] Holger Frohlich Jorg K Wegner Florian Sieker Andreas Zell Optimal assign-ment kernels for attributed molecular graphs in International Conference onMachine Learning 2005 pp 225ndash232

[11] Jean-Philippe Vert The Optimal Assignment Kernel is Not Positive DefinitearXiv08014061 2008

[12] Michel Neuhaus Horst Bunke Edit distance-based kernel functions forstructural pattern classification Pattern Recognit 39 (10) (2006) 1852ndash1863

[13] P Lamberti A Majtey A Borras M Casas A Plastino Metric character of thequantum JensenndashShannon divergence Phys Rev A 77 (2008) 052311

[14] L Bai ER Hancock Graph kernels from the JensenndashShannon divergence JMath Imaging Vis 47 (2013) 60ndash69

[15] LK Grover A fast quantum mechanical algorithm for database search inProceedings of the ACM Symposium on the Theory of Computing (STOC) 1996pp 212ndash219

[16] PW Shor Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer SIAM J Comput 26 (1997) 1484ndash1509

[18] A Majtey P Lamberti D Prato JensenndashShannon divergence as a measure ofdistinguishability between mixed quantum states Phys Rev A 72 (2005)052310

[19] E Farhi S Gutmann Quantum computation and decision trees Phys Rev A 58(1998) 915

[20] D Aharonov A Ambainis J Kempe U Vazirani Quantumwalks on graphs inProceedings of ACM Theory of Computing (STOC) 2001 pp 50ndash59

[21] A Ambainis E Bach A Nayak A Vishwanath J Watrous One-dimensionalquantum walks in Proceedings of ACM Theory of Computing (STOC) 2001pp 60ndash69

[22] A Ambainis Quantum walks and their algorithmic applications Int JQuantum Inf 1 (2003) 507ndash518

[23] AM Childs E Farhi S Gutmann An example of the difference betweenquantum and classical random walks Quantum Inf Process 1 (2002) 35ndash53

[24] AM Childs Universal computation by quantum walk Phys Rev Lett 102 (18)(2009) 180501

[26] L Bai ER Hancock A Torsello L Rossi A quantum JensenndashShannon graphkernel using the continuous-time quantum walk in Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR) 2013 pp 121ndash131

[27] L Rossi A Torsello ER Hancock A continuous-time quantum walk kernel forunattributed graphs in Proceedings of Graph-Based Representations inPattern Recognition (GbRPR) 2013 pp 101ndash110

[28] L Rossi A Torsello ER Hancock Attributed graph similarity from thequantum JensenndashShannon divergence in Proceedings of Similarity-BasedPattern Recognition (SIMBAD) 2013 pp 204ndash218

[29] S Umeyama An eigendecomposition approach to weighted graph matchingproblems IEEE Trans Pattern Anal Mach Intell 10 (5) (1988) 695ndash703

[30] Nan Hu Leonidas Guibas Spectral Descriptors for Graph MatchingarXiv13041572 2013

[31] J Kempe Quantum random walks an introductory overview Contemp Phys44 (2003) 307ndash327

[32] M Nielsen I Chuang Quantum Computation and Quantum InformationCambridge University Press 2010

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎ 11

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i

[33] AF Martins NA Smith EP Xing PM Aguiar MA Figueiredo Nonextensiveinformation theoretic kernels on measures J Mach Learn Res 10 (2009)935ndash975

[34] WK Wootters Statistical distance and Hilbert space Phys Rev D 23 (2)(1981) 357

[35] G Lindblad Entropy information and quantum measurements CommunMath Phys 33 (4) (1973) 305ndash322

[36] D Bures An extension of Kakutanis theorem on infinite product measures tothe tensor product of semifinite Wn-algebras Trans Am Math Soc 135(1969) 199ndash212

[37] L Rossi A Torsello ER Hancock R Wilson Characterising graph symmetriesthrough quantum JensenndashShannon divergence Phys Rev E 88 (3) (2013)032806

[38] R Konder J Lafferty Diffusion kernels on graphs and other discrete inputspaces in Proceedings of International Conference on Machine Learning(ICML) 2002 pp 315ndash322

[39] B Jop P Harremos Properties of classical and quantum JensenndashShannondivergence Phys Rev A 79 (5) (2009) 052311

[40] AK Debnath RL Lopez de Compadre G Debnath AJ Shusterman C HanschStructure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds correlation with molecular orbital energies and hydropho-bicity J Med Chem 34 (1991) 786ndash797

[41] F Escolano ER Hancock MA Lozano Heat diffusion thermodynamic depthcomplexity of networks Phys Rev E 85 (2012) 036206

[42] G Li M Semerci B Yener MJ Zaki Effective graph classification based ontopological and label attributes Stat Anal Data Mining 5 (2012) 265ndash283

[43] C-C Chang C-J Lin LIBSVM A Library for Support Vector Machines Softwareavailable at langhttpwwwcsientuedutwcjlinlibsvmrang 2011

[44] L Bai ER Hancock P Ren A JensenndashShannon kernel for hypergraphs inProceedings of Structural Syntactic and Statistical Pattern Recognition-JointIAPR International Workshop (SSPRampSPR) 2012 pp 79ndash88

[45] P Ren T Aleksic D Emms RC Wilson ER Hancock Quantum walks Iharazeta functions and cospectrality in regular graphs Quantum Inf Process 10(2011) 405ndash417

Lu Bai received both the BSc and MSc degrees from Faculty of Information Technology Macau University of Science and Technology Macau SAR PR China He is currentlypursuing the PhD degree in the University of York York UK He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA) His currentresearch interests include structural pattern recognition machine learning soft computing and approximation reasoning especially in kernel methods and complexityanalysis on (hyper)graphs and networks

Luca Rossi received his BSc MSc and PhD in computer science from Ca Foscari University of Venice Italy He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham His current research interests are in the areas of graph-based pattern recognition machine learning network science andcomputer vision

Andrea Torsello received his PhD in computer science at the University of York UK and is currently an assistant professor at Ca Foscari University of Venice Italy Hisresearch interests are in the areas of computer vision and pattern recognition in particular the interplay between stochastic and structural approaches and game-theoreticmodels Recently he co-edited a special issue of Pattern Recognition on ldquoSimilarity-based pattern recognitionrdquo He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops He was a co-chair of the edition of GbR awell-established IAPR workshop on Graph-based methods in Pattern Recognition held in Venice in 2009 Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University Australia

Edwin R Hancock received the BSc PhD and DSc degrees from the University of Durham Durham UK He is now a professor of computer vision in the Department ofComputer Science University of York York UK He has published nearly 150 journal articles and 550 conference papers He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009 He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern RecognitionComputer Vision and Image Understanding and Image and Vision Computing His awards include the Pattern Recognition Society Medal in 1991 outstanding paper awardsfrom the Pattern Recognition Journal in 1997 and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001 the AsianConference on Computer Vision in 2002 the International Conference on Pattern Recognition (ICPR) in 2006 British Machine Vision Conference (BMVC) in 2007 and theInternational Conference on Image Analysis and Processing in 2009 He is a Fellow of the International Association for Pattern Recognition the Institute of Physics theInstitute of Engineering and Technology and the British Computer Society He was appointed as the founding Editor-in-Chief of the Institute of Engineering amp TechnologyComputer Vision Journal in 2006 He was a General Chair for BMVC in 1994 and the Statistical Syntactical and Structural Pattern Recognition in 2010 Track Chair for ICPR in2004 and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008 He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997

L Bai et al Pattern Recognition ∎ (∎∎∎∎) ∎∎∎ndash∎∎∎12

Please cite this article as L Bai et al A quantum JensenndashShannon graph kernel for unattributed graphs Pattern Recognition (2014)httpdxdoiorg101016jpatcog201403028i