An Edit Distance between Quotiented Trees

39
DOI: 10.1007/s00453-002-1002-5 Algorithmica (2003) 36: 1–39 Algorithmica © 2003 Springer-Verlag New York Inc. An Edit Distance between Quotiented Trees Pascal Ferraro 1 and Christophe Godin 1 Abstract. In this paper we propose a dynamic programming algorithm to compare two quotiented trees using a constrained edit distance. A quotiented tree is a tree defined with an additional equivalent relation on vertices and such that the quotient graph is also a tree. The core of the method relies on an adaptation of an algorithm recently proposed by Zhang for comparing unordered rooted trees. This method is currently being used in plant architecture modelling to quantify different types of variability between plants represented by quotiented trees. Key Words. Quotiented tree, Edit distance matching, Dynamic programming. 1. Introduction. In the early seventies, Wagner and Fisher proposed an algorithm which computes the distance between two strings of characters as the minimum cost sequence of elementary operations needed to transform one of the strings into the other [1]. Using two strings of characters A and B of respective lengths N A and N B , a set of elementary operators on strings, called edit operations, and a cost associated with each edit operation, Wagner and Fisher defined a distance between two strings as the cost of the sequence of edit operations that transforms A into B with minimum cost. The Wagner and Fisher distance makes use of the dynamic programming principle to achieve an algorithm with linear complexity, i.e. in O ( N A · N B ). Selkow [2], then Ta¨ ı [3] and Lu [4], generalized this approach, based on edit operations, to define and compute metrics on labelled ordered trees. These algorithms have been used over recent decades in computer science and in various applied fields, such as evolutionary biology [5], chemistry [6] and molecular biology [7]. Zhang [8] extended these dynamic programming-based algorithms to define a dis- tance for unordered labelled trees. In unordered trees, no ordering is considered for the set of sons of any vertex. This algorithm has recently been applied in plant modelling applications to compute a distance between individual plants whose topology is repre- sented by unordered trees [9]. However, to take account of the multiscale nature of plant structures [10], plants are currently represented by quotiented trees [10]. A quotiented tree is a tree with an equivalence relation defined on the set of vertices, and such that the resulting quotient graph is also a tree. A quotiented tree can thus be considered as an autosimilar structure represented by trees on two different scales. In this paper we define a distance between quotiented trees based on the computation of an optimal sequence of edit operations that preserves equivalence relations on tree vertices. In Section 2 basic definitions concerning trees and quotiented trees are intro- duced. In Section 3 we first recall how sequences of edit operations can be modelled 1 UMR GNRS-INRA-CIRAD Botanique et Bio-informatique de l’Architecture des Plantes TA 40/PS2, Boule- vard de la Lironde, 34398 Montpellier Cedex 5, France. {ferraro,godin}@cirad.fr. Received August 1, 2000; revised December 20, 2001. Communicated by H. N. Gabow. Online publication January 13, 2003.

Transcript of An Edit Distance between Quotiented Trees

DOI: 10.1007/s00453-002-1002-5

Algorithmica (2003) 36: 1–39 Algorithmica© 2003 Springer-Verlag New York Inc.

An Edit Distance between Quotiented Trees

Pascal Ferraro1 and Christophe Godin1

Abstract. In this paper we propose a dynamic programming algorithm to compare two quotiented treesusing a constrained edit distance. A quotiented tree is a tree defined with an additional equivalent relation onvertices and such that the quotient graph is also a tree. The core of the method relies on an adaptation of analgorithm recently proposed by Zhang for comparing unordered rooted trees. This method is currently beingused in plant architecture modelling to quantify different types of variability between plants represented byquotiented trees.

Key Words. Quotiented tree, Edit distance matching, Dynamic programming.

1. Introduction. In the early seventies, Wagner and Fisher proposed an algorithmwhich computes the distance between two strings of characters as the minimum costsequence of elementary operations needed to transform one of the strings into the other[1]. Using two strings of characters A and B of respective lengths NA and NB, a setof elementary operators on strings, called edit operations, and a cost associated witheach edit operation, Wagner and Fisher defined a distance between two strings as thecost of the sequence of edit operations that transforms A into B with minimum cost. TheWagner and Fisher distance makes use of the dynamic programming principle to achievean algorithm with linear complexity, i.e. in O(NA · NB). Selkow [2], then Taı [3] and Lu[4], generalized this approach, based on edit operations, to define and compute metrics onlabelled ordered trees. These algorithms have been used over recent decades in computerscience and in various applied fields, such as evolutionary biology [5], chemistry [6] andmolecular biology [7].

Zhang [8] extended these dynamic programming-based algorithms to define a dis-tance for unordered labelled trees. In unordered trees, no ordering is considered for theset of sons of any vertex. This algorithm has recently been applied in plant modellingapplications to compute a distance between individual plants whose topology is repre-sented by unordered trees [9]. However, to take account of the multiscale nature of plantstructures [10], plants are currently represented by quotiented trees [10]. A quotientedtree is a tree with an equivalence relation defined on the set of vertices, and such thatthe resulting quotient graph is also a tree. A quotiented tree can thus be considered as anautosimilar structure represented by trees on two different scales.

In this paper we define a distance between quotiented trees based on the computationof an optimal sequence of edit operations that preserves equivalence relations on treevertices. In Section 2 basic definitions concerning trees and quotiented trees are intro-duced. In Section 3 we first recall how sequences of edit operations can be modelled

1 UMR GNRS-INRA-CIRAD Botanique et Bio-informatique de l’Architecture des Plantes TA 40/PS2, Boule-vard de la Lironde, 34398 Montpellier Cedex 5, France. {ferraro,godin}@cirad.fr.

Received August 1, 2000; revised December 20, 2001. Communicated by H. N. Gabow.Online publication January 13, 2003.

2 P. Ferraro and C. Godin

using mappings between tree vertices [3]. Zhang’s algorithm is then presented and refor-mulated in terms of recursive relations between sets of mappings in order to prepare itsextension to quotiented trees, carried out in Section 4. The properties of mappings be-tween quotiented trees, i.e. preserving equivalence relations and called valid edit distancemappings, are then studied and lead to new recursive equations. Similarly to Zhang’salgorithm, we show that these equations contain terms that can be computed as particu-lar minimum cost maximum flow problems. Finally, a dynamic programming algorithmthat computes a structural distance between two quotiented trees in polynomial time isdepicted.

2. Definitions and Notations. A finite directed graph (or simply a graph) is a pair(V, E) where V denotes a finite set of vertices and E ⊆ V × V denotes a finite setof edges. The number of vertices of a graph G is denoted by |G|. If e = (x, y) isan edge in E , x and y are incident with e. Vertex x is called a father of y and y isa son of x . The set of sons of a vertex is denoted by son(v) which is of size nv anddeg(G) = maxv∈V {nv}. For every k in {1, . . . , nv}, vk denotes a son of v. A path(resp. a chain) from x1 to xn is a sequence of vertices (x1, x2, . . . , xn) such that forany two consecutive vertices {xi , xi+1} of the sequence, (xi , xi+1) is an edge (resp.either (xi , xi+1) or (xi+1, xi ) is an edge). Vertex v is an ancestor of vertex w—andreciprocally w is a descendant of v—if a path exists from v to w. The set of descendantsof v is denoted by V [v] and contains v itself. A cycle is a non-empty path from onevertex to itself. A graph with no cycle is called a directed acyclic graph. A sub-graphof a graph G = (V, E) is a graph G ′ = (V ′, E ′) such that V ′ ⊆ V and E ′ ⊆ E .This is denoted by G ′ ⊆ G. Two vertices of a graph are connected if a chain existsbetween them. A graph is connected if any pair of vertices are connected. The connectedcomponents of a graph are the maximum (for graph inclusion) connected sub-graphs ofthis graph.

The ancestor relationship on a directed acyclic graph is a partial ordering relationon the set of vertices denoted by ≤. A tree is a connected graph such that there existsa unique vertex, called the root, which has no father, and any vertex different from theroot is the son of exactly one vertex. A tree T rooted in v is denoted by T [v]. A treecontains no cycle. In a tree the set of common ancestors of any two vertices x andy obviously contains at least the root vertex and is a totally ordered set (with respectto the ancestor relationship). The maximum element of this set is called the greatestcommon ancestor and is denoted by x ∧ y. If S is any set of vertices of a tree, ∧x∈S

denotes the greatest common ancestor of all the vertices in S. The graph θ = (∅, ∅)

is called the empty tree. An unordered tree is a tree for which no ordering distinctionis made among the sons of any vertex. A sub-tree is a connected sub-graph of a tree.If x is any vertex of tree T [v], T [x] = (V [x], E[x]) denotes the maximum sub-treeof T [v] rooted in x . A forest is a graph whose connected components are trees. If xis any vertex of tree T [v], F[x] denotes the forest rooted in x , i.e. obtained from T [x]by removing the root x and all the edges incident with x . A forest rooted in a vertexis thus defined as a set of tree and a given vertex x , therefore there is never equalitybetween a tree and a forest, even if x has only one son. Suppose that x is the onlyson of y, then by definition T [x] �= F[y] and T [x] ⊂ F[y]. In the following the

An Edit Distance between Quotiented Trees 3

term forest is used to designate a rooted forest. The set of all sub-trees and forests rootedin a vertex of T [v] is denoted by S(v) = {S[x] | x ∈ V [v] and S[x] = T [x] orS[x] = F[x]}. Suppose that x is the only son of y, then by definition T [x] �= F[y] andT [x] ⊂ F[y].

A labelled graph is a graph (V, E) together with a mapping α which associates alabel from a finite (or infinite) set of labels = {a, b, c, . . .}, with each vertex in V .We assume in what follows that a distance d is defined on . d enables us to define adistance between any two vertices x and y of labelled graphs: d(x, y) = d(α(x), α(y)).

A quotiented graph H is a 3-uple (G, W, π) where G = (V, E) is a directed graphcalled the support of H , W is a set of vertices and π is a surjective mapping from V toW. For any vertex x in V , the vertex π(x) is called the complex of x and reciprocally xis a component of π(x). π−1(z) denotes the set of components of a vertex z of W and ifx is a vertex of V , �(x) denotes the set π−1(π(x)) of components of π(x). The size of�(x) is denoted by |�(x)| and degπ (H) = maxx∈V {|�(x)|}. The function π inducesa partition �H on V : �H = {π−1(z) | z ∈ W }. The quotient graph Q(H) associatedwith H is the graph (W, Eπ ) such that

∀(x, y) ∈ E, (π(x), π(y)) ∈ Eπ ⇔ π(x) �= π(y).

Quotiented graphs whose support and quotient graphs are trees are called quotientedtrees. Let H = (G, W, π) be a quotiented graph with support graph G = (V, E)

which is either a tree or a forest. Let x ∈ V , then H [x] denotes the quotiented graph(G[x], W [π(x)], π/x ) where G[x] is the sub-tree or a forest of G rooted in x, W [π(x)]is the set of vertices of the sub-tree of Q(H) rooted in π(x) and π/x is the restriction ofπ to V [x]. If G[x] is a tree, H [x] is a quotiented tree.

3. Distance between Unordered Tree Graphs

3.1. Edit Distance Mappings. The tree-to-tree correction problem [11] consists in de-termining the distance between two trees measured by the minimum cost of the sequenceof edit operations needed to transform one tree into the other. Based on definitions es-tablished by Wagner and Fisher [1], Taı [3] and Selkow [2], Zhang [12], [8] uses threeedit operations: substitution, deletion and insertion:

• Substituting a vertex x means changing the label of x .• Deleting a vertex x means making the sons of x the sons of the father of x and

removing x .• Inserting a vertex x means that x becomes a son of a vertex y and a subset of sons of

y become the set of sons of x (insertion is the complement of deletion).

In order to characterize the effect of a sequence of edit operations on a tree, Taı [3]introduced a structure called edit distance mapping (EDM). An EDM from a tree T1[v]to a tree T2[w] is a partial mapping from V1[v] to V2[w], based on the notion of a tracebetween sequences [1]. Intuitively, an EDM is a description of how a sequence of editoperations transforms T1[v] into T2[w], ignoring the order in which the edit operations

4 P. Ferraro and C. Godin

are applied. The relation between edit operations and EDMs is made explicit in [3]and [9].

DEFINITION 1. Let T1[v] = (V1[v], E1[v]) and T2[w] = (V2[w], E2[w]) be two trees,then an EDM M from T1[v] to T2[w] is a set of ordered pairs of vertices (z, t) ofV1[v] × V2[w].

We recall that T1[v] and T2[w] are trees respectively rooted in v and w. The samedefinition is used to define an EDM from a forest F1[v] to a forest F2[w]. The set ofEDMs from T1[v] to T2[w] is denoted by EDM(v, w).

Let M be an EDM of EDM(v, w), by convention in any pair (z, t) of M , z is calledan image of t by M and reciprocally t is called an image of z by M . Similarly, if z doesnot appear in a pair of M , we say that z has no image.

Let x be a vertex of T1[v] and let y be a vertex of T2[w], Mx/y (resp. My/x ) denotesthe set of vertices of T1[x] (resp. T2[y]) which have an image by M in T2[y] and T1[x]:

Mx/y = {z1 ∈ V1[x] | ∃z2 ∈ V2[y]; (z1, z2) ∈ M},My/x = {z2 ∈ V2[y] | ∃z1 ∈ V1[x]; (z1, z2) ∈ M},Mx/y = V1[x]/Mx/y ,

My/x = V2[y]/My/x .

Mx/wand My/v

will be denoted by Mx and My when no confusion is possible.Let x be the greatest common ancestor of the vertices of T2[w] which have an image

in T1[x]:

x =∧

y∈Mw/x

{y} .

Similarly, for any y in T2[w], y defines a vertex in T1[v].Note that when Mx/w

is empty, then x is not defined.Furthermore, note that if x exists, x is in T2[w] while x is in T1[v] and that x is

not necessarily an image of a vertex in T1[x]. This function can be used to associate amapping M12 from S1(v) to S2(w) with any M in EDM(v, w).

DEFINITION 2. Let M be an EDM from T1[v] to T2[w]. M12 is a mapping from S1(v)

to S2(w) such that

M12: S1(v) → S2(w),

S1[x] �→ M12(S1[x]) =

θ if Mx/w

= ∅,

T2[x] if x ∈ Mw,

F2[x] otherwise.

Note that in this definition when x is not defined, then the image of S1[x] is the emptytree.

An Edit Distance between Quotiented Trees 5

c)(

a)( b)(

d)(F2

x

v w1F

T1 T2

x

v w

F1 F2w

x

v

T1 T2

v w

x

MM

M M

Fig. 1. Definition of M : S1[v] �→ M(S1[v]), (a) and (b) illustrate the image of one tree T1, (c) and (d) illustratethe image of one forest F1. (a) The image of T1 is a sub-tree of T2. (b) The image of T1 is a forest included inT2, note that the root of T1 has no image by M . (c) The image of F1 is a tree included in F2, note that verticesof F1 which have an image by M belong to one tree of the forest F1. (d) The image of F1 is a forest of F2.Colour convention used throughout the paper. Vertices (or trees) represented in black have an image by M ;those in grey may or may not have an image by M ; and those in white do not have any image.

Symetrically, a mapping M21 can be defined fromS2(w) toS1(v). When no confusionis possible, M12 and M21 are simply denoted by M . Figure 1 illustrates the image by Mof a tree and a forest. M gives a high-level interpretation of EDMs; whereas M expressesa relationship between vertices, M expresses a corresponding relationship between trees(or forests) rooted in theses vertices. M has the following important property:

PROPOSITION 1. M is an increasing mapping, i.e. for any x and y in V1[v], and for anyS1[x] and S1[y] in S1(v)

S1[x] ⊆ S1[y] ⇒ M(S1[x]) ⊆ M(S1[y]).

The reciprocal of Proposition 1, which is not true in general (Figure 1), is true withadditional assumptions (see Section 3.4). M enables us to work at the graph level, i.e. toexpress the algorithm properties in terms of relations between trees (or forests), whilethe original formulation of Zhang’s algorithm was performed at the vertex level. Thisformulation will be used in what follows to extend the original comparison algorithm toquotiented trees.

3.2. Cost of EDMs. According to the definition of the elementary cost between vertices,a cost is assigned to each EDM M from T1[v] to T2[w]:

γ (M) =∑

(x,y)∈M

d(x, y).

6 P. Ferraro and C. Godin

A first dissimilarity measure can be defined as the minimum cost of an EDM from T1[v]to T2[w]. However, this dissimilarity does not take account of vertices which have noimage, and two trees of different sizes could thus be considered as similar. To accountfor these vertices, the cost of sets Mv and Mw can be added to the cost of an EDM.We define a symbol λ not in and extend the distance d so that d is a distance over ∪ {λ}. The cost of inserting or deleting a vertex x is denoted d(x, λ) and is defined asd(α(x), λ).

The cost �v,w(M) of an EDM M from T1[v] to T2[w] is thus defined as

�v,w(M) = γ (M) + γ (Mv) + γ (Mw).

A dissimilarity measure between a tree T1[v] and a tree T2[w] can thus be defined as theminimum cost of an EDM from T1[v] to T2[w]:

D(T1[v], T2[w]) = minM∈EDM(v,w)

{�v,w(M)}.

When d is a distance, D is also a distance [13].

3.3. Valid EDMs. In the following we consider an analogous dissimilarity measure,restricted to EDMs preserving structural properties of the mapped trees. These are calledvalid EDMs:

DEFINITION 3 (Valid EDM). Let T1[v] = (V1[v], E1[v]) and T2[v] = (V2[v], E2[v])be two trees. A valid EDM M from T1[v] to T2[w] is a set of ordered pairs of vertices(x, y) ∈ V1[v] × V2[w] satisfying the constraints

∀(x1, x2), (y1, y2) ∈ M,

x1 = y1 ⇔ x2 = y2,(1)

x1 ≤ y1 ⇔ x2 ≤ y2.(2)

Sets of valid EDMs are denoted by T (v, w). A valid EDM between two rooted forestsF1[v] and F2[w] is defined in a similar manner. The set of EDMs from F1[v] to F2[w]is denoted by F(v, w).

A new dissimilarity measure between two tree graphs T1[v] and T2[w] can be definedas an optimization problem:

PROBLEM 1. Find �v,w(M) minimum, such that M is a valid EDM from T1[v] to T2[w]satisfying constraints (1) and (2):

D(T1[v], T2[w]) = minM∈T (v,w)

{�v,w(M)}.

Zhang [14] and Kilpellainen and Mannila [15] showed that, for two trees, this def-inition of valid mapping leads to an NP-complete problem. To alleviate this difficulty,an algorithm which solves this problem in polynomial time has been proposed by Taı[3] for ordered trees by introducing a new constraint which preserves the order. Thecorresponding dissimilarity measure was shown to be a distance [3]. In the case of

An Edit Distance between Quotiented Trees 7

unordered trees, Zhang proposed considering a new constraint in the definition of a validEDM [12], [8] based on an initial idea proposed by Tanaka and Tanaka [16] for orderedtrees: two separate sub-trees of one tree should be mapped onto two separate sub-treesof the other tree. Zhang extended this idea from ordered to unordered trees and changedthe definition of a valid EDM as follows:

DEFINITION 4. Let T1[v] = (V1[v], E1[v]) and T2[v] = (V2[v], E2[v]) be two trees.A valid EDM M from T1[v] to T2[w] is a set of ordered pairs of vertices (x, y) ∈V1[v] × V2[w] satisfying constraints (1), (2) and

∀(x1, x2), (y1, y2), (z1, z2) ∈ M, x1 ∧ y1 < z1 ⇔ x2 ∧ y2 < z2.(3)

The dissimilarity measure between two unordered trees T1[v] and T2[w] is defined as

D(T1[v], T2[w]) = minM∈T (v,w)

{�v,w(M)}.

Zhang showed that the dissimilarity measure D(T1[v], T2[w]) is actually a distance[8] and proposed an algorithm with bounded complexity to solve the following newoptimization problem.

PROBLEM 2. Find �v,w(M) minimum, such that M is a valid EDM from T1[v] to T2[w]satisfying constraints (1)–(3).

3.4. Properties of Valid EDMs. In this section we consider a valid EDM (accordingto Definition 4) M from S1[v] to S2[w], where S1[v] and S2[w] are both either a tree ora forest. We show several properties of valid EDMs that enable us to derive the basicalgorithm for comparing unordered trees [12] using this new formulation.

3.4.1. Properties of M . For valid EDMs, Proposition 1 of M can be extended asfollows:

PROPOSITION 2. For any valid EDM M ,

∀(S1[x], S1[y]) ∈ S1(v) × S1(v),

S1[x] ⊆ S1[y] ⇔ M(S1[x]) ⊆ M(S1[y]).

Note that the three constraints used to define valid EDMs (Definition 4) are necessaryfor the equivalence to hold.

PROPOSITION 3. M is a valid EDM from S1[v] to S2[w] if and only if M satisfies oneand only one of these five assertions:

1. M(S1[v]) = θ and M(S2[w]) = θ .2. M(S1[v]) �= θ and M(S2[w]) �= θ then:

(a) M(S1[v]) ⊂ S2[w] and M(S2[w]) = S1[v];(b) M(S1[v]) = S2[w] and M(S2[w]) ⊂ S1[v];(c) M(S1[v]) = S2[w] and M(S2[w]) = S1[v];(d) M(S1[v]) ⊂ S2[w] and M(S2[w]) ⊂ S1[v].

8 P. Ferraro and C. Godin

wv

wvwv

v w

v w wv

wv

(a) (b)

(c)

4

2

(d )3

(d1 ) (d )

(d )

Fig. 2. Partition of the set of valid EDMs from T1[v] to T2[w] using M , T (v, w)θ,θ is not represented. Generalform of EDM in (a) T (v, w)⊂,=, (b) T (v, w)=,⊂, (c) T (v, w)=,= and (d) T (v, w)⊂,⊂. (d1) F(v, w)⊂,=,(d2) F(v, w)=,⊂, (d3) F(v, w)=,= and (d4) F(v, w)⊂,⊂.

This proposition can be used to solve Problem 2 recursively [8]. This is achieved byappling the dynamic programming principle (e.g. [17]) to the computation of the optimalvalid EDM. To express the recursive nature of the optimality principle, the set M(v, w)

can be split into subsets (Figure 2) as follows:

1. M(S1[v]) = θ and M(S2[w]) = θ ;• M(v, w)θ,θ = {M ∈ M(v, w) | M(S1[v]) = θ and M(S2[w]) = θ}.

2. M(S1[v]) �= θ and M(S2[w]) �= θ then:• M(v, w)⊂,= = {M ∈ M(v, w) | M(S1[v]) ⊂ S2[w] and M(S2[w]) = S1[v]};• M(v, w)=,⊂ = {M ∈ M(v, w) | M(S1[v]) = S2[w] and M(S2[w]) ⊂ S1[v]};• M(v, w)=,= = {M ∈ M(v, w) | M(S1[v]) = S2[w] and M(S2[w]) = S1[v]};• M(v, w)⊂,⊂ = {M ∈ M(v, w) | M(S1[v]) ⊂ S2[w] and M(S2[w]) ⊂ S1[v]}.

These subsets form a partition of M(v, w). In these definitions, M(v, w) representseither T (v, w) or F(v, w). For example, T (v, w)=,= is the set of valid EDMs M fromT1[v] to T2[w] such that M(T1[v]) = T2[w] and M(T2[w]) = T1[v]; F(v, w)=,⊂ isthe set of valid EDMs M from F1[v] to F2[w] such that M(F1[v]) = F2[w] andM(F2[w]) ⊂ F1[v].

An Edit Distance between Quotiented Trees 9

PROPOSITION 4. For any valid EDM from T1[v] to T2[w], M is in T (v, w)=,= if andonly if

(v, w) ∈ M.

For any valid EDM M from T1[v] to T2[w] in T (v, w), if (v, w) is in M , M\{(v, w)}is denoted by M∗. Furthermore, if M is in T (v, w)=,=, M∗(T1[v]) ⊂ T2[w] andM∗(T2[w]) ⊂ T1[v] and then, according to the previous proposition, M∗ ∈ F(v, w).

PROPOSITION 5. For any valid EDM from T1[v] to T2[w], M∗ is in T (v, w)⊂,⊂ if andonly if M∗ is a valid EDM from F1[v] to F2[w]:

T (v, w)⊂,⊂ = F(v, w).

EDMs in R(v, w) = F(v, w)=,= ∪ F(v, w)⊂,⊂ have an additional remarkableproperty.

PROPOSITION 6. For any M inR(v, w) and for any vi son of v, such that M(T1[vi ]) �= θ ,there exists a unique wj , son of w such that M(T1[vi ]) ⊆ T2[wj ]. For any son vk of v, ifM(T1[vk]) �= θ and M(T1[vk]) ⊆ T2[wj ], then vk = vi .

For any EDM from R(v, w), the image of any tree of F1[v] is either the empty treeor included in a tree of F2[w]. This means that vertices from a tree T1[vi ] can onlybe mapped onto vertices of one tree T2[wj ] and reciprocally. R(v, w) thus defines amapping between trees of F1[v] and trees of F2[w], called a restricted EDM [8].

A matching of a graph is any subset of its edges such that no two members of the subsetare adjacent [18]. We define a bipartite graph G(v, w) = (V, E), where V representsson[v]∪ son[w] and E is son[v]× son[w]. The set of all possible matching on this graphis denoted by K(v, w).

3.4.2. Recursive expression of EDM sets. Proposition 3 can be directly expressed interms of valid EDMs and reveals the different cases used by Zhang to establish therecurrent relations between EDM sets.

PROPOSITION 7. Let M be a valid EDM:

1. If M is in T (v, w), then M satisfies one and only one of the follows assertions:(a) ∃wk ∈ son[w] such that M ∈ T (v, wk);(b) ∃vk ∈ son[v] such that M ∈ T (vk, w);(c) (v, w) ∈ M and M∗ ∈ F(v, w);(d) M ∈ F(v, w);(e) M = ∅.

2. If M is in F(v, w), then M satisfies one and only one of the follows assertions:(a) ∃wk ∈ son[w] such that M ∈ F(v, wk);(b) ∃vk ∈ son[v] such that M ∈ F(vk, w);(c) M ∈ R(v, w);(d) M = ∅.

10 P. Ferraro and C. Godin

Cases 1(e) and 2(d) do not appear in the original formulation since they representlimit cases. However, these limit cases will be exploited in the extension of the algorithmdiscussed in the next section.

The equivalence of Propositions 3 and 7 shows that the original formulation of Zhang’salgorithm can be expressed in terms of the properties of M . The new formulation intro-duced in Proposition 3 is more compact than the original formulation since it does needto make a distinction between forests and trees (as in Proposition 7).

3.5. Recursive Expression of the Distance between Unordered Trees. The above Propo-sition 7 of valid EDMs can be used to compute recursively the cost of a valid EDM withminimum cost.

THEOREM 1 [12], [8]. D(T1[v], T2[w]) and D(F1[v], F2[w]) can be computed recur-sively:

1. Initialization:

D(θ, θ) = 0,

D(F1[v], θ) = ∑vk∈son[v]

D(T1[vk], θ), D(T1[v], θ) = D(F1[v], θ) + d(v, λ),

D(θ, F2[w]) = ∑wk∈son[w]

D(θ, T2[wk]), D(θ, T2[w]) = D(θ, F2[w])+d(λ, w).

2. Distance between trees:

D(T1[v], T2[w]) = min

D(θ, T2[w])

+ minwk∈son[w]{D(T1[v], T2[wk]) − D(θ, T2[wk])},D(T1[v], θ)

+ minvk∈son[v]{D(T1[vk], T2[w]) − D(T1[vk], θ)},D(F1[v], F2[w]) + d(v, w).

3. Distance between forests:

D(F1[v], F2[w]) = min

D(θ, F2[w])

+ minwk∈son[w]{D(F1[v], F2[wk]) − D(θ, F2[wk])},D(F1[v], θ)

+ minvk∈son[v]{D(F1[vk], F2[w]) − D(F1[vk], θ)},minM∈R(v,w){γ (M)}.

Zhang [12] models the computation of minM∈R(v,w) {γ (M)} as a problem of mini-mum cost maximum flow, which mainly determines the overall complexity of the finalalgorithm. The cost of optimal retricted EDMs is studied in Section 4.4. The complexityof this algorithm is

O(|T1| × |T2| × (deg(T1) + deg(T2)) × log2(deg(T1) + deg(T2))).

An Edit Distance between Quotiented Trees 11

4. Distance between Quotiented Trees

4.1. Valid EDMs. We consider two quotiented trees G1 = (T1, W1, π1) and G2 =(T2, W2, π2) such that the roots of T1 and T2 are respectively v and w (if no confusionis possible, π1 and π2 are denoted by π ). Let M be an EDM from T1[v] to T2[w]. Minduces an EDM from tree Q(G1) to tree Q(G2), called the quotient EDM, denoted byQ(M), composed of pairs of vertices in W1 × W2 and defined as

(a, b) ∈ Q(M) ⇔ ∃(z, t) ∈ M such that

{π(z) = a,

π(t) = b.

In the following, Q(M) will be denoted by N .Let x be a vertex of T1[v] (resp. of T2[w]), Q(Mx ) denotes the set of vertices of

quotient graphs Q(G1[x]) or (resp. of Q(G2[y])) which have an image by Q(M):

Q(Mx ) = {π(z) | z ∈ Mx }.

It should be noted that if M denotes a valid EDM from T1[v] to T2[w], Q(M) is notnecessarily a valid EDM from tree Q(G1) to tree Q(G2) (see Figure 3).

DEFINITION 5 (Valid EDMs on Quotiented Trees). Let G1[v] = (T1, W1, π1) andG2[w] = (T2, W2, π2) be two quotiented trees. A valid EDM M from G1[v] to G2[w]is a valid EDM from T1[v] to T2[w] such that Q(M) is also a valid EDM from Q(G1)

to Q(G2).

The set of valid mappings from T1[π(v)] to T2[π(w)] is denoted by T (π(v), π(w)).Thus by definition M is a valid EDM from G1[v] to G2[w] if, and only if, M is inT (v, w) and Q(M) is in T (π(v), π(w)). The set of valid EDMs from G1[v] to G2[w]is denoted by G(v, w).

Similarly to unordered tree comparison, we need to consider a set of valid EDMsbetween the quotiented trees G1[v] and G2[w] in which v and w do not have any imageby the EDM, i.e. EDMs between forests F1[v] and F2[w]. This set will be denoted byH(v, w):

H(v, w) = G(v, w) ∩ F(v, w).

A dissimilarity measure between quotiented trees is then defined by the following opti-mization problem.

PROBLEM 3. Find �v,w(M) minimum, such that M is a valid EDM from G1[v] toG2[w] satisfying Definition 5:

D(G1[v], G2[w]) = minM∈G(v,w)

{�v,w(M)}.

12 P. Ferraro and C. Godin

a( )

c( )

b( )

Fig. 3. (a), (b) Valid EDMs for support tree comparison but they are not valid for quotient tree comparisonsince (a) the one to one correspondence is not satisfied on the quotient graph and (b) the ancestor relationshipsis not satisfied on the quotient graph. (c) A valid EDM between quotiented trees, i.e. EDMs between bothsupport and quotient graphs are valid.

LEMMA 1. D is a distance.

4.2. Properties of Valid EDMs. According to the properties described in Section 3.4,the efficient computation of a distance between unordered trees relies on the possibility ofapplying the dynamic programming principle using recursive relations between the set ofEDMs T (v, w) and and the sets {T (v, wk), T (vi , w), T (vi , wk),F(v, w)} where vi andwk are respectively sons of v and w. In a similar fashion, we wish to determine a recur-

An Edit Distance between Quotiented Trees 13

Fig. 4. Different partition of T (v, w) of valid EDMs from T1[v] to T2[w]. T (v, w) is firstly decomposed intofive subsets (T (v, w)θ,θ is not represented here) depending on the images of T1[v] to T2[w] by M . The subsetcorresponding to T (v, w)⊂,⊂ is then decomposed (1) into five new subsets (F(v, w)θ,θ is not representedhere). Each subsets of this partition is then decomposed (2) into five new subsets depending on the image ofT1[π(v)] and T2[π(w)] by N . So far, the subsets labelled d are decomposed into four new subsets dependingon the image of F1[π(v)] and F2[π(w)] by N .

sive expression between the set G(v, w) and {G(v, wk),G(vi , w),G(vi , wk),H(v, w)}of valid EDMs on quotiented trees which will enable us to solve Problem 3 efficientlyusing a dynamic programming-based algorithm.

In what follows we will show that such recursive expressions can be obtained bydefining an adequate partition of G(v, w). This partition is based on a two-level scheme(Figure 4): a first partition of G(v, w) is made, based on partitioning T (v, w) into subsets

14 P. Ferraro and C. Godin

T (v, w)=,=, T (v, w)⊂,=, T (v, w)⊂,=, T (v, w)⊂,⊂ = F(π(v), π(w)) and T (v, w)θ,θ

(Figure 4.1; T (v, w)θ,θ is not represented in the figure). Note that, as explained in theprevious section, F(π(v), π(w)) is further decomposed into F(v, w)=,=, F(v, w)⊂,=,F(v, w)⊂,=, F(v, w)⊂,⊂ and F(v, w)θ,θ (Figure 4.2; F(v, w)θ,θ is not represented inthe figure). Then, at a second level, each set of the resulting partition is itself decomposedinto a partition based on configurations of valid EDMs on quotient graphs, i.e. partitionsof T (π(v), π(w)) and F(π(v), π(w)). Each set is decomposed into five new subsetsdepending on the image of T1[π(v)] and T2[π(w)] by N (Figure 4.3):

a: N (T1[π(v)]) ⊂ T2[π(w)] and N (T2[π(w)]) = T1[π(v)];b: N (T1[π(v)]) = T2[π(w)] and N (T2[π(w)]) ⊂ T1[π(v)];c: N (T1[π(v)]) = T2[π(w)] and N (T2[π(w)]) = T1[π(v)];d: N (T1[π(v)]) ⊂ T2[π(w)] and N (T2[π(w)]) ⊂ T1[π(v)].

Note that the case corresponding to N (T1[π(v)]) = θ and N (T2[π(w)]) = θ is notrepresented in the figure.

Finally, the subsets corresponding to case d (labelled d in Figure 4.3) are furtherdecomposed into five new subsets depending on the image of F1[π(v)] and F2[π(w)]by N (Figure 4.4):

e: N (F1[π(v)]) ⊂ F2[π(w)] and N (F2[π(w)]) = F1[π(v)];f: N (F1[π(v)]) = F2[π(w)] and N (F2[π(w)]) ⊂ F1[π(v)];g: N (F1[π(v)]) = F2[π(w)] and N (F2[π(w)]) = F1[π(v)];h: N (F1[π(v)]) ⊂ F2[π(w)] and N (F2[π(w)]) ⊂ F1[π(v)].

Note that the case corresponding to N (F1[π(v)]) = θ and N (F2[π(w)]) = θ is notrepresented in the figure.

Hence the combinatorics of the different configurations of interest of valid EDMs bothat microscopic level (tree level) and at macroscopic level (quotient tree level) results ina partition of G(v, w) into 51 subsets. To establish recursive relations between thesesubsets, we study their properties in the next section.

4.2.1. Properties of N . In the following, if no confusion is possible, G(v, w) andH(v, w) are both denoted by M(v, w). If M belongs to M(v, w), N = Q(M) is a validEDM from S1[π(v)] to S2[π(w)]. Then, according to Proposition 3, M belongs to oneand only one of the following sets, depending on the respective images of S1[π(v)] andS2[π(w)] by N :

• [M(v, w)]⊂,= = {M ∈ M(v, w) | N ∈ T (π(v), π(w))⊂,=} is the set of valid EDMsof M(v, w) such that Q(M) is in T (π(v), π(w))⊂,=, this means that S1[π(v)] has animage included in S2[π(w)] and the image of S2[π(w)] is S1[π(v)].Similarly:

• [M(v, w)]=,⊂ = {M ∈ M(v, w) | N ∈ T (π(v), π(w))=,⊂};• [M(v, w)]=,= = {M ∈ M(v, w) | N ∈ T (π(v), π(w))=,=};• [M(v, w)]⊂,⊂ = {M ∈ M(v, w) | N ∈ T (π(v), π(w))⊂,⊂};• [M(v, w)]θ,θ = {M ∈ M(v, w) | N ∈ T (π(v), π(w))θ,θ }.N\{(π(v), π(w))} is denoted by N ∗. Note here that according to Proposition 4, forany valid EDM N from T1[π(v)] to T2[π(w)], N is in T (π(v), π(w))=,= if and only

An Edit Distance between Quotiented Trees 15

if (π(v), π(w)) ∈ N and if N ∗ is a valid EDM in F(π(v), π(w)). Similarly, N is inT (π(v), π(w))⊂,⊂ if and only if N = N ∗ is a valid EDM in F(π(v), π(w)). Therefore,to express recursive relations based on sets T (π(v), π(w))=,= and T (π(v), π(w))⊂,⊂,we need to consider partitions of [M(v, w)]=,= and [M(v, w)]⊂,⊂ based respectivelyon the membership of N ∗ or N in the different subsets of F(π(v), π(w)):

• ([M(v, w)]=,=)⊂,= = {M ∈ [M(v, w)]=,= | N ∗ ∈ F(π(v), π(w))⊂,=};• ([M(v, w)]⊂,⊂)⊂,= = {M ∈ [M(v, w)]⊂,⊂ | N ∗ ∈ F(π(v), π(w))⊂,=}.Note that in the case of [M(v, w)]⊂,⊂, N ∗ = N . Sets ([M(v, w)]=,=)=,⊂, ([M(v,

w)]=,=)=,=, ([M(v, w)]=,=)⊂,⊂, ([M(v, w)]=,=)θ,θ , ([M(v, w)]⊂,⊂)=,⊂, ([M(v,

w)]⊂,⊂)=,=, ([M(v, w)]⊂,⊂)⊂,⊂ and ([M(v, w)]⊂,⊂)θ,θ are defined similarly. The dif-ferent types of EDMs corresponding to the partition of [M(v, w)]=,= are representedgraphically in Figure 5.

To find recursive relations between these sets for a pair of vertices (v, w), we need tostudy how such sets can be computed from similar sets associated with the decendantsof v and w. To compute recursively M in M(v, w) from Mi ’s in M(xi , yi ), wherexi and yi are descendants of v and w, respectively, we need to study two kinds ofsituation: either M is identical to one of the Mi ’s (Lemma 2) or M is a union of the Mi ’s(Lemma 3).

(a) (b)

(c) (d)

(e)

(a)

Fig. 5. Partition of the set [H(v, w)]=,= of valid EDMs from F1[v] to F2[w] using M such thatM(T1[π(v)]) = T2[π(w)] and M(T2[π(w)]) = T1[π(v)]. General form of EDM in (a) ([H(v, w)]=,=)=,⊂,(b) ([H(v, w)]=,=)⊂,=, (c) ([H(v, w)]=,=)=,=, (d) ([H(v, w)]=,=)⊂,⊂, and (e) ([H(v, w)]=,=)θ,θ .

16 P. Ferraro and C. Godin

LEMMA 2. Let x and y both be descendants of v and w, respectively, for any validEDM M in M(x, y), then M is a valid EDM in M(v, w) such that

N (S1[π(x)]) = θ ⇒ N (S1[π(v)]) = θ,(4)

N (S1[π(x)])⊂ S2[π(y)] ⇒ N (S1[π(v)])⊂ S2[π(w)],(5)

N (S1[π(x)])= S2[π(y)] ⇒{

N (S1[π(v)])= S2[π(w)] if π(y)=π(w),

N (S1[π(v)])⊂ S2[π(w)] otherwise.(6)

Here it should be recalled that S represents either a tree or a forest and that x and ycan be respectively equal to v and w. By symmetry, the same proposition holds if theroles of S1 and S2 are inverted. This proposition can be used to compute the sets of thepartition ofM(v, w), i.e. [M(v, w)]⊂,=, [M(v, w)]=,⊂, [M(v, w)]=,=, [M(v, w)]⊂,⊂,from those of the partition of M(x, y).

For example, if M denotes a valid EDM in [M(x, y)]⊂,=, by definition N = Q(M) isin T (π(x), π(y))⊂,=, which means that N (T1[π(x)]) ⊂ T2[π(y)] and N (T2[π(y)]) =T1[π(x)]. Then according to Lemma 2, M is a valid EDM in M(v, w) such that:

• if π(x) = π(v), then N (T1[π(v)]) ⊂ T2[π(w)] (from (5)) and N (T2[π(w)]) =T1[π(v)] (from (6)), which means that N is in T (π(v), π(w))⊂,=.

• Otherwise N (T1[π(v)]) ⊂ T2[π(w)] (from (5)) and N (T2[π(w)]) ⊂ T1[π(v)] (from(6)), which means that N is in T (π(v), π(w))⊂,⊂.

In other terms, M is in [M(v, w)]⊂,= if π(x) = π(v), otherwise M is in [M(v, w)]⊂,⊂.Proposition 8 details similar relationships between sets [M(x, y)]α,β and

[M(v, w)]α′,β

′ , where α, β, α′and β

′are in {⊂, =, θ}.

PROPOSITION 8. Let x and y both be descendants of v and w, respectively, for anyvalid EDM M in [M(x, y)]α,β , then M is a valid EDM in [M(x, y)]α′

,β′ as detailed in

Table 1.

In Section 4.4 we shall need to analyse sets [M(v, w)]=,= and [M(v, w)]⊂,⊂ fur-ther to derive a complete and sound recursive expression of M(v, w). To achievethis, Lemma 2 can also be applied to derive inclusions between sets of the parti-tion of [M(x, y)]⊂,⊂ (resp. [M(x, y)]=,=), i.e. ([M(x, y)]⊂,⊂)=,⊂, ([M(x, y)]⊂,⊂)⊂,=,([M(x, y)]⊂,⊂)=,=, . . . (resp. ([M(x, y)]=,=)=,⊂, ([M(x, y)]=,=)⊂,=, . . . ), and those ofthe partition of [M(v, w)]=,= and [M(v, w)]⊂,⊂. For example, consider

Table 1. Membership of a valid EDM in M(x, y) into M(v, w) depending on π(x) and π(y).

π(x) π(y) [M(x, y)]⊂,= [M(x, y)]=,⊂ [M(x, y)]=,= [M(x, y)]⊂,⊂ [M(x, y)]θ,θ

= π(v) = π(w) [M(v, w)]⊂,= [M(v, w)]=,⊂ [M(v, w)]=,= [M(v, w)]⊂,⊂ [M(v, w)]θ,θ

= π(v) �= π(w) [M(v, w)]⊂,= [M(v, w)]⊂,⊂ [M(v, w)]⊂,= [M(v, w)]⊂,⊂ [M(v, w)]θ,θ

�= π(v) = π(w) [M(v, w)]⊂,⊂ [M(v, w)]=,⊂ [M(v, w)]=,⊂ [M(v, w)]⊂,⊂ [M(v, w)]θ,θ

�= π(v) �= π(w) [M(v, w)]⊂,⊂ [M(v, w)]⊂,⊂ [M(v, w)]⊂,⊂ [M(v, w)]⊂,⊂ [M(v, w)]θ,θ

An Edit Distance between Quotiented Trees 17

Table 2. Membership of a valid EDM in [M(x, y)]⊂,⊂ into M(v, w) depending on π(x) and π(y).

π(x) π(y) ([M(x, y)]⊂,⊂)⊂,= ([M(x, y)]⊂,⊂)=,⊂ ([M(x, y)]⊂,⊂)=,= ([M(x, y)]⊂,⊂)⊂,⊂ ([M(x, y)]⊂,⊂)θ,θ

= π(v) = π(w) ([M(v, w)]⊂,⊂)⊂,= ([M(v, w)]⊂,⊂)=,⊂ ([M(v, w)]⊂,⊂)=,= ([M(v, w)]⊂,⊂)⊂,⊂ ∅= π(v) �= π(w) ([M(v, w)]⊂,⊂)⊂,= ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,= ([M(v, w)]⊂,⊂)⊂,⊂ ∅�= π(v) = π(w) ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)=,⊂ ([M(v, w)]⊂,⊂)=,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ∅�= π(v) �= π(w) ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ∅

M in ([M(x, y)]⊂,⊂)⊂,=. By definition N ∗ is in F(π(x), π(y))⊂,=, then accordingto Lemma 2, M is a valid EDM in M(v, w) such that:

• if π(x) = π(v), then N is in F(π(v), π(w))⊂,=;• otherwise N is in F(π(v), π(w))⊂,⊂.

Finally, M is in ([M(v, w)]⊂,⊂)⊂,= if π(x) = π(v), otherwise M is in([M(v, w)]⊂,⊂)⊂,⊂.

All these results are summarised by the following propositions.

PROPOSITION 9. Let x and y both be descendants of v and w, respectively, for anyvalid EDM M in [M(x, y)]⊂,⊂, then M is a valid EDM in one set of the partition of[M(v, w)]⊂,⊂, as detailed in Table 2.

PROPOSITION 10. Let x and y both be descendants of v and w, respectively, for anyvalid EDM M in [M(x, y)]=,=, then M is a valid EDM in M(v, w) as detailed inTable 3.

In the same way, the propositions below give the relations between an EDM M inM(v, w) and EDMs Mi in M(xi , yi ), where xi and yi are descendants of v and w, whenM can be considered as a union of the Mi ’s.

LEMMA 3. Let v1, v2, . . . , vnvbe the sons of v and let w1, w2, . . . , wnw

be the sons ofw. Consider n EDMs (Mk)k∈{1..n} in M(vpk , wqk ) such that for any Mi and Mj , i �= j ifand only if vpi �= vpj and wpi �= wpj . Let M be the union

⋃k∈{1..n}{Mk}:

N ∈ T (π(v), π(w))⊂,= ⇔ ∃i | Ni ∈ T (π(v), π(w))⊂,=

and ∀ j �= i, Nj ∈ T (π(v), π(w))θ,θ ,

Table 3. Membership of a valid EDM in [M(x, y)]=,= into M(v, w) depending on π(x) and π(y).

π(x) π(y) ([M(x, y)]=,=)⊂,= ([M(x, y)]=,=)=,⊂ ([M(x, y)]=,=)=,= ([M(x, y)]=,=)⊂,⊂ ([M(x, y)]=,=)θ,θ

= π(v) = π(w) ([M(v, w)]=,=)⊂,= ([M(v, w)]=,=)=,⊂ ([M(v, w)]=,=)=,= ([M(v, w)]=,=)⊂,⊂ ([M(v, w)]=,=)θ,θ

= π(v) �= π(w) [M(v, w)]⊂,= [M(v, w)]⊂,= [M(v, w)]⊂,= [M(v, w)]⊂,= [M(v, w)]⊂,=�= π(v) = π(w) [M(v, w)]=,⊂ [M(v, w)]=,⊂ [M(v, w)]=,⊂ [M(v, w)]=,⊂ [M(v, w)]=,⊂�= π(v) �= π(w) ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)θ,θ

18 P. Ferraro and C. Godin

N ∈ T (π(v), π(w))=,= ⇔ ∃i | Ni ∈ T (π(v), π(w))=,=

and ∀ j �= i,

Nj ∈ T (π(v), π(w))=,=,

Nj ∈ T (π(v), π(w))⊂,⊂,

Nj ∈ T (π(v), π(w))θ,θ ,

N ∈ T (π(v), π(w))⊂,⊂ ⇔ ∃i | Ni ∈ T (π(v), π(w))⊂,⊂

and ∀ j �= i,

{Ni ∈ T (π(v), π(w))⊂,⊂,

Ni ∈ T (π(v), π(w))θ,θ ,

N ∗ ∈ F(π(v), π(w))⊂,= ⇔ ∃i | N ∗i ∈ F(π(v), π(w))⊂,=

and ∀ j �= i, N ∗j ∈ F(π(v), π(w))θ,θ ,

N ∗ ∈ F(π(v), π(w))=,= ⇔ ∃i | N ∗i ∈ F(π(v), π(w))=,=

and ∀ j �= i,

N ∗

j ∈ F(π(v), π(w))=,=,

N ∗j ∈ F(π(v), π(w))⊂,⊂,

N ∗j ∈ F(π(v), π(w))θ,θ ,

N ∗ ∈ F(π(v), π(w))⊂,⊂ ⇔ ∃i | N ∗i ∈ F(π(v), π(w))⊂,⊂

and ∀ j �= i,

{N ∗

i ∈ F(π(v), π(w))⊂,⊂,

N ∗i ∈ F(π(v), π(w))θ,θ ,

where N ∗i denotes the valid EDM Ni\{(π(vpi ), π(wqi )}.

According to this proposition it is possible to find relationships between the sets ofthe partition ofM(vpi , wqi ), i.e. [M(vpi , wqi )]⊂,=, [M(vpi , wqi )]=,⊂, [M(vpi , wqi )]=,=,[M(vpi , wqi )]⊂,⊂, and those of the partition of M(v, w). For example, consider n validEDMs M1, M2, . . . and Mn in M(vp1 , wq1), M(vp2 , wq2), . . . and M(vpn , wqn ), respec-tively, if there exists i in {1..n} such that Mi denotes a valid EDM in [M(x, y)]⊂,=, whereπ(x) = π(v) and π(y) = π(w), according to Lemma 2, Ni is inT (π(v), π(w))⊂,=.As established in Lemma 3, N = ⋃

k∈{1..n}{Nk} is in N ∈ T (π(v), π(w))⊂,= if andonly if for any j �= i , Nj ∈ T (π(v), π(w))θ,θ . In other terms, M is a valid EDM in[M(v, w)]⊂,= if and only if Mj is in [M(x, y)]θ,θ .

The next proposition summarises results which allow us to determine the union ofsets (Mk)k∈{1..n} in the other cases.

Let v1, v2, . . . , vnvbe the sons of v and let w1, w2, . . . , wnw

be the sons of w. Considern EDMs (Mk)k∈{1..n} in M(vpk , wqk ) such that for any Mi and Mj , i �= j if and onlyif vpi �= vpj and wpi �= wpj . Let M be the union

⋃k∈{1..n}{Mk}, then M is in A(v, w)

if there exists Mi in B(v, w) and for any Mj , j �= i , Mj is in C(v, w), where A(v, w),B(v, w) and C(v, w) are sets of valid EDMs as detailed in Table 4.

4.2.2. Recursive expression of EDM sets. These results are used to determine the setsof the partition of G(v, w) and H(v, w).

An Edit Distance between Quotiented Trees 19

Table 4. Membership of an union of valid EDMs M =⋃k∈{1..n}{Mk} to M(v, w). Mi is a necessary EDM,while Mj is not a necessary EDM. An empty cell denotes the empty set.

M∈ A(v, w) Mi ∈ B(v, w) Mj ∈ C(v, w)

[M(v, w)]⊂,= [M(v, w)]⊂,= [M(v, w)]θ,θ

[M(v, w)]=,⊂ [M(v, w)]=,⊂ [M(v, w)]θ,θ

[M(v, w)]=,= [M(v, w)]=,= [M(v, w)]=,= [M(v, w)]⊂,⊂ [M(v, w)]θ,θ

([M(v, w)]=,=)⊂,= ([M(v, w)]=,=)⊂,= ([M(v, w)]=,=)θ,θ

([M(v, w)]=,=)=,⊂ ([M(v, w)]=,=)=,⊂ ([M(v, w)]=,=)θ,θ

([M(v, w)]=,=)=,= ([M(v, w)]=,=)=,= ([M(v, w)]=,=)=,= ([M(v, w)]⊂,⊂)=,= ([M(v, w)]=,=)θ,θ

([M(v, w)]=,=)⊂,⊂ ([M(v, w)]=,=)⊂,⊂ ([M(v, w)]=,=)⊂,⊂ ([M(v, w)]=,=)θ,θ

([M(v, w)]=,=)θ,θ ([M(v, w)]=,=)θ,θ ([M(v, w)]=,=)θ,θ

[M(v, w)]⊂,⊂ [M(v, w)]⊂,⊂ [M(v, w)]⊂,⊂ [M(v, w)]θ,θ

([M(v, w)]⊂,⊂)⊂,= ([M(v, w)]⊂,⊂)⊂,= ([M(v, w)]⊂,⊂)θ,θ

([M(v, w)]⊂,⊂)=,⊂ ([M(v, w)]⊂,⊂)=,⊂ ([M(v, w)]⊂,⊂)θ,θ

([M(v, w)]⊂,⊂)=,= ([M(v, w)]⊂,⊂)=,= ([M(v, w)]⊂,⊂)=,= ([M(v, w)]⊂,⊂)θ,θ

([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)⊂,⊂ ([M(v, w)]⊂,⊂)θ,θ

([M(v, w)]⊂,⊂)θ,θ ([M(v, w)]⊂,⊂)θ,θ ([M(v, w)]⊂,⊂)θ,θ

[M(v, w)]θ,θ [M(v, w)]θ,θ [M(v, w)]θ,θ

PROPOSITION 11. Let M be a valid EDM from T1[v] to T2[w] of [G(v, w)]⊂,=, then Msatisfies one and only one of the following assertions:

1. ∃wk ∈ son[w] such that:• π(wk) �= π(w) and M ∈ [G(v, wk)]⊂,= ∪ [G(v, wk)]=,=;• π(wk) = π(w) and M ∈ [G(v, wk)]⊂,=.

2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ [G(vk, w)]⊂,=.3. M ∈ [H(v, w)]⊂,=.4. M = ∅.

SKETCH OF THE PROOF. The proof of this proposition is based on Proposition 7 andLemmas 3 and 2. Let M be a valid EDM of [G(v, w)]⊂,=, then according to Proposition 7,M satisfies one and only one of the five assertions:

1. ∃wk ∈ son[w] such that M ∈ G(v, wk);2. ∃vk ∈ son[v] such that M ∈ G(vk, w);3. (v, w) ∈ M and M∗ ∈ H(v, w);4. M ∈ H(v, w);5. M = ∅.

Suppose that there exists wk , a son of w such that M is in G(v, wk). As establishedby Proposition 2, if M is in [G(v, wk)]⊂,=, then M is in [G(v, w)]⊂,= (corresponds tocases 1(a) and 2(a) of Table 1, i.e. first line, first column and second line, first column),if M is in [G(v, wk)]=,=, then M is in [G(v, w)]⊂,= if π(wk) �= π(w) (correspondsto case 1(c) of Table 1). In the other case, M is in [G(v, wk)]=,⊂ ∪ [G(v, wk)]⊂,⊂, andnecessarily M is not in [G(v, w)]⊂,= (corresponds to cases 1(b), 2(b), 1(d) and 2(d) ofTable 1). The other assertions are determined using the same scheme.

20 P. Ferraro and C. Godin

The detailed proof is given in the Appendix. This recursive expression of EDM setsis used to derive a recursive expression of the distance between two quotiented trees.

4.3. Recursive Expression of the Distance between Quotiented Trees. We respec-tively denote by [D(S1[v], S2[w])]⊂,=, [D(S1[v], S2[w])]=,⊂, [D(S1[v], S2[w])]=,= and[D(S1[v], S2[w])]⊂,⊂ the minimum cost �v,w(M) of EDMs M of sets [M(v, w)]⊂,=,[M(v, w)]=,⊂, [M(v, w)]=,= and [M(v, w)]⊂,⊂.

THEOREM 2. D(G1[v], G2[w]) can be computed recursively:

1. Initialization:

D(θ, θ) = 0,

D(H1[v], θ) =∑

vk∈son[v]

D(G1[vk], θ),

D(G1[v], θ) = D(H1[v], θ) + d(v, λ),

D(θ, H2[w]) =∑

wk∈son[w]

D(θ, G2[wk]),

D(θ, G2[w]) = D(θ, H2[w]) + d(λ, w).

2. Computation of the distance between quotiented trees:

D(G1[v], G2[w]) = min

D(θ, G2[w])

+ minwk∈son[w]{D(G1[v], G2[wk]) − D(θ, G2[wk])},D(G1[v], θ)

+ minvk∈son[v]{D(G1[vk], G2[w]) − D(G1[vk], θ)},D(H1[v], H2[w]) + d(v, λ) + d(λ, w),

[D(H1[v], H2[w])]=,= + d(v, w),

[D(H1[v], H2[w])]⊂,⊂ + d(v, w).

The recursive relation for computing the partial distance which appears in the previousequation is given, with proof, in the Appendix. These results are summarized in Figure 6by a dependency graph showing how quantities are recursively dependent one upon theother in the computation of D(G1[v], G2[w]). This graph shows that the computationof D(G1[v], G2[w]) ultimately relies on the computation of special restricted EDMs,detailed in the following section.

4.4. Restricted EDMs with Minimum Cost. From the definition of a restricted EDM(Proposition 6) the problem of finding the restricted EDMs with minimum cost is relatedto the minimum cost bipartite matching problem. However, we have to compute severaloptimal restricted EDMs depending on the image of π(v) and π(w). For each differentcase, we give a method for computing the optimal EDM, based on the modelling ofZhang [12], [8] as a minimum cost flow problem.

An Edit Distance between Quotiented Trees 21

( )a),1 F2F(D

T1 T2)( ,D ( )min RR

T1 T2)( ,D

U

=,

,= =

UU

,

,θ θ

,=

UU U,

U U

,

U U

,

U U

,

U U

,

,=

U

,=

U

, ==

, ==

, ==

, ==

, ==

=

U

,

((((( )

))))

[ ])( , T2T1D

)([ ])( , T2T1D

U

=,)([ ]), T2T1D

,= =)([ ])( , T2T1D

UU

,)([ ])( , T2T1D

,θ θ)([ ])( , T2T1D

[ ])( , T2T1D

[ ])( , T2T1D

[ ])( , T2T1D

[ ])( , T2T1D

[ ])( , T2T1D

[ ])( , T2T1D

(

( )v w,R,

U

=

Γ( (M))min][

v w

v w

v w

v w

v w

Γ( (M))min][ ,R )(( =,=)

Γ( (M))min][ ,R )(( =,=)

,

U

=

Γ( (M))min][ ,R )(( =,=)

,=

Γ( (M))min][ ,R )(( =,=)

,

U

Γ( (M))min][ ,R )(( =,=)

=

U

,

=

U

θ

v w

v w

v w

v w

v w

( )v w,R

,

U U

U

,

UU

,

UU

,

UU

,

UU

,

U

Γ( (M))min][ ,R )(( )

Γ( (M))min][ ,R )(( )

,

U

=

Γ( (M))min][ ,R )(( )

,=

Γ( (M))min][ ,R )(( )

Γ( (M))min][ ,R )((

=

U

,

=

θ

Γ( (M))min][

,=

U

)

,=

U( , F2F1D )[ ]

, ==( , F2F1D )[ ]( ) ,θ θ

, ==( , F2F1D )[ ]( ) UU

,

, ==( , F2F1D )[ ]( ) ,= =

, ==( , F2F1D )[ ]( ) U

=,

, ==( , F2F1D )[ ]( ) ,=

U ( , F2F1D )[ ]( , F2F1D )[ ]( , F2F1D )[ ]( , F2F1D )[ ]( , F2F1D )[ ]

U

=,

,= =UU

,

,θ θ

,=

UU U

,

U U,

U U

,U U

,

U U

,

( , F2F1D )[ ] =

U

,

((((( )

))))

( )b

Non recursive equation

Recursive equation

Fig. 6. Dependency graphs of the Zhang algorithm (a) and the algorithm for comparing quotiented trees (b).In each graph an arrow from node A to node B means that quantities appearing in node A can be expressed interms of the quantities in node B. Solid arrows correspond to non-recursive equations between the quantitiesof nodes A and B while dashed arrows correspond to recursive equations.

4.4.1. Modelling as a minimum cost flow problem. If nv = nw, this is exactly theminimum cost bibartite matching problem. If nv �= nw, we have to consider the extratrees in one of the forests. Suppose that nv > nw. One way to solve this problem is toadd nv − nw null trees to F2[w] and then use a bipartite matching. However, this resultsin redundant computation. We can reduce this problem directly to the minimum costmaximum flow problem by adding only one null tree to F2[w].

Given two forests F1[v] and F2[w], we assume that nv > nw. Let I = {v1, v2, . . . , vnv}

and J = {w1, w2, . . . , wnw}, where vk , 1 ≤ k ≤ nv , represents the tree T1[vk] and wk ,

1 ≤ k ≤ nw, represents the tree T2[wk]. We construct a graph R = (S, A) as follows:

• vertex set: S = {s, t, e} ∪ I ∪ J , where s is the source, t is the sink and e represents anull tree;

• edge set:

A ={ ⋃

vk∈son(v)

(s, vk)∪⋃

vk∈son(v)

(vk, e)∪⋃

wk∈son(w)

(wk, t)∪⋃

vk∈son(v)

( ⋃wl∈son(w)

(vk, wl)

)}.

All the edges have capacity one, except (e, t) whose capacity is nv − nw.

R is a network with integer capacities and the maximum flow f ∗ = nv = max{nv, nw}.

22 P. Ferraro and C. Godin

v5v4v3v2v1

v

w3w1 w2

w

v1

v2

v3

v4

v5

w1

w2

w3

⊃⊃,D1, (G1[v1],G2[w2])[ ]

,⊃.D1, (G1[v4],G2[w1])[ ]

D(G1[v4],G2[w3])1,

v5[ ],θ)

ts

e

1,0

1,0

D(1, 1G

(b)

G1 G2(a)

Fig. 7. Reduction of optimal restricted EDM research to the minimum cost flow problem. (a) Quotiented treesfor which an optimal restricted EDM is sought. (b) Representation of the optimal restricted EDM problem asa minimum cost flow problem to find a restricted EDM in [R(v, w)]⊂,⊂.

In the original graph proposed by Zhang [12], [8], the cost D(T1[vk], T2[wl]) isattached to each edge (vk, wl). However, finding an optimal matching in this case doesnot ensure that constraints (2) and (3) are satisfied for the quotient EDM. The sub-sectionsbelow show how we can modify the cost of edges to compute the optimal restricted EDMof [R(v, w)]⊂,=, [R(v, w)]=,⊂, [R(v, w)]=,= and [R(v, w)]⊂,⊂. A representation of thenetwork is given in Figures 7 and 8.

According to Proposition 6, for any restricted EDM M , there exists a partition Rof M and a matching K of K(v, w). We show in Propositions 26–32 that there existsone and only one element Mi of R and a pair (vk, wl) ∈ K such that Mi is a validEDM in G(vk, wl) and for any other element Mj �= Mi of the partition R there existsa pair (vp, wq) �= (vk, wl) such that Mj is a valid EDM in G(vp, wq); both Mi and Mj

belonging respectively to particular set of valid EDMs of the partition. The results ofthese propositions are summarized in Tables 5 and 6 and detailed in the Appendix. Notethat according to these propositions, if no Mi satisfies the condition (the particular setsof valid EDMs G(vk, wl) is empty), then [R(v, w)]α,β is necessarily empty.

An Edit Distance between Quotiented Trees 23

Table 5. For any M in [R(v, w)]α,β , this table gives the memberships of Mi . An empty cell denotes the emptyset. A cell which is composed of several sets, denotes the union of these sets.

Mi ∈π(vk) = π(v) π(vk) = π(v) π(vk) �= π(v) π(vk) �= π(v)

M ∈ π(wl) = π(w) π(wl) �= π(w) π(wl) = π(w) π(wl) �= π(w)

[R(v, w)]⊂,= [G(vk, wl)]⊂,= [G(vk, wl)]⊂,=[G(vk, wl)]=,=

[R(v, w)]=,⊂ [G(vk, wl)]=,⊂ [G(vk, wl)]=,⊂[G(vk, wl)]=,=

([R(v, w)]=,=)⊂,= ([G(vk, wl)]=,=)⊂,= ([G(vk, wl)]=,=)⊂,=([G(vk, wl)]=,=)=,=

([R(v, w)]=,=)=,⊂ ([G(vk, wl)]=,=)=,⊂ ([G(vk, wl)]=,=)=,⊂([G(vk, wl)]=,=)=,=

([R(v, w)]=,=)=,= ([G(vk, wl)]=,=)=,=([R(v, w)]=,=)⊂,⊂ ([G(vk, wl)]=,=)⊂,⊂([R(v, w)]⊂,⊂)⊂,= ([G(vk, wl)]⊂,⊂)⊂,= ([G(vk, wl)]⊂,⊂)⊂,=

([G(vk, wl)]⊂,⊂)=,=([R(v, w)]⊂,⊂)=,⊂ ([G(vk, wl)]⊂,⊂)=,⊂ ([G(vk, wl)]⊂,⊂)=,⊂

([G(vk, wl)]⊂,⊂)=,=([R(v, w)]⊂,⊂)=,= ([G(vk, wl)]⊂,⊂)=,= [G(vp, wq)]=,⊂ [G(vp, wq)]⊂,= G(vp, wq)

([G(vp, wq)]⊂,⊂)⊂,⊂ ([G(vp, wq)]⊂,⊂)⊂,= ([G(vp, wq)]⊂,⊂)=,⊂([G(vp, wq)]⊂,⊂)⊂,⊂ ([G(vp, wq)]⊂,⊂)⊂,⊂

([R(v, w)]⊂,⊂)⊂,⊂ ([G(vk, wl)]⊂,⊂)⊂,⊂ [G(vp, wq)]=,⊂ [G(vp, wq)]⊂,= G(vp, wq)([G(vp, wq)]⊂,⊂)⊂,= ([G(vp, wq)]⊂,⊂)=,⊂([G(vp, wq)]⊂,⊂)⊂,⊂ ([G(vp, wq)]⊂,⊂)⊂,⊂

Table 6. For any M in [R(v, w)]α,β , this table gives the memberships of Mj . An empty cell denotes the emptyset. A cell which is composed of several sets, denotes the union of these sets.

Mj ∈π(vk) = π(v) π(vk) = π(v) π(vk) �= π(v) π(vk) �= π(v)

M ∈ π(wl) = π(w) π(wl) �= π(w) π(wl) = π(w) π(wl) �= π(w)

[R(v, w)]⊂,=[R(v, w)]=,⊂

([R(v, w)]=,=)⊂,= ([G(vp, wq)]=,=)θ,θ

([R(v, w)]=,=)=,⊂ ([G(vp, wq)]=,=)θ,θ

([R(v, w)]=,=)=,= ([G(vp, wq)]=,=)=,= [G(vp, wq)]=,⊂ [G(vp, wq)]⊂,= G(vp, wq)([G(vp, wq)]=,=)⊂,⊂ [G(vp, wq)]⊂,⊂ [G(vp, wq)]⊂,⊂([G(vp, wq)]⊂,⊂)=,=([G(vp, wq)]⊂,⊂)⊂,⊂([G(vp, wq)]=,=)θ,θ

([R(v, w)]=,=)⊂,⊂ ([G(vp, wq)]=,=)θ,θ

([R(v, w)]⊂,⊂)⊂,=([R(v, w)]⊂,⊂)=,⊂([R(v, w)]⊂,⊂)=,= ([G(vp, wq)]⊂,⊂)=,= [G(vp, wq)]=,⊂ [G(vp, wq)]⊂,= G(vp, wq)

([G(vp, wq)]⊂,⊂)⊂,⊂ ([G(vp, wq)]⊂,⊂)⊂,= ([G(vp, wq)]⊂,⊂)=,⊂([G(vp, wq)]⊂,⊂)⊂,⊂ ([G(vp, wq)]⊂,⊂)⊂,⊂

([R(v, w)]⊂,⊂)⊂,⊂

24 P. Ferraro and C. Godin

v5v4v3v2v1

v

w3w1 w2

w

v1

v2

v3

v4

v5

w1

D1, (G1[v1],G2[w2])[ ] , ==

,⊃.D1, (G1[v4],G2[w2])[ ]

D(G1[v4],G2[w3])1,

w2

w3

e

1,0

1,0

D(1, G v1 5[ ],θ)

s t

v1

v2

v3

v4

v5

w1

D1, (G1[v1],G2[w2])[ ] , ==

,⊃.D1, (G1[v4],G2[w2])[ ]

D(G1[v4],G2[w3])1,

G1 G2

(a) (b)

w2

w3

e

1,0

1,0

D(1, G v1 5[ ],θ)

s t

Fig. 8. Reduction of optimal restricted EDM research to the minimum cost flow problem in [R(v, w)]=,=. Inthis case there must exist an edge between {w1, w2} (in π(w)) and {v1, v2, v3} (in π(v)). (a) This edge reachesw1. (b) This edge reaches w2.

According to these results, a cost is attached to each edge of the previous network.The next list gives the assigned cost to each edge of the network flow for computing([R(v, w)]⊂,⊂)=,=:

• For any vk and wl such that π(vk) = π(v) and π(wl) = π(w),

γ (vk, wl) = min{([D(T1[vk], T2[wl])]⊂,⊂)=,=, ([D(T1[vk], T2[wl])]⊂,⊂)⊂,⊂}.

• For any vk and wl such that π(vk) = π(v) and π(wl) �= π(w),

γ (vk, wl) = min{[D(T1[vk], T2[wl])]=,⊂,

([D(T1[vk], T2[wl])]⊂,⊂)⊂,⊂, ([D(T1[vk], T2[wl])]⊂,⊂)=,⊂}.

• For any vk and wl such that π(vk) �= π(v) and π(wl) = π(w),

γ (vk, wl) = min{[D(T1[vk], T2[wl])]⊂,=,

([D(T1[vk], T2[wl])]⊂,⊂)⊂,⊂, ([D(T1[vk], T2[wl])]⊂,⊂)⊂,=}.

• For any vk and wl such that π(vk) �= π(v) and π(wl) �= π(w),

γ (vk, wl) = D(T1[vk], T2[wl]).

An Edit Distance between Quotiented Trees 25

• For any vk in son(v),

γ (vk, θ1) = D(T1[vk], θ1).

• The other edge costs are null.

4.5. Algorithm and Complexity. The following algorithm computes a distance betweentwo quotiented trees:

Input: G1 and G2.Output: D(G1[x], G2[y]) for any x ∈ G1 and y ∈ G2.D(θ, θ) = 0.For v ∈ G1,For w ∈ G2,

D(θ, θ) = 0,

D(H1[v], θ) = ∑vk∈son[v]

D(G1[vk], θ),

D(G1[v], θ) = D(H1[v], θ) + d(v, λ),

D(θ, H2[w]) = ∑wk∈son[w]

D(θ, G2[wk]),

D(θ, G2[w]) = D(θ, H2[w]) + d(λ, w),

computation of

[D(H1[v], H2[w])]⊂,=,

[D(H1[v], H2[w])]=,⊂,

[D(H1[v], H2[w])]=,=[D(H1[v], H2[w])]⊂,⊂,

[D(H1[v], H2[w])]∅,∅,

(according to Propositions 22 and 23),

and then computation of

[D(G1[v], G2[w])]⊂,=,

[D(G1[v], G2[w])]=,⊂,

[D(G1[v], G2[w])]=,=,

[D(G1[v], G2[w])]⊂,⊂,

[D(G1[v], G2[w])]∅,∅,

D(G1[v], G2[w]) = min

D(θ, G2[w])

+ minwk∈son[w]{D(G1[v], G2[wk])−D(θ, G2[wk])},D(G1[v], θ)

+ minvk∈son[v]{D(G1[vk], G2[w]) − D(G1[vk], θ)},D(H1[v], H2[w]) + d(v, λ) + d(λ, w),

[D(H1[v], H2[w])]=,= + d(v, w),

[D(H1[v], H2[w])]⊂,⊂ + d(v, w).

26 P. Ferraro and C. Godin

At one step of the recursion, i.e. for given v and w, the computation of terms 1and 2 in D(G1[v], G2[w]) takes a time proportional to nv + nw. The computation ofterms D(H1[v], H2[w]), [D(H1[v], H2[w])]=,=, [D(H1[v], H2[w])]⊂,⊂ relies on thecomputation of minima which also takes a time proportional to nv + nw and on thecomputation of costs of restricted EDMs.

The computation of the cost of a restricted EDM in [R(v, w)]⊂,⊂, uses a graphwith integer capacities, non-negative edge costs and maximum flow f ∗ = nv + nw.The complexity of finding the minimum cost maximum flow for such a graph, usingthe improvement proposed by Tarjan [19], is O(m × | f ∗| × log2(n)) where m is thenumber of edges and n is the number of vertices. Here, n = nv + nw + 4 and m =nv × nw + 2nv + 2nw + 3; therefore the complexity is O(nv × nw × (nv + nw) ×log2(nv + nw)). The case of [R(v, w)]=,= is similar: as discussed in Proposition 29,a total number of min{|�(v)|, |�(w)|} graphs of flow are used, where the number ofedges is m = (nv − 1) × nw + 2nv + 2nw + 3 + |�(w)| if |�(w)| ≥ |�(v)| andm = nv × (nw − 1) + 2nv + 2nw + 3 + |�(v)| otherwise. The total complexity ofthe minimum cost maximum flow computation (here split into several sub-graphs) isO(min{|�(v)|, |�(w)|} × nv × nw × (nv + nw) × log2(nv + nw)).

The overall complexity of the algorithm is thus

O(|T1| × |T2| × (deg(T1) + deg(T2))

× min{degπ1(T1), degπ2

(T2)} × log2(deg(T1) + deg(T2))).

5. Conclusion. In this paper we have extended an algorithm to compute a distancebetween unordered trees [8] and thus we have defined a distance between quotiented trees.The resulting algorithm computes this distance recursively in polynomial time, using thedynamic programming principle. The highest source of the complexity is due to a bipartitematching problem which occurs when comparing the forests rooted at two given vertices.We adapted a minimum cost maximum flow algorithm to take account of constraintsderived from the quotiented structures. The final algorithm has the same complexity asZhang’s algorithm, multiplied by a factor min{degπ1

(T1), degπ2(T2)} which expresses

the mean number of components of a complex (i.e. of a macro-constituent).This work is part of a project to develop computer tools for studying plant architecture

[20], [21]. The proposed algorithm is currently integrated within tools dealing with thequantitative evaluation of plant similarity [9]. This algorithm opens new perspectivesfor the comparison of plant architectures by considering extensions of the algorithm tomultiscale tree graphs [10], [9].

Appendix. Proof of Propositions. Note that G(v, w) ⊆ T (v, w), and then G(v, w)

can be partitioned, according to Proposition 3, into sets (G(v, w))⊂,=, (G(v, w))=,⊂,(G(v, w))=,=, (G(v, w))⊂,⊂ and (G(v, w))θ,θ .

An Edit Distance between Quotiented Trees 27

PROOF OF PROPOSITION 1. For any S1[x] and S1[y] in S1(v) such that S1[x] ⊆ S1[y],by definition, y ≤ x , then

Mw/y = Mw/x ∪ {z2 ∈ V1[w] | ∃z1 ∈ V1[y]\V1[x]; (z1, z2) ∈ M}and thus Mw/x ⊆ Mw/y . Finally y ≤ x ,

M(S1[x]) ⊆ M(S1[y]).

PROOF OF PROPOSITION 2. Let M be a valid EDM from S1[v] to S2[w] such thatM(S1[x]) ⊆ M(S1[y]), following definition 2 of M , there are two cases dependingon the image S1[y]:

1. M(S1[y]) is a tree T2[t2]: there exists a vertex t1 in S1[y] such that (t1, t2) ∈ M . IfM(S1[x]) �= θ , then for any vertex z2 in M(S1[x]), such that z2 has an image z1 inS1[x], z2 is in M(S1[y]), that is, in T2[t2]. According to constraint (2):

t2 ≤ z2 ⇔ t1 ≤ z1.

Thus, S1[x] ⊆ S1[y].2. M(S1[y]) is a forest F2[t1 ∧ u1]: where t1 and u1 are two vertices t1 and u1 in S1[y]

such that t1 and u1 have respectively an image t2 and u2 by M . If M(S1[x]) �= θ ,then for any vertex z2 in M(S1[x]), such that z2 has an image z1 in S1[x], z2 is inM(S1[y]), that is, in F2[t2]. According to constraint (3):

t2 ∧ u1 < z2 ⇔ t1 ∧ u1 < z1.

Thus, S1[x] ⊆ S1[y].

The reciprocal is due to Proposition 1.

PROOF OF PROPOSITION 3.

1. If M = ∅, then M(S1[v]) = θ and M(S2[w]) = θ ;2. else, M �= ∅ and then necessarily M(S1[v]) �= θ and M(S2[w]) �= θ .

According to Proposition 2, M(S1[v]) ⊆ M(S2[w]) and M(S2[w]) ⊆ M(S1[v]), thusthere are four cases:(a) M(S1[v]) ⊂ S2[w] and M(S2[w]) = S1[v];(b) M(S1[v]) = S2[w] and M(S2[w]) ⊂ S1[v];(c) M(S1[v]) = S2[w] and M(S2[w]) = S1[v];(d) M(S1[v]) ⊂ S2[w] and M(S2[w]) ⊂ S1[v].

PROOF OF PROPOSITION 4. Let M be a valid EDM from T1[v] to T2[w], M is inT (v, w)=,= if and only if M(T1[v]) = T2[w] and M(T2[w]) = T1[v]. Then by thedefinition of M , v and w both have an image by M . By condition (2) of a valid EDM, v

and w are necessarily an image on each other.

PROOF OF PROPOSITION 5. Obvious following the definition of a valid EDM from F1[v]to F2[w].

28 P. Ferraro and C. Godin

PROOF OF PROPOSITION 6. See [8].

PROOF OF PROPOSITION 7. This proposition is decomposed into two parts:

1. Let M be a valid EDM from T1[v] to T2[w], according to Proposition 3 applied totrees:• either M(T1[v]) �= θ and M(T2[w]) �= θ and

(a) M(T1[v]) ⊂ T2[w] and M(T2[w]) = T1[v], then there exists wk , a son of w,such that M(T1[v]) ⊆ T2[wk]. If there is a sonwl ofw such that M(T2[wl]) �= θ ,then necessarily M(T1[v]) = T2[w] (a contradiction with our hypothesis),thus for any sons wl �= wk of w, M(T2[wl]) = θ and M(T2[wk]) �= θ ,M ∈ T (v, wk);

(b) M(T1[v]) = T2[w] and M(T2[w]) ⊂ T1[v], similar to the previous case;(c) M(T1[v]) = T2[w] and M(T2[w]) = T1[v], then Proposition 4 can be applied

and M∗ ∈ F(v, w);(d) M(S1[v]) ⊂ S2[w] and M(S2[w]) ⊂ S1[v], then Proposition 5 can be applied

and M ∈ F(v, w);(e) or either M(T1[v]) = θ and M(T2[w]) = θ and then M = ∅.

2. Let M be a valid EDM from F1[v] to F2[w], according to Proposition 3 applied toforests:• either M(F1[v]) �= θ and M(F2[w]) �= θ and

(a) M(F1[v]) ⊂ F2[w] and M(F2[w]) = F1[v], then there exists wk , a son of w,such that M(F1[v]) ⊆ F2[wk]. If there is a son wl of w such that M(F2[wl]) �=θ , then necessarily M(F1[v]) = F2[w] (a contradiction with our hypothesis),thus for any sons wl �= wk of w, M(F2[wl]) = θ and M(F2[wk]) �= θ ,M ∈ F(v, wk);

(b) M(F1[v]) = F2[w] and M(F2[w]) ⊂ F1[v], similar to the previous case;(c) M(F1[v]) = F2[w] and M(F2[w]) = F1[v], or M(F1[v]) ⊂ F2[w] and

M(F2[w]) ⊂ F1[v], then according to Proposition 6, M ∈ R(v, w);(d) or either M(F1[v]) = θ and M(F2[w]) = θ and then M = ∅.

PROOF OF THEOREM 1. This theorem was first proved by Zhang in [12] and [8].

PROOF OF LEMMA 1. The proof is similar to the proof of Theorem 2 proposed by Zhangin [8]. A composition of EDMs can be defined as follow. Let M1 be a valid EDM fromT1 to T2 and let M2 be a valid EDM from T2 to T3, then

M1 ◦ M2 = {(x, y) | ∃z s.t. (x, z) ∈ M1 and (z, y) ∈ M2}.This definition is exactly similar to the definition proposed by Zhang. To prove thatM1 ◦ M2, we can use the same scheme given by Zhang in his Lemma 2 [8]. To checkconditions (4)–(6), we thus just need to choose three pairs (x1, z1), (x2, z2) and (x3, z3)

in M1 ◦ M2 such that π(x1), π(x2), π(x3) are different. Since M1 and M2 are valid EDMs,conditions (4) and (5) are obviously verified. Condition (6) is also verified if we appliedthe proof of Zhang on complexes π(x1), π(x2), π(x3). Furthermore, the proof of Zhangto show γ (M1 ◦ M2) ≤ γ (M1) + γ (M2) can also be directly applied to quotiented treegraphs. Thus the same results can be applied in that case.

An Edit Distance between Quotiented Trees 29

Now to show that D is a distance we need to prove the following relations:

1. D(T1[v], T1[v]) = 0;2. D(T1[v], T2[w]) = D(T2[w], T1[v]);3. D(T1[v], T3[x]) ≤ D(T1[v], T2[w]) + D(T2[w], T3[x]).

Relations (1) and (2) are direct consequences of the definition of valid EDMs. Further-more, relation (3) is a result obtained from the definition of the composition betweenEDMs (see Theorem 2 of [8]).

PROOF OF LEMMA 2. For any x and y in S1[v] × S2[w], let M be a valid EDM fromS1[x] to S2[y] then M is a valid EDM from S1[v] to S2[w]. Furthermore, N (S1[π(v)]) =N (S1[π(x)]) and N (S2(π(w))) = N (S2[π(y)]). According to Proposition 3, there arethree cases depending on the position of S1[π(x)] in front of S2[π(y)]:

1. N (S1[π(x)]) = θ then, necessarily, N (S1[π(v)]) = θ ;2. N (S1[π(x)]) ⊂ S2[π(y)], then, necessarily, N (S1[π(v)]) ⊂ S2[π(y)] ⊆ S2[π(w)]

⇒ N (S1[π(v)]) ⊂ S2[π(w)];3. N (S1[π(x)]) = S2[π(y)], then, necessarily, N (S1[π(v)]) = S2[π(y)] and if π(y) =

π(w), then S2[π(y)] = S2[π(w)] ⇒ N (S1[π(v)]) = S2[π(w)]. Else π(y) > π(w)

then S2[π(y)] ⊂ S2[π(w)] ⇒ N (S1[π(v)]) ⊂ S2[π(w)].

PROOF OF PROPOSITION 8. Obvious from Lemma 2.

PROOF OF PROPOSITION 9. Obvious from Lemma 2.

PROOF OF PROPOSITION 10. Obvious from Lemma 2.

PROOF OF LEMMA 3. For any x and y in S1[v] × S2[w], and for any valid EDM M inM(x, y), N (S1(π(v))) = N (S1(π(x))). Furthermore, π(w) ≤ π(y) and then, accord-ing to Proposition 2, S2(π(y)) ⊆ S2(π(w)), thus if N (S1(π1(x))) ⊂ S2(π2(y)), thenN (S1(π1(v))) ⊂ S2(π2(w)).

If π(y) = π(w), then S2(π(y)) = S2(π(w)) and if N (S1(π1(x))) = S2(π2(y)), thenN (S1(π1(v))) = S2(π2(w)).

Otherwise, S2(π(y)) ⊂ S2(π(w)), N (S1(π(x))) = S2(π(y)) ⊂ S2(π(w)) ⇒N (S1(π1(v))) ⊂ S2(π2(w)).

Let M1 ∈ M(x, y) such that N1(S1(π(v))) ⊂ S2(π(w)) and N1(S2(π(w))) =S1(π(v)), then there exists y

′ ≥ y such that π(y′) �= π(w) and N1(S1(π(v))) ⊆

S2(π(y′)) and N1(S2(π(y

′))) = S1(π(v)). Let M2 ∈ M(z, t) and let M be the union

M1 ∪ M2. Note that M1 ∩ M2 = ∅, then the image of S2(π(y′)) by N is given by

N1, N (S2(π(y′))) = N1(S2(π(y

′))). According to the previous result, N (S2(π(y

′))) =

S1(π(v)) ⇒ N (S1(π(v))) ⊆ S2(π(y′)). Let z

′be a descendant of z such that π(z

′) �=

π(v) and N (S1(π(z′))) �= θ . N is an increasing function, S1(π(z

′)) ⊂ S1(π(v)) ⇔

N (S1(π(z′))) ⊆ N (S1(π(v))) ⊆ S2(π(y

′)), however, N (S1(π(z

′))) = N2(S1(π(z

′))) ⊆

S2(π(t′)) where t

′ ≥ t such that π(t′) �= π(w) or N (S1(π(z

′))) = N2(S1(π(z

′))) ⊆

S2(π(w)). In both cases N (S1(π(z′))) �⊆ S2(π(y

′)), this is a contradiction, thus

30 P. Ferraro and C. Godin

N (S1(π(z′))) = θ and N2(S1(π(z

′))) = θ . Finally, M = M1 and then N (S1(π(v))) ⊂

S2(π(w)) and N (S2(π(w))) = S1(π(v)), this means M ∈ M⊂,=(v, w).Following previous results, M2 is not a valid EDM from S1[z] to S2[t] such that

N2(S1(π(v))) ⊂ S2(π(w)) and N2(S2(π(w))) = S1(π(v)) (and conversely). M2 mustsatisfy one of the previous properties.

Reciprocal: obvious.

PROOF OF PROPOSITION 11. The proof of this proposition is based on Proposition 7 andLemmas 2 and 3. Let M be a valid EDM of [G(v, w)]⊂,=, then according to Proposition 7(G(v, w) ⊆ T (v, w)), M satisfies one and only one of the five assertions:

1. ∃wk ∈ son[w] such that M ∈ G(v, wk). As established by Proposition 2, M is in[G(v, w)]⊂,= if and only if M is in [G(v, wk)]⊂,= (corresponds to cases 1(a) and2(a) of Table 2), or if M is in [G(v, wk)]=,= and π(wk) �= π(w) (corresponds tocase 1(c) of the array). In the other cases, M is in [G(v, wk)]=,⊂ ∪ [G(v, wk)]⊂,⊂, andnecessarily M is not in [G(v, w)]⊂,= (corresponds to cases 1(b), 2(b), 1(d) and 2(d)of the array).

2. ∃vk ∈ son[v] such that M ∈ G(vk, w). As established by Proposition 2, M is in[G(v, w)]⊂,= if only if M is in [G(vk, w)]⊂,= and π(vk) = π(v).

3. (v, w) ∈ M and M∗ ∈ H(v, w), then N (T1(π(v))) = T2(π(w)) and N (T2(π(w))) =T1(π(v)), and then M is not in [G(v, w)]⊂,=.

4. M∗ ∈ H(v, w), and necessarily M is in [G(v, w)]⊂,= if and only M∗ is in[H(v, w)]⊂,=.

5. M = ∅.

[G(v, w)]=,⊂ is determined symmetrically from [G(v, w)]⊂,=.

PROPOSITION 12. Let M be a valid EDM from G1[v] to G2[w] of [G(v, w)]=,=, thenM satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that π(wk) = π(w) and M ∈ [G(v, wk)]=,= ;2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ [G(vk, w)]=,=;3. (v, w) ∈ M and M∗ ∈ [H(v, w)]=,= ∪ [H(v, w)]⊂,⊂;4. M ∈ [H(v, w)]=,=;5. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 13. Let M be a valid EDM from G1[v] to G2[w] of ([G(v, w)]=,=)⊂,=,then M satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that π(wk) = π(w) and M ∈ ([G(v, wk)]=,=)⊂,=;2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ ([G(vk, w)]=,=)⊂,=;3. (v, w) ∈ M and M∗ ∈ ([H(v, w)]=,=)⊂,= ∪ ([H(v, w)]⊂,⊂)⊂,=;4. M ∈ ([H(v, w)]=,=)⊂,=;5. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

An Edit Distance between Quotiented Trees 31

([G(v, w)]=,=)=,⊂ is determined symmetrically from ([G(v, w)]=,=)⊂,=.In the following, ([G(v, w)]=,=)=,= denotes ([G(v, w)]=,=)=,= ∪ ([G(v, w)]=,=)⊂,⊂.

PROPOSITION 14. Let M be a valid EDM from G1[v] to G2[w] of ([G(v, w)]=,=)=,=,then M satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that π(wk) = π(w) and M ∈ ([G(v, wk)]=,=)=,=;2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ ([G(vk, w)]=,=)=,=;3. (v, w) ∈ M and M∗ ∈ ([H(v, w)]=,=)=,= ∪ ([H(v, w)]⊂,⊂)=,=;4. M ∈ ([H(v, w)]=,=)=,=;5. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 15. Let M be a valid EDM from G1[v] to G2[w] of [G(v, w)]⊂,⊂, thenM satisfies one of the following assertions:

1. M ∈ [H(v, w)]⊂,⊂;2. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 16. Let M be a valid EDM from G1[v] to G2[w] of ([G(v, w)]⊂,⊂)⊂,=,then M satisfies one of the following assertions:

1. M ∈ ([H(v, w)]⊂,⊂)⊂,=;2. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

([G(v, w)]⊂,⊂)=,⊂ is determined symmetrically from ([G(v, w)]⊂,⊂)⊂,=.In the following, ([G(v, w)]⊂,⊂)=,= denotes ([G(v, w)]⊂,⊂)=,= ∪ ([G(v, w)]⊂,⊂)⊂,⊂.

PROPOSITION 17. Let M be a valid EDM from G1[v] to G2[w] of ([G(v, w)]⊂,⊂)=,=,then M satisfies one of the following assertions:

1. M ∈ ([H(v, w)]⊂,⊂)=,=;2. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 18. Let M be a valid EDM from H1[v] to H2[w] of [H(v, w)]⊂,=, thenM satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that:(a) π(wk) �= π(w) and M ∈= [H(v, wk)]⊂,= ∪ [H(v, wk)]=,= or(b) π(wk) = π(w) and M ∈ [H(v, wk)]⊂,=;

32 P. Ferraro and C. Godin

2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ [H(vk, w)]⊂,=;3. M ∈ [R(v, w)]⊂,=;4. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

[H(v, w)]=,⊂ is determined symmetrically from [H(v, w)]⊂,=.

PROPOSITION 19. Let M be a valid EDM from H1[v] to H2[w] of [H(v, w)]=,=, thenM satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that π(wk) = π(w) and M ∈ [H(v, wk)]=,=;2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ [H(vk, w)]=,=;3. M ∈ [R(v, w)]=,=;4. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 20. Let M be a valid EDM from H1[v] to H2[w] of ([H(v, w)]=,=)⊂,=,then M satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that π(wk) = π(w) and M ∈ ([H(v, wk)]=,=)⊂,=;2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ ([H(vk, w)]=,=)⊂,=;3. M ∈ ([R(v, w)]=,=)⊂,=;4. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 21. Let M be a valid EDM from H1[v] to H2[w] of ([H(v, w)]=,=)=,=,then M satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that π(wk) = π(w) and M ∈ ([H(v, wk)]=,=)=,=;2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ ([H(vk, w)]=,=)=,=;3. M ∈ ([R(v, w)]=,=)=,=;4. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 22. Let M be a valid EDM from H1[v] to H2[w] of [H(v, w)]⊂,⊂, thenM satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that:(a) M ∈ [H(v, wk)]⊂,⊂ or(b) π(wk) �= π(w) and M ∈ [H(v, wk)]=,⊂;

2. ∃vk ∈ son[v] such that:(a) M ∈ [H(vk, w)]⊂,⊂ or(b) π(vk) �= π(v) and M ∈ [H(vk, w)]⊂,=;

An Edit Distance between Quotiented Trees 33

3. M ∈ [R(v, w)]⊂,⊂;4. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 23. Let M be a valid EDM from H1[v] to H2[w] of ([H(v, w)]⊂,⊂)⊂,=,then M satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that:(a) π(wk) = π(w) and M ∈ ([H(v, wk)]⊂,⊂)⊂,= or(b) π(wk) �= π(w) and M ∈ ([H(v, wk)]⊂,⊂)⊂,= ∪ ([H(v, wk)]⊂,⊂)=,=;

2. ∃vk ∈ son[v] such that π(vk) = π(v) and M ∈ ([H(vk, w)]⊂,⊂)⊂,=;3. M ∈ ([R(v, w)]⊂,⊂)⊂,=;4. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

PROPOSITION 24. Let M be a valid EDM from H1[v] to H2[w] of ([H(v, w)]⊂,⊂)=,=,then M satisfies one of the following assertions:

1. ∃wk ∈ son[w] such that:(a) π(wk) = π(w) and M ∈ ([H(v, wk)]⊂,⊂)=,= or(b) π(wk) �= π(w) and M ∈ ([H(v, wk)]⊂,⊂)=,⊂ ∪ [H(v, wk)]=,⊂;

2. ∃vk ∈ son[v] such that:(a) π(vk) = π(v) and M ∈ ([H(vk, w)]⊂,⊂)=,= or(b) π(vk) �= π(v) and M ∈ ([H(vk, w)]⊂,⊂)⊂,= ∪ [H(vk, w)]⊂,=;

3. M ∈ ([R(v, w)]⊂,⊂)=,=;4. M = ∅.

PROOF. The proof is similar to the proof of Proposition 11.

THEOREM 3. Let v1, v2, . . . , vn be the sons of v and let w1, w2, . . . , wp be the sons ofw, then

D(G1[v], G2[w]) = min

D(θ, G2[w])

+ minwk∈son[w]{D(G1[v], G2[wk]) − D(θ, G2[wk])},D(G1[v], θ)

+ minvk∈son[v]{D(G1[vk], G2[w]) − D(G1[vk], θ)},[D(H1[v], H2[w])]⊂,= + d(v, λ) + d(λ, w),

[D(H1[v], H2[w])]=,⊂ + d(v, λ) + d(λ, w),

[D(H1[v], H2[w])]⊂,⊂ + d(v, w),

[D(H1[v], H2[w])]=,= + d(v, w).

34 P. Ferraro and C. Godin

LEMMA 4. Let v1, v2, . . . , vn be the sons of v and let w1, w2, . . . , wp be the sons of w,then

[D(H1[v], H2[w])]⊂,=

= min

D(H1[v], θ) + minwk∈son[w]{[D(H1[v], H2[wk])]⊂,= − D(θ, H2[wk])},D(H1[v], θ) + minwk |π(wk )�=π(w){[D(H1[v], H2[wk])]=,= − D(θ, H2[wk])},D(θ, H2[w]) + minvk |π(vk )�=π(v){[D(H1[vk], H2[w])]⊂,= − D(H1[vk], θ)},minR∈[[R(v,w)]=,=]⊂,={�v,w(R)}.

LEMMA 5. Let v1, v2, . . . , vn be the sons of v and let w1, w2, . . . , wp be the sons of w,then

[D(H1[v], H2[w])]=,= = min

[[D(H1[v], H2[w])]=,=]⊂,=,

[[D(H1[v], H2[w])]=,=]=,⊂,

[[D(H1[v], H2[w])]=,=]=,=,

[[D(H1[v], H2[w])]=,=]θ,θ ,

([D(H1[v], H2[w])]=,=)⊂,=

= min

D(θ, H2[w])

+ minwk |π(wk )=π(w){([D(H1[v], H2[wk])]=,=)⊂,= − D(θ, H2[wk])},D(H1[v], θ)

+ minvk |π(vk )=π(v){([D(H1[vk], H2[w])]=,=)⊂,= − D(H1[vk], θ)},minR∈[[R(v,w)]=,=]⊂,={�v,w(R)},

([D(H1[v], H2[w])]=,=)=,=

= min

D(θ, H2[w])

+ minwk |π(wk )=π(w){([D(H1[v], H2[wk])]=,=)=,= − D(θ, H2[wk])},D(H1[v], θ)

+ minvk |π(vk )=π(v){([D(H1[vk], H2[w])]=,=)=,= − D(H1[vk], θ)},minR∈([R(v,w)]=,=)=,={�v,w(R)},

([D(H1[v], H2[w])]=,=)θ,θ

= min

D(θ, H2[w])

+ minwk |π(wk )=π(w){([D(H1[v], H2[wk])]=,=)θ,θ − D(θ, H2[wk])},D(H1[v], θ)+minvk |π(vk )=π(v){([D(H1[vk], H2[w])]=,=)θ,θ −D(H1[vk], θ)},minR∈([R(v,w)]=,=)θ,θ

{�v,w(R)}.

An Edit Distance between Quotiented Trees 35

LEMMA 6. Let v1, v2, . . . , vn be the sons of v and let w1, w2, . . . , wp be the sons of w,then

[D(H1[v], H2[w])]⊂,⊂ = min

([D(H1[v], H2[w])]⊂,⊂)⊂,=,

([D(H1[v], H2[w])]⊂,⊂)=,⊂,

([D(H1[v], H2[w])]⊂,⊂)=,=,

([D(H1[v], H2[w])]⊂,⊂)⊂,=

=min

D(θ, H2[w]) + minwk∈son[w]{([D(H1[v], H2[wk])]⊂,⊂)⊂,=−D(θ, H2[wk])},D(θ, H2[w])+minπ(wk )�=π(w){([D(H1[v], H2[wk])]⊂,⊂)=,=−D(θ, H2[wk])},D(H1[v], θ) + minπ(vk )�=π(v){([D(H1[vk], H2[w])]⊂,⊂)⊂,=−D(H1[vk], θ)},minR∈([R(v,w)]⊂,⊂)⊂,={�v,w(R)},

([D(H1[v], H2[w])]⊂,⊂)=,=

=min

D(θ, H2[w])+minπ(wk )=π(w){([D(H1[v], H2[wk])]⊂,⊂)=,=−D(θ, H2[wk])},D(θ, H2[w])+minπ(wk )�=π(w){([D(H1[v], H2[wk])]⊂,⊂)=,⊂−D(θ, H2[wk])},D(θ, H2[w]) + minπ(wk )�=π(w){[D(H1[v], H2[wk])]=,⊂ − D(θ, H2[wk])},D(H1[v], θ) + minπ(vk )=π(v){([D(H1[vk], H2[w])]⊂,⊂)=,= − D(H1[vk], θ)},D(H1[v], θ) + minπ(vk )�=π(v){([D(H1[vk], H2[w])]⊂,⊂)⊂,= − D(H1[vk], θ)},D(H1[v], θ) + minπ(vk )�=π(v){[D(H1[vk], H2[w])]⊂,⊂ − D(H1[vk], θ)},minR∈([R(v,w)]⊂,⊂)=,={�v,w(R)}.

LEMMA 7. Let v1, v2, . . . , vn be the sons of v and let w1, w2, . . . , wp be the sons of w,then

[D(G1[v], G2[w])]⊂,=

= min

D(θ, G2[w]) + minwk∈son[w]{[D(G1[v], T2[wk])]⊂,= − D(θ, G2[wk])},D(θ, G2[w]) + minwk |π(wk )�=π(w){[D(G1[v], G2[wk])]=,= − D(θ, G2[wk])},[D(H1[v], H2[w])]⊂,= + d(v, λ) + d(λ, w),

[D(G1[v], G2[w])]=,=

= min

D(θ, G2[w]) + minwk |π(wk )=π(w){[D(G1[v], G2[wk])]=,= − D(θ, G2[wk])},D(G1[v], θ) + minvk |π(vk )=π(v){[D(G1[vk], G2[w])]=,= − D(G1[vk], θ)},[D(H1[v], H2[w])]=,= + d(v, w),

[D(H1[v], H2[w])]⊂,⊂ + d(v, w),

36 P. Ferraro and C. Godin

([D(G1[v], G2[w])]=,=)⊂,=

= min

D(θ, G2[w])

+ minwk |π(wk )=π(w){([D(G1[v], G2[wk])]=,=)⊂,= − D(θ, G2[wk])},D(G1[v], θ)

+ minvk |π(vk )=π(v){([D(G1[vk], G2[w])]=,=)⊂,= − D(G1[vk], θ)},([D(H1[v], H2[w])]=,=)⊂,= + d(v, w),

([D(H1[v], H2[w])]⊂,⊂)⊂,= + d(v, w),

([D(G1[v], G2[w])]=,=)=,=

= min

D(θ, G2[w])

+ minwk |π(wk )=π(w){([D(G1[v], G2[wk])]=,=)=,= − D(θ, G2[wk])},D(G1[v], θ)

+ minvk |π(vk )=π(v){([D(G1[vk], G2[w])]=,=)=,= − D(G1[vk], θ)},([D(H1[v], H2[w])]=,=)=,= + d(v, w),

([D(H1[v], H2[w])]⊂,⊂)=,= + d(v, w),

[D(G1[v], G2[w])]⊂,⊂ = [D(H1[v], H2[w])]⊂,⊂ + d(v, λ) + d(λ, w),

([D(G1[v], G2[w])]⊂,⊂)⊂,= = ([D(H1[v], H2[w])]⊂,⊂)⊂,= + d(v, λ) + d(λ, w),

([D(G1[v], G2[w])]⊂,⊂)=,= = ([D(H1[v], H2[w])]⊂,⊂)=,= + d(v, λ) + d(λ, w).

Proofs of the following propositions are a direct consequence of Propositions 8–10.

PROPOSITION 25. For any restricted EDM M in [R(v, w)]⊂,=, there exists vk and wl

sons of v and w such that:

• π(vk) = π(v), π(wl) = π(w) and M ∈ [G(vk, wl)]⊂,= or• π(vk) = π(v), π(wl) �= π(w) and M ∈ [G(vk, wl)]⊂,= ∪ [G(vk, wl)]=,=.

[R(v, w)]=,⊂ is determined by symmetry.

PROPOSITION 26. For any restricted EDM M in ([R(v, w)]⊂,⊂)⊂,=, there exists vk andwl sons of v and w such that:

• π(vk) = π(v), π(wl) = π(w) and M ∈ ([G(vk, wl)]⊂,⊂)⊂,= or• π(vk) = π(v), π(wl) �= π(w) and M ∈ ([G(vk, wl)]⊂,⊂)⊂,= ∪ ([G(vk, wl)]⊂,⊂)=,=.

([R(v, w)]⊂,⊂)=,⊂ is determined by symmetry.

PROPOSITION 27. For any restricted EDM M in ([R(v, w)]⊂,⊂)=,=, there exists a parti-tion R of M and a matching K ofK(v, w) such that there exists M ′ in ([G(vk, wl)]⊂,⊂)=,=

An Edit Distance between Quotiented Trees 37

with π(vk) = π(v), π(wl) = π(w) and for any elements M ′′ of the partition R thereexists a pair (vp, wq) ∈ K :

• if π(vq) = π(v) and π(wp) = π(w), then

M ′ ∈ ([G(vp, wq)]⊂,⊂)=,= ∪ ([G(vp, wq)]⊂,⊂)⊂,⊂;• if π(vq) = π(v) and π(wp) �= π(w), then

M ′′ ∈ [G(vp, wq)]=,⊂ ∪ ([G(vp, wq)]⊂,⊂)⊂,⊂ ∪ ([G(vp, wq)]⊂,⊂)=,⊂;• if π(vq) �= π(v) and π(wp) = π(w), then

M ′′ ∈ [G(vp, wq)]⊂,= ∪ ([G(vp, wq)]⊂,⊂)⊂,⊂ ∪ ([G(vp, wq)]⊂,⊂)⊂,=;• if π(vq) �= π(v) and π(wp) �= π(w), then M ′′ ∈ G(vp, wq).

PROPOSITION 28. For any restricted EDM M in ([R(v, w)]⊂,⊂)⊂,⊂, there exists a par-tition R of M and a matching K ofK(v, w) such that for any elements M ′ of the partitionR there exists a pair (vp, wq) ∈ K such that:

• π(vp) = π(v), π(wq) = π(w) and M ′ ∈ ([G(vp, wq)]⊂,⊂)⊂,⊂;• π(vp) = π(v), π(wq) �= π(w) and

M ′ ∈ [G(vp, wq)]=,⊂ ∪ ([G(vp, wq)]⊂,⊂)⊂,⊂ ∪ ([G(vp, wq)]⊂,⊂)=,⊂;• if π(vq) �= π(v) and π(wp) = π(w), then

M ′ ∈ [G(vp, wq)]⊂,= ∪ ([G(vp, wq)]⊂,⊂)⊂,⊂ ∪ ([G(vp, wq)]⊂,⊂)⊂,=;• if π(vq) �= π(v) and π(wp) �= π(w), then M ′ ∈ G(vp, wq).

PROPOSITION 29. For any restricted EDM M in ([R(v, w)]=,=)⊂,= , there exists vk andwl sons ofv andw such thatπ(vk) = π(v),π(wl) = π(w)and M ∈ ([G(vk, wl)]=,=)⊂,=.

([R(v, w)]=,=)=,⊂ is determined by symmetry.

PROPOSITION 30. For any restricted EDM M in ([R(v, w)]=,=)⊂,⊂, there exists apartition R of M and a matching K of K(v, w) such that there exists an elementM ′ of R and a pair (vk, wl) ∈ K such that π(vk) = π(v), π(wl) = π(w) andM ′ ∈ ([G(vk, wl)]=,=)⊂,⊂ and for any elements M ′′ of the partion R there exists apair (vp, wq) ∈ K satisfying one of these two properties:

• if π(vp) = π(v) and π(wq) = π(w), then M ′′ ∈ ([G(vp, wq)]=,=)θ,θ ;• otherwise, M ′′ = ∅.

PROPOSITION 31. For any restricted EDM M in ([R(v, w)]=,=)θ,θ , there exists a parti-tion R of M and a matching K ofK(v, w) such that there exists an element M ′ of R and apair (vk, wl) ∈ K such that π(vk) = π(v), π(wl) = π(w) and M ′ ∈ ([G(vk, wl)]=,=)θ,θ

38 P. Ferraro and C. Godin

and for any elements M ′′ of the partion R there exists a pair (vp, wq) ∈ K satisfyingone of these two properties:

• if π(vp) = π(v) and π(wq) = π(w), then M ′′ ∈ ([G(vp, wq)]=,=)θ,θ ;• otherwise, M ′′ = ∅.

PROPOSITION 32. For any restricted EDM M in ([R(v, w)]=,=)=,=, there exists apartition R of M and a matching K of K(v, w) such that there exists an elementM ′ of R and a pair (vk, wl) ∈ K such that π(vk) = π(v), π(wl) = π(w) andM ′ ∈ ([G(vk, wl)]=,=)=,= and for any elements M ′′ of the partion R there exists apair (vp, wq) ∈ K satisfying one of these four properties:

• if π(vp) = π(v) and π(wq) = π(w), then

M ′′ ∈ ([G(vp, wq)]=,=)=,= ∪ ([G(vp, wq)]=,=)⊂,⊂ ∪ ([G(vp, wq)]=,=)θ,θ

∪ ([G(vp, wq)]⊂,⊂)=,= ∪ ([G(vp, wq)]⊂,⊂)⊂,⊂;• if π(vp) = π(v) and π(wq) �= π(w), then M ′′ ∈ [G(vp, wq)]⊂,= ∪ [G(vp, wq)]⊂,⊂;• if π(vp) �= π(v) and π(wq) = π(w), then M ′′ ∈ [G(vp, wq)]=,⊂ ∪ [G(vp, wq)]⊂,⊂;• M ′′ ∈ G(vp, wq).

References

[1] R. A. Wagner and M. J. Fisher, The string-to-string correction problem, Journal of the Association forComputing Machinery, vol. 21, pp. 168–173, 1974.

[2] S. M. Selkow, The tree-to-tree editing problem, Information Processing Letters, vol. 6, no. 6, pp. 184–186, 1977.

[3] K.-C. Tai, The tree-to-tree correction problem, Journal of the Association for Computing Machinery,vol. 26, no. 3, pp. 422–433, 1979.

[4] S.-Y. Lu, A tree-to-tree distance and its application to cluster analysis, IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 1, pp. 219–224, 1979.

[5] M. Farach and M. Thorup, Sparse dynamic programming for evolutionary-tree comparison, SIAMJournal of Computing, vol. 26, no. 1, pp. 210–230, 1997.

[6] Y. Takahashi, Y. Satoh, H. Suzuki, and S. Sasaki, Recognition of largest common structural fragmentamong a variety of chemical structures, Analytical Science, vol. 3, pp. 23–28, 1987.

[7] B. A. Shapiro and K. Zhang, Comparing multiple RNA secondary structures using trees comparisons,Cabios, vol. 6, pp. 309–318, 1990.

[8] K. Zhang, A constrained edit distance between unordered labeled trees, Algorithmica, vol. 15, pp. 205–222, 1996.

[9] P. Ferraro and C. Godin, A distance measure between plant architectures, Annals of Forest Science,vol. 57, pp. 445–461, June 2000.

[10] C. Godin and Y. Caraglio, A multiscale model of plant topological structures, Journal of TheoreticalBiology, vol. 191, pp. 1–46, 1998.

[11] A. Ohmori and E. Tanaka, A unified view on tree metrics, Syntactic and Structural Pattern Recognition,vol. 45, pp. 85–100, 1988.

[12] K. Zhang, A new editing-based distance between unordered trees, in Combinatorial Pattern Matching,4th Annual Symposium (Padala, Italy), pp. 254–265, 1993.

[13] P. Ferraro, Proprietes des alignements valides entre arborescences quotientees, Technical Report, Cirad,Plant Modelling Program, Montpellier, 2000.

[14] K. Zhang and T. Jiang, Some max snp-hard results concerning unordered labeled trees, InformationProcessing Letters, vol. 49, pp. 249–254, 1994.

An Edit Distance between Quotiented Trees 39

[15] P. Kilpellainen and H. Mannila, The tree inclusion problem, in Proceedings of the International JointConference on the Theory and Practice of Software, vol. 1, pp. 202–214, 1991.

[16] E. Tanaka and K. Tanaka, The tree-to-tree editing problem, International Journal of Pattern Recognitionand Artificial Intelligence, vol. 2, no. 2, pp. 221–240, 1988.

[17] M. Minoux, Mathematical Programming. Theory and Algorithms, Wiley-Interscience, New York, 1986.[18] C. Berge, Hypergraphs, Elsevier, Amsterdam, 1989.[19] R. E. Tarjan, Data structures and Network Algorithms, CBMS–NFS Regional Conference Series in

Applied Mathematics, CBMS, Washington, DC, 1983.[20] C. Godin, E. Costes, and Y. Caraglio, Exploring plant topological structure with the amapmod software:

an outline, Silva Fennica, vol. 31, pp. 355–366, 1997.[21] C. Godin, Y. Guedon, and E. Costes, Exploration of a plant architecture database with the amamod

software illustrated on an apple tree hybrid family, Agronomie, vol. 19, pp. 163–184, March–May 1999.