A P system and a constructive membrane-inspired DNA algorithm for solving the Maximum Clique Problem

A

amtacl©

K

1

capcAptttp

f

(

0

BioSystems 90 (2007) 687–697

Available online at www.sciencedirect.com

A P system and a constructive membrane-inspired DNAalgorithm for solving the Maximum Clique Problem

Marc Garcıa-Arnau a,∗, Daniel Manrique a, Alfonso Rodrıguez-Paton a, Petr Sosık a,b

a Departamento Inteligencia Artificial, Universidad Politecnica de Madrid (UPM), Boadilla del Monte s/n, 28660 Madrid, Spainb Institute of Computer Science, Faculty of Philosophy and Science, Silesian University, Bezrucovo nam. 13, 74601 Opava, Czech Republic

Received 2 April 2006; received in revised form 22 December 2006; accepted 20 February 2007

bstract

We present a P system with replicated rewriting to solve the Maximum Clique Problem for a graph. Strings representing cliquesre built gradually. This involves the use of inhibitors that control the space of all generated solutions to the problem. Calculating theaximum clique for a graph is a highly relevant issue not only on purely computational grounds, but also because of its relationship

o fundamental problems in genomics. We propose to implement the designed P system by means of a DNA algorithm. This
lgorithm is then compared with two standard papers that addressed the same problem and its DNA implementation in the past. Thisomparison is carried out on the basis of a series of computational and physical parameters. Our solution features a significantlyower cost in terms of time, the number and size of strands, as well as the simplicity of the biological implementation.
2007 Elsevier Ireland Ltd. All rights reserved.

comput
eywords: Membrane computing; Maximum Clique Problem; DNA
. Introduction

Natural computing is the name given to the disciplineovering a number of areas working with unconventionalnd naturally inspired computational models. One of therimary fields making up this discipline is biomolecularomputing, which came into being in 1994 after Leonarddleman’s famous experiment (Adleman, 1994). In hisaper, Adleman used DNA molecules for the first timeo solve a complex computational problem: the Hamil-
onian Path Problem. A year later Lipton generalized theechniques that Adleman had used, proposing a com-utational model that solved the Satisfiability Problem
∗ Corresponding author. Tel.: +34 91 336 69 07;ax: +34 91 352 48 19.

E-mail address: [email protected]. Garcıa-Arnau).

303-2647/$ – see front matter © 2007 Elsevier Ireland Ltd. All rights reservdoi:10.1016/j.biosystems.2007.02.005

ing; Constructive approach; NP-complete problem

for logical formulas, SAT (Lipton, 1995). Since then,many papers exploring the construction of new mod-els and examining the use of molecules as support forcomputation have been published. The following docu-ments are good references for researchers in this area:Paun (1998), Hagiya (1999), Amos (1999) and Paun etal. (1998). Moreover, two excellent references of cur-rent work in this discipline are Benenson et al. (2004)and Seelig et al. (2006). Another of the major milestonesin the short history of natural computing was, unques-tionably, the emergence of membrane computing (alsoknown as P systems), introduced by Gheorghe Paun in1998 (Paun, 2000). In his seminal article “Computingwith Membranes”, Paun presented a new abstract com-putational model inspired by the structure and behaviour
of living cells. As result of that early work, a sizeablegroup of researchers were seduced by membrane com-puting and a lot of related literature has seen the lightsince then (Calude and Paun, 2004).
ed.

mailto:[email protected]

dx.doi.org/10.1016/j.biosystems.2007.02.005

BioSys
688 M. Garcıa-Arnau et al. /
Many of these papers addressing both membrane andDNA computing have tackled computationally difficultproblems (class NP-complete) (Ledesma et al., 2004,2005). NP-complete problems have two prominent fea-tures: (1) there are as yet no polynomial algorithms tosolve them, and (2) all their “yes” instances are certifiedto be verifiable efficiently (Garey and Johnson, 1979).Although, thanks to the parallel and distributed capabil-ities of these new paradigms, it has been possible to solveNP-complete problems in polynomial time, researchershave not managed to limit the exponential growth ofsome of the other resources involved in problem solving,such as the number of strings or molecules or even thenumber of hardware components needed. Therefore, agreat deal of effort (especially in molecular computing)has gone into optimizing the exponential consumptionof these resources. Several strategies have been pro-posed for this purpose: first, the adaptation of especiallyspace-efficient classical algorithms to a DNA scenario,as was done for 3SAT in Ogihara (1996). Second, thecreation of new constructive algorithms conscientiouslydesigned to optimize the number of DNA strings, as inthe solution of the maximum independent set problemin Bach et al. (1996) (which nevertheless involves somepre-processing work on a conventional computer). Third,the use of strategies that reduce the number of biologicaloperations needed and the resulting risk of error, as inthe work of Manca and Zandron (2001) to solve the SAT.Fourth, the application of dynamic DNA programmingtechniques, as demonstrated in Baum and Boneh (1996)to solve the Knapsack Problem or even the creation ofcomputational models based on a destructive strategyover RNA strands as in Cukras et al. (1999) for solv-ing the Knight Problem. After analysing these and otherpapers, it is clear that, when dealing with an NP-completeproblem, a compromise has to be reached between spaceoptimization and algorithm time, without overlookingthe number and complexity of the operations involved.Indeed, space-inefficient linear-time algorithms soonplace limitations on the size of the instances that they canmanage to solve, whereas algorithms that mind spaceefficiency can turn out to be relatively slow or com-plex because of the type of operations they use, whichincreases the likelihood of processing errors.

Based on these points, we use membrane computingin this article to solve a standard NP-complete prob-lem: calculating the maximum clique for a graph. Thisproblem is very interesting for several reasons. First, it
is related to the Common Algorithmic Problem (CAP)defined by Tom Head in Head et al. (1999). In this article,Head defended that the solution of many NP-completeproblems fits one and the same algorithmic mould. Later,
tems 90 (2007) 687–697

this problem was further examined in Perez-Jimenez andRomero-Campero (2005), and a family of recognizer Psystems with active membranes was used to solve it. TheCAP definition can be easily mapped to the MaximumClique Problem for a graph. Indeed, it has been foundthat there is a series of NP-complete Constraint Satis-faction Problems (CSP) that can be solved by findingthe maximum clique of the complementary graph of theproblem constraints graph. Apart from its purely compu-tational appeal, the Maximum Clique Problem is also ofnotable biological interest. Indeed, important problemshave recently been identified within the field of systemsbiology that call for the solution of the Maximum CliqueProblem, the maximum independent set problem or cal-culating all the maximal cliques for a graph. Some ofthese problems are compiled and stated in Butenko andWilhelm (2005), including matching three-dimensionalmolecular structures, protein docking or genome rear-rangements and mapping genome data.

The remainder of the article is organized as it fol-lows. Section 2 presents the definition of the MaximumClique Problem, reviews a number of papers that haveaddressed the problem to date and proposes a P systemwith replicated rewriting and inhibitors to solve the prob-lem. The proposed P system adopts a problem-solvingconstructive strategy, quite different from the brute forcestrategies usually used in both membrane computing andmany biomolecular algorithms. In Section 3, we pro-pose the implementation of this P system using a DNAalgorithm. Being thus implemented, the efficiency of theproposed solution can be compared in Section 4 with thatof two standard papers that tackled the same problemearlier in Ouyang et al. (1997) and Head et al. (1999).A number of computational and physical parameters aredefined to make this comparison. Finally, Section 5 setsout the final remarks.

2. A P system with replicated rewriting andinhibitors for solving the Maximum CliqueProblem

2.1. The Maximum Clique Problem and relatedwork

Let G = (V, E) be a graph with n nodes. A clique in Gis defined as a subset V′ ⊂ V such that each two verticesin V′ are connected by an arc in E. The Maximum CliqueProblem then involves finding the biggest subset V′ of
totally connected nodes in the graph.
Over the last decade, many papers have tackled thisproblem from the natural computing paradigm point ofview. Indeed, a DNA computing model that uses the gen-

BioSyst

eoipCrtpfBlePpswo(ssemttZco

iai

2i

mbcrmcrn

oaNectb

M. Garcıa-Arnau et al. /

ration of integer combinations and a series of biologicalperations to solve NP-complete problems is presentedn Amos et al. (1996). Among other algorithms pro-osed in that paper, there is one that solves the Maximumlique Problem for a graph. However, a DNA bioalgo-

ithm to solve this problem was implemented for the firstime in Ouyang et al. (1997). Since this ground-breakingaper, many others have addressed the same problemrom different viewpoints. For example, the work ofack et al. (1999) focuses on the relation between evo-

utionary computing and DNA computing, proposing anvolutionary DNA approach to the Maximum Cliqueroblem. Later, a parallel algorithm using fluids dis-lacement in a three-dimensional microfluidic system toolve the Maximum Clique Problem in a six-vertex graphas presented in Chiu et al. (2001). Almost simultane-usly another microflow reactor was used in McCaskill2001) to solve the same problem using a brute forcetrategy and codifying each possible subgraph as a DNAtrand. An aqueous algorithm was proposed in Headt al. (1999) to solve the same instance of the Maxi-um Clique Problem as in Ouyang et al. (1997), but this

ime employing a new model using plasmids to storehe algorithm string information. Finally, the work ofimmermann (2002) is an example of use of the DNAomputing sticker model to get all the cliques of size Kf a graph.

In the following, we present a new proposal for solv-ng the Maximum Clique Problem of a graph, using

constructive P system with replicated rewriting andnhibitors.

.2. A P system with replicated rewriting andnhibitors

P systems with replicated rewriting are defined asembrane systems whose basic objects are not sym-

ols but structured elements, like strings. These systemsontain multisets of strings that are processed usingeplicated rewriting rules, that is, rules that, apart fromodifying strings, can increase the number of their

opies. Each such rule consists of n ≥ 1 subrules. When aule is applied to a string, the string is first replicated intocopies and then each subrule is applied to one copy.Usually, both P systems with replicated rewriting and

ther types of P systems (for example P systems withctive membranes) adopt brute force strategies to solveP-complete problems. Hence, these systems use differ-

nt tools (rewriting rules with replication or membranereation, respectively) to generate the space of all solu-ions to the combinatorial problem and then select theest one (Krishna and Rama, 2001; Zandron et al., 2000).

ems 90 (2007) 687–697 689

For a problem like calculating the maximum cliquefor a graph with n nodes, this would mean having togenerate first the 2n possible cliques and then select thebiggest one. However, it is fairly clear that not all the2n possible cliques of a graph will exist in most cases.Indeed, for any graph G, only those cliques that do notcontain two nodes linked by an arc in its complementarygraph G′ are valid, since we know that any pair of nodeslinked in G′ will not be connected in G. If that comple-mentary graph is G′ = (V, E′), then E′ is what we willterm the constraints set.

In order to make use of the valuable information con-tained in that complementary graph G′, we define setsof inhibitors for the rules of our P system. The idea ofusing promoters or inhibitors in P systems rules has aclear biological inspiration (Bottoni et al., 2002; Ionescuand Sburlan, 2004). Thus, a rule modelling a biologicalreaction can or cannot take place in the presence of cer-tain enzymatic proteins. In our case, for each rule Ri ofthe P system, we define a set of inhibitors Ui containingthose symbols aj in presence of which the rule cannot beapplied.

Hence, the P system proposed here uses the informa-tion from that constraints set E′ to gradually build onlythose cliques of G that are valid, thereby optimizing thenumber of strings needed to solve the problem. Given agraph G = ({a1, . . ., an}, E), we formally define this Psystem as a construct:

Π = (V, μ, M1, . . . , Mn, R1, . . . , Rn) where

V = {a1, d |1 ≤ i ≤ n} ,

μ = [n· · ·[2[1]1]2· · ·]n,M1 = {d}, Mi = {λ} with 2 ≤ i ≤ n,

Ri = {dw → (dwai, out)¬Ui||(dw, out)|

w ∈ {aj|1 ≤ j ≤ i − 1}∗, 0 ≤ |w| ≤ i − 1}with Ui = {aj ∈ V |{ai, aj} ∈ E′}, 1 ≤ i ≤ n

Thus, the alphabet of the P system consists of n + 1elements: d, an auxiliary symbol, and a1, . . ., an, thenodes of the graph. Furthermore, the defined P systemΠ has a membrane structure μ composed of n embed-ded membranes. Each membrane contains a set of rulesRi with replication. Each rule is composed of a coupleof subrules ri,a: dw → (dwai, out)¬Ui

and ri,b: dw →(dw, out). Initially, the innermost membrane 1 containsa string with a single symbol d (M = {d}), whereas the
1
other membranes are empty (Mi = {λ}, 2 ≤ i ≤ n). Thesystem starts to work by applying the two rewriting sub-rules of membrane 1 to that string d. At the end of thatfirst step, two strings are created and sent to the next

BioSys
membrane. The process is repeated in each membraneapplying their respective rules Ri so that, step by step,all the valid cliques of the graph are generated. Notethat the subrules ri,b are always applicable to all strings,whereas the subrules ri,a are only applicable if the stringsto be rewritten do not contain any of the symbols inthe inhibitor set Ui. Each of these inhibitor sets con-tains the aj linked by an arc in G′ to the node ai, thatis, Ui = {aj ∈ V|{ai, aj}∈ E′}. When this happens, theonly applicable rule is ri,b, and there is no replicationin this case. In step n, then, the system outputs a lan-guage consisting of all the valid cliques for the graphG, the biggest of which is the solution to the MaximumClique Problem for that graph. In Section 3 we proposean implementation of Π by means of a DNA algorithmthat uses a constructive strategy to simulate the P systembehaviour.

2.3. Extracting only the maximum size cliques

The P system proposed in the previous section gener-ates simultaneously strings representing all the possiblecliques of the graph G. When implemented in the DNAframework as described in the next sections, the stringsare represented by DNA strands. Therefore, using thetechnique called electrophoresis, we can easily separatethe longest strands representing maximal cliques. Onecan ask, however, how to separate the strings represent-ing the maximal cliques also by the P system itself, toprovide the solution of the Maximum Clique Problem.To achieve this, we enrich the structure of the P systemΠ as follows:

Π = (V ′, μ′, M1, . . . , Mn+1, R′1, . . . , R

′n+1) where

V ′ = {ai, dj |1 ≤ i ≤ n, 0 ≤ j ≤ n },μ′ = [n+1· · ·[2[1]1]2· · ·]n+1,

M1 = {d0}, Mi = {λ} with 2 ≤ i ≤ n + 1,

R′i = {dkw → (dk+1wai, out)¬Ui

||(dkw, out)|w ∈{aj|1 ≤ j ≤ i − 1}∗, |w| = k, 0 ≤ k ≤ i − 1}with Ui = {aj ∈ V |{ai, aj} ∈ E′}, 1 ≤ i ≤ n

R′n+1 = {dkw → (dk+1w, here)|w ∈ {aj|1 ≤ j ≤ n}∗,

0 ≤ |w| ≤ n, 0 ≤ k ≤ n − 1}∪{dnw → (dnw, out)|w ∈ {aj|1 ≤ j ≤ n}∗,0 ≤ |w| ≤ n}.

The function of the P system Π ′ is quite similar tothat of Π, with the following difference. Each producedstring bears an explicit information about the size of theclique it represents, encoded in the symbol dk. After n

tems 90 (2007) 687–697

steps, the strings representing all cliques are collected inthe membrane n + 1. From now on, the system outputs ineach step only strings of a certain size, in the decreasingorder. Therefore, in step n + 1 the strings representingcliques of size n are sent out (if there are any), in stepn + 2 the cliques of size n − 1 and so on. The first outputfrom the system represents the maximal cliques. Onecan easily increase, by additional symbols and rules, theintervals between sending out solutions of different sizes.Hence there can be enough time to separate the first (andmaximal) solution.

3. A constructive membrane-inspired DNAalgorithm for solving the Maximum CliqueProblem

3.1. Computational model

The computational benefits of using DNA come fromits complementarity. DNA strands, formed by repeat-ing four types of nitrogen bases {A, C, G, T}, have thenatural property of pairing, meaning that two comple-mentary strands oriented in opposing directions pair toform a double helix structure. This is known as DNA’ssecondary structure and was discovered by Watson andCrick in 1953. Chemical properties of DNA guaranteethat adenine (A) can only pair with thymine (T), whereasguanine (G) can only join up with cytosine (C).

Lipton’s computational model or the test tube modelcan be used to solve generic computational problems bydefining the problem strings as binary words constructedover the {A, C, G, T} alphabet. In the algorithms basedon this model, the initial tube is composed of the binarywords of n bits that encode the space of all problemsolutions. Additionally, this model defines a series ofbiological operations that are used to operate on the ini-tial contents of that test tube until the correct solution isfinally reached.

In the model that we propose to implement the Psystem described in Section 2.2, the problem solutionis represented in a manner more reminiscent of whatAdleman did, that is, as a word over a finite alphabet,determined by the problem domain. Quite contrary tothe stipulations of Adleman’s and Lipton’s models, theinitial test tube is empty in our case, that is, does not con-tain the space of all the possible solutions of the problem.This way the problem strings (in this case valid cliquesfor the graph) are built gradually. This point is conceptu-
ally much closer to how the proposed P system operates,gradually generating the strings from the initial symbold. Apart from these differences, the biological opera-tions used to implement the P system are similar to the

BioSyst

ofaE

1

2

3

4

56

ateogaomtoaFtctAa

Fo


nes defined in Adleman’s and Lipton’s models. Theirunction is to allow problem strings to be built gradu-lly, removing any that are invalid as they are detected.xpressly, the following operations on DNA are used:

. Merge (t1, t2, t3): Mixes two different multisets ofstrings (tubes) t1 and t2 to form a single multiset int3.

. Replicate (t1, t2, t3): Replicates the contents of a mul-tiset of strings (tube) t1 into two multisets t2 and t3such that each of t2 and t3 contains one copy of theoriginal content of t1, which is emptied.

. Separate (t1, a, t2): Given a tube or multiset of stringst1 and a substring a, we extract from t1 all the stringscontaining a, and a tube t2 is generated with all theextracted strings.

. Append (t1, a): Given a tube t1, append the string ato the end of all the strings the tube contains. If t1 isempty, then the result of the operation is just {a}.

. Delete (t1): Given a tube t1, delete its contents.

. MeasureStrings (t1): Measure the strings of tube t1 toidentify the biggest.

Following the design instructions set out in Ouyang etl. (1997), the use of 20-base pair long strands (20-mer)o encode each node ai of the problem domain is consid-red sufficient for the proposed examples. The last modelperation, MeasureStrings, can be implemented usingel electrophoresis. The Merge and Delete operationsre simple manipulations on test tubes. The Replicateperation can be perfomed via one cycle of the poly-erase chain reaction (PCR). To assure that both t2 and

3 contain one copy of each original strand, we can markne primer of each pair (using, e.g. magnetic beads) andfter performing PCR we separate all the marked strands.inally, a series of operators are needed to implement

he Separate and Append operations (Fig. 1). Specifi-
ally, n separation operators (complementary strands ofhe n elements ai) are needed for the Separate operation.s regards the Append operation, the strand encoding
i can be considered to be composed of a prefix and

ig. 1. DNA strands for coding domain elements, append operators and sepperator; (c) element ai append operators.

ems 90 (2007) 687–697 691

a suffix of 10-mer each, called p(ai) and s(ai). Then,each element ai needs exactly i − 1 append operatorsof the form 5′-s(aj)p(ai)-3′, 1 ≤ j < i. Therefore, thealgorithm needs a total of k operators of this type, with:

k =n−1∑

i=1

i

3.2. The membrane-inspired DNA algorithm

In this section, we present the DNA algorithm basedon the structure and behaviour of the P system with repli-cated rewriting and inhibitors proposed in Section 2.2.As discussed earlier, this algorithm does not use binarywords of constant length, but variable-sized strings on aproblem domain-equivalent alphabet. In the case of theMaximum Clique Problem, this domain is the set V ofnodes of the graph. Therefore, a clique for the graph isencoded by means of a word that contains only the nodesthat belong to that clique, thereby ruling out the repre-sentation of those nodes that are not part of that clique.The implementation of our P system in terms of a DNA-based algorithm in the computational model set out inthe last section is based on the following points:

1. The different membranes of the P system constitutethe physical space that delimits the scope of the repli-cated rewriting operations on problem strings in eachiteration. Therefore, this number of membranes isdirectly related to the number of iterations neededby the DNA algorithm. Specifically, the algorithmmakes n iterations, one for each P system membrane.

2. As it possesses rules Ri with replication (ri,a, ri,b),the P system is able to multiply its number of stringsat every step, thereby generating the exponentially
increasing space needed to solve NP-complete prob-lems in linear time. From the viewpoint of a DNAalgorithm, this replication property is equivalent tothe capability of dividing the strands space into
aration operators: (a) domain element ai; (b) element ai separation

BioSys

this, we chose exactly the same instance of the prob-lem as published in the papers Ouyang et al. (1997) andHead et al. (1999). Fig. 2 shows the graph G and itscomplementary G′.


several subsets, enabling different operations to beperformed on each one. The Replicate operation isresponsible for this task. The maximum number ofstrings that can be generated after a rule has beenapplied is called degree of replication. This degreeof replication is then directly related to the numberof subsets (tubes) into which the DNA strands spacecan be replicated. As the degree of replication of ourP system is two (each rule Ri generates at most twostrings in each step), the Replicate operation has togenerate no more than two subsets t2 and t3 from aninput set t1 (Replicate (t1, t2, t3)).

3. The subrule ri,a of each P system rule Ri carries outthe rewriting operation that adds a new node ai to the

strings. However, this operation is only performed onvalid strings, that is, strings that contain no node aj

such that {ai, aj}∈ E′, with (i > j). From the viewpointof a DNA algorithm, it takes more than one operationto evaluate this condition. First of all, the algorithmloops through the constraints set E′ in search of anynodes aj, with the aim of separating and deleting allthe strands containing such a node. Once this processis complete, the node ai is appended to the remainingstrings. Therefore, the P system rewriting rules aredirectly related to the Separate, Delete and Appendoperations of the DNA computing model describedin the last section.

4. After applying the P system rules to a set of strings ina particular membrane, the resulting strings are sentto the next system membrane. Therefore, the region
defined by that receptor membrane is responsible forgrouping all the strings generated in the last systemstep. From the viewpoint of the DNA algorithm, thereneeds to be an operation that can be used to cluster the
tems 90 (2007) 687–697

different subsets of strands. In the model used, thistask is carried out by the Merge (t1, t2, t3) operation,which, of course, should be able to group as manysubsets of strings as the Replicate (t1, t2, t3) operationcan generate.

5. Finally, the DNA algorithm makes use of the Mea-sureStrings (t1) operation to determine the problemsolution (composed of the biggest string exiting theskin membrane of the system in the last step n).

In the following, we present the DNA algorithm,working in linear time with respect to the number ofnodes |V|, that implements the P system Π proposed tosolve the Maximum Clique Problem for a graph:

3.3. Algorithm execution

In the following, we detail an execution of the algo-rithm presented in the above section (Table 1). To do

Fig. 2. Graphs G and G′. The Maximum Clique of G is {2345} andits constraints set is E′ = {{0, 2}, {0, 5}, {1, 5}, {1, 3}}.

M. Garcıa-Arnau et al. / BioSystems 90 (2007) 687–697 693

Table 1Algorithm execution for the graph illustrated in Fig. 2

4

tahpdctga

ala(in

. Comparison

In this section, we compare the algorithm proposed inhis paper with those described in Ouyang et al. (1997)nd Head et al. (1999). In this comparison, we look atow the three algorithms work on two different exam-les. The first example is exactly the same as the oneescribed in the last section. The second is a modifi-ation of the first in which the maximum clique is stillhe same, but the size of the constraints set E′ is big-er. The next two paragraphs describe how the other twolgorithms work.

In their paper, Ouyang et al. (1997) propose the use ofbrute force strategy to solve the Maximum Clique Prob-

em. Hence, the cliques in this “brute force algorithm”
re represented as binary strings of n pairs of elementsposition, value). The position component in each pairndicates the respective node, whereas the value compo-ent contains one or zero depending on whether or not
this node is in the clique. The sequence of nucleotides(from the {A, C, G, T} alphabet) selected to representthe position and value components is designed such thata DNA strand representing an invalid clique is gener-ated with a particular restriction enzyme recognitionsequence. In this way, once the space of all possible prob-lem solutions (2n cliques) has been generated, a seriesof restriction enzymes are applied to break the strandsencoding all these invalid cliques. Hence, these strandsare disabled and are not multiplied in successive PCRcycles applied to the working set. As a result, gel elec-trophoresis can be applied in the last step to measurethe strands in the set and identify the optimum problemsolution.

The problem is tackled in quite a different way in
Head et al. (1999). This paper defines a new modelcalled “aqueous computing” which proposes the use ofplasmid (circular DNA strings that are approximately3000 base pairs (bp) long) to represent binary-encoded

694 M. Garcıa-Arnau et al. / BioSystems 90 (2007) 687–697

Table 2Comparison between the “brute force algorithm”, the “aqueous algorithm” and the “constructive algorithm” for the graph illustrated in Fig. 2, withV = {0, 1, 2, 3, 4, 5} and E′ = {{0, 2}, {0, 5}, {1, 5}, {1, 3}}

Brute force algorithm Aqueous algorithm Constructive algorithm

No. of iterations 4 4 6No. of different molecules per iterationa 48, 40, 32, 26 2, 4, 7, 12 1, 3, 5, 8, 17, 25Volume of the no. of copies of the

problem-solving molecule after thelast iterationb

1/26 1/16 1/25

No. of operations performedc 8 8 4Mean string size per iterationd 173.3 175 177.5 179.6 179 182 185.3 188.3 20 26.7 28 30 38.8 42.4Total mean string size 176.3 185.2 31No. of restriction enzymes needed 6 6 0

a The brute force algorithm starts with the space of all possible solutions and deletes strands until it gets all the strands that encode valid cliques.The constructive algorithm starts from an empty space and gradually constructs the set of all valid cliques (an empty clique is not generated, as thestring that represents it does not physically exist). Finally, the aqueous algorithm solves the problem without having generated all possible moleculetypes (cliques) for the graph in this case.

b The different generated molecule types occupy the same volume in both the brute force algorithm and the constructive algorithm. Therefore,this is calculated as (1/no. different molecule types). With respect to the aqueous algorithm, even though it generates a smaller number of moleculetypes, some of these are generated several times repeatedly, which upsets the balance between the proportions of the each molecule type.

c Although all three algorithms perform different operations, this point has accounted for the number of operations that are considered biologicallymore complex for each one. Therefore, we have counted the number of cut operations using restriction enzymes for the brute force algorithm, each

ation opred in awhere t

Reset on a station for the aqueous algorithm, and the number of separd Even though each copy of one and the same molecule type is sto

shown in the table refers to the MSC region of that plasmid, which is

cliques. The information about the clique is actually tobe found in a 175 bp subsegment of the plasmid, calledMCS (multiple cloning site). That subsegment is fur-ther divided into n regions called stations. Each stationrepresents a node of the graph and is associated witha particular restriction enzyme. Additionally, the modelhas an operation, called Reset(k), which can be used toset the value of station k to 0. This operation consistsof three basic steps: (1) linearize the plasmid by cuttingat station k using its associated restriction enzyme, (2)extend the 3′ ends with polymerase to produce a lin-ear molecule with blunt ends and (3) apply ligase tomake that blunt-ended linear molecule circular again.This produces a plasmid in which the size of stationk has been increased by 4 bp. Therefore, it no longerencodes the recognition region of its associated restric-tion enzyme. Hence, the “aqueous algorithm” works asfollows. Initially, all the molecule stations are set to 1,indicating that all the nodes are present in the clique.Then, a step is carried out for every arc {ai, aj} that is inthe complementary graph G′. The strings space is splitinto two subsets t1 and t2 in each step. The molecules in t1are subject to the Reset(ai) operation and the moleculesin t to the Reset(a ) operation. Finally, the contents
2 j
of t1 and t2 are poured into a single set. At the endof the algorithm, the number of 1s in the molecules iscounted to determine the maximum clique that solves theproblem.

erations for the constructive algorithm.plasmid of about 3000 bp in the aqueous algorithm, the size that is

he information about the clique encoding that molecule is stored.

We have selected a number of computational andphysical parameters with the aim of being able to makethe intended comparison. The goal is to characterize howthe three algorithms work well enough to show up thestrengths and weaknesses of each one. Table 2 presentsthis comparison based on the following parameters:

1. No. of iterations: This indicates how many steps ittakes for the algorithm to finish.

2. No. of different molecules per iteration: This indi-cates the number of physically different molecules(not copies of the same molecule) that the algorithmgenerates after each iteration.

3. Volume of the no. of copies of the problem-solvingmolecule after the last iteration: This parameter mea-sures the total volume taken up by the copies of theproblem-solving molecule at the end of the algorithm.If all the algorithms are considered to work on a con-stant volume of molecules equal to 1, the volumetaken up by the copies of each one molecule willfall as they operate and generate new molecule types.This parameter is directly related to error probabil-ity in the biological operations. If the volume of theproblem-solving string drops substantially, there will
be a higher risk of losing the problem-solving stringduring some biological manipulation.
4. No. of operations performed: This indicates the totalnumber of biological operations carried out by the

M. Garcıa-Arnau et al. / BioSystems 90 (2007) 687–697 695

Table 3Comparison between the brute force algorithm, the aqueous algorithm and the constructive algorithm for the graph illustrated in Fig. 3, with V = {0,1, 2, 3, 4, 5} and E′ = {{0, 1}, {0, 2}, {0, 3}, {0, 5}, {1, 3}, {1, 4}, {1, 5}}

Brute force algorithm Aqueous algorithm Constructive algorithm

No. of iterations 7 7 6No. of different molecules per

iterationa48, 40, 36, 34, 26, 22, 20 2, 4, 8, 16, 13, 22, 20 1, 2, 4, 6, 11, 19

Volume of the no. of copiesof the problem-solvingmolecule after the lastiteration

1/20 1/64 1/19

No. of operations performed 14 14 7Mean string size per iteration 173.3 175 175.5 175.6 178.8 180.5 181 179 182 184.5 186.8 188.5 191.2 191.4 20 20 25 26.7 32.7 40Total mean string size 177.1 186.2 27.4N

possibl

5

6

7

bapftb

Fb3

o. of restriction enzymesneeded

6

a In this case, the proposed aqueous algorithm does generate all the

algorithm. This parameter needs further explanation,as the type and complexity of the operations used ineach algorithm may differ significantly.

. Mean string size per iteration: This parameter is cal-culated for each iteration as the weighted mean of thesize of each molecule type divided by the proportionof its number of copies. Size is measured in terms ofnumber of bases or base pairs (bp).

. Total mean string size: This measures the mean stringsize in each of the algorithms. It is calculated as thearithmetic mean of the mean sizes of all the iterations.

. No. of restriction enzymes needed: This indicates thetotal amount of restriction enzymes that the algorithmrequires to solve a particular problem instance.

Some of the values reported in Table 2 (referring therute force algorithm and the aqueous algorithm) werelready reported explicitly or implicitly in the compared
apers (e.g. the number of iterations, the number of dif-erent molecules per iteration, the size of molecules orhe restriction enzymes needed). All other values haveeen calculated after tracing the execution of the dif-
ig. 3. Graphs G and G′. The maximum clique of G is still {2345},ut its constraints set is now E′ = {{0, 1}, {0, 2}, {0, 3}, {0, 5}, {1,}, {1, 4}, {1, 5}}.

6 0

e cliques on the graph.

ferent algorithms for the corresponding instances of theproblem.

In the following, we present a second example to com-plete the comparison. The selected graph in this case(Fig. 3) has the same set of nodes as the last one V = {0, 1,2, 3, 4, 5} and its maximum clique is still {2345}. How-ever, the size of the constraints set has been increased bythree, and is now E′ = {{0, 1}, {0, 2}, {0, 3}, {0, 5}, {1,3}, {1, 4}, {1, 5}}. Table 3 shows the results of applyingeach algorithm to this new graph.

5. Conclusions

In this paper, we present a P system with replicatedrewriting to solve the Maximum Clique Problem for agraph. The system uses inhibitors to prevent the gener-ation of all the solutions space. This problem is highlyrelevant not only at the purely computational level, butalso because of its relation to a number of problemsunderlying genomics. Although this problem has beentackled on numerous occasions in the DNA computingliterature, the same cannot be said of membrane com-puting. With the aim of being able to comparativelyevaluate the efficacy and efficiency of the designed sys-tem with that of other researchers, we have proposed animplementation of this P system using a DNA algorithm.Hence, we have been able to compare the algorithm withtwo standard works (brute force algorithm and aqueousalgorithm) that solved this problem earlier. A series ofcomputational and physical parameters have been usedto carry out this comparison.

Looking at the results, the constructive strategy usedappears to have a number of advantages. First, the num-ber of iterations of the brute force algorithm and aqueousalgorithm match the size of the constraints set E′, that is,

BioSys
the number of arcs of the complementary graph G′. As ateach algorithm iteration, the working space necessarilyhas to be split into two to apply any individual oper-ation, the maximum theoretical size of E′ could be asbig as (log2 1018 ≈ 60), assuming an initial set of 1018

DNA strands. In our constructive algorithm, however,each of the constraints in E′ matches a separation oper-ation, whereas it is the size of the set of nodes of G thatdetermines the number of iterations. This would allowour algorithm to tackle problems on any graph of up to60 nodes in linear time, irrespective of the size of theconstraints set E′.

Secondly, the fact that the only basic operation thatour algorithm uses on DNA is parallel overlap assembly(POA) helps to increase the effectiveness and simplic-ity of its biological implementation. Additionally, theconstructive strategy allows substantially smaller-sizedmolecules than those in the compared works to be usedthroughout the algorithm. This optimization of the meanstrand size used is a positive improvement, as it can helpto reduce the number of manipulation errors.

Another benefit of the constructive algorithm stemsfrom the non-use of restriction enzymes. Both the bruteforce algorithm and the aqueous algorithm call for theuse of a number of restriction enzymes that increaseslinearly with the number of nodes of G. The fact thatour algorithm has no need of these enzymes, on the onehand, lifts the numerical limits that they place on thedesign of the problem-solving molecules and, on theother, does away with the complications derived fromthe appearance of occasional mutations in their recogni-tion region. The constructive algorithm trades the use ofthese enzymes for select and append operators.

All the results presented in this paper are based on atheoretical model. However, the operations required toimplement this algorithm have already been carried outin the laboratory many times.

Acknowledgements

This research has been partially funded by the Span-ish Ministry of Science and Education under projectsTIC2002-04220-C03-03 (co-financed by FEDER funds)and DEP2005-00232-C03-03, by the Ramon y CajalProgram of the Spanish Ministry of Science and Tech-nology, and by the Czech Science Foundation, grant201/06/0567.

References

Adleman, L.M., 1994. Molecular computation of solutions to combi-natorial problems. Science 266, 1021–1024.

tems 90 (2007) 687–697

Amos, M., Gibbons, A., Hodgson, D., 1996. Error-resistant imple-mentation of DNA computations. In: Proceedings of the SecondAnnual Meeting on DNA Based Computers, Princeton University,pp. 87–101.

Amos, M., 1999. Theoretical and experimental DNA computation.Bull. Eur. Assoc. Theor. Comput. Sci. 67, 125–138.

Bach, E., Condon, A., Glaser, E., Tanguay, C., 1996. DNA models andalgorithms for NP-complete problems. In: Proceedings of the 11thIEEE Conference on Computational Complexity, pp. 290–300.

Back, T., Kok, J.N., Rozenberg, G., 1999. Evolutionary computa-tion as a paradigm for DNA-based computing. In: Landweber, L.,Winfree, E., Lipton, R., Freeland, S. (Eds.), Proceedings of theDIMACS Workshop on Evolution as Computation. Princeton, NJ,pp. 67–88.

Baum, E.B., Boneh, D., 1996. Running dynamic programming algo-rithms on a DNA computer. In: Proceedings of the Second AnnualMeeting on DNA Based Computers, Princeton University, pp.141–147.

Benenson, Y., Gil, B., Ben-dor, U., Adar, R., Shapiro, E., 2004. Anautonomous molecular computer for logical control of gene expres-sion. Nature 429, 423–429.

Bottoni, P., Martin-Vide, C., Paun, G., Rozenberg, G., 2002. Membranesystems with promoters/inhibitors. Acta Inform. 38, 695–720.

Butenko, S., Wilhelm, W., 2005. Clique-detection Models in Com-putational Biochemistry and Genomics. Department of IndustrialEngineering Texas AM University, College Station, TX.

Calude, C.S., Paun, G., 2004. Computing with Cells and Atoms: AfterFive Years. Centre for Discrete Mathematics and Theoretical Com-puter Science, CDMTCS-246 Research Report Series.

Chiu, D.T., Pezzoli, E., Wu, H., Stroock, A.D., Whitesides, G.M.,2001. Using three-dimensional microfluidic networks for solvingcomputationally hard problems. Proc. Natl. Acad. Sci. U.S.A. 98,2961–2966.

Cukras, A.R., Faulhammer, D., Lipton, R.J., Landweber, L.F., 1999.Chess games: a model for RNA based computation. Biosystems52, 35–45.

Garey, M.R., Johnson, D.S., 1979. Computers and Intractability. AGuide to the Theory of NP-completeness. W.H. Freeman, SanFrancisco.

Hagiya, M., 1999. Perspectives on molecular computing. New Generat.Comput. 17, 131–151.

Head, T., Yamamura, M., Gal, S., 1999. Aqueous computing: writingon molecules. In: Proceedings of Congress on Evolutionary Com-putation, IEEE Service Center, Piscataway, NJ, pp. 1006–1010.

Ionescu, M., Sburlan, D., 2004. On P systems with promot-ers/inhibitors. J. Univ. Comput. Sci. 10, 581–599.

Krishna, S.N., Rama, R., 2001. P systems with replicated rewriting. J.Automata, Lang. Comb. 6, 345–350.

Ledesma, L., Pazos, J., Rodrıguez-Paton, A., 2004. A DNA algorithmfor the Hamiltonian Path Problem. Using microfluidic systems.In: Lecture Notes in Computer Science 2959. Springer-Verlag, pp.289–296.

Ledesma, L., Manrique, D., Rodriguez-Paton, A., 2005. A tissue Psystem and a DNA microfluidic device for solving the shortestcommon superstring problem. Soft Comput. 9, 679–685.

Lipton, R.J., 1995. DNA solution of hard computational problems.Science 268, 542–545.

Manca, V., Zandron, C., 2001. A clause string DNA algorithm for SAT.In: Lecture Notes in Computer Science 2340. Springer-Verlag, pp.172–181.

McCaskill, J.S., 2001. Optically programming DNA computing inmicroflow reactors. Biosystems 59, 125–138.

BioSyst

O

O

P

P

P


gihara, M., 1996. Breadth first search 3-SAT algorithms for DNAcomputers. Technical Report 629. University of Rochester, NY.

uyang, Q., Kaplan, Peter, D., Liu, S., Libchaber, A., 1997. DNAsolution of the maximal clique problem. Science 278, 446–449.

aun, G., 1998. Biomolecular Computing. Theory and Experiment.
Springer-Verlag.
aun, G., 2000. Computing with membranes. J. Comput. Syst. Sci. 61,108–143.

aun, G., Rozenberg, G., Salomaa, A., 1998. DNA Computing. NewComputing Paradigms. Springer-Verlag.

ems 90 (2007) 687–697 697

Perez-Jimenez, M.J., Romero-Campero, F.J., 2005. Attacking the com-mon algorithmic problem by recognizer P systems. In: LectureNotes in Computer Science 3354. Springer-Verlag, pp. 304–315.

Seelig, G., Soloveichik, D., Yu Zhang, D., Winfree, E., 2006. Enzyme-free nucleic acid logic circuits. Science 314, 1585–1588.

Zandron, C., Ferretti, C., Mauri, G., 2000. Solving NP complete prob-
lems using P systems with active membranes. In: Antoniou, I.,Calude, C.S., Dinneen, M.J. (Eds.), Unconventional Models ofComputation. Springer-Verlag, London, pp. 289–301.
Zimmermann, K.H., 2002. Efficient DNA sticker algorithms for NP-complete graph problems. Comput. Phys. Commun. 144, 297–309.

A P system and a constructive membrane-inspired DNA algorithm for solving the Maximum Clique Problem

Documents

Transcript of A P system and a constructive membrane-inspired DNA algorithm for solving the Maximum Clique Problem