ASP-G: an ASP-based method for finding attractors in genetic regulatory networks

7
BIOINFORMATICS Vol. 00 no. 00 2014 Pages 1–7 ASP-G: An ASP-based method for finding attractors in Genetic Regulatory Networks Mushthofa Mushthofa 1 , Gustavo Torres 1 , Yves Van de Peer 2,3,4 , Kathleen Marchal 3,5,6 * and Martine De Cock 1,7 * 1 Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan 281 (S9), 9000 Ghent, Belgium. 2 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Ghent, Belgium. 3 Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Ghent, Belgium. 4 Genomics Research Institute (GRI), University of Pretoria, Private bag X20, Pretoria, 0028, South Africa 5 Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark, Arenberg 20, 3001 Leuven, Belgium. 6 Dept. of Information Technology, IMinds, Ghent University, Zuiderpoort Office Park, Gaston Crommenlaan 8, B-9050 Ghent-Ledeberg, Belgium. 7 Center for Web and Data Science, Institute of Technology, University of Washington Tacoma, 1900 Commerce Street, Tacoma WA-98402, USA. Received on XXXXX; revised on XXXXX; accepted on XXXXX Associate Editor: XXXXXXX ABSTRACT Motivation: Boolean network models are suitable to simulate gene regulatory networks (GRNs) in the absence of detailed kinetic information. However, reducing the biological reality implies making assumptions on how genes interact (interaction rules) and how their state is updated during the simulation (update scheme). The exact choice of the assumptions largely determines the outcome of the simulations. In most cases, however, the biologically correct assumptions are unknown. An ideal simulation thus implies testing different rules and schemes to determine those that best capture an observed biological phenomenon. This is not trivial, since most current methods to simulate Boolean network models of GRNs and to compute their attractors impose specific assumptions that cannot be easily altered, as they are built into the system. Results: To allow for a more flexible simulation framework, we developed ASP-G. We show the correctness of ASP-G in simulating Boolean network models and obtaining attractors under different assumptions by successfully recapitulating the detection of attractors of previously published studies. We also provide an example of how performing simulation of network models under different settings help determine the assumptions under which a certain conclusion holds. The main added value of ASP-G is in its modularity and declarativity, making it more flexible and less error-prone than traditional approaches. The declarative nature of ASP-G comes at the expense of being slower than the more dedicated systems but still achieves a good efficiency w.r.t. computational time. Availability: The source code of ASP-G is available at: http://www.psb.ugent.be/˜mumus/ASPG/asp-g.zip. * to whom correspondence should be addressed Contact: [email protected], [email protected] 1 INTRODUCTION Gene Regulatory Networks (GRNs) consist of genes, proteins and other regulatory molecules that undergo complex and dynamic interactions which drive gene expression, and ultimately, complex cellular behaviour. To be able to understand and predict this behaviour, various mathematical models have been developed that describe the dynamics of these GRNs. Different model formalisms have been used, as reviewed in De Jong (2002). One of the earliest models to describe GRNs are Boolean network models (Kauffman (1993)). Boolean network models are attractive because of their simplicity (Shmulevich et al. (2002)): by reducing the complexity of GRNs to qualitative logical models, Boolean network models are able to cope with the largely incomplete kinetic information of biological networks. Despite their highly simplified representation of biological reality, Boolean network models were shown to still grasp the important dynamic properties of GRNs, such as the networks’ attractors. An attractor represents a stable set of states towards which the transiently changing gene expression values converge to. Evolving towards an attractor thus corresponds to reaching a specific developmental stage (cell types, development stages of cells, etc) or functional mode (De Jong and Page (2008), Kauffman (1993)), and the analysis of attractors in a regulatory network thus hints towards the functional modes of the regulatory network (De Jong and Page (2008)). c Oxford University Press 2014. 1

Transcript of ASP-G: an ASP-based method for finding attractors in genetic regulatory networks

BIOINFORMATICS Vol. 00 no. 00 2014Pages 1–7

ASP-G: An ASP-based method for finding attractors inGenetic Regulatory NetworksMushthofa Mushthofa 1, Gustavo Torres 1, Yves Van de Peer 2,3,4, KathleenMarchal 3,5,6 ∗ and Martine De Cock 1,7 ∗

1Department of Applied Mathematics, Computer Science and Statistics, Ghent University,Krijgslaan 281 (S9), 9000 Ghent, Belgium.2Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Ghent, Belgium.3Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927,9052 Ghent, Belgium.4Genomics Research Institute (GRI), University of Pretoria, Private bag X20, Pretoria, 0028, SouthAfrica5Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark, Arenberg 20, 3001Leuven, Belgium.6Dept. of Information Technology, IMinds, Ghent University, Zuiderpoort Office Park, GastonCrommenlaan 8, B-9050 Ghent-Ledeberg, Belgium.7Center for Web and Data Science, Institute of Technology, University of Washington Tacoma, 1900Commerce Street, Tacoma WA-98402, USA.Received on XXXXX; revised on XXXXX; accepted on XXXXX

Associate Editor: XXXXXXX

ABSTRACTMotivation: Boolean network models are suitable to simulate generegulatory networks (GRNs) in the absence of detailed kineticinformation. However, reducing the biological reality implies makingassumptions on how genes interact (interaction rules) and howtheir state is updated during the simulation (update scheme). Theexact choice of the assumptions largely determines the outcomeof the simulations. In most cases, however, the biologically correctassumptions are unknown. An ideal simulation thus implies testingdifferent rules and schemes to determine those that best capturean observed biological phenomenon. This is not trivial, since mostcurrent methods to simulate Boolean network models of GRNs andto compute their attractors impose specific assumptions that cannotbe easily altered, as they are built into the system.Results: To allow for a more flexible simulation framework, wedeveloped ASP-G. We show the correctness of ASP-G in simulatingBoolean network models and obtaining attractors under differentassumptions by successfully recapitulating the detection of attractorsof previously published studies. We also provide an example ofhow performing simulation of network models under different settingshelp determine the assumptions under which a certain conclusionholds. The main added value of ASP-G is in its modularity anddeclarativity, making it more flexible and less error-prone thantraditional approaches. The declarative nature of ASP-G comes atthe expense of being slower than the more dedicated systems butstill achieves a good efficiency w.r.t. computational time.Availability: The source code of ASP-G is available at:http://www.psb.ugent.be/˜mumus/ASPG/asp-g.zip.

∗to whom correspondence should be addressed

Contact: [email protected], [email protected]

1 INTRODUCTIONGene Regulatory Networks (GRNs) consist of genes, proteins andother regulatory molecules that undergo complex and dynamicinteractions which drive gene expression, and ultimately, complexcellular behaviour. To be able to understand and predict thisbehaviour, various mathematical models have been developed thatdescribe the dynamics of these GRNs. Different model formalismshave been used, as reviewed in De Jong (2002). One of the earliestmodels to describe GRNs are Boolean network models (Kauffman(1993)). Boolean network models are attractive because of theirsimplicity (Shmulevich et al. (2002)): by reducing the complexityof GRNs to qualitative logical models, Boolean network modelsare able to cope with the largely incomplete kinetic information ofbiological networks. Despite their highly simplified representationof biological reality, Boolean network models were shown to stillgrasp the important dynamic properties of GRNs, such as thenetworks’ attractors. An attractor represents a stable set of statestowards which the transiently changing gene expression valuesconverge to. Evolving towards an attractor thus corresponds toreaching a specific developmental stage (cell types, developmentstages of cells, etc) or functional mode (De Jong and Page (2008),Kauffman (1993)), and the analysis of attractors in a regulatorynetwork thus hints towards the functional modes of the regulatorynetwork (De Jong and Page (2008)).

c© Oxford University Press 2014. 1

Mushthofa et al

Current knowledge regarding GRNs are generally incomplete(Rottger et al. (2012)). Comparing simulated with observedattractors (states, e.g. developmental stages) of a certain networkmodel can thus aid in evaluating existing network models and/orpredict missing information in the current knowledge. For instance,Mendoza and Alvarez-Buylla (1998), Mendoza et al. (1999) andmore recently, Espinosa-Soto et al. (2004) and Sanchez-Corraleset al. (2010) studied flower development in Arabidopsis thalianausing Boolean network models of which the network attractorscorresponded to stable gene expression levels during the differentstages of flower development. These models helped predictingmutant phenotypes and the existence of a yet-uncharacterized geneinvolved in the transition from the non-flowering to the floweringstate. Li et al. (2004) and Davidich and Bornholdt (2008) used aBoolean network model and its steady states to describe the differentstages in yeast cell cycle, where the stages of the cycle correspond tothe strong attractors of the network. Kaufman et al. (1985) explainedthe various states of the immune system with Boolean networkmodels and Sanchez and Thieffry (2001), Albert and Othmer (2003)and Gonzalez et al. (2008) used Boolean network models andtheir attractors to describe the cellular development of Drosophilamelanogaster.

Key to simulating GRNs with Boolean network models is thechoice of the proper assumptions. These assumptions refer to theactivation rules and update scheme. Activation rules determine theway the activation state of each gene depends on the activationstates of its interactors in the previous transition step. The updatescheme determines how these activation states are updated, i.e,.either synchronously or asynchronously. The exact choice of theseassumptio ns largely determines the number and characteristics ofthe attractors. As in most cases, the true biological activation rulesand update scheme are not known, one should be able to easily testdifferent activation rules and schemes, as this allows to have an ideaon the conditions under which the simulated network model wouldbe able to capture an observed biological phenomena (boundaryconditions).

Several computational tools have been developed to performthe computation of attractors in Boolean network models. Garget al. (2007) developed genYsis, which uses techniques involvingBinary Decision Diagrams (BDD) to computate attractors. Arellanoet al. (2011) used techniques based on Temporal Logic modelchecking in Antelope. Ay et al. (2009) used state-space pruning andrandomized state-space traversal methods to improve the scalabilityof the attractor computation. Dubrova and Teslenko (2011) used aBoolean Satisfiability (SAT) solver, typically used for combinatorialmodeling and problem solving, to compute attractors of GRNs andobtained a better computational time and space efficiency comparedto the BDD-based approach. More recently, Zheng et al. (2013)developed geneFatt based on the Reduced Order BDD (ROBDD)data structure which further improves the efficiency of the attractorcomputation.

Most of the above-mentioned systems to simulate Booleannetwork models in principle can perform simulations with differentassumptions. However, changing these assumptions would requirequite tedious modifications on these systems. For instance, usingthe SAT-approach (Dubrova and Teslenko (2011)), modifying thestructure of the network and the updating rules would requireupdating the truth tables in the cnet format.

To allow for a more flexible simulation framework we developedASP-G which makes use of the declarative-programming paradigmAnswer Set Programming (ASP) (Lifschitz (2008)). The declarativenature of ASP allows one to specify and modify the domain-specificlogic (here the definition of the network interactions, activationrules and update schemes) required to represent and solve thecomputational problem at hand (here dynamical modeling andattractor calculation) in an intuitive and modular way (Eiter et al.(2009)). To illustrate the flexibility of our approach, we appliedit to calculate attractors of previously published Boolean networkmodels of GRNs of different sizes and complexity, and differentsimulation assumptions. By trying to mimic previous results underdiverse settings, we can show that the main advantage of ourapproach consists of making the modeling more flexible and lesserror prone, and therefore helps delineate the boundary conditionsunder which the biological conclusions based on simulations ofBoolean network models are valid. At the same time, we alsoshow that, with the use of fast and efficient ASP solvers, thecomputational efficiency of our method is in the same range asthat of the most efficient dedicated methods for the simulation ofBoolean network models and the calculation of their attractors.

2 METHODS AND MODELS2.1 Boolean network modeling of GRNsIn our work, we adopt the definition of Boolean networks asdescribed in Kauffman (1993): a Boolean network model consistsof network elements (nodes, here representing structural andregulatory genes/proteins) which can either be active (ON) orinactive (OFF) and interactions between these elements (edges,which represent the directed regulatory interactions between thegenes). We define two types of regulatory interactions betweeninteracting nodes i.e. activation (up-regulation) and inhibition(down-regulation). The activation state of a certain node at a certaintime step is determined by a logical function of the activation statesat the previous time step of its interactors (where the interactors of anode are defined as the incoming edges of a certain network node).

Formally, a Boolean network model G(V, F ) is defined by a set ofnodes V = {x1, . . . , xn} and a list of Boolean functions F = (f1,. . . , fn). Each xi ∈ {0, 1}, i = 1, . . . , n is a binary variable and itsvalue at time t + 1 can be determined by the values of some othernodes xj1(i), xj2(i), . . . , xjki

(i) at time t by means of a Booleanfunction fi ∈ F . That is, there are ki nodes assigned to xi thatdetermine its state. The activation state of every node xi changesover time according to

xi(t+ 1) = fi(xj1(i) (t) , xj2(i) (t) , . . . , xjki

(i) (t))

A state s of a Boolean network is an assignment of {0, 1} to eachnode xi. Its successor state is the state resulting of applying fi toeach node xi. The dynamics of the network consists of transitionsbetween network states. An example of a Boolean network is givenin Figure 1. This network has 3 nodes, denoted by x1, x2 and x3,and interactions between these nodes, represented by the edges. Thedynamics of the network can be described using a State TransitionGraph (STG) as given in the supplementary Figure S1.

Attractors in a Boolean network model are defined as in Garget al. (2007) and Ay et al. (2009).

2

ASP-G

x1

x2

x3

Fig. 1: A Boolean network model with three genes. Edges witharrowed tips are activating interactions and edges with blunt tipsare repressing (inhibiting) links.

Fig. 2: Architecture of ASP-G.

DEFINITION 1. Let S be a set of states of a Boolean networkmodel. S is an attractor if and only if the following conditions aresatisfied:

1. The set of the successor states of all the states in S is equal toS.

2. For each si ∈ S, once it is visited, the probability of revisitingsi in a finite number of state transitions is equal to 1.

2.2 Representing Boolean network models of GRNsand computing their attractors using ASP

ASP-G frameworkOur framework for modeling GRNs and computing attractors, calledASP-G, uses Answer Set Programming (ASP). ASP is a declarativeprograming paradigm (Lifschitz (2008)), which is typically usedto solve combinatorial search problems (Eiter et al. (2009)). Thearchitecture of ASP-G is shown in Figure 2. ASP-G consists offour main modules/parts of the system: the network description, theupdate scheme, the activation rules and the attractor computationalgorithm. The following describes each of these modules.

Network description moduleThis module contains the description of the structure of the network,encoded in a set of facts provided by the user. Because ofthe declarative nature of ASP, such a description can be written

succinctly in an intuitive format. For example, the following set offacts is used to describe the network depicted in Figure 1:

gene(x1). gene(x2). gene(x3).

activates(x1, x1). activates(x1, x2). activates(x1, x3).

activates(x2, x1). activates(x2, x3). inhibits(x3, x1).

activates(x3, x2). inhibits(x3, x3).

To compare the declarative specification of the network with themore classically-used SAT notation Dubrova and Teslenko (2011),we give the network specification corresponding to the Booleannetwork depicted in in Figure 1 in cnet format, as follows:

.n 1 3 1 2 3 .n 2 2 1 3 .n 3 3 1 2 3--1 0 1- 1 --1 01-0 1 -1 1 1-0 1-10 1 00 0 -10 1000 0 000 0

Each gene in the network is written with its label, e.g. .n 1 fornode x1, followed by the number of regulators and then their labels,e.g. (3 1 2 3). Next a truth table follows that determines thebehavior of the network (e.g. --1 0 in the truth table for x1 meansthat whenever x3 is active then x1 will be inhibited irrespective ofthe values of x1 and x2).

Activation rulesThis module determines the activation rules used to update theactivation state of each gene at each transition step, based on theintended assumption by the user. We implemented two frequentlyused activation rules:

1. A gene will be active in a subsequent time step t+ 1 if at timestep t at least one active interactor is an activator and no activeinteractors that act as inhibitors are present.

2. A gene will be active in a subsequent time step t + 1 ifthere are more active activators than active inhibitors amongits interactors at time step t.

In ASP-G this first type of activation rule is encoded as

r+1 : active(X,T ) ← A > 0, I = 0,#act(X,A, T − 1),

#inh(X, I, T − 1), gene(X),

T > 0.

r+2 : inhibited(X,T ) ← not active(X,T ), gene(X),

T > 0.

The second type of activation rule is encoded as

r∗1 : active(X,T ) ← A− I > 0,#act(X,A, T − 1),

#inh(X, I, T − 1), gene(X),

T > 0.

r∗2 : inhibited(X,T ) ← I −A ≥ 0,#act(X,A, T − 1),

#inh(X, I, T − 1), gene(X),

T > 0.

3

Mushthofa et al

To illustrate the flexibility of ASP to express nearly any possibleactivation rule, we implemented a more specific type of rules inwhich the activation of a certain gene is expressed as a free-formgene-specific Boolean function of all the interactors of that gene,such as the one used in Sanchez-Corrales et al. (2010) and Gonzalezet al. (2008). For example, the activation rule of a gene G1 might beexpressed as a Boolean function of the form:

G1 = G1 and (not G2 or not G3)

where G2 and G3 are two other genes involved in the network. Inthis case, G1 has a self-activating interaction, while G2 and G3act as inhibitors for G1. To accommodate such an activation rulein ASP-G, we first convert the Boolean function into a DisjunctiveNormal Form (DNF). For example, the activation rule given aboveis rewritten into the following:

G1 = G1 and not G2

G1 = G1 and not G3

and then encoded in ASP as follows:

active(G1, T ) ← T > 0, active(G1, T − 1),

inhibited(G2, T − 1)

active(G1, T ) ← T > 0, active(G1, T − 1),

inhibited(G3, T − 1)

Update schemeTwo update schemes were adopted in ASP-G: synchronous andasynchronous updates. In the synchronous update scheme, weassume that all genes in the network are updated simultaneouslyper evaluated time step. This implies that the transitions betweenthe network states are deterministic, i.e., for every state visited overtime, only one possible successor state exists. In the asynchronousupdate, no assumptions of synchronicity is made, and genes maybe updated at different time steps. Therefore, transitions are non-deterministic: there may be several possible next state after a certaintransition. As illustrated in the supplementary Figures S1 and S2,an asynchronous update scheme can result in a drastically differentSTG and leads to different attractors.

Both the synchronous and the asynchronous updating wereimplemented in ASP-G. For the synchronous update scheme, we usethe following rules to generate initial activation states of the genes:

active(X, 0) ← gene(X),not inhibited(X, 0).

inhibited(X, 0) ← gene(X),not active(X, 0).

and then use the following rules (in relation with the activation rulesdescribed previously) to determine the activation state of each geneat each time step t:

active(X, t) ← active(X, t− 1),not inhibited(X, t).

inhibited(X, t) ← inhibited(X, t− 1),not active(X, t).

For the asynchronous update scheme, we need to add thefollowing rules, in addition to the previously described rules:

changed(X,T ) ← active(X,T ), inhibited(X,T − 1),

gene(X), T > 0.

changed(X,T ) ← inhibited(X,T ), active(X,T − 1),

gene(X), T > 0.

← #changed(N,X, T ), N ≥ 2,

gene(X), T > 0.

To obtain a better performance in the asynchronous case, we applythe STG reduction technique, as explained in the supplementaryFigure S2.

Computing the attractorsThe attractor computation in ASP-G is performed by Algorithm 1,which is based on the algorithm by Dubrova and Teslenko (2011).The main idea used in Algorithm 1 is to identify attractors bylooking at identical states in transition paths of certain lengths in theSTG of the network. Since there are exponentially many possiblestates in an STG (in relation to the number of nodes in the network),explicit enumeration of all the states in an STG is unfeasiblefor larger networks. ASP-G avoids this explicit enumeration byimplicitly simulating the dynamics of the network. Furthermore,once a state is identified as part of an attractor, it can be removedfrom the STG to prune the search space. An illustration on howthe algorithm works is given in supplementary Figure S4. Pathgeneration and state removal are being done using ASP rules andASP constraints, respectively, as shown in Figure S5. To furtherincrease the efficiency of the computation, we use the incrementalASP approach described in Gebser et al. (2008b) and the claspsolver from the Potassco ASP suite (Gebser et al. (2011b)).

3 RESULTS3.1 ASP-G: a novel framework for the simulation and

attractor computation of GRNs with Booleannetwork models

Simulating Boolean network models implies that activation ruleshave to be defined to decide how a gene is activated by itsinteractors. The exact choice of the assumptions largely determinesthe outcome and the number and characteristics of the attractors.Supplementary Figures S1-S3 show that, for the example Booleannetwork in Figure 1, the choices of activation rules and updatescheme can result in a different network behavior, and thus, differentsets of attractors. As it is often not known in advance whichassumptions best match the biological reality of the modeled GRN,testing different assumptions is advisable.

To have a generic framework that allows testing differentassumptions, ASP-G implements three different activation rules: thefirst one adopted by Li et al. (2004), Ay et al. (2009) and Davidichand Bornholdt (2008) in which a gene is considered to be activatedif the majority of its active incoming interactors (interactors thatare themselves active) has an activating role. Otherwise, the genewill become inactive. This is referred to as the r∗ rule in ASP-G. A second one adopted by Pedicini et al. (2010) and Garg et al.

4

ASP-G

Algorithm 1 Algorithm to compute attractors in ASP-G

1: {P is the ASP program with the rules of the network}2: k = n3: attractor is found = False4: attractors = ∅5: while ASP finds a path of length k as an answer set in P do6: {s = 〈s1, s2, . . . , sk〉 is the path found}7: for j = k − 1 to 1 do8: if sj = sk then9: attractors = attractors

⋃{〈sj+1, . . . , sk〉}

10: attractor is found = True11: {The attractors already found are forbidden in P}12: for s in {sk, ..., sj+1} do13: {The states are added as constraints for the next path}14: P = P

⋃{← active(X1, T ), inhibited(X2, T ), . . . |Xi ∈

s}15: end for16: break17: end if18: end for19: if attractor is found then20: attractor is found = False21: else22: k = 2 · k23: end if24: end while25: return attractors

(2007) assumes that a gene is activated only when there is at leastone active activator amongst its active incoming interactors and noinhibitor. This is referred to as the r+ rule in ASP-G. In addition,we implemented a third more detailed activation rule in whichthe activation for each gene is expressed as a free-form Booleanfunction of its activators and inhibitors. This type of rules bettergrasps the complexity of true biological interactions and includesmore detailed information on the specifics of the interactions. Itwas used in for instance Sanchez-Corrales et al. (2010), Albert andOthmer (2003) and Gonzalez et al. (2008).

Related to the update scheme, a choice has to be made betweenupdating the network elements simultaneously (synchronously)versus at different points (asynchronously). Earlier work byThomas (1973) and Kauffman (1993) assumed synchronicity intheir modeling, mainly because of computational efficiency reasons.However, the assumption of synchronicity was challenged inThomas (1991) and Harvey and Bossomaier (1997), who arguedthat, for many biological systems, assuming an asynchronousupdate scheme is more realistic. Subsequent work on Booleannetwork models (Naldi et al. (2007),Garg et al. (2007), Ay et al.(2009), Garg et al. (2008) and Gonzalez et al. (2008)) mainlyapplied asynchronous update schemes. Therefore, in ASP-G weimplemented both the synchronous and the asynchronous updatescheme.

3.2 Simulation ResultsTo test the correctness of ASP-G in simulating Boolean networkmodels and computing attractors, we applied ASP-G on previouslypublished Boolean network models of GRNs and compared the

obtained attractors with the originally published ones. Like anyother formalism to find attractors in Boolean network models, ASP-G detects exhaustively all attractors of the network. Therefore, acorrect result corresponds to an exact match in the attractor setidentified by ASP-G and that present in the reference publications.

First we tested ASP-G on the relatively small GRNs involved inbudding and fission yeast cell cycle analyzed in Li et al. (2004),Ay et al. (2009) and Davidich and Bornholdt (2008), as well ason the network model of the GRN from the Th cell differentiationdescribed in Garg et al. (2007). We used the activation rules andupdate schemes that were also applied in the original studies, exceptfor the data set of Garg et al. (2007), in which we applied alsothe synchronous update scheme in addition to the originally appliedasynchronous one (see Table 1).

As expected for each of these results, the attractors found byASP-G match exactly the ones found by the reference papers (datanot shown). The table also shows that ASP-G performs relativelyfast for these small networks. We also observe that the attractors inthe synchronous case often coincide with those in the asynchronouscase. This is due to the fact that simple attractors (i.e., attractorsthat have only one state) are more commonly found, and thatthey are shared between synchronous and asynchronous updateschemes. Only for the fission yeast cell network under the r∗

activation rules there is a difference in the number of synchronousattractors (13) and asynchronous attractors (15). The fact that thereare more attractors in the asynchronous case might seem counter-intuitive, since the reduced STG used to calculate these attractorscontains at most as many nodes as the synchronous STG, andoften less. However, the original STG for the asynchronous casetypically contains more edges than in the synchronous case, andthese additional edges can account for more attractors.

To show the flexibility of ASP-G in expressing different typesof activation rules, we also encoded the Boolean network modelof the GRN involved in A. thaliana flower development originallydescribed in Sanchez-Corrales et al. (2010) with our ASP-Gframework. This network model consists of 13 genes and usesgene specific update rules as described in Section 2.2 Activationrules. As in the original publication, we applied a synchronousupdate scheme. ASP-G correctly recapitulated all 10 attractors ofthe network as described in the original paper Sanchez-Corraleset al. (2010). The computation took only 0.479 seconds.

To test ASP-G on a larger network, we use the network datafrom the T-helper (Th) cell differentiation described by Pediciniet al. (2010). The purpose of the study was to find evidencesupporting (or contradicting) the traditional view that the genesinvolved in the regulation of the two types of Th cells, Th1 andTh2, have counter-regulatory interaction. Similar to what has beendone in Pedicini et al. (2010), we first computed the attractorsof the original network in the presence of all genes and thensearched for attractors in different single-gene knock-out networksin silico to test the effect of knocking out intracellular genestowards the attractors of the network. To show the added valueof using different simulation assumptions, we also performed thecomputations using an asynchronous update mechanism, as opposedto only synchronous update scheme, as performed by Pedicini et al.(2010). The results are presented in the supplementary Table S6.

In terms of computational efficiency, the result shows that ASP-Gis able to perform relatively well for the moderately-sized Th cellnetwork. For the asynchronous case, attractor computation required

5

Mushthofa et al

Table 1. ASP-G results and running times for common GRNs found in the literature. Network: describes the original network model. Genes: number of genespresent in the network; Attractors: number of detected attractors; Update mechanism: synchrounous versus asynchrounous updating was used as describedin the methods section; Activation rules: r∗ activation rules indicates that a gene becomes active when it has more active activating interactors than activeinhibiting ones, while the r+ activation rules indicate that a gene becomes active if it has at least one active activating gene and no inhibiting ones. Time:running time on a Dell Latitude D820 notebook.

Network Reference Genes Attractors Update Mechanism Activation Rules TimeCommon Yeast cell cycle Li et al. (2004) 11 7 Synchronous r∗ 1.507

Ay et al. (2009) 11 7 Asynchronous r∗ 0.134Fission Yeast cell cycle Davidich and Bornholdt (2008) 10 13 Synchronous r∗ 1.653

Ay et al. (2009) 10 15 Asynchronous r∗ 0.371Th cell differentiation / 23 3 Synchronous r+ 0.270

Garg et al. (2007) 23 3 Asynchronous r+ 0.206

0.6 seconds on average. For the synchronous case where moreattractors were found, the computation times took 71.9 seconds onaverage. When applying the synchronous update scheme as usedin Pedicini et al. (2010), ASP-G reproduced the four attractors:Th0, Th1, Th2 and ThX, as in the original paper. However, whentrying to reproduce the attractors in the gene knock-out setting,we found discrepancies with the results reported by Pedicini et al.(2010). These discrepancies were caused by mistakes in the SAT-based truth table used in the original publication. SupplementaryTable S6 shows the corrected results for completeness. Note thatthese mistakes do not affect the biological conclusions made in theoriginal publication. However, it illustrates that specifying largernetworks with rather complicated behavior becomes cumbersomeand error prone in paradigms like SAT, whereas this is much lessthe case for a declarative approach such as ASP-G.

The existence of the ThX attractor and the observed pattern ofgene activities in the attractors of the knock-out networks caused theauthors of Pedicini et al. (2010) to conclude that the active genes inTh1 and Th2 cells do not play a counter-regulatory roles towardseach other, contrary to what is traditionally believed. However,when using the asynchronous update scheme, we noted that theattractor ThX is no longer obtained. A similar pattern occurs forthe knock-out networks, where the use of the asynchronous updatescheme drastically changes the set of detected attractors comparedto those detected using a synchronous update scheme: the numberof attractors found with the asynchronous update scheme for theknock-out networks ranges only between 2-5, whereas the numberof attractors found in the synchronous case ranged between 286-1154. This finding suggests that the occurrence of the ThX attractorand the existence of a large number of attractors in the knock-outnetworks as found in Pedicini et al. (2010) are only valid under thesynchronous update scheme. It thus defines the boundary conditionsunder which the conclusions of Pedicini et al. (2010) are valid andhighlights the relevance of performing modeling under differentscenarios, as offered by ASP-G, to put biological conclusions inperspective.

4 CONCLUSIONIn this paper, we presented ASP-G, a modular system tosimulate Boolean network models of GRNs and to subsequentlycompute their attractors. ASP-G is based on the declarative ASPprogramming paradigm which has already been previously appliedin the context of biological network data analysis and modeling (see,

e.g., Gebser et al. (2008a), Dworschak et al. (2008), Gebser et al.(2010b), Gebser et al. (2010a) and Corblin et al. (2012)). Recently,Inoue (2011) showed in a theoretical comparison between Booleannetworks and the underlying semantics of ASP, that a strongmathematical relation exists between the attractors/steady states ofBoolean networks and the notion of stable models commonly usedin ASP. We built upon this earlier result in our proposed method,ASP-G.

The main added value of ASP-G is in its declarativity andmodularity: it allows users to easily test different update schemesand activation rules when simulating the dynamics of their Booleannetwork model by selecting and modifying the appropriate modules.In addition, the fact that ASP-G is based on a declarative languagemakes it less error prone than other approaches such as SAT whichdepend on the definition of difficult to interpret and tedious toconstruct truth tables. Using an underlying declarative programmingparadigm also makes ASP-G easily extendable to other parametersettings. Decoupling the problem definition from its solution thusallows for a greater flexibility compared to other ad hoc systemssuch as genYsis (Garg et al. (2007)) and geneFAtt (Zheng et al.(2013)), where assumptions such as update scheme and activationrules are already built into the system.

We showed the correctness of ASP-G in simulating Booleannetwork models and obtaining attractors under different assumptionsby successfully recapitulating the detection of attractors ofpreviously published studies. Relying on a modular and flexibledeclarative programming paradigm definitely comes at the expenseof being slower than the more dedicated systems to computeattractors, such as genYsis (Garg et al. (2007)) and geneFAtt(Zheng et al. (2013)). However, in terms of computationalefficiency, ASP-G proved to be quite fast (for small networks i.e. upto 23 genes computations are below a second, for larger networksi.e. up to 51 genes, the longest computation time took under 4.5minutes). Also, ongoing research in ASP solvers (Gebser et al.(2011a)) will make it possible for ASP to reach a point where itoutperforms other logic paradigms. This is definitely the case whencomparing ASP with Binary (or multiple) Decision Diagrams (Lee(1959)) used to calculate attractors in Boolean Networks models forGRNs (Arellano et al. (2011), Naldi et al. (2007)) as they sufferfrom memory explosion when the size of the network starts tobecome large (Clarke et al. (2001)).

For larger-sized networks, any exhaustive method will face achallenge, as the state space of the network increases exponentiallyw.r.t. the number of nodes in the network. When dealing with such

6

ASP-G

larger networks, methods that avoid an exhaustive search as in Ayet al. (2009) might become more suitable under these conditions.Conclusively, ASP-G is tailored to simulate Boolean networkmodels of GRNs and to compute attractors in a diagnostic mode,where one wants to test different update schemes and activationrules in order to find the setting that best matches experimental dataor to correctly delineate the boundary conditions under which thebiological conclusions based on these simulations are valid.

ACKNOWLEDGEMENTSFunding: This work was supported by the Ghent UniversityMultidisciplinary Research Partnership “Bioinformatics: fromnucleotides to networks” and the Interuniversity Attraction PolesProgramme [IUAP P6/25], initiated by the Belgian State, SciencePolicy Office (BioMaGNet) and by the IWT: SBO-NEMOA; FWO:G.0428.13N fund.

REFERENCESAlbert, R. and Othmer, H. G. (2003). The topology of the regulatory interactions

predicts the expression pattern of the segment polarity genes in drosophilamelanogaster. Journal of theoretical biology, 223(1), 1–18.

Arellano, G., Argil, J., Azpeitia, E., Benitez, M., Carrillo, M., Gongora, P.,Rosenblueth, D., and Alvarez-Buylla, E. (2011). ”antelope”: a hybrid-logic modelchecker for branching-time boolean grn analysis. BMC Bioinformatics, 12(1), 490.

Ay, F., Xu, F., and Kahveci, T. (2009). Scalable steady state analysis of booleanbiological regulatory networks. PLoS ONE, 4(12), e7992.

Clarke, E., Grumberg, O., Jha, S., Lu, Y., and Veith, H. (2001). Progress on the stateexplosion problem in model checking. In Informatics, pages 176–194. Springer.

Corblin, F., Fanchon, E., Trilling, L., Chaouiya, C., and Thieffry, D. (2012). Automaticinference of regulatory and dynamical properties from incomplete gene interactionand expression data. In Information Processign in Cells and Tissues, pages 25–30.Springer.

Davidich, M. I. and Bornholdt, S. (2008). Boolean network model predicts cell cyclesequence of fission yeast. PloS one, 3(2), e1672.

De Jong, H. (2002). Modeling and simulation of genetic regulatory systems: a literaturereview. Journal of computational biology, 9(1), 67–103.

De Jong, H. and Page, M. (2008). Search for steady states of piecewise-lineardifferential equation models of genetic regulatory networks. Computational Biologyand Bioinformatics, IEEE/ACM Transactions on, 5(2), 208–222.

Dubrova, E. and Teslenko, M. (2011). A sat-based algorithm for finding attractors insynchronous boolean networks. IEEE/ACM Transactions on Computational Biologyand Bioinformatics (TCBB), 8(5), 1393–1399.

Dworschak, S., Grell, S., Nikiforova, V. J., Schaub, T., and Selbig, J. (2008). Modelingbiological networks by action languages via answer set programming. Constraints,13(1-2), 21–65.

Eiter, T., Ianni, G., and Krennwallner, T. (2009). Answer set programming: A primer.In Reasoning Web. Semantic Technologies for Information Systems, pages 40–110.Springer.

Espinosa-Soto, C., Padilla-Longoria, P., and Alvarez-Buylla, E. R. (2004). A generegulatory network model for cell-fate determination during arabidopsis thalianaflower development that is robust and recovers experimental gene expressionprofiles. The Plant Cell Online, 16(11), 2923–2939.

Garg, A., Xenarios, I., Mendoza, L., and DeMicheli, G. (2007). An efficient methodfor dynamic analysis of gene regulatory networks and in silico gene perturbationexperiments. In Research in Computational Molecular Biology, pages 62–76.Springer.

Garg, A., Di Cara, A., Xenarios, I., Mendoza, L., and De Micheli, G.(2008). Synchronous versus asynchronous modeling of gene regulatory networks.Bioinformatics, 24(17), 1917–1925.

Gebser, M., Schaub, T., Thiele, S., Usadel, B., and Veber, P. (2008a). Detectinginconsistencies in large biological networks with answer set programming. In Logic

Programming, pages 130–144. Springer.Gebser, M., Kaminski, R., Kaufmann, B., Ostrowski, M., Schaub, T., and Thiele, S.

(2008b). Engineering an incremental asp solver. In Logic Programming, pages

190–205. Springer.Gebser, M., Konig, A., Schaub, T., Thiele, S., and Veber, P. (2010a). The bioasp library:

Asp solutions for systems biology. In Tools with Artificial Intelligence (ICTAI), 201022nd IEEE International Conference on, volume 1, pages 383–389. IEEE.

Gebser, M., Guziolowski, C., Ivanchev, M., Schaub, T., Siegel, A., Thiele, S., andVeber, P. (2010b). Repair and prediction (under inconsistency) in large biologicalnetworks with answer set programming. In KR.

Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T., and Schnor, B. (2011a).Cluster-based asp solving with claspar. In Logic Programming and NonmonotonicReasoning, pages 364–369. Springer.

Gebser, M., Kaufmann, B., Kaminski, R., Ostrowski, M., Schaub, T., and Schneider, M.(2011b). Potassco: The potsdam answer set solving collection. AI Communications,24(2), 107–124.

Gonzalez, A., Chaouiya, C., and Thieffry, D. (2008). Logical modelling of the roleof the hh pathway in the patterning of the drosophila wing disc. Bioinformatics,24(16), i234–i240.

Harvey, I. and Bossomaier, T. (1997). Time out of joint: Attractors in asynchronousrandom boolean networks. In Proceedings of the Fourth European Conference onArtificial Life, pages 67–75. MIT Press, Cambridge.

Inoue, K. (2011). Logic programming for boolean networks. In Proceedings ofthe Twenty-Second international joint conference on Artificial Intelligence-VolumeVolume Two, pages 924–930. AAAI Press.

Kauffman, S. A. (1993). The origins of order: Self-organization and selection inevolution. Oxford university press.

Kaufman, M., Urbain, J., and Thomas, R. (1985). Towards a logical analysis of theimmune response. Journal of theoretical biology, 114(4), 527–561.

Lee, C.-Y. (1959). Representation of switching circuits by binary-decision programs.Bell System Technical Journal, 38(4), 985–999.

Li, F., Long, T., Lu, Y., Ouyang, Q., and Tang, C. (2004). The yeast cell-cycle networkis robustly designed. Proceedings of the National Academy of Sciences of the UnitedStates of America, 101(14), 4781–4786.

Lifschitz, V. (2008). What is answer set programming?. In AAAI, volume 8, pages1594–1597.

Mendoza, L. and Alvarez-Buylla, E. R. (1998). Dynamics of the genetic regulatorynetwork for arabidopsis thaliana flower morphogenesis. Journal of theoreticalbiology, 193(2), 307–319.

Mendoza, L., Thieffry, D., and Alvarez-Buylla, E. R. (1999). Genetic control of flowermorphogenesis in arabidopsis thaliana: a logical analysis. Bioinformatics, 15(7),593–606.

Naldi, A., Thieffry, D., and Chaouiya, C. (2007). Decision diagrams for therepresentation and analysis of logical models of genetic networks. In ComputationalMethods in Systems Biology, pages 233–247. Springer.

Pedicini, M., Barrenas, F., Clancy, T., Castiglione, F., Hovig, E., Kanduri, K., Santoni,D., and Benson, M. (2010). Combining network modeling and gene expressionmicroarray analysis to explore the dynamics of th1 and th2 cell regulation. PLoScomputational biology, 6(12), e1001032.

Rottger, R., Ruckert, U., Taubert, J., and Baumbach, J. (2012). How little do weactually know? on the size of gene regulatory networks. Computational Biologyand Bioinformatics, IEEE/ACM Transactions on, 9(5), 1293–1300.

Sanchez, L. and Thieffry, D. (2001). A logical analysis of the drosophila gap-genesystem. Journal of theoretical Biology, 211(2), 115–141.

Sanchez-Corrales, Y.-E., Alvarez-Buylla, E. R., and Mendoza, L. (2010). Thearabidopsis thaliana flower organ specification gene regulatory network determinesa robust differentiation process. Journal of theoretical biology, 264(3), 971–983.

Shmulevich, I., Dougherty, E. R., and Zhang, W. (2002). From boolean to probabilisticboolean networks as models of genetic regulatory networks. Proceedings of theIEEE, 90(11), 1778–1792.

Thomas, R. (1973). Boolean formalization of genetic control circuits. Journal oftheoretical biology, 42(3), 563–585.

Thomas, R. (1991). Regulatory networks seen as asynchronous automata: a logicaldescription. Journal of Theoretical Biology, 153(1), 1–23.

Zheng, D., Yang, G., Li, X., Wang, Z., Liu, F., and He, L. (2013). An efficient algorithmfor computing attractors of synchronous and asynchronous boolean networks. PloSone, 8(4), e60593.

7