Dynamics Preserving Size Reduction Mappings for Probabilistic Boolean Networks

13
2310 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007 Dynamics Preserving Size Reduction Mappings for Probabilistic Boolean Networks Ivan Ivanov, Ranadip Pal, and Edward R. Dougherty Abstract—A probabilistic Boolean network (PBN) is a discrete network composed of a family of Boolean networks such that at each time instant the transition of the PBN is governed by one of its constituent networks. A random variable determines when the governing Boolean network switches and, when a switch oc- curs, a new constituent network is chosen according to a selec- tion distribution to govern the PBN transitions until another switch is called for. As nonlinear models of genetic regulatory networks, PBNs incorporate the indeterminacy owing to latent variables ex- ternal to the model that have biological interaction with genes in the model. Besides being used to model biologically phenomena, such as cellular state dynamics and the switch-like behavior of cer- tain genes, PBNs have served as the main model for the application of intervention methods, including optimal control strategies, to fa- vorably effect system dynamics. An obstacle in applying PBNs to large-scale networks is the computational complexity of the model. It is sometimes necessary to construct computationally tractable subnetworks while still carrying sufficient structure for the appli- cation at hand. Hence, there is a need for size reducing mappings. This paper proposes a reduction strategy that preserves the dy- namical structure of the network, a crucial requirement for the development of intervention strategies based on control theory. In particular, we focus on the following two issues when deleting a gene from the network: 1) maintaining the same number of con- stituent Boolean networks, and 2) preserving, in an optimal sense, the attractor structure, the relative sizes of the basins of attraction, and the level structures of the state transition diagrams of the con- stituent Boolean networks. Preservation of the attractor structure is critical because the attractors of a PBN determine its steady-state behavior. Index Terms—Complexity, reduction mappings, models of Genomic regulation, probabilistic Boolean networks. I. INTRODUCTION A SALIENT problem in genomic research is to understand the manner in which genes interact in an integrative and holistic way within a given genome. Modeling the dynamics of the expression profiles of those genes over time is one of the most important approaches in uncovering fundamental cell regulatory mechanisms and constitutes a key area of genomic Manuscript received July 5, 2005. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Elias S. Manolakos. This work was supported in part by the National Science Founda- tion under ECS0355227 and CCF-0514644, by the National Cancer Institute under CA90301, and by the Translational Genomics Research Institute. I. Ivanov is with the Department of Veterinary Physiology and Pharma- cology, Texas A&M University, College Station, TX 77843-4466 USA (e-mail: [email protected]). R. Pal is with the Department of Electrical Engineering, Texas A&M Univer- sity, College Station, TX 77843-4466 USA (e-mail: [email protected]). E. R. Dougherty is with the Department of Electrical Engineering, Texas A&M University, College Station, TX 77843-4466 USA and also with the Computational Biology Division, Translational Genomic Research Institute, Phoenix, AZ 85004 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TSP.2006.890929 signal processing. Many different gene regulatory network models have been proposed, ranging from fully deterministic and discrete, e.g., Boolean networks (BNs), to fully stochastic and continuous, e.g., stochastic differential equations, [4]. The availability of experimental data and the specific properties of the distribution of the gene expression profiles in microarray experiments suggest that discrete dynamical systems offer a reasonable compromise between the model complexity and its descriptive power [7]. Among deterministic dynamical systems, perhaps, the most attention has been given to the BN model. In this model, gene expression is quantized to only two levels: ON and OFF. The expression level (state) of a gene is functionally related via a logical rule to the expression states of some other genes. The Boolean model has yielded insights into the overall behavior of large genetic networks [5], [8]–[10]. This paper concerns a stochastic extension of the Boolean model, probabilistic Boolean networks (PBNs). PBNs share the appealing rule-based properties of BNs, are robust to uncer- tainty both in the data and model selection, and can be studied in the probabilistic context of Markov chains [12]. PBNs can be used to model biologically meaningful phenomena, such as cellular state dynamics, the switch-like behavior of certain genes, stability, and hysteresis of the cell regulatory system [2], [12], [17], they offer the potential for discovering optimal intervention strategies based on genetic regulatory dynamics [2], [14], and they have theoretically grounded relationships with Bayesian networks [13]. An obstacle in applying PBN to large-scale networks is the computational complexity of the model. It is sometimes neces- sary to construct computationally tractable subnetworks while still carrying sufficient structure for the application at hand. Hence, there is a need for size reducing mappings and algo- rithms between PBNs. The initial effort at constructing such mappings put the major emphasis on preserving the original probability structure of the network [3]; here, we focus on reduc- tion strategies that preserve the dynamical structure. Our moti- vation is related to the biological interpretation of the model. If a model were to represent a real gene regulatory network, then reducing the size of the network should produce similar dynam- ical behavior. Reducing the size of a PBN while preserving its dynamical behavior, has important implications in developing suitable intervention strategies based on control theory. We ap- proach dynamical behavior in terms of the structure of the at- tractors and the basins of attraction of the network under con- sideration. The differences between the dynamical structure of the original PBN and the reduced network will depend essen- tially on which genes are removed from the original network. The rest of this paper is organized as follows. Section II gives the definition of a PBN and defines several important character- 1053-587X/$25.00 © 2007 IEEE

Transcript of Dynamics Preserving Size Reduction Mappings for Probabilistic Boolean Networks

2310 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Dynamics Preserving Size Reduction Mappings forProbabilistic Boolean Networks

Ivan Ivanov, Ranadip Pal, and Edward R. Dougherty

Abstract—A probabilistic Boolean network (PBN) is a discretenetwork composed of a family of Boolean networks such that ateach time instant the transition of the PBN is governed by oneof its constituent networks. A random variable determines whenthe governing Boolean network switches and, when a switch oc-curs, a new constituent network is chosen according to a selec-tion distribution to govern the PBN transitions until another switchis called for. As nonlinear models of genetic regulatory networks,PBNs incorporate the indeterminacy owing to latent variables ex-ternal to the model that have biological interaction with genes inthe model. Besides being used to model biologically phenomena,such as cellular state dynamics and the switch-like behavior of cer-tain genes, PBNs have served as the main model for the applicationof intervention methods, including optimal control strategies, to fa-vorably effect system dynamics. An obstacle in applying PBNs tolarge-scale networks is the computational complexity of the model.It is sometimes necessary to construct computationally tractablesubnetworks while still carrying sufficient structure for the appli-cation at hand. Hence, there is a need for size reducing mappings.This paper proposes a reduction strategy that preserves the dy-namical structure of the network, a crucial requirement for thedevelopment of intervention strategies based on control theory. Inparticular, we focus on the following two issues when deleting agene from the network: 1) maintaining the same number of con-stituent Boolean networks, and 2) preserving, in an optimal sense,the attractor structure, the relative sizes of the basins of attraction,and the level structures of the state transition diagrams of the con-stituent Boolean networks. Preservation of the attractor structureis critical because the attractors of a PBN determine its steady-statebehavior.

Index Terms—Complexity, reduction mappings, models ofGenomic regulation, probabilistic Boolean networks.

I. INTRODUCTION

ASALIENT problem in genomic research is to understandthe manner in which genes interact in an integrative and

holistic way within a given genome. Modeling the dynamicsof the expression profiles of those genes over time is one ofthe most important approaches in uncovering fundamental cellregulatory mechanisms and constitutes a key area of genomic

Manuscript received July 5, 2005. The associate editor coordinating thereview of this manuscript and approving it for publication was Dr. Elias S.Manolakos. This work was supported in part by the National Science Founda-tion under ECS0355227 and CCF-0514644, by the National Cancer Instituteunder CA90301, and by the Translational Genomics Research Institute.

I. Ivanov is with the Department of Veterinary Physiology and Pharma-cology, Texas A&M University, College Station, TX 77843-4466 USA (e-mail:[email protected]).

R. Pal is with the Department of Electrical Engineering, Texas A&M Univer-sity, College Station, TX 77843-4466 USA (e-mail: [email protected]).

E. R. Dougherty is with the Department of Electrical Engineering, TexasA&M University, College Station, TX 77843-4466 USA and also with theComputational Biology Division, Translational Genomic Research Institute,Phoenix, AZ 85004 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TSP.2006.890929

signal processing. Many different gene regulatory networkmodels have been proposed, ranging from fully deterministicand discrete, e.g., Boolean networks (BNs), to fully stochasticand continuous, e.g., stochastic differential equations, [4]. Theavailability of experimental data and the specific properties ofthe distribution of the gene expression profiles in microarrayexperiments suggest that discrete dynamical systems offer areasonable compromise between the model complexity and itsdescriptive power [7]. Among deterministic dynamical systems,perhaps, the most attention has been given to the BN model. Inthis model, gene expression is quantized to only two levels: ON

and OFF. The expression level (state) of a gene is functionallyrelated via a logical rule to the expression states of some othergenes. The Boolean model has yielded insights into the overallbehavior of large genetic networks [5], [8]–[10].

This paper concerns a stochastic extension of the Booleanmodel, probabilistic Boolean networks (PBNs). PBNs share theappealing rule-based properties of BNs, are robust to uncer-tainty both in the data and model selection, and can be studiedin the probabilistic context of Markov chains [12]. PBNs canbe used to model biologically meaningful phenomena, suchas cellular state dynamics, the switch-like behavior of certaingenes, stability, and hysteresis of the cell regulatory system[2], [12], [17], they offer the potential for discovering optimalintervention strategies based on genetic regulatory dynamics[2], [14], and they have theoretically grounded relationshipswith Bayesian networks [13].

An obstacle in applying PBN to large-scale networks is thecomputational complexity of the model. It is sometimes neces-sary to construct computationally tractable subnetworks whilestill carrying sufficient structure for the application at hand.Hence, there is a need for size reducing mappings and algo-rithms between PBNs. The initial effort at constructing suchmappings put the major emphasis on preserving the originalprobability structure of the network [3]; here, we focus on reduc-tion strategies that preserve the dynamical structure. Our moti-vation is related to the biological interpretation of the model. Ifa model were to represent a real gene regulatory network, thenreducing the size of the network should produce similar dynam-ical behavior. Reducing the size of a PBN while preserving itsdynamical behavior, has important implications in developingsuitable intervention strategies based on control theory. We ap-proach dynamical behavior in terms of the structure of the at-tractors and the basins of attraction of the network under con-sideration. The differences between the dynamical structure ofthe original PBN and the reduced network will depend essen-tially on which genes are removed from the original network.

The rest of this paper is organized as follows. Section II givesthe definition of a PBN and defines several important character-

1053-587X/$25.00 © 2007 IEEE

IVANOV et al.: DYNAMICS PRESERVING SIZE REDUCTION MAPPINGS FOR PROBABILISTIC BOOLEAN NETWORKS 2311

istics of the model. Section III introduces the projection map-ping and the reduction mapping. Section IV describes a newalgorithm for reducing the size of a PBN while preserving itsdynamical structure. Section V compares the performance ofthe mappings given in the preceding sections on a synthetic ex-ample of a PBN. Section VI applies the algorithm developed inSection IV to a real data set, and Section VII concludes with adiscussion about advantages and disadvantages of the mappingsconsidered in this paper.

II. DEFINITIONS AND BASIC PROPERTIES

Following [16], we define a PBN as atriple, where , isa set of nodes (genes), ,

, is a list of realizations or network func-tions of , and is a list of selection proba-bilities for the corresponding realizations of . Each noderepresents the state (expression) of the gene , wheremeans that gene is not expressed and means that itis expressed. The th component of each network function

gives the rule of regulation for the gene in the context ofthat realization of , and is called predictor for that gene. Up-dating the states of all genes in the network is done synchro-nously according to the components of the currently used net-work function, and then the process is repeated. The choice ofwhich network function to apply is governed by a selectionprocedure. Specifically, at each time point a random decision ismade as to whether to switch the network function for the nexttransition, with a probability of a switch being a system pa-rameter. If a decision is made to switch the network function,then a new function is chosen from among all of the possiblerealizations of , according to their selection probabili-ties . In other words, each network function representsa deterministic BN and the PBN behaves as a fixed BN until arandom decision (with probability ) is made to change the net-work function according to the probabilities fromamong . In addition to the network switching andselection in the PBN model there is mechanism which modelsrandom gene mutations, i.e., at each time point there is a prob-ability of any gene changing its value uniformly randomlyamong the other possible values in . In the special case whenthe selection probabilities of the individual componentsare given, one can compute, under the assumption that each genepredictor is chosen independently of the other predictors, the se-lection probability for the realization of by the formula

(1)

and, in this case, the PBN is called independent.The state space of the network together

with the set of network functions, in conjunction with transitionsbetween the states and network functions, determine a Markovchain. The random perturbation makes the Markov chain er-godic, meaning that it has the possibility of reaching any statefrom another state and that it possesses a long-run (steady-state)distribution.

As a particular realization of evolves in time, it will even-tually enter a fixed state, or a set of states, through which it willcontinue to cycle. In the first case, the state is called a singletonor fixed-point attractor, whereas, in the second case it is calleda cyclic attractor. The attractor structure of is determinedby the particular combination of singleton and cyclic attractors,and by the lengths of the attractor cycles. The attractors that thenetwork may enter depend on the initial state. All initial statesthat eventually produce a given attractor constitute the basin ofthat attractor. The attractors represent the fixed points of the dy-namical system that capture its longterm behavior. The numberof transitions needed to return to a given state in an attractor iscalled the cycle length. Attractors may be used to characterize acell’s phenotype [11]. The attractors of a PBN are the attractorsof its constituent BNs. It is important to emphasize that eventhough each BN has the same state space as the PBN it par-ticipates in, the organization (dynamics) of the state space mightbe quite different for different BNs. The dynamics of the statespace in the th BN are governed by the th network function

, and is represented by its state transition diagram: a directedgraph with the states in as the set of its ver-tices, and with edges connecting the state withthe state if . The state transition diagram ispartitioned into level sets, where the level set is composed ofall of the states of the network that transition to one of the at-tractor states in exactly transitions. One can think of the set ofattractor states as the level set . Then nonattractor states com-pose the transient states of the network. The transient states arepartitioned according to the attractor cycles because each tran-sient state begins a sequence of transitions that eventually endsup in a unique attractor cycle. The attractor cycles are mutuallydisjoint, which in turn implies that the basins of different attrac-tors are disjoint. Given any transient state, it belongs to a uniquebasin and unique level. The dynamics of the state space in theentire PBN is represented by the collection of the state transi-tion diagrams , where the decision about switching from onestate transition diagram to another one is based on the modelparameters , , and . One should also notice thatthe deterministic state transition diagrams can be combinedinto a single probabilistic state transition diagram which rep-resents the Markov chain associated with the PBN.

III. PROJECTION AND REDUCTION MAPPINGS

Projection mappings of independent PBNs are defined in [3].They are introduced as an attempt to reduce the complexity ofa PBN while maintaining consistency with the original prob-ability structure of the PBN. Here, we provide the definition ofthe projection mapping for the general case of a PBN. The basicprojection is a mapping that transforms the given PBNinto a new one with the same parameters and , and such thatthe number of genes is reduced by one, i.e., the gene in theoriginal network is deleted. Without loss of generality, we mayassume that the deleted gene is . Thus, for a PBN

2312 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Every predictor , , generates two predic-

tors and according to the rule

in (2)

Thus, every network function for determines newnetwork functions for by combining ’s and ’s in allof the possible ways for every fixed and .The new network functions have their corresponding selectionprobabilities given by the formula

(3)

where is the number of the components of the new networkfunction that are coming as , and ,is the marginal probability for the gene to have values 0 or 1,computed using the steady/stationary state probability distribu-tion of the original PBN . For example, the new network func-tion has its selection probability equal to

. When two or more of thenetwork functions for happen to be identical their selectionprobabilities combine in a natural way. One should notice thatwhile the projection mapping preserves the probability structureof , it exponentially increases the number of the BNs that form

. In this sense the projection mapping does not achieve thegoal of reducing the complexity of the original PBN.

A different kind of size-reducing mapping (which also pre-serves the parameters and of the original PBN) is the re-duction mapping of a PBN, see [6] for the special case of anindependent PBN. It is important to point out that this mappingmight not preserve the probability structure of the original PBN.

To better understand the motivation and the definition ofthe reduction mapping, we consider the following portionof the probabilistic state transition diagram for the orig-inal PBN containing the states ,

, , and, where , , , are the

corresponding transition probabilities. If one “deletes” the nodethis diagram collapses to where and

are the corresponding states, and

If we formally define the reduction mapping acting on a PBNby “deleting” one gene

and require that for the new PBN with network func-tions having the same selection probabilities (Fig. 1) as theircounterparts in , , , we can see that in orderto maximally preserve the probability structure of the originalPBN, the probabilistic state transition diagram for must havetransition probabilities closely matching the transition probabil-ities of the “collapsed” state transition diagram previouslydescribed. This goal is achieved by an optimization procedure

Fig. 1. Original state transition diagram.

Fig. 2. Collapsed state transition diagram.

which for every fixed network function , fromcombines the predictors and (Fig. 2) to form the new

predictor , . A detailed discussion about theconstruction of in the special case of an independent PBNis given in [6], and the construction carries on with no changesfor the case of a general PBN. One can immediately notice thatthe reduction mapping reduces the complexity of the originalPBN not only by “deleting” one gene but also by not increasingthe number of BNs that comprise the reduced PBN. Here, wewant to point out the difference in defining the reduction andthe projection mappings. While the projection is based on theprobability distribution of a single gene, the reduction mappingis defined using the probability distribution of the entire collec-tion of states of the given PBN which allows for the optimizationprocedure given in [6]. In both cases though, there is no controlover the changes in the dynamics/state transition diagrams of theBNs comprising the original PBN. In addition, both mappingsrely on knowledge about the steady/stationary state distributionof the original PBN.

IV. SIZE REDUCING DYNAMICS PRESERVING MAPPINGS

In this section, we describe dynamics induced reduction(DIRE) for PBN: a new algorithm for reducing the size of agiven PBN. The idea is to delete genes from the PBN while pre-serving the dynamics of the constituent BNs and keeping theirnumber unchanged. We follow a procedure that collapses thestate transitional diagram for each individual BN in a mannersimilar to the example preceding the definition of the reductionmapping (Figs. 1 and 2). States in a given BN are aggregatedtogether if they differ only in the expression value of the genethat has been deleted from the network. It is important to noticefrom the onset that the design of DIRE ensures control over thedamage to the state transition diagram of every BN. In this waynot only the stationary/steady-state distribution of the reducedPBN is similar to the one produced by the previously describedreduction mapping but in addition, the structure of the statetransition diagram is taken into account which is the mainadvantage of the newly proposed algorithm. Several importantissues must be addressed before describing the algorithm ingreater detail. Assuming that the real genetic regulatory systembeing modeled by a PBN switches between different wiringsfor the genes participating in the regulation, depending onfactors either external or internal to the cell, then the deletionof a gene has to be properly interpreted as follows.

IVANOV et al.: DYNAMICS PRESERVING SIZE REDUCTION MAPPINGS FOR PROBABILISTIC BOOLEAN NETWORKS 2313

Fig. 3. State transition diagram of the BN.

1) First, it is desirable that deleting a gene from a givenregulatory network does not increase the number of con-stituent BNs.

2) Second, it is important that the dynamics of a real geneticregulatory system do not strongly depend on how manyof the genes participating in the regulation are observ-able or not. Thus, preserving properties of a PBN, suchas its attractor structure, the relative sizes of the basinsof attraction, as well as the level structure of the statetransition diagrams of the BNs comprising the PBN, isa major goal of size reduction.

To appreciate the second goal, we consider an example of BNon three genes whose state transition diagram is given in Fig. 3.Suppose that the gene corresponding to the right-most digit inthe binary representation of the states is to be deleted. If we tryto collapse the state transition diagram, we notice that, with re-spect to the attractor structure of the original BN, merging theleaf node ( ) and the attractor node ( ) can be done in twovery different ways: either the merging happens towards the at-tractor state or it happens towards the leaf state. In the first case,the attractor state is preserved in the reduced BN as ( ), and thebasin of attraction of the attractor ( ) in the original networklooses one leaf. In the second case, the attractor structure of thereduced network differs significantly to that of the original BN,the only remaining attractor being the reduced state ( ). Thus,if we consider the attractor structure of a BN as a representationof important biological characteristics of the real genomic reg-ulatory system, then the merging of the states ( ) and ( )should be done towards the attractor state. At the same time, wepoint out that those two states are the only states in the originalstate transition diagram that create a possibility of essentially al-tering the attractor structure of the original BN. The rest of thestates will merge within the basin of attraction of the attractorstate ( ). Keeping all of this in mind, we give the followingdefinition.

Definition 1: A state is called an inconsistency point withrespect to gene if and only if the state in the state transition

diagram that differs from only in the value of belongs to adifferent basin of attraction than the basin of attraction belongsto.

DIRE: Input—a PBN. Output—a reduced PBN.

for {every BN in the PBN} do

{label each attractor and its corresponding basin, andlabel each leaf node in the state transition diagram};

{label each inconsistency point};

merge {states with respect to the gene being deleted,filling in at the same time the truth tableof the reduced BN};

end;

Notice that the complexity of the algorithm is determined bythe number of genes, and by the number of the BNs com-prising the PBN. Because the individual BNs model the dif-ferent possible wirings of the real genomic regulatory network,one can assume that . Biological considera-tions suggest that the constant is not larger than 100, andin all practical applications . Therefore, the com-plexity of the algorithm depends on the number ofstates, on the size of the truth table for the reduced BNs that arebuilt up in the merge part of the algorithm, and on the numberof visits to the states of the original BNs which is also part ofthe same procedure. The size of the truth table of the reducedBNs is which leads to a complexity estimateof . To analyze the average number of visits tothe states of a BN, we assume that its state transition diagram isa single-rooted well-balanced tree on vertices. Such an as-sumption is motivated by the understanding that if a PBN wereto model a real genomic regulatory system then that PBN shouldhave relatively short attractor cycles with the majority of thembeing singleton ones, [11]. With this assumption in mind, onecan observe that the number of state visits is approximately# leaves # levels in the state transition diagram. The factor 2comes from the observation that each state on the path from aleaf to the corresponding attractor could be visited twice withinthe merge procedure. The number of levels gives the averagelength of such a path, and for the well-balanced single-rootedtree, we have # levels and # leaves .Thus, we obtain a complexity estimate of for the visitsto the states of the state transition diagram of one of the originalBNs. Next, we discuss the merge part of the algorithm. Whenmerging/collapsing the states that differ only by the value of thegene being “deleted” one needs to take special care of the fol-lowing two issues:

1) controlling the damage to the attractor structure whileproperly handling the points of inconsistency;

2) merging states without creating cycles within the basins ofattraction.

In order to describe the merging part of the algorithm, weneed a few more definitions.

2314 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 4. General flowchart.

Fig. 5. Process state procedure.

Definition 2: The states and in a BN state transition dia-gram are called dual states with respect to a given gene in theBN iff and differ only in the value of that gene .

Definition 3: The state produced after deleting the variablerepresenting the gene in a state from the state transitiondiagram of the original BN is called the reduced state withrespect to and is denoted by .

Definition 4: Assume that the attractor cycles andthe singleton attractors of a BN are indexed. If de-notes the th attractor cycle/singleton attractor then

pairwise distinct orif is a singleton.

Definition 5: If the states and are dual states with respectto the gene in a BN, then the one that determines the position ofthe reduced state in the state transition diagram of the reducedBN is called absorbing and is denoted by .

The flowchart of the algorithm is given on Figs. 4–8. Noticethat the flowchart Fig. 8 combines with the one on Fig. 5 to formthe process basins procedure, and also with the flowchart onFig. 6 to form the process attractors procedure.

The last part of the algorithm, form attractors procedure,forms an attractor cycle which is a random permutation of thereduced versions of the states that are not points of inconsistencyin the corresponding attractor cycle of the original BN.

Before formally defining the algorithm, we provide an ex-ample to show how the proposed algorithm works in the simplecase of the BN on three genes shown on the left-hand side ofFig. 9. Assume that the gene corresponding to the right-mostdigit in the states of the BN is to be “deleted.” The labeling ofthe states in the state transition diagram is produced by the firstpart of the algorithm given on the general flowchart, Fig. 4. For

example the leaf node ( ) has a label (1,I,L) indicating that:first, it belongs to the basin of the attractor 1; second, that it isan inconsistency point; and, last, that it is a leaf node. One cannotice that the states ( ) and ( ) are the only inconsistencypoints. When processing basins, we assume that the leafs of theBN are ordered as ( ),( ),( ). We point out that differentorderings of the leafs could produce different state transition di-agrams of the reduced BN. With the given leaf ordering, thefirst states processed and marked are ( ) and ( ), and

. Because there is a transition , the transi-tion is introduced in the state transition diagramof the reduced BN. The next available nonattractor state on thepath from ( ) towards the attractor, labeled as 1, is ( ), andits dual state is the leaf ( ). Thus, , the algorithmmarks those states and introduces a transition inthe state transition diagram of the reduced BN. The reason forintroducing such a transition is that the state . Be-cause all of the nonattractor states on the path from the first leafto the attractor 1 have been visited, the next input for the processstate procedure is the second leaf ( ) that happens to be aninconsistency point. Therefore, the point of inconsistency pro-cedure is invoked, which results in marking the state ( ), be-cause its dual state ( ) is an attractor state. The next availablestate on the path from ( ) to the attractor 1 is ( ). Since itsdual is the attractor state ( ) the state ( ) is marked by thealgorithm, and with that all of the basins of attraction for theBN have been processed. The attractor structure of the BN inconsideration is relatively simple, and process attractors pro-cedure removes the inconsistency label from ( ) because itsdual state ( ) is not an attractor state. Finally, form attractorsprocedure uses the singleton attractors ( ) and ( ) from the

IVANOV et al.: DYNAMICS PRESERVING SIZE REDUCTION MAPPINGS FOR PROBABILISTIC BOOLEAN NETWORKS 2315

Fig. 6. Process attractors procedure (one pass of the for loop).

Fig. 7. Form the set S (one pass of the for loop in the form attractors procedure).

original BN to create the singleton attractors ( ) and ( ) inthe reduced BN. Thus, we arrive at the state transition diagramof the reduced BN given on the right-hand side of Fig. 9.

The following is the formal description of the proposed algo-rithm. Recall that refers to the absorbing state, and standsfor the set of reduced with respect to the gene attractor statesfrom the attractor cycle/singleton attractor of a BN that par-ticipates in the original PBN.

Merge while “deleting” the gene procedure: Input—a BN.Output—a reduced BN.

for {every leaf node in the BN} do

:= process state ;

while do

:= process state ;

end;

end;

process attractors;

form attractors.

Remark: The for loop in the merge procedure ensures that all ofthe nonattractor states of a BN have been marked and the nonat-tractor transitions of the reduced BN have been established. Theactual marking and the new transitions establishing is accom-plished by successive calls to process state procedure. The pro-cedure returns a state such that an attractor state has beenreached if and only if .

Process state procedure: Input—a state . Output—astate .

if { is not marked}

if {the dual of the state is not an attractor state}

if { and belong to the same basin of attraction}

then

mark {the states and }; ;

{fill in the truth table of the reduced BN byintroducing a transition from into , where

is a path from to in the state transitiondiagram of the original BN, and is the closeston the path towards the attractor, state to that isneither a point of inconsistency nor a marked state,or if all of the states on that path are either marked

2316 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 8. Point of inconsistency procedure.

Fig. 9. Example.

or points of inconsistency is selected to be thefirst attractor state on that path};

else point of inconsistency ;

else { and mark };

if {the state from the transition is a nonattractor state} ;

return .

Remark: Notice that the assignment in necessary in ordernot to increase the number of level sets of the state transitiondiagram of the BN being reduced. An example on the comple-mentary web site http://gsp.tamu.edu/Publications/dire.htm il-lustrates how the other possible assignment for the absorbing

state can lead to a reduced BN with more level sets thanthe original BN.

Process attractors procedure: Input—the attractors of a BN.Output—a labeling of the attractors.

for {every attractor node in the BN} do

if { is an inconsistency point and is an attractornode}

then

point of inconsistency ( , );

if { is an inconsistency point and is not an attractornode}

IVANOV et al.: DYNAMICS PRESERVING SIZE REDUCTION MAPPINGS FOR PROBABILISTIC BOOLEAN NETWORKS 2317

then {remove the inconsistency mark from }

end;

return.

Form attractors procedure: Input—the attractors of a BNtogether with the labeling. Output—the attractors of thereduced BN.

for {every attractor cycle in the BN} do

{form the set };

{fill in the truth table of the reduced BN by introducinga cycle connecting the reduced states in in the orderof any one of their possible permutations}

end;

return.

The next procedure takes care of the points of inconsistency insuch a way that:

1) If and are both nonattractor states then the relativeweight of a basin of attraction in the original BN is changedby one state only and with probability of 1/2.

2) If one of and is an attractor state then that state becomesthe absorbing state.

3) If both of the states are attractor states the singleton attrac-tors are favored and are labeled so that the form attractorsprocedure preserves them in the state transition diagram ofthe reduced BN.

Point of inconsistency procedure: Input—states and. Output—a labeling for the states of a BN and transitions

for the states in the basins of the state transition diagram ofthe reduced BN.

if { and are both non-attractor states}

mark {the states and };

{uniformly randomly decide if or };

{fill in the truth table of the reduced BN by introducing atransition from into , as described in the process stateprocedure};

if {either or is an attractor state}

mark {the nonattractor state};

if { and are both attractor states}

case 1 { and are either both singleton or nonsingletonattractors}:

{uniformly randomly decide if or };

{remove the inconsistency mark from };

case 2 {exactly one of the dual states is a singleton attractor}:

{ := singleton attractor};

{remove the inconsistency mark from }.

By the way it is constructed the proposed algorithm achievesthe goals stated at the beginning of this section in 1) and 2),while taking care of the points of inconsistency. Specifically:

1) The algorithm does not increase the number of the BNscomprising the original PBN, and in the special caseswhen some of the reduced BNs are identical DIRE pro-duces a reduced PBN with a lesser number of partici-pating BNs compared to that of the original PBN. That isachieved by combining the identical BNs, and assigningto the only remaining copy a selection probability thatequals to the sum of the individual selection probabili-ties of those BNs.

2) One can immediately notice that merging and markingof states in the basins of attraction does not create cyclesin the basins of attraction of the reduced BNs, the algo-rithm never increases the number of the level sets presentin the state transition diagram of the original PBN, andthe points of inconsistency are treated in a way that con-trols the damage to the state transition diagram of thecorresponding BN. The algorithm changes the relativeweight of any basin of attraction, e.g., point of incon-sistency procedure, only in the case when a point of in-consistency is moved from that basin to another basinof attraction, and even in those cases the change of theweight is just by one state and with a probability of 1/2.Moreover, the algorithm favors the original structure ofthe attractors of the BN being reduced. The only casewhen the number of the attractors or the structure of anattractor could be changed is when a state in that at-tractor is a point of inconsistency. If its dual state isnot part of an attractor, then is merged into . Other-wise, the two attractor states and are merged so thatthe singleton attractor is always favored or merging isdone uniformly randomly if none of the two states repre-sents a singleton attractor. There is one exceptional casewhen all of the states in an attractor cycle are points ofinconsistency, and then the attractor states in that cyclecould be merged towards other attractor cycles with aprobability of 1/2. The intuition behind this treatment ofthe attractors and their basins is that the presence of la-tent variables should not affect in a significant way theobserved dynamical behavior of the regulatory systemmodeled by the PBN. One should notice that every tran-sition introduced in the state transition diagram of a re-duced BN is equivalent to filling in the row in its truthtable that is labeled by the reduced state . Here, weassume that the truth table of the BN has its rows la-beled by the states (ordered as binary numbers) and itscolumns labeled by the individual genes. Marking thepairs of dual states ensures that every row in the truthtable is visited only once. Finally, the basins and the at-tractors are handled in separate consecutive steps of the

2318 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 10. Original BN3.

algorithm, and forming of the attractor cycles of the re-duced BN is performed as the last step. All of this en-sures that an attractor cycle of the reduced BN can beformed in any one of the possible ways if there are

reduced states participating in that cycle, and in doingso none of the previously introduced state transitions be-tween the states in the basins or from state in the basinto an attractor state will change.

To illustrate how the new algorithm outperforms, in the senseof 1) and 2), the reduction mapping, we produce, Fig. 10, thestate transition diagrams of one of the BNs comprising thePBN generated from the melanoma data set to be discussed inSection VI, and the state transition diagrams resulting from ap-plying the reduction mapping, Fig. 11, and the newly proposedalgorithm, Fig. 12, to that particular BN. One can notice thechange of the attractor structure under the reduction mapping.In contrast, the attractor structure of the original BN is wellpreserved after applying the new algorithm. The change inthe level structure of the original BN after applying the twodifferent mappings should be noticed as well. While the newalgorithm does not increase the number (four) of original levels,the reduction mapping introduces an extra level in the statetransition diagram of the reduced BN. In essence, the proposedalgorithm attempts to preserve the flow of information betweenstates lying on the same path of the state transition diagram, andit handles the exceptions created by the inconsistencies arisingfrom deletion of a gene by considering it to be a latent variable.In Section V, we apply the algorithm to a synthetic exampleof a PBN, and compare the results to the results after applying

the projection and the reduction mappings to the same PBN.The following two important features of the newly proposedmapping deserve special attention.

1) The mapping is multivalued. This property comes intwo flavors: Different state transition diagrams of areduced BN could be produced by the different possibleorderings of the leafs in the corresponding originalBN. In addition, an attractor cycle in a reduced BNcan be formed in possible ways by permuting thestates comprising it (here, is the number of the statesforming that attractor cycle). The multivalued feature ofthe proposed algorithm allows for natural optimizationprocedures based on criteria such as the least damageto the state transition diagram of each BN or the min-imal distance from the probability distribution of the“collapsed” state transition diagram.

2) The algorithm performing the mapping does not requireknowledge about the steady-state distribution of theoriginal PBN. The design of the algorithm ensures therelative preservation of the structure of the attractorsets and their corresponding basins, which in its turnhelps in obtaining a reduced PBN whose steady-statedistribution is close to the desired one.

V. SYNTHETIC EXAMPLE AND COMPARISON

In order to clearly illustrate the differences between the threedifferent size-reducing mappings, we consider the followingsynthetic example.

IVANOV et al.: DYNAMICS PRESERVING SIZE REDUCTION MAPPINGS FOR PROBABILISTIC BOOLEAN NETWORKS 2319

Fig. 11. BN3 after the reduction mapping.

Fig. 12. BN3 reduced using the new algorithm.

2320 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 13. BNs comprising the PBN from example 1.

Fig. 14. BNs comprising ~A .

Example 1: Let be a PBN with parametersand , and consisting of three genes

and the set of network functions with selec-tion probabilities , . The four BNsparticipating in the dynamics of are given on Fig. 13.

Applying the projection mapping (with respect to )produces the PBN composed of 16 BNs: an expo-

nential increase of the model complexity. While the probabilitystructure of is preserved in , some BNs comprising it havetheir state transition diagrams far from the dynamical structureof the original four BNs.

The reduction mapping (with respect to the same gene)produces the PBN comprised of two BNs but some

of the important structure present in the state transition diagramsof BN3 and BN4 is lost, as Fig. 14 shows.

In comparison, the newly proposed algorithm produces (afterdeleting the third gene) the PBN , where not only the numberof participating BNs is the same as in , but as one can see fromFig. 15, the dynamical structure of each of the original BNs ispreserved to a greater extent.

Notice that the reduced PBNs produced by either thereduction mapping or by the newly proposed algorithmhave exactly the same stationary-state distribution, namely,{(0,0.25),(1,0),(2,0),(3,0.75)} (the states of the network beinggiven by their decimal representations). Moreover, if one se-lects the gene “flipping” probability to be a positive one, then

Fig. 15. BNs comprising A .

the PBN induces an ergodic Markov chain, and the previousstationary-state distribution is also its steady-state distribution.

VI. MELANOMA APPLICATION

The gene-expression profiles used in this study result fromthe study of 31 malignant melanoma samples [1]. For the study,messenger RNA was isolated directly from melanoma biopsies,and fluorescent cDNA from the message was prepared and hy-bridized to a microarray containing probes for 8150 cDNAs(representing 6971 unique genes). The seven genes WNT5A,pirin, S100P, RET1, MART1, HADHB, and STC2 used herefor the model were chosen from a set of 587 genes from themelanoma data set that were subjected to an analysis of theirability to cross predict each other’s state in a multivariate set-ting ([12]). The gene expression profiles were binarized and aPBN comprised of four BNs was constructed following themethod from [18]. The parameters for were set asand . Fig. 16 shows the steady-state distributions ofthe PBNs and produced by deleting WNT5A using thereduction mapping and the newly proposed algorithm, respec-tively, and the probability distribution defined on the samestate space and resulting from the collapsing procedure given inSection III. The MSE between and the steady-state distribu-tion of is 0.01369498, whereas the MSE between and thesteady-state distribution of is 0.0242964. We note that the

IVANOV et al.: DYNAMICS PRESERVING SIZE REDUCTION MAPPINGS FOR PROBABILISTIC BOOLEAN NETWORKS 2321

Fig. 16. Steady-state distributions.

newly proposed algorithm was applied without any optimiza-tion over all of the possible leafs in the state transition diagramsof the BNs, which if applied can reduce the MSE error between

and even further.

VII. CONCLUSION

The new algorithm proposed in this paper has the followingimportant advantages compared to the previously introduced al-gorithms for reducing the size/complexity of a given PBN.

1) It preserves the dynamical structure of the original PBNin the following sense: controlling the damage to the at-tractor structure, no increase of the number of the levelsets in the basins of the attractors, and preserving thenumber of the BNs that constitute the PBN. Not only theattractor structure is preserved but there are no spuriousattractors being generated in the reduced network. Thisis not the case for the projection and reduction mappings.The synthetic example from Section V shows that essen-tial attractor structure could be lost while applying thereduction mapping, or spurious attractors could be cre-ated by the projection mapping. The melanoma data setexample shows that the reduction mapping can also pro-duce spurious attractors in some of the BNs comprisingthe reduced PBN. The proposed algorithm does not in-crease the number of level sets present in the state tran-sition diagram of the original PBN, which is not true forthe reduction mapping.

2) According to the discussion about goal 2) in Section IV,the proposed mapping does not introduce changes in thenumber of the attractors nor in the length of the attractorcycles, unless there are points of inconsistency that to-gether with their dual states are also attractor statesas well. In addition, the same discussion explains whythere will be very little change, in average, of the rela-tive sizes of the basins of the attractors. All of this to-gether with the observation that the number of the BNs,their selection probabilities, and the parameters andremain the same for the original and the reduced PBNimplies that, with the exception of the degenerate cases,where there are relatively large number of attractor statesthat are points of inconsistency and whose dual states arealso attractor states, the steady-state distribution of thereduced PBN produced by the proposed mapping willmatch closely the steady-state distribution of the original

PBN in the sense described in Section III. Moreover, thenew algorithm does not require any prior information forthe steady-state distribution of the original PBN as in thecase of the projection and the reduction mappings. Westudied DIRE performance on synthetically generateddata: constrained PBN generated using the algorithmproposed in [15] and randomly generated PBN. The re-sults of this study can be found on the complementaryweb site http://gsp.tamu.edu/Publications/dire.htm. Theresults support our conclusions in the following sense.First, none of the randomly generated PBNs had oneof the previously mentioned degenerate attractor struc-tures. Second, DIRE performed as expected on all of thesynthetically generated PBNs.

3) Finally, the notion of point of inconsistency creates theopportunity for evaluating the importance of a partic-ular gene for the network in question. This could leadto new methods for gene ranking, as well as optionsfor applying control-based optimization procedures forshifting the steady-state distribution of a given PBN intoa desired one: a goal that has immediate implications fortherapeutic intervention.

REFERENCES

[1] M. Bittner, P. Meltzer, Y. Chen, Y. Jiang, E. Seftor, M. Hendrix,M. Radmacher, R. Simon, Z. Yakhini, A. Ben-Dor, N. Sampas,E. Dougherty, E. Wang, F. Marincola, C. Gooden, J. Lueders, A.Glatfelter, P. Pollock, J. Carpten, E. Gillanders, D. Leja, K. Dietrich,C. Beaudry, M. Berens, D. Alberts, and V. Sondak, “Molecularclassification of cutaneous malignant melanoma by gene expressionprofiling,” Nature, vol. 406, no. 6795, pp. 536–540, 2000.

[2] A. Datta, A. Choudhary, M. Bittner, and E. Dougherty, “External con-trol in Markovian genetic regulatory networks,” Mach. Learn., vol. 52,no. 1–2, pp. 169–191, Jul. 2003.

[3] E. Dougherty and I. Shmulevich, “Mappings between probabilisticBoolean networks,” Signal Process., vol. 84, no. 4, pp. 799–809, 2003.

[4] H. de Jong, “Modeling and simulation of genetic regulatory systems:A literature review,” J. Comput. Bio., vol. 9, pp. 67–103, 2002.

[5] K. Glass and S. Kauffman, “The logical analysis of continuous, non-linear biochemical control networks,” Theoretical Bio., vol. 39, pp.103–129, 1973.

[6] I. Ivanov and E. Dougherty, “Reduction mappings between prob-abilistic Boolean networks,” in Proc. EURASIP JASP, 2004, pp.125–131.

[7] I. Ivanov and E. Dougherty, “Modeling genetic regulatory networks:Continuous or descrete?,” J. Bio. Syst., vol. 14, no. 2, pp. 219–229,2006.

[8] S. Kauffman, “Metabolic stability and epigenesist in randomly con-structed genetic nets,” J. Theor. Bio., vol. 22, pp. 437–467, 1969.

[9] S. Kauffman, “Homeostasis and differentiation in random genetic con-trol networks,” Nature, vol. 224, pp. 177–178, 1969.

[10] S. Kauffman, “The large scale structure and dynamics of geneticcontrol circuits: An ensemble approach,” J. Theor. Bio., vol. 24, pp.167–190, 1974.

[11] S. Kauffman, The Origins of Order: Self-Organization and Selectionin Evolution. New York: Oxford Univ. Press, 1993.

[12] S. Kim, H. Li, Y. Chen, N. Cao, E. R. Dougherty, M. L. Bittner, andE. B. Suh, “Can Markov chain models mimic biological regulation?,”J. Bio. Syst., vol. 10, pp. 337–357, 2002.

[13] H. Lähdesmäki, “Relationships between probabilistic Boolean net-works and dynamic Bayesian networks as models of gene regulatorynetworks,” presented at the Workshop Discrete Models Genetic Regu-latory Netw., College Station, TX, 2003.

[14] R. Pal, A. Datta, A. Fornace, M. Bittner, and E. Dougherty, “Booleanrelationships among genes responsive to ionizing radiation in the NCI60 ACDS,” Bioinformatics, vol. 21, no. 8, pp. 1542–1549, Apr. 2005.

[15] R. Pal, I. Ivanov, A. Datta, and E. Dougherty, “Generating Booleannetworks with a prescribed attractor structure,” Bioinformatics, vol. 21,no. 21, pp. 4021–4025, Nov. 2005.

2322 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

[16] I. Shmulevich, E. Dougherty, S. Kim, and W. Zhang, “ProbabilisticBoolean networks: A rule-based uncertainty model for gene regulatorynetworks,” Bioinformatics, vol. 18, pp. 261–274, 2002.

[17] I. Shmulevich, E. Dougherty, and W. Zhang, “From Boolean to prob-abilistic Boolean networks as models of genetic regulatory networks,”Proc. IEEE, vol. 90, no. 11, pp. 1778–1792, Nov. 2002.

[18] X. Zhou, X. Wang, R. Pal, I. Ivanov, M. Bitner, and E. Dougherty,“A Bayesian connectivity-based approach to constructing proba-bilistic gene regulatory networks,” Bioinformatics, vol. 20, no. 17, pp.2918–2927, Nov. 2004.

Ivan Ivanov received the Ph.D. degree in mathe-matics from the University of South Florida, Tampa.

He is currently an Assistant Professor in theDepartment of Veterinary Physiology and Pharma-cology, Texas A&M University, College Station. Hiscurrent research is focused on genomic signal pro-cessing, and in particular on modeling the genomicregulatory mechanisms and on mappings reducingthe complexity of the models of genomic regulatorynetworks.

Ranadip Pal received the B. Tech. degree in elec-tronics and electrical communication engineeringfrom the Indian Institute of Technology, Kharagpur,India, in 2002, and the M.S. degree in electricalengineering from Texas A&M University, CollegeStation, in 2004, where he is currently pursuing thePh.D. degree in electrical and computer engineering.

His research interests include computationalbiology, genomic signal processing, and control ofgenetic regulatory networks.

Edward R. Dougherty received the Ph.D. degree inmathematics from Rutgers University, Piscataway,NJ, and the M.S. degree in computer science fromStevens Institute of Technology, Hoboken, NJ.

He is currently a Professor in the Departmentof Electrical Engineering, and the Director of theGenomic Signal Processing Laboratory, Texas A&MUniversity, College Station, and Director of theComputational Biology Division of the TranslationalGenomics Research Institute, Phoenix, AZ. He hascontributed extensively to the statistical design of

nonlinear operators for image processing and the consequent application ofpattern recognition theory to nonlinear image processing. His current researchinterests focus on genomic signal processing, with the central goal being tomodel genomic regulatory mechanisms for the purposes of diagnosis andtherapy. He is author of 12 books, editor of five others, and author of morethan 190 journal papers. He has served as Editor of the Journal of ElectronicImaging for six years.

Dr. Dougherty was a recipient of the SPIE President’s Award. He is a fellowof the SPIE.