On designing multicore-aware simulators for systems biology endowed with on-line statistics

15
Research Article On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics Marco Aldinucci, 1 Cristina Calcagno, 1,2 Mario Coppo, 1 Ferruccio Damiani, 1 Maurizio Drocco, 1 Eva Sciacca, 1 Salvatore Spinella, 1 Massimo Torquati, 3 and Angelo Troina 1 1 Department of Computer Science, University of Torino, 10149 Torino, Italy 2 Department of Life Sciences and Systems Biology, University of Torino, 10123 Torino, Italy 3 Department of Computer Science, University of Pisa, 56127 Pisa, Italy Correspondence should be addressed to Angelo Troina; [email protected] Received 19 February 2014; Revised 16 May 2014; Accepted 18 May 2014; Published 22 June 2014 Academic Editor: Horacio P´ erez-S´ anchez Copyright © 2014 Marco Aldinucci et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. e simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. e exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the soſtware designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. 1. Introduction is paper presents a critical rethinking of the parallelization of biological computational tools in the light of multicore platforms, which nowadays equip all scientific laboratories. We will focus on the features that are required to derive an efficient simulator of stochastic processes considering, in particular, performance and easy engineering. is latter aspect will be of crucial importance for next generation of biological tools that will be largely designed by bioinformat- ics scientists, who are likely to be more interested in the accurate modeling of natural phenomena rather than on the synchronisation protocols required to build efficient tools on multicore platforms. e stochastic simulation of biological systems is an increasingly popular technique in bioinformatics as either an alternative or a complementary tool to ordinary differential equations (ODEs). is trend, starting from Gillespie’s sem- inal work [1], has been supported by a growing number of formalisms aiming to describe stochastic models of biological systems [27]. e stochastic modeling approach is computation- ally much more expensive than ODEs. Nevertheless, this approach is quite attractive for its superior ability to describe transient and multistable behaviors of biological systems. In particular it allows studying rare or divergent behaviors, spikes, and discriminate families of possible behaviors that Hindawi Publishing Corporation BioMed Research International Volume 2014, Article ID 207041, 14 pages http://dx.doi.org/10.1155/2014/207041

Transcript of On designing multicore-aware simulators for systems biology endowed with on-line statistics

Research ArticleOn Designing Multicore-Aware Simulators for Systems BiologyEndowed with OnLine Statistics

Marco Aldinucci1 Cristina Calcagno12 Mario Coppo1

Ferruccio Damiani1 Maurizio Drocco1 Eva Sciacca1 Salvatore Spinella1

Massimo Torquati3 and Angelo Troina1

1 Department of Computer Science University of Torino 10149 Torino Italy2 Department of Life Sciences and Systems Biology University of Torino 10123 Torino Italy3 Department of Computer Science University of Pisa 56127 Pisa Italy

Correspondence should be addressed to Angelo Troina troinadiunitoit

Received 19 February 2014 Revised 16 May 2014 Accepted 18 May 2014 Published 22 June 2014

Academic Editor Horacio Perez-Sanchez

Copyright copy 2014 Marco Aldinucci et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

The paper arguments are on enabling methodologies for the design of a fully parallel online interactive tool aiming to supportthe bioinformatics scientists In particular the features of these methodologies supported by the FastFlow parallel programmingframework are shown on a simulation tool to perform themodeling the tuning and the sensitivity analysis of stochastic biologicalmodels A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should beanalysed by statistic and datamining tools In the considered approach the two stages are pipelined in such away that the simulationstage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial resultThe simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biologicalsystems behavior on a multicore platform and representative proof-of-concept biological systems The exploited methodologiesinclude pattern-based parallel programming and data streaming that provide key features to the software designers such asperformance portability and efficient in-memory (big) data management and movement Two paradigmatic classes of biologicalsystems exhibiting multistable and oscillatory behavior are used as a testbed

1 Introduction

This paper presents a critical rethinking of the parallelizationof biological computational tools in the light of multicoreplatforms which nowadays equip all scientific laboratories

We will focus on the features that are required to derivean efficient simulator of stochastic processes consideringin particular performance and easy engineering This latteraspect will be of crucial importance for next generation ofbiological tools that will be largely designed by bioinformat-ics scientists who are likely to be more interested in theaccurate modeling of natural phenomena rather than on thesynchronisation protocols required to build efficient tools onmulticore platforms

The stochastic simulation of biological systems is anincreasingly popular technique in bioinformatics as either analternative or a complementary tool to ordinary differentialequations (ODEs) This trend starting from Gillespiersquos sem-inal work [1] has been supported by a growing number offormalisms aiming to describe stochasticmodels of biologicalsystems [2ndash7]

The stochastic modeling approach is computation-ally much more expensive than ODEs Nevertheless thisapproach is quite attractive for its superior ability to describetransient and multistable behaviors of biological systemsIn particular it allows studying rare or divergent behaviorsspikes and discriminate families of possible behaviors that

Hindawi Publishing CorporationBioMed Research InternationalVolume 2014 Article ID 207041 14 pageshttpdxdoiorg1011552014207041

2 BioMed Research International

are typically hidden in the averaged behavior described byODEs

The high computational cost of stochastic simulation canalso be very annoying in the tuning of biochemical models inwhich some quantitative parameters are unknown or scarcelyknown and need a set of tests to be fixed

This has led in the last two decades to a numberof attempts to accelerate them up using several kind oftechniques such as approximate simulation algorithms andparallel computing [8 9] In this work this latter approach istaken into account

Since stochastic simulations rely onMonte Carlomethodmany independent simulation instances should be computedand analysed to achieve statistically meaningful results Thecomputation of these independent instances has been tradi-tionally exploited in an embarrassingly parallel fashion exe-cuting a partition of the instances on different machinesThisapproach naturally couples with the distributed executionof a batch of tasks that require large infrastructures (eggrids and clouds) and suffers from slow time-to-solutionas each experiment requires to enqueue the simulations inshared environment deploy initial data simulate the modelgather results from a distributed environment postprocessthem (often sequentially) then eventually access resultsThisprocess is typically repeated several times to fine tune theinitial conditions and simulation parameters

This approach when transferred on multicore platformwhich nowadays equips the large majority of computingplatforms fails shortly to produce actual application speedupespecially for IO and memory-bound applications since allthe cores usually share the same memory and IO subsystemIndeed the simulation of biological systems produces a largeamount of data which can be regarded as streams of dataresulting from the on-going simulations The managementof these streams is not trivial on multicore platforms asthe memory bandwidth cannot usually sustain a continuousflux of data coming from all the cores at the same time Arelated aspect concerns the filtering and the analysis of rawresults which require the merging of data obtained fromdifferent simulation instances and possibly their statisticaldescription or mining with data reduction techniques Evenin a distributed computing environment this phase is oftendemoted to a secondary aspect in the computation andtreated with offline postprocessing tools frequently not evendisclosed in performance results

This approach is no longer practical especially on multi-core platforms because of a number of reasons

(i) the ever-increasing size of produced data burdens onthe main weaknesses of multicore platforms that ismemory bandwidth and core synchronisations

(ii) the ldquosequentialisationrdquo of simulation and analysisphases slow down the design-to-result process whichis particularly annoying during the tuning of thebiological model especially in the cases where thesimulation outcome could show very soon an incor-rect behavior

(iii) the design of the simulator is often specifically opti-mised for a specific parallel platform eithermulticoreor distributed (or not optimised at all)

Since the frequency and size of data moved across sim-ulation workflows strictly depend on the required accuracythe simulation and analysis of biological systems at high-precision happen to be a serious issue on modern shared-memory multicore platforms Indeed it involves the mergingof results from different simulation instances and possiblytheir statistical description or mining with data reductiontechniques

The design of simulation-analysis workflow encompass-ing a parallel simulator stage and a parallel data analysisstage is presented along with its experimental validationThe two stages are pipelined in such a way that at eachobserved simulation time 119905

119894 the simulation stage stream out

the partial results of all simulation trajectories (aligned at119905119894) to the analysis stage that immediately produces a partial

result Analysis stage which can be equipped with user-defined statistic and mining operators works on sliding datawindow and does not require keeping inmemory the full dataset with both performance and response time benefits

The advocated design methodology is validated on inter-esting classes of biological problems for which the classicalmodelization via ODE is quite problematic if not impossiblewhile a stochastic model can be more proficuous As we shallsee despite the whole simulation-analysis being performedon partial data that is on temporal sliding window ofsimulation results the dynamics of the system is effectivelyapproximated and can be shown to bioinformatics scientistduring a simulation

The technical challenges for the parallelisation of simu-lation and analysis stages and their pipelining are discussed(among these the exploitation of high-level pattern-basedparallel programming approaches to decrease design andimplementation time) Eventually the proposed approach canbe used as a fully reusable methodology that can be exploitedin the design or parallelisation of other tools for systemsbiology

The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executing the sim-ulation and online analysis workflow onmulticore platformsIn this respect paradigmatic examples of two challengingclasses of biological systems that is bistablemultistable andoscillatory systems are discussed The key behavior of thesesystems are represented by way of different classes of onlineanalysis tools introduced in the previous section respectivelystatistical description clustering and frequency detection

To perform these experiments a simulator for the calcu-lus of wrapped compartments (CWC) built with the abovetechnology will be used as test-bed CWC [10] is a recentlyproposed formalism for the representation of biologicalsystems CWC extends the known stochastic simulators byadding a nested structure of labeled compartments delimitedby membranes However to better focus on the proposedmethodology and make the paper self-contained we willwrite all examples of the paper in the basic subset of CWCin which biochemical reactions are denoted in a standard

BioMed Research International 3

chemical notation We only remark that a distinguishedfeature of CWC which will be used in this paper is thepossibility of associating each reaction with an arbitrary ratefunction depending on the overall content of the ambientin which the reaction takes place This allows to tailor thereaction rates on the specific characteristics of the systemas for instance when representing nonlinear reactions asMichaelis-Menten kinetics

2 Methods

The methods for the stochastic simulation of biologicalsystems including Gillespiersquos algorithm [1] are typicallybased on theMonte Carlomethod An individual simulationwhich tracks the state of the system at each time-step iscalled trajectory Many thousands of independent trajectoriesmight be needed to get a representative picture of how thesystem behaves on the whole This behavior springs from thecollective analysis of trajectories which is typically carriedout by way of statistical or data mining estimators

Thanks to their independence the different instancesneeded to simulate a biological model that can be easilycomputed in an embarrassingly parallel fashion However thecomplete simulation workflow needed to derive simulationresult including additional phases such as the dispatchingand scheduling of simulations result gathering trajectorydata assembling and analysis phases which exhibit datadependencies (thus requires communications andor syn-chronisations) Often to simplify the design of the simulationtool these phases are neither parallelized nor considered inthe performance evaluation As a matter of a fact a parallelsimulation is often considered an ldquoembarrassingly parallelrdquoproblem whereas it is if data distribution gathering filteringand analysis are not considered as part of the whole simula-tion workflowThese phases often (questionably) consideredas preprocessing and postprocessing phases may result tobe as expensive as the simulation itself Moreover the fullsequentialization of the phases inhibits the early detection ofbadly tuned simulations and makes the model tuning spiralannoyingly slow

As an example a simulation of theHIV diffusion problem(computed using the StochKit toolkit for 4 years of simulationtime) may easily produce over 5GBytes of raw data perinstance [11] As clear the data size is 119899-folded when 119899

instances are considered Eventually this data should begathered and often reduced to a single trajectory via statisticalmethods or analysed with data mining methods that canbe much more time expensive to be figured out than barestatistical estimators

These potential performance flaws are further exacer-bated inmulticore andmany-core platformsThese platformsdo not exhibit the same degree of replication of hardwareresources that can be found in distributed environments andeven independent processes actually compete for the samehardware resources within the single platform such as mainand secondarymemory the performances of which representthe real challenge of the forthcoming parallel programmingmodels (aka memory wall problem) While simulation is

substantially a CPU-bound problem on distributed platformit may become prevalently an IO-bound problem on amulticore platform due to the need to store and postprocessmany trajectories The finer the observed simulation time-step the strongest the computational problem is characterizedas IO-bound

To be effective stochastic methods in systems biologyrequire many trajectories with a fine grain resolution inorder to make observable deviant trajectories peaks highvariance of results and multistable behaviors which oftenrepresent the real nature of the phenomena that is not wellcaptured by traditional approaches such as ODEs Theseevents are often not immediate to detect in the bulk of grosssimulation results Several techniques for analyzing such datafor example principal components analysis linear modelingand canonical correlation analysis have been proposed It canbe imagined that next generation software tools for naturalsciences should be able to perform this kind of processing inpipeline with the running data source as a partially or totallyonline process because

(i) it will be needed to manage an ever-increasingamount of experimental data either coming frommeasurement or simulation and

(ii) it will substantially improve the overall experimentalworkflow by providing the natural scientists with analmost real-time feedback enabling the early tuningor sweeping of the experimental parameters andthusscientific productivity

The flexibility given by the possibility of running manydifferent analysis modules in parallel is of particular interestas in many biological case studies the searched pattern inexperimental results is unknown and might require to trydifferent kinds of analysis since modules can be swept inparallel

The parallel analysis of the system dynamics (eg alongtime) is more challenging since online data processingrequires statistic and data mining operators to work onstreamed data and in general random access to data is guar-anteed only within a limited window of the whole datasetwhile already accessed data can be only stored in synthesizedform When data description techniques requiring to accessthe whole data set in random order cannot be used onlinedata description and mining requires novel algorithms Theextensive study of these algorithms is an emerging topic indata discovery community and is beyond the scope of thiswork

We advocate the parallelization of both simulations andanalysis by pipelining them in a two-stage workflow whereboth stages are also parallel We also advocate the high-leveldesign of the whole workflow to enhance both productivityand efficiency on different platforms (ie performance porta-bility)

21 A Parallel Simulation-Analysis Workflow for CWC Theintrinsic complexity in the parallelization of the single stephas traditionally led to the exploitation of parallelism in the

4 BioMed Research International

Simeng

Gat

her

Simeng

Disp

tach

Incomplete simulation tasks (with load balancing)

Farm

Alig

nmen

t of

traj

ecto

ries

Gen

erat

ion

of

simul

atio

n ta

sks Stat

eng

Gat

her

Disp

tach

Farm

Gen

erat

ion

ofsli

ding

win

dow

sof

traj

ecto

riesRaw

simulationresults Display of

results

graphicaluser

interface

MeanVariance

simulationresults

Filtered

stateng

Simulation pipeline Analysis pipeline

Start new simulations steer and terminate running simulations

Main pipeline

K-means

middot middot middot

middot middot middot

Figure 1 CWC simulator with online parallel analysis architecture

computation of independent instances of the same simula-tion which should anyway be computed to achieve statisticalconvergence of simulated trajectories (as in all Monte Carlomethods) The problem is well understood it has beenexploited in the last two decades inmany different flavors anddistributed computing environments from clusters to grid toclouds [8 11ndash16] Notwithstanding that the problem has beenoften approached either neglecting to consider the cost ofanalysis or assuming that output data has a negligible size thisis not likely to happen in this and next generations biologicalsimulations

The previous considerations have led to the design of asimulation tool that includes both parallel simulation anddata analysis in a single workflowThese phases are pipelinedrather than sequential To make it possible it is needed thatall the logical phases of the process (ie data distributionparallel simulation result gathering parallel trajectory dataassembling and analysis) can be effectively pipelined Thisimplies that all phases can effectively work on data streamsIdeally an efficient and portable implementation of thesimulation-analysis workflow should be able to representstreams as first-class concept and provide the designerswith high-level programming constructs to manage themefficiently (also in multicore platforms)

To date application programming for bioinformatics(and other sciences) has not embraced much more than low-level communication and synchronization libraries In thehierarchy of abstractions it is only slightly above togglingabsolute binary in the front panel of the machine Theadvent and the pervasiveness of multicore platforms arepushing parallel programming outside its historical nicheNext generation software should be designed not only to beefficient on these platforms but also to be developed andtuned with high-productivity and reduced time-to-marketThis is particularly important when parallel computing servesas a tool for other sciences since nonexpert designers shouldbe able to experiment different algorithmic solutions for bothsimulations and data analysis

Attempts to reduce the programming effort by raisingthe level of abstraction through the provision of parallelprogramming frameworks date back at least three decadesand have resulted in a number of significant contribu-tions Notable among these is the skeletal approach [17]

(aka pattern-based parallel programming) which appearsto become increasingly popular after being revamped byseveral successful parallel programming frameworks [18ndash20]Skeletons (aka patterns) capture common parallel program-ming paradigms (eg Map Reduce MapReduce PipelineFarm and DivideampConquer) and make them available to theprogrammer as high-level programming constructs equippedwith well-defined functional and extrafunctional semantics[21] Some of them are specifically designed tomanage streamas first-class objects such as pipeline farm (aka master-worker pattern) and loop patterns

A particularly efficient implementation of the describedpatterns is provided by the FastFlow parallel programmingframework [22] FastFlow is a C++ open-source templatelibrary aimed at simplifying the development of efficientapplications for multicore platforms FastFlow eases thedevelopment and guaranteed runtime efficiency by raisingthe abstraction level of the design phase thus providingdevelopers with a set of optimised parallel programmingpatterns [22 23] Run-time efficiency is mainly achievedby way of a lock-less implementation of patterns exhibitinga speed edge against other popular parallel programmingframeworks (also see performance comparisons in [24])The FastFlow implementation of the pipeline loop andfarm patterns is exploited to design the CWC simulationworkflow The effectiveness of the FastFlow framework inhigh-frequency synchronizations (ie fine-grained tasks) ata high-level of abstraction is the key features to devisea portable and efficient implementation of the simulationworkflow which is CWC simulator with online parallelanalysis architecture composed by a three-stage pipelinesimulation analysis and display of results The former twostages are in turn pipelines of other stages whereas the displayof results is realized by way of a graphical user interface(GUI) The big picture of the simulation workflow is shownin Figure 1 In the picture all the grey boxes as well as all thecode needed for synchronization anddata streaming (double-headed arrows) are automatically generated by the FastFlowframework The implementation of the whole software actu-ally consists in declaring the structure of the workflow interms of FastFlow objects (ie farm and pipelines) and filling

BioMed Research International 5

white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed

211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories

The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline

In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic

212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in

the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage

The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows

(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency

(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ

119882centered in the current

data point 119909119894

equiv 119891(119905119894) where 119905

119894equiv 1199050

+ 119894Δ119878(where Δ

119878

is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864

119894 referred to the local

trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888

119899that preserve higher moments that is

to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial

(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period

6 BioMed Research International

Figure 2 Screenshots of the simulation tool interface

between successive peaks the frequency of the related eventis then inferred

The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section

213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2

3 Experiments and Results

The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection

31 Expressivity and Effectiveness

311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are

119860 119860003

997891997888rarr 119860 119860 119860 119860 119860 11986000001

997891997888rarr 119860 119860

119861200

997891997888rarr 119861 119860 11986035

997891997888rarr ∙

(1)

where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law

The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory

Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480

stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize

312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863

+1198621198682 leads the lysogenic

stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863

minus1198621198682 The notation 119863

+1198621198682119863

minus1198621198682 models the

phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the

BioMed Research International 7

500

250

0

500

250

0

500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200

DeviationPo

pula

tion

Popu

latio

n

Popu

latio

n

Time

Time Time

00 25 50 75 100 125 150 175 200

Time

A

Mean A

750

500

250

0

Popu

latio

n 750

A0

A1

A0

A1

21 24

A A

K-means A

(a)

00 25 50 75 100 125 150 175 200

500

250

0

Popu

latio

n500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200

Popu

latio

n

Time

Time

Time

00 25 50 75 100 125 150 175 200

Time

500

250

0

Popu

latio

n

Popu

latio

n750

500

250

0

750

21 24

A A

A

Mean A

A0

A1

Deviation K-means A

A0

A

A

(b)

Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)

promoting site with a global repressing effect) The reactionrules in this system are

119862119868 119862119868005

997891997888rarr 1198621198682 119862119868205

997891997888rarr 119862119868 119862119868

1198621198682 1198630026

997891997888rarr 119863+1198621198682 119863

+11986211986820026

997891997888rarr 1198621198682 119863

1198621198682 1198630026

997891997888rarr 119863minus1198621198682 119863

minus11986211986820026

997891997888rarr 1198621198682 119863

119863+1198621198682 1198621198682

013

997891997888rarr 119863+1198621198682119863

minus1198621198682

119863+1198621198682119863

minus1198621198682013

997891997888rarr 119863+1198621198682 1198621198682

119863+1198621198682 119875

40

997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868

00007

997891997888rarr ∙

(2)

where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the

8 BioMed Research International

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120

Num

ber o

f mol

ecul

es (C

I)

Time (min)

(a) 120582-phage model simulation

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

QT

cluste

rs w

ith tr

ends

(CI)

Time (min)

(b) QT clustering

Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT

basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))

Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories

K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters

313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare

FRQin119891FRQin(119905)997891997888rarr FRQin 119872

11987205

997891997888rarr 119872 FRQ

119872

119891119872

997891997888rarr ∙ ⊤ FRQ119891119889

997891997888rarr ∙

FRQ 05997891997888rarr FRQin

FRQin06

997891997888rarr FRQ

(3)

The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V

119904(119905)(119870119899

119868(FRQ119899 + 119870

119899

119868))

denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V

119904(119905) defined by

V119904 (119905)

= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879

200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)

(4)

increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870

119868is related to the threshold beyond

which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891

119872= V119898(119872(119870

119872+ 119872))

The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870

119889+ FRQ)) where V

119889is the maximum rate

of FRQ degradation and theMichaelis constant related to thisprocess is 119870

119889

As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V

119898= 505 V

119889= 140 119896

119904= 05 119896

1= 05 119896

2= 06

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

2 BioMed Research International

are typically hidden in the averaged behavior described byODEs

The high computational cost of stochastic simulation canalso be very annoying in the tuning of biochemical models inwhich some quantitative parameters are unknown or scarcelyknown and need a set of tests to be fixed

This has led in the last two decades to a numberof attempts to accelerate them up using several kind oftechniques such as approximate simulation algorithms andparallel computing [8 9] In this work this latter approach istaken into account

Since stochastic simulations rely onMonte Carlomethodmany independent simulation instances should be computedand analysed to achieve statistically meaningful results Thecomputation of these independent instances has been tradi-tionally exploited in an embarrassingly parallel fashion exe-cuting a partition of the instances on different machinesThisapproach naturally couples with the distributed executionof a batch of tasks that require large infrastructures (eggrids and clouds) and suffers from slow time-to-solutionas each experiment requires to enqueue the simulations inshared environment deploy initial data simulate the modelgather results from a distributed environment postprocessthem (often sequentially) then eventually access resultsThisprocess is typically repeated several times to fine tune theinitial conditions and simulation parameters

This approach when transferred on multicore platformwhich nowadays equips the large majority of computingplatforms fails shortly to produce actual application speedupespecially for IO and memory-bound applications since allthe cores usually share the same memory and IO subsystemIndeed the simulation of biological systems produces a largeamount of data which can be regarded as streams of dataresulting from the on-going simulations The managementof these streams is not trivial on multicore platforms asthe memory bandwidth cannot usually sustain a continuousflux of data coming from all the cores at the same time Arelated aspect concerns the filtering and the analysis of rawresults which require the merging of data obtained fromdifferent simulation instances and possibly their statisticaldescription or mining with data reduction techniques Evenin a distributed computing environment this phase is oftendemoted to a secondary aspect in the computation andtreated with offline postprocessing tools frequently not evendisclosed in performance results

This approach is no longer practical especially on multi-core platforms because of a number of reasons

(i) the ever-increasing size of produced data burdens onthe main weaknesses of multicore platforms that ismemory bandwidth and core synchronisations

(ii) the ldquosequentialisationrdquo of simulation and analysisphases slow down the design-to-result process whichis particularly annoying during the tuning of thebiological model especially in the cases where thesimulation outcome could show very soon an incor-rect behavior

(iii) the design of the simulator is often specifically opti-mised for a specific parallel platform eithermulticoreor distributed (or not optimised at all)

Since the frequency and size of data moved across sim-ulation workflows strictly depend on the required accuracythe simulation and analysis of biological systems at high-precision happen to be a serious issue on modern shared-memory multicore platforms Indeed it involves the mergingof results from different simulation instances and possiblytheir statistical description or mining with data reductiontechniques

The design of simulation-analysis workflow encompass-ing a parallel simulator stage and a parallel data analysisstage is presented along with its experimental validationThe two stages are pipelined in such a way that at eachobserved simulation time 119905

119894 the simulation stage stream out

the partial results of all simulation trajectories (aligned at119905119894) to the analysis stage that immediately produces a partial

result Analysis stage which can be equipped with user-defined statistic and mining operators works on sliding datawindow and does not require keeping inmemory the full dataset with both performance and response time benefits

The advocated design methodology is validated on inter-esting classes of biological problems for which the classicalmodelization via ODE is quite problematic if not impossiblewhile a stochastic model can be more proficuous As we shallsee despite the whole simulation-analysis being performedon partial data that is on temporal sliding window ofsimulation results the dynamics of the system is effectivelyapproximated and can be shown to bioinformatics scientistduring a simulation

The technical challenges for the parallelisation of simu-lation and analysis stages and their pipelining are discussed(among these the exploitation of high-level pattern-basedparallel programming approaches to decrease design andimplementation time) Eventually the proposed approach canbe used as a fully reusable methodology that can be exploitedin the design or parallelisation of other tools for systemsbiology

The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executing the sim-ulation and online analysis workflow onmulticore platformsIn this respect paradigmatic examples of two challengingclasses of biological systems that is bistablemultistable andoscillatory systems are discussed The key behavior of thesesystems are represented by way of different classes of onlineanalysis tools introduced in the previous section respectivelystatistical description clustering and frequency detection

To perform these experiments a simulator for the calcu-lus of wrapped compartments (CWC) built with the abovetechnology will be used as test-bed CWC [10] is a recentlyproposed formalism for the representation of biologicalsystems CWC extends the known stochastic simulators byadding a nested structure of labeled compartments delimitedby membranes However to better focus on the proposedmethodology and make the paper self-contained we willwrite all examples of the paper in the basic subset of CWCin which biochemical reactions are denoted in a standard

BioMed Research International 3

chemical notation We only remark that a distinguishedfeature of CWC which will be used in this paper is thepossibility of associating each reaction with an arbitrary ratefunction depending on the overall content of the ambientin which the reaction takes place This allows to tailor thereaction rates on the specific characteristics of the systemas for instance when representing nonlinear reactions asMichaelis-Menten kinetics

2 Methods

The methods for the stochastic simulation of biologicalsystems including Gillespiersquos algorithm [1] are typicallybased on theMonte Carlomethod An individual simulationwhich tracks the state of the system at each time-step iscalled trajectory Many thousands of independent trajectoriesmight be needed to get a representative picture of how thesystem behaves on the whole This behavior springs from thecollective analysis of trajectories which is typically carriedout by way of statistical or data mining estimators

Thanks to their independence the different instancesneeded to simulate a biological model that can be easilycomputed in an embarrassingly parallel fashion However thecomplete simulation workflow needed to derive simulationresult including additional phases such as the dispatchingand scheduling of simulations result gathering trajectorydata assembling and analysis phases which exhibit datadependencies (thus requires communications andor syn-chronisations) Often to simplify the design of the simulationtool these phases are neither parallelized nor considered inthe performance evaluation As a matter of a fact a parallelsimulation is often considered an ldquoembarrassingly parallelrdquoproblem whereas it is if data distribution gathering filteringand analysis are not considered as part of the whole simula-tion workflowThese phases often (questionably) consideredas preprocessing and postprocessing phases may result tobe as expensive as the simulation itself Moreover the fullsequentialization of the phases inhibits the early detection ofbadly tuned simulations and makes the model tuning spiralannoyingly slow

As an example a simulation of theHIV diffusion problem(computed using the StochKit toolkit for 4 years of simulationtime) may easily produce over 5GBytes of raw data perinstance [11] As clear the data size is 119899-folded when 119899

instances are considered Eventually this data should begathered and often reduced to a single trajectory via statisticalmethods or analysed with data mining methods that canbe much more time expensive to be figured out than barestatistical estimators

These potential performance flaws are further exacer-bated inmulticore andmany-core platformsThese platformsdo not exhibit the same degree of replication of hardwareresources that can be found in distributed environments andeven independent processes actually compete for the samehardware resources within the single platform such as mainand secondarymemory the performances of which representthe real challenge of the forthcoming parallel programmingmodels (aka memory wall problem) While simulation is

substantially a CPU-bound problem on distributed platformit may become prevalently an IO-bound problem on amulticore platform due to the need to store and postprocessmany trajectories The finer the observed simulation time-step the strongest the computational problem is characterizedas IO-bound

To be effective stochastic methods in systems biologyrequire many trajectories with a fine grain resolution inorder to make observable deviant trajectories peaks highvariance of results and multistable behaviors which oftenrepresent the real nature of the phenomena that is not wellcaptured by traditional approaches such as ODEs Theseevents are often not immediate to detect in the bulk of grosssimulation results Several techniques for analyzing such datafor example principal components analysis linear modelingand canonical correlation analysis have been proposed It canbe imagined that next generation software tools for naturalsciences should be able to perform this kind of processing inpipeline with the running data source as a partially or totallyonline process because

(i) it will be needed to manage an ever-increasingamount of experimental data either coming frommeasurement or simulation and

(ii) it will substantially improve the overall experimentalworkflow by providing the natural scientists with analmost real-time feedback enabling the early tuningor sweeping of the experimental parameters andthusscientific productivity

The flexibility given by the possibility of running manydifferent analysis modules in parallel is of particular interestas in many biological case studies the searched pattern inexperimental results is unknown and might require to trydifferent kinds of analysis since modules can be swept inparallel

The parallel analysis of the system dynamics (eg alongtime) is more challenging since online data processingrequires statistic and data mining operators to work onstreamed data and in general random access to data is guar-anteed only within a limited window of the whole datasetwhile already accessed data can be only stored in synthesizedform When data description techniques requiring to accessthe whole data set in random order cannot be used onlinedata description and mining requires novel algorithms Theextensive study of these algorithms is an emerging topic indata discovery community and is beyond the scope of thiswork

We advocate the parallelization of both simulations andanalysis by pipelining them in a two-stage workflow whereboth stages are also parallel We also advocate the high-leveldesign of the whole workflow to enhance both productivityand efficiency on different platforms (ie performance porta-bility)

21 A Parallel Simulation-Analysis Workflow for CWC Theintrinsic complexity in the parallelization of the single stephas traditionally led to the exploitation of parallelism in the

4 BioMed Research International

Simeng

Gat

her

Simeng

Disp

tach

Incomplete simulation tasks (with load balancing)

Farm

Alig

nmen

t of

traj

ecto

ries

Gen

erat

ion

of

simul

atio

n ta

sks Stat

eng

Gat

her

Disp

tach

Farm

Gen

erat

ion

ofsli

ding

win

dow

sof

traj

ecto

riesRaw

simulationresults Display of

results

graphicaluser

interface

MeanVariance

simulationresults

Filtered

stateng

Simulation pipeline Analysis pipeline

Start new simulations steer and terminate running simulations

Main pipeline

K-means

middot middot middot

middot middot middot

Figure 1 CWC simulator with online parallel analysis architecture

computation of independent instances of the same simula-tion which should anyway be computed to achieve statisticalconvergence of simulated trajectories (as in all Monte Carlomethods) The problem is well understood it has beenexploited in the last two decades inmany different flavors anddistributed computing environments from clusters to grid toclouds [8 11ndash16] Notwithstanding that the problem has beenoften approached either neglecting to consider the cost ofanalysis or assuming that output data has a negligible size thisis not likely to happen in this and next generations biologicalsimulations

The previous considerations have led to the design of asimulation tool that includes both parallel simulation anddata analysis in a single workflowThese phases are pipelinedrather than sequential To make it possible it is needed thatall the logical phases of the process (ie data distributionparallel simulation result gathering parallel trajectory dataassembling and analysis) can be effectively pipelined Thisimplies that all phases can effectively work on data streamsIdeally an efficient and portable implementation of thesimulation-analysis workflow should be able to representstreams as first-class concept and provide the designerswith high-level programming constructs to manage themefficiently (also in multicore platforms)

To date application programming for bioinformatics(and other sciences) has not embraced much more than low-level communication and synchronization libraries In thehierarchy of abstractions it is only slightly above togglingabsolute binary in the front panel of the machine Theadvent and the pervasiveness of multicore platforms arepushing parallel programming outside its historical nicheNext generation software should be designed not only to beefficient on these platforms but also to be developed andtuned with high-productivity and reduced time-to-marketThis is particularly important when parallel computing servesas a tool for other sciences since nonexpert designers shouldbe able to experiment different algorithmic solutions for bothsimulations and data analysis

Attempts to reduce the programming effort by raisingthe level of abstraction through the provision of parallelprogramming frameworks date back at least three decadesand have resulted in a number of significant contribu-tions Notable among these is the skeletal approach [17]

(aka pattern-based parallel programming) which appearsto become increasingly popular after being revamped byseveral successful parallel programming frameworks [18ndash20]Skeletons (aka patterns) capture common parallel program-ming paradigms (eg Map Reduce MapReduce PipelineFarm and DivideampConquer) and make them available to theprogrammer as high-level programming constructs equippedwith well-defined functional and extrafunctional semantics[21] Some of them are specifically designed tomanage streamas first-class objects such as pipeline farm (aka master-worker pattern) and loop patterns

A particularly efficient implementation of the describedpatterns is provided by the FastFlow parallel programmingframework [22] FastFlow is a C++ open-source templatelibrary aimed at simplifying the development of efficientapplications for multicore platforms FastFlow eases thedevelopment and guaranteed runtime efficiency by raisingthe abstraction level of the design phase thus providingdevelopers with a set of optimised parallel programmingpatterns [22 23] Run-time efficiency is mainly achievedby way of a lock-less implementation of patterns exhibitinga speed edge against other popular parallel programmingframeworks (also see performance comparisons in [24])The FastFlow implementation of the pipeline loop andfarm patterns is exploited to design the CWC simulationworkflow The effectiveness of the FastFlow framework inhigh-frequency synchronizations (ie fine-grained tasks) ata high-level of abstraction is the key features to devisea portable and efficient implementation of the simulationworkflow which is CWC simulator with online parallelanalysis architecture composed by a three-stage pipelinesimulation analysis and display of results The former twostages are in turn pipelines of other stages whereas the displayof results is realized by way of a graphical user interface(GUI) The big picture of the simulation workflow is shownin Figure 1 In the picture all the grey boxes as well as all thecode needed for synchronization anddata streaming (double-headed arrows) are automatically generated by the FastFlowframework The implementation of the whole software actu-ally consists in declaring the structure of the workflow interms of FastFlow objects (ie farm and pipelines) and filling

BioMed Research International 5

white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed

211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories

The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline

In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic

212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in

the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage

The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows

(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency

(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ

119882centered in the current

data point 119909119894

equiv 119891(119905119894) where 119905

119894equiv 1199050

+ 119894Δ119878(where Δ

119878

is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864

119894 referred to the local

trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888

119899that preserve higher moments that is

to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial

(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period

6 BioMed Research International

Figure 2 Screenshots of the simulation tool interface

between successive peaks the frequency of the related eventis then inferred

The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section

213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2

3 Experiments and Results

The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection

31 Expressivity and Effectiveness

311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are

119860 119860003

997891997888rarr 119860 119860 119860 119860 119860 11986000001

997891997888rarr 119860 119860

119861200

997891997888rarr 119861 119860 11986035

997891997888rarr ∙

(1)

where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law

The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory

Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480

stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize

312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863

+1198621198682 leads the lysogenic

stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863

minus1198621198682 The notation 119863

+1198621198682119863

minus1198621198682 models the

phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the

BioMed Research International 7

500

250

0

500

250

0

500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200

DeviationPo

pula

tion

Popu

latio

n

Popu

latio

n

Time

Time Time

00 25 50 75 100 125 150 175 200

Time

A

Mean A

750

500

250

0

Popu

latio

n 750

A0

A1

A0

A1

21 24

A A

K-means A

(a)

00 25 50 75 100 125 150 175 200

500

250

0

Popu

latio

n500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200

Popu

latio

n

Time

Time

Time

00 25 50 75 100 125 150 175 200

Time

500

250

0

Popu

latio

n

Popu

latio

n750

500

250

0

750

21 24

A A

A

Mean A

A0

A1

Deviation K-means A

A0

A

A

(b)

Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)

promoting site with a global repressing effect) The reactionrules in this system are

119862119868 119862119868005

997891997888rarr 1198621198682 119862119868205

997891997888rarr 119862119868 119862119868

1198621198682 1198630026

997891997888rarr 119863+1198621198682 119863

+11986211986820026

997891997888rarr 1198621198682 119863

1198621198682 1198630026

997891997888rarr 119863minus1198621198682 119863

minus11986211986820026

997891997888rarr 1198621198682 119863

119863+1198621198682 1198621198682

013

997891997888rarr 119863+1198621198682119863

minus1198621198682

119863+1198621198682119863

minus1198621198682013

997891997888rarr 119863+1198621198682 1198621198682

119863+1198621198682 119875

40

997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868

00007

997891997888rarr ∙

(2)

where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the

8 BioMed Research International

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120

Num

ber o

f mol

ecul

es (C

I)

Time (min)

(a) 120582-phage model simulation

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

QT

cluste

rs w

ith tr

ends

(CI)

Time (min)

(b) QT clustering

Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT

basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))

Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories

K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters

313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare

FRQin119891FRQin(119905)997891997888rarr FRQin 119872

11987205

997891997888rarr 119872 FRQ

119872

119891119872

997891997888rarr ∙ ⊤ FRQ119891119889

997891997888rarr ∙

FRQ 05997891997888rarr FRQin

FRQin06

997891997888rarr FRQ

(3)

The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V

119904(119905)(119870119899

119868(FRQ119899 + 119870

119899

119868))

denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V

119904(119905) defined by

V119904 (119905)

= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879

200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)

(4)

increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870

119868is related to the threshold beyond

which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891

119872= V119898(119872(119870

119872+ 119872))

The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870

119889+ FRQ)) where V

119889is the maximum rate

of FRQ degradation and theMichaelis constant related to thisprocess is 119870

119889

As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V

119898= 505 V

119889= 140 119896

119904= 05 119896

1= 05 119896

2= 06

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 3

chemical notation We only remark that a distinguishedfeature of CWC which will be used in this paper is thepossibility of associating each reaction with an arbitrary ratefunction depending on the overall content of the ambientin which the reaction takes place This allows to tailor thereaction rates on the specific characteristics of the systemas for instance when representing nonlinear reactions asMichaelis-Menten kinetics

2 Methods

The methods for the stochastic simulation of biologicalsystems including Gillespiersquos algorithm [1] are typicallybased on theMonte Carlomethod An individual simulationwhich tracks the state of the system at each time-step iscalled trajectory Many thousands of independent trajectoriesmight be needed to get a representative picture of how thesystem behaves on the whole This behavior springs from thecollective analysis of trajectories which is typically carriedout by way of statistical or data mining estimators

Thanks to their independence the different instancesneeded to simulate a biological model that can be easilycomputed in an embarrassingly parallel fashion However thecomplete simulation workflow needed to derive simulationresult including additional phases such as the dispatchingand scheduling of simulations result gathering trajectorydata assembling and analysis phases which exhibit datadependencies (thus requires communications andor syn-chronisations) Often to simplify the design of the simulationtool these phases are neither parallelized nor considered inthe performance evaluation As a matter of a fact a parallelsimulation is often considered an ldquoembarrassingly parallelrdquoproblem whereas it is if data distribution gathering filteringand analysis are not considered as part of the whole simula-tion workflowThese phases often (questionably) consideredas preprocessing and postprocessing phases may result tobe as expensive as the simulation itself Moreover the fullsequentialization of the phases inhibits the early detection ofbadly tuned simulations and makes the model tuning spiralannoyingly slow

As an example a simulation of theHIV diffusion problem(computed using the StochKit toolkit for 4 years of simulationtime) may easily produce over 5GBytes of raw data perinstance [11] As clear the data size is 119899-folded when 119899

instances are considered Eventually this data should begathered and often reduced to a single trajectory via statisticalmethods or analysed with data mining methods that canbe much more time expensive to be figured out than barestatistical estimators

These potential performance flaws are further exacer-bated inmulticore andmany-core platformsThese platformsdo not exhibit the same degree of replication of hardwareresources that can be found in distributed environments andeven independent processes actually compete for the samehardware resources within the single platform such as mainand secondarymemory the performances of which representthe real challenge of the forthcoming parallel programmingmodels (aka memory wall problem) While simulation is

substantially a CPU-bound problem on distributed platformit may become prevalently an IO-bound problem on amulticore platform due to the need to store and postprocessmany trajectories The finer the observed simulation time-step the strongest the computational problem is characterizedas IO-bound

To be effective stochastic methods in systems biologyrequire many trajectories with a fine grain resolution inorder to make observable deviant trajectories peaks highvariance of results and multistable behaviors which oftenrepresent the real nature of the phenomena that is not wellcaptured by traditional approaches such as ODEs Theseevents are often not immediate to detect in the bulk of grosssimulation results Several techniques for analyzing such datafor example principal components analysis linear modelingand canonical correlation analysis have been proposed It canbe imagined that next generation software tools for naturalsciences should be able to perform this kind of processing inpipeline with the running data source as a partially or totallyonline process because

(i) it will be needed to manage an ever-increasingamount of experimental data either coming frommeasurement or simulation and

(ii) it will substantially improve the overall experimentalworkflow by providing the natural scientists with analmost real-time feedback enabling the early tuningor sweeping of the experimental parameters andthusscientific productivity

The flexibility given by the possibility of running manydifferent analysis modules in parallel is of particular interestas in many biological case studies the searched pattern inexperimental results is unknown and might require to trydifferent kinds of analysis since modules can be swept inparallel

The parallel analysis of the system dynamics (eg alongtime) is more challenging since online data processingrequires statistic and data mining operators to work onstreamed data and in general random access to data is guar-anteed only within a limited window of the whole datasetwhile already accessed data can be only stored in synthesizedform When data description techniques requiring to accessthe whole data set in random order cannot be used onlinedata description and mining requires novel algorithms Theextensive study of these algorithms is an emerging topic indata discovery community and is beyond the scope of thiswork

We advocate the parallelization of both simulations andanalysis by pipelining them in a two-stage workflow whereboth stages are also parallel We also advocate the high-leveldesign of the whole workflow to enhance both productivityand efficiency on different platforms (ie performance porta-bility)

21 A Parallel Simulation-Analysis Workflow for CWC Theintrinsic complexity in the parallelization of the single stephas traditionally led to the exploitation of parallelism in the

4 BioMed Research International

Simeng

Gat

her

Simeng

Disp

tach

Incomplete simulation tasks (with load balancing)

Farm

Alig

nmen

t of

traj

ecto

ries

Gen

erat

ion

of

simul

atio

n ta

sks Stat

eng

Gat

her

Disp

tach

Farm

Gen

erat

ion

ofsli

ding

win

dow

sof

traj

ecto

riesRaw

simulationresults Display of

results

graphicaluser

interface

MeanVariance

simulationresults

Filtered

stateng

Simulation pipeline Analysis pipeline

Start new simulations steer and terminate running simulations

Main pipeline

K-means

middot middot middot

middot middot middot

Figure 1 CWC simulator with online parallel analysis architecture

computation of independent instances of the same simula-tion which should anyway be computed to achieve statisticalconvergence of simulated trajectories (as in all Monte Carlomethods) The problem is well understood it has beenexploited in the last two decades inmany different flavors anddistributed computing environments from clusters to grid toclouds [8 11ndash16] Notwithstanding that the problem has beenoften approached either neglecting to consider the cost ofanalysis or assuming that output data has a negligible size thisis not likely to happen in this and next generations biologicalsimulations

The previous considerations have led to the design of asimulation tool that includes both parallel simulation anddata analysis in a single workflowThese phases are pipelinedrather than sequential To make it possible it is needed thatall the logical phases of the process (ie data distributionparallel simulation result gathering parallel trajectory dataassembling and analysis) can be effectively pipelined Thisimplies that all phases can effectively work on data streamsIdeally an efficient and portable implementation of thesimulation-analysis workflow should be able to representstreams as first-class concept and provide the designerswith high-level programming constructs to manage themefficiently (also in multicore platforms)

To date application programming for bioinformatics(and other sciences) has not embraced much more than low-level communication and synchronization libraries In thehierarchy of abstractions it is only slightly above togglingabsolute binary in the front panel of the machine Theadvent and the pervasiveness of multicore platforms arepushing parallel programming outside its historical nicheNext generation software should be designed not only to beefficient on these platforms but also to be developed andtuned with high-productivity and reduced time-to-marketThis is particularly important when parallel computing servesas a tool for other sciences since nonexpert designers shouldbe able to experiment different algorithmic solutions for bothsimulations and data analysis

Attempts to reduce the programming effort by raisingthe level of abstraction through the provision of parallelprogramming frameworks date back at least three decadesand have resulted in a number of significant contribu-tions Notable among these is the skeletal approach [17]

(aka pattern-based parallel programming) which appearsto become increasingly popular after being revamped byseveral successful parallel programming frameworks [18ndash20]Skeletons (aka patterns) capture common parallel program-ming paradigms (eg Map Reduce MapReduce PipelineFarm and DivideampConquer) and make them available to theprogrammer as high-level programming constructs equippedwith well-defined functional and extrafunctional semantics[21] Some of them are specifically designed tomanage streamas first-class objects such as pipeline farm (aka master-worker pattern) and loop patterns

A particularly efficient implementation of the describedpatterns is provided by the FastFlow parallel programmingframework [22] FastFlow is a C++ open-source templatelibrary aimed at simplifying the development of efficientapplications for multicore platforms FastFlow eases thedevelopment and guaranteed runtime efficiency by raisingthe abstraction level of the design phase thus providingdevelopers with a set of optimised parallel programmingpatterns [22 23] Run-time efficiency is mainly achievedby way of a lock-less implementation of patterns exhibitinga speed edge against other popular parallel programmingframeworks (also see performance comparisons in [24])The FastFlow implementation of the pipeline loop andfarm patterns is exploited to design the CWC simulationworkflow The effectiveness of the FastFlow framework inhigh-frequency synchronizations (ie fine-grained tasks) ata high-level of abstraction is the key features to devisea portable and efficient implementation of the simulationworkflow which is CWC simulator with online parallelanalysis architecture composed by a three-stage pipelinesimulation analysis and display of results The former twostages are in turn pipelines of other stages whereas the displayof results is realized by way of a graphical user interface(GUI) The big picture of the simulation workflow is shownin Figure 1 In the picture all the grey boxes as well as all thecode needed for synchronization anddata streaming (double-headed arrows) are automatically generated by the FastFlowframework The implementation of the whole software actu-ally consists in declaring the structure of the workflow interms of FastFlow objects (ie farm and pipelines) and filling

BioMed Research International 5

white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed

211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories

The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline

In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic

212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in

the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage

The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows

(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency

(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ

119882centered in the current

data point 119909119894

equiv 119891(119905119894) where 119905

119894equiv 1199050

+ 119894Δ119878(where Δ

119878

is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864

119894 referred to the local

trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888

119899that preserve higher moments that is

to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial

(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period

6 BioMed Research International

Figure 2 Screenshots of the simulation tool interface

between successive peaks the frequency of the related eventis then inferred

The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section

213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2

3 Experiments and Results

The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection

31 Expressivity and Effectiveness

311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are

119860 119860003

997891997888rarr 119860 119860 119860 119860 119860 11986000001

997891997888rarr 119860 119860

119861200

997891997888rarr 119861 119860 11986035

997891997888rarr ∙

(1)

where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law

The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory

Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480

stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize

312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863

+1198621198682 leads the lysogenic

stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863

minus1198621198682 The notation 119863

+1198621198682119863

minus1198621198682 models the

phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the

BioMed Research International 7

500

250

0

500

250

0

500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200

DeviationPo

pula

tion

Popu

latio

n

Popu

latio

n

Time

Time Time

00 25 50 75 100 125 150 175 200

Time

A

Mean A

750

500

250

0

Popu

latio

n 750

A0

A1

A0

A1

21 24

A A

K-means A

(a)

00 25 50 75 100 125 150 175 200

500

250

0

Popu

latio

n500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200

Popu

latio

n

Time

Time

Time

00 25 50 75 100 125 150 175 200

Time

500

250

0

Popu

latio

n

Popu

latio

n750

500

250

0

750

21 24

A A

A

Mean A

A0

A1

Deviation K-means A

A0

A

A

(b)

Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)

promoting site with a global repressing effect) The reactionrules in this system are

119862119868 119862119868005

997891997888rarr 1198621198682 119862119868205

997891997888rarr 119862119868 119862119868

1198621198682 1198630026

997891997888rarr 119863+1198621198682 119863

+11986211986820026

997891997888rarr 1198621198682 119863

1198621198682 1198630026

997891997888rarr 119863minus1198621198682 119863

minus11986211986820026

997891997888rarr 1198621198682 119863

119863+1198621198682 1198621198682

013

997891997888rarr 119863+1198621198682119863

minus1198621198682

119863+1198621198682119863

minus1198621198682013

997891997888rarr 119863+1198621198682 1198621198682

119863+1198621198682 119875

40

997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868

00007

997891997888rarr ∙

(2)

where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the

8 BioMed Research International

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120

Num

ber o

f mol

ecul

es (C

I)

Time (min)

(a) 120582-phage model simulation

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

QT

cluste

rs w

ith tr

ends

(CI)

Time (min)

(b) QT clustering

Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT

basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))

Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories

K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters

313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare

FRQin119891FRQin(119905)997891997888rarr FRQin 119872

11987205

997891997888rarr 119872 FRQ

119872

119891119872

997891997888rarr ∙ ⊤ FRQ119891119889

997891997888rarr ∙

FRQ 05997891997888rarr FRQin

FRQin06

997891997888rarr FRQ

(3)

The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V

119904(119905)(119870119899

119868(FRQ119899 + 119870

119899

119868))

denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V

119904(119905) defined by

V119904 (119905)

= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879

200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)

(4)

increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870

119868is related to the threshold beyond

which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891

119872= V119898(119872(119870

119872+ 119872))

The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870

119889+ FRQ)) where V

119889is the maximum rate

of FRQ degradation and theMichaelis constant related to thisprocess is 119870

119889

As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V

119898= 505 V

119889= 140 119896

119904= 05 119896

1= 05 119896

2= 06

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

4 BioMed Research International

Simeng

Gat

her

Simeng

Disp

tach

Incomplete simulation tasks (with load balancing)

Farm

Alig

nmen

t of

traj

ecto

ries

Gen

erat

ion

of

simul

atio

n ta

sks Stat

eng

Gat

her

Disp

tach

Farm

Gen

erat

ion

ofsli

ding

win

dow

sof

traj

ecto

riesRaw

simulationresults Display of

results

graphicaluser

interface

MeanVariance

simulationresults

Filtered

stateng

Simulation pipeline Analysis pipeline

Start new simulations steer and terminate running simulations

Main pipeline

K-means

middot middot middot

middot middot middot

Figure 1 CWC simulator with online parallel analysis architecture

computation of independent instances of the same simula-tion which should anyway be computed to achieve statisticalconvergence of simulated trajectories (as in all Monte Carlomethods) The problem is well understood it has beenexploited in the last two decades inmany different flavors anddistributed computing environments from clusters to grid toclouds [8 11ndash16] Notwithstanding that the problem has beenoften approached either neglecting to consider the cost ofanalysis or assuming that output data has a negligible size thisis not likely to happen in this and next generations biologicalsimulations

The previous considerations have led to the design of asimulation tool that includes both parallel simulation anddata analysis in a single workflowThese phases are pipelinedrather than sequential To make it possible it is needed thatall the logical phases of the process (ie data distributionparallel simulation result gathering parallel trajectory dataassembling and analysis) can be effectively pipelined Thisimplies that all phases can effectively work on data streamsIdeally an efficient and portable implementation of thesimulation-analysis workflow should be able to representstreams as first-class concept and provide the designerswith high-level programming constructs to manage themefficiently (also in multicore platforms)

To date application programming for bioinformatics(and other sciences) has not embraced much more than low-level communication and synchronization libraries In thehierarchy of abstractions it is only slightly above togglingabsolute binary in the front panel of the machine Theadvent and the pervasiveness of multicore platforms arepushing parallel programming outside its historical nicheNext generation software should be designed not only to beefficient on these platforms but also to be developed andtuned with high-productivity and reduced time-to-marketThis is particularly important when parallel computing servesas a tool for other sciences since nonexpert designers shouldbe able to experiment different algorithmic solutions for bothsimulations and data analysis

Attempts to reduce the programming effort by raisingthe level of abstraction through the provision of parallelprogramming frameworks date back at least three decadesand have resulted in a number of significant contribu-tions Notable among these is the skeletal approach [17]

(aka pattern-based parallel programming) which appearsto become increasingly popular after being revamped byseveral successful parallel programming frameworks [18ndash20]Skeletons (aka patterns) capture common parallel program-ming paradigms (eg Map Reduce MapReduce PipelineFarm and DivideampConquer) and make them available to theprogrammer as high-level programming constructs equippedwith well-defined functional and extrafunctional semantics[21] Some of them are specifically designed tomanage streamas first-class objects such as pipeline farm (aka master-worker pattern) and loop patterns

A particularly efficient implementation of the describedpatterns is provided by the FastFlow parallel programmingframework [22] FastFlow is a C++ open-source templatelibrary aimed at simplifying the development of efficientapplications for multicore platforms FastFlow eases thedevelopment and guaranteed runtime efficiency by raisingthe abstraction level of the design phase thus providingdevelopers with a set of optimised parallel programmingpatterns [22 23] Run-time efficiency is mainly achievedby way of a lock-less implementation of patterns exhibitinga speed edge against other popular parallel programmingframeworks (also see performance comparisons in [24])The FastFlow implementation of the pipeline loop andfarm patterns is exploited to design the CWC simulationworkflow The effectiveness of the FastFlow framework inhigh-frequency synchronizations (ie fine-grained tasks) ata high-level of abstraction is the key features to devisea portable and efficient implementation of the simulationworkflow which is CWC simulator with online parallelanalysis architecture composed by a three-stage pipelinesimulation analysis and display of results The former twostages are in turn pipelines of other stages whereas the displayof results is realized by way of a graphical user interface(GUI) The big picture of the simulation workflow is shownin Figure 1 In the picture all the grey boxes as well as all thecode needed for synchronization anddata streaming (double-headed arrows) are automatically generated by the FastFlowframework The implementation of the whole software actu-ally consists in declaring the structure of the workflow interms of FastFlow objects (ie farm and pipelines) and filling

BioMed Research International 5

white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed

211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories

The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline

In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic

212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in

the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage

The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows

(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency

(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ

119882centered in the current

data point 119909119894

equiv 119891(119905119894) where 119905

119894equiv 1199050

+ 119894Δ119878(where Δ

119878

is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864

119894 referred to the local

trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888

119899that preserve higher moments that is

to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial

(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period

6 BioMed Research International

Figure 2 Screenshots of the simulation tool interface

between successive peaks the frequency of the related eventis then inferred

The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section

213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2

3 Experiments and Results

The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection

31 Expressivity and Effectiveness

311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are

119860 119860003

997891997888rarr 119860 119860 119860 119860 119860 11986000001

997891997888rarr 119860 119860

119861200

997891997888rarr 119861 119860 11986035

997891997888rarr ∙

(1)

where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law

The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory

Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480

stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize

312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863

+1198621198682 leads the lysogenic

stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863

minus1198621198682 The notation 119863

+1198621198682119863

minus1198621198682 models the

phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the

BioMed Research International 7

500

250

0

500

250

0

500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200

DeviationPo

pula

tion

Popu

latio

n

Popu

latio

n

Time

Time Time

00 25 50 75 100 125 150 175 200

Time

A

Mean A

750

500

250

0

Popu

latio

n 750

A0

A1

A0

A1

21 24

A A

K-means A

(a)

00 25 50 75 100 125 150 175 200

500

250

0

Popu

latio

n500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200

Popu

latio

n

Time

Time

Time

00 25 50 75 100 125 150 175 200

Time

500

250

0

Popu

latio

n

Popu

latio

n750

500

250

0

750

21 24

A A

A

Mean A

A0

A1

Deviation K-means A

A0

A

A

(b)

Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)

promoting site with a global repressing effect) The reactionrules in this system are

119862119868 119862119868005

997891997888rarr 1198621198682 119862119868205

997891997888rarr 119862119868 119862119868

1198621198682 1198630026

997891997888rarr 119863+1198621198682 119863

+11986211986820026

997891997888rarr 1198621198682 119863

1198621198682 1198630026

997891997888rarr 119863minus1198621198682 119863

minus11986211986820026

997891997888rarr 1198621198682 119863

119863+1198621198682 1198621198682

013

997891997888rarr 119863+1198621198682119863

minus1198621198682

119863+1198621198682119863

minus1198621198682013

997891997888rarr 119863+1198621198682 1198621198682

119863+1198621198682 119875

40

997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868

00007

997891997888rarr ∙

(2)

where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the

8 BioMed Research International

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120

Num

ber o

f mol

ecul

es (C

I)

Time (min)

(a) 120582-phage model simulation

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

QT

cluste

rs w

ith tr

ends

(CI)

Time (min)

(b) QT clustering

Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT

basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))

Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories

K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters

313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare

FRQin119891FRQin(119905)997891997888rarr FRQin 119872

11987205

997891997888rarr 119872 FRQ

119872

119891119872

997891997888rarr ∙ ⊤ FRQ119891119889

997891997888rarr ∙

FRQ 05997891997888rarr FRQin

FRQin06

997891997888rarr FRQ

(3)

The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V

119904(119905)(119870119899

119868(FRQ119899 + 119870

119899

119868))

denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V

119904(119905) defined by

V119904 (119905)

= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879

200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)

(4)

increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870

119868is related to the threshold beyond

which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891

119872= V119898(119872(119870

119872+ 119872))

The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870

119889+ FRQ)) where V

119889is the maximum rate

of FRQ degradation and theMichaelis constant related to thisprocess is 119870

119889

As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V

119898= 505 V

119889= 140 119896

119904= 05 119896

1= 05 119896

2= 06

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 5

white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed

211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories

The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline

In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic

212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in

the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage

The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows

(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency

(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ

119882centered in the current

data point 119909119894

equiv 119891(119905119894) where 119905

119894equiv 1199050

+ 119894Δ119878(where Δ

119878

is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864

119894 referred to the local

trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888

119899that preserve higher moments that is

to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial

(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period

6 BioMed Research International

Figure 2 Screenshots of the simulation tool interface

between successive peaks the frequency of the related eventis then inferred

The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section

213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2

3 Experiments and Results

The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection

31 Expressivity and Effectiveness

311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are

119860 119860003

997891997888rarr 119860 119860 119860 119860 119860 11986000001

997891997888rarr 119860 119860

119861200

997891997888rarr 119861 119860 11986035

997891997888rarr ∙

(1)

where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law

The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory

Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480

stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize

312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863

+1198621198682 leads the lysogenic

stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863

minus1198621198682 The notation 119863

+1198621198682119863

minus1198621198682 models the

phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the

BioMed Research International 7

500

250

0

500

250

0

500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200

DeviationPo

pula

tion

Popu

latio

n

Popu

latio

n

Time

Time Time

00 25 50 75 100 125 150 175 200

Time

A

Mean A

750

500

250

0

Popu

latio

n 750

A0

A1

A0

A1

21 24

A A

K-means A

(a)

00 25 50 75 100 125 150 175 200

500

250

0

Popu

latio

n500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200

Popu

latio

n

Time

Time

Time

00 25 50 75 100 125 150 175 200

Time

500

250

0

Popu

latio

n

Popu

latio

n750

500

250

0

750

21 24

A A

A

Mean A

A0

A1

Deviation K-means A

A0

A

A

(b)

Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)

promoting site with a global repressing effect) The reactionrules in this system are

119862119868 119862119868005

997891997888rarr 1198621198682 119862119868205

997891997888rarr 119862119868 119862119868

1198621198682 1198630026

997891997888rarr 119863+1198621198682 119863

+11986211986820026

997891997888rarr 1198621198682 119863

1198621198682 1198630026

997891997888rarr 119863minus1198621198682 119863

minus11986211986820026

997891997888rarr 1198621198682 119863

119863+1198621198682 1198621198682

013

997891997888rarr 119863+1198621198682119863

minus1198621198682

119863+1198621198682119863

minus1198621198682013

997891997888rarr 119863+1198621198682 1198621198682

119863+1198621198682 119875

40

997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868

00007

997891997888rarr ∙

(2)

where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the

8 BioMed Research International

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120

Num

ber o

f mol

ecul

es (C

I)

Time (min)

(a) 120582-phage model simulation

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

QT

cluste

rs w

ith tr

ends

(CI)

Time (min)

(b) QT clustering

Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT

basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))

Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories

K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters

313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare

FRQin119891FRQin(119905)997891997888rarr FRQin 119872

11987205

997891997888rarr 119872 FRQ

119872

119891119872

997891997888rarr ∙ ⊤ FRQ119891119889

997891997888rarr ∙

FRQ 05997891997888rarr FRQin

FRQin06

997891997888rarr FRQ

(3)

The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V

119904(119905)(119870119899

119868(FRQ119899 + 119870

119899

119868))

denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V

119904(119905) defined by

V119904 (119905)

= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879

200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)

(4)

increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870

119868is related to the threshold beyond

which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891

119872= V119898(119872(119870

119872+ 119872))

The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870

119889+ FRQ)) where V

119889is the maximum rate

of FRQ degradation and theMichaelis constant related to thisprocess is 119870

119889

As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V

119898= 505 V

119889= 140 119896

119904= 05 119896

1= 05 119896

2= 06

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

6 BioMed Research International

Figure 2 Screenshots of the simulation tool interface

between successive peaks the frequency of the related eventis then inferred

The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section

213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2

3 Experiments and Results

The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection

31 Expressivity and Effectiveness

311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are

119860 119860003

997891997888rarr 119860 119860 119860 119860 119860 11986000001

997891997888rarr 119860 119860

119861200

997891997888rarr 119861 119860 11986035

997891997888rarr ∙

(1)

where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law

The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory

Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480

stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize

312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863

+1198621198682 leads the lysogenic

stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863

minus1198621198682 The notation 119863

+1198621198682119863

minus1198621198682 models the

phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the

BioMed Research International 7

500

250

0

500

250

0

500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200

DeviationPo

pula

tion

Popu

latio

n

Popu

latio

n

Time

Time Time

00 25 50 75 100 125 150 175 200

Time

A

Mean A

750

500

250

0

Popu

latio

n 750

A0

A1

A0

A1

21 24

A A

K-means A

(a)

00 25 50 75 100 125 150 175 200

500

250

0

Popu

latio

n500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200

Popu

latio

n

Time

Time

Time

00 25 50 75 100 125 150 175 200

Time

500

250

0

Popu

latio

n

Popu

latio

n750

500

250

0

750

21 24

A A

A

Mean A

A0

A1

Deviation K-means A

A0

A

A

(b)

Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)

promoting site with a global repressing effect) The reactionrules in this system are

119862119868 119862119868005

997891997888rarr 1198621198682 119862119868205

997891997888rarr 119862119868 119862119868

1198621198682 1198630026

997891997888rarr 119863+1198621198682 119863

+11986211986820026

997891997888rarr 1198621198682 119863

1198621198682 1198630026

997891997888rarr 119863minus1198621198682 119863

minus11986211986820026

997891997888rarr 1198621198682 119863

119863+1198621198682 1198621198682

013

997891997888rarr 119863+1198621198682119863

minus1198621198682

119863+1198621198682119863

minus1198621198682013

997891997888rarr 119863+1198621198682 1198621198682

119863+1198621198682 119875

40

997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868

00007

997891997888rarr ∙

(2)

where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the

8 BioMed Research International

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120

Num

ber o

f mol

ecul

es (C

I)

Time (min)

(a) 120582-phage model simulation

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

QT

cluste

rs w

ith tr

ends

(CI)

Time (min)

(b) QT clustering

Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT

basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))

Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories

K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters

313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare

FRQin119891FRQin(119905)997891997888rarr FRQin 119872

11987205

997891997888rarr 119872 FRQ

119872

119891119872

997891997888rarr ∙ ⊤ FRQ119891119889

997891997888rarr ∙

FRQ 05997891997888rarr FRQin

FRQin06

997891997888rarr FRQ

(3)

The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V

119904(119905)(119870119899

119868(FRQ119899 + 119870

119899

119868))

denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V

119904(119905) defined by

V119904 (119905)

= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879

200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)

(4)

increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870

119868is related to the threshold beyond

which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891

119872= V119898(119872(119870

119872+ 119872))

The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870

119889+ FRQ)) where V

119889is the maximum rate

of FRQ degradation and theMichaelis constant related to thisprocess is 119870

119889

As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V

119898= 505 V

119889= 140 119896

119904= 05 119896

1= 05 119896

2= 06

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 7

500

250

0

500

250

0

500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200

DeviationPo

pula

tion

Popu

latio

n

Popu

latio

n

Time

Time Time

00 25 50 75 100 125 150 175 200

Time

A

Mean A

750

500

250

0

Popu

latio

n 750

A0

A1

A0

A1

21 24

A A

K-means A

(a)

00 25 50 75 100 125 150 175 200

500

250

0

Popu

latio

n500

250

0

00 25 50 75 100 125 150 175 200

00 25 50 75 100 125 150 175 200

Popu

latio

n

Time

Time

Time

00 25 50 75 100 125 150 175 200

Time

500

250

0

Popu

latio

n

Popu

latio

n750

500

250

0

750

21 24

A A

A

Mean A

A0

A1

Deviation K-means A

A0

A

A

(b)

Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)

promoting site with a global repressing effect) The reactionrules in this system are

119862119868 119862119868005

997891997888rarr 1198621198682 119862119868205

997891997888rarr 119862119868 119862119868

1198621198682 1198630026

997891997888rarr 119863+1198621198682 119863

+11986211986820026

997891997888rarr 1198621198682 119863

1198621198682 1198630026

997891997888rarr 119863minus1198621198682 119863

minus11986211986820026

997891997888rarr 1198621198682 119863

119863+1198621198682 1198621198682

013

997891997888rarr 119863+1198621198682119863

minus1198621198682

119863+1198621198682119863

minus1198621198682013

997891997888rarr 119863+1198621198682 1198621198682

119863+1198621198682 119875

40

997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868

00007

997891997888rarr ∙

(2)

where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the

8 BioMed Research International

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120

Num

ber o

f mol

ecul

es (C

I)

Time (min)

(a) 120582-phage model simulation

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

QT

cluste

rs w

ith tr

ends

(CI)

Time (min)

(b) QT clustering

Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT

basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))

Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories

K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters

313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare

FRQin119891FRQin(119905)997891997888rarr FRQin 119872

11987205

997891997888rarr 119872 FRQ

119872

119891119872

997891997888rarr ∙ ⊤ FRQ119891119889

997891997888rarr ∙

FRQ 05997891997888rarr FRQin

FRQin06

997891997888rarr FRQ

(3)

The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V

119904(119905)(119870119899

119868(FRQ119899 + 119870

119899

119868))

denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V

119904(119905) defined by

V119904 (119905)

= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879

200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)

(4)

increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870

119868is related to the threshold beyond

which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891

119872= V119898(119872(119870

119872+ 119872))

The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870

119889+ FRQ)) where V

119889is the maximum rate

of FRQ degradation and theMichaelis constant related to thisprocess is 119870

119889

As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V

119898= 505 V

119889= 140 119896

119904= 05 119896

1= 05 119896

2= 06

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

8 BioMed Research International

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120

Num

ber o

f mol

ecul

es (C

I)

Time (min)

(a) 120582-phage model simulation

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

QT

cluste

rs w

ith tr

ends

(CI)

Time (min)

(b) QT clustering

Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT

basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))

Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories

K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters

313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare

FRQin119891FRQin(119905)997891997888rarr FRQin 119872

11987205

997891997888rarr 119872 FRQ

119872

119891119872

997891997888rarr ∙ ⊤ FRQ119891119889

997891997888rarr ∙

FRQ 05997891997888rarr FRQin

FRQin06

997891997888rarr FRQ

(3)

The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V

119904(119905)(119870119899

119868(FRQ119899 + 119870

119899

119868))

denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V

119904(119905) defined by

V119904 (119905)

= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879

200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)

(4)

increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870

119868is related to the threshold beyond

which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891

119872= V119898(119872(119870

119872+ 119872))

The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870

119889+ FRQ)) where V

119889is the maximum rate

of FRQ degradation and theMichaelis constant related to thisprocess is 119870

119889

As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V

119898= 505 V

119889= 140 119896

119904= 05 119896

1= 05 119896

2= 06

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 9

0

200

400

600

800

1000

1200

0 12 24 36 48 60 72140

160

180

200

220an

d FR

Q-m

RNA

(M)

Time (h)

Tota

l (F

t) a

nd n

ucle

ar (F

n) F

RQ

Max

ium

um F

RQ tr

ansc

riptio

n ra

te

Ft

Fn

M

Vs

Vs

(nm

ol(

hlowast1

00

))(a) Cytosolic FRQ oscillations in lightdark conditions

20

21

22

23

24

25

26

0 24 48 72 96 120 144 168 192 216 240

FRQ

-mRN

A ti

me p

erio

d (h

)

Time (h)

Dark onlyAlternate 12 h-dark12 h-light

(b) Peaks frequency of the cytosolic FRQ

Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel

119870119898

= 50119870119868= 100119870

119889= 13 and 119899 = 4 Concentrations have

been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V

119904is equal to

160 in the alternate condition V119904is equal to 160 during the

dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition

32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64

The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is

(sim eng) + (stat eng) + (other nodes)

+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4

(5)

where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model

As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines

As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora

Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10

4 and 106 simulation points

The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories

The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

10 BioMed Research International

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(a)

5

10

15

20

25

30

5 10 15 20 25 30

Spee

dup

Number of simulation engines (sim eng)

Ideal240 trajectories

480 trajectories1200 trajectories

(b)

Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)

(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)

(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads

(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories

This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms

In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)

The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878

119894 In particular

119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878

1) 119879119904 (119878

119896) (6)

Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)

Single trajectory information Overall data(20 sim eng 3 stat eng)

Number ofsamples Interarrival time Throughput Output size

104 2586 120583s 270MBs 8240MB

105 278 120583s 2859MBs 82398MB

106 23268 ns 30386MBs 824GB

where 119879119904(119878119894) models the average interdeparture time of

stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878

119894to produce one

stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as

119879119904 (farm (119882 119899)) =119879119904 (119882)

119899 (7)

Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899

for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10

5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 11

two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works

However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories

4 Discussion and Related Works

In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models

Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]

The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects

The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of

biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms

DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]

StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime

In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses

Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency

An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])

A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2

This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

12 BioMed Research International

Table 2 Biological simulation tools comparison

Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing

partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10

5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm

5 Conclusions

In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability

We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new

Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several

challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution

Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck

The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)

References

[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 13

[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001

[3] G Paun Membrane Computing An Introduction Springer2002

[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006

[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009

[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008

[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008

[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998

[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004

[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012

[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011

[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011

[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006

[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011

[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008

[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009

[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989

[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008

[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel

computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009

[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007

[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012

[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-

Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010

[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979

[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999

[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964

[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972

[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998

[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000

[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999

[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999

[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004

[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

14 BioMed Research International

[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005

[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo

[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009

[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001

[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005

[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010

[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml

[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010

[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010

[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011

[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010

[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011

[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011

[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012

[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology