Servet: A benchmark suite for autotuning on multicore clusters
On designing multicore-aware simulators for systems biology endowed with on-line statistics
-
Upload
universitaditorino -
Category
Documents
-
view
3 -
download
0
Transcript of On designing multicore-aware simulators for systems biology endowed with on-line statistics
Research ArticleOn Designing Multicore-Aware Simulators for Systems BiologyEndowed with OnLine Statistics
Marco Aldinucci1 Cristina Calcagno12 Mario Coppo1
Ferruccio Damiani1 Maurizio Drocco1 Eva Sciacca1 Salvatore Spinella1
Massimo Torquati3 and Angelo Troina1
1 Department of Computer Science University of Torino 10149 Torino Italy2 Department of Life Sciences and Systems Biology University of Torino 10123 Torino Italy3 Department of Computer Science University of Pisa 56127 Pisa Italy
Correspondence should be addressed to Angelo Troina troinadiunitoit
Received 19 February 2014 Revised 16 May 2014 Accepted 18 May 2014 Published 22 June 2014
Academic Editor Horacio Perez-Sanchez
Copyright copy 2014 Marco Aldinucci et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
The paper arguments are on enabling methodologies for the design of a fully parallel online interactive tool aiming to supportthe bioinformatics scientists In particular the features of these methodologies supported by the FastFlow parallel programmingframework are shown on a simulation tool to perform themodeling the tuning and the sensitivity analysis of stochastic biologicalmodels A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should beanalysed by statistic and datamining tools In the considered approach the two stages are pipelined in such away that the simulationstage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial resultThe simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biologicalsystems behavior on a multicore platform and representative proof-of-concept biological systems The exploited methodologiesinclude pattern-based parallel programming and data streaming that provide key features to the software designers such asperformance portability and efficient in-memory (big) data management and movement Two paradigmatic classes of biologicalsystems exhibiting multistable and oscillatory behavior are used as a testbed
1 Introduction
This paper presents a critical rethinking of the parallelizationof biological computational tools in the light of multicoreplatforms which nowadays equip all scientific laboratories
We will focus on the features that are required to derivean efficient simulator of stochastic processes consideringin particular performance and easy engineering This latteraspect will be of crucial importance for next generation ofbiological tools that will be largely designed by bioinformat-ics scientists who are likely to be more interested in theaccurate modeling of natural phenomena rather than on thesynchronisation protocols required to build efficient tools onmulticore platforms
The stochastic simulation of biological systems is anincreasingly popular technique in bioinformatics as either analternative or a complementary tool to ordinary differentialequations (ODEs) This trend starting from Gillespiersquos sem-inal work [1] has been supported by a growing number offormalisms aiming to describe stochasticmodels of biologicalsystems [2ndash7]
The stochastic modeling approach is computation-ally much more expensive than ODEs Nevertheless thisapproach is quite attractive for its superior ability to describetransient and multistable behaviors of biological systemsIn particular it allows studying rare or divergent behaviorsspikes and discriminate families of possible behaviors that
Hindawi Publishing CorporationBioMed Research InternationalVolume 2014 Article ID 207041 14 pageshttpdxdoiorg1011552014207041
2 BioMed Research International
are typically hidden in the averaged behavior described byODEs
The high computational cost of stochastic simulation canalso be very annoying in the tuning of biochemical models inwhich some quantitative parameters are unknown or scarcelyknown and need a set of tests to be fixed
This has led in the last two decades to a numberof attempts to accelerate them up using several kind oftechniques such as approximate simulation algorithms andparallel computing [8 9] In this work this latter approach istaken into account
Since stochastic simulations rely onMonte Carlomethodmany independent simulation instances should be computedand analysed to achieve statistically meaningful results Thecomputation of these independent instances has been tradi-tionally exploited in an embarrassingly parallel fashion exe-cuting a partition of the instances on different machinesThisapproach naturally couples with the distributed executionof a batch of tasks that require large infrastructures (eggrids and clouds) and suffers from slow time-to-solutionas each experiment requires to enqueue the simulations inshared environment deploy initial data simulate the modelgather results from a distributed environment postprocessthem (often sequentially) then eventually access resultsThisprocess is typically repeated several times to fine tune theinitial conditions and simulation parameters
This approach when transferred on multicore platformwhich nowadays equips the large majority of computingplatforms fails shortly to produce actual application speedupespecially for IO and memory-bound applications since allthe cores usually share the same memory and IO subsystemIndeed the simulation of biological systems produces a largeamount of data which can be regarded as streams of dataresulting from the on-going simulations The managementof these streams is not trivial on multicore platforms asthe memory bandwidth cannot usually sustain a continuousflux of data coming from all the cores at the same time Arelated aspect concerns the filtering and the analysis of rawresults which require the merging of data obtained fromdifferent simulation instances and possibly their statisticaldescription or mining with data reduction techniques Evenin a distributed computing environment this phase is oftendemoted to a secondary aspect in the computation andtreated with offline postprocessing tools frequently not evendisclosed in performance results
This approach is no longer practical especially on multi-core platforms because of a number of reasons
(i) the ever-increasing size of produced data burdens onthe main weaknesses of multicore platforms that ismemory bandwidth and core synchronisations
(ii) the ldquosequentialisationrdquo of simulation and analysisphases slow down the design-to-result process whichis particularly annoying during the tuning of thebiological model especially in the cases where thesimulation outcome could show very soon an incor-rect behavior
(iii) the design of the simulator is often specifically opti-mised for a specific parallel platform eithermulticoreor distributed (or not optimised at all)
Since the frequency and size of data moved across sim-ulation workflows strictly depend on the required accuracythe simulation and analysis of biological systems at high-precision happen to be a serious issue on modern shared-memory multicore platforms Indeed it involves the mergingof results from different simulation instances and possiblytheir statistical description or mining with data reductiontechniques
The design of simulation-analysis workflow encompass-ing a parallel simulator stage and a parallel data analysisstage is presented along with its experimental validationThe two stages are pipelined in such a way that at eachobserved simulation time 119905
119894 the simulation stage stream out
the partial results of all simulation trajectories (aligned at119905119894) to the analysis stage that immediately produces a partial
result Analysis stage which can be equipped with user-defined statistic and mining operators works on sliding datawindow and does not require keeping inmemory the full dataset with both performance and response time benefits
The advocated design methodology is validated on inter-esting classes of biological problems for which the classicalmodelization via ODE is quite problematic if not impossiblewhile a stochastic model can be more proficuous As we shallsee despite the whole simulation-analysis being performedon partial data that is on temporal sliding window ofsimulation results the dynamics of the system is effectivelyapproximated and can be shown to bioinformatics scientistduring a simulation
The technical challenges for the parallelisation of simu-lation and analysis stages and their pipelining are discussed(among these the exploitation of high-level pattern-basedparallel programming approaches to decrease design andimplementation time) Eventually the proposed approach canbe used as a fully reusable methodology that can be exploitedin the design or parallelisation of other tools for systemsbiology
The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executing the sim-ulation and online analysis workflow onmulticore platformsIn this respect paradigmatic examples of two challengingclasses of biological systems that is bistablemultistable andoscillatory systems are discussed The key behavior of thesesystems are represented by way of different classes of onlineanalysis tools introduced in the previous section respectivelystatistical description clustering and frequency detection
To perform these experiments a simulator for the calcu-lus of wrapped compartments (CWC) built with the abovetechnology will be used as test-bed CWC [10] is a recentlyproposed formalism for the representation of biologicalsystems CWC extends the known stochastic simulators byadding a nested structure of labeled compartments delimitedby membranes However to better focus on the proposedmethodology and make the paper self-contained we willwrite all examples of the paper in the basic subset of CWCin which biochemical reactions are denoted in a standard
BioMed Research International 3
chemical notation We only remark that a distinguishedfeature of CWC which will be used in this paper is thepossibility of associating each reaction with an arbitrary ratefunction depending on the overall content of the ambientin which the reaction takes place This allows to tailor thereaction rates on the specific characteristics of the systemas for instance when representing nonlinear reactions asMichaelis-Menten kinetics
2 Methods
The methods for the stochastic simulation of biologicalsystems including Gillespiersquos algorithm [1] are typicallybased on theMonte Carlomethod An individual simulationwhich tracks the state of the system at each time-step iscalled trajectory Many thousands of independent trajectoriesmight be needed to get a representative picture of how thesystem behaves on the whole This behavior springs from thecollective analysis of trajectories which is typically carriedout by way of statistical or data mining estimators
Thanks to their independence the different instancesneeded to simulate a biological model that can be easilycomputed in an embarrassingly parallel fashion However thecomplete simulation workflow needed to derive simulationresult including additional phases such as the dispatchingand scheduling of simulations result gathering trajectorydata assembling and analysis phases which exhibit datadependencies (thus requires communications andor syn-chronisations) Often to simplify the design of the simulationtool these phases are neither parallelized nor considered inthe performance evaluation As a matter of a fact a parallelsimulation is often considered an ldquoembarrassingly parallelrdquoproblem whereas it is if data distribution gathering filteringand analysis are not considered as part of the whole simula-tion workflowThese phases often (questionably) consideredas preprocessing and postprocessing phases may result tobe as expensive as the simulation itself Moreover the fullsequentialization of the phases inhibits the early detection ofbadly tuned simulations and makes the model tuning spiralannoyingly slow
As an example a simulation of theHIV diffusion problem(computed using the StochKit toolkit for 4 years of simulationtime) may easily produce over 5GBytes of raw data perinstance [11] As clear the data size is 119899-folded when 119899
instances are considered Eventually this data should begathered and often reduced to a single trajectory via statisticalmethods or analysed with data mining methods that canbe much more time expensive to be figured out than barestatistical estimators
These potential performance flaws are further exacer-bated inmulticore andmany-core platformsThese platformsdo not exhibit the same degree of replication of hardwareresources that can be found in distributed environments andeven independent processes actually compete for the samehardware resources within the single platform such as mainand secondarymemory the performances of which representthe real challenge of the forthcoming parallel programmingmodels (aka memory wall problem) While simulation is
substantially a CPU-bound problem on distributed platformit may become prevalently an IO-bound problem on amulticore platform due to the need to store and postprocessmany trajectories The finer the observed simulation time-step the strongest the computational problem is characterizedas IO-bound
To be effective stochastic methods in systems biologyrequire many trajectories with a fine grain resolution inorder to make observable deviant trajectories peaks highvariance of results and multistable behaviors which oftenrepresent the real nature of the phenomena that is not wellcaptured by traditional approaches such as ODEs Theseevents are often not immediate to detect in the bulk of grosssimulation results Several techniques for analyzing such datafor example principal components analysis linear modelingand canonical correlation analysis have been proposed It canbe imagined that next generation software tools for naturalsciences should be able to perform this kind of processing inpipeline with the running data source as a partially or totallyonline process because
(i) it will be needed to manage an ever-increasingamount of experimental data either coming frommeasurement or simulation and
(ii) it will substantially improve the overall experimentalworkflow by providing the natural scientists with analmost real-time feedback enabling the early tuningor sweeping of the experimental parameters andthusscientific productivity
The flexibility given by the possibility of running manydifferent analysis modules in parallel is of particular interestas in many biological case studies the searched pattern inexperimental results is unknown and might require to trydifferent kinds of analysis since modules can be swept inparallel
The parallel analysis of the system dynamics (eg alongtime) is more challenging since online data processingrequires statistic and data mining operators to work onstreamed data and in general random access to data is guar-anteed only within a limited window of the whole datasetwhile already accessed data can be only stored in synthesizedform When data description techniques requiring to accessthe whole data set in random order cannot be used onlinedata description and mining requires novel algorithms Theextensive study of these algorithms is an emerging topic indata discovery community and is beyond the scope of thiswork
We advocate the parallelization of both simulations andanalysis by pipelining them in a two-stage workflow whereboth stages are also parallel We also advocate the high-leveldesign of the whole workflow to enhance both productivityand efficiency on different platforms (ie performance porta-bility)
21 A Parallel Simulation-Analysis Workflow for CWC Theintrinsic complexity in the parallelization of the single stephas traditionally led to the exploitation of parallelism in the
4 BioMed Research International
Simeng
Gat
her
Simeng
Disp
tach
Incomplete simulation tasks (with load balancing)
Farm
Alig
nmen
t of
traj
ecto
ries
Gen
erat
ion
of
simul
atio
n ta
sks Stat
eng
Gat
her
Disp
tach
Farm
Gen
erat
ion
ofsli
ding
win
dow
sof
traj
ecto
riesRaw
simulationresults Display of
results
graphicaluser
interface
MeanVariance
simulationresults
Filtered
stateng
Simulation pipeline Analysis pipeline
Start new simulations steer and terminate running simulations
Main pipeline
K-means
middot middot middot
middot middot middot
Figure 1 CWC simulator with online parallel analysis architecture
computation of independent instances of the same simula-tion which should anyway be computed to achieve statisticalconvergence of simulated trajectories (as in all Monte Carlomethods) The problem is well understood it has beenexploited in the last two decades inmany different flavors anddistributed computing environments from clusters to grid toclouds [8 11ndash16] Notwithstanding that the problem has beenoften approached either neglecting to consider the cost ofanalysis or assuming that output data has a negligible size thisis not likely to happen in this and next generations biologicalsimulations
The previous considerations have led to the design of asimulation tool that includes both parallel simulation anddata analysis in a single workflowThese phases are pipelinedrather than sequential To make it possible it is needed thatall the logical phases of the process (ie data distributionparallel simulation result gathering parallel trajectory dataassembling and analysis) can be effectively pipelined Thisimplies that all phases can effectively work on data streamsIdeally an efficient and portable implementation of thesimulation-analysis workflow should be able to representstreams as first-class concept and provide the designerswith high-level programming constructs to manage themefficiently (also in multicore platforms)
To date application programming for bioinformatics(and other sciences) has not embraced much more than low-level communication and synchronization libraries In thehierarchy of abstractions it is only slightly above togglingabsolute binary in the front panel of the machine Theadvent and the pervasiveness of multicore platforms arepushing parallel programming outside its historical nicheNext generation software should be designed not only to beefficient on these platforms but also to be developed andtuned with high-productivity and reduced time-to-marketThis is particularly important when parallel computing servesas a tool for other sciences since nonexpert designers shouldbe able to experiment different algorithmic solutions for bothsimulations and data analysis
Attempts to reduce the programming effort by raisingthe level of abstraction through the provision of parallelprogramming frameworks date back at least three decadesand have resulted in a number of significant contribu-tions Notable among these is the skeletal approach [17]
(aka pattern-based parallel programming) which appearsto become increasingly popular after being revamped byseveral successful parallel programming frameworks [18ndash20]Skeletons (aka patterns) capture common parallel program-ming paradigms (eg Map Reduce MapReduce PipelineFarm and DivideampConquer) and make them available to theprogrammer as high-level programming constructs equippedwith well-defined functional and extrafunctional semantics[21] Some of them are specifically designed tomanage streamas first-class objects such as pipeline farm (aka master-worker pattern) and loop patterns
A particularly efficient implementation of the describedpatterns is provided by the FastFlow parallel programmingframework [22] FastFlow is a C++ open-source templatelibrary aimed at simplifying the development of efficientapplications for multicore platforms FastFlow eases thedevelopment and guaranteed runtime efficiency by raisingthe abstraction level of the design phase thus providingdevelopers with a set of optimised parallel programmingpatterns [22 23] Run-time efficiency is mainly achievedby way of a lock-less implementation of patterns exhibitinga speed edge against other popular parallel programmingframeworks (also see performance comparisons in [24])The FastFlow implementation of the pipeline loop andfarm patterns is exploited to design the CWC simulationworkflow The effectiveness of the FastFlow framework inhigh-frequency synchronizations (ie fine-grained tasks) ata high-level of abstraction is the key features to devisea portable and efficient implementation of the simulationworkflow which is CWC simulator with online parallelanalysis architecture composed by a three-stage pipelinesimulation analysis and display of results The former twostages are in turn pipelines of other stages whereas the displayof results is realized by way of a graphical user interface(GUI) The big picture of the simulation workflow is shownin Figure 1 In the picture all the grey boxes as well as all thecode needed for synchronization anddata streaming (double-headed arrows) are automatically generated by the FastFlowframework The implementation of the whole software actu-ally consists in declaring the structure of the workflow interms of FastFlow objects (ie farm and pipelines) and filling
BioMed Research International 5
white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed
211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories
The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline
In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic
212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in
the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage
The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows
(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency
(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ
119882centered in the current
data point 119909119894
equiv 119891(119905119894) where 119905
119894equiv 1199050
+ 119894Δ119878(where Δ
119878
is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864
119894 referred to the local
trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888
119899that preserve higher moments that is
to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial
(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period
6 BioMed Research International
Figure 2 Screenshots of the simulation tool interface
between successive peaks the frequency of the related eventis then inferred
The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section
213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2
3 Experiments and Results
The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection
31 Expressivity and Effectiveness
311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are
119860 119860003
997891997888rarr 119860 119860 119860 119860 119860 11986000001
997891997888rarr 119860 119860
119861200
997891997888rarr 119861 119860 11986035
997891997888rarr ∙
(1)
where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law
The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory
Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480
stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize
312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863
+1198621198682 leads the lysogenic
stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863
minus1198621198682 The notation 119863
+1198621198682119863
minus1198621198682 models the
phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the
BioMed Research International 7
500
250
0
500
250
0
500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200
DeviationPo
pula
tion
Popu
latio
n
Popu
latio
n
Time
Time Time
00 25 50 75 100 125 150 175 200
Time
A
Mean A
750
500
250
0
Popu
latio
n 750
A0
A1
A0
A1
21 24
A A
K-means A
(a)
00 25 50 75 100 125 150 175 200
500
250
0
Popu
latio
n500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200
Popu
latio
n
Time
Time
Time
00 25 50 75 100 125 150 175 200
Time
500
250
0
Popu
latio
n
Popu
latio
n750
500
250
0
750
21 24
A A
A
Mean A
A0
A1
Deviation K-means A
A0
A
A
(b)
Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)
promoting site with a global repressing effect) The reactionrules in this system are
119862119868 119862119868005
997891997888rarr 1198621198682 119862119868205
997891997888rarr 119862119868 119862119868
1198621198682 1198630026
997891997888rarr 119863+1198621198682 119863
+11986211986820026
997891997888rarr 1198621198682 119863
1198621198682 1198630026
997891997888rarr 119863minus1198621198682 119863
minus11986211986820026
997891997888rarr 1198621198682 119863
119863+1198621198682 1198621198682
013
997891997888rarr 119863+1198621198682119863
minus1198621198682
119863+1198621198682119863
minus1198621198682013
997891997888rarr 119863+1198621198682 1198621198682
119863+1198621198682 119875
40
997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868
00007
997891997888rarr ∙
(2)
where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the
8 BioMed Research International
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Num
ber o
f mol
ecul
es (C
I)
Time (min)
(a) 120582-phage model simulation
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120
QT
cluste
rs w
ith tr
ends
(CI)
Time (min)
(b) QT clustering
Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT
basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))
Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories
K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters
313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare
FRQin119891FRQin(119905)997891997888rarr FRQin 119872
11987205
997891997888rarr 119872 FRQ
119872
119891119872
997891997888rarr ∙ ⊤ FRQ119891119889
997891997888rarr ∙
FRQ 05997891997888rarr FRQin
FRQin06
997891997888rarr FRQ
(3)
The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V
119904(119905)(119870119899
119868(FRQ119899 + 119870
119899
119868))
denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V
119904(119905) defined by
V119904 (119905)
= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879
200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)
(4)
increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870
119868is related to the threshold beyond
which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891
119872= V119898(119872(119870
119872+ 119872))
The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870
119889+ FRQ)) where V
119889is the maximum rate
of FRQ degradation and theMichaelis constant related to thisprocess is 119870
119889
As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V
119898= 505 V
119889= 140 119896
119904= 05 119896
1= 05 119896
2= 06
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
2 BioMed Research International
are typically hidden in the averaged behavior described byODEs
The high computational cost of stochastic simulation canalso be very annoying in the tuning of biochemical models inwhich some quantitative parameters are unknown or scarcelyknown and need a set of tests to be fixed
This has led in the last two decades to a numberof attempts to accelerate them up using several kind oftechniques such as approximate simulation algorithms andparallel computing [8 9] In this work this latter approach istaken into account
Since stochastic simulations rely onMonte Carlomethodmany independent simulation instances should be computedand analysed to achieve statistically meaningful results Thecomputation of these independent instances has been tradi-tionally exploited in an embarrassingly parallel fashion exe-cuting a partition of the instances on different machinesThisapproach naturally couples with the distributed executionof a batch of tasks that require large infrastructures (eggrids and clouds) and suffers from slow time-to-solutionas each experiment requires to enqueue the simulations inshared environment deploy initial data simulate the modelgather results from a distributed environment postprocessthem (often sequentially) then eventually access resultsThisprocess is typically repeated several times to fine tune theinitial conditions and simulation parameters
This approach when transferred on multicore platformwhich nowadays equips the large majority of computingplatforms fails shortly to produce actual application speedupespecially for IO and memory-bound applications since allthe cores usually share the same memory and IO subsystemIndeed the simulation of biological systems produces a largeamount of data which can be regarded as streams of dataresulting from the on-going simulations The managementof these streams is not trivial on multicore platforms asthe memory bandwidth cannot usually sustain a continuousflux of data coming from all the cores at the same time Arelated aspect concerns the filtering and the analysis of rawresults which require the merging of data obtained fromdifferent simulation instances and possibly their statisticaldescription or mining with data reduction techniques Evenin a distributed computing environment this phase is oftendemoted to a secondary aspect in the computation andtreated with offline postprocessing tools frequently not evendisclosed in performance results
This approach is no longer practical especially on multi-core platforms because of a number of reasons
(i) the ever-increasing size of produced data burdens onthe main weaknesses of multicore platforms that ismemory bandwidth and core synchronisations
(ii) the ldquosequentialisationrdquo of simulation and analysisphases slow down the design-to-result process whichis particularly annoying during the tuning of thebiological model especially in the cases where thesimulation outcome could show very soon an incor-rect behavior
(iii) the design of the simulator is often specifically opti-mised for a specific parallel platform eithermulticoreor distributed (or not optimised at all)
Since the frequency and size of data moved across sim-ulation workflows strictly depend on the required accuracythe simulation and analysis of biological systems at high-precision happen to be a serious issue on modern shared-memory multicore platforms Indeed it involves the mergingof results from different simulation instances and possiblytheir statistical description or mining with data reductiontechniques
The design of simulation-analysis workflow encompass-ing a parallel simulator stage and a parallel data analysisstage is presented along with its experimental validationThe two stages are pipelined in such a way that at eachobserved simulation time 119905
119894 the simulation stage stream out
the partial results of all simulation trajectories (aligned at119905119894) to the analysis stage that immediately produces a partial
result Analysis stage which can be equipped with user-defined statistic and mining operators works on sliding datawindow and does not require keeping inmemory the full dataset with both performance and response time benefits
The advocated design methodology is validated on inter-esting classes of biological problems for which the classicalmodelization via ODE is quite problematic if not impossiblewhile a stochastic model can be more proficuous As we shallsee despite the whole simulation-analysis being performedon partial data that is on temporal sliding window ofsimulation results the dynamics of the system is effectivelyapproximated and can be shown to bioinformatics scientistduring a simulation
The technical challenges for the parallelisation of simu-lation and analysis stages and their pipelining are discussed(among these the exploitation of high-level pattern-basedparallel programming approaches to decrease design andimplementation time) Eventually the proposed approach canbe used as a fully reusable methodology that can be exploitedin the design or parallelisation of other tools for systemsbiology
The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executing the sim-ulation and online analysis workflow onmulticore platformsIn this respect paradigmatic examples of two challengingclasses of biological systems that is bistablemultistable andoscillatory systems are discussed The key behavior of thesesystems are represented by way of different classes of onlineanalysis tools introduced in the previous section respectivelystatistical description clustering and frequency detection
To perform these experiments a simulator for the calcu-lus of wrapped compartments (CWC) built with the abovetechnology will be used as test-bed CWC [10] is a recentlyproposed formalism for the representation of biologicalsystems CWC extends the known stochastic simulators byadding a nested structure of labeled compartments delimitedby membranes However to better focus on the proposedmethodology and make the paper self-contained we willwrite all examples of the paper in the basic subset of CWCin which biochemical reactions are denoted in a standard
BioMed Research International 3
chemical notation We only remark that a distinguishedfeature of CWC which will be used in this paper is thepossibility of associating each reaction with an arbitrary ratefunction depending on the overall content of the ambientin which the reaction takes place This allows to tailor thereaction rates on the specific characteristics of the systemas for instance when representing nonlinear reactions asMichaelis-Menten kinetics
2 Methods
The methods for the stochastic simulation of biologicalsystems including Gillespiersquos algorithm [1] are typicallybased on theMonte Carlomethod An individual simulationwhich tracks the state of the system at each time-step iscalled trajectory Many thousands of independent trajectoriesmight be needed to get a representative picture of how thesystem behaves on the whole This behavior springs from thecollective analysis of trajectories which is typically carriedout by way of statistical or data mining estimators
Thanks to their independence the different instancesneeded to simulate a biological model that can be easilycomputed in an embarrassingly parallel fashion However thecomplete simulation workflow needed to derive simulationresult including additional phases such as the dispatchingand scheduling of simulations result gathering trajectorydata assembling and analysis phases which exhibit datadependencies (thus requires communications andor syn-chronisations) Often to simplify the design of the simulationtool these phases are neither parallelized nor considered inthe performance evaluation As a matter of a fact a parallelsimulation is often considered an ldquoembarrassingly parallelrdquoproblem whereas it is if data distribution gathering filteringand analysis are not considered as part of the whole simula-tion workflowThese phases often (questionably) consideredas preprocessing and postprocessing phases may result tobe as expensive as the simulation itself Moreover the fullsequentialization of the phases inhibits the early detection ofbadly tuned simulations and makes the model tuning spiralannoyingly slow
As an example a simulation of theHIV diffusion problem(computed using the StochKit toolkit for 4 years of simulationtime) may easily produce over 5GBytes of raw data perinstance [11] As clear the data size is 119899-folded when 119899
instances are considered Eventually this data should begathered and often reduced to a single trajectory via statisticalmethods or analysed with data mining methods that canbe much more time expensive to be figured out than barestatistical estimators
These potential performance flaws are further exacer-bated inmulticore andmany-core platformsThese platformsdo not exhibit the same degree of replication of hardwareresources that can be found in distributed environments andeven independent processes actually compete for the samehardware resources within the single platform such as mainand secondarymemory the performances of which representthe real challenge of the forthcoming parallel programmingmodels (aka memory wall problem) While simulation is
substantially a CPU-bound problem on distributed platformit may become prevalently an IO-bound problem on amulticore platform due to the need to store and postprocessmany trajectories The finer the observed simulation time-step the strongest the computational problem is characterizedas IO-bound
To be effective stochastic methods in systems biologyrequire many trajectories with a fine grain resolution inorder to make observable deviant trajectories peaks highvariance of results and multistable behaviors which oftenrepresent the real nature of the phenomena that is not wellcaptured by traditional approaches such as ODEs Theseevents are often not immediate to detect in the bulk of grosssimulation results Several techniques for analyzing such datafor example principal components analysis linear modelingand canonical correlation analysis have been proposed It canbe imagined that next generation software tools for naturalsciences should be able to perform this kind of processing inpipeline with the running data source as a partially or totallyonline process because
(i) it will be needed to manage an ever-increasingamount of experimental data either coming frommeasurement or simulation and
(ii) it will substantially improve the overall experimentalworkflow by providing the natural scientists with analmost real-time feedback enabling the early tuningor sweeping of the experimental parameters andthusscientific productivity
The flexibility given by the possibility of running manydifferent analysis modules in parallel is of particular interestas in many biological case studies the searched pattern inexperimental results is unknown and might require to trydifferent kinds of analysis since modules can be swept inparallel
The parallel analysis of the system dynamics (eg alongtime) is more challenging since online data processingrequires statistic and data mining operators to work onstreamed data and in general random access to data is guar-anteed only within a limited window of the whole datasetwhile already accessed data can be only stored in synthesizedform When data description techniques requiring to accessthe whole data set in random order cannot be used onlinedata description and mining requires novel algorithms Theextensive study of these algorithms is an emerging topic indata discovery community and is beyond the scope of thiswork
We advocate the parallelization of both simulations andanalysis by pipelining them in a two-stage workflow whereboth stages are also parallel We also advocate the high-leveldesign of the whole workflow to enhance both productivityand efficiency on different platforms (ie performance porta-bility)
21 A Parallel Simulation-Analysis Workflow for CWC Theintrinsic complexity in the parallelization of the single stephas traditionally led to the exploitation of parallelism in the
4 BioMed Research International
Simeng
Gat
her
Simeng
Disp
tach
Incomplete simulation tasks (with load balancing)
Farm
Alig
nmen
t of
traj
ecto
ries
Gen
erat
ion
of
simul
atio
n ta
sks Stat
eng
Gat
her
Disp
tach
Farm
Gen
erat
ion
ofsli
ding
win
dow
sof
traj
ecto
riesRaw
simulationresults Display of
results
graphicaluser
interface
MeanVariance
simulationresults
Filtered
stateng
Simulation pipeline Analysis pipeline
Start new simulations steer and terminate running simulations
Main pipeline
K-means
middot middot middot
middot middot middot
Figure 1 CWC simulator with online parallel analysis architecture
computation of independent instances of the same simula-tion which should anyway be computed to achieve statisticalconvergence of simulated trajectories (as in all Monte Carlomethods) The problem is well understood it has beenexploited in the last two decades inmany different flavors anddistributed computing environments from clusters to grid toclouds [8 11ndash16] Notwithstanding that the problem has beenoften approached either neglecting to consider the cost ofanalysis or assuming that output data has a negligible size thisis not likely to happen in this and next generations biologicalsimulations
The previous considerations have led to the design of asimulation tool that includes both parallel simulation anddata analysis in a single workflowThese phases are pipelinedrather than sequential To make it possible it is needed thatall the logical phases of the process (ie data distributionparallel simulation result gathering parallel trajectory dataassembling and analysis) can be effectively pipelined Thisimplies that all phases can effectively work on data streamsIdeally an efficient and portable implementation of thesimulation-analysis workflow should be able to representstreams as first-class concept and provide the designerswith high-level programming constructs to manage themefficiently (also in multicore platforms)
To date application programming for bioinformatics(and other sciences) has not embraced much more than low-level communication and synchronization libraries In thehierarchy of abstractions it is only slightly above togglingabsolute binary in the front panel of the machine Theadvent and the pervasiveness of multicore platforms arepushing parallel programming outside its historical nicheNext generation software should be designed not only to beefficient on these platforms but also to be developed andtuned with high-productivity and reduced time-to-marketThis is particularly important when parallel computing servesas a tool for other sciences since nonexpert designers shouldbe able to experiment different algorithmic solutions for bothsimulations and data analysis
Attempts to reduce the programming effort by raisingthe level of abstraction through the provision of parallelprogramming frameworks date back at least three decadesand have resulted in a number of significant contribu-tions Notable among these is the skeletal approach [17]
(aka pattern-based parallel programming) which appearsto become increasingly popular after being revamped byseveral successful parallel programming frameworks [18ndash20]Skeletons (aka patterns) capture common parallel program-ming paradigms (eg Map Reduce MapReduce PipelineFarm and DivideampConquer) and make them available to theprogrammer as high-level programming constructs equippedwith well-defined functional and extrafunctional semantics[21] Some of them are specifically designed tomanage streamas first-class objects such as pipeline farm (aka master-worker pattern) and loop patterns
A particularly efficient implementation of the describedpatterns is provided by the FastFlow parallel programmingframework [22] FastFlow is a C++ open-source templatelibrary aimed at simplifying the development of efficientapplications for multicore platforms FastFlow eases thedevelopment and guaranteed runtime efficiency by raisingthe abstraction level of the design phase thus providingdevelopers with a set of optimised parallel programmingpatterns [22 23] Run-time efficiency is mainly achievedby way of a lock-less implementation of patterns exhibitinga speed edge against other popular parallel programmingframeworks (also see performance comparisons in [24])The FastFlow implementation of the pipeline loop andfarm patterns is exploited to design the CWC simulationworkflow The effectiveness of the FastFlow framework inhigh-frequency synchronizations (ie fine-grained tasks) ata high-level of abstraction is the key features to devisea portable and efficient implementation of the simulationworkflow which is CWC simulator with online parallelanalysis architecture composed by a three-stage pipelinesimulation analysis and display of results The former twostages are in turn pipelines of other stages whereas the displayof results is realized by way of a graphical user interface(GUI) The big picture of the simulation workflow is shownin Figure 1 In the picture all the grey boxes as well as all thecode needed for synchronization anddata streaming (double-headed arrows) are automatically generated by the FastFlowframework The implementation of the whole software actu-ally consists in declaring the structure of the workflow interms of FastFlow objects (ie farm and pipelines) and filling
BioMed Research International 5
white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed
211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories
The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline
In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic
212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in
the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage
The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows
(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency
(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ
119882centered in the current
data point 119909119894
equiv 119891(119905119894) where 119905
119894equiv 1199050
+ 119894Δ119878(where Δ
119878
is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864
119894 referred to the local
trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888
119899that preserve higher moments that is
to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial
(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period
6 BioMed Research International
Figure 2 Screenshots of the simulation tool interface
between successive peaks the frequency of the related eventis then inferred
The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section
213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2
3 Experiments and Results
The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection
31 Expressivity and Effectiveness
311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are
119860 119860003
997891997888rarr 119860 119860 119860 119860 119860 11986000001
997891997888rarr 119860 119860
119861200
997891997888rarr 119861 119860 11986035
997891997888rarr ∙
(1)
where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law
The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory
Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480
stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize
312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863
+1198621198682 leads the lysogenic
stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863
minus1198621198682 The notation 119863
+1198621198682119863
minus1198621198682 models the
phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the
BioMed Research International 7
500
250
0
500
250
0
500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200
DeviationPo
pula
tion
Popu
latio
n
Popu
latio
n
Time
Time Time
00 25 50 75 100 125 150 175 200
Time
A
Mean A
750
500
250
0
Popu
latio
n 750
A0
A1
A0
A1
21 24
A A
K-means A
(a)
00 25 50 75 100 125 150 175 200
500
250
0
Popu
latio
n500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200
Popu
latio
n
Time
Time
Time
00 25 50 75 100 125 150 175 200
Time
500
250
0
Popu
latio
n
Popu
latio
n750
500
250
0
750
21 24
A A
A
Mean A
A0
A1
Deviation K-means A
A0
A
A
(b)
Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)
promoting site with a global repressing effect) The reactionrules in this system are
119862119868 119862119868005
997891997888rarr 1198621198682 119862119868205
997891997888rarr 119862119868 119862119868
1198621198682 1198630026
997891997888rarr 119863+1198621198682 119863
+11986211986820026
997891997888rarr 1198621198682 119863
1198621198682 1198630026
997891997888rarr 119863minus1198621198682 119863
minus11986211986820026
997891997888rarr 1198621198682 119863
119863+1198621198682 1198621198682
013
997891997888rarr 119863+1198621198682119863
minus1198621198682
119863+1198621198682119863
minus1198621198682013
997891997888rarr 119863+1198621198682 1198621198682
119863+1198621198682 119875
40
997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868
00007
997891997888rarr ∙
(2)
where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the
8 BioMed Research International
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Num
ber o
f mol
ecul
es (C
I)
Time (min)
(a) 120582-phage model simulation
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120
QT
cluste
rs w
ith tr
ends
(CI)
Time (min)
(b) QT clustering
Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT
basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))
Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories
K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters
313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare
FRQin119891FRQin(119905)997891997888rarr FRQin 119872
11987205
997891997888rarr 119872 FRQ
119872
119891119872
997891997888rarr ∙ ⊤ FRQ119891119889
997891997888rarr ∙
FRQ 05997891997888rarr FRQin
FRQin06
997891997888rarr FRQ
(3)
The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V
119904(119905)(119870119899
119868(FRQ119899 + 119870
119899
119868))
denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V
119904(119905) defined by
V119904 (119905)
= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879
200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)
(4)
increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870
119868is related to the threshold beyond
which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891
119872= V119898(119872(119870
119872+ 119872))
The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870
119889+ FRQ)) where V
119889is the maximum rate
of FRQ degradation and theMichaelis constant related to thisprocess is 119870
119889
As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V
119898= 505 V
119889= 140 119896
119904= 05 119896
1= 05 119896
2= 06
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 3
chemical notation We only remark that a distinguishedfeature of CWC which will be used in this paper is thepossibility of associating each reaction with an arbitrary ratefunction depending on the overall content of the ambientin which the reaction takes place This allows to tailor thereaction rates on the specific characteristics of the systemas for instance when representing nonlinear reactions asMichaelis-Menten kinetics
2 Methods
The methods for the stochastic simulation of biologicalsystems including Gillespiersquos algorithm [1] are typicallybased on theMonte Carlomethod An individual simulationwhich tracks the state of the system at each time-step iscalled trajectory Many thousands of independent trajectoriesmight be needed to get a representative picture of how thesystem behaves on the whole This behavior springs from thecollective analysis of trajectories which is typically carriedout by way of statistical or data mining estimators
Thanks to their independence the different instancesneeded to simulate a biological model that can be easilycomputed in an embarrassingly parallel fashion However thecomplete simulation workflow needed to derive simulationresult including additional phases such as the dispatchingand scheduling of simulations result gathering trajectorydata assembling and analysis phases which exhibit datadependencies (thus requires communications andor syn-chronisations) Often to simplify the design of the simulationtool these phases are neither parallelized nor considered inthe performance evaluation As a matter of a fact a parallelsimulation is often considered an ldquoembarrassingly parallelrdquoproblem whereas it is if data distribution gathering filteringand analysis are not considered as part of the whole simula-tion workflowThese phases often (questionably) consideredas preprocessing and postprocessing phases may result tobe as expensive as the simulation itself Moreover the fullsequentialization of the phases inhibits the early detection ofbadly tuned simulations and makes the model tuning spiralannoyingly slow
As an example a simulation of theHIV diffusion problem(computed using the StochKit toolkit for 4 years of simulationtime) may easily produce over 5GBytes of raw data perinstance [11] As clear the data size is 119899-folded when 119899
instances are considered Eventually this data should begathered and often reduced to a single trajectory via statisticalmethods or analysed with data mining methods that canbe much more time expensive to be figured out than barestatistical estimators
These potential performance flaws are further exacer-bated inmulticore andmany-core platformsThese platformsdo not exhibit the same degree of replication of hardwareresources that can be found in distributed environments andeven independent processes actually compete for the samehardware resources within the single platform such as mainand secondarymemory the performances of which representthe real challenge of the forthcoming parallel programmingmodels (aka memory wall problem) While simulation is
substantially a CPU-bound problem on distributed platformit may become prevalently an IO-bound problem on amulticore platform due to the need to store and postprocessmany trajectories The finer the observed simulation time-step the strongest the computational problem is characterizedas IO-bound
To be effective stochastic methods in systems biologyrequire many trajectories with a fine grain resolution inorder to make observable deviant trajectories peaks highvariance of results and multistable behaviors which oftenrepresent the real nature of the phenomena that is not wellcaptured by traditional approaches such as ODEs Theseevents are often not immediate to detect in the bulk of grosssimulation results Several techniques for analyzing such datafor example principal components analysis linear modelingand canonical correlation analysis have been proposed It canbe imagined that next generation software tools for naturalsciences should be able to perform this kind of processing inpipeline with the running data source as a partially or totallyonline process because
(i) it will be needed to manage an ever-increasingamount of experimental data either coming frommeasurement or simulation and
(ii) it will substantially improve the overall experimentalworkflow by providing the natural scientists with analmost real-time feedback enabling the early tuningor sweeping of the experimental parameters andthusscientific productivity
The flexibility given by the possibility of running manydifferent analysis modules in parallel is of particular interestas in many biological case studies the searched pattern inexperimental results is unknown and might require to trydifferent kinds of analysis since modules can be swept inparallel
The parallel analysis of the system dynamics (eg alongtime) is more challenging since online data processingrequires statistic and data mining operators to work onstreamed data and in general random access to data is guar-anteed only within a limited window of the whole datasetwhile already accessed data can be only stored in synthesizedform When data description techniques requiring to accessthe whole data set in random order cannot be used onlinedata description and mining requires novel algorithms Theextensive study of these algorithms is an emerging topic indata discovery community and is beyond the scope of thiswork
We advocate the parallelization of both simulations andanalysis by pipelining them in a two-stage workflow whereboth stages are also parallel We also advocate the high-leveldesign of the whole workflow to enhance both productivityand efficiency on different platforms (ie performance porta-bility)
21 A Parallel Simulation-Analysis Workflow for CWC Theintrinsic complexity in the parallelization of the single stephas traditionally led to the exploitation of parallelism in the
4 BioMed Research International
Simeng
Gat
her
Simeng
Disp
tach
Incomplete simulation tasks (with load balancing)
Farm
Alig
nmen
t of
traj
ecto
ries
Gen
erat
ion
of
simul
atio
n ta
sks Stat
eng
Gat
her
Disp
tach
Farm
Gen
erat
ion
ofsli
ding
win
dow
sof
traj
ecto
riesRaw
simulationresults Display of
results
graphicaluser
interface
MeanVariance
simulationresults
Filtered
stateng
Simulation pipeline Analysis pipeline
Start new simulations steer and terminate running simulations
Main pipeline
K-means
middot middot middot
middot middot middot
Figure 1 CWC simulator with online parallel analysis architecture
computation of independent instances of the same simula-tion which should anyway be computed to achieve statisticalconvergence of simulated trajectories (as in all Monte Carlomethods) The problem is well understood it has beenexploited in the last two decades inmany different flavors anddistributed computing environments from clusters to grid toclouds [8 11ndash16] Notwithstanding that the problem has beenoften approached either neglecting to consider the cost ofanalysis or assuming that output data has a negligible size thisis not likely to happen in this and next generations biologicalsimulations
The previous considerations have led to the design of asimulation tool that includes both parallel simulation anddata analysis in a single workflowThese phases are pipelinedrather than sequential To make it possible it is needed thatall the logical phases of the process (ie data distributionparallel simulation result gathering parallel trajectory dataassembling and analysis) can be effectively pipelined Thisimplies that all phases can effectively work on data streamsIdeally an efficient and portable implementation of thesimulation-analysis workflow should be able to representstreams as first-class concept and provide the designerswith high-level programming constructs to manage themefficiently (also in multicore platforms)
To date application programming for bioinformatics(and other sciences) has not embraced much more than low-level communication and synchronization libraries In thehierarchy of abstractions it is only slightly above togglingabsolute binary in the front panel of the machine Theadvent and the pervasiveness of multicore platforms arepushing parallel programming outside its historical nicheNext generation software should be designed not only to beefficient on these platforms but also to be developed andtuned with high-productivity and reduced time-to-marketThis is particularly important when parallel computing servesas a tool for other sciences since nonexpert designers shouldbe able to experiment different algorithmic solutions for bothsimulations and data analysis
Attempts to reduce the programming effort by raisingthe level of abstraction through the provision of parallelprogramming frameworks date back at least three decadesand have resulted in a number of significant contribu-tions Notable among these is the skeletal approach [17]
(aka pattern-based parallel programming) which appearsto become increasingly popular after being revamped byseveral successful parallel programming frameworks [18ndash20]Skeletons (aka patterns) capture common parallel program-ming paradigms (eg Map Reduce MapReduce PipelineFarm and DivideampConquer) and make them available to theprogrammer as high-level programming constructs equippedwith well-defined functional and extrafunctional semantics[21] Some of them are specifically designed tomanage streamas first-class objects such as pipeline farm (aka master-worker pattern) and loop patterns
A particularly efficient implementation of the describedpatterns is provided by the FastFlow parallel programmingframework [22] FastFlow is a C++ open-source templatelibrary aimed at simplifying the development of efficientapplications for multicore platforms FastFlow eases thedevelopment and guaranteed runtime efficiency by raisingthe abstraction level of the design phase thus providingdevelopers with a set of optimised parallel programmingpatterns [22 23] Run-time efficiency is mainly achievedby way of a lock-less implementation of patterns exhibitinga speed edge against other popular parallel programmingframeworks (also see performance comparisons in [24])The FastFlow implementation of the pipeline loop andfarm patterns is exploited to design the CWC simulationworkflow The effectiveness of the FastFlow framework inhigh-frequency synchronizations (ie fine-grained tasks) ata high-level of abstraction is the key features to devisea portable and efficient implementation of the simulationworkflow which is CWC simulator with online parallelanalysis architecture composed by a three-stage pipelinesimulation analysis and display of results The former twostages are in turn pipelines of other stages whereas the displayof results is realized by way of a graphical user interface(GUI) The big picture of the simulation workflow is shownin Figure 1 In the picture all the grey boxes as well as all thecode needed for synchronization anddata streaming (double-headed arrows) are automatically generated by the FastFlowframework The implementation of the whole software actu-ally consists in declaring the structure of the workflow interms of FastFlow objects (ie farm and pipelines) and filling
BioMed Research International 5
white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed
211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories
The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline
In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic
212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in
the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage
The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows
(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency
(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ
119882centered in the current
data point 119909119894
equiv 119891(119905119894) where 119905
119894equiv 1199050
+ 119894Δ119878(where Δ
119878
is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864
119894 referred to the local
trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888
119899that preserve higher moments that is
to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial
(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period
6 BioMed Research International
Figure 2 Screenshots of the simulation tool interface
between successive peaks the frequency of the related eventis then inferred
The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section
213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2
3 Experiments and Results
The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection
31 Expressivity and Effectiveness
311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are
119860 119860003
997891997888rarr 119860 119860 119860 119860 119860 11986000001
997891997888rarr 119860 119860
119861200
997891997888rarr 119861 119860 11986035
997891997888rarr ∙
(1)
where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law
The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory
Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480
stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize
312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863
+1198621198682 leads the lysogenic
stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863
minus1198621198682 The notation 119863
+1198621198682119863
minus1198621198682 models the
phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the
BioMed Research International 7
500
250
0
500
250
0
500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200
DeviationPo
pula
tion
Popu
latio
n
Popu
latio
n
Time
Time Time
00 25 50 75 100 125 150 175 200
Time
A
Mean A
750
500
250
0
Popu
latio
n 750
A0
A1
A0
A1
21 24
A A
K-means A
(a)
00 25 50 75 100 125 150 175 200
500
250
0
Popu
latio
n500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200
Popu
latio
n
Time
Time
Time
00 25 50 75 100 125 150 175 200
Time
500
250
0
Popu
latio
n
Popu
latio
n750
500
250
0
750
21 24
A A
A
Mean A
A0
A1
Deviation K-means A
A0
A
A
(b)
Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)
promoting site with a global repressing effect) The reactionrules in this system are
119862119868 119862119868005
997891997888rarr 1198621198682 119862119868205
997891997888rarr 119862119868 119862119868
1198621198682 1198630026
997891997888rarr 119863+1198621198682 119863
+11986211986820026
997891997888rarr 1198621198682 119863
1198621198682 1198630026
997891997888rarr 119863minus1198621198682 119863
minus11986211986820026
997891997888rarr 1198621198682 119863
119863+1198621198682 1198621198682
013
997891997888rarr 119863+1198621198682119863
minus1198621198682
119863+1198621198682119863
minus1198621198682013
997891997888rarr 119863+1198621198682 1198621198682
119863+1198621198682 119875
40
997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868
00007
997891997888rarr ∙
(2)
where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the
8 BioMed Research International
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Num
ber o
f mol
ecul
es (C
I)
Time (min)
(a) 120582-phage model simulation
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120
QT
cluste
rs w
ith tr
ends
(CI)
Time (min)
(b) QT clustering
Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT
basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))
Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories
K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters
313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare
FRQin119891FRQin(119905)997891997888rarr FRQin 119872
11987205
997891997888rarr 119872 FRQ
119872
119891119872
997891997888rarr ∙ ⊤ FRQ119891119889
997891997888rarr ∙
FRQ 05997891997888rarr FRQin
FRQin06
997891997888rarr FRQ
(3)
The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V
119904(119905)(119870119899
119868(FRQ119899 + 119870
119899
119868))
denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V
119904(119905) defined by
V119904 (119905)
= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879
200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)
(4)
increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870
119868is related to the threshold beyond
which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891
119872= V119898(119872(119870
119872+ 119872))
The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870
119889+ FRQ)) where V
119889is the maximum rate
of FRQ degradation and theMichaelis constant related to thisprocess is 119870
119889
As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V
119898= 505 V
119889= 140 119896
119904= 05 119896
1= 05 119896
2= 06
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
4 BioMed Research International
Simeng
Gat
her
Simeng
Disp
tach
Incomplete simulation tasks (with load balancing)
Farm
Alig
nmen
t of
traj
ecto
ries
Gen
erat
ion
of
simul
atio
n ta
sks Stat
eng
Gat
her
Disp
tach
Farm
Gen
erat
ion
ofsli
ding
win
dow
sof
traj
ecto
riesRaw
simulationresults Display of
results
graphicaluser
interface
MeanVariance
simulationresults
Filtered
stateng
Simulation pipeline Analysis pipeline
Start new simulations steer and terminate running simulations
Main pipeline
K-means
middot middot middot
middot middot middot
Figure 1 CWC simulator with online parallel analysis architecture
computation of independent instances of the same simula-tion which should anyway be computed to achieve statisticalconvergence of simulated trajectories (as in all Monte Carlomethods) The problem is well understood it has beenexploited in the last two decades inmany different flavors anddistributed computing environments from clusters to grid toclouds [8 11ndash16] Notwithstanding that the problem has beenoften approached either neglecting to consider the cost ofanalysis or assuming that output data has a negligible size thisis not likely to happen in this and next generations biologicalsimulations
The previous considerations have led to the design of asimulation tool that includes both parallel simulation anddata analysis in a single workflowThese phases are pipelinedrather than sequential To make it possible it is needed thatall the logical phases of the process (ie data distributionparallel simulation result gathering parallel trajectory dataassembling and analysis) can be effectively pipelined Thisimplies that all phases can effectively work on data streamsIdeally an efficient and portable implementation of thesimulation-analysis workflow should be able to representstreams as first-class concept and provide the designerswith high-level programming constructs to manage themefficiently (also in multicore platforms)
To date application programming for bioinformatics(and other sciences) has not embraced much more than low-level communication and synchronization libraries In thehierarchy of abstractions it is only slightly above togglingabsolute binary in the front panel of the machine Theadvent and the pervasiveness of multicore platforms arepushing parallel programming outside its historical nicheNext generation software should be designed not only to beefficient on these platforms but also to be developed andtuned with high-productivity and reduced time-to-marketThis is particularly important when parallel computing servesas a tool for other sciences since nonexpert designers shouldbe able to experiment different algorithmic solutions for bothsimulations and data analysis
Attempts to reduce the programming effort by raisingthe level of abstraction through the provision of parallelprogramming frameworks date back at least three decadesand have resulted in a number of significant contribu-tions Notable among these is the skeletal approach [17]
(aka pattern-based parallel programming) which appearsto become increasingly popular after being revamped byseveral successful parallel programming frameworks [18ndash20]Skeletons (aka patterns) capture common parallel program-ming paradigms (eg Map Reduce MapReduce PipelineFarm and DivideampConquer) and make them available to theprogrammer as high-level programming constructs equippedwith well-defined functional and extrafunctional semantics[21] Some of them are specifically designed tomanage streamas first-class objects such as pipeline farm (aka master-worker pattern) and loop patterns
A particularly efficient implementation of the describedpatterns is provided by the FastFlow parallel programmingframework [22] FastFlow is a C++ open-source templatelibrary aimed at simplifying the development of efficientapplications for multicore platforms FastFlow eases thedevelopment and guaranteed runtime efficiency by raisingthe abstraction level of the design phase thus providingdevelopers with a set of optimised parallel programmingpatterns [22 23] Run-time efficiency is mainly achievedby way of a lock-less implementation of patterns exhibitinga speed edge against other popular parallel programmingframeworks (also see performance comparisons in [24])The FastFlow implementation of the pipeline loop andfarm patterns is exploited to design the CWC simulationworkflow The effectiveness of the FastFlow framework inhigh-frequency synchronizations (ie fine-grained tasks) ata high-level of abstraction is the key features to devisea portable and efficient implementation of the simulationworkflow which is CWC simulator with online parallelanalysis architecture composed by a three-stage pipelinesimulation analysis and display of results The former twostages are in turn pipelines of other stages whereas the displayof results is realized by way of a graphical user interface(GUI) The big picture of the simulation workflow is shownin Figure 1 In the picture all the grey boxes as well as all thecode needed for synchronization anddata streaming (double-headed arrows) are automatically generated by the FastFlowframework The implementation of the whole software actu-ally consists in declaring the structure of the workflow interms of FastFlow objects (ie farm and pipelines) and filling
BioMed Research International 5
white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed
211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories
The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline
In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic
212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in
the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage
The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows
(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency
(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ
119882centered in the current
data point 119909119894
equiv 119891(119905119894) where 119905
119894equiv 1199050
+ 119894Δ119878(where Δ
119878
is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864
119894 referred to the local
trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888
119899that preserve higher moments that is
to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial
(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period
6 BioMed Research International
Figure 2 Screenshots of the simulation tool interface
between successive peaks the frequency of the related eventis then inferred
The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section
213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2
3 Experiments and Results
The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection
31 Expressivity and Effectiveness
311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are
119860 119860003
997891997888rarr 119860 119860 119860 119860 119860 11986000001
997891997888rarr 119860 119860
119861200
997891997888rarr 119861 119860 11986035
997891997888rarr ∙
(1)
where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law
The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory
Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480
stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize
312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863
+1198621198682 leads the lysogenic
stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863
minus1198621198682 The notation 119863
+1198621198682119863
minus1198621198682 models the
phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the
BioMed Research International 7
500
250
0
500
250
0
500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200
DeviationPo
pula
tion
Popu
latio
n
Popu
latio
n
Time
Time Time
00 25 50 75 100 125 150 175 200
Time
A
Mean A
750
500
250
0
Popu
latio
n 750
A0
A1
A0
A1
21 24
A A
K-means A
(a)
00 25 50 75 100 125 150 175 200
500
250
0
Popu
latio
n500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200
Popu
latio
n
Time
Time
Time
00 25 50 75 100 125 150 175 200
Time
500
250
0
Popu
latio
n
Popu
latio
n750
500
250
0
750
21 24
A A
A
Mean A
A0
A1
Deviation K-means A
A0
A
A
(b)
Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)
promoting site with a global repressing effect) The reactionrules in this system are
119862119868 119862119868005
997891997888rarr 1198621198682 119862119868205
997891997888rarr 119862119868 119862119868
1198621198682 1198630026
997891997888rarr 119863+1198621198682 119863
+11986211986820026
997891997888rarr 1198621198682 119863
1198621198682 1198630026
997891997888rarr 119863minus1198621198682 119863
minus11986211986820026
997891997888rarr 1198621198682 119863
119863+1198621198682 1198621198682
013
997891997888rarr 119863+1198621198682119863
minus1198621198682
119863+1198621198682119863
minus1198621198682013
997891997888rarr 119863+1198621198682 1198621198682
119863+1198621198682 119875
40
997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868
00007
997891997888rarr ∙
(2)
where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the
8 BioMed Research International
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Num
ber o
f mol
ecul
es (C
I)
Time (min)
(a) 120582-phage model simulation
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120
QT
cluste
rs w
ith tr
ends
(CI)
Time (min)
(b) QT clustering
Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT
basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))
Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories
K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters
313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare
FRQin119891FRQin(119905)997891997888rarr FRQin 119872
11987205
997891997888rarr 119872 FRQ
119872
119891119872
997891997888rarr ∙ ⊤ FRQ119891119889
997891997888rarr ∙
FRQ 05997891997888rarr FRQin
FRQin06
997891997888rarr FRQ
(3)
The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V
119904(119905)(119870119899
119868(FRQ119899 + 119870
119899
119868))
denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V
119904(119905) defined by
V119904 (119905)
= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879
200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)
(4)
increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870
119868is related to the threshold beyond
which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891
119872= V119898(119872(119870
119872+ 119872))
The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870
119889+ FRQ)) where V
119889is the maximum rate
of FRQ degradation and theMichaelis constant related to thisprocess is 119870
119889
As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V
119898= 505 V
119889= 140 119896
119904= 05 119896
1= 05 119896
2= 06
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 5
white boxes with sequential code All data is passed by usingtheir references in such a way that no data copies in memoryare needed
211 The Simulation Pipeline The simulation pipeline asshown in Figure 1 is composed by three main stages (i)generation of simulation tasks (ii) farm of simulation enginesand (iii) alignment of trajectories
The input of the simulation pipeline (from the GUI ora file) is the model to be simulated and the parameters ofthe simulation The output is a stream of simulation resultseach of them being an array holding a point for each ofthe trajectories of all (independent) simulations aligned at agiven simulation time Actually each array represents a cutat a given simulation time of the whole dataset of resultsThis does not necessarily represent the current status (at agiven point in wall-clock time) of all running simulationsBy their very nature stochastic processes exhibit an irregularbehavior in space and time since different simulations maycover the same simulation timespan followingmanydifferentrandomly chosen paths and number of iterations Thereforeparallelization tools should support the dynamic and activebalancing of workload across the involved cores This mainlymotivates the structure of the simulation pipeline The firststage generates a number of independent simulation taskseach of them wrapped in a C++ object These objects arepassed to the farm of simulation engines which dispatchthem (on-demand) to a number of simulation engines (simeng) Each simulation engine brings forward a simulationfor a given simulation time (simulation quantum) thenreschedules back the simulation along the feedback channelSimulation results produced in this quantum are streamedtoward the next stage which sorts out all the received resultsand aligns them according to simulation time Once allsimulation tasks overcome a given simulation time an arrayof results is produced and streamed to the analysis pipeline
In this process the farm schedule prioritizes ldquoslowrdquosimulation tasks in such a way that the simulation tasksproceed with simulation times the more aligned as possibleThis solves the load balancing problem by keeping all sim-ulation engines always busy and reduces to the minimumthe transient storage of incomplete results thus reducing theshared memory traffic
212 The Analysis Pipeline By design each snapshot at agiven simulation time of all simulation trajectories (ie anarray of simulation results) can be analyzed immediatelyand independently (thus concurrently) on each other Forexample the mean and variance (as well as other statisticalestimators) can be immediately computed and streamed outto the display stage More complex analyses that is theones aimed to understand system dynamics have furtherrequirements In the most general case they require theaccess to the whole dataset Unfortunately this can be hardlydone with a fully online process In many cases howeverit is possible to derive reasonable approximations of theseanalyses from a sliding window of the whole dataset (egfor trajectory clustering) For this the stream incoming in
the analysis pipeline is passed through a stage that creates astream of (partially overlapping) sliding windows of trajec-tories cuts Each sliding window can be eventually processedin parallel and therefore is dispatched to a farm of statisticengines Results are collected and reordered (ie gathered)and streamed toward the user interface and the permanentstorage
The analysis pipeline is provided with three families ofpredefined estimators covering the most common instru-ments of statistical analyses Thanks to high-level modulardesign of the simulation pipeline it can be easily extendedwith new filters Current filters included in the system are asfollows
(1) Mean Standard Deviation and Quantiles These standardstatistical estimators are typically used to evaluate bothqualitatively and quantitatively the behavior of stable systemsand the reliability of the stochastic models used for theirsimulation Quantiles are also often useful to approximatethe distribution of simulation trajectories over time as it per-forms a histogram which summarizes the involved quantitieswithout the effects of long-tailored asymmetric distributionor outliers In fact in those cases descriptive statistics couldnot underline a central tendency
(2) Trajectory Clustering The clustering of trajectories helpsthe analysis of biological systems exhibiting a multistablebehavior Each cluster can automatically separate and dis-tinguish different cases which can be eventually analyzed bystatistical estimators Concentrations of elements in a giveninstant from all simulations are numerically filtered fromstochastic noise and the global trends are extrapolated fromclusters In this work we employed two clustering techniquesK-means [25] and quality threshold (QT) [26] clusteringThe clustering procedure collects the filtered data containedinto a sliding time window Δ
119882centered in the current
data point 119909119894
equiv 119891(119905119894) where 119905
119894equiv 1199050
+ 119894Δ119878(where Δ
119878
is a constant sampling time) for all simulation trajectoriesand the extrapolated forecast point 119909119864
119894 referred to the local
trend using the information of the Savitzky-Golay filter[27] that is a low-pass filter suitable for data smoothingThe main idea underneath Savitzky-Golay filtering is to findfilter coefficients 119888
119899that preserve higher moments that is
to approximate the underlying function within the movingwindow not by a constant (whose estimate is the average)but by a polynomial of higher order This schema also allowsthe computation of numerical derivatives considering thecoefficient of the derived polynomial
(3) PeakFrequency Detection Many processes in livingorganisms are oscillatory For these kinds of systems theanalysis must be focused on the recurrence of phenomenafor instance concentration spikes or peaks of biologicalquantities which also make it possible to determine thefrequency of occurrence of a given phenomenon The peakdetection is basically performed by way of the analysis ofthe local maximum in a continuous curve which is inturn detected through the analysis of the derivatives of thecurve estimated by the Savitzky-Golay filter From the period
6 BioMed Research International
Figure 2 Screenshots of the simulation tool interface
between successive peaks the frequency of the related eventis then inferred
The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section
213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2
3 Experiments and Results
The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection
31 Expressivity and Effectiveness
311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are
119860 119860003
997891997888rarr 119860 119860 119860 119860 119860 11986000001
997891997888rarr 119860 119860
119861200
997891997888rarr 119861 119860 11986035
997891997888rarr ∙
(1)
where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law
The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory
Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480
stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize
312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863
+1198621198682 leads the lysogenic
stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863
minus1198621198682 The notation 119863
+1198621198682119863
minus1198621198682 models the
phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the
BioMed Research International 7
500
250
0
500
250
0
500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200
DeviationPo
pula
tion
Popu
latio
n
Popu
latio
n
Time
Time Time
00 25 50 75 100 125 150 175 200
Time
A
Mean A
750
500
250
0
Popu
latio
n 750
A0
A1
A0
A1
21 24
A A
K-means A
(a)
00 25 50 75 100 125 150 175 200
500
250
0
Popu
latio
n500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200
Popu
latio
n
Time
Time
Time
00 25 50 75 100 125 150 175 200
Time
500
250
0
Popu
latio
n
Popu
latio
n750
500
250
0
750
21 24
A A
A
Mean A
A0
A1
Deviation K-means A
A0
A
A
(b)
Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)
promoting site with a global repressing effect) The reactionrules in this system are
119862119868 119862119868005
997891997888rarr 1198621198682 119862119868205
997891997888rarr 119862119868 119862119868
1198621198682 1198630026
997891997888rarr 119863+1198621198682 119863
+11986211986820026
997891997888rarr 1198621198682 119863
1198621198682 1198630026
997891997888rarr 119863minus1198621198682 119863
minus11986211986820026
997891997888rarr 1198621198682 119863
119863+1198621198682 1198621198682
013
997891997888rarr 119863+1198621198682119863
minus1198621198682
119863+1198621198682119863
minus1198621198682013
997891997888rarr 119863+1198621198682 1198621198682
119863+1198621198682 119875
40
997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868
00007
997891997888rarr ∙
(2)
where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the
8 BioMed Research International
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Num
ber o
f mol
ecul
es (C
I)
Time (min)
(a) 120582-phage model simulation
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120
QT
cluste
rs w
ith tr
ends
(CI)
Time (min)
(b) QT clustering
Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT
basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))
Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories
K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters
313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare
FRQin119891FRQin(119905)997891997888rarr FRQin 119872
11987205
997891997888rarr 119872 FRQ
119872
119891119872
997891997888rarr ∙ ⊤ FRQ119891119889
997891997888rarr ∙
FRQ 05997891997888rarr FRQin
FRQin06
997891997888rarr FRQ
(3)
The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V
119904(119905)(119870119899
119868(FRQ119899 + 119870
119899
119868))
denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V
119904(119905) defined by
V119904 (119905)
= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879
200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)
(4)
increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870
119868is related to the threshold beyond
which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891
119872= V119898(119872(119870
119872+ 119872))
The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870
119889+ FRQ)) where V
119889is the maximum rate
of FRQ degradation and theMichaelis constant related to thisprocess is 119870
119889
As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V
119898= 505 V
119889= 140 119896
119904= 05 119896
1= 05 119896
2= 06
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
6 BioMed Research International
Figure 2 Screenshots of the simulation tool interface
between successive peaks the frequency of the related eventis then inferred
The usage examples of each family of filters will bediscussed in the ldquoresultsrdquo section
213 The Graphical User Interface The CWC simulation-analysis pipeline is wrapped in a back-end tool that can besteered either via a command line tool or via a graphicaluser interface This makes it possible to design the biologicalmodel to run simulations and analysis and to view partialresults during the run Also the front-end makes it possibleto control the simulation workflow from a remote machineTwo screenshots of the graphical front-end are reported inFigure 2
3 Experiments and Results
The evaluation of the integrated approach will be focusedon the efficiency and speedup of the tool in executingthe simulation and online analysis workflow on multicoreplatforms In this respect paradigmatic examples of twochallenging classes of biological systems are discussed thatis bistablemultistable and oscillatory systems The keybehavior of these systems is studied by way of the differentclasses of online analysis tools introduced in the previoussection in particular statistical description clustering andpeakfrequency detection
31 Expressivity and Effectiveness
311 Multistable Biological System (Schlogl Model) One ofthe most studied examples of bistability is the Schlogl model[28] The simplicity of this network makes it an idealprototype to show the effectiveness of the online clusteringtechniques on the filtered trajectories in the presence ofbimodalityThe set of reaction rules modeling this system are
119860 119860003
997891997888rarr 119860 119860 119860 119860 119860 11986000001
997891997888rarr 119860 119860
119861200
997891997888rarr 119861 119860 11986035
997891997888rarr ∙
(1)
where the rules are decoratedwith the kinetic constants of thecorresponding reactions and all reaction rates are evaluatedaccording to the mass action law
The number of molecules of the species 119861 is kept constant(buffered) while at equilibrium the species 119860 displays anoise-induced switching between the two stable steady states(see Figure 3) This case is paradigmatic to show that simplemean and standard deviation are not significant to summa-rize the overall behavior and the mean is not representativeof any simulation trajectory
Figure 3 shows the resulting clusters computed onlineusing K-means on the Schlogl model for species 119860 over 480
stochastic simulations starting with the term 119861 250 lowast 119860 Thelines width of theK-means plot is proportional to each clustersize
312 Multistable System (Bacteriophage 120582 Life Cycle) One ofthe best studied examples of multistability in genetic systemsis the bacteriophage 120582 life cycle [29] This process involvestwo different biological entities delimited by membranes thephage and the bacterium Lambda phage is a virus consistingof a head containing a double-stranded linear DNA and atail The phage recognizes and binds to its host EscherichiaColi causing the DNA in the head of the phage to be ejectedthrough the tail into the cytoplasm of the bacterial cell Afterthis it can enter into one of two alternative stages calledlysogeny and lysis The lysogenic stage is a dormant stagein which the phage DNA is inserted into the host DNAand passively reproduces with the host The only proteinexpressed in this phase is the 120582 repressor 119862119868 When thehost becomes stressed the phage is more likely to go intolysis in which case it reproduces more phages kills the hostand spreads to other bacteria The decision between lysisand lysogeny can be thought of as a switching mechanismA simplified model for the bacteriophage was proposed in[30] In their model the gene cI expresses the 120582 repressor(119862119868) which dimerises (1198621198682) and binds to DNA (119863) as atranscription factor at either of the two binding sites Thebinding of the transcription factor to the site enhancingthe transcription of 119862119868 (positive feedback) is represented by119863+1198621198682 The phagic DNA in state 119863
+1198621198682 leads the lysogenic
stage The binding of the transcription factor to the siterepressing the transcription of 119862119868 (negative feedback) isdenoted by 119863
minus1198621198682 The notation 119863
+1198621198682119863
minus1198621198682 models the
phagic DNA when both sites are bound (1198621198682 can bind to therepressing site also when another 1198621198682 dimer is bound to the
BioMed Research International 7
500
250
0
500
250
0
500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200
DeviationPo
pula
tion
Popu
latio
n
Popu
latio
n
Time
Time Time
00 25 50 75 100 125 150 175 200
Time
A
Mean A
750
500
250
0
Popu
latio
n 750
A0
A1
A0
A1
21 24
A A
K-means A
(a)
00 25 50 75 100 125 150 175 200
500
250
0
Popu
latio
n500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200
Popu
latio
n
Time
Time
Time
00 25 50 75 100 125 150 175 200
Time
500
250
0
Popu
latio
n
Popu
latio
n750
500
250
0
750
21 24
A A
A
Mean A
A0
A1
Deviation K-means A
A0
A
A
(b)
Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)
promoting site with a global repressing effect) The reactionrules in this system are
119862119868 119862119868005
997891997888rarr 1198621198682 119862119868205
997891997888rarr 119862119868 119862119868
1198621198682 1198630026
997891997888rarr 119863+1198621198682 119863
+11986211986820026
997891997888rarr 1198621198682 119863
1198621198682 1198630026
997891997888rarr 119863minus1198621198682 119863
minus11986211986820026
997891997888rarr 1198621198682 119863
119863+1198621198682 1198621198682
013
997891997888rarr 119863+1198621198682119863
minus1198621198682
119863+1198621198682119863
minus1198621198682013
997891997888rarr 119863+1198621198682 1198621198682
119863+1198621198682 119875
40
997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868
00007
997891997888rarr ∙
(2)
where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the
8 BioMed Research International
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Num
ber o
f mol
ecul
es (C
I)
Time (min)
(a) 120582-phage model simulation
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120
QT
cluste
rs w
ith tr
ends
(CI)
Time (min)
(b) QT clustering
Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT
basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))
Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories
K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters
313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare
FRQin119891FRQin(119905)997891997888rarr FRQin 119872
11987205
997891997888rarr 119872 FRQ
119872
119891119872
997891997888rarr ∙ ⊤ FRQ119891119889
997891997888rarr ∙
FRQ 05997891997888rarr FRQin
FRQin06
997891997888rarr FRQ
(3)
The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V
119904(119905)(119870119899
119868(FRQ119899 + 119870
119899
119868))
denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V
119904(119905) defined by
V119904 (119905)
= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879
200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)
(4)
increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870
119868is related to the threshold beyond
which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891
119872= V119898(119872(119870
119872+ 119872))
The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870
119889+ FRQ)) where V
119889is the maximum rate
of FRQ degradation and theMichaelis constant related to thisprocess is 119870
119889
As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V
119898= 505 V
119889= 140 119896
119904= 05 119896
1= 05 119896
2= 06
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 7
500
250
0
500
250
0
500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200 00 25 50 75 100 125 150 175 200
DeviationPo
pula
tion
Popu
latio
n
Popu
latio
n
Time
Time Time
00 25 50 75 100 125 150 175 200
Time
A
Mean A
750
500
250
0
Popu
latio
n 750
A0
A1
A0
A1
21 24
A A
K-means A
(a)
00 25 50 75 100 125 150 175 200
500
250
0
Popu
latio
n500
250
0
00 25 50 75 100 125 150 175 200
00 25 50 75 100 125 150 175 200
Popu
latio
n
Time
Time
Time
00 25 50 75 100 125 150 175 200
Time
500
250
0
Popu
latio
n
Popu
latio
n750
500
250
0
750
21 24
A A
A
Mean A
A0
A1
Deviation K-means A
A0
A
A
(b)
Figure 3 Simulation results on the Schlogl model The figures report the mean and standard deviation two exemplificative raw simulationtrajectories and the clustering results using K-means during the simulation runs (a) and at the end of the simulation runs (b)
promoting site with a global repressing effect) The reactionrules in this system are
119862119868 119862119868005
997891997888rarr 1198621198682 119862119868205
997891997888rarr 119862119868 119862119868
1198621198682 1198630026
997891997888rarr 119863+1198621198682 119863
+11986211986820026
997891997888rarr 1198621198682 119863
1198621198682 1198630026
997891997888rarr 119863minus1198621198682 119863
minus11986211986820026
997891997888rarr 1198621198682 119863
119863+1198621198682 1198621198682
013
997891997888rarr 119863+1198621198682119863
minus1198621198682
119863+1198621198682119863
minus1198621198682013
997891997888rarr 119863+1198621198682 1198621198682
119863+1198621198682 119875
40
997891997888rarr 119863+1198621198682 119875 1198621198682 1198621198682 119862119868
00007
997891997888rarr ∙
(2)
where 119875 represents the RNA polymerase assumed here tobe constant and two proteins per mRNA transcript wereconsidered In this model the stochastic time trajectories of119862119868 switch between two stable equilibria if the noise amplitudeis sufficient to drive the trajectories occasionally out of the
8 BioMed Research International
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Num
ber o
f mol
ecul
es (C
I)
Time (min)
(a) 120582-phage model simulation
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120
QT
cluste
rs w
ith tr
ends
(CI)
Time (min)
(b) QT clustering
Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT
basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))
Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories
K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters
313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare
FRQin119891FRQin(119905)997891997888rarr FRQin 119872
11987205
997891997888rarr 119872 FRQ
119872
119891119872
997891997888rarr ∙ ⊤ FRQ119891119889
997891997888rarr ∙
FRQ 05997891997888rarr FRQin
FRQin06
997891997888rarr FRQ
(3)
The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V
119904(119905)(119870119899
119868(FRQ119899 + 119870
119899
119868))
denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V
119904(119905) defined by
V119904 (119905)
= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879
200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)
(4)
increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870
119868is related to the threshold beyond
which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891
119872= V119898(119872(119870
119872+ 119872))
The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870
119889+ FRQ)) where V
119889is the maximum rate
of FRQ degradation and theMichaelis constant related to thisprocess is 119870
119889
As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V
119898= 505 V
119889= 140 119896
119904= 05 119896
1= 05 119896
2= 06
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
8 BioMed Research International
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Num
ber o
f mol
ecul
es (C
I)
Time (min)
(a) 120582-phage model simulation
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120
QT
cluste
rs w
ith tr
ends
(CI)
Time (min)
(b) QT clustering
Figure 4 Simulation results on the 120582-phagemodel (a) Reports (approximatively) the 480 raw trajectories and (b) shows the online clusteringresults using QT
basin of attraction of one equilibrium into the basin ofattraction of the other equilibrium (see Figure 4(a))
Figure 4(b) shows the resulting clusters (gray circles)computed online using QT on the 120582-phage model for species119862119868 over 1200 stochastic simulations starting with the term10 lowast 119862119868 119863 119875 Circles diameters are proportional to eachcluster size and arrows display the local trends of the clusteredtrajectories
K-means is suitable for stable switch systems wherethe number of clusters and their tendencies are known inadvance in the other cases QT although more computation-ally expensive can build accurate partitions of trajectoriesgiving evidence of instabilities with a dynamic number ofclusters
313 Oscillatory System (Circadian Oscillations of Neu-rospora) We examine here the theoretical model for circa-dian oscillations based on transcriptional regulation of thefrequency (frq) gene in the fungus Neurospora The modelrelies on the feedback exerted on the expression of the frqgene by its protein product FRQ sdot FRQin represents the FRQprotein inside the nucleus In this model sustained rhythmicvariations in protein andmRNA (119872) levels occur in the formof limit cycle oscillations [31] In describing this system weexploit a feature of CWC which allows computing the rateof some reaction with an ad hoc function used to representnonstandard kinetics The reaction rules modeling this caseare
FRQin119891FRQin(119905)997891997888rarr FRQin 119872
11987205
997891997888rarr 119872 FRQ
119872
119891119872
997891997888rarr ∙ ⊤ FRQ119891119889
997891997888rarr ∙
FRQ 05997891997888rarr FRQin
FRQin06
997891997888rarr FRQ
(3)
The model is based on the negative feedback exerted bythe protein FRQ on the transcription of the frq gene the rateof gene expression is enhanced by light The model includesgene transcription in the nucleus accumulation of the cor-responding mRNA in the cytosol with the associated proteinsynthesis protein transport into and out of the nucleus andregulation of gene expression by the nuclear form of the FRQprotein The function 119891FRQ(119905) = V
119904(119905)(119870119899
119868(FRQ119899 + 119870
119899
119868))
denotes the rate of frq transcription where FRQ denotes thenumber of FRQelements at themoment inwhich the reactiontakes place The parameter V
119904(119905) defined by
V119904 (119905)
= 160 when 2119899119879 le 119905 lt (2119899 + 1) 119879
200 when (2119899 + 1) 119879 le 119905 lt (2119899 + 2) 119879(119899 ge 0)
(4)
increases in light conditions of the current time of thesimulation where 119879 represents the period of the dark-lightphases The constant 119870
119868is related to the threshold beyond
which nuclear FRQ represses frq transcription the Hillcoefficient 119899 characterizes the degree of cooperativity of therepression process In the functions the name of an atomindicates its multiplicity The mRNA degradation is givenby the Michaelis rate function 119891
119872= V119898(119872(119870
119872+ 119872))
The FRQ degradation is given by the Michaelis rate function119891119889= V119889(FRQ(119870
119889+ FRQ)) where V
119889is the maximum rate
of FRQ degradation and theMichaelis constant related to thisprocess is 119870
119889
As in [31] wemodeled the oscillations under two differentconditions (i) constant dark conditionand (ii) alternate lightand dark phases Following [31] the values of the parametersare set as V
119898= 505 V
119889= 140 119896
119904= 05 119896
1= 05 119896
2= 06
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 9
0
200
400
600
800
1000
1200
0 12 24 36 48 60 72140
160
180
200
220an
d FR
Q-m
RNA
(M)
Time (h)
Tota
l (F
t) a
nd n
ucle
ar (F
n) F
RQ
Max
ium
um F
RQ tr
ansc
riptio
n ra
te
Ft
Fn
M
Vs
Vs
(nm
ol(
hlowast1
00
))(a) Cytosolic FRQ oscillations in lightdark conditions
20
21
22
23
24
25
26
0 24 48 72 96 120 144 168 192 216 240
FRQ
-mRN
A ti
me p
erio
d (h
)
Time (h)
Dark onlyAlternate 12 h-dark12 h-light
(b) Peaks frequency of the cytosolic FRQ
Figure 5 Simulation results of the cytosolic FRQ protein of the Neurosporamodel
119870119898
= 50119870119868= 100119870
119889= 13 and 119899 = 4 Concentrations have
been made discrete by scaling 1 nM to 100 atomic elementsIn the constant dark condition parameter V
119904is equal to
160 in the alternate condition V119904is equal to 160 during the
dark phase and to 200 during the light phase Figure 5(a)shows an extract of a single stochastic simulation of thecircadian oscillations in the darklight alternate conditionplotting the number of FRQ proteins within the nucleus thetotal number of FRQ proteins in the cell and the number ofmRNA molecules leading the synthesis of FRQ Figure 5(b)shows the outcome of the peak detection tool which is ableto summarize the frequency of the peak events over timeThe plot results after capturing the peaks in the curve of thecytosolic mRNA for the FRQ protein synthesis Measuringthe distance between two consecutive peaks we compute theperiod of each oscillation and then plot the moving averageover 5000 simulations of the local periods In the constantdark condition the circadian period is close to 21 and a halfhours but increases producing damping oscillations with aperiod of approximately 24 hours in the darklight alternatecondition
32 Performances All reported experiments were run onan Intel workstation equipped with 4 eight-core E7-4820Nehalem (32 cores 64 contexts) 20GHz with 18MB L3cache and 64GBytes of main memory with Linux x86 64
The analysis pipeline is configured with 3 statistic enginesexecutingmean standard deviation quantilesK-means QTand frequency detection filters For each experiment the totalnumber of FastFlow nodes that is the boxes depicted withsolid lines in Figure 1 is
(sim eng) + (stat eng) + (other nodes)
+ (FastFlow support nodes) = (sim eng) + 3 + 3 + 4
(5)
where other nodes are ldquogeneration if simulation tasksrdquo ldquoalign-ment of trajectoriesrdquo and ldquogeneration if sliding windows oftrajectoriesrdquo nodes whereas FastFlow support nodes are thetwo couples of dispatch-gather nodes in Figure 1 Each nodein the FastFlow run-time support is implemented by a POSIX(portable operating system interface for uniX) thread using anonblocking execution model
As we shall see the number of statistical engines hasbeen chosen according to a simple but effective performancemodel which is made possible by the high-level approach ofthe design According to the samemodel themost interestingsensitivity analysis under performance viewpoint concernsthe number of simulation engines
As a case study we consider the simulation workflow forthe transcriptional regulation of the Neurospora
Figure 6 shows the speedup obtained for the wholeworkflow on varying the number of concurrent simulationengines where the simulation points (or samples) per trajec-tory is set to be 10
4 and 106 simulation points
The speedup on the total execution time achieved inthe former case (Figure 6(a)) scales ideally with respect tothe number of simulation engines whereas a performancepenalty is paid in the latter case (Figure 6(b)) for the highestdegree of parallelism and number of produced trajectories
The very same speedup behaviour is achieved for othertest cases and it is worth a detailed discussion For eachperformance experiment all the runs are executed by fixingrandom seeds Thus given a set of simulation parametersit can be verified that each stochastic simulation of a singletrajectory requires exactly the same number of iterations andthe simulated time progress identically across random walksirrespectively of the number of simulationstatistic enginesand observed simulation points which can be considereda (synchronized) sampling at fixed simulation times oftrajectories These observations imply the following
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
10 BioMed Research International
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(a)
5
10
15
20
25
30
5 10 15 20 25 30
Spee
dup
Number of simulation engines (sim eng)
Ideal240 trajectories
480 trajectories1200 trajectories
(b)
Figure 6 Speedup of the workflow of the Neurospora model on the Intel platform against number of simulation engines with 3 statisticalengines for different number of trajectories each of them counting 104 points (a) and 105 points (b)
(i) The parallelism strategy does not break determinismand reproducibility of results (correctness)
(ii) As reflected in the speedup results the design of thesimulator ensures effective load balancing and lowsynchronization overheads
(iii) The efficiency of parallel executions depends on theorder of magnitude of the observed simulation pointsand by the number of produced trajectories
This latter point specifically exploits the working hypoth-esis stochastic methods are particularly informative whenused to simulate the model at high resolution that is highnumber of samples and trajectories In this case the mainbottleneck of the simulation software is data movement andmanagement since the computationdata-movement ratiomay easily reach the limits of modern multicore platforms
In multicore platforms ldquoobservingrdquo the phenomena isa key issue in the simulation-analysis workflow as the fre-quency of observation determines both the quality of resultsand inversely the overall speedup As shown in Figure 6 andTable 1 the proposed design and implementation effectivelycope with this trade-off and succeed to exploit high ratesof data movement Thanks to merging many independenttrajectories happening in the simulation pipeline the outputsize and thus the required disk throughput is greatlyreduced (unless the storage of raw simulation results happen-ing among the two pipelines is requested by the user)
The proposed simulation architecture is not only fast butalso highly predictable in term of performance This latteraspect is mainly due to the high-level structured design [32]The whole workflow is a pipeline of two pipelines (ie apipeline) whose performance can be modeled by means ofthe service time (119879119904) of each stage 119878
119894 In particular
119879119904 (pipeline (1198781 119878119896)) = max 119879119904 (119878
1) 119879119904 (119878
119896) (6)
Table 1 Performance on 1200 simulation instances of the Neu-rospora model (Intel 32 core platform)
Single trajectory information Overall data(20 sim eng 3 stat eng)
Number ofsamples Interarrival time Throughput Output size
104 2586 120583s 270MBs 8240MB
105 278 120583s 2859MBs 82398MB
106 23268 ns 30386MBs 824GB
where 119879119904(119878119894) models the average interdeparture time of
stream items of the stage 119894 of the pipeline which actuallymatches the average computation time of 119878
119894to produce one
stream item In turn some of the stages are farm whichexploit 119899 independent replicas of a (sequential or parallel)worker for example simulation and static engines Its servicetime can be modeled as
119879119904 (farm (119882 119899)) =119879119904 (119882)
119899 (7)
Given the service time of each sequential stage forexample measured in the sequential code these equationscan be also used to tune the optimal number of workers 119899
for any new simulation problem and to understand its upperbound in term of speedup As an example in the Neurosporawith 10
5 samples test case the sequential code exhibits thefollowing timing per trajectory 119879119904(generation)sim0 119879119904(simeng) = 53 s119879119904(alignment) = 011 s119879119904(windows generation) =002 s and119879119904(stat eng) = 033 s with a total execution time foreach trajectory of sim58 s (sim120 minutes for 1200 trajectories)Among those sim eng and stat eng are used within a farmthus their service time can be reduced by increasing thenumber of workers Therefore the maximum performanceand efficiency of the whole workflow are reached when the
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 11
two farms are tuned to match the service time of the slowestsequential stage that is the alignment stage 011 s For thisthe farm in the simulation pipeline should be configuredwith119899 = 53011 asymp 48 workers whereas the farm in the analysispipeline with 119899 = 033011 = 3workersThe overall speedupupper bound can be obtained using the total executiontime and the slowest stage service time that is maximumspeedup achievable for this test case is asymp58011 = 53 whichincludes the contributes from both pipeline and farm Theanalysis despite being approximated since it does not includesynchronization overheads and memory bandwidth limitsis adherent of results depicted in the left plot of Figure 6The speedup linearly grows with the number of simulationengines in the 119899 = [1 sdot sdot sdot 32] rangeThe primary reasons of theslight performance drop in the right end of the plots is due tothe fact that more virtual cores (ie hyperthread contexts)than physical core are used and the increased memorytraffic for high numbers of trajectories Furthermore theperformance analysis highlights that the bottleneck of thearchitecture for high throughput problems which is in thealignment of trajectory stage Its parallelization which canbe addressed by pipelining simulation engines and a partialalignment stage within the farm is among future works
However this simple reasoning does not apply when abig number of trajectories are modeled In fact in such casesthe main architecture bottleneck when using a high numberof simulation engines is the memory bandwidth limit of theunderlying platform Such effect can be seen in the right plotof Figure 6 for the case of 1200 trajectories
4 Discussion and Related Works
In the field of biological modeling tools such as SPiM [3334] and Dizzy [35] have been used to capture first orderapproximations to system dynamics using a combination ofstochastic simulation and differential equation approxima-tion SPiM has long been the standard tool for simulatingstochastic 120587 calculus models
Bio-PEPA [36] is a timed process algebra designed forthe description of biological phenomena and their analysisthrough quantitative methods such as stochastic simulationand probabilistic model-checking Two software tools areavailable for modeling with Bio-PEPA the Bio-PEPA Work-bench and the Bio-PEPA Eclipse Plugin [37]
The parallelization of stochastic simulators has beenextensively studied in the last two decades Many of theseefforts focus on distributed architectures Our work differsfrom these efforts in three aspects (see discussion below)(1) it addresses multicore-specific parallelization issues (2)it advocates a general parallelization schema rather thana specific simulator and (3) it addresses the online dataanalysis thus it is designed to manage large streams of dataTo the best of our knowledge many related works cover someof these aspects but few of them address all three aspects
The Swarm algorithm [38] which is well suited forbiochemical pathway optimisation has been used in a dis-tributed environment An example is Grid Cellware [39] agrid-based modeling and simulation tool for the analysis of
biological pathways that offers an integrated environment forseveralmathematical representations ranging from stochasticto deterministic algorithms
DiVinE is a general distributed verification environmentmeant to support the development of distributed enumerativemodel checking algorithms including probabilistic analysisfeatures used for biological systems analysis [40]
StochKit [41] is a C++ stochastic simulation frameworkAmong othermethods it implements the Gillespie algorithmand in its second version it targets multicore platforms it istherefore similar to our work It does not implement onlinetrajectory reduction that is performed in a postprocessingphase A first form of online reduction of simulation trajec-tories has been experimented within StochKit-FF [11] whichis an extension of StochKit using the FastFlow runtime
In [14] a parallel computing platform has been employedto simulate a large biochemical network in hundreds differentcellular volumes using Gillespie stochastic simulation algo-rithm on multiple processors Parallel computing techniquesmade it possible to run massive simulations in reasonablecomputational times However the analysis of the simulationresults to characterize the intrinsic noise of the networkhas been done as a postprocessing step We believe ourparallelization framework could further improve those kindsof analyses
Hy3S software package [13] that includes hybrid stochas-tic simulation algorithms and SRSim [42] that performs rule-based spatial modeling are both embarrassingly parallelizedby way of the MPI (message passing interface) library Inthis case high latencies and communication connectionproblems of the computing clusters could decrease the speedefficiency
An efficient parallelization of Gillespiersquos SSA on GPGPUhas been presented by Li and Petzold [43] where paral-lelization is obtained by distributing the computation ofeach trajectory to a distinct unit Online data analysis hasnot been addressed segmentation of threads and onlinealignment of outputs seem difficult to achieve owing tothe sharing restrictions imposed by GPGPU StochSimGPU[44] exploits GPGPU for parallel stochastic simulations ofbiological systems The tool allows computing averages andhistograms of the molecular populations across the sampledrealizations on the GPGPUThe tool leverages on a GPGPU-accelerated version of the MATLAB framewosrk that canbe hardly compared in flexibility and performance witha C++ implementation A GPGPU implementation of theCWC simulator is actually under development In particularGPGPU exploitation appears to be particularly suitable forthe analysis of spatial models (see [45ndash48])
A schematic comparison of the main features of thebiological simulation tools cited above is reported in Table 2
This paper does not provide computational comparisonswith other systems Actually a comparative analysis of theperformance of the presented framework with respect toother simulators would not be particularly informative forthe following reasons (1) the computational cost of thesimulations presented in this paper is mainly dependenton the parameters of the output sampling (time neededto write on disk) (2) the performance of our system is
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
12 BioMed Research International
Table 2 Biological simulation tools comparison
Tool Calculus Simulation schema Parallelism Data analysisSCWC CWC Gillespie FastFlow Online statisticsSPiM 120587-calculus Gillespie None NoneDizzy Reaction model Gillespie Gibson-Bruck Tau-Leap ODE None NoneBioPEPA Process algebra ODE Gillespie None NoneCellware Reaction model Gillespie Gibson-Bruck ODE None NoneDiVinE Model checker ODE MPI NoneStochKit Reaction model Gillespie Tau-leaping MPI PostprocessingStochKit2 Reaction model Gillespie Tau-leaping Multithread PostprocessingStochKit-FF Reaction model Gillespie Tau-leaping FastFlow Online statisticsHy3S Reaction model Gibson-Bruck Hybrid MPI PostprocessingLi and Petzoldrsquos Reaction model Gillespie GPGPU NoneStochSimGPU Reaction model Gillespie Gibson-Bruck Li GPGPU Postprocessing
partially affected by the computational cost of the run timestatistics which the other simulators do not provide (thisaccounts eg of about 6 of the total simulation cost in theNeurospora example with 10
5 samples) (3) CWC presentedin this paper in a simplified form for the sake of readabilityuses a pattern matching algorithm in a compartmentalisedsetting (this makes it more expressive but computationallymore expensive when compared with stochastic simulatorsimplementing the Gillespiersquos algorithm in a flat scenario) and(4) another feature of CWC is the use of complex functionsfor computing the rate of reactions So for example themodel for the analysis of circadian oscillations of Neurosporacannot be directly simulated by Gillespiersquos algorithm
5 Conclusions
In this paper we focused on a methodology to accelerate thesimulation of stochastic models and analysis of simulationresults on multicore platforms We advocate a fully parallelsimulation-analysis workflow as way to accelerate the sim-ulation of stochastic model to improve the likely to detectunknown system behaviors and to shorten the model tuningprocess thus improving tool usability
We applied our methodology on a simulator for thecalculus of wrapped compartments (CWC) a formal frame-work for modeling biological systems and their stochasticbehavior Even if we focused here on the integration ofthe simulationanalysis phases we demonstrate its efficiencyand effectiveness by modeling three simple but paradigmaticexamples of biological systems which are representative ofdifferent system dynamics that is multistable and oscillatoryAll considered cases are representative of the kind of behav-iors for which stochastic simulations have an expressivityedge onto classic ODEs The ability of detecting interestingbut unknown system behaviors is enhanced by the possibilityof plugging in the workflow a set of user-defined statistic andmining tools that can be executed in parallel and while modelsimulation is still ongoing In this respect our approach is asfar as we know completely new
Concerning the parallelization problem we propose afully parallel simulation-analysis tool that copes with several
challenging problems inter alia close to ideal speedup andefficiency coupled with low code development effort capa-bility to manage high data throughput thus enhancing theprecision of simulated models and interactivity and reduceddata size produced with respect to classic simulation-analysissequential execution
Many of these results are achieved by means of the Fast-Flow programming framework that provides a high-level ofabstraction parallel programming methodology that exemptthe programmer fromdirectmanagement of synchronizationand orchestration of concurrent activities FastFlow provideslow synchronization overhead and performance predictabil-ity which makes it possible to design and tune a complexautobalancing fully online simulation-analysis workflow thatcan produce data at a very high frequency and filter themin memory before being stored out-of-core avoiding the diskbottleneck
The FastFlow framework and the CWC simulation-analysis workflow are open source software under LGPLlicense [23 49]
Conflict of Interests
The authors declare that they have no conflict of interestsregarding the publication of this paper
Acknowledgments
The authors wish to thank M Mazumder and E Macchiaof Etica Srl for the simulator GUI implementation Thiswork has been supported by the BioBITs project (Converg-ing technologies 2007 area Biotechnology-ICT) RegionePiemonte Italy and by Paraphrase Project (FWP7EC-STREPno 288570)
References
[1] D TGillespie ldquoExact stochastic simulation of coupled chemicalreactionsrdquo Journal of Physical Chemistry vol 81 no 25 pp2340ndash2361 1977
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 13
[2] R Alur C Belta F Ivancic et al ldquoHybrid modeling andsimulation of biomolecular networksrdquo in Proceedings of the 4thInternational Workshop on Hybrid Systems Computation andControl (HSCC rsquo01) vol 2034 of Lecture Notes in ComputerScience pp 19ndash32 Springer Rome Italy 2001
[3] G Paun Membrane Computing An Introduction Springer2002
[4] R Barbuti A Maggiolo-Schettini P Milazzo and A TroinaldquoA calculus of looping sequences for modelling microbiologicalsystemsrdquo Fundamenta Informaticae vol 72 no 1-3 pp 21ndash352006
[5] R Barbuti A Maggiolo-Schettini P Milazzo P Tiberi andA Troina ldquoStochastic calculus of looping sequences for themodelling and simulation of cellular pathwaysrdquoTransactions onComputational Systems Biology vol 5121 pp 86ndash113 2009
[6] J Krivine R Milner and A Troina ldquoStochastic BigraphsrdquoElectronic Notes inTheoretical Computer Science vol 218 no Cpp 73ndash96 2008
[7] L Cardelli ldquoOn process rate semanticsrdquo Theoretical ComputerScience vol 391 no 3 pp 190ndash215 2008
[8] A Ferscha ldquoPerformance models for discrete event systemswith synchronisations formalisms and analysis techniquesrdquo inVII-Simulation vol 2 of MATCH Advanced Schools EditorialKronos Zaragoza Spain 1998
[9] T C Meng S Somani and P Dhar ldquoModeling and simulationof biological systems with stochasticityrdquo In Silico Biology vol 4no 3 pp 293ndash309 2004
[10] M Coppo F DamianiMDrocco et al ldquoSimulation techniquesfor the calculus of wrapped compartmentsrdquo Theoretical Com-puter Science vol 431 pp 75ndash95 2012
[11] M Aldinucci A Bracciali P Lio A Sorathiya andM TorquatildquoStochKit-FF efficient systems biology on multicore architec-turesrdquo in Euro-Par 2010 Parallel ProcessingWorkshops vol 6586of Lecture Notes in Computer Science pp 167ndash175 SpringerIschia Italy 2011
[12] M Aldinucci M Coppo F Damiani M Drocco M Torquatiand A Troina ldquoOn designing multicore-aware simulators forbiological systemsrdquo in Proceedings of the 19th InternationalEuromicro Conference on Parallel Distributed and Network-Based Processing (PDP rsquo11) pp 318ndash325 Ayia Napa CyprusFebruary 2011
[13] H Salis V Sotiropoulos and Y N Kaznessis ldquoMultiscaleHy3S Hybrid stochastic simulation for supercomputersrdquo BMCBioinformatics vol 7 article 93 2006
[14] J Intosalmi T Manninen K Ruohonen and M-L LinneldquoComputational study of noise in a large signal transductionnetworkrdquo BMC Bioinformatics vol 12 article 252 2011
[15] L Dematte and T Mazza ldquoOn parallel stochastic simulationof diffusive systemsrdquo in Computational Methods in SystemsBiology M Heiner and A M Uhrmacher Eds vol 5307 ofLecture Notes in Computer Science pp 191ndash210 Springer 2008
[16] P Ballarini M Forlin T Mazza and D Prandi ldquoEfficientparallel statistical model checking of biochemical networksrdquo inProceedings of the 8th International Workshop on Parallel andDistributed Methods in Verification (PDMC rsquo09) L Brim and Jvan de Pol Eds vol 14 of Electronic Proceedings in TheoreticalComputer Science pp 47ndash61 Eindhoven The Netherlands2009
[17] M Cole Algorithmic Skeletons Structured Management ofParallel Computations Research Monographs in Parallel andDistributed Computing Pitman 1989
[18] J Dean and S Ghemawat ldquoMapReduce Simplified data pro-cessing on large clustersrdquo Communications of the ACM vol 51no 1 pp 107ndash113 2008
[19] Intel CorpThreading Building Blocks 2011[20] K Asanovic R Bodik J Demmel et al ldquoA view of the parallel
computing landscaperdquoCommunications of the ACM vol 52 no10 pp 56ndash67 2009
[21] M Aldinucci and M Danelutto ldquoSkeleton-based parallel pro-gramming functional and parallel semantics in a single shotrdquoComputer Languages Systems and Structures vol 33 no 3-4pp 179ndash192 2007
[22] MAldinucciMDanelutto P KilpatrickMMeneghin andMTorquati ldquoAn efficient unbounded lock-free queue for multi-core systemsrdquo in Programming Multi-Core and Many-CoreComputing Systems Parallel and Distributed Computing vol7484 of Lecture Notes in Computer Science chapter 13 pp 662ndash673 Wiley 2012
[23] FastFlow 2009 httpmc-fastflowsourceforgenet[24] M Aldinucci M Meneghin andM Torquati ldquoEfficient Smith-
Waterman on multi-core with FastFlowrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 195ndash199 Pisa Italy February2010
[25] J A Hartigan and M A Wong ldquoA k-means clustering algo-rithmrdquo Journal of the Royal Statistical Society C vol 28 no 1pp 100ndash108 1979
[26] L J Heyer S Kruglyak and S Yooseph ldquoExploring expressiondata identification and analysis of coexpressed genesrdquo GenomeResearch vol 9 no 11 pp 1106ndash1115 1999
[27] A Savitzky and M J E Golay ldquoSmoothing and differentiationof data by simplified least squares proceduresrdquo AnalyticalChemistry vol 36 no 8 pp 1627ndash1639 1964
[28] F Schlogl ldquoChemical reaction models for non-equilibriumphase transitionsrdquo Zeitschrift fur Physik vol 253 no 2 pp 147ndash161 1972
[29] A Arkin J Ross and H H McAdams ldquoStochastic kineticanalysis of developmental pathway bifurcation in phage 120582-infected Escherichia coli cellsrdquoGenetics vol 149 no 4 pp 1633ndash1648 1998
[30] J Hasty J Pradines M Dolnik and J J Collins ldquoNoise-basedswitches and amplifiers for gene expressionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 97 no 5 pp 2075ndash2080 2000
[31] J-C Leloup D Gonze and A Goldbeter ldquoLimit cycle modelsfor circadian rhythms based on transcriptional regulation inDrosophila andNeurosporardquo Journal of Biological Rhythms vol14 no 6 pp 433ndash448 1999
[32] M Aldinucci and M Danelutto ldquoStream parallel skeletonoptimizationrdquo in Proceedings of the International EuromicroConference on Parallel and Distributed Computing and Systems(PDCS rsquo99) pp 955ndash962 IASTED ACTA press CambridgeMass USA November 1999
[33] A Phillips and L Cardelli ldquoA correct abstract machine forthe stochastic pi-calculusrdquo in Concurrent Models in MolecularBiology (BIOCONCUR rsquo04) Electronic Notes in TheoreticalComputer Science Elsevier London UK 2004
[34] A Phillips and L Cardelli ldquoEfficient correct simulation ofbiological processes in the stochastic Pi-calculusrdquo inProceedingsof theInternational Conference on Computational Methods inSystems Biology (CMSB rsquo07) vol 4695 of Lecture Notes inComputer Science pp 184ndash199 Springer 2007
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
14 BioMed Research International
[35] S Ramsey D Orrell and H Bolouri ldquoDizzy stochastic simula-tion of large-scale genetic regulatory networks (supplementarymaterial)rdquo Journal of Bioinformatics andComputational Biologyvol 3 no 2 pp 437ndash454 2005
[36] F Ciocchetta and J Hillston ldquoBio-PEPA an extension of theprocess algebra PEPA for biochemical networksrdquo ElectronicNotes in Theoretical Computer Science vol 194 no 3 pp 103ndash117 2008 Proceedings of the First Workshop ldquoFrom Biology toConcurrency and back (FBTC 2007)rdquo
[37] F Ciocchetta A Duguid S Gilmore M L Guerriero and JHillston ldquoThe Bio-PEPA tool suiterdquo in Proceedings of the 6thInternational Conference on the Quantitative Evaluation of Sys-tems (QEST rsquo09) pp 309ndash310 Budapest Hungary September2009
[38] T Ray and P Saini ldquoEngineering design optimization using aswarm with an intelligent information sharing among individ-ualsrdquo EngineeringOptimization vol 33 no 6 pp 735ndash748 2001
[39] P K Dhar T C Meng S Somani et al ldquoGrid Cellware thefirst grid-enabled tool for modelling and simulating cellularprocessesrdquo Bioinformatics vol 21 no 7 pp 1284ndash1287 2005
[40] J Barnat L Brim andD Safranek ldquoHigh-performance analysisof biological systems dynamicswith theDiVinEmodel checkerrdquoBriefings in Bioinformatics vol 11 no 3 pp 301ndash312 2010
[41] L R Petzold ldquoStochKit stochastic simulation kit web pagerdquo2009 httpwwwengineeringucsbedusimcseStochKitindexhtml
[42] G Gruenert B Ibrahim T Lenser M Lohel T Hinze andP Dittrich ldquoRule-based spatial modeling with diffusing geo-metrically constrained moleculesrdquo BMC Bioinformatics vol 11article 307 2010
[43] H Li and L Petzold ldquoEfficient parallelization of the stochasticsimulation algorithm for chemically reacting systems on thegraphics processing unitrdquo International Journal of High Perfor-mance Computing Applications vol 24 no 2 pp 107ndash116 2010
[44] G Klingbeil R Erban M Giles and P K MainildquoSTOCHSIMGPU parallel stochastic simulation for thesystems biology toolbox 2 for MATLABrdquo Bioinformatics vol27 no 8 pp 1170ndash1171 2011
[45] L Cardelli and P Gardner ldquoProcesses in spacerdquo in ProgramsProofs Processes F Ferreira B Lowe E Mayordomo and L MGomes Eds vol 6158 of Lecture Notes in Computer Science pp78ndash87 Springer 2010
[46] L Bioglio C Calcagno M Coppo et al ldquoA spatial calculus ofwrapped compartmentsrdquo in Proceedings of the 5th Workshop onMembrane Computing and Biologically Inspired Process Calculi(MeCBIC rsquo11) Fontainebleau France 2011
[47] C Calcagno M Coppo F Damiani et al ldquoModelling spatialinteractions in the arbuscular mycorrhizal symbiosis usingthe calculus of wrapped compartmentsrdquo in Proceedings of the3rd International Workshop on Computational Models for CellProcesses (CompMod rsquo11) vol 67 of Electronic Proceedings inTheoretical Computer Science pp 3ndash18 Aachen Germany 2011
[48] D Chen L Wang D Cui et al ldquoA massively parallel approachfor nonlinear interdependency analysis of multivariate sig-nals with GPGPUrdquo in Proceedings of the 26th IEEE Interna-tional Parallel and Distributed Processing SymposiumWorkshops(IPDPSW rsquo12) pp 1971ndash1978 May 2012
[49] CWC Simulator project 2010 httpsourceforgenetprojectscwcsimulator
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
Hindawi Publishing Corporationhttpwwwhindawicom
GenomicsInternational Journal of
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology