Emulating biology: the virtual living organism

10
© 2011 Collegium Basilea & AMSI Journal of Biological Physics and Chemistry 11 (2011) 97–106 Received 11 August 2011; accepted 30 September 2011 97 20BA11A ________________________________________________________________________________________________________ 1. INTRODUCTION Validating a concept in biological sciences is not an easy task since it involves experiments that can be hard to set up and outputs that can be hard to measure. To avoid these problems, researchers often use simulations that can more or less accurately predict the results of a real world experiment. 1 Many such simulations, corresponding to the many fields and parts of biology, have been run. They are mostly realized with custom-built software that uses mathematical formulae specific to the problem at hand. One evident major weakness of the simulations is the scale. It is usually easy to simulate a very local phenomenon with today’s computational capacity, but it is quickly saturated as soon as the problem expands to a global scale, such as one that includes both fine intracellular parts and global organismal parts. Such global problems will likely exceed computational capacity if done with the same software and numerical accuracy as the local simulations, hence greater abstraction is required to enable the global ones to be completed. Biology works on many scales, both spatial and temporal. An organism is made up of organs, which consist of tissues and tissues have cells inside them, just to name a few. Some phenomena take milliseconds or less to occur, others can take years to finish. In order to be able to investigate complex processes of nature, all of these scales have to be represented in the simulation, hence the goal is a multiscale simulation, to create which the basic unit has to be identified as a starting point, whence all the scales are represented to the required extent. When the goal is to simulate from cellular level (or even below) to organism level, the “middle–out” approach is a popular choice; see Walker & Southgate (2009) for examples. It considers cells as the basic unit of life and, correspondingly, makes them the basic unit of the simulation. The majority of these simulations use one of two tools to achieve their goals: cellular automata or multi- agent systems. Cellular automata (CA) are clearly useful tools for many problems, but by definition each cell only has a local view and effect; this can be a major restriction if, for example, part of the simulation can be made more abstract by skipping some local causes and making an effect more global. Also, CA have a strong spacial property that is rather rigid and therefore makes them harder to apply to some problems. It is not an absolute necessity, but most CA are discretely timed and synchronous, which make it easier to obtain accurate results, but make the simulation much slower as even unaffected cells need to be updated at every iteration. In contrast the independent agents of multiagent systems are more suited for multiscale simulations as they are asynchronous by default and can be hierarchically or spacially organized (e.g., agent-based modelling utilizing spacial organization and the multiscale biological simulation of inflammation simulation (An, 2008) ). There is no generally accepted definition of multiagent systems, but there are certain attributes of agents used in most of the working definitions. Woolridge and Jennings’ (1995) definition of an agent within the system includes the following: autonomy, social ability, reactivity and pro- activeness. These properties do indeed apply to biological cells and, hence, are useful if emulating nature is the goal. There is, however, another property, namely strict locality in the ability of agents to sense their environment, which albeit being true to nature (as no cell in a multicellular Emulating biology: the virtual living organism Gergely Bándi 1 and Jeremy J. Ramsden 1, 2, * 1 Cranfield University, Bedfordshire, MK43 0AL, UK 2 Collegium Basilea (Institute of Advanced Study), 4002 Basel, Switzerland Rapid prototyping tools exist in many fields of science and engineering, but not as much in biology, especially not general tools that can handle the diversity and complexity of the many spacial and temporal scales in nature. In this paper we introduce and describe an abstract tool that emulates biology and can create rapid prototypes of many diverse biological objects. It is anticipated that it can be used to provide support for theories, create “virtual” experiments that would be impossible to undertake or physically measure, and emulate in vivo experimentation in a highly cost-effective fashion. * Corresponding author. E-mail: [email protected] 1 In our usage, “simulation” means explicitly mimicking as much detail about a system as is known, whereas “emulation” means mimicking the external behaviour of the system (functional representation). Evidently the degree of abstraction is much greater in the latter compared with the former. A simulation attempts to be as accurate as possible even in terms of concrete details.

Transcript of Emulating biology: the virtual living organism

© 2011 Collegium Basilea & AMSIJournal of Biological Physics and Chemistry 11 (2011) 97–106Received 11 August 2011; accepted 30 September 2011

97

20BA11A________________________________________________________________________________________________________

1. INTRODUCTION

Validating a concept in biological sciences is not an easytask since it involves experiments that can be hard to setup and outputs that can be hard to measure. To avoidthese problems, researchers often use simulations thatcan more or less accurately predict the results of a realworld experiment.1 Many such simulations, correspondingto the many fields and parts of biology, have been run.They are mostly realized with custom-built software thatuses mathematical formulae specific to the problem athand. One evident major weakness of the simulations isthe scale. It is usually easy to simulate a very localphenomenon with today’s computational capacity, but it isquickly saturated as soon as the problem expands to aglobal scale, such as one that includes both fineintracellular parts and global organismal parts. Such globalproblems will likely exceed computational capacity if donewith the same software and numerical accuracy as thelocal simulations, hence greater abstraction is required toenable the global ones to be completed.

Biology works on many scales, both spatial andtemporal. An organism is made up of organs, whichconsist of tissues and tissues have cells inside them, just toname a few. Some phenomena take milliseconds or less tooccur, others can take years to finish. In order to be able toinvestigate complex processes of nature, all of thesescales have to be represented in the simulation, hence thegoal is a multiscale simulation, to create which the basicunit has to be identified as a starting point, whence all thescales are represented to the required extent. When thegoal is to simulate from cellular level (or even below) toorganism level, the “middle–out” approach is a popularchoice; see Walker & Southgate (2009) for examples. It

considers cells as the basic unit of life and,correspondingly, makes them the basic unit of thesimulation. The majority of these simulations use one oftwo tools to achieve their goals: cellular automata or multi-agent systems.

Cellular automata (CA) are clearly useful tools formany problems, but by definition each cell only has a localview and effect; this can be a major restriction if, forexample, part of the simulation can be made more abstractby skipping some local causes and making an effect moreglobal. Also, CA have a strong spacial property that israther rigid and therefore makes them harder to apply tosome problems. It is not an absolute necessity, but mostCA are discretely timed and synchronous, which make iteasier to obtain accurate results, but make the simulationmuch slower as even unaffected cells need to be updatedat every iteration.

In contrast the independent agents of multiagentsystems are more suited for multiscale simulations as theyare asynchronous by default and can be hierarchically orspacially organized (e.g., agent-based modelling utilizingspacial organization and the multiscale biologicalsimulation of inflammation simulation (An, 2008) ). Thereis no generally accepted definition of multiagent systems,but there are certain attributes of agents used in most ofthe working definitions. Woolridge and Jennings’ (1995)definition of an agent within the system includes thefollowing: autonomy, social ability, reactivity and pro-activeness. These properties do indeed apply to biologicalcells and, hence, are useful if emulating nature is the goal.There is, however, another property, namely strict localityin the ability of agents to sense their environment, whichalbeit being true to nature (as no cell in a multicellular

Emulating biology: the virtual living organismGergely Bándi1 and Jeremy J. Ramsden1, 2, *

1 Cranfield University, Bedfordshire, MK43 0AL, UK2 Collegium Basilea (Institute of Advanced Study), 4002 Basel, Switzerland

Rapid prototyping tools exist in many fields of science and engineering, but not as much inbiology, especially not general tools that can handle the diversity and complexity of the manyspacial and temporal scales in nature. In this paper we introduce and describe an abstract toolthat emulates biology and can create rapid prototypes of many diverse biological objects. It isanticipated that it can be used to provide support for theories, create “virtual” experiments thatwould be impossible to undertake or physically measure, and emulate in vivo experimentationin a highly cost-effective fashion.

* Corresponding author. E-mail: [email protected] In our usage, “simulation” means explicitly mimicking as much detail about a system as is known, whereas “emulation” means

mimicking the external behaviour of the system (functional representation). Evidently the degree of abstraction is much greaterin the latter compared with the former. A simulation attempts to be as accurate as possible even in terms of concrete details.

98 G. Bándi and J.J. Ramsden Emulating biology: the virtual living organism______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

system has a truly global view (Panait & Luke, 2005)) willdoubtless make some desirable features and optimizationsimpossible or suboptimal. In the interests of being aswidely optimal as possible, which should be one of themain goals of a generic emulation system, agents withglobal views will therefore be allowed in the virtual livingorganism (i.e., the novel framework we present in thispaper). In every other sense, the system described here isa multiagent system.

Abstraction is a requirement for crossing thecomplexity barrier. Even a degree of dynamic abstractionmight be necessary. Think of a simulation of the effect ofan anticancer drug or therapy that in league with theimmune system can prevent the growth of a tumour ormetastasis. For this simulation, as a minimum, a tumourinside a tissue and a working immune system would berequired. If the effect-mechanism uses the circulation ofthe lymphatic system, that might also be a necessity. Otherparts of the organism are not required; even those partssimulated do not need to have all their properties included.As a result, we arrive at a system where the details, the“how” part, are mostly unimportant; the “what” part, theessence of what is happening inside, is what counts. Sucha problem is better suited for an emulation rather than asimulation.1

The emulation consists of one or more black boxes,each of which mimics a certain function. The system is atsome levels detailed enough to emulate with enoughaccuracy to draw useful conclusions. This type ofemulation is well suited for conceptual validations. It canhelp with behavioural analysis by demonstrating theeffects in a more complex system, and it can createstatistics of various properties. It can be used to determinethe validity of biological concepts that could in principle(but with much more trouble and expense) be proven inlocal or laboratory environments. The emulator can thenbe quickly extended to complex (simulated in vivo)environments, far more readily than a real experiment.

One of the main barriers to the realization of suchnumerical models (emulations) is the effort needed toprogram a general purpose computer to run them (evenmore effort would be needed to construct a dedicatedcomputer). The ability to rapidly create these emulationswould enable the user to swiftly test numerous theoriesand variants with changing core details. Simulations1

necessarily have more rigid approaches because of therequirement for accuracy of detail and many otherconstraints, while every feature of a good rapid prototypingemulator tool can be changed without much effort. If theemulator has interchangeable parts at many levels, alibrary can be created, shared and reused; for example,test organisms or organs can be customized or extended.

Our solution for creating the desired rapid prototypingtool is the virtual living organism. An introduction to thebasic design principles and associated definitions can befound in our previous paper (Bándi & Ramsden, 2010). Akey feature (required for reusability) is to have welldesigned interfaces so that the parts (i.e., the blocks usedto build up the organism) can be exchanged at manylevels and still be compatible with the entire system,without making any limiting restrictions.

In the remainder of this paper we introduce the virtualliving organism (VLO) library; first its hierarchicalstructure, then the means of communication betweencells; some more advanced functions like the creationprocess, self-replication and optimization of cells anddeath are also described.

2. THE VLO LIBRARY

2.1 System hierarchy

The VLO library is an implementation of a general-purpose, middle–out model of a cell-based virtual livingorganism. The system does all the work at a cellular level.Every cell is a separate program thread (an individual lineof execution within a program) on its own. The number ofthreads in the virtual organism equals the number of cellsplus one (a traditional program thread starts the first cellresponsible for the creation of the organism). This mayseem excessive as in reality the maximum number ofconcurrently running threads is the number of CPU coresin the computer, but it seems obvious that this is nature’sway and it is the direction where computer technology isgoing with increasing numbers of processor cores anddistributed systems. A thread-pooling system could beeasily integrated into the virtual organism to better fit thetask at hand.

The cell is also an object in the traditional program-ming sense, as it has its own set of variables and functions.All the cells are derived from a class named Cell. Thisparent class has all the default functionalities implementedthat most cells use during their lifetime. Each cell has aspecial purpose in “life”, which will often be a smallparticular task that correlates with the limited functionalityof an individual biological cell. Of course, if required, eachcell can be as complex as the programmer makes it.

While cells correspond to active physical (biological)entities, all the higher levels are merely logical; that is,without a thread and only having “services” and shareddata. Services can be thought of as callable functions thatimplement the particular uses that can be best achievedwith the use of the particular level.

The next level above cells is the tissue. Every cellinside a multicellular organism is inside a tissue (Figure 1).

Emulating biology: the virtual living organism G. Bándi and J.J. Ramsden 99______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

This connexion is realized for both directions, every cellhas a pointer to its “parent” tissue, and every tissue has achain of links that point to its cells. A tissue consists of cellsthat work closely together. In a traditional programmingsense, this can be thought of as a bigger function that callssmaller functions. In this case, there is a main cell thatmore or less corresponds to the bigger function, and thiscell uses the “service” of the smaller cells, and those mightuse other even smaller ones to fulfil their tasks. Cellscommunicate their needs and results by giving out jobs toother types of cells. Details of this can be found in Section2.2. Cells also have the ability to self-replicate. Details ofthis ability can be found in Section 2.4. With thisinformation at hand, the use of this cell-based structurestarts to have some major advantages: as in nature, thecellular structure can achieve a self-balancing effect interms of population as there can be an optimal number ofa certain type of cell: fewer would be insufficient forcompleting the task at hand and more would be a waste ofresources. For example, blood cells are organized so themore of them there are, the better result they can achieve(up to a point); hence, when needed, more blood cells arecreated (in nature). Conversely, if a cell is too slow, itcreates a bottleneck and brings the whole system out ofbalance.2 In a typical computer, the number of processorcores—and thus the number of physically concurrentoperations—is very limited; hence, without any extraprioritizing, a similar result can be achieved only bycreating enough copies of a cell so the relative number ofthe slow cells is increased relative to the rest of theorganism. This is a possible way of self-optimization toeliminate bottlenecks. For example, if there are two copiesof type A cells in an organism of 100 cells that are allworking at the same time, in theory it can be expected that

about 2% of the processing power of the computer will beallocated to type A cells (in practice the allocation maydepend on other factors as well). If each non-A cellreceives roughly equal numbers of jobs per second, but thetype A cell receives five times more, then if every othercell can keep up with the workload, type A will soon bebehind schedule. But if it could multiply its copies to 11 outof then 109 cells, it would use about 10% of theprocessing power of the whole system, which wouldcompensate for the increased workload.

In essence, a tissue consists of sets of different typesof cells and all their job queues (where all the incomingjobs are waiting). Aside from this, it has functions that helpcreating and annihilating cells inside the tissue, functionsthat return global information like the number of cellsinside a type group or details of the jobs in the queue, aswell as any needed maintenance and secure use of allaccessible stored information. Tissues also supportfunctions used mainly by special types of cells, like a cell-moving function used by blood cells to be able to movebetween tissues and organs.

Tissues are stored in organs. Every organ has a maintissue that realizes the main function of that organ, andseveral support tissues for maintenance and any(predefined) service required by any cell in the organ tofunction normally. Every tissue has a pointer to its organ,and every organ has a pointer to its sole main tissue and aset of pointers to any support tissues. This results in ahierarchical tree structure.

Organs are necessary structural elements. Cellsusually work closely together, a tissue is enough tofacilitate this. But there might be times when cells accessindirect services, like a globally synchronized variable,which needs to be actively updated; this service must bedone by a cell. The synchronizing cell (whose function isuniquely synchronization) is strategically different (becauseits function is to maintain the environment) from the otherworking cells, so it is not logical to put it in the same tissue.This is why we need the support tissues; they have all themaintenance and support services that make the work ofthe other cells possible. On a strategic or functional level,an organ can be thought of as not much more than its maintissue extended with the necessary, but most of the timeunobtrusive, services.

The topmost structural level is the organism. Anorganism consists of organs (Figure 2). An organism is aprocess from the operating system’s point of view.

2 Homeostasis is of course strongly operating; for example, the liver of most adult humans is considered to have a consistent size,but there is adaptation according to need—the heart may become larger and more powerful if required.

Figure 1. Relationship between tissues and cells. StandardUML (2010) notation is used, showing that there is at most onetissue for every cell.

100 G. Bándi and J.J. Ramsden Emulating biology: the virtual living organism______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

Figure 2. Structure of the virtual living organism (UML notation).Organisms consist of multiple organs, but every organ is insideone organism. Organs have exactly one main tissue and canhave several support tissues. Each tissue consists of at leastone cell. The relationship between organism, organs and tissuesis called composition (i.e., they all have the same lifespan), whilethe relationship between tissues and cells is called aggregation(i.e., their lifespans are independent of each other) (UML, 2010).

2.2 Communications

In nature, there are many types of communication betweenthe cells inside an organism. The vast majority of thiscommunication is done via chemical signalling (Campbell& Reece, 2007). The virtual organism has the same typesof signals as we can see in biology. For the differentcommunication types, see Table 1.

Communication type Target and mode of transportationa

Endocrine Cell in distant target tissue, reached using the circulatory system

Paracrine Neighbouring cell within tissue or organ

Autocrine Same cell, transported outside of cell

Intracrine Same cell, transported within the cell

Table 1. Communication types and their properties.

a From Nussey & Whitehead (2001).

In the VLO, the “chemical” used for communicationis the job data structure. As intracrine signals do not leavethe cell where they were created, and are used forconveying information between the parts inside a cell, andthe smallest part simulated inside this virtual organism isthe cell, it would seem that at a present we do not explicitlyneed to use this type of signalling, and can think of it asbeing represented by the use of variables inside theprogram of the cell.

An autocrine signal acts on the same cell that createdit, so this is realized by sending a job for the same type ofcell that created it (through its own tissue). This is widelyused as part of the death (apoptosis) process, where asignal is embedded into the job structure, and when a cellprocesses this job from the queue, it re-emits the signal soother cells of the same type can process it too.

Paracrine signalling is used to reach any type ofnearby cell (Figure 3). In biology the intercellular substanceinside the tissue is used to convey the information. In thevirtual organism this is implemented as a direct way a cellcan reach other cells inside the same tissue. The cell insertsa job for any kind of cell inside the tissue by using the jobinserting service of the tissue directly. This service securelyinserts the new job to the target cell’s job queue. This isexpected be the most widely used method of signalling insidea typical virtual organism, as it is optimized to be very fast.

When the target cell type is outside of the tissue of themessage sender, the endocrine signalling system is used.This makes use of the vast circulatory system inside anorganism that can essentially reach all of the tissues. Thissystem has two layers, the first connects all the organs,and it is implemented as a pointer in every organ thatpoints out the next one (Figure 4). The second layercirculates inside the organ starting with the main tissue ofthe organ, as it is expected that the majority of thecommunication inside the organism will concern cellsinside the main tissue of the organ. Inside every tissue,there is a similar pointer as in the organs, pointing to thenext tissue of the second layer.

The circulatory system is not to be used by any typeof cell but is almost exclusively used by the blood cells,which can be given a job with details of the target and thejob structure as the message. The address must include thetarget cell type, and can include the name of the organand/or the name of the tissue that has the cell inside it. Ifthere is a specific address, then the blood cell can skip theorgan if it is not the target by not going through its innerlink, and can detect if it went to the address and the targetwas not there (i.e., the target tissue did not recognize thetarget cell type). When the blood cell finds the target celltype either inside the specific tissue or organ (when given),or otherwise after going round the circulatory system and

Emulating biology: the virtual living organism G. Bándi and J.J. Ramsden 101______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

Figure 3. Autocrine and paracrine signalling (UMLnotation) showing that an autocrine signal inserts thejob into the job queue of the sender’s tissue type,and a paracrine signal inserts the job of the recipient’sjob queue inside the same tissue.

Figure 4. The circulatory system (UML notation).The two-level circulatory system consists of aninner level that starts at the main tissue of the organand goes around in a predefined order and the outerlevel goes around the organs.

102 G. Bándi and J.J. Ramsden Emulating biology: the virtual living organism______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

searching for the target cell type (by “asking” the tissue ifit has that type of cell), it inserts the message as a job intothe target and continues its work.

In nature, most of the chemical signals travel asdissolved molecules inside the bloodstream. In thisemulation, there is no physics simulation embedded, so nopassive entities (like dissolved molecules) can move bythemselves, only active entities (like blood cells) caninteract with them. As long as this inaccuracy does notcause a problem in a given emulation task, the system israther similar to what actually happens at a high level innature. As messages appear inside the tissue fluid, theyare indeed like dissolved molecules. In the VLO thesemolecules get into the bloodstream automatically by afunction in the tissue called by a cell. As they travel aroundthe circulatory system, they selectively enter tissues andbind to the target cell by being put into its work queue. It iseasy to see the accuracy of this model if we imagine a copyof the BloodCell type and name it DissolvedMolecule, itcould do the same routines as the blood cells and use thecirculatory system the same way. This would be closer tonature, but having the BloodCell undertake the transportof all molecules makes it easier to use this system.

The present implementation has two types of bloodcell: one is called BloodCell, used as described above; theother is called SpecialBloodCell, used to distributeinformation to all cells of the same type inside theorganism, mimicking a similar process in nature (Valituttiet al., 1995). This is done by going around the circulatorysystem, giving out the job to every job queue thatcorresponds to the target cell type and going on until itreaches the starting point, when it gives a job to theoriginating cell notifying it of the completion of the job.This can be used for synchronizing certain events orvalues inside the whole organism, like a signal that is givento cells to notify them at the beginning that it is safe to startbecause a consistent environment has been set up. TheSpecialBloodCell has an option to wait at every step for anew job with which to continue its journey. This allows forgathering data from all the cells of the same type. It isintended to be used with an empty set or similar type thatis being extended at every step. This is used, for example,to gather all the type names of cells present in theorganism after birth.

Job structures are completely opaque, meaning thatonly the creator knows the content; they can be createdby calling the jobOrder method of the target cell type withall the data that a cell needs for doing its job. This returnsthe opaque structure that can be sent with any of thecommunication methods described in this section.

There is a method of returning results of a job that isimplemented here for the sake of simplicity and not

necessarily as an accurate representation of a realbiological process. Inside the job order, it is possible toinsert a special type of pointer that contains the address towhich the results should be sent directly. The sender canthen wait until the results are back and continue its workusing them. This mechanism is not a problem is terms ofbiological accuracy, as it can be chosen not to be used ifneeded and the same result can be achieved if the sendercell is made into two separate type of cells, one doing thework only before sending the job, and the other doing thework only after receiving the results. This version isbiologically more accurate and achieves the same result,but is harder to see through and handle, hence as asimplification method the special pointer can be used whenthe inaccuracy does not pose a problem.

2.3 Creation (Begetting)

Creation (begetting) here means how the virtual organismcomes into existence. It is the process that begins with theuser starting a program and ends with a virtual organismin existence on the same computer. This process is partlylike birth in biology, but unlike complex biological beingswho are first in an immature state for a significant proportionof their lives, the virtual organism should become fullyoperational in as short a time as possible. This means thatunless it is a requirement for a simulation to have accuratebirth and maturing processes, the emulation would benefitfrom a creation process optimized for speed.

For building the structure of the organism, there aretwo possibilities, a top–down or a bottom–up approach.The former would mean that first we create the organism,then all the organs and tissues, and finally the cells withinthem. Most of the time this is the approach used inprogramming, since it means that programmers can createself-contained structures; for example an organism thatautomatically creates the tissues it needs, and the tissuescan create the cells they need. While this would be quiteuseful, and since everything above the cells is almostpurely structural, the biological equivalent would be moreor less physically plausible, but this is not how it is done innature, and it would be hard to extend to a more accurateemulation. So the approach to implement must be abottom–up approach, or more like a hybrid approach thatstarts with a bottom-to-top part and still makes it possibleto create self-contained structures.

The first step must be to put one cell into existence(i.e., as if by divine fiat), and this must be a special kind ofcell whose main purpose is to create a virtual biologicalstructure. This cell is called a DivineCell. When it iscreated from the program, it does not have any structurearound it, so at this stage we speak of a single cell

Emulating biology: the virtual living organism G. Bándi and J.J. Ramsden 103______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

organism. The first thing this cell has to do is to create itsown immediate environment. It creates a tissue, anorgan, an organism and sets up the correct relations.Then it is time to create the other custom organs. Nowthese can be self-contained, meaning that when aspecific organ is created, it automatically creates andsets up all the tissues inside it. And as the tissues arecreated, a specialized DivineCell is automatically createdinside them that will be responsible for setting up andmaintaining its own part of the organism (meaning atissue and whatever is inside it) (Figure 5). However if

these independent DivineCells would immediately startcreating all kinds of worker cells, these worker cellswould try to start using services of other parts that mightnot exist at this point, for example because those cellsthat these services rely upon do not exist yet. This bringsthe need of a synchronization process after creating thecells, but before making them active. This synchroniza-tion process has several steps and relies heavily on theSpecialBloodCell introduced in the communicationssection (§2.2) to gather and distribute data, and to sendout signals for time synchronization.

Figure 5. The creation process. (a) The main DivineCell is created (physical level); (b) The organs are created (structural level); (c)The tissue DivineCells is created (physical level); (d) The tissues and other cells are created (structural and physical levels).

After creation, the DivineCells act as a sort ofregistry for the cells they are responsible for. This isneeded for two reasons. First, as opposed to nature, thevirtual organisms’ death process must end in a consistentstate, meaning that there should not be any data loss andresources must be freed. This means that when a cellsignals that it is required by the user of the main task ofthe organism to shut down (die), it is not acceptable to killall the cells where they are, they must be informed of theshutdown request and one must waited for them to reacha consistent state (see §2.5 for details). This requires aregistry of all cells in order to be able to wait for all of thembefore the main process is ended. The second reason whya registry is needed is that when a cell dies because it is notneeded any more (see details in the self-replication section§2.4), it cannot fully delete itself and free up all theresources it used. It is the DivineCell’s responsibility tofinish the death process of the cells it is responsible for.

2.4 Self-replication and optimization

One very important property of a biological system is itscapacity for self-replication on a cellular level. This countsas an optimization, as more cells can mean betterperformance, but it also means more resource consumption.The increased number of any particular cell type can resultin increased performance of the function this cell providesin the system. This is done by increasing the overallpercentage of this particular cell compared to all the cellsin the system, thus getting proportionally more resources(e.g., CPU time) than before. Luckily without anyadditional work, the increased resource consumptionapplies here as well, as every new thread increasesmemory consumption as well as overhead caused by theincreased amount of thread switching for the CPU.Though this is a negative effect, it makes emulation of thebiology somewhat more accurate.

104 G. Bándi and J.J. Ramsden Emulating biology: the virtual living organism______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

The implementation of self-replication can becustomized for every cell, although there is a basic built-inroutine for general use. In theory, a cell needs to replicateitself when the number of jobs in its queue is growingconstantly. This means that it cannot keep up with thevolume of jobs it gets. Also, it should make sure not toovershoot, as if it would replicate itself too many times, thecells would finish their jobs very quickly and demand forthem would fall, so many of them would need to die out inorder not to take up resources unnecessarily; if thedemand were lower than the combined productivity of thehighest number of cells, then this would result in afluctuation in the cell’s numbers, which is a waste ofresources. Another problem arises with the fact thatwithin a single type of cell, there is a decentralization ofstrategic control. This means that all the cells are trying tooptimize the whole collective without much coördination,just with the use of a few shared variables.

The optimization function only runs if enough time haselapsed, and this time increases with the number of cellsof the same type, thus trying to decrease failed attempts,as only one cell of the same type can work on self-replication at a given time in order to avoid unnecessarycomplications. The self-replication process also takes intoaccount the approximate number of cells globally inside theorganism, not to go over the physical limitations of thehardware, which is just as well since the same tendencycan be observed in nature because of the finiteness ofresources. As a result of how memory allocation works inmultithreaded systems, the number of cells inside a 32-bitsystem should not exceed about 1000, and in a 64-bitsystem not about 10000. These approximate valuesrepresent the current test system if running on a Windowsmachine. If the task at hand requires more cells than thesenumbers, thread pooling that effectively erases these limitsshould be implemented. A possible alternative is to extendone virtual organism across a distributed system wheremodified blood cells could easily go across the network.

2.5 Death

Death in a virtual organism has a somewhat differentmeaning than in biology. That is because of ourexpectations for it. In nature death can be thought of as anevent when the living system leaves its dynamicalequilibrium, its consistent state of being alive. It is easy tosimulate the event of death, we only need to cut off thevirtual system from its resources (e.g., terminate the mainprocess containing all the threads, or just terminate one ormore threads and not worry about any consequences).The only problem is that with computer programs, wedemand that they are always in a consistent state,especially before termination. In reality this can mean

saving data to a persistent location, writing out cachedinformation to disk (think of logging data in the form ofjobs waiting in the job queue of the blood cell in a tissuewaiting to be delivered to a logger cell somewhere else), orjust simply waiting for processing the last of the data whena data factory cell (i.e., one that only generates data anddoes not process any inputs) receives the death signalafter it finished creating all the data to process. This formof termination in its details works more like a hibernationprocess in nature, when preparations need to be done forswitching to an alternate state of unproductiveconsistency. At the same time it differs from hibernationin the way that hibernation has a semipersistent internalstate that remains unchanged after coming out ofhibernation, whereas a virtual organism is reborn afterexpiration and, even if it uses a set of persistently storedinformation, it is essentially a new specimen with no partof the customized inner state of the expired specimen.

As a result of the consistency requirement, the deathprocess must have multiple stages. This is a somewhatsimilar approach to what operating systems use forshutting down. The first step is the signalling of theshutdown request that originates from a cell. This cell canemit the signal either as a result of a user request orbecause of the logic programmed into it. The requestmust be sent to a centralized location, preferably to themain DivineCell via the local DivineCell inside the cell’stissue. When the request arrives at its destination, it istransformed into a shutdown request. The shutdownrequest is a hidden flag inside all the job structures(packaged inside a dummy job request), and can behandled as a job for any of the cells inside the system.This shutdown request is then propagated to all theDivineCells with the use of a SpecialBloodCell, where it isfurther propagated to all the cells inside the organism(Figure 6). At this point, a system-wide synchronizationeffort is required working closely with all the registries tomake certain that there is no state when one kind of cell isterminated while another is waiting for a response from it,not knowing that the job request is inside a queue that canno longer be processed. After all the cells are dead, andthe DivineCells have reported this fact back to the mainDivineCell, all the DivineCells can be killed and theprocess can be terminated.

3. CRITIQUE

Possible problems to which we should be alert include:• It is unclear to what extent structure shall have

primacy over behaviour of the model. Taking a moreabstract route to modelling means losing details andpossibly accuracy; high precision is not required formany tasks and can be traded off for other benefits.

Emulating biology: the virtual living organism G. Bándi and J.J. Ramsden 105______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

• Unrealistic structures in the model represent a threatto accuracy. The model is created from a very abstractstarting point and is refined as needed by the task towhich it is applied. The framework is designed to makeit easy to exchange within the library or change eventhe core concepts if necessary if, for example, acertain structure is revealed as overly abstract and,hence, might be considered unrealistic.

• The conceptual model is overly influenced by softwareemphasis. General abstractions try to be as open aspossible, and thus result in a more accurate system inno way distorted by views of any other discipline. Inthe present case, however, the goal is to create a modelof biology in a software environment (as in computer-assisted simulation), and to make this creation processas fast as possible (i.e., rapid prototyping); it isintended to occupy the middle ground betweencomputer science and biology. This goal does requirea certain level of software emphasis to be applied.The places where this can be allowed, however, canbe decided on a case-by-case basis, as abstractionshould be at the greatest level allowed by the task.Certain concepts and parts in the present version, suchas the DivineCell and the process of creation of theorganism might seem to owe more to softwaretechnology than to biology, but they are notindispensible and all of them can be developed furtheror completely replaced.

4. CONCLUSIONS

The VLO library constitutes a hugely effective rapidprototyping system. Creation of a virtual version of cancertook one of us less than a week in terms of designing andcoding (full details in Bándi & Ramsden, to be published).With the VLO library, a truly fast prototyping tool has beencreated. The cancer example also shows that parts of analready existing emulation can be easily reused; theorganism that was the base of the virtual cancer was oneof the organisms that was used to test the functionality ofthe library.

The virtual cancer example also shows how abiological process can be imitated with a new type ofsoftware algorithm that generates (or modifies) parts ofitself, somewhat similar to genetic programming but withdifferent principles.

The realization of such an emulation brings theemulator closer to software technologies than the currentsimulations. This closeness should enable clearercommunication of ideas between biologists and softwareengineers, hence enabling closer interdisciplinarycoöperation and helping both biology and softwareengineering to get new ideas and inspiration from the otherdiscipline. When creating basic functions of cells andother entities it becomes apparent that some of therequirements of biology and software technology are notfar from each other and solutions inside the two domainsare often based on the same principles, even if they are

Figure 6. Path of the shutdown request. Theshutdown signal originates at a cell (lower right),then it travels (1) to the local DivineCell (DC) andis transferred to the main DivineCell (2). There it istransformed into another type of shutdownsignal that is distributed to all the tissueDivineCells (3) and further distributed by them toall the cells for which they are responsible (4).

106 G. Bándi and J.J. Ramsden Emulating biology: the virtual living organism______________________________________________________________________________________________________

JBPC Vol. 11 (2011)

not exactly equivalent. This should make it possible to findsolutions to current big problems by recreating the problemin the proposed common environment and giving it to aprofessional of one or the other discipline (or both) whocan understand it and apply his or her own knowledge to it(knowledge capture). Many biologically inspired approachesare already being used to optimize engineering systems(Banzhaf et al., 2006) and at the same time their develop-ment has also been very beneficial to software engineering.

ACKNOWLEDGMENT

The authors gratefully acknowledge financial support fromthe Cranfield Innovative Manufacturing Research Centre(IMRC) (funded by the UK Engineering and PhysicalSciences Research Council) through the project “Designoptimisation of Micro/Nano Systems Biomimetically”(IMRC-135).

REFERENCES

UML (2010). Retrieved 12 October 2010, from the IBMCorporation website: http://www.ibm.com/developerworks/rational/library/content/RationalEdge/sep04/bell/

An, G. (2008). Introduction of an agent-based multi-scalemodular architecture for dynamic knowledge representation ofacute inflammation. Theoretical Biology and MedicalModelling 5, 11.Bándi, G. and Ramsden, J.J. (2010). Biological programming. J.Biol. Phys. Chem. 10, 39–43.Banzhaf, W., Beslon, G., Christensen, S., Foster, J.A., Képès, F.,Lefort, V., Miller, J.F., Radman, M. and Ramsden, J.J. (2006).From artificial evolution to computational evolution: a researchagenda. Nature Reviews Genetics 7, 729–735.Campbell, N.A. and Reece, J.B. (2007). Biology (8th ed.). SanFrancisco: Pearson Education.Nussey, S. and Whitehead, S. (2001). Endocrinology—AnIntegrated Approach. Oxford: BIOS Scientific Publishers.Panait, L. and Luke, S. (2005). Cooperative multi-agent learning:the state of the art. Autonomous Agents and Multi-AgentSystems 11, 387–434.Valitutti, S., Müller, S., Cella, M., Padovan, E. and Lanzavecchia, A.(1995). Serial triggering of many T-cell receptors by a fewpeptide-MHC complexes. Nature 375, 148–151.Walker, D.C. and Southgate, J. (2009). The virtual cell candidateco-ordinator for ‘middle-out’ modelling of biological systems.Briefings in Bioinformatics 10, 450–461.Wooldridge, M. and Jennings, N.R. (1995). Intelligent agents:theory and practice. Knowledge Engng Rev. 10, 115–152.