Network Biology, 2015, Vol. 5, Iss. 1

46
Network Biology Vol. 5, No. 1, 1 March 2015 International Academy of Ecology and Environmental Sciences

Transcript of Network Biology, 2015, Vol. 5, Iss. 1

Network Biology

Vol. 5, No. 1, 1 March 2015

International Academy of Ecology and Environmental Sciences

Network Biology ISSN 2220-8879 ∣ CODEN NBEICS Volume 5, Number 1, 1 March 2015 Editor-in-Chief WenJun Zhang Sun Yat-sen University, China International Academy of Ecology and Environmental Sciences, Hong Kong E-mail: [email protected], [email protected] Editorial Board Ronaldo Angelini (The Federal University of Rio Grande do Norte, Brazil) Sudin Bhattacharya (The Hamner Institutes for Health Sciences, USA) Andre Bianconi (Sao Paulo State University (Unesp), Brazil) Danail Bonchev (Virginia Commonwealth University, USA) Graeme Boswell (University of Glamorgan, UK) Jake Chen (Indiana University-Purdue University Indianapolis, USA) Ming Chen (Zhejiang University, China) Daniela Cianelli (University of Naples Parthenope, Italy) Kurt Fellenberg (Technische Universitaet Muenchen, Germany) Alessandro Ferrarini (University of Parma, Italy) Vadim Fraifeld (Ben-Gurion University of the Negev, Israel) Alberto de la Fuente (CRS4, Italy) Pietro Hiram Guzzi (University Magna Graecia of Catanzaro, Italy) Yongqun He (University of Michigan, USA) Shruti Jain (Jaypee University of Information Technology, India) Sarath Chandra Janga (University of Illinois at Urbana-Champaign, USA) Istvan Karsai (East Tennessee State University, USA) Caner Kazanci (University of Georgia, USA) Vladimir Krivtsov (Heriot-Watt University, UK) Miguel ángel Medina (Universidad de Málaga, Spain) Lev V. Nedorezov (Russian Academy of Sciences, Russia) Alexandre Ferreira Ramos (University of Sao Paulo, Brazil) Santanu Ray (Visva Bharati University, India) Dimitrios Roukos(Ioannina University School of Medicine, Greece) Ronald Taylor (Pacific Northwest National Laboratory,U.S. Dept of Energy, USA) Ezio Venturino (Universita’ di Torino, Italy) Jason Jianhua Xuan (Virginia Polytechnic Institute and State University, USA) Ming Zhan (National Institute on Aging, NIH, USA) TianShou Zhou (Sun Yat-Sen University, China) Editorial Office: [email protected]

Publisher: International Academy of Ecology and Environmental Sciences

Address: Unit 3, 6/F., Kam Hon Industrial Building, 8 Wang Kwun Road, Kowloon Bay, Hong Kong

Tel: 00852-2138 6086 Fax: 00852-3069 1955 Website: http://www.iaees.org/ E-mail: [email protected]

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

Article

A comparative analysis on computational methods for fitting an ERGM

to biological network data

 

Sudipta Saha1, Munni Begum2

1Dalla Lana School of Public Health, University of Toronto, Toronto, ONM5S 2J7, Canada 2Department of Mathematical Sciences, Ball State University, Muncie, IN47306, USA

E-mail: [email protected], [email protected]

 

Received 16 October 2014; Accepted 25 November 2014; Published online 1 March 2015

 

 

Abstract

Exponential random graph models (ERGM) based on graph theory are useful in studying global biological

network structure using its local properties. However, computational methods for fitting such models are

sensitive to the type, structure and the number of the local features of a network under study. In this paper, we

compared computational methods for fitting an ERGM with local features of different types and structures. Two

commonly used methods, such as the Markov Chain Monte Carlo Maximum Likelihood Estimation and the

Maximum Pseudo Likelihood Estimation are considered for estimating the coefficients of network attributes.

We compared the estimates of observed network to our random simulated network using both methods under

ERGM. The motivation was to ascertain the extent to which an observed network would deviate from a

randomly simulated network if the physical numbers of attributes were approximately same. Cut-off points of

some common attributes of interest for different order of nodes were determined through simulations. We

implemented our method to a known regulatory network database of Escherichia coli (E. coli).

Keywords biological networks; regulatory networks; exponential random graph models; Monte Carlo

maximum likelihood estimation; maximum pseudo likelihood estimation; E. coli.

 

 

 

 

 

1 Introduction

Over the last decade, there has been a growing interest in the study of biological interaction networks at the

macro and micro molecular levels (Zhang 2012). Identifying basic structural relationships among micro

components is the main goal in the field of systems biology (Li and Zhang, 2013). A formal basis for handling

such complex networks includes computational tools to support the modelling and simulation through methods

developed in mathematical biology and bioinformatics. Since 1960s, with some notable precursors in the

Network Biology     ISSN 2220­8879   URL: http://www.iaees.org/publications/journals/nb/online­version.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml E­mail: [email protected] Editor­in­Chief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences 

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

preceding decades, a variety of mathematical formalisms have been proposed to describe this kind of complex

networking. During the last few years, modelling efforts targeted several distinct types of networks at the

molecular level, such as gene regulatory networks (Pavlopoulos et al., 2011; Mason and Verwoerd, 2007),

metabolic networks (Ideker et al., 2001), signal transduction networks (Stock, 1990) or protein-protein

interaction networks (Pavlopoulos et al., 2011), transcription regulatory networks (Begum et al., 2014).

Networks of interactions that are not restricted to a cell (intercellular communications) or take place at an

altogether different level of detail (immunological networks, ecological networks) are also of immense interest.

In this paper, we considered a transcription regulatory network for the model organism Escherichia coli

K-12 (E. coli) from RegulonDB (Salgado et al., 2006) version 7.4 (http://regulondb.ccg.unam.mx/). The

RegulonDB contains information on transcription initiation and the regulatory network of E. coli. Downloadable

experimental datasets are available on the regulatory network interactions RegulonDB. The transcription factor

(TF) - transcription factor (TF) interaction network data are considered in this work. A transcriptional unit is

defined as a set of one or more genes within an operon transcribed as a set through the utilization of a single

promoter. In the original dataset (represented as a table) of E.coli in the RegulonDB website, there are four

columns. The first column is the name of the Transcription Factor (TF), the second column is TF regulated by TF,

third column is Regulatory effect of the TF on the regulated gene (+ activator, - repressor, +- dual, ? unknown)

and the fourth column is the evidence of support of the existence of the regulatory interaction. The first two

columns are considered and it created the TF-TF interaction network. The observed TF-TF network, which is a

directed network with loops, is given in Fig. 1.

Fig. 1 Observed TF-TF network.

Each vertex is a TF and an edge between two TFs represents a regulation. An edge from a TF to another TF

represents that the first TF regulates the second. We explored this observed network and counted the number of

several network attributes i.e. edge, triangle and stars. In this observed network, there are 387 edges, 114

triangles, twenty 3-ostars, thirty-four 3-istars, ten 5-ostars, and nine 5-istars and the network has two big

clusters and several small clusters. The basic definitions of some network attributes are given below,

Edges or arcs: This term adds one network statistic that is equal to the number of edges in the network. For

undirected networks, an edge is same ask-star (1) [see below] whereas for directed networks, an edge represents

both ostar (1) and istar (1) (Morris et al., 2008).

Triangles: This term adds one statistic to the model that is equal to the number of triangles in the network. For an

undirected network, a triangle is defined to be any set , , , , , of three edges. For a directed network,

2

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

a triangle is defined as any set of three edges and either or (Morris et al.,

2008).

k-star: This term adds one statistic when there exists ties between one node and k number of other nodes. For a

directed network the star statistics are replaced by outgoing stars (k-ostar) and incoming stars (k-istar) (Morris et

al., 2008).

2 Exponential Random Graph Model (ERGM)

An Exponential Random Graph Model (ERGM) models the probability distribution (mass function / density

function) for a given class of graphs. Given an observed graph and a set of local features of that graph, the

probability distribution of the graph is estimated. The distribution provides a concise summary of the class of

graphs to which the observed graph belongs, i.e. the probability distribution can be used to calculate the

probability that any given graph is drawn from the same distribution as the observed graph (Fronczak, 2012;

Robins et al., 2007; Saul and Filkov, 2007; Wasserman and Pattison, 1996).

ERGMs represent the generative process of tie formation in networks with two basic types of processes

namely dyadic dependence and dyadic independence. A dyad refers to a pair of nodes and the relations between

them. Dyadic dependent processes are those in which the state of one dyad depends stochastically on the state of

other dyads. Dyadic independent processes exhibit no direct dependence among dyads. This distinction between

these two types of processes affects the specification, estimation, and behaviour of ERGMs. Models with only

dyadic independent terms have a likelihood function that simplifies to a form that can be maximized using

standard logistic regression models. In contrast, models for processes with dyadic dependence require

computationally intensive estimation and imply complex forms of feedback and global dependence that

confound both intuition and estimation (Handcock et al., 2003; Hunter and Handcock, 2006).

Although an ERGM presents a flexible means to model complex networks, the likelihood function for

parameter estimation involves a mathematically intractable normalizing constant. ERGMs generalize the

Markov random graph models (Frank and Strauss, 1986), and edge and dyadic independence models. Several

statistical computational methods had been proposed to address this difficulty in parameter estimation in an

ERGM. These are the Markov chain Monte Carlo maximum likelihood estimation (MCMCMLE) method and

the Maximum pseudo likelihood estimation (MPLE) method (Handcock et al., 2003; Robins et al., 2007;

Snijders, 2002). We briefly discuss the general ERGM, which is also known as model, to layout the

theoretical background of such models.

The general log-linear form of model is expressed as,

exp ′

                                                                           1

here is a vector of model parameters, is a vector of network statistics, and . is a normalizing constant

which is hard to compute for large networks. In order to simplify the estimation process of the model parameters,

the log-linear model form of the model can be re-expressed as a logit model. In particular, as per

(Wasserman and Pattison, 1996),  denotes an adjacency matrix where a tie from is forced to be present.

That is ,   1 . denotes an adjacency matrix where a tie from is forced to be

absent. That is ,   0 . Finally, denotes an adjacency matrix with complement relation

for the tie from  . That is, ,   , . The model in Equation (1) can be turned to a

logistic regression model by considering a set of binary random variables , where 1 implying a tie

from     as follows.

log1|X

0|X  ′                             2

3

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

  ′                                                 3

Here is defined as vector of network statistic , is the vector of network statistic

and is the vector of difference statistics obtained from the network statistics . when the variable

changes from 1 to 0. The model in Equation (3) is referred to as the   model for single binary relation.

One can work with either the log-linear form of model given in Equation (1) or the logit form given in

equation (3).

3 Computational Methods

There are two methods commonly used to estimate the maximum likelihood fit to exponential random graph

models. These are the maximum pseudo-likelihood estimation (MPLE) and the Markov chain Monte Carlo

maximum likelihood estimation (MCMCMLE) (Handcock et al., 2003; Robins et al., 2007; Snijders, 2002). The

pseudo likelihood function is simply the product of the probabilities of with each probability conditional on

the rest of the data. The method avoids the technical difficulty inherent in the maximum likelihood approach.

The maximum pseudo likelihood estimator (MPLE) for an ERGM, which maximize the pseudo likelihood, may

easily be found (at least in principle) by using logistic regression as a computational device. However, when the

ERGM in question is not a dyadic independence model, the statistical properties of pseudo likelihood estimators

for a network are not well understood (Hunter and Handcock, 2006).

Monte Carlo maximum likelihood estimation (MCMCMLE) is preferred for dyadic dependentp models.

The MCMCMLE of the parameter vector is obtained by maximizing the approximate likelihood. The

MCMCML estimation algorithm is implemented to the software package statnet (Handcock et al., 2003) under

the statistical computational environment R. We use these two packages statnet and ergm (Handcock et al., 2008)

to fit the exponential random model given in equation (1).

4 Simulation Study

We conduct a simulation study for generating random network under varying conditions. We choose conditions

by assigning different number of nodes and network statistics. The primary reason behind conducting the

simulation is to determine the cut-off points for different number of nodes for specific attributes and also to

compare our simulated models with an observed model. For the comparison part, we create two networks by

imposing the same number of network attributes to the models and then compare the results of estimates with the

TF-TF interaction network of E. coli by fitting ERGM.

We consider various network statistics such as arc, stars, and triangles. A k-star is defined where there exist

ties between one node and k number of other nodes. For a directed network the star statistics are replaced by

outgoing stars (k-ostar) and incoming stars (k-istar). In particular arc, 5-ostar, 5-istar, 6-ostar, 6-istar, and

triangles are considered as our network attributes. We physically impose these attributes into the simulated

network by keeping approximately the same number of attributes as the observed network. We also observe that

if we simulate triangles, ostars, istars, and arcs are automatically created. We randomly assign these attributes

to the simulated networks for different number of nodes (n=20, 50, 100) and determine the conditions for these

statistics to become insignificant. The cut-off points for single attributes and for a combination of attributes are

assessed. However, due to the convergence issues, we were unable to obtain the cut-off points for some cases. A

cut-off point is defined as the value where network attributes become significant to insignificant and vice versa.

The rationale is that if the biological network behaves almost the same as the random network, then if we have an

observed network with different number of nodes, we can determine up to which point (approximately) certain

statistics become insignificant.

4

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

We explore the TF-TF interaction network of E. coli from the RegulonDB database and found that there are

ten5-ostars, nine5-istars, ten6-ostars, eight6-istars and 114 triangles. The network contains 175 nodes with

density 0.012. Once we determine the number of attributes in the observed network, then we mimic this network

and randomly simulate two networks. Then we consider different combinations of attributes (ostars, istars and

triangles) and fit the models by ERGM. We fit the same models for the observed data by using ERGM and then

compare the estimates of ERGM for both MCMCMLE and MPLE method.

We begin with networks with small number of nodes and move toward networks with higher number of

nodes. With only 20 nodes, we consider reasonably smaller magnitude of network attributes such as arcs,

3-ostars, 3-istars and triangles as our attributes of interest and then fit the models with ERGM to get the

estimates and also to determine the cut-off points. We start with smaller number of attributes, two3-ostars, two

3-istars, and two triangles. We increment each attribute one at a time to determine the cut-off points. Simulated

network with 77 triangles, twelve3-istars and fourteen3-ostars is presented in Fig. 2.

Fig. 2 Simulated network for n=20.

Next we increased the number of nodes to 50 and 100. A summary of the simulated networks with nodes 20,

50, and 100including the cut-off points for each network statistics is presented in Table 1.

Table 1 Summary of simulation studies for different numbers of nodes.

a% of n (apps) means that the lower cut-offs are the percentage of n (i.e. node). For example, for n=20, lower cut-off of 3-Ostar is 7 which is 35% of n=20.

For n=20 For n=50 For n=100

Trian-

gles 3-Ostar 3-Istar

Trian-gl

es 3-Ostar 3-Istar

Trian-gle

s 3-Ostar 3-Istar

Lower cut-offs - 7 6 - - 3 - 4 5

% ofn (apps)a - 35% 30% - - 6% - 4% 5%

Higher cut-offs 76-80 17 17 - 35 - - 64 64

% ofn (apps) 390% 85% 85% - 70% - - 64% 64%

5

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

It is to be noted that the cut-off points for 3-ostar and 3-istar are quite similar, although we could not find

any conclusive answer when the number of nodes is 50. For 3-ostar and 3-istar, we can say that, the cut-off

points spread out with the increase in the number of nodes. That is if we move toward higher number of nodes,

the lower cut-off points become smaller and the higher cut-off points become smaller. For n=20, the total spread

of insignificant region is close to (85-35) = 50% and which is approximately 60% for n=100. For triangles,

cut-off points should be bigger than the number of nodes n. In summary, we can say that, for network data if we

increase the order of the nodes, the spread of the insignificant region gradually becomes larger for any specific

attributes. To determine the exact percentage of cut-off points, we have to do similar study for different other

nodes, and then we can generalize the idea.

4.1 Comparisons of results under simulation schemes

In our observed TF-TF model, we have 175 nodes, 114 triangles, ten 5-ostars, nine 5-istars, ten 6-ostars and

eight 5-istars. An R-script is written to count the number of attributes in the model. Then we randomly simulate

two different network models to compare the estimates of these network attributes with the observed network. In

both cases, we have very close estimates of network attributes from the simulated models compared to the actual

model. In Fig. 3 and 4, we represent the observed TF-TF network with and without loops.

Fig. 3 Observed TF-TF network with looping. Fig. 4 Observed TF-TF network without looping.

The estimates of attributes of the observed network are presented in Table 2.

Table 2 Estimates of the observed network.

Network

Attributes

MCMCMLE

Estimates MPLE Estimates

Edges -5.3500647 -5.35

Triangle 0.9355000 9.355e-01

5-Ostar 0.0003851 1.564e-05

5-Istar 0.0022043 2.204e-01

6-Ostar 7.797e-05 1.676e-06

6-Istar 1.034e-01 1.034e-03

6

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

To compare the estimates of network attributes between the observed and simulated network, we randomly

simulate two networks by imposing the same number of attributes as TF-TF, one with 5-ostars, 5-istars, and

triangles (network-1) and the other with same number of 6-ostars, 6-istars, and triangles (network-2). The

summary table of the number of network attributes is presented in Table 8.It is to be noted that we found very

similar estimates for the common network attributes edge and triangle from network-1 and network-2. In Tables

3 and 4, we presented the estimates of the observed and simulated networks for both MCMCMLE and MPLE

methods.

Table 3 Estimates from observed versus simulated networks with MCMC MLE.

Network

Attributes

Estimates from

observed networks

Estimates from

simulated networks

Edges -5.3500647 -5.73286

Triangle 0.9355000 2.90743

5-Ostar 0.0003851 -0.01720

5-Istar 0.0022043 -0.08434

6-Ostar 7.797e-05 -1.342e-01

6-Istar 1.034e-01 -8.702e-04

 

Table 4 Estimates from observed versus simulated networks with MPLE.

Network

Attributes

Estimates from

observed networks

Estimates from

simulated networks

Edges -5.35 -5.675632

Triangle 9.355e-01 2.905757

5-Ostar 1.564e-05 -0.016141

5-Istar 2.204e-01 -0.083797

6-Ostar 1.676e-06 -0.1343137

6-Istar 1.034e-03 -0.0006957

From Tables 3 and 4, we conclude that except triangles the rest of the estimates of network attributes are

very close for both MCMCMLE and MPLE method. Therefore, from the biological point of view, if the

observed network is available and the numbers of certain network attributes are known, then it behaves almost

same as the random model for most of the cases. However, to generalize the case we need more experiment and

more exploration among higher order of species. The simulated networks (1 & 2)are presented in Figs 5 and 6.

7

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

Fig. 5 Simulated network-1. Fig. 6 Simulated network-2.

From this experiment, we observe that if we want to simulate a biological data, then one way would be to

explore the observed data and count the number of statistics that we are interested and then physically impose the

number of statistic and then compare. There are several other ways to simulate network models using several

packages on R. The simplest one is to take the density of the observed model and simulate it using binomial

distribution. Also, once a model is fitted by using ERGM package, it can be simulated from the fitted model.

ERGM takes the estimates of the network attributes and simulates a similar type of model. However, in such a

case the physical number of attributes differs substantially. Again, we can also simulate networks by using

Erdos-Renyi model. The comparison of networks obtained using different simulation approaches is presented in

the following section.

4.2 Comparison over simulation methods

In this section, we simulate several networks by the existing simulation schemes. We simulated a network by

using Erdos-Renyi modelling scheme where we consider 175 nodes to create similarity with our observed

TF-TF network and then consider the density of the TF-TF model. The summary of the estimates that we obtain

under different approaches, are provided in Tables 5, 6, and 7 (for both MCMCMLE and MPLE).

Table 5 Estimates from Erdos-Renyi model

Network

Attributes

MCMCMLE

Estimates

MPLE

Estimates

Edges -4.438846 -4.42969

Triangle -0.058951 -0.06431

5-Ostar -0.007336 -0.00546

5-Istar -0.120974 -0.15166

6-Ostar -0.01302 -0.01366

6-Istar -0.75202 -0.84257

 

8

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

 

Table 6 Estimates from Binomial simulated model

Network

Attributes

MCMCMLE

Estimates

MPLE

Estimates

Edges -4.33561 -4.30465

Triangle -0.06395 -0.08794

5-Ostar -0.01032 -0.04404

5-Istar -0.07059 -0.07491

6-Ostar -0.10063 -0.25368

6-Istar -0.31184 -0.29370

Table 7 Estimates from fitted ERGM models

Network

Attributes

MCMCMLE

Estimates

MPLE

Estimates

Edges -5.3318479 -5.332e+00

Triangle 0.7194116 7.194e-01

5-Ostar 0.0001207 6.297e-06

5-Istar 0.0016440 1.644e-03

6-Ostar 1.484e-05 5.333e-07

6-Istar -5.887e-02 -5.887e-02

We notice that as long as we consider the same network, estimates of certain attributes are always similar.

Although some of the estimates we obtain in this simulation study are very close, the physical numbers of

statistics differ substantially. As the simulation scheme takes the fitted estimates into account, the physical

number of different attributes should be close to the observed model. It is important since the exact numbers of

network statistics might have a significant influence on the overall process. The simulated networks using

Erdos-Renyi modelling scheme, binomial density, and ergm package in R are presented in Figs 7, 8, and 9.

Fig. 7 Simulated from Erdos-Renyi model.

9

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

Fig. 8 Simulated network using binomial probability.

 

Fig. 9 Simulated network from fitted ERGM model.

The numbers of network attributes for different simulation models are presented in Table 8.

Table 8 Summary table of estimates observed versus simulated networks.

Network

Attributes

Observed

TF-TF

network

Our

Simulated

network

Simulation

using density

Erdos-Renyi

simulation

ERGM fitted

simulation

Edges 263 327 377 375 247

Triangle 114 115 12 9 82

5-Ostar 10 10 17 6 7

5-Istar 9 9 12 8 2

6-Ostar 10 10 6 2 1

6-Istar 8 8 3 1 3

10

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

From Table 8, we can see that the network attributes are different under all the simulation schemes. In our

process as we are physically imposing the attributes, it is very close to the observed model. The only difference

in the attributes is for the triangles which differ by just 1. From Table 8, we can say that, in terms of number, the

ERGM simulated network generates close result. However, the numbers of triangles substantially differ from the

original observed model. For the simple binomial simulation, the edges do not even come close and the other

attributes also significantly differ. We find similar characteristic for Erdos-Renyimodelling scheme. The reason

behind this could be that both the binomial and Erdos-Renyi consider the density only while simulation. Thus,

the number of attributes along with the edges is very close. However, other attributes such as the number of

5-istars or 5-ostars are not very close. In our random simulation, we emphasize on the number of attributes

because a biological process is a very complicated process. A single edge might have significant influence over

the entire process. Therefore, for biological simulation, we should always keep in mind the physical number of

attributes that we are interested in.

 

5 Conclusions

The number of commonly used network attributes such as k-istar, k-ostar and triangles in the TF-TF regulatory

network of E. coli is determined. These networks attributes statistically serve as the significant local structures

for the E. coli regulatory network. An observed regulatory network of the model organism E. coli was exploredin

terms of finding statisticallysignificant local structure in this study. Simulation of two network models,

network-1 and network-2, and comparison of the estimates of the observed and simulated models are presented.

In both cases, the estimates we obtain are very similar with the observed TF-TF network except for triangles.

Networks simulated using existing methods are compared in terms of these estimates as well. At the end, our

models provide close results and same number of network attributes, which is very important for biological

network data. Therefore, it can concluded that for theE. coil regulatory network, the network can be reproduced

by taking the counts for different attributes, and the simulated network will behave as the observed network.

Simulation of different networks with different number of nodes and network attributes were performed. The

cut-off points were determined for a number of attributes at which point specific attributes become significant to

insignificant, or vice versa. We observed that for smaller numbers of network attributes, the estimates usually

become significant. If the number of attributes increases in a given model, the attributes become insignificant.

We also observe that the models in ERGM do not always converge. Addressing the convergence issue would

be a desirable upgrade for the computational method. For the several models considered, convergence failure

occurred while estimating parameters for any of the methods. For example, for our observed network, the model

with edges, 4-istars, 4-ostars and triangles did not converge. Also, due to the convergence issue, cut-off points

could not be determined for several network attributes. In addition, computation for networks with self loops

demonstrates convergence problems. Therefore, while the ERGM provides flexible methodology, these issues

remain in need of further analysis.

References 

Begum M, Bagga J, Blakey A, Saha S. 2014. Network motif identification and structure detection with graphical

models. Network Biology, 14(4): 155-169

Frank O, Strauss D. 1986. Markov graphs. Journal of the American Statistical Association, 81(395): 832-842

Fronczak A. 2012. Exponential Random Graph Models. ArXiv e-prints. Available at

http://arxiv.org/pdf/1210.7828.pdf

11

Network Biology, 2015, 5(1): 1-12 

  IAEES www.iaees.org

Handcock MS, Hunter DR, Butts CT, et al. 2003. Statnet: Software Tools for the Statistical Modeling of

Network data. Statnet Project. Available at http://statnetproject.org/

Handcock MS, Hunter DR, Butts CT, et al. 2008. ergm: A package to fit, simulate and diagnose

exponential-family models for networks. Journal of Statistical Software, 24. Available at

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743438/

Hunter DR, Handcock MS. 2006. Inference in curved exponential family models for networks. Journal of

Computational and Graphical Statistics, 15(3): 565-583

Ideker T, Thorsson V, Ranish JA, et al. 2001. Integrated genomic and proteomic analysis of a systematically

perturbed metabolic network. Science, 292(5518): 929-934

Li JR, Zhang WJ. 2013. Identification of crucial metabolites/reactions in tumor signaling networks. Network

Biology, 3(4): 121-132

Mason O, Verwoerd M. 2007. Graph theory and networks in biology. IET Systems Biology, 1(2): 89-119

Morris M, Handcock MS, Hunter DR. 2008. Specification of exponential-family random graph models: Terms

and computational aspects. Journal of Statistical Software, 24(4): 1-24

Pavlopoulos GA, Secrier M, Moschopoulos CN, et al. 2011. Using graph theory to analyze biological networks.

BioData Mining, 4(10): 1-27

RobinsGL,Pattison PE,Kalish Y, LusherD. 2007. An introduction to exponential random graph (p*) models for

social networks. Social Networks, 29(2): 173-191

Salgado H, Gama-Castro S, Peralta-Gil M, et al. 2006. RegulonDB (version 5.0): Escherichia coli K-12

transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Research,

34(1): 394-397

Saul ZM, Filkov V. 2007. Exploring biological network structure using exponential random graph models.

Bioinformatics, 23(19): 2604-2611

Snijders TAB. 2002. Markov Chain Monte Carlo Estimation of Exponential Random Graph Models.Journal of

Social Structure, 3: 1-40.

Stock JB, Stock AM, Mottonen JM. 1990. Signal Transduction in Bacteria. Europe PubMed Central, 344(6265):

395-400

Wasserman S, PattisonPE. 1996. Logit models and logistic regressions for social networks: I. An introduction to

Markov graphs and p*. Psychometrika, 61(3): 401-425

Zhang WJ. 2012. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,

Singapore

12

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Article

Determination of keystone species in CSM food web: A topological

analysis of network structure LiQin Jiang1, WenJun Zhang1,2 1School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China 2International Academy of Ecology and Environmental Sciences, Hong Kong

E-mail: [email protected], [email protected]

Received 20 August 2014; Accepted 28 September 2014; Published online 1 March 2015

 

Abstract

The importance of a species is correlated with its topological properties in a food web. Studies of keystone

species provide the valuable theory and evidence for conservation ecology, biodiversity, habitat management,

as well as the dynamics and stability of the ecosystem. Comparing with biological experiments, network

methods based on topological structure possess particular advantage in the identification of keystone species.

In present study, we quantified the relative importance of species in Carpinteria Salt Marsh food web by

analyzing five centrality indices. The results showed that there were large differences in rankings species in

terms of different centrality indices. Moreover, the correlation analysis of those centralities was studied in

order to enhance the identifying ability of keystone species. The results showed that the combination of degree

centrality and closeness centrality could better identify keystone species, and the keystone species in the CSM

food web were identified as, Stictodora hancocki, small cyathocotylid, Pygidiopsoides spindalis,

Phocitremoides ovale and Parorchis acanthus.

Key words keystone species; topological parameters; centrality indices; biological networks.

1 Introduction

Food webs are complex ecological networks describing trophic relationships between species in a certain area

(Pimm, 1982; Belgrano et al., 2005; Arii et al., 2007). If the entire food web is treated as a graph, the nodes in

the graph represent different species (individuals) in the ecosystem and the edges denote the interactions

between species (individuals). As a kind of network, food webs provide a new way to study communities

(Albert and Barabasi, 2002; Newman, 2003). To some extent, such a network is a formalized description for

complex relationships between species within the system.

The concept of keystone species originated from the thought that species diversity of an ecosystem was

controlled by the predators in the food chains, and they affected many other creatures in the ecosystem.

Network Biology     ISSN 2220­8879   URL: http://www.iaees.org/publications/journals/nb/online­version.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml E­mail: [email protected] Editor­in­Chief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences 

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Keystone species refer to those that biomass is disproportionate with its impact on the environment, and the

extinction of keystone species may lead to the collapse of communities (Paine, 1969; Mills et al., 1993;

Springer et al., 2003). The concept of keystone species means that an ecological community is not just a

simple collection of species (Mouquet, 2013). As a result, the ecologically important species might not

necessarily be the rare species conservation biologists always believed (Simberloff, 1998), because rare

species are associated with the little biomass and abundance of species, and the importance of species is a kind

of functional properties of the network. Therefore, the traditional protection pattern for rare species should be

gradually transformed into the maintenance of keystone species (Wilson, 1987).

Keystone species strongly affect species richness and ecosystem dynamics (Piraino et al., 2002), so the

research of keystone species is an important area for predicting and maintaining the stability of ecosystem

(Naeem and Li, 1997; Tilman, 2000). Definition of keystone species emphasizes the functional advantages of

species in the ecosystem, and whether a species is a keystone species depends upon if it has a consistent effect

in ecological function (Power et al., 1996), namely its sensitivity to environmental changes, such as

competition, drought, floods and other ecological processes. In the past, researchers used many field

experimental methods to study keystone species, but they mainly focused on the impact of changes in the

abundance of a species on the other species (Paine, 1992; Wootton, 1994; Berlow, 1999). The main

identification methods include control simulation method (Paine, 1995; Bai, 2011), equivalent advantage

method (Khanina, 1998; Ji, 2002), competitive advantage method (Yeaton, 1988; Bond, 1989), the relative

importance of species interactions method (Tanner and Hughes, 1994), community importance index method

(Power et al., 1996), keystone index method (Jordán et al., 1999) and functional importance index method

(Hurlbert, 1997). However, these methods mainly concentrated on a few species. Thus researchers need to do

an assessment of the interactions between species in the community before the experiment, in order to

determine species not important or interesting. So these methods are obvious subjective and produce certain

mistake on identifying keystone species (Wootton 1994; Bustamante et al., 1995). Furthermore, monitoring

species reaction to changes in the external environment through the above experimental methods requires that

experimenters have a high professional quality. And because of the longer experimental time span, greater cost

(Ernest and Brown, 2001), as well as other factors during the experiment, they are only suitable for

semi-artificial or simple controllable ecosystems. It is more difficult to judge whether a species is a keystone

species based on certain characteristics of species (Menge et al., 1994). So far, we don’t have a perfect and

universally applicable method to identify keystone species.

Research of keystone species has evolved from the initial direct experimental methods to network/software

analysis. For example, Libralato et al. (2006) analyzed keystone indicators of functional groups of a species or

a group of species in food web model through the ecosystem modeling (the Ecopath with Ecosim, EwE), and

then ranked the level of the key indicators to obtain the keystone species. Jordán et al. (2008) pointed out that

there were at least two methods to quantitatively assess the importance of species in communities. One was the

structural importance of network analysis and another for the functional importance of network analysis. So

they calculated the structural importance and the functional importance of species in the food web in Prince

William Sound by CosBiLaB Graph software, and evaluated the advantages and disadvantages of the two

methods. They believed that the combination of these two methods in the future would be the most important

way to research dynamic mechanism. Kuang and Zhang (2011) analyzed the topological properties of the food

web in Carpinteria Salt Marsh and found that parasites played a very important role in the food web, and the

addition of parasites in the food web would change some properties and greatly increase the complexity of the

food web. Therefore, the relationship between keystone species and topological characteristics can provide an

effective method to understand and describe the topological structures, dynamic characteristics and the

14

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

complexity of functions between species within the food web. And it also can provide valuable theory and

evidence for conservation ecology, biodiversity, habitat management, as well as the dynamics and stability of

the ecosystem.

Nevertheless, so far we lack of effective methods to identify keystone species and quantify their relative

importance, so the quantitative assessment of species importance in the food web is becoming increasingly

important and urgent (Paine, 1966; Power et al, 1996; Jordán, 2008). In recent years, there have been some

major discoveries about the topological properties of complex systems (Strogatz, 2001; Albert and Barabási,

2002; Newman, 2003), and these also affect the definition and identification of keystone species. For example,

the highly connected species were found to have more important influence on sustainability of food webs

(Soulé and Simberloff, 1986), which promoted the generation of the concept of degree. Degree of nodes thus

become the most widely used topological parameter to measure the keystone species (Dunne et al., 2002a).

Degree refers to the direct impacts between species (Callaway et al., 2000; West, 2001; Zhang, 2011, 2012a,

2012b, 2012c, 2012d). However, indirect impacts between species are also important (Wooton, 1994; Huang,

et al., 2008). For example, Darwin (1859) described the influence of cats on the clovers. Although indirect

effects of chemical and behavioral studies may be difficult to quantify, some indirect impacts of network links

have been proposed (Ulanowicz and Puccia, 1990; Patten, 1991). Thus the concept of centrality is proposed to

address this problem. Centrality focuses on the indirect effects between species. The impacts of food webs are

generally spread through indirect ways, so it may require detailed research and quantitative description on the

effective range of indirect interactions from a specific point to the entire network (Jordán, 2001). In other

words, it is necessary to determine how relevant these species are in the food web (Yodzis, 2000; Williams et

al., 2002). The concept of centrality stemmed from the social network analysis (Wasserman and Faust, 1994),

namely the ability of a node communicates with other nodes or the intimacy of a node with the others (Go'mez

et al., 2003). These have resulted in a series of topological parameters relating to the relative importance of a

node, such as degree centrality, betweenness centrality, closeness centrality, clustering coefficient centrality,

eigenvector centrality and information centrality, etc. In present paper, we used various methods to detect and

quantify relative importance of species in a famous food web, CSM (Carpinteria Salt Marsh) food web,

reported by Lafferty et al. (2006a, 2006b, 2008), and further studied the correlation between topological

parameters of the food web, aiming to evaluate the effectiveness of various methods in quantifying relative

importance of species and detecting the keystone species in the food webs.

2 Materials and Methods

2.1 Data source

Data were collected from the food web, Carpinteria Salt Marsh, California, reported by Lafferty et al. (2006a,

2006b, 2008) (http: //www.nceas.ucsb.edu/interactionweb/html/carpinteria.html). CSM food web includes four

sub-webs, predator-prey sub-web, predator-parasite sub-web, parasite-host sub-web, and parasite-parasite

sub-web.

2.2 Methods

2.2.1 Pajek software

Pajek is a software platform for network analysis, which contains various methods/algorithms/models on

analysis of topological properties.

2.2.2 Centrality measures

Centrality indices are used to measure impact and importance of nodes in a network. The most commonly used

centrality indices are degree centrality, betweenness centrality, closeness centrality, clustering coefficient

15

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

centrality and eigenvector centrality (Navia et al., 2010; Zhang, 2012a, b).

(1) Degree centrality (DC)

DC is the simplest measure which considers the degree of a node (species) only. The degree of species i is:

Di=Din,i + Dout,i, where Din,i: number of prey species of species i, and Dout,i: number of predator species of

species i. The degree of species was calculated by Net/Partitions/DC/All in Pajek.

(2) Betweenness centrality (BC)

BC is calculated by the following formula

BCi =2∑j≤k gjk(i)/gjk /[(N-1)(N-2)]

where i≠j≠k, gjk: the shortest path between species j and k, gjk(i): number of the shortest paths containing

species i, N: total number of species in the food web. A greater BCi means that the effect of losing species i will

promptly disperse across the food web (Zhang, 2012a, b).

(3) Closeness centrality (CC)

CCi refers to the mean shortest path of species i

CCi=(N-1)/∑j=1N dij

where i≠j, dij is the length of the shortest path between species i and j. A greater CCi means a more importance

of species i.

In Pajek, we use Net/Vector/Centrality/Closeness/All and Net/Vector/Centrality/Betweenness to calculate

BC (Wasserman and Faust, 1994).

(4) Clustering coefficient centrality (CU)

Clustering coefficient centrality denotes the ratio of the actual edges Ei of node i connected with its neighbors

divided by the most possible edges Di(Di-1)/2 between them (Watts and Strogatz, 1998). In other words, it

refers to the ratio of the directly connected neighboring pairs divided by all the neighboring pairs in the

neighboring points of the node, that is

CUi=2Ei/ [Di(Di-1)]

It measures how close the current node is to its neighboring nodes. The averag clustering coefficient of all

nodes is the clustering coefficient of the entire network. Obviously, the clustering coefficient of a network is

weighted by the clustering coefficient of all nodes whose degree must be at least 2. 0≤CU≤1; if CU=0, all

nodes in the network are isolated, and if CU=1, the network is fully connected. Furthermore, studies have

shown that clustering coefficient is related to network modularity. Clustering coefficient of the entire network

reflects the overall trend of all the nodes gathering into a module (Eisenberg and Levanon, 2003; Ravasz et al,

2002).

(5) Eigenvector centrality (EC)

Eigenvector centrality is the dominant eigenvector of the adjacency matrix A of the network (Bonacich, 1987),

i.e., the extent of a node connected to the node with the highest eigenvector centrality. In the word of social

networks, a person tends to occupy the central place more likely if he (she) has contacted more people in the

center position. Eigenvector centrality reflects the prestige and status of nodes. This measure tries to find the

keystone node in the entire network rather than in the local structure. Here, eigenvector is e, and λe=Ae, where

A is the adjacency matrix of a food web. Therefore, the EC of node i is

ECi= e1 (i)

where e1 is the eigenvector corresponding to the maximum eigenvalue λ1. A greater value of ECi means a

greater number of the neighboring nodes connected with node i, and it indicates that the node is in the core

16

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

position.

3 Results

3.1 Degree centrality

As shown in Fig. 1, 2 and Table 1, the species with the greater DC values in the full CSM food web are largely

consistent with that in the predator-parasite sub-web, parasite-parasite sub-web and parasite-host sub-web. And

these species are substantially parasites. The species with the maximum DC value in the predator-prey

sub-web is Pachygrapsus crassipes, and the species with the forth DC value is Willet. Although DC values of

the two species are larger, they are slightly lower than nine parasite species, such as Mesostephanus

appendiculatoides, etc. In addition, the basal species, Marine detritus, is of greater importance also.

Fig. 1 Results of degree centrality for the four sub-webs of CSM food web (upper left: predator-prey sub-web; upper right:

predator-parasite sub-web; bottom right: parasite-parasite sub-web; bottom left: parasite-host sub-web). The numbers in

parentheses are total links (degree, or incoming degree + outgoing degree) and the numbers outside parentheses are species ID

codes. The ID codes of different sub-webs are different from the original species.

17

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Fig. 2 Results of degree centrality for the full CSM food web. The numbers in parentheses are total links (degree, or incoming

degree + outgoing degree) and the numbers outside parentheses are species ID codes.

Table 1 Species with greater DC values in the full CSM food web and four sub-webs.

Predator-prey

sub-web

Predator-parasite

sub-web

Parasite-parasite

sub-web

Parasite-host sub-web Full CSM food web

ID Species ID Species ID Species ID Species ID Species

56 Pachygrapsus

crassipes

90 Culex

tarsalis

118 Mesostephanus

appendiculatoid

es

117 Stictodora

hancocki

118 Mesostephanus

appendiculatoid

es

46 Hemigrapsus

oregonensis

89 Aedes

taeniorhynchus

115 Renicola

cerithidicola

114 Phocitremoides

ovale

117 Stictodora

hancocki

47 Fundulus

parvipinnis

98 Plasmodium

107 Renicola

buchanani

119 Pygidiopsoides

spindalis

116 Small

cyathocotylid

57 Willet 117 Stictodora

hancocki

120 Microphallid 1

116 Small

cyathocotylid

119 Pygidiopsoides

spindalis

43 Cleavlandia

ios

119 Pygidiopsoides

spindalis

116 Small

cyathocotylid

118 Mesostephanus

appendiculatoid

es

114 Phocitremoides

ovale

73 Gillycthys

mirabilis

116 Small

cyathocotylid

110 Large

xiphideocercaria

111 Parorchis

acanthus

111 Parorchis

acanthus

33 Macoma

nasuta

114 Phocitremoides

ovale

109 Catatropis

johnstoni

113 Cloacitrema

michiganensis

113 Cloacitrema

michiganensis

18 Anisogammar 111 Parorchis 105 Probolocoryphe 104 Himasthla 105 Probolocoryphe

18

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

us

confervicolus

acanthus uca rhigedana uca

1 Marine

detritus

118 Mesostephanus

appendiculatoid

es

119 Pygidiopsoides

spindalis

105 Probolocoryphe

uca

108 Acanthoparyphi

um sp.

38 Geonemertes 113 Cloacitrema

michiganensis

117 Stictodora

hancocki

108 Acanthoparyphi

um sp.

56 Pachygrapsus

crassipes

31 Mosquito larva 57 Willet

3.2 Betweenness centrality

As illustrated in Fig. 3 and 4, the BC values of all nodes in the predator-parasite sub-web and parasite-host

sub-web are 0, because these species do not locate between other species in the network. But the radius of

Mesostephanus appendiculatoides in the parasite-parasite sub-web is very obvious, indicating that some

species in the parasite-parasite sub-web need to go through Mesostephanus appendiculatoides. Once this

species is removed, all the interaction chains will collapse and largely destruct the whole sub-web. From Table

2, the BC values of the top four species in the CSM food web are identical with that in the predator-prey

sub-web, while some parasites with larger DC values, such as Mesostephanus appendiculatoides, etc., whose

BC values are lower than that of some free-living species, such as Hemigrapsus oregonensis. It indicates that

the nutritional flow of free-living species in the food web has a greater effect than parasites.

Fig. 3 Results of betweenness centrality for the full CSM food web. The numbers in parentheses are betweenness centralities and

the numbers outside parentheses are species ID codes.

19

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Fig. 4 Results of betweenness centrality for the four sub-webs of CSM food web (upper left: predator-prey sub-web; upper right:

predator-parasite sub-web; bottom right: parasite-parasite sub-web; bottom left: parasite-host sub-web). The numbers in

parentheses are betweenness centralities and the numbers outside parentheses are species ID codes. The ID codes of different

sub-webs are different from the original species. The size of the node relates to the value of BC; the greater BC is, the bigger the

node radius is. The species ID codes of different sub-webs are different from the original species, and the magnification of each

figure is different.

20

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Table 2 Species with greater BC values in the full CSM food web and four sub-webs.

Predator-prey

sub-web

Predator-parasit

e sub-web

Parasite-parasite sub-web Parasite-host

sub-web

Full CSM food web

ID Species ID Species ID Species ID Species ID Species

46 Hemigrapsus

oregonensis

118 Mesostephanusap

pendiculatoides

46 Hemigrapsusore

gonensis

56 Pachygrapsus

crassipes

106 Himasthla species

B

56 Pachygrapsuscr

assipes

47 Fundulusparv

ipinnis

109 Catatropisjohnsto

ni

47 Fundulusparvipi

nnis

73 Gillycthys

mirabilis

111 Parorchis

acanthus

73 Gillycthys

mirabilis

72 Leptocottusar

matus

115 Renicola

cerithidicola

83 Triakis

semifasciata

38 Geonemertes 105 Probolocoryphe

uca

72 Leptocottus

armatus

43 Cleavlandiaio

s

110 Large

xiphideocercaria

57 Willet

48 Western

Sandpiper

116 Small

cyathocotylid

108 Acanthoparyphi

um sp.

50 Least

Sandpiper

120 Microphallid 1 52 Dowitcher

18 Anisogammar

usconfervicol

us

113 Cloacitrema

michiganensis

11 Phoronid

115 Renicola

cerithidicola

106 Himasthla

species B

118 Mesostephanus

appendiculatoid

es

116 Small

cyathocotylid

117 Stictodora

hancocki

111 Parorchis

acanthus

119 Pygidiopsoides

spindalis

120 Microphallid 1

3.3 Closeness centrality

CC values of species in food webs increases with the increase of species richness and completeness of food

web. Connection between species in the full CSM food web is closer than the other four sub-webs (Fig. 5 and

6, Table 3). Combined with Table 2, the species with the maximum CC value is Pachygrapsus crassipes

(species 56) in the full CSM food web, and it is also the greatest in the predator-prey sub-web, indicating it is

closer than other species in food web. The species with the tenth CC value is Fundulus parvipinnis (species 47)

21

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

in the full CSM food web, but it is the third in the predator-prey sub-web, just following behind Pachygrapsus

crassipes and Hemigrapsus oregonensis.

Fig. 5 Results of closeness centrality for the four sub-webs of CSM food web (upper left: predator-prey sub-web; upper right:

predator-parasite sub-web; bottom right: parasite-parasite sub-web; bottom left: parasite-host sub-web). The numbers in

parentheses are closeness centralities and the numbers outside parentheses are species ID codes. Species ID codes of different

sub-webs are different from the original species.

22

  IAEES

Fig. 6 Res

numbers o

Predator-p

sub-web

ID Sp

56 Pa

scr

46 He

ore

47 Fu

vip

73 Gi

mir

18 An

rus

olu

38 Ge

33 Ma

ta

72 Lep

rm

1 Ma

det

sults of closenes

outside parenthe

Tab

prey P

s

ecies ID

achygrapsu

rassipes

5

emigrapsus

egonensis

5

unduluspar

pinnis

5

llycthys

rabilis

5

nisogamma

sconfervic

us

6

eonemertes 1

acomanasu 6

ptocottusa

matus

6

arine

tritus

6

ss centrality for

eses are species

ble 3 Species w

Predator-parasit

ub-web

D Species

52 Dowitch

57 Willet

58 Black-b

d Plover

59 Californ

Gull

69 Clapper

117 Stictodo

hancock

62 Marbled

Godwit

63 Ring-bil

gull

64 Western

Gull

Network

r the full CSM f

s ID codes.

with greater CC

te Parasite

ID

her 116

107

ellie

r

109

nia 115

rail 120

ora

ki

105

d 118

lled 106

n 108

k Biology, 2015

food web. The n

values in the fu

e-parasite sub-w

Species

Small

cyathocotyli

Renicola

buchanani

Catatropis

johnstoni

Renicola

cerithidicola

Microphallid

Probolocory

uca

Mesostephan

appendicula

es

Himasthla

species B

Acanthopary

um sp.

5, 5(1): 13-33

numbers in pare

ull CSM food w

web Parasite

ID

id

119

116

114

a

117

d 1 118

yphe 111

nus

atoid

83

72

yphi 57

entheses are clo

web and four su

e-host sub-web

Species

Pygidiopsoides

spindalis

Small

cyathocotylid

Phocitremoide

ovale

Stictodora

hancocki

Mesostephanus

appendiculatoi

es

Parorchis

acanthus

Triakis

semifasciata

Leptocottus

armatus

Willet

w

oseness centrali

ub-webs.

Full CSM

ID Sp

s 56 Pa

cr

117 St

ha

es 116 Sm

cy

119 Py

sp

s

id

114 Ph

ov

111 Pa

ac

113 Cl

m

118 M

ap

es

108 Ac

um

www.iaees.org

ities and the

M food web

pecies

achygrapsus

rassipes

tictodora

ancocki

mall

yathocotylid

ygidiopsoides

pindalis

hocitremoides

vale

arorchis

canthus

loacitrema

ichiganensis

Mesostephanus

ppendiculatoid

s

canthoparyphi

m sp.

23

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

57 Willet 65 Bonaparte's

Gull

117 Stictodora

hancocki

113 Cloacitremamic

higanensis

47 Fundulus

parvipinnis

9 Oligochaete 111 Parorchis

acanthus

113 Cloacitrema

michiganensis

73 Gillycthys

mirabilis

11 Phoronid 114 Phocitremoi

des ovale

114 Phocitremoides

ovale

116 Small

cyathocotyli

d

119 Pygidiopsoides

spindalis

119 Pygidiopsoi

des

spindalis

111 Parorchis

acanthus

111 Parorchis

acanthus

3.4 Clustering coefficient centrality

CU values of predator-parasite sub-web and parasite-host sub-web appear in two patterns: one for the degree

values of some nodes are less than 2, and the CU values of these nodes are 999999998 in the Pajek; another for

the neighboring nodes of one node are less than 2, and the CU values of these nodes are 0. From Table 4, we

can find that the CU rankings of nodes in the full CSM food web and predator-prey sub-web are really

different.

Table 4 Species with greater CU values in the full CSM food web and four sub-webs.

Predator-prey

sub-web

Predator-parasite

sub-web

Parasite-parasite sub-web Parasite-host

sub-web

Full CSM food web

ID Species ID Species ID Species ID Species ID Species

60 Whimbrel 104 Himasthla

rhigedana

25 Cerithidea

californica

81 Pied Billed

Grebe

106 Himasthla species

B

109 Catatropis

johnstoni

38 Geonemertes 108 Acanthoparyphium

sp.

70 Cooper's Hawk

78 Black-crown

ed Night

heron

111 Parorchis

acanthus

34 Protothaca

61 Mew Gull 113 Cloacitrema

michiganensis

110 Large

xiphideocercaria

63 Ring-billed

gull

103 Euhaplorchis

californiensis

35 Tagelus spp.

64 Western Gull 114 Phocitremoides

ovale

106 Himasthla

species B

65 Bonaparte's

Gull

117 Stictodora

hancocki

71 Northern Harrier

36 Cryptomya 119 Pygidiopsoides

spindalis

115 Renicola

cerithidicola

77 Snowy Egret 105 Probolocoryphe

uca

103 Euhaplorchis

californiensis

68 Bufflehead

107 Renicola

buchanani

107 Renicola

buchanani

24

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

3.5 Eigenvector centrality

Species with greater EC values in the full CSM food web are largely consistent with that in the predator-prey

sub-web (Table 5; Fig. 7, 8). Species with greater EC values in the full CSM food web and predator-prey

sub-web are free-living species, rather than parasites. Willet (species ID 57) has the largest EC value.

Otherwise, species with larger EC values in predator-parasite sub-web and parasite-parasite sub-web are

parasites.

Fig.7 Results of eigenvector centrality for the full CSM food web. The numbers in parentheses are eigenvector centralities and

the numbers outside parentheses are species ID codes.

25

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Fig. 8 Results of eigenvector centrality for the four sub-webs of CSM food web (upper left: predator-prey sub-web; upper right:

predator-parasite sub-web; bottom right: parasite-parasite sub-web; bottom left: parasite-host sub-web). The numbers in

parentheses are eigenvector centralities and the numbers outside parentheses are species ID codes. Species ID codes of different

sub-webs are different from the original species.

Table 5 Species with greater eigenvector values in the full CSM food web and four sub-webs.

Predator-prey

sub-web

Predator-parasite sub-web Parasite-parasite

sub-web

Parasite-host

sub-web

Full CSM food web

ID Species ID Species ID Species ID Species ID Species

57 Willet 98 Plasmodium 111 Parorchis

acanthus

83 Triakis

semifasciata

57 Willet

58 Black-bellied

Plover

90 Culex tarsalis 106 Himasthla

species B

72 Leptocottus

armatus

52 Dowitcher

56 Pachygrapsu

s

crassipes

89 Aedestaeniorhync

hus

104 Himasthla

Rhigedana

73 Gillycthys

mirabilis

58 Black-bellied

Plover

52 Dowitcher 116 Small

cyathocotylid

113 Cloacitrema

michiganensis

57 Willet 72 Leptocottus

armatus

62 Marbled

Godwit

117 Stictodora

hancocki

108 Acanthoparyphi

um sp.

52 Dowitcher 73 Gillycthys

mirabilis

48 Western

Sandpiper

119 Pygidiopsoides

spindalis

119 Pygidiopsoides

Spindalis

58 Black-bellied

Plover

56 Pachygrapsus

crassipes

46 Hemigrapsus

oregonensis

114 Phocitremoides

ovale

117 Stictodora

Hancocki

77 Snowy Egret 83 Triakis

semifasciata

50 Least

Sandpiper

118 Mesostephanus

Appendiculatoides

114 Phocitremoides

Ovale

78 Black-crowne

d Night heron

67 Surf Scoter 

59 California

Gull

111 Parorchis

acanthus

103 Euhaplorchis

californiensis

81 Pied Billed

Grebe

50 Least Sandpiper

47 Fundulus

parvipinnis

113 Cloacitrema

michiganensis

118 Mesostephanus

Appendiculatoid

es

69 Clapper rail 69 Clapper rail

26

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

3.6 Analysis of DC, BC, CC, CU and EC

According to Table 6, the change of species ranking with CU is larger: the top ten species are totally different

with species ranking by remaining four indices. The DC and CC analysis in the full CSM food web (species ID

No. 1 to No. 128) showed that the parasites are more important than free-living species, while reverse results

were obtained from BC and EC analysis. The more important parasites calculated from DC and CC analysis

are Stictodora hancocki, small cyathocotylid, Pygidiopsoides spindalis, Phocitremoides ovale and Parorchis

acanthus (species No. 117, 116, 119, 114, and 111, respectively). Species ranking by BC, DC and CC in the

full CSM food web (species ID No. 1 to No. 83) are basically consistent with the species in the predator-prey

sub-web, and the relative important species are Pachygrapsus crassipes, Hemigrapsus oregonensis and

Fundulus parvipinnis(species ID No.56, 46, and 47, respectively). These results show that parasites in the full

CSM food web do not change the relative importance of free-living species, but increase the DC value of

free-living species.

Table 6 The top ten species (ID codes) ranking by DC, BC, CC, CU and EC in the full CSM food web and predator-prey sub-web, respectively.

DC BC CC CU EC

Full CSM food

web (Species

ID No.1 to No.

128)

118 46 56 25 57

117 56 117 109 52

116 73 116 70 58

119 83 119 34 72

114 47 114 110 73

111 72 111 35 56

113 57 113 106 83

105 108 118 71 67

108 52 108 115 50

56 113 47 103 69

Full CSM food

web (Species

ID No.1 to No.

83)

56 46 56 25 57

57 56 47 70 52

52 73 46 34 58

47 83 57 35 72

73 47 73 71 73

58 72 72 43 56

68 57 52 19 83

50 52 58 16 67

72 75 43 12 50

46 74 83 23 69

Predator-prey

sub-web

(Species No.1

to No. 83)

56 56 56 60 57

46 46 46 81 58

47 47 47 38 56

43 73 73 78 52

57 72 18 61 62

73 38 38 63 48

33 43 33 64 46

18 48 72 65 50

1 50 1 36 59

38 18 57 77 47

27

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

3.7 Pearson correlation of five topological indices

As can seen from Table 7, the Pearson’s correlations of DC and CC are the largest in the full CSM food web

and predator-prey sub-web (0.917 and 0.877, respectively), so DC and CC are strong correlated. DC mainly

measures the importance of a node in the local scope, and thus denotes the self-correlation of the node. CC is a

measure of the ability of one node for controlling the other nodes, and denotes the centralization extent of a

node. Therefore, DC and CC analysis synthesizes the importance of a node locally and globally. Table 6

demonstrates that the keystone species in the CSM food web are Stictodora hancocki, small cyathocotylid,

Pygidiopsoides spindalis, Phocitremoides ovale and Parorchis acanthus (species ID No. 117, 116, 119, 114,

and 111, respectively).

Table 7 Pearson’s correlation coefficients of five topological indices.

Pearson’s correlation coefficient analysis DC BC CC CU EC

DC Full CSM food web 1.000 0.773 0.917 0.483 0.800

predator-prey sub-web 1.000 0.789 0.877 0.053 0.498

BC Full CSM food web 0.773 1.000 0.754 0.338 0.625

predator-prey sub-web 0.789 1.000 0.595 -0.032 0.402

CC Full CSM food web 0.917 0.754 1.000 0.525 0.695

predator-prey sub-web 0.877 0.595 1.000 0.360 0.478

CU Full CSM food web 0.483 0.338 0.525 1.000 0.307

predator-prey sub-web 0.053 -0.032 0.360 1.000 0.205

EC Full CSM food web 0.800 0.625 0.695 0.307 1.000

predator-prey sub-web 0.498 0.402 0.478 0.205 1.000

3.8 Efficiency analysis of the full CSM food web

Table 8 indicates the changes of topological properties after removing different keystone species from the full

CSM food web. The major topological changes before and after removing keystone species include

(1) Number of top species and basal species does not change. The top species are not necessarily the

keystone species of the food web.

(2) Number of links and cycles reduces significantly. It means that the keystone species play an

important role in the food web. There are less cycles between predators and preys due to the removal

of parasites.

(3) Number of total links and maximum links, and link density and connectance decreases respectively.

(4) The maximum chain length did not change significantly.

Compared with the results of removing important species, the changes of the full food web are not

significant in terms of all indices.

In conclusion, the topological structure of the full food web changed significantly after removing the

keystone species, which further validates the results achieved previously.

28

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Table 8 Comparison of topological properties of the full CSM food web with removed different keystone species.

Removed

species

No.117

Removed

species

No.116

Removed

species

No.119

Removed

species

No.114

Removed

species

No.111

Removed

species

No.56

Full CSM

food web

Number of

species, S

127 127 127 127 127 127 128

Number of

links, L

2197 2197 2198 2199 2205 2212 2290

Number of

top species, T

3 3 3 3 3 3 3

Number of

intermediate

species, I

116 116 116 116 116 116 117

Number of

basal species,

B

8 8 8 8 8 8 8

Number of

Chain cycles

71142 70472 71111 71526 74331 80450 85214

Link density,

L/S

17.299 17.299 17.307 17.315 17.362 17.417 17.891

Connectance,

L/S2

0.13621 0.13621 0.13628 0.13634 0.13671 0.13714 0.13977

Mean

connectance,

D

34.598 34.598 34.614 34.630 34.724 34.835 35.781

Maximum

chain length

No.1-5: 3

No.6: 5

No.7-8: 4

No.1-5, 7:

3

No.6: 5

No.8: 4

No.1-5,

7: 3

No.6: 5

No.8: 4

No.1-5,

7: 3

No.6: 5

No.8: 4

No.1-5,

7: 3

No.6: 5

No.8: 4

No.1,3-5,7:

3

No.2,6,8: 4

No.1-5,7:

3

No.6: 5

No.8: 4

4 Discussion

Since the concept of keystone species was first proposed by Paine (1969), the importance of them for

conservation biology has been widely studied. However, due to the limitations of field experimental methods

and the temporal and spatial variation (Menge et al., 1994; Paine, 1995; Estes et al., 1998), more and more

researchers questioned the original concept of keystone species, and have developed various definitions of

keystone species (Mills et al., 1993; Bond, 2001; Davic, 2003). So far, quantitative methods to identify

keystone species remain to be little (Menge et al., 1994; Bond, 2001).

The traditional definitions of keystone species closely related to the richness and biomass of species,

however, the definitions can be considered by combining the topological importance (Jordán et al., 1999,

Jordán et al., 2003). Although the definitions of keystone species from network perspective and traditional

definition are not fully consistent, they provide a quantitative and complementary view for the importance of

species, and stress that the network theory and species conservation practices are highly correlated (Memmott,

1999; Dunne et al., 2002a). The identification of keystone species in food webs using network analysis

depends on the topological characteristics of the network. In present study, we calculated the five centrality

indices of nodes in the full CSM food web and its four sub-webs, and found that species rankings using

different centrality indices were different. Species importance ranking by the degree centrality and

betweenness centrality is based on their direct connection in the network. Degree considers the direct impact of

29

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

a species with its neighboring species directly connected. Betweenness centrality represents the influence of a

species in the "communication" process. On the other hand, closeness centrality, clustering coefficient

centrality and eigenvector centrality take the influence of a species in the global network into consideration

(Borgatti, 2005). In all of these indices, the importance of a species in the global or local network is equally

important, so the different rankings using different centrality indices should be taken as the comprehensive

measure of different topological properties, which are likely relevant to the direct target analysis of theoretical

ecology and conservation ecology (Estrada, 2007).

Studies have indicated that there is a significant correlation between different topological parameters of a

complex network (Wutchy and Stadler, 2003). Our results showed that DC and CC correlated significantly.

Thus the combined use of DC and CC can better reflect the importance ranking of species in the global and

local network.

Power et al. (1996) proposed that a quantitative and predictive generalization is a primary task for

identifying keystone species. Research on complex networks will give us new thoughts and methods to further

understand ecosystems (Abrams et al., 1996; Yodzis, 2001; Piraino et al., 2002). In this article, we identify

keystone species by only using Pajek software, so the analytical method may be more unitary and lack of

comparative study statistically. More methods, as Ecosim networks (Dunne et al., 2002b; Jordán et al., 2008),

CosBiLaB Graph software (Jordán et al., 2008), etc., are suggested using in the future. In addition, we have

used the conventional definition, i.e., taxonomical species, and simplify the life stages of species. In the further

studies, we may distinguish species in different life stages and then integrate their relationship.

Acknowledgment

We thank Mr. WenJin Chen for his pre-treatment on part data in this article.

References

Abrams PA, Menge BA, Mittelbach GG, et al. 1996. The Role of Indirect Effects in Food Webs. In: Integration

of Patterns and Dynamics. 371-395, Chapman and Hall, USA

Albert R, Barabási AL. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics. 74:

47-97

Arii K, Derome R, Parrott L. 2007. Examining the potential effects of species aggregation on the network

structure of food webs. Bulletin of Mathematical Biology, 69: 119-133

Bai KS, Gao RH, et al. 2011. Study on relationship between the keystone species of Larixgmelinii forest and

rhododendron plant. Journal of Inner Mongolia Agricultural University, 32(2): 31-37

Beigrano A, Seharler UM, Dunne J, Ulanowicz RE. 2005. Aquatic Food Webs: An Ecosystem Approach.

Oxford University Press, New York, USA

Berlow EL. 1999. Strong effects of weak interactions in ecological communities. Nature, 398: 330-334

Bonaeich P. 1987. Power and centrality: a family of measures. American Journal of Sociology, 92: 1170-1182

Bond WJ. 1989. The tortoise and the hare: ecology of angiosperm dominance and gymnosperm persistence.

Biological Journal of the Linnean Society, 36(3): 227-249

Bond WJ. 2001. Keystone species - hunting the shark? Science, 292: 63-64

Borgatti SP. 2005. Centrality and network flow. Social Networks, 27: 55-71

Bustamante RH, Branch GM, Eekhout S. 1995. Maintenance of an exceptional intertidal grazer biomass in

South Africa-subsidy by subtidal kelps. Ecology, 76: 2314-2329

30

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Callaway DS, Newman ME, et al. 2000. Network robustness and fragility: Percolation on random graphs.

Physical Review Letters, 85(25): 5468-5471

Davic RD. 2003.Linking keystone species and functional groups: a new operational definition of the keystone

species concept. Conservation Ecology, 7(1)

Dunne JA, Williams RJ, Martinez ND. 2002a. Network structure and biodiversity loss in food webs:

robustness increases with connectance. Ecology Letters, 5:558-567

Dunne JA, Williams RJ, Martinez ND, 2002b. Food-web structure and network theory: The role of

connectance and size. Proceeding of the National Academy of Sciences of the United States of America,

99: 12917-12922

Eisenberg E, Levanon EY. 2003. Preferential attachment in the Protein network evolution. Physical Review

Letters, 91: 1-4

Ernest SKM, Brown JH. 2001. Delayed compensation for missing keystone species by colonization. Science,

292: 101-104

Estes JA, Tinker MT, Williams TM, Doak DF. 1998. Killer whale predation on sea otters linking oceanic and

nearshore ecosystems. Science, 282: 473-476

Estrada E. 2007. Characterization of topological keystone species Local, global and ‘‘meso-scale’’ centralities

in food webs. Ecological Complexity, 4: 48-57

Go´mez D, González-Arangüena E, et al. 2003. Centrality and power in social networks: a game theoretic

approach. Mathematical Social Sciences, 46: 27-54

Huang HB, Yang LM, et al. 2008. Identification technique of essential nodes in protein networks based on

combined parameters. Acta Automatica Sinica, 34(11):1388-1395

Hurlbert SH. 1997. Functional importance vs keystoneness: reformulating some questions in theoretical

biocenology. Australian Journal of Ecology, 22(4): 369-382

Ji LZ, Liu ZG, et al. 2002. Effect on cones picking in broad-leaved pinuskoraiensis forest in Changbai

Mountain. Chinese Journal of Ecology, 21(3): 39-42

Jordán F. 2001. Trophic fields. Community Ecology, 2: 181-185

Jordán F, Liu WC, van Veen JF. 2003. Quantifying the importance of species and their interactions in a

host-parasitoid community. Community Ecology, 4: 79-88

Jordán F, Okey TA, Bauer B, Libralato S. 2008. Identifying important species: a comparison of structural and

functional indices. Ecological Modelling, 216: 75-80

Jordán F, Takács-Sánta A, Molnár I. 1999.A reliability for theoretical quest for keystones. Oikos, 86: 453-462

Khanina L. 1998. Determining keystone species. Conservation Ecology, 2(2): R2

Kuang WP, Zhang WJ. 2011. Some effects of parasitism on food web structure: a topological analysis.

Network Biology, 1(3-4): 171-185

Lafferty KD, Allesina S, Arim M, et al. 2008. Parasites in food webs: the ultimate missing links. Ecology

Letters, 11(6): 533-546

Lafferty KD, Dobson AP, Kuris AM. 2006a. Parasites dominate food web links. Proceedings of the National

Academy of Sciences of the United States of America, 103(30): 11211-11216

Lafferty KD, Hechinger RF, Shaw JC, et al. 2006b. Food webs and parasites in a salt marsh ecosystem. In:

Disease Ecology: Community Structure and Pathogen Dynamics (Collinge S, Ray C, eds). 119-134,

Oxford University Press, UK

Libralato S, Christensen V, Pauly D. 2006. A method for identifying keystone species in food web models.

Ecological Modelling, 1(195): 153-157

Memmott J. 1999. The structure of a plant-pollinator food web. Ecology Letters, 2: 276-280

31

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Menge BA, Berlow EL, et al. 1994. The keystone species concept: variation in interaction strength in a rocky

intertidal habitat. Ecological Monographs, 64: 249-286

Mills LS, Soule ME, Doak DF. 1993. The keystone-species concept in ecology and conservation. Bioscience,

43: 219-224

Mouquet N, Gravel D, Massol F, Calcagno V. 2013. Extending the concept of keystone species to communities

and ecosystems. Ecology Letters, 16(1): 1-8

Naeem S, Li S. 1997. Biodiversity enhances ecosystem reliability. Nature, 390: 507-509

Navia AF, Cortes E, Mejia-Falla PA. 2010. Topological analysis of the ecological importance of elasmobranch

fishes: A food web study on the Gulf of Tortugas, Colombia. Ecological Modelling, 221(24): 2918-2926

Newman ME. 2003. The structure and function of complex networks. Siam Review, 45: 167-256

Paine RT. 1966. Food web complexity and species diversity. American Naturalist, 100: 65-75

Paine RT. 1969. A note on trophic complexity and community stability. American Naturalist, 103: 91-93

Paine RT. 1992. Food-web analysis through field measurement of per capita interaction strength. Nature, 355:

73-75

Paine RT. 1995. A conversation on refining the concept of keystone species. Conservation Biology, 9(4):

962-964

Patten BC. 1991. Network ecology: indirect determination of the life-environment relationship in ecosystems.

In: Theoretical Studies of Ecosystems: The Network Perspective. 288-351, Cambridge University, UK

Pimm SL. 1982. Food Webs. Chapman & Hall, London, UK

Piraino S, Fanelli G, Boero F. 2002. Variability of species’ roles in marine communities: change of paradigms

for conservation priorities. Marine Biology, 140: 1067-1074

Power ME, Tilman D, Estes JA, et al. 1996. Challenges in the quest for keystones. BioScience, 46: 609-620

Ravasz E, Somera AL, Mongru DA, et al. 2002. Hierarehieal organization of modularity in metabolic networks.

Science, 297: 1551-1555

Simberloff D. 1998. Flagships, umbrellas, and keystones: is single-species management passé in the landscape

area? Biological Conservation, 83: 247-257

Springer AM, Estes JA, van Vliet GB, et al. 2003. Sequential megafaunal collapse in the North Pacific Ocean:

an ongoing legacy of industrial whaling? Proceeding of the National Academy of Sciences of the United

States of America, 100: 12223–12228

Strogatz SH. 2001. Exploring complex networks. Nature, 410: 268-275

Soulé ME, Simberloff D. 1986. What do genetic sand ecology tell us about the design of nature reserve.

Biological Conservation, 35(1): 19-40

Tanner JE, Hughes TP. 1994. Species coexistence, key stone species, and succession: a sensitivity analysis.

Ecology, 75(8): 2204-2219

Tilman D. 2000.Causes, consequences and ethics of biodiversity. Nature, 405: 208-211

Ulanowicz RE, Puccia CJ. 1990. Mixed trophic impacts in ecosystems. Coenoses, 5: 7-16

Wassermann S, Faust K. 1994. Social Network Analysis. Cambridge University Press, USA

Watts D, Strogatz SH. 1998. Collective dynamics of “small-world” networks. Nature, 393: 440-442

West DB. 2001. Introduction to Graph Theory. Prentice Hall, USA

Williams RJ, Berlow EL, Dunne JA, et al. 2002. Two degrees of separation in complex food webs. Proceeding

of the National Academy of Sciences of the United States of America, 99: 12913-12916

Wilson EO. 1987. The little things that run the world. Conservation Biology, 1: 344-346

Wootton JT. 1994. Predicting direct and indirect effects: an integrated approach using experiments and path

analysis. Ecology, 75: 151-165

32

Network Biology, 2015, 5(1): 13-33 

  IAEES www.iaees.org

Wutchy S, Stadler PF. 2003. Centers of complex networks. Journal of Theoretical Biology, 223: 45-53

Yeaton RI. 1988. Porcupines, fires and the dynamics of the tree layer of the Burkeaafricana savanna. Journal of

Ecology. 76(4): 1017-1029

Yodzis P. 2000. Diffuse effects in food webs. Ecology, 81: 261-266

Yodzis P. 2001. Must top predators be culled for the sake of fisheries? Trends in Ecology Evolution, 16: 78-84

Zhang WJ. 2011. Constructing ecological interaction networks by correlation analysis: hints from community

sampling. Network Biology, 1(2): 81-98

Zhang WJ. 2012a. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,

Singapore

Zhang WJ. 2012b. How to construct the statistic network? An association network of herbaceous plants

constructed from field sampling. Network Biology, 2(2): 57-68

Zhang WJ. 2012c. Modeling community succession and assembly: A novel method for network evolution.

Network Biology, 2(2): 69-78

Zhang WJ. 2012d. Several mathematical methods for identifying crucial nodes in networks. Network Biology,

2(4): 121-126

33

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

Article

Evolutionary Network Control also holds for nonlinear networks:

Ruling the Lotka-Volterra model Alessandro Ferrarini Department of Evolutionary and Functional Biology, University of Parma, Via G. Saragat 4, I-43100 Parma, Italy

E-mail: [email protected], [email protected]

Received 27 October 2014; Accepted 30 November 2014; Published 1 March 2015

Abstract

The proof of our understanding of ecological and biological systems is measured by our skill to rule them, i.e.

to channelize them towards a desired state. Control is a cardinal issue in most complex systems, but because a

general theory to apply it in a quantitative manner has been absent so far, little was known about how we can

rule weighted, directed networks that represent the most common configuration of real systems. To this

purpose, Evolutionary Network Control (ENC) has been developed as a theoretical and methodological

framework aimed to the control of ecological and biological networks by coupling network dynamics and

evolutionary modelling. ENC is a tools to address controllability for arbitrary network topologies and sizes.

ENC has proven to cover several topics of network control, e.g. a) the global control from inside and b) from

outside, c) the local (step-by-step) control, and the computation of: d) control success, e) feasibility, and f)

degree of uncertainty. Taken together, these results indicate that many aspects of controllability can be

explored exactly and analytically for arbitrary networks, opening new avenues to deepening our understanding

of complex systems. As yet, I have applied ENC only to linear ecological and biological networks. In this

work, I show that ENC also holds for any kind of nonlinear networks, and provide an applicative example

based on the nonlinear, widely-used, Lotka-Volterra model.

Keywords Evolutionary Network Control; genetic algorithms; global dynamics; nonlinear networks; predator-

prey model; sensitivity analysis; stochastic simulations.

1 Introduction

I have recently introduced a methodological framework, named Evolutionary Network Control (ENC;

Ferrarini, 2013a; Ferrarini, 2013b), so that ecological and biological networks can be controlled from the

outside (Ferrarini, 2013a) but also through the use of endogenous controllers (Ferrarini, 2013b), by coupling

Network Biology   ISSN 2220­8879  URL: http://www.iaees.org/publications/journals/nb/online­version.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml E­mail: [email protected] Editor­in­Chief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences 

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

network dynamics and evolutionary modelling (Holland, 1975; Goldberg, 1989). The endogenous control

requires that the network is optimized at the beginning of its dynamics, by acting upon nodes, edges or both, so

that it will inertially go to the desired state. On the contrary, the exogenous control requires that one or more

exogenous controllers act upon the network at each time step (Ferrarini, 2011a; Ferrarini, 2011b).

In another paper (Ferrarini, 2013c), I have examined a further issue: how reliable is the achieved solution?

This is an important question, because it’s not assured that while managing a network-like system we are able

to impose to nodes and edges exactly the optimized values we would need in order to achieve the desired

control. In order to face this topic, I have coined a 3-parts (network dynamics - genetic optimization -

stochastic simulations) solution.

I have further proposed a solution to the choice of the most feasible solution to network control by

introducing the concepts of control success and feasibility (Ferrarini, 2013d).

Later, I have faced another pivotal question, i.e. how to locally (step-by-step) drive ecological and biological

networks so that also intermediate steps (and not only the final state) are under our control (Ferrarini, 2014).

The ratio behind this question is that intermediate dynamics could potentially go below or above critical

ecological or biological thresholds, hence invalidating the final global control. To this purpose, I have

introduced a solution to the complete (local + global) control of ecological and biological networks by making

use of an intermediate control function.

As yet, I have applied ENC only to linear ecological and biological networks. While it is not goal of this

paper to discuss the implications of the ENC of ecological and biological networks, I show here that ENC also

holds for any kind of nonlinear networks, and provide an applicative example based on the nonlinear, widely-

used, Lotka-Volterra model (Lotka, 1925; Volterra, 1926).

2 Subduing the Lotka-Volterra Model via ENC: Mathematical Formulation

It’s given a generic ecological (or biological) dynamical system with n interacting actors

( , )dS

S tdt

(1)

where Si is the amount (e.g., number of individuals, total biomass, density, covered surface etc…) of the

generic i-th actor.

If we also consider inputs (e.g. species reintroductions) and outputs (e.g. hunting) from-to outside, we must

write

( , ) ( ) ( )dS

S t I t O tdt

(2)

with initial values

0 1 2 nS =<S (0), S (0)...S (0)>

(3)

and co-domain limits

1min 1 1max

nmin nmax

S ( ) S

...

S ( ) Sn

S t

t

S t

(4)

35

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

The Lotka-Volterra equations (Lotka, 1925; Volterra, 1926), otherwise known as the predator-prey

equations, are a combination of two first-order, non-linear, differential equations frequently used to describe

the dynamics of biological systems with two interacting species, one as a prey and the other as a predator.

The Lotka-Volterra model makes five assumptions about the environment and the dynamics of the two

interacting species: 1) the prey population finds food at any times; 2) the food supply for the predator depends

on the size of the prey population; 3) the rate of change of each population is proportional to its size; 4) while

interacting, the environment does not change; 5) predators have unbounded appetency. As differential

equations are used, the solution is deterministic and continuous so that the generations of both the predator and

prey persistently overlap.

The nonlinear Lotka-Volterra model with logistic grow of the prey S1 is a particular case of eq. (1) and it

reads as follows

1 11 1 2

21 2 2

(1 )dS S

S S SdtdS

S S Sdt

(5)

with initial values

0 1 2S =<S (0), S (0)>

(6)

and co-domain limits

1min 1 1max

2min 2 2max

S ( ) S

S ( ) S

S tt

S t

(7)

In order to get the global control of such model, ENC can act upon the previous Lotka-Volterra model as

follows

1 11 1 2

21 2 2

0 1 2

(1 )

S =<S (0), S (0)>

dS SS S S

dt

dSS S S

dt

(8)

with steady values at

1 2 0dS dS

dt dt (9)

where the tilde symbol means that the ENC is active over such actors by controlling equation parameters and

initial values.

The control equations in (8) are able to globally drive any ecological and biological network, and the

nonlinear Lotka-Volterra as a particular case, to the desired final state with an uncertainty degree that can be

calculated as proposed in Ferrarini (2013c).

36

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

1 11 1 2

21 2 2

0 1 2

(1 )

S =<S (0), S (0)>

dS SS S S

dt

dSS S S

dt

(10)

where the underscores represent 1%, 5% or 10% uncertainties about the optimized parameters. Thus for

example:

0.99* 1.01*

0.95* 1.05*

0.90* 1.10

*

or

or

(11)

If we stochastically vary n times (e.g. 10,000 times) the parameters that have been optimized via ENC, we can

compute how many times such uncertainty makes the optimization procedure useless. Hence, uncertainty

about network control can be computed as in Ferrarini (2013c)

% 100*k

Un

(12)

where k is the number of stochastic simulations acting upon the optimized parameters that make the

optimization procedure useless (i.e. the goal of the optimization procedure is not reached).

In order to apply the ENC framework, the software Control-Lab (Ferrarini, 2013e) has been developed using

Visual Basic (Balena, 2001; Pattinson, 1998).

3 An Applicative Example

Let’s consider the Lotka-Volterra system of eq. (5) with the following parameters and constants:

(13)

Fig. 1 shows its dynamical behaviour. Figure 2 depicts its phase plot.

37

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

Fig. 1 Time plot of the above-depicted nonlinear Lotka-Volterra dynamical system.

Fig. 2 Phase plot of the above-depicted nonlinear Lotka-Volterra dynamical system.

38

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

0

50

100

150

200

250

300

0 10 20 30

S1 S2S1(0) 100

S2(0) 10

a 4.451731398

b 0.035693864

k 500

c 1.005687682

d 3.570078492

The previous nonlinear system goes at the steady state with S1= 80.00 and S2= 67.20.

Let’s suppose we want that both the prey and the predator go to equilibrium with values close to 100. Fig. 3

shows the optimized solution (S1= 99.45 and S2= 99.91) detected via ENC by acting upon alpha, beta, gamma

and delta. The steady state happens at t= 42.15.

Fig. 3 The ruled Lotka-Volterra system with equilibrium values S1= 99.45 and S2= 99.91. The optimized parameters detected via Evolutionary Network Control are indicated above the two curves. The solution has been found through Control-Lab (Ferrarini, 2013e).

After 10,000 simulations (1% uncertainty) on the optimized parameters (alpha, beta, gamma and delta), we

achieve the results of Table 1. It is evident that the solution depicted in Fig. 3 is robust with regard to a 1%

sensitivity analysis, as mean and median values of the 10,000 simulations are very close to the desired solution

S1= S2= 100.

Table 1 Sensitivity analysis (1% uncertainty) on the optimized parameters of Fig. 3.

Statistics S1 S2

min 96.73 97.98

max 102.38 101.72

mean 99.52 99.92

median 99.57 99.92

std. dev. 1.02 0.74

39

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

0

50

100

150

200

250

300

0 10 20 30 40 50 60 70 80 90 100

S1 S2S1(0) 100

S2(0) 10

a 3.614049898

b 0.064913992

k 500

c 1.086066854

d 3.592714077

Now let’s suppose we want that both the prey and the predator go to equilibrium with values close to 50. Fig.

4 shows the optimized solution (S1= 50.96 and S2= 50.00) found via ENC by acting upon alpha, beta, gamma

and delta. The steady state happens at t= 74.48.

Fig. 4 Ruled Lotka-Volterra system with equilibrium values S1= 50.96 and S2= 50.00. The optimized parameters detected via Evolutionary Network Control are indicated above the two curves. The solution has been found through the software Control-Lab (Ferrarini, 2013e).

After 10,000 simulations (5% uncertainty) on the optimized parameters (alpha, beta, gamma and delta), we

achieve the results of Table 2. It is clear that the solution depicted in Fig. 4 is robust with regard to a 5%

sensitivity analysis, as mean and median values of the 10,000 simulations are very close to the desired solution

S1= S2= 50.

Table 2 Sensitivity analysis (5% uncertainty) on the optimized parameters of Fig. 4.

Statistics S1 S2

min 44.55 45.50

max 58.45 55.12

mean 50.94 49.95

median 50.85 49.91

std. dev. 2.52 1.95

40

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

0

50

100

150

200

250

300

0 10 20 30 40 50 60 70 80 90 100

S1 S2S1(0) 100

S2(0) 10

a 3.69942185

b 0.093481399

k 500

c 0.551459224

d 2.830463157

Last, let’s suppose we require that the prey and the predator go to equilibrium with S1+S2= 90. Fig. 5 shows

the optimized solution (S1= 54.92 and S2= 35.23) detected via ENC by acting upon alpha, beta, gamma and

delta. The steady state happens at t= 95.00.

Fig. 5 Ruled Lotka-Volterra system with equilibrium values S1= 54.92 and S2= 35.23 and sum equal to 90.15. The optimized parameters detected via Evolutionary Network Control are indicated above the two curves. The solution has been found through the software Control-Lab (Ferrarini, 2013e).

After 10,000 simulations (10% uncertainty) on the optimized parameters (alpha, beta, gamma and delta), we

achieve the results of Table 3. The solution depicted in Fig. 5 is robust with regard to a 10% sensitivity

analysis, as mean and median values of the 10,000 simulations are approximately equal to the desired solution

S1+S2= 90.

Table 3 Sensitivity analysis (10% uncertainty) on the optimized parameters of Fig. 5.

Statistics S1 S2

min 42.12 29.52

max 71.29 42.14

mean 55.22 35.27

median 54.91 35.13

std. dev. 5.66 2.65

Any other kind of network control is feasible using ENC, including the control of Lotka-Volterra models

with n>2 actors. ENC can also be employed to impose early (or late) stability to Lotka-Volterra models in

particular, but also to any arbitrary ecological and biological networks more in general (Ferrarini, 2015).

41

Network Biology, 2015, 5(1): 34-42

IAEES www.iaees.org

4 Conclusions

The control of ecological and biological networks has unlimited applications: a) neutralize damages to

ecological and biological networks, b) safeguard rare and endangered species, c) manage ecological systems at

the least possible cost, d) counteract the impacts of climate change.

While in previous papers I have showed how to globally and locally rule linear ecological and biological

networks, here I have showed that Evolutionary Network Control is on top of the control of nonlinear

networks as well.

References

Balena F. 2001. Programming Microsoft Visual Basic 6.0. Microsoft Press, Redmond, WA, USA

Ferrarini A. 2011a. Some thoughts on the controllability of network systems. Network Biology, 1 (3-4): 186-

188

Ferrarini A. 2011b. Some steps forward in semi-quantitative network modelling. Network Biology, 1(1): 72-78

Ferrarini A. 2013a. Exogenous control of biological and ecological systems through evolutionary modelling.

Proceedings of the International Academy of Ecology and Environmental Sciences, 3(3): 257-265

Ferrarini A. 2013b. Controlling ecological and biological networks via evolutionary modelling. Network

Biology, 3(3): 97-105

Ferrarini A. 2013c. Computing the uncertainty associated with the control of ecological and biological systems.

Computational Ecology and Software, 3(3): 74-80

Ferrarini A. 2013d. Networks control: introducing the degree of success and feasibility. Network Biology,

3(4): 115-120

Ferrarini A. 2013e. Control-Lab 5.0: a software for ruling Quantitative Ecological Networks using Ecological

Network Control Manual, 137 pages

Ferrarini A. 2014. Local and global control of ecological and biological networks. Network Biology, 4(1): 21-

30

Ferrarini A. 2015. Imposing early stability to ecological and biological networks through Evolutionary

Network Control. Proceedings of the International Academy of Ecology and Environmental Sciences, 5(1)

(in press)

Goldberg DE. 1989. Genetic Algorithms in Search Optimization and Machine Learning. Addison-Wesley,

Reading, USA

Holland JH. 1975. Adaptation in natural and artificial systems: an introductory analysis with applications to

biology, control and artificial intelligence. University of Michigan Press, Ann Arbor, USA

Lotka AJ. 1925. Elements of Physical Biology. Williams & Wilkins Co., Baltimore, USA

Pattison T. 1998. Programming Distributed Applications with COM and Microsoft Visual Basic 6.0.

Microsoft Press, Redmond, WA, USA

Volterra V. 1926. Variazioni e fluttuazioni del numero d'individui in specie animali conviventi. Memoriale

Accademia Nazionale dei Lincei di Roma, 2: 31-113

42

Network Biology

The Network Biology (ISSN 2220-8879; CODEN NBEICS) is an open access, peer/open reviewed online journal that considers scientific articles in all different areas of network biology. It is the transactions of the International Society of Network Biology.It dedicates to the latest advances in network biology. The goal of this journal is to keep a record of the state-of-the-art research and promote the research work in these fast moving areas. The topics to be covered by Network Biology include, but are not limited to:

Theories, algorithms and programs of network analysis Innovations and applications of biological networks Ecological networks, food webs and natural equilibrium Co-evolution, co-extinction, biodiversity conservation Metabolic networks, protein-protein interaction networks, biochemical reaction networks,

gene networks, transcriptional regulatory networks, cell cycle networks, phylogenetic networks, network motifs

Physiological networks Network regulation of metabolic processes, human diseases and ecological systems Social networks, epidemiological networks System complexity, self-organized systems, emergence of biological systems, agent-based

modeling, individual-based modeling, neural network modeling, and other network-based modeling, etc.

We are also interested in short communications that clearly address a specific issue or completely

present a new ecological network, food web, or metabolic or gene network, etc.

Authors can submit their works to the email box of this journal, [email protected] and(or)

[email protected]. All manuscripts submitted to Network Biology must be previously unpublished

and may not be considered for publication elsewhere at any time during review period of this journal.

In addition to free submissions from authors around the world, special issues are also accepted. The

organizer of a special issue can collect submissions (yielded from a research project, a research group,

etc.) on a specific topic, or submissions of a conference for publication of special issue.

Editorial Office: [email protected]

Publisher: International Academy of Ecology and Environmental Sciences

Address: Unit 3, 6/F., Kam Hon Industrial Building, 8 Wang Kwun Road, Kowloon Bay, Hong Kong

Tel: 00852-2138 6086

Fax: 00852-3069 1955

E-mail: [email protected]

Network Biology ISSN 2220-8879 ∣ CODEN NBEICS

Volume 5, Number 1, 1 March 2015

Articles

A comparative analysis on computational methods for fitting an ERGM

to biological network data

Sudipta Saha, Munni Begum 1-12

Determination of keystone species in CSM food web: A topological

analysis of network structure

LiQin Jiang, WenJun Zhang 13-33

Evolutionary Network Control also holds for nonlinear networks:

Ruling the Lotka-Volterra model

Alessandro Ferrarini 34-42

IAEES http://www.iaees.org/