Large-Scale Analysis of Plasmid Relationships through Gene-Sharing Networks
-
Upload
independent -
Category
Documents
-
view
2 -
download
0
Transcript of Large-Scale Analysis of Plasmid Relationships through Gene-Sharing Networks
and Evolution. All rights reserved. For permissions, please e-mail: [email protected] The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology
Large scale analysis of plasmids relationships through gene sharing networks 1
Research article 2
Manu Tamminen1, Marko Virta1, Renato Fani2, Marco Fondi2* 3 4
1Department of Food and Environmental Sciences, P.O.Box 56, 00014 University of Helsinki, 5
Finland 6
7 2Lab. of Microbial and Molecular Evolution, Dept. of Evolutionary Biology, Via Romana 17-8
19, University of Florence, Italy 9
10
* Correspoding author: 11
Marco Fondi 12
Lab. of Microbial and Molecular Evolution, Dept. of Evolutionary Biology 13
Via Romana 17-19, University of Florence 14
Italy 15
E-mail [email protected] 16
Tel. +39 0552288248 17
Fax. +39 055 2288250 18
19
Running head: Analysis of plasmids through gene sharing networks 20
Keywords: horizontal gene transfer, antibiotic resistance, plasmid, network 21
22
23
MBE Advance Access published November 29, 2011 at B
iblioteca di Scienze, Universit? degli studi di Firenze on January 11, 2012
http://mbe.oxfordjournals.org/
Dow
nloaded from
Abstract 23
Plasmids are vessels of genetic exchange in microbial communities. They are known to 24
transfer between different host organisms and acquire diverse genetic elements from 25
chromosomes and/or other plasmids. Therefore, they constitute an important element in 26
microbial evolution by rapidly disseminating various genetic properties among different 27
communities. A paradigmatic example of this is the dissemination of antibiotic resistance 28
genes that has resulted in the emergence of multiresistant pathogenic bacterial strains. To 29
globally analyze the evolutionary dynamics of plasmids, we built a large graph in which 2343 30
plasmids (nodes) are connected according to the proteins shared by each other. The analysis 31
of this gene sharing network revealed an overall coherence between network clustering and 32
the phylogenetic classes of the corresponding micro-organisms, likely resulting from genetic 33
barriers to horizontal gene transfer between distant phylogenetic groups. Habitat was not a 34
crucial factor in clustering as plasmids from organisms inhabiting different environments 35
were often found embedded in the same cluster. Analyses of network metrics revealed a 36
statistically significant correlation between plasmid mobility and their centrality within the 37
network, providing support to the observation that mobile plasmids are particularly important 38
in spreading genes in microbial communities. Finally, our study reveals an extensive (and 39
previously undescribed) sharing of antibiotic resistance genes between Actinobacteria to 40
Gammaproteobacteria, suggesting that the former might represent an important reservoir of 41
antibiotic resistance genes for the latter. 42
43
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Introduction 43
Plasmids are paradigmatic examples of the network-like structure of microbial evolution 44
(Brilli et al. 2008). Indeed, they are among the most important players in the evolution of 45
prokaryotes because they can be transferred between microorganisms, thus representing 46
natural vectors for the transfer of genes and the functions they code for (Norman, Hansen, 47
Sorensen 2009). Accordingly, they often provide a basis for genomic rearrangements via 48
homologous recombination, facilitating the loss and/or acquisition of genes during these 49
events, which may eventually lead to horizontal gene transfer (HGT). As a consequence, 50
plasmids possess a mosaic structure with collections of functional genetic modules, each of 51
which likely possessing an independent phylogenetic history, organized into a stable and self-52
replicating entity (Osborn et al. 2000; Toussaint, Merlin 2002; Bosi, Fani, Fondi 2011). 53
Importantly, these functional blocks often embed genes that might have a great impact on the 54
metabolic functions of the host cell, providing additional traits that can be accumulated 55
without altering the gene content of the bacterial chromosome (Fondi et al. 2010). Plasmids 56
are actually involved in many accessorial functions and constitute, together with "not 57
essential" chromosomal regions, what is referred to as the "dispensable genome" in the 58
microbial pan-genome concept (Medini et al. 2005). This, in turn, can include genes for 59
ecologically important traits such as antibiotic resistance (Crosa, Luttropp, Falkow 1975), 60
pathogen virulence (Hacker, Kaper 2000), symbiotic nitrogen fixation (van Rhijn, 61
Vanderleyden 1995) and the production of allelopathic bacteriocins (Riley, Gordon 1999). 62
Among these processes, pathogenesis and antibiotic resistance are those that have been 63
primarily explored up to now. Indeed, it has been shown that the presence of plasmids can be 64
strictly linked to the emergence of a pathogenic lineages within a given taxonomic unit 65
(Reynaud et al. 2008; Le Roux et al. 2010). Parallely, in terms of antibiotic resistance, 66
plasmids serve a central role as the vehicles for resistance gene capture and their subsequent 67
spreading (Bennett 2008; Fondi, Fani 2010). Dissemination of these features represents one of 68
the most important effects of ‘bacterial sex’, from both an evolutionary and ecological 69
viewpoint (Kohiyama et al. 2003). In this context, plasmid mobility represents an essential 70
parameter of microorganisms’ fitness and it might also be a key element to an understanding 71
of the epidemiology of these plasmid-carried traits (Smillie et al. 2010). However, despite 72
their clear biological relevance, the pathways followed by plasmids during their evolutionary 73
history remain almost obscure. 74
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Nowadays the use of massive plasmid sequencing as a routine laboratory technique (Schluter 75
et al. 2008), together with the development of bioinformatics tools enabling the visualization 76
of sequence homology relationships through similarity networks (Vlasblom et al. 2006; Brilli 77
et al. 2008), can greatly speed up studies of gene mobility among plasmids. Furthermore, 78
thanks to the expansion of network oriented representation of sequences similarity 79
relationships (Lima-Mendez, Toussaint, Leplae 2007; Brilli et al. 2008; Dagan, Martin 2009; 80
Dagan et al. 2010; Fondi et al. 2010; Fondi, Fani 2010; Halary et al. 2010), graph theory 81
measures have been applied to better describe the gene(s) flow across the diverse microbial 82
communities, paving the way to large scale comparative analyses adopting bioinformatics 83
strategies. In more detail, by adopting a gene sharing network approach, Dagan et al. (Dagan, 84
Artzy-Randrup, Martin 2008) reported the construction and the analysis of graphs capturing 85
both vertical and lateral components of evolutionary history among 539,723 genes distributed 86
across 181 sequenced prokaryotic genomes. The same authors estimated that an impressive 87
amount (almost 80% on average) of the gene content of each analyzed genome was involved 88
in lateral gene transfer at some point in evolution. More recently, Halary et al. (Halary et al. 89
2010) applied mathematical studies of the centralities of a network embedding 119,381 90
homologous DNA families. They demonstrated that plasmids, and not viruses, are likely the 91
key vectors of genetic exchange between bacterial chromosomes. Moreover, results also 92
supported a disconnected yet highly structured network of genetic diversity, revealing the 93
existence of multiple “genetic worlds”. From the analysis of the same network, the same 94
authors also inferred that DNA pools mostly circulate between vehicles (i.e. plasmids, phages 95
and chromosomes) of the same type. Finally, (Lima-Mendez et al. 2008) represented 96
relationships across the phage population as a weighted graph where nodes represented 97
phages and edges represented phage–phage similarities in terms of gene content. Their 98
approach succeeded in capturing the pervasive mosaicism of phage genomes, indicating the 99
importance of horizontal gene exchange in their evolution and also proving to be a promising 100
tool for predicting lifestyles of individual phages from sequence data. 101
By applying a computational, network-oriented pipeline we have analyzed the evolutionary 102
relationships among 2343 microbial plasmids in order to explore the role of each of them 103
within the reticulate evolutionary dynamics of this class of mobile genetic elements. 104
Moreover, we focused the attention on the proteins involved in two main biological processes, 105
that is antibiotic resistance and pathogenesis, as well as on plasmid features that might be 106
involved in ruling the overall network of plasmids-mediated HGT (e.g. plasmid mobility). 107
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Data obtained provide interesting clues in gaining a systemic interpretation of the overall 108
behaviour of plasmids within bacterial evolution and in the spreading of some key biological 109
features such as antibiotic resistance and virulence. 110
Methods 111
Datasets assembly 112
All the available complete plasmid sequences (in GenBank format) were downloaded from 113
NCBI using EFetch interface (as on July the 24th 2010). Totally, 2343 plasmids (102772 114
ORFs) were retrieved and a complete table including all their main features (their size, 115
taxonomy, accession codes etc.) is available as Supplemental Information S1. Moreover, two 116
different subsets of sequences were created starting from the whole plasmid sequences 117
dataset. On one side, we created a set of plasmid-encoded proteins that were involved in the 118
process of antibiotic resistance. This was done using each of the retrieved sequences as seed 119
in BLAST (Altschul et al. 1997) search against the Antibiotic Resistance DataBase (ARDB) 120
(Liu, Pop 2009) using the following parameters: e-value 1e-20, minimum alignment length 50 121
amino acid (aa), that is a degree of amino acid sequence identity sufficiently high to retrieve 122
all the proteins that should perform a function related to antibiotic resistance (Friedberg 2006; 123
Fondi, Fani 2010). In this way, a set of 2678 sequences putatively associated to antibiotic 124
resistance (AR) were retrieved (See Supplemental Information S2 for the complete list of 125
accession codes of the proteins used in this work). These sequences belonged to 501 different 126
plasmids. 127
The same strategy with the same parameters, was applied when searching for virulence 128
related proteins (virulence factors, VF) within the whole plasmid sequence dataset. In this 129
case the probed database was the Virulence Factor DataBase (VFDB) (Chen et al. 2005; Yang 130
et al. 2008) and a set of 7840 sequences was retrieved from this BLAST search (belonging to 131
615 plasmids). Again, all the information about these sequences is available as Supplemental 132
Information S3. 133
Network construction 134
The network construction workflow described in this paragraph has been applied to each of 135
the three assembled datasets, i.e. the one embedding all retrieved plasmids sequences 136
(hereinafter referred to as “all sequences network”), the one embedding the antibiotic 137
resistance-related sequences (the “resistance network”) and the one embedding virulence 138
factor-related sequeces (the “virulence network”). 139
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
In detail, each of the sequence dataset was used in an all against all BLAST probing 140
(Altschul et al. 1997) using the Murska parallel computing cluster (Center for Scientific 141
Computing, Espoo, Finland). The BLAST output was parsed to include matches from two 142
different identity thresholds (70% and 95%) by using ad hoc implemented Python scripts. 143
Two parsed files were obtained, one embedding those sequences sharing at least 70% 144
sequence identity and another one embedding sequences sharing at least 95% identity. 145
Similarly to (Dagan, Artzy-Randrup, Martin 2008) and, later, to (Halary et al. 2010), this 146
allows to interpret the resulting networks under a molecular clock–based assumption, i.e. 147
under the hypothesis that proteins with the highest percentages of identity were likely to be 148
more recently shared than the ones with less identity. In the present context, proteins with 149
95% identity were considered more recently shared than those with 70%. 150
Subsequently each of these parsed BLAST outputs was transformed into a gene sharing 151
network and visualized using Gephi visualization program (Bastian, Heymann, Jacomy 2009). 152
Accordingly, in this network, each node represents a single plasmid and two different 153
plasmids are linked on the basis of their shared protein content. In particular, sharing is 154
defined by a BLAST match between two reading frames longer than 300 bp and 95% or 70% 155
amino acid identity, respectively, therefore representing an absolute measure. To investigate 156
the dynamics of plasmids among bacterial cells, we applied a further filter to each of the 157
obtained graph, maintaining linked only those edges sharing at least five proteins and 158
discarded all the connections linking plasmids with a lower amount of shared proteins. 159
Similarly, to investigate the dynamics of individual genes or small gene clusters among the 160
plasmid population, we applied a filter to maintain only those edges that constitute sharing 161
less than 5 genes. Altogether, we obtained 8 different networks: (70% and 95% identity 162
values for all sequences with more or less than 5 gene transfers, and sequences related to AR 163
or VF). The Gephi-formatted network files are available as Supplemental Information S4. 164
Permutation tests 165
To evaluate the statistical significance of observed preferential gene flows (see below), we 166
randomly permuted 10000 times the phylogenetic affiliation of each node, while keeping 167
intact the original degree of each node within the network (randomization with node degree 168
conservation, see (Brohee et al. 2008)). A p-value was then obtained by counting the number 169
of times the randomly assembled networks returned a number of links greater (or lower) than 170
the observed one and dividing this number for the total amount of performed permutation 171
tests. 172
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Estimation of plasmid mobility 173
The presence of genes related to plasmid mobility were identified by BLAST analysis (with 174
the following parameters: e-value 1e-20, minimum alignment length 50 aa) of the plasmid-175
encoded amino acid sequences against a tra and mob gene dataset retrieved from ACLAME 176
database (http://aclame.ulb.ac.be/, (Leplae et al. 2004)). Since tra and mob genes are 177
generally associated with plasmid mobility and conjugation, we defined plasmid as mobile if 178
it contained one or more mob or tra genes [a similar approach was recently adopted by 179
(Smillie et al. 2010)]. 180
Network centralities, statistics and visualization 181
Network centrality values for network nodes were calculated using iGraph package in R 182
(Csardi 2006). Network clustering was estimated using the Louvain algorithm implemented in 183
Gephi (Blondel et al. 2008) by maximizing modularity and minimizing number of clusters. 184
All statistical tests to investigate the differences in degree and betweenness distributions and 185
GC% content were performed using the base statistics tools in R (R Development Core Team 186
2010; http://www.r-project.org/). Data plotting was performed using ggplot2 package of R 187
(Wickham 2009). All other statistical analyses were performed using in-house developed Perl 188
and Python scripts. Visualization of network clustering and gene sharing as an ideogram was 189
performed using Circos (Krzywinski et al. 2009). 190
Estimation of the phylogenetic distances of gene sharing 191
The 16S rRNA sequences for plasmid hosts were downloaded from Ribosomal Database 192
project (Cole et al. 2007; Cole et al. 2009). The 16S rRNA sequences were aligned using the 193
NAST aligner provided by Greengenes (DeSantis et al. 2006). The distance matrix of the 194
phylogenetic distances was calculated using Phylip (Felsestein 1989) 195
196
Estimation of phylogenetic coherence in major network clusters 197
The Conclustador algorithm (Leigh et al. 2011) was applied to analyze the congruence of 198
phylogenetic trees reconstructed from the sequences of the genes shared by plasmids 199
belonging to the same cluster in a network. Gene families responsible for the connections 200
among the different plasmids were extracted from the 70% and 95% networks and aligned 201
using Muscle software (Edgar 2004). Then, for each plasmid cluster, resulting multiple 202
sequence alignments were used as input for phylogenetic coherence analysis, adopting 203
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Conclustador (Leigh et al. 2011) algorithm. Finally, SplitsTree4 (Huson, Bryant 2006) was 204
used to visualise the phylogenetic information both in each single group identified by 205
Conclustador and in all the groups all at once (and, together, responsible for the plasmids 206
interconnections shown in the networks of Figure 1). In both cases, supernetworks were 207
inferred using data available from single gene phylogenetic analyses performed with RAxML 208
tool with 1000 bootstrap replications. 209
Since for Conclustador to work properly analyzed datasets should not be too fragmented, i.e. 210
about the 80% of the overall taxa dataset must be present in each multiple alignment, not all 211
the identified plasmids clusters could be reliably analyzed. Accordingly, only the major 212
clusters in the 70% and 95% networks were analyzed (namely clusters 961, 993, 1144 and 213
1238 for 70% network and 961, 993 and 1144 for 95% network). Interestingly, the 214
widespread fragmentation found for most of the clusters in the dataset might be due to a high 215
heterogeneity of the same clusters that, in turn, might mirror a high level of horizontal transfer 216
of their embedded genes. 217
218
Results and discussion 219
Gene sharing networks 220
Gene sharing between plasmids was visualized as a network where the plasmids are 221
represented as vertices (or nodes) and gene sharing as edges (or links). Altogether 8 networks 222
were constructed based on 70% and 95% identity between the amino acid sequences and 223
different edge criteria, such as the amount of genes shared (more than or less than 5), or 224
sharing antibiotic resistance or virulence genes (Supplemental Information S6). The identity-225
based criterion introduced for links setting allows interpreting the resulting networks under a 226
molecular clock–based assumption, i.e. under the hypothesis that sequences with the highest 227
percentages of identity (e.g. 95%) were likely to be more recently exchanged than the ones 228
with less identity (e.g. 70%) [see for example (Halary et al. 2010)]. Data for the networks 229
accounting for the sharing of 5 or more genes are reported in Figure 1a and b. Overall, the 230
plasmid network of all sequences at 70% identity (Figure 1b) threshold exhibits one major 231
connected component, some minor connected components and a large number of 232
disconnected plasmids (see below). The main connected component of the network of all 233
genes (the central one in Figure 1b) embeds plasmids mainly belonging to the Proteobacteria 234
phylum (particularly from Gamma, Alpha, and Beta subdivisions). Interestingly, this 235
component also contains plasmids from Actinobacteria. A similar trend is observed in the 236
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
case of 95% identity threshold network (Figure 1a) although, as it might be expected, in this 237
case the main connected component of the network is smaller. The only phylogenetically 238
uniform major component is represented by plasmids from Borrelia burgdorferi 239
(Spirochaetes, yellow nodes of Figure 1a and b). 240
241
In order to investigate the relationships between the taxonomy of represented microorganisms 242
and the evolutionary interconnections of their plasmids, we performed network clustering 243
using the Louvain algorithm implemented in Gephi [see Methods, (Blondel et al. 2008)] and 244
compared the obtained plasmids groups with the phylogenetic and habitat affiliations of their 245
constituent cells. The network clusters embedding multiple phyla and/or habitats for the 70% 246
and 95% networks of all sequencesare presented in Figure 2. According to the network 247
clustering analysis, the network clusters more typically embed members from different 248
habitats than from different phylogenetic orders. Hence, it appears that phylogenetic distance 249
is a greater barrier to gene sharing than having a different habitat. This is likely due to limited 250
horizontal gene transfer across phylogenetic classes that could result from, for example, 251
restriction or incompatible replication systems [as reviewed in (Thomas, Nielsen 2005)]. 252
Moreover, these observations are consistent with findings from microbial ecology and 253
previous in silico analyses (Baquero, Martinez, Canton 2008; Fondi, Fani 2010) and suggest 254
that there is a (more or less) high degree of mixing of microbes between unrelated 255
environments. 256
257
Gene sharing across phylogenetic classes implies at least one past HGT event, and is therefore 258
simple to detect. However, HGT could also be commonplace within phylogenetic classes. To 259
investigate this, all the major network clusters (including those reported in Figure 2) were 260
analyzed using Conclustador package to infer phylogenetically congruent and incongruent 261
gene families. Overall, obtained data (provided as Supplemental Information S5) revealed a 262
high level of incongruence among the analyzed clusters. Indeed Conclustador identified 8, 4, 263
2 and 3 different groups within 961, 993, 1144 and 1238 major plasmids clusters, 264
respectively. Similarly, in the 95% network 6, 4 and 2 distinct phylogenetic groups for 961, 265
993 and 1144 were retrieved. The construction of phylogenetic networks of the sequences 266
embedded in the groups identified by Conclustador revealed, in most cases, high levels of 267
inter- species reticulation. Overall, these data suggest the presence of potential abundant HGT 268
at lower taxonomical levels than those reported in Figure 1 and 2. 269
270
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Furthermore, in order to shed some light on the putative functions encoded by the shared 271
genes, we performed a COG-based functional annotation of the sequences embedded in each 272
plasmid cluster. Data obtained (also reported in Supplemental material S5) revealed that most 273
of the sequences responsible of the plasmids interconnections encode for proteins involved in 274
DNA transposition and recombination. This is not surprising since these functions are 275
strongly linked to the process of HGT and, consequently, to plasmids. Nevertheless, as shown 276
in Supplemental material S5, other genes are shared among the different plasmids embedded 277
in the same cluster and, importantly, their encoded functions are not directly related to the 278
process of HGT itself. This suggests that other functions, probably related to more complex 279
phenotypes, are shared by the different plasmids, including for example genes involved in 280
transcription, inorganic ion transport and metabolism and cell motility (the three most 281
abundant functional categories of plasmids cluster 961, see Supplemental material S5). 282
283
To study the sharing of resistance and virulence genes, the same procedure of network 284
construction was applied to the antibiotic resistance and virulence factor sequence datasets. 285
Results of these analyses for networks of 70% identity criterion are shown in Supplemental 286
material S6. Overall, the topology of both networks appeared to be similar to 70% and 95% 287
networks of all sequences, although some differences can be identified. Indeed, concerning 288
the antibiotic resistance network, the Proteobacterial plasmids do not form a single 289
component, but two different major components can now be identified, one embedding 290
Gammaproteobacterial and Actinobacterial plasmids and the other one embedding Beta and 291
Alphaproteobacterial sequences. This suggests that plasmids belonging to these taxonomic 292
units are not preferential transfer partners of antibiotic resistance genes for 293
Gammaproteobacteria representatives. Conversely, in the virulence network, Proteobacterial 294
plasmids form the major connected component of the graph (Supplemental material S6), 295
revealing an intense sharing of virulence-related genes among microorganisms belonging to 296
this taxonomic unit. Although some remarkable exceptions of plasmids acting as bridges in 297
connecting otherwise separate groups do exist (see below), the other clusters of virulence 298
network are overall coherent with the phylogenetic class affiliation (although intense gene 299
sharing might be present within these groups of plasmids, as shown by previous phylogenetic 300
coherence analysis). 301
302
Network features and taxonomy 303
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
In order to globally analyze the evolutionary relationships underlying the plasmid 304
populations, we applied graph theory measures to the gene sharing networks. In particular, the 305
networks were analyzed for node degree and betweenness. Degree is defined as the number of 306
connections a node has to other nodes. In the present context, a plasmid with a high degree is 307
a plasmid that shares a large number of genes with other plasmids. Betweenness is a centrality 308
measure that is defined as the frequency of a node to lie on the shortest path between two 309
other network nodes. In this context a plasmid with a high betweenness can transfer genes to 310
many other plasmid in the network with a low number of gene transfer events and, in other 311
words, can function as a bridge between otherwise disconnected regions of the network. 312
Accordingly, we computed centrality measures along the network, for all the classes of 313
prokaryotes present in the dataset. Results are provided in Figure 3, whose analysis revealed a 314
positive correlation between degree and betweenness that has also been observed by Halary et 315
al. 2010. However, in the network some nodes showed a much higher betweenness than most 316
nodes of the same degree (see below). Such outliers, characterized by a low degree but a high 317
betweenness, are especially important in any given network, as they can be seen as bridges 318
between smaller, more connected parts of the network (Halary et al. 2010). 319
Tables 1 and 2 report the highest degree and betweenness values, respectively, for individual 320
plasmids in the 70% and 95% identity networks of all sequences. The analysis of Table 1 321
reveals that all the plasmids possessing the highest values of degree belong to the Gamma 322
subdivision of Proteobacteria. This result can be easily explained by the oversampling of 323
plasmids from this class of bacteria. Indeed, the plasmids data used in this study is 324
unsystematically gathered from several unrelated sources and is highly biased toward human 325
pathogenic organisms (most of Gammaproteobacteria) (Wu et al. 2009). In this context, it is 326
likely that more detailed studies of individual environments would reveal several gene sharing 327
events between various phylogenetic groups that are not represented in the current data set. 328
Nevertheless, a detailed inspection of high-degree plasmids gave further support to previous 329
observations based on single plasmids sequence data. In fact, for example, plasmid pU302L 330
(see Table 1) from Salmonella enterica subsp. enterica serovar Typhimurium has already been 331
described for possessing a mosaic pattern of sequence homology with other plasmids (Chen et 332
al. 2007), suggesting, in turn, that this plasmid acquired resistance genes from a variety of 333
enteric bacteria (Chen et al. 2007). Notably, the fact that this plasmid is the best degree 334
scoring plasmid in the 95% network indicates that it acquired foreign genetic material from 335
very closely related microorganisms and/or very recently in time. Similarly, most of the other 336
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
plasmids embedded in Table 1 possess a well documented history of HGT events [see for 337
example p1658/97 (Zienkiewicz et al. 2007; Yi et al. 2010) and pKF3-140 (Yi et al. 2010)]. 338
High betweenness nodes (plasmids) span over a larger taxonomic spectrum, suggesting that 339
this centrality measure is less affected by sampling biases. Indeed the plasmids with highest 340
betweenness values belong to diverse phylogenetic classes, including Bacilli, Lactobacilli and 341
Gamma, Beta, and Alphaproteobacterial representatives. As in the case of high degree 342
plasmids, mosaic-like structure of high-betweenness plasmids has been described before, for 343
example, of pCoo from Escherichia coli (Froehlich et al. 2005) and pGO1 from 344
Staphylococcus aureus (Caryl, O'Neill 2009). Hence, although the overall plasmids clustering 345
seems to agree with taxonomic classification of their source microorganisms, some plasmids 346
compact the overall network, residing in the path between plasmids that otherwise would 347
remain disconnected (Halary et al. 2010). Importantly, some of the plasmids that were found 348
to possess high degree/betweenness values (Table 1 and 2) were the same that were found to 349
be central in other gene sharing network analyses performed by Halary et al. (2010) (namely, 350
plasmids pOU7519, pU302L from Salmonella representatives, p1658/97, pIP1206 from 351
Escherichia coli, pKPN5 from Klebsiella pneumoniae, pVEF3 from Enterobacter faecium, 352
pSK41 from Staphylococcus aureus, pGdh442 from Lactococcus lactis and pTEF1 from 353
Enterococcus faecalis V583) thus confirming the key role of these DNA molecules in the 354
flow of genetic material among different microorganisms. In our opinion these plasmids 355
represent key players from an evolutionary viewpoint, contributing to the spreading of 356
potentially clinically relevant genetic determinants within the whole bacterial mobilome. 357
Several plasmids (1159 for the 70% identity network of all genes and 1369 for the 95% 358
identity network) in the data set shared less than five genes with any other plasmid and 359
therefore did not belong to any connected component. The taxonomic composition of this 360
disconnected component of the network is presented in Figure 4. Statistical randomization 361
testing (as described in Methods) was performed to evaluate the effect of sampling bias in the 362
frequency distribution. Most of the phylogenetic classes possessed between 2% and 5% of 363
disconnected plasmids, the only exception being represented by Gammaproteobacteria 364
(almost 15% of disconnected plasmids). For most classes the amount of disconnected 365
plasmids was higher than expected by random shuffling of the networks. 366
Dynamics of genes in the plasmid population 367
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
In the previous sections we mainly analyzed networks in which two plasmids were connected 368
if they shared (at least) five genes, thus surely underestimating the real amount of gene 369
transfer events among plasmids. To go into greater details and to analyze the possible 370
dynamics of gene transfer among plasmids we built gene sharing networks taking into 371
account the sharing of single genes (up to four genes) among two given plasmids. Such 372
networks were constructed adopting the same computational strategy used for >= 5 networks 373
(see Methods) and, together with singlets taxonomical distribution and cross-taxa 374
interconnections, are reported in Supplemental Material S7. Overall, < 5 networks embedded 375
almost the same number of links (11458 and 5136 for 70% and 95% identity thresholds, 376
respectively) compared to >5 networks (12444 and 6777 for 70% and 95% identity 377
thresholds, respectively), suggesting the presence of an extensive amount of single gene (or of 378
relatively small gene sets) exchange among the different plasmids. 379
Louvain clustering of < 5 networks, although producing a large fraction of taxonomically 380
highly coherent groups, resulted in slightly more heterogeneous plasmid clustering compared 381
to the clustering obtained from >= 5 networks (Figure 2b and c). This suggests that when 382
considering the transfer single genes or groups of small genes, taxonomical barriers can be 383
bypassed more frequently than in the movement of larger sets of genes. In agreement with the 384
previous congruency analysis, a deeper analysis of the phylogenetic coherence (adopting the 385
coherence analysis pipeline described in Methods) of the gene families within the major 386
network clusters revealed a high amount of incongruency (data not shown). Hence, according 387
to the overall body of data presented here, it appears that the sharing of relatively small gene 388
sets is more abundant and spans over a larger phylogenetic distance than transfers of larger 389
sets of genes, although the great part of this genetic exchange still happens within the 390
boundaries of microbial phylogenetic classes. 391
392
Network comparison 393
To explore the differences among the networks, we computed Pearson product-moment 394
correlation coefficients between betweenness and degree values for each node (i.e. plasmid) 395
(Figure 5). Data obtained revealed a low positive correlation between betweenness and degree 396
in each of the networks, independently from the nucleic acid identity thresholds and/or the 397
functions shared among the different plasmids (virulence or antibiotic resistance genes). R2 398
values range between 0.25 and 0.36 for 70% networks and are slightly higher for 95% 399
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
networks (ranging from 0.29 and 0.44). Accordingly, node degree does not explain all the 400
variation in node betweenness regardless the timing of the gene transfer(s) (70% vs. 95% 401
thresholds) and/or to the functions that are transferred (virulence vs. antibiotic resistance 402
determinants) – the values are most likely determined by the mobile nature of plasmids 403
themselves. 404
405
Analysis of mobilizable and conjugative plasmids 406
Conjugative plasmids have been defined as “vessels” of the communal gene pool (Norman, 407
Hansen, Sorensen 2009). Indeed, this class of plasmids possesses the ability to “visit” 408
different cells and, in principle, undergo genetic rearrangements (such as homologous 409
recombination) with other plasmids and/or other informative molecules (phage genomes and 410
chromosomes). For this reason, conjugative plasmids might be expected to possess a more 411
central position within the overall plasmid gene sharing network in respect to those that are 412
not mobilizable. To test this hypothesis all the tra- and mob-like sequences of the plasmids 413
were eliminated from the networks and the centrality measures of conjugative/mobilizable 414
plasmids were evaluated. Plasmid mobility was estimated by identifying the number of mob 415
and tra genes that they harbor (an approach similar to that adopted in (Smillie et al. 2010) and 416
described in Methods). The relationship existing between the mobility and the network 417
measures was investigated by studying the distribution of the centrality measures between the 418
mobile and non-mobile plasmids. The distributions of the centrality measures are presented in 419
Figure 6 and are significantly higher for mobilizable plasmids in the networks of all genes and 420
resistance genes (p-values according to Mann-Whitney tests are presented in Figure 6). 421
Therefore, the presence of mob or tra genes significantly promotes the gene sharing measures 422
in the networks of all genes and antibiotic resistance genes. This suggests that plasmid 423
mobility is an important mechanism in spreading various genetic traits within the plasmid 424
community, including antibiotic resistance genes. This fully agrees with the central role 425
inferred for conjugative plasmids in the context of bacterial evolution (Norman, Hansen, 426
Sorensen 2009) and gives further support to the idea that these particular plasmids act as 427
vessels of the communal gene pool. This also indicates that the high incidence of high degree 428
and betweenness values in certain phylogenetic classes (such as Gammaproteobacteria) does 429
not only result from their over-representation in current data set but are also affected by 430
genetic properties of their plasmids. 431
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Gene sharing over phylogenetic classes 432
The importance of plasmids within the complex microbial evolutionary network resides also 433
in the capability to connect microbes separated by a (more or less) long phylogenetic distance 434
and to overcome the various barriers to horizontal gene transfer (Thomas, Nielsen 2005). The 435
occurrence of gene sharing over phylogenetic classes was enumerated and visualized in 436
Figure 7. 437
Interestingly, some connections in the network span over very large phylogenetic distances. 438
For example we found connections linking Alphaproteobacteria and Cyanobacteria and in 439
particular plasmid pCC7120beta from Nostoc sp. PCC 7120 with plasmid pBBta01 from 440
Bradyrhizobium sp. BTAi1 and pCC7120gamma from Nostoc sp. PCC 7120 with plasmid 441
pNGR234b from Sinorhizobium fredii NGR234. These connections suggest the presence of 442
HGT among microorganisms inhabiting very different ecological niches (multiple and host 443
associated for Cyanobacteria and Alphaproteobacteria, respectively), involving genes linked 444
to important functions such as copper transport and transcriptional regulation, respectively. 445
Remarkably, also inter-kingdom transfers (involving chemotaxis related genes) were 446
observed: this is the case, for example, of connections linking plasmid pH308197_258 from 447
Bacillus cereus H3081.97 to plasmid pHmuk01 from Halomicrobium mukohataei DSM 448
12286. Also in this case, microorganisms belong to likely unrelated habitats (multiple and 449
specialized, respectively). 450
However, because the amount of inter-classes connections is likely strongly affected by 451
sampling biases, we performed statistical tests to investigate the significance of the observed 452
inter-class connections by performing random permutation of the original network, as 453
described in Methods. In the 70% identity network, inter-class links included connections 454
between more closely related microorganisms (e.g. connections between Alpha, Beta and 455
Gammaproteobacteria and between Bacilli and Lactobacilli) as well as connections between 456
more distantly related microorganisms (i.e. Actinobacteria and Betaproteobacteria, 457
Actinobacteria and Gammaproteobacteria, Alphaproteobacteria and Deinococci). However, 458
some closely related microorganisms possessed a lower amount of connections than expected 459
by chance (e.g. between Alphaproteobacteria and Gammaproteobacteria, p-value<1e-4), 460
possibly indicating a genetic incompability between these groups (Thomas, Nielsen 2005). As 461
it might be expected, when analyzing the 95% network, the number of observed connections 462
decreased and mainly closely related taxonomic groups were still interconnected (Bacilli-463
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Lactobacilli and Betaproteobacteria-Gammaproteobacteria (p-value<1e-4) among over-464
represented and Alphaproteobacteria-Gammaproteobacteria and Bacilli-Gammaproteobacteria 465
among under-represented (p-value<1e-4)). Notably, the connection between distantly related 466
Gammaproteobacteria and Actinobacteria also remained strong. 467
As noted in the case of gene transfers among phylogenetically incoherent groups (see 468
Supplemental Material S5), the majority of shared genes code for functions that are related to 469
the process of HGT itself and generally belong to L category in COG annotation (Figure 8). 470
Nevertheless, also other functions are exchanged as indicated by gene sharing (Figure 8), 471
underlining the key role of plasmids in spreading important biological traits throughout the 472
whole microbial world. 473
Gene transfer between Actinobacteridae and Gammaproteobacteria 474
According to the results presented in Figure 7, the gene sharing between Actinobacteria and 475
Gammaproteobacteria is spanning one of the longest phylogenetic distances within our 476
networks (Supplemental Information S8) and appears to be crucial in transferring antibiotic 477
resistance genes. Furthermore, most of the shared genes are (at least) 95% similar and 478
therefore, according to the molecular clock hypothesis, the transfer between these classes has 479
occurred recently. For this reason, we further analyzed this, apparently preferential, gene 480
flow. 481
To better characterize the gene sharing between Actinobacteria and Gammaproteobacteria, we 482
selected representative plasmids with a high amount of shared genes between 483
Gammaproteobacteria and Actinobacteria and visualized them as circular ideogram with 484
resistance-, conjugation- and transposition-related genes and gene sharing events (Figure 9). 485
The analysis of Figure 9 revealed that the antibiotic resistance genes transfer between the 486
plasmids by transposition, as most of the links connecting Actinobacteria and 487
Gammaproteobacteria fall in plasmid regions embedding antibiotic resistance and/or 488
transposition-related genes. These results indicate the presence of a clinically-important gene 489
flow between representatives of these microbial groups, although not suggesting the possible 490
direction of these gene transfers (i.e. from Actinobacteria to Gammaproteobacteria or 491
viceversa). To shed some light on this point we investigated the composition of the involved 492
plasmids under the assumption that, if the HGT events are recent (as suggested by the high 493
amino acid identity) the transferred genes are expected to have a GC content closer to the 494
donor plasmids rather than to the recipient one (Karlin 2001). Hence, the GC content of the 495
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Actinobacterial and Gammaproteobacterial plasmids and genes was calculated and compared 496
(Supplemental Information S9). The Actinobacterial plasmid GC content (mean 0.56% from 7 497
plasmids) was significantly higher (p-value = 9.4e-3 according to a Mann-Whitney test) than 498
the Gammaproteobacterial GC content (mean 0.51% from 95 plasmids). Moreover, GC 499
contents were calculated for the individual transferred genes and compared to the plasmids. 500
According to Mann-Whitney test, the transferred genes have a significantly different GC 501
content from the Gammaproteobacterial plasmids (p =7.0e-15) but are not significantly 502
different from Actinobacterial plasmids (p = 0.42). Accordingly, the whole body of data 503
presented in this section suggests that the direction of gene transfer is very likely from 504
Actinobacteria to Gammaproteobacteria. This is consistent with the knowledge that some 505
Actinobacteria are natural producers of antibiotic compounds and, therefore, a potential 506
source of antibiotic resistance genes to human pathogens (Wright 2007; Miao, Davies 2010). 507
Conclusions 508
The use of gene sharing network as a tool to investigate microbial evolutionary relationships 509
is rapidly expanding, especially when studying non-tree like structures that sometimes can 510
arise in evolution (Dagan, Artzy-Randrup, Martin 2008; Halary et al. 2010). The power of 511
such approach is demonstrated here by revealing the relationships between biological 512
properties (e.g. plasmids mobility) and network properties (e.g. plasmid centrality) in the gene 513
sharing network. Moreover, the approach applied here also revealed an extensive antibiotic 514
resistance gene sharing between Actinobacterial and Gammaproteobacterial plasmids, 515
suggesting a potential source of antibiotic resistance genes that might have led to the recent 516
emergence of antibiotic multiresistance in pathogenic organisms. 517
The plasmid sequences analysed in this study were gathered in a non-systematic manner from 518
different sequencing projects; their sampling is therefore random and likely biased towards 519
human pathogenic organisms. The bioinformatic workflow described here would be best 520
suited for single genomic sequence data sets obtained from specifically selected 521
environments. We expect such data sets to become available as the DNA sequencing costs 522
decrease and genome sequencing from single cells becomes a routine approach 523
(Stepanauskas, Sieracki 2007; Rodrigue et al. 2009). The proposed approach could then be 524
used to investigate whether the functional categories of transferred genes would reflect the 525
different selective patterns present in the given environment(s). Therefore, obtaining single 526
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
genome data sets from multiple different environments would permit evaluation and 527
comparison of gene sharing patterns in response to different environmental conditions. 528
529
530
Table 1 531
Individual plasmids with highest degree measures observed in the gene sharing networks of 532
all genes. 533
Accession Number Microorganism Plasmid name Degree
N. of tra/mob genes
Conjugative (c) or mobilizable (m)
70% Network
NC_010119 Salmonella enterica subsp. enterica serovar Choleraesuis pOU7519 268
17 c
NC_006856
Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 pSC138 254
17 c
NC_011964 Escherichia coli pAPEC-O103-ColBM 253 8 c NC_013951 Klebsiella pneumoniae pKF3-140 247 9 c NC_013728 Escherichia coli O26:H- pO26-CRL 243 21 c NC_010488 Escherichia coli SMS-3-5 pSMS35_130 242 13 c
NC_011092
Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 pCVM19633_110 241
17 c
NC_006816 Salmonella enterica subsp. enterica serovar Typhimurium pU302L 240
17 c
NC_013122 Escherichia coli pEK499 231 15 c
NC_013437 Salmonella enterica subsp. enterica serovar Typhimurium pSLT-BT 225
4 c
95% Network
NC_006816 Salmonella enterica subsp. enterica serovar Typhimurium pU302S 192
16 c
NC_010488 Escherichia coli SMS-3-5 pSMS35_130 188 13 c
NC_006856
Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 pSC138 187
17 c
NC_013951 Klebsiella pneumoniae pKF3-140 186 9 c NC_011964 Escherichia coli pAPEC-O103-ColBM 184 8 c
NC_010119 Salmonella enterica subsp. enterica serovar Choleraesuis pOU7519 171
17
NC_013728 Escherichia coli O26:H- pO26-CRL 168 21 c NC_013122 Escherichia coli pEK499 166 15 c
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
NC_011092
Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 pCVM19633_110 165
17 c
NC_004998 Escherichia coli p1658/97 157 11 c 534
535
536
Table 2 537
Individual plasmids with highest betweenness measures observed in the gene sharing 538
networks of all genes. 539
Accession Number Microorganism Plasmid name Betweenness
N. of tra/mob genes
Conjugative (c) or mobilizable (m)
70% Network
NC_007635 Escherichia coli pCoo 8050 10 c/m
NC_006663 Staphylococcus epidermidis RP62A pSERP 6329
3 m
NC_007974 Cupriavidus metallidurans CH34 megaplasmid 6067
14 c
NC_011092
Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 pCVM19633_110 5800
17 c
NC_010558 Escherichia coli 1520 pIP1206 5750 16 c
NC_009651
Klebsiella pneumoniae subsp. pneumoniae MGH 78578 pKPN5 5641
11
NC_011339 Bacillus cereus H3081.97 pH308197_258 5507 2 m NC_011655 Bacillus cereus AH187 pAH187_270 5330 7 c/m NC_012586 Rhizobium sp. NGR234 pNGR234b 5271 88 c NC_010980 Enterococcus faecium pVEF3 4700 4 m
95% Network
NC_011092
Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 pCVM19633_110 38781
17 c
NC_005024 Staphylococcus aureus pSK41 29020 7 c NC_012547 Staphylococcus aureus pGO1 29020 9 c NC_010378 Escherichia coli pOLA52 21221 3 c
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
NC_005054 Staphylococcus aureus pLW043 19209 6 c NC_009435 Lactococcus lactis pGdh442 18216 7 m
NC_004669 Enterococcus faecalis V583 pTEF1 15617
8
NC_008381 Rhizobium leguminosarum bv. viciae 3841 pRL10 15030
27 c
NC_013121 Escherichia coli pEK516 13724 11 c NC_005327 Escherichia coli pC15-1a 13073 9 c
NC_011996 Macrococcus caseolyticus JCSC5402 pMCCL2 12981
4 m
540
541
542
543
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Figure 1: 543
The gene sharing between plasmids presented as matrices (A) and networks (B) at both 70% 544
and 95% criteria. In network figures, plasmids are represented by the nodes (node size is 545
proportional to the plasmid size) and the shared genes by the links. At least five shared genes 546
are required to establish a link. 547
548
549
!"#$%&#'()*+,-# ./&#'()*+,-#
0"#$%&#'()*+,-# ./&#'()*+,-#
!1+*2341,)5'4#
!6784752,)2341,)5'4#
94::4752,)2341,)5'4#
041'66'#
;41,2341'66'#
<7'52184),)=#
0),4752,)2341,)5'4#
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
549 Figure 2: 550
The major phylogenetic groups, their habitats and their clustering in A) the 70% and the 95% 551
networks for >= 5 networks and in 70% and the 95% networks for < 5 networks, B) and C), 552
respectively. The clusters that were subjected to Conclustador analysis have been indicated.In 553
D) the amount of inter- and intraphylum and inter- and intraclass clustering in the networks is 554
reported for both < 5 (low) and >= 5 (high) networks The clustering of the network has been 555
determined using the Louvain algorithm implemented in Gephi (see Methods). 556
557
558
!"# $"#
%"#
&'(')*#!+,-./+01/-23#!+4)/5,+6'-2,#!170,7-/6'/5,+6'-2,#$,+2112#$'6,7-/6'/5,+6'-2,#%0-//5,+6'-2,#8'2)/+/++2#8'16,7-/6'/5,+6'-2,#9,::,7-/6'/5,+6'-2,#;,1/5,+6'-2,#&,+6/5,+2112#
<#
<#<<#<<<#
M - Multiple habitats T - Terrestrial A - Aquatic H - Host-associated S - Specialized
M T A H M
T
M T
H M
S
*** M
A
H
M
A S H M
A * H M T
T
A H
M
M T A H M S T
* A
** A ** T
*** M
A
H
MA S H M
M T A H M
T H
M T
H M T
*** M
A
H
M
A S M
Cluster 993
Clus- ter 961
Cluster 1144
X9
coun
t
0
1000
2000
3000
4000
5000
6000
7000
0
1000
2000
3000
4000
5000
6000
7000
High
Inter−class Inter−habitat
Inter−class Intra−habitat
Intra−class Inter−habitat
Intra−class Intra−habitat
Low
Inter−class Inter−habitat
Inter−class Intra−habitat
Intra−class Inter−habitat
Intra−class Intra−habitat
70%95%
8"#
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
558 Figure 3: 559
Dependency of plasmid betweenness from plasmid degree for different phylogenetic classes 560
according to Pearson’s product moment correlation coefficient. 561
562
563
Degree
Betweenness
0
1000
2000
3000
4000
02000400060008000100001200014000
0500010000150002000025000
0100020003000400050006000
0
10000
20000
30000
0
5000
10000
15000
050010001500200025003000
0
500
1000
1500
2000
70%
! !!!!!!!!!!!!!!!!!! !!! !!!!!
!!! !!!! !!!!!! !!!!!!!
!!!!!!
!
!
!!
!
!!!! !!!
!
!!!!!
!
!!!!!!!!!!
!
!
!
!!
!
!
!!!!!!!!
!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!! !!! !!
!!! ! !!!
!! !!!!
! !!!!!!!!!!!!! !!! !!!!!!!!
! !!! !!! !!! !
!
! !!!!! ! !!!!!
!!
! !!! !!!!! !! !!!! !!
!!
!!!!!
!!!!!!
!
!!!
!
!!!!!!!!! !!!!!!!!!!!!!!!!! !!!!!!!!! !! !!! !!! !
!
!!! !!!
! !!!!!!!!
!! !!!!
! !! !!!
!
!! !!! !!!
!!! !! !! ! !!!!!!!!! !!!!!!! !!!!!!
!!! ! !! !!!
!! !! !!!!!!!!! !! !! !!!!!!!! ! !!!! !!!!! !!! !!!!!!!! !!
!!
! !! !!! !!!!! !!!! !! !! !!
!
! !!! !! !!
!
! !!! !!! !!!!! ! !! !! !!!!!!! !! !! !!!!!!!!!! !!!!!!
!!!!! !!!! !! !!!!! !
!
!!!!!! !!! !!!!!!! !!!! !!
! !!!
!
!!!!!!
!
!!!! !!
!
!!
!
!!
!!!!!! ! !
!
!!!!!! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!! !!!!! ! ! !!! !! !! !!!! !!!! !!!
!!!!!!
!
!!
!
!!!!!!! !
!! !!!!!!!!!!
!
!!! !!!!!!!!!!!!!!
!
!
!! !!! ! !
!
!!!!!
!!!!!!!!!!!!!!!!
!!! !!!! !!!! !!!!!! !!! !!!! !
!!!!! !!! !!! !!!! !! ! !! !! !!!!!! !! !!!!!! ! !!!! ! ! !! !!!!! !! !!! !! ! !! !! !! !!! !!! !!!!!!!! !! !!! !!! !! !! ! !!! ! !! ! !!! !!!!!!!!!!!!!
!!!!
!!!! !!! ! !! !! !!! !! !!!
!!! !! ! !! !!!! !!!! ! !!!! !!
!! !! !!! !
!!!
!!! !
!
! !! ! !!! ! ! !!!! ! !! !! !!!!! !!
! ! !!!! ! ! ! !!
!
!!!!! !!
!!!!!!!!
!!
!!! !!! !! !! !!! !!! !!!!!!!!! !
! !!
! !!! ! !! !! !!
!!
!! !!!!!! ! !
!
!!! !!!! !
!
! !!!! ! !! !! !!!! !!! !
!!
!!!!! ! !
!!!!!
!
!!!! !!!!! !!
!!!!!! ! !! !!! !!!!!
! !!
!
!! !! !! !! !! !!! !!!!! !!
!! !!!! !! ! ! !!!!!
!
! ! !!! !! !!!!! !!! !!!!
!!! !!
!
!!!! !!!!! !! !!!!!!!! !!!!! !!!!! !! !!! !!!! ! !! !! !!!! !
!! !!!
!! !! ! !!
!!!
! !!!! !!! !! !!!! ! !!!!!!! !! !
! ! !! !!!! ! ! !!!! !! !! !! ! !!!! !!! !! !!! !! !! !!! !!! !!! ! !!!!!!!! ! ! !! !!! ! !!! !!! !! !!! ! !!!! ! !!!! !! ! !!!
!!!!!!! ! !!! ! !!! !!! ! !!! !
!
!!!!!!! !!! !!! !! ! !!
!
!!! ! !!!!!! ! !! !! !!!
!!!!! !!!!!!!!!!!!!!!
!! !!! !! ! !!!!!!! !!!!!!!!!! !!!!!!!
!!
!
!!!!!!!!!! !! !! !! !!! !!!! ! !!!!!! !!
!
!! !!! !!! !!! !! !!!!!!!!! !!!! !!!!! !!!! ! !! !!!!!!!!
!!!!!!!! !!!!!!!!!! !!!! !! ! !!
!!! !!! !!!!! !
!! !
!
!!!!!!!!
!
!!!
!
!!
! !!! !! !! !
!
!! !!!!! !!! !!!!!
!!!!
!! ! !! !!!
! !!!! !!!! !!! !!! !!! !!!!!!
!!!
!
!!!!! !! !!!
!
! !!!! !!!!! !!! !! !
!
! ! !! !! !!! !!!!
!
!!!! !!!
!
!! ! !! !!!!!!! !
!
!!! !!!! !!!!!! !!!!
!!! !!
! ! !!!
!
!
!
!!! !!! ! !!!! !!! !! !!!! !!!! !! !!! !! !!!! !! !!! !!! !!!!! !!!!!! !! !!! !!!!!!!
! !!!! !!! !!! !!! !!! ! !!
!! ! !
!
!!
!
!!
!
!
!
! !!
!!
!
!
!
!!!
!!
!
!
!!!!!!!!
0 50 100
150
200
250
95%
!!!!!!!!!!!!!!!!!!! !!!
!
!!
!
!!!!!! !!!!!!!!!!!!! !
!
!!!!!!!
!
!!!
!!!! !!! !!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!
!
!
!!!!!!!!!
!
!!!! !!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!
!
!!!!!!!!!!!!!!!!
!!!
!
!
!!
!!!!!! !!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!! !!! !
!
!!! !! !! !!!!!!!!!! !!!
!
!
!!!!!
!
!!!!!!! !!! !!!!!!!!!!!!!!!!
!!!!!! !!!!!!!!! !!
!!!!
!! !!! !!!!!!!! !! !!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!
!
!
! !! !!!!!!!! !!!! !! !!!!
!
! !!!!
!!!
!
!!!! !!!! !!!! ! !!
!
! !!!! !!!
!
! !!!!!
!
!!!!!! !!!!!!
!
!!!!!!!! !! !!!!!! !!!!!!! !!!!!!!!!!!!!! !!! !!! !!!!!!!
!
!!!!!!
!
!! !!
!
!!!!!!!
!
!
! !! !!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!! !!! ! !!!! !! !!!!!! ! !!! !!!
!!!!!!!!!!
!!!!!!! !!!!!!!!!!!!
! !
!!! !!!!!!!!!!!!!!!
!
!!!!!!
!
!!!!!!!!!!!!!!!!!!!!!!!
!! !!!!!!!!!!!!!!
!
!!!!!!!
!!!!! ! !!!!! !!!! !!! !! !! !!!!!! !!!!!!!!! !!!!! !!!!
!!!! !! !!! !! ! ! ! !! !! !! ! !!! !
!
!!!!!!!! !!! !!! !!!!! !!! !!! !!!! !!!!!!!!!!!!!!
!!! !!!! !!!
!!! !! !!! !! !!!
!
!! !! ! !! !!!!!!!!!!!!! !!!
!!! !!! ! !!!
!!! !!
!!!! !!! ! !!!!! !!!!!!!!!! !!! ! !!!! !!! !!
!
!!!!!!
! !!!!!!!! !! !!! !!! !! !! !!!!!! !!!!!!!!! ! ! !!! !!! !!!!! !!! !!! !! ! !!! !! !!!!!!!!!
!
! !!!!! !! !! !!!!!! !!
! !!
!!!! ! !!!!!!!
!!
!! !!!!! !
!
!!!!!! ! !! !!!!
!!!!
! !! !!! !!!! !! !! !!! !!!!! ! !!!!
!!! !! ! ! !!!!!!
!! !!! !!!!!!! !!! !!! !!!! !!
!
!!!! !! !!!!! ! !!!!! !! !!!!!!!!!! !! !!! !!!! ! !! !! !!!! ! !! !!!
!! !! ! !!!!!
! !!!! !!! !! !!!! !!!!!!!! ! !!! ! !! !!!! !
!
!!!! !!!!!! ! !!!! !!! !!!
!! !! !!!!!!!! !!! !!
!!!!!!! !!!
! !!! !!!
! !!!!
!!!!! !!!! ! !!!!!!!!
!!
!!!!!!! !!
!
! !
!
!!
!
!!! !!! !!!!!!!!!!
!! !!!!!! !!
!
!!!! !!!!!!!!! !! ! !
!
!!!!! !!!!!!!!!!!!!!!
!!!!
! !! !!!!!!!!!!!!!!!!!!!
!!!
!!!
!
!
!
!!!!!!!!!!!!
!
!
!
!! !!
!!!! !
!!!!!!
!!
!!!
!!!
!!
!! !!!!
!!!!!!!!! !!!! !!!!! !!!! !!!!!!!!!!!
!!!!!!!!
!
!!!!!!!!! !!!! !! !!
! !!!!
!!!!!!!!!!!!!!!!
!!!!!!!! !!!!!!!!! !! !!!! !!!!!
!
!! !!!
!!!!!!!! ! !! !! !! !!!! !!!!!!!!!! !!
!!!!!! !!!!!
! !!!! !! !!! !! ! !!! ! !!!! !!! !! !!! ! !! !! !!! !!!!!
!!!! !!!!!!
!!!
!!!!!!! !!
!
!!
!!!! !!!!!! !!!
!
!!! !!
! !!
!!
!
!
!
!
!! ! !! ! !
!!
!!
!! !! !! !! !! !! !! !!! !! !!!! !!! !! !!! !!!!! !!!!!! !! ! !!!!!!!!!
! !!!! !!! !!!
!
!! !!! !
!!
!!
!
!
!
!
!
! !
!
!
!
!
!!!
!!
!
!
!
!
!
!!!
!
!
!!!!!!!!
0 50 100
150
Actinobacteria
Alphaproteobacteria
Bacilli
Betaproteobacteria
Gammaproteobacteria
Lactobacilli
Spirochaetes
Unassigned
r2 = 0.35 p-value < 2.2e-16
r2 = 0.40 p-value < 2.2e-16
r2 = 0.46 p-value < 2.2e-16
r2 = 0.52 p-value < 2.2e-16
r2 = 0.39 p-value < 2.2e-16
r2 = 0.28 p-value = 8.9e-16
r2 = 0.18 p-value = 6.6e-14
r2 = 0.35 p-value = 1.8e-6
r2 = 0.81 p-value < 2.2e-16
r2 = 0.17 p-value = 1.4e-10
r2 = 0.36 p-value < 2.2e-16
r2 = 0.67 p-value < 2.2e-16
r2 = 0.13 p-value < 2.2e-16
r2 = 0.33 p-value < 2.2e-16
r2 = 0.1 p-value = 8.2e-2
r2 = 0.41 p-value = 9.0e-8
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
563 Figure 4: 564
The phylogenetic class distribution of the disconnected plasmids in the data set.A plus sign 565
(+) is used to mark the inter-class transfers that were more abundant than expected by random 566
assignment of the transfer events between plasmids (permutation test, p-value < 1e-4). A 567
minus sign (-) is used to mark the inter-class transfers that were less abundant than expected 568
by random assignment of the transfer events between plasmids (permutation test, p-value < 569
1e-4). 570
571
572
Percentage
AcaryochlorisActinobacteridae
AlphaproteobacteriaAquificales
ArchaeoglobiBacillalesBacteria
BacteroidiaBangiophyceae
BetaproteobacteriaChlamydiales
ChlorobiaChroococcales
ClostridiaCytophagia
DeferribacteralesDeinococci
DeltaproteobacteriaDikarya
EpsilonproteobacteriaErysipelotrichiFlavobacteria
FlorideophyceaeFusobacteriales
GammaproteobacteriaHalobacteria
HerpetosiphonalesLactobacillales
MethanobacteriaMethanococci
MethanomicrobiaMollicutes
MycetozoaNitrospirales
NostocalesOscillatoriales
PlanctomycetaciaSatellite Nucleic Acids
SchizopyrenidaSphingobacteria
SpirochaetalesStreptophytaThermococci
ThermomicrobialesThermoplasmata
ThermoproteiThermotogales
Unassigned
General
5% 10%
15%
Resistance
5% 10%
15%
Virulence
5% 10%
15%
Identity70%95%
!"
!"
!"!"
!"
!"
!"
!"
!"!"!"!"
!"!"
!"!"
!"
!"!"!"!"
!" !"!"
!"!"!"!"
!"
!"!"!"!"
!"!"
!"!"
!"
!"!"
!"!"
!"
!" !"!"
!"!"!"
!"
!"!"
!"
!"!"
!"!"
!"
!"!"
!"!"
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
572 Figure 5. 573
Dependency of plasmid betweenness from plasmid degree for the major networks built in this 574
work according to Pearson’s product moment correlation coefficient. Networks of < 5 and >= 575
5 connections are indicated as Low and High, respectively. 576
577
578
Degree
Betweenness
0
1000
2000
3000
4000
5000
6000
0
500
1000
1500
2000
High
! !!!!!!!!!!!!!!!!! ! !! !!!!! !!!! !! !!!!
!!!! !! ! ! !
!
!!
!!
!! !! !! !!
!
!
!
! !!!!
!
!!!
!
! !!!
!
! !!
!
!!! !!!!
! !!! !!!!! !!
!
!
!
! !
!
!
! !
!
! !!!!! !!!!!!
!
! !!! !!!
! !
!!!!!!!
!
!!!!!
!
!
!
! !
!
!
!
!!
!
!!! !!
!
!
!
!! !!
!
! !!!!!
!
!!
!
!!
!!
!
!
!
!!
!
!!!
!
!
!
!! ! !
!
!!
!!!
!
!!!!!!! !!
!
!
!!
!!!
!
!
!!!!!!!!!! !!! !
!
!
!
!
!
!!!! ! !
!
!!!!!!!!!!!
!
!!
!
!
!!
!!!!!!!
!
!
!!!! ! ! !
!
!!!!!! !!!!
!
!
!
!
!!!
!!
!
!
!
!
!!!!!
!! !!! !
!
!!
!
!
! !!
!
! !!!
!!
!
!!
!
!
!
!
!
!
!!
!
!
!!
! !! !
!
!! !! !!
!
!
!
!
!
!
!
!
!
!! !!!
!
!
!
! !!!
!
!
!
!!
!
! !
!
!! !!!
!
! !!
!
!
!!!
!
!
!
!!!
!!
!! !
!
!!
!
!!!!!!!!
!
!! ! !!!!
!
!
!
!!!!!
!!!!!!! !!!!!
!
!
!!!! !! !!
!
!!!!
!
!
!
! !!!
!
!
!
!
!
!!!
!!
!
! !!! !
!
!
!!!!
!! !!
!
!
!
!! !!!!
!
!
!!
!
!
!!! !!! !
!
!
!
!
!
!
!
!!
!
!
! !!
!
!!!!! !!
!!! !! !!! !!!!!!!!
!!
!
!!! !
!
!
!!!!! !!!
!!! !
!
! !! !!!
!!
!
!
!
!
!
!!! !! !!! ! ! !
!
! ! !!!
!!!!! !!!!
!
!! !!!!
!
!!!!! !
!
!
!
!!!
!!!!
!
! !!
!
!! !!! !!! !!! !!! ! ! !!
!! !!! !!! !!
!
!
!!!
!!! !!!
!!
!! !!!
!
!!!! !!!!!!
!
! !!! !!! !!!!
!
!! !! !!! !! !!!! !! !
!
!! !!
!
!
!
!
!
!!!!! ! !!!
!!
!!!
!
! !
!
!
!!
!
!!
!!!
!
! ! !! !!!
!
!
! !
!!! !! !! ! !! !
!
!
!
!!! !!
!
!!!! !!
!
!!
!
!!!
! ! !!!!! !! ! !!!!!!!!! !!!!!! !!!!!!!!!!!!!!!!! !!! !! !! !!!!!!!!!!! !!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! !!!!!!!!
!! !!!!!!!!!!!!!! !! !!!! !!!!!!!!! !! !!
!
!!! !!! !! !! !!!!!!!
!
!
!
! !!!!!!!!!!!!
!!!! !!!!!!
!!
!
! !!!! !!!!
!!!!!!!!!
!
!!!!
!!!!!
!
!
!
!!
!
!
!
!
!
!
!! !
!
!! !!
!
! !!!
!
!!!
!
!
!
!!
!
!
!
! !!
!
!!!!
!
!
!
!!!!!!
!!!
!
!!
!!!!
!
!!!!! !
!
!
!
!!!!!!!
!
!!!!!
!!
!
!!!
!
!!!!!!! !!
!
!
!!!! !!!
!
!!!!
!
!
!
!
!
!!!!!!! !
!
! !
!
!
!
!
!!!!!
!
!!
!
!
!!!!!! !!
!
!! !!!
!
!
!
!!! !
!
!
!
!
! !
!
! !!
!!
!
!
!!!!!!
!
!
!
!!!
!
! !! !
!
!
!
!
!!!!!
!
!
!! ! !!
!
!
!
!!
!
!!!!!!!!!
!
! !!!
!!
!
!
!
!!!
!
! !! !! !!
!
!
!
!
! !
!
!
!
!!!!
!
!!!
!
!!!!! !!!!!!
!
!
! !
!
!!!!!!!!!!!!!
!!!!
!!!!
!
! !!
!!
!! !! !!!!
!
!
!
!
!
!!! !!! !
!
!
!! ! !!!!!! !!!! !
!
!!!
!! !
!
!!!!!
!
!!!!
!! !! !!!! !! !!!!!!!! !!!!!!
! !!!!
!!!!!
!
!!!!!!!!
!
!!! !!! !!! ! !!!! !! !!! !! !! !
!
!!
!!!
!!!! !!!
!
!
!!
!
!!!!! !! !!!
!
!
!
! !! ! !
!
!!!!! !!
!
!!!!! !!
!! !! !!!!!! !!! !!!!!!!! !! ! !! !!!! !!!!! ! !!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
50 100
150
Low
!!!! ! !! !!!! !!! !! !!! !!! !!! !!!!! !! ! !! !!! ! !!!
! !!! !!!! !! ! !! !!!!
! ! !!! !!! !!
!
!
!! !!! ! !!!
!
! !
!
! !!
!
!!
!! !!!!!! ! ! !!!
!
!
!! ! !
!
!!
!
!
!
! ! !!
!!
!
!
!!!
!!!!!
!!
!
!
!
!
! !
!
!!
!
!
!
!!!!
!
!!! ! !! !!!!
!
!!
!
!
!!
! !
!
!!
!!
!! !!!!
!
!! !
!
!
!
!
! !
!
!!
!
!! !!
!
!
!
!!
!!
!!!
!
!
!
!
!
!
!
!!!!
!
!!!!
!!
!
!
!! !!! !
!
!
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !! !
!
!
!!!
!
!!!
!
!
!
!
!! !!!
!!
!!
!
!!! !
!
!!
!
!
!
!
!!! !
!
!
!
!!!!!!!!!
!
!!!! !
!
!
!
!
!
! !!! !
!!
!
!!!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!!
!
!!
!
! !!! !!! !!!
!
!
!
!
!!
!
!! !
!!
!
!!
!
!!
!
!
!!
!!
!!
!
!
!
!
!
!
!!!!
!!
!!
!
!!!!
!
!
!
!
!
!
!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!!!!!
!
!
! !!
!
!!
!!
!
!!! !
!
!
!! !!
!
!
!
!!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!!
!
!
! !
!!
!
!
!! !
!
!!
!
!
!
!
!!!!!!!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!!!!!!! !!!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!!
!!
!
!
!
!
!!
!
!!
!
!! !!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!!
!!
!!
!
!
!
!!
!
!
!
!
!!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!! !!
!
!
!!!
!! !!
!!!!
!
!! !
!
!!
!
!!!!
!
!
!
!
!
!!! !
!
! !!
!
!
! !! !!
!
!
!
! !
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!!! ! !!
!
!
!
!!
!!
!
! !!! !!! !! ! !
!!
!
!! !!
!
!!
!
!!!
!
!
!!
!
!
!
!
!
!
!!
!
! !! !!!
!
!!
!
!
!
!
!
!!
!
!!!!
!
! !!
!
!!! !!!!!! !
!
!!!!!
!
!! !
!
! !
!
!
!
!!
!!
!!!
!!!
!
!
!!! !!!!
!
!
!!!!
!
!!
!
! !
! !
!
!!
!
!!
!
!
!!
!
!!
!
!
!
! !
!
!
! !
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!
! !!
!
!!
!
!
!
!
!
!!
!
!
!
!!
!
!!
!!!
!
!!
!
!
!
!
! ! !
!
!!
!
!
!!!!!
!
! !!!
! !!!!!! !!!!!!!! !! !!!!!!!! !! !!! !!!!!!!! !!! !! !! !!! !!!!! !!! !! !!!! !!!!! !!!!!!!!!!! !!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!! !!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!! !!!!!!!!!!
!!! !!!!!! !!! !!! !!!!! !! !!! !! !!!! !!!!
!! !! !!! !!!! !! !
!
!!
!!
!
!!! !
! !!!
!
!
!!!!! !! !!!!!!!!!! !
!
! !!! !!!
!!!
!!
!
!!
!
!
!
!
!
!
! !
!
!!!!!!!!! !
!
!!! !!!!
!
!!
!
!!!!!
!
!
!! !!!
!
!
! !
!
!
!
!
!!!
!
! !!
!
!
!
!!!
!
!
!
!
!
!
!
! !
!
!!
!
!
!
!
! !
!
!!!
!
!
!!
!
!
!
! ! !!
!
!!!
!
!!!!
!
!!
!!
!
!
!!!!!!!!!!!
!
!
!
!
!
!
!
!
!
!!! !!!
!
! !
!!
!
!
!
!
!
!!!! !!!!! !!
!
!!!!
! !!
!!!
!
!!!!
!
!
!
!!
!
!
!!
!!!!!!!!!! !!
!
!
!
!
!
!!
!
!
! !!
!
!
!
!!!!!!
!
!
!
!
!!
!
!!!
!
!
!
!!
!!!!
!!!
!!
!
!
!
!
! !
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!!!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!!
!
!! !
!
!!
!
!
!!!!!!!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!!! !
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!! !!!!!!!!!! !!!!!
!
!
!
!
!
!!!
!
!
!!
!!
!
!
!
!
!
!!
!!!
!!
!
!
!
!
!
!!! ! !!!
!! !!! !
!!!!!! !!! !
!
!!!! !! !
!
!!!
!!
!!!!
! !! !!! !!!!!!!! !!!
!!!!! !
!
! !!
! !!!
!! !! ! !!!
!! !!!!
!
!
!
!
!!
!
!
!
!!
!
!
! !!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!!
!
!!!!!!!!! ! ! !!! ! !!! ! !!!!!!!!!!!! !!!!!!! ! !!!!!! !!!!!!!!!!! !!!!!!!!!! !!!!!!!!!!! !! !!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! !!!! !!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20 40 60 80 100
Resistance
!!!! !! ! !!! !! !!!! !!! !! ! !!! !! !! !!!!! !!! ! !!! !! !!!! !!!! !!
!
!!!! !!!!!
!!!
!
!!
!
!!
!
!
! !!!!
! !!!
!!
!
!
! ! !! !
!! !!
!
!! !!
! !!! !!! !!! !!
! !
!
!
!
!!!! !!!!! !!!!!!!!
!
! !!!!
!
!!!!!!!!!! !! !!! !!!! !!! !! !!!!! !!! !!! !
!! ! !!!! !!!!!
!
!!
!!
!!!! ! !!!!!
!!!! !
!
! !!! !!! ! !!!
!
!
!! !!!
!
!! !!! !!
!
!!!!
!
!!
!!!
!
!! !! !!!!!
!!! !!
!
!
!
! !! !!! ! !!!! !!
!
!
!
!! !! !! !!
! !!
! !!!!
!
!! !! !!!
! !! !!
!
!
!
!
!!! !! ! !! !
!!
!!
!
!!! !!!! ! !! !!!!!
!!! !!
!! ! ! !!! !!!!! ! !! !!!!! !!!! !! !!!!!!!!!!! !!!!! !!!!!!!!!!!!!!!!!!!!!
!!!!!! !!! !!!! ! !! !! ! !!! !! ! !!!!! !!!!!! !!!!! !!
!
!
!
!!!
! !!!!
!
!
!
!!
!
!!
!
!
!!
!
!
! !!
!
! !
!
!
! !!!
!
! !!!!! !
!
!!!
!
!
!!! !
!
! !
!
!
!
!!!! !!!!! !!!!!!!!
!
! !!!!
!
!!!!!!! ! !! !!!!!
! !!!!!!
!!!!!!
!
! ! !!! !!!!!
!
!
!
!
!!!
!!!! !!!
! !
!
!
!
!! !!!!
!
!
!
!
!! !!!
!
!! !! !!
!
!!
!!
!
!
!
!
!
!! !
!!
!
!
!! !
!
!
!
!!
! !!!! !!!
!
!
!
!
!! !!!!! !
!
! !!
!
!
!
!! !!
!
!!
!!
!
!!
!
!!!
!
! !
!
!
!!
!
!! !!!! !!
! !! !!
!
!!
!
!! !!!!! ! !! !!!!!!! !!!!! !!!!!!!!!!!!!!! !!!!!! !!!!!!!!!!!!!!!
20 40 60 80 100120
Virulence
!! ! !!!! ! ! !!!!!!
! !!! !!! !!!
!! !!! !! !!
!
! !! !!! !!!!! !!!
!!!!!!! ! !!! !! ! !!!
!!! !
!
! !!
!
! !!!!
!
! !!! !
!
! !!! ! ! !! !! ! !!! !!! !!!
! !!
!
! !! ! !!
! !!!!!!! !
!
!! !
!
!
!
!!!! ! !!
!
!
!! !! !!
!
!
!
!!!
!
! ! !!!!! !!!! !
!
!
!
! !!! !
!
! !!!
!
!!!!!
!
!!
! !!
!
!!! !! !
!
!
!!
!! !!!!!!!
!
!
!!!! !!
!
!
!
!! !!! !
!
!!!
! !!!
!
!!! !!!!
!
!
! ! !!!! !!!
!! !!! !! !!
!
!
!!
!!! !! !! ! !! !
!
! !!! !!! !! !! !!
!
!
!!!! !!!! !!!!!
! !!
!!
!
!!
!!! ! !!!!! !!!!!!!!!!!! !
!
!!!!!!!
! !
!
! !!!!
!
!! !! ! !!! !!
!!
!
!! !
!
!!!!!
!! !
!
!
!
!!
!
!!
!
!!!
! !
!!!
!!!
!
!!! !!!!!!! ! !!!! ! !!! ! !!! !! !!!! !!!!!! !!!!!!! ! !!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!
!! !! !! !!!
!!
!
!!!! ! !! !!!! !! !! !!!!!!! !!!!!!! !!!!! ! !! !!! !!!
!!! !!! !
!
!!!!!
!!!! !! !!! !!!!!! !! !!!
!
!!! ! !!!!! !! !!!!
!
!
! ! !!
!!
!!!!!!! !!!!!! !! !!!!! ! !!
! ! !!!!! !!!
!! !! !! !!!!
!!! !!!
!!
!!! !!
!
! ! !!!! !! !
! !!!!! !!
!
!!! ! !!
!
!!!!! !!!!
! !!
!!!! ! !! !!!
!!!!! !!! !!
!!!!! !! !!!
!!!! !! !! !!!!!!!!!!
!
!
!!!!!!!!!!!! ! !!!!
!!!
!!!! !!
!!
! !! !! !!!!!!! !!! !! ! !!!!!!! ! !!!! ! !! !!!!!! !!!!!!!! ! !!!! !!!!!!!!!! !!!!! !!!!!!!!!!!!!!!!! !!!!! !!!!!! ! !!!!!
10 20 30 40 50 60 70
7095
r2 = 0.25 p-value < 2.2e-16
r2 = 0.44 p-value < 2.2e-16
r2 = 0.36 p-value < 2.2e-16
r2 = 0.31 p-value < 2.2e-16
r2 = 0.25 p-value < 2.2e-16
r2 = 0.32 p-value < 2.2e-16
r2 = 0.35 p-value < 2.2e-16
r2 = 0.30 p-value < 2.2e-16
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
578 Figure 6. 579
The relationship between the network centrality measures and plasmid mobility. The mobile 580
plasmids are significantly more central in the networksof all and resistance genes, as indicated 581
by the p-values (calculated with Mann-Whitney tests) embedded in the figure. 582
583
584
Mobility
50
100
150
0
200
400
600
800
All sequences
!
!
!
!
!!
!
!
!!
!
!
!
!!
!
!
!!
!
!!
!!
!!!
!
!!
!!
!
!
!
!
!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!!
!
!
!!
!!
!
!!
!
!
!!
!
!!
!
!!
!!!
!
!
!!
!
!!
!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!!!
!
!
!!
!
!!
!
!
!
!
!!
!
!!
!
!
!!
!
!!
!
!
!!
!!!
!!!!!
!
!!!
!!!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!!!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!!!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!!!!!!!!!!!
Mobile Non−mobile
Resistance
!!
!
!!
!!
!
!!
!!!
!
!
!!!
!
!
!
!!
!
!!!
!!
!
!
!
!
!
!
!!!!
!
!!
!!
!
!!
!!
!
!
!
!
Mobile Non−mobile
Virulence
!!!!!!
!!!!
!
!
!!!!
!
!!!
!
!!
!!
!
!
!!!!!!!
!!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!!!!!
!!
!
!
!!
!
!
!
!
!
!!!
!!
!!!!
!
!
!
!!!
!
!
!
!!
!
!!!
!
!!!!!!!!
!!!!!!!!!!!!!
!
!!!!!!
!
!!!
!
!!!!!
Mobile Non−mobile
Degree
Betweenness
Similarity70%95%
70% p-value 8.8e-16 95% p-value 4.8e-6
70% p-value 5.2e-3 95% p-value 1.7e-3
70% p-value 6.2e-1 95% p-value 2.1e-1
70% p-value 3.3e-19 95% p-value 1.5e-6
70% p-value 7.9e-3 95% p-value 4.9e-2
70% p-value 1.7e-1 95% p-value 2.9e-1
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
584 Figure 7 585
The frequency of inter-class gene transfer events in the networks. A plus sign (+) is used to 586
mark the inter-class transfers that were more abundant than expected by random assignment 587
of the transfer events between plasmids (permutation test, p-value < 1e-4). A minus sign (-) is 588
used to mark the inter-class transfers that were less abundant than expected by random 589
assignment of the transfer events between plasmids (permutation test, p-value < 1e-4). 590
591
592
Acaryochloris−ChroococcalesActinobacteridae−AlphaproteobacteriaActinobacteridae−BetaproteobacteriaActinobacteridae−Deltaproteobacteria
Actinobacteridae−GammaproteobacteriaActinobacteridae−Halobacteria
Actinobacteridae−LactobacillalesActinobacteridae−NitrospiralesAlphaproteobacteria−Bacillales
Alphaproteobacteria−BetaproteobacteriaAlphaproteobacteria−Chroococcales
Alphaproteobacteria−DeinococciAlphaproteobacteria−Deltaproteobacteria
Alphaproteobacteria−GammaproteobacteriaAlphaproteobacteria−Nitrospirales
Alphaproteobacteria−NostocalesBacillales−Betaproteobacteria
Bacillales−ClostridiaBacillales−Deltaproteobacteria
Bacillales−GammaproteobacteriaBacillales−Halobacteria
Bacillales−LactobacillalesBacillales−Thermomicrobiales
Bacteria−GammaproteobacteriaBetaproteobacteria−Deinococci
Betaproteobacteria−DeltaproteobacteriaBetaproteobacteria−Gammaproteobacteria
Betaproteobacteria−NitrospiralesChroococcales−Gammaproteobacteria
Chroococcales−NostocalesDeinococci−Gammaproteobacteria
Deinococci−ThermomicrobialesDeltaproteobacteria−Gammaproteobacteria
Deltaproteobacteria−NitrospiralesGammaproteobacteria−Lactobacillales
Gammaproteobacteria−NitrospiralesNostocales−Thermomicrobiales
All sequences
0 100200300400500600
Resistance0 100200300400500600
Virulence
0 100200300400500600
Identity70%95%
Frequency
!"
!"
!"
!"!"!"!"
!"!" !"
!"
!"
!" !"
!"
!"!"
!"!"
!"
!" !"
!"
!"!"
!"
!"
!"
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
592 Figure 8 593
COG functional annotation of the genes shared by the plasmids belonging to the different 594
taxonomical classes of the dataset. 595
596
597
Acaryochloris−ChroococcalesActinobacteridae−AlphaproteobacteriaActinobacteridae−BetaproteobacteriaActinobacteridae−Deltaproteobacteria
Actinobacteridae−GammaproteobacteriaActinobacteridae−Halobacteria
Actinobacteridae−LactobacillalesActinobacteridae−NitrospiralesAlphaproteobacteria−Bacillales
Alphaproteobacteria−BetaproteobacteriaAlphaproteobacteria−Chroococcales
Alphaproteobacteria−DeinococciAlphaproteobacteria−Deltaproteobacteria
Alphaproteobacteria−GammaproteobacteriaAlphaproteobacteria−Nitrospirales
Alphaproteobacteria−NostocalesBacillales−Betaproteobacteria
Bacillales−ClostridiaBacillales−Deltaproteobacteria
Bacillales−GammaproteobacteriaBacillales−Halobacteria
Bacillales−LactobacillalesBacillales−Thermomicrobiales
Bacteria−GammaproteobacteriaBetaproteobacteria−Deinococci
Betaproteobacteria−DeltaproteobacteriaBetaproteobacteria−Gammaproteobacteria
Betaproteobacteria−NitrospiralesChroococcales−Gammaproteobacteria
Chroococcales−NostocalesDeinococci−Gammaproteobacteria
Deinococci−ThermomicrobialesDeltaproteobacteria−Gammaproteobacteria
Deltaproteobacteria−NitrospiralesGammaproteobacteria−Lactobacillales
Gammaproteobacteria−NitrospiralesNostocales−Thermomicrobiales
Actinobacteridae−AlphaproteobacteriaActinobacteridae−Betaproteobacteria
Actinobacteridae−GammaproteobacteriaActinobacteridae−Lactobacillales
Alphaproteobacteria−BetaproteobacteriaAlphaproteobacteria−Gammaproteobacteria
Bacillales−GammaproteobacteriaBacillales−Lactobacillales
Betaproteobacteria−GammaproteobacteriaDeltaproteobacteria−Gammaproteobacteria
Gammaproteobacteria−Lactobacillales[C] Energy production and conversion
[D] Cell cycle control, cell division, chromosom
e partitioning
[E] Amino acid transport and m
etabolism
[F] Nucleotide transport and metabolism
[G] Carbohydrate transport and m
etabolism
[H] Coenzyme transport and m
etabolism
[I] Lipid transport and metabolism
[J] Translation, ribosomal structure and biogenesis
[K] Transcription
[L] Replication, recombination and repair
[M] Cell wall/m
embrane/envelope biogenesis
[N] Cell motility
[O] Posttranslational m
odification, protein turnover, chaperones
[P] Inorganic ion transport and metabolism
[Q] Secondary m
etabolites biosynthesis, transport and catabolism
[R] General function prediction only
[S] Function unknown
[T] Signal transduction mechanism
s
no functional class
70%95%
Frequency
0
100
200
300
400
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
597 Figure 9 598
An ideogram of gene transfers between Actinobacterial plasmids (accession numbers 599
NC_004939, NC_004945 and NC_014167) and Gammaproteobacterial plasmids (accession 600
numbers NC_006816, NC_009141, NC_009651, NC_010488, NC_010886 and NC_011092). 601
Gene transfer events are marked using the curves in the middle of the ideogram. GC content 602
of the plasmids is plotted on the outer side of the plasmid molecules if it is above the average 603
of the GC content of the corresponding plasmid. Genes related to resistance, conjugation and 604
transposition are marked as lines on outer, middle and innermost rings, respectively, on the 605
inner side of the plasmid ring. 606
607
608
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Supplemental Information: 608
Supplemental Information S1: Detailed information for the complete sequence dataset used 609
in this work. 610
Supplemental Information S2: Complete antibiotic resistance related sequence dataset used 611
in this work. 612
Supplemental Information S3: Complete virulence factors related sequence dataset used in 613
this work. 614
Supplemental Information S4: The overall Gephi-formatted networks built in this work. 615
Supplemental Information S5: Phylogenetic coherence for 70% and 95% network clusters 616
and COG functional categories of genes shared within the major clusters. For each plasmids 617
cluster we report a) phylogenetic network built with Splitstree using as input all the different 618
ML phylogenetic trees obtained from the alignments of all the gene families shared by the 619
different plasmids included int the cluster, b) phylogenetic networks obtained using as input 620
the gene families belonging to coherent groups as assessed by Conclustador and c) the COG 621
functional annotation of the shared sequence 622
Supplemental Information S6: Networks built with antiobiotic resistance (A and B) and 623
virulence (C and D) related sequences at 70% and 95% identity thresholds 624
Supplemental Information S7: Networks built lowering the threshold of gene sharing to 625
between 1 and 5 genes at A) 70% and B) 95% identity thresholds. In C) and D) the 626
phylogenetic class distribution of the disconnected plasmids and their inter-taxa connections 627
are also reported. 628
Supplemental Information S8: Phylogenetic distances of gene sharing between 629
Actinobacterial and Gammaproteobacterial plasmids. 630
Supplemental Information S9: GC contents of Actinobacterial and Gammaproteobacterial 631
plasmids and the transferred genes. 632
633
Acknowledgements 634
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
The study was financially supported by the Academy of Finland (grant number 129873) and 635
the Finnish Graduate School in Environmental Science and Technology (EnSTe). MF is 636
financed by a post-doctoral grant from “Fondazione Adriano Buzzati-Traverso”. The authors 637
would like to thank Kimmo Mattila for his kind assistance in parallel BLAST analyses. 638
639
References 640 641 Altschul, SF, TL Madden, AA Schaffer, J Zhang, Z Zhang, W Miller, DJ Lipman. 1997. 642
Gapped BLAST and PSI-‐BLAST: a new generation of protein database search 643 programs. Nucleic Acids Res 25:3389-‐3402. 644
Baquero, F, JL Martinez, R Canton. 2008. Antibiotics and antibiotic resistance in water 645 environments. Curr Opin Biotechnol 19:260-‐265. 646
Bastian, M, S Heymann, M Jacomy. 2009. Gephi: An Open Source Software for Exploring 647 and Manipulating Networks. International AAAI Conference on Weblogs and 648 Social Media. 649
Bennett, PM. 2008. Plasmid encoded antibiotic resistance: acquisition and transfer of 650 antibiotic resistance genes in bacteria. Br J Pharmacol 153 Suppl 1:S347-‐357. 651
Blondel, VD, J Guillaume, R Lambiotte, E Lefebvre. 2008. Fast unfolding of communites in 652 large networks. Journal of Statistical Mechanics: Theory and Experiment P10008. 653
Bosi, E, R Fani, M Fondi. 2011. The mosaicism of plasmids revealed by atypical genes 654 detection and analysis. BMC Genomics 12:403. 655
Brilli, M, A Mengoni, M Fondi, M Bazzicalupo, P Lio, R Fani. 2008. Analysis of plasmid 656 genes by phylogenetic profiling and visualization of homology relationships using 657 Blast2Network. BMC Bioinformatics 9:551. 658
Brohee, S, K Faust, G Lima-‐Mendez, G Vanderstocken, J van Helden. 2008. Network 659 Analysis Tools: from biological networks to clusters and pathways. Nat Protoc 660 3:1616-‐1629. 661
Caryl, JA, AJ O'Neill. 2009. Complete nucleotide sequence of pGO1, the prototype 662 conjugative plasmid from the Staphylococci. Plasmid 62:35-‐38. 663
Chen, CY, GW Nace, B Solow, P Fratamico. 2007. Complete nucleotide sequences of 84.5-‐ 664 and 3.2-‐kb plasmids in the multi-‐antibiotic resistant Salmonella enterica serovar 665 Typhimurium U302 strain G8430. Plasmid 57:29-‐43. 666
Chen, L, J Yang, J Yu, Z Yao, L Sun, Y Shen, Q Jin. 2005. VFDB: a reference database for 667 bacterial virulence factors. Nucleic Acids Res 33:D325-‐328. 668
Cole, JR, B Chai, RJ Farris, Q Wang, AS Kulam-‐Syed-‐Mohideen, DM McGarrell, AM 669 Bandela, E Cardenas, GM Garrity, JM Tiedje. 2007. The ribosomal database project 670 (RDP-‐II): introducing myRDP space and quality controlled public data. Nucleic 671 Acids Res 35:D169-‐172. 672
Cole, JR, Q Wang, E Cardenas, et al. 2009. The Ribosomal Database Project: improved 673 alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141-‐145. 674
Crosa, JH, LK Luttropp, S Falkow. 1975. Nature of R-‐factor replication in the presence of 675 chloramphenicol. Proc Natl Acad Sci U S A 72:654-‐658. 676
Csardi, GN, T. 2006. The igraph software package for complex network research. 677 InterJournal Complex Systems. 678
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Dagan, T, Y Artzy-‐Randrup, W Martin. 2008. Modular networks and cumulative impact of 679 lateral transfer in prokaryote genome evolution. Proc Natl Acad Sci U S A 680 105:10039-‐10044. 681
Dagan, T, W Martin. 2009. Getting a better picture of microbial evolution en route to a 682 network of genomes. Philos Trans R Soc Lond B Biol Sci 364:2187-‐2196. 683
Dagan, T, M Roettger, D Bryant, W Martin. 2010. Genome networks root the tree of life 684 between prokaryotic domains. Genome Biol Evol 2:379-‐392. 685
DeSantis, TZ, Jr., P Hugenholtz, K Keller, EL Brodie, N Larsen, YM Piceno, R Phan, GL 686 Andersen. 2006. NAST: a multiple sequence alignment server for comparative 687 analysis of 16S rRNA genes. Nucleic Acids Res 34:W394-‐399. 688
Edgar, RC. 2004. MUSCLE: a multiple sequence alignment method with reduced time and 689 space complexity. BMC Bioinformatics 5:113. 690
Felsestein, J. 1989. PHYLIP -‐ Phylogenetic inference package (Version 3.2). Cladistics 691 Cladistics 5: 164-‐166. :3. 692
Fondi, M, G Bacci, M Brilli, MC Papaleo, A Mengoni, M Vaneechoutte, L Dijkshoorn, R 693 Fani. 2010. Exploring the evolutionary dynamics of plasmids: the Acinetobacter 694 pan-‐plasmidome. BMC Evol Biol 10:59. 695
Fondi, M, R Fani. 2010. The horizontal flow of the plasmid resistome: clues from inter-‐696 generic similarity networks. Environ Microbiol. 697
Friedberg, I. 2006. Automated protein function prediction-‐-‐the genomic challenge. Brief 698 Bioinform 7:225-‐242. 699
Froehlich, B, J Parkhill, M Sanders, MA Quail, JR Scott. 2005. The pCoo plasmid of 700 enterotoxigenic Escherichia coli is a mosaic cointegrate. J Bacteriol 187:6509-‐701 6516. 702
Hacker, J, JB Kaper. 2000. Pathogenicity islands and the evolution of microbes. Annu Rev 703 Microbiol 54:641-‐679. 704
Halary, S, JW Leigh, B Cheaib, P Lopez, E Bapteste. 2010. Network analyses structure 705 genetic diversity in independent genetic worlds. Proc Natl Acad Sci U S A 706 107:127-‐132. 707
Huson, DH, D Bryant. 2006. Application of phylogenetic networks in evolutionary 708 studies. Mol Biol Evol 23:254-‐267. 709
Karlin, S. 2001. Detecting anomalous gene clusters and pathogenicity islands in diverse 710 bacterial genomes. Trends Microbiol 9:335-‐343. 711
Kohiyama, M, S Hiraga, I Matic, M Radman. 2003. Bacterial sex: playing voyeurs 50 years 712 later. Science 301:802-‐803. 713
Krzywinski, M, J Schein, I Birol, J Connors, R Gascoyne, D Horsman, SJ Jones, MA Marra. 714 2009. Circos: an information aesthetic for comparative genomics. Genome Res 715 19:1639-‐1645. 716
Le Roux, F, Y Labreuche, BM Davis, N Iqbal, S Mangenot, C Goarant, D Mazel, MK Waldor. 717 2010. Virulence of an emerging pathogenic lineage of Vibrio nigripulchritudo is 718 dependent on two plasmids. Environ Microbiol. 719
Leigh, JW, K Schliep, P Lopez, E Bapteste. 2011. Let Them Fall Where They May: 720 Congruence Analysis in Massive, Phylogenetically Messy Datasets. Mol Biol Evol. 721
Leplae, R, A Hebrant, SJ Wodak, A Toussaint. 2004. ACLAME: a CLAssification of Mobile 722 genetic Elements. Nucleic Acids Res 32:D45-‐49. 723
Lima-‐Mendez, G, A Toussaint, R Leplae. 2007. Analysis of the phage sequence space: the 724 benefit of structured information. Virology 365:241-‐249. 725
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Lima-‐Mendez, G, J Van Helden, A Toussaint, R Leplae. 2008. Reticulate representation of 726 evolutionary and functional relationships between phage genomes. Mol Biol Evol 727 25:762-‐777. 728
Liu, B, M Pop. 2009. ARDB-‐-‐Antibiotic Resistance Genes Database. Nucleic Acids Res 729 37:D443-‐447. 730
Medini, D, C Donati, H Tettelin, V Masignani, R Rappuoli. 2005. The microbial pan-‐731 genome. Curr Opin Genet Dev 15:589-‐594. 732
Miao, V, J Davies. 2010. Actinobacteria: the good, the bad, and the ugly. Antonie Van 733 Leeuwenhoek 98:143-‐150. 734
Norman, A, LH Hansen, SJ Sorensen. 2009. Conjugative plasmids: vessels of the 735 communal gene pool. Philos Trans R Soc Lond B Biol Sci 364:2275-‐2289. 736
Osborn, AM, FM da Silva Tatley, LM Steyn, RW Pickup, JR Saunders. 2000. Mosaic 737 plasmids and mosaic replicons: evolutionary lessons from the analysis of genetic 738 diversity in IncFII-‐related replicons. Microbiology 146 ( Pt 9):2267-‐2275. 739
R Development Core Team. 2010. R: A Language and Environment for Statistical 740 Computing. 741
Reynaud, Y, D Saulnier, D Mazel, C Goarant, F Le Roux. 2008. Correlation between 742 detection of a plasmid and high-‐level virulence of Vibrio nigripulchritudo, a 743 pathogen of the shrimp Litopenaeus stylirostris. Appl Environ Microbiol 744 74:3038-‐3047. 745
Riley, MA, DM Gordon. 1999. The ecological role of bacteriocins in bacterial competition. 746 Trends Microbiol 7:129-‐133. 747
Rodrigue, S, RR Malmstrom, AM Berlin, BW Birren, MR Henn, SW Chisholm. 2009. Whole 748 genome amplification and de novo assembly of single bacterial cells. PLoS One 749 4:e6864. 750
Schluter, A, L Krause, R Szczepanowski, A Goesmann, A Puhler. 2008. Genetic diversity 751 and composition of a plasmid metagenome from a wastewater treatment plant. J 752 Biotechnol 136:65-‐76. 753
Smillie, C, MP Garcillan-‐Barcia, MV Francia, EP Rocha, F de la Cruz. 2010. Mobility of 754 plasmids. Microbiol Mol Biol Rev 74:434-‐452. 755
Stepanauskas, R, ME Sieracki. 2007. Matching phylogeny and metabolism in the 756 uncultured marine bacteria, one cell at a time. Proc Natl Acad Sci U S A 104:9052-‐757 9057. 758
Thomas, CM, KM Nielsen. 2005. Mechanisms of, and barriers to, horizontal gene transfer 759 between bacteria. Nat Rev Microbiol 3:711-‐721. 760
Toussaint, A, C Merlin. 2002. Mobile elements as a combination of functional modules. 761 Plasmid 47:26-‐35. 762
van Rhijn, P, J Vanderleyden. 1995. The Rhizobium-‐plant symbiosis. Microbiol Rev 763 59:124-‐142. 764
Vlasblom, J, S Wu, S Pu, M Superina, G Liu, C Orsi, SJ Wodak. 2006. GenePro: a Cytoscape 765 plug-‐in for advanced visualization and analysis of interaction networks. 766 Bioinformatics 22:2178-‐2179. 767
Wickham, H. 2009. ggplot2: elegant graphics for data analysis. New York: Springer. 768 Wright, GD. 2007. The antibiotic resistome: the nexus of chemical and genetic diversity. 769
Nat Rev Microbiol 5:175-‐186. 770 Wu, D, P Hugenholtz, K Mavromatis, et al. 2009. A phylogeny-‐driven genomic 771
encyclopaedia of Bacteria and Archaea. Nature 462:1056-‐1060. 772 Yang, J, L Chen, L Sun, J Yu, Q Jin. 2008. VFDB 2008 release: an enhanced web-‐based 773
resource for comparative pathogenomics. Nucleic Acids Res 36:D539-‐542. 774
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from
Yi, H, Y Xi, J Liu, et al. 2010. Sequence analysis of pKF3-‐70 in Klebsiella pneumoniae: 775 probable origin from R100-‐like plasmid of Escherichia coli. PLoS One 5:e8601. 776
Zienkiewicz, M, I Kern-‐Zdanowicz, M Golebiewski, J Zylinska, P Mieczkowski, M 777 Gniadkowski, J Bardowski, P Ceglowski. 2007. Mosaic structure of p1658/97, a 778 125-‐kilobase plasmid harboring an active amplicon with the extended-‐spectrum 779 beta-‐lactamase gene blaSHV-‐5. Antimicrob Agents Chemother 51:1164-‐1171. 780
781 782
at Biblioteca di Scienze, U
niversit? degli studi di Firenze on January 11, 2012http://m
be.oxfordjournals.org/D
ownloaded from