1 TITLE An updated phylogeny of the Alphaproteobacteria reveals ...

67
1 TITLE 1 An updated phylogeny of the Alphaproteobacteria reveals that the parasitic Rickettsiales 2 and Holosporales have independent origins 3 AUTHORS 4 Sergio A. Muñoz-Gómez 1,2 , Sebastian Hess 1,2 , Gertraud Burger 3 , B. Franz Lang 3 , 5 Edward Susko 2,4 , Claudio H. Slamovits 1,2* , and Andrew J. Roger 1,2* 6 AUTHOR AFFILIATIONS 7 1 Department of Biochemistry and Molecular Biology; Dalhousie University; Halifax, 8 Nova Scotia, B3H 4R2; Canada. 9 2 Centre for Comparative Genomics and Evolutionary Bioinformatics; Dalhousie 10 University; Halifax, Nova Scotia, B3H 4R2; Canada. 11 3 Department of Biochemistry, Robert-Cedergren Center in Bioinformatics and 12 Genomics, Université de Montréal, Montreal, Quebec, Canada. 13 4 Department of Mathematics and Statistics; Dalhousie University; Halifax, Nova Scotia, 14 B3H 4R2; Canada. 15 16 *Correspondence to: Department of Biochemistry and Molecular Biology; Dalhousie 17 University; Halifax, Nova Scotia, B3H 4R2; Canada; 1 902 494 2881, [email protected] 18

Transcript of 1 TITLE An updated phylogeny of the Alphaproteobacteria reveals ...

1

TITLE 1

An updated phylogeny of the Alphaproteobacteria reveals that the parasitic Rickettsiales 2

and Holosporales have independent origins 3

AUTHORS 4

Sergio A. Muñoz-Gómez1,2, Sebastian Hess1,2, Gertraud Burger3, B. Franz Lang3, 5

Edward Susko2,4, Claudio H. Slamovits1,2*, and Andrew J. Roger1,2* 6

AUTHOR AFFILIATIONS 7

1 Department of Biochemistry and Molecular Biology; Dalhousie University; Halifax, 8

Nova Scotia, B3H 4R2; Canada. 9

2 Centre for Comparative Genomics and Evolutionary Bioinformatics; Dalhousie 10

University; Halifax, Nova Scotia, B3H 4R2; Canada. 11

3 Department of Biochemistry, Robert-Cedergren Center in Bioinformatics and 12

Genomics, Université de Montréal, Montreal, Quebec, Canada. 13

4 Department of Mathematics and Statistics; Dalhousie University; Halifax, Nova Scotia, 14

B3H 4R2; Canada. 15

16

*Correspondence to: Department of Biochemistry and Molecular Biology; Dalhousie 17

University; Halifax, Nova Scotia, B3H 4R2; Canada; 1 902 494 2881, [email protected] 18

2

ABSTRACT 19

The Alphaproteobacteria is an extraordinarily diverse and ancient group of bacteria. 20

Previous attempts to infer its deep phylogeny have been plagued with methodological 21

artefacts. To overcome this, we analyzed a dataset of 200 single-copy and conserved 22

genes and employed diverse strategies to reduce compositional artefacts. Such 23

strategies include using novel dataset-specific profile mixture models and recoding 24

schemes, and removing sites, genes and taxa that are compositionally biased. We 25

show that the Rickettsiales and Holosporales (both groups of intracellular parasites of 26

eukaryotes) are not sisters to each other, but instead, the Holosporales has a derived 27

position within the Rhodospirillales. A synthesis of our results also leads to an updated 28

proposal for the higher-level taxonomy of the Alphaproteobacteria. Our robust 29

consensus phylogeny will serve as a framework for future studies that aim to place 30

mitochondria, and novel environmental diversity, within the Alphaproteobacteria. 31

KEYWORDS: Holosporaceae, Holosporales, mitochondria, origin, Finniella inopinata, 32

Stachyamoba, Peranema, photosynthesis, Rhodospirillales, Azospirillaceae, 33

Rhodovibriaceae. 34

3

INTRODUCTION 35

The Alphaproteobacteria is an extraordinarily diverse and disparate group of bacteria 36

and well-known to most biologists for also encompassing the mitochondrial lineage 37

(Williams, Sobral, and Dickerman 2007; Roger, Muñoz-Gómez, and Kamikawa 2017). 38

The Alphaproteobacteria has massively diversified since its origin, giving rise to, for 39

example, some of the most abundant (e.g., Pelagibacter ubique) and metabolically 40

versatile (e.g., Rhodobacter sphaeroides) cells on Earth (Giovannoni 2017; Madigan, 41

Jung, and Madigan 2009). The basic structure of the tree of the Alphaproteobacteria 42

has largely been inferred through the analyses of 16S rRNA genes and several 43

conserved proteins (Garrity 2005; Lee et al. 2005; Rosenberg et al. 2014; Fitzpatrick, 44

Creevey, and McInerney 2006; Williams, Sobral, and Dickerman 2007; Brindefalk et al. 45

2011; Georgiades et al. 2011; Thrash et al. 2011; Luo 2015). Today, eight major orders 46

are well recognized, namely the Caulobacterales, Rhizobiales, Rhodobacterales, 47

Pelagibacterales, Sphingomonadales, Rhodospirillales, Holosporales and Rickettsiales 48

(the latter two formerly grouped into the Rickettsiales sensu lato), and their 49

interrelationships have also recently become better understood (Viklund, Ettema, and 50

Andersson 2012; Viklund et al. 2013; Rodríguez-Ezpeleta and Embley 2012; Wang and 51

Wu 2014, 2015). These eight orders were grouped into two subclasses by Ferla et al. 52

(2013): the subclass Rickettsiidae comprising the order Rickettsiales and 53

Pelagibacterales, and the subclass Caulobacteridae comprising all other orders. 54

The great diversity of the Alphaproteobacteria itself presents a challenge to deciphering 55

the deepest divergences within the group. Such diversity encompasses a broad 56

spectrum of genome (nucleotide) and proteome (amino acid) compositions (e.g., the 57

A+T%-rich Pelagibacterales versus the G+C%-rich Acetobacteraceae) and molecular 58

evolutionary rates (e.g., the fast-evolving Pelagibacteriales, Rickettsiales or 59

Holosporales versus many slow-evolving species in the Rhodospirillales) (Ettema and 60

Andersson 2009). This diversity may lead to pervasive artefacts when inferring the 61

phylogeny of the Alphaproteobacteria, e.g., long-branch attraction (LBA) between the 62

Rickettsiales and Pelagibacterales, especially when including mitochondria (Rodríguez-63

Ezpeleta and Embley 2012; Viklund, Ettema, and Andersson 2012; Viklund et al. 2013; 64

4

Luo 2015). Moreover, there are still important unknowns about the deep phylogeny of 65

the Alphaproteobacteria (Williams, Sobral, and Dickerman 2007; Ferla et al. 2013), for 66

example, the divergence order among the Rhizobiales, Rhodobacterales and 67

Caulobacterales (Williams, Sobral, and Dickerman 2007), the monophyly of the 68

Pelagibacterales (Viklund et al. 2013) and the Rhodospirillales (Ferla et al. 2013), and 69

the precise placement of the Rickettsiales and its relationship to the Holosporales 70

(Wang and Wu 2015; Martijn et al. 2018). 71

Systematic errors stemming from using over-simplified evolutionary models (which often 72

do not fit complex data as well by, for example, not accounting for compositional 73

heterogeneity across sites or branches) are perhaps the major confounding and limiting 74

factor to inferring deep evolutionary relationships; the number of taxa and genes (or 75

sites) can also be important factors. Previous multi-gene tree studies of the 76

Alphaproteobacteria were compromised by at least one of these problems, namely, 77

simpler or less realistic evolutionary models (because they were not available at the 78

time; e.g., Williams, Sobral, and Dickerman 2007 used the simple WAG+Γ4 model that 79

cannot account for compositional heterogeneity across sites), poor or uneven taxon 80

sampling (because the focus was too narrow or few genomes were available; e.g., 81

Williams, Sobral, and Dickerman 2007 had very few rhodospirillaleans and no 82

holosporaleans; Georgiades et al. 2011 included only 42 alphaproteobacteria with only 83

one pelagibacteralean) or a small number of genes (because the focus was 84

mitochondria; e.g., Rodríguez-Ezpeleta and Embley 2012 used 24 genes; Wang and 85

Wu 2015 relied on 29 genes; Martijn et al. 2018 also used 24 genes; or because only a 86

small set of 28 compositionally homogeneous genes was used, e.g., Luo 2015). The 87

most recent study on the phylogeny of the Alphaproteobacteria, and mitochondria, 88

attempted to counter systematic errors (or phylogenetic artefacts) by reducing amino 89

acid compositional heterogeneity (Martijn et al. 2018). Even though some deep 90

relationships were not robustly resolved, these analyses suggested that the 91

Pelagibacterales, Rickettsiales and Holosporales, which have compositionally-biased 92

genomes, are not each other’s closest relatives (Martijn et al. 2018). A resolved and 93

robust phylogeny of the Alphaproteobacteria is fundamental to addressing questions 94

such as how streamlined bacteria, intracellular parasitic bacteria, or mitochondria 95

5

evolved from their alphaproteobacterial ancestors. Therefore, a systematic study of the 96

different biases affecting the phylogeny of the Alphaproteobacteria, and its underlying 97

data, is much needed. 98

Here, we revised the phylogeny of the Alphaproteobacteria by using a large dataset of 99

200 conserved single-copy genes and employing carefully designed strategies aimed at 100

alleviating phylogenetic artefacts. We found that amino acid compositional 101

heterogeneity, and more generally long-branch attraction, were major confounding 102

factors in estimating phylogenies of the Alphaproteobacteria. In order to counter these 103

biases, we used novel dataset-specific profile mixture models and recoding schemes 104

(both specifically designed to ameliorate compositional heterogeneity), and removed 105

sites, genes and taxa that were compositionally biased. We also present three draft 106

genomes for endosymbiotic alphaproteobacteria belonging to the Rickettsiales and 107

Holosporales: (1) an undescribed midichloriacean endosymbiont of Peranema 108

trichophorum, (2) an undescribed rickettsiacean endosymbiont of Stachyamoeba 109

lipophora, and (3) the holosporalean ‘Candidatus Finniella inopinata’, an endosymbiont 110

of the rhizarian amoeboflagellate Viridiraptor invadens (Hess, Suthaus, and Melkonian 111

2015). Our results provide the first strong evidence that the Holosporales is unrelated to 112

the Rickettsiales and originated instead from within the Rhodospirillales. We incorporate 113

these and other insights regarding the deep phylogeny of the Alphaproteobacteria into 114

an updated taxonomy. 115

RESULTS 116

The genomes and phylogenetic positions of three novel endosymbiotic 117

alphaproteobacteria (Rickettsiales and Holosporales) 118

We sequenced the genomes of the novel holosporalean ‘Candidatus Finniella 119

inopinata’, an endosymbiont of the rhizarian amoeboflagellate Viridiraptor invadens 120

(Hess, Suthaus, and Melkonian 2015), and two undescribed rickettsialeans, one 121

associated with the heterolobosean amoeba Stachyamoeba lipophora and the other 122

with the euglenoid flagellate Peranema trichophorum. The three genomes are small with 123

a reduced gene number and high A+T% content, strongly suggesting an endosymbiotic 124

lifestyle (Table 1). Comparisons of their rRNA genes show that these genomes are truly 125

6

novel, being considerably divergent from other described alphaproteobacteria. As of 126

February 2018, the closest 16S rRNA gene to that of the Stachyamoeba-associated 127

rickettsialean belongs to Rickettsia massiliae str. AZT80, with only 88% identity. On the 128

other hand, the closest 16S rRNA gene to that of the Peranema-associated 129

rickettsialean belongs to an endosymbiont of Acanthamoeba sp. UWC8, which is only 130

92% identical. Phylogenetic analysis of both the 16S rRNA gene and a dataset that 131

comprises 200 single-copy conserved marker genes (see below) confirm that each 132

species belongs to different families and orders within the Alphaproteobacteria 133

(Supplementary file 1 and Figure 2-figure supplement 1). ‘Candidatus Finniella 134

inopinata’ belongs to the recently described ‘Candidatus Paracaedibacteraceae’ in the 135

Holosporales (Hess, Suthaus, and Melkonian 2015), whereas the Stachyamoeba-136

associated rickettsialean belongs to the Rickettsiaceae, and the Peranema-associated 137

rickettsialean belongs to the ‘Candidatus Midichloriaceae’, in the Rickettsiales.138

7

Table 1. Genome features for the three novel rickettsialeans sequenced in this study. 139

See Supplementary file 1 as well. 140

Species ‘Candidatus Finniella

inopinata’

Stachyamoeba-

associated

rickettsialean

Peranema-

associated

rickettsialean

Genome size 1,792,168 bp 1,738,386 bp 1,375,759 bp

N50 174,737 bp 1,738,386 bp 28,559 bp

Contig number 28 1 125

Gene number1 1,741 1,588 1,223

A+T% content 56.58% 67.01% 59.13%

Family Paracaedibacteraeae Rickettsiaceae ‘Candidatus

Midicloriaceae’

Order Holosporales Rickettsiales Rickettsiales

Completeness2 94.96% 97.12% (=100%) 92.08%

Redundancy2 0.0% 0.0% 2.1% 1 as predicted by Prokka v.1.13 (rRNA genes were searched with BLAST). 141

2 as estimated by Anvi’o v.2.4.0 using the Campbell et al., 2013 marker gene set.142

8

Compositional heterogeneity appears to be a major confounding factor affecting 143

phylogenetic inference of the Alphaproteobacteria 144

The average-linkage clustering of amino acid compositions shows that the Rickettsiales, 145

Pelagibacterales (together with alphaproteobacterium sp. HIMB59) and Holosporales 146

are clearly distinct from other alphaproteobacteria. This indicates that these three taxa 147

have divergent proteome amino acid compositions (Figure 1A). These taxa also have 148

the lowest GARP:FIMNKY amino acid ratios in all the Alphaproteobacteria (Figure 1A; 149

GARP amino acids are encoded by G+C%-rich codons, whereas FIMNKY amino acids 150

are encoded by A+T%-rich codons. Proteomes that have low GARP:FIMNKY ratios are 151

compositionally biased and therefore come from A+T%-rich genomes); the 152

Pelagibacterales (including alphaproteobacterium sp. HIMB59) being the most 153

divergent, followed by the Rickettsiales and then the Holosporales. Such biased amino 154

acid compositions appear to be the consequence of genome nucleotide compositions 155

that are strongly biased towards high A+T%—a scatter plot of genome G+C% and 156

proteome GARP:FIMNKY ratios shows a similar clustering of the Rickettsiales, 157

Pelagibacterales (including alphaproteobacterium sp. HIMB59) and Holosporales 158

(Figure 1B). This compositional similarity in the proteomes of the Rickettsiales, 159

Pelagibacterales (plus alphaproteobacterium sp. HIMB59) and Holosporales, which also 160

turn out to be the longest-branching alphaproteobacterial groups in previously published 161

phylogenies (e.g., Wang and Wu 2015), could be the outcome of either a shared 162

evolutionary history (i.e., the groups are most closely related to one another), or 163

alternatively, evolutionary convergence (e.g., because of similar lifestyles or 164

evolutionary trends toward small cell and genome sizes).165

9

166

167

10

Figure 1. Compositional heterogeneity in the Alphaproteobacteria is a major factor that 168

confounds phylogenetic inference. There are great disparities in the genome G+C% 169

content and amino acid compositions of the Rickettsiales, Pelagibacterales (including 170

alphaproteobacterium sp. HIMB59) and Holosporales with all other alphaproteobacteria. 171

A. A UPGMA (average-linkage) clustering of amino acid compositions (based on the 172

200 gene set for the Alphaproteobacteria) shows that the Rickettsiales (brown), 173

Pelagibacterales (maroon), and Holosporales (light blue) all have very similar proteome 174

amino acid compositions. At the tips of the tree, GARP:FIMNKY amino acid ratio values 175

are shown as bars. B. A scatterplot depicting the strong correlation between G+C% 176

(nucleotide compositions) and GARP:FIMNKY ratios (amino acid composition) for the 177

120 taxa in the Alphaproteobacteria (and outgroup) shows a similar clustering of the 178

Rickettsiales, Pelagibacterales (including alphaproteobacterium sp. HIMB59) and 179

Holosporales.180

11

As a first step to discriminate between these two alternatives, we used maximum 181

likelihood to estimate a tree on a dataset that comprised 200 single-copy and rarely 182

laterally-transferred marker genes for the Alphaproteobacteria (as determined by Phyla-183

AMPHORA; see Methods for more details; Wang and Wu 2013) under the site-184

heterogenous model LG+PMSF(ES60)+F+R6. The resulting tree united the 185

Rickettsiales, Pelagibacterales (with alphaproteobacterium sp. HIMB59 at its base) and 186

Holosporales in a fully supported clade (Figure 2A; see Figure 2-figure supplement 1 for 187

labeled trees). The clustering of these three groups is suggestive of a phylogenetic 188

artefact (e.g., long-branch attraction or LBA); indeed, such a pattern resembles the one 189

seen in the tree of proteome amino acid compositions (see Figure 1A). This is because 190

the three groups have the longest branches in the Alphaproteobacteria tree and have 191

compositionally-biased and fast-evolving genomes (see Figure 2). If evolutionary 192

convergence in amino acid compositions is confounding phylogenetic inference for the 193

Alphaproteobacteria, methods aimed at reducing compositional heterogeneity might 194

disrupt the clustering of the Rickettsiales, Pelagibacterales and Holosporales. 195

To further test whether the clustering of the Rickettsiales, Pelagibacterales and 196

Holosporales is real or artefactual we used several different strategies to reduce the 197

compositional heterogeneity of our dataset (see Figure 2-figure supplement 2 for the 198

diverse strategies employed). When removing the 50% most compositionally-biased 199

(heterogeneous) sites according to ɀ (a novel metric that measures amino acid 200

compositional disparity at a site; see Methods), the clustering between the Rickettsiales, 201

Pelagibacterales, alphaproteobacterium sp. HIMB59 and Holosporales is disrupted 202

(Figure 2B; see also Figure 2-figure supplement 3). The new more derived placements 203

for the Pelagibacterales, alphaproteobacterium sp. HIMB59 and Holosporales are well 204

supported (further described below), and support tends to increase as compositionally-205

biased sites are removed (Supplementary file 2-table S1). Furthermore, when each of 206

these long-branching and compositionally-biased taxa is analyzed in isolation (i.e., in 207

the absence of the other), and compositionally heterogeneity is further decreased, new 208

phylogenetic patterns emerge that are incompatible, or in conflict, with their clustering 209

(Figure 2-figure supplement 4 and Figure 3-figure supplement 1-5). Various strategies 210

to reduce compositional heterogeneity, such as removing the most compositionally-211

12

biased sites, recoding the data into reduced character-state alphabets, or using only the 212

most compositionally homogeneous genes, converge to very similar phylogenetic 213

patterns for the Alphaproteobacteria in which the clustering of the Rickettsiales, 214

Pelagibacterales, alphaprotobacterium sp. HIMB59 and Holosporales is disrupted; the 215

Pelagibacterales, alphaproteobacterium sp. HIMB59 and Holosporales have much more 216

derived phylogenetic placements (e.g., Figure 3, Figure 2-figure supplement 4 and 217

Figure 3-figure supplement 1-5). On the other hand, removing fast-evolving sites does 218

not disrupt the clustering of these three long-branching groups (Supplementary file 2-219

table S2), suggesting that high evolutionary rates per site are not a major confounding 220

factor when inferring the phylogeny of the Alphaproteobacteria.221

13

222 223

224

14

Figure 2. Decreasing compositional heterogeneity by removing compositionally-biased 225

sites disrupts the clustering of the Rickettsiales, Pelagibacterales (including 226

alphaprotobacterium sp. HIMB59) and Holosporales. All branch support values are 227

100% SH-aLRT and 100% UFBoot unless annotated. A. A maximum-likelihood tree 228

inferred under the LG+PMSF(ES60)+F+R6 model and from the untreated dataset which 229

is highly compositionally heterogeneous. The three long-branching orders, the 230

Rickettsiales, Pelagibacterales (including alphaprotobacterium sp. HIMB59) and 231

Holosporales, that have similar amino acid compositions form a clade. B. A maximum-232

likelihood tree inferred under the LG+PMSF(ES60)+F+R6 model and from a dataset 233

whose compositional heterogeneity has been decreased by removing 50% of the most 234

biased sites according to ɀ. In this phylogeny the clustering of the Rickettsiales, 235

Pelagibacterales and Holosporales is disrupted. The Pelagibacterales is sister to the 236

Rhodobacterales, Caulobacterales and Rhizobiales. The Holosporales, and 237

alphaproteobacterium sp. HIMB59, become sister to the Rhodospirillales. The 238

Rickettsiales remains as the sister to the Caulobacteridae. See Figure 2-supplement 239

figure 1 for taxon names. See Figure 2-figure supplement 3 for the Bayesian consensus 240

trees inferred in PhyloBayes MPI v1.7 under the CAT-Poisson+Γ4 model. See also 241

Figure 2-figure supplement 2 and 4-7. 242

243

15

The Holosporales is unrelated to the Rickettsiales and is instead most likely derived 244

within the Rhodospirillales 245

The Holosporales has traditionally been considered part of the Rickettsiales sensu lato 246

because it appears as sister to the Rickettsiales in many trees (e.g., Hess, Suthaus, and 247

Melkonian 2015; Montagna et al. 2013; Santos and Massard 2014). It is exclusively 248

composed of endosymbiotic bacteria living within diverse eukaryotes, and such a 249

lifestyle is shared with all other members of the Rickettsiales (with the possible 250

exception of a recently reported ectosymbiotic rickettialean; see Castelli et al. 2018). 251

When we decrease, and then account for, compositional heterogeneity, we recover tree 252

topologies in which the Holosporales moves away from the Rickettsiales (e.g., Figure 253

2B, Figure 2-figure supplement 4B and 4D). For example, the Holosporales becomes 254

sister to all free-living alphaproteobacteria (the Caulobacteridae) when only the 40 most 255

homogeneous genes are used (Figure 2-figure supplement 4D) or when 10% of the 256

most compositionally-biased sites are removed (Supplementary file 2-table S1). When 257

compositional heterogeneity is further decreased by removing 50% of the most 258

compositionally-biased sites, the Holosporales becomes sister to the Rhodospirillales 259

(Figure 2B and Supplementary file 2-table S1; and see also Figure 2-figure supplement 260

4B). 261

Similarly, when the long-branching and compositionally-biased Rickettsiales, 262

Pelagibacterales, and alphaproteobacterium sp. HIMB59 (plus the extremely long-263

branching genera Holospora and ‘Candidatus Hepatobacter’) are removed, after 264

compositional heterogeneity had been decreased through site removal, the 265

Holosporales move to a much more derived position well within the Rhodospirillales 266

(Figure 3A, Figure 3-figure supplement 1B, 1C and Figure 3-figure supplement 6). If the 267

very compositionally-biased and fast-evolving Holospora and ‘Candidatus Hepatobacter’ 268

are left in, the Holosporales are pulled away from its derived position and the whole 269

clade moves closer to the base of the tree (Figure 3-figure supplement 7). The same 270

pattern in which the Holosporales is derived within the Rhodospirillales is seen when 271

these same taxa are removed, and the data are then recoded into four- or six-character 272

states (Figure 3B, Figure 3-figure supplement 6 and Figure 3-figure supplement 8). 273

16

Specifically, the Holosporales now consistently branches as sister to a subgroup of 274

rhodospirillaleans that includes, among others, the epibiotic predator Micavibrio 275

aeruginosavorus and the purple nonsulfur bacterium Rhodocista centenaria (the 276

Azospirillaceae, see below) (Figure 3). This new placement of the Holosporales has 277

nearly full support under both maximum likelihood (>95% UFBoot; see Figure 3) and 278

Bayesian inference (>0.95 posterior probability; see Figure 3-figure supplement 6).Thus, 279

three different analyses independently converge to the same pattern and support a 280

derived origin of the Holosporales within the Rhodospirillales: (1) removal of 281

compositionally-biased sites (Figure 3A), (2) data recoding into four-character states 282

using the dataset-specific scheme S4 (Figure 3B and Figure 3-figure supplement 7), 283

and (3) data recoding into six-character states using the dataset-specific scheme S6 284

(Figure 3-figure supplement 8); each of these strategies had to be combined with the 285

removal of the Pelagibacterales, alphaproteobacterium sp. HIMB59, and Rickettsiales to 286

recover this phylogenetic position for the Holosporales.287

17

288 289

290

18

Figure 3. The Holosporales (renamed and lowered in rank to the Holosporaceae family 291

here) branches in a derived position within the Rhodospirillales when compositional 292

heterogeneity is reduced and the long-branching and compositionally-biased 293

Rickettsiales, Pelagibacterales, and alphaproteobacterium sp. HIMB59 are removed. 294

Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. A. A 295

maximum-likelihood tree, inferred under the LG+PMSF(ES60)+F+R6 model, to place 296

the Holosporaceae in the absence of the Rickettsiales, Pelagibacterales, and 297

alphaproteobacterium sp. HIMB59 and when compositional heterogeneity has been 298

decreased by removing 50% of the most biased sites. The Holosporaceae is sister to 299

the Azospirillaceae fam. nov. within the Rhodospirillales. B. A maximum-likelihood tree, 300

inferred under the GTR+ES60S4+F+R6 model, to place the Holosporaceae in the 301

absence of the Rickettsiales, Pelagibacterales, and alphaproteobacterium sp. HIMB59, 302

and when the data have been recoded into a four-character state alphabet (the dataset-303

specific recoding scheme S4: ARNDQEILKSTV GHY CMFP W) to reduce compositional 304

heterogeneity. This phylogeny shows a pattern that matches that inferred when 305

compositional heterogeneity has been alleviated through site removal. See Figure 3-306

figure supplement 6 for the Bayesian consensus trees inferred in PhyloBayes MPI v1.7 307

and under the and the CAT-Poisson+Γ4 model. See also Figure 3-figure supplement 1-308

5 and 7-8.309

19

A fourth independent analysis further supports a derived placement of the Holosporales 310

nested within the Rhodospirillales. Bayesian inference using the CAT-Poisson+Γ4 311

model, on a dataset whose compositional heterogeneity had been decreased by 312

removing 50% of the most compositionally-biased sites but for which no taxon had been 313

removed, also recovered the Holosporales as sister to the Azospirillaceae (see Figure 314

2-figure supplement 3). 315

The Rhodospirillales is a diverse order and comprises five well-supported families 316

The Rhodospirillales is an ancient and highly diversified group, but unfortunately this is 317

rarely obvious from published phylogenies because most studies only include a few 318

species for this order (Williams, Sobral, and Dickerman 2007; Georgiades et al. 2011; 319

Ferla et al. 2013). We have included a total of 31 Rhodospirillales taxa to better cover 320

its diversity. Such broad sampling reveals trees with five clear subgroups within the 321

Rhodospirillales that are well-supported in most of our analyses (e.g., Figure 2B and 3). 322

First is the Acetobacteraceae which comprises acetic acid (e.g., Acetobacter 323

oboediens), acidophilic (e.g., Acidisphaera rubrifaciens), and photosynthesizing 324

(bacteriochlorophyll-containing; e.g., Rubritepida flocculans) bacteria. The 325

Acetobacteraceae is strongly supported and relatively divergent from all other families 326

within the Rhodospirillales. Sister to the Acetobacteraceae is another subgroup that 327

comprises many photosynthesizing bacteria, including the type species for the 328

Rhodospirillales, Rhodospirillum rubrum, as well as the magnetotactic bacterial genera 329

Magnetospirillum, Magnetovibrio and Magnetospira (Figure 3). This subgroup best 330

corresponds to the poorly defined and paraphyletic Rhodospirillaceae family. We amend 331

the Rhodospirillaceae taxon and restrict it to the clade most closely related to the 332

Acetobacteraceae. As described above, when artefacts are accounted for, the 333

Holosporales most likely branches within the Rhodospirillales and therefore we suggest 334

the Holosporales sensu Szokoli et al. (2016) be lowered in rank to the family 335

Holosporaceae (containing e.g., Caedibacter sp. 37-49 and ‘Candidatus 336

Paracaedibacter symbiosus’), which is sister to the Azospirillaceae (Figure 3). The 337

Azospirillaceae fam. nov. contains the purple bacterium Rhodocista centenaria and the 338

epibiotic (neither periplasmic nor intracellular) predator Micavibrio aeruginosavorus, 339

20

among others. The Holosporaceae and the Azospirillaceae clades appear to be sister to 340

the Rhodovibriaceae fam. nov. (Figure 3), a well-supported group that comprises the 341

purple nonsulfur bacterium Rhodovibrio salinarum, the aerobic heterotroph Kiloniella 342

laminariae, and the marine bacterioplankter ‘Candidatus Puniceispirillum marinum’ (or 343

the SAR116 clade). Each of these subgroups and their interrelationships—with the 344

exception of the Holosporaceae that branches within the Rhodospirillales only after 345

compositional heterogeneity is countered—are strongly supported in nearly all of our 346

analyses (e.g., see Figure 2B and 3). 347

The Geminicoccaceae might be sister to all other free-living alphaproteobacteria (the 348

Caulobacteridae) 349

The Geminicocacceae is a recently proposed family within the Rhodospirillales 350

(Proença et al. 2017). It is currently represented by only two genera, Geminicoccus and 351

Arboriscoccus (Foesel et al. 2007; Proença et al. 2017). In most of our trees, however, 352

Tistrella mobilis is often sister to Geminicoccus roseus with full statistical support (e.g., 353

Figure 2B and 3A, but see Figure 3-figure supplement 6 for an exception) and we 354

therefore consider it to be part of the Geminococcaceae. Interestingly, the 355

Geminicoccaceae tends to have two alternative stable positions in our analyses, either 356

as sister to all other families of the Rhodospirillales (e.g., Figure 2A and 3A), or as sister 357

to all other orders of the Caulobacteridae (i.e., representing the most basal lineage of 358

free-living alphaproteobacteria; Figure 2B and Figure 2-figure supplement 3B, Figure 3B 359

and Figure 3-figure supplement 6, or Figure 2-figure supplement 4B, Figure 3-figure 360

supplement 1C, Figure 3-figure supplement 2B-D, Figure 3-figure supplement 3B, 3D, 361

and Figure 3-figure supplement 5C). Our analyses designed to alleviate compositional 362

heterogeneity, specifically site removal and recoding (without taxon removal), favor the 363

latter position for the Geminicoccaceae (Figure 2B and 3B). Moreover, as 364

compositionally-biased sites are progressively removed, support for the affiliation of the 365

Geminicoccaceae with the Rhodospirillales decreases, and after 50% of the sites have 366

been removed, the Geminicoccaceae emerges as sister to all other free-living 367

alphaproteobacteria with strong support (>95% UFBoot; Supplementary file 2-table S1). 368

In further agreement with this trend, the much simpler model LG4X places the 369

21

Geminicocacceae in a derived position as sister to the Acetobacteraceae (Figure 2-370

figure supplement 5), but as model complexity increases, and compositional 371

heterogeneity is reduced, the Geminicoccaceae moves closer to the base of the 372

Alphaproteobacteria (Figure 2A and 3A). Such a placement suggests that the 373

Geminicoccaceae may be a novel and independent order-level lineage in the 374

Alphaproteobacteria. However, because of the uncertainty in our results we opt here for 375

conservatively keeping the Geminicoccaceae as the sixth family of the Rhodospirillales 376

(Figure 3A). 377

Other deep relationships in the Alphaproteobacteria (Pelagibacterales, Rickettsiales, 378

alphaproteobacterium sp. HIMB59) 379

The clustering of the Pelagibacterales (formerly the SAR11 clade) with the Rickettsiales 380

and Holosporales is more easily disrupted than that of the Holosporales, either when 381

long-branching (or compositionally-biased) taxon removal is performed to control for 382

compositional attractions or not. The removal of compositionally-biased sites (from 30% 383

on; 16,320 out of 54,400 sites; see Supplementary file 2-table S1, Figure 2B, Figure 2-384

figure supplement 3B and Figure 3-figure supplement 4B), data recoding into four-385

character states (Figure 3-figure supplement 4C), and a set of the most compositionally 386

homogeneous genes (Figure 3-figure supplement 4D), all support a derived placement 387

of the Pelagibacterales as sister to the Rhodobacterales, Caulobacterales and 388

Rhizobiales. Attempts to account for compositional heterogeneity both across sites 389

(e.g., Rodríguez-Ezpeleta and Embley 2012; Viklund, Ettema, and Andersson 2012; 390

Viklund et al. 2013, and Martijn et al. 2018) and taxa (e.g., Luo et al. 2013; Luo 2015) 391

tend to disrupt the potentially artefactual clustering of the Pelagibacterales and the 392

Rickettsiales (in contrast to the studies of e.g., Williams, Sobral, and Dickerman 2007; 393

Thrash et al. 2011; and Georgiades et al. 2011 that did not account for compositional 394

heterogeneity). The Caulobacterales is sister to the Rhizobiales, and the 395

Rhodobacterales sister to both (e.g., Figure 2B and 3). This is consistent throughout 396

most of our results and such interrelationships become very robustly supported as 397

compositional heterogeneity is increasingly alleviated (Supplementary file 2-table S1). 398

The placement of the Rickettsiales as sister to the Caulobacteridae (i.e., all other 399

22

alphaproteobacteria) remains stable across different analyses (see Supplementary file 400

2-table S1, and also Figure 2B and Figure 3-figure supplement 2); this is also true when 401

the other long-branching taxa, the Pelagibacterales, alphaproteobacterium sp. HIMB59 402

and Holosporales, and even the Beta- Gammaproteobacteria outgroup, are removed 403

(see Figure 3-figure supplement 2 and Figure 3-figure supplement 3). Yet, the 404

interrelationships inside the Rickettsiales order remain uncertain; the ‘Candidatus 405

Midichloriaceae’ becomes sister to the Anaplasmataceae when fast-evolving sites are 406

removed (Supplementary file 2-table S2), but to the Rickettsiaceae when 407

compositionally-biased sites are removed (Supplementary file 2-table S1). The 408

placement of alphaproteobacterium sp. HIMB59 is uncertain (e.g., see Figure 2 and 409

Figure 2-figure supplement 3, and Figure 2-figure supplement 4 and Figure 3-figure 410

supplement 5; in contrast to Grote et al. 2012); taxon-removal analyses suggest that 411

alphaproteobacterium sp. HIMB59 is sister to the Caulobacteridae (Figure 3-figure 412

supplement 5), but the inclusion of any other long-branching group immediately 413

destabilizes this position (e.g., see Figure 2 and Figure 2-figure supplement 2, and 414

Figure 2-figure supplement 4). This is consistent with previous reports that suggest that 415

alphaproteobacterium sp. HIMB59 is not closely related to the Pelagibacterales (Viklund 416

et al. 2013; Martijn et al. 2018). 417

DISCUSSION 418

We have employed a diverse set of strategies to investigate the phylogenetic signal 419

contained within 200 genes for the Alphaproteobacteria. Specifically, such strategies 420

were primarily aimed at reducing amino acid compositional heterogeneity among taxa—421

a phenomenon that permeates our dataset (Figure 1). Compositional heterogeneity is a 422

clear violation of the phylogenetic models used in our, and previous, analyses, and 423

known to cause phylogenetic artefacts (Foster 2004). In the absence of more 424

sophisticated models for inferring deep phylogeny (i.e., those that best fit complex data), 425

the only way to counter artefacts caused by compositional heterogeneity is by removing 426

compositionally-biased sites or taxa, or recoding amino acids into reduced alphabets 427

(e.g., see Susko and Roger 2007; Heiss et al. 2018; Viklund, Ettema, and Andersson 428

2012). A combination of these strategies reveals that the Rickettsiales sensu lato (i.e., 429

23

the Rickettsiales and Holosporales) is polyphyletic. Our analyses suggest that the 430

Holosporales is derived within the Rhodospirillales, and that therefore this taxon should 431

be lowered in rank and renamed the Holosporaceae family (see Figure 2B and 3). The 432

same methods suggest that the Rhodospirillales might indeed be a paraphyletic order 433

and that the Geminicoccaceae could be a separate lineage that is sister to the 434

Caulobacteridae (e.g., Figure 2B). These two results, combined with our broader 435

sampling, reorganize the internal phylogenetic structure of the Rhodospirillales and 436

show that its diversity can be grouped into at least five well-supported major families 437

(Figure 3). 438

In 16S rRNA gene trees, the Holosporales has most often been allied to the 439

Rickettsiales (Montagna et al. 2013; Hess, Suthaus, and Melkonian 2015). The 440

apparent diversity of this group has quickly increased in recent years as more and more 441

intracellular bacteria living within protists have been described (e.g., Hess, Suthaus, and 442

Melkonian 2015; Szokoli et al. 2016; Eschbach et al. 2009; Boscaro et al. 2013). An 443

endosymbiotic lifestyle is shared by all members of the Holosporales and is also shared 444

with all those that belong to the Rickettsiales. Thus, it had been reasonable to accept 445

their shared ancestry as suggested by some 16S rRNA gene trees (e.g., Montagna et 446

al. 2013; Santos and Massard 2014; Hess, Suthaus, and Melkonian 2015). Apparent 447

strong support for the monophyly of the Rickettsiales and the Holosporales recently 448

came from some multi-gene trees by Wang and Wu (2014, 2015) who expanded 449

sampling for the Holosporales. However, an alternative placement for the Holosporales 450

as sister to the Caulobacteridae has been reported by Ferla et al. (2013) based on 451

rRNA genes, by Georgiades et al., (2011) based on 65 genes, by Schulz et al., (2015) 452

based on 139 genes, as well as by Wang and Wu (2015) based on 26, 29, or 200 genes 453

(see the supplementary information in Wang and Wu, 2015). This placement was 454

acknowledged by Szokoli et al. (2015), who formally established the order 455

Holosporales. Most recently, Martijn et al. 2018, who used strategies to reduce 456

compositional heterogeneity, and similarly to Wang and Wu (2015), recovered a number 457

of placements for the Holosporales within the Alphaproteobacteria; however these 458

different placements for the Holosporales were poorly supported. Here we provide 459

strong evidence for the hypothesis that the Holosporales is not related to the 460

24

Rickettsiales, as suggested earlier (Georgiades et al. 2011; Ferla et al. 2013; Szokoli et 461

al. 2016). The Rickettsiales sensu lato is polyphyletic. We show that the Holosporales is 462

artefactually attracted to the Rickettsiales (e.g., Figure 2A), but as compositional bias is 463

increasingly alleviated (through site removal and recoding), they move further away 464

from them (Figure 2B). The Holosporales is placed within the Rhodopirillales as sister to 465

the family Azospirillaceae (Figure 3). The similar lifestyles of the Holosporales and 466

Rickettsiales, as well as other features like the presence of an ATP/ADP translocase 467

(Wang and Wu 2014), are therefore likely the outcome of convergent evolution. 468

A derived origin of the Holosporales has important implications for understanding the 469

origin of mitochondria and the nature of their ancestor. Wang and Wu (2014, 2015) 470

proposed that mitochondria are phylogenetically embedded within the Rickettsiales 471

sensu lato. In their trees, mitochondria were sister to a clade formed by the 472

Rickettsiaceae, Anaplasmataceae and ‘Candidatus Midichloriaceae’, and the 473

Holosporales was itself sister to all of them. This phylogenetic placement for 474

mitochondria suggested that the ancestor of mitochondria was an intracellular parasite 475

(Wang and Wu, 2014). But if the Holosporales is a derived group of rhodospirillaleans 476

as we show here (see Figure 3), then the argument that mitochondria necessarily 477

evolved from parasitic alphaproteobacteria no longer holds. While the sisterhood of 478

mitochondria and the Rickettsiales sensu stricto is still a possibility, such a relationship 479

does not imply that the two groups shared a parasitic common ancestor (i.e., a parasitic 480

ancestry for mitochondria). The most recent analyses done by Martijn et al. (2018) 481

suggest that mitochondria are sister to all known alphaproteobacteria, also suggesting 482

their non-parasitic ancestry. Our study, and that of Martijn et al., thus complement each 483

other and support the view that mitochondria most likely evolved from ancestral free-484

living alphaproteobacteria (contra Sassera et al. 2011; Wang and Wu 2014, 2015). 485

The order Rhodospirillales is quite diverse and includes many purple nonsulfur bacteria 486

as well as all magnetotactic bacteria within the Alphaproteobacteria. The 487

Rhodospirillales is sister to all other orders in the Caulobacteridae, and has historically 488

been subdivided into two families: the Rhodospirillaceae and the Acetobacteraceae. 489

Recently, a new family, the Geminicoccaceae, was established for the Rhodospirillales 490

25

(Proença et al. 2017). However, some of our analyses suggest that the 491

Geminicoccaceae might be sister to all other Caulobacteridae (e.g., Figure 2B and 3B). 492

This phylogenetic pattern, therefore, suggests that the Rhodospirillales may be a 493

paraphyletic order. The placement of the Geminicoccaceae as sister to the 494

Caulobacteridae needs to be further tested once more sequenced diversity for this 495

group becomes available; if it were to be confirmed, the Geminicoccaceae should be 496

elevated to the order level. Whereas the Acetobacteraceae is phylogenetically well-497

defined, there has been considerable uncertainty about the Rhodospirillaceae (e.g., 498

Ferla et al., 2013), primarily because of poor sampling and a lack of resolution provided 499

by the 16S rRNA gene. We subdivide the Rhodospirillaceae sensu lato into three 500

subgroups (Figure 3). We restrict the Rhodospirillaceae sensu stricto to the subgroup 501

that is sister to the Acetobacteracae (Figure 3). The other two subgroups are the 502

Rhodovibriaceae and the Azospirillaceae; the latter is sister to the Holosporaceae 503

(Figure 3). 504

Based on our fairly robust phylogenetic patterns, we have updated the higher-level 505

taxonomy of the Alphaproteobacteria (Figure 4). We exclude the Magnetococcales from 506

the Alphaproteobacteria class because of its divergent nature (e.g., see Figure 1 in 507

Esser, Martin, and Dagan 2007 which shows that many of Magnetococcus’ genes are 508

more similar to those of beta-, and gammaproteobacteria). In agreement with its 509

intermediate phylogenetic placement, we endorse the Magnetococcia class as 510

proposed by Parks et al. (2018). At the highest level we define the Alphaproteobacteria 511

class as comprising two subclasses sensu Ferla et al. (2013), the Rickettsidae and the 512

Caulobacteridae. The former contains the Rickettsiales, and the latter contains all other 513

orders, which are primarily and ancestrally free-living alphaproteobacteria. The order 514

Rickettsiales comprises three families as previously defined, the Rickettsiaceae, the 515

Anaplasmataceae, and the ‘Candidatus Midichloriaceae’. On the other hand, the 516

Caulobacteridae is composed of seven phylogenetically well-supported orders: the 517

Rhodospirillales, Sneathiellales, Sphingomonadales, Pelagibacterales, 518

Rhodobacterales, Caulobacterales and Rhizobiales. Among the many species claimed 519

to represent new order-level lineages on the basis of 16S rRNA gene trees (Cho and 520

Giovannoni 2003; Kwon et al. 2005; Kurahashi et al. 2008; Wiese et al. 2009; Harbison 521

26

et al. 2017), only Sneathiella deserves order-level status (Kurahashi et al. 2008), since 522

all others have derived placements in our trees and those published by others (Williams 523

et al. 2012; Bazylinski et al. 2013; Venkata Ramana et al. 2013; Harbison et al. 2017). 524

The Rhodospirillales order comprises six families, three of which are new, namely the 525

Holosporaceae, Azospirillaceae and Rhodovibriaceae (Figure 4). This new higher-level 526

classification of the Alphaproteobacteria updates and expands those presented by Ferla 527

et al. (2013), the ‘Bergey’s Manual of Systematics of Archaea and Bacteria’ (Garrity 528

2005; Whitman 2015), and ‘The Prokaryotes’ (Rosenberg et al. 2014). The classification 529

scheme proposed here could be partly harmonized with that recently proposed by Parks 530

et al., (2018) by elevating the six families within the Rhodospirllales to the order level; 531

the phylograms by Parks et al., (2018), however, are in conflict with those shown here 532

and many of their proposed taxa are as well. 533

27

534

Figure 4. A higher-level classification scheme for the Alphaproteobacteria and the 535

Magnetococcia classes within the Proteobacteria, and the Rickettsiales and 536

Rhodospirillales orders within the Alphaproteobacteria. 537

538

28

CONCLUSIONS 539

We employed a combination of methods to decrease compositional heterogeneity in 540

order to disrupt artefacts that arise when inferring the phylogeny of the 541

Alphaproteobacteria. This is an example of the complex nature of the historical signal 542

contained in modern genomes and the limitations of our current evolutionary models to 543

capture these signals. A robust phylogeny of the Alphaproteobacteria is a precondition 544

for placing the mitochondrial lineage. This is because including mitochondria certainly 545

exacerbates the already strong biases in the data, and therefore represents additional 546

sources of artefacts in phylogenetic inference (as seen in Wang and Wu, 2015 where 547

the Holosporales is attracted by both mitochondria and the Rickettsiales). The robust 548

phylogenetic framework developed here will serve as a reference for future studies that 549

aim to place mitochondria and novel not-yet-cultured environmental diversity within the 550

Alphaproteobacteria. 551

TAXON DESCRIPTIONS 552

Rickettsidae emend. (Alphaproteobacteria) Rickettsia is the type genus of the subclass. 553

The Rickettsidae subclass is here amended by redefining its circumscription so it 554

remains monophyletic by excluding the Pelagibacterales order. The emended 555

Rickettsidae subclass within the Alphaproteobacteria class is defined based on 556

phylogenetic analyses of 200 genes which are predominantly single-copy and vertically-557

inherited (unlikely laterally transferred) when compositionally heterogeneity was 558

decreased by site removal or recoding. Phylogenetic (node-based) definition: the least 559

inclusive clade containing Anaplasma phagocytophilum HZ, Rickettsia typhi Wilmington, 560

and ‘Candidatus Midichloria mitochondrii’ IricVA. The Rickettsidae does not include: 561

Pelagibacter sp. HIMB058, ‘Candidatus Pelagibacter sp.’ IMCC9063, 562

alphaproteobacterium sp. HIMB59, Caedibacter sp. 37-49, ‘Candidatus Nucleicultrix 563

amoebiphila’ FS5, ‘Candidatus Finniella lucida’, Holospora obtusa F1, Sneathiella 564

glossodoripedis JCM 23214, Sphingomonas wittichii, and Brevundimonas subvibrioides 565

ATCC 15264. 566

Caulobacteridae emend. (Alphaproteobacteria) Caulobacter is the type genus of the 567

subclass. The Caulobacteridae subclass is here amended by redefining its 568

29

circumscription so it remains monophyletic by including the Pelagibacterales order. The 569

emended Caulobacteridae subclass within the Alphaproteobacteria class is defined 570

based on phylogenetic analyses of 200 genes which are predominantly single-copy and 571

vertically-inherited (unlikely laterally transferred) when compositionally heterogeneity 572

was decreased by site removal or recoding. Phylogenetic (node-based) definition: the 573

least inclusive clade containing Pelagibacter sp. HIMB058, ‘Candidatus Pelagibacter 574

sp.’ IMCC9063, alphaproteobacterium sp. HIMB59, Caedibacter sp. 37-49, ‘Candidatus 575

Nucleicultrix amoebiphila’ FS5, ‘Candidatus Finniella lucida’, Holospora obtusa F1, 576

Sneathiella glossodoripedis JCM 23214, Sphingomonas wittichii, and Brevundimonas 577

subvibrioides ATCC 15264. The Caulobacteridae does not include: Anaplasma 578

phagocytophilum HZ, Rickettsia typhi Wilmington, and ‘Candidatus Midichloria 579

mitochondrii’ IricVA 580

Azospirillaceae fam. nov. (Rhodospirillales, Alphaproteobacteria) Azospirillum is the 581

type genus of the family. This new family within the Rhodospirillales order is defined 582

based on phylogenetic analyses of 200 genes which are predominantly single-copy and 583

vertically-inherited (unlikely laterally transferred). Phylogenetic (node-based) definition: 584

the least inclusive clade containing Micavibrio aeruginoavorus ARL-13, Rhodocista 585

centenaria SW, and Inquilinus limosus DSM 16000. The Azospirillaceae does not 586

include: Rhodovibrio salinarum DSM 9154, ‘Candidatus Puniceispirillum marinum’ IMCC 587

1322, Rhodospirillum rubrum ATCC 11170, Terasakiella pusilla DSM 6293, Acidiphilium 588

angustum ATCC 49957, and Elioraea tepidiphila DSM 17972. 589

Rhodovibriaceae fam. nov. (Rhodospirillales, Alphaproteobacteria) Rhodovibrio is the 590

type genus of the family. This new family within the Rhodospirillales order is defined 591

based on phylogenetic analyses of 200 genes which are predominantly single-copy and 592

vertically-inherited (unlikely laterally transferred). Phylogenetic (node-based) definition: 593

the least inclusive clade containing Rhodovibrio salinarum DSM 9154, Kiloniella 594

laminariae DSM 19542, Oceanibaculum indicum P24, Thalassobaculum salexigens 595

DSM 19539 and ‘Candidatus Puniceispirillum marinum’ IMCC 1322. The 596

Rhodovobriaceae does not include: Rhodospirillum rubrum ATCC 11170, Terasakiella 597

30

pusilla DSM 6293, Rhodocista centenaria SW, Micavibrio aeruginoavorus ARL-13, 598

Acidiphilium angustum ATCC 49957, and Elioraea tepidiphila DSM 17972. 599

Rhodospirillaceae emend. (Rhodospirillales, Alphaproteobacteria) Rhodospirillum is the 600

type genus of the family. The Rhodospirillaceae family is here amended by redefining its 601

circumscription so it remains monophyletic. The emended Rhodospirillaceae family 602

within the Rhodospirillales order is defined based on phylogenetic analyses of 200 603

genes which are predominantly single-copy and vertically-inherited (unlikely laterally 604

transferred). Phylogenetic (node-based) definition: the least inclusive clade containing 605

Rhodospirillum rubrum ATCC 11170, Roseospirillum parvum 930l, Magnetospirillum 606

magneticum AMB-1 and Terasakiella pusilla DSM 6293. The Rhodospirillaceae does 607

not include: Rhodocista centenaria SW, Micavibrio aeruginoavorus ARL-13, ‘Candidatus 608

Puniceispirillum marinum’ IMCC 1322, Rhodovibrio salinarum DSM 9154, Elioraea 609

tepidiphila DSM 17972, and Acidiphilium angustum ATCC 49957. 610

Holosporaceae (Rhodospirillales, Alphaproteobacteria) Holospora is the type genus of 611

the family. The Holosporaceae family as defined here has the same taxon 612

circumscription as the Holosporales order sensu Szokoli et al., (2016), but it is here 613

lowered to the family level and placed within the Rhodospirillales order. The new family 614

rank-level for this group is based on the phylogenetic analysis of 200 genes, which are 615

predominantly single-copy and vertically-inherited (unlikely laterally transferred), when 616

compositionally heterogeneity was decreased by site removal or recoding (and coupled 617

to the removal of the long-branching taxa Pelagibacterales and Rickettsiales). The 618

family contains three subfamilies (lowered in rank from a former family level) and one 619

formally-undescribed clade, namely, the Holosporodeae, and ‘Candidatus 620

Paracaedibacteriodeae’, ‘Candidatus Hepatincolodeae’, and the Caedibacter-621

Nucleicultrix clade. 622

METHODS 623

Genome sequencing 624

Cultures of Viridiraptor invadens strain Virl02, the host of ‘Candidatus Finniella 625

inopinata’, were grown on the filamentous green alga Zygnema pseudogedeanum strain 626

31

CCAC 0199 as described in (Hess and Melkonian 2013). Once the algal food was 627

depleted, Viridiraptor cells were harvested by filtration through a cell strainer (mesh size 628

40 µm to remove algal cell walls) and centrifugation (~1,000 g for 15 min). For short-629

read sequencing, DNA extraction of total gDNA was carried out with the ZR 630

Fungal/Bacterial DNA MicroPrep Kit (Zymo Research) using a BIO101/Savant FastPrep 631

FP120 high-speed bead beater and 20 uL of proteinase K (20 mg/mL). A sequencing 632

library was made using the NEBNext Ultra II DNA Library Prep Kit (New England 633

Biolabs). Paired-end DNA sequencing libraries were sequenced with an Illumina MiSeq 634

instrument (Dalhousie University; Canada). (number of reads=3,006,282, read 635

length=150 bp). For long-read sequencing, DNA extraction was performed using a 636

CTAB and phenol-chloroform method. Total gDNA was further cleaned through a 637

QIAGEN Genomic-Tip 20/G. A sequencing library was made using the Nanopore 638

Ligation Sequencing Kit 1D (SQK-LSK108). Sequencing was done on a portable 639

MinION instrument (Oxford Nanopore Technologies). (total bases=191,942,801 bp, 640

number of reads=73,926, longest read=32,236 bp, mean read length=2,596 bp, mean 641

read quality=9.4). 642

Peranema trichophorum strain CCAP 1260/1B was obtained from the Culture Collection 643

of Algae and Protozoa (CCAP, Oban, Scotland) and grown in liquid Knop media plus 644

egg yolk crystals. Total gDNA was extracted following Lang and Burger (2007). A 645

paired-end sequencing library was made using a TruSeq DNA Library Prep Kit 646

(Illumina). DNA sequencing libraries were sequenced with an Illumina MiSeq instrument 647

(Genome Quebec Innovation Centre; Canada). (number of reads=4,157,475, read 648

length=300 bp). 649

Stachyamoeba lipophora strain ATCC 50324 cells feeding on Escherichia coli were 650

harvested and then broken up with pestle and mortar in the presence of glass beads (< 651

450 µm diameter). Total gDNA was extracted using the QIAGEN Genomic G20 Kit. A 652

paired-end sequencing library was made using a TruSeq DNA Library Prep Kit 653

(Illumina). DNA sequencing libraries were sequenced with an Illumina MiSeq instrument 654

(Genome Quebec Innovation Centre; Canada). (number of reads=35,605,415, read 655

length=100 bp). 656

32

Genome assembly and annotation 657

Short sequencing reads produced in an Illumina MiSeq from Viridiraptor invadens, 658

Peranema trichophorum, and Stachyamoeba lipophora were first assessed with 659

FASTQC v0.11.6 and then, based on its reports, trimmed with Trimmomatic v0.32 660

(Bolger, Lohse, and Usadel 2014) using the options: HEADCROP:16 LEADING:30 661

TRAILING:30 MINLEN:36. Illumina adapters were similarly removed with Trimmomatic 662

v0.32 using the option ILLUMINACLIP. Long sequencing reads produced in a Nanopore 663

MinION instrument from Viridiraptor invadens were basecalled with Albacore v2.1.7, 664

adapters were removed with Porechop v0.2.3, lambda phage reads were removed with 665

NanoLyse v0.5.1, quality filtering was done with NanoFilt v2.0.0 (with the options ‘--666

headcrop 50 -q 8 -l 1000’), and identity filtering against the high-quality short Illumina 667

reads was done with Filtlong v0.2.0 (and the options ‘--keep_percent 90 --trim --split 500 668

--length_weight 10 --min_length 1000’). Statistics were calculated throughout the read 669

processing workflow with NanoStat v0.8.1 and NanoPlot v1.9.1. A hybrid co-assembly 670

of both processed Illumina short reads and Nanopore long reads from Viridiraptor 671

invadens was done with SPAdes v3.6.2 (Bankevich et al. 2012). Assemblies of the 672

Illumina short reads from Peranema trichophorum and Stachyamobea lipophora were 673

separately done with SPAdes v3.6.2 (Bankevich et al. 2012). The resulting assemblies 674

for both Viridiraptor invadens and Peranema trichophorum were later separately 675

processed with the Anvi’o v2.4.0 pipeline (Eren et al. 2015) and refined genome bins 676

corresponding to ‘Candidatus Finniella inopinata’ and the Peranema-associated 677

rickettsialean were isolated primarily based on tetranucleotide sequence composition 678

and taxonomic affiliation of its contigs. A single contig corresponding to the genome of 679

the Stachyamoeba-associated rickettsialean was obtained from its assembly and this 680

was circularized by collapsing the overlapping ends of the contig. Gene prediction and 681

genome annotation was carried out with Prokka v.1.13 (see Table 1). 682

Dataset assembly (taxon and gene selection) 683

The selection of 120 taxa was largely based on the phylogenetically diverse set of 684

alphaproteobacteria determined by Wang and Wu (2015). To this set of taxa, recently 685

sequenced and divergent unaffiliated alphaproteobacteria were added, as well as those 686

33

claimed to constitute novel order-level taxa. Some other groups, like the 687

Pelagibacterales, Rhodospirillales and the Holosporales, were expanded to better 688

represent their diversity. A set of four betaproteobacteria and four 689

gammaproteobacterial were used as outgroup (see Figure 2-figure supplement 6 for 690

taxon names; see Supplementary file 2-table S3 for accession numbers). 691

A set of 200 gene markers (54,400 sites; 9.03% missing data, see Figure 2-figure 692

supplement 6) defined by Phyla-AMPHORA was used (Wang and Wu 2013). The genes 693

are single-copy and predominantly vertically-inherited as assessed by congruence 694

among them (Wang and Wu 2013). In brief, Phyla-AMPHORA searches for each 695

marker gene using a profile Hidden Markov Model (HMM), then aligns the best hits to 696

the profile HMM using hmmalign of the HMMER suite, and then trims the alignments 697

using pre-computed quality scores (the mask) previously generated using the 698

probabilistic masking program ZORRO (Wu, Chatterji, and Eisen 2012; Wang and Wu 699

2013). Phylogenetic trees for each marker gene were inferred from the trimmed multiple 700

alignments in IQ-TREE v1.5.5 (Minh, Nguyen, and von Haeseler 2013; Nguyen et al. 701

2015) and under the model LG4X+F model. Single-gene trees were examined 702

individually to remove distant paralogues, contaminants or laterally transferred genes. 703

All this was done before concatenating the single-gene alignments into a supermatrix 704

with SequenceMatrix v 1.8 (Vaidya, Lohman, and Meier 2011). Another smaller dataset 705

of 40 compositionally-homogenous genes (5,570 sites; 5.98% missing data) was built 706

by selecting the least compositionally heterogeneous genes from the larger 200 gene 707

set according compositional homogeneity tests performed in P4 (Foster 2004; see 708

Supplementary file 2-table S4 for a list of the 40 most compositionally homogenous 709

genes). This was done as an alternative way to overcome the strong compositional 710

heterogeneity observed in datasets for the Alphaproteobacteria with a broad selection of 711

taxa. In brief, the P4 tests rely on simulations based on a provided tree (here inferred for 712

each gene under the model LG4X+F in IQ-TREE) and a model (LG+F+G4 available in 713

P4) to obtain proper null distributions to which to compare the X2 statistic. Most 714

standard tests for compositional homogeneity (those that do not rely on simulate the 715

data on a given tree) ignore correlation due to phylogenetic relatedness, and can suffer 716

from a high probability of false negatives (Foster 2004). 717

34

Variations of our full set were made to specifically assess the placement of each long-718

branching and compositionally-biased group individually. In other words, each group 719

with comparatively long branches (the Rickettsiales, Pelagibacterales, Holosporales, 720

and alphaproteobacterium sp. HIMB59) was analyzed in isolation, i.e., in the absence of 721

other long-branching and compositionally-biased taxa. This was done with the purpose 722

of reducing the potential artefactual attraction among these groups. Taxon removal was 723

done in addition to compositionally-biased site removal and data recoding into reduced 724

character-state alphabets (for a summary of the different methodological strategies 725

employed see Figure 2-figure supplement 2). 726

Removal of compositionally-biased and fast-evolving sites 727

As an effort to reduce artefacts in phylogenetic inference from our dataset (which might 728

stem from extreme divergence in the evolution of the Alphaproteobacteria), we removed 729

sites estimated to be highly compositionally heterogeneous or fast evolving. The 730

compositional heterogeneity of a site was estimated by using a metric intended to 731

measure the degree of disparity between the most %AT-rich taxa and all others. Taxa 732

were ordered from lowest to highest proteome GARP:FIMNKY ratios; ‘GARP’ amino 733

acids are encoded by %GC-rich codons, whereas ‘FIMNKY’ amino acids are encoded 734

by %AT-rich codons. The resulting plot was visually inspected and a GARP:FIMNKY 735

ratio cutoff of 1.06 (which represented a discontinuity or gap in the distribution which 736

separated the long-branching and compositionally-biased taxa Pelagibacterales, 737

Holosporales and Rickettsiales from all others) was chosen to divide the dataset into 738

low GARP:FMINKY (or %AT-rich) and higher GARP:FIMNKY (or ‘GC-rich’) taxa (Figure 739

2-figure supplement 7). Next, we determined the degree of compositional bias per site 740

(ɀ) for the frequencies of both FIMNKY and GARP amino acids between the %AT-rich 741

and all other (‘GC-rich’) alphaproteobacteria. To calculate this metric for each site the 742

following formula was used: 743

ɀ = (𝜋𝐹𝐼𝑀𝑁𝐾𝑌%AT-rich − 𝜋𝐹𝐼𝑀𝑁𝐾𝑌%GC-rich) + (𝜋𝐺𝐴𝑅𝑃%GC-rich − 𝜋𝐺𝐴𝑅𝑃%AT-rich)

where 𝜋𝐹𝐼𝑀𝑁𝐾𝑌 and 𝜋𝐺𝐴𝑅𝑃 are the sum of the frequencies for FIMNKY and GARP 744

amino acids at a site, respectively, for either ‘%AT-rich’ or ‘%GC-rich’ taxa. According to 745

this metric, higher values measure a greater disparity between %AT-rich 746

35

alphaproteobacteria and all others; a measure of compositional heterogeneity or bias 747

per site. The most compositionally heterogeneous sites according to ɀ were 748

progressively removed using the software SiteStripper (Verbruggen 2018) in increments 749

of 10%. We also progressively removed the fastest evolving sites in increments of 10%. 750

Conditional mean site rates were estimated under the LG+C60+F+R6 model in IQ-751

TREE v1.5.5 using the ‘-wsr’ flag (Nguyen et al. 2015). 752

Data recoding 753

Our datasets were recoded into four- and six-character state amino acid alphabets 754

using dataset-specific recoding schemes aimed at minimizing compositional 755

heterogeneity in the data (Susko and Roger 2007). The program minmax-chisq, 756

which implements the methods of Susko and Roger (2007), was used to find the best 757

recoding schemes—please see Figure 3, Figure 2-figure supplement 4 and Figure 3-758

figure supplement 1-6, and Figure 3-figure supplement 8 legends for the specific 759

recoding schemes used for each dataset. The approach uses the chi-squared (X2) 760

statistic for a test of homogeneity of frequencies as a criterion function for determining 761

the best recoding schemes. Let 𝜋𝑖 denote the frequency of bin 𝑖 for the recoding 762

scheme currently under consideration. For instance, suppose the amino acids were 763

recoded into four bins, 764

RNCM EHIPTWV ADQLKS GFY. 765

Then 𝜋4would be the frequency with which the amino acids G, F or Y were observed. 766

Let 𝜋𝑖𝑠 be the frequency of bin 𝑖 for the 𝑠th taxa. Then the X2 statistic for the null 767

hypothesis that the frequencies are constant, over taxa, against the unrestricted 768

hypothesis is 769

𝑡𝑠 = ∑ (𝜋𝑖𝑠 − 𝜋𝑖)2

𝑖𝑠/𝜋𝑖

The X2 statistic provides a measure of how different the frequencies for the 𝑠th taxa are 770

from the average frequencies. The maximum 𝑡𝑠 over 𝑠 is taken as an overall measure of 771

how heterogeneous the frequencies are for a given recoding scheme. The minmax-772

36

chisq program searches through recoding schemes, moving amino acids from one bin 773

to another, to try to minimize the 𝑚𝑎𝑥 𝑡𝑠 (Susko and Roger 2007). 774

Phylogenetic inference 775

The inference of phylogenies was primarily done under the maximum likelihood 776

framework and using IQ-TREE v1.5.5 (Minh, Nguyen, and von Haeseler 2013; Nguyen 777

et al. 2015). ModelFinder in IQ-TREE v1.5.5 (Kalyaanamoorthy et al. 2017) was used to 778

assess the best-fitting amino acid empirical matrix (e.g., JTT, WAG, and LG), on a 779

maximum-likelihood tree, to our full dataset of 120 taxa and 200 conserved single-copy 780

marker genes (see Supplementary file 2-table S5 and Supplementary file 2-table S6). 781

We first inferred guide trees (for a PMSF analysis) with a model that comprises the LG 782

empirical matrix, with empirical frequencies estimated from the data (F), six rates for the 783

FreeRate model to account for rate heterogeneity across sites (R6), and a mixture 784

model with 60 amino acid profiles (C60) to account for compositional heterogeneity 785

across sites—LG+C60+F+R6. Because the computational power and time required to 786

properly explore the whole tree space (given such a big dataset and complex model) 787

was too high, constrained tree searches were employed to obtain these initial guide 788

trees (see Figure 2-figure supplement 6 for the constraint tree). Many shallow nodes 789

were constrained if they received maximum UFBoot and SH-aLRT support in a 790

LG+PMSF(C60)+F+R6 analysis. All deep nodes, those relevant to the questions 791

addressed here, were left unconstrained (Figure 2-figure supplement 6). The guide 792

trees were then used together with a dataset-specific mixture model ES60 to estimate 793

site-specific amino acid profiles, or a PMSF (Posterior Mean Site Frequency Profiles) 794

model, that best account for compositional heterogeneity across sites (Wang et al. 795

2018). The dataset-specific empirical mixture model ES60 also has 60 categories but, 796

unlike the general C60, was directly estimated from our large dataset of 200 genes and 797

120 alphaproteobacteria (and outgroup) using the methods described in (Susko, 798

Lincker, and Roger 2018; ModelFinder (Kalyaanamoorthy et al. 2017) suggests that the 799

LG+ES60+F+R6 model is the best-fitting model; the R6 model component, however, 800

considerably increases computational burden; see Supplementary file 2-table S6 and 801

Supplementary file 2-table S7). Final trees were inferred using the 802

37

LG+PMSF(ES60)+F+R6 model and a fully unconstrained tree search. Those datasets 803

that produced the most novel topologies under maximum likelihood were further 804

analyzed under a Bayesian framework using PhyloBayes MPI v1.7 and the CAT-805

Poisson+Γ4 model (Lartillot and Philippe 2004; Lartillot, Lepage, and Blanquart 2009). 806

This model allows for a very large number of classes to account for compositional 807

heterogeneity across sites and, unlike in the more complex CAT-GTR+Γ4 model, also 808

allows for convergence to be more easily achieved between MCMC chains. PhyloBayes 809

MCMC chains were run for at least 10,000 cycles until convergence between the chains 810

was achieved and the largest discrepancy (i.e., maxdiff parameter) was ≤ 0.4 (except 811

for the untreated dataset analyzed in Figure 2-figure supplement 3A; see 812

Supplementary file 2-table S8 for several summary statistics for each PhyloBayes 813

MCMC chain, including discrepancy and effective sample size values). A consensus 814

tree was generated from two PhyloBayes MCMC chains using a burn-in of 500 trees 815

and sub-sampling every 10 trees. 816

Phylogenetic analyses of recoded datasets into four-character state alphabets were 817

analyzed using IQ-TREE v1.5.5 and the model GTR+ES60S4+F+R6. ES60S4 is an 818

adaptation of the dataset-specific empirical mixture model ES60 to four-character 819

states. It is obtained by adding the frequencies of the amino acids that belong to each 820

bin in the dataset-specific four-character state scheme S4 (see Data Recoding for 821

details). Phylogenetics analyses of recoded datasets into six-character state alphabets 822

were analyzed using PhyloBayes MPI v1.7 and the CAT-Poisson+Γ4 model. Maximum-823

likelihood analyses with a six-state recoding scheme could not be performed because 824

IQ-TREE currently only supports amino acid datasets recoded into four-character 825

states. 826

Other analyses 827

The 16S rRNA genes of ‘Candidatus Finniella inopinata’, and the presumed 828

endosymbionts of Peranema trichophorum and Stachyamoeba lipophora were identified 829

with RNAmmer 1.2 server and BLAST searches. A set of 16S rRNA genes for diverse 830

rickettsialeans and holosporaleans, and other alphaproteobacteria as outgroup, were 831

retrieved from NCBI GenBank. The selection was based on Hess et al., (2016), Szokoli 832

38

et al., (2016) and Wang and Wu (2015). Environmental sequences for uncultured and 833

undescribed rickettsialeans were retrieved by keeping the 50 best hits resulting from a 834

BLAST search of our three novel 16S rRNA genes against the NCBI GenBank non-835

redundant (nr) database. The sequences were aligned with the SILVA aligner SINA 836

v1.2.11 and all-gap sites were later removed. Phylogenetic analyses on this alignment 837

were performed on IQ-TREE v1.5.5 using the GTR+F+R8 model. 838

A UPGMA (average-linkage) clustering of amino acid compositions based on the 200 839

gene set for the Alphaproteobacteria was built in MEGA 7 (Kumar, Stecher, and Tamura 840

2016) from a matrix of Euclidean distances between amino acid compositions of 841

sequences exported from the phylogenetic software P4 (Foster 2004; 842

http://p4.nhm.ac.uk/index.html). 843

Data availability 844

The genome of ‘Candidatus Finniella inopinata’, endosymbiont of Peranema 845

trichophorum strain CCAP 1260/1B and endosymbiont of Stachyamoeba lipophora 846

strain ATCC 50324 were deposited in NCBI GenBank under the BioProject 847

PRJNA501864. Raw sequencing reads were deposited on the NCBI SRA archive under 848

the BioProject PRJNA501864. Multi-gene datasets as well as phylogenetic trees 849

inferred in this study were deposited at Mendeley Data under the DOI: 850

http://dx.doi.org/10.17632/75m68dxd83.1. 851

ACKNOWLEDGEMENTS 852

Sergio A. Muñoz-Gómez is supported by a Killam Predoctoral Scholarship and a Nova 853

Scotia Graduate Scholarship. This work was supported by Natural Sciences and 854

Engineering Research (NSERC) Discovery Grants 2016-06792 to A.J.R, RGPIN/05754-855

2015 to C.H.S, RGPIN/05286-2014 to G.B., and RGPIN-2017-05411 to B.F.L. We thank 856

Jon Jerlström Hultqvist and Gina Filloramo (both at Dalhousie University) for advice on 857

long-read sequencing with the Nanopore MinION. We also thank Camilo A. Calderón-858

Acevedo for advice about taxonomic issues, and Franziska Szokoli for reading and 859

commenting on a late version of this manuscript. Bruce Curtis (Dalhousie University) 860

and Peter G. Foster (Natural History Museum of London) kindly provided technical help 861

39

with bioinformatics and with the software P4, respectively. We thank Joanny Roy, 862

Georgette Kiethega, Matus Valach, and Shona Teijeiro (all at the Université de 863

Montréal), and Drahomira Faktora (University of South Bohemia), for help with the 864

culturing, DNA preparation and sequencing of the endosymbiont of Peranema 865

trichophorum. Some of the genome data used in this study were produced by the US 866

Department of Energy Joint Genome Institute (http://www.jgi.doe.gov/) in collaboration 867

with the user community. 868

COMPETING INTERESTS 869

No competing interests declared by the authors. 870

REFERENCES 871

Bankevich, Anton, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander 872 S. Kulikov, Valery M. Lesin, et al. 2012. “SPAdes: A New Genome Assembly Algorithm 873 and Its Applications to Single-Cell Sequencing.” Journal of Computational Biology 19 (5): 874 455–77. https://doi.org/10.1089/cmb.2012.0021. 875

Bazylinski, Dennis A., Timothy J. Williams, Christopher T. Lefèvre, Denis Trubitsyn, Jiasong 876 Fang, Terrence J. Beveridge, Bruce M. Moskowitz, et al. 2013. “Magnetovibrio 877 Blakemorei Gen. Nov., Sp. Nov., a Magnetotactic Bacterium (Alphaproteobacteria: 878 Rhodospirillaceae) Isolated from a Salt Marsh.” International Journal of Systematic and 879 Evolutionary Microbiology 63 (5): 1824–33. https://doi.org/10.1099/ijs.0.044453-0. 880

Bolger, Anthony M., Marc Lohse, and Bjoern Usadel. 2014. “Trimmomatic: A Flexible Trimmer 881 for Illumina Sequence Data.” Bioinformatics 30 (15): 2114–20. 882 https://doi.org/10.1093/bioinformatics/btu170. 883

Boscaro, Vittorio, Sergei I. Fokin, Martina Schrallhammer, Michael Schweikert, and Giulio 884 Petroni. 2013. “Revised Systematics of Holospora-like Bacteria and Characterization of 885 ‘Candidatus Gortzia Infectiva’, a Novel Macronuclear Symbiont of Paramecium 886 Jenningsi.” Microbial Ecology 65 (1): 255–67. https://doi.org/10.1007/s00248-012-0110-887 2. 888

Brindefalk, Björn, Thijs J. G. Ettema, Johan Viklund, Mikael Thollesson, and Siv G. E. 889 Andersson. 2011. “A Phylometagenomic Exploration of Oceanic Alphaproteobacteria 890 Reveals Mitochondrial Relatives Unrelated to the SAR11 Clade.” PLoS ONE 6 (9): 891 e24457. https://doi.org/10.1371/journal.pone.0024457. 892

Castelli, Michele, Elena Sabaneyeva, Olivia Lanzoni, Natalia Lebedeva, Anna Maria Floriano, 893 Stefano Gaiarsa, Konstantin Benken, et al. 2018. “The Extracellular Association of the 894 Bacterium ‘Candidatus Deianiraea Vastatrix’ with the Ciliate Paramecium Suggests an 895 Alternative Scenario for the Evolution of Rickettsiales.” BioRxiv, November, 479196. 896 https://doi.org/10.1101/479196. 897

Cho, Jang-Cheon, and Stephen J. Giovannoni. 2003. “Parvularcula Bermudensis Gen. Nov., 898 Sp. Nov., a Marine Bacterium That Forms a Deep Branch in the Alpha-Proteobacteria.” 899 International Journal of Systematic and Evolutionary Microbiology 53 (Pt 4): 1031–36. 900 https://doi.org/10.1099/ijs.0.02566-0. 901

Eren, A. Murat, Özcan C. Esen, Christopher Quince, Joseph H. Vineis, Hilary G. Morrison, 902 Mitchell L. Sogin, and Tom O. Delmont. 2015. “Anvi’o: An Advanced Analysis and 903

40

Visualization Platform for ‘omics Data.” PeerJ 3 (October): e1319. 904 https://doi.org/10.7717/peerj.1319. 905

Eschbach, Erik, Martin Pfannkuchen, Michael Schweikert, Denja Drutschmann, Franz Brümmer, 906 Sergei Fokin, Wolfgang Ludwig, and Hans-Dieter Görtz. 2009. “‘Candidatus 907 Paraholospora Nucleivisitans’, an Intracellular Bacterium in Paramecium Sexaurelia 908 Shuttles between the Cytoplasm and the Nucleus of Its Host.” Systematic and Applied 909 Microbiology 32 (7): 490–500. https://doi.org/10.1016/j.syapm.2009.07.004. 910

Esser, Christian, William Martin, and Tal Dagan. 2007. “The Origin of Mitochondria in Light of a 911 Fluid Prokaryotic Chromosome Model.” Biology Letters 3 (2): 180–84. 912 https://doi.org/10.1098/rsbl.2006.0582. 913

Ettema, Thijs J. G., and Siv G. E. Andersson. 2009. “The Alpha-Proteobacteria: The Darwin 914 Finches of the Bacterial World.” Biology Letters 5 (3): 429–32. 915 https://doi.org/10.1098/rsbl.2008.0793. 916

Ferla, Matteo P., J. Cameron Thrash, Stephen J. Giovannoni, and Wayne M. Patrick. 2013. 917 “New RRNA Gene-Based Phylogenies of the Alphaproteobacteria Provide Perspective 918 on Major Groups, Mitochondrial Ancestry and Phylogenetic Instability.” PLoS ONE 8 919 (12): e83383. https://doi.org/10.1371/journal.pone.0083383. 920

Fitzpatrick, David A., Christopher J. Creevey, and James O. McInerney. 2006. “Genome 921 Phylogenies Indicate a Meaningful Alpha-Proteobacterial Phylogeny and Support a 922 Grouping of the Mitochondria with the Rickettsiales.” Molecular Biology and Evolution 23 923 (1): 74–85. https://doi.org/10.1093/molbev/msj009. 924

Foesel, Bärbel U., Anita S. Gössner, Harold L. Drake, and Andreas Schramm. 2007. 925 “Geminicoccus Roseus Gen. Nov., Sp. Nov., an Aerobic Phototrophic 926 Alphaproteobacterium Isolated from a Marine Aquaculture Biofilter.” Systematic and 927 Applied Microbiology 30 (8): 581–86. https://doi.org/10.1016/j.syapm.2007.05.005. 928

Foster, Peter G. 2004. “Modeling Compositional Heterogeneity.” Systematic Biology 53 (3): 929 485–95. 930

Garrity, George, ed. 2005. Bergey’s Manual of Systematic Bacteriology: Volume 2 : The 931 Proteobacteria. 2nd ed. Springer US. //www.springer.com/gp/book/9780387950402. 932

Georgiades, Kalliopi, Mohammed-Amine Madoui, Phuong Le, Catherine Robert, and Didier 933 Raoult. 2011. “Phylogenomic Analysis of Odyssella Thessalonicensis Fortifies the 934 Common Origin of Rickettsiales, Pelagibacter Ubique and Reclimonas Americana 935 Mitochondrion.” PLoS ONE 6 (9): e24857. https://doi.org/10.1371/journal.pone.0024857. 936

Giovannoni, Stephen J. 2017. “SAR11 Bacteria: The Most Abundant Plankton in the Oceans.” 937 Annual Review of Marine Science 9 (January): 231–55. https://doi.org/10.1146/annurev-938 marine-010814-015934. 939

Grote, Jana, J. Cameron Thrash, Megan J. Huggett, Zachary C. Landry, Paul Carini, Stephen J. 940 Giovannoni, and Michael S. Rappé. 2012. “Streamlining and Core Genome 941 Conservation among Highly Divergent Members of the SAR11 Clade.” MBio 3 (5): 942 e00252-12. https://doi.org/10.1128/mBio.00252-12. 943

Harbison, Austin B., Laiken E. Price, Michael D. Flythe, and Suzanna L. Bräuer. 2017. 944 “Micropepsis Pineolensis Gen. Nov., Sp. Nov., a Mildly Acidophilic 945 Alphaproteobacterium Isolated from a Poor Fen, and Proposal of Micropepsaceae Fam. 946 Nov. within Micropepsales Ord. Nov.” International Journal of Systematic and 947 Evolutionary Microbiology 67 (4): 839–44. https://doi.org/10.1099/ijsem.0.001681. 948

Heiss, Aaron A., Martin Kolisko, Fleming Ekelund, Matthew W. Brown, Andrew J. Roger, and 949 Alastair G. B. Simpson. 2018. “Combined Morphological and Phylogenomic Re-950 Examination of Malawimonads, a Critical Taxon for Inferring the Evolutionary History of 951 Eukaryotes.” Royal Society Open Science 5 (4): 171707. 952 https://doi.org/10.1098/rsos.171707. 953

41

Hess, Sebastian, and Michael Melkonian. 2013. “The Mystery of Clade X: Orciraptor Gen. Nov. 954 and Viridiraptor Gen. Nov. Are Highly Specialised, Algivorous Amoeboflagellates 955 (Glissomonadida, Cercozoa).” Protist 164 (5): 706–47. 956 https://doi.org/10.1016/j.protis.2013.07.003. 957

Hess, Sebastian, Andreas Suthaus, and Michael Melkonian. 2015. “‘Candidatus Finniella’ 958 (Rickettsiales, Alphaproteobacteria), Novel Endosymbionts of Viridiraptorid 959 Amoeboflagellates (Cercozoa, Rhizaria).” Applied and Environmental Microbiology 82 960 (2): 659–70. https://doi.org/10.1128/AEM.02680-15. 961

Kalyaanamoorthy, Subha, Bui Quang Minh, Thomas K. F. Wong, Arndt von Haeseler, and Lars 962 S. Jermiin. 2017. “ModelFinder: Fast Model Selection for Accurate Phylogenetic 963 Estimates.” Nature Methods 14 (6): 587–89. https://doi.org/10.1038/nmeth.4285. 964

Kumar, Sudhir, Glen Stecher, and Koichiro Tamura. 2016. “MEGA7: Molecular Evolutionary 965 Genetics Analysis Version 7.0 for Bigger Datasets.” Molecular Biology and Evolution 33 966 (7): 1870–74. https://doi.org/10.1093/molbev/msw054. 967

Kurahashi, Midori, Yukiyo Fukunaga, Shigeaki Harayama, and Akira Yokota. 2008. “Sneathiella 968 Glossodoripedis Sp. Nov., a Marine Alphaproteobacterium Isolated from the Nudibranch 969 Glossodoris Cincta, and Proposal of Sneathiellales Ord. Nov. and Sneathiellaceae Fam. 970 Nov.” International Journal of Systematic and Evolutionary Microbiology 58 (Pt 3): 548–971 52. https://doi.org/10.1099/ijs.0.65328-0. 972

Kwon, Kae Kyoung, Hee-Soon Lee, Sung Hyun Yang, and Sang-Jin Kim. 2005. “Kordiimonas 973 Gwangyangensis Gen. Nov., Sp. Nov., a Marine Bacterium Isolated from Marine 974 Sediments That Forms a Distinct Phyletic Lineage (Kordiimonadales Ord. Nov.) in the 975 ‘Alphaproteobacteria.’” International Journal of Systematic and Evolutionary Microbiology 976 55 (5): 2033–37. https://doi.org/10.1099/ijs.0.63684-0. 977

Lang, B. Franz, and Gertraud Burger. 2007. “Purification of Mitochondrial and Plastid DNA.” 978 Nature Protocols 2 (3): 652–60. https://doi.org/10.1038/nprot.2007.58. 979

Lartillot, Nicolas, Thomas Lepage, and Samuel Blanquart. 2009. “PhyloBayes 3: A Bayesian 980 Software Package for Phylogenetic Reconstruction and Molecular Dating.” 981 Bioinformatics 25 (17): 2286–88. https://doi.org/10.1093/bioinformatics/btp368. 982

Lartillot, Nicolas, and Hervé Philippe. 2004. “A Bayesian Mixture Model for Across-Site 983 Heterogeneities in the Amino-Acid Replacement Process.” Molecular Biology and 984 Evolution 21 (6): 1095–1109. https://doi.org/10.1093/molbev/msh112. 985

Lee, Kyung-Bum, Chi-Te Liu, Yojiro Anzai, Hongik Kim, Toshihiro Aono, and Hiroshi Oyaizu. 986 2005. “The Hierarchical System of the ‘Alphaproteobacteria’: Description of 987 Hyphomonadaceae Fam. Nov., Xanthobacteraceae Fam. Nov. and Erythrobacteraceae 988 Fam. Nov.” International Journal of Systematic and Evolutionary Microbiology 55 (Pt 5): 989 1907–19. https://doi.org/10.1099/ijs.0.63663-0. 990

Luo, Haiwei. 2015. “Evolutionary Origin of a Streamlined Marine Bacterioplankton Lineage.” The 991 ISME Journal 9 (6): 1423–33. https://doi.org/10.1038/ismej.2014.227. 992

Luo, Haiwei, Miklós Csűros, Austin L. Hughes, and Mary Ann Moran. 2013. “Evolution of 993 Divergent Life History Strategies in Marine Alphaproteobacteria.” MBio 4 (4): e00373-13. 994 https://doi.org/10.1128/mBio.00373-13. 995

Madigan, Michael T., Deborah O. Jung, and Michael T. Madigan. 2009. “An Overview of Purple 996 Bacteria: Systematics, Physiology, and Habitats.” In The Purple Phototrophic Bacteria, 997 1–15. Advances in Photosynthesis and Respiration. Springer, Dordrecht. 998 https://doi.org/10.1007/978-1-4020-8815-5_1. 999

Martijn, Joran, Julian Vosseberg, Lionel Guy, Pierre Offre, and Thijs J. G. Ettema. 2018. “Deep 1000 Mitochondrial Origin Outside the Sampled Alphaproteobacteria.” Nature 557 (7703): 1001 101–5. https://doi.org/10.1038/s41586-018-0059-5. 1002

42

Minh, Bui Quang, Minh Anh Thi Nguyen, and Arndt von Haeseler. 2013. “Ultrafast 1003 Approximation for Phylogenetic Bootstrap.” Molecular Biology and Evolution 30 (5): 1004 1188–95. https://doi.org/10.1093/molbev/mst024. 1005

Montagna, Matteo, Davide Sassera, Sara Epis, Chiara Bazzocchi, Claudia Vannini, Nathan Lo, 1006 Luciano Sacchi, Takema Fukatsu, Giulio Petroni, and Claudio Bandi. 2013. “‘Candidatus 1007 Midichloriaceae’ Fam. Nov. (Rickettsiales), an Ecologically Widespread Clade of 1008 Intracellular Alphaproteobacteria.” Applied and Environmental Microbiology 79 (10): 1009 3241–48. https://doi.org/10.1128/AEM.03971-12. 1010

Nguyen, Lam-Tung, Heiko A. Schmidt, Arndt von Haeseler, and Bui Quang Minh. 2015. “IQ-1011 TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood 1012 Phylogenies.” Molecular Biology and Evolution 32 (1): 268–74. 1013 https://doi.org/10.1093/molbev/msu300. 1014

Parks, Donovan H., Maria Chuvochina, David W. Waite, Christian Rinke, Adam Skarshewski, 1015 Pierre-Alain Chaumeil, and Philip Hugenholtz. 2018. “A Standardized Bacterial 1016 Taxonomy Based on Genome Phylogeny Substantially Revises the Tree of Life.” Nature 1017 Biotechnology 36 (10): 996–1004. https://doi.org/10.1038/nbt.4229. 1018

Proença, Diogo N., William B. Whitman, Neha Varghese, Nicole Shapiro, Tanja Woyke, Nikos 1019 C. Kyrpides, and Paula V. Morais. 2017. “Arboriscoccus Pini Gen. Nov., Sp. Nov., an 1020 Endophyte from a Pine Tree of the Class Alphaproteobacteria, Emended Description of 1021 Geminicoccus Roseus, and Proposal of Geminicoccaceae Fam. Nov.” Systematic and 1022 Applied Microbiology, December. https://doi.org/10.1016/j.syapm.2017.11.006. 1023

Rodríguez-Ezpeleta, Naiara, and T. Martin Embley. 2012. “The SAR11 Group of Alpha-1024 Proteobacteria Is Not Related to the Origin of Mitochondria.” PLoS ONE 7 (1): e30520. 1025 https://doi.org/10.1371/journal.pone.0030520. 1026

Roger, Andrew J., Sergio A. Muñoz-Gómez, and Ryoma Kamikawa. 2017. “The Origin and 1027 Diversification of Mitochondria.” Current Biology 27 (21): R1177–92. 1028 https://doi.org/10.1016/j.cub.2017.09.015. 1029

Rosenberg, Eugene, Edward F. DeLong, Stephen Lory, Erko Stackebrandt, and Fabiano 1030 Thompson, eds. 2014. The Prokaryotes: Alphaproteobacteria and Betaproteobacteria. 1031 4th ed. Berlin Heidelberg: Springer-Verlag. 1032 //www.springer.com/gp/book/9783642301964. 1033

Santos, Huarrisson Azevedo, and Carlos Luiz Massard. 2014. “The Family Holosporaceae.” In 1034 The Prokaryotes: Alphaproteobacteria and Betaproteobacteria, edited by Eugene 1035 Rosenberg, Edward F. DeLong, Stephen Lory, Erko Stackebrandt, and Fabiano 1036 Thompson, 237–46. Berlin, Heidelberg: Springer Berlin Heidelberg. 1037 https://doi.org/10.1007/978-3-642-30197-1_264. 1038

Sassera, Davide, Nathan Lo, Sara Epis, Giuseppe D’Auria, Matteo Montagna, Francesco 1039 Comandatore, David Horner, et al. 2011. “Phylogenomic Evidence for the Presence of a 1040 Flagellum and Cbb3 Oxidase in the Free-Living Mitochondrial Ancestor.” Molecular 1041 Biology and Evolution 28 (12): 3285–96. https://doi.org/10.1093/molbev/msr159. 1042

Susko, Edward, Léa Lincker, and Andrew J. Roger. 2018. “Accelerated Estimation of Frequency 1043 Classes in Site-Heterogeneous Profile Mixture Models.” Molecular Biology and Evolution 1044 may026. https://doi.org/10.1093/molbev/msy026. 1045

Susko, Edward, and Andrew J. Roger. 2007. “On Reduced Amino Acid Alphabets for 1046 Phylogenetic Inference.” Molecular Biology and Evolution 24 (9): 2139–50. 1047 https://doi.org/10.1093/molbev/msm144. 1048

Szokoli, Franziska, Michele Castelli, Elena Sabaneyeva, Martina Schrallhammer, Sascha 1049 Krenek, Thomas G. Doak, Thomas U. Berendonk, and Giulio Petroni. 2016. 1050 “Disentangling the Taxonomy of Rickettsiales and Description of Two Novel Symbionts 1051 (‘Candidatus Bealeia Paramacronuclearis’ and ‘Candidatus Fokinia Cryptica’) Sharing 1052

43

the Cytoplasm of the Ciliate Protist Paramecium Biaurelia.” Applied and Environmental 1053 Microbiology 82 (24): 7236–47. https://doi.org/10.1128/AEM.02284-16. 1054

Thrash, J. Cameron, Alex Boyd, Megan J. Huggett, Jana Grote, Paul Carini, Ryan J. Yoder, 1055 Barbara Robbertse, Joseph W. Spatafora, Michael S. Rappé, and Stephen J. 1056 Giovannoni. 2011. “Phylogenomic Evidence for a Common Ancestor of Mitochondria 1057 and the SAR11 Clade.” Scientific Reports 1 (June): 13. 1058 https://doi.org/10.1038/srep00013. 1059

Vaidya, Gaurav, David J Lohman, and Rudolf Meier. 2011. “SequenceMatrix: Concatenation 1060 Software for the Fast Assembly of Multi-gene Datasets with Character Set and Codon 1061 Information.” Cladistics 27 (2): 171–80. https://doi.org/10.1111/j.1096-1062 0031.2010.00329.x. 1063

Venkata Ramana, V., S. Kalyana Chakravarthy, E. V. V. Ramaprasad, V. Thiel, J. F. Imhoff, Ch 1064 Sasikala, and Ch V. Ramana. 2013. “Emended Description of the Genus 1065 Rhodothalassium Imhoff et Al., 1998 and Proposal of Rhodothalassiaceae Fam. Nov. 1066 and Rhodothalassiales Ord. Nov.” Systematic and Applied Microbiology 36 (1): 28–32. 1067 https://doi.org/10.1016/j.syapm.2012.09.003. 1068

Verbruggen, Heroen. 2018. “SiteStripper Version 1.03.” http://www.phycoweb.net/software. 1069 Viklund, Johan, Thijs J. G Ettema, and Siv G. E Andersson. 2012. “Independent Genome 1070

Reduction and Phylogenetic Reclassification of the Oceanic SAR11 Clade.” Molecular 1071 Biology and Evolution 29 (2): 599–615. https://doi.org/10.1093/molbev/msr203. 1072

Viklund, Johan, Joran Martijn, Thijs J. G. Ettema, and Siv G. E. Andersson. 2013. “Comparative 1073 and Phylogenomic Evidence That the Alphaproteobacterium HIMB59 Is Not a Member 1074 of the Oceanic SAR11 Clade.” PLoS ONE 8 (11): e78858. 1075 https://doi.org/10.1371/journal.pone.0078858. 1076

Wang, Huai-Chun, Bui Quang Minh, Edward Susko, and Andrew J. Roger. 2018. “Modeling Site 1077 Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate 1078 Phylogenomic Estimation.” Systematic Biology 67 (2): 216–35. 1079 https://doi.org/10.1093/sysbio/syx068. 1080

Wang, Zhang, and Martin Wu. 2013. “A Phylum-Level Bacterial Phylogenetic Marker Database.” 1081 Molecular Biology and Evolution 30 (6): 1258–62. 1082 https://doi.org/10.1093/molbev/mst059. 1083

———. 2014. “Phylogenomic Reconstruction Indicates Mitochondrial Ancestor Was an Energy 1084 Parasite.” PLoS ONE 9 (10): e110685. https://doi.org/10.1371/journal.pone.0110685. 1085

———. 2015. “An Integrated Phylogenomic Approach toward Pinpointing the Origin of 1086 Mitochondria.” Scientific Reports 5 (January): 7949. https://doi.org/10.1038/srep07949. 1087

Whitman, William B., ed. 2015. Bergey’s Manual of Systematics of Archaea and Bacteria. Wiley. 1088 https://onlinelibrary.wiley.com/doi/book/10.1002/9781118960608. 1089

Wiese, Jutta, Vera Thiel, Andrea Gärtner, Rolf Schmaljohann, and Johannes F. Imhoff. 2009. 1090 “Kiloniella Laminariae Gen. Nov., Sp. Nov., an Alphaproteobacterium from the Marine 1091 Macroalga Laminaria Saccharina.” International Journal of Systematic and Evolutionary 1092 Microbiology 59 (2): 350–56. https://doi.org/10.1099/ijs.0.001651-0. 1093

Williams, Kelly P., Bruno W. Sobral, and Allan W. Dickerman. 2007. “A Robust Species Tree for 1094 the Alphaproteobacteria.” Journal of Bacteriology 189 (13): 4578–86. 1095 https://doi.org/10.1128/JB.00269-07. 1096

Williams, Timothy J., Christopher T. Lefèvre, Weidong Zhao, Terry J. Beveridge, and Dennis A. 1097 Bazylinski. 2012. “Magnetospira Thiophila Gen. Nov., Sp. Nov., a Marine Magnetotactic 1098 Bacterium That Represents a Novel Lineage within the Rhodospirillaceae 1099 (Alphaproteobacteria).” International Journal of Systematic and Evolutionary 1100 Microbiology 62 (10): 2443–50. https://doi.org/10.1099/ijs.0.037697-0. 1101

44

Wu, Martin, Sourav Chatterji, and Jonathan A. Eisen. 2012. “Accounting for Alignment 1102 Uncertainty in Phylogenomics.” PloS One 7 (1): e30288. 1103 https://doi.org/10.1371/journal.pone.0030288. 1104

1105

45

SUPPLEMENTARY FIGURE LEGENDS 1106

Figure 2-figure supplement 1. A labeled version showing taxon names for Figure 2. 1107

Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. 1108

Figure 2-figure supplement 2. A diagram of the strategies and phylogenetic analyses 1109

employed in this study. 1110

Figure 2-figure supplement 3. Bayesian consensus trees inferred with PhyloBayes 1111

MPI v1.7 and the CAT-Poisson+Γ4 model. Branch support values are 1.0 posterior 1112

probabilities unless annotated. A. Bayesian consensus tree inferred from the full dataset 1113

which is highly compositionally heterogeneous. B. Bayesian consensus tree inferred 1114

from a dataset whose compositional heterogeneity has been decreased by removing 1115

50% of the most biased sites according to ɀ. See Figs. 2A and 2B for the most likely 1116

trees inferred in IQ-TREE v1.5.5 and the LG+PMSF(C60)+F+R6 model. 1117

Figure 2-figure supplement 4. Maximum-likelihood trees to assess the placements of 1118

the Holosporales, Rickettsiales, Pelagibacterales and alphaproteobacterium sp. 1119

HIMB59 when all four groups are included. Branch support values are 100% SH-aLRT 1120

and 100% UFBoot unless annotated. A. A tree that results from the analysis of the 1121

untreated dataset. B. A tree that results from the analysis of a dataset from which the 1122

50% most compositionally-biased sites have been removed. C. A tree that results from 1123

the analysis of a dataset that has been recoded into the four-character state recoding 1124

scheme S4 (recoding scheme: RNCM EHIPTWV ADQLKS GFY). D. A tree that results 1125

from the analysis of a dataset that only comprises the 40 most compositionally 1126

homogeneous genes. 1127

Figure 3-figure supplement 1. Maximum-likelihood trees to assess the placement of 1128

the Holosporales in the absence of the Rickettsiales, Pelagibacterales and 1129

alphaproteobacterium sp. HIMB59. Branch support values are 100% SH-aLRT and 1130

100% UFBoot unless annotated. A. A tree that results from the analysis of the untreated 1131

dataset. B. A tree that results from the analysis of a dataset from which the 50% most 1132

compositionally-biased sites have been removed. C. A tree that results from the 1133

analysis of a dataset that has been recoded into the four-character state recoding 1134

scheme S4 (recoding scheme: ARNDQEILKSTV GHY CMFP W). D. A tree that results 1135

from the analysis of a dataset that only comprises the 40 most compositionally 1136

homogeneous genes. 1137

Figure 3-figure supplement 2. Maximum-likelihood trees to assess the placement of 1138

the Rickettsiales in the absence of the Holosporales, Pelagibacterales, and 1139

alphaproteobacterium sp. HIMB59. Branch support values are 100% SH-aLRT and 1140

100% UFBoot unless annotated. A. A tree that results from the analysis of the untreated 1141

dataset. B. A tree that results from the analysis of a dataset from which the 50% most 1142

compositionally-biased sites have been removed. C. A tree that results from the 1143

analysis of a dataset that has been recoded into the four-character state recoding 1144

scheme S4 (recoding scheme: PY RNMF GHLKTW ADCQEISV). D. A tree that results 1145

46

from the analysis of a dataset that only comprises the 40 most compositionally 1146

homogeneous genes. 1147

Figure 3-figure supplement 3. Maximum-likelihood trees to assess the placement of 1148

the Rickettsiales in the absence of the Holosporales, Pelagibacterales, 1149

alphaproteobacterium sp. HIMB59 and the Beta-, and Gammaproteobacteria outgroup. 1150

Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. A. A 1151

tree that results from the analysis of the untreated dataset. B. A tree that results from 1152

the analysis of a dataset from which the 50% most compositionally-biased sites have 1153

been removed. C. A tree that results from the analysis of a dataset that has been 1154

recoded into the four-character state recoding scheme S4 (recoding scheme: RNMF 1155

GHLKTW ADCQEISV PY). D. A tree that results from the analysis of a dataset that only 1156

comprises the 40 most compositionally homogeneous genes. Branch support values 1157

are 100% SH-aLRT and 100% UFBoot unless annotated. 1158

Figure 3-figure supplement 4. Maximum-likelihood trees to assess the placement of 1159

the Pelagibacterales in the absence of the Holosporales, Rickettsiales and 1160

alphaproteobacterium sp. HIMB59. Branch support values are 100% SH-aLRT and 1161

100% UFBoot unless annotated. A. A tree that results from the analysis of the untreated 1162

dataset. B. A tree that results from the analysis of a dataset from which the 50% most 1163

compositionally-biased sites have been removed. C. A tree that results from the 1164

analysis of a dataset that has been recoded into the four-character state recoding 1165

scheme S4 (recoding scheme: EGIV ARNDQHKMPSY LFT CW). D. A tree that results 1166

from the analysis of a dataset that only comprises the 40 most compositionally 1167

homogeneous genes. 1168

Figure 3-figure supplement 5. Maximum-likelihood trees to assess the placement of 1169

alphaproteobacterium sp. HIMB59 in the absence of the Holosporales, Rickettsiales and 1170

Pelagibacterales. Branch support values are 100% SH-aLRT and 100% UFBoot unless 1171

annotated. A. A tree that results from the analysis of the untreated dataset. B. A tree 1172

that results from the analysis of a dataset from which the 50% most compositionally-1173

biased sites have been removed. C. A tree that results from the analysis of a dataset 1174

that has been recoded into the four-character state recoding scheme S4 (recoding 1175

scheme: RLKMT ANDQEIPSV CW GHFY). D. A tree that results from the analysis of a 1176

dataset that only comprises the 40 most compositionally homogeneous genes. 1177

Figure 3-figure supplement 6. Bayesian consensus trees inferred with PhyloBayes 1178

MPI v1.7 and the CAT-Poisson+Γ4 model. Branch support values are 1.0 posterior 1179

probabilities unless annotated. A. Bayesian consensus tree inferred to place the 1180

Holosporales in the absence of the Rickettsiales and the Pelagibacterales and when 1181

compositional heterogeneity has been decreased by removing 50% of the most biased 1182

sites according to ɀ. B. Bayesian consensus tree inferred to place the Holosporales in 1183

the absence of the Rickettsiales and the Pelagibacterales and when the data have been 1184

recoded into a four-character state alphabet (the dataset-specific recoding scheme S4: 1185

ARNDQEILKSTV GHY CMFP W) to reduce compositional heterogeneity. See Figs. 2A 1186

47

and 2B for the most likely trees inferred in IQ-TREE v1.5.5 and the 1187

LG+PMSF(C60)+F+R6 and GTR+ES60S4+F+R6 models, respectively. 1188

Figure 3-figure supplement 7. Maximum-likelihood trees to assess the placement of 1189

the Holosporales when the fast-evolving Holospora and ‘Candidatus Hepatobacter’ are 1190

also included in the absence of the Rickettsiales, Pelagibacterales and 1191

alphaproteobacterium sp. HIMB59. Branch support values are 100% SH-aLRT and 1192

100% UFBoot unless annotated. 1193

Figure 3-figure supplement 8. Bayesian consensus tree inferred to place the 1194

Holosporales in the absence of the Pelagibacterales, alphaproteobacterium sp. 1195

HIMB59, and Rickettsiales, and when the data have been recoded into a six-character 1196

state alphabet (the dataset-specific recoding scheme S6: AQEHISV RKMT PY DCLF 1197

NG W) to reduce compositional heterogeneity. Branch support values are 1.0 posterior 1198

probabilities unless annotated. 1199

Figure 2-figure supplement 5. Maximum-likelihood tree from the untreated dataset 1200

from which no taxon has been removed and analyzed under simpler LG4X model. In 1201

this tree, derived from an analysis using a model that does not account for 1202

compositional heterogeneity across sites, the Geminicoccaceae has a more derived 1203

placements within the Rhodospirillales as sister to the Acetobacteraceae. 1204

Figure 2-figure supplement 6. Constraint tree, used for IQ-TREE analyses, labeled 1205

with taxon names and also degree of missing data per taxon. Magnetococcales in gray; 1206

Rickettsiales in brown; Pelagibacterales in maroon; Holosporales in light blue; 1207

Rhizobiales in green; Caulobacterales in orange; Rhodobacterales in red; Sneathiellales 1208

in pink; Rhodospirillales in purple; Beta- and Gammaproteobacteria in black. 1209

Figure 2-figure supplement 7. GARP:FIMNKY ratios across the proteomes of the 120 1210

alphaproteobacteria and outgroup used in this study. 1211

1212

1213

1214

SUPPLEMENTARY FILES 1215

Supplementary file 1. A 16S rRNA gene maximum-likelihood tree of the Rickettsiales 1216

and Holosporales that phylogenetically places the three endosymbionts whose 1217

genomes were sequenced in this study: (1) ‘Candidatus Finniella inopinata’ 1218

endosymbiont of Viridiraptor invadens strain Virl02, (2) an alphaproteobacterium 1219

associated with Peranema trichophorum strain CCAP 1260/1B, and (3) an 1220

alphaproteobacterium associated with Stachyamoeba lipophora strain ATCC 50324. 1221

Branch support values are SH-aLRT and UFBoot. 1222

1223

48

1224

Supplementary file 2-table S1. Ultrafast bootstrap (UFBoot) variation for several 1225

clades discussed in this study as compositionally-biased sites, according to ɀ, are 1226

progressively removed in steps of 10%. 1227

Supplementary file 2-table S2. Ultrafast bootstrap (UFBoot) variation for several 1228

clades discussed in this study as the fastest sites are progressively removed in steps of 1229

10%. 1230

Supplementary file 2-table S3. GenBank assembly accession numbers for the 120 1231

alphaproteobacterial and outgroup genomes used in this study. 1232

Supplementary file 2-table S4. A list of the least compositionally heterogeneous genes 1233

out of the 200 single-copy and vertically-inherited genes used in this study. 1234

Supplementary file 2-table S5. Model fit of amino acid replacement matrices as 1235

components of simple models that do not account for compositional heterogeneity 1236

across sites. Models are ordered from lowest to highest BIC. -LnL: log-likelihood; df: 1237

degrees of freedom or number of free parameters; AIC: Akaike information criterion; 1238

AICc: corrected Akaike information criterion; BIC: Bayesian information criterion. 1239

Supplementary file 2-table S6. Model fit of amino acid replacement matrices as 1240

components of complex models that account for compositional heterogeneity across 1241

sites. Models are ordered from lowest to highest BIC. -LnL: log-likelihood; df: degrees of 1242

freedom or number of free parameters; AIC: Akaike information criterion; AICc: 1243

corrected Akaike information criterion; BIC: Bayesian information criterion. 1244

Supplementary file 2-table S7. Model fit of LG+ES60+F for which the model 1245

component that accounts for rate heterogeneity across sites varies. Models are ordered 1246

from lowest to highest BIC. -LnL: log-likelihood; df: degrees of freedom or number of 1247

free parameters; AIC: Akaike information criterion; AICc: corrected Akaike information 1248

criterion; BIC: Bayesian information criterion. 1249

Supplementary file 2-table S8. Several summary statistics for the PhyloBayes MCMC 1250

chains run for each analysis under the CAT-Poisson+Γ4. 1251

B.

20

30

40

50

60

70

80

0.2 0.7 1.2 1.7 2.2

G+C

%

GARP:FIMNKY

A.

0.45

2.36

0.006

RickettsialesPelagibacterales + alphaproteobacterium sp. HIMB59Holosporalesother Alphaproteobacteria

RickettsialesPelagibacterales + alphaproteobacterium sp. HIMB59Holosporalesother Alphaproteobacteria

99.4/99

99.9/100

90/92

84.2/88

99.9/100

99.8/99

98.7/98

95.9/93

99.4/98

98.6/97

99.9/100

99.4/100

80.9/84

100/94

3.8/51

99.8/100

100/94

99.9/100

100/92

78.3/89

98.6/99

60.6/87

93.5/94

100/94

98.6/99

87.8/94

93.2/90

100/93

99.2/98

91.6/93

99.9/100

0.4 0.2

Rhizobiales

Caulobacterales

Rhodobacterales

SphingomonadalesSneathiellales

RhodospirillalesRhodospirillales

Rhizobiales

Caulobacterales

Rhodobacterales

SphingomonadalesSneathiellales

Holosporales

Holosporales

Pelagibacterales

Pelagibacterales

Rickettsiales Rickettsiales

A. B.

Geminicoccaceae

Geminicoccaceae

alphaproteobacterium HIMB59

alphaproteobacterium HIMB59

44.3/89

52.3/57

100/100

99.3/100

12.8/75

98.8/99

96/97

67/87

14.9/83

99.6/99

99.8/99

96/100

37.2/93

13.8/78

98/99

90.9/99

100/99

77.3/83

99.9/100

99.9/99

100/99

92.5/87

100/100

98.8/99

99.7/97

70.2/79

99.7/99

99.9/100

99.4/99

Holosporaceae

Azospirillaceae

Rhodovibriaceae

Rhodospirillaceae

Acetobacteraceae

0.20.2

A. B.

Geminicoccaceae

Geminicoccaceae

Rhizobiales

Caulobacterales

Rhodobacterales

SphingomonadalesSneathiellales

Sneathiellales

Rhodospirillum rubrumRhodospirillum rubrum

Rhodocista centenaria Rhodocista centenaria

Micavibrio aeruginosavorusMicavibrio aeruginosavorus

Rhodovibrio salinarum Rhodovibrio salinarumKiloniella laminariaeKiloniella laminariae

‘Ca. Puniceispirillum marinum’ ‘Ca. Puniceispirillum marinum’

Magnetospirillum magneticum Magnetospirillum magneticum

Acidisphaera rubrifaciens Acidisphaera rubrifaciens

Rubritepida flocculans Rubritepida flocculans

Caedibacter sp. 37-49 Caedibacter sp. 37-49

‘Ca. Paracaedibacter symbiosus’‘Ca. Paracaedibacter symbiosus’

Class 1. Alphaproteobacteria Garrity et al. 2005Subclass 1. Rickettsidae Ferla et al. 2013 emend. Muñoz-Gómez et al. 2018

Order 1. Rickettsiales Gieszczykiewicz 1939 emend. Dumler et al. 2001Family 1. Anaplasmataceae Philip 1957

Family 3. Rickettsiaceae Pinkerton 1936Family 2. 'Candidatus Midichloriaceae' Montagna et al. 2013

Subclass 2. Caulobacteridae Ferla et al. 2013 emend. Muñoz-Gómez et al. 2018Order 1. Rhodospirillales Pfennig and Truper 1971 emend. Muñoz-Gómez et al. 2018

Family 1. Acetobacteriacae ex Henrici 1939 Gillis and De Ley 1980Family 2. Rhodospirillaceae Pfennig and Truper 1971 emend. Muñoz-Gómez et al. 2018Family 3. Azospirillaceae fam. nov. Muñoz-Gómez et al. 2018Family 4. Holosporaceae Szokoli et al. 2016Family 5. Rhodovibriaceae fam. nov. Muñoz-Gómez et al. 2018Family 6. Geminicoccaceae Proença et al. 2017

Order 2. Sneathiellales Kurahashi et al. 2008Order 3. Sphingomonadales Yabuuchi et al. 1990Order 4. Pelagibacterales Grote et al. 2012Order 5. Rhodobacterales Garrity et al. 2006Order 6. Caulobacterales Henrici and Johnson 1935Order 7. Rhizobiales Kuykendall 2006

Class 2. Magnetococcia Parks et al. 2018Order 1. Magnetococcales Bazylinki et al. 2013

Rhodovibrio salinarum DSM 9154

Methylovorus glucosetrophus SIP3-4

Bartonella quintana RM-11

alphaproteobacterium AAP81b

Candidatus Arcanobacter lacustris

Methylocystis sp. SC2

Octadecabacter antarcticus 307

Caenispirillum salinarum AK4

Belnapia moabensis DSM 16746

Candidatus Midichloria mitochondrii IricVA

Candidatus Odyssella thessalonicensis L13

Chelativorans sp. J32

Hirschia baltica ATCC 49814

Thermopetrobacter sp. TC1

endosymbiont of Peranema

Caedibacter sp. 38-128

Magnetofaba australis IT-1

Sphingomonas wittichii

alphaproteobacterium L41A

Geminicoccus roseus DSM 18922

Nitrosospira multiformis ATCC 25196

Oceanicaulis sp. HTCC2633

Candidatus Hepatobacter penaei

Xanthobacter autotrophicus Py2

Magnetococcus marinus MC-1

Acidiphilium angustum ATCC 35903

Polymorphum gilvum SL003B-26A1

Candidatus Finniella lucida

Candidatus Paracaedibacter symbiosus

Caedibacter varicaedens

Spongiibacter tropicus DSM 19543

Rhodocista sp. MIMtkB3

Anaplasma phagocytophilum HZ

Rhodomicrobium vannielii ATCC 17100

Rhodobacter sphaeroides 2.4.1

Rhodospirillum rubrum ATCC 11170

Candidatus Pelagibacter ubique HTCC8051

Pelagibacterium halotolerans B2Pseudovibrio sp. FO-BEG1

endosymbiont of Stachyamoeba

Hyphomonas neptunium ATCC 15444

Thalassobaculum salexigens DSM 19539

Jannaschia sp. EhC01

Thalassospira profundimaris WP0211

Elioraea tepidiphila DSM 17972

Ralstonia solanacearum GMI1000

Methylocella silvestris BL2

Candidatus Pelagibacter ubique HTCC1002

Commensalibacter intestini A911

Maricaulis maris MCS10

Kordiimonas gwangyangensis DSM 19435 - JCM 12864

Gluconacetobacter diazotrophicus PA1 5

Enterobacter soli ATCC BAA-2102

alphaproteobacterium HIMB59

Ketogulonicigenium vulgare WSH-001

Oceanibaculum indicum P24

Terasakiella pusilla DSM 6293

Sneathiella glossodoripedis JCM 23214

Roseomonas cervicalis ATCC 49957

Phaeospirillum fulvum MGU-K5

Rhodopseudomonas palustris TIE-1

Burkholderia thailandensis E264

Candidatus Puniceispirillum marinum IMCC1322

Tistrella mobilis KA081020-065

Brevundimonas subvibrioides ATCC 15264

Wolbachia endosymbiont of Culex quinquefasciatus Pel

Parvularcula bermudensis HTCC2503

Congregibacter litoralis KT71

Phenylobacterium zucineum HLK1

alphaproteobacterium AAP38

Orientia tsutsugamushi Boryong

Roseibium sp. TrichSKD4

Paracoccus denitrificans PD1222

alphaproteobacterium HIMB114

Asticcacaulis excentricus CB 48

Granulibacter bethesdensis CGDNIH1

Rubellimicrobium thermophilum DSM 16684

alphaproteobacterium Q-1

Wolbachia endosymbiont of Onchocerca ochengiEhrlichia canis Jake

Acidisphaera rubrifaciens HS-AP3

Kiloniella laminariae DSM 19542

Caedibacter sp. 37-49

Alteromonas lipolytica

alphaproteobacterium SCGC AAA288-N07

Roseospirillum parvum strain 930IPararhodospirillum photometricum DSM 122

alphaproteobacterium BAL199

Hyphomicrobium denitrificans ATCC 51888

Meganema perideroedes DSM 15528

Candidatus Paracaedibacter acanthamoebae isolate PRA3

Magnetospirillum magneticum AMB-1

Sagittula stellata E-37

Rhodospirillum centenum SW

Gluconobacter oxydans H24

Candidatus Jidaibacter acanthamoeba

Candidatus Caedibacter acanthamoebae

Parvibaculum lavamentivorans DS-1

Neorickettsia sennetsu Miyayama

Candidatus Nucleicultrix amoebiphila FS5

Rickettsia typhi Wilmington

Holospora undulata HU1

Rhizobium leguminosarum bv. trifolii WSM1689

Holospora obtusa F1

Methylobacterium extorquens AM1

alphaproteobacterium IMCC14465

Micavibrio aeruginosavorus ARL-13

Rubritepida flocculans DSM 14296

alphaproteobacterium Mf105b01

Ahrensia sp. R2A130

Candidatus Pelagibacter sp. IMCC9063

Pelagibacter sp. HIMB058

alphaproteobacterium RS24

alphaproteobacterium HIMB5

Ruegeria sp. ANG-R

Zymomonas mobilis sub mobilis ATCC 10988

Inquilinus limosus DSM 16000

alphaproteobacterium SCGC AAA280-P20

alphaproteobacterium LLX12ACitromicrobium sp. JLT1363

90/92

84.2/88

99.9/100

99.9/100

95.9/93

98.7/98

99.9/100

99.4/99

99.4/98

98.6/97

99.8/99

alphaproteobacterium BAL199

Methylobacterium extorquens AM1

Rhodobacter sphaeroides 2.4.1Paracoccus denitrificans PD1222

Zymomonas mobilis sub mobilis ATCC 10988

Inquilinus limosus DSM 16000

Gluconobacter oxydans H24

Hirschia baltica ATCC 49814

Candidatus Odyssella thessalonicensis L13

alphaproteobacterium HIMB59

Oceanibaculum indicum P24

alphaproteobacterium HIMB5

Hyphomicrobium denitrificans ATCC 51888

Methylocella silvestris BL2

Candidatus Caedibacter acanthamoebae

Octadecabacter antarcticus 307

alphaproteobacterium Mf105b01alphaproteobacterium RS24

Oceanicaulis sp. HTCC2633

Acidisphaera rubrifaciens HS-AP3

alphaproteobacterium Q-1

Maricaulis maris MCS10

Acidiphilium angustum ATCC 35903

endosymbiont of Peranema

Magnetospirillum magneticum AMB-1Terasakiella pusilla DSM 6293

Parvibaculum lavamentivorans DS-1

alphaproteobacterium L41A

Orientia tsutsugamushi Boryong

Holospora undulata HU1

Wolbachia endosymbiont of Culex quinquefasciatus Pel

Phenylobacterium zucineum HLK1

Tistrella mobilis KA081020-065

Rhizobium leguminosarum bv. trifolii WSM1689

Rhodocista sp. MIMtkB3

Methylocystis sp. SC2

alphaproteobacterium SCGC AAA288-N07

Rickettsia typhi Wilmington

alphaproteobacterium HIMB114

Geminicoccus roseus DSM 18922

alphaproteobacterium SCGC AAA280-P20

alphaproteobacterium LLX12A

Methylovorus glucosetrophus SIP3-4Burkholderia thailandensis E264

Candidatus Nucleicultrix amoebiphila FS5

Pelagibacter sp. HIMB058

Belnapia moabensis DSM 16746

Sagittula stellata E-37

Micavibrio aeruginosavorus ARL-13

Roseibium sp. TrichSKD4

alphaproteobacterium AAP38

Bartonella quintana RM-11

Polymorphum gilvum SL003B-26A1

Ketogulonicigenium vulgare WSH-001

Roseomonas cervicalis ATCC 49957

Rhodomicrobium vannielii ATCC 17100

Rhodovibrio salinarum DSM 9154

Ehrlichia canis Jake

Phaeospirillum fulvum MGU-K5

alphaproteobacterium IMCC14465

Citromicrobium sp. JLT1363

Hyphomonas neptunium ATCC 15444

Rhodospirillum centenum SW

Gluconacetobacter diazotrophicus PA1 5

Spongiibacter tropicus DSM 19543

Jannaschia sp. EhC01

Kiloniella laminariae DSM 19542

Ahrensia sp. R2A130

Candidatus Puniceispirillum marinum IMCC1322

Candidatus Jidaibacter acanthamoeba

Roseospirillum parvum strain 930I

Rubritepida flocculans DSM 14296

Commensalibacter intestini A911

Candidatus Midichloria mitochondrii IricVA

Thermopetrobacter sp. TC1

Neorickettsia sennetsu Miyayama

Candidatus Paracaedibacter symbiosus

Parvularcula bermudensis HTCC2503

Candidatus Pelagibacter sp. IMCC9063

Thalassobaculum salexigens DSM 19539

Rubellimicrobium thermophilum DSM 16684

Anaplasma phagocytophilum HZ

Chelativorans sp. J32

Ruegeria sp. ANG-R

Elioraea tepidiphila DSM 17972

Alteromonas lipolytica

Pseudovibrio sp. FO-BEG1

Sneathiella glossodoripedis JCM 23214

alphaproteobacterium AAP81b

Candidatus Pelagibacter ubique HTCC8051

Kordiimonas gwangyangensis DSM 19435 - JCM 12864

Pelagibacterium halotolerans B2

Candidatus Finniella lucida

Asticcacaulis excentricus CB 48

Meganema perideroedes DSM 15528

Candidatus Arcanobacter lacustris

Rhodopseudomonas palustris TIE-1

Congregibacter litoralis KT71

Nitrosospira multiformis ATCC 25196

Thalassospira profundimaris WP0211

Candidatus Paracaedibacter acanthamoebae isolate PRA3

endosymbiont of Stachyamoeba

Brevundimonas subvibrioides ATCC 15264

Caenispirillum salinarum AK4

Ralstonia solanacearum GMI1000

Magnetococcus marinus MC-1

Xanthobacter autotrophicus Py2

Caedibacter sp. 38-128

Holospora obtusa F1

Pararhodospirillum photometricum DSM 122

Candidatus Hepatobacter penaei

Candidatus Pelagibacter ubique HTCC1002

Magnetofaba australis IT-1

Rhodospirillum rubrum ATCC 11170

Caedibacter varicaedens

Caedibacter sp. 37-49

Granulibacter bethesdensis CGDNIH1

Sphingomonas wittichii

Wolbachia endosymbiont of Onchocerca ochengi

Enterobacter soli ATCC BAA-2102

100/92

99.9/100

100/94

99.8/100

78.3/89

99.2/98

100/93

93.5/94

100/94

87.8/94

91.6/9399.4/100

93.2/90

80.9/84

99.9/100

3.8/51

100/94

60.6/87

98.6/99

98.6/99

0.4 0.2

120 taxa&

200 genesdataset

Site removalin steps of 10%

by compositional

biasby rate

Recodinginto S4

Set of 40 mostcompositionally

homogeneous genes

Taxonremoval

Phylogeneticanalyses

Phylogeneticanalyses

Phylogeneticanalyses

Phylogeneticanalyses

Taxonremoval

Taxonremoval

Phylogeneticanalyses

Phylogeneticanalyses

Phylogeneticanalyses

Taxonremoval

Phylogeneticanalyses

0.2

alphaproteobacterium sp. HIMB59

0.99

0.97

0.99

0.99

0.98

0.99

0.61

0.99

0.99

0.99

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Magnetococcales

Holosporales

Beta- & Gammaproteobacteria

Rickettsiales

Pelagibacterales

A. B.

0.4

0.99

0.99

0.99

0.50.98

0.99

0.99

0.97 0.5

1

alphaproteobacterium sp. HIMB59

Rhizobiales

Caulobacterales

Rhodobacterales

SphingomonadalesSneathiellales

Rhodospirillales

GeminicoccaceaeMagnetococcales

Holosporales

Beta- & Gammaproteobacteria

Rickettsiales

Pelagibacterales

0.4

alphaproteobacterium sp. HIMB59

99.7/100

99/97

0.2

alphaproteobacterium sp. HIMB59

93.5/94

100/94

100/94

100/93

100/94

100/92

98.6/99

0.4

alphaproteobacterium sp. HIMB59

99.9/100

96.3/100

99.8/100

76.8/100

0.3

alphaproteobacterium sp. HIMB59

99.3/100

94.5/98

97.2/99

99.9/100

99.1/100

98.5/100

98.4/99

79.2/91

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Rickettsiales

Pelagibacterales

Holosporales

MagnetococcalesBeta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Holosporales

Rickettsiales

MagnetococcalesBeta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

RhodospirillalesGeminicoccaceae

Rickettsiales

Pelagibacterales

Holosporales

MagnetococcalesBeta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

SphingomonadalesSneathiellales

Rhodospirillales

Geminicoccaceae

Rickettsiales

Pelagibacterales

Holosporales

MagnetococcalesBeta- & Gammaproteobacteria

Pelagibacterales

A. B.

C. D.

0.3

Holospora obtusa F1alphaproteobacterium sp. HIMB59

Holospora undulata HU1Candidatus Hepatobacter penaei

98.5/97

100/97

100/96

100/97

95.1/82

100/46

99.4/99

100/5299.2/100

36.9/88

64.7/50

100/97

97.7/46

99.2/98

85.3/82

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Rhodospirillales

Sneathiellales

Geminicoccaceae

Rickettsiales

Magnetococcales

Pelagibacterales

Beta- & Gammaproteobacteria

Holosporales

Rhodobacter sphaeroides 2.4.1

alphaproteobacterium HIMB59

Commensalibacter intestini A911

alphaproteobacterium SCGC AAA280-P20

Parvularcula bermudensis HTCC2503

alphaproteobacterium AAP38

Caedibacter sp. 37-49

Hirschia baltica ATCC 49814

Roseospirillum parvum strain 930I

Thermopetrobacter sp. TC1

Zymomonas mobilis sub mobilis ATCC 10988

Oceanibaculum indicum P24

Pseudovibrio sp. FO-BEG1

Nitrosospira multiformis ATCC 25196

Rhizobium leguminosarum bv. trifolii WSM1689

Octadecabacter antarcticus 307

Rubellimicrobium thermophilum DSM 16684

Rubritepida flocculans DSM 14296

alphaproteobacterium SCGC AAA288-N07

Candidatus Odyssella thessalonicensis L13

Candidatus Pelagibacter sp. IMCC9063

Pelagibacterium halotolerans B2

alphaproteobacterium Q-1

Candidatus Puniceispirillum marinum IMCC1322

Pararhodospirillum photometricum DSM 122

Ehrlichia canis Jake

Rhodocista sp. MIMtkB3

Burkholderia thailandensis E264

Rhodomicrobium vannielii ATCC 17100

Thalassospira profundimaris WP0211

alphaproteobacterium HIMB114

Candidatus Pelagibacter ubique HTCC8051

Candidatus Hepatobacter penaei

Magnetospirillum magneticum AMB-1

Ralstonia solanacearum GMI1000

Caedibacter varicaedens

alphaproteobacterium BAL199

Gluconobacter oxydans H24

Candidatus Paracaedibacter acanthamoebae isolate PRA3

Candidatus Pelagibacter ubique HTCC1002

Magnetofaba australis IT-1

Caedibacter sp. 38-128

Ruegeria sp. ANG-R

Sneathiella glossodoripedis JCM 23214

Methylobacterium extorquens AM1

Tistrella mobilis KA081020-065

Candidatus Jidaibacter acanthamoeba

Citromicrobium sp. JLT1363

Methylovorus glucosetrophus SIP3-4

Holospora obtusa F1

Kordiimonas gwangyangensis DSM 19435 - JCM 12864

Methylocystis sp. SC2

Holospora undulata HU1

Jannaschia sp. EhC01

alphaproteobacterium AAP81b

Oceanicaulis sp. HTCC2633

alphaproteobacterium Mf105b01

Spongiibacter tropicus DSM 19543

Phenylobacterium zucineum HLK1

Caenispirillum salinarum AK4

Rhodovibrio salinarum DSM 9154

Rhodopseudomonas palustris TIE-1

Magnetococcus marinus MC-1

alphaproteobacterium LLX12A

Maricaulis maris MCS10

Candidatus Finniella lucida

Chelativorans sp. J32

Meganema perideroedes DSM 15528

Ketogulonicigenium vulgare WSH-001

Bartonella quintana RM-11

Roseomonas cervicalis ATCC 49957Acidiphilium angustum ATCC 35903

Kiloniella laminariae DSM 19542

Methylocella silvestris BL2

Anaplasma phagocytophilum HZ

Candidatus Nucleicultrix amoebiphila FS5

Gluconacetobacter diazotrophicus PA1 5

Hyphomonas neptunium ATCC 15444

Paracoccus denitrificans PD1222

Elioraea tepidiphila DSM 17972

Rickettsia typhi Wilmington

Wolbachia endosymbiont of Culex quinquefasciatus Pel

Ahrensia sp. R2A130

Xanthobacter autotrophicus Py2

Sagittula stellata E-37

Terasakiella pusilla DSM 6293

Candidatus Caedibacter acanthamoebae

Enterobacter soli ATCC BAA-2102

Roseibium sp. TrichSKD4

Acidisphaera rubrifaciens HS-AP3

Inquilinus limosus DSM 16000

alphaproteobacterium RS24

endosymbiont of Peranema

endosymbiont of Stachyamoeba

Alteromonas lipolytica

Candidatus Paracaedibacter symbiosus

Granulibacter bethesdensis CGDNIH1

Phaeospirillum fulvum MGU-K5

Parvibaculum lavamentivorans DS-1

Asticcacaulis excentricus CB 48

Wolbachia endosymbiont of Onchocerca ochengiNeorickettsia sennetsu Miyayama

alphaproteobacterium L41A

alphaproteobacterium IMCC14465

Thalassobaculum salexigens DSM 19539

Candidatus Midichloria mitochondrii IricVA

Brevundimonas subvibrioides ATCC 15264

Orientia tsutsugamushi Boryong

Belnapia moabensis DSM 16746

Congregibacter litoralis KT71

Geminicoccus roseus DSM 18922

Rhodospirillum rubrum ATCC 11170

Micavibrio aeruginosavorus ARL-13Rhodospirillum centenum SW

Hyphomicrobium denitrificans ATCC 51888

Sphingomonas wittichii

Candidatus Arcanobacter lacustris

Polymorphum gilvum SL003B-26A1

Pelagibacter sp. HIMB058alphaproteobacterium HIMB5

54,400(100%)

27,200(50%)

0(0%)

0

0.5

1

1.5

2

2.5

alph

apro

teob

acte

rium

SC

GC

AA

A288

-N07

Can

dida

tus

Pela

giba

cter

ubi

que

HTC

C80

51Pe

lagi

bact

er s

p. H

IMB

058

alph

apro

teob

acte

rium

HIM

B11

4al

phap

rote

obac

teriu

m S

CG

C A

AA2

80-P

20al

phap

rote

obac

teriu

m H

IMB

5C

andi

datu

s Pe

lagi

bact

er u

biqu

e HT

CC

1002

alph

apro

teob

acte

rium

HIM

B59

Can

dida

tus

Pela

giba

cter

IMC

C90

63R

icke

ttsia

typh

i Wilm

ingt

onEh

rlich

ia c

anis

Jak

eO

rient

ia ts

utsu

gam

ushi

Bor

yong

Wol

bach

ia e

ndos

ymbi

ont o

f Onc

hoce

rca

oche

ngi

Wol

bach

ia e

ndos

ymbi

ont o

f Cul

ex q

uinq

uefa

scia

tus

Pel

endo

sym

bion

t of S

tach

yam

oeba

Can

dida

tus

Jidai

bact

er a

cant

ham

oeba

Can

dida

tus

Arca

noba

cter

lacu

stris

Can

dida

tus

Mid

ichlo

ria m

itoch

ondr

ii Iric

VAen

dosy

mbi

ont o

f Per

anem

aN

eoric

ketts

ia s

enne

tsu

Miy

ayam

aH

olos

pora

obt

usa

F1An

apla

sma

phag

ocyt

ophi

lum

HZ

Hol

ospo

ra u

ndul

ata

HU1

Cae

diba

cter

sp.

37-

49C

andi

datu

s Nu

clei

cultr

ix a

moe

biph

ila F

S5

Can

dida

tus

Caed

ibac

ter a

cant

ham

oeba

eC

aedi

bact

er s

p. 3

8-12

8Ba

rtone

lla q

uint

ana

RM

-11

Can

dida

tus

Para

caed

ibac

ter a

cant

ham

oeba

e iso

late

PR

A3

Can

dida

tus

Para

caed

ibac

ter s

ymbi

osus

Cae

diba

cter

var

icae

dens

Can

dida

tus

Ody

ssel

la th

essa

loni

cens

is L

13C

andi

datu

s Fi

nnie

lla lu

cida

Com

men

salib

acte

r int

estin

i A91

1Al

tero

mon

as li

polyt

icaSn

eath

iella

glo

ssod

orip

edis

JCM

232

14Te

rasa

kiel

la p

usill

a D

SM 6

293

alph

apro

teob

acte

rium

RS

24En

tero

bact

er s

oli A

TCC

BAA

-210

2C

andi

datu

s He

pato

bact

er p

enae

ial

phap

rote

obac

teriu

m IM

CC

1446

5H

irsch

ia b

altic

a A

TCC

498

14N

itros

ospi

ra m

ultif

orm

is A

TCC

251

96Zy

mom

onas

mob

ilis s

ub m

obilis

ATC

C 10

988

Kilo

niel

la la

min

aria

e D

SM 1

9542

Met

hylo

voru

s gl

ucos

etro

phus

SIP

3-4

Mica

vibr

io a

erug

inos

avor

us A

RL-

13Ps

eudo

vibr

io s

p. F

O-B

EG

1Th

alas

sosp

ira p

rofu

ndim

aris

WP

0211

Mag

neto

cocc

us m

arin

us M

C-1

Can

dida

tus

Puni

ceis

piri l

lum

mar

inum

IMCC

1322

Kord

iimon

as g

wan

gyan

gens

is D

SM

194

35 -

JCM

128

64O

ctad

ecab

acte

r ant

arct

icus

307

Rhi

zobi

um le

gum

inos

arum

bv.

trifo

lii W

SM

1689

Ros

eibi

um s

p. T

richS

KD4

Spon

giib

acte

r tro

picu

s D

SM 1

9543

Ahre

nsia

sp.

R2A

130

Astic

caca

ulis

exc

entri

cus

CB

48M

agne

tofa

ba a

ustra

lis IT

-1Pe

lagi

bact

eriu

m h

alot

oler

ans

B2

Rue

geria

sp.

AN

G-R

alph

apro

teob

acte

rium

Mf1

05b0

1Bu

rkho

lder

ia th

aila

nden

sis

E264

Ral

ston

ia s

olan

acea

rum

GM

I100

0H

ypho

mic

robi

um d

enitr

ifica

ns A

TCC

518

88C

hela

tivor

ans

sp. J

32al

phap

rote

obac

teriu

m Q

-1C

ongr

egib

acte

r lito

ralis

KT7

1R

hodo

pseu

dom

onas

pal

ustri

s TI

E-1

Sagi

ttula

ste

llata

E-3

7Ja

nnas

chia

sp.

EhC

01Ke

togu

loni

cige

nium

vul

gare

WSH

-001

Parv

ibac

ulum

lava

men

tivor

ans

DS-

1R

hodo

mic

robi

um v

anni

elii

ATC

C 1

7100

alph

apro

teob

acte

rium

AAP

38H

ypho

mon

as n

eptu

nium

ATC

C 1

5444

Ther

mop

etro

bact

er s

p. T

C1

alph

apro

teob

act e

rium

LLX

12A

Cae

nisp

irillu

m s

alin

arum

AK4

Mag

neto

spiri

llum

mag

netic

um A

MB

-1M

aric

aulis

mar

is M

CS10

Oce

anica

ulis

sp.

HTC

C26

33C

itrom

icrob

ium

sp.

JLT

1363

Sphi

ngom

onas

witt

ichii

Glu

cono

bact

er o

xyda

ns H

24M

ethy

loce

lla s

ilves

tris

BL2

Oce

anib

acul

um in

dicu

m P

24Pa

rvul

arcu

la b

erm

uden

sis H

TCC

2503

Rho

docis

ta s

p. M

IMtk

B3Po

lym

orph

um g

ilvum

SL0

03B-

26A1

Thal

asso

bacu

lum

sal

exig

ens

DSM

195

39M

ethy

locy

stis

sp.

SC

2al

phap

rote

oba c

teriu

m L

41A

Para

cocc

us d

enitr

ifica

ns P

D12

22R

hodo

bact

er s

phae

roid

es 2

.4.1

Rho

dovib

rio s

alin

arum

DSM

915

4Xa

ntho

bact

er a

utot

roph

icus

Py2

alph

apro

teob

acte

rium

BAL

199

Acid

iphi

lium

ang

ustu

m A

TCC

359

03Br

evun

dim

onas

sub

vibrio

ides

ATC

C 15

264

Phae

ospi

rillu

m fu

lvum

MG

U-K

5R

hodo

spiri

llum

rubr

um A

TCC

1117

0G

ranu

libac

ter b

ethe

sden

sis C

GDN

IH1

Met

hylo

bact

eriu

m e

xtor

quen

s AM

1G

emin

icocc

us ro

seus

DS

M 1

8922

Inqu

ilinus

lim

osu s

DSM

160

00R

hodo

spiri

llum

cen

tenu

m S

WPh

enylo

bact

eriu

m z

ucin

eum

HLK

1Ti

stre

lla m

obilis

KA0

8102

0-06

5Pa

rarh

odos

piril

lum

pho

tom

etric

um D

SM 1

22M

egan

ema

perid

eroe

des

DSM

155

28R

oseo

mon

as c

ervi

calis

ATC

C 4

9957

Ros

eosp

irillu

m p

arvu

m s

train

930

IAc

idisp

haer

a ru

brifa

cien

s H

S-AP

3G

luco

nace

toba

cter

dia

zotro

phicu

s PA

1 5

alph

apro

teob

acte

rium

AAP

81b

Rub

ellim

icro

bium

ther

mop

hilu

m D

SM 1

6684

Beln

apia

moa

bens

is D

SM 1

6746

Eli o

raea

tepi

diph

ila D

SM 1

7972

Rub

ritep

ida

flocc

ulan

s D

SM 1

4296

GAR

P:FI

MNK

Y ra

tio

Taxon

0.3

99.7/100

97/96

0.2

77.3/83

99.9/99

98.8/99

100/100

99.4/99 70.2/79

99.9/100

99.9/100

99.7/97

100/99

92.5/87

99.7/99

100/99

0.2

3.1/64

39.4/87

66.8/87

99.9/10077.8/95

84.3/92

99.1/99

98.5/99

63.8/90

98/99

99.7/99

73.8/89

99.7/100

99.8/99

96.8/96

96.7/98

0.2

99.2/100

61.7/88

91.3/96

99.9/100

98.9/100

98.9/99

99/100

99.3/99

99.8/100

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Holosporales

Magnetococcales

Beta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Holosporales

Magnetococcales

Beta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Holosporales

Magnetococcales

Beta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Holosporales

Magnetococcales

Beta- & Gammaproteobacteria

A. B.

C. D.

0.3

99.4/99

84.2/84

0.2

98.6/98

0.4

98.1/95

100/97

94.9/99

87.2/88

63.2/87

0.3

99.3/100

82.6/91

99.3/100

98.9/100

66.9/85

99.1/99

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Rhodospirillales

Geminicoccaceae

Rickettsiales

Magnetococcales

Beta- & Gammaproteobacteria

Sneathiellales

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Rhodospirillales

Geminicoccaceae

Rickettsiales

Magnetococcales

Beta- & Gammaproteobacteria

Sneathiellales

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Rhodospirillales

Geminicoccaceae

Rickettsiales

Magnetococcales

Beta- & Gammaproteobacteria

Sneathiellales

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Rhodospirillales

Geminicoccaceae

Rickettsiales

Magnetococcales

Beta- & Gammaproteobacteria

Sneathiellales

A. B.

C. D.

0.4

55.3/70

0.2

99.2/99

99.8/100

99.9/100

0.4

98.5/24

75.6/21

98.8/100

88.1/100

0.3

13.6/60

99.8/100

98.1/99

92.3/94

98.2/98

99.4/100

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Magnetococcales

Rickettsiales

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Magnetococcales

Rickettsiales

Rhizobiales

Caulobacterales

Rhodobacterales

SphingomonadalesSneathiellales

Rhodospirillales

Geminicoccaceae

Magnetococcales

Rickettsiales

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Magnetococcales

Rickettsiales

A. B.

C. D.

0.3

100/59

100/59

100/59

0.2

93.2/94

90.4/92

0.3

61.2/88

19.3/98

98.1/100

79.5/95

0.2

97.8/98

99.9/100

94.6/92

98.9/96

96.7/98

99.2/100

99.4/100

97.9/94

97.9/98

98.8/99

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Pelagibacterales

Magnetococcales

Beta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Pelagibacterales

MagnetococcalesBeta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae

Pelagibacterales

Magnetococcales

Beta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

SphingomonadalesSneathiellales

Rhodospirillales

Geminicoccaceae

Pelagibacterales

Magnetococcales

Beta- & Gammaproteobacteria

A. B.

C. D.

0.3

alphaproteobacterium sp. HIMB59

99.8/100

0.2

alphaproteobacterium sp. HIMB59

99.2/99

99.8/97

99.5/97

0.3

alphaproteobacterium sp. HIMB59

97.4/91

99.8/100

89.1/100

86.4/90

95.6/99

0.2

alphaproteobacterium sp. HIMB59

99.1/91

87.8/86

99.1/100

95.2/96

99.5/99

99.9/100

98.4/100

98.4/99

99.9/96

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

GeminicoccaceaeMagnetococcales

Beta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

GeminicoccaceaeMagnetococcales

Beta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

GeminicoccaceaeMagnetococcales

Beta- & Gammaproteobacteria

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

GeminicoccaceaeMagnetococcales

Beta- & Gammaproteobacteria

A. B.

C. D.

0.3

0.97

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

GeminicoccaceaeMagnetococcales

Holosporales

Beta- & Gammaproteobacteria

0.2

0.99

0.99

0.98

0.99

0.97

0.99

0.97

0.99

0.99

0.99

0.98

0.92

0.55

0.99

0.99

0.98

0.95

0.8

0.96

0.98

0.97

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

Geminicoccaceae Magnetococcales

Holosporales

Beta- & Gammaproteobacteria

A. B.

0.2

Holospora undulata HU1Candidatus Hepatobacter penaei

Holospora obtusa F1

98.4/98

34.1/57

99.7/97

78.5/89

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

GeminicoccaceaeMagnetococcales

Beta- & Gammaproteobacteria

Holosporales

0.4

0.99

0.99

0.96

0.98

0.98

0.99

0.61

0.99

0.95

0.99

0.99

0.99

0.99

0.88

0.99

0.83

0.92

0.99

Rhizobiales

Caulobacterales

Rhodobacterales

Sphingomonadales

Sneathiellales

Rhodospirillales

GeminicoccaceaeMagnetococcales

Holosporales

Beta- & Gammaproteobacteria