Error, signal, and the placement of Ctenophora sister to all other animals
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Error, signal, and the placement of Ctenophora sister to all other animals
Error, signal, and the placement of Ctenophora sisterto all other animalsNathan V. Whelana,1, Kevin M. Kocotb, Leonid L. Morozc, and Kenneth M. Halanycha
aMolette Biology Laboratory for Environmental and Climate Change Studies, Department of Biological Sciences, Auburn University, Auburn, AL 36849;bSchool of Biological Sciences, University of Queensland, St. Lucia, QLD 4101, Australia; and cDepartment of Neuroscience and McKnight Brain Institute,University of Florida, Gainesville, FL 32610
Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved March 24, 2015 (received for review February 18, 2015)
Elucidating relationships among early animal lineages has beendifficult, and recent phylogenomic analyses place Ctenophora sisterto all other extant animals, contrary to the traditional view of Poriferaas the earliest-branching animal lineage. To date, phylogenetic supportfor either ctenophores or sponges as sister to other animals has beenlimited and inconsistent among studies. Lack of agreement amongphylogenomic analyses using different data and methods obscureshow complex traits, such as epithelia, neurons, andmuscles evolved. Aconsensus view of animal evolution will not be accepted until datasetsand methods converge on a single hypothesis of early metazoanrelationships and putative sources of systematic error (e.g., long-branch attraction, compositional bias, poormodel choice) are assessed.Here, we investigate possible causes of systematic error by expandingtaxon sampling with eight novel transcriptomes, strictly enforcingorthology inference criteria, and progressively examining potentialcauses of systematic error while using both maximum-likelihood withrobust data partitioning and Bayesian inference with a site-heteroge-neous model. We identified ribosomal protein genes as possessinga conflicting signal compared with other genes, which caused somepast studies to infer ctenophores and cnidarians as sister. Importantly,biases resulting from elevated compositional heterogeneity or ele-vated substitution rates are ruled out. Placement of ctenophores assister to all other animals, and sponge monophyly, are strongly sup-ported under multiple analyses, herein.
phylogenomics | Metazoa | Ctenophora | Porifera | Cnidaria
Resolving relationships among extant lineages at the base ofthe metazoan tree is integral to understanding evolution of
complex animal traits, including nervous systems and gastrula-tion. Historically, sponges and placozoans, both of which haverelatively simple body plans and lack neurons, have been con-sidered to diverge from other animals earlier than ctenophores,cnidarians, and bilaterians (1). Phylogenomic studies have resultedin controversial hypotheses placing either Placozoa (Fig. 1A) (2),ctenophores (ctenophore-sister hypothesis) (Fig. 1B) (3–7), or aclade of ctenophores and sponges (Fig. 1C) (6) as sister to allremaining animals. Others (7–10) have claimed nontraditionalfindings resulted from systematic error and argued for traditionalplacement of sponges as sister to all remaining animals (Eumetazoa,or Porifera-sister hypothesis) (Fig. 1D) and a sister relationshipbetween ctenophores and cnidarians (Coelenterata) (Fig. 1D).Limited statistical support for various hypotheses and conflictamong, and even within studies, has undermined confidence inour understanding of early animal evolution. Basal metazoanrelationships must be resolved with greater consistency before aconsensus viewpoint is widely accepted.Long-branch attraction (LBA) (11), which occurs when two
divergent lineages are artificially inferred as related because ofsubstitutional saturation (11), is perhaps the most often evokedexplanation for controversial or spurious phylogenetic results (7–10, 12, 13). Additional sources of systematic error include poortaxon or character sampling (10, 14, 15), large amounts of missingdata (16), and model misspecification (16, 17). Such errors havebeen implicated as influencing the position of ctenophores inmetazoan phylogeny studies (5–8). For example, Ryan et al. (6)
recovered a sister relationship between sponges and ctenophoresin analyses where taxa with high amounts of missing data wereexcluded, and support for this was highest in Bayesian inferencewith the CAT (17) substitution model. However, ctenophoreswere recovered as sister to all other extant animals in maximum-likelihood analyses with greater taxon sampling (Bayesian in-ference never converged for datasets with more than 19 taxa).The CAT model is a site-heterogeneous model that may handleLBA artifacts better than site-homogeneous substitution modelslike GTR (18). Therefore, LBA plausibly influenced the phylo-genetic position of ctenophores in analyses of Ryan et al. (6) thatrecovered ctenophores-sister. In Moroz et al. (5), strong nodalsupport for ctenophores sister to all other extant animals dis-appeared when both the strictest orthology criteria were enforcedand ctenophore taxon sampling increased. Thus, further consid-eration of systematic error influencing phylogeny reconstructionat the base of the animal tree is desirable.The ctenophore-sister hypothesis has challenged our under-
standing of early metazoan evolution, but given conflicting results(2–10), this and other hypotheses must be carefully scrutinized.Ideally, if robust datasets are assembled and causes of systematicerror are accounted for, different datasets and analytical methodswill converge on a single phylogenetic hypothesis (19). However,practical barriers exist in assembling robust datasets free of sys-tematic error. For example, phylogenomic datasets are proneto missing data given the incomplete nature of transcriptome andeven genome sequences, and orthology determination among
Significance
Traditional interpretation of animal phylogeny suggests traits,such as mesoderm, muscles, and neurons, evolved only oncegiven the assumed placement of sponges as sister to all otheranimals. In contrast, placement of ctenophores as the firstbranching animal lineage raises the possibility of multiple originsof many complex traits considered important for animal di-versification and success. We consider sources of potential errorand increase taxon sampling to find a single, statistically robustplacement of ctenophores as our most distant animal relatives,contrary to the traditional understanding of animal phylogeny.Furthermore, ribosomal protein genes are identified as creatingconflict in signal that caused some past studies to recover a sisterrelationship between ctenophores and cnidarians.
Author contributions: N.V.W., K.M.K., L.L.M., and K.M.H. designed research; N.V.W. andK.M.K. performed research; N.V.W. and K.M.K. analyzed data; and N.V.W., K.M.K., L.L.M.,and K.M.H. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open access option.
Data deposition: The sequence reported in this paper has been deposited in the NCBISequence Read Archive, www.ncbi.nlm.nih.gov/sra (accession no. PRJNA278284). Tran-scriptome assemblies, phylogenetic datasets, and an annotation file were deposited tofigshare, figshare.com (doi: 10.6084/m9.figshare.1334306).1To whom correspondence should be addressed. Email: [email protected].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1503453112/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1503453112 PNAS | May 5, 2015 | vol. 112 | no. 18 | 5773–5778
EVOLU
TION
distantly related species can be difficult (20, 21). Computationallimitations of complex phylogenetic methods can also preventusing what may be the best theoretical phylogenetic method.Nevertheless, both data quality and appropriate methods shouldbe emphasized if deep relationships of any organismal group areto be robustly resolved.Here, we have assembled a more comprehensive phyloge-
nomic dataset of metazoan lineages that branched early in ani-mal evolution than previous studies to alleviate taxon samplingconcerns. We have sequenced transcriptomes of eight additionalspecies and used other deeply sequenced publicly available tran-scriptomes (including some not used in past studies). Additionally,we use a number of data-filtering steps to explore the sensitivity ofthese results to potential sources of error. This process includesstrict orthology determination, removal of taxa and genes thatmay cause LBA, and removal of heterogeneous genes that maycause model misspecification. Regardless of how data were fil-tered, all maximum-likelihood analyses with model partitioningand all Bayesian inference analyses using a site-heterogeneousmodel recover ctenophores as sister to all other animals withstrong support. We identify overreliance on ribosomal proteingenes in some datasets (7, 9) as the source of incongruence amongprevious phylogenomic studies. We also find strong support forsponge monophyly in contrast to previous reports (7, 22–24).
ResultsDatasets and Accounting for Biases. Orthology filtering of tran-scriptome and genome data from 76 species resulted in 251orthologous groups (OGs) and 81,008 aligned amino acid sites(Tables S1 and S2). TreSpEx (25) further identified 83 “certain”paralogs (i.e., sequences TreSpEx classified as high-confidenceparalogs) from 10 OGs. TreSpEx also identified 2,684 “un-certain” paralogs (i.e., sequences TreSpEx classified as possibly,
but not definitively, paralogous) from 104 OGs. Datasets withcertain and both certain and uncertain paralogs pruned werestarting points for progressively filtering other causes of system-atic error. Overall, 25 hierarchical datasets that had progressivelyfewer characters, but controlled for more potential causes ofsystematic errors, were assembled (Fig. 2 and Table S2) (alldatasets have been deposited on figshare, doi 10.6084/m9.fig-share.1334306). The percentage of gene occupancy and missingdata ranged from 70–82% and 35–44%, respectively (Table S2).Other than progressive data filtering, differences between data-sets analyzed here and those used in previous studies of basalmetazoan relationships (3–10) are increased character samplingand less missing data compared with some studies and increasednonbilaterian taxon sampling. In contrast to Nosenko et al. (7)and Philippe et al. (9), both of which relied heavily on ribosomalproteins (i.e., 52% and 71%, respectively), our dataset did notcontain a large representation of any one gene class (e.g., only 8of 250 were ribosomal protein genes) (SI Methods).
Ctenophores Sister to Other Extant Animals andMonophyletic Sponges.All maximum-likelihood analyses with outgroups resulted in to-pologies with strong support [≥ 97% bootstrap support (BS)] forctenophores sister to all other extant animal lineages (datasets1–21 in Figs. 2 and 3 and Figs. S1–S4). Importantly, Phylobayes(26) analyses using the CAT-GTR+Γ model also resulted inphylogenies with Ctenophora sister to all other animals with 100%posterior probability (PP), and 100% PP for sponge monophyly(datasets 6 and 16 in Figs. 2 and 3 and Fig. S5 B and C). Inferredrelationships among major sponge lineages (i.e., Demospongiae +Hexactinellida sister to Calcarea + Homoscleromorpha) wereconsistent with morphology (27, 28) and most other molecularanalyses that have recovered monophyletic sponges (datasets 1–27in Figs. 2 and 3 and Figs. S1–S5) (4, 5, 8, 9, 28–30). Alternativehypotheses of basal animal relationships (Fig. 1) were rejected byevery phylogenetic analysis as measured by the approximatelyunbiased (AU) (31) test (P ≤ 0.001) (Table 1). Overall, our resultsoverwhelmingly reject alternative hypotheses to ctenophores sisterto all other extant animals (Table 1).Ribosomal protein genes can have conflicting signal with most
other genes (32). The datasets of Philippe et al. (9) and Nosenkoet al. (7), which recovered cnidarians and ctenophores sister, hadhigh proportions of ribosomal protein genes (67 of 128 and 87 of122 genes, respectively). Nosenko et al. (7) analyzed a datasetwithout ribosomal protein genes and recovered ctenophores sisterto all other animals, but a similar analysis has not been done forthe original Philippe et al. (9) dataset. If certain topologies arerecovered only with the use of a high proportion of one group ofgenes (e.g., ribosomal protein genes), this may indicate a phylo-genetic signal that conflicts with the true evolutionary history. Assuch, we analyzed the Philippe et al. (9) dataset with ribosomalproteins removed (67 of 128) using maximum-likelihood andBayesian inference, and neither reconstruction placed ctenophoresand cnidarians sister as in the original study (Fig. S5 D and E).
Fig. 1. Phylogenetic hypotheses from previous molecular studies. (A) Placozoa-sister hypothesis (2). (B) Ctenophora-sister hypothesis (3–7). Placozoa was notincluded in some studies that found support for this topology. (C) Ctenophora +Porifera-sister hypothesis (6). (D) Traditional Porifera-sister hypothesis (7–10).
Full dataset
“Certain” paralogsremoved
“Uncertain” & “certain”paralogs removed
Taxa with high LB scores removed
Heterogeneousgenes removed
Lowest halfRCFV genes
Taxa with high LBscores removed
Heterogeneousgenes removed
Lowest halfRCFV genes
Genes with high LB scores removed
Genes with high LBscores removed
Outgroups removed
Slowest evolving half
Outgroupsremoved
Taxa with high LB scores removed
Genes with high LBscores removed
Genes with high LBscores removed
Outgroupsremoved
Slowest evolving half
Outgroupsremoved
Taxa with high LB scores removed
1
2
3
4
8
9
5, 22 6 7 10
11, 23
15, 24 16 17 20
21, 25
14 19
13 18
12
Fig. 2. Hierarchy of datasets with progressive datafiltering. Numbers associated with each datasets arereferences throughout the text. Datasets 5, 11, 15,21 have Choanoflagellates as an outgroup, anddatasets 22, 23, 24, 25 have all outgroups removed.RAxML analyses were used for each dataset, andshaded datasets were also analyzed with Phylo-Bayes. Data matrix statistics (e.g., number of sites,percentage of missing data, and so forth) for eachdataset can be found in Table S2.
5774 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al.
The maximum-likelihood analysis recovered strong support forctenophores as sister to all other metazoan lineages (BS = 93)(Fig. S5D). However, Bayesian inference (Fig. S5E) recoveredsponges as sister to all other metazoans, but support for this andother deep nodes were low (PP ≤ 90).
Systematic Biases and Their Effect on Phylogenetic Inference. Long-branch (LB) scores (28), a measurement for identifying taxa andOGs that could cause LBA, were calculated for each species andOG with TreSpEx (25). In total, we identified six “long-branched”taxa, all nonmetazoans (Fig. S6A and Table S2), and 28 OGswith high LB scores compared with other OGs (Fig. S6 B and C).We found complete congruence in relationships among basalmetazoan phyla in trees inferred with (datasets 1, 2, 8, 12, and 18in Fig. 2) and without (datasets 3–7, 9–11, 13–17, 19–21, and 22–25 in Fig. 2) taxa and genes that had high LB scores, and nodalsupport for critical nodes showed little variation among analyses(Fig. 3 and Figs. S1–S5). Removing OGs with high amino acidcompositional heterogeneity (datasets 7–11, 17–21, 23, and 25 inFig. 2) also had no effect on branching order (Fig. 3 and Figs. S2A–E, S3 E and F, S4 A–E, and S5A). Topologies inferred withonly the slowest evolving half of OGs assembled here (datasets 6and 16 in Fig. 2) (i.e., least saturated and least prone to homo-plasy; see Fig. S7 for saturation plots) recovered high support forctenophores sister to all other animals and sponge monophylywith both maximum-likelihood (BS = 100) (Fig. 3 and Figs. S1Fand S3D) and Bayesian inference using the CAT-GTR model(PP = 1) (Fig. 3 and Fig. S5 B and C). Importantly, our datasetsof the slowest evolving half of OGs were of a broad range of
protein classes (SI Methods; figshare), rather than consisting of amajority of ribosomal proteins (7, 9).Inaccurate orthology assignment can also introduce systematic
error into phylogenomic analyses. Although relationships amongbasal lineages were unaffected, removal of paralogs as identifiedby TreSpEx appeared to have the greatest effect on support forsome critical nodes. For example, most topologies with bothcertain and uncertain paralogs removed had strong support forsponge monophyly (i.e., ≥ 95% BS) (datasets 12–14 and 18–20 inFigs. 2 and 3 and Figs. S2F, S3 A, B, and F, and S4 A and B), butfour analyses with only certain paralogs removed recovered lowsupport (< 90% BS) for sponge monophyly (datasets 5, 7, 9, and10 in Figs. 2 and 3 and Figs. S1E and S2 A, D, and E).Because outgroup sampling has the potential to influence
rooting of the animal tree, we explored outgroup sampling aswell. When all outgroups except two choanoflagellates were re-moved (datasets 5, 11, 15, and 21 in Fig. 2), inferred nonbilaterianrelationships were identical as in analyses we performed with fulloutgroup sampling (datasets 5, 11, 15, and 21 in Figs. 2 and 3 andFigs. S1E, S2E, S3C, and S4C), but support for sponge mono-phyly decreased. In these analyses the leaf-stability indices forhomoscleromorph and calcareous sponges were less than 0.94,but in all other analyses they were greater than 0.97 (Fig. S5 F andG). Regardless, when choanoflagellates were the only outgroup,ctenophores were still recovered as the deepest split within theanimal tree with 100% BS support. Analyses with all outgrouptaxa removed (datasets 22–25 in Fig. 2) recovered identical re-lationships among major metazoan lineages as other analyses(Figs. S4 D–F and S5A). However, we observed low support forrelationships among ctenophores, sponges, and placozoans inthese analyses. This resulted from the long placozoan branchbeing attracted to ctenophores in the absence of outgroup taxa asindicated by bootstrap tree topologies and leaf-stability index forTrichoplax of less than 0.92, whereas leaf-stability indices weregreater than 0.99 in all other analyses (Fig. S5 F and G).
DiscussionPlacement of Ctenophores Sister to all Remaining Animals Is NotSensitive to Systematic Errors. Every analysis conducted hereinstrongly supported the ctenophore-sister hypothesis (Fig. 3 andTable 1). A major hurdle to wide acceptance of ctenophores assister to other animals has been that different analyses haveyielded conflicting hypotheses of early animal phylogeny (2–9).Sensitivity to the selected model of molecular evolution has beenespecially problematic (2–9). In contrast, both maximum-likeli-hood analyses using data partitioning and Bayesian analysesusing the CAT-GTR model of our datasets resulted in identicalbranching patterns among ctenophores, sponges, placozoans,cnidarians, and bilaterians. Past critiques of studies that foundctenophores to be sister to all other animals have emphasized theCAT model as the most appropriate model for deep phyloge-nomics because it is an infinite mixture model that accounts forsite-heterogeneity (7, 8, 29). Notably, when the CAT-GTR modelwas used here (datasets 6 and 16 in Fig. 2), we recovered cteno-phores-sister to all other metazoans (Fig. 3 and Fig. S5 B and C).The argument for LBA (7–10) or saturated datasets (7, 8) as the
reason past studies found ctenophores to be sister to all other an-imals seems to have been overstated. The recovered position ofctenophores was identical in analyses with (datasets 1, 2, 8, 12, and18 in Fig. 2 and Figs. S1 A and B, S2 B and F, and S3F) and without(datasets 3–7, 9–11, 13–17, and 19–25 in Fig. 2, and Figs. S1 C–F, S2A and C–E, S3 A–E, S4, and S5 A–C) taxa and genes with high LBscores, and analyses with the slowest evolving genes (datasets 6 and16 in Fig. 2 and Fig. S7) also recovered ctenophores sister to allother animals (Fig. 3 and Figs. S1F, S3D, and S5 B and C). Fur-thermore, despite the long internal branch leading to the cteno-phore clade, the position of this lineage did not change in anyanalysis including those when outgroups were removed (datasets 5,11, 15, 21, and 22–25 in Fig. 2 and Figs. S1E, S2E, S3C, and S4 C–F). If this branch was being artificially attracted toward outgroups,then employment of different outgroup schemes would be expected
Aurelia aurita
Hormathia digitata
Rossella fibulata
Beroe abyssicola
Drosophila melanogaster
Craseoa lathetica
Petromyzon marinus
Hydra oligactis
Spongilla alba
Strongylocentrotus purpuratus
Abylopsis tetragona
Ministeria vibrans
Danio rerio
Pleurobrachia bachei
Lottia gigantea
Eunicella verrucosa
Hydra viridissima
Tethya wilhelma
Mortierella verticillata
Nanomia bijuga
Sycon coactum
Oscarella carmela
Vallicula sp.
Spizellomyces punctatus
Mnemiopsis leidyi
Aiptasia pallida
Allomyces macrogynus
Mertensiidae sp.
Petrosia ficiformisPseudospongosorites suberitoides
Rhizopus oryzae
Amoebidium parasiticum
Capitella teleta
Crella elegans
Monosiga ovata
Acropora digitifera
Aphrocallistes vastus
Bolinopsis infundibulum
Capsaspora owczarzakiSalpingoeca rosetta
Tubulanus polymorphus
Agalma elegans
Corticium candelabrum
Hyalonema populiferum
Hydra vulgaris
Euplokamis dunlapae
Lithobius forficatus
Bolocera tuediae
Pleurobrachia sp.
Amphimedon queenslandica
Hemithiris psittacea
Ircinia fasciculata
Dryodora glandiformis
Kirkpatrickia variolosa
Sphaeroforma arctica
Daphnia pulex
Homo sapiens
Platygyra carnosa
Ephydatia muelleriChondrilla nucula
Saccoglossus kowalevskii
Periphylla periphylla
Trichoplax adhaerens
Sympagella nux
Latrunculia apicalis
Coeloplana astericola
Physalia physalis
Sycon ciliatum
Nematostella vectensis
Priapulus caudatus
96
95
61
9965
94
79
0.2
Beroe abyssicolaa
Pleurobrachia bacheiVallicula sp.
Pl
Mnemiopsis leidyii
Mertensiidae sp.M i t
Bolinopsis infundibulumM t iid
Euplokamis dunlapae
Pleurobrachia ssp.Dryodora glandifoormisB b i lb
p
Coeloplana asterricolaV lli l
pp
99
Rosse
Spongilla albaTethya wilhelma
S
Sycon coactum
Oscarella carmela
Petrosia ficiformisPseudospongosorites su
Crella elegans
Aphroc
Corticium candelabrum
H
Amphimedon queenAi fi if i
p
Ircinia fasciculata
Kirkpatrickia variolosagg
Ephydatia muellerip gp g
Chondrilla nuculayy
p
SympaA hh
Latrunculia apicalisC ll l
Sycon ciliatumll l
96
61
y gTrichoplax adhaerens
Aurelia aurita
Hormathia digitata
Craseoa lathetica
Hydra oligactis
Abylopsis tetragonaAl th ti
Eunicella verrucosa
Hydra viridissima
Nanomia bijuga
Aiptasia pallidaH thi di ithi
Acropora digitifera
Agalma elegans
Hydra vulgarisH d li tiH d li
yy
Bolocera tuediaegg
Platygyr
Periphylla periphyllaAb l ib l
Physalia physalisH d i idi id i idi
Nematostella vectensis
94
ella fifibulata
uberittoides
callisttes vastus
Hyaloonema populiferum
nslanndica
agellaa nuxlli tl t
raa carnosa
Fig. 3. Reconstructed maximum-likelihood topology of metazoan relation-ships inferred with dataset 10. Maximum likelihood and Bayesian topologiesinferred with other datasets (Fig. 2) have identical basal branching patterns(Figs. S1–S5). Nodes are supported with 100% bootstrap support unless oth-erwise noted. Support, as inferred from each dataset (Fig. 2), for nodes cov-ered by black boxes are in Table 1.
Whelan et al. PNAS | May 5, 2015 | vol. 112 | no. 18 | 5775
EVOLU
TION
to result in different ctenophore placement. Maximum-likelihoodand Bayesian inference using the CAT-GTR model of the leastsaturated datasets (datasets 6 and 16 in Fig. 2 and Fig. S7) re-covered identical basal relationships as our other analyses (Fig. 3and Figs. S1F, S3D, and S5 B and C), also indicating homoplasy andmodel choice did not bias results. Given the consistency among ouranalyses that were designed to have different levels of potentialbiases, we conclude that the ctenophore-sister hypothesis is robustto systematic errors.Rather than focusing on long branches, fast evolving genes, or
model misspecification as influencing the position of cteno-phores, the individual genes underlying datasets that resulted in
a sister relationship between ctenophores and cnidarians (7, 9)should be the focus of identifying problems with phylogeneticreconstruction. A benefit of phylogenomic datasets is that mul-tiple gene classes and many parts of the genome are analyzed. Assuch, phylogenomic datasets should not rely too heavily on asingle gene class. Past molecular studies that found support forCoelenterata and the Porifera-sister (7, 9) hypotheses appear tohave been strongly affected by a disproportionate reliance (i.e., >50%) on ribosomal protein genes. Nosenko et al. (7) and Philipeet al. (9) found support for ctenophores sister to cnidarians, butNosenko et al. (7) recovered ctenophores sister to all other ani-mals when ribosomal proteins were excluded. Ribosomal protein
Table 1. AU test P values for alternative hypotheses of animal relationships and support for ctenophora-sister and sponge monophyly
Dataset (no.) Porifera-sister Coelenterata Placozoa-sister Porifera + Ctenophora Ctenophora-sisterPorifera
monophyly
Full dataset (1) 2.E-04 5.E-06 1.E-76 2.E-05 100 BS 95 BS“Certain” paralogs
removed (2)2.E-04 1.E-05 3.E-03 5.E-79 100 BS 90 BS
Taxa with high LB scoresremoved (3)
6.E-06 2.E-06 1.E-62 2.E-07 100 BS 94 BS
Genes with high LB scoresremoved (4)
1.E-02 2.E-04 2.E-32 3.E-47 100 BS 94 BS
Choanoflagellate-onlyoutgroup (5)
4.E-45 1.E-04 1.E-05 6.E-104 100 BS 68 BS
All outgroups removed (22) N/A 3.E-55 N/A N/A N/A 84 BSSlowest evolving half
of genes (6)2.E-78 2.E-11 2.E-04 1.E-05 100 BS/100 PP 95 BS/100 PP
Genes with lowest half ofRCFV values (7)
3.E-06 2.E-57 9.E-47 2.E-03 100 BS 53 BS
Heterogeneous genesremoved (8)
7.E-52 2.E-49 9.E-05 3.E-05 100 BS 90 BS
Genes with high LB scoresremoved (9)
9.E-41 9.E-05 2.E-04 2.E-66 99 BS 90 BS
Taxa with high LB scoresremoved (10)
7.E-22 5.E-61 9.E-06 7.E-06 100 BS 88 BS
Choanoflagellate-onlyoutgroup (11)
1.E-03 3.E-36 1.E-02 6.E-104 92 BS 82 BS
All outgroups removed (23) N/A 2.E-04 N/A N/A N/A 92 BS“Certain” and “uncertain”
paralogs (12)6.E-39 1.E-05 4.E-44 4.E-05 100 BS 99 BS
Taxa with high LB scoresremoved (13)
6.E-05 8.E-06 6.E-30 8.E-09 100 BS 99 BS
Genes with high LB scoresremoved (14)
8.E-105 9.E-05 1.E-45 5.E-102 99 BS 97 BS
Choanoflagellate-onlyoutgroup (15)
5.E-59 5.E-39 7.E-05 1.E-72 100 BS 66 BS
All outgroups removed (24) N/A 2.E-07 N/A N/A N/A 92 BSSlowest evolving half of
genes (16)1.E-04 1.E-52 1.E-11 1.E-07 100 BS/100 PP 94 BS/100 PP
Genes with lowest half ofRCFV values (17)
5.E-09 1.E-68 5.E-07 3.E-72 100 BS 93 BS
Heterogeneous genesremoved (18)
7.E-07 4.E-10 6.E-77 8.E-09 99 BS 100 BS
Genes with high LB scoresremoved (19)
2.E-46 1.E-08 2.E-04 1.E-57 100 BS 96 BS
Taxa with high LB scoresremoved (20)
1.E-03 3.E-92 1.E-66 3.E-113 100 BS 100 BS
Choanoflagellate-onlyoutgroup (21)
2.E-61 1.E-31 2.E-04 4.E-40 100 BS 77 BS
All outgroups removed (25) N/A 9.E-05 N/A N/A N/A 96 BSPhilippe et al. (7) Maximum
likelihood0.001 0.010 0.008 0.075 93 BS 42 BS
Philippe et al. (7) Bayesianinference
— — — — N/A 99 PP
Dataset numbers are as in Fig. 2.
5776 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al.
datasets from these studies are less saturated than datasets as-sembled here based on linear regression of patristic distanceversus uncorrected genetic distance (Fig. S7) (7–9, 25). This lowermutational saturation has been the primary rational for empha-sizing ribosomal genes when inferring deep animal relationship(7). However, standard measurements of sequence saturation (7,8, 25) average across the length of the sequence. Thus, a sequencewith a few variable, highly saturated sites may appear less satu-rated than a sequence with numerous variable sites but less sat-uration per site. Furthermore, extremely low mutation rates canindicate selection and result in too little phylogenetic information,both of which could lead to the inference of incorrect relation-ships (33). Our maximum-likelihood analysis of Philipe et al.’s (9)dataset with ribosomal genes removed recovered support forctenophores as sister to all other animals (Fig. S5D). Basal re-lationships were poorly resolved in the Bayesian analysis, whichmay be a result of too few characters, but ctenophores and cni-darians were not recovered as sister (Fig. S5E). Notably, a studyfocused only on myxozoan cnidarians (34) used the same matrixas Philippe et al. (9), but added two highly divergent cnidariansand recovered ctenophores sister to all other animals. Ribosomalprotein genes have previously been identified as a potentialsource of phylogenetic error (32), and the above indicates thatctenophores sister to cnidarians as in Philippe et al. (9) wascaused by either limited cnidarian taxon sampling, misleadingsignal in ribosomal genes, or both. More work is needed to assesssaturation, possible convergent evolution, and selective pressuresof ribosomal proteins in ctenophores, sponges, placozoans, andcnidarians. However, differences in topologies when ribosomalproteins are included or excluded strongly imply a misleadingsignal in ribosomal protein genes. Put simply, it appears highlyimprobable that all genes other than ribosomal protein genescould be recovering an incorrect phylogeny.
Sponges Are Monophyletic. Sponge monophyly, although less con-troversial than the phylogenetic positions of ctenophores, cnidar-ians, and placozoans remains an important question as severalstudies have supported sponge paraphyly (7, 22–24). In regards toinferring the characteristics of the metazoan ancestor, spongeparaphyly, coupled with sponges being at the base of the metazoanphylogeny, is an attractive hypothesis that implies the metazoanancestor was sponge-like. However, sponge monophyly was re-covered in all of our analyses and best supported when the strictestorthology criteria were applied with TreSpEx (Fig. 3 and Table 1),which also removes sequences resulting from sample contamina-tion (e.g., endosymbionts).This observation suggests that spuriousparalogs or sequence contamination may have been a source oferror when sponges were found paraphyletic, but datasets thathave recovered sponge paraphyly were also much smaller thanthose analyzed here (e.g., refs. 7, 22–24). Sponge monophyly andthe well-supported ctenophore-sister hypothesis complicates in-ferring the ancestral condition of metazoan and other majormetazoan groups (e.g., Placozoa + Cnidaria + Bilateria) becausemany sponge characteristics are likely apomorphic traits. Ourrobust support of sponge monophyly agrees with morphology(35) and most other large molecular datasets (5, 6, 8–10).
ConclusionsFor more than a century, sponges were traditionally consideredsister to all other extant metazoans because unlike ctenophores,cnidarians, and bilaterians, they lack true tissues and body sym-metry (36, 37). Sponges also possess choanocyte cells that aresimilar in morphology to choanoflagellates, the sister group tometazoans (36). However, Mah et al. (38) found that homologybetween choanoflagellates and sponge choanocytes is not as de-finitive as previously assumed. Similarly, some authors have arguedthat Placozoans are sister to all other animals because they lackneural and muscular systems and also share similarities in mito-chondrial genome size with choanoflagellaes (2, 39). A commontheme of these two hypotheses is the placement of morphologicallysimple animals near the base of the animal tree, but complexity is
not a good proxy for metazoan evolution (40). Challenges tolong-held viewpoints of morphological complexity and assumedimprobability of convergent evolution must not be dismissedsimply because they seem unlikely at face value, especially con-sidering a growing body of evidence that supports convergentevolution of many animal traits including neurons (5, 6, 41–43).Furthermore, the Porifera-sister hypothesis lacks critical evalu-ation and homology of many characters in these taxa still needthorough analysis (44). Overall, findings presented here robustlysupport ctenophores as sister to all remaining animals.
MethodsTaxon and Character Sampling. Taxon sampling included previously availabledata and eight new transcriptomes from two choanoflagellates, three glasssponges (Hexactinellida), two demosponges, and a deep-sea cnidarian (Scy-phozoa) (Table S1). Briefly, RNA was extracted, reverse-transcribed, and am-plified using the SMART kit (Clontech), and sequenced on an Illumina HiSeq(SI Methods). Raw or assembled transcriptome data for 68 additional specieswere retrieved from public databases (Table S1).
Raw Illumina transcriptome data were digitally normalized using nor-malize-by-median.py (45) with a k-mer size of 20, a desired coverage of 30,and four hash tables with a lower bound of 2.5 × 109. Normalized Illuminareads were assembled using default parameters in Trinity v20131110 (46).Raw 454 transcriptome data were assembled with Newbler (47).
Orthology Determination and Data Filtering. Putative orthologs were de-termined for each species using HaMStR v13.2 (48) using the model organismcore ortholog set. OGs, determined by HaMStR, were further processed us-ing a custom pipeline that filtered OGs with too much missing data (i.e., OGswith less than 37 species), aligned sequences, and filtered potential paralogs(https://github.com/kmkocot) (SI Methods).
TreSpEx (25), which requires individual gene trees for each OG, was usedto identify putative paralogs and exogenous contamination missed by ourinitial orthology inference approach. Gene trees were inferred with RAxMLv.8.0.2 (49) with 100 rapid bootstrap replicates followed by a full maximum-likelihood inference; each tree was inferred with the LG+Γmodel, which wasby far the most common best-fitting model when the complete dataset waspartitioned (see below). Paralogs in the initial dataset were detected withTreSpEx using the automated BLAST method and the prepackaged Capitellateleta and Helobdella robusta BLAST databases. This method identified twoclasses of paralogs: “certain” or sequences that are high-confidence paralogs,and “uncertain” or sequences that are potential paralogs. From the initial,251 gene dataset, we created one dataset by removing certain paralogs andanother dataset with both certain and uncertain paralogs pruned (Fig. 2);after pruning, OGs with fewer than 37 taxa were removed. LB scores (25)were calculated for each taxon and OG with TreSpEx. Following Struck (25),these values were plotted in R (Fig. S6) (50), and outliers were identified astaxa or genes that could cause LBA artifacts. After removal of taxa and genes,each OG was ranked by evolutionary rate with a custom python script (https://github.com/nathanwhelan) following Telford et al. (51). Datasets with onlythe slowest half of remaining genes were then generated to assess if fastevolving homoplasious genes were biasing inferences (Fig. 2).
We used BaCoCa (52) and two metrics (χ2-test of heterogeneity and rel-ative composition frequency variability; RCFV) (53) to identify genes withamino acid compositional heterogeneity. Some datasets were further fil-tered by removing non-choanoflagellate outgroups and all outgroup taxa todetermine if outgroup choice affects inferred relationships. Saturation ofeach filtered dataset with full outgroup sampling was explored with TreSpExand plotted in R (Fig. S7) to provide a further metric to compare datasets.
Phylogenetics. In addition to removing compositionally heterogeneous genesfrom some datasets, two approaches were used to handle site-heterogeneity:(i) partitioning schemes for each dataset and associated protein substitutionmodels were determined using the relaxed clustering method in PartitonFinder(54) with 20% clustering and the corrected Akaike information criterion; (ii) asite-heterogeneous mixture model, CAT-GTR+Γ, was used in PhyloBayes (26).Maximum-likelihood topologies were inferred with RAxML using partitions asindicated by ParitionFinder, associated best-fit substitution models, and thegamma parameter to model rate-heterogeneity. Nodal support was measuredwith 100 fast bootstrap replicates. Phylobayes analyses were run with two chainsuntil themaxdiff statistic between chains was below 0.3 as measured by bpcomp(26). Convergence was also assessed with tracecomp (26) to ensure each pa-rameter had a maximum discrepancy between chains of less than 0.3 and aneffective sample size of at least 50. Computational demands and convergence
Whelan et al. PNAS | May 5, 2015 | vol. 112 | no. 18 | 5777
EVOLU
TION
issues prevented us from using the CAT-GTR+Γ model for most datasets.Therefore, Bayesian phylogenies are only reported for the two analyses of theslowest evolving half of OGs. Leaf-stability indices (55) for each taxon weremeasured in PhyUtility (56) to identify potentially unstable taxa in each dataset.
Maximum-likelihood and Bayesian inference trees were also inferred fromthe Philippe et al. (9) dataset with ribosomal proteins removed (i.e., 67 of 128genes) to determine if a single gene class-biased phylogenetic inference.Ribosomal proteins were filtered from the original dataset following datamatrix annotations (9) and the matrix was split into individual genes formodel testing using a custom R script (https://github.com/nathanwhelan).
The AU test (31) was used to determine if a priori hypotheses of basalmetazoan relationships could be rejected (Fig. 1). Topological constraintswere enforced in RAxML and the most likely tree given this constraint was
inferred with the same partitioning scheme and models used for unconstrainedphylogenetic inference. Per site log-likelihoods for trees were calculated in RAxMLand AU tests were performed in Consel (57).
ACKNOWLEDGMENTS. We thank members of the Molette Biology Labora-tory for Environmental and Climate Change Studies at Auburn University forhelp with bioinformatics and data collection, especially Damien Waits. Thiswork was made possible in part by a grant of high-performance computingresources and technical support from the Alabama Supercomputer Authorityand was supported by the US National Aeronautics and Space Administra-tion (Grant NASA-NNX13AJ31G) and in part by National Science Foundation(Grant 1146575). This is Molette Biology Laboratory Contribution 36 andAuburn University Marine Biology Program Contribution 128.
1. DohrmannM, Wörheide G (2013) Novel scenarios of early animal evolution—Is it timeto rewrite textbooks? Integr Comp Biol 53(3):503–511.
2. Dellaporta SL, et al. (2006) Mitochondrial genome of Trichoplax adhaerens supportsplacozoa as the basal lower metazoan phylum. Proc Natl Acad Sci USA 103(23):8751–8756.
3. Dunn CW, et al. (2008) Broad phylogenomic sampling improves resolution of theanimal tree of life. Nature 452(7188):745–749.
4. Hejnol A, et al. (2009) Assessing the root of bilaterian animals with scalable phylo-genomic models. Proc Biol Sci 276(1802):4261–4270.
5. Moroz LL, et al. (2014) The ctenophore genome and the evolutionary origins of neuralsystems. Nature 510(7503):109–114.
6. Ryan JF, et al.; NISC Comparative Sequencing Program (2013) The genome of thectenophore Mnemiopsis leidyi and its implications for cell type evolution. Science342(6164):1242592.
7. Nosenko T, et al. (2013) Deep metazoan phylogeny: When different genes tell dif-ferent stories. Mol Phylogenet Evol 67(1):223–233.
8. Philippe H, et al. (2011) Resolving difficult phylogenetic questions: Why more se-quences are not enough. PLoS Biol 9(3):e1000602.
9. Philippe H, et al. (2009) Phylogenomics revives traditional views on deep animal re-lationships. Curr Biol 19(8):706–712.
10. Pick KS, et al. (2010) Improved phylogenomic taxon sampling noticeably affectsnonbilaterian relationships. Mol Biol Evol 27(9):1983–1987.
11. Felsenstein J (1978) Cases in which parsimony and compatability methods will bepositively misleading. Syst Zool 27(4):401–410.
12. Boussau B, et al. (2014) Strepsiptera, phylogenomics and the long branch attractionproblem. PLoS ONE 9(10):e107709.
13. Straub SCK, et al. (2014) Phylogenetic signal detection from an ancient rapid radia-tion: Effects of noise reduction, long-branch attraction, and model selection in crownclade Apocynaceae. Mol Phylogenet Evol 80:169–185.
14. Heath TA, Hedtke SM, Hillis DM (2008) Taxon sampling and the accuracy of phylo-genetic analyses. J Syst Evol 46(3):239–257.
15. Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: The beginning ofincongruence? Trends Genet 22(4):225–231.
16. Roure B, Baurain D, Philippe H (2013) Impact of missing data on phylogenies inferredfrom empirical phylogenomic data sets. Mol Biol Evol 30(1):197–214.
17. Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attractionartefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol7(Suppl 1):S4.
18. Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNAsequences. Lect Math Life Sci 17:57–86.
19. Philippe H, Delsuc F, Brinkmann H, Lartillot N (2005) Phylogenomics. Annu Rev EcolEvol Syst 36:541–562.
20. Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of ortho-logs inference projects and methods. PLOS Comput Biol 5(1):e1000262.
21. Gabaldón T (2008) Large-scale assignment of orthology: Back to phylogenetics? Ge-nome Biol 9(10):235.
22. Borchiellini C, et al. (2001) Sponge paraphyly and the origin of Metazoa. J Evol Biol14(1):171–179.
23. Sperling EA, Pisani D, Peterson KJ (2007) Poriferan paraphyly and its implications forPrecambrian paleobiology. Geol Soc Lond Spec Publ 286:355–368.
24. Sperling EA, Peterson KJ, Pisani D (2009) Phylogenetic-signal dissection of nuclearhousekeeping genes supports the paraphyly of sponges and the monophyly of Eu-metazoa. Mol Biol Evol 26(10):2261–2274.
25. Struck TH (2014) TreSpEx-detection of misleading signal in phylogenetic reconstruc-tions based on tree information. Evol Bioinform Online 10:51–67.
26. Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: Phylogenetic re-construction with infinite mixtures of profiles in a parallel environment. Syst Biol62(4):611–615.
27. van Soest RWM (1984) Deficient Merlia normani Kirkpatrick, 1908, from the Curacaoreefs, with a discussion on the phylogenetic interpretation of sclerosponges. ContribZool 54(2):211–219.
28. Gazave E, et al. (2012) No longer Demospongiae: Homoscleromorpha formal nomi-nation as a fourth class of Porifera. Hydrobiologia 687(1):3–10.
29. Dohrmann M, Janussen D, Reitner J, Collins AG, Wörheide G (2008) Phylogeny andevolution of glass sponges (Porifera, Hexactinellida). Syst Biol 57(3):388–405.
30. Voigt O, Adamski M, Sluzek K, Adamska M (2014) Calcareous sponge genomes revealcomplex evolution of α-carbonic anhydrases and two key biomineralization enzymes.BMC Evol Biol 14(1):230.
31. Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection.Syst Biol 51(3):492–508.
32. Bleidorn C, et al. (2009) On the phylogenetic position of Myzostomida: Can 77 genesget it wrong? BMC Evol Biol 9(1):150.
33. Edwards SV (2009) Natural selection and phylogenetic analysis. Proc Natl Acad Sci USA106(22):8799–8800.
34. Nesnidal MP, Helmkampf M, Bruchhaus I, El-Matbouli M, Hausdorf B (2013) Agent ofwhirling disease meets orphan worm: phylogenomic analyses firmly place Myxozoa inCnidaria. PLoS ONE 8(1):e54576.
35. Ax P (1996) Multicellular Animals: A New Approach to the Phylogenetic Order inNature (Springer, Berlin).
36. Nielsen C (2008) Six major steps in animal evolution: Are we derived sponge larvae?Evol Dev 10(2):241–257.
37. Srivastava M, et al. (2010) The Amphimedon queenslandica genome and the evolu-tion of animal complexity. Nature 466(7307):720–726.
38. Mah JL, Christensen-Dalsgaard KK, Leys SP (2014) Choanoflagellate and choanocytecollar-flagellar systems and the assumption of homology. Evol Dev 16(1):25–37.
39. Osigus H-J, Eitel M, Bernt M, Donath A, Schierwater B (2013) Mitogenomics at thebase of Metazoa. Mol Phylogenet Evol 69(2):339–351.
40. Halanych KM (1999) Metazoan phylogeny and the shifting comparative framework.Recent Developments in Comparative Endocrinology andNeurobiology, eds Roubos EW,Wendelaar-Bonga SE, Vaudry H, De Loof A (Shaker, Maastrict, The Netherlands), pp 3–7.
41. Dunn CW, Giribet G, Edgecombe GD, Hejnol A (2014) Animal phylogeny and itsevolutionary implications. Annu Rev Ecol Evol Syst 45:371–395.
42. Liebeskind BJ, Hillis DM, Zakon HH (2015) Convergence of ion channel genomecontent in early animal evolution. Proc Natl Acad Sci USA 112(8):E846–E851.
43. Moroz LL (2015) Convergent evolution of neural systems in ctenophores. J Exp Biol218(Pt 4):598–611.
44. Halanych KM (2015) The ctenophore lineage is older than sponges? That cannot beright! Or can it? J Exp Biol 218(Pt 4):592–597.
45. Brown T, Howe C, Zhang A, Pyrkosz Q, Brom AB (2012) A reference-free algorithm forcomputational normalization of shotgun sequencing data. arXiv:1203.4802.
46. Haas BJ, et al. (2013) De novo transcript sequence reconstruction from RNA-seq usingthe Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512.
47. Margulies M, et al. (2005) Genome sequencing in microfabricated high-density pi-colitre reactors. Nature 437(7057):376–380.
48. Ebersberger I, Strauss S, von Haeseler A (2009) HaMStR: Profile hidden markov modelbased search for orthologs in ESTs. BMC Evol Biol 9:157.
49. Stamatakis A (2014) RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313.
50. R Core Development Team (2014) R: A Language and Environment for StatisticalComputing (R Foundation for Statistical Computing, Vienna, Austria).
51. Telford MJ, et al. (2013) Phylogenomic analysis of echinoderm class relationshipssupports Asterozoa. Proc Biol Sci 281(1786):20140479.
52. Kück P, Struck TH (2014) BaCoCa—A heuristic software tool for the parallel assess-ment of sequence biases in hundreds of gene and taxon partitions. Mol PhylogenetEvol 70:94–98.
53. Zhong M, et al. (2011) Detecting the symplesiomorphy trap: A multigene phyloge-netic analysis of terebelliform annelids. BMC Evol Biol 11(1):369.
54. Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A (2014) Selecting optimal par-titioning schemes for phylogenomic datasets. BMC Evol Biol 14:82.
55. Thorley JL, Wilkinson M (1999) Testing the phylogenetic stability of early tetrapods.J Theor Biol 200(3):343–344.
56. Smith SA, Dunn CW (2008) Phyutility: A phyloinformatics tool for trees, alignmentsand molecular data. Bioinformatics 24(5):715–716.
57. Shimodaira H, Hasegawa M (2001) CONSEL: For assessing the confidence of phylo-genetic tree selection. Bioinformatics 17(12):1246–1247.
5778 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al.
Supporting InformationWhelan et al. 10.1073/pnas.1503453112SI MethodsTranscriptome Sequencing. Frozen aliquots of Acanthoeca specta-bilis (ATCC PRA-103) and Salpingoeca pyxidium (ATCC 50929)cultures were purchased from ATCC and a 10-μL aliquot of eachculture was used for RNA extraction with the RNAqueous MicroRNA extraction kit (Life Technologies). Notably, Acanthoecaspectabilis was grown in a xenic culture with the bacteria Klebsiellapneumoniae (ATCC 700831) and Enterobacter aerogenes (ATCC13048) and Salpingoeca pyxidium was grown in a xenic culturewith Klebsiella pneumoniae (ATCC 700831). Tissue cuttings weretaken from the apical portion of the sponges Latrunculia apicalis,Kirpatrickia variolosa, Hyalonema populiferum, and Rossella fibulata.One “orb” of the unusual hexactinellid Sympagella nux was takenfor RNA extractions. A tissue clip was taken from the margin ofthe bell of the deep-sea cnidarian Periphylla periphylla. RNAwas extracted from all sponges and P. periphylla with TRIzol(invitrogen). The Qiagen RNeasy kit and on-column DNase di-gestion was used to purify TRIzol extracted RNA. cDNA librariesof all eight taxa were constructed with the SMART cDNA libraryconstruction kit (Clontech Laboratories). cDNA library constructionfollowed manufacturer’s instructions except that the provided 3′ oligowas replaced with the Cap-Trsa-CV oligo following Meyer et al. (1).The Advantage 2 PCR system (Clontech) was used to amplify full-length cDNA using a minimum number of PCR cycles (∼17–21).Amplified cDNA was shipped to Hudson Alpha Institute for Bio-technology in Huntsville, AL for Illumina library preparation and 2 ×100-base pair (A. spectabilis, S. pyxidium, K. variolosa, L. apicalis) or2 × 150-base pair (H. populiferum, R. fibulata, S. nux, P. periphylla)paired-end sequencing on an Illumina HiSEq. 2500 using ∼one-eighth lane per taxon.
Initial Orthology Inference Procedure. Before any filtering steps, allsequences were backed up and new line characters were removedfrom interleaved sequences with nentferner.pl (https://github.com/mptrsen/HaMStRad). All sequences shorter than 50 bpwere then removed. Any gene with fewer than 37 taxa (∼47%)
was also removed from the dataset to minimize missing data.HaMStR sometimes pulls two identical sequences from a singletaxon for any given gene so the script uniqHaplo.pl (raven.iab.alaska.edu/∼ntakebay/teaching/programming/perl-scripts/uniqHaplo.pl) wasused to remove duplicate sequences. As an initial data-filteringstep before sequence alignment, if an ambiguous amino acid (i.e.,an X) was present in the first 20 or last 20 amino acids then the 5′ or3′ end, respectively, was trimmed to this X as it may indicate se-quencing errors.Sequences were aligned using MAFFT (2) with the “auto” and
“localpair” parameters and 1,000 maximum iterations. MAFFToutputs sequences in an interleaved format so nentferner.pl wasagain used to remove newline characters from sequences. Un-alignable regions were then removed with aliscore (3) and alicut(4). Alignment columns with only gaps were subsequently re-moved, and any gene with an alignment less than 50-bp longafter trimming was removed. For each gene, a custom javascript(https://github.com/kmkocot) then removed any sequence thatdid not overlap other sequences by at least 20 amino acids tominimize the number of end gaps in alignments that would likelynot add to information content in phylogenetic inference. Fi-nally, any gene that had fewer than 37 taxa after alignment andtrimming was removed to further minimize missing data.The last step removed putative paralogs with PhyloTreePrunner
(5), which is a tree-based method. Gene trees for each OG wereinferred using FastTreeMP (6) with the “slow” and “gamma”parameters. Gene trees and corresponding OGs were input intoPhyloTreePrunner and inferred paralogs were removed fromeach OG. Alignments of each OG were concatenated usingFASconCAT v1.0 (7). An automated wrapper script for thisprocedure is available from https://github.com/kmkocot.
OG Annotation. To ensure no one gene class dominated ourmatrices, OGs were annotated with PFAM domains (8) using theTrinotate pipeline (trinotate.github.io). An annotation spread-sheet was placed on figshare (doi: 10.6084/m9.figshare.1334306)
1. Meyer E, et al. (2009) Sequencing and de novo analysis of a coral larval transcriptomeusing 454 GSFlx. BMC Genomics 10:219.
2. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7:Improvements in performance and usability. Mol Biol Evol 30(4):772–780.
3. Misof B, Misof K (2009) A Monte Carlo approach successfully identifies randomness inmultiple sequence alignments: A more objective means of data exclusion. Syst Biol58(1):21–34.
4. Kück P (2009) ALICUT: A Perlscript Which Cuts ALISCORE Identified RSS (Department ofBioinformatics, Zoologisches Forschungsmuseum, Bonn, Germany).
5. Kocot KM, Citarella MR, Moroz LL, Halanych KM (2013) PhyloTreePruner: A phyloge-netic tree-based approach for selection of orthologous sequences for phylogenomics.Evol Bioinform Online 9(3940):429–435.
6. Price MN, Dehal PS, Arkin AP (2010) FastTree 2—Approximately maximum-likelihoodtrees for large alignments. PLoS ONE 5(3):e9490.
7. Kück P, Meusemann K (2010) FASconCAT: Convenient handling of data matrices. MolPhylogenet Evol 56(3):1115–1118.
8. Finn RD, et al. (2014) Pfam: the protein families database. Nucleic Acids Res42(Database issue):D222–D230.
Whelan et al. www.pnas.org/cgi/content/short/1503453112 1 of 11
Saccoglossus kowalevskii
Drosophila melanogaster
Homo sapiens
Corticium candelabrum
Physalia physalis
Eunicella verrucosa
Ministeria vibrans
Oscarella carmela
Strongylocentrotus purpuratus
Sycon coactum
Hyalonema populiferum
Mortierella verticillata
Salpingoeca pyxidium
Ephydatia muelleri
Aphrocallistes vastus
Acanthoeca spectabilis
Tubulanus polymorphus
Petromyzon marinus
Aurelia aurita
Amoebidium parasiticum
Hydra viridissima
Capitella teleta
Danio rerio
Dryodora glandiformis
Latrunculia apicalis
Neurospora crassa
Sympagella nux
Bolinopsis infundibulum
Mnemiopsis leidyi
Sphaeroforma arctica
Beroe abyssicola
Hemithiris psittacea
Hydra vulgaris
Ircinia fasciculata
Platygyra carnosa
Salpingoeca rosetta
Euplokamis dunlapae
Pleurobrachia sp.
Hormathia digitata
Petrosia ficiformis
Spizellomyces punctatus
Monosiga ovata
Allomyces macrogynus
Abylopsis tetragona
Lottia gigantea
Bolocera tuediae
Sycon ciliatum
Vallicula sp.
Trichoplax adhaerens
Rhizopus oryzae
Aspergillus fumigatus
Nanomia bijuga
Acropora digitifera
Nematostella vectensis
Tethya wilhelmaPseudospongosorites suberitoides
Monosiga brevicollis
Rossella fibulata
Pleurobrachia bachei
Spongilla alba
Mertensiidae sp.
Coeloplana astericola
Craseoa lathetica
Periphylla periphylla
Daphnia pulex
Crella elegans
Agalma elegans
Kirkpatrickia variolosa
Saccharomyces cerevisiae
Aiptasia pallida
Amphimedon queenslandica
Capsaspora owczarzaki
Hydra oligactis
Chondrilla nucula
Lithobius forficatusPriapulus caudatus
99
99
44
79
63
70
95
36
0.4
Hyalonema populiferum
Monosiga brevicollis
Eunicella verrucosa
Pseudospongosorites suberitoides
Ephydatia muelleri
Spizellomyces punctatus
Salpingoeca pyxidium
Bolocera tuediae
Latrunculia apicalis
Sympagella nux
Mnemiopsis leidyi
Agalma elegansHydra vulgaris
Mertensiidae sp.
Trichoplax adhaerens
Sycon ciliatum
Pleurobrachia sp.
Aurelia aurita
Craseoa lathetica
Lottia gigantea
Euplokamis dunlapae
Amoebidium parasiticum
Oscarella carmela
Salpingoeca rosetta
Aspergillus fumigatus
Acanthoeca spectabilis
Kirkpatrickia variolosa
Corticium candelabrum
Vallicula sp.
Crella elegans
Acropora digitifera
Physalia physalis
Chondrilla nucula
Coeloplana astericola
Nematostella vectensis
Rossella fibulata
Ircinia fasciculata
Sycon coactum
Ministeria vibrans
Allomyces macrogynus
Hemithiris psittacea
Drosophila melanogaster
Saccoglossus kowalevskii
Pleurobrachia bachei
Amphimedon queenslandica
Petromyzon marinus
Aphrocallistes vastus
Monosiga ovata
Hormathia digitata
Hydra viridissima
Sphaeroforma arctica
Strongylocentrotus purpuratus
Spongilla alba
Priapulus caudatus
Tubulanus polymorphus
Nanomia bijuga
Periphylla periphylla
Platygyra carnosa
Saccharomyces cerevisiae
Homo sapiens
Daphnia pulex
Beroe abyssicola
Neurospora crassa
Capsaspora owczarzaki
Mortierella verticillataRhizopus oryzae
Abylopsis tetragona
Bolinopsis infundibulum
Aiptasia pallida
Petrosia ficiformis
Danio rerio
Tethya wilhelma
Capitella teleta
Lithobius forficatus
Hydra oligactis
Dryodora glandiformis
99
55
39
90
99
55
77
99
79
0.4
Ministeria vibrans
Aphrocallistes vastus
Platygyra carnosa
Spizellomyces punctatus
Sympagella nux
Acropora digitifera
Capsaspora owczarzaki
Homo sapiens
Mertensiidae sp.
Hydra oligactis
Hyalonema populiferum
Lottia gigantea
Ephydatia muelleri
Pleurobrachia bachei
Chondrilla nuculaPetrosia ficiformis
Sycon coactum
Pleurobrachia sp.
Rhizopus oryzae
Abylopsis tetragona
Sphaeroforma arctica
Priapulus caudatus
Physalia physalis
Oscarella carmela
Amphimedon queenslandica
Hemithiris psittacea
Dryodora glandiformis
Sycon ciliatum
Agalma elegans
Monosiga ovata
Daphnia pulex
Beroe abyssicola
Saccoglossus kowalevskii
Corticium candelabrum
Latrunculia apicalis
Nanomia bijuga
Allomyces macrogynus
Bolinopsis infundibulum
Bolocera tuediae
Danio rerio
Hormathia digitata
Aurelia aurita
Ircinia fasciculata
Coeloplana astericola
Strongylocentrotus purpuratus
Capitella teleta
Hydra vulgaris
Mnemiopsis leidyi
Euplokamis dunlapae
Lithobius forficatus
Trichoplax adhaerens
Rossella fibulata
Hydra viridissima
Tubulanus polymorphus
Aiptasia pallida
Amoebidium parasiticum
Nematostella vectensis
Salpingoeca rosetta
Petromyzon marinus
Vallicula sp.
Spongilla alba
Crella elegans
Periphylla periphylla
Tethya wilhelma
Mortierella verticillata
Eunicella verrucosa
Craseoa lathetica
Pseudospongosorites suberitoides
Kirkpatrickia variolosa
Drosophila melanogaster
94
52
90
82
90
98
0.2
Tubulanus polymorphus
Pleurobrachia sp.
Hydra vulgaris
Abylopsis tetragona
Agalma elegans
Aiptasia pallida
Coeloplana astericola
Corticium candelabrum
Euplokamis dunlapae
Nematostella vectensis
Ircinia fasciculata
Hydra viridissima
Vallicula sp.
Acropora digitiferaAurelia aurita
Amphimedon queenslandica
Strongylocentrotus purpuratus
Amoebidium parasiticum
Mortierella verticillata
Ministeria vibrans
Hyalonema populiferum
Aphrocallistes vastus
Hormathia digitata
Bolinopsis infundibulum
Oscarella carmela
Ephydatia muelleri
Beroe abyssicola
Eunicella verrucosa
Drosophila melanogaster
Trichoplax adhaerens
Allomyces macrogynus
Pseudospongosorites suberitoides
Petrosia ficiformis
Periphylla periphylla
Danio rerio
Rhizopus oryzae
Tethya wilhelma
Kirkpatrickia variolosa
Lithobius forficatus
Homo sapiens
Petromyzon marinus
Sympagella nuxRossella fibulata
Sphaeroforma arctica
Dryodora glandiformis
Latrunculia apicalis
Hemithiris psittacea
Capitella teleta
Daphnia pulex
Hydra oligactis
Platygyra carnosa
Nanomia bijuga
Spongilla alba
Pleurobrachia bachei
Bolocera tuediae
Sycon ciliatum
Salpingoeca rosetta
Priapulus caudatus
Capsaspora owczarzaki
Sycon coactum
Mertensiidae sp.
Lottia gigantea
Craseoa lathetica
Chondrilla nucula
Spizellomyces punctatus
Saccoglossus kowalevskii
Monosiga ovata
Physalia physalis
Mnemiopsis leidyi
Crella elegans
82
64
86
82
94
0.2
Drosophila melanogaster
Lithobius forficatus
Hemithiris psittacea
Agalma elegans
Trichoplax adhaerensHyalonema populiferum
Lottia gigantea
Abylopsis tetragona
Mnemiopsis leidyi
Daphnia pulex
Danio rerio
Nematostella vectensis
Hydra oligactis
Amphimedon queenslandica
Crella elegans
Chondrilla nucula
Acropora digitifera
Saccoglossus kowalevskii
Tethya wilhelma
Beroe abyssicola
Priapulus caudatus
Sympagella nux
Pseudospongosorites suberitoides
Eunicella verrucosa
Spongilla alba
Bolocera tuediae
Euplokamis dunlapae
Ircinia fasciculata
Pleurobrachia bachei
Nanomia bijuga
Tubulanus polymorphus
Bolinopsis infundibulum
Monosiga ovata
Aurelia aurita
Hydra viridissima
Platygyra carnosa
Hormathia digitata
Mertensiidae sp.
Capitella teleta
Dryodora glandiformis
Salpingoeca rosetta
Ephydatia muelleri
Pleurobrachia sp.
Aiptasia pallida
Latrunculia apicalis
Hydra vulgaris
Physalia physalis
Periphylla periphylla
Corticium candelabrum
Sycon coactum
Kirkpatrickia variolosa
Petromyzon marinus
Rossella fibulata
Homo sapiens
Vallicula sp.
Craseoa lathetica
Petrosia ficiformis
Oscarella carmela
Strongylocentrotus purpuratus
Coeloplana astericola
Aphrocallistes vastus
Sycon ciliatum
53
68
99
99
89
99
0.2Monosiga ovata
Crella elegans
Daphnia pulex
Sycon ciliatum
Saccoglossus kowalevskii
Trichoplax adhaerens
Pleurobrachia sp.
Capsaspora owczarzaki
Nematostella vectensis
Latrunculia apicalis
Sphaeroforma arctica
Hydra oligactis
Hydra viridissima
Physalia physalis
Ministeria vibrans
Spongilla alba
Drosophila melanogasterLithobius forficatus
Rhizopus oryzae
Petrosia ficiformisPseudospongosorites suberitoides
Amoebidium parasiticum
Bolinopsis infundibulum
Tubulanus polymorphus
Sympagella nux
Hydra vulgaris
Kirkpatrickia variolosa
Rossella fibulata
Strongylocentrotus purpuratus
Euplokamis dunlapae
Petromyzon marinusHomo sapiens
Dryodora glandiformis
Aurelia aurita
Salpingoeca rosetta
Ircinia fasciculata
Nanomia bijuga
Hormathia digitata
Allomyces macrogynus
Hyalonema populiferum
Bolocera tuediae
Mertensiidae sp.
Aphrocallistes vastus
Platygyra carnosa
Craseoa lathetica
Coeloplana astericola
Mortierella verticillata
Capitella teleta
Aiptasia pallida
Tethya wilhelma
Mnemiopsis leidyi
Acropora digitifera
Corticium candelabrum
Abylopsis tetragona
Priapulus caudatus
Oscarella carmela
Hemithiris psittacea
Ephydatia muelleri
Vallicula sp.
Beroe abyssicola
Agalma elegans
Pleurobrachia bachei
Chondrilla nucula
Amphimedon queenslandica
Danio rerio
Lottia gigantea
Periphylla periphylla
Sycon coactum
Spizellomyces punctatus
Eunicella verrucosa
71
44
95
65
48
57
65
97
83
58
99
98
99
0.2
A B C
D E F
Fig. S1. Maximum-likelihood topologies. (A) Full dataset (Fig. 2, dataset 1). (B) Dataset with certain paralogs removed (Fig. 2, dataset 2). (C) Dataset with certain paralogs and taxa with high LB scores removed (Fig. 2, dataset 3). (D) Dataset with certain paralogs and taxa and gene with high LB scores removed (Fig. 2, dataset 4). (E) Dataset with certain paralogs, taxa and genes withhigh LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 5). (F) Dataset with slowest evolving half of genes after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 6). Bootstrap values for each node are 100% unless otherwise noted.
Whelan et al. www.pnas.org/cgi/content/short/1503453112 2 of 11
Chondrilla nucula
Pleurobrachia sp.
Agalma elegans
Pseudospongosorites suberitoides
Lithobius forficatus
Dryodora glandiformis
Hydra vulgaris
Sympagella nux
Daphnia pulex
Trichoplax adhaerens
Mertensiidae sp.
Monosiga ovata
Capitella teleta
Crella elegans
Abylopsis tetragona
Hormathia digitata
Saccoglossus kowalevskii
Bolinopsis infundibulum
Ministeria vibrans
Tubulanus polymorphus
Spongilla alba
Periphylla periphyllaAurelia aurita
Lottia gigantea
Amphimedon queenslandica
Mortierella verticillata
Platygyra carnosa
Physalia physalis
Tethya wilhelma
Aphrocallistes vastus
Aiptasia pallida
Rossella fibulata
Spizellomyces punctatus
Bolocera tuediae
Euplokamis dunlapae
Corticium candelabrum
Ircinia fasciculata
Mnemiopsis leidyi
Nanomia bijuga
Amoebidium parasiticum
Vallicula sp.
Kirkpatrickia variolosa
Beroe abyssicola
Hemithiris psittacea
Hydra oligactis
Petrosia ficiformis
Sphaeroforma arctica
Petromyzon marinus
Pleurobrachia bachei
Sycon ciliatum
Hydra viridissima
Priapulus caudatus
Oscarella carmela
Drosophila melanogaster
Strongylocentrotus purpuratus
Latrunculia apicalis
Coeloplana astericola
Acropora digitifera
Sycon coactum
Allomyces macrogynus
Eunicella verrucosa
Danio rerio
Craseoa lathetica
Rhizopus oryzae
Capsaspora owczarzaki
Homo sapiens
Hyalonema populiferum
Ephydatia muelleri
Salpingoeca rosetta
Nematostella vectensis
85
91
83
85
94
99
88
55
99
98
99
99
93
85
98
53
0.2
Monosiga brevicollis
Tethya wilhelma
Pleurobrachia sp.Coeloplana astericola
Pleurobrachia bachei
Spongilla alba
Eunicella verrucosa
Homo sapiens
Corticium candelabrum
Sycon ciliatum
Mertensiidae sp.
Priapulus caudatus
Salpingoeca rosetta
Periphylla periphylla
Dryodora glandiformis
Rhizopus oryzae
Strongylocentrotus purpuratus
Bolinopsis infundibulum
Nematostella vectensis
Nanomia bijuga
Daphnia pulex
Platygyra carnosa
Hemithiris psittacea
Salpingoeca pyxidium
Latrunculia apicalis
Saccharomyces cerevisiae
Mortierella verticillata
Ephydatia muelleri
Capitella teleta
Neurospora crassa
Beroe abyssicola
Acropora digitifera
Kirkpatrickia variolosa
Hydra oligactisHydra vulgaris
Sphaeroforma arctica
Lottia gigantea
Bolocera tuediae
Lithobius forficatus
Allomyces macrogynus
Rossella fibulataSympagella nux
Agalma elegans
Saccoglossus kowalevskii
Oscarella carmela
Abylopsis tetragona
Amoebidium parasiticum
Aurelia aurita
Chondrilla nucula
Petromyzon marinus
Ministeria vibrans
Sycon coactum
Tubulanus polymorphus
Aspergillus fumigatus
Euplokamis dunlapae
Ircinia fasciculata
Crella elegans
Craseoa lathetica
Petrosia ficiformis
Hydra viridissima
Aspergillus fumigatus
Monosiga ovata
Hyalonema populiferum
Mnemiopsis leidyi
Spizellomyces punctatus
Amphimedon queenslandica
Aphrocallistes vastus
Danio rerio
Hormathia digitata
Vallicula sp.
Pseudospongosorites suberitoides
Trichoplax adhaerens
Physalia physalis
Capsaspora owczarzaki
Aiptasia pallida
Drosophila melanogaster
30
99
90
55
54
66
88
83
0.4
Aurelia aurita
Spongilla alba
Kirkpatrickia variolosa
Mnemiopsis leidyi
Abylopsis tetragona
Rhizopus oryzae
Craseoa lathetica
Sphaeroforma arctica
Allomyces macrogynus
Lottia gigantea
Bolinopsis infundibulum
Salpingoeca rosetta
Agalma elegans
Hydra viridissima
Salpingoeca pyxidium
Petromyzon marinus
Bolocera tuediae
Ircinia fasciculata
Amoebidium parasiticum
Homo sapiens
Hydra oligactis
Capsaspora owczarzaki
Danio rerio
Drosophila melanogaster
Latrunculiaapicalis
Acanthoeca spectabilis
Trichoplax adhaerens
Hemithiris psittacea
Monosiga ovata
Beroe abyssicola
Saccoglossus kowalevskii
Nanomia bijuga
Euplokamis dunlapae
Dryodora glandiformis
Mortierella verticillata
Tubulanus polymorphus
Oscarella carmela
Pseudospongosorites suberitoides
Neurospora crassa
Chondrilla nucula
Amphimedon queenslandica
Corticium candelabrum
Sympagella nux
Pleurobrachia bachei
Platygyra carnosa
Hormathia digitata
Crella elegans
Strongylocentrotus purpuratus
Petrosia ficiformis
Spizellomyces punctatus
Monosiga brevicollis
Hyalonema populiferum
Vallicula sp.
Ministeria vibrans
Mertensiidae sp.
Hydra vulgaris
Coeloplana astericola
Lithobius forficatus
Tethya wilhelma
Acropora digitifera
Aiptasia pallida
Capitella teleta
Physalia physalis
Periphylla periphylla
Sycon ciliatum
Aphrocallistes vastusRossella fibulata
Aspergillus fumigatus
Sycon coactum
Saccharomyces cerevisiae
Daphnia pulex
Eunicella verrucosa
Ephydatia muelleri
Priapulus caudatus
Nematostella vectensis
Pleurobrachia sp.
30
98
97
99
63
74
90
83
99
55
97
0.3
Aurelia aurita
Hormathia digitata
Rossella fibulata
Beroe abyssicola
Drosophila melanogaster
Craseoa lathetica
Petromyzon marinus
Hydra oligactis
Spongilla alba
Strongylocentrotus purpuratus
Abylopsis tetragona
Ministeria vibrans
Danio rerio
Pleurobrachia bachei
Lottia gigantea
Eunicella verrucosa
Hydra viridissima
Tethya wilhelma
Mortierella verticillata
Nanomia bijuga
Sycon coactum
Oscarella carmela
Vallicula sp.
Spizellomyces punctatus
Mnemiopsis leidyi
Aiptasia pallida
Allomyces macrogynus
Mertensiidae sp.
Petrosia ficiformisPseudospongosorites suberitoides
Rhizopus oryzae
Amoebidium parasiticum
Capitella teleta
Crella elegans
Monosiga ovata
Acropora digitifera
Aphrocallistes vastus
Bolinopsis infundibulum
Capsaspora owczarzakiSalpingoeca rosetta
Tubulanus polymorphus
Agalma elegans
Corticium candelabrum
Hyalonema populiferum
Hydra vulgaris
Euplokamis dunlapae
Lithobius forficatus
Bolocera tuediae
Pleurobrachia sp.
Amphimedon queenslandica
Hemithiris psittacea
Ircinia fasciculata
Dryodora glandiformis
Kirkpatrickia variolosa
Sphaeroforma arctica
Daphnia pulex
Homo sapiens
Platygyra carnosa
Ephydatia muelleriChondrilla nucula
Saccoglossus kowalevskii
Periphylla periphylla
Trichoplax adhaerens
Sympagella nux
Latrunculia apicalis
Coeloplana astericola
Physalia physalis
Sycon ciliatum
Nematostella vectensis
Priapulus caudatus
96
95
88
61
9965
94
79
0.2
Ephydatia muelleri
Physalia physalis
Pseudospongosorites suberitoides
Saccoglossus kowalevskii
Trichoplax adhaerens
Corticium candelabrum
Craseoa lathetica
Chondrilla nucula
Agalma elegans
Amphimedon queenslandica
Oscarella carmelaSycon coactum
Priapulus caudatus
Abylopsis tetragona
Crella elegans
Danio rerio
Hormathia digitata
Salpingoeca rosetta
Sycon ciliatum
Vallicula sp.
Capitella teleta
Pleurobrachia bacheiPleurobrachia sp.
Coeloplana astericola
Ircinia fasciculata
Dryodora glandiformis
Hemithiris psittaceaPetromyzon marinus
Eunicella verrucosaAcropora digitifera
Hydra viridissima
Platygyra carnosa
Bolocera tuediae
Latrunculia apicalis
Bolinopsis infundibulum
Drosophila melanogaster
Nanomia bijuga
Aphrocallistes vastus
Beroe abyssicola
Hydra oligactis
Aiptasia pallida
Lithobius forficatus
Monosiga ovata
Petrosia ficiformis
Nematostella vectensis
Homo sapiens
Mertensiidae sp.
Daphnia pulex
Kirkpatrickia variolosa
Strongylocentrotus purpuratus
Mnemiopsis leidyi
Tubulanus polymorphus
Tethya wilhelma
Aurelia aurita
Hyalonema populiferum
Rossella fibulata
Spongilla alba
Periphylla periphylla
Euplokamis dunlapae
Sympagella nux
Hydra vulgaris
Lottia gigantea
89
82
99
99
98
60
0.2
Capitella teleta
Pleurobrachia sp.
Sympagella nux
Aspergillus fumigatus
Sycon coactum
Homo sapiens
Ministeria vibrans
Agalma elegans
Lottia gigantea
Corticium candelabrum
Spongilla alba
Physalia physalis
Neurospora crassa
Hyalonema populiferum
Mortierella verticillata
Craseoa lathetica
Hemithiris psittacea
Allomyces macrogynus
Aphrocallistes vastus
Eunicella verrucosa
Abylopsis tetragona
Coeloplana astericola
Latrunculia apicalis
Rhizopus oryzae
Hydra viridissima
Euplokamis dunlapae
Trichoplax adhaerens
Bolinopsis infundibulum
Ephydatia muelleri
Monosiga brevicollis
Salpingoeca pyxidium
Hydra oligactis
Amphimedon queenslandica
Sphaeroforma arctica
Aiptasia pallidaHormathia digitata
Saccharomyces cerevisiae
Bolocera tuediae
Sycon ciliatum
Ircinia fasciculata
Acanthoeca spectabilis
Priapulus caudatus
Amoebidium parasiticum
Spizellomyces punctatus
Nanomia bijuga
Nematostella vectensis
Periphylla periphylla
Petromyzon marinus
Rossella fibulata
Beroe abyssicola
Drosophila melanogaster
Tubulanus polymorphus
Oscarella carmela
Acropora digitiferaPlatygyra carnosa
Lithobius forficatus
Mnemiopsis leidyi
Hydra vulgaris
Monosiga ovata
Pseudospongosorites suberitoides
Daphnia pulex
Strongylocentrotus purpuratus
Salpingoeca rosetta
Mertensiidae sp.
Danio rerio
Aurelia aurita
Dryodora glandiformis
Tethya wilhelma
Vallicula sp.
Chondrilla nucula
Kirkpatrickia variolosa
Petrosia ficiformis
Capsaspora owczarzaki
Saccoglossus kowalevskii
Crella elegans
Pleurobrachia bachei
50
56
59
91
54
6
99
99
80
99
0.4
A B C
D E F
Fig. S2. Maximum-likelihood topologies. (A) Dataset with only genes with the lowest 50% of RCFV values after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 7). (B) Dataset with certain paralogs and heterogeneous genes removed (Fig. 2, dataset 8). (C) Dataset with certain paralogs, heterogeneous genes, and genes with high LB scoresremoved (Fig. 2, dataset 9). (D) Dataset with certain paralogs, heterogeneous genes, and taxa and genes with high LB scores removed (Fig. 2, dataset 10). (E) Dataset with certain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 11). (F) Dataset with certain and uncertain paralogs removed (Fig.2, dataset 12). Bootstrap values for each node are 100% unless otherwise noted.
Whelan et al. www.pnas.org/cgi/content/short/1503453112 3 of 11
Platygyra carnosa
Tubulanus polymorphus
Chondrilla nucula
Strongylocentrotus purpuratus
Bolocera tuediae
Crella elegans
Sycon coactum
Nematostella vectensis
Latrunculia apicalis
Spongilla alba
Saccoglossus kowalevskii
Craseoa lathetica
Hyalonema populiferum
Physalia physalis
Ircinia fasciculata
Capsaspora owczarzaki
Bolinopsis infundibulum
Lithobius forficatus
Salpingoeca rosetta
Hydra vulgarisAgalma elegans
Nanomia bijuga
Vallicula sp.
Rossella fibulata
Pleurobrachia bachei
Mertensiidae sp.
Priapulus caudatus
Monosiga ovata
Rhizopus oryzae
Hemithiris psittacea
Corticium candelabrum
Coeloplana astericola
Tethya wilhelma
Beroe abyssicola
Trichoplax adhaerens
Oscarella carmela
Spizellomyces punctatus
Euplokamis dunlapae
Periphylla periphylla
Allomyces macrogynus
Aurelia aurita
Abylopsis tetragona
Ministeria vibrans
Eunicella verrucosa
Petromyzon marinusDanio rerio
Hydra oligactisHydra viridissima
Pseudospongosorites suberitoides
Mortierella verticillata
Sympagella nux
Acropora digitifera
Homo sapiens
Aphrocallistes vastus
Sphaeroforma arctica
Mnemiopsis leidyi
Ephydatia muelleri
Capitella teleta
Drosophila melanogaster
Kirkpatrickia variolosa
Amoebidium parasiticum
Sycon ciliatum
Daphnia pulex
Dryodora glandiformisPleurobrachia sp.
Lottia gigantea
Aiptasia pallidaHormathia digitata
Amphimedon queenslandicaPetrosia ficiformis
93
59
93
99
99
80
95
99
0.2
Craseoa lathetica
Coeloplana astericola
Monosiga ovata
Platygyra carnosaEunicella verrucosa
Amoebidium parasiticum
Dryodora glandiformis
Vallicula sp.
Nanomia bijuga
Mnemiopsis leidyi
Capsaspora owczarzaki
Aphrocallistes vastus
Salpingoeca rosetta
Petromyzon marinus
Tubulanus polymorphus
Crella elegans
Pseudospongosorites suberitoides
Mortierella verticillata
Kirkpatrickia variolosa
Abylopsis tetragona
Chondrilla nucula
Sphaeroforma arctica
Agalma elegans
Sycon coactum
Aurelia aurita
Daphnia pulex
Corticium candelabrum
Hydra oligactis
Danio rerio
Ircinia fasciculata
Lithobius forficatus
Capitella teleta
Bolinopsis infundibulum
Sycon ciliatum
Mertensiidae sp.
Oscarella carmela
Priapulus caudatus
Lottia gigantea
Hydra viridissima
Petrosia ficiformis
Sympagella nuxRossella fibulata
Hormathia digitata
Nematostella vectensis
Latrunculia apicalis
Hemithiris psittacea
Periphylla periphylla
Beroe abyssicola
Ephydatia muelleri
Ministeria vibrans
Spizellomyces punctatus
Saccoglossus kowalevskii
Tethya wilhelma
Rhizopus oryzaeAllomyces macrogynus
Physalia physalis
Hydra vulgaris
Drosophila melanogaster
Strongylocentrotus purpuratus
Spongilla alba
Amphimedon queenslandica
Pleurobrachia sp.
Acropora digitifera
Euplokamis dunlapae
Bolocera tuediae
Trichoplax adhaerens
Homo sapiens
Pleurobrachia bachei
Aiptasia pallida
Hyalonema populiferum
91
99
76
99
97
99
76
73
63
0.2
Petrosia ficiformis
Lithobius forficatus
Rossella fibulata
Priapulus caudatus
Periphylla periphylla
Amphimedon queenslandica
Saccoglossus kowalevskii
Hyalonema populiferum
Oscarella carmela
Bolocera tuediae
Daphnia pulex
Spongilla alba
Vallicula sp.
Pleurobrachia sp.
Acropora digitifera
Hydra viridissima
Crella elegans
Capitella teleta
Drosophila melanogaster
Eunicella verrucosa
Aphrocallistes vastus
Trichoplax adhaerens
Latrunculia apicalis
Danio rerio
Pleurobrachia bachei
Euplokamis dunlapae
Ircinia fasciculata
Lottia gigantea
Tethya wilhelma
Coeloplana astericola
Bolinopsis infundibulum
Monosiga ovata
Aiptasia pallida
Homo sapiens
Nematostella vectensis
Abylopsis tetragona
Beroe abyssicola
Hemithiris psittacea
Corticium candelabrum
Nanomia bijuga
Physalia physalis
Hormathia digitata
Craseoa lathetica
Sycon coactum
Dryodora glandiformis
Chondrilla nucula
Strongylocentrotus purpuratus
Kirkpatrickia variolosa
Aurelia aurita
Mnemiopsis leidyi
Ephydatia muelleri
Hydra vulgaris
Sycon ciliatum
Salpingoeca rosetta
Tubulanus polymorphus
Mertensiidae sp.
Agalma elegans
Pseudospongosorites suberitoides
Petromyzon marinus
Sympagella nux
Hydra oligactis
Platygyra carnosa
66
87
78
51
99
98
98
98
0.2
Amphimedon queenslandica
Pleurobrachia sp.
Hydra oligactisHydra vulgaris
Beroe abyssicola
Saccoglossus kowalevskii
Amoebidium parasiticum
Hormathia digitata
Danio rerio
Sphaeroforma arctica
Sycon ciliatum
Aurelia auritaPeriphylla periphylla
Aphrocallistes vastusRossella fibulata
Allomyces macrogynus
Dryodora glandiformis
Pleurobrachia bachei
Spizellomyces punctatus
Spongilla alba
Rhizopus oryzae
Mertensiidae sp.
Hyalonema populiferum
Monosiga ovata
Petromyzon marinus
Drosophila melanogaster
Bolinopsis infundibulum
Ministeria vibrans
Ircinia fasciculata
Nematostella vectensis
Sycon coactum
Kirkpatrickia variolosa
Latrunculia apicalis
Lithobius forficatus
Hemithiris psittacea
Physalia physalis
Platygyra carnosa
Coeloplana astericola
Nanomia bijuga
Hydra viridissima
Corticium candelabrum
Ephydatia muelleri
Tethya wilhelma
Lottia gigantea
Acropora digitifera
Abylopsis tetragona
Pseudospongosorites suberitoides
Strongylocentrotus purpuratus
Mortierella verticillata
Aiptasia pallida
Agalma elegans
Petrosia ficiformis
Craseoa lathetica
Capitella teleta
Mnemiopsis leidyi
Daphnia pulex
Priapulus caudatus
Euplokamis dunlapae
Homo sapiens
Salpingoeca rosetta
Oscarella carmela
Crella elegans
Eunicella verrucosa
Tubulanus polymorphus
Bolocera tuediae
Chondrilla nucula
Capsaspora owczarzaki
Sympagella nux
Vallicula sp.
Trichoplax adhaerens
94
86
69
78
87
62
83
98
64
97
60
62
99
84
0.2
Homo sapiens
Latrunculia apicalis
Aphrocallistes vastus
Ministeria vibrans
Petrosia ficiformis
Capitella teleta
Sycon ciliatum
Pseudospongosorites suberitoides
Petromyzon marinus
Hemithiris psittacea
Coeloplana astericola
Strongylocentrotus purpuratus
Beroe abyssicolaMnemiopsis leidyi
Drosophila melanogaster
Corticium candelabrum
Vallicula sp.
Aiptasia pallidaHormathia digitata
Priapulus caudatus
Ircinia fasciculata
Nematostella vectensis
Kirkpatrickia variolosa
Ephydatia muelleri
Bolocera tuediae
Oscarella carmela
Amoebidium parasiticum
Tubulanus polymorphus
Mertensiidae sp.
Sympagella nux
Acropora digitifera
Sphaeroforma arctica
Hyalonema populiferum
Crella elegans
Salpingoeca rosetta
Abylopsis tetragona
Rossella fibulata
Amphimedon queenslandica
Mortierella verticillata
Daphnia pulex
Spizellomyces punctatus
Dryodora glandiformis
Pleurobrachia bachei
Aurelia aurita
Platygyra carnosa
Pleurobrachia sp.
Monosiga ovata
Allomyces macrogynus
Capsaspora owczarzaki
Bolinopsis infundibulum
Nanomia bijuga
Trichoplax adhaerens
Euplokamis dunlapae
Danio rerio
Saccoglossus kowalevskii
Sycon coactum
Eunicella verrucosa
Periphylla periphylla
Spongilla alba
Craseoa lathetica
Lottia gigantea
Tethya wilhelma
Rhizopus oryzae
Hydra viridissima
Agalma elegans
Lithobius forficatus
Chondrilla nucula
Physalia physalis
Hydra oligactisHydra vulgaris
75
93
9975
40
99
78
45
73
8791
0.2
Agalma elegans
Ministeria vibrans
Saccoglossus kowalevskii
Pleurobrachia sp.
Mortierella verticillata
Sphaeroforma arctica
Abylopsis tetragona
Ircinia fasciculata
Crella elegans
Craseoa lathetica
Sympagella nux
Amphimedon queenslandica
Pleurobrachia bachei
Hormathia digitata
Coeloplana astericola
Monosiga brevicollis
Aphrocallistes vastus
Ephydatia muelleri
Mnemiopsis leidyi
Sycon coactum
Kirkpatrickia variolosa
Euplokamis dunlapae
Lottia gigantea
Chondrilla nucula
Hemithiris psittacea
Hydra vulgaris
Aspergillus fumigatus
Amoebidium parasiticum
Aurelia aurita
Neurospora crassa
Bolocera tuediae
Dryodora glandiformis
Capitella teleta
Hyalonema populiferum
Nematostella vectensis
Lithobius forficatus
Latrunculia apicalis
Petromyzon marinus
Monosiga ovata
Daphnia pulex
Periphylla periphylla
Spizellomyces punctatus
Priapulus caudatus
Platygyra carnosa
Bolinopsis infundibulum
Tubulanus polymorphus
Spongilla alba
Danio rerio
Physalia physalis
Nanomia bijuga
Allomyces macrogynus
Aiptasia pallida
Rhizopus oryzae
Sycon ciliatum
Saccharomyces cerevisiae
Eunicella verrucosa
Acropora digitifera
Oscarella carmela
Capsaspora owczarzaki
Acanthoeca spectabilis
Salpingoeca rosetta
Petrosia ficiformis
Tethya wilhelma
Rossella fibulata
Strongylocentrotus purpuratus
Corticium candelabrum
Vallicula sp.
Homo sapiens
Trichoplax adhaerens
Hydra viridissima
Pseudospongosorites suberitoides
Beroe abyssicola
Drosophila melanogaster
Hydra oligactis
Salpingoeca pyxidium
Mertensiidae sp.
99
99
9999
71
99
66
97
1
59
32
88
99
99
88
99
99
59
87
0.4
A B C
D E F
Fig. S3. Maximum-likelihood topologies. (A) Dataset with certain and uncertain paralogs and taxa with high LB scores removed (Fig. 2, dataset 13). (B) Dataset with certain and uncertain paralogs and taxa and gene with high LB scores removed (Fig. 2, dataset 14). (C) Dataset with certain and uncertain paralogs, taxa and genes with high LB scores, and all outgroups exceptchoanoflagellates removed (Fig. 2, dataset 15). (D) Dataset with slowest evolving half of genes after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 16). (E) Dataset with only genes with the lowest 50% of RCFV values after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 17).(F) Dataset with certain and uncertain paralogs and heterogeneous genes removed (Fig. 2, dataset 18). Bootstrap values for each node are 100% unless otherwise noted.
Whelan et al. www.pnas.org/cgi/content/short/1503453112 4 of 11
Saccharomyces cerevisiae
Salpingoeca pyxidium
Agalma elegans
Hydra oligactis
Neurospora crassa
Tubulanus polymorphus
Capitella teletaLottia gigantea
Monosiga brevicollis
Spizellomyces punctatus
Bolinopsis infundibulum
Aurelia aurita
Pleurobrachia bachei
Bolocera tuediae
Ircinia fasciculata
Capsaspora owczarzaki
Chondrilla nucula
Oscarella carmela
Amphimedon queenslandica
Latrunculia apicalis
Priapulus caudatus
Ministeria vibrans
Periphylla periphylla
Mnemiopsis leidyi
Pseudospongosorites suberitoides
Platygyra carnosa
Hormathia digitata
Rossella fibulata
Trichoplax adhaerens
Vallicula sp.Monosiga ovata
Kirkpatrickia variolosa
Strongylocentrotus purpuratus
Hydra vulgaris
Acanthoeca spectabilis
Dryodora glandiformis
Tethya wilhelma
Lithobius forficatusDaphnia pulex
Crella elegans
Hydra viridissima
Rhizopus oryzae
Ephydatia muelleri
Sycon coactumSycon ciliatum
Spongilla alba
Hemithiris psittacea
Drosophila melanogaster
Eunicella verrucosa
Mortierella verticillata
Nanomia bijuga
Sphaeroforma arctica
Danio rerio
Hyalonema populiferum
Saccoglossus kowalevskii
Homo sapiens
Aiptasia pallida
Petromyzon marinus
Petrosia ficiformis
Acropora digitifera
Craseoa lathetica
Aphrocallistes vastus
Amoebidium parasiticum
Sympagella nux
Aspergillus fumigatus
Beroe abyssicola
Nematostella vectensis
Abylopsis tetragona
Allomyces macrogynus
Euplokamis dunlapae
Pleurobrachia sp.
Physalia physalis
Corticium candelabrum
Salpingoeca rosetta
Coeloplana astericola
Mertensiidae sp.
96
93
57
78
93
96
70
68
53
74
99
84
0.3
Tubulanus polymorphus
Hyalonema populiferum
Mnemiopsis leidyi
Rhizopus oryzae
Tethya wilhelma
Spizellomyces punctatus
Sphaeroforma arctica
Sympagella nuxAphrocallistes vastus
Capsaspora owczarzaki
Daphnia pulex
Sycon ciliatum
Aiptasia pallida
Nanomia bijuga
Strongylocentrotus purpuratus
Vallicula sp.
Hemithiris psittacea
Pleurobrachia bachei
Lithobius forficatus
Periphylla periphylla
Platygyra carnosa
Hydra vulgaris
Hormathia digitata
Hydra oligactis
Mertensiidae sp.
Salpingoeca rosetta
Trichoplax adhaerens
Euplokamis dunlapae
Nematostella vectensis
Homo sapiensDanio rerio
Ircinia fasciculata
Corticium candelabrum
Allomyces macrogynus
Physalia physalis
Agalma elegans
Latrunculia apicalis
Beroe abyssicola
Eunicella verrucosa
Pleurobrachia sp.
Petrosia ficiformis
Crella elegans
Amoebidium parasiticum
Chondrilla nucula
Monosiga ovata
Pseudospongosorites suberitoides
Sycon coactum
Aurelia aurita
Kirkpatrickia variolosa
Rossella fibulata
Ephydatia muelleri
Dryodora glandiformis
Abylopsis tetragona
Bolinopsis infundibulum
Petromyzon marinus
Spongilla alba
Priapulus caudatus
Ministeria vibrans
Lottia gigantea
Drosophila melanogaster
Acropora digitifera
Amphimedon queenslandica
Oscarella carmela
Coeloplana astericola
Saccoglossus kowalevskii
Mortierella verticillata
Bolocera tuediae
Capitella teleta
Craseoa lathetica
Hydra viridissima
70
9870
85
79
79
96
99
0.2
Euplokamis dunlapae
Petrosia ficiformis
Lottia giganteaCapitella teleta
Oscarella carmela
Mnemiopsis leidyi
Kirkpatrickia variolosa
Salpingoeca rosetta
Vallicula sp.
Agalma elegans
Amphimedon queenslandica
Tethya wilhelma
Sympagella nux
Coeloplana astericola
Aiptasia pallida
Daphnia pulex
Latrunculia apicalis
Monosiga ovata
Hemithiris psittacea
Saccoglossus kowalevskii
Mertensiidae sp.
Pleurobrachia sp.
Hydra viridissima
Danio rerio
Hydra vulgaris
Hyalonema populiferum
Pleurobrachia bachei
Physalia physalis
Dryodora glandiformis
Bolocera tuediae
Rossella fibulata
Hormathia digitata
Crella elegans
Nanomia bijuga
Priapulus caudatus
Periphylla periphylla
Eunicella verrucosa
Platygyra carnosa
Corticium candelabrum
Spongilla alba
Aurelia aurita
Bolinopsis infundibulum
Tubulanus polymorphus
Trichoplax adhaerens
Hydra oligactisAbylopsis tetragona
Ephydatia muelleriPseudospongosorites suberitoides
Ircinia fasciculata
Beroe abyssicola
Homo sapiens
Lithobius forficatus
Craseoa lathetica
Drosophila melanogaster
Petromyzon marinus
Chondrilla nucula
Strongylocentrotus purpuratus
Sycon ciliatum
Nematostella vectensis
Aphrocallistes vastus
Sycon coactum
Acropora digitifera
99
99
9873
77
87
97
99
54
97
0.2
Capitella teleta
Acropora digitifera
Physalia physalis
Homo sapiens
Craseoa lathetica
Lithobius forficatus
Periphylla periphyllaTrichoplax adhaerens
Aiptasia pallida
Corticium candelabrum
Bolocera tuediae
Priapulus caudatus
Bolinopsis infundibulum
Daphnia pulex
Beroe abyssicola
Aurelia aurita
Coeloplana astericola
Euplokamis dunlapae
Petrosia ficiformis
Lottia gigantea
Vallicula sp.
Drosophila melanogaster
Sycon ciliatum
Sympagella nux
Spongilla alba
Petromyzon marinus
Oscarella carmela
Sycon coactum
Pleurobrachia bachei
Hormathia digitata
Pleurobrachia sp.
Platygyra carnosa
Chondrilla nucula
Saccoglossus kowalevskii
Abylopsis tetragona
Amphimedon queenslandica
Hydra vulgaris
Ephydatia muelleri
Tethya wilhelma
Eunicella verrucosa
Hydra viridissima
Latrunculia apicalis
Crella elegans
Rossella fibulata
Hyalonema populiferum
Strongylocentrotus purpuratus
Nematostella vectensis
Hydra oligactis
Hemithiris psittacea
Aphrocallistes vastus
Pseudospongosorites suberitoides
Ircinia fasciculata
Agalma elegans
Dryodora glandiformis
Mertensiidae sp.
Nanomia bijuga
Kirkpatrickia variolosa
Mnemiopsis leidyi
Tubulanus polymorphus
Danio rerio
71
99
4484
97
86
0.2
Latrunculia apicalis
Corticium candelabrum
Mertensiidae sp.
Strongylocentrotus purpuratus
Capitella teleta
Platygyra carnosa
Agalma elegans
Nematostella vectensis
Sycon ciliatum
Pseudospongosorites suberitoides
Tethya wilhelma
Craseoa lathetica
Tubulanus polymorphus
Amphimedon queenslandica
Ircinia fasciculata
Pleurobrachia sp.
Daphnia pulex
Hyalonema populiferum
Lithobius forficatus
Physalia physalis
Bolocera tuediae
Dryodora glandiformis
Aurelia aurita
Spongilla alba
Euplokamis dunlapae
Danio rerio
Hydra oligactis
Saccoglossus kowalevskii
Nanomia bijuga
Vallicula sp.
Bolinopsis infundibulum
Petrosia ficiformis
Beroe abyssicola
Acropora digitifera
Hydra vulgarisHydra viridissima
Homo sapiens
Eunicella verrucosa
Oscarella carmela
Petromyzon marinus
Ephydatia muelleri
Mnemiopsis leidyi
Abylopsis tetragona
Chondrilla nucula
Drosophila melanogaster
Hemithiris psittacea
Kirkpatrickia variolosa
Lottia gigantea
Trichoplax adhaerens
Pleurobrachia bachei
Hormathia digitata
Crella elegans
Priapulus caudatus
Sycon coactum
Sympagella nux
Periphylla periphylla
Aphrocallistes vastus
Coeloplana astericola
Aiptasia pallida
Rossella fibulata
99
95
92
69
57
99
88
0.2
Trichoplax adhaerens
Platygyra carnosa
Lottia gigantea
Aurelia auritaAgalma elegans
Oscarella carmela
Capitella teleta
Daphnia pulex
Latrunculia apicalis
Priapulus caudatus
Abylopsis tetragona
Bolocera tuediae
Hydra viridissima
Pleurobrachia sp.
Dryodora glandiformis
Hormathia digitata
Corticium candelabrum
Chondrilla nucula
Sycon coactum
Craseoa lathetica
Mertensiidae sp.Bolinopsis infundibulum
Ircinia fasciculata
Spongilla alba
Hydra vulgaris
Amphimedon queenslandica
Aphrocallistes vastus
Hemithiris psittacea
Pleurobrachia bachei
Strongylocentrotus purpuratus
Beroe abyssicola
Danio rerio
Hydra oligactis
Homo sapiens
Hyalonema populiferum
Periphylla periphylla
Physalia physalis
Nanomia bijuga
Coeloplana astericola
Pseudospongosorites suberitoides
Petrosia ficiformis
Petromyzon marinus
Saccoglossus kowalevskii
Vallicula sp.
Mnemiopsis leidyi
Rossella fibulata
Nematostella vectensis
Sycon ciliatum
Kirkpatrickia variolosa
Tethya wilhelma
Ephydatia muelleri
Sympagella nux
Tubulanus polymorphus
Drosophila melanogaster
Lithobius forficatus
Euplokamis dunlapae
Acropora digitiferaEunicella verrucosa
Aiptasia pallida
Crella elegans
98
98
86
63
99
51
99
99
81
92
0.2
A B C
D E F
Fig. S4. Maximum-likelihood topologies. (A) Dataset with certain and uncertain paralogs, heterogeneous genes, and genes with high LB scores removed (Fig. 2, dataset 19). (B) Dataset with certain and uncertain paralogs, heterogeneous genes, and taxa and genes with high LB scores removed (Fig. 2, dataset 20). (C) Dataset with certain and uncertain” paralogs, heterogeneousgenes, taxa and genes with high LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 21). (D) Dataset with certain paralogs, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 22). (E) Dataset with certain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 23). (F)Dataset with certain and uncertain paralogs, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 24). Bootstrap values for each node are 100% unless otherwise noted.
Whelan et al. www.pnas.org/cgi/content/short/1503453112 5 of 11
Periphylla periphylla
Latrunculia apicalis
Aphrocallistes vastus
Eunicella verrucosa
Dryodora glandiformis
Danio rerioHomo sapiens
Chondrilla nucula
Euplokamis dunlapae
Mnemiopsis leidyi
Petrosia ficiformis
Sycon ciliatum
Corticium candelabrum
Rossella fibulata
Hyalonema populiferum
Sympagella nux
Vallicula sp.
Nematostella vectensis
Hydra vulgaris
Bolinopsis infundibulum
Kirkpatrickia variolosa
Acropora digitifera
Ircinia fasciculata
Mertensiidae sp.
Hydra viridissima
Sycon coactum
Aiptasia pallida
Lottia gigantea
Agalma elegans
Hormathia digitata
Pleurobrachia sp.
Craseoa lathetica
Lithobius forficatus
Tubulanus polymorphus
Amphimedon queenslandica
Strongylocentrotus purpuratus
Physalia physalis
Hemithiris psittacea
Drosophila melanogaster
Ephydatia muelleri
Capitella teleta
Hydra oligactis
Aurelia aurita
Beroe abyssicola
Saccoglossus kowalevskii
Abylopsis tetragona
Coeloplana astericola
Tethya wilhelma
Priapulus caudatus
Nanomia bijuga
Platygyra carnosa
Spongilla alba
Petromyzon marinus
Oscarella carmela
Pleurobrachia bachei
Trichoplax adhaerens
Daphnia pulex
Bolocera tuediae
Crella elegans
Pseudospongosorites suberitoides
96
99
92
85
61
99
98
98
55
0.2
Craseoa lathetica
Euplokamis dunlapae
Acropora digitifera
Spongilla alba
Homo sapiens
Lithobius forficatus
Agalma elegans
Dryodora glandiformis
Petrosia ficiformis
Beroe abyssicola
Coeloplana astericola
Petromyzon marinus
Hemithiris psittacea
Periphylla periphylla
Daphnia pulex
Nematostella vectensis
Danio rerio
Mnemiopsis leidyi
Aurelia aurita
Crella elegans
Tethya wilhelma
Chondrilla nucula
Sphaeroforma arctica
Hydra vulgaris
Lottia gigantea
Bolocera tuediae
Salpingoeca rosetta
Saccoglossus kowalevskii
Capitella teleta
Sycon coactum
Strongylocentrotus purpuratus
Hydra oligactis
Aiptasia pallida
Physalia physalis
Aphrocallistes vastus
Pleurobrachia bachei
Mertensiidae sp.
Hormathia digitata
Tubulanus polymorphus
Sympagella nux
Drosophila melanogaster
Mortierella verticillata
Kirkpatrickia variolosa
Ministeria vibransCapsaspora owczarzaki
Hyalonema populiferum
Oscarella carmela
Sycon ciliatum
Rossella fibulata
Spizellomyces punctatus
Hydra viridissima
Monosiga ovata
Abylopsis tetragona
Amoebidium parasiticum
Allomyces macrogynus
Bolinopsis infundibulum
Eunicella verrucosa
Pseudospongosorites suberitoides
Rhizopus oryzae
Latrunculia apicalis
Nanomia bijuga
Amphimedon queenslandica
Platygyra carnosa
Priapulus caudatus
Corticium candelabrum
Pleurobrachia sp.
Trichoplax adhaerens
Ircinia fasciculata
Ephydatia muelleri
Vallicula sp.
0.83
0.87
0.98
0.69
0.3
Aurelia aurita
Ircinia fasciculata
Homo sapiens
Nematostella vectensis
Spizellomyces punctatus
Daphnia pulex
Coeloplana astericola
Sympagella nux
Amphimedon queenslandica
Priapulus caudatus
Acropora digitifera
Mertensiidae sp.
Aiptasia pallida
Tubulanus polymorphus
Allomyces macrogynus
Hydra viridissima
Platygyra carnosa
Capsaspora owczarzaki
Aphrocallistes vastus
Sphaeroforma arctica
Trichoplax adhaerens
Mnemiopsis leidyi
Drosophila melanogaster
Euplokamis dunlapae
Rossella fibulata
Craseoa lathetica
Agalma elegans
Spongilla alba
Latrunculia apicalis
Amoebidium parasiticumMinisteria vibrans
Bolocera tuediae
Sycon ciliatum
Pleurobrachia sp.
Vallicula sp.
Capitella teleta
Periphylla periphylla
Kirkpatrickia variolosaCrella elegans
Salpingoeca rosetta
Physalia physalis
Tethya wilhelma
Saccoglossus kowalevskii
Hydra vulgaris
Oscarella carmelaSycon coactum
Corticium candelabrum
Petrosia ficiformis
Rhizopus oryzae
Danio rerio
Petromyzon marinus
Mortierella verticillata
Ephydatia muelleri
Pleurobrachia bachei
Lottia gigantea
Abylopsis tetragona
Pseudospongosorites suberitoides
Hydra oligactis
Dryodora glandiformis
Eunicella verrucosa
Beroe abyssicola
Hyalonema populiferum
Chondrilla nucula
Lithobius forficatus
Bolinopsis infundibulum
Nanomia bijuga
Hormathia digitata
Hemithiris psittacea
Monosiga ovata
Strongylocentrotus purpuratus
0.99
0.84
0.68
0.66
0.3
Euprymna
Rhizopus orizae
Monosiga ovata
Nematostella vectensis
Oscarella
Sphaeroforma arctica
Spizellomyces punctatus
Mnemiopsis leiydi
Sycon raphanus
Hydra magnipapillata
Branchiostoma
Amoebidium parasiticum
Monosiga brevicollis
Montastrae faveolata
Aplysia californica
Heterochone calyx
Batrachochochytrium dendrobatidis
Oopsacas minuta
Leucetta chagosensis
Scutigera coleoptera
Ephydatia
Phycomyces blakesleeanus
Ciona intestinalis
Acropora
Euperipatoides
Petromyzon marinus
Metridium senile
Cyanea capillata
Molgula tectiformis
Pedicellina cernua
Ixodes scapularis
Amphimedon queenslandica
Podocoryne carnea
Nasonia vitripennis
Danio rerio
Crassostrea
Saccoglossus kowalevskii
Daphnia
Strongylocentrotus purpuratus
Helobdella
Capitella sp.
Hydractini echinata
Metridium senile
Allomyces macrogynus
Proterospongia sp.
Carteriospongia foliascens
Tubifex tubifex
Pediculus humanus
Pleurobracia pileus
Anoplodactylus eroticus
Capsaspora owczarzaki
Xenoturbella bocki
Suberites
Trichoplax adhaerens
Clytia hemisphaerica
29
47
99
89
5055
43
43
72
42
93
95
57
76
77
100
86
85
93
99
99
80
81
78
86
98
0.07
Mnemiopsis leiydi
Amoebidium parasiticum
Trichoplax adhaerens
Euprymna
Heterochone calyx
Sphaeroforma arctica
Sycon raphanus
Ciona intestinalis
Metrinsiidae sp.
Montastrae faveolata
Clytia hemisphaerica
Tubifex tubifex
Metridium senile
Scutigera coleoptera
Xenoturbella bocki
Podocoryne carnea
Capitella sp.
Pleurobrachia pileus
Proterospongia sp.
Oopsacas minuta
Capsaspora owczarzaki
Ephydatia
Strongylocentrotus purpuratus
Allomyces macrogynus
Rhizopus orizae
Anoplodactylus eroticus
Danio rerio
Carteriospongia foliascens
Branchiostoma
Aplysia californica
Petromyzon marinus
Monosiga brevicollis
Pediculus humanus
Crassostrea
Molgula tectigormis
Euperipatoides
Oscarella
Amphimedon queenslandica
Pedicellin cernua
Hydra magnipapillata
Daphnia
Nematostella vectensis
Cyanea capillata
Leucetta chagosensis
Saccoglossus kowalevskii
Batrachochytrium dendrobatidis
Acropora
Hydractini echinata
Spizellomyces punctatus
Nasonia vitripennis
Suberites
Monosiga ovata
Ixodes scapularis
Helobdella
Phycomyces blakesleeanus
0.70
0.62
0.90
0.99
0.94
0.96
0.61
0.90
0.55
0.99
0.99
0.2
A B C
D E
Den
sity
0
20
40
60
80
100
120
0.97 0.98 0.99 1.00
Den
sity
0
10
20
30
40
50
0.92 0.94 0.96 0.98 1.00Average Leaf Stability Index
CalcareaHomoscleromorpha
Trichoplax adhaerens
F
G
Fig. S5. (A) Maximum-likelihood topology of dataset with certain and uncertain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 25). (B) Bayesian inference topology of dataset of slowest evolving half of genes after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 6). (C) Bayesianinference topology of dataset of slowest evolving half of genes after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 16). (D) Maximum-likelihood phylogeny of Philippe et al. (1) dataset with ribosomal protein genes removed. Chimeric taxa are labeled only by their genus name as in the original publication. (E) Bayesianinference topology of Philippe et al. (1) dataset with ribosomal protein genes removed. Chimeric taxa are labeled only by their genus name as in the original publication. Bootstrap values or posterior probabilities for each node are 100% unless otherwise noted. (F) Density plot of average leaf stability indices for each taxon across all datasets. (G) Density plot of average leaf stabilityindices for each taxon in datasets without outgroups.
1. Philippe H, et al. (2009) Phylogenomics revives traditional views on deep animal relationships. Curr Biol 19(8):706–712.
Whelan et al. www.pnas.org/cgi/content/short/1503453112 6 of 11
6 taxa
10 genes
18 genes
A
B
C
Fig. S6. Density plots of (A) average LB scores for each taxon, (B) average upper quartile LB score for each OG, and (C) SD of LB score for each OG. Shadedareas include taxa or genes that were considered to have “high” LB scores and were trimmed from respective datasets (see Fig. 2).
Whelan et al. www.pnas.org/cgi/content/short/1503453112 7 of 11
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
Dataset 3 Dataset 4
Dataset 7 Dataset 6
Dataset 2 Dataset 8
Dataset 10 Dataset 9
Dataset 13 Dataset 14
Dataset 17 Dataset 16
Dataset 12 Dataset 18
Dataset 20Dataset 19
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
Den
sity
R of Linear Regression2
R of Linear Regression2
Nosenko Ribosomal Nosenko Non-Ribosomal
Philippe et al. Non-RibosomalPhilippe et al. Ribosomal
0.0 0.1 0.2 0.3 0.4
01
23
45
6 Dataset 2 Dataset 8
Dataset 10 Dataset 9
0.0 0.1 0.2 0.3
01
23
45
6
Dataset 12 Dataset 18
Dataset 20Dataset 19
Slope of Linear Regression
Slope of Linear Regression
−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6
01
23
45
Den
sity
Nosenko Ribosomal Nosenko Non-Ribosomal
Philippe et al. Non-RibosomalPhilippe et al. Ribosomal
−0.1 0.0 0.1 0.2 0.3 0.4
Den
sity
01
23
45
6Dataset 3 Dataset 4
Dataset 7Dataset 6
0.0 0.1 0.2 0.3 0.4
Dataset 13 Dataset 14
Dataset 17 Dataset 16
Den
sity
01
23
45
6
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
Den
sity
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
Den
sity
Fig. S7. Density plots of gene specific slope and R2 of linear regression between patristic and uncorrected pairwise distances for each dataset. Datasets with higher values for each metric are less saturated. Dataset numbers are referenced as in Fig. 2.
Whelan et al. www.pnas.org/cgi/content/short/1503453112 8 of 11
Table S1. Raw data for each species used in orthology searches
SpeciesSequence
typeNo. of reads, transcriptsor predicted proteins
Source orcollection locality NCBI or other accession
FungiAllomyces macrogynus Whole genome 19,446 proteins Broad Institute www.broadinstitute.org/annotation/genome/
multicellularity_project/MultiHome.htmlAspergillus fumigatus† Whole genome 11,485 proteins InParanoid databaseMortierella verticillata Whole genome 12,569 proteins Broad www.broadinstitute.org/annotation/genome/
multicellularity_project/MultiHome.htmlNeurospora crassa† Whole genome 9,822 proteins InParanoid databaseRhizopus oryzae Whole genome 16,971 proteins InParanoid databaseSaccharomyces cerevisiae* Whole genome 6,590 proteins InParanoid databaseSpizellomyces punctatus Whole genome 9,424 proteins Broad Institute www.broadinstitute.org/annotation/genome/
multicellularity_project/MultiHome.htmlIchthyosporeaAmoebidium parasiticum Illumina 105,237 transcripts Broad Institute www.broadinstitute.org/annotation/genome/
multicellularity_project/MultiHome.htmlCapsaspora owczarzaki Whole genome 101,23 proteins Broad Institute www.broadinstitute.org/annotation/genome/
multicellularity_project/MultiHome.htmlMinisteria vibrans Illumina 19,953,296 NCBI SRA SRR343051Sphaeroforma arctica Whole genome 18,730 proteins Broad Institute www.broadinstitute.org/annotation/genome/
multicellularity_project/MultiHome.htmlChoanoflagellataAcanthoeca spectabilis* Illumina 41,408,758 reads ATCC SRR1915695Monosiga brevicollis* Whole genome 10336 proteins Joint Genome InstituteMonosiga ovata Sanger 29,495 sequences dbESTSalpingoeca pyxidium* Illumina 34,177,888 reads ATCC SRR1915694Salpingoeca rosetta Whole genome 11,731 proteins Broad Institute www.broadinstitute.org/annotation/genome/
multicellularity_project/MultiHome.htmlPlacozoaTrichoplax adhaerans Whole genome 11,179 proteins Joint Genome Institute
CnidariaAcropora digitifera Whole genome 33,366 proteins OIST Marine genomics
UnitAbylopsis tetragona Illumina 43150352 reads NCBI SRA SRR871525Bolocera tuediae 454 546,903 reads NCBI SRA SRR504347Agalma elegans Illumina 107,996,364 reads NCBI SRA SRR871526Nanomia bijuga Illumina 105,066,886 reads NCBI SRA SRR871527Aiptasia pallida Illumina 536616844 reads NCBI SRA SRR696721; SRR696732; SRR696745Aurelia aurita
(strain Roscoff)454 1,099,012 reads NCBI SRA SRR040475; SRR040476; SRR040477;
SRR040478; SRR040479Eunicella verrucosa Illumina 70,071,835 reads NCBI SRA SRR1324944; SRR1324945Hydra oligactis 454 1,017,333 reads NCBI SRA SRR040466; SRR040467; SRR040468;
SRR040469Hydra viridissima 454 1,048,810 reads NCBI SRA SRR040470; SRR040471; SRR040472;
SRR040473Hormathia digitata 454 546,846 reads NCBI SRA SRR504348Hydra vulgaris Sanger 167,982 sequences dbESTNematostella vectensis Whole genome 27,273 proteins Joint Genome InstitutePeriphylla periphylla Illumina 44, 443,566 45°55.33N 125°05.47 W SRR1915828Physalia physalis Illumina 72,963,546 reads NCBI SRA SRR871528Platygyra carnosus Illumina 72,963,546 reads NCBI SRA SRR402974; SRR402975Craseoa lathetica Illumina 76,466,398 reads NCBI SRA SRR871529
PoriferaAmphimedon
queenslandicaSanger 63,542 sequences dbEST
Aphrocallistes vastus Illumina preassembled Evolution & ResearchArchive
Chondrilla nucula Illumina preassembled Dataverse Network (1)Corticium candelabrum Illumina preassembled Dataverse Network (1)Crella elegans Illumina preassembled Dryad doi: 10.5061/dryad.50dc6/3Ephydatia muelleri Illumina 156,515,380 reads NCBI SRA SRR1041944Hyalonema populiferum Illumina 62,031,394 reads 43°55.376N 127°24.65W SRR1916923Ircinia fasciculata Illumina preassembled Dataverse Network (1)Kirkpatrickia variolosa Illumina 59,234,424 reads 77°54.228S 170°57.915E SRR1916957
Whelan et al. www.pnas.org/cgi/content/short/1503453112 9 of 11
Table S1. Cont.
SpeciesSequence
typeNo. of reads, transcriptsor predicted proteins
Source orcollection locality NCBI or other accession
Latrunculia apicalis Illumina 49,432,908 reads 82°54.228S 175°57.915 E SRR1915755Oscarella carmela Illumina Preassembled www.compagen.orgPetrosia ficiformis Illumina preassembled Dataverse Network (1)Pseudospongosorites
suberitoidesIllumina preassembled Dataverse Network (1)
Rossella fibulata Illumina 79,607,548 reads 76°14.716S 174°30.247E SRR1915835Spongilla alba Illumina preassembled Dataverse Network (1)Sycon coactum Illumina preassembled Dataverse Network (1)Sympagella nux Illumina 30,065,060 reads 28°10.26N 89°56.80W SRR1916581Tethya wilhelma 454 442,949 reads NCBI SRA ERR216193
CtenophoraBeroe abyssicola Illumina 45,444,644 reads NCBI SRA SRR777787Bolinopsis infundibulum Illumina 43,377,170 reads NCBI SRA SRR786491Coeloplana astericola Illumina 41,685,220 reads NCBI SRA SRR786490Dryodora glandiformis Illumina 41,269,166 reads NCBI SRA SRR777788Euplokamis dunlapae Illumina 68,302,698 reads NCBI SRA SRR777663Mertensiidae sp. Illumina 47,454,246 reads NCBI SRA SRR786492Mnemiopsis leidyi Illumina 49,782,900 reads NCBI SRA SRR789900Pleurobrachia bachei Whole genome 80,151 proteins neurobase.rc.ufl.edu/
pleurobrachiaPleurobrachia sp. Illumina 50626422 reads NCBI SRA SRR789901Vallicula sp. Illumina 49,090,372 reads NCBI SRA SRR786489
BilateriaHomo sapiens Whole genome N/A HaMStR core orthologsDanio rerio Whole genome 54,390 proteins EnSEMBLPetromyzon_marinus Whole genome 12,444 proteins EnSEMBLSaccoglossus kowalevskii Sanger 202,190 sequences dbESTStrongylocentrotus
purpuratusWhole genome 28,474 proteins InParanoid
Capitella teleta Whole genome 32,415 proteins Joint Genome InstituteHemithiris psittacea Illumina 60,730,022 reads NCBI SRA SRR1611556Lottia gigantea Whole genome 22,899 proteins Joint Genome InstituteTubulanus polymorphus Illumina 39,262,732 reads NCBI SRA SRR1611583Daphnia pulex Whole genome 30,137 proteins Joint Genome InstituteDrosophila melanogaster Whole genome N/A HaMStR core orthologsPriapulus caudatus Illumina 57,331,982 reads NCBI SRA SRR1611567Lithobius forficatus Illumina 57,098,550 reads NCBI SRA SRR1159752
Taxa in bold were sequenced in this study.*Taxa identified as possibly causing LBA based on LB scores.
1. Riesgo A, Farrar N, Windsor PJ, Giribet G, Leys SP (2014) The analysis of eight transcriptomes from all Poriferan classes reveals surprising genetic complexity in sponges. Mol Biol Evol31(5):1102–1120.
Whelan et al. www.pnas.org/cgi/content/short/1503453112 10 of 11
Table S2. Data statistics for each phylogenetic dataset
Dataset (no.) Taxa Genes Sites Gene occupancy, %Missing data
(including gaps), %
Full dataset (1) 76 251 81,008 75 40Certain paralogs removed (2) 76 250 80,735 75 40Taxa with high LB scores removed (3) 70 250 80,735 75 41Genes with high LB scores removed (4) 70 231 73,323 75 41All outgroups except Choanoflagellates removed (5) 62 231 73,323 73 44All outgroups removed (6) 60 231 73,323 73 44Slowest evolving half of genes (7) 70 115 33,403 80 35Genes with lowest half of RCFV values (8) 70 115 41,933 81 43Heterogeneous genes removed (9) 76 228 66,571 76 38Genes with high LB scores removed (10) 76 210 59,733 75 38Taxa with high LB scores removed (11) 70 210 59,733 82 38All outgroups except Choanoflagellates removed (12) 62 210 59,733 74 40All outgroups removed (13) 60 210 59,733 75 40Certain and uncertain paralogs removed (14) 76 209 60,768 72 41Taxa with high LB scores removed (15) 70 209 60,768 78 42Genes with high LB scores removed (16) 70 191 54,193 78 42All outgroups except Choanoflagellates removed (17) 62 191 54,193 70 44All outgroups removed (18) 60 191 54,193 71 44Slowest evolving half of genes (19) 70 89 23,680 78 35Genes with lowest half of RCFV values (20) 70 96 33,069 80 40Heterogeneous genes removed (21) 76 195 52,543 73 39Genes with high LB scores removed (22) 76 178 46,542 73 39Taxa with high LB scores removed (23) 70 178 46,542 79 34All outgroups except Choanoflagellates removed (24) 62 178 46,542 71 41All outgroups removed (25) 60 178 46,542 72 41
Datasets nested in others have the same data filtered as corresponding upper-level datasets (see Fig. 2).
Whelan et al. www.pnas.org/cgi/content/short/1503453112 11 of 11