Error, signal, and the placement of Ctenophora sister to all other animals

17
Error, signal, and the placement of Ctenophora sister to all other animals Nathan V. Whelan a,1 , Kevin M. Kocot b , Leonid L. Moroz c , and Kenneth M. Halanych a a Molette Biology Laboratory for Environmental and Climate Change Studies, Department of Biological Sciences, Auburn University, Auburn, AL 36849; b School of Biological Sciences, University of Queensland, St. Lucia, QLD 4101, Australia; and c Department of Neuroscience and McKnight Brain Institute, University of Florida, Gainesville, FL 32610 Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved March 24, 2015 (received for review February 18, 2015) Elucidating relationships among early animal lineages has been difficult, and recent phylogenomic analyses place Ctenophora sister to all other extant animals, contrary to the traditional view of Porifera as the earliest-branching animal lineage. To date, phylogenetic support for either ctenophores or sponges as sister to other animals has been limited and inconsistent among studies. Lack of agreement among phylogenomic analyses using different data and methods obscures how complex traits, such as epithelia, neurons, and muscles evolved. A consensus view of animal evolution will not be accepted until datasets and methods converge on a single hypothesis of early metazoan relationships and putative sources of systematic error (e.g., long- branch attraction, compositional bias, poor model choice) are assessed. Here, we investigate possible causes of systematic error by expanding taxon sampling with eight novel transcriptomes, strictly enforcing orthology inference criteria, and progressively examining potential causes of systematic error while using both maximum-likelihood with robust data partitioning and Bayesian inference with a site-heteroge- neous model. We identified ribosomal protein genes as possessing a conflicting signal compared with other genes, which caused some past studies to infer ctenophores and cnidarians as sister. Importantly, biases resulting from elevated compositional heterogeneity or ele- vated substitution rates are ruled out. Placement of ctenophores as sister to all other animals, and sponge monophyly, are strongly sup- ported under multiple analyses, herein. phylogenomics | Metazoa | Ctenophora | Porifera | Cnidaria R esolving relationships among extant lineages at the base of the metazoan tree is integral to understanding evolution of complex animal traits, including nervous systems and gastrula- tion. Historically, sponges and placozoans, both of which have relatively simple body plans and lack neurons, have been con- sidered to diverge from other animals earlier than ctenophores, cnidarians, and bilaterians (1). Phylogenomic studies have resulted in controversial hypotheses placing either Placozoa (Fig. 1A) (2), ctenophores (ctenophore-sister hypothesis) (Fig. 1B) (37), or a clade of ctenophores and sponges (Fig. 1C) (6) as sister to all remaining animals. Others (710) have claimed nontraditional findings resulted from systematic error and argued for traditional placement of sponges as sister to all remaining animals (Eumetazoa, or Porifera-sister hypothesis) (Fig. 1D) and a sister relationship between ctenophores and cnidarians (Coelenterata) (Fig. 1D). Limited statistical support for various hypotheses and conflict among, and even within studies, has undermined confidence in our understanding of early animal evolution. Basal metazoan relationships must be resolved with greater consistency before a consensus viewpoint is widely accepted. Long-branch attraction (LBA) (11), which occurs when two divergent lineages are artificially inferred as related because of substitutional saturation (11), is perhaps the most often evoked explanation for controversial or spurious phylogenetic results (710, 12, 13). Additional sources of systematic error include poor taxon or character sampling (10, 14, 15), large amounts of missing data (16), and model misspecification (16, 17). Such errors have been implicated as influencing the position of ctenophores in metazoan phylogeny studies (58). For example, Ryan et al. (6) recovered a sister relationship between sponges and ctenophores in analyses where taxa with high amounts of missing data were excluded, and support for this was highest in Bayesian inference with the CAT (17) substitution model. However, ctenophores were recovered as sister to all other extant animals in maximum- likelihood analyses with greater taxon sampling (Bayesian in- ference never converged for datasets with more than 19 taxa). The CAT model is a site-heterogeneous model that may handle LBA artifacts better than site-homogeneous substitution models like GTR (18). Therefore, LBA plausibly influenced the phylo- genetic position of ctenophores in analyses of Ryan et al. (6) that recovered ctenophores-sister. In Moroz et al. (5), strong nodal support for ctenophores sister to all other extant animals dis- appeared when both the strictest orthology criteria were enforced and ctenophore taxon sampling increased. Thus, further consid- eration of systematic error influencing phylogeny reconstruction at the base of the animal tree is desirable. The ctenophore-sister hypothesis has challenged our under- standing of early metazoan evolution, but given conflicting results (210), this and other hypotheses must be carefully scrutinized. Ideally, if robust datasets are assembled and causes of systematic error are accounted for, different datasets and analytical methods will converge on a single phylogenetic hypothesis (19). However, practical barriers exist in assembling robust datasets free of sys- tematic error. For example, phylogenomic datasets are prone to missing data given the incomplete nature of transcriptome and even genome sequences, and orthology determination among Significance Traditional interpretation of animal phylogeny suggests traits, such as mesoderm, muscles, and neurons, evolved only once given the assumed placement of sponges as sister to all other animals. In contrast, placement of ctenophores as the first branching animal lineage raises the possibility of multiple origins of many complex traits considered important for animal di- versification and success. We consider sources of potential error and increase taxon sampling to find a single, statistically robust placement of ctenophores as our most distant animal relatives, contrary to the traditional understanding of animal phylogeny. Furthermore, ribosomal protein genes are identified as creating conflict in signal that caused some past studies to recover a sister relationship between ctenophores and cnidarians. Author contributions: N.V.W., K.M.K., L.L.M., and K.M.H. designed research; N.V.W. and K.M.K. performed research; N.V.W. and K.M.K. analyzed data; and N.V.W., K.M.K., L.L.M., and K.M.H. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. Data deposition: The sequence reported in this paper has been deposited in the NCBI Sequence Read Archive, www.ncbi.nlm.nih.gov/sra (accession no. PRJNA278284). Tran- scriptome assemblies, phylogenetic datasets, and an annotation file were deposited to figshare, figshare.com (doi: 10.6084/m9.figshare.1334306). 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1503453112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1503453112 PNAS | May 5, 2015 | vol. 112 | no. 18 | 57735778 EVOLUTION

Transcript of Error, signal, and the placement of Ctenophora sister to all other animals

Error, signal, and the placement of Ctenophora sisterto all other animalsNathan V. Whelana,1, Kevin M. Kocotb, Leonid L. Morozc, and Kenneth M. Halanycha

aMolette Biology Laboratory for Environmental and Climate Change Studies, Department of Biological Sciences, Auburn University, Auburn, AL 36849;bSchool of Biological Sciences, University of Queensland, St. Lucia, QLD 4101, Australia; and cDepartment of Neuroscience and McKnight Brain Institute,University of Florida, Gainesville, FL 32610

Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved March 24, 2015 (received for review February 18, 2015)

Elucidating relationships among early animal lineages has beendifficult, and recent phylogenomic analyses place Ctenophora sisterto all other extant animals, contrary to the traditional view of Poriferaas the earliest-branching animal lineage. To date, phylogenetic supportfor either ctenophores or sponges as sister to other animals has beenlimited and inconsistent among studies. Lack of agreement amongphylogenomic analyses using different data and methods obscureshow complex traits, such as epithelia, neurons, andmuscles evolved. Aconsensus view of animal evolution will not be accepted until datasetsand methods converge on a single hypothesis of early metazoanrelationships and putative sources of systematic error (e.g., long-branch attraction, compositional bias, poormodel choice) are assessed.Here, we investigate possible causes of systematic error by expandingtaxon sampling with eight novel transcriptomes, strictly enforcingorthology inference criteria, and progressively examining potentialcauses of systematic error while using both maximum-likelihood withrobust data partitioning and Bayesian inference with a site-heteroge-neous model. We identified ribosomal protein genes as possessinga conflicting signal compared with other genes, which caused somepast studies to infer ctenophores and cnidarians as sister. Importantly,biases resulting from elevated compositional heterogeneity or ele-vated substitution rates are ruled out. Placement of ctenophores assister to all other animals, and sponge monophyly, are strongly sup-ported under multiple analyses, herein.

phylogenomics | Metazoa | Ctenophora | Porifera | Cnidaria

Resolving relationships among extant lineages at the base ofthe metazoan tree is integral to understanding evolution of

complex animal traits, including nervous systems and gastrula-tion. Historically, sponges and placozoans, both of which haverelatively simple body plans and lack neurons, have been con-sidered to diverge from other animals earlier than ctenophores,cnidarians, and bilaterians (1). Phylogenomic studies have resultedin controversial hypotheses placing either Placozoa (Fig. 1A) (2),ctenophores (ctenophore-sister hypothesis) (Fig. 1B) (3–7), or aclade of ctenophores and sponges (Fig. 1C) (6) as sister to allremaining animals. Others (7–10) have claimed nontraditionalfindings resulted from systematic error and argued for traditionalplacement of sponges as sister to all remaining animals (Eumetazoa,or Porifera-sister hypothesis) (Fig. 1D) and a sister relationshipbetween ctenophores and cnidarians (Coelenterata) (Fig. 1D).Limited statistical support for various hypotheses and conflictamong, and even within studies, has undermined confidence inour understanding of early animal evolution. Basal metazoanrelationships must be resolved with greater consistency before aconsensus viewpoint is widely accepted.Long-branch attraction (LBA) (11), which occurs when two

divergent lineages are artificially inferred as related because ofsubstitutional saturation (11), is perhaps the most often evokedexplanation for controversial or spurious phylogenetic results (7–10, 12, 13). Additional sources of systematic error include poortaxon or character sampling (10, 14, 15), large amounts of missingdata (16), and model misspecification (16, 17). Such errors havebeen implicated as influencing the position of ctenophores inmetazoan phylogeny studies (5–8). For example, Ryan et al. (6)

recovered a sister relationship between sponges and ctenophoresin analyses where taxa with high amounts of missing data wereexcluded, and support for this was highest in Bayesian inferencewith the CAT (17) substitution model. However, ctenophoreswere recovered as sister to all other extant animals in maximum-likelihood analyses with greater taxon sampling (Bayesian in-ference never converged for datasets with more than 19 taxa).The CAT model is a site-heterogeneous model that may handleLBA artifacts better than site-homogeneous substitution modelslike GTR (18). Therefore, LBA plausibly influenced the phylo-genetic position of ctenophores in analyses of Ryan et al. (6) thatrecovered ctenophores-sister. In Moroz et al. (5), strong nodalsupport for ctenophores sister to all other extant animals dis-appeared when both the strictest orthology criteria were enforcedand ctenophore taxon sampling increased. Thus, further consid-eration of systematic error influencing phylogeny reconstructionat the base of the animal tree is desirable.The ctenophore-sister hypothesis has challenged our under-

standing of early metazoan evolution, but given conflicting results(2–10), this and other hypotheses must be carefully scrutinized.Ideally, if robust datasets are assembled and causes of systematicerror are accounted for, different datasets and analytical methodswill converge on a single phylogenetic hypothesis (19). However,practical barriers exist in assembling robust datasets free of sys-tematic error. For example, phylogenomic datasets are proneto missing data given the incomplete nature of transcriptome andeven genome sequences, and orthology determination among

Significance

Traditional interpretation of animal phylogeny suggests traits,such as mesoderm, muscles, and neurons, evolved only oncegiven the assumed placement of sponges as sister to all otheranimals. In contrast, placement of ctenophores as the firstbranching animal lineage raises the possibility of multiple originsof many complex traits considered important for animal di-versification and success. We consider sources of potential errorand increase taxon sampling to find a single, statistically robustplacement of ctenophores as our most distant animal relatives,contrary to the traditional understanding of animal phylogeny.Furthermore, ribosomal protein genes are identified as creatingconflict in signal that caused some past studies to recover a sisterrelationship between ctenophores and cnidarians.

Author contributions: N.V.W., K.M.K., L.L.M., and K.M.H. designed research; N.V.W. andK.M.K. performed research; N.V.W. and K.M.K. analyzed data; and N.V.W., K.M.K., L.L.M.,and K.M.H. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

Data deposition: The sequence reported in this paper has been deposited in the NCBISequence Read Archive, www.ncbi.nlm.nih.gov/sra (accession no. PRJNA278284). Tran-scriptome assemblies, phylogenetic datasets, and an annotation file were deposited tofigshare, figshare.com (doi: 10.6084/m9.figshare.1334306).1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1503453112/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1503453112 PNAS | May 5, 2015 | vol. 112 | no. 18 | 5773–5778

EVOLU

TION

distantly related species can be difficult (20, 21). Computationallimitations of complex phylogenetic methods can also preventusing what may be the best theoretical phylogenetic method.Nevertheless, both data quality and appropriate methods shouldbe emphasized if deep relationships of any organismal group areto be robustly resolved.Here, we have assembled a more comprehensive phyloge-

nomic dataset of metazoan lineages that branched early in ani-mal evolution than previous studies to alleviate taxon samplingconcerns. We have sequenced transcriptomes of eight additionalspecies and used other deeply sequenced publicly available tran-scriptomes (including some not used in past studies). Additionally,we use a number of data-filtering steps to explore the sensitivity ofthese results to potential sources of error. This process includesstrict orthology determination, removal of taxa and genes thatmay cause LBA, and removal of heterogeneous genes that maycause model misspecification. Regardless of how data were fil-tered, all maximum-likelihood analyses with model partitioningand all Bayesian inference analyses using a site-heterogeneousmodel recover ctenophores as sister to all other animals withstrong support. We identify overreliance on ribosomal proteingenes in some datasets (7, 9) as the source of incongruence amongprevious phylogenomic studies. We also find strong support forsponge monophyly in contrast to previous reports (7, 22–24).

ResultsDatasets and Accounting for Biases. Orthology filtering of tran-scriptome and genome data from 76 species resulted in 251orthologous groups (OGs) and 81,008 aligned amino acid sites(Tables S1 and S2). TreSpEx (25) further identified 83 “certain”paralogs (i.e., sequences TreSpEx classified as high-confidenceparalogs) from 10 OGs. TreSpEx also identified 2,684 “un-certain” paralogs (i.e., sequences TreSpEx classified as possibly,

but not definitively, paralogous) from 104 OGs. Datasets withcertain and both certain and uncertain paralogs pruned werestarting points for progressively filtering other causes of system-atic error. Overall, 25 hierarchical datasets that had progressivelyfewer characters, but controlled for more potential causes ofsystematic errors, were assembled (Fig. 2 and Table S2) (alldatasets have been deposited on figshare, doi 10.6084/m9.fig-share.1334306). The percentage of gene occupancy and missingdata ranged from 70–82% and 35–44%, respectively (Table S2).Other than progressive data filtering, differences between data-sets analyzed here and those used in previous studies of basalmetazoan relationships (3–10) are increased character samplingand less missing data compared with some studies and increasednonbilaterian taxon sampling. In contrast to Nosenko et al. (7)and Philippe et al. (9), both of which relied heavily on ribosomalproteins (i.e., 52% and 71%, respectively), our dataset did notcontain a large representation of any one gene class (e.g., only 8of 250 were ribosomal protein genes) (SI Methods).

Ctenophores Sister to Other Extant Animals andMonophyletic Sponges.All maximum-likelihood analyses with outgroups resulted in to-pologies with strong support [≥ 97% bootstrap support (BS)] forctenophores sister to all other extant animal lineages (datasets1–21 in Figs. 2 and 3 and Figs. S1–S4). Importantly, Phylobayes(26) analyses using the CAT-GTR+Γ model also resulted inphylogenies with Ctenophora sister to all other animals with 100%posterior probability (PP), and 100% PP for sponge monophyly(datasets 6 and 16 in Figs. 2 and 3 and Fig. S5 B and C). Inferredrelationships among major sponge lineages (i.e., Demospongiae +Hexactinellida sister to Calcarea + Homoscleromorpha) wereconsistent with morphology (27, 28) and most other molecularanalyses that have recovered monophyletic sponges (datasets 1–27in Figs. 2 and 3 and Figs. S1–S5) (4, 5, 8, 9, 28–30). Alternativehypotheses of basal animal relationships (Fig. 1) were rejected byevery phylogenetic analysis as measured by the approximatelyunbiased (AU) (31) test (P ≤ 0.001) (Table 1). Overall, our resultsoverwhelmingly reject alternative hypotheses to ctenophores sisterto all other extant animals (Table 1).Ribosomal protein genes can have conflicting signal with most

other genes (32). The datasets of Philippe et al. (9) and Nosenkoet al. (7), which recovered cnidarians and ctenophores sister, hadhigh proportions of ribosomal protein genes (67 of 128 and 87 of122 genes, respectively). Nosenko et al. (7) analyzed a datasetwithout ribosomal protein genes and recovered ctenophores sisterto all other animals, but a similar analysis has not been done forthe original Philippe et al. (9) dataset. If certain topologies arerecovered only with the use of a high proportion of one group ofgenes (e.g., ribosomal protein genes), this may indicate a phylo-genetic signal that conflicts with the true evolutionary history. Assuch, we analyzed the Philippe et al. (9) dataset with ribosomalproteins removed (67 of 128) using maximum-likelihood andBayesian inference, and neither reconstruction placed ctenophoresand cnidarians sister as in the original study (Fig. S5 D and E).

Fig. 1. Phylogenetic hypotheses from previous molecular studies. (A) Placozoa-sister hypothesis (2). (B) Ctenophora-sister hypothesis (3–7). Placozoa was notincluded in some studies that found support for this topology. (C) Ctenophora +Porifera-sister hypothesis (6). (D) Traditional Porifera-sister hypothesis (7–10).

Full dataset

“Certain” paralogsremoved

“Uncertain” & “certain”paralogs removed

Taxa with high LB scores removed

Heterogeneousgenes removed

Lowest halfRCFV genes

Taxa with high LBscores removed

Heterogeneousgenes removed

Lowest halfRCFV genes

Genes with high LB scores removed

Genes with high LBscores removed

Outgroups removed

Slowest evolving half

Outgroupsremoved

Taxa with high LB scores removed

Genes with high LBscores removed

Genes with high LBscores removed

Outgroupsremoved

Slowest evolving half

Outgroupsremoved

Taxa with high LB scores removed

1

2

3

4

8

9

5, 22 6 7 10

11, 23

15, 24 16 17 20

21, 25

14 19

13 18

12

Fig. 2. Hierarchy of datasets with progressive datafiltering. Numbers associated with each datasets arereferences throughout the text. Datasets 5, 11, 15,21 have Choanoflagellates as an outgroup, anddatasets 22, 23, 24, 25 have all outgroups removed.RAxML analyses were used for each dataset, andshaded datasets were also analyzed with Phylo-Bayes. Data matrix statistics (e.g., number of sites,percentage of missing data, and so forth) for eachdataset can be found in Table S2.

5774 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al.

The maximum-likelihood analysis recovered strong support forctenophores as sister to all other metazoan lineages (BS = 93)(Fig. S5D). However, Bayesian inference (Fig. S5E) recoveredsponges as sister to all other metazoans, but support for this andother deep nodes were low (PP ≤ 90).

Systematic Biases and Their Effect on Phylogenetic Inference. Long-branch (LB) scores (28), a measurement for identifying taxa andOGs that could cause LBA, were calculated for each species andOG with TreSpEx (25). In total, we identified six “long-branched”taxa, all nonmetazoans (Fig. S6A and Table S2), and 28 OGswith high LB scores compared with other OGs (Fig. S6 B and C).We found complete congruence in relationships among basalmetazoan phyla in trees inferred with (datasets 1, 2, 8, 12, and 18in Fig. 2) and without (datasets 3–7, 9–11, 13–17, 19–21, and 22–25 in Fig. 2) taxa and genes that had high LB scores, and nodalsupport for critical nodes showed little variation among analyses(Fig. 3 and Figs. S1–S5). Removing OGs with high amino acidcompositional heterogeneity (datasets 7–11, 17–21, 23, and 25 inFig. 2) also had no effect on branching order (Fig. 3 and Figs. S2A–E, S3 E and F, S4 A–E, and S5A). Topologies inferred withonly the slowest evolving half of OGs assembled here (datasets 6and 16 in Fig. 2) (i.e., least saturated and least prone to homo-plasy; see Fig. S7 for saturation plots) recovered high support forctenophores sister to all other animals and sponge monophylywith both maximum-likelihood (BS = 100) (Fig. 3 and Figs. S1Fand S3D) and Bayesian inference using the CAT-GTR model(PP = 1) (Fig. 3 and Fig. S5 B and C). Importantly, our datasetsof the slowest evolving half of OGs were of a broad range of

protein classes (SI Methods; figshare), rather than consisting of amajority of ribosomal proteins (7, 9).Inaccurate orthology assignment can also introduce systematic

error into phylogenomic analyses. Although relationships amongbasal lineages were unaffected, removal of paralogs as identifiedby TreSpEx appeared to have the greatest effect on support forsome critical nodes. For example, most topologies with bothcertain and uncertain paralogs removed had strong support forsponge monophyly (i.e., ≥ 95% BS) (datasets 12–14 and 18–20 inFigs. 2 and 3 and Figs. S2F, S3 A, B, and F, and S4 A and B), butfour analyses with only certain paralogs removed recovered lowsupport (< 90% BS) for sponge monophyly (datasets 5, 7, 9, and10 in Figs. 2 and 3 and Figs. S1E and S2 A, D, and E).Because outgroup sampling has the potential to influence

rooting of the animal tree, we explored outgroup sampling aswell. When all outgroups except two choanoflagellates were re-moved (datasets 5, 11, 15, and 21 in Fig. 2), inferred nonbilaterianrelationships were identical as in analyses we performed with fulloutgroup sampling (datasets 5, 11, 15, and 21 in Figs. 2 and 3 andFigs. S1E, S2E, S3C, and S4C), but support for sponge mono-phyly decreased. In these analyses the leaf-stability indices forhomoscleromorph and calcareous sponges were less than 0.94,but in all other analyses they were greater than 0.97 (Fig. S5 F andG). Regardless, when choanoflagellates were the only outgroup,ctenophores were still recovered as the deepest split within theanimal tree with 100% BS support. Analyses with all outgrouptaxa removed (datasets 22–25 in Fig. 2) recovered identical re-lationships among major metazoan lineages as other analyses(Figs. S4 D–F and S5A). However, we observed low support forrelationships among ctenophores, sponges, and placozoans inthese analyses. This resulted from the long placozoan branchbeing attracted to ctenophores in the absence of outgroup taxa asindicated by bootstrap tree topologies and leaf-stability index forTrichoplax of less than 0.92, whereas leaf-stability indices weregreater than 0.99 in all other analyses (Fig. S5 F and G).

DiscussionPlacement of Ctenophores Sister to all Remaining Animals Is NotSensitive to Systematic Errors. Every analysis conducted hereinstrongly supported the ctenophore-sister hypothesis (Fig. 3 andTable 1). A major hurdle to wide acceptance of ctenophores assister to other animals has been that different analyses haveyielded conflicting hypotheses of early animal phylogeny (2–9).Sensitivity to the selected model of molecular evolution has beenespecially problematic (2–9). In contrast, both maximum-likeli-hood analyses using data partitioning and Bayesian analysesusing the CAT-GTR model of our datasets resulted in identicalbranching patterns among ctenophores, sponges, placozoans,cnidarians, and bilaterians. Past critiques of studies that foundctenophores to be sister to all other animals have emphasized theCAT model as the most appropriate model for deep phyloge-nomics because it is an infinite mixture model that accounts forsite-heterogeneity (7, 8, 29). Notably, when the CAT-GTR modelwas used here (datasets 6 and 16 in Fig. 2), we recovered cteno-phores-sister to all other metazoans (Fig. 3 and Fig. S5 B and C).The argument for LBA (7–10) or saturated datasets (7, 8) as the

reason past studies found ctenophores to be sister to all other an-imals seems to have been overstated. The recovered position ofctenophores was identical in analyses with (datasets 1, 2, 8, 12, and18 in Fig. 2 and Figs. S1 A and B, S2 B and F, and S3F) and without(datasets 3–7, 9–11, 13–17, and 19–25 in Fig. 2, and Figs. S1 C–F, S2A and C–E, S3 A–E, S4, and S5 A–C) taxa and genes with high LBscores, and analyses with the slowest evolving genes (datasets 6 and16 in Fig. 2 and Fig. S7) also recovered ctenophores sister to allother animals (Fig. 3 and Figs. S1F, S3D, and S5 B and C). Fur-thermore, despite the long internal branch leading to the cteno-phore clade, the position of this lineage did not change in anyanalysis including those when outgroups were removed (datasets 5,11, 15, 21, and 22–25 in Fig. 2 and Figs. S1E, S2E, S3C, and S4 C–F). If this branch was being artificially attracted toward outgroups,then employment of different outgroup schemes would be expected

Aurelia aurita

Hormathia digitata

Rossella fibulata

Beroe abyssicola

Drosophila melanogaster

Craseoa lathetica

Petromyzon marinus

Hydra oligactis

Spongilla alba

Strongylocentrotus purpuratus

Abylopsis tetragona

Ministeria vibrans

Danio rerio

Pleurobrachia bachei

Lottia gigantea

Eunicella verrucosa

Hydra viridissima

Tethya wilhelma

Mortierella verticillata

Nanomia bijuga

Sycon coactum

Oscarella carmela

Vallicula sp.

Spizellomyces punctatus

Mnemiopsis leidyi

Aiptasia pallida

Allomyces macrogynus

Mertensiidae sp.

Petrosia ficiformisPseudospongosorites suberitoides

Rhizopus oryzae

Amoebidium parasiticum

Capitella teleta

Crella elegans

Monosiga ovata

Acropora digitifera

Aphrocallistes vastus

Bolinopsis infundibulum

Capsaspora owczarzakiSalpingoeca rosetta

Tubulanus polymorphus

Agalma elegans

Corticium candelabrum

Hyalonema populiferum

Hydra vulgaris

Euplokamis dunlapae

Lithobius forficatus

Bolocera tuediae

Pleurobrachia sp.

Amphimedon queenslandica

Hemithiris psittacea

Ircinia fasciculata

Dryodora glandiformis

Kirkpatrickia variolosa

Sphaeroforma arctica

Daphnia pulex

Homo sapiens

Platygyra carnosa

Ephydatia muelleriChondrilla nucula

Saccoglossus kowalevskii

Periphylla periphylla

Trichoplax adhaerens

Sympagella nux

Latrunculia apicalis

Coeloplana astericola

Physalia physalis

Sycon ciliatum

Nematostella vectensis

Priapulus caudatus

96

95

61

9965

94

79

0.2

Beroe abyssicolaa

Pleurobrachia bacheiVallicula sp.

Pl

Mnemiopsis leidyii

Mertensiidae sp.M i t

Bolinopsis infundibulumM t iid

Euplokamis dunlapae

Pleurobrachia ssp.Dryodora glandifoormisB b i lb

p

Coeloplana asterricolaV lli l

pp

99

Rosse

Spongilla albaTethya wilhelma

S

Sycon coactum

Oscarella carmela

Petrosia ficiformisPseudospongosorites su

Crella elegans

Aphroc

Corticium candelabrum

H

Amphimedon queenAi fi if i

p

Ircinia fasciculata

Kirkpatrickia variolosagg

Ephydatia muellerip gp g

Chondrilla nuculayy

p

SympaA hh

Latrunculia apicalisC ll l

Sycon ciliatumll l

96

61

y gTrichoplax adhaerens

Aurelia aurita

Hormathia digitata

Craseoa lathetica

Hydra oligactis

Abylopsis tetragonaAl th ti

Eunicella verrucosa

Hydra viridissima

Nanomia bijuga

Aiptasia pallidaH thi di ithi

Acropora digitifera

Agalma elegans

Hydra vulgarisH d li tiH d li

yy

Bolocera tuediaegg

Platygyr

Periphylla periphyllaAb l ib l

Physalia physalisH d i idi id i idi

Nematostella vectensis

94

ella fifibulata

uberittoides

callisttes vastus

Hyaloonema populiferum

nslanndica

agellaa nuxlli tl t

raa carnosa

Fig. 3. Reconstructed maximum-likelihood topology of metazoan relation-ships inferred with dataset 10. Maximum likelihood and Bayesian topologiesinferred with other datasets (Fig. 2) have identical basal branching patterns(Figs. S1–S5). Nodes are supported with 100% bootstrap support unless oth-erwise noted. Support, as inferred from each dataset (Fig. 2), for nodes cov-ered by black boxes are in Table 1.

Whelan et al. PNAS | May 5, 2015 | vol. 112 | no. 18 | 5775

EVOLU

TION

to result in different ctenophore placement. Maximum-likelihoodand Bayesian inference using the CAT-GTR model of the leastsaturated datasets (datasets 6 and 16 in Fig. 2 and Fig. S7) re-covered identical basal relationships as our other analyses (Fig. 3and Figs. S1F, S3D, and S5 B and C), also indicating homoplasy andmodel choice did not bias results. Given the consistency among ouranalyses that were designed to have different levels of potentialbiases, we conclude that the ctenophore-sister hypothesis is robustto systematic errors.Rather than focusing on long branches, fast evolving genes, or

model misspecification as influencing the position of cteno-phores, the individual genes underlying datasets that resulted in

a sister relationship between ctenophores and cnidarians (7, 9)should be the focus of identifying problems with phylogeneticreconstruction. A benefit of phylogenomic datasets is that mul-tiple gene classes and many parts of the genome are analyzed. Assuch, phylogenomic datasets should not rely too heavily on asingle gene class. Past molecular studies that found support forCoelenterata and the Porifera-sister (7, 9) hypotheses appear tohave been strongly affected by a disproportionate reliance (i.e., >50%) on ribosomal protein genes. Nosenko et al. (7) and Philipeet al. (9) found support for ctenophores sister to cnidarians, butNosenko et al. (7) recovered ctenophores sister to all other ani-mals when ribosomal proteins were excluded. Ribosomal protein

Table 1. AU test P values for alternative hypotheses of animal relationships and support for ctenophora-sister and sponge monophyly

Dataset (no.) Porifera-sister Coelenterata Placozoa-sister Porifera + Ctenophora Ctenophora-sisterPorifera

monophyly

Full dataset (1) 2.E-04 5.E-06 1.E-76 2.E-05 100 BS 95 BS“Certain” paralogs

removed (2)2.E-04 1.E-05 3.E-03 5.E-79 100 BS 90 BS

Taxa with high LB scoresremoved (3)

6.E-06 2.E-06 1.E-62 2.E-07 100 BS 94 BS

Genes with high LB scoresremoved (4)

1.E-02 2.E-04 2.E-32 3.E-47 100 BS 94 BS

Choanoflagellate-onlyoutgroup (5)

4.E-45 1.E-04 1.E-05 6.E-104 100 BS 68 BS

All outgroups removed (22) N/A 3.E-55 N/A N/A N/A 84 BSSlowest evolving half

of genes (6)2.E-78 2.E-11 2.E-04 1.E-05 100 BS/100 PP 95 BS/100 PP

Genes with lowest half ofRCFV values (7)

3.E-06 2.E-57 9.E-47 2.E-03 100 BS 53 BS

Heterogeneous genesremoved (8)

7.E-52 2.E-49 9.E-05 3.E-05 100 BS 90 BS

Genes with high LB scoresremoved (9)

9.E-41 9.E-05 2.E-04 2.E-66 99 BS 90 BS

Taxa with high LB scoresremoved (10)

7.E-22 5.E-61 9.E-06 7.E-06 100 BS 88 BS

Choanoflagellate-onlyoutgroup (11)

1.E-03 3.E-36 1.E-02 6.E-104 92 BS 82 BS

All outgroups removed (23) N/A 2.E-04 N/A N/A N/A 92 BS“Certain” and “uncertain”

paralogs (12)6.E-39 1.E-05 4.E-44 4.E-05 100 BS 99 BS

Taxa with high LB scoresremoved (13)

6.E-05 8.E-06 6.E-30 8.E-09 100 BS 99 BS

Genes with high LB scoresremoved (14)

8.E-105 9.E-05 1.E-45 5.E-102 99 BS 97 BS

Choanoflagellate-onlyoutgroup (15)

5.E-59 5.E-39 7.E-05 1.E-72 100 BS 66 BS

All outgroups removed (24) N/A 2.E-07 N/A N/A N/A 92 BSSlowest evolving half of

genes (16)1.E-04 1.E-52 1.E-11 1.E-07 100 BS/100 PP 94 BS/100 PP

Genes with lowest half ofRCFV values (17)

5.E-09 1.E-68 5.E-07 3.E-72 100 BS 93 BS

Heterogeneous genesremoved (18)

7.E-07 4.E-10 6.E-77 8.E-09 99 BS 100 BS

Genes with high LB scoresremoved (19)

2.E-46 1.E-08 2.E-04 1.E-57 100 BS 96 BS

Taxa with high LB scoresremoved (20)

1.E-03 3.E-92 1.E-66 3.E-113 100 BS 100 BS

Choanoflagellate-onlyoutgroup (21)

2.E-61 1.E-31 2.E-04 4.E-40 100 BS 77 BS

All outgroups removed (25) N/A 9.E-05 N/A N/A N/A 96 BSPhilippe et al. (7) Maximum

likelihood0.001 0.010 0.008 0.075 93 BS 42 BS

Philippe et al. (7) Bayesianinference

— — — — N/A 99 PP

Dataset numbers are as in Fig. 2.

5776 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al.

datasets from these studies are less saturated than datasets as-sembled here based on linear regression of patristic distanceversus uncorrected genetic distance (Fig. S7) (7–9, 25). This lowermutational saturation has been the primary rational for empha-sizing ribosomal genes when inferring deep animal relationship(7). However, standard measurements of sequence saturation (7,8, 25) average across the length of the sequence. Thus, a sequencewith a few variable, highly saturated sites may appear less satu-rated than a sequence with numerous variable sites but less sat-uration per site. Furthermore, extremely low mutation rates canindicate selection and result in too little phylogenetic information,both of which could lead to the inference of incorrect relation-ships (33). Our maximum-likelihood analysis of Philipe et al.’s (9)dataset with ribosomal genes removed recovered support forctenophores as sister to all other animals (Fig. S5D). Basal re-lationships were poorly resolved in the Bayesian analysis, whichmay be a result of too few characters, but ctenophores and cni-darians were not recovered as sister (Fig. S5E). Notably, a studyfocused only on myxozoan cnidarians (34) used the same matrixas Philippe et al. (9), but added two highly divergent cnidariansand recovered ctenophores sister to all other animals. Ribosomalprotein genes have previously been identified as a potentialsource of phylogenetic error (32), and the above indicates thatctenophores sister to cnidarians as in Philippe et al. (9) wascaused by either limited cnidarian taxon sampling, misleadingsignal in ribosomal genes, or both. More work is needed to assesssaturation, possible convergent evolution, and selective pressuresof ribosomal proteins in ctenophores, sponges, placozoans, andcnidarians. However, differences in topologies when ribosomalproteins are included or excluded strongly imply a misleadingsignal in ribosomal protein genes. Put simply, it appears highlyimprobable that all genes other than ribosomal protein genescould be recovering an incorrect phylogeny.

Sponges Are Monophyletic. Sponge monophyly, although less con-troversial than the phylogenetic positions of ctenophores, cnidar-ians, and placozoans remains an important question as severalstudies have supported sponge paraphyly (7, 22–24). In regards toinferring the characteristics of the metazoan ancestor, spongeparaphyly, coupled with sponges being at the base of the metazoanphylogeny, is an attractive hypothesis that implies the metazoanancestor was sponge-like. However, sponge monophyly was re-covered in all of our analyses and best supported when the strictestorthology criteria were applied with TreSpEx (Fig. 3 and Table 1),which also removes sequences resulting from sample contamina-tion (e.g., endosymbionts).This observation suggests that spuriousparalogs or sequence contamination may have been a source oferror when sponges were found paraphyletic, but datasets thathave recovered sponge paraphyly were also much smaller thanthose analyzed here (e.g., refs. 7, 22–24). Sponge monophyly andthe well-supported ctenophore-sister hypothesis complicates in-ferring the ancestral condition of metazoan and other majormetazoan groups (e.g., Placozoa + Cnidaria + Bilateria) becausemany sponge characteristics are likely apomorphic traits. Ourrobust support of sponge monophyly agrees with morphology(35) and most other large molecular datasets (5, 6, 8–10).

ConclusionsFor more than a century, sponges were traditionally consideredsister to all other extant metazoans because unlike ctenophores,cnidarians, and bilaterians, they lack true tissues and body sym-metry (36, 37). Sponges also possess choanocyte cells that aresimilar in morphology to choanoflagellates, the sister group tometazoans (36). However, Mah et al. (38) found that homologybetween choanoflagellates and sponge choanocytes is not as de-finitive as previously assumed. Similarly, some authors have arguedthat Placozoans are sister to all other animals because they lackneural and muscular systems and also share similarities in mito-chondrial genome size with choanoflagellaes (2, 39). A commontheme of these two hypotheses is the placement of morphologicallysimple animals near the base of the animal tree, but complexity is

not a good proxy for metazoan evolution (40). Challenges tolong-held viewpoints of morphological complexity and assumedimprobability of convergent evolution must not be dismissedsimply because they seem unlikely at face value, especially con-sidering a growing body of evidence that supports convergentevolution of many animal traits including neurons (5, 6, 41–43).Furthermore, the Porifera-sister hypothesis lacks critical evalu-ation and homology of many characters in these taxa still needthorough analysis (44). Overall, findings presented here robustlysupport ctenophores as sister to all remaining animals.

MethodsTaxon and Character Sampling. Taxon sampling included previously availabledata and eight new transcriptomes from two choanoflagellates, three glasssponges (Hexactinellida), two demosponges, and a deep-sea cnidarian (Scy-phozoa) (Table S1). Briefly, RNA was extracted, reverse-transcribed, and am-plified using the SMART kit (Clontech), and sequenced on an Illumina HiSeq(SI Methods). Raw or assembled transcriptome data for 68 additional specieswere retrieved from public databases (Table S1).

Raw Illumina transcriptome data were digitally normalized using nor-malize-by-median.py (45) with a k-mer size of 20, a desired coverage of 30,and four hash tables with a lower bound of 2.5 × 109. Normalized Illuminareads were assembled using default parameters in Trinity v20131110 (46).Raw 454 transcriptome data were assembled with Newbler (47).

Orthology Determination and Data Filtering. Putative orthologs were de-termined for each species using HaMStR v13.2 (48) using the model organismcore ortholog set. OGs, determined by HaMStR, were further processed us-ing a custom pipeline that filtered OGs with too much missing data (i.e., OGswith less than 37 species), aligned sequences, and filtered potential paralogs(https://github.com/kmkocot) (SI Methods).

TreSpEx (25), which requires individual gene trees for each OG, was usedto identify putative paralogs and exogenous contamination missed by ourinitial orthology inference approach. Gene trees were inferred with RAxMLv.8.0.2 (49) with 100 rapid bootstrap replicates followed by a full maximum-likelihood inference; each tree was inferred with the LG+Γmodel, which wasby far the most common best-fitting model when the complete dataset waspartitioned (see below). Paralogs in the initial dataset were detected withTreSpEx using the automated BLAST method and the prepackaged Capitellateleta and Helobdella robusta BLAST databases. This method identified twoclasses of paralogs: “certain” or sequences that are high-confidence paralogs,and “uncertain” or sequences that are potential paralogs. From the initial,251 gene dataset, we created one dataset by removing certain paralogs andanother dataset with both certain and uncertain paralogs pruned (Fig. 2);after pruning, OGs with fewer than 37 taxa were removed. LB scores (25)were calculated for each taxon and OG with TreSpEx. Following Struck (25),these values were plotted in R (Fig. S6) (50), and outliers were identified astaxa or genes that could cause LBA artifacts. After removal of taxa and genes,each OG was ranked by evolutionary rate with a custom python script (https://github.com/nathanwhelan) following Telford et al. (51). Datasets with onlythe slowest half of remaining genes were then generated to assess if fastevolving homoplasious genes were biasing inferences (Fig. 2).

We used BaCoCa (52) and two metrics (χ2-test of heterogeneity and rel-ative composition frequency variability; RCFV) (53) to identify genes withamino acid compositional heterogeneity. Some datasets were further fil-tered by removing non-choanoflagellate outgroups and all outgroup taxa todetermine if outgroup choice affects inferred relationships. Saturation ofeach filtered dataset with full outgroup sampling was explored with TreSpExand plotted in R (Fig. S7) to provide a further metric to compare datasets.

Phylogenetics. In addition to removing compositionally heterogeneous genesfrom some datasets, two approaches were used to handle site-heterogeneity:(i) partitioning schemes for each dataset and associated protein substitutionmodels were determined using the relaxed clustering method in PartitonFinder(54) with 20% clustering and the corrected Akaike information criterion; (ii) asite-heterogeneous mixture model, CAT-GTR+Γ, was used in PhyloBayes (26).Maximum-likelihood topologies were inferred with RAxML using partitions asindicated by ParitionFinder, associated best-fit substitution models, and thegamma parameter to model rate-heterogeneity. Nodal support was measuredwith 100 fast bootstrap replicates. Phylobayes analyses were run with two chainsuntil themaxdiff statistic between chains was below 0.3 as measured by bpcomp(26). Convergence was also assessed with tracecomp (26) to ensure each pa-rameter had a maximum discrepancy between chains of less than 0.3 and aneffective sample size of at least 50. Computational demands and convergence

Whelan et al. PNAS | May 5, 2015 | vol. 112 | no. 18 | 5777

EVOLU

TION

issues prevented us from using the CAT-GTR+Γ model for most datasets.Therefore, Bayesian phylogenies are only reported for the two analyses of theslowest evolving half of OGs. Leaf-stability indices (55) for each taxon weremeasured in PhyUtility (56) to identify potentially unstable taxa in each dataset.

Maximum-likelihood and Bayesian inference trees were also inferred fromthe Philippe et al. (9) dataset with ribosomal proteins removed (i.e., 67 of 128genes) to determine if a single gene class-biased phylogenetic inference.Ribosomal proteins were filtered from the original dataset following datamatrix annotations (9) and the matrix was split into individual genes formodel testing using a custom R script (https://github.com/nathanwhelan).

The AU test (31) was used to determine if a priori hypotheses of basalmetazoan relationships could be rejected (Fig. 1). Topological constraintswere enforced in RAxML and the most likely tree given this constraint was

inferred with the same partitioning scheme and models used for unconstrainedphylogenetic inference. Per site log-likelihoods for trees were calculated in RAxMLand AU tests were performed in Consel (57).

ACKNOWLEDGMENTS. We thank members of the Molette Biology Labora-tory for Environmental and Climate Change Studies at Auburn University forhelp with bioinformatics and data collection, especially Damien Waits. Thiswork was made possible in part by a grant of high-performance computingresources and technical support from the Alabama Supercomputer Authorityand was supported by the US National Aeronautics and Space Administra-tion (Grant NASA-NNX13AJ31G) and in part by National Science Foundation(Grant 1146575). This is Molette Biology Laboratory Contribution 36 andAuburn University Marine Biology Program Contribution 128.

1. DohrmannM, Wörheide G (2013) Novel scenarios of early animal evolution—Is it timeto rewrite textbooks? Integr Comp Biol 53(3):503–511.

2. Dellaporta SL, et al. (2006) Mitochondrial genome of Trichoplax adhaerens supportsplacozoa as the basal lower metazoan phylum. Proc Natl Acad Sci USA 103(23):8751–8756.

3. Dunn CW, et al. (2008) Broad phylogenomic sampling improves resolution of theanimal tree of life. Nature 452(7188):745–749.

4. Hejnol A, et al. (2009) Assessing the root of bilaterian animals with scalable phylo-genomic models. Proc Biol Sci 276(1802):4261–4270.

5. Moroz LL, et al. (2014) The ctenophore genome and the evolutionary origins of neuralsystems. Nature 510(7503):109–114.

6. Ryan JF, et al.; NISC Comparative Sequencing Program (2013) The genome of thectenophore Mnemiopsis leidyi and its implications for cell type evolution. Science342(6164):1242592.

7. Nosenko T, et al. (2013) Deep metazoan phylogeny: When different genes tell dif-ferent stories. Mol Phylogenet Evol 67(1):223–233.

8. Philippe H, et al. (2011) Resolving difficult phylogenetic questions: Why more se-quences are not enough. PLoS Biol 9(3):e1000602.

9. Philippe H, et al. (2009) Phylogenomics revives traditional views on deep animal re-lationships. Curr Biol 19(8):706–712.

10. Pick KS, et al. (2010) Improved phylogenomic taxon sampling noticeably affectsnonbilaterian relationships. Mol Biol Evol 27(9):1983–1987.

11. Felsenstein J (1978) Cases in which parsimony and compatability methods will bepositively misleading. Syst Zool 27(4):401–410.

12. Boussau B, et al. (2014) Strepsiptera, phylogenomics and the long branch attractionproblem. PLoS ONE 9(10):e107709.

13. Straub SCK, et al. (2014) Phylogenetic signal detection from an ancient rapid radia-tion: Effects of noise reduction, long-branch attraction, and model selection in crownclade Apocynaceae. Mol Phylogenet Evol 80:169–185.

14. Heath TA, Hedtke SM, Hillis DM (2008) Taxon sampling and the accuracy of phylo-genetic analyses. J Syst Evol 46(3):239–257.

15. Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: The beginning ofincongruence? Trends Genet 22(4):225–231.

16. Roure B, Baurain D, Philippe H (2013) Impact of missing data on phylogenies inferredfrom empirical phylogenomic data sets. Mol Biol Evol 30(1):197–214.

17. Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attractionartefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol7(Suppl 1):S4.

18. Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNAsequences. Lect Math Life Sci 17:57–86.

19. Philippe H, Delsuc F, Brinkmann H, Lartillot N (2005) Phylogenomics. Annu Rev EcolEvol Syst 36:541–562.

20. Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of ortho-logs inference projects and methods. PLOS Comput Biol 5(1):e1000262.

21. Gabaldón T (2008) Large-scale assignment of orthology: Back to phylogenetics? Ge-nome Biol 9(10):235.

22. Borchiellini C, et al. (2001) Sponge paraphyly and the origin of Metazoa. J Evol Biol14(1):171–179.

23. Sperling EA, Pisani D, Peterson KJ (2007) Poriferan paraphyly and its implications forPrecambrian paleobiology. Geol Soc Lond Spec Publ 286:355–368.

24. Sperling EA, Peterson KJ, Pisani D (2009) Phylogenetic-signal dissection of nuclearhousekeeping genes supports the paraphyly of sponges and the monophyly of Eu-metazoa. Mol Biol Evol 26(10):2261–2274.

25. Struck TH (2014) TreSpEx-detection of misleading signal in phylogenetic reconstruc-tions based on tree information. Evol Bioinform Online 10:51–67.

26. Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: Phylogenetic re-construction with infinite mixtures of profiles in a parallel environment. Syst Biol62(4):611–615.

27. van Soest RWM (1984) Deficient Merlia normani Kirkpatrick, 1908, from the Curacaoreefs, with a discussion on the phylogenetic interpretation of sclerosponges. ContribZool 54(2):211–219.

28. Gazave E, et al. (2012) No longer Demospongiae: Homoscleromorpha formal nomi-nation as a fourth class of Porifera. Hydrobiologia 687(1):3–10.

29. Dohrmann M, Janussen D, Reitner J, Collins AG, Wörheide G (2008) Phylogeny andevolution of glass sponges (Porifera, Hexactinellida). Syst Biol 57(3):388–405.

30. Voigt O, Adamski M, Sluzek K, Adamska M (2014) Calcareous sponge genomes revealcomplex evolution of α-carbonic anhydrases and two key biomineralization enzymes.BMC Evol Biol 14(1):230.

31. Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection.Syst Biol 51(3):492–508.

32. Bleidorn C, et al. (2009) On the phylogenetic position of Myzostomida: Can 77 genesget it wrong? BMC Evol Biol 9(1):150.

33. Edwards SV (2009) Natural selection and phylogenetic analysis. Proc Natl Acad Sci USA106(22):8799–8800.

34. Nesnidal MP, Helmkampf M, Bruchhaus I, El-Matbouli M, Hausdorf B (2013) Agent ofwhirling disease meets orphan worm: phylogenomic analyses firmly place Myxozoa inCnidaria. PLoS ONE 8(1):e54576.

35. Ax P (1996) Multicellular Animals: A New Approach to the Phylogenetic Order inNature (Springer, Berlin).

36. Nielsen C (2008) Six major steps in animal evolution: Are we derived sponge larvae?Evol Dev 10(2):241–257.

37. Srivastava M, et al. (2010) The Amphimedon queenslandica genome and the evolu-tion of animal complexity. Nature 466(7307):720–726.

38. Mah JL, Christensen-Dalsgaard KK, Leys SP (2014) Choanoflagellate and choanocytecollar-flagellar systems and the assumption of homology. Evol Dev 16(1):25–37.

39. Osigus H-J, Eitel M, Bernt M, Donath A, Schierwater B (2013) Mitogenomics at thebase of Metazoa. Mol Phylogenet Evol 69(2):339–351.

40. Halanych KM (1999) Metazoan phylogeny and the shifting comparative framework.Recent Developments in Comparative Endocrinology andNeurobiology, eds Roubos EW,Wendelaar-Bonga SE, Vaudry H, De Loof A (Shaker, Maastrict, The Netherlands), pp 3–7.

41. Dunn CW, Giribet G, Edgecombe GD, Hejnol A (2014) Animal phylogeny and itsevolutionary implications. Annu Rev Ecol Evol Syst 45:371–395.

42. Liebeskind BJ, Hillis DM, Zakon HH (2015) Convergence of ion channel genomecontent in early animal evolution. Proc Natl Acad Sci USA 112(8):E846–E851.

43. Moroz LL (2015) Convergent evolution of neural systems in ctenophores. J Exp Biol218(Pt 4):598–611.

44. Halanych KM (2015) The ctenophore lineage is older than sponges? That cannot beright! Or can it? J Exp Biol 218(Pt 4):592–597.

45. Brown T, Howe C, Zhang A, Pyrkosz Q, Brom AB (2012) A reference-free algorithm forcomputational normalization of shotgun sequencing data. arXiv:1203.4802.

46. Haas BJ, et al. (2013) De novo transcript sequence reconstruction from RNA-seq usingthe Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512.

47. Margulies M, et al. (2005) Genome sequencing in microfabricated high-density pi-colitre reactors. Nature 437(7057):376–380.

48. Ebersberger I, Strauss S, von Haeseler A (2009) HaMStR: Profile hidden markov modelbased search for orthologs in ESTs. BMC Evol Biol 9:157.

49. Stamatakis A (2014) RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313.

50. R Core Development Team (2014) R: A Language and Environment for StatisticalComputing (R Foundation for Statistical Computing, Vienna, Austria).

51. Telford MJ, et al. (2013) Phylogenomic analysis of echinoderm class relationshipssupports Asterozoa. Proc Biol Sci 281(1786):20140479.

52. Kück P, Struck TH (2014) BaCoCa—A heuristic software tool for the parallel assess-ment of sequence biases in hundreds of gene and taxon partitions. Mol PhylogenetEvol 70:94–98.

53. Zhong M, et al. (2011) Detecting the symplesiomorphy trap: A multigene phyloge-netic analysis of terebelliform annelids. BMC Evol Biol 11(1):369.

54. Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A (2014) Selecting optimal par-titioning schemes for phylogenomic datasets. BMC Evol Biol 14:82.

55. Thorley JL, Wilkinson M (1999) Testing the phylogenetic stability of early tetrapods.J Theor Biol 200(3):343–344.

56. Smith SA, Dunn CW (2008) Phyutility: A phyloinformatics tool for trees, alignmentsand molecular data. Bioinformatics 24(5):715–716.

57. Shimodaira H, Hasegawa M (2001) CONSEL: For assessing the confidence of phylo-genetic tree selection. Bioinformatics 17(12):1246–1247.

5778 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al.

Supporting InformationWhelan et al. 10.1073/pnas.1503453112SI MethodsTranscriptome Sequencing. Frozen aliquots of Acanthoeca specta-bilis (ATCC PRA-103) and Salpingoeca pyxidium (ATCC 50929)cultures were purchased from ATCC and a 10-μL aliquot of eachculture was used for RNA extraction with the RNAqueous MicroRNA extraction kit (Life Technologies). Notably, Acanthoecaspectabilis was grown in a xenic culture with the bacteria Klebsiellapneumoniae (ATCC 700831) and Enterobacter aerogenes (ATCC13048) and Salpingoeca pyxidium was grown in a xenic culturewith Klebsiella pneumoniae (ATCC 700831). Tissue cuttings weretaken from the apical portion of the sponges Latrunculia apicalis,Kirpatrickia variolosa, Hyalonema populiferum, and Rossella fibulata.One “orb” of the unusual hexactinellid Sympagella nux was takenfor RNA extractions. A tissue clip was taken from the margin ofthe bell of the deep-sea cnidarian Periphylla periphylla. RNAwas extracted from all sponges and P. periphylla with TRIzol(invitrogen). The Qiagen RNeasy kit and on-column DNase di-gestion was used to purify TRIzol extracted RNA. cDNA librariesof all eight taxa were constructed with the SMART cDNA libraryconstruction kit (Clontech Laboratories). cDNA library constructionfollowed manufacturer’s instructions except that the provided 3′ oligowas replaced with the Cap-Trsa-CV oligo following Meyer et al. (1).The Advantage 2 PCR system (Clontech) was used to amplify full-length cDNA using a minimum number of PCR cycles (∼17–21).Amplified cDNA was shipped to Hudson Alpha Institute for Bio-technology in Huntsville, AL for Illumina library preparation and 2 ×100-base pair (A. spectabilis, S. pyxidium, K. variolosa, L. apicalis) or2 × 150-base pair (H. populiferum, R. fibulata, S. nux, P. periphylla)paired-end sequencing on an Illumina HiSEq. 2500 using ∼one-eighth lane per taxon.

Initial Orthology Inference Procedure. Before any filtering steps, allsequences were backed up and new line characters were removedfrom interleaved sequences with nentferner.pl (https://github.com/mptrsen/HaMStRad). All sequences shorter than 50 bpwere then removed. Any gene with fewer than 37 taxa (∼47%)

was also removed from the dataset to minimize missing data.HaMStR sometimes pulls two identical sequences from a singletaxon for any given gene so the script uniqHaplo.pl (raven.iab.alaska.edu/∼ntakebay/teaching/programming/perl-scripts/uniqHaplo.pl) wasused to remove duplicate sequences. As an initial data-filteringstep before sequence alignment, if an ambiguous amino acid (i.e.,an X) was present in the first 20 or last 20 amino acids then the 5′ or3′ end, respectively, was trimmed to this X as it may indicate se-quencing errors.Sequences were aligned using MAFFT (2) with the “auto” and

“localpair” parameters and 1,000 maximum iterations. MAFFToutputs sequences in an interleaved format so nentferner.pl wasagain used to remove newline characters from sequences. Un-alignable regions were then removed with aliscore (3) and alicut(4). Alignment columns with only gaps were subsequently re-moved, and any gene with an alignment less than 50-bp longafter trimming was removed. For each gene, a custom javascript(https://github.com/kmkocot) then removed any sequence thatdid not overlap other sequences by at least 20 amino acids tominimize the number of end gaps in alignments that would likelynot add to information content in phylogenetic inference. Fi-nally, any gene that had fewer than 37 taxa after alignment andtrimming was removed to further minimize missing data.The last step removed putative paralogs with PhyloTreePrunner

(5), which is a tree-based method. Gene trees for each OG wereinferred using FastTreeMP (6) with the “slow” and “gamma”parameters. Gene trees and corresponding OGs were input intoPhyloTreePrunner and inferred paralogs were removed fromeach OG. Alignments of each OG were concatenated usingFASconCAT v1.0 (7). An automated wrapper script for thisprocedure is available from https://github.com/kmkocot.

OG Annotation. To ensure no one gene class dominated ourmatrices, OGs were annotated with PFAM domains (8) using theTrinotate pipeline (trinotate.github.io). An annotation spread-sheet was placed on figshare (doi: 10.6084/m9.figshare.1334306)

1. Meyer E, et al. (2009) Sequencing and de novo analysis of a coral larval transcriptomeusing 454 GSFlx. BMC Genomics 10:219.

2. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7:Improvements in performance and usability. Mol Biol Evol 30(4):772–780.

3. Misof B, Misof K (2009) A Monte Carlo approach successfully identifies randomness inmultiple sequence alignments: A more objective means of data exclusion. Syst Biol58(1):21–34.

4. Kück P (2009) ALICUT: A Perlscript Which Cuts ALISCORE Identified RSS (Department ofBioinformatics, Zoologisches Forschungsmuseum, Bonn, Germany).

5. Kocot KM, Citarella MR, Moroz LL, Halanych KM (2013) PhyloTreePruner: A phyloge-netic tree-based approach for selection of orthologous sequences for phylogenomics.Evol Bioinform Online 9(3940):429–435.

6. Price MN, Dehal PS, Arkin AP (2010) FastTree 2—Approximately maximum-likelihoodtrees for large alignments. PLoS ONE 5(3):e9490.

7. Kück P, Meusemann K (2010) FASconCAT: Convenient handling of data matrices. MolPhylogenet Evol 56(3):1115–1118.

8. Finn RD, et al. (2014) Pfam: the protein families database. Nucleic Acids Res42(Database issue):D222–D230.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 1 of 11

Saccoglossus kowalevskii

Drosophila melanogaster

Homo sapiens

Corticium candelabrum

Physalia physalis

Eunicella verrucosa

Ministeria vibrans

Oscarella carmela

Strongylocentrotus purpuratus

Sycon coactum

Hyalonema populiferum

Mortierella verticillata

Salpingoeca pyxidium

Ephydatia muelleri

Aphrocallistes vastus

Acanthoeca spectabilis

Tubulanus polymorphus

Petromyzon marinus

Aurelia aurita

Amoebidium parasiticum

Hydra viridissima

Capitella teleta

Danio rerio

Dryodora glandiformis

Latrunculia apicalis

Neurospora crassa

Sympagella nux

Bolinopsis infundibulum

Mnemiopsis leidyi

Sphaeroforma arctica

Beroe abyssicola

Hemithiris psittacea

Hydra vulgaris

Ircinia fasciculata

Platygyra carnosa

Salpingoeca rosetta

Euplokamis dunlapae

Pleurobrachia sp.

Hormathia digitata

Petrosia ficiformis

Spizellomyces punctatus

Monosiga ovata

Allomyces macrogynus

Abylopsis tetragona

Lottia gigantea

Bolocera tuediae

Sycon ciliatum

Vallicula sp.

Trichoplax adhaerens

Rhizopus oryzae

Aspergillus fumigatus

Nanomia bijuga

Acropora digitifera

Nematostella vectensis

Tethya wilhelmaPseudospongosorites suberitoides

Monosiga brevicollis

Rossella fibulata

Pleurobrachia bachei

Spongilla alba

Mertensiidae sp.

Coeloplana astericola

Craseoa lathetica

Periphylla periphylla

Daphnia pulex

Crella elegans

Agalma elegans

Kirkpatrickia variolosa

Saccharomyces cerevisiae

Aiptasia pallida

Amphimedon queenslandica

Capsaspora owczarzaki

Hydra oligactis

Chondrilla nucula

Lithobius forficatusPriapulus caudatus

99

99

44

79

63

70

95

36

0.4

Hyalonema populiferum

Monosiga brevicollis

Eunicella verrucosa

Pseudospongosorites suberitoides

Ephydatia muelleri

Spizellomyces punctatus

Salpingoeca pyxidium

Bolocera tuediae

Latrunculia apicalis

Sympagella nux

Mnemiopsis leidyi

Agalma elegansHydra vulgaris

Mertensiidae sp.

Trichoplax adhaerens

Sycon ciliatum

Pleurobrachia sp.

Aurelia aurita

Craseoa lathetica

Lottia gigantea

Euplokamis dunlapae

Amoebidium parasiticum

Oscarella carmela

Salpingoeca rosetta

Aspergillus fumigatus

Acanthoeca spectabilis

Kirkpatrickia variolosa

Corticium candelabrum

Vallicula sp.

Crella elegans

Acropora digitifera

Physalia physalis

Chondrilla nucula

Coeloplana astericola

Nematostella vectensis

Rossella fibulata

Ircinia fasciculata

Sycon coactum

Ministeria vibrans

Allomyces macrogynus

Hemithiris psittacea

Drosophila melanogaster

Saccoglossus kowalevskii

Pleurobrachia bachei

Amphimedon queenslandica

Petromyzon marinus

Aphrocallistes vastus

Monosiga ovata

Hormathia digitata

Hydra viridissima

Sphaeroforma arctica

Strongylocentrotus purpuratus

Spongilla alba

Priapulus caudatus

Tubulanus polymorphus

Nanomia bijuga

Periphylla periphylla

Platygyra carnosa

Saccharomyces cerevisiae

Homo sapiens

Daphnia pulex

Beroe abyssicola

Neurospora crassa

Capsaspora owczarzaki

Mortierella verticillataRhizopus oryzae

Abylopsis tetragona

Bolinopsis infundibulum

Aiptasia pallida

Petrosia ficiformis

Danio rerio

Tethya wilhelma

Capitella teleta

Lithobius forficatus

Hydra oligactis

Dryodora glandiformis

99

55

39

90

99

55

77

99

79

0.4

Ministeria vibrans

Aphrocallistes vastus

Platygyra carnosa

Spizellomyces punctatus

Sympagella nux

Acropora digitifera

Capsaspora owczarzaki

Homo sapiens

Mertensiidae sp.

Hydra oligactis

Hyalonema populiferum

Lottia gigantea

Ephydatia muelleri

Pleurobrachia bachei

Chondrilla nuculaPetrosia ficiformis

Sycon coactum

Pleurobrachia sp.

Rhizopus oryzae

Abylopsis tetragona

Sphaeroforma arctica

Priapulus caudatus

Physalia physalis

Oscarella carmela

Amphimedon queenslandica

Hemithiris psittacea

Dryodora glandiformis

Sycon ciliatum

Agalma elegans

Monosiga ovata

Daphnia pulex

Beroe abyssicola

Saccoglossus kowalevskii

Corticium candelabrum

Latrunculia apicalis

Nanomia bijuga

Allomyces macrogynus

Bolinopsis infundibulum

Bolocera tuediae

Danio rerio

Hormathia digitata

Aurelia aurita

Ircinia fasciculata

Coeloplana astericola

Strongylocentrotus purpuratus

Capitella teleta

Hydra vulgaris

Mnemiopsis leidyi

Euplokamis dunlapae

Lithobius forficatus

Trichoplax adhaerens

Rossella fibulata

Hydra viridissima

Tubulanus polymorphus

Aiptasia pallida

Amoebidium parasiticum

Nematostella vectensis

Salpingoeca rosetta

Petromyzon marinus

Vallicula sp.

Spongilla alba

Crella elegans

Periphylla periphylla

Tethya wilhelma

Mortierella verticillata

Eunicella verrucosa

Craseoa lathetica

Pseudospongosorites suberitoides

Kirkpatrickia variolosa

Drosophila melanogaster

94

52

90

82

90

98

0.2

Tubulanus polymorphus

Pleurobrachia sp.

Hydra vulgaris

Abylopsis tetragona

Agalma elegans

Aiptasia pallida

Coeloplana astericola

Corticium candelabrum

Euplokamis dunlapae

Nematostella vectensis

Ircinia fasciculata

Hydra viridissima

Vallicula sp.

Acropora digitiferaAurelia aurita

Amphimedon queenslandica

Strongylocentrotus purpuratus

Amoebidium parasiticum

Mortierella verticillata

Ministeria vibrans

Hyalonema populiferum

Aphrocallistes vastus

Hormathia digitata

Bolinopsis infundibulum

Oscarella carmela

Ephydatia muelleri

Beroe abyssicola

Eunicella verrucosa

Drosophila melanogaster

Trichoplax adhaerens

Allomyces macrogynus

Pseudospongosorites suberitoides

Petrosia ficiformis

Periphylla periphylla

Danio rerio

Rhizopus oryzae

Tethya wilhelma

Kirkpatrickia variolosa

Lithobius forficatus

Homo sapiens

Petromyzon marinus

Sympagella nuxRossella fibulata

Sphaeroforma arctica

Dryodora glandiformis

Latrunculia apicalis

Hemithiris psittacea

Capitella teleta

Daphnia pulex

Hydra oligactis

Platygyra carnosa

Nanomia bijuga

Spongilla alba

Pleurobrachia bachei

Bolocera tuediae

Sycon ciliatum

Salpingoeca rosetta

Priapulus caudatus

Capsaspora owczarzaki

Sycon coactum

Mertensiidae sp.

Lottia gigantea

Craseoa lathetica

Chondrilla nucula

Spizellomyces punctatus

Saccoglossus kowalevskii

Monosiga ovata

Physalia physalis

Mnemiopsis leidyi

Crella elegans

82

64

86

82

94

0.2

Drosophila melanogaster

Lithobius forficatus

Hemithiris psittacea

Agalma elegans

Trichoplax adhaerensHyalonema populiferum

Lottia gigantea

Abylopsis tetragona

Mnemiopsis leidyi

Daphnia pulex

Danio rerio

Nematostella vectensis

Hydra oligactis

Amphimedon queenslandica

Crella elegans

Chondrilla nucula

Acropora digitifera

Saccoglossus kowalevskii

Tethya wilhelma

Beroe abyssicola

Priapulus caudatus

Sympagella nux

Pseudospongosorites suberitoides

Eunicella verrucosa

Spongilla alba

Bolocera tuediae

Euplokamis dunlapae

Ircinia fasciculata

Pleurobrachia bachei

Nanomia bijuga

Tubulanus polymorphus

Bolinopsis infundibulum

Monosiga ovata

Aurelia aurita

Hydra viridissima

Platygyra carnosa

Hormathia digitata

Mertensiidae sp.

Capitella teleta

Dryodora glandiformis

Salpingoeca rosetta

Ephydatia muelleri

Pleurobrachia sp.

Aiptasia pallida

Latrunculia apicalis

Hydra vulgaris

Physalia physalis

Periphylla periphylla

Corticium candelabrum

Sycon coactum

Kirkpatrickia variolosa

Petromyzon marinus

Rossella fibulata

Homo sapiens

Vallicula sp.

Craseoa lathetica

Petrosia ficiformis

Oscarella carmela

Strongylocentrotus purpuratus

Coeloplana astericola

Aphrocallistes vastus

Sycon ciliatum

53

68

99

99

89

99

0.2Monosiga ovata

Crella elegans

Daphnia pulex

Sycon ciliatum

Saccoglossus kowalevskii

Trichoplax adhaerens

Pleurobrachia sp.

Capsaspora owczarzaki

Nematostella vectensis

Latrunculia apicalis

Sphaeroforma arctica

Hydra oligactis

Hydra viridissima

Physalia physalis

Ministeria vibrans

Spongilla alba

Drosophila melanogasterLithobius forficatus

Rhizopus oryzae

Petrosia ficiformisPseudospongosorites suberitoides

Amoebidium parasiticum

Bolinopsis infundibulum

Tubulanus polymorphus

Sympagella nux

Hydra vulgaris

Kirkpatrickia variolosa

Rossella fibulata

Strongylocentrotus purpuratus

Euplokamis dunlapae

Petromyzon marinusHomo sapiens

Dryodora glandiformis

Aurelia aurita

Salpingoeca rosetta

Ircinia fasciculata

Nanomia bijuga

Hormathia digitata

Allomyces macrogynus

Hyalonema populiferum

Bolocera tuediae

Mertensiidae sp.

Aphrocallistes vastus

Platygyra carnosa

Craseoa lathetica

Coeloplana astericola

Mortierella verticillata

Capitella teleta

Aiptasia pallida

Tethya wilhelma

Mnemiopsis leidyi

Acropora digitifera

Corticium candelabrum

Abylopsis tetragona

Priapulus caudatus

Oscarella carmela

Hemithiris psittacea

Ephydatia muelleri

Vallicula sp.

Beroe abyssicola

Agalma elegans

Pleurobrachia bachei

Chondrilla nucula

Amphimedon queenslandica

Danio rerio

Lottia gigantea

Periphylla periphylla

Sycon coactum

Spizellomyces punctatus

Eunicella verrucosa

71

44

95

65

48

57

65

97

83

58

99

98

99

0.2

A B C

D E F

Fig. S1. Maximum-likelihood topologies. (A) Full dataset (Fig. 2, dataset 1). (B) Dataset with certain paralogs removed (Fig. 2, dataset 2). (C) Dataset with certain paralogs and taxa with high LB scores removed (Fig. 2, dataset 3). (D) Dataset with certain paralogs and taxa and gene with high LB scores removed (Fig. 2, dataset 4). (E) Dataset with certain paralogs, taxa and genes withhigh LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 5). (F) Dataset with slowest evolving half of genes after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 6). Bootstrap values for each node are 100% unless otherwise noted.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 2 of 11

Chondrilla nucula

Pleurobrachia sp.

Agalma elegans

Pseudospongosorites suberitoides

Lithobius forficatus

Dryodora glandiformis

Hydra vulgaris

Sympagella nux

Daphnia pulex

Trichoplax adhaerens

Mertensiidae sp.

Monosiga ovata

Capitella teleta

Crella elegans

Abylopsis tetragona

Hormathia digitata

Saccoglossus kowalevskii

Bolinopsis infundibulum

Ministeria vibrans

Tubulanus polymorphus

Spongilla alba

Periphylla periphyllaAurelia aurita

Lottia gigantea

Amphimedon queenslandica

Mortierella verticillata

Platygyra carnosa

Physalia physalis

Tethya wilhelma

Aphrocallistes vastus

Aiptasia pallida

Rossella fibulata

Spizellomyces punctatus

Bolocera tuediae

Euplokamis dunlapae

Corticium candelabrum

Ircinia fasciculata

Mnemiopsis leidyi

Nanomia bijuga

Amoebidium parasiticum

Vallicula sp.

Kirkpatrickia variolosa

Beroe abyssicola

Hemithiris psittacea

Hydra oligactis

Petrosia ficiformis

Sphaeroforma arctica

Petromyzon marinus

Pleurobrachia bachei

Sycon ciliatum

Hydra viridissima

Priapulus caudatus

Oscarella carmela

Drosophila melanogaster

Strongylocentrotus purpuratus

Latrunculia apicalis

Coeloplana astericola

Acropora digitifera

Sycon coactum

Allomyces macrogynus

Eunicella verrucosa

Danio rerio

Craseoa lathetica

Rhizopus oryzae

Capsaspora owczarzaki

Homo sapiens

Hyalonema populiferum

Ephydatia muelleri

Salpingoeca rosetta

Nematostella vectensis

85

91

83

85

94

99

88

55

99

98

99

99

93

85

98

53

0.2

Monosiga brevicollis

Tethya wilhelma

Pleurobrachia sp.Coeloplana astericola

Pleurobrachia bachei

Spongilla alba

Eunicella verrucosa

Homo sapiens

Corticium candelabrum

Sycon ciliatum

Mertensiidae sp.

Priapulus caudatus

Salpingoeca rosetta

Periphylla periphylla

Dryodora glandiformis

Rhizopus oryzae

Strongylocentrotus purpuratus

Bolinopsis infundibulum

Nematostella vectensis

Nanomia bijuga

Daphnia pulex

Platygyra carnosa

Hemithiris psittacea

Salpingoeca pyxidium

Latrunculia apicalis

Saccharomyces cerevisiae

Mortierella verticillata

Ephydatia muelleri

Capitella teleta

Neurospora crassa

Beroe abyssicola

Acropora digitifera

Kirkpatrickia variolosa

Hydra oligactisHydra vulgaris

Sphaeroforma arctica

Lottia gigantea

Bolocera tuediae

Lithobius forficatus

Allomyces macrogynus

Rossella fibulataSympagella nux

Agalma elegans

Saccoglossus kowalevskii

Oscarella carmela

Abylopsis tetragona

Amoebidium parasiticum

Aurelia aurita

Chondrilla nucula

Petromyzon marinus

Ministeria vibrans

Sycon coactum

Tubulanus polymorphus

Aspergillus fumigatus

Euplokamis dunlapae

Ircinia fasciculata

Crella elegans

Craseoa lathetica

Petrosia ficiformis

Hydra viridissima

Aspergillus fumigatus

Monosiga ovata

Hyalonema populiferum

Mnemiopsis leidyi

Spizellomyces punctatus

Amphimedon queenslandica

Aphrocallistes vastus

Danio rerio

Hormathia digitata

Vallicula sp.

Pseudospongosorites suberitoides

Trichoplax adhaerens

Physalia physalis

Capsaspora owczarzaki

Aiptasia pallida

Drosophila melanogaster

30

99

90

55

54

66

88

83

0.4

Aurelia aurita

Spongilla alba

Kirkpatrickia variolosa

Mnemiopsis leidyi

Abylopsis tetragona

Rhizopus oryzae

Craseoa lathetica

Sphaeroforma arctica

Allomyces macrogynus

Lottia gigantea

Bolinopsis infundibulum

Salpingoeca rosetta

Agalma elegans

Hydra viridissima

Salpingoeca pyxidium

Petromyzon marinus

Bolocera tuediae

Ircinia fasciculata

Amoebidium parasiticum

Homo sapiens

Hydra oligactis

Capsaspora owczarzaki

Danio rerio

Drosophila melanogaster

Latrunculiaapicalis

Acanthoeca spectabilis

Trichoplax adhaerens

Hemithiris psittacea

Monosiga ovata

Beroe abyssicola

Saccoglossus kowalevskii

Nanomia bijuga

Euplokamis dunlapae

Dryodora glandiformis

Mortierella verticillata

Tubulanus polymorphus

Oscarella carmela

Pseudospongosorites suberitoides

Neurospora crassa

Chondrilla nucula

Amphimedon queenslandica

Corticium candelabrum

Sympagella nux

Pleurobrachia bachei

Platygyra carnosa

Hormathia digitata

Crella elegans

Strongylocentrotus purpuratus

Petrosia ficiformis

Spizellomyces punctatus

Monosiga brevicollis

Hyalonema populiferum

Vallicula sp.

Ministeria vibrans

Mertensiidae sp.

Hydra vulgaris

Coeloplana astericola

Lithobius forficatus

Tethya wilhelma

Acropora digitifera

Aiptasia pallida

Capitella teleta

Physalia physalis

Periphylla periphylla

Sycon ciliatum

Aphrocallistes vastusRossella fibulata

Aspergillus fumigatus

Sycon coactum

Saccharomyces cerevisiae

Daphnia pulex

Eunicella verrucosa

Ephydatia muelleri

Priapulus caudatus

Nematostella vectensis

Pleurobrachia sp.

30

98

97

99

63

74

90

83

99

55

97

0.3

Aurelia aurita

Hormathia digitata

Rossella fibulata

Beroe abyssicola

Drosophila melanogaster

Craseoa lathetica

Petromyzon marinus

Hydra oligactis

Spongilla alba

Strongylocentrotus purpuratus

Abylopsis tetragona

Ministeria vibrans

Danio rerio

Pleurobrachia bachei

Lottia gigantea

Eunicella verrucosa

Hydra viridissima

Tethya wilhelma

Mortierella verticillata

Nanomia bijuga

Sycon coactum

Oscarella carmela

Vallicula sp.

Spizellomyces punctatus

Mnemiopsis leidyi

Aiptasia pallida

Allomyces macrogynus

Mertensiidae sp.

Petrosia ficiformisPseudospongosorites suberitoides

Rhizopus oryzae

Amoebidium parasiticum

Capitella teleta

Crella elegans

Monosiga ovata

Acropora digitifera

Aphrocallistes vastus

Bolinopsis infundibulum

Capsaspora owczarzakiSalpingoeca rosetta

Tubulanus polymorphus

Agalma elegans

Corticium candelabrum

Hyalonema populiferum

Hydra vulgaris

Euplokamis dunlapae

Lithobius forficatus

Bolocera tuediae

Pleurobrachia sp.

Amphimedon queenslandica

Hemithiris psittacea

Ircinia fasciculata

Dryodora glandiformis

Kirkpatrickia variolosa

Sphaeroforma arctica

Daphnia pulex

Homo sapiens

Platygyra carnosa

Ephydatia muelleriChondrilla nucula

Saccoglossus kowalevskii

Periphylla periphylla

Trichoplax adhaerens

Sympagella nux

Latrunculia apicalis

Coeloplana astericola

Physalia physalis

Sycon ciliatum

Nematostella vectensis

Priapulus caudatus

96

95

88

61

9965

94

79

0.2

Ephydatia muelleri

Physalia physalis

Pseudospongosorites suberitoides

Saccoglossus kowalevskii

Trichoplax adhaerens

Corticium candelabrum

Craseoa lathetica

Chondrilla nucula

Agalma elegans

Amphimedon queenslandica

Oscarella carmelaSycon coactum

Priapulus caudatus

Abylopsis tetragona

Crella elegans

Danio rerio

Hormathia digitata

Salpingoeca rosetta

Sycon ciliatum

Vallicula sp.

Capitella teleta

Pleurobrachia bacheiPleurobrachia sp.

Coeloplana astericola

Ircinia fasciculata

Dryodora glandiformis

Hemithiris psittaceaPetromyzon marinus

Eunicella verrucosaAcropora digitifera

Hydra viridissima

Platygyra carnosa

Bolocera tuediae

Latrunculia apicalis

Bolinopsis infundibulum

Drosophila melanogaster

Nanomia bijuga

Aphrocallistes vastus

Beroe abyssicola

Hydra oligactis

Aiptasia pallida

Lithobius forficatus

Monosiga ovata

Petrosia ficiformis

Nematostella vectensis

Homo sapiens

Mertensiidae sp.

Daphnia pulex

Kirkpatrickia variolosa

Strongylocentrotus purpuratus

Mnemiopsis leidyi

Tubulanus polymorphus

Tethya wilhelma

Aurelia aurita

Hyalonema populiferum

Rossella fibulata

Spongilla alba

Periphylla periphylla

Euplokamis dunlapae

Sympagella nux

Hydra vulgaris

Lottia gigantea

89

82

99

99

98

60

0.2

Capitella teleta

Pleurobrachia sp.

Sympagella nux

Aspergillus fumigatus

Sycon coactum

Homo sapiens

Ministeria vibrans

Agalma elegans

Lottia gigantea

Corticium candelabrum

Spongilla alba

Physalia physalis

Neurospora crassa

Hyalonema populiferum

Mortierella verticillata

Craseoa lathetica

Hemithiris psittacea

Allomyces macrogynus

Aphrocallistes vastus

Eunicella verrucosa

Abylopsis tetragona

Coeloplana astericola

Latrunculia apicalis

Rhizopus oryzae

Hydra viridissima

Euplokamis dunlapae

Trichoplax adhaerens

Bolinopsis infundibulum

Ephydatia muelleri

Monosiga brevicollis

Salpingoeca pyxidium

Hydra oligactis

Amphimedon queenslandica

Sphaeroforma arctica

Aiptasia pallidaHormathia digitata

Saccharomyces cerevisiae

Bolocera tuediae

Sycon ciliatum

Ircinia fasciculata

Acanthoeca spectabilis

Priapulus caudatus

Amoebidium parasiticum

Spizellomyces punctatus

Nanomia bijuga

Nematostella vectensis

Periphylla periphylla

Petromyzon marinus

Rossella fibulata

Beroe abyssicola

Drosophila melanogaster

Tubulanus polymorphus

Oscarella carmela

Acropora digitiferaPlatygyra carnosa

Lithobius forficatus

Mnemiopsis leidyi

Hydra vulgaris

Monosiga ovata

Pseudospongosorites suberitoides

Daphnia pulex

Strongylocentrotus purpuratus

Salpingoeca rosetta

Mertensiidae sp.

Danio rerio

Aurelia aurita

Dryodora glandiformis

Tethya wilhelma

Vallicula sp.

Chondrilla nucula

Kirkpatrickia variolosa

Petrosia ficiformis

Capsaspora owczarzaki

Saccoglossus kowalevskii

Crella elegans

Pleurobrachia bachei

50

56

59

91

54

6

99

99

80

99

0.4

A B C

D E F

Fig. S2. Maximum-likelihood topologies. (A) Dataset with only genes with the lowest 50% of RCFV values after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 7). (B) Dataset with certain paralogs and heterogeneous genes removed (Fig. 2, dataset 8). (C) Dataset with certain paralogs, heterogeneous genes, and genes with high LB scoresremoved (Fig. 2, dataset 9). (D) Dataset with certain paralogs, heterogeneous genes, and taxa and genes with high LB scores removed (Fig. 2, dataset 10). (E) Dataset with certain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 11). (F) Dataset with certain and uncertain paralogs removed (Fig.2, dataset 12). Bootstrap values for each node are 100% unless otherwise noted.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 3 of 11

Platygyra carnosa

Tubulanus polymorphus

Chondrilla nucula

Strongylocentrotus purpuratus

Bolocera tuediae

Crella elegans

Sycon coactum

Nematostella vectensis

Latrunculia apicalis

Spongilla alba

Saccoglossus kowalevskii

Craseoa lathetica

Hyalonema populiferum

Physalia physalis

Ircinia fasciculata

Capsaspora owczarzaki

Bolinopsis infundibulum

Lithobius forficatus

Salpingoeca rosetta

Hydra vulgarisAgalma elegans

Nanomia bijuga

Vallicula sp.

Rossella fibulata

Pleurobrachia bachei

Mertensiidae sp.

Priapulus caudatus

Monosiga ovata

Rhizopus oryzae

Hemithiris psittacea

Corticium candelabrum

Coeloplana astericola

Tethya wilhelma

Beroe abyssicola

Trichoplax adhaerens

Oscarella carmela

Spizellomyces punctatus

Euplokamis dunlapae

Periphylla periphylla

Allomyces macrogynus

Aurelia aurita

Abylopsis tetragona

Ministeria vibrans

Eunicella verrucosa

Petromyzon marinusDanio rerio

Hydra oligactisHydra viridissima

Pseudospongosorites suberitoides

Mortierella verticillata

Sympagella nux

Acropora digitifera

Homo sapiens

Aphrocallistes vastus

Sphaeroforma arctica

Mnemiopsis leidyi

Ephydatia muelleri

Capitella teleta

Drosophila melanogaster

Kirkpatrickia variolosa

Amoebidium parasiticum

Sycon ciliatum

Daphnia pulex

Dryodora glandiformisPleurobrachia sp.

Lottia gigantea

Aiptasia pallidaHormathia digitata

Amphimedon queenslandicaPetrosia ficiformis

93

59

93

99

99

80

95

99

0.2

Craseoa lathetica

Coeloplana astericola

Monosiga ovata

Platygyra carnosaEunicella verrucosa

Amoebidium parasiticum

Dryodora glandiformis

Vallicula sp.

Nanomia bijuga

Mnemiopsis leidyi

Capsaspora owczarzaki

Aphrocallistes vastus

Salpingoeca rosetta

Petromyzon marinus

Tubulanus polymorphus

Crella elegans

Pseudospongosorites suberitoides

Mortierella verticillata

Kirkpatrickia variolosa

Abylopsis tetragona

Chondrilla nucula

Sphaeroforma arctica

Agalma elegans

Sycon coactum

Aurelia aurita

Daphnia pulex

Corticium candelabrum

Hydra oligactis

Danio rerio

Ircinia fasciculata

Lithobius forficatus

Capitella teleta

Bolinopsis infundibulum

Sycon ciliatum

Mertensiidae sp.

Oscarella carmela

Priapulus caudatus

Lottia gigantea

Hydra viridissima

Petrosia ficiformis

Sympagella nuxRossella fibulata

Hormathia digitata

Nematostella vectensis

Latrunculia apicalis

Hemithiris psittacea

Periphylla periphylla

Beroe abyssicola

Ephydatia muelleri

Ministeria vibrans

Spizellomyces punctatus

Saccoglossus kowalevskii

Tethya wilhelma

Rhizopus oryzaeAllomyces macrogynus

Physalia physalis

Hydra vulgaris

Drosophila melanogaster

Strongylocentrotus purpuratus

Spongilla alba

Amphimedon queenslandica

Pleurobrachia sp.

Acropora digitifera

Euplokamis dunlapae

Bolocera tuediae

Trichoplax adhaerens

Homo sapiens

Pleurobrachia bachei

Aiptasia pallida

Hyalonema populiferum

91

99

76

99

97

99

76

73

63

0.2

Petrosia ficiformis

Lithobius forficatus

Rossella fibulata

Priapulus caudatus

Periphylla periphylla

Amphimedon queenslandica

Saccoglossus kowalevskii

Hyalonema populiferum

Oscarella carmela

Bolocera tuediae

Daphnia pulex

Spongilla alba

Vallicula sp.

Pleurobrachia sp.

Acropora digitifera

Hydra viridissima

Crella elegans

Capitella teleta

Drosophila melanogaster

Eunicella verrucosa

Aphrocallistes vastus

Trichoplax adhaerens

Latrunculia apicalis

Danio rerio

Pleurobrachia bachei

Euplokamis dunlapae

Ircinia fasciculata

Lottia gigantea

Tethya wilhelma

Coeloplana astericola

Bolinopsis infundibulum

Monosiga ovata

Aiptasia pallida

Homo sapiens

Nematostella vectensis

Abylopsis tetragona

Beroe abyssicola

Hemithiris psittacea

Corticium candelabrum

Nanomia bijuga

Physalia physalis

Hormathia digitata

Craseoa lathetica

Sycon coactum

Dryodora glandiformis

Chondrilla nucula

Strongylocentrotus purpuratus

Kirkpatrickia variolosa

Aurelia aurita

Mnemiopsis leidyi

Ephydatia muelleri

Hydra vulgaris

Sycon ciliatum

Salpingoeca rosetta

Tubulanus polymorphus

Mertensiidae sp.

Agalma elegans

Pseudospongosorites suberitoides

Petromyzon marinus

Sympagella nux

Hydra oligactis

Platygyra carnosa

66

87

78

51

99

98

98

98

0.2

Amphimedon queenslandica

Pleurobrachia sp.

Hydra oligactisHydra vulgaris

Beroe abyssicola

Saccoglossus kowalevskii

Amoebidium parasiticum

Hormathia digitata

Danio rerio

Sphaeroforma arctica

Sycon ciliatum

Aurelia auritaPeriphylla periphylla

Aphrocallistes vastusRossella fibulata

Allomyces macrogynus

Dryodora glandiformis

Pleurobrachia bachei

Spizellomyces punctatus

Spongilla alba

Rhizopus oryzae

Mertensiidae sp.

Hyalonema populiferum

Monosiga ovata

Petromyzon marinus

Drosophila melanogaster

Bolinopsis infundibulum

Ministeria vibrans

Ircinia fasciculata

Nematostella vectensis

Sycon coactum

Kirkpatrickia variolosa

Latrunculia apicalis

Lithobius forficatus

Hemithiris psittacea

Physalia physalis

Platygyra carnosa

Coeloplana astericola

Nanomia bijuga

Hydra viridissima

Corticium candelabrum

Ephydatia muelleri

Tethya wilhelma

Lottia gigantea

Acropora digitifera

Abylopsis tetragona

Pseudospongosorites suberitoides

Strongylocentrotus purpuratus

Mortierella verticillata

Aiptasia pallida

Agalma elegans

Petrosia ficiformis

Craseoa lathetica

Capitella teleta

Mnemiopsis leidyi

Daphnia pulex

Priapulus caudatus

Euplokamis dunlapae

Homo sapiens

Salpingoeca rosetta

Oscarella carmela

Crella elegans

Eunicella verrucosa

Tubulanus polymorphus

Bolocera tuediae

Chondrilla nucula

Capsaspora owczarzaki

Sympagella nux

Vallicula sp.

Trichoplax adhaerens

94

86

69

78

87

62

83

98

64

97

60

62

99

84

0.2

Homo sapiens

Latrunculia apicalis

Aphrocallistes vastus

Ministeria vibrans

Petrosia ficiformis

Capitella teleta

Sycon ciliatum

Pseudospongosorites suberitoides

Petromyzon marinus

Hemithiris psittacea

Coeloplana astericola

Strongylocentrotus purpuratus

Beroe abyssicolaMnemiopsis leidyi

Drosophila melanogaster

Corticium candelabrum

Vallicula sp.

Aiptasia pallidaHormathia digitata

Priapulus caudatus

Ircinia fasciculata

Nematostella vectensis

Kirkpatrickia variolosa

Ephydatia muelleri

Bolocera tuediae

Oscarella carmela

Amoebidium parasiticum

Tubulanus polymorphus

Mertensiidae sp.

Sympagella nux

Acropora digitifera

Sphaeroforma arctica

Hyalonema populiferum

Crella elegans

Salpingoeca rosetta

Abylopsis tetragona

Rossella fibulata

Amphimedon queenslandica

Mortierella verticillata

Daphnia pulex

Spizellomyces punctatus

Dryodora glandiformis

Pleurobrachia bachei

Aurelia aurita

Platygyra carnosa

Pleurobrachia sp.

Monosiga ovata

Allomyces macrogynus

Capsaspora owczarzaki

Bolinopsis infundibulum

Nanomia bijuga

Trichoplax adhaerens

Euplokamis dunlapae

Danio rerio

Saccoglossus kowalevskii

Sycon coactum

Eunicella verrucosa

Periphylla periphylla

Spongilla alba

Craseoa lathetica

Lottia gigantea

Tethya wilhelma

Rhizopus oryzae

Hydra viridissima

Agalma elegans

Lithobius forficatus

Chondrilla nucula

Physalia physalis

Hydra oligactisHydra vulgaris

75

93

9975

40

99

78

45

73

8791

0.2

Agalma elegans

Ministeria vibrans

Saccoglossus kowalevskii

Pleurobrachia sp.

Mortierella verticillata

Sphaeroforma arctica

Abylopsis tetragona

Ircinia fasciculata

Crella elegans

Craseoa lathetica

Sympagella nux

Amphimedon queenslandica

Pleurobrachia bachei

Hormathia digitata

Coeloplana astericola

Monosiga brevicollis

Aphrocallistes vastus

Ephydatia muelleri

Mnemiopsis leidyi

Sycon coactum

Kirkpatrickia variolosa

Euplokamis dunlapae

Lottia gigantea

Chondrilla nucula

Hemithiris psittacea

Hydra vulgaris

Aspergillus fumigatus

Amoebidium parasiticum

Aurelia aurita

Neurospora crassa

Bolocera tuediae

Dryodora glandiformis

Capitella teleta

Hyalonema populiferum

Nematostella vectensis

Lithobius forficatus

Latrunculia apicalis

Petromyzon marinus

Monosiga ovata

Daphnia pulex

Periphylla periphylla

Spizellomyces punctatus

Priapulus caudatus

Platygyra carnosa

Bolinopsis infundibulum

Tubulanus polymorphus

Spongilla alba

Danio rerio

Physalia physalis

Nanomia bijuga

Allomyces macrogynus

Aiptasia pallida

Rhizopus oryzae

Sycon ciliatum

Saccharomyces cerevisiae

Eunicella verrucosa

Acropora digitifera

Oscarella carmela

Capsaspora owczarzaki

Acanthoeca spectabilis

Salpingoeca rosetta

Petrosia ficiformis

Tethya wilhelma

Rossella fibulata

Strongylocentrotus purpuratus

Corticium candelabrum

Vallicula sp.

Homo sapiens

Trichoplax adhaerens

Hydra viridissima

Pseudospongosorites suberitoides

Beroe abyssicola

Drosophila melanogaster

Hydra oligactis

Salpingoeca pyxidium

Mertensiidae sp.

99

99

9999

71

99

66

97

1

59

32

88

99

99

88

99

99

59

87

0.4

A B C

D E F

Fig. S3. Maximum-likelihood topologies. (A) Dataset with certain and uncertain paralogs and taxa with high LB scores removed (Fig. 2, dataset 13). (B) Dataset with certain and uncertain paralogs and taxa and gene with high LB scores removed (Fig. 2, dataset 14). (C) Dataset with certain and uncertain paralogs, taxa and genes with high LB scores, and all outgroups exceptchoanoflagellates removed (Fig. 2, dataset 15). (D) Dataset with slowest evolving half of genes after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 16). (E) Dataset with only genes with the lowest 50% of RCFV values after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 17).(F) Dataset with certain and uncertain paralogs and heterogeneous genes removed (Fig. 2, dataset 18). Bootstrap values for each node are 100% unless otherwise noted.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 4 of 11

Saccharomyces cerevisiae

Salpingoeca pyxidium

Agalma elegans

Hydra oligactis

Neurospora crassa

Tubulanus polymorphus

Capitella teletaLottia gigantea

Monosiga brevicollis

Spizellomyces punctatus

Bolinopsis infundibulum

Aurelia aurita

Pleurobrachia bachei

Bolocera tuediae

Ircinia fasciculata

Capsaspora owczarzaki

Chondrilla nucula

Oscarella carmela

Amphimedon queenslandica

Latrunculia apicalis

Priapulus caudatus

Ministeria vibrans

Periphylla periphylla

Mnemiopsis leidyi

Pseudospongosorites suberitoides

Platygyra carnosa

Hormathia digitata

Rossella fibulata

Trichoplax adhaerens

Vallicula sp.Monosiga ovata

Kirkpatrickia variolosa

Strongylocentrotus purpuratus

Hydra vulgaris

Acanthoeca spectabilis

Dryodora glandiformis

Tethya wilhelma

Lithobius forficatusDaphnia pulex

Crella elegans

Hydra viridissima

Rhizopus oryzae

Ephydatia muelleri

Sycon coactumSycon ciliatum

Spongilla alba

Hemithiris psittacea

Drosophila melanogaster

Eunicella verrucosa

Mortierella verticillata

Nanomia bijuga

Sphaeroforma arctica

Danio rerio

Hyalonema populiferum

Saccoglossus kowalevskii

Homo sapiens

Aiptasia pallida

Petromyzon marinus

Petrosia ficiformis

Acropora digitifera

Craseoa lathetica

Aphrocallistes vastus

Amoebidium parasiticum

Sympagella nux

Aspergillus fumigatus

Beroe abyssicola

Nematostella vectensis

Abylopsis tetragona

Allomyces macrogynus

Euplokamis dunlapae

Pleurobrachia sp.

Physalia physalis

Corticium candelabrum

Salpingoeca rosetta

Coeloplana astericola

Mertensiidae sp.

96

93

57

78

93

96

70

68

53

74

99

84

0.3

Tubulanus polymorphus

Hyalonema populiferum

Mnemiopsis leidyi

Rhizopus oryzae

Tethya wilhelma

Spizellomyces punctatus

Sphaeroforma arctica

Sympagella nuxAphrocallistes vastus

Capsaspora owczarzaki

Daphnia pulex

Sycon ciliatum

Aiptasia pallida

Nanomia bijuga

Strongylocentrotus purpuratus

Vallicula sp.

Hemithiris psittacea

Pleurobrachia bachei

Lithobius forficatus

Periphylla periphylla

Platygyra carnosa

Hydra vulgaris

Hormathia digitata

Hydra oligactis

Mertensiidae sp.

Salpingoeca rosetta

Trichoplax adhaerens

Euplokamis dunlapae

Nematostella vectensis

Homo sapiensDanio rerio

Ircinia fasciculata

Corticium candelabrum

Allomyces macrogynus

Physalia physalis

Agalma elegans

Latrunculia apicalis

Beroe abyssicola

Eunicella verrucosa

Pleurobrachia sp.

Petrosia ficiformis

Crella elegans

Amoebidium parasiticum

Chondrilla nucula

Monosiga ovata

Pseudospongosorites suberitoides

Sycon coactum

Aurelia aurita

Kirkpatrickia variolosa

Rossella fibulata

Ephydatia muelleri

Dryodora glandiformis

Abylopsis tetragona

Bolinopsis infundibulum

Petromyzon marinus

Spongilla alba

Priapulus caudatus

Ministeria vibrans

Lottia gigantea

Drosophila melanogaster

Acropora digitifera

Amphimedon queenslandica

Oscarella carmela

Coeloplana astericola

Saccoglossus kowalevskii

Mortierella verticillata

Bolocera tuediae

Capitella teleta

Craseoa lathetica

Hydra viridissima

70

9870

85

79

79

96

99

0.2

Euplokamis dunlapae

Petrosia ficiformis

Lottia giganteaCapitella teleta

Oscarella carmela

Mnemiopsis leidyi

Kirkpatrickia variolosa

Salpingoeca rosetta

Vallicula sp.

Agalma elegans

Amphimedon queenslandica

Tethya wilhelma

Sympagella nux

Coeloplana astericola

Aiptasia pallida

Daphnia pulex

Latrunculia apicalis

Monosiga ovata

Hemithiris psittacea

Saccoglossus kowalevskii

Mertensiidae sp.

Pleurobrachia sp.

Hydra viridissima

Danio rerio

Hydra vulgaris

Hyalonema populiferum

Pleurobrachia bachei

Physalia physalis

Dryodora glandiformis

Bolocera tuediae

Rossella fibulata

Hormathia digitata

Crella elegans

Nanomia bijuga

Priapulus caudatus

Periphylla periphylla

Eunicella verrucosa

Platygyra carnosa

Corticium candelabrum

Spongilla alba

Aurelia aurita

Bolinopsis infundibulum

Tubulanus polymorphus

Trichoplax adhaerens

Hydra oligactisAbylopsis tetragona

Ephydatia muelleriPseudospongosorites suberitoides

Ircinia fasciculata

Beroe abyssicola

Homo sapiens

Lithobius forficatus

Craseoa lathetica

Drosophila melanogaster

Petromyzon marinus

Chondrilla nucula

Strongylocentrotus purpuratus

Sycon ciliatum

Nematostella vectensis

Aphrocallistes vastus

Sycon coactum

Acropora digitifera

99

99

9873

77

87

97

99

54

97

0.2

Capitella teleta

Acropora digitifera

Physalia physalis

Homo sapiens

Craseoa lathetica

Lithobius forficatus

Periphylla periphyllaTrichoplax adhaerens

Aiptasia pallida

Corticium candelabrum

Bolocera tuediae

Priapulus caudatus

Bolinopsis infundibulum

Daphnia pulex

Beroe abyssicola

Aurelia aurita

Coeloplana astericola

Euplokamis dunlapae

Petrosia ficiformis

Lottia gigantea

Vallicula sp.

Drosophila melanogaster

Sycon ciliatum

Sympagella nux

Spongilla alba

Petromyzon marinus

Oscarella carmela

Sycon coactum

Pleurobrachia bachei

Hormathia digitata

Pleurobrachia sp.

Platygyra carnosa

Chondrilla nucula

Saccoglossus kowalevskii

Abylopsis tetragona

Amphimedon queenslandica

Hydra vulgaris

Ephydatia muelleri

Tethya wilhelma

Eunicella verrucosa

Hydra viridissima

Latrunculia apicalis

Crella elegans

Rossella fibulata

Hyalonema populiferum

Strongylocentrotus purpuratus

Nematostella vectensis

Hydra oligactis

Hemithiris psittacea

Aphrocallistes vastus

Pseudospongosorites suberitoides

Ircinia fasciculata

Agalma elegans

Dryodora glandiformis

Mertensiidae sp.

Nanomia bijuga

Kirkpatrickia variolosa

Mnemiopsis leidyi

Tubulanus polymorphus

Danio rerio

71

99

4484

97

86

0.2

Latrunculia apicalis

Corticium candelabrum

Mertensiidae sp.

Strongylocentrotus purpuratus

Capitella teleta

Platygyra carnosa

Agalma elegans

Nematostella vectensis

Sycon ciliatum

Pseudospongosorites suberitoides

Tethya wilhelma

Craseoa lathetica

Tubulanus polymorphus

Amphimedon queenslandica

Ircinia fasciculata

Pleurobrachia sp.

Daphnia pulex

Hyalonema populiferum

Lithobius forficatus

Physalia physalis

Bolocera tuediae

Dryodora glandiformis

Aurelia aurita

Spongilla alba

Euplokamis dunlapae

Danio rerio

Hydra oligactis

Saccoglossus kowalevskii

Nanomia bijuga

Vallicula sp.

Bolinopsis infundibulum

Petrosia ficiformis

Beroe abyssicola

Acropora digitifera

Hydra vulgarisHydra viridissima

Homo sapiens

Eunicella verrucosa

Oscarella carmela

Petromyzon marinus

Ephydatia muelleri

Mnemiopsis leidyi

Abylopsis tetragona

Chondrilla nucula

Drosophila melanogaster

Hemithiris psittacea

Kirkpatrickia variolosa

Lottia gigantea

Trichoplax adhaerens

Pleurobrachia bachei

Hormathia digitata

Crella elegans

Priapulus caudatus

Sycon coactum

Sympagella nux

Periphylla periphylla

Aphrocallistes vastus

Coeloplana astericola

Aiptasia pallida

Rossella fibulata

99

95

92

69

57

99

88

0.2

Trichoplax adhaerens

Platygyra carnosa

Lottia gigantea

Aurelia auritaAgalma elegans

Oscarella carmela

Capitella teleta

Daphnia pulex

Latrunculia apicalis

Priapulus caudatus

Abylopsis tetragona

Bolocera tuediae

Hydra viridissima

Pleurobrachia sp.

Dryodora glandiformis

Hormathia digitata

Corticium candelabrum

Chondrilla nucula

Sycon coactum

Craseoa lathetica

Mertensiidae sp.Bolinopsis infundibulum

Ircinia fasciculata

Spongilla alba

Hydra vulgaris

Amphimedon queenslandica

Aphrocallistes vastus

Hemithiris psittacea

Pleurobrachia bachei

Strongylocentrotus purpuratus

Beroe abyssicola

Danio rerio

Hydra oligactis

Homo sapiens

Hyalonema populiferum

Periphylla periphylla

Physalia physalis

Nanomia bijuga

Coeloplana astericola

Pseudospongosorites suberitoides

Petrosia ficiformis

Petromyzon marinus

Saccoglossus kowalevskii

Vallicula sp.

Mnemiopsis leidyi

Rossella fibulata

Nematostella vectensis

Sycon ciliatum

Kirkpatrickia variolosa

Tethya wilhelma

Ephydatia muelleri

Sympagella nux

Tubulanus polymorphus

Drosophila melanogaster

Lithobius forficatus

Euplokamis dunlapae

Acropora digitiferaEunicella verrucosa

Aiptasia pallida

Crella elegans

98

98

86

63

99

51

99

99

81

92

0.2

A B C

D E F

Fig. S4. Maximum-likelihood topologies. (A) Dataset with certain and uncertain paralogs, heterogeneous genes, and genes with high LB scores removed (Fig. 2, dataset 19). (B) Dataset with certain and uncertain paralogs, heterogeneous genes, and taxa and genes with high LB scores removed (Fig. 2, dataset 20). (C) Dataset with certain and uncertain” paralogs, heterogeneousgenes, taxa and genes with high LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 21). (D) Dataset with certain paralogs, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 22). (E) Dataset with certain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 23). (F)Dataset with certain and uncertain paralogs, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 24). Bootstrap values for each node are 100% unless otherwise noted.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 5 of 11

Periphylla periphylla

Latrunculia apicalis

Aphrocallistes vastus

Eunicella verrucosa

Dryodora glandiformis

Danio rerioHomo sapiens

Chondrilla nucula

Euplokamis dunlapae

Mnemiopsis leidyi

Petrosia ficiformis

Sycon ciliatum

Corticium candelabrum

Rossella fibulata

Hyalonema populiferum

Sympagella nux

Vallicula sp.

Nematostella vectensis

Hydra vulgaris

Bolinopsis infundibulum

Kirkpatrickia variolosa

Acropora digitifera

Ircinia fasciculata

Mertensiidae sp.

Hydra viridissima

Sycon coactum

Aiptasia pallida

Lottia gigantea

Agalma elegans

Hormathia digitata

Pleurobrachia sp.

Craseoa lathetica

Lithobius forficatus

Tubulanus polymorphus

Amphimedon queenslandica

Strongylocentrotus purpuratus

Physalia physalis

Hemithiris psittacea

Drosophila melanogaster

Ephydatia muelleri

Capitella teleta

Hydra oligactis

Aurelia aurita

Beroe abyssicola

Saccoglossus kowalevskii

Abylopsis tetragona

Coeloplana astericola

Tethya wilhelma

Priapulus caudatus

Nanomia bijuga

Platygyra carnosa

Spongilla alba

Petromyzon marinus

Oscarella carmela

Pleurobrachia bachei

Trichoplax adhaerens

Daphnia pulex

Bolocera tuediae

Crella elegans

Pseudospongosorites suberitoides

96

99

92

85

61

99

98

98

55

0.2

Craseoa lathetica

Euplokamis dunlapae

Acropora digitifera

Spongilla alba

Homo sapiens

Lithobius forficatus

Agalma elegans

Dryodora glandiformis

Petrosia ficiformis

Beroe abyssicola

Coeloplana astericola

Petromyzon marinus

Hemithiris psittacea

Periphylla periphylla

Daphnia pulex

Nematostella vectensis

Danio rerio

Mnemiopsis leidyi

Aurelia aurita

Crella elegans

Tethya wilhelma

Chondrilla nucula

Sphaeroforma arctica

Hydra vulgaris

Lottia gigantea

Bolocera tuediae

Salpingoeca rosetta

Saccoglossus kowalevskii

Capitella teleta

Sycon coactum

Strongylocentrotus purpuratus

Hydra oligactis

Aiptasia pallida

Physalia physalis

Aphrocallistes vastus

Pleurobrachia bachei

Mertensiidae sp.

Hormathia digitata

Tubulanus polymorphus

Sympagella nux

Drosophila melanogaster

Mortierella verticillata

Kirkpatrickia variolosa

Ministeria vibransCapsaspora owczarzaki

Hyalonema populiferum

Oscarella carmela

Sycon ciliatum

Rossella fibulata

Spizellomyces punctatus

Hydra viridissima

Monosiga ovata

Abylopsis tetragona

Amoebidium parasiticum

Allomyces macrogynus

Bolinopsis infundibulum

Eunicella verrucosa

Pseudospongosorites suberitoides

Rhizopus oryzae

Latrunculia apicalis

Nanomia bijuga

Amphimedon queenslandica

Platygyra carnosa

Priapulus caudatus

Corticium candelabrum

Pleurobrachia sp.

Trichoplax adhaerens

Ircinia fasciculata

Ephydatia muelleri

Vallicula sp.

0.83

0.87

0.98

0.69

0.3

Aurelia aurita

Ircinia fasciculata

Homo sapiens

Nematostella vectensis

Spizellomyces punctatus

Daphnia pulex

Coeloplana astericola

Sympagella nux

Amphimedon queenslandica

Priapulus caudatus

Acropora digitifera

Mertensiidae sp.

Aiptasia pallida

Tubulanus polymorphus

Allomyces macrogynus

Hydra viridissima

Platygyra carnosa

Capsaspora owczarzaki

Aphrocallistes vastus

Sphaeroforma arctica

Trichoplax adhaerens

Mnemiopsis leidyi

Drosophila melanogaster

Euplokamis dunlapae

Rossella fibulata

Craseoa lathetica

Agalma elegans

Spongilla alba

Latrunculia apicalis

Amoebidium parasiticumMinisteria vibrans

Bolocera tuediae

Sycon ciliatum

Pleurobrachia sp.

Vallicula sp.

Capitella teleta

Periphylla periphylla

Kirkpatrickia variolosaCrella elegans

Salpingoeca rosetta

Physalia physalis

Tethya wilhelma

Saccoglossus kowalevskii

Hydra vulgaris

Oscarella carmelaSycon coactum

Corticium candelabrum

Petrosia ficiformis

Rhizopus oryzae

Danio rerio

Petromyzon marinus

Mortierella verticillata

Ephydatia muelleri

Pleurobrachia bachei

Lottia gigantea

Abylopsis tetragona

Pseudospongosorites suberitoides

Hydra oligactis

Dryodora glandiformis

Eunicella verrucosa

Beroe abyssicola

Hyalonema populiferum

Chondrilla nucula

Lithobius forficatus

Bolinopsis infundibulum

Nanomia bijuga

Hormathia digitata

Hemithiris psittacea

Monosiga ovata

Strongylocentrotus purpuratus

0.99

0.84

0.68

0.66

0.3

Euprymna

Rhizopus orizae

Monosiga ovata

Nematostella vectensis

Oscarella

Sphaeroforma arctica

Spizellomyces punctatus

Mnemiopsis leiydi

Sycon raphanus

Hydra magnipapillata

Branchiostoma

Amoebidium parasiticum

Monosiga brevicollis

Montastrae faveolata

Aplysia californica

Heterochone calyx

Batrachochochytrium dendrobatidis

Oopsacas minuta

Leucetta chagosensis

Scutigera coleoptera

Ephydatia

Phycomyces blakesleeanus

Ciona intestinalis

Acropora

Euperipatoides

Petromyzon marinus

Metridium senile

Cyanea capillata

Molgula tectiformis

Pedicellina cernua

Ixodes scapularis

Amphimedon queenslandica

Podocoryne carnea

Nasonia vitripennis

Danio rerio

Crassostrea

Saccoglossus kowalevskii

Daphnia

Strongylocentrotus purpuratus

Helobdella

Capitella sp.

Hydractini echinata

Metridium senile

Allomyces macrogynus

Proterospongia sp.

Carteriospongia foliascens

Tubifex tubifex

Pediculus humanus

Pleurobracia pileus

Anoplodactylus eroticus

Capsaspora owczarzaki

Xenoturbella bocki

Suberites

Trichoplax adhaerens

Clytia hemisphaerica

29

47

99

89

5055

43

43

72

42

93

95

57

76

77

100

86

85

93

99

99

80

81

78

86

98

0.07

Mnemiopsis leiydi

Amoebidium parasiticum

Trichoplax adhaerens

Euprymna

Heterochone calyx

Sphaeroforma arctica

Sycon raphanus

Ciona intestinalis

Metrinsiidae sp.

Montastrae faveolata

Clytia hemisphaerica

Tubifex tubifex

Metridium senile

Scutigera coleoptera

Xenoturbella bocki

Podocoryne carnea

Capitella sp.

Pleurobrachia pileus

Proterospongia sp.

Oopsacas minuta

Capsaspora owczarzaki

Ephydatia

Strongylocentrotus purpuratus

Allomyces macrogynus

Rhizopus orizae

Anoplodactylus eroticus

Danio rerio

Carteriospongia foliascens

Branchiostoma

Aplysia californica

Petromyzon marinus

Monosiga brevicollis

Pediculus humanus

Crassostrea

Molgula tectigormis

Euperipatoides

Oscarella

Amphimedon queenslandica

Pedicellin cernua

Hydra magnipapillata

Daphnia

Nematostella vectensis

Cyanea capillata

Leucetta chagosensis

Saccoglossus kowalevskii

Batrachochytrium dendrobatidis

Acropora

Hydractini echinata

Spizellomyces punctatus

Nasonia vitripennis

Suberites

Monosiga ovata

Ixodes scapularis

Helobdella

Phycomyces blakesleeanus

0.70

0.62

0.90

0.99

0.94

0.96

0.61

0.90

0.55

0.99

0.99

0.2

A B C

D E

Den

sity

0

20

40

60

80

100

120

0.97 0.98 0.99 1.00

Den

sity

0

10

20

30

40

50

0.92 0.94 0.96 0.98 1.00Average Leaf Stability Index

CalcareaHomoscleromorpha

Trichoplax adhaerens

F

G

Fig. S5. (A) Maximum-likelihood topology of dataset with certain and uncertain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 25). (B) Bayesian inference topology of dataset of slowest evolving half of genes after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 6). (C) Bayesianinference topology of dataset of slowest evolving half of genes after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 16). (D) Maximum-likelihood phylogeny of Philippe et al. (1) dataset with ribosomal protein genes removed. Chimeric taxa are labeled only by their genus name as in the original publication. (E) Bayesianinference topology of Philippe et al. (1) dataset with ribosomal protein genes removed. Chimeric taxa are labeled only by their genus name as in the original publication. Bootstrap values or posterior probabilities for each node are 100% unless otherwise noted. (F) Density plot of average leaf stability indices for each taxon across all datasets. (G) Density plot of average leaf stabilityindices for each taxon in datasets without outgroups.

1. Philippe H, et al. (2009) Phylogenomics revives traditional views on deep animal relationships. Curr Biol 19(8):706–712.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 6 of 11

6 taxa

10 genes

18 genes

A

B

C

Fig. S6. Density plots of (A) average LB scores for each taxon, (B) average upper quartile LB score for each OG, and (C) SD of LB score for each OG. Shadedareas include taxa or genes that were considered to have “high” LB scores and were trimmed from respective datasets (see Fig. 2).

Whelan et al. www.pnas.org/cgi/content/short/1503453112 7 of 11

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

Dataset 3 Dataset 4

Dataset 7 Dataset 6

Dataset 2 Dataset 8

Dataset 10 Dataset 9

Dataset 13 Dataset 14

Dataset 17 Dataset 16

Dataset 12 Dataset 18

Dataset 20Dataset 19

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Den

sity

R of Linear Regression2

R of Linear Regression2

Nosenko Ribosomal Nosenko Non-Ribosomal

Philippe et al. Non-RibosomalPhilippe et al. Ribosomal

0.0 0.1 0.2 0.3 0.4

01

23

45

6 Dataset 2 Dataset 8

Dataset 10 Dataset 9

0.0 0.1 0.2 0.3

01

23

45

6

Dataset 12 Dataset 18

Dataset 20Dataset 19

Slope of Linear Regression

Slope of Linear Regression

−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6

01

23

45

Den

sity

Nosenko Ribosomal Nosenko Non-Ribosomal

Philippe et al. Non-RibosomalPhilippe et al. Ribosomal

−0.1 0.0 0.1 0.2 0.3 0.4

Den

sity

01

23

45

6Dataset 3 Dataset 4

Dataset 7Dataset 6

0.0 0.1 0.2 0.3 0.4

Dataset 13 Dataset 14

Dataset 17 Dataset 16

Den

sity

01

23

45

6

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

Den

sity

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

Den

sity

Fig. S7. Density plots of gene specific slope and R2 of linear regression between patristic and uncorrected pairwise distances for each dataset. Datasets with higher values for each metric are less saturated. Dataset numbers are referenced as in Fig. 2.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 8 of 11

Table S1. Raw data for each species used in orthology searches

SpeciesSequence

typeNo. of reads, transcriptsor predicted proteins

Source orcollection locality NCBI or other accession

FungiAllomyces macrogynus Whole genome 19,446 proteins Broad Institute www.broadinstitute.org/annotation/genome/

multicellularity_project/MultiHome.htmlAspergillus fumigatus† Whole genome 11,485 proteins InParanoid databaseMortierella verticillata Whole genome 12,569 proteins Broad www.broadinstitute.org/annotation/genome/

multicellularity_project/MultiHome.htmlNeurospora crassa† Whole genome 9,822 proteins InParanoid databaseRhizopus oryzae Whole genome 16,971 proteins InParanoid databaseSaccharomyces cerevisiae* Whole genome 6,590 proteins InParanoid databaseSpizellomyces punctatus Whole genome 9,424 proteins Broad Institute www.broadinstitute.org/annotation/genome/

multicellularity_project/MultiHome.htmlIchthyosporeaAmoebidium parasiticum Illumina 105,237 transcripts Broad Institute www.broadinstitute.org/annotation/genome/

multicellularity_project/MultiHome.htmlCapsaspora owczarzaki Whole genome 101,23 proteins Broad Institute www.broadinstitute.org/annotation/genome/

multicellularity_project/MultiHome.htmlMinisteria vibrans Illumina 19,953,296 NCBI SRA SRR343051Sphaeroforma arctica Whole genome 18,730 proteins Broad Institute www.broadinstitute.org/annotation/genome/

multicellularity_project/MultiHome.htmlChoanoflagellataAcanthoeca spectabilis* Illumina 41,408,758 reads ATCC SRR1915695Monosiga brevicollis* Whole genome 10336 proteins Joint Genome InstituteMonosiga ovata Sanger 29,495 sequences dbESTSalpingoeca pyxidium* Illumina 34,177,888 reads ATCC SRR1915694Salpingoeca rosetta Whole genome 11,731 proteins Broad Institute www.broadinstitute.org/annotation/genome/

multicellularity_project/MultiHome.htmlPlacozoaTrichoplax adhaerans Whole genome 11,179 proteins Joint Genome Institute

CnidariaAcropora digitifera Whole genome 33,366 proteins OIST Marine genomics

UnitAbylopsis tetragona Illumina 43150352 reads NCBI SRA SRR871525Bolocera tuediae 454 546,903 reads NCBI SRA SRR504347Agalma elegans Illumina 107,996,364 reads NCBI SRA SRR871526Nanomia bijuga Illumina 105,066,886 reads NCBI SRA SRR871527Aiptasia pallida Illumina 536616844 reads NCBI SRA SRR696721; SRR696732; SRR696745Aurelia aurita

(strain Roscoff)454 1,099,012 reads NCBI SRA SRR040475; SRR040476; SRR040477;

SRR040478; SRR040479Eunicella verrucosa Illumina 70,071,835 reads NCBI SRA SRR1324944; SRR1324945Hydra oligactis 454 1,017,333 reads NCBI SRA SRR040466; SRR040467; SRR040468;

SRR040469Hydra viridissima 454 1,048,810 reads NCBI SRA SRR040470; SRR040471; SRR040472;

SRR040473Hormathia digitata 454 546,846 reads NCBI SRA SRR504348Hydra vulgaris Sanger 167,982 sequences dbESTNematostella vectensis Whole genome 27,273 proteins Joint Genome InstitutePeriphylla periphylla Illumina 44, 443,566 45°55.33N 125°05.47 W SRR1915828Physalia physalis Illumina 72,963,546 reads NCBI SRA SRR871528Platygyra carnosus Illumina 72,963,546 reads NCBI SRA SRR402974; SRR402975Craseoa lathetica Illumina 76,466,398 reads NCBI SRA SRR871529

PoriferaAmphimedon

queenslandicaSanger 63,542 sequences dbEST

Aphrocallistes vastus Illumina preassembled Evolution & ResearchArchive

Chondrilla nucula Illumina preassembled Dataverse Network (1)Corticium candelabrum Illumina preassembled Dataverse Network (1)Crella elegans Illumina preassembled Dryad doi: 10.5061/dryad.50dc6/3Ephydatia muelleri Illumina 156,515,380 reads NCBI SRA SRR1041944Hyalonema populiferum Illumina 62,031,394 reads 43°55.376N 127°24.65W SRR1916923Ircinia fasciculata Illumina preassembled Dataverse Network (1)Kirkpatrickia variolosa Illumina 59,234,424 reads 77°54.228S 170°57.915E SRR1916957

Whelan et al. www.pnas.org/cgi/content/short/1503453112 9 of 11

Table S1. Cont.

SpeciesSequence

typeNo. of reads, transcriptsor predicted proteins

Source orcollection locality NCBI or other accession

Latrunculia apicalis Illumina 49,432,908 reads 82°54.228S 175°57.915 E SRR1915755Oscarella carmela Illumina Preassembled www.compagen.orgPetrosia ficiformis Illumina preassembled Dataverse Network (1)Pseudospongosorites

suberitoidesIllumina preassembled Dataverse Network (1)

Rossella fibulata Illumina 79,607,548 reads 76°14.716S 174°30.247E SRR1915835Spongilla alba Illumina preassembled Dataverse Network (1)Sycon coactum Illumina preassembled Dataverse Network (1)Sympagella nux Illumina 30,065,060 reads 28°10.26N 89°56.80W SRR1916581Tethya wilhelma 454 442,949 reads NCBI SRA ERR216193

CtenophoraBeroe abyssicola Illumina 45,444,644 reads NCBI SRA SRR777787Bolinopsis infundibulum Illumina 43,377,170 reads NCBI SRA SRR786491Coeloplana astericola Illumina 41,685,220 reads NCBI SRA SRR786490Dryodora glandiformis Illumina 41,269,166 reads NCBI SRA SRR777788Euplokamis dunlapae Illumina 68,302,698 reads NCBI SRA SRR777663Mertensiidae sp. Illumina 47,454,246 reads NCBI SRA SRR786492Mnemiopsis leidyi Illumina 49,782,900 reads NCBI SRA SRR789900Pleurobrachia bachei Whole genome 80,151 proteins neurobase.rc.ufl.edu/

pleurobrachiaPleurobrachia sp. Illumina 50626422 reads NCBI SRA SRR789901Vallicula sp. Illumina 49,090,372 reads NCBI SRA SRR786489

BilateriaHomo sapiens Whole genome N/A HaMStR core orthologsDanio rerio Whole genome 54,390 proteins EnSEMBLPetromyzon_marinus Whole genome 12,444 proteins EnSEMBLSaccoglossus kowalevskii Sanger 202,190 sequences dbESTStrongylocentrotus

purpuratusWhole genome 28,474 proteins InParanoid

Capitella teleta Whole genome 32,415 proteins Joint Genome InstituteHemithiris psittacea Illumina 60,730,022 reads NCBI SRA SRR1611556Lottia gigantea Whole genome 22,899 proteins Joint Genome InstituteTubulanus polymorphus Illumina 39,262,732 reads NCBI SRA SRR1611583Daphnia pulex Whole genome 30,137 proteins Joint Genome InstituteDrosophila melanogaster Whole genome N/A HaMStR core orthologsPriapulus caudatus Illumina 57,331,982 reads NCBI SRA SRR1611567Lithobius forficatus Illumina 57,098,550 reads NCBI SRA SRR1159752

Taxa in bold were sequenced in this study.*Taxa identified as possibly causing LBA based on LB scores.

1. Riesgo A, Farrar N, Windsor PJ, Giribet G, Leys SP (2014) The analysis of eight transcriptomes from all Poriferan classes reveals surprising genetic complexity in sponges. Mol Biol Evol31(5):1102–1120.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 10 of 11

Table S2. Data statistics for each phylogenetic dataset

Dataset (no.) Taxa Genes Sites Gene occupancy, %Missing data

(including gaps), %

Full dataset (1) 76 251 81,008 75 40Certain paralogs removed (2) 76 250 80,735 75 40Taxa with high LB scores removed (3) 70 250 80,735 75 41Genes with high LB scores removed (4) 70 231 73,323 75 41All outgroups except Choanoflagellates removed (5) 62 231 73,323 73 44All outgroups removed (6) 60 231 73,323 73 44Slowest evolving half of genes (7) 70 115 33,403 80 35Genes with lowest half of RCFV values (8) 70 115 41,933 81 43Heterogeneous genes removed (9) 76 228 66,571 76 38Genes with high LB scores removed (10) 76 210 59,733 75 38Taxa with high LB scores removed (11) 70 210 59,733 82 38All outgroups except Choanoflagellates removed (12) 62 210 59,733 74 40All outgroups removed (13) 60 210 59,733 75 40Certain and uncertain paralogs removed (14) 76 209 60,768 72 41Taxa with high LB scores removed (15) 70 209 60,768 78 42Genes with high LB scores removed (16) 70 191 54,193 78 42All outgroups except Choanoflagellates removed (17) 62 191 54,193 70 44All outgroups removed (18) 60 191 54,193 71 44Slowest evolving half of genes (19) 70 89 23,680 78 35Genes with lowest half of RCFV values (20) 70 96 33,069 80 40Heterogeneous genes removed (21) 76 195 52,543 73 39Genes with high LB scores removed (22) 76 178 46,542 73 39Taxa with high LB scores removed (23) 70 178 46,542 79 34All outgroups except Choanoflagellates removed (24) 62 178 46,542 71 41All outgroups removed (25) 60 178 46,542 72 41

Datasets nested in others have the same data filtered as corresponding upper-level datasets (see Fig. 2).

Whelan et al. www.pnas.org/cgi/content/short/1503453112 11 of 11