Post on 14-May-2023
SHORT COMMUNICATION
Single-cell genomics shedding light on marineThaumarchaeota diversification
Haiwei Luo1, Bradley B Tolar1, Brandon K Swan2, Chuanlun L Zhang1,3,Ramunas Stepanauskas2, Mary Ann Moran1 and James T Hollibaugh1
1Department of Marine Sciences, University of Georgia, Athens, GA, USA; 2Bigelow Laboratory for OceanSciences, East Boothbay, ME, USA and 3State Key Laboratory of Marine Geology, Tongji University,Shanghai, China
Previous studies based on analysis of amoA, 16S ribosomal RNA or accA gene sequenceshave established that marine Thaumarchaeota fall into two phylogenetically distinct groupscorresponding to shallow- and deep-water clades, but it is not clear how water depth interactswith other environmental factors, including light, temperature and location, to affect this patternof diversification. Earlier studies focused on single-gene distributions were not able to linkphylogenetic structure to other aspects of functional adaptation. Here, we analyzed the genomecontent of 46 uncultivated single Thaumarchaeota cells sampled from epi- and mesopelagic watersof subtropical, temperate and polar oceans. Phylogenomic analysis showed that populationsdiverged by depth, as expected, and that mesopelagic populations from different locations were wellmixed. Functional analysis showed that some traits, including putative DNA photolyase and catalasegenes that may be related to adaptive mechanisms to reduce light-induced damage, were foundexclusively in members of the epipelagic clade. Our analysis of partial genomes has thus confirmedthe depth differentiation of Thaumarchaeota populations observed previously, consistent with thedistribution of putative mechanisms to reduce light-induced damage in shallow- and deep-waterpopulations.The ISME Journal (2014) 8, 732–736; doi:10.1038/ismej.2013.202; published online 7 November 2013Subject Category: Integrated genomics and post-genomics approaches in microbial ecologyKeywords: marine group I Archaea; Thaumarchaeota; single-cell genomics; nitrification; ammoniaoxidation
Thaumarchaeota are abundant in both marine andterrestrial environments and have a significant rolein the global nitrogen cycle (Leininger et al., 2006;Wuchter et al., 2006). Oceanic Thaumarchaeota aredistributed throughout the water column, with theirrelative abundance increasing with depth to up to50% of mesopelagic prokaryotic cells (Karner et al.,2001). Cultivation of marine Thaumarchaeota hasproven to be difficult. Only one strain, Nitrosopumilusmaritimus SCM1 (Konneke et al., 2005), has beensuccessfully brought into pure culture, while severalothers have been cultivated as enrichment cultures(Blainey et al., 2011; Mosier et al., 2012a,b; Park et al.,2012a,b). Therefore, our understanding of theirdiversity and ecological function has been gainedlargely through culture-independent approaches.Using a few marker genes (16S ribosomal RNA,amoA, accA), many studies have shown that marine
Thaumarchaeota fall into shallow- and deep-waterphylogenetic clades (Francis et al., 2005; Hallam et al.,2006; Nicol et al., 2011). In addition to depth, light,temperature, latitude and dissolved oxygen have beenidentified as important correlates of Thaumarchaeotadiversity and distributions in the ocean (Prosser andNicol, 2008; Biller et al., 2012). These single-gene-based analyses have outlined the phylogenetic dis-tribution and diversity of marine Thaumarchaeota, butnot provided insights into the adaptive mechanismsgiving rise to specific ecotypes.
We turned to single amplified genomes (SAGs)in order to link additional metabolic capabilitiesto taxonomy and to reconstruct phylogeny usingcharacters sampled across the genome. Forty-sixsingle cells related to Thaumarchaeota populationsin a variety of marine environments (SupplementaryFigure S1) were obtained from epi- and mesopelagicwaters of the Southern Ocean, temperatenorth Atlantic, subtropical north Pacific and southAtlantic (Supplementary Table S1). Genomes ofthese cells were recovered with variable success,with a mean of 32% (±12%) relative tothe Nitrosopumilus maritimus SCM1 genome
Correspondence: H Luo or JT Hollibaugh, Department of MarineSciences, University of Georgia, Athens, GA 30602, USA.E-mail: hluo2006@gmail.com or aquadoc@uga.eduReceived 15 August 2013; revised 27 September 2013; accepted 6October 2013; published online 7 November 2013
The ISME Journal (2014) 8, 732–736& 2014 International Society for Microbial Ecology All rights reserved 1751-7362/14
www.nature.com/ismej
(1.65 Mbp). A maximum likelihood phylogenomictree constructed using a concatenated amino-acidsequence of 97 single-copy orthologous genes(Supplementary Table S2) with 27 041 sites stronglysupported separating the SAGs into two phylo-genetically coherent groups correspondingto epi- and mesopelagic clades (Figure 1). Thewithin-clade average nucleotide identity (ANI) is89.0% (±8.9%) and 86.8% (±5.4%) forepi- and mesopelagic clades, respectively, whereas
the between-clade ANI is 75.4% (±4.1%). When thecomposite genomes from enrichment cultures orfrom metagenomic assembly were included inthe phylogenomic analysis, the surface water SAGsappeared to be evolutionarily separated from all thecultured marine Thaumarchaeota (SupplementaryFigure S2). Among the eight Antarctic SAGssampled, three were obtained from 80 m, a depthassociated with the Winter Water (WW) water mass(Church et al., 2003), whereas the remaining fivewere sampled from 400 m in the Circumpolar DeepWater (CDW) water mass. Although separatedby only B300-m depth, the SAGs from these twowater masses were confidently assigned tothe epi- and mesopelagic clades, respectively(Figure 1). Conversely, among the mesopelagiccells collected 1000s of km apart from CDW andsubtropical waters, SAGs were intermingled withinthe mesopelagic clade (Figure 1).
Previous studies have shown that marineThaumarchaeota are inhibited by light (Merbtet al., 2012) and suggested that sensitivity tophotoinhibition might be a key factor determiningtheir depth distribution (Mincer et al., 2007; Churchet al., 2010; Hu et al., 2011; Merbt et al., 2012). Lightwas also implicated as the factor determining theseasonal dynamics of Thaumarchaeota in the South-ern Ocean, where they are abundant during winterbut nearly absent in summer (Kalanetra et al., 2009).Yet no direct evidence from field populationshas been reported to support these hypotheses.Our analyses of genomes from uncultivatedThaumarchaeota showed that a homolog to deoxy-ribodipyrimidine photolyase, a key gene in thepathway to repair ultraviolet-induced DNA damage(Goosen and Moolenaar, 2008), was present in aSAG sampled from Gulf of Maine surface water(1-m depth), but absent from all of the 42 mesope-lagic SAGs (Supplementary Table S3). The occur-rence of putative DNA photolyases in surface waterThaumarchaeota was reinforced by conducting arigorous search for DNA photolyase genes asso-ciated with Thaumarchaeota reads in the GlobalOcean Survey (GOS) surface water metagenomes.Our identification of seven putative Thaumarch-aeota photolyase genes in the GOS data (Figure 2a) isconsistent with the hypothesis that light is animportant factor structuring marine Thaumarch-aeota populations by depth, and suggests thatsurface water members have evolved effectivemechanisms to cope with ultraviolet-induced DNAdamage. It remains unknown, however, whether theprocess of ammonia oxidation is indeed subject tophotoinhibition, as has been suggested previously(Mincer et al., 2007). DNA photolyase was not foundin the other three epipelagic clade SAGs that wereassociated with the Antarctic WW water mass andcollected from 80 m, which is inconclusive becauseof the low recovery of genome sequence from thesecells (coverage o35% relative to N. maritimusSCM1).
AAA288-N15
AAA008-O05
AAA007-N23
AAA007-E15
AAA008-N07
AAA007-O23
AAA007-N19
AAA008-E02
AAA288-J14
AAA008-P02
AAA001-A19
AAA008-E15
AAA008-O18
AAA008-M23
AB-663-P07
AAA288-I14
AAA288-D22
AB-663-G14
AAA288-O22
AAA008-P23
AAA288-C17
AAA288-P18
AB-663-F14
AB-663-N18
AAA288-N23
AAA007-C21
AAA007-M20
AAA288-D03
AAA008-G03
AAA288-E09
AAA288-M04
AAA288-G05
AAA288-M23
AAA288-K05
AAA288-K20
AAA007-G17
AAA008-E17
AAA288-O17
AAA288-P02
AAA288-K02
AAA008-M21
AB-663-O07
AB-661-I02
AB-661-L21
AB-661-M19
AAA160-J20
Nitrosopumilus maritimus
100100
100
82100
54
100
100
100
100
86
100
95
100
100
100
96
60
80
61
78
66
86
100
100
100
100
100
100
100
52
100
100
99
100
87
100
0.05
770 m, South Atlantic Gyre
800 m, North Pacific Gyre
400 m, AntarcticCircumpolar Deep Water
80 m, Antarctic Winter Water
Surface, Gulf of Maine
Mesopelagicclade
Epipelagicclade
Figure 1 Maximum likelihood phylogenomic analysis of 47genomes of the marine Thaumarchaeota. The tree was constructedusing the RAxML v7.3.0 software (Stamatakis, 2006) using aconcatenated amino-acid sequence of 97 genes with 27 041 sites,with a data partition model determined by the PartitionFindersoftware (Lanfear et al., 2012). Values at the nodes show thenumber of times the clade defined by that node appeared in the 100bootstrapped data sets. Two Crenarchaeota outgroup species arenot shown. Details of tree construction can be found inSupplementary Material. The epi- and mesopelagic clades areindicated by shading. Single-cell genomes from different watermasses/locations/depths are marked with different colors asidentified in the legend inset.
Diversification of marine ThaumarchaeotaH Luo et al
733
The ISME Journal
Members of the epi- and mesopelagic clades alsoappear to differ in their capabilities for reducingoxidative stress. Genes encoding superoxidedismutase, which catalyzes the dismutation ofsuperoxide into oxygen and hydrogen peroxide,are equally abundant in epi- and mesopelagic clades(Supplementary Table S3; w2 test; P40.05). Hydro-gen peroxide is subsequently converted to water bythe enzymes peroxiredoxin (also known as alkylhydroperoxide reductase) and catalase. Althoughperoxiredoxin gene families occur with comparablefrequency in both clades (Supplementary Table S3;w2 test; P40.05), a gene with high homology tocatalase was found exclusively in two WW SAGsof the epipelagic clade (Supplementary Table S3).A key difference between these two types ofantioxidant enzymes is that peroxiredoxin is 100-to 1000-fold less efficient than catalase; the latterbecomes crucial once the former is saturated withhydrogen peroxide (Parsonage et al., 2008). Further,there is evidence that catalase is critical in mini-mizing ultraviolet-induced oxidative damage inbacteria (Costa et al., 2010). Phylogenetic analysissuggested that this gene was acquired throughhorizontal gene transfer (Figure 2b), which is furthersubstantiated by its absence in the genomes of anyof the seven cultured marine Thaumarchaeotasequenced to date by homology search, all ofwhich are members of the epipelagic ecotype(Supplementary Figure S2). These results areconsistent with microbes in epipelagic watersexperiencing a stronger oxidative stress becauseof photochemical and photosynthetic productionof reactive oxygen species compared with those in
deep water where biological activity is the singlesource of superoxide (Diaz et al., 2013).
When gene functions were assigned more broadlyusing either COG (Tatusov et al., 1997) or arCOG(Wolf et al., 2012) categories, we found that thegenome content of the epipelagic clade wassignificantly different from the mesopelagic clade(w2 test; Po0.001), with the signal transductionfunctional category significantly enriched in theepipelagic clade (Rodriguez-Brito et al., 2006). Theability of generalist marine bacteria to respondto a changing environment has been attributedto differences in the sophistication of regulatorymachinery (Lauro et al., 2009; Luo et al., 2013),and this reasoning may apply to selectionpressures operating on epipelagic Thaumarchaeotacompared with those inhabiting the more stablemesopelagic waters. By contrast, we found a higherabundance of urease genes in SAGs from themesopelagic compared with the epipelagic clade(Supplementary Table S3), consistent with a recentreport of the depth distribution of Thaumarchaeotaurease genes in polar oceans (Alonso-Saez et al.,2012).
In conclusion, differentiation of epi- and mesope-lagic Thaumarchaeota populations first detected byanalysis of single genes (Francis et al., 2005; Hallamet al., 2006) is supported by phylogenomic analysisof partial genomes retrieved from uncultivated cells.The exclusive presence of putative DNA photolyaseand catalase in SAGs from the epipelagic is strongevidence that light or light-driven photochemistry isa major factor structuring marine Thaumarchaeotaby depth.
Figure 2 Bayesian phylogenetic tree of (a) photolyase and (b) catalase amino-acid sequences. The trees were constructed using theMrBayes v3.1.2 software (Ronquist and Huelsenbeck, 2003) using the WAG þG4 model. Values at the nodes are posterior probabilities ofthe internal branches. Details of tree construction can be found in Supplementary Material. The distinct phylogenetic groups(Tharmarchaeota, Euryarchaeota, Crenarchaeota, Bacteria) are indicated by shading. The trees consist of sequences from single cells(filled star), reference taxa with sequence id (NCBI gi/accession/locus tag) given in parenthesis, and homologs in GOS metagenomes withsequence id in the format of ‘JCVI_READ_XXXX’.
Diversification of marine ThaumarchaeotaH Luo et al
734
The ISME Journal
Conflict of Interest
The authors declare no conflict of interest.
Acknowledgements
We thank the Georgia Advanced Computing ResourceCenter at the University of Georgia for providing computa-tional resources. This research was funded by grantsfrom NSF (OPP 08-38996; OCE-1232982, EF-826924 andOCE-821374) and the Gordon and Betty Moore Founda-tion, with sequencing support from the US Department ofEnergy Joint Genome Institute Community SupportedProgram grants 2010-77 and 2011-387.
References
Alonso-Saez L, Waller AS, Mende DR, Bakker K,Farnelid H, Yager PL et al. (2012). Role for urea innitrification by polar marine Archaea. Proc Natl AcadSci USA 109: 17989–17994.
Biller SJ, Mosier AC, Wells GF, Francis CA. (2012). Globalbiodiversity of aquatic ammonia-oxidizing Archaea ispartitioned by habitat. Front Microbiol 3: 252.
Blainey PC, Mosier AC, Potanina A, Francis CA,Quake SR. (2011). Genome of a low-salinityammonia-oxidizing archaeon determined by single-cell and metagenomic analysis. PLoS One 6: e16626.
Church MJ, Wai B, Karl DM, DeLong EF. (2010). Abun-dances of crenarchaeal amoA genes and transcripts inthe Pacific Ocean. Environ Microbiol 12: 679–688.
Church MJ, DeLong EF, Ducklow HW, Karner MB, PrestonCM, Karl DM. (2003). Abundance and distributionof planktonic Archaea and Bacteria in the waterswest of the Antarctic Peninsula. Limnol Oceanogr 48:1893–1902.
Costa CS, Pezzoni M, Fernandez RO, Pizarro RA. (2010).Role of the quorum sensing mechanism in theresponse of Pseudomonas aeruginosa to lethal andsublethal UVA irradiation. Photochem Photobiol 86:1334–1342.
Diaz JM, Hansel CM, Voelker BM, Mendes CM, Andeer PF,Zhang T. (2013). Widespread production of extracel-lular superoxide by heterotrophic bacteria. Science340: 1223–1226.
Francis CA, Roberts KJ, Beman JM, Santoro AE,Oakley BB. (2005). Ubiquity and diversity ofammonia-oxidizing Archaea in water columns andsediments of the ocean. Proc Natl Acad Sci USA 102:14683–14688.
Goosen N, Moolenaar GF. (2008). Repair of UV damage inbacteria. DNA Repair 7: 353–379.
Hallam SJ, Mincer TJ, Schleper C, Preston CM, Roberts K,Richardson PM et al. (2006). Pathways of carbonassimilation and ammonia oxidation suggested byenvironmental genomic analyses of marine. Crenarch-aeota. PLoS Biol 4: e95.
Hu A, Jiao N, Zhang R, Yang Z. (2011). Niche partitioningof marine group I Crenarchaeota in the euphoticand upper mesopelagic zones of the East China Sea.Appl Environ Microbiol 77: 7469–7478.
Kalanetra KM, Bano N, Hollibaugh JT. (2009). Ammonia-oxidizing Archaea in the Arctic Ocean and Antarcticcoastal waters. Environ Microbiol 11: 2434–2445.
Karner MB, DeLong EF, Karl DM. (2001). Archaealdominance in the mesopelagic zone of the PacificOcean. Nature 409: 507–510.
Konneke M, Bernhard AE, de la Torre JR, Walker CB,Waterbury JB, Stahl DA. (2005). Isolation of anautotrophic ammonia-oxidizing marine archaeon.Nature 437: 543–546.
Lanfear R, Calcott B, Ho SYW, Guindon S. (2012).PartitionFinder: combined selection of partitioningschemes and substitution models for phylogeneticanalyses. Mol Biol Evol 29: 1695–1701.
Lauro FM, McDougald D, Thomas T, Williams TJ, Egan S,Rice S et al. (2009). The genomic basis of trophicstrategy in marine bacteria. Proc Natl Acad Sci USA106: 15527–15533.
Leininger S, Urich T, Schloter M, Schwark L, Qi J,Nicol GW et al. (2006). Archaea predominate amongammonia-oxidizing prokaryotes in soils. Nature 442:806–809.
Luo H, Csu+ros M, Hughes AL, Moran MA. (2013).Evolution of divergent life history strategies in marineAlphaproteobacteria. mBio 4: e00373–13.
Merbt SN, Stahl DA, Casamayor EO, Martı E, Nicol GW,Prosser JI. (2012). Differential photoinhibition ofbacterial and archaeal ammonia oxidation. FEMSMicrobiol Lett 327: 41–46.
Mincer TJ, Church MJ, Taylor LT, Preston C, Karl DM,DeLong EF. (2007). Quantitative distribution ofpresumptive archaeal and bacterial nitrifiers inMonterey Bay and the North Pacific Subtropical Gyre.Environ Microbiol 9: 1162–1175.
Mosier AC, Allen EE, Kim M, Ferriera S, Francis CA.(2012a). Genome sequence of ‘candidatus Nitrosoarch-aeum limnia’ BG20, a low-salinity ammonia-oxidizingarchaeon from the San Francisco Bay estuary.J Bacteriol 194: 2119–2120.
Mosier AC, Allen EE, Kim M, Ferriera S, Francis CA.(2012b). Genome sequence of ‘‘Candidatus Nitrosopu-milus salaria’’ BD31, an ammonia-oxidizing archaeonfrom the San Francisco Bay estuary. J Bacteriol 194:2121–2122.
Nicol GW, Leininger S, Schleper C. (2011). Distributionand activity of ammonia-oxidizing Archaea innatural environments. In Ward BB, Arp DJ, Klotz MJ(eds) Nitrification. ASM Press: Washington, DC,pp 157–178.
Park S-J, Kim J-G, Jung M-Y, Kim S-J, Cha I-T,Kwon K et al. (2012a). Draft genome sequence ofan ammonia-oxidizing Archaeon, ‘CandidatusNitrosopumilus koreensis’ AR1, from marine sedi-ment. J Bacteriol 194: 6940–6941.
Park S-J, Kim J-G, Jung M-Y, Kim S-J, Cha I-T, Ghai R et al.(2012b). Draft genome sequence of an ammonia-oxidizing Archaeon, ‘Candidatus Nitrosopumilussediminis’ AR2, from svalbard in the Arctic circle.J Bacteriol 194: 6948–6949.
Parsonage D, Karplus PA, Poole LB. (2008). Substratespecificity and redox potential of AhpC, abacterial peroxiredoxin. Proc Natl Acad Sci USA105: 8209–8214.
Prosser JI, Nicol GW. (2008). Relative contributionsof Archaea and bacteria to aerobic ammoniaoxidation in the environment. Environ Microbiol 10:2931–2941.
Rodriguez-Brito B, Rohwer F, Edwards R. (2006). Anapplication of statistics to comparative metagenomics.BMC Bioinformatics 7: 162.
Diversification of marine ThaumarchaeotaH Luo et al
735
The ISME Journal
Ronquist F, Huelsenbeck JP. (2003). MrBayes 3: Bayesianphylogenetic inference under mixed models. Bioinfor-matics 19: 1572–1574.
Stamatakis A. (2006). RAxML-VI-HPC: maximum like-lihood-based phylogenetic analyses with thousands oftaxa and mixed models. Bioinformatics 22: 2688–2690.
Tatusov RL, Koonin EV, Lipman DJ. (1997). A genomicperspective on protein families. Science 278: 631–637.
Wolf Y, Makarova K, Yutin N, Koonin E. (2012). Updatedclusters of orthologous genes for Archaea: a complexancestor of the Archaea and the byways of horizontalgene transfer. Biol Direct 7: 46.
Wuchter C, Abbas B, Coolen MJL, Herfort L, van BleijswijkJ, Timmers P et al. (2006). Archaeal nitrificationin the ocean. Proc Natl Acad Sci USA 103:12317–12322.
Supplementary Information accompanies this paper on The ISME Journal website (http://www.nature.com/ismej)
Diversification of marine ThaumarchaeotaH Luo et al
736
The ISME Journal
Supplemental Material:
Single Cell Genomes of Marine Thaumarcheota Reveal Insights into Population
Differentiation by Depth
Haiwei Luo, Bradley B. Tolar, Brandon K. Swan, Chuanlun L. Zhang, Ramunas Stepanauskas,
Mary Ann Moran, James T. Hollibaugh
Supplemental Methods
Single cell sample collection and construction of single amplified genome (SAG) libraries
Water samples for single cell analyses were collected and replicate, 1 mL subsamples
were cryopreserved with 6% glycine betaine (Sigma) and stored at –80 ºC (Cleland et al., 2004).
Prior to cell sorting, samples with prokaryote cell abundances above 5x105 mL
-1 were diluted
10x with filter-sterilized field samples and screened through a 70 µm mesh-size cell strainer
(BD). For heterotrophic prokaryote detection, diluted subsamples (1-3 mL) were incubated for
10-120 min with SYTO-9 DNA stain (5 µM; Invitrogen). Cell sorting was performed with a
MoFlo™ (Beckman Coulter) flow cytometer using a 488 nm argon laser for excitation, a 70 µm
nozzle orifice and a CyClone™ robotic arm for droplet deposition into microplates. The
cytometer was triggered on side scatter. The “single 1 drop” mode was used for maximal sort
purity. Prokaryote cells were separated from eukaryotes, viruses, and detritus based on SYTO-9
fluorescence (proxy to nucleic acid content) and light side scatter (proxy to particle size) (del
Giorgio et al., 1996). Synechococcus cells were excluded, based on their autofluorescence
signal. Target cells were deposited into 384-well plates containing 600 nL per well of either a)
1x TE buffer or b) prepGEM™ Bacteria (Zygem) reaction mix and stored at –80 ºC until further
processing. Of the 384 wells, 315 were dedicated for single cells, 66 were used as negative
controls (no droplet deposited) and 3 received 10 cells each (positive controls).
The accuracy of droplet deposition was determined by depositing 10 mm fluorescent
beads into 384-well plates then the results were checked by microscopically verifying the
presence of beads in the plate wells. Of the 2-3 plates examined each sort day, with one bead
deposited per well, fewer than 2% of wells were found to contain no bead and 0.4% to contain
more than one bead. The latter is most likely caused by co-deposition of two beads attached to
each other, which at certain orientation may have similar optical properties to single beads.
Cells were sorted into TE buffer were lysed and their DNA was denatured using cold
KOH (Raghunathan et al., 2005). Genomic DNA from the lysed cells was amplified using
multiple displacement amplification (MDA) (Dean et al., 2002; Raghunathan et al., 2005) in 10
µL final volume. The MDA reactions contained 2 U/µL Repliphi polymerase (Epicentre), 1x
reaction buffer (Epicentre), 0.4 mM each dNTP (Epicentre), 2 mM DTT (Epicentre), 50 mM
phosphorylated random hexamers (IDT) and 1 µM SYTO-9 (Invitrogen) (all final
concentration). The MDA reactions were run at 30 °C for 12-16 h, then inactivated by a 15 min
incubation at 65 °C. Amplified genomic DNA was stored at -80 °C until further processing. We
refer to the MDA products originating from individual cells as single amplified genomes
(SAGs).
Prior to cell sorting, the instrument and the workspace were decontaminated for DNA as
previously described (Stepanauskas and Sieracki, 2007). High molecular weight DNA
contaminants were removed from all MDA reagents by a UV treatment in Stratalinker
(Stratagene) (Woyke et al., 2011). During UV treatment, reagents were placed on ice to avoid
overheating. An empirical optimization of the UV exposure was performed to remove all
detectable contaminants without inactivating the reaction. Cell sorting and MDA setup were
performed in a HEPA-filtered environment. As a quality control, the kinetics of all MDA
reactions was monitored by measuring the SYTO-9 fluorescence using either LightCycler 480
(Roche) or FLUOstar Omega (BMG). The critical point (Cp) was determined for each MDA
reaction as the time required to produce half of the maximal fluorescence. The Cp is inversely
correlated to the amount of DNA template (Zhang and Fang, 2006).
PCR screening of SAG libraries
MDA products were diluted 50-fold in TE buffer and 500 nL aliquots of diluted MDA
product served as the template DNA in 5 µL final volume real-time PCR screens. All PCR
reactions were performed using LightCycler 480 SYBR Green I Master Mix (Roche) and the
Roche LightCycler® 480 II real-time thermal cycler. PCR amplification of Archaeal SSU rRNA
from SAGs was done using primers Arch_344F (ACG GGG YGC AGC AGG CGC GA) and
Arch_915R (GTG CTC CCC CGC CAA TTC CT) (Lane et al. 1991). Forward (5´–
GTAAAACGACGGCCAGT–3´) and reverse (5´–CAGGAAACAGCTATGACC–3´) M13
sequencing primers were added to the 5´ ends of each target primer pair to aid direct sequencing
of PCR products. All PCR reactions were run for 40 cycles at the appropriate annealing
temperature, followed by melting curve analysis performed as follows: 95°C for 5 s, 52°C for 1
min, and a continuous temperature ramp (0.11°C/s) from 52 to 97°C. Real-time PCR kinetics and
amplicon melting curves served as proxies for detecting SAGs positive for target genes. New,
20 µL PCR reactions were set up for all PCR-positive SAGs and amplicons were sequenced
from both ends using Sanger technology by Beckman Coulter Genomics.
Single cell sorting, whole genome amplification, real-time PCR screens and PCR product
sequence analyses were performed at the Bigelow Laboratory Single Cell Genomics Center
following protocols described on their web site (www.bigelow.org/scgc). Antarctic SAGS were
also screened at the University of Georgia for the presence of Archaeal amoA genes using
primers and qPCR conditions described in Francis et al. (2005) and Wuchter et al. (2006). PCR
products were sequenced at the Georgia Genomics Facility to verify amplification of the target
gene.
SAG sequencing and analysis
A total of 46 Thaumarchaeota SAGs were chosen for whole genome sequencing based on
multiple displacement amplification (MDA) kinetics, presence of metabolic genes from PCR
screening and geographic location of the sampling site. Three approaches were used for
sequencing marine Thaumarchaeota SAGs: 1) A combination of Illumina and 454 shotgun
sequencing (AAA007-O23), or Illumina only (AB-661-I02, AB-661-L21, AB-661-M19, AB-
663-F14, AB-663-G14, AB-663-N18, AB-663-O07, AB-663-P07, AAA160-J20, AAA001-
A19), as described in Swan et al. (2011); 2) a combination of Illumina and PacBio long read
sequence data (AAA007-N19, AAA288-I14, and AAA288-J14) as described in Martinez-Garcia
et al. (2012) and assembled using Velvet-SC (Chitsaz et al., 2011) and PBcR (Koren et al.,
2012) and; 3) 454 shotgun sequencing of Nextera-prepared libraries followed by dual assembly
with Newbler v2.4 and Geneious Pro v.5.5.6 (Drummond et al., 2011) (all remaining SAGs; total
of 32). For each of these 32 SAGs, raw 454 sequences were trimmed in Geneious Pro v5.5.6 and
any remaining transposons were removed using TagCleaner v0.11 (Schmieder et al., 2010).
Sequences were then assembled separately in Newbler v.2.4 (Roche) using default settings and
Geneious using the high-sensitivity setting. The Newbler-assembled sequences were imported
into Geneious and co-assembled with both the Geneious-assembled contigs and the unused
reads. The dual assembled contigs and all other contigs longer than 300 bp were pooled and
annotated. Nextera-prepared sequencing libraries were generated using the Roche Titanium-
Compatible kit with MDA product as the input DNA, following the manufacturer’s instructions
(Adey et al., 2010). A total of 32 Nextera sequencing libraries constructed from SAGs were
barcoded and sequenced (454 FLX Titanium chemistry) on 1/2 microtiter plate. Whole-genome
sequence data for all Thaumarchaeota SAGs are available in IMG under accession numbers
listed in Supplementary Table S1.
SAG whole genome sequence quality control
Each raw sequence data set was screened against all finished bacterial and archaeal
genome sequences (downloaded from NCBI) and the human genome to identify potential
contamination in the sample. Reads were mapped against reference genomes with bwa version
0.5.9 (Li and Durbin, 2009) using default parameters (96% identity threshold). None of the
libraries showed significant contamination. Additionally, gene sequences of the final assemblies
(see below) were compared against the GenBank nr database by BLASTX and taxonomically
classified using MEGAN (57).
To further verify the absence of contaminating sequences in the assemblies, tetramer
frequencies were extracted from all scaffolds using two alternative settings: 1) sliding window of
1000 bp and 100 bp step size and 2) sliding window of 5000 bp and 500 bp step size. Reverse-
complementary tetramers were combined and the frequencies represented as a N×136 feature
matrix, where N is the number of windows and each column of the matrix corresponds to the
frequency of one of the 136 possible tetramers. Principal component analysis (PCA) was then
used to extract the most important components of this high dimensional feature matrix. The
analysis produced unimodal distribution along the first four PCs for the majority of SAGs,
suggesting homogenous DNA sources. Scaffolds representing extremes on the first four PCs
were identified and manually examined for their closest TBLASTX hits against the NCBI nt
database.
SAG annotation
The gene modeling program Prodigal (http://prodigal.ornl.gov/) was run on the draft
single cell genomes, using default settings that permit overlapping genes and using ATG, GTG,
and TTG as potential starts. The resulting protein translations were compared to the GenBank
non-redundant database (NR), the Swiss-Prot/TrEMBL, Pfam, TIGRFam, Interpro, KEGG, and
COGs databases using BLASTP or HMMER. From these results, product assignments were
made. Initial criteria for automated functional assignment set priority based on TIGRFam, Pfam,
COG, Interpro profiles, pairwise BLAST versus Swiss-Prot/TrEMBL, and KO groups. The
annotation was imported into the Joint Genome Institute Integrated Microbial Genomes (IMG;
http://img.jgi.doe.gov/cgi-bin/pub/main.cgi) (Markowitz et al., 2010).
Phylogenomic tree construction
We compiled two data sets for phylogenomic analyses of marine Thaumarchaeota. The
first data set used sequence data from 46 single cell genomes and the single cultured isolate
Nitrosopumilus maritimus SCM1 genome. The second set included all 8 published composite
Thaumarchaeota genomes in addition to N. maritimus and the 46 single cell genomes used in the
first compilation. These composite genomes are Candidatus Nitrosoarchaeum koreensis MY1,
Candidatus Nitrosoarchaeum limnia SFB1, Candidatus Cenarchaeum symbiosum A, Candidatus
Nitrosoarchaeum limnia BG20, Candidatus Nitrosopumilus salaria BD31, Candidatus
Nitrosopumilus koreensis AR1, Candidatus Nitrosopumilus sediminis AR2, and Candidatus
Nitrososphaera gargensis Ga9.2. Genome sequences from 2 Crenarchaeota, Pyrobaculum
islandicum DSM 4184 and Sulfolobus acidocaldarius DSM 639, were included as outgroups.
These two data sets were analyzed separately, because it is not clear how composite genomes
may affect the phylogenomic reconstruction.
The two data sets were processed in an identical way based on the following procedure.
Orthologous gene families were identified using the OrthoMCL software (Li et al., 2003).
Inparalog copies in a gene family were discarded, and gene members assigned to different COGs
were also discarded. For the remaining single copy orthologous families, only those found in
genomes from at least 25 (in the first data set) or 33 (in the second data set) Thaumarchaeota and
1 outgroup member were retained. This resulted in retention of 97 (in the first data set) or 83 (in
the second data set) gene families. Members in each gene family were aligned at the amino acid
level using MAFFT (Katoh et al., 2005) and the alignments were trimmed using TrimAl
(Capella-Gutiérrez et al., 2009) with the criteria of “-automated1 -resoverlap 0.55 -seqoverlap
60”. Then the trimmed alignments were concatenated, with missing sequences treated as gaps.
To account for heterogeneity in the evolutionary processes among different genes, we applied a
data partition model during phylogenetic construction using the RAxML v7.3.0 software
(Stamatakis, 2006). The PartitionFinder software (Lanfear et al., 2012) grouped the 97 proteins
into 16 partitions and grouped the 83 proteins into 14 partitions, respectively, and estimated the
best-fit substitution matrix for each partition using a maximum likelihood framework. Gamma
distribution of rate variation was also applied in RAxML analysis. Genomes obtained from
single cells have many missing genes and taxa with insufficient phylogenetic signal may become
rogues that take uncertain positions in a phylogenetic tree. We applied the RogueNaRok software
(Aberer et al., 2013) and identified one rogue, SCGC AAA008-M23. Another RAxML
phylogenomic tree was constructed with this genome excluded, but the bootstrap support for
unresolved branches was only slightly improved compared to the original tree. Therefore, only
the original RAxML tree containing sequences from all SAGs is presented. Orthologous protein
sequences are available upon request.
Comparative analysis of genome content
All of the predicted amino acid sequences from the 46 SAGs and Nitrosopumilus
maritimus SCM1 were clustered into orthologous gene families using the OrthoMCL software
(Li et al. 2003). Then the occurrence rate of each family in the 4 epipelagic clade SAGs and the
42 mesopelagic clade SAGs was calculated, respectively. The most interesting ecologically
relevant gene families, that had a higher occurrence rate in one clade compared to the other, were
identified and are listed in Table S3.
Analysis of photolyase and catalase
Inferred amino acid sequences closely related to homologs of photolyase and catalase
were identified in the Global Ocean Survey (GOS) metagenomic database using a three-step
procedure. Firstly, the GOS DNA read sequences were translated to amino acid sequences using
all 6 reading frames. Peptide fragments with at least 60 amino acids were retained. Next, the
photolyase and catalase amino acid sequences identified in the SAGs were used as query
sequences to search against GOS using the BLASTp program. The criteria to retain GOS hits for
further analyses were similarity scores ≥60, alignment lengths ≥100, and bit scores ≥100 for
photolyase, and similarity scores ≥75, alignment lengths ≥310, and bit scores ≥500 for
catalase. These parameter values were estimated based on preliminary phylogenetic analyses
showing that sequences recovered using more relaxed criteria were not related to
Thaumarchaeota. Finally, GOS DNA sequences identified as photolyase and catalase peptide
fragments, were extracted and searched against the NCBI non-redundant database using the
BLASTx program to guarantee that these GOS reads encoded photolyase or catalase.
Phylogenetic analysis of the photolyase and catalase sequences we retrieved followed an
identical procedure. Since the homologous sequences are very divergent, 7 alignment methods
were used and compared to better account for alignment uncertainty. These methods include
(Larkin et al., 2007), MAFFT (Katoh et al., 2005), MUSCLE (Edgar, 2004), T-coffee
(Notredame et al., 2000), DIALIGN (Morgenstern, 2004), Kalign (Lassmann and Sonnhammer,
2005), and OPAL (Wheeler and Kececioglu, 2007). The qualities of the alignments were
compared using the TrimAl software (Capella-Gutiérrez et al., 2009); the best alignment was
selected according to the consistency score calculated by TrimAl. Next, the amino acid
substitution model was determined using the ProtTest v3 software (Darriba et al., 2011). A
phylogenetic tree was constructed using the MrBayes v3.1.2 software (Ronquist and
Huelsenbeck, 2003). One cold and three heated Markov chain Monte Carlo (MCMC) chains
were run for 1,000,000 generations with trees sampled every 100 generations. Two independent
runs of MCMC were performed. The first 25% of all runs were discarded as ‘burn-in’. A 50%
majority-rule consensus tree was constructed from the post-burn-in trees. The average standard
deviation of split frequencies reached <0.01, indicative of convergence.
Supplemental Figure Legends
Figure S1. Maximum likelihood phylogenetic tree of marine Thaumarchaeota 16S rRNA
genes. The tree was constructed using the RAxML v7.3.0 software using the GTR substitution
model with Gamma distributed rate heterogeneity among sites. Values at the nodes show the
number of times the clade defined by that node appeared in the 100 bootstrapped datasets.
Bootstrap values below 50 are not shown. The taxa included in the tree are the Thaumarchaeota
SAGs, cultures, and a few environmental sequences, with 5 soil and hot spring sequences as
outgroups. The epi- and mesopelagic clades are indicated by shading.
Figure S2. Maximum likelihood phylogenomic analysis of 55 Thaumarchaeota genomes.
The tree was constructed using the RAxML v7.3.0 software using a concatenated amino acid
sequence of 83 genes with 24,061 sites, with a data partition model determined by the
PartitionFinder software. Values at the nodes show the number of times the clade defined by that
node appeared in the 100 bootstrapped datasets. Two Crenarchaeota outgroup species are not
shown. Details of tree construction can be found in Supplemental Material. The epi- and
mesopelagic clades are indicated by shading. Single cell genomes from different water
masses/locations/depths are marked with different colors as identified in the legend inset.
Supplemental Table Legends
Table S1. Accession numbers and environmental characteristics of the 46 marine
Thaumarchaeota single-cell amplified genomes used in this study.
Table S2. COG annotations of the 97 proteins used for phylogenomic analysis.
Table S3. Examples of gene families distributed in the epi- and mesopelagic clades of
marine Thaumarchaeota.
References
Aberer, A.J., Krompass, D., and Stamatakis, A. (2013). Pruning rogue taxa
improves phylogenetic accuracy: an efficient algorithm and webservice. Syst Biol 62:
162-166.
Adey A, Morrison H, Asan, Xun X, Kitzman J, Turner E et al. (2010). Rapid,
low-input, low-bias construction of shotgun fragment libraries by high-density in vitro
transposition. Genome Biology 11: R119.
Capella-Gutiérrez, S., Silla-Martínez, J.M., and Gabaldón, T. (2009). trimAl: a
tool for automated alignment trimming in large-scale phylogenetic analyses.
Bioinformatics 25: 1972-1973.
Chitsaz H, Yee-Greenbaum JL, Tesler G, Lombardo M-J, Dupont CL, Badger JH
et al. (2011). Efficient de novo assembly of single-cell bacterial genomes from short-read
data sets. Nat Biotechnol 29: 915–921.
Cleland D, Krader P, McCree C, Tang J, Emerson D (2004). Glycine betaine as a
cryoprotectant for prokaryotes. Journal of Microbiological Methods 58: 31-38.
Darriba, D., Taboada, G.L., Doallo, R., and Posada, D. (2011). ProtTest 3: fast
selection of best-fit models of protein evolution. Bioinformatics 27: 1164-1165.
ean FB, Hosono S, Fang L, Wu X, Faruqi AF, Bray-Ward P et al. (2002).
Comprehensive human genome amplification using multiple displacement amplification.
Proceedings of the National Academy of Sciences of the United States of America 99:
5261-5266.
del Giorgio PA, Bird DF, Prairie YT, Planas D (1996). Flow cytometric
determination of bacterial the green nucleic acid stain SYTO 13. Limnol Oceanogr 41:
783–789.
Drummond AJ, Ashton B, Buxton S, Cheung M, Cooper A, Duran C et al.
(2011). Geneious v5.4, Available from http://www.geneious.com/.
Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy
and high throughput. Nucleic Acids Re 32: 1792-1797.
Francis, C. A., K. J. Roberts, J. M. Beman, A. E. Santoro and B. B. Oakley
(2005). "Ubiquity and diversity of ammonia-oxidizing Archaea in water columns and
sediments of the ocean." Proceedings of the National Academy of Sciences of the US
102(41): 14683-14688.
Katoh, K., Kuma, K.-i., Toh, H., and Miyata, T. (2005). MAFFT version 5:
improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33: 511-
518.
Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G et al.
(2012). Hybrid error correction and de novo assembly of single-molecule sequencing
reads. Nat Biotechnol 30: 693–700.
Lane, D. J. 1991. 16S/23S rRNA sequencing. In E. Stackebrandt and M.
Goodfellow (ed.), Nucleic acid techniques in bacterial systematics. John Wiley,
Chichester, UK.
Lanfear, R., Calcott, B., Ho, S.Y.W., and Guindon, S. (2012). PartitionFinder:
combined selection of partitioning schemes and substitution models for phylogenetic
analyses. Mol Biol Evol 29: 1695-1701.
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A.,
McWilliam, H. et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23:
2947-2948.
Lassmann, T., and Sonnhammer, E. (2005). Kalign - an accurate and fast multiple
sequence alignment algorithm. BMC Bioinformatics 6: 298.
Li H, Durbin R (2009). Fast and accurate short read alignment with Burrows–
Wheeler transform. Bioinformatics 25: 1754–1760.
Li, L., Stoeckert, C.J., and Roos, D.S. (2003). OrthoMCL: identification of
ortholog groups for eukaryotic genomes. Genome Res 13: 2178-2189.
Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, Grechkin Y et al.
(2010). The integrated microbial genomes system: an expanding comparative analysis
resource. Nucleic Acids Res 38: D382–D390.
Martinez-Garcia M, Brazel DM, Swan BK, Arnosti C, Chain PSG, Reitenga KG
et al. (2012). Capturing single cell genomes of active polysaccharide degraders: An
unexpected contribution of Verrucomicrobia. PLoS ONE 7: e35314.
Morgenstern, B. (2004). DIALIGN: multiple DNA and protein sequence
alignment at BiBiServ. Nucleic Acids Res 32: W33-W36.
Notredame, C., Higgins, D., and Heringa, J. (2000). T-coffee: a novel method for
fast and accurate multiple sequence alignment. J Mol Biol 302: 205-217.
Raghunathan A, Ferguson HR, Jr., Bornarth CJ, Song W, Driscoll M, Lasken RS
(2005). Genomic DNA amplification from a single bacterium. Applied and
Environmental Microbiology 71: 3342-3347.
Ronquist, F., and Huelsenbeck, J.P. (2003). MrBayes 3: Bayesian phylogenetic
inference under mixed models. Bioinformatics 19: 1572-1574.
Schmieder R, Lim YW, Rohwer F, Edwards R (2010). TagCleaner: Identification
and removal of tag sequences from genomic and metagenomic datasets. BMC
Bioinformatics 11: 341.
Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based
phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688
- 2690.
Stepanauskas R, Sieracki ME (2007). Matching phylogeny and metabolism in the
uncultured marine bacteria, one cell at a time. Proceedings of the National Academy of
Sciences 104: 9052-9057.
Swan BK, Martinez-Garcia M, Preston CM, Sczyrba A, Woyke T, Lamy D et al.
(2011). Potential for chemolithoautotrophy among ubiquitous bacteria lineages in the
dark ocean. Science 333: 1296–1300.
Wheeler, T.J., and Kececioglu, J.D. (2007). Multiple alignment by aligning
alignments. Bioinformatics 23: i559-i568.
Woyke T, Sczyrba A, Lee J, Rinke C, Tighe D, Clingenpeel S et al. (2011).
Decontamination of MDA reagents for single cell whole genome amplification. PLoS
ONE 6: e26161.
Wuchter, C., B. Abbas, M. J. L. Coolen, L. Herfort, J. van Bleijswijk, P.
Timmers, M. Strous, E. Teira, G. J. Herndl, J. J. Middelburg, S. Schouten and J. S.
Sinninghe Damste (2006). "Archaeal nitrification in the ocean." Proceedings of the
National Academy of Sciences of the USA 103(33): 12317-12322.
Zhang T, Fang H (2006). Applications of real-time polymerase chain reaction for
quantification of microorganisms in environmental samples. Applied Microbiology and
Biotechnology 70: 281-289.
0.04
AAA288-C17_North_Pacific_HQ675812
Uncultured_deep-sea_hydrothermal_vent_archaeon_clone_SSM263-NA01_AB193964
Nitrosopumilus_maritimus_DQ085097
AAA288-O22_North_Pacific_HQ675853
Nitrosopumilus_sp._NM25_AB546961
Uncultured_Arctic_crenarchaeote_clone_SCICEX122426B4_EU199641
AAA008-N07_South_Atlantic_HQ675793
AAA007-E15_South_Atlantic_HQ675760
AB-663-O07_CDW
AAA288-K05_North_Pacific_HQ675842
Uncultured_Antarctic_crenarchaeote_clone_Ant10903C1_EU199467
AAA288-E09_North_Pacific_HQ675824
Uncultured_deep-sea_hydrothermal_vent_archaeon_clone_FS243A-90_DQ270604
AAA288-P18_North_Pacific_HQ675855
Uncultured_Antarctic_crenarchaeote_clone_Ant11103E7_EU199498
Uncultured_Antarctic_crenarchaeote_clone_15G10_AF393304AAA160-J20_Gulf_of_Maine
Marine_group_I_crenarchaeote_clone_HF4000_046D20_DQ300528
Uncultured_Antarctic_crenarchaeote_clone_Ant4407C3_EU199544
AAA008-E17_South_Atlantic_HQ675779
Uncultured_Antarctic_crenarchaeote_clone_Ant1003B4_EU199452
AAA288-I14_North_Pacific_HQ675835
Uncultured_Antarctic_crenarchaeote_clone_83A10_AF393307
AAA008-O05_South_Atlantic_HQ675796
Uncultured_Antarctic_sponge_crenarchaeote_AY320201
Nitrososphaera_gargensis_EU281334
Nitrosoarchaeum_limnia_AEGP01000029
Uncultured_crenarchaeote_74A4_AF393466
Uncultured_Santa_Barbara_Channel_200m_crenarchaeote_clone_SB95-57_U78199
AAA007-C21_South_Atlantic_HQ675759
Uncultured_Arctic_crenarchaeote_clone_SCICEX122416H2_EU199638
Uncultured_aquarium_biofilter_crenarchaeote_FJ971122
AAA288-M23_North_Pacific_HQ675846
Marine_group_I_cren_clone_HF770_010J17_DQ300531
Marine_group_I_crenarchaeote_clone_HF500_73E11_DQ300524
Marine_group_I_crenarchaeote_clone_HF770_036I18_DQ300541
AB-661-M19_WW
AB-661-I02_WW
Cenarchaeum_symbiosum_A_AF083071Nitrosocaldus_yellowstonii_HL72_EU239960
Uncultured_deep-sea_hydrothermal_vent_archaeon_clone_SSM264-EA12_AB194003
Uncultured_deep-sea_hydrothermal_vent_archaeon_clone_Scd-NEA02_AB193965
Uncultured_Antarctic_crenarchaeote_clone_N67a_76_EF645850
AAA001-A19_South_Atlantic_HQ675727
AAA288-P02_North_Pacific_HQ675854
Uncultured_Namibian_upwelling_system_OMZ_crenarchaeote_clone_N67a_43_EF645854
Nitrososphaera_viennensis_EN76_FR773157
AAA288-M04_North_Pacific_HQ675845
AAA008-E02_South_Atlantic_HQ675777
AAA008-P23_South_Atlantic_HQ675800
Uncultured_cren_clone_Ant1023F6_EU199462
Nitrososphaera_viennensis_EN123_FR773158
AAA007-O23_South_Atlantic_HQ675772
AB-663-F14_CDW
Uncultured_Antarctic_crenarchaeote_fosmid_74A4_AF393466
AAA007-N23_South_Atlantic_HQ675770AAA288-J14_North_Pacific_HQ675839
Uncultured_Antarctic_crenarchaeote_clone_Ant25B13G3_EU199522
AAA007-G17_South_Atlantic_HQ675762
Marine_group_I_crenarchaeote_clone_T1_35_A1-2_FJ150824
AAA008-P02_South_Atlantic_HQ675798AAA008-G03_South_Atlantic_HQ675782
AAA288-K20_North_Pacific_HQ675844AAA288-D03_North_Pacific_HQ675814
Uncultured_marine_cren_clone_ZMEW4050_F5_FJ615484
AB-663-P07_CDW
Uncultured_Antarctic_crenarchaeote_clone_Ant4402F2_EU199541
AAA288-D22_North_Pacific_HQ675821
Uncultured_Antarctic_crenarchaeote_clone_Ant25B02D2_EU199515
Marine_group_I_cren_clone_HF770_018L23_DQ300534
Uncultured_Arctic_crenarchaeote_clone_SCICEX122320G7_EU199627
AB-661-L21_WW
AAA008-M21_South_Atlantic_HQ675791
Uncultured_North_Sea_surface_archaeon_clone_TS10C299_AF052946
AAA288-N15_North_Pacific_HQ675849
Uncultured_Arctic_crenarchaeote_95B-55-2G_AY288375
AAA288-N23_North_Pacific_HQ675850
Nitrososphaera_gargensis_EU281335
AAA008-M23_South_Atlantic_HQ675792
AAA288-K02_North_Pacific_HQ675841
Uncultured_crenarchaeote_clone_KM3-86-C1_EF597717
AAA288-G05_North_Pacific_HQ675831
AAA008-E15_South_Atlantic_HQ675859
AAA288-O17_North_Pacific_HQ675852
AAA007-M20_South_Atlantic_HQ675767
AAA008-O18_South_Atlantic_HQ675797
AB-663-G14_CDW
AB-663-N18_CDW
AAA007-N19_South_Atlantic_HQ675769
Uncultured_Arctic_crenarchaeote_clone_SCICEX1231236E12_EU199667
100
92
93
100
100
95
59
96
81
Mesopelagic clade
Epipelagic clade
Figure S1 Luo et al.
AAA288-N15 AAA008-O05 AAA007-N23
AAA007-E15 AAA008-N07 AAA008-M23 AAA008-O18 AAA008-E15
AAA001-A19 AAA288-J14 AAA008-P02 AAA008-E02
AAA007-O23 AAA007-N19 AAA288-I14 AB-663-P07 AAA288-D22
AB-663-G14 AAA288-O22 AAA008-P23 AAA288-C17 AAA288-P18
AB-663-F14 AB-663-N18
AAA288-N23 AAA007-C21 AAA007-M20 AAA288-D03
AAA008-G03 AAA288-E09 AAA288-M04 AAA288-G05 AAA288-M23 AAA288-K05 AAA288-K20
AAA007-G17 AAA008-E17 AAA288-O17
AAA288-P02 AAA288-K02 AAA008-M21 AB-663-O07
AB-661-L21 AB-661-I02 AB-661-M19
AAA160-J20 Cand. Nitrosopumilus sediminis AR2 Cand. Nitrosopumilus salaria BD31 Nitrosopumilus maritimus SCM1 Cand. Nitrosopumilus koreensis AR1 Cand. Nitrosoarchaeum koreensis MY1 Cand. Nitrosoarchaeum limnia SFB1 Cand. Nitrosoarchaeum limnia BG20
Cand. Cenarchaeum symbiosum A Cand. Nitrososphaera gargensis Ga9.2
100100
100
100
100
100
99
100
57
92
100
98
98
95
8456
23
75
90
81
31
48
68
98
85
62
25
49
36
98
98
97
100
100
100100
100
62
46
99
100
83100
100
100
100
100
100
100100
100
100
100
0.1
Mesopelagic clade
Epipelagic clade
770 m, South Atlantic Gyre
800 m, North Pacific Gyre
400 m, AntarcticCircumpolar Deep Water
80 m, Antarctic Winter Water
Surface, Gulf of Maine
Figure S2 Luo et al.
Tab
le S
1. C
har
acte
rist
ics
of
46 m
arin
e T
hau
mar
chae
ota
sin
gle
-cel
l am
pli
fied
gen
om
es.
SA
G I
D
(SC
GC
-)
IMG
Tax
on
ID
Lo
cati
on
1
Lat
itude
Longit
ude
Dat
e D
epth
(m)
Tem
p
erat
ur
e (°
C)
Sal
init
y
(PS
U)
Ass
emb
l
y s
ize2
(Mb
p)
No
.
con
tigs
No
.
gen
es
AB
-66
1-I
02
2
52
40
23
096
WW
64°2
4.1
6′S
64°5
5.9
′W
11Ja
n2011
80
-0.5
1
33
.93
0.5
2
70
68
5
AB
-661
-L2
1
25
24
02
30
84
WW
64°2
4.1
6′S
64°5
5.9
′W
11Ja
n2011
80
-0.5
1
33
.93
0.5
1
51
64
6
AB
-661
-M1
9
25
24
02
30
95
WW
64°2
4.1
6′S
64°5
5.9
′W
11Ja
n2011
80
-0.5
1
33
.93
0.5
8
57
74
0
AB
-663
-F1
4
25
23
53
36
33
CD
W
64°2
4.1
6′S
64°5
5.9
′W
11Ja
n2011
40
0
1.4
3
4.6
4
0.5
7
10
7
77
6
AB
-663
-G14
2
52
40
23
085
CD
W
64°2
4.1
6′S
64°5
5.9
′W
11Ja
n2011
40
0
1.4
3
4.6
4
0.5
8
78
75
5
AB
-663
-N18
2
52
40
23
093
CD
W
64°2
4.1
6′S
64°5
5.9
′W
11Ja
n2011
40
0
1.4
3
4.6
4
0.3
6
53
45
0
AB
-663
-O07
2
52
40
23
092
CD
W
64°2
4.1
6′S
64°5
5.9
′W
11Ja
n2011
40
0
1.4
3
4.6
4
0.4
8
37
59
8
AB
-663
-P0
7
25
24
02
30
94
C
DW
64°2
4.1
6′S
64°5
5.9
′W
11Ja
n2011
40
0
1.4
3
4.6
4
0.8
2
38
10
10
AA
A160
-J2
0
25
29
29
26
98
GO
M
43°5
0′3
9.8
7′′N
69°3
8′2
7.4
9′′W
16S
ep2009
1
22
.3
30
0
.56
99
72
6
AA
A007
-O2
3
25
27
29
15
00
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
1
.13
67
13
93
AA
A001
-A1
9
25
13
23
70
67
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.76
22
3
11
14
AA
A007
-N1
9
25
13
23
70
68
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.89
19
4
12
95
AA
A007
-C2
1
25
24
02
31
06
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.28
18
5
45
1
AA
A007
-E1
5
25
24
02
31
07
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.45
24
4
70
4
AA
A007
-G1
7
25
24
02
31
08
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.34
18
9
58
1
AA
A007
-M2
0
25
24
02
31
09
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.30
25
3
50
3
AA
A007
-N2
3
25
24
02
31
10
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.17
12
3
29
8
AA
A008
-E0
2
25
24
02
31
11
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.53
28
3
83
8
AA
A008
-E1
5
25
24
02
31
12
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.46
23
6
70
6
AA
A008
-E1
7
25
24
02
31
13
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.43
25
7
73
0
AA
A008
-G0
3
25
24
02
31
14
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.53
23
6
85
2
AA
A008
-M2
1
25
24
02
31
15
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.36
19
7
60
6
AA
A008
-M2
3
25
24
02
31
16
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.35
17
1
55
8
AA
A008
-N0
7
25
24
02
31
17
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.28
16
3
48
9
AA
A008
-O0
5
25
24
02
31
18
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.35
15
0
55
7
AA
A008
-O1
8
25
24
02
31
19
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.39
23
4
65
3
AA
A008
-P0
2
25
24
02
31
20
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.44
19
5
73
5
AA
A008
-P2
3
25
24
02
31
21
SA
12°2
9′4
1.4
″S
4°5
9′5
5.2
″W
01D
ec2007
80
0
4.8
3
4.5
0
.50
23
0
80
1
AA
A288
-I1
4
25
13
23
70
66
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
1
.06
15
3
15
34
AA
A288
-J1
4
25
13
23
70
65
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.68
25
3
10
73
AA
A288
-C1
7
25
24
02
31
22
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.59
26
9
95
3
AA
A288
-D0
3
25
24
02
31
23
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.53
33
5
88
0
AA
A288
-D2
2
25
24
02
31
24
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.55
23
4
85
9
AA
A288
-E0
9
25
24
02
31
25
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.55
30
0
88
7
AA
A288
-G0
5
25
24
02
31
26
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.71
24
2
10
63
AA
A288
-K0
2
25
24
02
31
27
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.66
31
9
10
53
AA
A288
-K0
5
25
24
02
31
28
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.49
27
9
80
3
AA
A288
-K2
0
25
24
02
30
97
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.57
28
8
88
8
AA
A288
-M0
4
25
24
02
30
98
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.56
32
7
91
7
AA
A288
-M2
3
25
24
02
30
99
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.51
32
0
84
7
AA
A288
-N1
5
25
24
02
31
00
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.27
17
0
50
0
AA
A288
-N2
3
25
24
02
31
01
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.49
24
8
76
8
AA
A288
-O1
7
25
24
02
31
02
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.13
12
7
24
9
AA
A288
-O2
2
25
24
02
31
03
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.37
17
8
62
1
AA
A288
-P0
2
25
24
02
31
04
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.86
30
7
12
74
AA
A288
-P1
8
25
24
02
31
05
NP
22°4
5′ N
158°0
0′ W
09S
ep2009
77
0
4.7
3
4.3
0
.68
33
5
10
91
1W
W (
Anta
rcti
c W
inte
r W
ater
), C
DW
(A
nta
rtic
Cir
cum
pola
r D
eep W
ater
), G
OM
(G
ulf
of
Mai
ne)
, S
A (
South
Atl
anti
c g
yre
), N
P
(Nort
h P
acif
ic g
yre
).
2T
he
cult
ure
d r
efer
ence
str
ain
Nit
roso
pum
ilus
mari
tim
us
scm
1 h
as a
gen
om
e si
ze o
f 1.6
5 M
bp.
Table S2. COG annotation of the 97 proteins used for phylogenomic analysis.
COG id Biological function
COG0541 Signal recognition particle GTPase
COG2511 Archaeal Glu-tRNAGln amidotransferase subunit E (contains GAD domain)
COG0252 L-asparaginase/archaeal Glu-tRNAGln amidotransferase subunit D
COG5257 Translation initiation factor 2, gamma subunit (eIF-2gamma; GTPase)
COG0152 Phosphoribosylaminoimidazolesuccinocarboxamide (SAICAR) synthase
COG1797 Cobyrinic acid a,c-diamide synthase
COG2138 Uncharacterized conserved protein
COG2082 Precorrin isomerase
COG0096 Ribosomal protein S8
COG0113 Delta-aminolevulinic acid dehydratase
COG1881 Phospholipid-binding protein
COG2109 ATP:corrinoid adenosyltransferase
COG0468 RecA/RadA recombinase
COG0126 3-phosphoglycerate kinase
COG0520 Selenocysteine lyase
COG1093 Translation initiation factor 2, alpha subunit (eIF-2alpha)
COG0615 Cytidylyltransferase
COG0097 Ribosomal protein L6P/L9E
COG0128 5-enolpyruvylshikimate-3-phosphate synthase
COG2260 Predicted Zn-ribbon RNA-binding protein
COG0093 Ribosomal protein L14
COG0396 ABC-type transport system involved in Fe-S cluster assembly, ATPase component
COG0090 Ribosomal protein L2
COG0092 Ribosomal protein S3
COG1471 Ribosomal protein S4E
COG0034 Glutamine phosphoribosylpyrophosphate amidotransferase
COG1339 Transcriptional regulator of a riboflavin/FAD biosynthetic operon
COG2090 Uncharacterized protein conserved in archaea
COG1675 Transcription initiation factor IIE, alpha subunit
COG0048 Ribosomal protein S12
COG2125 Ribosomal protein S6E (S10)
COG0225 Peptide methionine sulfoxide reductase
COG0504 CTP synthase (UTP-ammonia lyase)
COG0169 Shikimate 5-dehydrogenase
COG1303 Uncharacterized protein conserved in archaea
COG2139 Ribosomal protein L21E
COG1324 Uncharacterized protein involved in tolerance to divalent cations
COG2262 GTPases
COG0088 Ribosomal protein L4
COG0100 Ribosomal protein S11
COG0010 Arginase/agmatinase/formimionoglutamate hydrolase, arginase family
COG1958 Small nuclear ribonucleoprotein (snRNP) homolog
COG1646 Predicted phosphate-binding enzymes, TIM-barrel fold
COG0189 Glutathione synthase/Ribosomal protein S6 modification enzyme (glutaminyl transferase)
COG4830 Ribosomal protein S26
COG1547 Uncharacterized conserved protein
COG1258 Predicted pseudouridylate synthase
COG0265 Trypsin-like serine proteases, typically periplasmic, contain C-terminal PDZ domain
COG1412 Uncharacterized proteins of PilT N-term./Vapc superfamily
COG0199 Ribosomal protein S14
COG0287 Prephenate dehydrogenase
COG1798 Diphthamide biosynthesis methyltransferase
COG0094 Ribosomal protein L5
COG1903 Cobalamin biosynthesis protein CbiD
COG2890 Methylase of polypeptide chain release factors
COG2073 Cobalamin biosynthesis protein CbiG
COG1867 N2,N2-dimethylguanosine tRNA methyltransferase
COG2875 Precorrin-4 methylase
COG0667 Predicted oxidoreductases (related to aryl-alcohol dehydrogenases)
COG1491 Predicted RNA-binding protein
COG2429 Uncharacterized conserved protein
COG1985 Pyrimidine reductase, riboflavin biosynthesis
COG0358 DNA primase (bacterial type)
COG0057 Glyceraldehyde-3-phosphate dehydrogenase/erythrose-4-phosphate dehydrogenase
COG1024 Enoyl-CoA hydratase/carnithine racemase
COG1537 Predicted RNA-binding proteins
COG0671 Membrane-associated phospholipid phosphatase
COG4221 Short-chain alcohol dehydrogenase of unknown specificity
COG0087 Ribosomal protein L3
COG0186 Ribosomal protein S17
COG1460 Uncharacterized protein conserved in archaea
COG0091 Ribosomal protein L22
COG0644 Dehydrogenases (flavoproteins)
COG1587 Uroporphyrinogen-III synthase
COG0049 Ribosomal protein S7
COG0030 Dimethyladenosine transferase (rRNA methylation)
COG2241 Precorrin-6B methylase 1
COG0185 Ribosomal protein S19
COG0863 DNA modification methylase
COG1180 Pyruvate-formate lyase-activating enzyme
COG0195 Transcription elongation factor
COG3253 Uncharacterized conserved protein
COG1254 Acylphosphatases
COG1378 Predicted transcriptional regulators
COG1439 Predicted nucleic acid-binding protein, consists of a PIN domain and a Zn-ribbon module
COG1522 Transcriptional regulators
COG1964 Predicted Fe-S oxidoreductases
COG1599 Single-stranded DNA-binding replication protein A (RPA), large (70 kD) subunit and
related ssDNA-binding proteins
COG1940 Transcriptional regulator/sugar kinase
COG0054 Riboflavin synthase beta-chain
COG0255 Ribosomal protein L29
COG2242 Precorrin-6B methylase 2
COG1409 Predicted phosphohydrolases
COG1382 Prefoldin, chaperonin cofactor
COG1703 Putative periplasmic protein kinase ArgK and related GTPases of G3E family
COG3185 4-hydroxyphenylpyruvate dioxygenase and related hemolysins
COG0805 Sec-independent protein secretion pathway component TatC
Table S3. Examples of gene families distributed in the epi- and mesopelagic clades of marine
Thaumarchaeota.
Family
id1
Gene Epipelagic SAGs
(N=4)
Nmar2
(N=1)
Mesopelagic
SAGs (N=42)
OR3550 urease subunit alpha 0 0 2
OR3005 urease subunit beta 0 0 3
OR3010 urease accessory protein UreD 0 0 3
OR2375 urease accessory protein 0 0 7
OR2177 urease accessory protein 0 0 8
OR2326 urea amidohydrolase subunit gamma 0 0 8
OR2294 urease accessory protein 0 0 8
OR2270 urease accessory protein UreD 0 0 9
OR1967 urea active transporter 0 0 10
OR2098 urease subunit alpha 0 0 10
OR3671 Deoxyribodipyrimidine photolyase 1 0 0
OR3847 twin-arginine translocation pathway
signal 2 0 0
OR2460 universal stress protein 3 1 0
OR3848 catalase/peroxidase HPI 2 0 0
OR2257 ammonia monooxygenase, subunit A 2 1 6
OR2022 ammonium transporter 2 1 8
OR1861 superoxide dismutase 3 1 11
OR2115 ammonia monooxygenase operon-
associated hypothetical protein 2 1 9
OR2077 ammonia monooxygenase subunit B 2 1 10
OR1607 blue (type1) copper domain-
containing protein 2 1 17
OR1653 multicopper oxidase type 3 2 1 17
OR1554 ammonia monooxygenase/methane
monooxygenase subunit C 2 1 19
OR1042 DSBA oxidoreductase 3 1 27
OR2116 peroxiredoxin 2 1 9
OR1393 peroxiredoxin 1 1 22
OR1993 peroxiredoxin 0 1 13
OR1659 peroxiredoxin 0 1 18
OR2405 peroxiredoxin 0 0 5
OR1050 blue (type1) copper domain-
containing protein 3 1 30
OR1688 blue (type1) copper domain-
containing protein 1 1 18
OR1569 blue (type1) copper domain-
containing protein 0 1 18
1These are orthologous gene families identified by the OrthoMCL software; the family id is
arbitrary.
2Nmar: Nitrosopumilus maritimus scm1, the only marine Thaumarchaeota strain in pure culture.