Plastid Genome Sequence of the Cryptophyte Alga Rhodomonas salina CCMP1319: Lateral Transfer of...

11
Plastid Genome Sequence of the Cryptophyte Alga Rhodomonas salina CCMP1319: Lateral Transfer of Putative DNA Replication Machinery and a Test of Chromist Plastid Phylogeny Hameed Khan,* Natalie Parks,* Catherine Kozera, Bruce A. Curtis, Byron J. Parsons, Sharen Bowman, and John M. Archibald* *Genome Atlantic and the Canadian Institute for Advanced Research, Program in Evolutionary Biology, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada; and  The Atlantic Genome Centre, Halifax, Nova Scotia, Canada Cryptophytes are a group of unicellular algae with chlorophyll c–containing plastids derived from the uptake of a secondary (i.e., eukaryotic) endosymbiont. Biochemical and molecular data indicate that cryptophyte plastids are derived from red algae, yet the question of whether or not cryptophytes acquired their red algal plastids independent of those in heterokont, haptophyte, and dinoflagellate algae is of long-standing debate. To better understand the origin and evolution of the cryptophyte plastid, we have sequenced the plastid genome of Rhodomonas salina CCMP1319: at 135,854 bp, it is the largest secondary plastid genome characterized thus far. It also possesses interesting features not seen in the distantly related cryptophyte Guillardia theta or in other red secondary plastids, including pseudogenes, introns, and a bacterial-derived gene for the tau/gamma subunit of DNA polymerase III (dnaX), the first time putative DNA replication machinery has been found encoded in any plastid genome. Phylogenetic analyses indicate that dnaX was acquired by lateral gene transfer (LGT) in an ancestor of Rhodomonas, most likely from a firmicute bacterium. A phylogenomic survey revealed no additional cases of LGT, beyond a noncyanobacterial type rpl36 gene similar to that recently characterized in other cryptophytes and haptophytes. Rigorous concatenated analysis of 45 proteins encoded in 15 complete plastid genomes produced trees in which the heterokont, haptophyte, and cryptophyte (i.e., chromist) plastids were monophyletic, and heterokonts and haptophytes were each other’s closest relatives. However, statistical support for chromist monophyly disappears when amino acids are recoded according to their chemical properties in order to minimize the impact of composition bias, and a significant fraction of the concatenate appears consistent with a sister- group relationship between cryptophyte and haptophyte plastids. Introduction Plastids (chloroplasts) evolved from free-living cyano- bacteria within the confines of a nonphotosynthetic host eu- karyote (Gray and Doolittle 1982). The ‘‘primary’’ endosymbiotic origin of plastids is widely believed to have occurred only once (Moreira, Le Guyader, and Phillippe 2000; Palmer 2003; Keeling 2004; Rodriguez-Ezpeleta et al. 2005) and 3 modern-day eukaryotic lineages—red al- gae, green algae (including land plants), and glaucophytes —harbor plastids stemming directly from this landmark event (Palmer 2003; Keeling 2004). More recently, the plastids of red and green algae have spread laterally across the eukaryotic tree via ‘‘secondary endosymbiosis,’’ i.e., the uptake of a photosynthetic eukaryote by an unrelated non- photosynthetic host (Delwiche 1999; Archibald and Keel- ing 2002, 2005; Palmer 2003; Keeling 2004). Unlike primary plastids, which are surrounded by 2 membranes, secondary plastids are characterized by the presence of 3 or 4 bounding membranes, a feature that complicates the import of nucleus-encoded, plastid-targeted proteins in secondary plastid-containing organisms (Cavalier-Smith 1999; McFadden 1999; Soll and Schleiff 2004). Some of the most abundant and ecologically significant eukaryotic phototrophs on Earth acquired their plastids secondarily, yet many of the details surrounding the pattern and process of secondary endosymbiosis remain unclear. Determining the number of endosymbioses that gave rise to the known spectrum of red algal–derived plastids has been particularly challenging, in large part due to the tre- mendous morphological, biochemical, and molecular di- versity exhibited by the organisms bearing them. These include heterokonts (e.g., diatoms and kelp), haptophytes, dinoflagellates, and (probably) apicomplexan parasites, the latter group containing a highly derived nonphotosynthetic organelle whose precise origin has been difficult to discern (reviewed by Delwiche 1999; Archibald and Keeling 2002, 2005; Palmer 2003; Keeling 2004). The cryptophytes are a ubiquitous group of flagellated unicells that also possess secondary plastids of red algal origin. Together with the chlorarachniophytes, an unrelated group of algae with green algal secondary plastids (Ishida, Green, and Cavalier- Smith 1999; Keeling 2004; Gilson et al. 2006; Rogers et al. 2007), cryptophytes are unlike other secondary plas- tid–containing algae in that the primary endosymbiont nu- cleus—the ‘‘nucleomorph’’—persists in the remnant cytosol of the engulfed algal cell between the inner and out- er pairs of plastid membranes (Gilson, Maier, and McFad- den 1997; Archibald 2007). Nucleomorph (Douglas et al. 2001) and plastid (Douglas and Penny 1999) genome se- quence data from the model cryptophyte Guillardia theta convincingly show a red algal ancestry for the cryptophyte plastid, although the precise relationship between crypto- phyte plastids and other red secondary plastids is not known. Based on the unique membrane topology of their plas- tids and the shared presence of the photosynthetic pigment chlorophyll c 2 , Cavalier-Smith (1982, 1986) placed the cryptophytes, heterokonts, and haptophytes together in the kingdom Chromista, hypothesizing that photosynthesis evolved in these organisms only once as a result of a single Key words: cryptophytes, plastid, chromists, chromalveolates, secondary endosymbiosis. Email: [email protected]. Mol. Biol. Evol. 24(8):1832–1842. 2007 doi:10.1093/molbev/msm101 Advance Access publication May 23, 2007 Ó The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] by guest on August 8, 2015 http://mbe.oxfordjournals.org/ Downloaded from

Transcript of Plastid Genome Sequence of the Cryptophyte Alga Rhodomonas salina CCMP1319: Lateral Transfer of...

Plastid Genome Sequence of the Cryptophyte Alga Rhodomonas salinaCCMP1319: Lateral Transfer of Putative DNA Replication Machinery anda Test of Chromist Plastid Phylogeny

Hameed Khan,* Natalie Parks,* Catherine Kozera,� Bruce A. Curtis,� Byron J. Parsons,�Sharen Bowman,� and John M. Archibald*

*Genome Atlantic and the Canadian Institute for Advanced Research, Program in Evolutionary Biology, Department of Biochemistryand Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada; and �The Atlantic Genome Centre, Halifax, NovaScotia, Canada

Cryptophytes are a group of unicellular algae with chlorophyll c–containing plastids derived from the uptake ofa secondary (i.e., eukaryotic) endosymbiont. Biochemical and molecular data indicate that cryptophyte plastids arederived from red algae, yet the question of whether or not cryptophytes acquired their red algal plastids independent ofthose in heterokont, haptophyte, and dinoflagellate algae is of long-standing debate. To better understand the origin andevolution of the cryptophyte plastid, we have sequenced the plastid genome of Rhodomonas salina CCMP1319: at135,854 bp, it is the largest secondary plastid genome characterized thus far. It also possesses interesting features notseen in the distantly related cryptophyte Guillardia theta or in other red secondary plastids, including pseudogenes,introns, and a bacterial-derived gene for the tau/gamma subunit of DNA polymerase III (dnaX), the first time putativeDNA replication machinery has been found encoded in any plastid genome. Phylogenetic analyses indicate that dnaXwas acquired by lateral gene transfer (LGT) in an ancestor of Rhodomonas, most likely from a firmicute bacterium.A phylogenomic survey revealed no additional cases of LGT, beyond a noncyanobacterial type rpl36 gene similar to thatrecently characterized in other cryptophytes and haptophytes. Rigorous concatenated analysis of 45 proteins encoded in15 complete plastid genomes produced trees in which the heterokont, haptophyte, and cryptophyte (i.e., chromist)plastids were monophyletic, and heterokonts and haptophytes were each other’s closest relatives. However, statisticalsupport for chromist monophyly disappears when amino acids are recoded according to their chemical properties in orderto minimize the impact of composition bias, and a significant fraction of the concatenate appears consistent with a sister-group relationship between cryptophyte and haptophyte plastids.

Introduction

Plastids (chloroplasts) evolved from free-living cyano-bacteria within the confines of a nonphotosynthetic host eu-karyote (Gray and Doolittle 1982). The ‘‘primary’’endosymbiotic origin of plastids is widely believed to haveoccurred only once (Moreira, Le Guyader, and Phillippe2000; Palmer 2003; Keeling 2004; Rodriguez-Ezpeletaet al. 2005) and 3 modern-day eukaryotic lineages—red al-gae, green algae (including land plants), and glaucophytes—harbor plastids stemming directly from this landmarkevent (Palmer 2003; Keeling 2004). More recently, theplastids of red and green algae have spread laterally acrossthe eukaryotic tree via ‘‘secondary endosymbiosis,’’ i.e., theuptake of a photosynthetic eukaryote by an unrelated non-photosynthetic host (Delwiche 1999; Archibald and Keel-ing 2002, 2005; Palmer 2003; Keeling 2004). Unlikeprimary plastids, which are surrounded by 2 membranes,secondary plastids are characterized by the presence of3 or 4 bounding membranes, a feature that complicatesthe import of nucleus-encoded, plastid-targeted proteinsin secondary plastid-containing organisms (Cavalier-Smith1999; McFadden 1999; Soll and Schleiff 2004). Some ofthe most abundant and ecologically significant eukaryoticphototrophs on Earth acquired their plastids secondarily,yet many of the details surrounding the pattern and processof secondary endosymbiosis remain unclear.

Determining the number of endosymbioses that gaverise to the known spectrum of red algal–derived plastids hasbeen particularly challenging, in large part due to the tre-mendous morphological, biochemical, and molecular di-versity exhibited by the organisms bearing them. Theseinclude heterokonts (e.g., diatoms and kelp), haptophytes,dinoflagellates, and (probably) apicomplexan parasites, thelatter group containing a highly derived nonphotosyntheticorganelle whose precise origin has been difficult to discern(reviewed by Delwiche 1999; Archibald and Keeling 2002,2005; Palmer 2003; Keeling 2004). The cryptophytes area ubiquitous group of flagellated unicells that also possesssecondary plastids of red algal origin. Together with thechlorarachniophytes, an unrelated group of algae withgreen algal secondary plastids (Ishida, Green, and Cavalier-Smith 1999; Keeling 2004; Gilson et al. 2006; Rogerset al. 2007), cryptophytes are unlike other secondary plas-tid–containing algae in that the primary endosymbiont nu-cleus—the ‘‘nucleomorph’’—persists in the remnantcytosol of the engulfed algal cell between the inner and out-er pairs of plastid membranes (Gilson, Maier, and McFad-den 1997; Archibald 2007). Nucleomorph (Douglas et al.2001) and plastid (Douglas and Penny 1999) genome se-quence data from the model cryptophyte Guillardia thetaconvincingly show a red algal ancestry for the cryptophyteplastid, although the precise relationship between crypto-phyte plastids and other red secondary plastids is notknown.

Based on the unique membrane topology of their plas-tids and the shared presence of the photosynthetic pigmentchlorophyll c2, Cavalier-Smith (1982, 1986) placed thecryptophytes, heterokonts, and haptophytes together inthe kingdom Chromista, hypothesizing that photosynthesisevolved in these organisms only once as a result of a single

Key words: cryptophytes, plastid, chromists, chromalveolates,secondary endosymbiosis.

Email: [email protected].

Mol. Biol. Evol. 24(8):1832–1842. 2007doi:10.1093/molbev/msm101Advance Access publication May 23, 2007

� The Author 2007. Published by Oxford University Press on behalf ofthe Society for Molecular Biology and Evolution. All rights reserved.For permissions, please e-mail: [email protected]

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

endosymbiosis in their common ancestor. More specifi-cally, cryptophytes have been argued to be the deepestbranching of the 3 groups, with the haptophytes and heter-okonts having diverged from one another more recently(Cavalier-Smith 2003). The so-called ‘‘chromalveolate’’hypothesis goes 1 step further in postulating that the endo-symbiosis that gave rise to the chromist plastid occurredeven earlier, in a common ancestor these organisms sharedwith dinoflagellates and apicomplexans (Cavalier-Smith1999, 2004). The main rationale behind this idea is to in-voke the fewest number of secondary endosymbioses pos-sible, given that each plastid acquisition requires extensivenucleus-to-nucleus gene transfers and the evolution of a pro-tein targeting system (Cavalier-Smith 1999; McFadden1999). However, inferring the minimum number of second-ary endosymbioses has important implications for how weinterpret the evolutionary history of a significant fraction ofeukaryotic biodiversity. This is because many dinofla-gellates and heterokonts are aplastidic and/or nonphotosyn-thetic, and the ciliates, which together with dinoflagellatesand apicomplexans comprise the alveolates, are an entirelynonphotosynthetic lineage. If correct, the chromalveolatehypothesis demands that all dinoflagellates, heterokonts,and ciliates evolved from plastid-bearing ancestors (for dis-cussion see Bachvaroff, Sanchez Puerta, and Delwiche2005; Bodyl 2005). The recently sequenced macronucleargenome of the ciliate Tetrahymena thermophila providedno evidence for a photosynthetic ancestry in ciliates (Eisenet al. 2006), although plastid-associated genes were foundin the genome of the aplastidic heterokont pathogen Phy-tophthora (Tyler et al. 2006), suggesting that outright or-ganelle loss is possible.

Early molecular data brought to bear on the origin(s)of chromist plastids failed to provide support for their com-mon ancestry. Phylogenies of RuBisCO and plastid smallsubunit ribosomal DNA (SSU rDNA) placed cryptophytes,heterokonts, and haptophytes as independent lineageswithin red algae (e.g., Medlin et al. 1995; Daugbjerg andAndersen 1997; Oliveira and Bhattacharya 2000) and wereinterpreted as evidence that their plastids were acquired in-dependent of one another. Early nuclear SSU rDNA phy-logenies failed to unite the host components of theselineages (Bhattacharya, Helmchen, and Melkonian 1995),consistent with separate plastid origins. More recent anal-yses using larger concatenated datasets have provided ev-idence both for (Yoon et al. 2002; Rogers et al. 2007) andagainst (Martin et al. 2002) the hypothesis that chromistplastids are monophyletic. Analysis of endosymbiotic re-placements involving nucleus-encoded, plastid-targetedproteins have provided strong evidence in favor of chromistand chromalveolate monophyly (Fast et al. 2001; Harperand Keeling 2003; Patron, Rogers, and Keeling 2004), al-though the significance of such replacements has beenquestioned (Bodyl 2005).

A consistent trend in molecular phylogenies that dorecover chromist monophyly is the basal position of cryp-tophytes with respect to the other chlorophyll c-containinglineages (Yoon et al. 2002; Bachvaroff, Sanchez Puerta,and Delwiche 2005; Rogers et al. 2007). In contrast, a recentcomprehensive survey of plastid genomes revealed anextremely rare lateral gene transfer (LGT) involving a

noncyanobacterial rpl36 gene in the plastid genome ofcryptophytes and haptophytes but not heterokonts, suggest-ing that the former 2 lineages could be each other’s closestrelatives (Rice and Palmer 2006). On balance, the evolu-tionary history of chromist plastids is controversial, and at-tempts to resolve the issue have been hampered by thelimited amount of genome sequence data available from di-verse members of the 3 groups. Combined with the consid-erable evolutionary distance between chromist plastids andthose of red algae, such limited taxon sampling has madephylogenetic inferences especially prone to artifact.

Here we present the completely sequenced plastid ge-nome of the cryptophyte Rhodomonas salina CCMP1319and analyze it in the context of 15 other genomes, including5 other chromist plastids and 4 red algae. The R. salina ge-nome is unique in containing a laterally transferred dnaXgene encoding the tau/gamma subunit of bacterial DNApolymerase III and, unlike previously sequenced red sec-ondary plastids, contains pseudogenes and introns. Concat-enated phylogenies inferred from a dataset of 45 plastidproteins and subsets thereof provide insight into the com-plex nature of the phylogenetic signal in support of a com-mon ancestry of haptophyte and heterokont plastids to theapparent exclusion of cryptophytes.

Materials and MethodsGenome Sequencing, Assembly, and Annotation

Six fosmid library clones with inserts derived from theR. salina plastid genome were identified in a previous study(Khan et al. 2007). These were subcloned and sequenced to;8� coverage using ET terminator chemistry and Mega-Bace capillary DNA sequencers as described therein. Basedon synteny shared with the plastid genome of G. theta(Douglas and Penny 1999), exact match and degenerate pri-mers were designed to genomic regions between psaB andrpl13, and long-range PCR was used to amplify the remain-ing regions of the R. salina genome. High-fidelity Taq poly-merase was used, and PCR products were purified using theMinElute Gel Extraction Kit (Qiagen Sciences, Valencia,CA). PCR products were cloned using the Topo TA Clon-ing Kit (Invitrogen) according to the manufacturer’s proto-col, and at least 3 independent clones were sequenced peramplicon. Plasmids were isolated using the FastplasmidMini Kit (Eppendorf AG, Hamburg, Germany) and se-quenced using the CEQ Dye Terminator Cycle Sequencing(DCTS) kit (Beckman Coulter, Inc., Fullerton, CA) and runon Beckman Coulter CEQ8000 capillary DNA sequencers.Sequences were assembled into contigs using Staden(Staden 1996). Protein genes were annotated using NCBIORF-finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html)and BLASTX and BLASTN searches at NCBI (http://www.ncbi.nlm.nih.gov/). Ribosomal RNA genes were an-notated via comparison to previously published rRNA se-quences, and tRNA genes were identified using tRNAScan(http://www.genetics.wustl.edu/eddy/tRNAscan-SE/). Thecircular genome map was constructed using CIRDNA(http://bioweb.pasteur.fr/seqanal/interfaces/cirdna.html).The complete R. salina genome is available from GenBank(EF508371).

Plastid Genome Sequence of Rhodomonas salina 1833

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

dnaX Amplification, Cloning, and Sequencing

Cryptophyte cell cultures were obtained from the Pro-vasoli-Guillard National Center for Culture of Marine Phy-toplankton (CCMP) and the Roscoff Culture Collection(RCC) and grown under conditions described previously(Khan et al. 2007). Based on the R. salina dnaX sequenceand the groEL and psb28 genes flanking it, degenerate PCRprimers were designed to amplify dnaX-coding regionsfrom additional Rhodomonas species (R. balticaRCC350, R. salina CCMP1170, and Rhodomonas sp.CCMP1178). PCR products were cloned and sequencedas described above.

Phylogenetic Analyses

A concatenated dataset of 45 plastid-encoded proteinsequences common among 15 algae and 2 cyanobacteria(Synechocystis sp. PCC6803 and Prochlorococcus marinussubsp. pastoris str. CCMP1986) were aligned using Clus-talX (Chenna et al. 2003) and MacClade 4.06 (Maddisonand Maddison 2003). The 45 genes included in the analysiswere atpA, atpB, atpE, atpF, atpH, petA, petB, petD, petG,psaA, psaB, psaC, psaJ, psbA, psbB, psbC, psbD, psbE,psbF, psbH, psbI, psbJ, psbK, psbL, psbN, psbT, rpl2,rpl14, rpl16, rpl20, rpoA, rpoB, rps2, rps3, rps4, rps7,rps8, rps11, rps12, rps14, rps18, rps19, ycf3, ycf4, andpsbZ. PhyML 2.4 (Guindon and Gascuel 2003) and IQPN-NI (Vinh le and Von Haeseler 2004) were used to performML analysis on individual proteins and on the concatenatedset of 9,081 amino acid positions using the cpREV, WAG,and JTT amino acid substitution matrices with a gammadistribution approximated by 4 or 8 categories to model siterate heterogeneity. Statistical support for individualbranches was examined by bootstrapping (100 replicates).To reduce systematic errors associated with saturation andhomoplasy, the fastest-evolving sites were determined us-ing Tree-Puzzle version 5.2 (Strimmer and von Haeseler1996). Trees were constructed from the concatenated align-ment with sites corresponding to rates 6, 7, and 8 removed,recognizing that site rate assignments can vary slightly de-pending on the tree topology considered. Bayesiananalyses were performed using PhyloBayes with the site-heterogeneous CAT model as described in Lartillot, Brink-mann, and Philippe (2007) using 4 gamma-distributed ratecategories. Invariant sites were removed prior to analysis.We also used Tree-Puzzle to perform a site-by-site likeli-hood analysis of all 9,081 amino acid positions in the fulldataset given 2 tree topologies, one in which cryptophytesand haptophytes were each other’s closest relatives, theother where haptophytes and heterokonts shared a commonbranch. Differences in log likelihood were plotted in orderto assess support for the 2 trees. These and other trees werecompared to one another using the site likelihoods calcu-lated above and ‘‘approximately unbiased’’ (AU) tests ofsignificance. AU tests were performed using CONSEL0.1 (Shimodaira and Hasegawa 2001) with default scalingand replicate values. Protein sequences were ‘‘recoded’’ ac-cording to Hrdy et al. (2004). Hydrophobic (MILV) andaromatic (FYW) amino acids were combined into a singlecategory, and cysteine was coded as ‘‘missing data.’’ This

allowed use of PAUP (Swofford 2002) to perform ML anal-yses on the recoded alignment using the general-time-re-versible (GTR) matrix with 4 rate categories. Thefrequency of recoded amino acids, as well as the proportionof invariable sites parameter was estimated from the data.

The R. salina plastid-encoded dnaX protein was usedas a query to identify and retrieve a diverse set of dnaX andreplication factor C proteins from public databases. Sequen-ces were aligned using MacClade 4.06 (Maddison andMaddison 2003), manually adjusted, and analyzed usingPhyML and IQPNNI as described above. The dnaX-onlyalignment contained 72 sequences and 240 unambiguouslyaligned residues. AU tests were performed as describedabove.

Results and DiscussionRhodomonas salina Plastid Genome Sequence

The complete R. salina plastid genome is 135,854 bp(fig. 1), the largest secondary plastid genome characterizedthus far. The overall GþC content is ;34%, and like otherchromist and red algal plastid genomes (Kowallik et al.1995; Douglas and Penny 1999; Ohta et al. 2003; Puerta,Bachvaroff, and Delwiche 2005), the R. salina genome pos-sesses highly similar inverted rDNA cistrons with a GþCcontent of ;50%. The R. salina genome is predicted to en-code 183 genes, including rRNAs and 31 tRNA genes. ThetRNA (M) CAU (not present in the cryptophyte G. theta) isfound in R. salina, and tRNA (S) GGA is substituted withtRNA (S) GCU in G. theta. The ochre termination codonTAA is used in R. salina 79% of the time, with amber(TAG) and opal (TGA) codons being used 18% and 3%,respectively. In 9 cases a valine start codon (GTG) is usedrather than methionine (chll, rbcS, rpl23, rpl24, rps8,rps13, ycf27, ORF99, and dnaB). Five instances of overlap-ping genes are found in R. salina, many of which are alsofound in other chromist plastid genomes. The psbD-psbCoverlap found in R. salina exists in all sequenced chromistgenomes, although the amount of overlap varies. Overlapsinvolving atpD-atpF and rpl4-rpl23 are common to heter-okonts and cryptophytes, but not haptophytes. Single nucle-otide overlaps between rpl16-rpl29 and orf142-orf146 arepresent in R. salina, the former being found only in G. theta.

Two very short regions of the R. salina genome couldnot be sequenced despite repeated attempts using varied se-quencing parameters and progressively smaller subclonedfragments as template. These areas presumably correspondto regions forming extensive secondary structure. Using re-striction enzyme digestion, the sizes of the gaps were de-termined to be ;10 bp (between rps10-petF) and ;100bp (between ftsH25-ycf33) in size. Given their small sizeand the fact that all the genes present in G. theta (exceptfor hypothetical ORFs) are accounted for in R. salina, itis highly unlikely that they harbor additional genes.

Overall, the R. salina plastid genome shows a high de-gree of synteny with the previously sequenced genome ofthe cryptophyte G. theta (Douglas and Penny 1999). Thereare 12 genes present in R. salina that are absent in G. theta,and 8 genes (mostly hypothetical ORFs) are present in G.theta and not in R. salina (supplementary table 1). HlpA,

1834 Khan et al.

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

a histone-like protein previously found only in the G. thetaplastid genome, is also in R. salina and the red algae Cy-anidioschyzon merolae (Ohta et al. 2003) and Galdieriasulphuraria, but absent in haptophytes and heterokontsand nucleus-encoded in apicomplexans (Hall et al. 2002;Nierman et al. 2005; Pain et al. 2005). Additional genespresent in cryptophytes but absent in heterokonts and/orhaptophytes include cpeB, ilvB, ilvH, and infB (supplemen-tary table 1). Almost all of the genes encoded in the genomeof the haptophyte Emiliania huxleyi (Puerta, Bachvaroff,and Delwiche 2005) are present in the R. salina and G. thetagenomes, while several genes present in heterokont plastidsare not found in haptophytes and cryptophytes.

An intriguing feature of the R. salina plastid genome isthe presence of 2 group II introns, none of which are presentin G. theta (Douglas and Penny 1999) or any other chromistplastids (Oudot-Le Secq et al. 2007). A previous study iden-tified a putative ‘‘twintron’’ in the groEL gene in anotherRhodomonas species (Maier et al. 1995), although the R.salina groEL gene presented here lacks an intron. Oneof the R. salina introns resides within the photosystem genepsbN, while the second intron, located between ycf37 and

ycf12 (fig. 1), appears to be degenerate (the evolution of theR. salina introns will be presented in detail elsewhere). Sur-prisingly, the R. salina plastid genome encodes remnants of3 subunits of light-independent protocholorophyllide re-ductase (LIPOR; WchlL, WchlN, WchlB; fig. 1), an enzymeinvolved in the light-independent synthesis of chlorophyll(Shi and Shi 2006). These genes are not found in the G.theta plastid genome nor in any secondary plastids ofred algal origin, but are present in the red alga Porphyrapurpurea (Reith and Munholland 1995), cyanobacteria(Kaneko et al. 1996), glaucophytes (Stirewalt 1995), andnumerous organisms belonging to the green algal lineage,including land plants (Martin et al. 1998, 2002) (supple-mentary table 1). A comparison of the salient features oftheR. salinaplastid genome compared to those of other chro-mists and red algae is presented in supplementary table 1.

Lateral Gene Transfer

The most unexpected finding in the R. salina plastidgenome is the presence of a gene with strong similarity todnaX, which encodes the tau/gamma components of

FIG. 1.—Plastid genome map of Rhodomonas salina CCMP1319. Genes present on the outside of the circle are transcribed clockwise. Annotatedgenes are colored according to the functional categories shown in the inset box.

Plastid Genome Sequence of Rhodomonas salina 1835

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

bacterial DNA polymerase (Blinkova et al. 1993; Dallmannand McHenry 1995). Although plastid DNA polymeraseshave been purified and characterized enzymatically (e.g.,Gaikwad, Hop, and Mukherjee 2002), the process of plastidDNA replication is very poorly understood, and the en-zymes directly involved are invariably nucleus-encoded.The R. salina gene represents the first instance of a putativeDNA polymerase enzyme encoded in plastid DNA. Usingdegenerate PCR primers designed to the R. salina dnaX andto 2 genes flanking it (groEL and psb28; fig. 1), we success-fully amplified dnaX from several additional Rhodomonasplastid genomes, including species CCMP1178, one of thedeepest branching lineages in the Rhodomonas cluster(Lane et al. 2006).

Phylogenetic analysis reveals that the RhodomonasdnaX genes were acquired by LGT. Analysis of dnaX pro-teins together with their closest eukaryotic homologs, i.e.,components of replication factor C (Waga and Stillman1998), clearly indicates that Rhodomonas dnaX is derivedfrom the former and not the latter (fig. 2A). A more focusedanalysis of a larger set of dnaX homologs in isolation (fig.2B) failed to definitively identify the donor of the gene, al-though in most of our analyses the Rhodomonas sequencesbranch (with weak to moderate support) with a subset offirmicutes, i.e., the parasitic mycoplasmas and related or-ganisms. Phylogenies of the genes flanking dnaX clearly

place R. salina within the chromist/red algal clade (datanot shown), indicating that the LGT solely involved dnaX.Significantly, Rhodomonas dnaX appears distantly relatedto homologs in cyanobacteria, eliminating the (remote) pos-sibility that the gene is an ancestral plastid gene. It is nev-ertheless intriguing that with most phylogenetic methodsthe cyanobacterial dnaX homologs branch modestly withthe only other eukaryotic sequences in the dnaX cluster,the so-called ‘‘STICHEL’’ proteins of Oryza sativa andArabidopsis thaliana. These nucleus-encoded, apparentlyland plant-specific proteins are ;800 amino acids largerthan dnaX and are not believed to be involved in DNA rep-lication, but are instead thought to play a role in plant cellmorphogenesis (Ilgenfritz et al. 2003). Putative nuclear lo-calization signals suggest that STICHEL is targeted to thenucleus, and not plastid-localized (Ilgenfritz et al. 2003).AU tests of significance further support the idea that Rho-domonas dnaX is not specifically related to cyanobacterialdnaX or to plant STICHEL proteins; such tests reject alter-nate trees in which the Rhodomonas sequences are placedsister to cyanobacteria (P , 0.05) and STICHEL (P ,0.05), as well as next to the Chlorobiales/Bacteroidetesclade (P , 0.01). Interestingly, a variety of other positionsfor Rhodomonas dnaX within the noncyanobacterial/plantportion of the tree (e.g., alternate placements within and be-tween Firmicutes and Proteobacteria) are not significantly

FIG. 2.—Lateral gene transfer of dnaX in Rhodomonas. (A) PhyML phylogeny of dnaX proteins from 4 Rhodomonas species in the context ofa range of bacterial dnaX proteins rooted with eukaryotic replication factor C homologs. Cryptophyte plastid-encoded dnaX homologs are highlighted.(B) Phylogeny constructed from an alignment of 72 dnaX homologs covering the known breadth of bacterial diversity including cyanobacterialsequences. The STICHEL proteins found in Arabidopsis and Oryza are also present (see text). Bootstrap values are shown where .50%. Scale barsindicate inferred number of substitutions per amino acid site.

1836 Khan et al.

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

worse than the tree shown in figure 2B. It is thus possiblethat the relationship between the Rhodomonas dnaX se-quences and those of mycoplasmas is the result of long-branch attraction or a shared compositional bias. The roleof dnaX in the Rhodomonas plastid is not known, but anobvious possibility is that it functions in DNA replication,perhaps interacting with or replacing some of the nucleus-encoded proteins that must be imported into the organelle.Experiments to better understand the distribution of dnaX incryptophytes are currently underway.

LGT is increasingly recognized as a major factor inplant mitochondrial genome evolution (e.g., Bergthorssonet al. 2003, 2004; Davis and Wurdack 2004), yet LGTs in-volving plastid genes appear to be exceedingly rare. Indeed,a recent comprehensive survey of 204 plastid-encodedgenes by Rice and Palmer (2006) revealed only 1 convinc-ing case of LGT in addition to the previously characterizedproteobacterial-derived large and small subunits of RuBis-CO in red algal plastids and their derivatives (Delwiche andPalmer 1996). In this instance, a noncyanobacterial typerpl36 gene was identified in cryptophytes and haptophytes(Rice and Palmer 2006); the simplest interpretation is thatthe gene replaced the canonical cyanobacterial homolog ina common ancestor of cryptophytes and haptophytes afterthey diverged from heterokonts. The dnaX transfer pre-sented here represents only the third plastid LGT to bedocumented and the second directly involving cryptophytealgae. The significance of this observation is not clear, al-though differences in the apparent frequency of mitochon-drial and plastid LGTs have been linked to the presence/absence of DNA uptake systems (Rice and Palmer2006). It is possible that cryptophyte plastids and, moregenerally, those of chromists and red algae are atypicalin this regard. While single-gene phylogenies of all analyz-able R. salina plastid proteins revealed no additional con-vincing cases of LGT beyond dnaX and rpl36 (data notshown), the presence of multiple apparently unrelated mo-bile genetic elements in different genomic contexts in theplastids of Rhodomonas species (Archibald Lab, unpub-lished) is consistent with the notion of enhanced DNA up-take in cryptophyte plastids.

Concatenated Protein Phylogenies and Chromist PlastidEvolution

While phylogenetic studies have demonstrably shownthat cryptophyte, heterokont, and haptophyte plastids arederived from red algae (reviewed by Bhattacharya, Yoon,and Hackett 2003; Palmer 2003; Keeling 2004; Archibaldand Keeling 2005), single- and multi-gene analyses of plas-tid proteins have produced conflicting results regardingtheir origin(s). With complete plastid genome sequencesnow available from 2 cryptophytes (this study; Douglasand Penny 1999), 3 heterokonts (Kowallik 1995; Oudot-Le Secq et al. 2007), a haptophyte (Puerta, Bachvaroff,and Delwiche 2005) and 4 red algae (Reith 1995; Glockner,Rosenthal, and Valentin 2000; Ohta et al. 2003; Hagopianet al. 2004), we assembled a concatenate of 45 proteins en-coded in 15 plastid genomes as well as homologs from2 cyanobacteria (9,081 amino acids in total with no missing

data). This dataset (and derivatives thereof) was subjectedto a battery of phylogenetic analyses aimed at rigorouslytesting relationships within and between chromists andred algae. Due to the limited coding capacity of their plastidgenomes (see Bachvaroff, Sanchez Puerta, and Delwiche2005 and references therein), sequences from chloro-phyll-c-containing dinoflagellates were not included.

The 45-protein dataset was first analyzed usingPhyML (Guindon and Gascuel 2003) and IQPNNI (Vinhle and Von Haeseler 2004), 2 methods that explore treespace using maximum likelihood (ML), as well as Phylo-Bayes, a method employing a site-heterogeneous model toaccount for position-specific characteristics of protein evo-lution (Lartillot and Philippe 2004; Lartillot, Brinkmann,and Philippe 2007). All 3 methods recovered a monophy-letic chromist assemblage with high support (fig. 3A, Sup-plementary figs. S1A–C). Heterokonts and haptophytesconsistently branched as each other’s closest relatives, ashas been observed in recent analyses (Yoon et al. 2002;Rogers et al. 2007), and the chromists branched with theGracilaria/Porphyra clade of red algae to the exclusionof the cyanidiales (fig. 3A). More generally, the positionof the glaucophyte Cyanophora paradoxa relative to greenalgae/land plants and red algae/chromists varied dependingon the method used, branching weakly with green plastidsin the PhyloBayes tree. Removal of the fastest-evolvingsites (Materials and Methods) had no effect on the resultingtree topologies, although statistical support for chromistmonophyly decreased from 93 to 77% with PhyML (fig.3A). Significantly, the improved chromist taxon samplingused here produced trees largely unaffected by the inclusionof proteins involved in transcription and translation in thefull concatenate, unlike previous results (Martin et al. 1998;Hagopian et al. 2004). Anomalous results were, however,obtained in analyses of transcription and translation pro-teins on their own. For example, PhyloBayes analysisof a 16-protein transcription/translation-only dataset re-sulted in a topology in which cryptophytes and the hapto-phyte Emiliania huxleyi branched as sister taxa witha posterior probability of 0.97 (Supplementary fig. S1N;see below). Exclusion of the cyanidiales, which representlong branches in our trees and have been shown to be prob-lematic in concatenated analyses of nuclear genes (Rodri-guez-Ezpeleta et al. 2005), did not impact the relativebranching order of the 3 chromist groups or support forchromist monophyly (Supplementary figs. S1Q–1T).

We next tested the impact of protein composition biason our concatenated phylogenies by recoding the aminoacids into 4 groups with biochemically similar propertiesas done by Hrdy et al. (2004), coding cysteine as missingdata and combining hydrophobic (MILV) and aromatic(FYW) amino acids as a single category (Rodriguez-Ezpeleta et al. 2007). Analysis of the 45-protein recodeddataset produced strikingly different results (fig. 3B) fromthose described above. While the chromists and red algaestill branched together, support for chromist monophylydisintegrated. Cryptophytes and haptophytes branchedweakly as sister groups, with the heterokonts branching out-side a cluster of these taxa together with Porphyra and Gra-cilaria. When the fastest-evolving sites were removed fromthe recoded dataset, chromist monophyly was once again

Plastid Genome Sequence of Rhodomonas salina 1837

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

recovered, although support was still weak (Supplementaryfig. S1H). A consistent observation in both the standard andrecoded analyses was the long branches observed in the het-erokont sequences as well as those of the cyanidiales.

The dramatic differences seen in analyses performedwith and without amino acid recoding led us to explore theimpact of protein composition bias further. Specifically, wecalculated GARP/FYMINK ratios (Foster and Hickey1999) for each protein and each taxon and assessed the be-tween-taxon variation for each. The results indicate thata subset of the proteins in our concatenate exhibit signifi-cant variation in GARP/FYMINK ratios, with the cyani-diales, heterokonts, and single haptophyte being theoutliers in most cases (Supplementary fig. S2A). Photosyn-thesis- and nonphotosynthesis-related proteins were distrib-uted more or less randomly when proteins were rankedfrom lowest to highest in terms of standard deviation(SD). We performed analyses on subsets of the concatenate:9 proteins with SD . 0.4, the remaining 36 proteins, and 21proteins with SD , 0.2 (Supplementary Figs. S3A–S3L).Interestingly, the proteins with the most inter-taxon varia-tion in GARP/FYMINK ratio usually produced topologiesincongruent with the full concatenate, often with crypto-phytes and haptophytes as sister taxa and with the hetero-konts branching elsewhere in the tree (e.g., Supplementaryfig. S2B). PhyloBayes was the only method that still recov-ered chromist monophyly when the most compositionallyvaried proteins were analyzed in isolation (Supplementaryfig. S3B). This is consistent with the fact that the methodappears resistant to systematic artifacts such as long-branchattraction (Lartillot, Brinkmann, and Philippe 2007).

Rice and Palmer (2006) recently described the pres-ence of an LGT-derived noncyanobacterial-type rpl36 genein the plastids of cryptophytes and haptophytes, which wasinterpreted as evidence for the sisterhood of these 2 groupsto the exclusion of heterokonts and alveolates. This rela-tionship is inconsistent with the bulk of the plastid phylog-

enies presented in this study and elsewhere (Yoon et al.2002; Hagopian et al. 2004; Rogers et al. 2007). Neverthe-less, the sisterhood of cryptophytes and haptophytes is ob-served in some of our analyses (e.g., fig. 3B, Supplementaryfig. S1N) and recent phylogenies of nucleus-encodedproteins have provided support for a specific relationshipbetween these 2 groups (Harper, Waanders, and Keeling2005; Hackett et al. 2007; Patron, Inagaki, and Keeling2007). We used AU tests to further assess the relativebranching order within and between chromists and redalgae using the PhyloBayes tree shown in figure 3A as areference. From a set of 315 alternate trees, only 3 topol-ogies were not rejected at P, 0.05. Significantly, these in-clude the tree shown in figure 3B and 2 trees in which thered algae G. tenuistitipata and P. purpurea were placed assister to the cryptophytes and the haptophytes and hetero-konts were monophyletic. Our genome-wide single-gene/protein analyses of the R. salina genome revealed that,of the 103 proteins analyzed, a full 38% produced phylog-enies in which cryptophytes and haptophytes were sistertaxa, consistent with the results of Rice and Palmer(2006). Statistical support for this relationship was highlyvariable (,20%–100%; average 5 48%), as expected fromsingle-gene phylogenies inferred from anciently divergedsequences.

In an effort to shed light on the apparently contradic-tory scenarios for the evolutionary relationship amongst the3 chromist lineages, we performed a site-by-site likelihoodanalysis of all 9,081 amino acids present in the 45-proteinconcatenate under 2 different topologies, one in which hap-tophytes and heterokonts were each others’ closest relatives(i.e., the topology shown in fig. 3A), the other where cryp-tophytes and haptophytes branch together, as predicted bythe rpl36 LGT (Rice and Palmer 2006). The differences inlog likelihood (DlnL) for each site were then plotted andordered according to their inferred evolutionary rate (fig.4). In total, 5,413 sites support a haptophyte-heterokont

FIG. 3.—Phylogeny of chromist plastids. (A) PhyloBayes tree constructed from a concatenate of 45 proteins (9,081 amino acids). PhyloBayes (PB)posterior probabilities are provided for all nodes, while PB and PhyML (PML) support is provided for significant nodes for all sites and with the fastest-evolving sites removed. (B) Protein maximum likelihood tree constructed using the full 45-protein dataset with amino acids recoded according to theirbiochemical characteristics (see text). Bootstrap values are provided. Scale bars indicate inferred number of substitutions per amino acid site.

1838 Khan et al.

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

relationship, while 3,668 sites favor a cryptophyte-haptophyte relationship (fig. 4; the majority of the sitesin the alignment [3,370] were invariant with very smallDlnL values and a negligible impact on support for one to-pology over the other). When considering sites belonging tosite rate categories 2–8, 1,994 sites support a haptophyte-cryptophyte relationship, while 3,718 sites support a hapto-phyte-heterokont relationship. This result is consistentwith the full-dataset concatenated phylogenies (figs. 3A, Sup-plementary figs. S1 and S2) that support a haptophyte-heterokont relationship. The only 2 groups of amino acidsites that support a haptophyte-cryptophyte relationshipover heterokonts-haptophytes fall into site rate categories7 and 8, i.e., the fast-evolving sites. As expected, removalof the fast-evolving sites from the full-concatenated datasetresulted in increased bootstrap support for the monophylyof haptophytes and heterokonts (Supplementary fig. S1),but decreased support for the node uniting chromists.The majority of the sites that support a haptophyte-hetero-kont relationship in site rate categories 2–8 have DlnL val-ues between 0 and �0.1 (3,063 sites), the same being thecase for the haptophyteþcryptophyte relationship (1,342

sites between 0 and þ0.1). Unexpectedly, the total numberof sites with more extreme DlnL values (i.e., . þ0.1 or ,�0.1) in support of the 2 topologies is approximately equal:652 sites for haptophytes-cryptophytes, 655 for hetero-konts-haptophytes (fig. 4). This indicates that the high sta-tistical support for the haptophyte-heterokont relationshipseen in the concatenated phylogenies is the result of thecombined signal from a proportionately large number ofamino acid positions (3,063) that do not strongly discrim-inate between the 2 topologies. The total number of sitesthat strongly support haptophytesþheterokonts versuscryptophytesþhaptophytes is very similar.

As discussed above, 38% of the individual proteinsanalyzed in this study showed cryptophytes and hapto-phytes as each other’s closest relatives. In order to identifyindividual proteins in support of the alternative topologiesshown in figure 4, we performed a likelihood analysisof each of the 45 proteins analyzed in the full dataset inisolation. Consistent with the results described above,and a similar analysis performed by Rice and Palmer(2006), 28 out of 45 proteins (62%) supported a hapto-phyte-heterokont relationship (fig. S4). Nine genes/proteins

FIG. 4.—Site-by-site maximum likelihood analysis of all 9,081 residues in the 45-protein dataset given 2 competing tree topologies. The treesdiffer only in the relative branching order of cryptophytes, haptophytes and heterokonts. Differences in site log likelihoods given the 2 trees (DlnL) areplotted and ranked according to the topology they support and their inferred rate of evolution. Only 3 sites in the alignment fell into site rate category 1and possessed very small DlnL values: they are therefore included in the set of invariant sites in this graph.

Plastid Genome Sequence of Rhodomonas salina 1839

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

(20%) possessed DlnL values in the range of þ0.033 to�0.05 (atpF, petG, rpl20, rps18, psbN, psbF, psbK, psbL,rps11), indicating that they support neither topology verystrongly. With the exception of rpl20, all of the above-men-tioned proteins have relatively high GARP/FYMINK ratios(Supplementary fig. S2A), and most are less than 100 aminoacids in size. On the one extreme, the photosystem genespsbB, psaA, and petD strongly support haptophy-tesþheterokonts, as was shown to be the case previously(Yoon et al. 2002). On the other, rps14, rpoB, and atpBstrongly favor cryptophytesþhaptophytes.

Conclusion

We have completely sequenced the plastid genome ofthe cryptophyte alga Rhodomonas salina and identified thefirst known case of a putative DNA polymerase enzyme en-coded in plastid DNA. Phylogenies constructed from a largeconcatenated set of plastid proteins provide support for themonophyly of chromist plastids and consistently favor a re-lationship between heterokonts and haptophytes to the ex-clusion of cryptophytes. However, we have also shown thatsupport for the chromist plastid monophyly and heterokont-haptophyte sisterhood vanishes when the data is recoded tominimize the impact of amino acid composition bias, leav-ing open the possibility that an as yet undetermined system-atic bias is responsible for the strongly supported treespresented herein. We can thus neither confirm nor refutea specific relationship between the plastids of cryptophytesand haptophytes as suggested by the shared presence of anLGT-derived rpl36 gene (Rice and Palmer 2006) and recentlarge-scale analyses of nuclear genes (Hackett et al. 2007;Patron, Inagaki, and Keeling 2007). Additional plastid ge-nome sequences from diverse haptophytes, heterokonts,and cryptophytes will be required to further break up thelong terminal branches that presently characterize chromistplastid phylogenies. It is also important to consider that se-quences from the fourth chlorophyll c–containing algal lin-eage, the dinoflagellates, were not included in our analysis,and complete plastid genomes are currently only availablefrom a small fraction of the known diversity of red algae(Yoon et al. 2006). The impact of these missing data onthe relative branching order of the 3 chromist lineagesand, more generally, support for chromist monophyly ispresently unclear. Together with trees inferred from concat-enated nuclear gene sequences from chromists, dinoflagel-lates, and apicomplexans, analysis of a further expandedplastid protein dataset should make it possible to assessthe congruence between the host cell and plastid compo-nents of these complex organisms and definitively testthe hypothesis of a single endosymbiotic origin of plastidsin chromist and chromalveolate taxa (Cavalier-Smith1999).

Suppementary Material

Supplementary figures and table are available atMolecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Acknowledgments

We are grateful to Marie-Pierre Oudot-Le Secq andBeverly R. Green for providing access to the T. pseudonanaand P. tricornutum plastid genomes before public releaseand to Jessica Leigh, Andrew Roger, Susan Douglas,and Gabino Sanchez Perez for help with analyses andfor comments on the manuscript. Three anonymous re-viewers are also thanked for helpful suggestions. This workwas supported by Genome Atlantic and an operating grantawarded to J.M.A. from NSERC. J.M.A. is a Scholar of theCanadian Institute for Advanced Research, Program inEvolutionary Biology.

Literature Cited

Archibald JM. 2007. Nucleomorph genomes: structure, function,origin and evolution. Bioessays. 4:392–402.

Archibald JM, Keeling PJ. 2002. Recycled plastids: a greenmovement in eukaryotic evolution. Trends Genet. 18:577–584.

Archibald JM, Keeling PJ. 2005. On the origin and evolution ofplastids. In: Saap J, editor. Microbial phylogeny andevolution. New York: Oxford University Press. p. 238–260.

Bachvaroff TR, Sanchez Puerta MV, Delwiche CF. 2005.Chlorophyll c-containing plastid relationships based onanalyses of a multigene data set with all four chromalveolatelineages. Mol Biol Evol. 22:1772–1782.

Bergthorsson U, Adams KL, Thomason B, Palmer JD. 2003.Widespread horizontal transfer of mitochondrial genes inflowering plants. Nature. 424:197–201.

Bergthorsson U, Richardson AO, Young GJ, Goertzen LR,Palmer JD. 2004. Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella. Proc Natl Acad Sci USA.101:17747–17752.

Bhattacharya D, Helmchen T, Melkonian M. 1995. Molecularevolutionary analyses of nuclear-encoded small subunitribosomal RNA identify an independent rhizopod lineagecontaining the Euglyphidae and the Chlorarachniophyta.J Eukaryot Microbiol. 42:64–68.

Bhattacharya D, Yoon HS, Hackett JD. 2003. Photosyntheticeukaryotes unite: endosymbiosis connects the dots. Bioessays.26:50–60.

Blinkova A, Hervas C, Stukenberg PT, Onrust R, O’Donnell ME,Walker JR. 1993. The Escherichia coli DNA polymerase IIIholoenzyme contains both products of the dnaX gene, tau andgamma, but only tau is essential. J Bacteriol. 175:6018–6027.

Bodyl A. 2005. Do plastid-related characters support thechromalveolate hypothesis? J Phycol. 41:712–719.

Cavalier-Smith T. 1982. The origins of plastids. Biol J Linn Soc.17:289–306.

Cavalier-Smith T. 1986. The kingdom Chromista: origin andsystematics. Progr Phycol Res. 4:309–347.

Cavalier-Smith T. 1999. Principles of protein and lipid targetingin secondary symbiogenesis: euglenoid, dinoflagellate, andsporozoan plastid origins and the eukaryote family tree.J Eukaryot Microbiol. 46:347–366.

Cavalier-Smith T. 2003. Genomic reduction and evolution ofnovel genetic membranes and protein-targeting machinery ineukaryote-eukaryote chimaeras (meta-algae). Philos Trans RSoc Lond B Biol Sci. 358:109–133 discussion 133–104.

Cavalier-Smith T. 2004. Chromalveolate diversity and cellmegaevolution: interplay of membranes, genomes andcytoskeleton. In: Hirt RP, Horner D, editors. Organelles,

1840 Khan et al.

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

genomes and eukaryotic evolution. London: Taylor andFrancis. p. 75–108.

Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ,Higgins DG, Thompson JD. 2003. Multiple sequencealignment with the Clustal series of programs. Nucleic AcidsRes. 31:3497–3500.

Dallmann HG, McHenry CS. 1995. DnaX complex of Escher-ichia coli DNA polymerase III holoenzyme. Physicalcharacterization of the DnaX subunits and complexes. J BiolChem. 270:29563–29569.

Daugbjerg N, Andersen RA. 1997. Phylogenetic analyses of therbcL sequences from haptophytes and heterokont algaesuggest their chloroplasts are unrelated. Mol Biol Evol.14:1242–1251.

Davis CC, Wurdack KJ. 2004. Host-to-parasite gene transfer inflowering plants: phylogenetic evidence from Malpighiales.Science. 305:676–678.

Delwiche CF. 1999. Tracing the thread of plastid diversitythrough the tapestry of life. Am Nat. 154(Suppl.):S164–S177.

Delwiche CF, Palmer JD. 1996. Rampant horizontal transfer andduplication of rubisco genes in eubacteria and plastids. MolBiol Evol. 13:873–882.

Douglas SE, Penny SL. 1999. The plastid genome from thecryptomonad alga, Guillardia theta: complete sequence andconserved synteny groups confirm its common ancestry withred algae. J Mol Evol. 48:236–244.

Douglas SE, Zauner S, Fraunholz M, Beaton M, Penny S,Deng L, Wu X, Reith M, Cavalier-Smith T, Maier UG. 2001.The highly reduced genome of an enslaved algal nucleus.Nature. 410:1091–1096.

Eisen JA, Coyne RS, Wu M, et al. (53 co-authors). 2006.Macronuclear genome sequence of the ciliate Tetrahymenathermophila, a model eukaryote. PLoS Biol. 4:e286.

Fast NM, Kissinger JC, Roos DS, Keeling PJ. 2001. Nuclear-encoded, plastid-targeted genes suggest a single commonorigin for apicomplexan and dinoflagellate plastids. Mol BiolEvol. 18:418–426.

Foster PG, Hickey DA. 1999. Compositional bias may affectboth DNA-based and protein-based phylogenetic reconstruc-tions. J Mol Evol. 48:284–290.

Gaikwad A, Hop DV, Mukherjee SK. 2002. A 70-kDachloroplast DNA polymerase from pea (Pisum sativum) thatshows high processivity and displays moderate fidelity. MolGenet Genomics. 267:45–56.

Gilson PR, Maier UG, McFadden GI. 1997. Size isn’teverything: lessons in genetic miniaturisation from nucleo-morphs. Curr Opin Genet Dev. 7:800–806.

Gilson PR, Su V, Slamovits CH, Reith ME, Keeling PJ,McFadden GI. 2006. From the Cover: Complete nucleotidesequence of the chlorarachniophyte nucleomorph: Nature’ssmallest nucleus. Proc Natl Acad Sci USA. 103:9566–9571.

Glockner G, Rosenthal A, Valentin K. 2000. The structure andgene repertoire of an ancient red algal plastid genome. J MolEvol. 51:382–390.

Gray MW, Doolittle WF. 1982. Has the endosymbionthypothesis been proven? Microbiol Rev. 46:1–42.

Guindon S, Gascuel O. 2003. A simple, fast, and accuratealgorithm to estimate large phylogenies by maximum likeli-hood. Syst Biol. 52:696–704.

Hackett JD, Yoon HS, Li S, Reyes-Prieto A, Rummele SE,Bhattacharya D. 2007. Phylogenomic analysis supports themonophyly of cryptophytes and haptophytes and theassociation of ‘Rhizaria’ with chromalveolates. Mol BiolEvol. doi:10.1093/molbev/msm089.

Hagopian JC, Reis M, Kitajima JP, Bhattacharya D, deOliveira MC. 2004. Comparative analysis of the completeplastid genome sequence of the red alga Gracilaria

tenuistipitata var. liui provides insights into the evolution ofrhodoplasts and their relationship to other plastids. J MolEvol. 59:464–477.

Hall N, Pain A, Berriman M, et al. (80 co-authors). 2002. Sequenceof Plasmodium falciparum chromosomes 1, 3–9 and 13.Nature. 419:527–531.

Harper JT, Keeling PJ. 2003. Nucleus-encoded, plastid-targetedglyceraldehyde-3-phosphate dehydrogenase (GAPDH) indi-cates a single origin for chromalveolate plastids. Mol BiolEvol. 20:1730–1735.

Harper JT, Waanders E, Keeling PJ. 2005. On the monophyly ofchromalveolates using a six-protein phylogeny of eukaryotes.Int J Syst Evol Microbiol. 55:487–496.

Hrdy I, Hirt RP, Dolezal P, Bardonova L, Foster PG, Tachezy J,Embley TM. 2004. Trichomonas hydrogenosomes contain theNADH dehydrogenase module of mitochondrial complex I.Nature. 432:618–622.

Ilgenfritz H, Bouyer D, Schnittger A, Mathur J, Kirik V,Schwab B, Chua NH, Jurgens G, Hulskamp M. 2003. TheArabidopsis STICHEL gene is a regulator of trichome branchnumber and encodes a novel protein. Plant Physiol.131:643–655.

Ishida K, Green BR, Cavalier-Smith T. 1999. Diversification ofa chimaeric algal group, the chlorarachniophytes: phylogenyof nuclear and nucleomorph small-subunit rRNA genes. MolBiol Evol. 16:321–331.

Kaneko T, Sato S, Kotani H, et al. (24 co-authors). 1996.Sequence analysis of the genome of the unicellularcyanobacterium Synechocystis sp. strain PCC6803. II.Sequence determination of the entire genome and assignmentof potential protein-coding regions (supplement). DNA Res.3:185–209.

Keeling PJ. 2004. Diversity and evolutionary history of plastidsand their hosts. Am J Bot. 91:1481–1493.

Khan H, Kozera C, Curtis BA, Bussey JT, Theophilou S,Bowman S, Archibald JM. 2007. Retrotransposons andtandem repeat sequences in the nuclear genomes ofcryptomonad algae. J Mol Evol. 64:223–236.

Kowallik KV, Stoebe B, Schaffran I, Kroth-Pancic P, Freier U.1995. The chloroplast genome of a chlorophyll aþc-containing alga, Odontella sinensis. Plant Mol Biol Rep.13:336–342.

Kowallik KV, Stoebe B, Schaffran I, Frier U. 1995. Thechloroplast genome of chlorophyll aþc-containing algaOdontella sinensis. Plant Mol Biol Rep. 13:336–342.

Lane CE, Khan H, MacKinnon M, Fong A, Theophilou S,Archibald JM. 2006. Insight into the diversity and evolutionof the cryptomonad nucleomorph genome. Mol Biol Evol.23:856–865.

Lartillot N, Brinkmann H, Philippe H. 2007. Suppression oflong-branch attraction artefacts in the animal phylogeny usinga site-heterogenious model. BMC Evol Biol. 7:S4.

Lartillot N, Philippe H. 2004. A Bayesian mixture model foracross-site heterogeneities in the amino-acid replacementprocess. Mol Biol Evol. 21:1095–1109.

Maddison W, Maddison D. 2003. MacClade. Sinauer Associates:Sunderland, Massachusetts.

Maier UG, Rensing SA, Igloi GL, Maerz M. 1995. Twintrons arenot unique to the Euglena chloroplast genome: structure andevolution of a plastome cpn60 gene from a cryptomonad. MolGen Genet. 246:128–131.

Martin W, Rujan T, Richly E, Hansen A, Cornelsen S, Lins T,Leister D, Stoebe B, Hasegawa M, Penny D. 2002.Evolutionary analysis of Arabidopsis, cyanobacterial, andchloroplast genomes reveals plastid phylogeny and thousandsof cyanobacterial genes in the nucleus. Proc Natl Acad SciUSA. 99:12246–12251.

Plastid Genome Sequence of Rhodomonas salina 1841

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from

Martin W, Stoebe B, Goremykin V, Hansmann S, Hasegawa M,Kowallik KV. 1998. Gene transfer to the nucleus and theevolution of chloroplasts. Nature. 393:162–165.

McFadden GI. 1999. Plastids and protein targeting. J. Eukaryot.Microbiol. 46:339–346.

Medlin LK, Cooper A, Hill C, Wrieden S, Wellbrock U. 1995.Phylogenetic position of the Chromista plastids based onsmall subunit rRNA coding regions. Curr Genet. 28:560–565.

Moreira D, Le Guyader H, Phillippe H. 2000. The origin of redalgae and the evolution of chloroplasts. Nature. 405:69–72.

Nierman WC, Pain A, Anderson MJ, et al. (98 co-authors). 2005.Genomic sequence of the pathogenic and allergenic filamentousfungus Aspergillus fumigatus. Nature. 438:1151–1156.

Ohta N, Matsuzaki M, Misumi O, Miyagishima SY, Nozaki H,Tanaka K, Shin IT, Kohara Y, Kuroiwa T. 2003. Completesequence and analysis of the plastid genome of the unicellularred alga Cyanidioschyzon merolae. DNA Res. 10:67–77.

Oliveira MC, Bhattacharya D. 2000. Phylogeny of the Bangio-phycidae (Rhodophyta) and the secondary endosymbioticorigin of algal plastids. Am J Bot. 87:482–492.

Oudot-Le Secq M-P, Grimwood J, Shapiro H, Armbrust EV,Bowler C, Green BR. 2007. Chloroplast genomes of thediatoms Phaeodactylum tricornutum and Thalassiosira pseu-donana: comparison with other plastid genomes of the redlineage. Mol Genet Genom. 277:427–439.

Pain A, Renauld H, Berriman M, et al. (50 co-authors). 2005.Genome of the host-cell transforming parasite Theileriaannulata compared with T. parva. Science. 309:131–133.

Palmer JD. 2003. The symbiotic birth and spread of plastids: howmany times and whodunnit? J. Phycol. 39:4–11.

Patron NJ, Inagaki Y, Keeling PJ. 2007. Multiple genephylogenies support the monophyly of cryptomonad andhaptophyte host lineages. Curr Biol. 17:887–891.

Patron NJ, Rogers MB, Keeling PJ. 2004. Gene replacement offructose-1,6-bisphosphate aldolase supports the hypothesis ofa single photosynthetic ancestor of chromalveolates. EukaryotCell. 3:1169–1175.

Puerta MV, Bachvaroff TR, Delwiche CF. 2005. The completeplastid genome sequence of the haptophyte Emiliania huxleyi:a comparison to other plastid genomes. DNA Res. 12:151–156.

Reith ME, Munholland J. 1995. Complete nucleotide sequence ofthe Porphyra purpurea chloroplast genome. Plant Mol BioRep. 13:333–335.

Rice DW, Palmer JD. 2006. An exceptional horizontal genetransfer in plastids: gene replacement by a distant bacterialparalog and evidence that haptophyte and cryptophyte plastidsare sisters. BMC Biol. 4:31.

Rodriguez-Ezpeleta N, Brinkmann H, Burey SC, Roure B,Burger G, Loffelhardt W, Bohnert HJ, Philippe H, Lang BF.2005. Monophyly of primary photosynthetic eukaryotes: greenplants, red algae, and glaucophytes. Curr Biol. 15:1325–1330.

Rodriguez-Ezpeleta N, Philippe H, Brinkmann H, Becker B,Melkonian M. 2007. Phylogenetic analyses of nuclear,mitochondrial and plastid multi-gene datasets support theplacement of Mesostigma in the Streptophyta. Mol Biol Evol.24:723–731.

Rogers MB, Gilson PR, Su V, McFadden GI, Keeling PJ. 2007.The complete chloroplast genome of the chlorarachniophyteBigelowiella natans: evidence for independent origins ofchlorarachniophyte and euglenid secondary endosymbionts.Mol Biol Evol. 24:54–62.

Shi C, Shi X. 2006. Characterization of three genes encoding thesubunits of light-independent protochlorophyllide reductase inChlorella protothecoides CS-41. Biotechnol Prog.22:1050–1055.

Shimodaira H, Hasegawa M. 2001. CONSEL: for assessing theconfidence in of phylogenetic tree selection. Bioinformatics.17:1246–1247.

Soll J, Schleiff E. 2004. Protein import into chloroplasts. Nat RevMol Cell Biol. 5:198–208.

Staden R. 1996. The Staden sequence analysis package.Molecular Biotechnology. 5:233–241.

Stirewalt V, Michalowski CB, Loffelhardt W, Bohnert HJ,Bryant DA. 1995. Nucleotide sequence of the cyanellegenome from Cyanophora paradoxa. Plant Molecular Bi-ology Reporter. 13:327–332.

Strimmer K, von Haeseler A. 1996. Quartet puzzling: a quartetmaximum likelihood method for reconstructing tree topolo-gies. Mol Biol Evol. 13:964–969.

Swofford DL. 2002. Phylogenetic analysis using parsimony(*and other methods), version 4.0B10 PPC. Sunderland, MA:Sinauer Associates.

Tyler BM, Tripathy S, Zhang X, et al. (53 co-authors). 2006.Phytophthora genome sequences uncover evolutionaryorigins and mechanisms of pathogenesis. Science. 313:1261–1266.

Vinh le S, Von Haeseler A. 2004. IQPNNI: moving fast throughtree space and stopping in time. Mol Biol Evol. 21:1565–1571.

Waga S, Stillman B. 1998. The DNA replication fork ineukaryotic cells. Annu Rev Biochem. 67:721–751.

Yoon HS, Hackett JD, Pinto G, Bhattacharya D. 2002. Thesingle, ancient origin of chromist plastids. Proc Natl Acad SciUSA. 99:15507–15512.

Yoon HS, Muller KM, Sheath RG, Ott FD, Bhattacharya D.2006. Defining the major lineages of red algae (Rhodophyta).J Phycol. 42:482–492.

Geoffrey McFadden, Associate Editor

Accepted May 17, 2007

1842 Khan et al.

by guest on August 8, 2015

http://mbe.oxfordjournals.org/

Dow

nloaded from