Improving the analysis of dinoflagellate phylogeny based on rDNA

18
Protist, Vol. 156, 269—286, August 2005 http://www.elsevier.de/protis Published online date 3 August 2005 ORIGINAL PAPER Improving the Analysis of Dinoflagellate Phylogeny based on rDNA Shauna Murray a,1,2 , Ma ˚ rten Flø Jørgensen b , Simon Y.W. Ho c , David J. Patterson a,d , and Lars S. Jermiin a,e a The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan b Department of Phycology, Biological Institute, University of Copenhagen, Nørre Farigmagsgade 2D, 1353 Copenhagen K, Denmark c Henry Wellcome Ancient Biomolecules Centre, Department of Zoology, University of Oxford, Oxford OX1 3PS, United Kingdom d The Josephine Bay Paul Center in Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, Mass 02543, USA e Sydney University Biological Informatics and Technology Centre (SUBIT), Medical Foundation Building K25, University of Sydney, NSW 2006, Australia Submitted August 18, 2004; Accepted May 19, 2005 Monitoring Editor: Karen A. Steidinger Phylogenetic studies of dinoflagellates are often conducted using rDNA sequences. In analyses to date, the monophyly of some of the major lineages of dinoflagellates remain to be demonstrated. There are several reasons for this uncertainty, one of which may be the use of models of evolution that may not closely fit the data. We constructed and examined alignments of SSU and partial LSU rRNA along with a concatenated alignment of the two molecules. The alignments showed several characteristics that may confound phylogeny reconstruction: paired helix (stem) regions that contain non-independently evolving sites, high levels of compositional heterogeneity among some of the sequences, high levels of incompatibility (homoplasy), and rate heterogeneity among sites. Taking into account these confounding factors, we analysed the data and found that the Gonyaulacales, a well- supported clade, may be the most recently diverged order. Other supported orders were, in the analysis based on SSU, the Suessiales and the Dinophysiales; however, the Gymnodiniales and Prorocentrales appeared to be polyphyletic. The Peridiniales without Heterocapsa species appeared as a monophyletic group in the analysis based on LSU; however, the support was low. The concatenated alignment did not provide a better phylogenetic resolution than the single gene alignments. & 2005 Elsevier GmbH. All rights reserved. Key words: Bayesian analysis; Dinophyceae; large subunit rRNA; phylogeny; small subunit rRNA. Introduction Dinoflagellates are a diverse group of 4,000 species of single-celled protists, occupying marine and freshwater habitats and exploiting autotrophic, ARTICLE IN PRESS 1 Corresponding author; fax 61 2 93514119 e-mail [email protected] (S. Murray) 2 Present address: School of Biological Sciences A08, University of Sydney, NSW 2006, Australia & 2005 Elsevier GmbH. All rights reserved. doi:10.1016/j.protis.2005.05.003

Transcript of Improving the analysis of dinoflagellate phylogeny based on rDNA

ARTICLE IN PRESS

Protist, Vol. 156, 269—286, August 2005

1Correspondinfax 61 2 93514e-mail smurra2Present addUniversity of S

& 2005 Elsevdoi:10.1016/j

elsevier.de/protis

http://www.Published online date 3 August 2005

ORIGINAL PAPER

Improving the Analysis of Dinoflagellate Phylogenybased on rDNA

Shauna Murraya,1,2, Marten Flø Jørgensenb, Simon Y.W. Hoc, David J. Pattersona,d, andLars S. Jermiina,e

aThe University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanbDepartment of Phycology, Biological Institute, University of Copenhagen, Nørre Farigmagsgade 2D, 1353Copenhagen K, DenmarkcHenry Wellcome Ancient Biomolecules Centre, Department of Zoology, University of Oxford, Oxford OX13PS, United KingdomdThe Josephine Bay Paul Center in Comparative Molecular Biology and Evolution, Marine BiologicalLaboratory, 7 MBL Street, Woods Hole, Mass 02543, USAeSydney University Biological Informatics and Technology Centre (SUBIT), Medical Foundation Building K25,University of Sydney, NSW 2006, Australia

Submitted August 18, 2004; Accepted May 19, 2005Monitoring Editor: Karen A. Steidinger

Phylogenetic studies of dinoflagellates are often conducted using rDNA sequences. In analyses todate, the monophyly of some of the major lineages of dinoflagellates remain to be demonstrated.There are several reasons for this uncertainty, one of which may be the use of models of evolution thatmay not closely fit the data. We constructed and examined alignments of SSU and partial LSU rRNAalong with a concatenated alignment of the two molecules. The alignments showed severalcharacteristics that may confound phylogeny reconstruction: paired helix (stem) regions that containnon-independently evolving sites, high levels of compositional heterogeneity among some of thesequences, high levels of incompatibility (homoplasy), and rate heterogeneity among sites. Taking intoaccount these confounding factors, we analysed the data and found that the Gonyaulacales, a well-supported clade, may be the most recently diverged order. Other supported orders were, in theanalysis based on SSU, the Suessiales and the Dinophysiales; however, the Gymnodiniales andProrocentrales appeared to be polyphyletic. The Peridiniales without Heterocapsa species appearedas a monophyletic group in the analysis based on LSU; however, the support was low. Theconcatenated alignment did not provide a better phylogenetic resolution than the single genealignments.& 2005 Elsevier GmbH. All rights reserved.

Key words: Bayesian analysis; Dinophyceae; large subunit rRNA; phylogeny; small subunit rRNA.

g author;[email protected] (S. Murray)ress: School of Biological Sciences A08,ydney, NSW 2006, Australia

ier GmbH. All rights reserved..protis.2005.05.003

Introduction

Dinoflagellates are a diverse group of �4,000species of single-celled protists, occupying marineand freshwater habitats and exploiting autotrophic,

ARTICLE IN PRESS270 S. Murray et al.

heterotrophic, mixotrophic, parasitic, and symbio-tic modes of nutrition. Despite many phylogeneticstudies, the evolutionary history of the majordinoflagellate lineages remains uncertain. Thereare several reasons for this. Genes from fewer than150 species have so far been sequenced, with abias towards the photosynthetic taxa commonlyavailable in culture (Fensome et al. 1999). Mostestimates of their phylogeny have been based on asingle gene or gene region, which may or may notcontain a large number of phylogenetically infor-mative sites, and they rest implicitly on theassumption that the inferred phylogeny adequatelyrepresents the genome’s ‘true’ evolutionary history.However, phylogenetic estimates are not onlybiased by the choice and amounts of phylogeneticdata. Other factors that may confound estimates ofphylogeny are becoming increasingly well- known(see, for example, Sanderson and Schaffer 2002).Factors that appear to be most critical in assess-ments of dinoflagellate phylogeny are: non-inde-pendence of the evolution of sites, wherebynucleotide substitutions at some sites in a genemay be correlated with those at other sites withinthe gene (Hillis and Dixon 1991; Jow et al. 2002;Smith et al. 2004); saturation of nucleotide sub-stitutions, which involves repeated nucleotidesubstitutions at a given site within a gene (Ho andJermiin 2004); different substitution rates in differ-ent lineages, which can result in the grouping ofdivergent lineages irrespective of their evolutionaryhistory (Felsenstein 1978); unequal rates of sub-stitution at different sites within a gene (Lockhart etal. 1996; Yang 1993); and compositional hetero-geneity, which may group taxa due to the similarityof their nucleotide or amino acid contents ratherthan to their shared evolutionary history (for a briefreview, see Jermiin et al. 2004).

Most phylogenetic studies of dinoflagellateshave used parts of the ribosomal DNA (rDNA)array, which codes for ribosomal RNA, to inferevolutionary history. This includes the small sub-unit rDNA (SSU rDNA or 18/16S) (i.e. Edvardsen etal. 2003; Grzebyk et al. 1998; John et al. 2003;Litaker et al. 1999; Lopez-Garcıa et al. 2001;McNally et al. 1994; Saldarriaga et al. 2001, 2003b2004; Saunders et al. 1997), the large subunitrDNA (LSU rDNA or 28/26S), including the D1—D6domains (Daugbjerg et al. 2000; de Salas et al.2003; Flø Jørgensen et al. 2004a,; Zardoya et al.1995), internal transcribed spacer regions (ITS)(LaJeunesse 2001; Shao et al. 2004), and the D8domain (Lenaers et al. 1991). Recently, twostudies have also inferred phylogenies based ongenes encoding a- and b-tubulin, heat-shock

protein 90 (hsp90), and actin for a few dinofla-gellate taxa (Leander and Keeling 2004; Saldarria-ga et al. 2003b). Plastid genes have not been usedto infer the phylogeny of dinoflagellates for severalreasons: they are only present in about 50% oftaxa, and in those species, they may have beenlost and gained on multiple occasions (Saldarriagaet al. 2001; Schnepf and Elbrachter 1999) and theynow occur in mini-circles (Zhang et al. 1999).Cladistic studies based on matrices of morpholo-gical, ultrastructural, life-history, or other charac-ters have rarely been performed (Flø Jørgensen etal. 2004a; Roberts 1991; Roberts and Roberts1991).

There is considerable variation among the treesinferred in previous studies (Daugbjerg et al. 2000;Edvardsen et al. 2003; Flø Jørgensen et al. 2004b;Litaker et al. 1999; Saldarriaga et al. 2001, 2003a,2004; Saunders et al. 1997; Takishita et al. 2002),causing residual doubt about the evolutionaryrelationships among many dinoflagellates. In orderto differentiate between these phylogenetic trees,it is necessary to determine whether the substitu-tion models used to infer the trees were appro-priate to these studies. The evolution of rDNA isconstrained by selective forces that differ fromthose constraining the evolution of protein-codingDNA. RNA molecules possess a highly conservedsecondary structure, consisting of loop and stem(helix) regions, and substitutions on one strand ofa helix are correlated to compensatory substitu-tions on the other strand in order to maintain thehelical structure (Smith et al. 2004; Tillier andCollins 1998). Most phylogenetic programs as-sume that each site in a molecule evolvesindependently of the other sites, but this assump-tion is clearly violated by stem regions of RNAmolecules (Hillis and Dixon 1991; Jow et al. 2002;Smith et al. 2004).

Concatenated alignments of several genes mayhelp in resolving the phylogeny in cases wheresingle gene phylogenies provide insufficient reso-lution (Gontcharov et al. 2004). The greaternumber of nucleotide sites that can be reliablyaligned in combined analyses can lead to a morestrongly supported topology (Gontcharov et al.2004; Winchell et al. 2002). However, the analysisof multi-gene data sets may be difficult becausethe genes may have evolved under differentselective constraints (Gontcharov et al. 2004;Pupko et al. 2002; Seo et al. 2005; Yang 1996a).It is important that specific sets of model para-meters can be applied to the different genes(Gontcharov et al. 2004; Pupko et al. 2002; Seoet al. 2005), perhaps in conjunction with

ARTICLE IN PRESSImproving the Analysis of Dinoflagellate Phylogeny based on rDNA 271

multi-objective optimisation (Poladian and Jermiin2004, 2005).

In order to improve phylogenetic estimates ofthe evolution of the major groups of dinoflagel-lates, we have examined the characteristics ofalignments of SSU and partial LSU rDNA genes,and a concatenation of the SSU and LSUalignments, with respect to compositional hetero-geneity, substitutional saturation, and phyloge-netic incompatibility among sites. We alsosequenced part of the LSU rDNA gene fromOxyrrhis marina, a species thought to be either asister group to dinoflagellates or a very primitivedinoflagellate (‘‘pre-dinoflagellate’’ sensu Fen-some et al. 1993) in order to improve the taxonsampling. Based on results from these exploratorysurveys, the data were analysed phylogeneticallyusing methods that account appropriately forconfounding factors. Finally, we discuss someadditional issues that need to be resolved in order

Figure 1. z-score distributions obtained from pairwiseregions of the SSU rRNA; (B) loop regions of the SSU rRregions of the LSU rRNA. Each histogram shows the nu(6-5, 5-4, 4-3, etc.). For more information, see the main

to best estimate the phylogeny of this unusualgroup of organisms.

Results

The Assumption of CompositionalStationarity

Nucleotides encoding the stem and loop regionsof the SSU and LSU rRNA were analysedindependently in order to determine whetherdifferent regions of the alignments might haveevolved under different conditions. The sequenceswere compared in a pairwise manner, producingn � ðn � 1Þ=2 z scores (where n is the number ofsequences in the four alignments), and histogramsillustrating the z score distributions were producedfor the stem and loop regions of (i) the SSU rRNA(Fig. 1A,B) and (ii) the LSU rRNA (Fig. 1C,D).

comparisons of nucleotides coding for the: (A) stemNA; (C) stem regions of the LSU rRNA; and (D) loop

mber of z scores found within the different categoriestext.

ARTICLE IN PRESS

Figure 2. Compatibility plot, based on 730 parsi-moniously informative sites, of the SSU rRNAalignment. A similar compatibility plot was obtainedfor the LSU rRNA alignment (based on 594 parsi-moniously informative sites) (not shown here).

272 S. Murray et al.

If a pair of orthologous sequences has evolvedunder stationary conditions over infinite time, thenthe expected z score is 0.071.0; for smalleramounts of evolutionary time it is less than thatvalue. Notwithstanding the fact that each histo-gram in Figure 1 is a summary of comparisonsamong many related sequences and that theresults therefore cannot be interpreted in a strictstatistical sense, the histograms indicate thatsome of the sequences have evolved under non-stationary conditions, which is implied by thepreponderance of large z scores in each histo-gram. This suggests that the phylogenetic as-sumption of compositional stationarity is violatedand that there is an elevated chance of errors if thesequences are analysed using phylogenetic meth-ods that assume compositional stationarity.

We examined the z scores individually todetermine whether the high z scores involvedcomparisons of a small number of sequences; ifso, then removal of the ‘offending’ sequencescould be considered. The SSU rDNA sequencesof Cryptosporidium parvum, and to a lesser extentGonyaulax spinifera and Fragilidium subglobosum,produced the high z scores when compared tothose of the other taxa, so we removed thesequences of these species from the alignments.The LSU rDNA sequences of C. parvum, and to alesser extent Amphidinium herdmanii and A.steinii, produced the high z scores when com-pared to those of the other species, so weremoved the LSU rDNA sequence of C. parvumfrom the alignment. We decided initially to notremove the two Amphidinium sequences from thealignment, because A. herdmanii is also present inthe SSU alignment and a comparison of the twophylogenies may be significant. As there was anelevated chance of phylogenetic errors (Jermiin etal. 2004) in regard to the position of these twospecies, we interpreted the position of thesespecies in resultant phylogenies with caution.

Phylogenetic Compatibility among Sites

The compatibility plots (Fig. 2) highlighted twoimportant features of the alignments of the SSUrRNA and LSU rRNA sequences. Firstly, theyshowed that the incompatible sites are morecommon than the compatible sites (overall com-patibility is 0.484 for the SSU rRNA and 0.349 forthe LSU rRNA); therefore, we conclude that a largenumber of multiple substitutions have occurredduring the evolution of these sequences, implyingthat the use of phylogenetic methods that appro-

priately consider multiple substitutions at thesame site are necessary.

Secondly, the black and white pixels areclearly distributed in a significantly non-randommanner in the SSU rRNA (neighbour similarityscore ¼ 0.655; P o 0.0001) and the LSU rRNA(neighbour similarity score ¼ 0.662; Po0.0001)alignments, implying that the sites in the align-ments of these data cannot have evolved inde-pendently. This supports the idea that the stem-forming sites in these two rRNA genes may co-evolve, and that different regions of these geneshave evolved at different rates.

Optimal Substitution Models

Substitution models to analyse the data werechosen based on the above results, with theadditional aim of avoiding over-parameterisation.For the stem regions, the RNA 7D model (Tillierand Collins 1998), based on paired nucleotides,including invariant sites and five G-rate categories,was used (Table 3). Three rate parameters areused in this model (Table 3), where 1 ¼ a rate ofsingle transitions, 2 ¼ a rate of double transver-sions, 3 ¼ a mismatch rate. The optimal model forthe loop regions was determined to be a generaltime reversible (REV) model with invariant sites

ARTICLE IN PRESSImproving the Analysis of Dinoflagellate Phylogeny based on rDNA 273

and five G-rate categories. For the concatenatedalignment, a separate REV model was used foreach of the two stem categories (LSU and SSU),and a separate RNA7D model was used for eachof the two loop categories (LSU and SSU).

The optimum parameters for the substitutionmodels, as determined using mcmcphase, areshown in Table 3. The optimum G shape para-meters for all analyses were found to be low (Table3), suggesting that considerable among-site rateheterogeneity has occurred, and highlighting the

Figure 3. Phylogenetic tree based on LSU rRNA alignlengths estimated using maximum likelihood. Log likelProbabilities (PP), and values less than 50% are not sh

importance of the use of G-distribution model inthese analyses.

Trees

Phylogenetic trees with posterior probability (PP)values are shown in Figures 3—5. In general it wasfound that several of the internal edges in allanalyses were very short, with the consequencethat parts of the topology, particularly someinternal edges, were not well supported. The

ment, inferred using Bayesian analysis, with branchihood ¼ �13780.8879. Support values are Posteriorown. For more information, see the main text.

ARTICLE IN PRESS

Figure 4. Phylogenetic tree based on SSU rRNA alignment, inferred using Bayesian analysis, with branchlengths estimated using maximum likelihood. Log likelihood ¼ �18601.0167. Support values are PosteriorProbabilities (PP), and values less than 50% are not shown. For more information, see the main text.

274 S. Murray et al.

concatenated alignment (Fig. 5) did not appear toshow higher levels of support for internal edgesthan the single gene alignments (Figs 3,4). The D1-D6 regions of the LSU rDNA had a substitutionrate that was 4—8% faster than that of thecomplete SSU rDNA for those taxa studied, basedon a comparison of the total edge lengths fromTetrahymena pyriformis to Heterocapsa triquetra,Akashiwo sanguinea, Polarella glacialis, and Peri-dinium willei.

The dinoflagellates were found to be a mono-phyletic group with PP values of 94% in the SSUanalysis, 100% in the combined analysis, and63% in the LSU analysis (Figs 3—5). The sistergroup to the main dinoflagellate clade was foundto be Oxyrrhis marina in the LSU analysis (Fig. 3),with a PP support of 70%, and a clade comprisingthe three alveolates Toxoplasma gondii, Cyclos-

pora cercopitheci, and Eimeria tenella, with a PPsupport of 71%, in the analysis based on SSU(Fig. 4) (note that O. marina was excluded fromthis analysis). Of the dinoflagellates included in theanalyses, the earliest diverging were found to beNoctiluca scintillans in the SSU analysis, Akashiwosanguinea in the LSU analysis, and Karlodiniummicrum in the combined analysis.

Within the dinoflagellates, the order that was themost consistently supported clade in all analyseswas the Gonyaulacales, which was found to bemonophyletic with PP support of 98%, 97%, and100% in LSU, SSU, and combined analyses,respectively (Figs 3—5). The sister group relation-ship of the Gonyaulacales differed among theanalyses, and was only supported by a PP of morethan 50% in the SSU analysis, where the sistergroup consisted of a clade of the species

ARTICLE IN PRESS

Figure 5. Phylogenetic tree based on the combined alignment, inferred using Bayesian analysis, with branchlengths estimated using maximum likelihood. Log likelihood ¼ �17650.8779. Support values are PosteriorProbabilities (PP), and values less than 50% are not shown. The numbers on some branches refer to knowncharacter states that are exclusive to a particular clade and present in every member of that clade, based ona cladistic study of 39 characters (Flø Jørgensen et al. 2004a). Character no. 6 is tabulation, with the stategonyaulacoid; character no. 14 is apical groove direction, with the state, anticlockwise encircling the apex;and character no. 3 is epicone deflection, with the state, deflected to the left.

Improving the Analysis of Dinoflagellate Phylogeny based on rDNA 275

‘Amphidinium asymmetricum’ and ‘Amphidiniumbritannicum’. We consider these two species to beunrelated to the genus Amphidinium, and mostlikely closely related to the species of Togulaincluded in the LSU analysis (see Flø Jørgensen etal. 2004b). Other clades that were well supportedin all analyses were the genera Gymnodinium (PP

support of 100%, 51%, and 100% in the LSU,SSU, and combined analyses, respectively) andKarenia (PP support of 100% in all analyses).However, the order Gymnodiniales was not foundto form a monophyletic clade in any analysis.

In the SSU analysis, other orders that weresupported as monophyletic clades were the

ARTICLE IN PRESS276 S. Murray et al.

Dinophysiales (Fig. 4, PP support of 100%) andthe Suessiales, represented by Polarella andSymbiodinium (Fig. 4, PP support of 100%). Theseclades could not be assessed on the basis of theLSU analysis, as only one species was included.The monophyly of the Prorocentrales could alsoonly be assessed based on the SSU analysis, asthe LSU included only closely related planktonicspecies. The species of Prorocentrum includedwere found to be polyphyletic (Fig. 4).

In the LSU analysis, species of the Peridiniales,including Pfiesteria piscicida, but excluding thetwo Heterocapsa species, were found to form amonophyletic clade; however, the PP support wasless than 50% (Fig. 3). This grouping was notsupported in the SSU tree.

A few taxa appeared on long branches relativeto the rest of the tree, most notably O. marina inthe LSU analysis. As the presence or absence ofthis species did not affect the rest of thephylogeny in further analyses, this was notconsidered to be problematic. Analyses of theconcatenated and the SSU alignments both withand without O. marina showed that this speciesappeared on very long branches as a sister groupto either Polarella or Amyloodinium, two otherwisevery different species, and its presence affectedthe overall phylogeny (results not shown). Ana-lyses of the SSU and combined alignments with-out this species were therefore preferred.

Character states that were exclusive to parti-cular clades and known to be present in everymember of that clade are shown on the combinedanalysis (Fig. 5). The clade of Gonyaulacales wasdefined by the character tabulation, with the stategonyaulacoid; the clade of Gymnodinium specieswas defined by the character apical groovedirection, with the state, anticlockwise encirclingthe apex; and the clade of Amphidinium specieswas defined by the character of epicone deflec-tion, with the state, deflected to the left.

Discussion

Dinoflagellate rDNA alignments were found tohave several features that required careful choiceof phylogenetic analysis methods in order to avoidmisleading results, due to the use of an inap-propriate model of evolution. These were, firstly,the fact that the stem regions consisted of pairedsites that did not appear to evolve independentlyand needed to be analysed as one unit, to avoid‘‘double counting’’; secondly, a relatively high levelof compositional heterogeneity, suggesting that

taxa for inclusion in analyses should be screenedand chosen carefully, or another method robust tocompositional differences should be used; thirdly,high levels of incompatibility, indicating that multi-ple substitutions have occurred at many sites inthe alignment, which, in turn, implies that thosesites are not suitable for analysis with methodsassuming low levels of multiple substitutions orlow evolutionary rates, such as maximum parsi-mony or uncorrected distances (Ho and Jermiin2004); and finally, high neighbour similarity scoresand low G shape parameters, indicating consider-able among-site rate heterogeneity (Yang 1996b),suggesting that a G model should be used toaccount for this.

Many of the internal edges were very short inthe phylogenies found here, as well as those foundin previous studies (Edvardsen et al. 2003; FløJørgensen et al. 2004a; Leander and Keeling2004; Saldarriaga et al. 2001; Saunders et al.1997). If, as is indicated by this study, the truedinoflagellate phylogeny contains many shortinternal edges, possibly indicating a rapid radia-tion of taxa (Philippe et al. 1994), then it isparticularly important that the most suitableevolutionary model be applied. The use of aninappropriate evolutionary model is more likely tolead to erroneous conclusions in cases in whichinternal edges are very short (Jermiin et al. 2004).The choice of the evolutionary model may have aconsiderable impact on the topology of theinferred phylogeny in the case of dinoflagellates.

Comparison of Phylogenies Generated herewith Previous Phylogenies

The phylogenies inferred in this study using thetwo different genes differed from one another;however, they also had elements in common witheach other and with previous studies (Edvardsenet al. 2003; Grzebyk et al. 1998; John et al. 2003;Litaker et al. 1999; Lopez-Garcıa et al. 2001;McNally et al. 1994; Saldarriaga et al. 2001,2003a, 2004; Saunders et al. 1997). The findingof high support for the monophyly of dinoflagel-lates is consistent with studies based on rDNAand protein-coding genes, as well as our under-standing based on traditional morphologicalmethods (Edvardsen et al. 2003; Fensome et al.1993; Flø Jørgensen et al. 2004a; Leander andKeeling 2004; Saldarriaga et al. 2001; Saunders etal. 1997). In the LSU analysis, Oxyrrhis marina wasfound to be either the closest sister taxon todinoflagellates of the taxa included, or the most

ARTICLE IN PRESSImproving the Analysis of Dinoflagellate Phylogeny based on rDNA 277

basal dinoflagellate of those included. In phylo-genies recently generated using hsp90, actin, anda- and b-tubulin, the position of O. marina wasinterpreted to be a sister group to dinoflagellates,rather than the most basal dinoflagellate of thoseincluded (Leander and Keeling 2004; Saldarriagaet al. 2003b). Oxyrrhis marina possesses an SSUrDNA sequence that differs substantially fromrelated taxa. Its evolutionary relationships cannotbe resolved on the basis of this SSU rDNAsequence, as has also been previously found(Saldarriaga et al. 2003a), and it is therefore notrecommended for use in future studies.

In the LSU analysis, no members of the parasiticBlastodiniales or other purportedly basal dinofla-gellates were included in the analysis, and A.sanguinea was found to be a well-supported sistergroup to all other included dinoflagellates. Theapparently basal position of A. sanguinea needs tobe confirmed in analyses including a greaternumber of LSU sequences of other probablebasal dinoflagellates, as it was not supported bythe results of the SSU or combined rDNA analysis.

A common element in all phylogenies was thatthe Gonyaulacales were a well-supported clade.SSU rDNA sequence data only were available forCrypthecodinium cohnii, and in this analysis itappeared to be a sister group to Haplozoon, albeiton a relatively long branch (Fig. 4). However, itsmorphological characteristics suggest that it ismost likely a member of the Gonyaulacales(Fensome et al. 1993; Saldarriaga et al. 2001). Ithas also branched inside the clade of Gonyaula-cales in other analyses based on SSU rDNA thatincluded a greater number of species of Gonyau-lacales (Edvardsen et al. 2003; Saldarriaga et al.2001); therefore, its position outside of theGonyaulacales may indicate that taxon samplingin this study was not sufficient to resolve itsposition. Further gene sequence data, for examplethe LSU rDNA of this species, may need to becollected before its position can be determinedwith greater certainty.

The Gymnodiniales was not found to be mono-phyletic in any analysis, in common with previousphylogenetic studies using rDNA (Daugbjerg et al.2000; Edvardsen et al. 2003; Flø Jørgensen et al.2004a,; Litaker et al. 1999; Saldarriaga et al. 2001,2003a; Saunders et al. 1997). The genera ofGymnodiniales included in the analyses weregenerally monophyletic; Gymnodinium sensustricto (see Daugbjerg et al. 2000), Karenia andTogula all appeared to be monophyletic with somesupport. Amphidinium sensu stricto (see FløJørgensen et al. 2004a) was monophyletic in the

analysis based on LSU, however, A. herdmanii didnot branch within the clade of Amphidiniumspecies in the analysis based on SSU, but insteadwas sister group to the Karenia species with a PPsupport of 92%. This finding is surprising andunexpected, and we have no explanation for it. Ingeneral, the unarmoured state appears to haveevolved multiple times within the dinoflagellates(Saldarriaga et al. 2001).

The Peridiniales has not been recovered as amonophyletic group in previous analyses basedon SSU rDNA, and it has been considered asprobably polyphyletic, despite morphological sy-napomorphies existing for the group (Fensome etal. 1993). In the analysis based on LSU rDNA, thePeridiniales, without the genus Heterocapsa,appeared to be monophyletic, but the supportwas less than 50%. The clade of species ofHeterocapsa (and Cachonina, in the SSU rDNAanalysis) was found to diverge relatively early inthe dinoflagellate lineage in all analyses, and to bea well-supported clade. Heterocapsa speciespossess body scales (Pennick and Clarke 1977),a morphological synapomorphy that is relativelyunusual among the dinoflagellates; the trait maysupport its disparate position in comparison toother Peridiniales.

Very few of the clades found in the combinedanalysis were supported by an exclusive characterstate that was known to be present in everymember of the clade, based on a previouslypublished matrix of morphological, ultrastructural,and life-history characters (Flø Jørgensen et al.2004a). However, the clades that were supportedby such a character, the Gonyaulacales, and thegenera Gymnodinium and Amphidinium, were alsoamong the few clades supported by 100% PPsupport. Cladistic analysis of dinoflagellates hasbeen rarely performed and character matrices arerelatively undeveloped as yet, so it may beexpected that further development will lead tomore valuable insights.

Comparison Between the PhylogeniesBased on the Different Alignments

It has been argued that combined analyses ofseveral genes can be superior to single-geneanalyses because they contain more phylogeneticinformation, and because certain taxa may have afast evolutionary rate in one gene but not inanother, and so the use of both genes maymitigate the detrimental effect of a single divergentsequence (Gontcharov et al. 2004). In our analysis,

ARTICLE IN PRESS278 S. Murray et al.

the multi-gene approach did not offer moreresolution than the single-gene trees. This maybe due to the limited taxon set in the combinedanalysis. In this study, the use of a concatenatedalignment did not resolve the position of O. marina(results not shown) and this species was thereforeremoved from this analysis.

In this study, it was found that the D1—D6regions of the LSU rDNA had a substitution ratethat was 4—8% faster than that of the completeSSU rDNA. This difference is only marginal;however, it may indicate that the D1—D6 regionof the LSU rDNA gene may be slightly moreuseful for resolving genus and species leveldivergence than the SSU rDNA gene. Studiescomparing phylogenies based on the LSU gene tothose generated using the SSU gene in othergroups of taxa, such as stramenopiles andvertebrates, also found that the LSU gene appearsto evolve faster, and this, combined with itsadditional length, was thought to give rise toimproved phylogenies in cases where investiga-tions did not intend to explore deep divergences,for example, relationships between different phyla(Van der Auwera and de Wachter 1998; Winchellet al. 2002).

Remaining Problems: Taxon Sampling andTaxon Identification

A problem that remains for estimating the phylo-geny of dinoflagellates based on molecular genet-ic data is that the variety of taxa that have beensequenced is still limited, with a bias towardsphotosynthetic species available in culture. Phy-logenies generated using SSU sequences havemore thorough taxon sampling than those gener-ated using LSU: complete or partial SSU se-quences of species of about 45 dinoflagellategenera were available in GenBank (as of July2004), as opposed to LSU sequences of speciesof about 25 dinoflagellate genera. However, thereare about 210 genera of living dinoflagellates, withseveral new genera described each year in recentyears (see, for example, http://microscope.mbl.edu).Despite more than half of all dinoflagellates beingheterotrophic or mixotrophic (Gaines and Elbrach-ter 1987), genera including heterotrophic ormixotrophic species that have been sequencedconstitute only 10 of the 45 genera represented bySSU sequences available in GenBank and 6 of the25 genera represented by LSU sequences avail-able in GenBank. This is due to the difficulty inculturing heterotrophic or mixotrophic species

(Fensome et al. 1999; Saldarriaga et al. 2001).Culturing autotrophic dinoflagellates is alsonot always possible and requires large timeinvestment.

Techniques for the isolation and amplification ofDNA from single cells or small numbers of cellshave been developed recently (Bolch 2001;Edvardsen et al. 2003; Flø Jørgensen et al.2004a; Lopez-Garcıa et al. 2001; Marin et al.2001; Saldarriaga et al. 2001), enabling sequencesto be determined directly from environmentalsamples. This may be the only method availableto obtain molecular genetic data from the manyessentially unculturable dinoflagellate taxa; how-ever, its disadvantage is that the referenceorganism becomes no longer available to deter-mine or verify organism identity, ultrastructural,physiological, ecological, or other features, or tosequence other genes. The possibility that DNAsequences, freely available via GenBank, mayhave erroneous species names attached to thembased on mis-identifications will become greaterwith the use of these methods. The use ofsequences from mis-identified taxa in phyloge-netic inference could cause considerable confu-sion. A related problem is that several sequencesavailable in GenBank, that may be close relativesof dinoflagellates or basal dinoflagellates, and soparticularly important in understanding dinoflagel-late phylogeny, have no name attached to them.For example, studies based on partial SSUsequencing of PCR amplifications directly from0.2mm filtered deep water environmental samplesfound many unique and highly variable sequencesof two clades that appeared to be sister groupsbetween dinoflagellates and perkinsids (Lopez-Garcıa et al. 2001; Moon-van der Staay et al.2001). Despite their apparent significance, theusefulness of such sequences in phylogenies islimited when it is not known whether theyrepresent new or already described taxa, and ifso, which.

In taxonomy, the species is founded on a type,and studies that require some degree of certaintyabout the identity of an organism usually involvehaving a voucher or reference specimen. Formolecular genetic data to be most useful, it shouldalso be tied to an organism name and a method ofverifying that identification. This information isprovided by cultures freely available in culturecollections, but could also be provided by anothermethod, such as documenting the actual cell/sused in the PCR reaction, as has been done insome studies (e.g., Edvardsen et al. 2003; Takanoand Horiguchi 2004).

ARTICLE IN PRESS

Table 1. GenBank accession numbers for LSUsequences used in the analysis

Species GenBankAccessionnumber

Akashiwo sanguinea AF260397Alexandrium catenella AF200667Alexandrium fundyense AF200666Alexandrium tamarense AF200668Amphidinium carterae AY460578Amphidinium herdmanii AY455675Amphidinium incoloratum AY455677Amphidinium operculatum AY460591Amphidinium semilunatum AY455678Amphidinium steinii AY455673Ceratium lineatum AF260391Ceratium tripos AF260389Cryptosporidium parvum AF040725Dinophysis norvegica AY571375Eimeria tenella AF026388Euplotes aediculatus AF223571Gonyaulax spinifera AF260388Gymnodinium catenatum AF200672Gymnodinium chlorophorum AF200669Gymnodinium fuscum AF200676

Improving the Analysis of Dinoflagellate Phylogeny based on rDNA 279

Conclusion

There are several factors that may cause ambi-guity in studies of dinoflagellate evolution basedon comparative molecular genetic sequences. Inthis study, we have focused on the implications ofthe aspects of dinoflagellate rDNA for phyloge-netic analyses. Dinoflagellate rDNA has differentrates of evolution in different regions, containingstem regions with sites that do not evolveindependently; it may be significantly composi-tionally heterogeneous and contains a high level ofhomoplasy. These factors are particularly impor-tant if, as indicated by the results of this study, theedge lengths in the ‘true’ dinoflagellate phylogenyare short. Other causes of uncertainty in inferringdinoflagellate phylogenies based on moleculargenetic data remain. Ideally, taxa used in phylo-genies should reflect the diversity of the group,including taxa thought to be basal or unusual.Methods for verifying taxon identity are alsoimportant. Improved methods are being devel-oped, and should in the future help to resolvesome of these issues hindering our current under-standing of the evolution of dinoflagellates.

Gymnodinium pellucidum AY455681Heterocapsa rotundata AF260400Heterocapsa triquetra AF260401Karenia brevis AF200677Karenia mikimotoi AF200682Karlodinium micrum AF200675Peridiniella catenata AF260398Peridinium bipes AF260385Peridinium pseudolaeve AF260395Peridinium willei AF260384Pfiesteria piscicida AY112746Polarella glacialis AY036081Prorocentrum cordatum AF260379Prorocentrum mexicanum AF260378Prorocentrum micans AF260377Protoceratium reticulatum AF260386Scrippsiella trochoidea AF260393Tetrahymena pyriformis X01533Togula jolla AY568559Togula britannica AY455679Toxoplasma gondii X75429Woloszynskia pseudopalustris AF260402

Methods

Initially, we surveyed the SSU and LSU rDNAsequences of dinoflagellates available in GenBankwith the aims to construct an alignment with themaximum taxonomic diversity and to includerepresentative species of the main orders ofdinoflagellates where possible. Sequences thatwere substantially shorter than the majority (e.g.,containing only the D1—D2 regions of the LSUrDNA) were excluded, and some sequences wereexcluded from the SSU data set so that it wouldnot be too large for the available computingpower. In order to improve the taxon sampling atthe base of the tree, we also sequenced the LSUrDNA of Oxyrrhis marina. This species wasobtained from the North-Eastern Pacific CultureCollection (strain NEPCC1) and the DNA wasextracted using a CTAB method, as described inFlø Jørgensen et al. (2004a). An �1300 bp frag-ment of the LSU rDNA was amplified using theprimers and PCR reaction mix described in FløJørgensen et al. (2004a) (GenBank accession no.AY460596). In total, 43 taxa were included in thealignment of LSU rDNA and 55 taxa were includedin the alignment of SSU rDNA (Tables 1 and 2). Weused two ciliates (Tetrahymena pyriformis, Eu-plotes aediculatus) and three apicomplexans

(Cryptosporidium parvum, Toxoplasma gondii,Eimeria tenella) to root the phylogeny inferredusing LSU rDNA, and two ciliates (Tetrahymenapyriformis, Paramecium bursaria), four apicom-plexans (Cryptosporidium parvum, Toxoplasma

ARTICLE IN PRESS

Table 2. GenBank accession numbers for SSUsequences used in the analysis

Species GenBankAccessionnumber

‘Amphidinium asymmetricum’ AF274250‘Amphidinium corpulentum’ AF274252Adenoides eludens AF274249Akashiwo sangiunea AF276818Alexandrium tamarense AF022191Alexandium catenella AJ535392Amphidinium carterae AF009217Amphidinium gibbosum L13719Amphidinium herdmanii AF274253Amphidinium semilunatum AF274256Amyloodinium ocellatum AF080096Cachonina hallii AF033865Ceratium fusus AF022153Ceratocorys horrida AF022154Crypthecodinium cohnii M64245Cryptosporidium parvum AF093494Cyclospora cercopitheci AF111184Dinophysis acuminata AJ506972Dinophysis norvegica AF239261Eimeria tenella U67121Fragilidium subglobosum AF033869Glenodiniopsis steinii AF274257Gloeodinium viscum L13716Gonyaulax spinifera AF022155Gymnodinium catenatum AF022193Gymnodinium fuscum AF022194Halostylodinium arenarium AB036837Haplozoon axiothellae AF274264Heterocapsa triquetra AF022198Karenia brevis AF274259Karenia mikimotoi AF022195Karlodinium micrum AF172712Kryptoperidinium foliaceum AF274268Lepidodinium viride AF022199Lessardia elongata AF521100Lingulodinium polyedrum AF274269Noctiluca scintillans AF022200Oxyrrhis marina AF482425Paramecium bursaria AF100314Pentapharsodinium tyrrenicum AF022201Peridinium willei AF274272Perkinsus marinus X75762Pfiesteria piscicida AF080098Polarella glacialis AF09983Prorocentrum lima Y16235Prorocentrum mexicanum Y16232Prorocentrum cordatum Y16238Prorocentrum micans M14649Protoceratium reticulatum AF274273Pyrocystis noctiluca AF022156Roscoffia capitata AF521101

Table 2. (continued )

Species GenBankAccessionnumber

Scrippsiella nutricula U52357Symbiodinium microadriaticum M88521Tetrahymena pyriformis X56171Thoracosphaera elongata AF274278Toxoplasma gondii L31520

280 S. Murray et al.

gondii, Eimeria tenella, Cyclospora cercopitheci),and a perkinsid (Perkinsus marinus) to root thephylogeny inferred using SSU rDNA. Due to thetaxonomic discordance between the two datasets, the concatenated data only included 27species, including the three outgroup taxa (Cryp-tosporidium parvum, Toxoplasma gondii, Eimeriatenella).

As the SSU rDNA of Oxyrrhis marina was verydifferent from those of the other taxa, we alsoexamined alignments without this species todetermine its effect on the inferred topology; ifits presence affected the topology, then analyseswithout the species were preferred. In the case ofthe species Akashiwo sanguinea, Amphidiniumcarterae, Pfiesteria piscicida, and Alexandriumcatenella, the concatenated alignment consistedof sequences from two different strains represent-ing the LSU rDNA and the SSU rDNA, respectively.This probably makes the data set invalid fordetailed species level studies; however, as thisstudy focused on relationships at a higher level,this was not believed to pose a significantproblem.

The secondary structures of the D1—D6 regionof the LSU rRNA and the SSU rRNA of Cryptos-poridium parvum, Euplotes aediculatus, Eimeriatenella (outgroup taxa), and Prorocentrum micans(a dinoflagellate) were obtained from the EuropeanRibosomal Database (Wuyts et al. 2001). Thesecondary structure of the rRNA from otherdinoflagellates was assumed to be equivalent tothat of the homologous sites in Prorocentrummicans. The information on the secondary struc-ture was manually converted into a format appro-priate for the programs used to analyse the data.As the secondary structure of the hypervariableD2 region of the LSU rDNA is unreliable, and asthis region was difficult to align with certainty, itwas not used further in analyses. Sequences were

ARTICLE IN PRESSImproving the Analysis of Dinoflagellate Phylogeny based on rDNA 281

aligned by eye using the secondary structure as areference, producing alignments of 1,233 (LSUrDNA), 1,850 (SSU rDNA), and 3,001 (concate-nated alignment) nucleotides. Stem and loopregions were sometimes analysed separately inorder to obtain information about confoundingfactors (see below). The alignments, with second-ary structure information included, are available onrequest.

Prior to the phylogenetic analyses, the align-ments were surveyed for evidence of phylogen-etically confounding factors. In particular, we wereconcerned about violation of the phylogeneticassumption of compositional stationarity and lackof independence among some of the sites in thealignments (mainly those forming helices in theRNA molecules). When compositional differencesare found among nucleotide sequences, it impliesthat they have evolved under different conditions.Most phylogenetic methods assume that thesequences have evolved under the same sub-stitution model and are likely to err if thisassumption is violated by the data. Following areview of methods to detect violation of thephylogenetic assumption of compositional statio-narity (Jermiin et al. 2004), we employed Bow-ker’s, (1948) matched-pairs test of symmetry toidentify sequences violating the assumption ofcompositional stationarity: those sequences foundto be in violation of this assumption could then beremoved. The degree of violation of the assump-tion of compositional stationarity is expressed in zscores (i.e., the standard Normal value that isequivalent to a w2 value, given the degrees offreedom); negative values of z imply that se-quences have evolved under stationary condi-tions; high positive values of z (e.g., X4) imply thatsequences have evolved under non-stationaryconditions.

When orthologous sequences evolve from theircommon ancestor over a considerable period oftime, they are likely to contain sites that haveundergone multiple nucleotide replacements.When such sites are present in an alignment, thephylogenetic signals at different sites may havebecome incompatible. Thus, the simplest phylo-genetic explanation for the variation at one site,among all the sequences in an alignment, maydiffer from the simplest phylogenetic explanationfor the variation at another site. Such sites arethen said to have phylogenetically incompatiblesignals. We determined the level of compatibility inthe alignments by using the program reticulate(Jakobsen and Easteal 1996). It produces acompatibility plot, in which each pixel in the plot

represents a pairwise comparison. White pixelscorrespond to compatible sites and black pixelscorrespond to incompatible sites. The programalso calculates an overall compatibility scorebetween 1.0 and 0.0, where 1.0 indicates that allthe sites are compatible, and 0.0 indicates thatnone of the sites are compatible. The compatibi-lilty plot is a graphical representation of thephylogenetic compatibility of sites in an alignment.Assuming that the sequences have evolved from acommon ancestor without any reticulation, thecompatibility score is an approximate measure ofthe amount of multiple substitutions that may haveoccurred since the taxa evolved from their lastcommon ancestor. Compatibility plots are alsouseful for identifying sections of an alignment thathave higher amounts of multiple substitutions thanother sections. The program reticulate also calcu-lates a neighbour similarity score that measuresthe degree of compatibility in adjacent columns (orrows) in the compatibility plot. This shows whetherthe incompatible sites are distributed randomlyalong the length of the alignment or are clumpedwithin certain areas.

PHASE is a package of programs developed forphylogenetic studies of RNA (Jow and Gowri-Shankar 2003). PHASE 1.2 provides a wide rangeof substitutional models, such as the Jukes-Cantor model, general time reversible model,two-parameter models for recoded nucleotides,as well as models for stem regions of RNA thattreat paired sites as the fundamental unit ratherthan single sites. In principle, there are 16 possiblepairs of sites that can be formed with four bases,but in practice, only six occur frequently (AU, GU,GC, UA, UG, CG) (Jow et al. 2002), and theremaining mismatch pairs can be grouped to-gether in a seventh category (here shown as AA).Models also allow for a proportion of sites to beinvariant and for rate variation across sites to bemodelled using a G distribution. Prior to analysesusing PHASE, the optimal substitutional modelsfor the loop regions of the rDNA data weredetermined using the program MODELTEST (Po-sada and Crandall 1998), which calculates the loglikelihood of 56 different substitution models anduses hierarchical likelihood ratio tests (hLRTs) andthe Akaike information criterion (AIC) to determinethe best model for analysing the alignment.

The program mcmcphase, contained in thePHASE package, uses a Bayesian approach toinfer phylogenetic trees. Using a Markov ChainMonte Carlo (MCMC) approach, it allows for acomputationally efficient characterization of largetrees and parameter spaces that would normally

ARTICLE IN PRESS282 S. Murray et al.

be intractable using a maximum-likelihood analy-sis (for a more detailed description of thedifferences, see Felsenstein 2004). Furthermore,Bayesian inference produces posterior probabil-ities for each clade in the tree, which denote thetrue probabilities of each clade occurring. Thepriors used in mcmcphase analyses are simplefactorised priors (for more information see Jowand Gowri-Shankar 2003). For each MCMCanalysis, the burn-in period covered 50,000cycles, after which trees were sampled every 20steps from the following 2,000,000 generations.The chains were checked for stationarity by visualinspection of log-likelihood plots. A majority-ruleconsensus tree was computed, from the 100,000trees sampled this way, using the programconsense from the PHYLIP package (Felsenstein1993).

Table 3. Models and parameters for models for the thr

Alignment Frequencies Rate ratiosa

LSU

REV model(loop regions)

F(A) ¼ 0.33423 1 ¼ 0.58858F(C) ¼ 0.15021 2 ¼ 0.52786F(G) ¼ 0.26520 3 ¼ 0.57247F(T) ¼ 0.25036 4 ¼ 5.57449

5 ¼ 0.66508RNA7D model(stem regions)

F(AU) ¼ 0.16411 1 ¼ 4.45893F(CG) ¼ 0.15770 2 ¼ 0.08547F(GC) ¼ 0.18807 3 ¼ 2.41156F(GU) ¼ 0.09038F(UA) ¼ 0.16971F(UG) ¼ 0.07473F(AA) ¼ 0.15531

SSU

REV model(loop regions)

F(A) ¼ 0.32374 1 ¼ 0.48362F(C) ¼ 0.17494 2 ¼ 0.35534F(G) ¼ 0.20455 3 ¼ 0.37590F(T) ¼ 0.29677 4 ¼ 2.28686

5 ¼ 0.41246RNA7D model(stem regions)

F(AU) ¼ 0.16271 1 ¼ 3.36125F(CG) ¼ 0.17980 2 ¼ 0.08576F(GC) ¼ 0.12964 3 ¼ 1.36588F(GU) ¼ 0.07862F(UA) ¼ 0.17351F(UG) ¼ 0.09354F(AA) ¼ 0.18219

LSU+SSU

REV model F(A) ¼ 0.35083 1 ¼ 0.89260F(C) ¼ 0.13493 2 ¼ 0.39977

The edge lengths of the consensus tree wereestimated using the optimise program of PHASE1.2 (Jow and Gowri-Shankar 2003). This programuses maximum likelihood to estimate the edgelengths, using the same parameters, respectively,as were previously used in the MCMC analysis(Table 3).

Character states from a cladistic analysis of 39dinoflagellate ultrastructural, morphological, andlife-history features (Flø Jørgensen et al. 2004a)were superimposed onto the combined phylogenywhen a character state was found to be exclusiveto a particular clade and present in every memberof that clade. Descriptions of characters and adiscussion of each character are given in FløJørgensen et al. (2004a) and at http://www.bi.ku.dk/staff/nielsd/morphologyrefs.htm.

ee data sets.

G shapeparameterb

Proportion ofinvariantsites

Parsimoniouslyinformativesites

0.438820 0.168606 293

1.101158 0.105628 310

0.484733 0.229407 374

0.509519 0.143693 379

0.428391 0.224594

ARTICLE IN PRESS

Table 3. (continued )

Alignment Frequencies Rate ratiosa G shapeparameterb

Proportion ofinvariantsites

Parsimoniouslyinformativesites

F(G) ¼ 0.25649 3 ¼ 0.92182F(T) ¼ 0.25775 4 ¼ 6.76566

5 ¼ 0.89320RNA7D model 1 F(AU) ¼ 0.15924 1 ¼ 8.39316 1.355542 0.246910

F(CG) ¼ 0.19195 2 ¼ 0.07724F(GC) ¼ 0.18842 3 ¼ 3.60041F(GU) ¼ 0.07863F(UA) ¼ 0.18854F(UG) ¼ 0.06797F(AA) ¼ 0.12526

REVmodel 2 F(A) ¼ 0.33175 1 ¼ 0.59853 0.513818 0.332096F(C) ¼ 0.15254 2 ¼ 0.37018F(G) ¼ 0.21928 3 ¼ 0.59488F(T) ¼ 0.29642 4 ¼ 3.05373

5 ¼ 0.50737RNA7D model 2 F(AU) ¼ 0.16549 1 ¼ 4.75430 0.647844 0.147278

F(CG) ¼ 0.18008 2 ¼ 0.06436F(GC) ¼ 0.17250 3 ¼ 1.89574F(GU) ¼ 0.09065F(UA) ¼ 0.15011F(UG) ¼ 0.08428F(AA) ¼ 0.15689

aFor an explanation see text.bThe G shape parameter is used to model rate variation among sites.

Improving the Analysis of Dinoflagellate Phylogeny based on rDNA 283

Acknowledgements

We thank Vivek Gowri-Shankar for helpful adviceon the use of PHASE, and Y. Fukuyo and Y.Nagahama for providing infrastructure during thewriting of this manuscript. We thank the AustralianBiological Resources Study and Japan Society forthe Promotion of Science for partly supporting thiswork. This is research paper #010 from SUBIT.

References

Bolch CJS (2001) PCR protocols for geneticidentification of dinoflagellates directly from singlecysts and plankton cells. Phycologia 40: 162—167

Bowker AH (1948) A test for symmetry in con-tingency tables. J Am Stat Assoc 43: 572—574

Daugbjerg N, Hansen G, Larsen J, Moestrup Ø(2000) Phylogeny of some of the major genera ofdinoflagellates based on ultrastructure and partialLSU rDNA sequence data including the erection of

three new genera of unarmoured dinoflagellates.Phycologia 39: 302—317

de Salas M, Bolch CJS, Botes L, Nash G,Wright SW, Hallegraeff GM (2003) Takayamagen. nov. (Gymnodiniales, Dinophyceae), anew genus of unarmoured dinoflagellateswith sigmoid apical grooves, including the des-cription of two new species. J Phycol 39:1233—1246

Edvardsen B, Shalchian-Tabrizi K, Jakobsen KS,Medlin L, Dahl E, Brubak S, Paasche E (2003)Genetic variability and molecular phylogeny ofDinophysis species (Dinophyceae) from Norwegianwaters inferred from single cell analyses of rDNA. JPhycol 39: 395—408

Felsenstein J (1978) Cases in which parsimony orcompatibility methods will be positively misleading.Syst Zool 27: 401—410

Felsenstein J (1993) PHYLIP (Phylogeny inferencepackage). Seattle, University of Washington

Felsenstein J (2004) Inferring Phylogenies. SinauerAssociates, Inc, Sunderland, USA 664pp

ARTICLE IN PRESS284 S. Murray et al.

Fensome RA, Taylor FJR, Norris G, Sarjeant WAS,Wharton DI, Williams GL (1993) A Classification ofLiving and Fossil Dinoflagellates. Micropaleontologyspecial publication number 7, American Museum ofNatural History 351pp

Fensome RA, Saldarriaga JF, Taylor FJR (1999)Dinoflagellate phylogeny revisited: reconciling mor-phological and molecular based phylogenies. Grana38: 66—80

Flø Jørgensen M, Murray S, Daugbjerg N (2004a)Amphidinium revisited I: redefinition of Amphidinium(Dinophyceae) based on cladistic and molecularphylogenetic analyses. J Phycol 40: 351—365

Flø Jørgensen M, Murray S, Daugbjerg N (2004b)A new genus of athecate interstitial dinoflagellates,Togula gen. nov., previously encompassed withinAmphidinium sensu lato — inferred from light andelectron microscopy and phylogenetic analyses ofpartial LSU rDNA sequences. Phycol Res 52:284—299

Gaines G, Elbrachter M (1987) HeterotrophicNutrition. In Taylor FJR (ed) The Biology of Dino-flagellates. Blackwell Scientific Publications, Oxford,pp 224—268

Gontcharov AA, Marin B, Melkonian M (2004) Arecombined analyses better than single gene phylo-genies? A case study using SSU rDNA and rbcLsequence comparisons in the Zygnematophyceae(Streptophyta). Mol Biol Evol 21: 612—624

Grzebyk D, Sako Y, Berland B (1998) Phylogeneticanalysis of nine species of Prorocentrum (Dinophy-ceae) inferred from 18S ribosomal DNA sequences,morphological comparisons, and description ofProrocentrum panamensis, sp nov. J Phycol 34:1055—1068

Hillis DM, Dixon MT (1991) Ribosomal DNA:molecular evolution and phylogenetic inference. QRev Biol 66: 411—453

Ho SYW, Jermiin LS (2004) Tracing the decay of thehistorical signal in biological sequence data. SystBiol 53: 623—637

Jakobsen IB, Easteal S (1996) A program forcalculating and displaying compatibility matrices asan aid in determining reticulate evolution in mole-cular sequences. CABIOS 12: 291—295

Jermiin LS, Ho SYW, Ababneh F, Robinson J,Larkum AWD (2004) The biasing effect of composi-tional heterogeneity on phylogenetic estimates maybe underestimated. Syst Biol 53: 638—643

John U, Fensome RA, Medlin LK (2003) Theapplication of a molecular clock based on molecular

sequences and the fossil record to explain biogeo-graphic distributions within the Alexandrium tamar-ense ‘species complex’ (Dinophyceae). Mol Biol Evol20: 1015—1027

Jow H, Gowri-Shankar V (2003) PHASE: a Soft-ware Package for Phylogenetics and SequenceEvolution. University of Manchester, Manchester,UK 47pp

Jow H, Hudelot C, Rattray M, Higgs PG (2002)Bayesian phylogenetics using an RNA substitutionmodel applied to early mammalian evolution. MolBiol Evol 19: 1591—1601

LaJeunesse T (2001) Investigating the biodiversity,ecology and phylogeny of endosymbiotic dinofla-gellates in the genus Symbiodinium using the ITSregion. J Phycol 37: 866—880

Leander BS, Keeling PJ (2004) Early evolutionaryhistory of dinoflagellates and apicomplexans (alveo-lata) as inferred from HSP90 and actin phylogenies. JPhycol 40: 341—350

Lenaers G, Scholin C, Bhaud Y, Saint-Hilaire D,Herzog M (1991) A molecular phylogeny of dino-flagellate protists (Pyrrophyta) inferred from thesequences of 24S rRNA divergent domains D1 andD8. J Mol Evol 32: 53—63

Litaker RW, Tester PA, Colorni A, Levy MG, NogaEJ (1999) The phylogenetic relationship of Pfiesteriapiscicida, cryptoperidiniopsoid sp, Amyloodiniumocellatum and a Pfiesteria-like dinoflagellate to otherdinoflagellates and apicomplexans. J Phycol 35:1379—1389

Lockhart PJ, Larkum AWD, Steel MA, Waddell PJ,Penny D (1996) Evolution of chlorophyll and bacter-iochlorophyll: the problem of invariant sites insequence analysis. Proc Natl Acad Sci USA 93:1930—1934

Lopez-Garcıa P, Rodrıguez-Valera F, Pedros-AlloC, Moreira D (2001) Unexpected diversity of smalleukaryotes in deep-sea Antarctic plankton. Nature409: 603—607

McNally K, Govind NS, Thome PE, Trench RK(1994) Small-subunit ribosomal DNA sequenceanalyses and a reconstruction of the inferredphylogeny among symbiotic dinoflagellates (Pyrro-phyta). J Phycol 30: 316—329

Marin I, Aguilera A, Reguera B, Abad JP (2001)Preparation of DNA suitable for PCR amplificationfrom fresh or fixed single dinoflagellate cells.BioTechniques 30: 88—93

Moon-van der Staay S, de Wachter R, Vaulot D(2001) Oceanic 18S rDNA sequences from pico-

ARTICLE IN PRESSImproving the Analysis of Dinoflagellate Phylogeny based on rDNA 285

plankton reveal unsuspected eukaryotic diversity.Nature 409: 607—610

Pennick NC, Clarke KJ (1977) The occurrence ofscales in the peridinian dinoflagellate Heterocapsatriquetra (Ehrenb) Stein. Br Phycol J 12: 63—66

Philippe H, Chenuil A, Adoutte A (1994) Can theCambrian explosion be inferred through molecularphylogeny? Development 1994: 15—25

Poladian L, Jermiin LS (2004) What might evolu-tionary algorithms (EA) and multi-objective optimisa-tion (MOO) contribute to phylogenetics and the totalevidence debate. Genetic and Evolutionary Compu-tation Conference Proceedings (GECCO-2004),Seattle,Washington USA.

Poladian L, Jermiin LS (2005) Evolutionary multi-objective optimisation algorithms and phylogeneticinference with multiple data sets — Soft Computing(Special Issue on Soft Computing in Bioinformatics),in Press

Posada D, Crandall KA (1998) Modeltest: testingthe model of DNA substitution. Bioinformatics 14:817—818

Pupko T, Huchon D, Cao Y, Okada N, HasegawaM (2002) Combining multiple data sets in a likelihoodanalysis: which models are the best? Mol Biol Evol19: 2294—2307

Roberts KR (1991) The Flagellar Apparatus andCytoskeleton of Dinoflagellates: Organisation andUse in Systematics. In Patterson DJ, Larsen J (eds)The Biology of Free-Living Heterotrophic Flagellates.Clarendon Press, Oxford, pp 285—302

Roberts KR, Roberts JE (1991) The flagellarapparatus and cytoskeleton of the dinoflagellates— A comparative analysis. Protoplasma 164:105—122

Saldarriaga JF, Leander BS, Taylor FJR, Keeling P(2003a) Lessardia elongata gen et sp nov (Dino-flagellata, Peridiniales, Podolampaceae) and thetaxonomic position of the genus Roscoffia. J Phycol39: 368—378

Saldarriaga JF, Taylor FJR, Keeling PJ, Cavalier-Smith T (2001) Dinoflagellate nuclear SSU rRNAphylogeny suggests multiple plastid losses andreplacements. J Mol Evol 53: 204—213

Saldarriaga JF, McEwen ML, Fast NM, Taylor FJR,Keeling PJ (2003b) Multiple protein phylogeniesshow that Oxyrrhis marina and Perkinsus marinus areearly branches of the dinoflagellate lineage. Int JSyst Evol Microbiol 53: 355—365

Saldarriaga JF, Taylor FJR, Cavalier-Smith T,Menden-Deuer S, Keeling PJ (2004) Molecular

data and the evolutionary history of dinoflagellates.Europ J Protistol 40: 85—111

Sanderson MJ, Schaffer HB (2002) Troubleshoot-ing molecular phylogenetic analyses. Annu Rev EcolSyst 33: 49—72

Saunders GW, Hill DRA, Sexton JP, AndersenRA (1997) Small-subunit ribosomal RNA seq-uences from selected dinoflagellates:testing classi-cal evolutionary hypotheses with molecular sys-tematic methods. Plant Syst Evol (Suppl) 11:237—259

Schnepf E, Elbrachter M (1999) Dinophyte chlor-oplasts and phylogeny — a review. Grana 38:81—97

Seo T-K, Kishino H, Thorne JL (2005) Incorporatinggene specific variation when inferring and evaluatingoptimal evolutionary tree topologies from multilocussequence data. Proc Natl Acad Sci USA 102:4436—4441

Shao P, Chen Y-Q, Zhou H, Yuan J, Qu L-H, ZhaoD, Lin Y-S (2004) Genetic variability in Gymnodinia-ceae ITS regions: implications for species identifica-tion and phylogenetic analysis. Mar Biol 144:215—224

Smith AD, Lui TWH, Tillier ERM (2004) Empiricalmodels for substitution in ribosomal RNA. Mol BiolEvol 21: 419—427

Takano Y, Horiguchi T (2004) Surface ultrastructureand molecular phylogenetics of four unarmouredheterotrophic dinoflagellates, including the typespecies of the genus Gyrodinium (Dinophyceae).Phycol Res 52: 107—116

Takishita K, Koike K, Maruyama T, Ogata T(2002) Molecular evidence for plastid robbery(kleptoplastidy) in Dinophysis, a dinoflagellate caus-ing diarrhetic shellfish poisoning. Protist 153:293—302

Tillier ERM, Collins RA (1998) High apparentrate of simultaneous compensatory base-pair sub-stitutions in ribosomal RNA. Genetics 148:1993—2002

Van der Auwera G, de Wachter R (1998) Structureof the large subunit rDNA from a diatom, andcomparison between small and large subunit ribo-somal RNA for studying stramenopile evolution. JEukaryot Microbiol 45: 521—527

Winchell CJ, Sullivan J, Cameron CB, Swalla BJ,Mallatt J (2002) Evaluating hypotheses of deuter-ostome phylogeny and chordate evolution with newLSU and SSU ribosomal DNA data. Mol Biol Evol 19:762—776

ARTICLE IN PRESS286 S. Murray et al.

Wuyts J, De Rijk P, van de Peer Y, Winkelmans T,De Wachter R (2001) The European Large SubunitRibosomal RNA database. Nucleic Acids Res 29:175—177

Yang Z (1993) Maximum likelihood estimationof phylogeny from DNA sequences when substitu-tion rates differ over sites. Mol Biol Evol 10:1396—1401

Yang Z (1996a) Maximum-likelihood models forcombined analyses of multiple sequence data. JMol Evol 42: 587—596

Yang Z (1996b) Among-site rate variation and itsimpact on phylogenetic analysis. Trends Ecol Evol11: 367—372

Zardoya R, Costas E, Lopez-Rodas V, Garrido-Pertierrra A, Bautista JM (1995) Revised dinofla-gellate phylogeny inferred from molecular analysis oflarge subunit ribosomal RNA gene sequences. J MolEvol 41: 637—645

Zhang Z, Green BR, Cavalier-Smith T (1999) Singlegene circles in dinoflagellate chloroplast genomes.Nature 400: 155—159