Evolution of the trnF(GAA) Gene in Arabidopsis Relatives and the Brassicaceae Family: Monophyletic...

12
Evolution of the trnF(GAA) Gene in Arabidopsis Relatives and the Brassicaceae Family: Monophyletic Origin and Subsequent Diversification of a Plastidic Pseudogene Marcus A. Koch,* Christoph Dobes ˇ,* Michaela Matschinger,* Walter Bleeker, Johannes Vogel,à Markus Kiefer,* and Thomas Mitchell-Olds§ *Heidelberg Institute of Plant Science, Heidelberg University, Heidelberg, Germany;  Department of Systematic Botany, University of Osnabru ¨ ck, Germany; àBotany Department, National History Museum, London, United Kingdom; and §Max Planck Institute for Chemical Ecology, Jena, Germany Recently, we used the 5#-trnL(UAA)–trnF(GAA) region of the chloroplast DNA for phylogeographic reconstructions and phylogenetic analysis among the genera Arabidopsis, Boechera, Rorippa, Nasturtium, and Cardamine. Despite the fact that extensive gene duplications are rare among the chloroplast genome of higher plants, within these taxa the anticodon domain of the trnF(GAA) gene exhibit extensive gene duplications with one to eight tandemly repeated copies in close 5# proximity of the functional gene. Interestingly, even in Arabidopsis thaliana we found six putative pseudogenic copies of the functional trnF gene within the 5#-intergenic trnL-trnF spacer. A reexamination of trnL(UAA)-trnF(GAA) regions from numerous published phylogenetic studies among halimolobine, cardaminoid, and other cruciferous taxa revealed not only extensive trnF gene duplications but also favor the hypothesis about a single origin of trnF pseudogene formation during evolution of the Brassicaceae family 16–21 MYA. Conserved sequence motifs from this tandemly repeated region are codistributed nonrandomly throughout the plastome, and we found some similarities with a DNA sequence duplication in the rps7 gene and its adjacent spacer. Our results demonstrate the potential evolutionary dynamics of a plastidic region generally regarded as highly conserved and probably cotranscribed and, as shown here for several genera among crucif- erous plants, greatly characterized by parallel gains and losses of duplicated trnF copies. Introduction Among plant systematic and phylogeographic studies the chloroplast genome is widely used and generally accepted as an excellent source for molecular information (Olmstead and Palmer 1994; Newton et al. 1999; Hewitt 2001). There are several reasons for this. First, the unipar- ental inheritance (maternally in most angiosperms, pater- nally in gymnosperms; Reboud and Zeyl 1994) ensures orthology of sequences. Biparental inheritance is a rare exception (Johnson and Palmer 1989). Second, even within an individual the possibility of recombination between genomes from individual plastids is extremely low, and there are only a few studies describing the occurrence of multimeric chloroplast DNA (cpDNA) genomes or interchromosomal cpDNA recombination (Govindaraju, Dancik, and Wagner 1989; Dally and Second 1990). Third, dramatic changes in gene content and structure only occurred after the chloroplast genome entered a eukaryotic cell via primary endocytobiosis (Martin et al. 1998), whereas land plant plastomes are highly conserved (Goremykin et al. 2003; Kelch, Driskell, and Mishler 2004). However, some studies indicated that the chloroplast genome in higher plants still has the potential for evolution- ary changes as indicated by a radically reduced ‘‘minimal plastid’’ genome (parasitic Epifagus: Wolfe, Morden, and Palmer 1992) or possible DNA recombination (lodgepole pine: Marshall, Newton, and Ritland 2001). A summary of structural mutations in the chloroplast genome is provided by Vijverberg and Bachmann (1999), and it has been concluded that most structural mutations concern indels ,10 bp. These microstructural changes have been shown to be extremely useful even in resolving deep phylogenies (Graham et al. 2000; Lo ¨ hne and Borsch 2005) and have been analyzed in more detail in the chloroplast genome of Silene (Ingvarsson, Ribstein, and Taylor 2003). Structural mutations such as gene duplications among higher plant plastomes are rarely described. Those examples involve tRNA genes (e.g., Hipkens et al. 1995; Vijverberg and Bachmann 1999; Dra ´bkova et al. 2004), rpl2 and rpl23 (Bowman, Barker, and Dyer 1988), psbA (Lidholm, Szmidt, and Gustafsson 1991), and psaM (Wakasugi et al. 1994). An overview of losses of chloroplast genes in angiosperms is provided by Millen et al. (2001), and it seems obvious that most of the duplications can be manifested only in rear- ranged chloroplast genomes, such as those of the grasses, legumes, and conifers. Interestingly, evolutionary dynam- ics of the chloroplast genomes such as rearrangements and nucleotide substitution rates greatly depend on such large- scale rearrangement, for example, the loss of one copy of the inverted repeat (IR) (Palmer and Thompson 1981, 1982; Perry and Wolfe 2002). And consequently one of the few reports of pseudogenes came from Vigna angularis, legume family, and describes ycf2 gene duplication (Perry et al. 2002). One of the most widely used plastidic molecular markers in plant systematics and phylogeography is the trnT-trnF region since Taberlet et al. (1991) introduced universal primers to amplify the region comprising the trnT(UGA) gene, the trnL(UAA) gene including a group I intron, the trnF(GAA) gene, and the corresponding two spacers. Interestingly, this region provided not only phylo- genetic signal to resolve deep angiosperm phylogeny (e.g., Borsch et al. 2003) but also revealed extensive haplotype variation to elaborate speciation processes on the popula- tion level (e.g., Dobes ˇ, Mitchell-Olds, and Koch 2004). The trnL-trnF genes are cotranscribed (Kanno and Hirai 1993), and therefore it can be assumed that intron as well as spacer regions are of functional importance. Key words: Brassicaceae, trnF(GAA), pseudogenes, phylogeny, gene duplication. E-mail: [email protected]. Mol. Biol. Evol. 22(4):1032–1043. 2005 doi:10.1093/molbev/msi092 Advance Access publication February 2, 2005 Molecular Biology and Evolution vol. 22 no. 4 Ó Society for Molecular Biology and Evolution 2005; all rights reserved. by guest on May 15, 2014 http://mbe.oxfordjournals.org/ Downloaded from

Transcript of Evolution of the trnF(GAA) Gene in Arabidopsis Relatives and the Brassicaceae Family: Monophyletic...

Evolution of the trnF(GAA) Gene in Arabidopsis Relatives and theBrassicaceae Family: Monophyletic Origin and SubsequentDiversification of a Plastidic Pseudogene

Marcus A. Koch,* Christoph Dobes,* Michaela Matschinger,* Walter Bleeker,� Johannes Vogel,�Markus Kiefer,* and Thomas Mitchell-Olds§*Heidelberg Institute of Plant Science, Heidelberg University, Heidelberg, Germany; �Department of Systematic Botany, Universityof Osnabruck, Germany; �Botany Department, National History Museum, London, United Kingdom; and §Max Planck Institute forChemical Ecology, Jena, Germany

Recently, we used the 5#-trnL(UAA)–trnF(GAA) region of the chloroplast DNA for phylogeographic reconstructions andphylogenetic analysis among the genera Arabidopsis, Boechera, Rorippa, Nasturtium, and Cardamine. Despite the factthat extensive gene duplications are rare among the chloroplast genome of higher plants, within these taxa the anticodondomain of the trnF(GAA) gene exhibit extensive gene duplications with one to eight tandemly repeated copies in close 5#proximity of the functional gene. Interestingly, even in Arabidopsis thaliana we found six putative pseudogenic copies ofthe functional trnF gene within the 5#-intergenic trnL-trnF spacer. A reexamination of trnL(UAA)-trnF(GAA) regionsfrom numerous published phylogenetic studies among halimolobine, cardaminoid, and other cruciferous taxa revealed notonly extensive trnF gene duplications but also favor the hypothesis about a single origin of trnF pseudogene formationduring evolution of the Brassicaceae family 16–21 MYA. Conserved sequence motifs from this tandemly repeated regionare codistributed nonrandomly throughout the plastome, and we found some similarities with a DNA sequence duplicationin the rps7 gene and its adjacent spacer. Our results demonstrate the potential evolutionary dynamics of a plastidic regiongenerally regarded as highly conserved and probably cotranscribed and, as shown here for several genera among crucif-erous plants, greatly characterized by parallel gains and losses of duplicated trnF copies.

Introduction

Among plant systematic and phylogeographic studiesthe chloroplast genome is widely used and generallyaccepted as an excellent source for molecular information(Olmstead and Palmer 1994; Newton et al. 1999; Hewitt2001). There are several reasons for this. First, the unipar-ental inheritance (maternally in most angiosperms, pater-nally in gymnosperms; Reboud and Zeyl 1994) ensuresorthology of sequences. Biparental inheritance is a rareexception (Johnson and Palmer 1989). Second, even withinan individual the possibility of recombination betweengenomes from individual plastids is extremely low, andthere are only a few studies describing the occurrenceof multimeric chloroplast DNA (cpDNA) genomes orinterchromosomal cpDNA recombination (Govindaraju,Dancik, and Wagner 1989; Dally and Second 1990). Third,dramatic changes in gene content and structure onlyoccurred after the chloroplast genome entered a eukaryoticcell via primary endocytobiosis (Martin et al. 1998),whereas land plant plastomes are highly conserved(Goremykin et al. 2003; Kelch, Driskell, and Mishler2004). However, some studies indicated that the chloroplastgenome in higher plants still has the potential for evolution-ary changes as indicated by a radically reduced ‘‘minimalplastid’’ genome (parasitic Epifagus: Wolfe, Morden, andPalmer 1992) or possible DNA recombination (lodgepolepine: Marshall, Newton, and Ritland 2001).

A summary of structural mutations in the chloroplastgenome is provided by Vijverberg and Bachmann (1999),and it has been concluded that most structural mutationsconcern indels ,10 bp. These microstructural changes have

been shown to be extremely useful even in resolving deepphylogenies (Graham et al. 2000; Lohne and Borsch 2005)and have been analyzed in more detail in the chloroplastgenome of Silene (Ingvarsson, Ribstein, and Taylor 2003).Structural mutations such as gene duplications among higherplant plastomes are rarely described. Those examplesinvolve tRNA genes (e.g., Hipkens et al. 1995; Vijverbergand Bachmann 1999; Drabkova et al. 2004), rpl2 and rpl23(Bowman, Barker, and Dyer 1988), psbA (Lidholm, Szmidt,and Gustafsson 1991), and psaM (Wakasugi et al. 1994).An overview of losses of chloroplast genes in angiospermsis provided by Millen et al. (2001), and it seems obviousthat most of the duplications can be manifested only in rear-ranged chloroplast genomes, such as those of the grasses,legumes, and conifers. Interestingly, evolutionary dynam-ics of the chloroplast genomes such as rearrangements andnucleotide substitution rates greatly depend on such large-scale rearrangement, for example, the loss of one copy ofthe inverted repeat (IR) (Palmer and Thompson 1981, 1982;Perry and Wolfe 2002). And consequently one of the fewreports of pseudogenes came from Vigna angularis, legumefamily, and describes ycf2 gene duplication (Perry et al. 2002).

One of the most widely used plastidic molecularmarkers in plant systematics and phylogeography is thetrnT-trnF region since Taberlet et al. (1991) introduceduniversal primers to amplify the region comprising thetrnT(UGA) gene, the trnL(UAA) gene including a groupI intron, the trnF(GAA) gene, and the corresponding twospacers. Interestingly, this region provided not only phylo-genetic signal to resolve deep angiosperm phylogeny (e.g.,Borsch et al. 2003) but also revealed extensive haplotypevariation to elaborate speciation processes on the popula-tion level (e.g., Dobes, Mitchell-Olds, and Koch 2004).The trnL-trnF genes are cotranscribed (Kanno and Hirai1993), and therefore it can be assumed that intron as wellas spacer regions are of functional importance.

Key words: Brassicaceae, trnF(GAA), pseudogenes, phylogeny, geneduplication.

E-mail: [email protected].

Mol. Biol. Evol. 22(4):1032–1043. 2005doi:10.1093/molbev/msi092Advance Access publication February 2, 2005

Molecular Biology and Evolution vol. 22 no. 4 � Society for Molecular Biology and Evolution 2005; all rights reserved.

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

The trnL group I intron resembles an ancestral introntype, which can be traced back to a single cyanobacterialendosymbiosis, and this region has been analyzed inten-sively (Kuhsel, Strickland, and Palmer 1990; Xu et al.1990; Cech et al. 1992; Paquin et al. 1997; Besendahlet al. 2000; Costa, Paulstrud, and Lindblatt 2002). How-ever, less efforts have been undertaken to understand func-tion and evolution of the trnL(UAA)-trnF(GAA) spacerregion (Bakker et al. 2000; Hamilton, Braverman, andSoria-Hernanz 2003) and to analyze the relevance of puta-tive promoter elements and mutational hotspots with littlestructural constraints for evolution and phylogenetic re-constructions (Borsch et al. 2003; Quandt et al. 2004).

It is remarkable that the only examples of trnF genecopy number variation outside the Brassicaceae have beenreported from Microseris and Uropappus (Vijverberg andBachmann 1999), Taraxacum (Wittzell 1999) from theAsteraceae family, both of which are members of the tribeLactuceae, and from Juncus and Luzula from the Juncaceaefamily (Drabkova et al. 2004). It has been concluded thatpolymorphic pseudogenes are not subject to purifying selec-tion inTaraxacum, and in closely related generaYoungia andCrepis, no pseudogenes have been observed (Wittzell 1999).This finding might evoke the question whether an initialpseudogene formation could have occurred within a partic-ular lineage of Asteraceae with parallel losses and elimina-tion of particular pseudogene copies. However, this isuncertain, although Vijverberg and Bachmann (1999)already concluded that an initial duplication must haveoccurred in an ancestor of the genera under study and thatthe duplication is rather ancient. The repetitive nature ofthe pseudogenes is substantiated by interspersed 4-bp(AATA) motifs in Taraxacum (Wittzell 1999), and aproposed mechanism of generation of pseudogenes via inter-chromosomal recombination and intrachromosomal dupli-cations has been provided (Vijverberg and Bachmann 1999).

In this study we investigated the trnF(GAA) gene andits evolution in cruciferous plants. Recently, we detectedextensive trnF(GAA) pseudogene formation among thecruciferous genera Cardaminopsis (meanwhile integratedinto a newly defined genus Arabidopsis [O’Kane and Al-Shehbaz 1997, 2003] and Boechera [Koch, Dobes, andMatschinger 2003; Dobes, Mitchell-Olds, and Koch2004]). Therefore, herein we aim (1) to reconstruct the evo-lutionary history of trnF pseudogenes in Brassicaceae withspecial emphasis on the genus Arabidopsis (which is themost diverse model known so far), (2) to analyze muta-tional patterns and sequence motifs in the spacer region thatmight provide insights into the mechanism of pseudogeneformation, and (3) to evaluate the positional occurrence ofcomplete or partial trnF pseudogenes in angiosperm chlor-oplast genomes to assess if they form transposable elementswithin the plastome to demonstrate their taxonomic distri-bution and to comment on their phylogenetic utility.

Materials and Methods

The DNA sequence data of the trnL(UAA)-trnF(GAA)region were obtained from two different sources. Most of thesequences have been selected from previously publishedstudies on crucifer systematics and evolution (table 1). All

these sequences had been already deposited in GenBank,and we only refer to the corresponding publication. A secondsource and large-scale study of approximately 750 acces-sions sequenced is focusing on the phylogeography of thegenusArabidopsis,andinthisstudywepresentsomeselectedsequences (Matschinger and Koch 2003) representing mostof the variation in trnF copy number (table 1). In addition,we submitted sequence data of numerous taxa from thegenera Draba and Arabis to GenBank (unpublished data,AF134196–AF134278). However, none of those sequencescontained any pseudogene.

Detailed protocols of DNA isolation, polymerasechain reaction, and DNA sequencing are given in Dobes,Mitchell-Olds and Koch (2004), and the methods used fol-low standard procedures.

For halimolobine Brassicaceae and several out-groupswe used the trnL(UAA)-trnF(GAA) alignment provided byBailey, Price, and Doyle (2002) as an example to demon-strate pseudogene copy number distribution in the contextof a published and robust phylogeny.

Additionally, we selected trnL-F spacer regions fromnumerous cruciferous taxa and several species from theorder Capparales as out-groups (table 1) to cover asmany genera as possible. A National Center for Biotechnol-ogy Information GenBank search (using the ENTREZgateway and ‘‘keywords trnF and Brassicaceae’’ athttp://www.ncbi.nlm.nih.gov/entrez/) resulted in 726 se-quences. However, there are only a few publications andstudies comprising more than 99% of these sequences(Draba, Erophila, Tomostima, Cusickiella, and relatedtaxa: Koch and Al-Shehbaz [2002] [78 sequences]; Draba,Schivereckia, Arabis: Koch unpublished [82 sequences:AF134196– AF134278]; Lepidium, Cardaria, Hymenolo-bus, Pritzelago, Hornungia and related taxa: Mummenhoff,Bruggemann, and Bowman [2001] [82 sequences]; Lepi-dium: Lee, Mummenhoff, and Bowman [2002] [58 se-quences]; selection of taxa from the order Capparales:Hall, Sytsma, and Iltis [2002] [57 sequences in total, 11 fromBrassicaceae]; halimolobine Brassicaceae: Bailey, Price, andDoyle [2002] [47 sequences]; Rorippa and Nasturtium:Bleeker, Weber-Sparenberg, and Hurka [2002] [35 se-quences]; Rorippa: Bleeker and Hurka [2001] [76 se-quences—characterizing haplotypes from 359 individuals];Cardamine, Rorippa: Bleeker et al. [2002] [24 sequences];Cardamine: Lihova et al. [2004] [76 sequences]; selection oftaxa from tribe Brassiceae: Yang et al. [2002] [12 se-quences]; Brassia relatives and Diplotaxis: Lanner [1998][15 sequences]; Noccaea, Raparia, and Microthlaspi: Kochand Al-Shehbaz [2004] [26 sequences]; Boechera, Cusick-iella and related taxa: Dobes, Mitchell-Olds, and Koch[2004] [103 sequences—characterizing haplotypes from654 accessions]). The several taxa are summarized intable 1.

TrnF(GAA) Pseudogene Recognition and CopyNumber in Arabidopsis thaliana

We used the GenBank accession of the chloroplastgenome of Arabidopsis thaliana (AP000423) to selectthe corresponding trnL(UAA)-trnF(GAA) region (bp posi-tions 46894–48247; with 46894–46928 and 47441–47490

trnF Pseudogene Evolution in Cruciferous Plants 1033

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

for the two exons of trnL, and 48175–48247 for the trnFgene). Initially, we scored this region with the central anti-codon domain of the trnF gene (fig. 1). After the recogni-tion of six pseudogenic copies of this anticodon domain(region D) in the trnL-F spacer region, an alignment ofthese multiplicated copies and its flanking sequences(regions A–C, E) was done manually (fig. 2a). A blastsearch in the whole chloroplast genome of A. thaliana usingthese anticodon copies as query revealed the trnF gene asthe only possible source of the pseudogenes.

In order to obtain information about additional co-occurrence of sequences similar to the flanking regionsA–C in the chloroplast genome, we searched for exactmatches of highly conserved 7- to 8-bp fragments (fig.2a) within the A. thaliana chloroplast genome.

Regions A–C were also used for subsequent blastsearches against the whole chloroplast genome to identifysimilarly modularized DNA sequences.

TrnF Pseudogenes in the GenusArabidopsis-Cardaminopsis

A selected number of 11 sequences from thetrnL(UAA)-trnF(GAA) region of different Arabidopsisspecies (the former genus Cardaminopsis, for details referto O’Kane and Al-Shehbaz 1997) are presented here (table1). The recognition of duplicated sequences has been per-formed taking advantage of the results from A. thaliana (fig.2a), and a corresponding alignment has been generatedmanually (fig. 1, Supplementary Material online). For adeeper understanding of copy number evolution withinthe genus Arabidopsis, we separated each single pseudo-gene copy from each Arabidopsis sequence, aligned themaccordingly (fig. 2b), and performed a phylogenetic analy-sis using a parsimony approach (PAUP4.0b10, Swofford2000) with the heuristic search settings using the tree-bisection-reconnection option and using the option

Table 1Distribution of trnF Pseudogenes Among Cruciferous Plantsand Number of Multiplicated trnF Anticodon Domains

Arabis soyeri 0 pseudogenes, AY134271Arabis alpina 0 pseudogenes, AY034180Cardaminopsis halleri (Card0022):

Austria6 pseudogenes, AY665575

Cardaminopsis arenosa (Haplotype20): Slovakia

5 pseudogenes, AY665576

Cardaminopsis arenosa (Card0207):Austria

5 pseudogenes, AY665577

Cardaminopsis petrogena (Haplotype44): Romania

5 pseudogenes, AY665578

Cardaminopsis petraea (Card0280):Scotland

3 pseudogenes, AY665579

Cardaminopsis croatica (967):Croatia

3 pseudogenes, AY665580

Cardaminopsis arenosa (Card0189):Austria

6 pseudogene, AY665581

Cardaminopsis petraea (Card0154):Austria

1 pseudogene, AY665582

Cardaminopsis petraea (Haplotype32): Austria

4 pseudogenes, AY665583

Cardaminopsis arenosa (Haplotype0199): Austria

7 pseudogenes, AY665584

Cardaminopsis halleri (Card0086):Romania

7 pseudogenes, AY665585

Yang et al. (2002)Brassica (6 taxa) 0 pseudogenesRaphanus sativus 0 pseudogenesSinapis alba 0 pseudogenesLepidium virginicum 2 pseudogene

Lanner (1998)Brassica (14 taxa) 0 pseudogenesDiplotaxis erucoides 0 pseudogenes

Bailey, Price, and Doyle (2002)Sphaerocardamum (8 taxa) 1–2 pseudogenesPennellia (2 taxa) 2 pseudogenesNerisyrenia linearifolia 1 pseudogeneMancoa (4 taxa) 2–3 pseudogenesLesquerella fendleri 1 pseudogeneHalimolobus (11 taxa) 1–3 pseudogenesCusickiella douglasii 1 pseudogeneCapsella bursa-pastoris 5 pseudogenesArabis tricornuta 1 pseudogeneBoechera stricta 1 pseudogeneLepidium campestre 3 pseudogenes

Dobes, Mitchell-Olds, and Koch (2004)Boechera (3 taxa) 1–3 pseudogenes

Matschinger and Koch (2003)Cardaminopsis, Arabidopsis (6species)

1–8 pseudogenes

Hall, Sytsma, and Iltis (2002)Aethionema (2 taxa) 0 pseudogenesIberis (2 taxa) 0 pseudogenesStanleya pinnata 0 pseudogenesBarbarea vulgaris (min. 1 pseudogene)Capsella bursa-pastoris (min. 1 pseudogene)Nasturtium officinale (min. 1 pseudogene)Sisymbrium altissimum 0 pseudogenesThlaspi arvense 0 pseudogenesThellungiella salsuginea 0 pseudogenesAll other taxa from the orderCapparales

0 pseudogenes

Mummenhoff, Bruggemann, and Bowman (2001)Hymenolobus procurrens 2 pseudogenesHornungia petraea 1 pseudogenePritzelago alpina 1 pseudogeneCardaria (3 taxa) 2 pseudogenesLepidium (70 taxa) 1–4 pseudogenes

Table 1Continued

Lee, Mummenhoff, and Bowmann (2002)Lepidium (43 taxa) 1–4 pseudogenes

Bleeker et al. (2002)Rorippa (2 taxa) 2–4 pseudogenesCardamine (15 taxa) (min.) 1–4 pseudogenes

Bleeker and Hurka (2001)Rorippa (3 taxa) 2–4 pseudogenes

Bleeker, Weber-Sparenberg, and Hurka (2002)Rorippa (24 taxa) 1–5 pseudogenesNasturtium (2 taxa) 2 pseudogenes

Lihova et al. (2004)Cardamine (22 taxa) (min.) 2–6 pseudogenes

Koch and Al-Shehbaz (2002)Draba (66 taxa) 0 pseudogenesErophila (2 taxa) 0 pseudogenesTomostima (4 taxa) 0 pseudogenesCusickiella (2 taxa) 1 pseudogene

Koch and Al-Shehbaz (2004)Noccaea (24 taxa) 0 pseudogenesMicrothlaspi perfoliatum 0 pseudogenesRaparia bulbosa 0 pseudogenes

NOTE.—Either the GenBank accession code of the plant material investigated

herein is provided or the corresponding phylogenetic study has been cited.

1034 Koch et al.

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

GAPMODE 5 MISSING. No additional gap coding hasbeen performed (e.g., as binary character) to minimize abias caused by the alignment of the multiple sequencesitself. The bootstrap option of PAUP (1,000 replicates)was used to assess relative support in the unweighted par-simony analysis.

Halimolobine Brassicaceae

The anticodon domain sequence from A. thaliana hasbeen also used to recognize duplicated copies in a recentlypublished study on systematics of halimolobine Brassica-ceae (Bailey, Price, and Doyle 2002). We used the original

alignment to demonstrate anticodon domain copy numberdistribution and its correspondence to the published phylo-genetic hypothesis, which is not only based on the trnL-Fregion but which is also supported by sequence data of theinternal transcribed spacers of nuclear ribosomal DNA (ITS1 and ITS 2) and the pistillata intron.

Comparisons Within the Brassicaceae Family

For all trnL-trnF sequences as summarized in table 1we analyzed (1) the occurrence of duplicated sequences ofthe anticodon domain of the trnF(GAA) gene and (2) theoccurrence of the several motifs (A–C, E) as characterized

FIG. 2.—(a) Nucleotide sequence of the Arabidopsis thaliana trnL (2. exon), the trnL-F intergenic spacer region, and part of the trnF gene. Theduplicated copies have been scored from I to VIII, and the different regions have been named A–E. Copies IV–VI refer to the copies found in the otherArabidopsis species investigated, but they are not, or only partially, present in A. thaliana (fig. 1 and (b) of this figure). *, # search motifs and matches for ablast search within the whole plastome (refer to table 2). 1 search motifs against the whole plastome (refer to table 2). (b) Alignment of eight types of thetrnF pseudogene copies demonstrating single nucleotide and indel polymorphism observed in the trnL-F region of Arabidopsis-Cardaminopsis. Theresults of a phylogenetic analysis based on this alignment are shown in figure 5. Designation of the several regions (A–E) follows (a) of this figure.

FIG. 1.—Nucleotide sequence and secondary structure of the Arabidopsis thaliana trnF gene. Secondary structuring of the DNA is indicated bysymbols f–j and f#–j#.

trnF Pseudogene Evolution in Cruciferous Plants 1035

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

in A. thaliana (fig. 2a). The results were compared with aphylogeny of the whole family. The most comprehensiveand available phylogeny based on a multiple data set (inter-nal transcribed spacers 1 and 2 of nuclear ribosomal DNA,maturase K, alcoholdehydrogenase, and chalcone synthase;with several genes missing for numerous taxa in variouscombinations) has been published recently (Koch 2003).However, a more robust phylogenetic framework with lesstaxa but clocklike evolution of the corresponding molecularmarkers (nuclear-encoded chalcone synthase and alcoholdehydrogenase, and plastidic maturase K) has been elabo-rated (Koch, Haubold, and Mitchell-Olds 2000, 2001), andthese phylogenetic frameworks with their correspondingdivergence time estimates have been used to show a time-frame for the first occurrence of trnF gene duplications. Formethods of calibrating the molecular clock and computingdivergence time estimates refer to Koch, Haubold andMitchell-Olds (2000, 2001).

ResultsTrnF Pseudogenes in A. thaliana

In A. thaliana we characterized six multiple sequencesin total within the 668-bp trnL(UAA)-trnF(GAA) inter-genic spacer with the trnF(GAA) anticodon domain asthe most highly conserved element (fig. 2a). These dupli-cated sequences have been enumerated as copies I to VIII.

Copies IV–VI refer to the copies found in the other Arabi-dopsis species investigated, but they are not, or only parti-ally, present in A. thaliana (fig. 2b, fig. 1, SupplementaryMaterial online).

However, the neighboring regions of the anticodon-likesequence (indicated as regions A–C, E) did show low simi-larity only to the different regions of the trnF(GAA) gene(acceptor stem, D domain, and T domain [fig. 1]), with theexception of 3–6 base pairs at the 5#- and 3#-flanking regionsof the D and T domains (fig. 2a).

Of particular interest is a common AGTA motif andits modifications (ATTA, AGGA, CGTA, GGTA), respec-tively, which is frequently found at the 5# end of the differ-ent duplicated regions A–C (fig. 2a).

Multiple trnF Pseudogene Copies inCardaminopsis-Arabidopsis

The twelve different trnL-F spacer sequences of theseveral Arabidopsis species revealed 2–8 (table 1) pseudo-genic copies among the different species (figs. 1 and 2,Supplementary Material online). The most similar copyto the trnF gene is pseudogene no. VII (fig. 5a and band table 3). However, this copy is not present in twoCardaminopsis haplotypes analyzed herein (fig. 1, Supple-mentary Material online). Such losses of a particularpseudogene were found for all of the eight pseudogenecopies in one or another accession. It has been shownrecently that A. thaliana and the remaining representativesof the newly defined genus Arabidopsis (former genusCardaminopsis) have diverged from each other roughly5.8 MYA (Koch, Haubold, and Mitchell-Olds 2001), whichprovides a good time frame for the evolution of the severalcopies differing in their modularized structure of regionsA–E (fig. 2a).

A parsimony analysis using all pseudogene copiesseparately provides some more detailed evidence for theirevolutionary history (fig. 3a and b). Copies I, VI, VII andVIII of A. thaliana are clustering together with the corre-sponding copies of Cardaminopsis (see alignment in fig.1, Supplementary Material online), indicating that thesecopies had existed prior to the split of the two phylogeneticlineages 5.8 MYA. This is also supported by the fact that atleast copies I, VII, and VIII have identical or very similargap positions (not included in our analysis). The onlyexception is copy VI, of which only part of the A. thalianasequence (regions A#, C, D, and E, refer to fig. 2a) ishomologous to the Cardaminopsis copy VI sequences.Cardaminopsis copy VI served as source for copy V type2. The parsimony analysis also indicated that copies II, III,and IV of Cardaminopsis most likely evolved independ-ently from the most similar copy VIII. Copy V type 1 isidentical to copy V type 2 concerning its structural align-ment and gap information, however, phylogenetic analysisbased on single nucleotide polymorphisms placed this copyclose to copy IV from Cardaminopsis. This is bestexplained by two independent duplication events.

Consequently, A. thaliana copies II and III evolvedindependently from Cardaminopsis copies II and III. Aschematic summary of pseudogene copy evolution basedon parsimony analysis is provided in figure 5c.

FIG. 3.—TrnF pseudogene copy number evolution among Arabidop-sis species (Arabidopsis thaliana and former Cardaminopsis). For detailsof the alignment refer to figure 2b. (a) 50% Majority Rule Consensus Treebased on regions C, D, E only (tree length 44, consistency index [CI] 0.62,88 trees). (b) 50% Majority Rule Consensus Tree based on the entire regionA–E (tree length 82, CI 0.64, 1,000 trees). (c) hypothetical model of suc-cessive copy number evolution in A. thaliana and members of the formergenus Cardaminopsis.

1036 Koch et al.

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Occurrence and Distribution of Pseudogenes AmongCruciferous Plants

Duplicated copies of the trnF(GAA) anticodondomain have been detected in numerous genera of the mus-tard family and a summary is given in table 1. We obtainedthe original alignments from most of the studies listed intable 1, and we were able to search for the duplicatedregions directly within these alignments. These searchesrevealed several findings: the alignment of a phylogeneticstudy of the order Capparales (Hall, Sytsma, and Iltis 2002)revealed that few sequences (Nasturtium, Barbarea, andCapsella) ended at the 3# end with the first trnF (GAA)pseudogene copy, and the authors did not provide the entiresequence of the trnL-F spacer region including the ‘‘true’’trnF(GAA) gene. However, this had no effect on their con-clusion and results on Capparales systematics. In our studyfrom these three sequences we could only estimate a mini-mum number of pseudogene copies. Fortunately, all thesespecies have been included in other studies and have beenanalyzed on a broader scale (e.g., Bailey, Price, and Doyle2002; Bleeker et al. 2002). A similar situation has beenfound in Cardamine (Lihova et al. 2004). The alignmentof the spacer region ended with a pseudogene and not withthe trnF(GAA) gene as indicated, and here we also provided aminimum number of pseudogenes for the taxa analyzed.

In many other cases we found no duplicated anticodondomains (e.g., Draba, Arabis, Noccaea, and others, table1). Interestingly, in all cases of lacking pseudogenes wealso did not find any of the repetitive motifs B, C, and Ein the corresponding trnL(UAA)-trnF(GAA) spacer region.However, the prominent motif A/A# is always present inclose 5# proximity of the functional trnF gene.

The distribution of the pseudogenic trnF(GAA) tan-dem repeats is totally in congruence with previously pub-lished phylogenies (fig. 4) of the Brassicaceae family (Koch2003; summarized in Koch, Al-Shehbaz, and Mummenhoff2003), and it is obvious that a first pseudogene copy evolvedonly once at the base of a highly supported monophyleticlineage (fig. 4). This is the first time that a reliable marker(molecular or morphological) has been described, whichseparates this taxonomically notorious difficult family witha relative deep split in time of approximately 18.5 (meanestimate of matk and chs from node A, fig. 4) to 16 MYA(mean estimate of matk and chs from node B, fig. 4). Diver-gence time estimates have been redrawn from previousinvestigations (Koch, Haubold, and Mitchell-Olds 2001).

However, ‘‘non’’ pseudogene–containing taxa remainparaphyletic in respect to the pseudogene carrying taxa.

The Example ‘‘Halimolobine’’ Brassicaceae

The analysis of the alignment provided by Bailey,Price, and Doyle (2002) revealed varying numbers of apseudogenic anticodon domain from 1 to 6 (fig. 5). Enu-meration of pseudogene anticodon copies followed theiroccurrence within the alignment and has been adopted tocopy enumeration I to VIII in A. thaliana and Cardaminop-sis. This analysis demonstrates that all copies (except forcopy number 1) either have been constituted independentlyseveral times or have been lost several times in parallelthroughout their evolution. Interestingly, the different

regions A, A#, B, and C are present at the 5#end of the firstpseudogene copy in all taxa carrying the pseudogenes. Acomparison with taxa that do not carry a pseudogene copydemonstrates that among all cruciferous taxa analyzedherein only region A is highly conserved and regions B,A# and C are missing in non–pseudogene carrying species(data not shown).

TrnF(GAA) Pseudogene Evolution in Angiosperms

We screened the trnL-trnF alignment of Borsch et al.(2003) covering all major groups of angiosperms for trnFpseudogene (or partial anticodon domain) insertion, andnone of these taxa contained any duplications. In addition,we also screened this alignment for the different regions A,B, and C occurring in all cruciferous taxa showing anticodondomain duplications. However, none of these regions couldbe identified with a significant sequence identity among allnoncruciferous taxa analyzed by Borsch et al. (2003).

This is also true for those Asteraceae (Microseris, Uro-pappus, Taraxacum) that represent the only examples oftrnF gene copy number variation outside the Brassicaceae(Vijverberg and Bachmann 1999; Wittzell 1999).

However, in these cases the entire trnF gene has beenduplicated, which is in sharp contrast to the Brassicaceaewith extensive duplication of the trnF anticodon domainonly.

DiscussionTrnF Pseudogene Characterization in Cruciferous Plants

The trnF(GAA) pseudogenes from cruciferous plantsare quite different from those characterized in Microserisand Uropappus (Vijverberg and Bachmann 1999). In thesespecies the whole gene including both acceptor stemregions has been tandemly duplicated with a sequence iden-tity to the original trnF gene varying from 88% to 99%. Asimilar situation was found in Taraxacum (Wittzell 1999),with a sequence identity ranging from 80%–92%. Contra-rily, in A. thaliana several different repetitive motifs occur(A–C, E) as indicated in figure 2a, which are not part of thefunctional trnF gene. It is notable that these motifs are alsoconserved among a variety of different taxa exclusivelycharacterized by anticodon domain duplications, as shownby the halimolobine species data set (fig. 2, SupplementaryMaterial online, e.g., alignment positions 966–1072). Themajority of these motifs are not found in trnL-F spacerregions of cruciferous plants lacking such duplications.The only exception is the 5# region A/A#. This motif of22 base pairs (fig. 2a) is present in all trnL-F spacer regionsin closest proximity to the functional trnF gene. Blastsearches for region A, B, and C against the whole chloro-plast genome sequence of A. thaliana (AP000423) revealedno significant hits, with the exception of parts of region Amatching parts of the rps7 gene and also its neighboringtrnV-rps7 spacer (table 2 and fig. 2a). Interestingly, thesituation changes when we select shorter motifs (7–8 bp)from regions A, B, and C to search for identical motifsthroughout the A. thaliana chloroplast genome (table 2and fig. 2a). As expected because of shorter search strings,the number of hits increased greatly. It is also obvious

trnF Pseudogene Evolution in Cruciferous Plants 1037

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

from the summary scores in table 2, that the single hits arerandomly distributed all over the plastome, which in thiscase is 154,478 bp in size. However, the three selected7- to 8-bp motifs revealed a significant nonrandom cluster-ing. Out of 25 hits in total, 13 are co-occurring in similarregions of the plastome—a finding, for which we have noexplanation so far. From these results we can conclude that(1) the flanking sequence regions A–C of the trnF(GAA)anticodon domain are unique and have not been simplytransferred from other regions of the plastome and (2)

the occurrence of region A in all cruciferous taxa regardlessof any anticodon domain duplication provides evidence thatthe duplicated sequences resulted from rearrangement ofthe trnF gene and its neighboring areas. However, the find-ing that the only significant matches of the blast search con-cern region A (table 2), and, moreover, that these matchesare found in a coding gene (rps7) and its neighboring spacerregion (spacer trnV-rps7) might indicate that a sequencelike that from region A might have driven the first corre-sponding duplication.

FIG. 4.—Phylogenetic relationships among cruciferous plants based on chs and matK sequence data (redrawn from Koch, Haubold, and Mitchell-Olds 2001). Some genera have been added according to Koch (2003) and their phylogenetic position is indicated by a dotted line. Filled circles indicatetaxa with trnF pseudogenes. Taxa marked with open circles have been proved to contain no pseudogenes.

1038 Koch et al.

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

A comparison of the distribution of trnF anticodonduplications among cruciferous plants implies a single ori-gin of an initial duplication within a monophyletic lineage(fig. 4). The dates of divergence between nodes A (approx-imately 18 MYA) and B (approximately 16 MYA) providetime estimates (Koch, Haubold, and Mitchell-Olds 2000,2001). The phylogenetic hypothesis shown in figure 4 com-prises only a limited set of taxa. However, our finding of thedistribution of anticodon duplications among cruciferousplants is also fully consistent with a large-scale phylogenyprovided recently (Koch 2003) and not shown here.

The consistent co-occurrence of flanking regions withduplicated anticodon domains can be studied as an examplein somemoredetail focusingon the halimolobinecruciferdataset (Bailey, Price, and Doyle 2002). A. thaliana anticodon

pseudogene copy 1 (fig. 2a) is distributed in all speciesincluded in this study (fig.5,cf. fig.2,Supplementary Materialonline: alignment positions 966–1100). Pairwise sequenceidentity of this pseudogene copy 1 among the different speciesis always higher than compared to the original trnFgene (datanot shown), which also provides good evidence for the mono-phyletic origin of the first pseudogene copy.

In addition to the duplicated pseudogenes (anticodondomains) and the neighboring regions A–E, we were also abletocharacterizepromoterelementsthatshowhighsimilaritytoaputative sigma70-type bacterial promoter motif (�35 TTGACA/�10 GAGGAT) (Quandt et al. 2004). In a comprehensivestudy across land plants, this motif has been found consis-tently (Quandt et al. 2004), and it has been speculated thatthis promoter represents the original trnFGAA gene promoter.

FIG. 5.—Phylogenetic relationships among ‘‘halimolobine’’ crucifers and out-group taxa as published by Bailey, Price, and Doyle (2002). Theoccurrence of multiplicated trnF anticodon domains is indicated.

trnF Pseudogene Evolution in Cruciferous Plants 1039

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

However, it has been concluded that the trnFGAA gene iscotranscribed with trnLUAA (Kanno and Hirai 1993), andconsequently the �35 TTGACA/�10 GAGGAT promotermotif in frontof the trnFGAAgeneshouldbenonfunctionable.Our data largely support this conclusion because the�10 ele-mentandthe�35elementarepresent inseveral trnL-Fspacersequences of the genus Arabidopsis (fig. 1, SupplementaryMaterial online: position 198–203, position 900–905), andall duplications are inserted between these two elements.Consequently, it can be hardly believed that they are stillfunctionable.

TrnF Pseudogene Copy Number Evolution: The GenusArabidopsis

We can only speculate about the mode of origin of thefirst pseudogene copy, which dates back roughly 17 MYA.However, the example from Cardaminopsis-Arabidopsisprovides some more detailed insights into the dynamicsof subsequent copy number evolution (figs. 3a–c). In alleight cases the newly arisen copy was placed betweenalready existing pseudogenes (fig. 3c), and they did notmove further downstream of the 5# end of copy I. The par-simony analysis did not recognize all groups significantlywith high bootstrap support, but tree topologies are congru-ent when different proportions of the total pseudogeneregion have been selected (fig. 3a vs. fig. 3b), which mightindicate that in most cases the total region has been sub-jected to several duplication events. However, we cannotexclude additional recombination events, and the exampleof copy V might indicate such a situation: Relative positionand gap structure is totally conserved between both copies(fig. 2b), but parsimony analysis does not recognize them asorthologues (fig. 3a and b).

Similarly, genetic distances are not always in congru-ence with our hypothesis of trnF pseudogene evolution(table 3). One might expect that if we regard copy I asancestral type, this copy must show the highest sequencedistance when compared to the original functional gene.This is not the case, and copies VI and VIII show significanthigher distance values than copies I or VII. Our sequencedistance values provide a mutation rate for regions C–E (fig.2a), varying between 2.4 3 10�8 and 3.8 3 10�8 muta-tions/site/year. However, these values exceed the normalmutation rate of the entire trnL-intron–trnL-F spacer regionby a factor of 20 (3.6 3 10�9 to 7.7 3 10�9, e.g., calculatedin Mummenhoff et al. 2004), which can be at least partlyexplained by an increase of the mutation rate of singlenucleotides by structural mutations such as recombinationresulting in new copies. From our data it might be alsospeculated that the highly conserved 5# region of the firstpseudogene copy (as well as of the 3# part and, by selection,the trnF gene) might be the consequence of not being proneto recombination of these regions, in contrast to the regionin-between.

However, further research is needed to understand theunderlying evolutionary mechanisms.

Phylogenetic Utility of trnF Pseudogenes

It has to be mentioned here that the evolutionary his-tory of the Brassicaceae on a family-wide scale is stillpoorly understood (Koch 2003; Koch, Al-Shehbaz, andMummenhoff 2003). The most important conclusion of

Table 2Distribution of Short DNA Motifs A, B, C (Refer to Fig. 2a)Within the Arabidopsis thaliana Chloroplast Genome (thePosition Is Given in bp)

Motifa MatchSearch (7–8 bp)

Blast Search(Total RegionA 22 bp, 13and 12 bp)b

Region AAtacttc:a Spacer trnR-atpA 9811c

Spacer trnC-ycf6 27858Spacer ycf6-psbM 28181Intron ycf3 43127d

Spacer trnF 47704e

PsbG 49632AtpB 52836Intron clpP 71591PetD 77448e

rpl22 83910Intron rpl2 85065e

Ycf3 89331e

92400e

Intron trnI 103405Gene psaC 117334Gene ndhI 119555Spacer ndhB-trnL 144208Spacer trnV–rps7 138589, 138295

100354, 100060Rps7 gene 140925, 97724

Region Bagtagatt:a Intron petD 76681e

Intron rpl2 84775e

Intron atpF 11944Spacer trnT-psbD 31791Intron ycf3 43216d

Region Ccatagctt:a AtpA 10334c

ycf3 43692d

spacer trnN-trnR 130189c

a Sequence of the three motifs from region A–C. The position is indicated in

figure 2a.b The position is indicated in figure 2a.c Similar matching of regions A and C.d Similar matching of regions A–C.e Similar matching of regions A and B.

Table 3Simple Pairwise P Value (Mean) and Standard Deviation (Sequence Distance, PAUP4.0b10) of the Different trnF Copies(Region C–E, Refer to Fig. 2a and b) Compared to the Original trnF Gene Among the Twelve Arabidopsis AccessionsInvestigated Herein

Copy I Copy II Copy III Copy IV Copy V Copy VI Copy VII Copy VIII

0.160 (0.012) 0.195 (0.029) 0.138 (0.010) 0.206 (0.005) 0.205 (0.030) 0.222 (0.016) 0.133 (0.026) 0.214 (0.048)

1040 Koch et al.

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

the various phylogenetic studies published so far is that tra-ditional classification schemes based on morphology,embryology, or cytology often do not reflect phylogeneticrelationships, depending on the taxonomical level consid-ered. The occurrence of trnF pseudogenes among crucifer-ous plants is the first character defining a significant split inthe deep Brassicaceae phylogeny roughly 16–18 MYA.The corresponding clade comprises taxa from various arti-ficially designed tribes (Sisymbrieae, Arabideae, Lepidieae)as defined by traditional taxonomists such as Janchen(1942), Schulz (1936), or Hayek (1911). Future molecularstudies might substantiate our findings on a family-widescale to contribute clarifying the systematic situation inthe mustard family as it was done based on structural muta-tion in the chloroplast genome in various families (Aster-aceae: Jansen and Palmer 1987; Fabaceae: Bruneau, Doyle,and Palmer 1990; Doyle, Lavin, and Bruneau 1992; Poa-ceae: Doyle et al. 1992; Doyle, Doyle, and Palmer 1995;and reviewed by D. E. Soltis and P. S. Soltis 1998).

Acknowledgments

This work was supported by grants from the AustrianScience Foundation—FWF (GEN-15609 and GEN-14463)and the German Science Foundation—DFG (Ko-2302/1-1)to M.K. We also thank all authors providing us with theiroriginal DNA sequence alignments.

Literature Cited

Bailey, C. D., R. A. Price, and J. J. Doyle. 2002. Systematics ofthe halimolobine Brassicaceae: evidence from three loci andmorphology. Syst. Bot. 27:318–332.

Bakker, F. T., A. Culham, R. Gomez-Martinez, J. Carvalho,J. Compton, R. Dawtrey, and M. Gibby. 2000. Pattern ofnucleotide substitution in angiosperm cpDNA trnL(UAA)-trnF(GAA) regions. Mol. Biol. Evol. 17:1146–1155.

Besendahl, A., Y.-L. Qiu, J. Lee, J. D. Palmer, and D. Bhattacharya.2000. The cyanobacterial origin and vertical transmission ofthe plastid tRNALeu group-I-intron. Curr. Genet. 37:12–23.

Bleeker, W., A. Franzke, K. Pollmann, A. H. D. Brown, andH. Hurka. 2002. Phylogeny and biogeography of SouthernHemisphere high-mountainCardamine species (Brassicaceae).Aust. Syst. Bot. 15:575–581.

Bleeker, W., and H. Hurka. 2001. Introgressive hybridizationin Rorippa (Brassicaceae): gene flow and its consequencesin natural and anthropogenic habitats. Mol. Ecol. 10:2013–2022.

Bleeker, W., C. Weber-Sparenberg, and H. Hurka. 2002. ChloroplastDNA variation and biogeography in the genus Rorippa Scop.(Brassicaceae). Plant Biol. 4:104–111.

Borsch, T., K. W. Hilu, D. Quandt, V. Wilde, C. Neinhuis, andW. Barthlott. 2003. Noncoding plastid trnT-trnF sequencesreveal a well resolved phylogeny of basal angiosperms. J. Evol.Biol. 16:558–576.

Bowman, C. M., R. F. Barker, and T. A. Dyer. 1988. The locationand possible evolutionary significance of small dispersedrepeats in wheat ctDNA. Curr. Genet. 10:931–941.

Bruneau, A., J. J. Doyle, and J. D. Palmer. 1990. A chloroplastDNA structural mutation as a subtribal character in thePhaseoleae (Leguminosae). Syst. Bot. 15:378–386.

Cech, T. R., D. Herschlag, J. A. Piccirilli, and A. M. Pyle. 1992.RNA catalysis by a group I ribozyme: developing a model fortransition state stabilization. J. Biol. Chem. 267:17479–17482.

Costa, J. L., P. Paulstrud, and P. Lindblatt. 2002. The cyanobac-terial tRNALeu(UAA) intron: evolutionary patterns in a geneticmarker. Mol. Biol. Evol. 19:850–857.

Dally, A. M., and G. Second. 1990. Chloroplast DNA diversity inwild and cultivated species of rice (Genus Oryza, sectionOryza). Cladistic-mutation and genetic-distance analysis.Theor. Appl. Genet. 80:209–222.

Dobes, C., T. Mitchell-Olds, and M. Koch. 2004. Extensive chlor-oplast haplotype variation indicates Pleistocene hybridizationand radiation of North American Arabis drummondii, A.3divaricarpa, and A. holboellii (Brassicaceae) Mol. Ecol.13:349–370.

Doyle, J. J., J. I. Davis, R. J. Soreng, D. Garvin, and M. J. Anderson.1992. Chloroplast DNA inversions and the origin of the grassfamily (Poaceae). Proc. Natl. Acad. Sci. USA 89:7722–7726.

Doyle, J. J., J. L. Doyle, and J. D. Palmer. 1995. Multiple inde-pendent losses of two genes and one intron from legume chlor-oplast genomes. Syst. Bot. 20:272–294.

Doyle, J. J., M. Lavin, and A. Bruneau. 1992. Contributions ofmolecular data to papillionoid legume systematics. Pp. 223–251in P. S. Soltis, D. E. Soltis, and J. J. Doyle, eds. Molecularsystematics of plants. Chapman and Hall, New York.

Drabkova, L., J. Kirschner, C. Vlcek, and V. Pacek. 2004. TrnL-trnF intergenic spacer and trnL intron define major cladeswithin Luzula and Juncus (Juncaceae): importance of structuralmutations. J. Mol. Evol. 59:1–10.

Goremykin, V. V., K. I. Hirsch-Ernst, S. Wolfl, and F. H. Hellwig.2003. Analysis of the Amborella trichopoda chloroplastgenome sequence suggests that Amborella is not a basal angio-sperm. Mol. Biol. Evol. 20:1499–1505.

Govindaraju, D. R., B. R. Dancik, and D. B. Wagner. 1989. Novelchloroplast DNA polymorphism in a sympatric region of twopines. J. Evol. Biol. 2:49–59.

Graham,S.W.,P.A.Reeves,C.E.Burns,andR.G.Olmstead.2000.Microstructural changes in non-coding DNA: interpretation,evolutionandutilityof indelsand inversions inbasalangiospermphylogenetic inference. Int. J. Plant Sci. 161:S83–S96.

Hall, J. C., K. J. Sytsma, and H. H. Iltis. 2002. Phylogeny ofCapparaceae and Brassicaceae based on chloroplast sequencedata. Am. J. Bot. 89:1826–1842.

Hamilton, M. B., J. M. Braverman, and D. F. Soria-Hernanz.2003. Patterns and relative rates of nucleotide and insertion/deletion evolution at six chloroplast intergenic regions inNew World species of the Lecythidaceae. Mol. Biol. Evol.20:1710–1721.

Hayek, A. 1911. Entwurf eines Cruciferensystems auf phylogene-tischer Grundlage. Beih. Bot. Centralbl. 27:127–335.

Hewitt, G. M. 2001. Speciation, hybrid zones and phylogeog-raphy—or seeing genes in space and time. Mol. Ecol.10:537–549.

Hipkens, V. D., K. A. Marshall, D. B. Neale, W. H. Rottmann, andS. H. Strauss. 1995. A mutation hotspot in the chloroplastgenome of a conifer (Douglas fir: Pseudotsuga) is causedby variability in the number of direct repeats from a partiallyduplicated tRNA gene. Curr. Genet. 27:527–579.

Ingvarsson, P. K., S. Ribstein, and D. R. Taylor. 2003. MolecularEvolution of insertions and deletions in the chloroplast genomeof Silene. Mol. Biol. Evol. 20:1737–1740.

Janchen, E. 1942. Das System der Cruciferen. Osterr. Bot. Z.91:1–28.

Jansen, R. K., and J. D. Palmer. 1987. A chloroplast DNA inver-sion marks an ancient evolutionary split in the sunflower fam-ily (Asteraceae). Proc. Natl. Acad. Sci. USA 84:5818–5822.

Johnson, L. B., and J. D. Palmer. 1989. Heteroplasmy of chloro-plast DNA in Medicago. Plant Mol. Biol. 12:3–11.

Kanno,A.,andA.Hirai.1993.Atranscriptionmapofthechloroplastgenome from rice (Oryza sativa). Curr. Genet. 23:166–174.

trnF Pseudogene Evolution in Cruciferous Plants 1041

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Kelch, D. G., A. Driskell, and B. Mishler. 2004. Inferring phylog-eny using genomic characters: a case study using land plantplastomes. Mongr. Syst. Bot. Mo. Bot. Gard. 98:3–12.

Koch, M. 2003. Molecular phylogenetics, evolution and popula-tion biology in the Brassicaceae. Pp. 1–35 in A. K. Sharma andA. Sharma, eds. Plant genome: biodiversity and evolution, Vol.1. Phanerogams. Science Publishers, Inc., Enfield, N.H.

Koch, M., and I. A. Al-Shehbaz. 2002. Molecular data indicatecomplex intra- and intercontinental differentiation of AmericanDraba (Brassicaceae). Ann. Mo. Bot. Gard. 89:88–109.

———. 2004. Taxonomic and phylogenetic evaluation of theAmerican ‘‘Thlaspi’’ species: identity and relationship tothe Eurasian genus Noccaea (Brassicaceae). Syst. Bot. 29:375–384.

Koch, M., I. A. Al-Shehbaz, and K. Mummenhoff. 2003.Molecular systematics, evolution and population biology inthe mustard family (Brassicaceae). Ann. Mo. Bot. Gard. 90:151–171.

Koch, M., C. Dobes, and M. Matschinger. 2003. The trnF(GAA)gene in cruciferous plants: extensive duplication, variation incopy number and parallel evolution. Palmarum Hortus Franco-furtensis 7:54.

Koch, M., B. Haubold, and T. Mitchell-Olds. 2000. Comparativeevolutionary analysis of chalcone synthase and alcoholdehy-drogenase loci in Arabidopsis, Arabis, and related genera(Brassicaceae). Mol. Biol. Evol. 17:1483–1498.

———. 2001. Molecular systematics of the cruciferae: evidencefrom coding plastome matK and nuclear CHS sequences.Am. J. Bot. 88:534–544.

Kuhsel, M. G., R. Strickland, and J. D. Palmer. 1990. An ancientgroup I intron shared by eubacteria and chloroplasts. Science250:1570–1573.

Lanner, C. 1998. Relationships of wild Brassica species withchromosome number 2n518, based on comparison of theDNA sequence of the chloroplast intergenic region betweentrnL(UAA) and trnF(GAA). Can. J. Bot. 76:228–237.

Lee, J.-Y., K. Mummenhoff, and J. L. Bowman. 2002.Allopolyploidization and evolution of species with reducedfloral structures in Lepidium L. (Brassicaceae). Proc. Natl.Acad. Sci. USA 99:16835–16840.

Lidholm, J., A. Szmidt, and P. Gustafsson. 1991. Duplication ofthe psbA gene in the chloroplast genome of two Pinus species.Mol. Gen. Genet. 226:345–352.

Lihova, J., J. Fuertes-Aguilar, K. Marhold, and G. Nieto-Feliner.2004. Origin of the disjunct tetraploid Cardamine amporitana(Brassicaceae) assessed with nuclear and chloroplast DNAsequence data. Am. J. Bot. 91:1231–1242.

Lohne C., and T. Borsch. 2005. Molecular evolution and phylo-genetic utility of the petD group II intron: a case study in basalangiosperms. Mol. Biol. Evol. 22:1–16.

Marshall, H. D., C. Newton., and K. Ritland. 2001. Sequence-repeat polymorphisms exhibit the signature of recombinationin lodgepole pine chloroplast DNA. Mol. Biol. Evol. 18:2136–2138.

Martin, W., B. Stoebe,V. Goremykin, S. Hansmann, M. Hasegawa,and K. V. Kowallik. 1998. Gene transfer to the nucleus and theevolution of chloroplasts. Nature 393:161–165.

Matschinger, M., and M. Koch. 2003. Molecular systematics,phylobiogeography and evolution of the genus CardaminopsisHayek (Brassicaceae), the closest relatives of the modelplant Arabidopsis thaliana (L.) Heynh. Palmarum HortusFrancofurtensis 7:196.

Millen, R. S., R. G. Olmstead, K. L. Adams et al. (12 co-authors).2001. Many parallel losses of infA from chloroplast DNAduring angiosperm evolution with multiple independenttransfers to the nucleus. Plant Cell 13:645–658.

Mummenhoff, K., H. Bruggemann, and J. Bowman. 2001.Chloroplast DNA phylogeny and biogeography of the genusLepidium (Brassicaceae). Am. J. Bot. 88:2051–2063.

Mummenhoff, K., P. Linder, N. Friesen, J. L. Bowman, J.-Y. Lee,and A. Franzke. 2004. Molecular evidence for bicontinentalhybridogenous genomic constitution in Lepidium sensu stricto(Brassicaceae) species from Australia and New Zealand. Am.J. Bot. 91:254–261.

Newton, A. C., T. R. Allnutt, A. C. M. Gillies, A. J. Lowe, andR. A. Ennos. 1999. Molecular phylogeography, intraspecificvariation and the conservation of tree species. Trends Ecol.Evol. 14:140–145.

O’Kane, S. L. Jr., and I. A. Al-Shehbaz. 1997. A synopsis ofArabidopsis (Brassicaceae). Novon 7:323–327.

O’Kane, S., and I. A. Al-Shehbaz. 2003. Phylogenetic positionand generic limits of Arabidopsis (Brassicaceae) based onsequences of nuclear ribosomal DNA. Ann. Mo. Bot. Gard.90:603–612.

Olmstead, R. G., and J. D. Palmer. 1994. Chloroplast DNAsystematics: a review of methods and data analysis. Am. J.Bot. 81:1205–1224.

Palmer, J. D., and W. F. Thompson. 1981. Rearrangements in thechloroplast genomes of mung bean and pea. Proc. Natl. Acad.Sci. USA 78:5533–5537.

———. 1982. Chloroplast DNA rearrangements are morefrequent when a large inverted repeat sequence is lost. Cell29:537–550.

Paquin, B., S. D. Kathe, S. A. Nierzwicki-Bauer, and D. A. Shub.1997. Origin and evolution of group I introns in cyanobacterialtRNA genes. J. Bacteriol. 179:6798–6806.

Perry, A. S., S. Brennan, D. J. Murphy, T. A. Kavanagh, andK. H. Wolfe. 2002. Evolutionary re-organization of a largeoperon in Adzuki bean chloroplast DNA caused by invertedrepeat movement. DNA Res. 9:157–162.

Perry, A. S., and K. H. Wolfe. 2002. Nucleotide substitution ratesin legume chloroplast DNA depend on the presence of theinverted repeat. J. Mol. Evol. 55:501–508.

Quandt, D., K. Muller, M. Stech, J.-P. Frahm, W. Frey, K. W. Hilu,and T. Borsch. 2004. Molecular evolution of the chloroplasttrnL-F region in land plants. Monogr. Syst. Bot. MissouriBot. Gard. 98:13–37.

Reboud, X., and C. Zeyl. 1994. Organelle inheritance in plants.Heredity 72:132–140.

Schulz, O. E. 1936. Cruciferae. Pp. 227–658 in A. Engler andK. Prantl, eds. Die naturlichen Pflanzenfamilien, Vol. 17B.Verlag von Wilhelm Engelmann, Leipzig.

Soltis, D. E., and P. S. Soltis. 1998. Choosing an approach andappropriate gene for phylogenetic analysis. Pp. 1–42 inD. E. Soltis, P. S. Soltis, and J. J. Doyle, eds. Molecularsystematics of plants II. DNA sequencing. Kluwer AcademicPublishers, London.

Swofford D. L. 2000. PAUP* 4.0b10. Sinauer Associates,Sunderland, Mass.

Taberlet, P., L. Gielly, and G. Pautou. 1991. Universal primersfor amplification of three non-coding chloroplast regions.Plant Mol. Biol. 17:1105–1109.

Vijverberg, K., and K. Bachmann. 1999. Molecular evolution ofa tandemly repeated trnF(GAA) gene in the chloroplastgenome of Microseris (Asteraceae) and the use of structuralmutations in phylogenetic analysis. Mol. Biol. Evol.16:1329–1340.

Wakasugi, T., J. Tsudzuki, S. Ito, K. Nakashima, T. Tsudzuki, andM. Sigiura. 1994. Loss of all ndh genes as determined bysequencing the entire chloroplast genome of the black pinePinus thunbergii. Proc. Natl. Acad. Sci. USA 91: 9794–9798.

1042 Koch et al.

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Wittzell, H. 1999. Chloroplast DNA variation and reticulateevolution in sexual and apomictic sections of dandelions.Mol. Ecol. 8:2023–2035.

Wolfe, K. H., C. W. Morden, and J. D. Palmer. 1992. Functionand evolution of a minimal plastid genome from a nonphoto-synthetic parasitic plant. Proc. Natl. Acad. Sci. USA 89:10648–10652.

Xu, M.-Q., S. D.Kathe, H. Goodrich-Blair, S.A. Nierzwicki-Bauer,and D. A. Shub. 1990. Bacterial origin of a chloroplastintron: conserved self-splicing group-I introns in cyanobacte-ria. Science 250:1566–1570.

Yang, Y.-W., P.-Y. Tai, Y. Chen, and W.-H. Li. 2002. A studyof the phylogeny of Brassica rapa, B. nigra, Raphanussativus, and their related genera using noncoding regions ofthe chloroplast DNA. Mol. Phylogenet. Evol. 23:268–275.

Spencer V. Muse, Associate Editor

Accepted January 7, 2005

trnF Pseudogene Evolution in Cruciferous Plants 1043

by guest on May 15, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from