Evolution of the plastid ribosomal RNA operon in a nongreen parasitic plant: Accelerated sequence...

12
Plant Molecular Biology 18: 1037-1048, 1992. © 1992 Kluwer Academic Publishers. Printed in Belgium. 1037 Evolution of the plastid ribosomal RNA operon in a nongreen parasitic plant: Accelerated sequence evolution, altered promoter structure, and tRNA pseudogenes Kenneth H. Wolfe 1, Deborah S. Katz-Downie 2, Clifford W. Morden 3 and Jeffrey D. Palmer* Department of Biology, Indiana University, Bloomington, IN 47405, USA (*authorfor correspondence); Present addresses: 1Department of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland," 2Department of Plant Biology, Universityof Illinois, Urbana, IL 61801, USA,"3Departments of Botany and E.E.C.B., University of Hawaii, Honolulu, HI 96822, USA Received 8 November 1991; accepted in revised form 6 December 1991 Key words." molecular evolution, plastid DNA, promoter, pseudogene, ribosomal RNA gene, transfer RNA gene Abstract The nucleotide sequence of a 7.4 kb region containing the entire plastid ribosomal RNA operon of the nongreen parasitic plant Epifagus virginiana has been determined. Analysis of the sequence indicates that all four rRNA genes are intact and almost certainly functional. In contrast, the split genes for tRNA ~le and tRNA Ala present in the 16S-23S rRNA spacer region have become pseudogenes, and deletion upstream of the 16S rRNA gene has removed a tRNA v~l gene and most of the promoter region for the rRNA operon. The rate of nucleotide substitution in 16S and 23S rRNAs is several times higher in Epifagus than in tobacco, a related photosynthetic plant. Possible reasons for this, including relaxed translational constraints, are discussed. Introduction The chloroplast genomes of all land plants inves- tigated have an identical gene organization around the ribosomal RNA locus: four tRNA and four rRNA genes are arranged in the order trn V-16S-trnI-trnA-23 S-4.5 S-5 S-trnR. Complete nucleotide sequences of this locus have been re- ported for a bryophyte (Marchantiapolymorpha), two monocots (maize and rice) and two dicots (tobacco and pea) [9, 15, 20, 21, 29, 32, 33, 35], with partial sequences known from many others [5]. In all cases the genes for tRNA ne (GAU) (trnI) and tRNA Aja (UGC) (trnA) are interrupted by group II introns of 700-1000 nucleotides [20, 27]. The rRNA genes are cotranscribed as a precursor RNA of about 8 kb [1, 21, 38]. The in vivo start site of the primary transcript lies be- The nucleotide sequence data reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession number X62099.

Transcript of Evolution of the plastid ribosomal RNA operon in a nongreen parasitic plant: Accelerated sequence...

Plant Molecular Biology 18: 1037-1048, 1992. © 1992 Kluwer Academic Publishers. Printed in Belgium. 1037

Evolution of the plastid ribosomal RNA operon in a nongreen parasitic plant: Accelerated sequence evolution, altered promoter structure, and tRNA pseudogenes

Kenneth H. Wolfe 1, Deborah S. Katz-Downie 2, Clifford W. Morden 3 and Jeffrey D. Palmer* Department of Biology, Indiana University, Bloomington, IN 47405, USA (*author for correspondence); Present addresses: 1Department of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland," 2Department of Plant Biology, University of Illinois, Urbana, IL 61801, USA," 3Departments of Botany and E.E.C.B., University of Hawaii, Honolulu, HI 96822, USA

Received 8 November 1991; accepted in revised form 6 December 1991

Key words." molecular evolution, plastid DNA, promoter, pseudogene, ribosomal RNA gene, transfer RNA gene

Abstract

The nucleotide sequence of a 7.4 kb region containing the entire plastid ribosomal RNA operon of the nongreen parasitic plant Epifagus virginiana has been determined. Analysis of the sequence indicates that all four rRNA genes are intact and almost certainly functional. In contrast, the split genes for tRNA ~le and tRNA Ala present in the 16S-23S rRNA spacer region have become pseudogenes, and deletion upstream of the 16S rRNA gene has removed a tRNA v~l gene and most of the promoter region for the rRNA operon. The rate of nucleotide substitution in 16S and 23S rRNAs is several times higher in Epifagus than in tobacco, a related photosynthetic plant. Possible reasons for this, including relaxed translational constraints, are discussed.

Introduction

The chloroplast genomes of all land plants inves- tigated have an identical gene organization around the ribosomal RNA locus: four tRNA and four rRNA genes are arranged in the order trn V- 16 S-trnI-trnA-23 S-4.5 S-5 S-trnR. Complete nucleotide sequences of this locus have been re- ported for a bryophyte (Marchantia polymorpha),

two monocots (maize and rice) and two dicots (tobacco and pea) [9, 15, 20, 21, 29, 32, 33, 35], with partial sequences known from many others [5]. In all cases the genes for tRNA ne (GAU) (trnI) and tRNA Aja (UGC) (trnA) are interrupted by group II introns of 700-1000 nucleotides [20, 27]. The rRNA genes are cotranscribed as a precursor RNA of about 8 kb [1, 21, 38]. The in vivo start site of the primary transcript lies be-

The nucleotide sequence data reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession number X62099.

1038

tween trnV and the 16S gene. The site of tran- scription termination is less certain: trnR may or may not be included in the cotranscribed unit [5].

The two tRNA genes located between the 16S and 23 S genes in land plant plastid DNAs (ptD- NAs) are also found in equivalent positions in plastids of green, red and chlorophyll a/c algae [5, 18, 24]. Furthermore, the same pair of tRNA genes is found at the same location in some of the rRNA operons of Escherichia coli, Bacillus subtilis and the cyanobacterium Synechococcus [5], and thus this organization appears to predate the en- dosymbiotic origin of plastids. The trnI and trnA genes contain introns only in land plants and in one closely related lineage of green algae, leading to the conclusion that these introns were inserted into uninterrupted tRNA genes relatively recently (i.e., subsequent to the diversification of the major groups of green algae) [25]. This pair of introns is also unique among plastid introns in having extensive sequence similarity to each other (ap- proximately 50~o sequence identity, depending on the species and sequence alignment used) and so appear to be the products of an ancient duplica- tion event. In this regard it is interesting that they interrupt the tRNA cloverleaves at the same po- sition (two nucleotides downstream of the antic- odon).

The 71 kb plastid genome of the parasitic non- photosynthetic flowering plant Epifagus virginiana (beechdrops; family Orobanchaceae) contains barely a third of the genetic information of the 150 kb ptDNAs of green plants [7]. Among the missing genes are virtually all photosynthesis and chlororespiratory genes. However, transcripts of rRNAs and ribosomal protein mRNAs have been detected by northern analysis, which indicates that the Epifagus genome is still functional in tran- scription and probably also translation [7]. This implies that some genes in angiosperm p tDNA may have a role in a metabolic activity ofplastids that is not related to photosynthesis or chloro- respiration. On the other hand, sequence analy- sis of two regions of Epifagus ptDNA has shown that it lacks some other ribosomal protein genes and tRNA genes, in addition to photosynthetic and chlororespiratory genes [28, 48 ]. This might

suggest that the plastid is incapable of translation, or at least that its translation apparatus is signif- icantly different from that of photosynthetic plants.

Because it is unlikely that any translation could occur in Epifagus plastids without functional rRNAs, we have sequenced the rRNA operon from Epifagus ptDNA to determine whether the rRNA genes are intact. We report that the rRNA genes are highly similar to those of photosynthetic species and hence are likely functional, although their rate of nucleotide substitution is accelerated as compared to tobacco. However, the split trnI and trnA loci have become pseudogenes and the trnV gene upstream of the 16S gene has been deleted completely. This deleted region also en- compasses part of the former promoter of the rRNA operon.

Materials and methods

Total DNA from above ground tissue of Epifagus virginiana collected in Washtenaw Co. (Michi- gan) was extracted by the CTAB method [8]. A library of Sau3AI partial digestion products was made in the vector DASH II (Stratagene) and screened using purified tobacco p tDNA as a probe in plaque hybridizations. A set of overlap- ping phage clones covering most of the inverted repeat region of the Epifagus plastid genome was obtained in this screen (see also [48]). Hind III fragments mapping to the rRNA locus [7] were subcloned into pBluescript vector. Single- stranded templates of nested exonuclease III de- letion series clones [13] were sequenced by the dideoxy chain-termination method. Custom olig- onucleotides were used as necessary to complete the sequence on both strands, and parts of the rRNA locus were sequenced using internal prim- ers generously provided by Drs J. Manhart, N. Pace and E. Zimmer.

The sequence reported here is contiguous with that of the Epifagus plastid small single-copy re- gion [48; accession number X61368). The latter sequence also includes the gene for trnRAc ~, lo- cated downstream of the 5S rRNA gene. We have

not sequenced across four Hind II! sites used in cloning (positions 4786, 5058, 6159 and 7390), but in each case at least 20 bp on either side of the site are almost identical between Epifagus and tobacco [33], so we are confident that no small fragments have been overlooked. The first three of these sites are within the 23 S rRNA gene, and the fourth is between the 5S gene and trnR.

Results

Intact rRNA genes

The four rRNA genes found in ptDNA of pho- tosynthetic land plants are also present in the 7395 bp region ofEpifagus virginiana ptDNA, and their sizes and spacing are similar to what has been found in other species. Comparison of the Epifagus genes to those of tobacco (a photosyn- thetic species in the same subclass, Asteridae) reveals very high sequence identities (97-99~o) and very few length mutations (Table 1). Wimpee et al. [44] have recently reported the sequence of the plastid rRNA genes from Conopholis americana, a nonphotosynthetic parasite closely related to Epifagus in the family Orobanchaceae. The 16S and 23S sequences from Epifagus both have 96.8~o identity to their Conopholis homo- logues. Remarkably, this is no higher than their similarity to tobacco (Table 1).

The Epifagus sequences are compatible with the secondary structures proposed for tobacco

Table 1. Comparison between Epifagus and tobacco plastid rRNA genes.

16S 23S 4.5S 5S

Length: Epifagus 1492 2804 103 121 tobacco 1489 2810 103 121

% identity 97.2 97.0 99.0 97.5 Number of 41 85 1 3 nucleotide differences Number of gaps 2 8 0 0 in alignment Spacer length: Epifagus 2009 97 188 233 a

tobacco 2078 101 256 257 a

a Distance to trnR.

1039

16S and 23S rRNAs [12, 37]. Many of the nu- cleotide substitutions within base-paired regions are nondisruptive (i.e., either compensatory pairs of changes or G:C ~ G:U and G:U ~ A:U substitutions), and all of the length mutations occur within (or adjacent to) unpaired regions in the secondary structure models. The most sub- stantial difference between the two species is a deletion of nine nucleotides around position 2160 in the Epifagus 23 S rRNA. This deletion reduces an 8 bp stem and 4 bp loop found in other an- giosperm plastid and eubacteria123 S rRNAs [ 12] to a 3 bp stem and 5 bp loop in Epifagus. This stem-loop structure is also aberrant in the bryo- phyte Marchantia and is completely absent from the equivalent location in mitochondrial and eu- karyotic rRNAs [ 12], so its exact size is unlikely to be critical to rRNA function.

The 4.5S rRNAs of Epifagus and tobacco are identical except for the first nucleotide. The 5S rRNAs differ at only three positions, all located in helix II in the secondary structure model [41]. Two of these changes are compensatory and the third is a G:C ---, G:U substitution. The intact- ness and high degree of sequence conservation of all four rRNA genes, together with evidence for transcription of the operon [7], leads us to con- clude that the rRNAs of Epifagus plastids are probably functional.

The spacers between the rRNA genes are of similar length in Epifagus and tobacco, with the exception of the 4.5S-5S spacer, which is 188 bp in Epifagus but 256 bp in tobacco (Table 1). Al- most half of this size difference is due to deletion of one copy of a 32 bp direct repeat found in tobacco and spinach but not other species (data not shown; [1]).

Accelerated evolution of the 16S and 23S rRNA genes

The 16S and 23S rRNA sequences from Epifagus were aligned with those from other land plant plastid genomes, and phylogenetic trees were constructed by parsimony analysis (Fig. 1). The reliability of branching orders was assessed using

1040

(a) 16S rRNA 22

(16°/o) 3~ 1 25 1111 ~ tobacco 7TR

(78°/°) r - - q ( 00o/o>23 r

I . 4 mgo/o,H ~ I 10 soybean

1(100°/°) 31 ~r4ie maize 0

Epifagus

Conopholis

15 pea

25 Vicia

(b) 23S rRNA (100%)42

(58°/°) 22 12 tobacco zx IR

__~ L ' 63 pea AIR ~9 maize

(100%) 54 U rice

33 Epifagus

Conopholis

Fig. 1. Phylogenetic trees for plastid 16S and 23S rRNA se- quences produced by parsimony analysis. The trees were pro- duced using the exhaustive search option of PAUP (version 3.0q; D. Swofford, Illinois Natural History Survey). The num- ber of nucleotide substitutions assigned to each branch are indicated. Bootstrap values (100 replications) are shown in parentheses for all internal branches. The trees were rooted using the Marchantia sequences [29]. Lineages in which the inverted repeat (IR) region has been deleted are indicated by IR. Data sources: Conopholis americana [44], tobacco [33], pea [35], Viciafaba ([ 11]; J. Gao, pers. comm.), soybean [42], Brassica napus [10], maize [9,32], rice [15].

100 bootstrap replications. Trees were also con- structed by the neighbor-joining method [31] and gave essentially the same results (not shown).

Parsimony analysis of the available land plant 16S rRNA sequences resulted in four trees of equal length that differed in the branching order among four families of dicots (as represented by the Orobanchaceae, the legumes, tobacco and

Brassica). These dicots are normally placed in three subclasses: Asteridae (tobacco and the par- asites), Rosidae (the legumes) and Dilleniidae (Brassica). The tree shown in Fig. la was chosen over the others because it is the only one in which the Asteridae form a monophyletic group, though the proportion of bootstrap replicates supporting this group is very low (16%). The branch lengths differ dramatically among the lineages. For ex- ample, all but one of the 41 nucleotide substitu- tions between Epifagus and tobacco are assigned to the Epifagus lineage. Similarly, the rates of mo- lecular evolution in the 16S genes of pea and Vicia faba are greater than that in soybean, which in turn is greater than those in Brassica and tobacco.

Analysis of the 23S rRNA led to a single short- est tree with similar branch length variation (Fig. lb), though fewer sequences have been re- ported. The lineage leading to Epifagus is in this case six times longer than that for tobacco, as compared to the fortyfold difference for the 16S rRNA.

Pseudogenes for tRNA lie and tRNA Aza and their in- trons

The 16S and 23S rRNA genes are separated by 2009 bp in Epifagus (Fig. 2), a distance slightly smaller than the range (2.1-2.4 kb) reported for photosynthetic plants (summarized in [5]), though substantially larger than that in Con@- holis [44]. This spacer region normally contains the split genes for tRNAne(GAU) and tR- NAA]a(UGC), but these are clearly pseudogenes in Epifagus: both exons of both genes contain length mutations that preclude the formation of normal cloverleaf secondary structures (Fig. 3).

Fig. 2. Alignment of Epifagus and tobacco ptDNA sequences in parts of the rRNA locus. Only differences in the tobacco sequence [33] are shown; dashes indicate insertions/deletions. Coding regions were inferred by comparison to the tobacco sequence and are boxed. The site of initiation mapped for the Y-rpsl2 transcript in tobacco [14] is also boxed. Arrows mark the counterparts of in vivo start sites of rRNA precursor transcription mapped in four species [2,19,34,36]; the mapping in duckweed is approxi- mate [19]. The - 3 5 and - 10 elements of the putative P1 and P2 promoters [5], the spinach CDF2 protein binding site [2], and the possible trnV transcript start site regions (PE3, PE4) mapped in maize [34] are marked. Letters in brackets indicate the five intronic insertions/deletions described in the text. The Epifagus sequence shown is part of the 7395 bp that includes complete sequences of the four rRNA genes (Fig.4).

3'-rpsl2 transcript

tagtaaattacaaattatgtctcagtaggacatg~attt ~c~tgactattaaaa .... aatttaagtagttaatgg%ggagttaccattatccCttttgt 96 c g c c g t~ t g ttcat a g a g - 102150

~ttttgtagtgataaatcttttatatgtt~ttatatgttcttaa~aaaaagaaaattttt~aggqgtcccaa-ggggcgtggaaa~aaa~aataactctt 195 ...... cg g g cc agaa ..... gg tttgtcca c a gc g 1o2239

gaattqaaatggatggaaaqggtaactc~ag%~tcttcgtaatcgctagtaaatc~tatttc~tatgggggcagttgacaattttatchtaaattt~c~a 295 g ..... a g t - tc g g c g a ga g tt g i02333

PE9 ttttt~taatatacgta~gtatttcatatacgaatgtaataggtcgaaaagaaga~ccggctataagttgttgaagatgcgt~c~cattgagc~tctcgg 395

a c c ..................... tg g cc c ata - g tt a i02411

t rnV c .... cattttttgaccagtcaatgttatttatctatggtgat ......................................................... 434 cctt gac ag tt g tc c g g cgggg~agqgatataactcagcggtagagtgtcaccttgacgtggtggaagtcatcagt 102511

.................................................................................................... 434 tcgagcctgattatccct~agcccaatgtgagtttttctagt~ggattt~ctc~cccgcc~tcgttcaatgagaatggataagaggctcgt~gga~tgac 102611

P1 -35 P1 -10 CDF2 P2 -35

........... ggtgtggctatatttctgggagtgaactccg~gagaatat~aagcg~atggatccaagttatgccttcgaattaaagacaa~tatgaat 523

oog 9 9co I . . . . . . . . . . . .

~S maize pinach pea duckweed

ccgctttgtctacgaacaaggaagctataagtaatgcaactatgaa~ataatggagag~ttgatc~tggctcaggatgaacgctggcc c c ....... 102798610

16S rRNA (5' and 3' ends)

....... ggqggatg~cgaa~gcagggctagtgactggagtgaagtcgtaacaaggtagccgtactggaaggtgc~gctggatcacctcctt~tcagg 104251 2066

Epi rob

Epi tob

Epi rob

Epi tcb

Epi tob

Epi rob

Epi rob

Epl rob

Epi teb

gagagcta ............................................................. catctgagctaaactt t gaqatggaat t tit 2105 Epi atgcttgttgggtattttggtttgacactgcttcacacccccaaaaaaaagaagggagcta g t g g c 104351 rob

tttt cgtttatcaacggt gaagtaagactaagctcattagcttatt at cct aggtcggaacaagttgtt ggttgat aggatacccct cat tttgt t at gc 2205 Epi c c C g c g ....... c ..... t c t 104439 tob

ccccatqcctgtcgacct acga-gat atgggggt gcaaat aagt aaagaagaggggt ggggt t gct ct cgatctcgcttt t ggcatagcgggcccct tag 2304 Epi t- cc cc q gt gc c c a - g - a t . . . . . . . 104529 tob

trnI exon 1 rA] t ggggggct cacacgaga~gggct attctatt cttatt ctatt aqctcagtggt agatcgcacccc ..... ~ ..... ~ ...... ccagqgctgtgaggg 2385 Epi

a g .................. | g g tgataa~tgcgtcgttgtg t 104611 tob

ctctcagacacatatatagttcaatgtgctcatcggcqcctgactcttacatgtgtatcatccaa~ ........... t ggcat act cct cctg~ccgagc 2474 Epi c gg c g g g "gcacattagca g t a i04711 rob

cggggttttaaaccaaat ctatcccaagtaggat agacgggt cgatt cgggt gagatccaatgt atatcct attttt gattaaat cgtgggct acggg-- 2572 Epi g ctcc tc g t g g a c c c C a c cg 104811 tob

[c] [B] ....... ggggacca••acggcccttctcttctc•agaatacatacatcccttatca•atcaatgtagtgtatggacagctatctctcgagcacaggttt 2665 Epi gtccggg t c C .......... 104901 tob

agagttcaat gaaaaaaaaaaat ggaatacctaacaacgcattt t cacagaaaaagaactat gagaccacccctttcatt ctggggtgacggagggatcg 2765 Epi .... gg t gc c cc c t g 104997 tob

taccattcaagt cgtttttttttcccgagg£ctggagaaagct gaaaccaataggatttccctaat cctcccttcccgaaagaaaggaagagcgtgtt tt 2865 Epi ............................................................................... 105018 rob

g c [D]

•acttt•cgcagggaccaggagat•tgatctagc•ataataaaaatgcttggtataaaaaac•aact%atcttgacctcagtcatta•gaacgcctccta ~ 2965 Epi .................................................................................................... 105018 tob

taagtgcatctatt agt att acaatgggatgtzt ctattt t ctct c-aaatgggggc ........... aggatct t agagtgt ct agggttgggccgg~a 3053 Epi ...................................... c ga g a agqtttgaaaa a g 105080 tob

gggt ctcttaacgcctt ctat ttt~ttattct c-tcggagttat ttcacaaagactt gccagggtaaggaaggaat aagggggaaaaagcacactcggag 3152 Epi --- c a a--- g c t 105174 tob

agcgtagtacaacgga-agtt at atgct gaatt agggaag ......... ct cccgaaaggt a atct art gait ct ctcccaatt ggtagggccat aggt g 3242 Epi c g g cg c gatgaatcg a g t a g 105274 rob

trnI exon 2 cgatgatt~actccat~ggctaggtctctggttcaagtccaggatg~ccca~ctgcgctaggaaaaa~aatataaaaagcatctgattacttc~tgcatg 3340 Epi

g g c g g g g c 105374 tob

trnA exon 1

ctt .......... g~gtggatataa~t~agttggtttgatagagcta~g~tcttgcaa~tg~gt~gttg~aattatggg~tggatgtataaatgt~cagg 3430 Epi ccacttggctc I | q g ..... cc g c c t 105469 tob

ca ...... atagtatcttgtacctgaaccggtggctcactttttagaggtataagtaatggggaagagga .............................. 3494 Epi ggta3tg c ...... ccgaaacgt gccactgaaaga ctctactga 105563 tob

[El .................................................................................................... 3494 Epi gacaaagatgggctgtcaagaacgtagaggaggtaggatgggcagttgg••agatctagtatggatcgtacatggacggtagttggagtcggcggctct• 105663 tob

..................................... taaagttggcccttgcgaacagcttgatgtactatctcccttcaaccctttgagcgaaaggcg 3557 Epi ccagqgttccctcatctgagatctctggggaagagga c c t 105763 rob

gcaaaaggaaaggaaggaaaata~at gtaccaaccccatcatct ccacccct cagt aactat gagt at gagat caccccaaa ......... ggcgt acag 3648 Epi a cc g g gt g c ...... ggacgccttc a c 105857 tob

gggtcacggaccgac•agagaaccctgttcaataagtggaactg--tagctg•ccgttttcaggttgggtaattaagggtcgtataagggcaatcactca 3746 Epi t gcat c C c g c - g g g 105956 tob

tt .... attataataaaaaa .............. tttgcgt .... tcttaag ....... gcc ...... aaaggatcg---ggtgaaa ..... aag ..... 3798 Epl cttag g ggg ttccaactcagcacct a gaga t ~ aagagtt tctttgg g c ca tac a gttgt ctgtg 106056 rob

trnA exon 2 ---ggqgg ........ t gt atatcgtt agcct ctat ggt agaataagccggt tt atatgataggcagtggtt tac .... -~gcagatgtc ...... tcga 3879 Epi ttc ggagttat c ~ c t ggg cc g g cctg~ g agcggt 106156 rob

gtccgc ...... cc~ctcat t aacttagccgat acaaagacaaagctat a tgatagcaccccaatt tttttgatccagcggttctatctatgattt--- 3966 Epi ttatct ] g g ...... t - cc t g g arc 106249 tob

--t catggatgt tgat aagatccat ccattat agcagcacct taggat ggcat agcgcct tata~t ttttatt atttaaaatt aagtat aagt aataagg 4064 Epi at c . . . . . . . . . . . . . . . . . . . C g g ............ 106318 rob

23S rRNA (5 ' end)

gcgaggItcaaa•gagaaaaggcttacggtggatacctaggcaccca•agacgaggaaggatgtagtaatcgacgaaatgcttcggggagttgaaaataa 4164 Epi g gc 106418 tob

1041

1042

lie (GAU) Ala (UGC) A A

- G - C X-~ , o o G - C G • c o G O U . G A i A-U U •

, U - A G G A c c U G A ,,A O O C G C C U G A G A A u • IL IL A

C U C G U C U G G u U G A c u c A A u O ( I I O O 0 ~ U G G I I I C U C I I i C ~ C

G G A U C G U U G G A G C U U U U A ~¢ C A G LLz. G U A ~A G A G

~ A c ~ G >~G A~ U

C-G ~ - ~ I n t r o n C - G / i n t r o n C - G ~ r" U O /

C • C A

• • u c A l l O 0 u o

• I l e Fig. 3. Cloverleaf representations of the Epifagus tRNA and t R N A A]a pseudogenes. Large dots indicate bases deleted rela- tive to tobacco, and asterisks show nucleotide substitutions.

Exon 1 of trnI contains a near-perfect tandem quadruplication of the sequence CTATT normally present only once, and the last 6 bp of the exon (including the anticodon) have been deleted along with the first 13 bp of the intron (Figs. 2, 3). Exon 2 of trnI is the most intact of the four tRNA exons but contains a single-nucleotide deletion (Figs. 2, 3). trnA has suffered a 5 bp insertion in exon 1 and two 6 bp deletions in exon 2, as well as deletion of the 3' intron-exon junction. These length mutations, together with several point mu- tations that disrupt base-pairing and the deletion of two splice sites, clearly indicate that these tRNA genes cannot be functional.

The complete nucleotide sequence of the E. vir- giniana plastid ribosomal RNA locus is shown in Fig. 4.

The intron sequences lack some highly con- served sequence domains found in group II in- trons (marked by letters in square brackets in Fig. 2; [27]) and are most unlikely to be spliced. In trnI, a deletion (marked [A]) crossing the 5' splice junction has removed the IBS1 element normally present in exon 1. Loop I D3, which con- tains the EBS1 element that should pair with IBS 1, is abnormally large due to a 10 bp insertion [ B ] next to EB S 1. These changes in the IB S 1 and EBS 1 regions do not appear to be mutually com- pensatory. Another deletion [C] in the trnI intron has removed part of stem I TM and EBS2.

The greatest region of length difference between the Epifagus and tobacco trnI introns [D] is, iron- ically, the result of an apparently benign deletion

of about 220 bp that occurred in the tobacco lin- eage. This inference can be made because the length difference occurs entirely within an un- paired sequence in the loop of domain III, a loop that differs greatly in length among group II in- trons [27], and because the sequence present in Epifagus is very similar to domain III loop se- quences from trnI introns in maize, rice and soy- bean [4, 15, 20]. Comparison with other pub- lished sequences [26, 29, 33, 35] reveals that this loop has contracted in size on at least three in- dependent occasions during dicot ptDNA evolu- tion. Its presence in monocots (as well as in Marchantia) indicates that the large domain III loop was present in the ancestor of dicots, but it has been lost: (i) from pea ptDNA subsequent to its divergence from the soybean lineage (both are in the family Leguminosae); (ii) from tobacco subsequent to its divergence from Epifagus (sub- class Asteridae); and (iii) from spinach subse- quent to the monocot/dicot divergence.

A deletion of 167 bp [E] from the Epifagus trnA intron has removed much of domains I c and I D, including EBS2. The endpoints of this deletion correspond to a perfect 12 bp direct repeat (TGGGGAAGAGGA) in tobacco, only one copy of which is retained in Epifagus (Fig. 2). Domains IV, V and VI in the Epifagus trnA intron are also divergent and have sustained several short deletions; the deletion in domain VI extends over the 3' splice site.

Deletion of trnV and part of the former 16S rRNA promoter

Upstream of the 16S gene in Epifagus we have identified the 3' part of the trans-spliced gene rpsl2 (Fig. 2 and data not shown). The trn V gene normally found between these two genes has been lost completely as part of a deletion of 168 bp (Fig. 2). This region may have been deleted in a single event, or could be the result of a series of small deletions. In vivo start sites for transcription of the 8 kb rRNA precursor molecule have been mapped to between trnV and the 16S gene in four angiosperm species and are indicated in Fig. 2.

16S rRNA

trnl

~trnA

23S rRNA

4.5S rRNA

5S rRNA ~1'

1043

tagtaaatta tgtagtgata gaaatggatg ttaatatacg ttttgaccag tcgaattaaa GAACGCTGGC AAC-%ACAGCC CAATAGCTTA GGAATTTTCC GTATCTGGGG GTGGCTTTTT GGAGCGGTGA GGATTAGATA TGGGGAGTAT CCAGGGCTTG GTTGGGTTAA TGAGGATGAC CCAAAAACCC CCCGGCCCTT GACTGC~AGTG tttttttttc tatgccccca tcagtggggg atagttcaat agtaggatag agaatacata aacgcatttt agaaagctga ataataaaaa tat~ttctct caaagacttg ggtaatctat aggaaaaaaa aattatgggt tggcccttgc tccacccctc cgttttcagg gggtgtatat gatacaaaga gcaccttagg CTAGGCACCC TGAACTGCTG CGGCGAGCGA TAGATGGCGA TTGCAAGGCT

caaattatgt ctcagtagga catg<atttc aatcttttat atgtttttat atgttcttaa gaaagggtaa ctccagtgtc ttcgtaatcg tacgtatttc atatacgaat gtaataggtc tcaatgttat ttatctatgg tgatggtgtg gacaattatg aatccgcttt gtctacgaac GGCATGCTTA ACACATGCAA GTCGGACGGG AAACGGCT GCTAATACCT CGTAAGCTGA CCAAGGCGAT GATCAGTAGC TGGTCCGAC.A GCAATGGGCG AAAGCTGACG GAGCAATGCC AATAAGCATC GGCTAACTCT GTGCCAGCAG ATGTCCGCCG TCAAATACCA GC-GCTTAACC AATGTTTAGAGATCGGGAAGAACACCAACG CCCCAGTAGT CCTAGCCGTA AACGATGGAT GTTCGCAAGAATGAAACTCAAAGGAATTGA ACATGCCGCG AATCCTCTTG AAAGAGAGGG GTCCCGTAAC GAGCGCAACC CTCGTGTTTA GTCAAGTCAT CATGCCCCTT ATGCCCTGGG GTCCTCAGTT CGGATTGCAG ACTGCAACTC GTACACACCG CCCGTCACAC TATGGAAGCT AAGTCGTAAC AAGGTAGCCG TACTGGAAGG gtttatcaac ggtgaagtaa qactaagctc tgcctg[cga cctacgagat atgggggtgc gc[cacacga gac~TAT TCTATTCTTA gtgctcatcg gcgcctgact cttacatqtg acgggtcgat tcgggtgaga tccaatgtat catcccttat catatcaatg tagtgtatgg cacagaaaaa gaactatgag accacccctt aaccaatagg atttccctaa tcctcccttc tgcttgqtat aaaaaactaa c~tatcttga caaatggggg caggatctta gagtg~ctag ccagggtaag gaeggaataa gggggaaaaa tgattctctc ccaattqgta gqgccatagg tataaaaagc atctgatzac ttcatgcatg tggatgtata aatgtccagg caatagtatc gaacagcttg atgtactatc tccct~caac agtaactatg agtatgagat caccccaaag ttggg~aatt aagggtcgta taagggcaat cgttggcctc tatggtagaa taagccggtt caaagctata tgatagcacc ccaatttttt atggcatagc gccttatatt ttttattatt AGAGACGAGG AAGGATGTAG TAATCGACGA CTGAATCCAT GGGCAGACAA GAGATAACCT AATGGGAGCA GCCTAAACCG CGAAAACGGG AAGTCCAGTA GCCGAAAGCA TCATTAGCTT AAATACTCCT GGGTAACCGA TAGTTAAGTA

TAAGCTCCCA AGCAGTGGGA GC.AGCCAC.GG CTCTGACCGC GGAGCCATAG CGAAAGCGAG TATTCATAGG GCAATTGTCA TAAGTGGAGG TCCGAACCGACTGATGTTGAAGAATCAGCG AATGCGTTGA GGCGCAGCAG TTGACTGGAC ATCTAGGGGT TAGAATATGA CCTAAAATA ATAGGGGTAA AGGTCGGCCA GGCCCCTAAT TGATCGCTCA GTGATAAAGG AGGTTAGGGG TTACTGATCG AGCGCTCTTG CGCCGAATAT GAACGGC-C~T GA~C.CC TCTGCACGAG CAGCC4~TC.GA CGAAGCGGAA AGGGTTCCTC CGCAAGGTTC GTCCACGGAG GGTGAGTCAG CCCTCGCTGG TCCCGAGGGACGGAGCAGGC TAGGTTAGCC GGGGTAGAGAAAATGCCTCG AGCCAATGTT TGAGCACCAG ACAAAAGGGC ACCTGTACCT GAAACCGACA CAGGTGGGTA ATAACTTCGG GAGAAGGGGT C=CCTCCTCGG GGGGGTCGCA TATAT~ CTGACGCCTG CCCAGTGCCG GAAGGTCAAG AACTATAACG GTCCTAAGGT AGCGAAATTC CTTGTCGGGT GAAATAGACA TGTCTGTGAA GATGCGGACT ACCTGCACCT GCAGCTTAGG TGGAAGGCGAAGAAGGCAGT GGGCCCGAGC CCGCGGGACA GTCTCAGGTA GACAGTTTCT ATGGGGCGTA CTCGAGTGCAAAGGCATAAG GGAGCTTGAC TGCAAGACCC GTCGCTCAAC GGATAAAAGT TACTCTAGGGATAACAGGCT ACCTGGGGCT GTAGTATGTT CCAAGGGTTG GGCTGTTCGC GTGTGGGCGT TAGAGCATTG AGAGGACCTT TCCCTAGTAC GGTAGCCAAG TGCGGGGCGGATAACTGCTGAAAGCTCTA gcacagtcga tacagcgacg ggttctctgc ccctgtgggc ATTACGATAG GTGTCAAGTG GAAGTGCAGT GATGTATGCA aqatcaggcc cccgccatct attttcattg ttcaaatctt tgtcaaqaat tggggcctcg caatcacctt ttctctca[g CATCCCGAAC TTGGTGGTTA AATTATACTG CC-GTGACGAT

catgactatt taaaaagaaa ctagtaaatc gaaaagaaga gctatatttc aaqgaagcta AAGTGGTGTT GGAGCTAAAG GGATGATCAG GCGTGGACGT CCGCGGTAAG CCGGACAGGC C.CGAAAGCAC ACTAGGCGCT CGGGGGCCCG GTGCCTTCGG GTTGCCCTCG CGACACACGT GTCTGCATGA ~CATGCCC TGCAC.CTGGA attagcttat aaataagtaa TTCTATTAGC tatcatccaa atcctatt[t acagctatct tcattctggg ccgaaagaaa cctcagtcat ggttgggccg gcacactcgg Egcgatgatt ct t ggGTGGA ttgtacctga cctttgagcg gcgtacaggg cactcattat tatatgatag tgatccagcg taaaattaag AATGCTTCGG GGCGAACTGA GTTGT~AG GTC-CTCTGAC GTACCGTGAG GTGCCTGTTG CTGCTTATGG GATGAGTTGT AAAGCACTGT GTGAGACGAT TGCAGAGACA AAGCGATTCG GCGAGAATGT GGCCTAAC~AT GAGAAAGATG GCC-CTACGGC GGTAGAGAAT GTGATCAGGC GAAGTTGGTG AAGTTCCGAC GGACAGAAAG CATCAGTGAG GGCCTCCCAA ACCCGTCGAG GATCTTCCCC CCATTAAAGC GAGAGGACCG AGTAGTAAGC atggagcgac GCTGAC-GCAT tgacaacatg cctttcttcg ACTGTAGGGG

aaaaaattta attttttagg ctatttccta cccggctata tgggagtgaa taagtaatgc TCCGGTGGCG GAGGAATCCG CCACACTGGG AGAAGGCCCA ACAGAGGATG GGTGGAAACT TCTGCTGGGC GTGCGTATCG CACAAGCGGT GAACGCGGAC TTTAATTTGT GCTACAACGG AGCCGGAATC GAAGTCGTTA TCACCTCCTT tatcctaggt agaagagggg TCAGTGGTAG gtggcatact tqattaaatc ctcgagcaca gtgacggagg ggaagagcgt tacgaacgcc g~agggtctc agagcqtagt tacttcatGG TATAACTCAG accggtggct aaaggcggca gtcacggacc Eataataaaa gcagtggttt gttctatcta tataagtaat GGAGTTGAAA AACATCTTAG AGCAATAAAA CCGAGTAGCA GGAAGGGTGA AAGAATGAGC ACCCGAACCT GGTTAGGGGT TTCAGTGCGG GGGGGATAAG GCCAGGAGGT CCGAAC.CTGT CGGCTTGAGT CAGGCCGAAA GTTATCGGTT GCTGAAGTAA ACCTAGGC4ZC CCGGGCGACT ACCTGATTAC CCGCACGAAA ACCCTATGAA

agtagttaat ggtggagtta ccattatcct ttttgttttt i00 ggtctcaagg ggcgtggaaa caaataataa ctcttgaatt 200 tgggggcagt tgacaauttt atcctaaatt taccattttt 300 agttgttgaa gatgcgtgcg cattgagcct ctcggctatt 400 ctccgggaga atatgaagcg tatggatcca agttatgcct 500 aactatgaaT ATAATGGAGA GTTTGATCCT GGCTCAGGAT 600 GACGGGTGAG TAACGCGTAA GAACCTGCCC CTGGGAGGC4Z 700 CCCGAGGAGG GGCTTGCGTC TGATTAGCTA GTTGGTGGC-G 800 ACTGAGACAC GGCCCAGACT CCTACGGGAG C.CAGCAGTGG 900 CGGGTCGTGA ACTTCTTTTA CCGAAGAAGA AGCAATGACG 1000 CAAGCGTTAT CCGGAATGAT TGGGCGTAAA GCGTCTGTAG ll0O ATCTAGCTGG AGTACGGTAG GGGCAGAGGG AATTTCCGGT 1200 CGACACTGAC ACTTAGAGAC GAAAGCTAGG GGAC-CGAATG !300 ACCCGTGCAG TGCTGTAGCT AACGCGTTAA GTATCCCGCC !400 GGAGCATGTG GTTTAATTCG ATGCAAAGCG AAGAACCTTA 1500 ACAGGTGGTG CATGCCTGTC GTCAGCTCGT CCCGTAAGGT 1600 GGAACCCTGA ACAGACTGCC GGCGAGAAGC CAGAGGAAGG 1700 CCCC-GACAAA GGGTCGCGAT CCCGCGAGGG TGAGCTAACC 1800 GCTAGTAATC GCCGGTCACC CATACGGCGG TGAATTAGTT 1900 TCTTAACCAA AAGGA~ ATCCCGAAGG CAGGGCTAGT 2000 Ttcagggaga gctacatctg agctaaactt tgagatggaa 2100 cggaacaagt tgttggttga taggataccc ctcattttgt 2200 tggggrtgct ctcgatctcg cttttggcat agcgggcccc 2300 ATCGCACCCC ccagggctgt gagggctctc agacacatat 2400 cctcctgtcc gagccggggt tttaaaccaa atctatccca 2500 gtgggctacg ggggggacca ccacggccct tctcttctcg 2600 gqtttagagt tcaatgaaaa aaaaaaatgg aatacctaac 2700 gatcgtacca ttcaagtcgt ttttttttcc cgaggtctgg 2800 gtttttactt tccgcaggga ccaggagatt tgatctagcc 2900 <cctataag[ gcatctatta gtattacaat ggqatgtttc 3000 [taacgcctt ctattttttt attctctcgg agttatttca 3100 acaacggaag ttatatqctg aa~tagggaa gctcccgaaa 3200 C.CTAGGTCTC TGGTTCAAGT CCAGGATGCC CAgctgcgct 3300 TTGGTTTGAT AGAGCTAGGC TCTTGCAAtt gggtcgttgc 3400 cactttttag agqtataagt aatggggaag aggataaagt 3500 aaaggaaagg aaggaaaata tatgtaccaa ccccatcatc 3600 gaccaqagaa ccctgttcaa taagtggaac tgtagctgtc 3700 aat[tgcgtt cttaaggcca aaggatcggg tgaaaaaqgg 3800 acGCAGATGT CTCGAGTCCG CCCAactcat taacttaqcc 3900 tgattttcat ggatgttgat aagatccatc cattatagca 4000 aagggcgagg TTCAAACGAG AAAAGGCTTA CGGTGGATAC 4100 ATAAGCATAG ATCCGGAGAT TCCCGAATAG GGCAACCTTT 4200 TAGCCAGAGG AAAAGAAAGC AAAAGCGATT CCCATAGTAG 4300 GTGTCGTGCT GCTAGGCGAA CGAGCTTGAA TGCTGCACCC 4400 TGGGACACGT GGAATCCCGT GTGAATCAGC AAGGACCACC 4500 AAAGAACCCC CATT~AG TGAAATAGAA CATGAAACCG 4600 CGGCGACTCATAGCCACCGG CTTGGTTAAG GGAACCCACC 4700 GGGTGATCTA TCCATGACCA GGATGAAGCT TGGGTGAAAC 4800 GAAATGCCAC TCGAACCCAG AGCTAGCTGG TTCTCCCCGA 4900 GCCGTGAGAG CGGTACCAAA TCGAAGCAAA CTCTGAATAC 5000 CTTCATCGTC GAGAGGGAAA CAGCCCGGAT CACCAGCTAA 5100 TTGCCTAGAA GCAGCCACCC TTGAAAGAGT C.CGTAATAGC 5200 GGGATGTAAA AAC.ACATCGG TA~GCG TTCCC.CCTTA 5300 AACC.C-AAACATTGGTGAGAA TCCAATGCCC CGAAAACCTA 5400 GGCGTAGTCAATGGACAACA C4~TGAATATT CCTGTACTAC 5500 CAAGAACGTA AAGTGCCCCT GTTTTTTCAG GGTAAGTTAA 5600 CCCACGCCAT ACTCCCGGGA AAAGCTCGAA CGGCCTTTAA 5700 GCGAGACAAC TCTCTCTAAG GAACTCGGCAAAATAGCCCC 5800 GTTTACCAAA AACACAGGTC TCCGCAAAGT CGTAAGACCA 5900 AGGGGAGCCGGCC.ACCTAAG CCCCGGTGAACGCCGGCCGT 6000 GGCGTAACGA TCT~ACT GTCTCC-GAGA GAGGCTCGGT 6100 GCTTTACTGT TCCCT~AT TGAGTTTGGG CTTTTCCTGC 6200

ATACCACTCT GGAAGAGCTA GAATTCTAAC AAGGTAACGG AGGCGTGCAA AGGTTTCCTC CAGGGACGAA AGTCGGCCTT AGTGATCCGA AAGAGCTCAC ATCGACGGGA AGGTTTGGCA GGTACGTGAG CTC-GGTTCAG AACGTCGTGA GGAAGGACGC ACCTCTGGTG TACCAGTTGT CCACCCCAAG ATGAGTGCTC TCCTattccg agaagttt<g agaattcaag aTAAGGTCAC CCTAACAGAC CGGTAGACTT GAACcttgtt aaaatgctqc cccccttcta tccaagggat ttcatqgttc gaTATTCTGG TGTCCTAC~ZC AGGTCCTGCG GAAAAATAGC TCGACGCCAG

CTTGTGTCAG GACCTAC~ 6300 GGGCCGGGCG GAGATTGGCC 6400 CGGTACCGAG TGGAAGGGCC 6500 CCTCGATGTC C-CTCTTCGCC 6600 GACAGTTTGG TCCATATCCG 6700 CGTGCCCACG GTAAACGCTG 6808 acttccccag agcttcggta 6900 GGCGAGACGA GCCGTTTATC 7000 cctacatgac ctgataaatt 7100 gtaaggacat aggattttgg 7200 GTATAGGAAC CACACCAATC 7300 GATgataaaa agc[t 7395

Fig. 4. Complete nucleotide sequence of the Epifagus virginiana plastid ribosomal RNA locus.

(The sequences in this region from all the species studied are sufficiently similar to those of tobacco and Epifagus that the homologues of sites mapped in other species can be identified unambiguously in Fig. 2.) The start sites in pea, maize and duck- weed (Spirodela oligorhiza) lie within a few nucle- otides of one another while that in spinach is about 20 bp further upstream.

The promoter of the 8 kb rRNA precursor transcript is, in photosynthetic plants, one of the

strongest of all plastid promoters. Two prokary- otic-type promoter sequences (termed P 1 and P2) have been identified upstream of the 16S gene in several species [5] (Fig. 2). Both of these promot- ers function in vitro with E. coli RNA polymerase, but the relevance of this to the in vivo situation is unclear; in spinach an homologous chloroplast RNA polymerase extract selects a different tran- scription initiation site than does the E. coli poly- merase [2]. Baeza et aI. [2] have recently mapped

1044

a protein-binding site (termed CDF2) in spinach to the region just upstream of the in vivo transcrip- tion start site, though the small size of the pro- teins bound suggested that they were not the poly- merase itself. In Epifagus the trnV deletion extends through all of the P1 promoter, the CDF2 site, the -35 element of P2, and the site of tran- scription initiation in spinach (Fig. 2). The -10 element of P2 and the initiation site used in three other species have been retained in the parasite, but there is no strong candidate for a new -35 region for P2 at an appropriate distance upstream.

One possibility is that the homologue of the promoter of trnV in other species is used as the promoter of the rRNA operon in Epifagus. trnV is normally transcribed separately from the rRNA operon but its start site has not been mapped precisely. Primer extension experiments by Stritt- matter et al. [34] identified two pre-tRNA val spe- cies in maize whose 5' ends mapped to the re- gions indicated as PE3 and PE4 in Fig. 2. This suggests that the promoter of trn V is upstream of this area, at least in maize. This region is quite well conserved among Epifagus, tobacco and maize (Fig. 2 and data not shown), so the former trn V promoter may still be active in Epifagus.

Discussion

Organization of the rRNA locus in Epifagus ptDNA

The rRNA sequences from Epifagus are highly similar to those of photosynthetic plants and there is no reason to suspect that they are not func- tional. Transcripts of the 16S and 23S rRNAs of Epifagus have been detected by northern analysis ([7]; C.W. dePamphilis and J.D. Palmer, unpub- lished results), and transcription of 16S and 23S rRNAs in the related nongreen parasite Conop- holis has been reported by Wimpee etal. [43]. The largest deletion in the Epifagus rRNA genes (which total 4.5 kb) is only nine nucleotides, whereas in the 2.0 kb 16S-23S spacer region there are ten deletions of this size or larger (Fig. 2). This strongly suggests that natural selection has acted to conserve the rRNA sequences in Epifa-

gus ptDNA, which would not occur if these genes were nonfunctional. Furthermore, the conserva- tion of intact rRNA genes between Epifagus and Conopholis [44], despite divergence of the inter- genic spacers, indicates that selection on rRNA sequences has continued to operate following the loss of photosynthesis in the Orobanchaceae lin- eage.

Three tRNA genes (trnVGA c, trnlGA u and tr- nAu~c) within or flanking the rRNA operon in Epifagus are either missing or clearly pseudogenes, whereas all four of the expected rRNA genes are present in an intact state. The fourth tRNA gene in the locus, trnRAc ~, is located downstream of the 5S gene and appears to be intact [48]. The trnI and trnA genes are also apparently nonfunc- tional in Conopholis, where the 16S-23S spacer has been reduced to only 398 bp [44]. In other regions of the Epifagus genome [28, 48] some tRNA genes have been retained intact whereas others have become pseudogenes or have been deleted entirely, even though all 61 sense codons are used in Epifagus protein genes in the same proportions as in chloroplast genomes ([28]; un- published data). We have hypothesized that some tRNAs must be imported from the cytoplasm to effect translation [28]. A detailed characteriza- tion of the set of tRNA genes retained in the Epifagus plastid genome will be presented else- where.

Sequences corresponding to the promoter of the primary rRNA transcript in photosynthetic species have been partially deleted in Epifagus ptDNA (Fig. 2). We have made similar observa- tions concerning the promoters of trnE and rps2 in Epifagus, where -35 and -10 elements found in other species have been deleted [28]. The Epif- agus plastid genome lacks the four bacterial-type RNA polymerase subunit genes (rpo genes) that are present in other ptDNAs ([28]; unpublished results) and which are known to be functional [16, 17, 23]. This has led us to propose that all transcription in Epifagus plastids must rely on the second RNA polymerase activity tentatively iden- tified in photosynthetic chloroplasts (reviewed in [3]), which is probably the product of nuclear genes. Little is known about this second poly-

merase, but it is possible that its requirements for promoter sequence recognition differ from those of the better-characterized (plastid-encoded) polymerase.

Processing of the primary rRNA transcript is complex: the precursor must be cleaved in at least eight places to yield the four mature rRNAs, in addition to the splicing of two introns and a fur- ther four cleavages to produce the spacer tRNAs in photosynthetic plants. The pathway by which the primary transcript matures has not been stud- ied in depth in any plastid system but experiments in spinach [26] and maize [6] indicate that at least some cleavage of the primary transcript pre- cedes intron splicing. If this is the case, the pre- dicted failure to splice the trnI and trnA introns in Epifagus would be unlikely to have any effect on the maturation of rRNAs. On the other hand some potential secondary structures formed by base-pairing between sequences upstream and downstream of the mature 16S rRNA in tobacco and maize have been suggested to be important for rRNA processing [34,40]. Much of this structure cannot be formed in Epifagus due to a 61 bp deletion just downstream of the 16S gene (Fig. 2), and this might result in a difference in processing between Epifagus and other species. This idea is supported by the observation that although the 16S rRNAs of Epifagus and tobacco are predicted to be almost identical in size (Ta- ble 1), their electrophoretic mobilities are slightly different [7]. In contrast, the less extensive base- pairing of sequences upstream of the 23S rRNA with those downstream of the 4.5S rRNA pro- posed for tobacco [39] can also occur in Epifa- gus.

Rates of molecular evolution of the rRNAs

The rates of molecular evolution of the Epifagus 16S and 23S rRNAs are much greater than those in tobacco, and somewhat greater than in mono- cots and some other dicots (Fig. 1). Similar branch length differences were seen in the other three shortest 16S rRNA trees obtained by par- simony analysis and in those produced by the

1045

neighbor-joining method (data not shown). At least three separate factors may contribute to the branch length variation seen in these trees. First, the removal of the requirement to translate pho- tosynthetic and chlororespiratory genes may re- sult in a relaxation of natural selection on the whole plastid ribosome in Epifagus: the only di- rect selection on the ribosome is to translate the half-dozen or fewer retained open reading frames whose products are not themselves part of the gene expression apparatus. The accelerated rate of amino acid substitution seen in two Epifagus ribosomal protein genes examined [28] and the complete absence of at least three others [28, 48] are consistent with this notion of 'sloppy' ribo- somes.

Second, the difference in rate between Epifagus and tobacco may be due in part to a slowdown in the rate of evolution of the tobacco rRNAs. In phylogenetic trees drawn for 16S and 23S rRNAs, as well as for many protein genes, the branch length leading to tobacco is consistently shorter than that leading to monocots [47]. The reasons for these rate differences between lineages of pho- tosynthetic plants are not understood but may reflect underlying mutation rate differences. Sig- nificant rate differences among lineages ofp tDNA have also been identified through restriction map- ping studies (e.g. [30]).

Third, the deletion of one copy of the inverted repeat (IR) region of ptDNA in some of the spe- cies considered in the 16S tree (Fig. la) may fur- ther complicate the issue because these repeated sequences (which contain the rRNA genes) have a fourfold lower mutation rate than single-copy plastid sequences [45, 46]. The inverted repeat has been lost independently in the lineage leading to the legumes pea and Vicia faba [22] and in Conopholis (S.R. Downie, C.W. dePamphilis and J.D. Palmer, unpublished results) (Fig. la). Pea and Vicia faba have an accelerated rate of 16S rRNA evolution as compared to soybean (an 1R- containing legume) as would be expected if the mutation rate in the rRNA operon increased once it became single-copy. In the Orobanchaceae, Conopholis shows a faster rate of evolution than Epifagus in the 23S gene, but the branch length

1046

difference for the 16S gene is only marginal (Fig. 1).

Our emerging view of the Epifagus plastid ri- bosome is one of typical rRNAs associated with only some of the ribosomal proteins encoded by photosynthetic plastids. At least three other ribo- somal proteins are not encoded by the Epifagus plastid genome and are either absent from the ribosome or have been replaced by proteins en- coded by another genome (most likely nuclear) [28, 48]. The function of this ribosome is to trans- late the mRNAs encoding ribosomal proteins and other proteins involved in gene expression, as well as the few other proteins that are not involved in gene expression. This is achieved without a full set ofplastid-encoded tRNAs. Mutations in genes for ribosomal proteins, rRNAs and tRNAs will be tolerated provided that sufficient quantities of the non-gene-expression proteins are made with sufficient accuracy. This reduction of selective constraints may have permitted the Epifagus plas- tid ribosome to evolve at an accelerated rate, in terms of both the loss of certain components and the sequences of those components still present.

Acknowledgements

We thank Stephanie Eros for assistance in DNA sequencing, Drs Jim Manhart, Norm Pace and Liz Zimmer for oligonucleotides, and Drs Chuck Wimpee and Gao Jiaguo for making data avail- able prior to publication. This study was sup- ported by grants from the NIH (GM 35087) to J.D.P. and the Alfred P. Sloan Foundation (90- 3-5) to K.H.W.

References

1. Audren H, Bisanz-Seyer C, Briat J-F, Mache R: Struc- ture and transcription of the 5 S rRNA gene from spinach chloroplasts. Curt Genet 12:263-269 (1987).

2. Baeza L, Bertrand A, Mache R, Lerbs-Mache S: Char- acterization of a protein binding sequence in the promoter region of the 16S rRNA gene of the spinach chloroplast genome. Nucl Acids Res 19:3577-3581 (1991).

3. BogoradL: Replication andtranscriptionofplastidDNA. In: Bogorad L, Vasil IK (eds) Molecular Biology of Plas-

tids (vol 7A of Vasil IK (ed-in-chief), Cell Culture and Somatic Cell Genetics of Plants), pp. 93-124. Academic Press, San Diego (1991).

4. de Lanversin G, Pillay DTN: Primary structure and se- quence organization of the 16 S-23 S sp acer in the ribo so- real operon of soybean (Glycine max L.) chloroplast DNA. Theor Appl Genet 76:443-448 (1988).

5. Delp G, Koessel H: rRNAs and rRNA genes of plastids. In: Bogorad L, Vasil IK (eds) Molecular Biology of Plas- tids (vol 7A of Vasit IK (ed-in-chiet), Cell Culture and Somatic Cell Genetics of Plants), pp. 139-167. Academic Press, San Diego (1991).

6. Delp G, Igloi GL, Koessel H: Identification of in vivo processing intermediates and of splice junctions oftRNAs from maize chloroplasts by amplification with the poly- merase chain reaction. Nucl Acids Res 19:713-716 (1991).

7. dePamphilis CW, Pahner JD: Loss of photosynthetic and chlororespiratory genes from the plastid genome of a par- asitic flowering plant. Nature 348:337-339 (1990).

8. Doyle JJ, Doyle JL: A rapid isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull 19:11-15 (1987).

9. Edwards K, Koessel H: The rRNA operon from Zea mays chloroplasts: nucleotide sequence of 23S rDNA and its homology with E. coli 23S rDNA. Nucl Acids Res 9: 2853-2869 (1981).

10. Gao J, Wang X, Wang Q: The complete nucleotide se- quence of Brassica napus chloroplast 16S rRNA gene. Acta Genet Sin 16:263-268 (1989).

11. Gao J, Wang X, Wang Q, Tan J: Analysis of the primary structure of the leader sequence of the 16S rRNA gene from Vieia faba. Science in China, ser B 33:592-598 (1990).

12. Gutell RR, Fox GE: A compilation of large subunit RNA sequences presented in a structural format. Nucl Acids Res 16:r175-r269 (1988).

13. HenikoffS: Unidirectional digestion with exonuclease III in DNA sequence analysis. Meth Enzymol 155:156-165 (1987).

14. Hildebrand M, Hallick RB, Passavant CW, Bourque DP: Trans-splicing in chloroplasts: the rps12 locus of Nicotiana tabacum, Proc Natl Acad Sci USA 85:372-376 (1988).

15. Hiratsuka J, Shimada H, Whittier R, Ishibashi T, Saka- moto M, Mori M, Kondo C, Honji Y, Sun C-R, Meng B-Y, Li Y-Q, Kanno A, Nishizawa Y, Hiral A, Shinozaki K, Sugiura M: The complete nueleotide sequence of the rice (Oryza sativa) chloroplast genome: intermolecular re- combination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol Gen Genet 217:185-194 (1989).

16. Hu J, Bogorad L: Maize chloroplast RNA polymerase: The 180-, 120-, and 38-kilodalton polypeptides are en- coded in chloroplast genes. Proc Natl Acad Sci USA 87: 1531-1535 (1990).

17. Hu J, Troxler RF, Bogorad L: Maize chloroplast RNA

polymerase: The 78-kilodalton polypeptide is encoded by the plastid rpoC1 gene. Nucl Acids Res 19:3431-3434 (1991).

18. Janssen I, Mucke H, Loeffelhardt W, Bohnert HJ: The central part of the cyanelle rDNA unit of Cyanophora paradoxa: sequence comparisons with chloroplasts and cyanobacteria. Plant Mol Biol 9:479-484 (1987).

19. Keus RJA, Dekker AF, van Roon MA, Groot GSP: The nucleotide sequence of the regions flanking the genes cod- ing for 23S, 16S and 4.5S ribosomal RNA on chloroplast DNA from Spirodela oligorhiza. Nucl Acids Res 11: 6465-6474 (1984).

20. Koch W, Edwards K, Koessel H: Sequencing of the 16S- 23S spacer in a ribosomal RNA operon of Zea mays chloroplast DNA reveals two split tRNA genes. Cell 25: 203-213 (1981).

21. Koessel H, Edwards K, Koch W, Langridge P, Schiefer- mayr E, Schwarz Zs, Strittmatter G, Zenke G: Structural and functional analysis of an rRNA operon and its flank- ing tRNA genes from Zea mays chloroplasts. NucI Acids Res Symp Ser 11:117-120 (1982).

22. Lavin M, Doyle JJ, Palmer JD: Evolutionary significance of the loss of the chloroplast-DNA inverted in the Legu- minosae subfamily Papilionoideae. Evolution 44: 390- 402 (1990).

23. Little MC, Hallick RB: Chloroplast rpoA, rpoB, and rpoC genes specify at least three components of a chloroplast DNA-dependent RNA polymerase active in tRNA and mRNA transcription. J Biol Chem 263:14302-14307 (1988).

24. Maid U, Zetsche K: Structural features of the plastid ribosomal RNA operons of two red algae: Antithamnion sp. and Cyanidium caldarium. Plant Mol Biol 16:537-546 (1991).

25. Manhart JR, Palmer JD: The gain of two chloroplast tRNA introns marks the green algal ancestors of land plants. Nature 345:268-270 (1990).

26. Massenet O, Martinez P, Seyer P, Briat J-F: Sequence organization of the chloroplast ribosomal spacer of Spinacia oleracea including the 3' end of the 16S rRNA and the 5' end of the 23S rRNA. Plant Mol Biol 10: 53-63 (1987).

27. Michel F, Umesono K, Ozeki H: Comparative and func- tional anatomy of group II catalytic introns - a review. Gene 82:5-30 (1989).

28. Morden CW, Wolfe KH, dePamphilis CW, Palmer JD: Plastid translation and transcription genes in a non- photosynthetic plant: intact, missing and pseudo genes. EMBO J 10:3281-3288 (1991).

29. Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S, Umesono K, Shiki Y, Takeuchi M, Chang Z, Aota S, Inokuchi H, Ozeki H: Chloroplast gene organi- zation deduced from complete nucleotide sequence of liverwort Marchantia polymorpha chloroplast DNA. Na- ture 322:572-574 (1986).

30. Palmer JD, Jansen RK, Michaels HJ, Chase MW, Man-

1047

hart JR: Chloroplast DNA variation and plant phylog- eny. Ann Missouri Bot Gard 75:1180-1206 (1988).

31. Saitou N, Nei M: The neighbor joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406-425 (1987).

32. Schwarz Zs, Koessel H: The primary structure of 16S rDNA from Zea mays chloroplast is homologous to E. coli 16S rRNA. Nature 283:739-742 (1980).

33. Shinozaki K, Ohme M, Tanaka T, Wakasugi T, Hayash- ida N, Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchi-Shinozaki K, Ohto C, Torazawa K, Meng BY, Sugita M, Deno H, Kamogashira T, Yamada K, Kusuda J, Takaiwa F, Kato A, Tohdoh N, Shimada H, Sugiura M: The complete nucleotide sequence of the to- bacco chloroplast genome: its gene organization and ex- pression. EMBO J 5:2043-2049 (1986).

34. Strittmatter G, Gozdzicka-Jozefiak A, Koessel H: Iden- tification of an rRNA operon promoter from Zea mays chloroplasts which excludes the proximal tRNAV~c from the primary transcript. EMBO J 4:599-604 (1985).

35. Stummann BM, Lehmbeck J, Bookjans G, Henningsen KW: Nucleotide sequence of the single ribosomal RNA operon of pea chloroplast DNA. Physiol Plant 72: 139- 146 (1988).

36. Sun E, Wu B-W, Tewari KK: In vitro analysis of the pea chloroplast 16S rRNA gene promoter. Mol Cell Biol 9: 5650-5659 (1989).

37. Svab Z, Maliga P: Mutation proximal to the tRNA bind- ing region of the Nicotiana plastid 16S rRNA confers resistance to spectinomycin. Mol Gen Genet 228: 316- 319 (1991).

38. Takaiwa F, Sugiura M: Nucleotide sequence of the 16S- 23 S spacer region in an rRNA gene cluster from tobacco chloroplast DNA. Nucl Acids Res 10:2665-2676 (1982).

39. Takaiwa F, Sugiura M: The complete nucleotide sequence of a 23-S rRNA gene from tobacco chloroplasts. Eur J Biochem 124:13-19 (1982).

40. TohdohN, Sugiura M: The complete nucleotide sequence of a 16S ribosomal RNA gene from tobacco chloroplasts. Gene 17:213-218 (1982).

41. Toukifimpa R, Romby P, Rozier C, Ehresmann C, Ehres- mann B, Mache R: Characterization and footprint anal- ysis of two 5S rRNA binding proteins from spinach chloroplast ribosomes. Biochemistry 28:5840-5846 (1989).

42. yon Allmen J-M, Stutz E: The soybean chloroplast ge- uome: nucleotide sequence of a region containing tRNA- Val(GAC) and 16S rRNA gene. Nucl Acids Res 16:1200 (1988).

43. Wimpee CF, Wrobel RL, Garvin DK: A divergent plas- tid genome in Conopholis americana, an achlorophyllous parasitic plant. Plant Mol Biol 17:161-166 (1991).

44. Wimpee CF, Morgan R, Wrobel R: An aberrant plastid ribosomal RNA gene cluster in the root parasite Conopholis americana. Plant Mol Biol, in press.

45. Wolfe KH: Protein-coding genes in chloroplast DNA:

1048

compilation of nucleotide sequences, data base entries, and rates of molecular evolution. In: Bogorad L, Vasil IK (eds) The Photosynthetic Apparatus: Molecular Biology and Operation (vol 7B of Vasil IK (ed-in-chief), Cell Cul- ture and Somatic Cell Genetics of Plants), pp. 467-482. Academic Press, San Diego (1991).

46. Wolfe KH, Li W-H, Sharp PM: Rates ofnucleotide sub- stitution vary greatly among plant mJtochondrial, chloro- plast, and nuctear DNAs. Proc Natl Acad Sci USA 84: 9054-9058 (1987).

47. Wolfe KH, Gouy M, Yang Y-W, Sharp PM, Li W-H: Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc Natl Acad Sci USA 86:6201-6205 (1989).

48. Wolfe KH, Morden CW, Palmer JD: Small single-copy region of plastid DNA in the non-photosynthetic an- giosperm Epifagus virginiana contains only two genes: dif- ferences among dicots, monocots and bryophytes in gene organization at a non-bioenergetic locus. J Moi Biol, in press.