Extreme heterogeneity of Ty1-copia group retrotransposons in plants

10
MolGenGenet (1992) 231:233 242 © Springer-Verlag 1992 Extreme heterogeneity of Tyl-copia group retrotransposons in plants Andrew J. Flavell, 1 Donald B. Smith l,, and Amar Kumar 2 1 Department of Biochemistry, The University, Dundee, DDI 4HN, Scotland, UK 2 Cell and Molecular Genetics Department, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK Summary. We have used the polymerase chain reaction to analyse Tyl-copia group retrotransposons of flower- ing plants. All eight species studied contain reverse trans- criptase fragments from Tyl-copia group retrotranspo- sons. Sequence analysis of 31 subcloned fragments from potato reveals that each is different from the others, with predicted amino acid diversities between individual fragments varying between 5% and 75%. Such sequence heterogeneity within a single species contrasts strongly with the limited diversity seen in such retrotransposons in yeast and Drosophila. The fragments from the other seven plant species examined are also heterogeneous, both within and between species, showing that this is a general property of this transposon group in plants. Phylogenetic analysis of all these sequences reveals that many of them fall into subgroups which span species boundaries, such that the closest homologue of one se- quence is often from a different species. We suggest that both vertical transmission of Tyl-copia group retrotrans- posons within plant lineages and horizontal transmission between different species have played roles in the evolu- tion of Tyl-copia group retrotransposons in flowering plants. Key words: copia - retrotransposon - Plant - Solanum tuberosum Introduction Retrovirus-related transposable elements are among the most common and widespread types of eukaryotic trans- poson (Dootlittle et al. 1989). They include the verte- brate retroviruses (Varmus and Brown 1989) and the retrotransposons of Drosophila and Saecharomyces Offprint requests to : A. Flavell * Present address: Department of Molecular Biologyand Genetics, University of Guelph, Guelph, Ontario N1G 2WI, Canada cerevisiae (Bingham and Zachar 1989; Boeke 1989). The best studied retrotransposons, the copia element of Dro- sophila and the Tyl and Ty2 elements of S. cerevisiae, are members of a distinct group, the Tyl-copia group, other members of which exist in plants (Voytas and Au- subel 1988; Grandbastien etal. 1989; Camirand and Brisson 1990). The major difference between Tyl-copia group retrotransposons and retroviruses is the presence of an env gene in the retroviruses, which enables them to enter and exit their host cells (reviewed by Boeke 1989; Bingham and Zachar 1989; Varmus and Brown 1989). Other properties of the Tyl-copia group which set them apart from all other retrovirus-related transpos- able elements include the unique gene order gag-int-rt- RNaseH and characteristic sequence elements, in several of these genes, which are conserved between the different transposons in this group. Each copy of any Tyl-copia group retrotransposon is typically greater than 95% identical to any other copy (Mount and Rubin 1985; Emori et al. 1985; Boeke 1989; Konieczny et al. 1990, 1991). In contrast, homologies between different retro- transposons in this group can be so low that only the better conserved genes, such as the rt (reverse transcrip- tase) gene are recognisably similar (Doolittle et al. 1989; Xiong and Eickbush 1990). There has been considerable interest in the evolution of retrovirus-related transposable elements. The pres- ence of retrotransposons in primitive eukaryotes and their simpler genetic constitution argues that they may have been the progenitors of retroviruses (Temin 1980; Doolittle et al. 1989). The alternative possibility is that retrotransposons evolved from retroviruses by the loss of their env gene. It has also been postulated that the existence of the Tyl-copia group in evolutionarily dis- tant, but ecologically close species is a consequence of horizontal genetic transfer of this class of transposon between these species (Yuki et al. 1986; Xiong and Eick- bush 1990; Doolittle et al. 1989; Konieczny et al. 1991). However, experiments to look directly for horizontal transmission of Ty retrotransposons between yeast strains have been unsuccessful (Garfinkel et al. 1985).

Transcript of Extreme heterogeneity of Ty1-copia group retrotransposons in plants

MolGenGenet (1992) 231:233 242

© Springer-Verlag 1992

Extreme heterogeneity of Tyl-copia group retrotransposons in plants Andrew J. Flavell, 1 Donald B. Smith l , , and Amar Kumar 2

1 Department of Biochemistry, The University, Dundee, DDI 4HN, Scotland, UK 2 Cell and Molecular Genetics Department, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK

Summary. We have used the polymerase chain reaction to analyse Tyl-copia group retrotransposons of flower- ing plants. All eight species studied contain reverse trans- criptase fragments from Tyl-copia group retrotranspo- sons. Sequence analysis of 31 subcloned fragments from potato reveals that each is different from the others, with predicted amino acid diversities between individual fragments varying between 5% and 75%. Such sequence heterogeneity within a single species contrasts strongly with the limited diversity seen in such retrotransposons in yeast and Drosophila. The fragments from the other seven plant species examined are also heterogeneous, both within and between species, showing that this is a general property of this transposon group in plants. Phylogenetic analysis of all these sequences reveals that many of them fall into subgroups which span species boundaries, such that the closest homologue of one se- quence is often from a different species. We suggest that both vertical transmission of Tyl-copia group retrotrans- posons within plant lineages and horizontal transmission between different species have played roles in the evolu- tion of Tyl-copia group retrotransposons in flowering plants.

Key words: c o p i a - retrotransposon - Plant - Solanum tuberosum

Introduction

Retrovirus-related transposable elements are among the most common and widespread types of eukaryotic trans- poson (Dootlittle et al. 1989). They include the verte- brate retroviruses (Varmus and Brown 1989) and the retrotransposons of Drosophila and Saecharomyces

Offprint requests to : A. Flavell * Present address: Department of Molecular Biology and Genetics, University of Guelph, Guelph, Ontario N1G 2WI, Canada

cerevisiae (Bingham and Zachar 1989; Boeke 1989). The best studied retrotransposons, the copia element of Dro- sophila and the Tyl and Ty2 elements of S. cerevisiae, are members of a distinct group, the Tyl-copia group, other members of which exist in plants (Voytas and Au- subel 1988; Grandbastien etal. 1989; Camirand and Brisson 1990). The major difference between Tyl-copia group retrotransposons and retroviruses is the presence of an env gene in the retroviruses, which enables them to enter and exit their host cells (reviewed by Boeke 1989; Bingham and Zachar 1989; Varmus and Brown 1989). Other properties of the Tyl-copia group which set them apart from all other retrovirus-related transpos- able elements include the unique gene order gag-int-rt- RNaseH and characteristic sequence elements, in several of these genes, which are conserved between the different transposons in this group. Each copy of any Tyl-copia group retrotransposon is typically greater than 95% identical to any other copy (Mount and Rubin 1985; Emori et al. 1985; Boeke 1989; Konieczny et al. 1990, 1991). In contrast, homologies between different retro- transposons in this group can be so low that only the better conserved genes, such as the rt (reverse transcrip- tase) gene are recognisably similar (Doolittle et al. 1989; Xiong and Eickbush 1990).

There has been considerable interest in the evolution of retrovirus-related transposable elements. The pres- ence of retrotransposons in primitive eukaryotes and their simpler genetic constitution argues that they may have been the progenitors of retroviruses (Temin 1980; Doolittle et al. 1989). The alternative possibility is that retrotransposons evolved from retroviruses by the loss of their env gene. It has also been postulated that the existence of the Tyl-copia group in evolutionarily dis- tant, but ecologically close species is a consequence of horizontal genetic transfer of this class of transposon between these species (Yuki et al. 1986; Xiong and Eick- bush 1990; Doolittle et al. 1989; Konieczny et al. 1991). However, experiments to look directly for horizontal transmission of Ty retrotransposons between yeast strains have been unsuccessful (Garfinkel et al. 1985).

234

One way to approach these questions experimentally is to look at the phylogenetic relationships between the members of a retrotransposon group in a variety of relat- ed organisms. This study was therefore initiated to ad- dress these issues and to obtain basic information about the organization and evolution of the Tyl-copia group in flowering plants. We have used the polymerase chain reaction (PCR) to isolate a portion of the reverse trans- criptase gene of Tyl-copia group members from a variety of angiosperm plants.

We show here that each of the eight species that we have screened contains Tyl-copia group retrotranspo- sons. However, in marked contrast to previously de- scribed Tyt-copia group retrotransposons of Drosophila and S. cerevisiae, these novel sequences do not fall into a small number of homogeneous groups. Rather, each plant species possesses a heterogeneous collection of in- terrelated sequences that fall into many subgroups. In the potato, this heterogeneity is so extreme that no two out of 35 sequences examined are the same. We propose that some feature, or features, which are peculiar to the genomes of flowering plants promote this heterogeneity.

We also find that the majority of the sequence sub- groups which are found in the potato are present in closely related plant species, suggesting that the evolu- tion of this retrotransposon group in the flowering plants has been influenced by vertical transmission with- in evolving plant lineages. Lastly, we find some cases where very distantly related plants carry similar reverse transcriptase sequences, suggesting that horizontal transmission of Tyl-copia retrotransposons between plant species has also occurred.

Materials and methods

Plant materials. Plants of the following species were grown from seed in glass houses: potato (Solanum tuber- osum) cv. Desir6e and cv. Pentland Squire; capsicum (Capsicum annum) cv. Bellboy; tomato (Lycopersieum esculentum) cv. Rutgers; tobacco (Nicotania tabacum) cv. Xanthi; petunia (Petunia hybrida) cv. nana compac- ta; thorn apple (Datura stremonium); pea (Pisum sati- rum) cv. Onward; barley (Hordeum vulgate) cv. Tyne.

Molecular Analysis. Plant DNAs were isolated either by the method of Saghai-Marzoof et al. (1984) or by the following simple method. A small piece of leaf tissue was ground under liquid nitrogen in a microfuge tube then boiled for 10 rain in water. The mixture was sub- jected to phenol-chloroform extraction, followed by phe- nol extraction and ethanol precipitation. The crude DNA pellet was redissolved in 20 gl of 10 mM TRIS- HC1 (pH 7.5), 1 mM EDTA and 1 ~tl was used for PCR reactions. Primer oligonucleotides were sythesized using an Applied Biosystems 381A synthesizer. PCR reactions were performed using Cetus reagents and a Hybaid ma- chine programmed for 94 ° C for 1 rain followed by 25 or 30 cycles, each comprising 1 rain at 94 ° C, 1 rain at 45 ° C and 1 min at 72 ° C with a final elongation at 72 ° C

for 7 rain. Reaction products were separated on 1.5% agarose gels and visualized under UV illumination. DNA fragments were eluted from gel slices by centrifu- gation through Costar Spin-X columns and purified by phenol-chloroform extraction followed by ethanol pre- cipitation. Prior to cloning, the fragments were treated with T4 polynucleotide kinase followed by Klenow poly- merase, to create blunt-ended, 5'-phosphorylated ends which were subcloned into Sma I-digested, phosphatase- treated M13mplS. Nucleotide sequences of single- stranded MI3 DNAs were determined by the dideoxy method using Sequenase enzyme (US Biochemicals) and the protocol supplied. The genomic potato )~ library (a gift from Professor Lothar Willmitzer, Institut ffir Gen- biologische Forschung, Berlin) was probed at low strin- gency (0.1 x SSC, 37 ° C) with M8, M159 and M166 probes and at higher stringency with a probe for the patatin gene (present in more than 40 copies per tetra- ploid genome; Mignery et al. 1988) as a control for the number of genomes screened.

Computer analysis. Computing was carried out on the VAX facilities of the University of Leicester and the SERC Computing Centre, Daresbury, UK. Plant nucle- otide sequences were first aligned relative to the corre- sponding region of the Tnti retrotransposon (Grandbas- tien et al. 1989) using the GAP programme of the UWGCG Sequence Analysis Package (Devereux et al. 1984). Frameshifts were corrected by insertion of a non- specific nucleotide and the resulting peptide sequences were cross-compared using the CLUSTAL sequence

analysis programme (t-Iiggins and Sharp 1988) or the PAPA set of programmes (Feng and Doolittle 1990; Doolittle and Feng 1990). CLUSTAL compares each se- quence with each of the others and generates a similarity matrix. The most similar pair of sequences in this matrix are aligned and the pair is substituted by a consensus, before repeating the computation cycle until eventually all of the sequences are aligned relative to each other, preserving all introduced gaps. This method is similar to the first step of the PAPA package. A relationship tree is generated from the CLUSTAL similarity scores generated during the alignment. CLUSTAL has theoret- ical limitations that argue against its ability to generate accurate phylogenetic relationships, but it nevertheless generates a relationship tree from the retrovirus data used by Doolittle et al. (1989) which is topologically vir- tually identical to the phylogenetic tree deduced by these authors (28 out of 30 branch points in the two analyses are identical; data not shown). We therefore used this tree as a broad guide for running the TREE and PAPA3 programs of Feng and Doolittle. These programmes are both capable of generating credible phylogenetic trees but cannot handle simultaneously the number of se- quences used here. Any discrepancies between the trees generated by TREE and PAPA3 were resolved by sub- mitting the topology suggested by one program to analy- sis by,the other. In such cases, the tree with the lowest standard deviation (Doolittle and Feng 1990) was adopted.

Results

Multiple Tyl-copia group sequences in potato

To detect Tyl'-copia group members specifically in plant genomes by PCR analysis, we looked through the se- quences of the Tyl-copia group for a pair of priming regions which would facilitate the detection of plant Tyl- copia group members and exclude all other retroele- ments, including other retrotransposon families. We de- cided to use the regions shown in Fig. 1. The upstream priming region chosen by us is completely conserved in the two plant Tyl-copia group members, Tal and Tntl, the sequences of which had been reported when the study was initiated (Voytas and Ausubel 1988; Grandbastien et al. ~989). It is less well conserved in other Tyl-copia group members, including the Tst! re- trotransposon of potato and it is almost undetectable in all other reverse transcriptases and retrotransposons (Fig. l ; see Xiong and Eickbush 1990). The downstream primer region chosen encodes the YVDD peptide motif, which is the most strongly conserved sequence in the reverse transcriptase protein.

235

A

5'ACNGCNTTPyPyTNCAPyGG 3' 3'ATPuCANCTPuCTPuTACPuA 5'

T A F L H G Y V D D M L

B copia-Ty FAMILY SPECIES

TAFLHG...~75 ....... YVDDMLI Tntl N. tabacum NVFLNG YVDDIIL Tstl S. tuberosum TAFLNG YVDDVVI copia Drosophila TAYLNS VYVDDLI 1731 Drosophila SAYLYA FVDDMVL Tyl S. cerevisiae

OTHER RNTROELEMENTS DAYFSV YMDDLYV HIVI Human DAFFCL YVDDLLL MoMLV Mouse DCFFS I YMDDLLL RSV Chicken SGYHQI YVDDVI I gypsy Drosophila KAYLHV YLDDLLI Dirs Dictyostelium

Fig. l A, B. Oligonucleotide primers used in this study and corre- sponding regions in Tyl-copia group retrotransposons. A Nucleo- tide and peptide sequences of the primers used. N indicates any base; Pu, purine bases (A and G); Py, pyrimidine bases (C and T). B Amino acid sequence homologies between different retroele- ment peptide sequences and the priming regions. Sequences are from Xiong and Eickbush (1990) and Camirand and Brisson (1990).

M 143 SUBGROUP M 1 4 3 (TAFLHG)0~ iE: :~ !~ ;~ : :~E O:.~E~!~K V E G K E N - F~ I i ~K !~ iK : : ~ I : . ~ IE I i ~ ! I ~ : : ~ i ~K~ i~E S V M E E Q~Y K K T S SigH C V:~VQ K I S O N D F::~!:::'~LI~ L ( Y V O D U ~ D

~ ~ E EE ~ : . ~ E O ~ E ~::~ K V E G K E Y - F ~ K : : ~ : : R ~ ~ ; ~ : ~ K ~ E E F V M G~Q~ Y N K T S S:~ H C V ~ V O K N Y D G D F ~::~ L~ L F P ~ ~ D.q ~ ::~ ~ E H L E ~.~; K V O G K E N - F ~;~ E ~ R ~ ~ : ; ~ ~ K ~ ~ K ~ E S I M G E O ~ H K K T S S ~. H C V ::~ V Q K I Y O N D F~::~::;. L ~ L O ~ ; ~ E ; ~ : : ~ E K F E ~ E ~ K V D G K E N - F ~ K H R ~ D : : ~ ~ ; ~ R ~ : : ~ E P V T G E Q ~ : Y K K T S L ~ H C V Z C T K I S DOV F::~::::::::~:,L~L F P ~ D D:~}~:.~ E Q ~ E ~=~E V KG K E N - Y ~ K ~ K Q ~ ~ L ~ R ~ G S F MSQQ~F K K T S S ~ Y C V ~ V Q K F S NG D F ::~::~::~V~ L F 0

M 1 5 4 M4 M151 M 3 4 0

M200 SUBGROUP M 2 0 0 M 2 1 0 M8 M 1 4 7

M41SUBGROUP M41 M 1 5 9 M 1 5 0 M61 M~82 M 3 6 0 M 1 4 8

M166 SUBGROUP

~ :;~i E !~i~ i~i~i~ E Oi~ E ~::i~ E V SO K K H - M~!~ K : : ~ I N I ~ I ~ : ~ ! ~ I ~ K~!;:~ D S F MK SQMY T K T Y S:.~P C V Y F KR F $ O N N F~i!ii~iL!~ L F E ~:~i~i~ I E QI~E ~!I~IE V S G K K H - M ~ K ~ N ~ ! ~ . , . ~ ! . ~ K ~i~ D S F M K $ (3 T Y T K T Y $ ~ P C V Y F K R F S D N N F ~i!~i Li~ L

0 ~ E ::t= ~ ~::~ E O :=~ E ~ i~i E V S E K K H - M ~ F K ~i N i~ ~ i ~ i~i~ ~ ~ i ~ i ~ ~ K ~:;~i D $ F M K $ Q T Y T K T Y Y G P C I Y F K R F S N N N F :i~['!!t:;~ L ~ L E ~ K ~ ~i:~i~M E Q:~ E ~i!~iV V P G K E E - K~ii~ R F V: :~F ~ i ~ i i ~ : : i l ~ i K ~ X C.~* ~! D Q T M L A N~V K I N E Ci~:,K C V Y I K N V S N H E - V ii~!iV C L

~ i o Oi~:::t::i::~i~ Q L~ K~i- F - V CQ G E - K ~i~ R ~ ! T I ~ ' ~ i ~ N H~iL T E A L L K L K FQQ~QH~R~ L~ ~ I N K - A E EW I ~ : V : : ~ :,~ :~ D D~ ~ ~ Q L ~ K ~ - F - V R Q E E - K ~::~ R ~ T ::~ ~ P ~ C ~ ~ N H ~ L T E A L L K L K F Q Q ~ Q H ~ H ~ L ~ I N K - A E E G I ~:~:J ~ V~:~ ~ E ~ D O~ ::::~::~ ~ Q L ::~ O ~:: - F - V S Q G E - K ~:~ R ~ T ::~ ~ ~ ~ ~ N H ~ L T E A L L K L K F Q Q ~ Q H ~ H ~ L ~ I N K - A E E O I ~ ~ V ~ ~ E :~ D D:.~i~i::::~ R L ::~Q~;- F - V S Q G E - K ~ R ~ T ~ ~ : : ~ N H ~ L T E A L L K L K F Q Q.~ Q H~ H~: L ~ I N K - A K O R s~i~ V ~ ~ D ~ : : ~ Q L ~ K ~ - F - A V K R R - R ~ ~ G P;K~ ~ . ' ~ ~ : : ~ : ~ N H ~ ~ T E A L L K L K F Q Q~Q H~ H ~ L i~' I N K - A E E G I }:~:.~:: V ~ ~ ~ E K:E D;~I::~:::M K P~ P ~ L F - S S P T S - D ~ K~K R ~ : : ~ : : ~ A ~ F D~::R S T L L Q F S F E Q~ K y ~S ~ L ~ L WK - T S T G C V L L ~ E :~ D R ~ ~ I C O ~ M ~ Q S Q D H P E - Y ~::~:: K E~R:~ T ~ R ~.~::~ ~ . ~ A ~ G ~ S E F L S H S C Y S V T S A ~ S R L ~.V K V - V G R K L A ~ V ~

M I 6 8 M 1 5 2 M 1 6 5 M 1 6 7 TST~ M34

M36 SUBGROUP M36 M62 M7

M168 SUBGROUP M 1 6 8 M60 M38 M44 M 1 5 6 M35

!~!I~EI~iI~V~REQi~P~!i~VAQGESSSL~i~Ri~!RR~i~!~!!~S~ET~FG~i~$TV QQF~MTR~GA~IH~V~YRHSAPNRCi i~ i lYLV~i ! ~ ! ~ E ! ~ i i ~ A H ~ E ~ i ~ P ~ i ~ A Q G E S S S L ~ i R ! ~ R R ~ i ~ F ~ A ~ F G ~ S T V ~ Q Q F ~ M T R ~ G A ~ H ~ V ~ Y R H ~ A P N R C ~ Y L V ~

NVFLNG ~ E i E i ~ V ~ i ~ D P i ~ p ~ ! ~ - - E G K Y K s K ~ R ~ E ~ R R ~ i ~ i ~ i ~ i ~ s ~ F E R ~ T Q F V K R Q ~ Y T R ! ~ G A i ~ Y R ~ Y & ~ N R ~ Y ~ V ~ ~ E Q ~ i ~ A L S E ~ i ~ i ~ A Q G E L E K ~ i ~ H ! E i K ! ~ P M ~ ! ~ N ~ A ~ L S ~ i ~ ! ~ E V V ~ D F ~ L K M , ~ K C ~ H ~ i ~ s A ~ N ~ W L Y S P ~ i

i ~ ! i ~ N i ~ ! ~ F ~ T L ~ Q ~ i L - A ~ D G K D - M ~ R F K ~ i ~ ! ~ R ~ i i ~ s i ~ D ~ L A s S L C $ K ~ Y T H ~ N S ~ ! Y ~ L ~ : Y K K - K G ~ S L V F V A ~ i ~ N i ~ i ~ V C ~ A ~ i ~ Q ~ L ~ A V ~ S S N - M ~ i ~ K i ~ i K i ~ i ~ i ~ S ~ ! ~ D ~ L A S S L Y S K ~ Y ¥ H i ~ N s ! ~ Y ~ L i ~ Y R K - N G C S w s L L Q L i ~ ! ~ H ! ~ V ~ A P i F Q ~ L - V V G C D N - M ~ K ~ i K i ~ i i ~ i ~ i ~ i i ~ S ~ E ~ L A N C L ~ $ R ~ Y Q H ~ D S i ~ Y ~ L i ~ Y K R - K G C S L ~ F V A ~

~ ! i ~ N ~ V ~ K F i ~ P ~ i ~ - s P P T ~ N ~ H ~ i ~ L i ~ K i ~ I ~ i E ~ i ~ S K * ~ A ~ L T ~ A L N F K ~ F S H ! ~ i L N ~ C ~ L ~ F K K ~ $ G ~ S ~ S $ ~ E i E Q ! ~ i ~ V ~ R F ~ T ~ i ! ~ - ~ P P S P N - H ~ i ~ i L i ~ K i ~ i ~ i ~ ; s i ~ S ~ A R L A ~ A L A Y K ~ : Y ~ i ~ : L N ~ Y ~ L i ~ F K T - T G D L ~ s i ~ L A ~ N ~ i ~ V F ~ K F ~ P ~ M - N S L G P N ~ H ¥ L K ~ N i ~ F ~ i ~ ! R ~ s ~ E ~ $ R L T A A L N F K ~ F $ S ~ L N ~ ; Y ~ L ~ F K L - v G ~ s ~ i ~ i V A ~ ~ ! ~ i ~ ! ~ K F i ~ i M - D P P S ~ N - F ~ ! ~ K i ~ R i ~ ! ~ i ~ R ~ ! S ~ A R L G A A L ~ F K ~ F V S ! ~ L N ~ Y K H ~ G $ F ~ P L ~ A t ~ i ~ A ~ K F i ~ P ~ V - D ~ P S ~ S - H ~ i i ~ R ! E i R i ~ i ~ i ~ ] E i i ~ i S ~ $ R i ~ i ~ C s S W Y Q ~ C P - F H E * L ~ Y K S ~ G T L ~ T ! i ~ i ~ A ~ K ~ D i E ~ V ~ M S Q ~ P ~ ; ~ V D P T ~ P N - H ~ ! ¢ L i ~ i K ~ ! ~ C M ~ N K C L A D V ~ L S L ~ F V G ~ K T ~ Y ~ - s A D V K ~ F C i ~ i

k F P

F / S P

F / S

F Y V D D I

F

X indicates a - 1 nucleotide frameshift; asterisks denote nonsense codons. The clones are derived from Pentland Squire (P), Desir6e (D), Kennebec (K) cultivars or from a 2 library derived from an unknown cultivar. Clones containing in-frame frameshifts or non- sense codons are also indicated in the left column by F and S, respectively. A 51 bp insert in the hepatitis B virus sequence is denoted by a filled square.

Fig. 2. Alignment of predicted peptide sequences of potato retro- transposon subclones and other retrotransposons. Potato-derived sequences are placed in subgroups according to sequence similarity. Amino acids shared by more than 50 % of the sequences are shaded. The sequences of the Tstl, Tntl, copia, 412 and DIRS retrotranspo- son fragments and the hepatitis B virus (HEPB) fragment are also shown (Xiong and Eickbush 1990; Camirand and Brisson 1990).

OTHER COPIA-TYFAMILY RETROTRANSPOSONS

COPIATNT1 TA~LHGTA ~)i~il:::~:~ii~::~F O~I:~!~I:: r A G - KK H ~ i ~ K i ~ i ~ : : ~ ! i ~ i i ~ i ~ ! ! ~ i ~ . ~ M ~ : ~ i D $ F M ~ $ ~ I ¥ L ~ T ~ S ~ i ~ C V ¥ ~ K~ F$ ~ ~ F -::~il:~k L YVDDM LNG T ~ ; K E E ~ ) . ~ R L ~ Q ~ I ' S C - - - N S D N ~ ) : ~ K ~ N = . ~ ' ~ X ~ E V ~ E Q A L K E C E F V N ~ S V ~ R C Y ILDKGNINEN~: : :YVLL YVDDV

NON-COPIA- ~ FAMILY RETROTRANSPOSONS HEPB A A F Y H L P ; : ~ H P A A M P H L L V ~ S S G L P - i - P I I L G F R : - ~ I P M ~ V G L S E F L L A Q F T S A I C S V V R R A F P H C L ~FS YMDDVV 4 1 2 SG F HQ I E~D~GS R D I T S F S ~ S M . . . . . . G S Y R F T R L P F ~ I ~ N S FQRMMT I A F SG I E P SQ . . . . ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ , FL YMD D L I o ~ s ~ A V L H V ~ V D ~ Q ~ D L ~ R ~ V W ~ G . . . . . . S H Y R W K T M ~ F ~ i ~ S ~ ' : . ~ ~ T M L L R ~ V L ~ L R O ~ N V S . . . . . . . . . . . . . . . . . V ~ Y L O D ~ L

236

M143

M4 M154

MISI.

M34& J--1 M200]_,

M 210"--'~ t M8 I

M147

MIS9

M1SO M61

M162

M360

M148

M166 M152 [ - ~

I M167 Tstl

M34 M36'

M62 M7

M168"' M60

M38 M44

MIS6

M35

40 35 30 25 20 15 10 5 0 I I l I I I I I I

DISTANCE UNITS

Fig. 3. Phylogenetic tree of predicted peptide sequences of Tyl- copia group members of potato. Divergences in distance units (Feng and Doolittle 1987) are indicated by horizontal branch lengths, the vertical lengths have no significance.

We first chose potato (Solanum tuberosurn) as a source of DNA for analysis. PCRs on DNA isolated from two potato cultivars, Pentland Squire and Desir6e, both generated only fragments with the expected size of approximately 260 base pairs (data not shown). Anal- ysis of fragment subclones showed, surprisingly, that no two sequences were the same (see below), though all were obviously related to the corresponding region of the Tntt retrotransposon.

To obtain an approximate estimate of the number of different Tyl-copia group member sequences repre- sented among the potato PCR fragments, we deter- mined the sequences of 28 potato PCR subclones, plus 2 fragments isolated from a potato genomic 2 library by screening with one particular PCR subclone (M8; see Fig. 2). To determine the sequence similarities within this set, we converted the nucleotide data into peptide sequence and compared each subclone sequence with the entire set, using three multiple alignment computer programmes (see Materials and methods). We also in- cluded the corresponding region of the Tstl potato re- trotransposon (Camirand and Brisson 1990) in this anal- ysis. All of the 30 sequences determined by us were ob-

viously derived from Tyl-copia group retrotransposons, as judged by their sequence similarities with copia and Tntl and their lack of similarity with the closest relatives of the Tyl-copia retrotransposon group, which are the hepadnavirus group, and the 412 and DIRS retrotrans- posons (Xiong and Eickbush 1990; Fig. 2). However, none of the 31 potato derived sequences in Fig. 2 are identical. Therefore, the PCR band contains many more than 30 different sequences which are characteristic of the reverse transcriptase gene of Tyl-copia group retro- transposons.

It was immediately obvious that some of the potato sequences in Fig. 2 are very similar to each other. To get a better idea of the overall spectrum of similarities, the aligned sequences were used to generate a phyloge- netic tree, which is shown in Fig. 3. Many of the se- quences fall into subgroups and the sequences in Fig. 2 have been arranged to reflect these groupings. The most closely related pair of sequences are the two 2 library recombinants, M200 and M210, and these from the M200 subgroup in Fig. 2, together with M8, the probe used to isolate them. These three sequences are approxi- mately 30% divergent in their amino acid sequences from the subgroup containing M143 (the M143 sub- group), which itself is comprised of five sequences, with between 6% and 14% amino acid divergence between the various members. There are four other subgroups discernible, each named after the top sequence of the subgroup in Fig. 2. The previously determined Tstl se- quence (Camirand and Brisson 1990) is a distant relative of the M166 subgroup. In addition, there are several sequences which have more than 50% amino acid diver- gence from any other subclone (Fig. 2). These are placed in Fig. 2 near the subgroups which they most resemble.

These data show that there is a very high degree of sequence heterogeneity among Tyl-copia group members in the potato genome, with respect to both the number of different types of retrotransposon se- quence and the degree of sequence divergence among many of these fragments. For instance, the least similar pair of clones, M8 and M34 are 75% divergent. This exceeds the degree of divergence between the copia and 173i retrotransposons of Drosophila (59%). Even the most highly similar sequences, M200 and M210, show greater heterogeneity (4.6%) than do the different copies of the copia or Tyl retrotransposons of D. melanogaster and S. cerevisiae respectively, which generally show less than 2% predicted amino acid divergence between indi- vidual copies (Mount and Rubin 1985; Emori etal. 1985; Flavell et al. 1981; Boeke 1989).

Because these data were so unexpected, we wondered whether these results derived from PCR artefacts. The possibility that these sequences derive from cross-con- tamination with exogenous DNAs and are not actually present in the potato genome is very remote, for the following reasons. First, no amplified fragments were found in any PCR controls run without added DNA (data not shown). Second, we had already isolated ho- mologues of M8 (M200 and M210; see above) from a potato genomic library and we also isolated M159 and M166 homologues (data not shown). Lastly, we de-

237

SOLANACEAE D I C O T S

CAPSICUM C A P 2 I ( T A F L H G [ ~ K K ] ~ : : ~ t E ~ N i I ~ : L K M K G - - R - EN Y ! ~ ; ~ : R : . ~ T ~ ~ ] ~ Q ~ K ~ : : E S V T G E K ~ : N K K I S S ~ H ~ V F A Q R F S D N D - - F I I ~ : : ( Y V D D M ) C A P 2 2 I ~ D K;~::I~ ~ M Q ~ E t ~;V V P G - - K - E H M ~ Y ::~ ~:; N R ~ I ~ i ~ I ~ Q ~ K : : ~ : ~ : ' D S F M S K S V::~ X Q S E K ~ a R C ~ F ~::K Y - T D S - - Y V F :~ ~ ~:: C A P 2 3 ~ E I ~ t ~ l i ~ M L ~ L V i ~ L L X A C A N S S P M ~ I L i ~ Q ~ i ~ H ~ R ~ V S ! ~ Q * _ . ~ P I ~ L S Q A L C S R E Y T H S M N ~ Y S L F T R Q - - - A A S S I V I F V V

I~:~T O K::~ i~:~:M O i~!~~i~1 ~i ? ? P S - - K - E H M E ? i ~ T R : : lB~ i~ i~ i~ i~ !~ : :~ " ? ~ K::~i~iD S F M T K S ~ i ~ C K A K K ~ P ~ C ~ F ~ K Y - T D S - - Y V F ~ i i ~ :I~E D t~i~ !~ Y M D K : : ~ ' I ~ E V KG - - K - E N Y ~ ~;~:: K ~!~ : : ~ : : ~ : ~ I I ~ i ~ Q W ~ : , R : ~ ~IG S F M S QQ~I ~::K Q T S S t ~ H ~ V F V Q K F S DG D - - F I V v~ :~ i

T O M A T O

T O M 1 1 T O M 1 2

T O B A C C O T N T 1 TAFLHG

T N T 1 4 T N T 1 5

PETUNIA P E T 2 6 P E T 2 7 P E T 2 8

DATURA D A T ~ 8 3 D A T 1 8 6 O A T 1 B 8

O T H E R D I C O T S

P E A

P I S 1 7 g P i S 1 8 2

M O N O C O T S

B A R L E Y B A R 1 2 1 B A R 3 0 B A R 2 g

~::E E ~ ' t ~ i::,i~ M E ~:~:~: ~ : ~ E V A G - D ~ E i~ T ::~YM D ~ ~ ~ ~ P A E G - ~ :~ E E D M K ~ R K O V S G -

~ ~ E;E O ~ ~ M K ~ ' E ~ Q V PG -

~ ~ O ~;E ~ Y ~ V ~ M ~ Q S O E - ~E~S E ~ V F M K F PPG::L T P P - -

M ~ E ~ ~ ~ L ~ E E Q E - 0 ~ H E ~ V ~ A L ~ Q ~ L L V Q - - E E E E ~: : . f~ M L ~ ~ E E ~ n -

- K - K H M ~ ~::~;~:: N ig:,~ ~:~::~ ~ : , ~ ] I ~ Q ~ M:,~:I~::D S F M K S O T ~ L K T Y S ~ P ~ V ~ F ~::~ F S E N N - - F I I ~::~::~::~ Y V D D M - K - E D H ~ Q ~ K ~ : , ~ S ~ Q ~ K R Y D A F M T T H E ::~ S R S A F ~ S ~ V ~ H ~::K M S G N S - - M I Y~:::~ ~ - K - E N L ~ K ~ : : ~ Q ~ K ~ : : D G L M H K N S ~ T R C E M ~ H ~ C ~ ~ I ~ N L - O E S - - Y I I ~ : : ~

- K - E N L :~ ~ R ~ K ~ ; ~ : : ~ ~ ~ Q ~ K ~ D G F M N N D ~ K R C Q M ~ H ~ C ~ I ~:: N L - 0 K S - - Y t ( ~ ; ~ ~: - - H P E Y ~ ~:.::~;~:: R ~ A ~ : : ~ ? ? ~ A ~ G ~ I A E F L T O S ~ ~ F I T S A ~ S S L F V ~::V - - - S R G K L A I V ::~ V - - S P N S V ~ : : R ~ R ~ : : ~ : . : : ~ ? ~ S ~ Q ~ A R L S S A L G T R ~ S P S L N ~ Y S E F F ~ S E G T F ~ T I I A V

- K - E N L ~ ~::~:, N ~ ~ : ~ ~ ~ C ~ K R ~:D S F t I S L ~ Y I R H S S ~ P:~ I :~ Y ~:: R F G D G D - - F I I ~ - - D D R T ~ : : N : : ~ ~ S ~ Q ~ E : : ~ , A T S L F S R ~ Y S H S T G ~ Y S L F F R K - - - A G C S L V F V A V - K - E N L ~ N ~ Y : : ~ : ~ ~ C ~ K R ::~D S F I ~ S L ~ Y N R H S S ~ P ~ V ~ Y ~::R F G O G N - - F I I ~ ~::~

E ~ Q ~! O V ~ M T t ~ ~ ~::V T C H . . . . R P N Q ~ ~ i Q ~:Ki~ ~ ~::~!~ ~ i~ '~ N ~ K ~ i E i~ L T A L L L Q Q ~ V S Q S r $ ~ Y S L F T L K - - - T D A H F T A I !~ V [~ E E ~ i ~ / i ~ ~ K ~ S D ~ : ' ~ V K D - - K - D D H ~ ~ R ~:. R ~ ~ ~ ~ : : ~ ~ W K ~ K :.~ L E S V M C E Q ~ Y Q K T T L G H ~ V F V R V F S N D D - - F I I ~ F ::~

E F O t~i:;~ V ~ I E I :~ P ~i~ iG T - S - - E T T G K;~ ~;R ~i K i K } ~ ! ~ ~ i ~ S ~ A ~ F D R i~iR R ^ V R G M ~ Y G Q W N D ~ H T M ,1~ V ~!H S D R K I T / _- ~ - ~ A V E ~ Nit~ tE [ i i ~ Q ~ i ~ D (~ : :V I E G - - Q - E E K ~ i : :~ i~ i L ~::~:, !~i~i~ R ~ K Q ~ H E : : ~ I N T T L T S V ~i::~ V V N E A ~ K ~ V ~ Y R HG G G E G - I ~ C::~I N X K G O V ~ I ~ A ~ C ~ V D P T - - - N A A E I ~i:~'~:N Ri~ I ~ M R~ !~A~S~t~N I H::~D~* V I V O L ~::~ I K N E*__H Y H L ~ N ~ - - - S S G S S V V F i~ I ~

O T H E R O R G A N I S M S D M 8 8 i 1 ~ H i~ T V ~ R ~ ~ ~:::[~i I D E - - - R Y P K K ~ L ~ : : H i ~ ~ ~::~ ~ i ~ S G ~ E ~J~ N K L L N E V L Q K I ~::i~i S S C P S E P ::~ V ~ T R N S G K S K - - - N L V V V Y V D D M 1 7 3 1 T A Y L N S E : i ~ K D T V ~ K O ~ Q I ~ ! ~ I T D A - - - A N P D Q ~ L L . ~ I R : : ~ A I ~ K ~ S G ~ E ~ N S ! ~ . L . , D G V L K D L ~ i ~ i K A C N H E P ! t ~ L ~ Q Q S G Q G N L - - - M L f ~ V Y V D D L C O P l A T A F L N G T L : K ~ : : ~ : ` ~ f ~ : ~ R L ~ Q ~ I - - - S C - - N S ~ N ~ N i ~ A ~ ( ~ ! ~ A ~ C ~ F E ~ i F ~ E Q A L K E C E : : ~ : V N ~ S v ~ R : ~ L D K G N I N - E N I Y V : ~ : : ~ ! Y V O O V T Y ~ S A Y L Y A I ~ / I K V : ~ ; L ~ R P P : . ~ H . . . . . . . L ~ M N D K L ~ R ~ : K : ~ i ~ E ~ S G A N ~ : E T ~ K S Y L ~ Q Q C G M E E V R G W S C V F ~ . . . . . . W S Q V T I C ~ F V D D M

Fig. 4. Alignment of Tyl-copia group retrotransposon subcIones sequence, x = ÷ int frame shift. For explanation of other symbols from plant species other than potato and previously studied group refer to Fig. 2 members from other classes of organism. ? indicates ambiguous

F / $

tected, on a Southern blot of restriction-digested potato DNA, sequences homologous to the M166 fragment (data not shown). Therefore, all three subgroups which we tested are present in the potato genome.

Another possibility was that our PCRs always gener- ate heterogeneous fragments. To exclude this possibility, we amplified 106-fold dilutions of four different potato PCR clones, using the same oligonucleotide primers shown in Fig. 1, and determined the sequences of two independent subclones of each clone. In no case did we find differences within any pair of subclones, apart from the use of different priming oligonucleotides. We also amplified Drosophila DNA with the same primers and obtained a band, which was subcloned and subjected to sequence analysis. Five out of 8 subclones are identi- cal, two differ from these, each by a single nucleotide substitution, and the last subclone has a 4-basepair dele- tion, relative to the five identical sequences (data not shown). Therefore, our experimental approach does not, by itself, generate the observed heterogeneity. Incidental- ly, this Drosophila sequence, Dm88, is a previously unre- ported member of the Tyl-copia group (Fig. 4). Frag- ments derived from the copia and 1731 retrotransposons were not found in the Drosophila amplification, thus validating our choice of the upstream oligonucleotide primer, which cannot recognize the diverged upstream regions of these genetic elements (see Fig. 1).

To get a very rough estimate of the total copy number of one of the sequence subgroups in the potato genome, we screened a potato genomic library with the M159 subclone under conditions which would pick up members of the M41 subgroup only (see Materials and

methods). This showed that there are more than 60 members of the M41 subgroup per tetraploid genome, whereas our PCR subcloning yielded 5 members, all of which were distinct sequences, out of the 30 clones ana- lysed (Fig. 3). If this result is typical of the other se- quences in Fig. 2, there are very approximately 360 Tyl- copia group members in the potato genome and the copy number of any Tyl-copia retrotransposons with effec- tively the same sequence is quite low (typically less than 10). We also performed Southern analysis with the M159, M166 and M8 subclones as probes, again at a hybridisation stringency that would only pick up these subgroups. This confirmed that the total copy numbers of the retrotransposons in each of these subgroups were probably as high as for the M41 subgroup (data not shown).

Tyl-copia group sequence heterogeneity in other plant species

We conclude from the above control experiments that the heterogeneity seen in our subclones is not artefactual but rather reflects the situation for the potato Tyl-copia group members. We next asked whether this variation also exists in plant genomes other than that of potato. DNAs were isolated from a variety of species, including several members of the Solanaceae family (of which po- tato and tobacco are members), together with a more diverged dicot species Pisum sativum (pea) and a mono- cot species Hordeum vulgare (barley). All of these species generated a PCR band with the oligonucleotide primers

238

M154 M I ~ Cap21

M4' I I

M151 m 1 2 @ ~ M340

TO

Plsl 82 M200 M210

Tntl M8 k

Oat183 ] Dat 188

Pet26 -- Tntl 5 - - 1

Tom 1 ] ] Cap22

Bar30 M 147 I

Tnt14 Ta7 I Tal-Ta6 I

M 41-I-i~ M 1 5 9 I

M 15e----ij 1 M 6 1 ~ I

M162

M360 M 148

Pet27

Bar29

M36

M~2 I t ~ M7 Dat 186 ~ ]

Cap23

M168'

M60

Pet28

M44

M38

M156,

Pis179

M3c I

I TaS-Tal 0 I t- M~6~ m

M 1 5 2 " I ' J ~1

M 1 6 ~ [ 1 M167 J

Tstl 1 Bar121

M34

45 40 35 30 25 20 15 10 S 0 1 I 1 I I I I I I I

DISTANCE UNITS

Fig. 5. Phylogenetic tree of predicted peptide sequences of plant members of the Tyl-copia group. Divergences in distance units (Feng and Doolittle 1987) are indicated by horizontal branch lengths, the vertical lengths have no significance. Tal-TalO are from Arabidopsis thaliana (Konieczny et al. 1991) and the diver- gences within Tai-Ta6 and TaS-TalO are shown by boxes. The sequences designated M, Bar, Cap, Dat, Pet, Pis, Tom and Tnt are derived from potato, barley, capsicum, Datura stremonium, pe- tunia, pea, tomato and tobacco respectively (see Figs. 2 and 4 and Materials and methods). The tree is rooted to the Tyi element

used for potato (data not shown). The results of se- quence analysis of subclones derived from these frag- ments are shown in Fig. 4 and the derived phylogenetic tree for all the plant fragments in this study, together with the corresponding regions of the published plant Tyl-copia group retrotransposons Tal-TalO, Tntl and Tstl, are shown in Fig. 5.

Each plant species which we have studied contains heterogeneous reverse transcriptase fragments which are characteristic of Tyt-copia group retrotransposons. Sev- eral of these fragments fall into the groups identified for the potato sequences; for instance, the Cap21 frag- ment, from Capsicum annum, is a member of the M143 subgroup and the previously described Tntl sequence from Nicotiana tabacum (Grandbastien et al. 1989) is a member of the M200 subgroup (Fig. 5). In fact, every potato subgroup in Fig. 3, with the exception of the M41 and M166 subgroups, is shown in Fig. 5 to be present in other plant species. Other fragment sequences, such as Dat183 and Dat188 (from Datura stremonium) fall into the gaps between subgroups. The Tal-Ta6 elements of Arabidopsis thaliana (Konieczny et al, 1991) form a new, relatively homogeneous subgroup which is related to Ta7 and Tntl4 from tobacco and the Ta8-TalO ele- ments are related to the M35 potato sequence.

A comparison of Figs. 3 and 5 shows that, in general, the addition of these other reverse transcriptase se- quences to the analysis increases the complexity of branching in the existing tree, without introducing sig- nificantly longer branches. This means that the most diverged clones in the potato are as divergent from each other as are randomly chosen reverse transcriptase sub- clones from the nine plant species in this study.

Another interesting conclusion that can be drawn from the data in Fig. 5 is that several pairs of plant species which are far diverged from each other carry reverse transcriptase fragments which are reasonably well conserved. For instance, the Bar30 sequence is the closest relative of the M147 potato fragment that we have found, sharing 57% predicted amino acid homolo- gy with it (Fig. 5). We will return to the implications of these observations for models of Tyl-copia group evo- lution in the Discussion.

Incidentally, out of the three tobacco PCR subclones which we isolated, one was identical to the correspond- ing region of Tntl, suggesting that this sequence is strongly represented in the tobacco genome. Indeed, probing Southern blots of potato and tobacco DNAs with M8 subclone at low stringency shows that the M200 subgroup (of which M8 and Tntl are members) is pres- ent in higher copy number in tobacco than potato (data not shown). This is presumably because of the prepon- derance of Tntl elements in tobacco (Grandbastien et al. 1989). However, Tnti is by no means the only Tyi-eopia group sequence in the tobacco genome, because both of the other tobacco-derived subclones (Tntl4 and Tntl5) are only about 60% identical to it at the amino acid level (Fig. 4).

Finally, we compared a representative sample of po- tato Tyl-copia group members, containing single exam- ples of each of the subgroups in Fig. 2, plus several un-

239

TyI \ 25 20 15 ;0 5 0

DISTANCE UNITS

" ~ M35 Dm88

M 3 6 / ~ ~ 0 0

M166 / \ "M147 M360 M41

Fig. 6. Pbylogenetic tree for selected Tyl-copia group members. Divergences in distance units (Feng and Doolittle 1987) are indicat- ed by branch lengths

grouped sequences, with Tyl-copia group members from Drosophila and yeast (Fig. 6). We have used a star repre- sentation for this tree because it shows more clearly the branch length distances between each sequence and all of the others. Here again, inclusion of Drosophila se- quences in the potato sequence analysis does not intro- duce significantly longer branch lengths. For instance, M168 is less divergent from copia (62%) than it is from M147 (66%). This means that the diversity of TyI-copia sequences within the potato genome is comparable to that between the potato and Drosophila genomes. The converse is also true; namely Din88, from Drosophila, is more similar to M35 than it is to copia. Only the Ty element of yeast is significantly more diverged from these sequences than they are from each other.

Discussion

Members of the Tyl-copia group of retrotransposons have been found in a diverse collection of eukaryotes, including fungi, insects and plants. This study was de- signed to use the PCR technique to study the evolution of the Tyl-copia group in flowering plants. By choosing one PCR priming region which is characteristic of only Tyl-copia group retrotransposons and another which is shared among the majority of reverse transcriptase, we have been able to amplify fragments which are related to the reverse transcriptase of the Tyl-copia group, from a wide variety of flowering plant species.

We have shown that these fragments are comprised of a very heterogeneous collection of reverse transcrip- tase sequences. Because this heterogeneity was unex- pected, we performed control experiments to eliminate the possibility of artefact in our results. These show that

our sequences are not the result of exogenous contami- nation, and in three out of three cases tested, are derived from the potato genome. We have also shown that our PCR amplifications generate homogeneous fragments when the templates themselves are homogeneous.

Extreme heterogeneity ofTyl-copia group retrotransposons in plants

The data presented here show that there are more types of Tyl-copia group retrotransposon in plants than were known to exist in the eukaryotes prior to this study. The isolation of 31 different potato sequences from 31 randomly picked subclones suggests that there are many more which we have failed to isolate. We estimate that the total number of different Tyl-copia group retrotrans- poson sequences in the potato tetraploid genome is greater than 360 (see Results). Furthermore, not all the Tyl-copia group retrotransposons would have been be amplified from the potato genome by our choice of PCR primers; the Tsti retrotransposon, for instance, would be undetectable (Camirand and Brisson 1990), owing to its lack of a TAFLHG primer region (Fig. 1).

We have sampled far fewer reverse transcriptase se- quences from plant species other than the potato, but those that we picked are all distinct, both within and between species (Figs. 4 and 5). This suggests that these seven plant genomes also carry a highly heterogeneous collection of copia group retrotransposons.

Why are plant Tyl-copia group retrotransposons so heterogeneous ?

The great complexity of our potato-derived sequences is in strong contrast to the situation in Drosophila and Saccharomyces cerevisiae. Both these organisms contain a small number of homogeneous Tyl-copia group members (Mount and Rubin 1985; Emori et al. 1985; Flavell et al. 1981; Boeke 1989). The degree of diver- gence between different Tyi-copia group members in Drosophila, such as copia and 1731 (59%) is somewhat less than the maximum divergence between different po- tato sequences but is similar to that between the members of some different potato subgroups (for in- stance, M143 and M44 are 60% divergent). However, every retrotransposon group in Drosophila and S. cerevi- siae is much more homogeneous than the plant se- quences described here, with approximately 1% diver- gence being observed between group members. The clo- sest parallel to the complexity seen here is found in Ara- bidopsis thaliana, which contains approximately ten dif- ferent Ta elements which fall into two subgroups (Kon- ieczny et al. 1991; Fig. 5). This degree of heterogeneity is considerably less than that described here for the pota- to, where we see 31 subclones comprised of 31 different sequences, falling into six subgroups and seven un- grouped seeuences. However, it exceeds the heterogene- ity seen in D. melanogaster, which has more Tyl-copia

240

group elements per genome than does A. thaliana. It thus seems that plant Tyl-copia group retrotransposons are inherently more susceptible to sequence variation than their counterparts in Drosophila.

The great variation within the Tyl-copia group retro- transposons in plant genomes is presumably due to fea- tures which are shared by these genomes but absent in those of D. melanogaster and S. eerevisiae. One feature which may contribute to this variation is the distinctive strategy adopted by plants for determination of their germlines. Plants contain meristematic cells which are capable of differentiating into somatic tissue or repro- ductive organs throughout the life cycle. This means that the cells which give rise to new organisms have gone through many somatic cell divisions. Mutations during somatic divisions affect primarily the cells in which they occur, because plants are tolerant to individual cell dam- age and only a small proportion of somatic divisions eventually give rise to germ cells, which must be tested by selection at the organismal level. Animals, on the other hand, complete the determination of the germ line early in embryogenesis and the germ cells may be better protected against mutation.

A further possible cause of this variation may be the great tolerance that plant genomes possess to chromo- somal alterations (Walbot and Cullis 1985). In natural populations of plants, polyploidy and supernumary chromosomes are very common and have little or no impact on organismal survival. The great variation in the retrotransposons resident in plants may reflect this tolerance. Another possibility is that molecular evolu- tion proceeds at a faster rate in plants than in animals, through inherently different error rates of DNA replica- tion and repair.

An additional contributory factor to the extreme het- erogeneity of potato Tyl-copia group members may be their copy number. The total genetic divergence between any transposons in a population is proportional to their copy number (Charlesworth 1986). There are hundreds of Tyl-copia group retrotransposons in the genomes of tobacco (Grandbastien et al. 1989) and our data suggest that there are at least 360 in the potato, whereas the copia and Ty elements are each present at approximately 10-60 copies per haploid genome (Potter et al. 1979; Boeke 1989). In mammals, high copy number retroele- ments, such as the human retrovirus related sequence (HuRRS-K) group (Shih et al. 1989) and the A particle retrotransposons of the mouse (Kuff and Lueders 1988) are comprised of heterogeneous sequences. Copy number cannot be the only determinant of sequence het- erogeneity of these elements in flowering plants, how- ever, since the Ta elements of Arabidopsis thaliana are both heterogeneous and present in low copy number.

In summary, plant genomes seem to be inherently more susceptible to the generation of sequence diversity in Tyl-copia group retrotransposons than are the ge- nomes of Drosophila and S. cerevisiae. An additional factor promoting heterogeneity may be the copy number of the retrotransposon and this feature may be in part responsible for the extreme heterogeneity seen for the potato.

The majority of potato Tyl-copia group retrotransposons are defective

Approximately 40% of the Tyl-copia group retrotrans- poson fragments in Figs. 2 and 4 bear translational stop codons. Errors generated by PCR might account for some of these but the reported error rate of PCR (Gyl- lenstein 1989) and our control amplifications (see Re- sults) indicate that one base pair per subclone, or less, is altered by PCR. As only 3 of the 64 codons are stop codons, less than I in 20 base substitutions would gener- ate translational stop codons. Therefore, only a small minority of the stop codons in our sequences can be artefactual. These subclones comprise about only 6% of the open reading frames of the retrotransposons to which they belong. Therefore, if stop codons are ran- domly distributed within the protein-coding regions, the majority of these regions (our calculations suggest greater than 99%) contain nonsense codons. However, our data are also consistent with the existence of a signif- icant population of active transposons superimposed upon a background of defective elements, each carrying a large number of translational discontinuities.

The Ta retrotransposons of Arabidopsis thaliana also appear to be defective (Konieczny et al. 1991). However, it is important to note that the presence of the active Tntl retrotransposon of tobacco (Grandbastien et al. 1989) shows that not all plantTyl-copia group members are dead relics of ancient active retrotransposons. Active retrotransposons might provide helper function which enable defective elements to transpose. However, the rel- atively low copy number of each distinct element (see Results) implies that the replication rate of these ele- ments via retrotransposition is low in the potato. We have not yet performed the necessary experiments that would resolve this question.

Horizontal and vertical transmission of retrotransposons

The second major conclusion from our studies is that Tyl-copia group retrotransposons of different species can in some cases be placed together in phylogenetic subgroups. There are two possible explanations for this. The first postulates that the process of diversification of these transposons commenced prior to the separation of the species concerned and the descendants of these ancient Tyl-copia elements have been transmitted verti- cally in these phyla. The second possibility is that Tyl- copia group retrotransposons may have been transferred horizontally between these species in the more recent past. These two possibilities are not mutually exclusive.

If vertical transmission were a major contributor to the spectrum of Tyl-copia group members in plants, we would expect that closely related species contain closely related retrotransposon homologues, whereas distantly related species should, in general, contain more distantly related sequences. This does seem to be the case in Fig. 5, where sequences which fall into the potato subgroups of Fig. 3 tend to belong to the Solanaceae family

241

members. This suggests that these subgroups arose in a Solanaceae progenitor and have persisted and diversi- fied in the different descendant species.

The problem with a model which uses only vertical transmission to explain our data is that it cannot account for the fact that the Tyl-copia group sequences of barley and potato, which diverged from each other approxi- mately 200 million years ago (Wolfe et al. (1989), are quite similar. The most similar pair (M147 and Bar30) share 57% predicted amino acid identity, yet reverse transcriptases are among the most variable proteins known, with only one amino acid out of 77 in this region being invariant amongst all known reverse transcriptases (Xiong and Eickbush 1990). Added to this is the fact that reverse transposition introduces errors at rates which are of the order of 104-106-fold higher than those associated with normal D N A replication. Inactive ele- ments would diverge at substantially the rate of pseudo- genes, which is far lower than that for active retroele- ments but is still very rapid compared to nuclear genes which are under active selection (Doolittle et al. 1989). It therefore seems unlikely that these retrotransposons could persist in diverging monocot and dicot genomes for periods in excess of 108 years and still remain recogn- isably similar. We conclude that horizontal transmission probably has played a role in the evolution of at least some of these sequences.

Others have suggested that retrotransposons in gener- al, and copia and Tyl in particular, may have been trans- mitted horizontally between their hosts, bearing in mind their close relationship to retroviruses (Yuki et al. 1986; Doolittle et al. 1989; Xiong and Eickbush 1990; Kon- ieczny et al. 1991). However, an experiment to test di- rectly for horizontal transmission of the Tyl element within yeast cocultures showed no evidence for such a process (Garfinkel et al. 1985). A potential barrier to horizontal transmission of Tyl-copia retrotransposons is their lack of an envelope glycoprotein gene (env), which is absolutely required for infectivity of the related retroviruses. These genetic elements would therefore have to rely upon helper vectors, such as viruses (Miller and Miller 1982) to shuttle them between species. Never- theless, there are a great number of cases where horizon- tal transmission needs to be invoked to explain the pres- ent-day distribution and sequence similarities of Tyl- copia group retrotransposons. They are not present only in organisms with close ecological or evolutionary rela- tionships, but are found in the genomes of S. cerevisiae, the slime mould Physarum polycephalum (Rothnie et al. 1991), many Drosophila species (Martin etal . 1983), mosquitoes (A. Warren, personal communication) and a fish (A. Flavell and D. Smith, in preparation). In this report in particular, we have shown that potato shares Tyl-copia group subgroups with all other tested Solana- ceae plants, and with barley. There are also more dis- tantly related, but recognisably similar, retrotransposon sequences in Arabidopsis and pea (Fig. 5). If all these shared sequences are the result of horizontal transmis- sion then this must have been a very promiscuous pro- cess, and it should be possible to demonstrate if it is still continuing at the present time.

In conclusion, we suggest that both vertical transmis- sion and horizontal transmission have played roles in the evolution of Tyl-copia group retrotransposons in flowering plants. The determination of the relative con- tributions of each process to the present-day spectrum of this group in these organisms will require a more detailed analysis of these sequences in a variety of spe- cies, together with experiments to look for direct evi- dence for horizontal transmission.

Acknowledgements. We thank Jane Cooke for excellent technical assistance, Andy Leigh-Brown for helpful discussions and Michel Caboche and Marie-Ange Grandbastien for providing Tntl probes. We are also grateful to the staff at the Daresbury computer centre for helpful advice and Alastair Murchie for oligonucleotide synthe- sis. This research and D.B.S. were supported by the Project Grant PG94/505 under the AFRC Plant Molecular Biology Initiative.

References

Bingham PM, Zachar Z (1989) Retrotransposons and the FB ele- ment from Drosophila melanogaster. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Wash- ington DC, pp 485-502

Boeke J (1989) Transposable elements in Saccharomyces eerevisiae. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Washington DC, pp 335-374

Camirand A, Brisson N (1990) The complete nucleotide sequence of the Tstl retrotransposon of potato. Nucleic Acids Res 18 : 4929

Charlesworth B (1986) Genetic divergence between transposable elements. Genet Res 48 : 111-118

Devereux J, Haeberli P, Smithies O (1984) A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12:387-395

Doolittle RF, Feng DF, Johnson MS, McClure MA (1989) Origins and evolutionary relationships of retroviruses. Quart Rev Biol 64:1-29

Doolittle RF, Feng DF (1990) Nearest neighbour procedure for relating progressively aligned amino acid sequences. Meth En- zymol 183:659-669

Emori Y, Shiba T, Kanaya S, Inouye S, Yuki S, Saigo K (1985) Determination of the nucleotide sequences of copia and copia- related RNA in Drosophila VLP. Nature 315 : 773-776

Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351- 360

Feng DF, Doolittle RF (1990) Progressive alignment and phyloge- netic tree construction of protein sequences. Meth in Enzymol 183:375-387

Flavell AJ, Levis R, Simon MA, Rubin GM (1981) The 5' termini of RNAs encoded by the transposable element copia. Nucleic Acids Res 9 : 6279-6291

Garfinkel DJ, Boeke JD, Fink GR (1985) Ty element transposi- tion : Reverse transcriptase and virus-like particles. Cell 42: 507- 517

Grandbastien M-A, Spielmann A, Caboche M (1989) Tntl, a mo- bile retroviral-like transposable element of tobacco isolated by plant cell genetics. Nature 337:376-380.

Gyllensten U (1989) Direct sequencing of in vitro amplified DNA. In: Erlich HA (ed) PCR Technology. Stockton Press, New York, p 54

Higgins DG, Sharp PM (1988) CLUSTAL: A package for per- forming multiple sequence alignments on a microcomputer. Gene 73 : 237-244

Konieczny A, Voytas DF, Ausubel FM (1990) Retrotransposable elements in Arabidopsis thaliana. In: Plant Gene Transfer. Alan R. Liss. pp 65 70

242

Konieczny A, Voytas DF, Cummings MP, Ausubel FM (1991) A superfamity of Arabidopsis thaliana retrotransposons. Genet- ics 127:801-809

Kuff EL, Lueders KK (1988) The intracisternal A-particle gene family: structural and functional aspects. Adv Cancer Res 51:183-276

Mignery CA, Pikaard CS, Park WD (1988) Molecular characteriza- tion of the patatin multigene family of potato. Gene 62: 27-44

Miller DW, Miller LK (1982) A virus mutant with the insertion of a copia-like transposable element. Nature 299: 562-564

Mount SM, Rubin GM (1985) Complete nucleotide sequence of the Drosophila tranposable element eopia: Homology between eopia and retroviral proteins. Mol Cell Biol 5:1630-1638

Potter SS, Brorein WJ, Dunsmuir P, Rubin GM (1979) Transposi- tion of elements of the 412, eopia and 297 dispersed repeated gene families in Drosophila. Cell 17:415-427

Rothnie HM, McCurrach KJ, Glover LA, Hardman N (1991) Re- trotransposon-like nature of Tpl elements: implications for the organization of highly repetitive, hypermethylated DNA in the genome of Physarum polycephalum. Nucleic Acids Res 19:279- 286

Saghai-Marzoof MA, Soliman KM, Jorgennsen RA, Allard RW (1984) Ribosomal DNA spacer length polymorphism in barley. Mendelian inheritance, chromosomal location and population dynamics. Proc Natl Acad Sci USA 81:8014-8018

Shih A, Misra R, Rush MG (1989) Detection of multiple novel

reverse transcriptase coding sequences in human nucleic acids: Relation to primate retroviruses. J Virol 63: 64-75

Temin HM (1980) Origin of retroviruses from cellular moveable genetic elements. Cell 21 : 599-600

Varmus HE, Brown P (1989) Retroviruses. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Washington DC; pp 53-108

Voytas DF, Ausubel FM (1988) A copia-like transposable element family in Arabidopsis thaliana. Nature 336:242-244

Walbot V, Cullis CA (1985) Rapid genomic change in higher plants. Annu Rev Plant Physiol 36 : 367-396

Wolfe KKH, Gouy M, Yang Y-W, Sharp PM, Li W-H (1989) Date of the monocot-dicot divergence estimated from chloro- plast DNA sequence data. Proc Natl Acad Sci USA 86:6201- 6201

Xiong Y, Eickbush TH (1990) Origin and evolution of retroele- ments based upon their reverse transcriptase sequences. EMBO J 9:3353-3362

Yuki SS, Ishimaru S, Inouye S, Saigo K (1986) Identification of genes for reverse transcriptase4ike enzymes in two Drosophila retrotransposons, 412 and gypsy; a rapid detection method of reverse transcriptase genes using YXDD box probes. Nucleic Acids Res 14: 301 23030

C o m m u n i c a t e d by J. F i n n e g a n