Analysis of the genome structure of tobacco rattle virus strain PSG

13
Volume 14 Number 5 1986 Nucleic Acids Research Analysis of the genome structur of tobacco rattle virus strain PSG Ben J.C.Cornelissen, Huub J.M.Linthorst, Frans Th.Brederode and John F.Bol* Department of Biochemistry, State University of Leiden, PO Box 9505, 2300 RA Leiden, The Netherlands Received 24 January 1986; Accepted 11 February 1986 ABSTRACT The sequence of the 3'-terminal 2077 nucleotides of genomic RNA 1 and the complete sequence of genomic RNA 2 of tobacco rattle virus (TRV, strain PSG) has been deduced. RNA 2 (1905 nucleotides) contains a single open reading frame for the viral coat protein (209 amino acids), flanked by 5'- and 3'-noncoding regions of 570 and 708 nucleotides, respectively. A subgenomic RNA (RNA 4) was found to lack the 5'-terminal 474 nucleotides of RNA 2 and is the putative messenger for coat protein. The deduced RNA 1 sequence contains the 3'-terminal part of a reading frame that probably corresponds to the TRV 170K protein and reading frames for a 29K protein and a 16K protein. Proteins encoded by the first two reading frames show significant amino acid sequence homology with corresponding proteins encoded by tobacco mosaic virus. Subgenomic RNAs 3 (1.6 kb) and 5 (0.7 kb) were identified as the putative messengers for the 29K and 16K proteins, respectively. At their 3'-termini all PSG-RNAs have an identical sequence of 497 nucleotides; at the 5'-termini homology is limited to 5 to 10 bases. INTRODUCTION A division of tobraviruses into three separate clusters has been pro- posed(1). These are represented by strains of tobacco rattle virus (TRV, serotype I-II), the CAM-strain of TRV (serotype III, also called pepper ringspot virus), and strains of pea early-browning virus (PEBV). Hybridiza- tion experiments using complementary DNA (cDNA) copies of virus RNA showed extensive homology between viruses within one cluster but not between viruses from different clusters (1). The genome of tobraviruses consists of two RNA molecules. The longer genome segment, RNA 1, has a length of approximately 6300 nucleotides in all strains. The length of RNA 2, however, differs from strain to strain and ranges from 1800 to 4000 nucleotides (2). Different strains of the TRV-cluster (serotype I-II) share extensive sequences in RNA 1 but show much diversity in the sequence of their RNA 2 (1). In vitro translation experiments demonstrate that RNA 1 directs the synthesis of a 120K protein and a 170K protein (3-5). The 170K protein is C IRL Press Umited, Oxford, England. Nucleic Acids Research Volume 14 Number 5 1986 2157

Transcript of Analysis of the genome structure of tobacco rattle virus strain PSG

Volume 14 Number 5 1986 Nucleic Acids Research

Analysis of the genome structur of tobacco rattle virus strain PSG

Ben J.C.Cornelissen, Huub J.M.Linthorst, Frans Th.Brederode and John F.Bol*

Department of Biochemistry, State University of Leiden, PO Box 9505, 2300 RA Leiden, TheNetherlands

Received 24 January 1986; Accepted 11 February 1986

ABSTRACTThe sequence of the 3'-terminal 2077 nucleotides of genomic RNA 1

and the complete sequence of genomic RNA 2 of tobacco rattle virus (TRV,strain PSG) has been deduced. RNA 2 (1905 nucleotides) contains a singleopen reading frame for the viral coat protein (209 amino acids), flankedby 5'- and 3'-noncoding regions of 570 and 708 nucleotides, respectively.A subgenomic RNA (RNA 4) was found to lack the 5'-terminal 474 nucleotidesof RNA 2 and is the putative messenger for coat protein. The deduced RNA 1sequence contains the 3'-terminal part of a reading frame that probablycorresponds to the TRV 170K protein and reading frames for a 29K proteinand a 16K protein. Proteins encoded by the first two reading frames showsignificant amino acid sequence homology with corresponding proteins encodedby tobacco mosaic virus. Subgenomic RNAs 3 (1.6 kb) and 5 (0.7 kb) wereidentified as the putative messengers for the 29K and 16K proteins,respectively. At their 3'-termini all PSG-RNAs have an identical sequenceof 497 nucleotides; at the 5'-termini homology is limited to 5 to 10 bases.

INTRODUCTIONA division of tobraviruses into three separate clusters has been pro-

posed(1). These are represented by strains of tobacco rattle virus (TRV,serotype I-II), the CAM-strain of TRV (serotype III, also called pepperringspot virus), and strains of pea early-browning virus (PEBV). Hybridiza-tion experiments using complementary DNA (cDNA) copies of virus RNA showedextensive homology between viruses within one cluster but not between virusesfrom different clusters (1). The genome of tobraviruses consists of two RNAmolecules. The longer genome segment, RNA 1, has a length of approximately6300 nucleotides in all strains. The length of RNA 2, however, differs fromstrain to strain and ranges from 1800 to 4000 nucleotides (2). Differentstrains of the TRV-cluster (serotype I-II) share extensive sequences in RNA 1but show much diversity in the sequence of their RNA 2 (1).

In vitro translation experiments demonstrate that RNA 1 directs thesynthesis of a 120K protein and a 170K protein (3-5). The 170K protein is

C IRL Press Umited, Oxford, England.

Nucleic Acids ResearchVolume 14 Number 5 1986

2157

Nucleic Acids Research

probably produced by readthrough translation of the 120K cistron (5). RNA 2translates into viral coat protein (3,4). In addition to RNAs 1 and 2, asubgenomic RNA 3 of about 1.6 kb has been identified in several strains,which directs the synthesis of a 30K protein. There is a controversy in theliterature about whether the 30K-cistron should be assicned to RNA 1 or RNA 2(5,6).

To obtain further insight in the genome structure of TRV we have clonedcDNA of the RNAs of strains PSG and TCM, isolated in The Netherlands frompotato and tulip, respectively (7). RNA 2 of strain TCM (3.5 kb) is about1600 nucleotides longer than RNA 2 of strain PSG (1.9 kb). Here we report the3'-terminal 2077 nucleotides of PSG-RNA 1 and the complete sequence of RNA 2(1905 nucleotides). The RNA 1 sequence contains three open reading framesencoding the C-terminal region of the 170K protein, a 29K protein and a 16Kprotein, respectively. The amino acid sequences of these proteins were com-pared to corresponding proteins encoded by the tobacco mosaic virus (TMV)genome. RNA 2 was found to contain only one significant open reading frameencoding the capsid protein. The structure of PSG-RNA 2 was compared to theCAM-RNA 2 sequence that was published recently (8). Sequencing of a subge-nomic RNA (RNA 4, 1431 nucleotides) showed it to be derived from PSG-RNA 2.Two other subgenomic RNAs (RNAs 3 and 5) were identified that are probablyinvolved in the expression of the 29K and 16K proteins.

MATERIALS AND METHODSVirus strains. A Dutch TRV strain from potato (PSG) was a gift from Dr.

J.A. de Bokx (Wageningen, The Netherlands). Inocula of two local TRV strainsfrom tulip (TCM and TAK) were kindly provided by Dr. C.J. Asjes (Lisse, TheNetherlands). Strains ORY, SYM and CAM were obtained from Drs. B.D. Harrisonand D.J. Robinson (Dundee, Scotland). A Dutch isolate of pea early-browningvirus (PEBV) was from Dr. L. Bos (Wageningen, The Netherlands).

Purification of viraZ nucZeoprotein and RNA. Virus was purified fromSamsun NN tobacco according to Huttinga (9) and sedimented in sucrosegradients. Fractions containing approximately equal amounts of long and shortparticles were collected. RNA was extracted with phenol/chloroform (1:1) at650C, from purified virus suspensions that had been incubated in 1% SDS for15 min at 37°C. Ethanol precipitated RNA was dissolved in 20 mM tris-HCl,pH 7.6, 0.1 mM EDTA.

Synthesis and cloning of double-stranded cDNA. RNA was polyadenylatedusing ATP:RNA adenyltransferase (Bethesda Research Laboratories). This RNA

2158

Nucleic Acids Research

was copied into DNA by the method of Gubler and Hoffman (10): the firststrand was synthesized with reverse transcriptase (Life Sciences Inc.) usingoligo(dT) as a primer. The second strand was synthesized with a combinationof the enzymes E. coli DNA polymerase I, RNase H and E. coZi ligase (all fromP-L Biochemicals). Double-stranded cDNA was tailed with dCTP and annealed to

Pst I-cut, G-tailed pUC 9 or pBR 322 as described (11). Transformation of

E. coZi and isolation of plasmid DNA were according to Pagert and Ehrlich(12) and to Birnboim and Doly (13), respectively. Clones were characterizedby restriction enzyme and Northern blot analyses.

DNA sequencing. cDNA inserts or restriction fragments thereof, were

subcloned into the mp and tg derivatives of M13 (14,15) and sequenced by thedideoxy chain termination method (16) using (a-35S)dATP (17).

RNA sequencing. RNAs 1, 2 and 4 of TRV strain PSG were decapped withtobacco acid pyrophosphatase (kindly provided by Dr. L. Pinck, Strasbourg,France) and dephosphorylated with bacterial alkaline phosphatase (BethesdaResearch Laboratories). After 5'-labeling with (y-32P)ATP and T4 polynucleo-tide kinase (Bethesda Research Laboratories), the RNAs were separated byelectrophoresis in 1.5% low melting point agarose, partially digested withnuclease P1 (Boehringer) and analysed by the wandering spot technique as

described previously (18).Northern bZotting and hybridization. Samples of 0.5 ug of TRV-RNA were

denatured with glyoxal (19), electrophoresed in 1.5% agarose gels and trans-ferred to Biodyne membranes (Pall Ltd., Portsmouth, U.K.) according to Thomas(20). The hybridization of the blots to cDNA probes, labelled by nick trans-lation (21), was performed as described previously (22).

RESULTSAnatysis of TRV RNAs

Figure 1 shows a Northern blot of RNAs of PEBV, strain CAM and five strainsof the TRV cluster (TCM, TAK, PSG, SYM and ORY). In panel A the blot isprobed with a cDNA clone corresponding to the 3'-terminal 440 nucleotides ofTCM-RNA 1. This clone hybridizes to RNAs 1 and 2 of a number of TRV strains(7). At least five RNAs were consistently found with this probe in prepa-rations of TCM, TAK, PSG, SYM and ORY. The genomic RNA 1 and the subgenomicRNAs 3 and 5 are of the same length in all strains; the length of the

genomic RNA 2 and the subgenomic RNA 4 differs from strain to strain (theirposition is indicated by two small bars in Figure 1A). The size differencebetween RNA 2 and 4 is about 500 nucleotides in all strains. In addition to

2159

Nucleic Acids Research

:-~~ ~~~~ C 0-t_ 0

4,,

Figure 1. Northern blot of RNAs extracted from virus preparations of TRV-strains TCM, TAK, PSG, SYMA! ORY, CAM and pea early-browning (PEB) virus.The blot was probed with 3 P-cDNA corresponding to (A) the 3'-terminal 440nucleotides of TCM-RNA 1, and (B) nucleotides 1193 to 2077 from the 3'-endof PSG RNA 1. The positioii of RNAs 1, 3 and 5 in all strains is indicatedin the margin; the position of RNAs 2 and 4 varies in each strain and isindicated by two small bars.

RNAs 1 to 5 some strainscontain RNAmolecules that have no clear counterpartin other strains. In panel B of Figure 1 the blot was probed with cDNAcorresponding to nucleotides 1193-2077 from the 3'-end of PSG-RNA 1. Thisprobe does not hybridize to RNAs 2, 4 and 5 but it does hybridize to RNAs 1and 3. Thus RNA 3 is derived from RNA 1. Moreover, the results of Figure 1confirm that there is extensive sequence homology between the 3'-terminal2 kb region of RNA 1 of the five strains from the TRV cluster, whereas nohomology is found with strains PEBV and CAM.

The data of Figure 1A show that there is considerable sequence homologyat the 3'-termini of TRV-RNAs 1 and 2 andtherespective subgenomic RNAs. Thesequence of the 5'-terminal 16 nucleotides of PSG-RNAs 1, 2 and 4 was deter-mined with the wandering spot method and is listed in Table 1. In all threeRNAs the cap-structure (2) is followed by the sequence AUAAA---; in RNAs 1and 2 the 5'-homology is extended to 10 nucleotides.3'-TerminaZ nucZeotide seqcuence of PSG-RNA 1

The sequence of the 3'-terminal 2077 nucleotides of PSG-RNA 1 was

2160

Nucleic Acids Research

Table 1: Confirmed and putative 5'-terminal sequences of TRV-RNAs

-5 -1 +1 +10Confirmed 5'-termini PSG-RNA 1 AUAAAACAUUUCAAUC--

PSG-RNA 2 AUAAAACAUUGCACCU--PSG-RNA 4 AUGGC AUAAAUAAACUGUUUG--CAM-RNA 2 AAAAUUUUCAGAAUGU--

Putative 5'-termini PSG-RNA 3 AUGGAAGC AUAUUAAGAGUUUUAC--PSG-RNA 5 AUGC AUAAAGAAAUUUAUUG--CAM-RNA 4 AUGC AUAAUUAUACUGAUUG--

Homologous nucleotides are underlined. Homologous genomic sequencespreceding the 5'-termini of the subgenomic RNAs are given. Sequencesof strain CAM are from Bergh et al. (8).

deduced from a single cDNA clone. Details of the sequencing procedure areavailable on request. Comparison with the 3'-terminal sequence of PSG-RNA 2(see below) indicated that the 3'-terminal 15 nucleotides are missing in thisclone. The sequenced region of PSG-RNA 1 contains three open reading frameswhich are schematically represented in Figure 2; for comparison also thegenome structure of TMV (23) is given.

Figure 3 shows the nucleotide sequence together with the amino acidsequences corresponding to the three open reading frames. The first readingframe probably corresponds to the C-terminal 179 amino acids of the TRV 170K

126 K 183K 30K CP

TMV - RNA 6395 nt

120 K 170K 29K 16K

z -T------ r-IZ PSG - RNA 1 ±6300 nt

CP

,; I_ PSG - RNA 2 1905 nt

CP:F_ '1 PSG - RNA 4 1431 nt

Figure 2. Schematic representation of the genetic information encoded in RNAs1, 2 and 4 of TRV-strain PSG. Regions of the RNAs that have been sequencedare indicated by solid bars; the open box at the end of these bars representsthe 3'-terminal sequence of 497 nucleotides that is identical in the threeRNAs. The location of the cistrons encoding the 120K protein, 170K protein,29K protein, 16K protein and coat protein (CP) is indicated. For comparison,the genome structure of TMV (23) is included.

2161

Nucleic Acids Research

G A HAUGGCGCACAUC

10L P L EUACCAUUAGAGA

100P K L ACAAAGUUGGCUA

190S C Y i;CGUGUUACGAGL

280Y 1 S LACAUCUCGCUGA

370K G D SAAGGUGAUUCUC

460

AAGGAAUUGAAC550

V T L KGGUUACCUUGAM

640K Y F E

GMGUACUUUCA730

Y K M ICUAUAAAAUGAL

820I Q I D

UAUUCAMUCGA910

C S V ACUGUUCAGUUGC

1000V P V C

CGUUCCGGUCUG1090

E L N SAGMCUGAAUAG

1180T I E A

GACGAUUGAGGC1270

N K K NUAAUMGAAAAA

1360G H E I

UGGUCACGAGAC1450

N C G WUAAUUGUGGAUG

1540K R V E

UAAACGCGUUGA1630

N S K NUAAUUCAAGAA

1720D Q L F

ULGAUCMUUGUU1810

UCUUACCAUAGG1900

AGAGAUUAUUGC1990

UACGCCC2077

L V P f K S 1;:UGGUACCAACAAAAAGUGG

20 30K A V M V T Y,AAGCAGUCAUGGUUACGUA

110 120T K W N F E CLCUAAGUGGAAUUUCGAGUG

200 210F V P G KIUUGUGCCAGAUCCAGUAAA

290 300N D S N R A LAUGAUUCCMUAGAGCUCU

380 390V H A L C A L;UUCAUGCGCUUUGUGCGCU

470 480

;CCGGCUCAGGUUGAUUGGA560 570

K K T F E VLGMGMGACUUUCGMGUU

650 6601 R R R E T VLCAGAAGMGAGAAACUGUC

740 750K S Y A F L

IUAAAUCUUACGCGUUUCUG830 840

L L D S R LUCUGUUGGAUUCGAGACUU

920 930L Q Y K V E Y:GCAGUACMGGUUGAAUAC

1010 1020:D G T Y P F;UGACGGUACAUACCCUUUC

1100 1110S D Y I E G

UUCGGAUUACAUUGAAGGC1190 1200

K Y D G P Y:GAAGUAUGACGGUCCUUAC

1280 1290[S S *,CUCGUCUUAAUGCAUAAG

1370 1380C S I G H A

:GUGUAGUAUCGGUCAUGCU1460 1470

f F V C I I ICUUUGUCUGUAUUAUUAUC

1550 1560A R N R E I

AGCAAGAAAUCGAGAGAUC1640 1650

[ S K K K F K,CUCUAAGMGAAAUUCAAA

1730 1740'V F *UGUUUMUUGAUUUUAUUUU

1820 1830

GAAACGGACUUUGUUUGUA1910 1920

;GGGUGAGUAAGUACUUUU2000 2010

;A

.0

;C.

A

IU

IA

D A D T Y N A N S D R T L C A L L S KGAUGCUGAUACUUACAAUGCAAAUUCAGACAGAACGCUUUGUGCGCUCUUGUCUGAAU

40 50 60 70 80 90G G U U S L I A F P R G T Q F V D P CGGAGGAGAUGACUCACUGAUUGCGUUUCCCAGGGGAACACMUUUGUUGAUCCAUGUC

130 140 150 160 170 180K I F K Y D V P M F C G K F L L K T SAAGAUCUUUAAGUAUGAUGUCCCGAUGUUUUGUGGGAAGUUCUUGCUUAAGACGUCAU

220 230 240 250 260 270V L T K L E K K S I K D V Q H 1. A E 1GUUCUGACGAAGUUGGGGAAAAAGAGCAUAAAAGAUGUACMCAUUUAGCUGAGAUUU

310 320 330 340 350 360G N Y M V V S K L S E S V S D R Y L Y

GGAAAUUAUAUGGUGGUGUCCAAACUGUCCGAAUCUGUUUCAGAUCGGUAUUUGUACA400 410 420 430 440 450

W K H I K S F T A L C T L L P R R K GUGGAAGCAUAUUMGAGUUUUACGGCUCUGUGUACUUUACUUCCGAGACGAAMGGAU

10 500 510 520 530 540M E D K S L

AGAAGGCACAGCGGGCUGUGUCAMCUUUUACGACUGGUGAGAUGGAAGACAAGUCAUU580 590 600 610 620 630

S K F S N L G A I E L F V D G R R K R PIUCAAAAUUUUCAMUCUAGGGGCCAUUGAGUUGUUUGUAGACGGUAGGAGGMGAGACC

670 680 690 700 710 720L N H V G G K K S L H K L O V F D Q R D

:CUAAAUCAUGUUGGUGGAMGAAGAGUGAACAUAAGUUAGACGUUUUCGAUCAMGGGA760 770 780 790 800 810

K I V G V Q L V V T S H L P A D T P G FAAGAUAGUGGGUGUACAACUAGUUGUAACAUCACAUCUACCUGCAGAUACACCUGGGUU

850 860 870 880 890 900T E K R K K G K T I Q R F K A R A C D N

'ACUGAGAAAGAMAGAAGGGAMGACUAUUCAGAGAUUCAMGCUCGAGCUUGCGAUAA940 950 960 970 980 990

S I S T Q E N V L D V W K V G C I S E GAGUAUUUCUACACAGGAGMUGUGCUUGAUGUCUGGMGGUGGGUUGUAUCUCUGAGGG

1030 1040 1050 1060 1070 1080S I E V S L I W V A T D S T R R L N V E

AGUAUCGMGUGUCGCUAAUUUGGGUUGCUACUGAUUCGACUAGGCGUCUUAAUGUGGA1120 1130 1140 1150 1160 1170

D F T D Q E V F G E F H S L K Q V E M KGAUUUCACCGAUCMGAGGUUUUCGGUGAGUUCAUGUCUUUGAAACMGUGGAGAUGAA

1210 1220 1230 1240 1250 1260R P A T T R P K S L L S S E D V K R A S

AGACCAGCUACUACUAGACCUAMUCAUUGUUGUCMGUGAAGAUGUUAAGAGAGCGUC1300 1310 1320 1330 1340 1350

M T C V L K G C V N E V T V LAAAUUUAUUGUCAAUAUGACGUGUGUACUCMGGGUUGUGUGMUGMGUCACUGUUCU

1390 1400 1410 1420 1430 1440N K L R K Q V A D M V G V T R R C A E N

AACMAUUGCGAAAGCAAGUUGCUGACAUGGUUGGUGUCACACCUAGGUGCGCGGMAA1480 1490 1500 1510 1520 1530

N D F T F D V Y N C C G R S H L E K C RAMUGAUUUUACUUUUGAUUGJUAUMUUGUUGUGGCCGUAGCCACCUUGAAAAGUGUCG

1570 1590 1600 1610 1620W K Q I R R I Q A E S S S A T R K K S H

UGGAMCAGAUUCGACGAAUUCMGCUGMAAGCUCGUCUGCGACACGUMAAAGUCUCA1660 1670 1680 1690 1700 1710

E D R E F G A P K R F L R D D V P L G I.GAGGACAGAGAAUUUGGGGCACCMAAAGAUUUUUAAGAGAUGAUGUUCCUUUGGGMU

1750 1760 1770 1780 1790 1800

AAMUUGUUAUCUGUUUCUGUGUAUAGACUGUUUGAGAUUGGCGUUUGGCCGACUCAUUG1840 1850 1860 1870 1880 1890

,UUGUUAUUUUAUUUGUAUUUUAUUMAAMUUCUCAAUGAUCUGMAMAGCUUCGCGGCUA1930 1940 1950 1960 1970 1980

IAMGUGAUGAUGGUUACAAAGGCAAAAGGGGUAAAACCCCUCGCCUACGUMGCGUUAU2020 2030 2040 2050 2060 2070

Figure 3. Sequence of the 3'-terminal 2077 nucleotides of PSG-RNA 1. Theamino acid sequence deduced from the open reading frames for the C-terminusof the 170K protein (nucleotides 1 to 539), the 29K protein (nucleotides 614to 1369) and the 16K protein (nucleotides 1397 to 1819) are given. Thearrows at positions 490 and 1376 indicate the putative 5'-termini of RNAs 3and 5, respectively. The asterisk marks the beginning of the 3'-terminalsequence of 497 nucleotides that is identical in PSG-RNAs 1 and 2.

2162

Nucleic Acids Research

protein (see discussion). The second reading frame encodes a protein of 252amino acids with a molecular weight of 28,793, hereafter referred to as 29Kprotein. The third reading frame corresponds to a protein of 141 amino acids

with a molecular weight of 16,278 (16K protein). The two intercistronic

regions are 75 and 27 nucleotides long, repectively; the length of the 3'-terminal noncoding region is 258 nucleotides.

Complete nucZeotide sequence of PSG-RNA 2

Three overlapping cDNA clones were used to deduce the sequence of

nucleotides 10 to 1905 of PSG-RNA 2. Sequencing of 3'-labeled RNA confirmedthat the 3'-terminal sequence of the RNA was represented in one of the clones(Van Belkum et aZ., manuscript in preparation). The missing 5'-terminal 9

nucleotides were deduced by sequencing 5'-labeled RNA 2 (Table 1). The 5'-terminal sequence was confirmed by reverse transcription of RNA 2, primedby a deoxyoligonucleotide (kindly provided by Dr. J.H. van Boom, Leiden)complementary to nucleotides 286 to 305 of RNA 2.

Figure 4 shows the complete sequence of the 1905, nucleotides of PSG-RNA

2. The 3'-terminal 497 nucleotides are exactly identical to the 3'-terminalsequence of PSG-RNA 1. (We assume that the homology also holds for the 15

nucleotides that were missing in the RNA 1 specific clone.) Because of this

homology, the reading frame for the C-terminal 79 amino acids of the RNA 1

encoded 16K protein is also present in RNA 2. Inspection of the sequence that

is unique to RNA 2 reveals only one significant open reading frame for a

protein of 209 amino acids (Mr 22,856), flanked by 5'- and 3'-noncodingregions of 570 and 708 nucleotides, respectively. Three observations support

the conclusion that this reading frame encodes the capsid protein. (a) When

a DNA fragment corresponding to nucleotides 528 to 1815 of RNA 2 was inserted

in pSP65 and transcribed with SP6-polymerase (24), translation of the trans-

cript in a reticulocyte cell free system yielded a product that comigratedwith PSG coat protein (result not shown). (b) The amino acid composition of

the RNA 2 encoded protein is identical to the composition reported for a

"Dutch isolate" of TRV (25). (c) The PSG-RNA 2 encoded protein showsconsiderable sequence homology to the capsid protein of strain CAM (seeDiscussion).

Two different sets of direct repeats have been reported to occur in the

leader sequence of CAM-RNA 2 (8). Such repeats are absent in PSG-RNA 2. The

leader sequence of PSG-RNA 2 contains 8 AUG-codons. The sequence of the sub-

genomic RNA 4 starts at position 475 in RNA 2, just downstream the eighth

2163

Nucleic Acids Research

AUAAAACAUUCCACCUUUGGCUGUCGCCCCUGGCUGGGGUAUGUCUUUGAACGCAGUACAAUGUGCUAAUUGACAAGUUGOAGAACGCGG10 20 30 40 50 60 70 80 90

UAGAACGUACUUAUCCGACAGGCCUUUAUCCCUCUUCCUGACCAGGUUUUUGUCAGUGUAUCAUGUUGUUUUGAACUAUCCAACUUAGUA100 110 120 130 140 150 160 170 180

CCGGAAUGGGAAAGUGAUUGGUGUGCUUAUCUUCGAUAUGAUGCUUUGAAUUUUGCAUAGUAGGAAGUUAGAAAAGAAACUCUUGUCUUC190 200 210 220 230 240 250 260 270

UCAAACAAGUAAAACCUGAGACGUGUUAACUACGAAAGUGUCCAUUCAAAAUAUCAUGAACGAACGUAGUUUGUUUGUGGUUACCAAAAA280 290 300 310 320 330 340 350 360

CGAUAAGAACACCUUUAAGGUUUUCUUUACGCAAGUGUUCGCCAGAACACUGGGGUUUUGUCAGUUUCUUUAGAGAAAACUGACUAAGUU370 380 390 400 410 420 430 440 450

IJCUAAUGUUAUCAUUAGAGAUGGCAUAAAUAAACUGUUUGUGUCUGCUGAUMGAUCAUUUUUACUUUGACAGUUAGCUUUGCUGAACUA460 470 - 480 490 500 510 520 530 540

M G D M Y D E Q F D K A G G P A D L M DCUGGUUACUGAAUCACUUACGCUAACUAACAUGGGUGACAUGUACGAUGAGCMUUCGAUAAGGCGGGAGGGCCUGCAGAUUUGAUGGAU

550 560 570 580 590 600 610 620 630D S W V E S T A W K D L L K K L H S V K F A L Q S G R D E I

GACUCAUGGGUUGAGUCUACAGCUUGGAMAGAUCUUUUAAAGAAACUGCAUAGCGUAAAAUUUGCGCUACAGUCUGGUAGAGAUGAGAUC640 650 660 670 680 690 700 710 720

T G L L T T L S R Q C P Y S P Y E Q F P E R K V Y F L L D SACCGGUUUGCUGACCACUCUCAGUAGACAGUGUCCUUAUUCGCCGUACGAACAGUUUCCUGAAAGAAAGGUUUACUUUUUGUUAGACUCA

730 740 750 760 770 780 790 800 810R A N N A ,, V I Q A S % F K R R A D E K N A V A G V T N

CGUGCUAAUAACGCUCUCGGUGUUAUUCAGAACGCUUCUGCCUUCAAGAGGAGAGCCGAUGAGAAGAAUGCUGUAGCUGGUGUUACAAAU820 830 840 850 860 870 880 890 900

I P A N P N T T V T T N (0G S C T Fr r K A N T S S T L E E DAUACCAGCCAAUCCAAAUACCACGGUCACUACAAAUCAAGGUAGUACUACUACUACGAAGGCGAACACAAGCUCGACUUUGGAGGAGGAU

910 920 930 940 950 960 970 980 990L Y T Y Y K F D D A S T T F d K S L T S L E N M Q L K S Y YCUGUAUACUUAUUACAAAUUCGAUGACGCCUCGACAACAUUUCAUAAAUCUCUGACGUCGUUGGAAAAUAUGCMCUGMGAGUUAUUAU

1000 1010 1020 1030 1040 1050 1060 1070 1080R R N F E K N F G V K F G S A S T P A S G G S G A T P P P A

CGAAGAMUUUUGAGAAGMCUUUGGUGUCAAAUUUGGUAGCGCGUCGACUCCGGCCUCGGGGGGAAGUGGUGCMCACCACCUCCUGCG1090 1100 1110 1120 1130 1140 1150 1160 1170

S G G A V R P N P *.\i, 20CGGGUJCUGUGCGUCCUMUCCUUGAUGUCGUCAAUCAACCUUUMGGGACCUUGUGAAAUUCMGGGGCGGGUGUCGCCAGM

1180 1190 1200 1210 1220 1230 1240 1250 1260

AUCACCGAUACUACUAGUUGUAUCAAAACAAMCUAAUCACAACUUGCAUUUUACUAGUGCAUGUUGAAUUCCUGUGGMGUCAGGAGGU1270 1280 1290 1300 1310 1320 1330 1340 1350

GGAUGUUACCAUAAACAUUAAUGUCGCAGGUGGCAUUUUAAAAUMGACCUGAUACGAUGUAUAAUUGUUGUGGCCGUAGCCACCUUGM1360 1370 1380 1390 1400 1420 1430 1440

AAGUGUCGUAAACGCGUUGAAGCAAGAAAUCGAGAGAUCUGGAAACAGAUUCGACGAAUUCAAGCUGAAAGCUCGUCUGCGACACGUAM1450 1460 1470 1480 1490 1500 1510 1520 1530

AAGUCUCAUAAUUCAAAGAACUCUAAGAAGAAAUUCAAAGAGGACAGAGMUUUGGGGCACCMAAAGAUUUUUMGAGAUGAUGUUCCU1540 1550 1560 1570 1580 1590 1600 1610 1620

UUGGGAAUUGAUCAAUUGUUUGUUUUUUGAUUUUAUUUUAAAUUGUUAUCUGUUUCUGUGUAUAGACUGUUUGAGAUUGGCGUUUGGCCG1630 1640 1650 1660 1670 1680 1690 1700 1710

ACUCAUUGUCUUACCAUAGGGAAACGGACUUUGUUUGUAUUGUUAUUUUAUUUGUAUUUUAUUAAAAUUCUCAAUGAUCUGMAAAGCUU1720 L730 1740 1750 1760 1770 1780 1790 1800

CGCGGCUAAGAGAUUAUUGGGGGGUGAGUAAGUACUUUUAAAGUGAUGAOGUUACAAAGGCAAAAGGGGUAAAACCCCUCGCCUACGUA1810 1820 1830 1840 1850 1860 1870 1880 1890

AGCGUUAUUACGCCC1900

Figure 4. Complete nucleotide sequence of PSG-RNA 2 and amino acid sequencededuced from the coat protein cistron (nucleotides 571 to 1197). The arrowat position 475 indicates the 5'-terminus of RNA 4; the asterisk at position1409 marks the beginning of the 3'-terminal sequence of 497 nucleotides thatis identical in PSG-RNAs 1 and 2.

AUG-codon. Thus, the sequence of 96 nucleotides preceding the coat proteincistron in RNA 4 is devoid of AUG-codons. The length of RNA 4 is 1431 nucleo-tides; its relationship to RNA 2 is illustrated in Figure 2.

2164

Nucleic Acids Research

A10 20 30 40 50 60 70 80 90

PSG -GAHLVPTKSGDADTYNANSDRTLCALLSELPLEKAVMVTYGGDDSL IAFPRGTQFVDPCPKLATKWNFECKIFKYDVPMFCGKFLLKTSTMV -TCIWYQRKSGDVTTFIGNTVI IAACLASMLPMEKI IKGAFCGDDSLLYFPKGCEFPDVQHSANLMWNFEAKLFKKQYGYFCGRYVIHHD

xxxx x x x x xx xx xxxxx xx x x x xxxx x xx xxx

100 119 120 130 140 150 160 170 180PSG SCYEFVPDPVKVLTKLGKKSIKDVQHLAEIYISLNDSNRALGNYMVVSKLSESVSDRYLYKGDSVHALCALWKHIKSFTALCTLLPRRKGTMV RGCIVYYDPLKL ISKLGAKHI KDWEHLEEFRRSLCDVAVSLNNCAYYTQLDDAVWEVHKTAPPGSFVYKSLVKYLSDKVLFRSLFIDGSSC

xx x xxx x xxx xx x xx x x x x x x x x

B10

PSG MEDKSLVTLK

20 30 40 50 60 70 80 90 100PSG KKTFEVSKFSNLGAIELFVDGRRKRPKYFHRRRETVLNHVGGKKSEHKLDVFDQRDYKMIKSYAFLKIVGVQLVVTSHLPADTPGFIQIDTMV MALVVKGKVNINEFIDLTKMEKILPSMFTPVKSVMCSKVDKIMVHENESLSEVNLLKGVKL IDSTYVCLAGLVVTGEWNLPDNCRGGVS

x x x x x x x xxxx

110 120 130 140 150 160 170 180 190PSG LLDSRLTEKRKKGKTIQRFKARACSNCSVAQYKVEYSISTQENVLDVWKVGCISEGVPVCDGTYPFSIEVSL IWVATDSTRRLNVEELNSTMV VCLVDKRMERADEATLGSYYTAAAKKRFQFKVVPNYAITTQDAMKNVWQVLVNIRNVKMSAGFCPLSLEFVSVCIVYRNNIKLGLREKIT

x x x x x xx xx x x x x x x x x

200 210 220 230 240 250 260 270PSG SDY IEGDFTDQEVFGEFMSLKQVEMKTIEAKYDGPYRPATTRPKSLLSSEDVKRASNKKNSSTMV NVRDGGPMELTEEVVDEFMEDVPMSIRLAKFRSRTGKKSDVRKGKNSSNDRSVPN KNYRNVKDFGGMSFKKNNLIDDDSEATVAESDSF

x x x x x x

C10 20 30 40 50 60 70 80 90

PSG MGDMYDEQFD-KAGGPADLMDDSWVESTAWKDLLKKLHSVKFALQSGRDEITGLLTTLSRQCPYSPYEQFPERKVYFLLDSRMNALGVICAM MAMYDDEFDTKAS ---DLTFSPWVEVENWKDVTTRLRAIKFALQADRDKIPGVLSDLKTNCPYSAFKRFPDKSLYSVLSKEAVIAVAQI

xxx xx xx xx xxx xxx x xxxxx xx x x x x xxxx xx x x x x x

100 110 120 130 140 150 160 170 180PSG QNASAFKRRADEKNAVAG---- VTNIPANPNTTVTTNQGSTTTTKANTS-STLEEDLYTYYKFDDASTTFHKSLTSLENMQLKSYYRRNCAM QSASGFKRRADEKNAVSGLVSVTPTQISQSASSSAATPVGLATVKPPRESDSAFQEDTFSYAKFDDASTAFHKALAYLEGLSLRPTYRRK

xxxxxxxxxxxxxx x x x x x x x x xx xxxxxxxxxxx x xx x xxx

190 200 210 220PSG FEKNFGVKFGSASTPASGGSGATPPPASGGAVRPNPCAM FEKDMNVKWGGSGSAPSGAPAGGSSGSAPPTSGSSGSGAAPTPPPNP

xxx xx x xx

Figure 5. Alignment of amino acid sequences of (A) the C-terminal region ofPSG 170K and TMV 183K proteins, (B) PSG 29K and TMV 30K proteins, and (C)PSG coat protein and CAM coat protein. Identical residues are indicated byan "x' below the sequence; underlined residues in (A) are part of the sequencethat is conserved in all RNA-dependent RNA-polymerases (26).

DISCUSSIONThe similarities in the organization and expression of genetic informa-

tion in TRV-RNA 1 and TMV-RNA prompted us to compare the amino acidsequences deduced from corresponding reading frames. Figure 5A shows analignment of the C-terminal sequences of the TRV-strain PSG 170K protein andthe TMV 183K protein. Identical residues are found at 55 positions of the 179amino acids that are compared. In addition there are a number of conservedamino acid changes. The underlined residues between positions 10 and 45constitute the consensus sequence that can be found in all proteins with a

(putative) role in RNA-dependent RNA-synthesis (26). They occur at the same

2165

Nucleic Acids Research

position in the TRV 170K and TMV 183K proteins. Like the 183K protein, the126K protein shows homology to tricorna- and alpha-virus proteins with aputative role in RNA replication (27,28). By analogy, we assume that theTRV 120K and 170K proteins are both involved in viral RNA synthesis.

Figure 5B shows that the homology between TRV strain PSG 29K proteinand TMV 30K protein is relatively low. However, significant local homologiesexist at position 83-86 and between positions 135 and 170. Studies withmutants indicate that the TMV 30K protein has a role in cell-to-cell trans-

port of the virus (29). The conceptthat the TRV 120K and 170K proteins are

responsible for RNA replication, whereas the 29K protein performs a transportfunction correlates well with the observation that TRV-RNA 1 is able toreplicate systemically in intact plants in the absence of RNA 2 (2).

The position of the 16K cistron in TRV-RNA 1 is similar to the locationof the coat protein cistron in TMV-RNA. It could be the remnant of a defectivecoat protein cistron. However, no significant sequence homology between the16K protein and either TMV or PSG coat protein was observed. The observationthat the 16K cistron is conserved in TRV strain TCM (Angenent et at., manu-

script in preparation) suggests that theencodedprotein has a function invirus multiplication. RNA 5 is probably the messenger involved in theexpression of this function. The results of the Northern blot experimentindicate that RNA 5 is 3'-coterminal with either RNA 1 or RNA 2; its estima-ted length of 700 nucleotides would map its 5'-end just upstream of the 16Kcistron in RNA 1. If RNA 5 corresponded to the 3'-terminal 700 nucleotides ofRNA 2 it would contain no meaningful information. RNA 3 was shown to be 3'-coterminal with RNA 1 (Figure 1). This would locate its 5'-end just upstreamof the 29K cistron. Probably, TRV-RNA 3 is functionally equivalent to I2-RNA,the messenger for the TMV 30K protein (30). The finding that TRV strains TCM,TAK, SYM and ORY contain RNA 3 and 5 molecules that are comparable in size tothe PSG RNAs, suggests that the genome structure of RNA 1 of these strains issimilar.

The length and genome organization of RNA 2 of strain PSG (1905nucleotides) and the CAM strain (1799 nucleotides)(8), are remarkably similar.Figure 5C shows that the amino acid sequence homology between the coatproteins of the respective strains is about 40%. A major difference betweenthe amino acid composition of the two proteins is the presence of 21 threonineresidues in PSG coat protein whereas only 14 threonine residues are presentin the coat protein of CAM. A cluster of 10 threonine residues is presentbetween positions 104 and 144 of the PSG sequence in Figure 5C, in a region

2166

Nucleic Acids Research

where the homology between the two proteins is relatively low.At least five TRV strains were found to contain an RNA 4 molecule that

is 400 to 500 nucleotides shorter than the genomic RNA 2 (Figure 1A). Thisindicates that although in these strains the length of RNA 2 is quite variable,the coat protein cistron is located at a fixed position with respect to the 5'end. In TCII-RNA 2 the coat protein cistron was found to initiate at 542nucleotides from the 5'-end (Angenent et al., manuscript in preparation). Asfar as is known the subgenomic RNA 4 is not replicated by the RNA 1 inducedreplicase. This indicates that the 5'-terminal sequence of RNA 2, that is

absent in RNA 4, contains signals that are essential to replication. Probablythese signals interfere with translation, thus creating the need for the

synthesis of a subgenomic coat protein messenger. The 5'-terminal sequenceAUAAAACAUU-- that is identical in PSG-RNAs 1 and 2 may reflect (part of) a

replicase recognition signal in the corresponding minus-strand RNAs; the 5'-

terminal sequence AUAAA-- of RNA 4 may reflect (part of) an internalinitiation site for the replicase in minus-strand RNA 2. The sequence AUAAAis also found 21 nucleotides upstream of the 16K cistron in RNA 1 (position1376, arrow in Figure 3). Initiation of transcription of minus-strand RNA 1at this position would generate an RNA molecule of 702 nucleotides, close to

the estimated length of RNA 5. This putative initiation site is preceded bythe sequence AUGC similar to the sequence AUGGC that is found upstream of theinitiation site in RNA 2. Although less conserved, a comparable sequence isfound 124 nucleotides upstream of the cistron for the 29K protein (position490, arrow in Figure 3). The use of this site would produce an RNA 3 moleculeof 1587 nucleotides. The putative 5'-terminal sequences of PSG-RNAs 3 and 5are also listed in Table 1. The 5'-terminal sequence of CAM-RNA 2 (8) isrich in A and U but is not identical to that of the PSG-RNAs. However, around

position 486 in the leader sequence of this RNA the sequence AUGC/AUAA isfound which probably represents the 5'-end of CAM-RNA 4 (Table 1).

At the 3'-end PSG-RNA 4 is 100% homologous to PSG-RNA 1 for a length of

497 nucleotides. Our studies on TCM-RNA 2 show that in this RNA molecule the

3'-terminal homology with RNA 1 continues for another 601 nucleotides(Angenent et al., manuscript in preparation). The sequence of the 3'-terminal1098 nucl eotides of TCM-RNA 2 shows a 94% homol ogy wi th the correspondi ngregion of PSG-RNA 1. The available data indicate that the RNA 1 molecules of

strains of the TRV serotype I-II cluster are closely related by sequence and

that the RNA 2 molecules of these strains show a 3'-terminal sequencehomology to RNA 1 for various lengths. Because of this homology the complete

2167

Nucleic Acids Research

16K cistron and part of the 29K cistron are present in TCM-RNA 2. It is quitepossible that in strain TCM the 16K protein is expressed from both RNA 1 andRNA 2. TCM-RNA 2 is about 1600 nucleotides longer than PSG-RNA 2. Of theseadditional nucleotides 601 contribute to the increased homology with RNA 1;the others are located in between the coat protein cistron and the 3'-termi-nal homologous region (Angenent et al., manuscript in preparation).

The 3'-terminal homologous sequence in TRV-RNAs 1 and 2 may be involvedin encapsidation and/or replication of the viral RNAs. The observation thatthe length of this homologous region can vary among different strainssuggests that only part of this sequence is required for one or both of thesefunctions. It is therefor remarkable that the PSG-RNAs are identical over asequence of 497 nucleotides. This 100% homology is not found when differentstrains are compared, e.g. strains PSG and TCM. In vitro, pseudorecombinantsare readily obtained between strains of TRV (2). The observed homology withina given strain indicates that either recombination does not occur frequentlyin the field or that once a pseudorecombi nant with heterol ogous 3 -termi nihas been formed a mechanism comes into play that corrects the sequencedifferences, e.g. by recombination. A 100% identity has been reported for the3'-terminal 459 nucleotides of the CAM strain RNAs 1 and 2. Comparison of theCAM and PSG sequences shows an 80% homology for the 3'-terminal 44 nucleo-.tides. The observation that CAM-RNA 1 lacks detectable sequence homology withRNA 1 of other TRV strains, exept for the 3'-terminal 44 nucleotides,separates strain CAM from the serotype I-II cluster.

ACKNOWLEDGMENTSThanks are due to Dr. T. Kartasova and Mr. E. Modderman for their help

with sequencing. Mr. C. Cuperus is greatfully acknowledged for isolation ofPSG and TCM isolates. We are indebted to Drs. S. Bergh and A. Siegel forsending us their manuscript prior to publication. This work was sponsered inpart by the Netherlands Foundation for Chemical Research (S.O.N.) withfinancial aid from the Netherlands Organization for the Advancement of PureResearch (Z.W.O.).

*To whom correspondence should be addressed

REFERENCES1. Robinson, D.J. and Harrison, B.D. (1985) J. gen. Virol. 66, 171-176.2. Harrison, B.D. and Robinson, D.J. (1978) Adv. Virus Res. 23, 25-77.3. Mayo, M.A., Fritsch, C. and Hirth, L. (1976) Virology 69, 408-415.

2168

Nucleic Acids Research

4. Fritsch, C., Mayo, M.A. and Hirth, L. (1977) Virology 77, 722-732.5. Pelham, H.R.B. (1979) Virology 97, 256-265.6. Bisaro, D. and Siegel, A. (1980) Virology 107, 194-201.7. Linthorst, H.J.M. and Bol, J.F. (1986) in: "Developments and Applica-

tions in virus testing". Eds. R.A.C. Jones and L. Torrance. Associationof Applied Biologists. in press.

8. Bergh, S.T., Koziel, M.G., Huang, S-C., Thomas, R.A., Gilley, D.P. andSiegel, A. (1985) Nucleic Acids Res. 13, 8507-8518.

9. Huttinga, H. (1972) Ph. D. Thesis, University of Wageningen.10. Gubler, U. and Hoffman, B.J. (1983) Gene 25, 263-269.11. Cornelissen, B.J.C., Brederode, F.Th., Moormann, R.J.M. and Bol, J.F.

(1983) Nucleic Acids Res. 11, 1253-1265.12. Pagert, M. and Ehrlich, S.D. (1979) Gene 6, 23-28.13. Birnboim, H.C. and Doly, J. (1979) Nucleic Acids Res. 7, 1513-1523.14. Messing, J. (1983) Methods Enzymol. 101, 20-78.15. Kieny, M.P., Lathe, R. and Lecocq, J.P. (1983) Gene 26, 91-99.16. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci.

USA 74, 5463-5467.17. Biggin, M.D., Gibson, T.J. and Hong, G.F. (1983) Proc. Natl. Acad. Sci.

USA 80, 3963-3965.18. Koper-Zwarthoff, E.C., Lockard, R.E., Alzner-De Weerd, B., RajBhandary,

U.L. and Bol, J.F. (1977) Proc.Natl. Acad. Sci. USA 74, 5504-5508.19. McMaster, G.K. and Carmichael, G.C. (1977) Proc. Natl. Acad. Sci. USA

76, 4835-4838.20. Thomas, P.S. (1980) Proc. Natl. Acad. Sci. USA 77, 5201-5205.21. Maniatis, T., Fritsch, E.F. and Sambrook,J. (1982) Molecular Cloning,

Cold Spring Harbor Laboratory.22. Wahl, G.M., Stern, M. and Stark, G.R. (1979) Proc. Natl. Acad. Sci. USA

76, 3683-3687.23. Goulet, P., Lomonosoff, G.P., Butler, P.J. G., Akam, M.E., Gait, M.J.

and Karn, J. (1982) Proc. Natl. Acad. Sci. USA 79, 5818-5822.24. Melton, D.A., Krieg, P.A., Rebagliati, M.R., Maniatis, T., Zinn, K. and

Green, M.R. (1984) Nucleic Acids Res. 12, 7035-7056.25. Offord, and Harris, (1965) in: Proc. 2nd FEBS-meeting, 216-217.26. Kamer, G. and Argos, P. (1984) NucleicAcids Res.-12, 7269-7282.27. Haseloff, J., Goelet, P., Zimmern, D., Ahlquist, P., Dasgupta, R. and

Kaesberg, P. (1984) Proc. Natl. Acad. Sci. USA 81, 4358-4362.28. Cornelissen, B.J.C. and Bol., J.F. (1984) Plant Mol. Biol. 3, 379-384.29. Leonard, D.A. and Zaitlin, M. (1982) Virology 117, 416-424.30. Hirth, L. and Richards, K.E. (1981) Adv. Virus Res. 26, 145-199.

2169