Post on 29-Jan-2023
Volume 14 Number 5 1986 Nucleic Acids Research
Analysis of the genome structur of tobacco rattle virus strain PSG
Ben J.C.Cornelissen, Huub J.M.Linthorst, Frans Th.Brederode and John F.Bol*
Department of Biochemistry, State University of Leiden, PO Box 9505, 2300 RA Leiden, TheNetherlands
Received 24 January 1986; Accepted 11 February 1986
ABSTRACTThe sequence of the 3'-terminal 2077 nucleotides of genomic RNA 1
and the complete sequence of genomic RNA 2 of tobacco rattle virus (TRV,strain PSG) has been deduced. RNA 2 (1905 nucleotides) contains a singleopen reading frame for the viral coat protein (209 amino acids), flankedby 5'- and 3'-noncoding regions of 570 and 708 nucleotides, respectively.A subgenomic RNA (RNA 4) was found to lack the 5'-terminal 474 nucleotidesof RNA 2 and is the putative messenger for coat protein. The deduced RNA 1sequence contains the 3'-terminal part of a reading frame that probablycorresponds to the TRV 170K protein and reading frames for a 29K proteinand a 16K protein. Proteins encoded by the first two reading frames showsignificant amino acid sequence homology with corresponding proteins encodedby tobacco mosaic virus. Subgenomic RNAs 3 (1.6 kb) and 5 (0.7 kb) wereidentified as the putative messengers for the 29K and 16K proteins,respectively. At their 3'-termini all PSG-RNAs have an identical sequenceof 497 nucleotides; at the 5'-termini homology is limited to 5 to 10 bases.
INTRODUCTIONA division of tobraviruses into three separate clusters has been pro-
posed(1). These are represented by strains of tobacco rattle virus (TRV,serotype I-II), the CAM-strain of TRV (serotype III, also called pepperringspot virus), and strains of pea early-browning virus (PEBV). Hybridiza-tion experiments using complementary DNA (cDNA) copies of virus RNA showedextensive homology between viruses within one cluster but not between virusesfrom different clusters (1). The genome of tobraviruses consists of two RNAmolecules. The longer genome segment, RNA 1, has a length of approximately6300 nucleotides in all strains. The length of RNA 2, however, differs fromstrain to strain and ranges from 1800 to 4000 nucleotides (2). Differentstrains of the TRV-cluster (serotype I-II) share extensive sequences in RNA 1but show much diversity in the sequence of their RNA 2 (1).
In vitro translation experiments demonstrate that RNA 1 directs thesynthesis of a 120K protein and a 170K protein (3-5). The 170K protein is
C IRL Press Umited, Oxford, England.
Nucleic Acids ResearchVolume 14 Number 5 1986
2157
Nucleic Acids Research
probably produced by readthrough translation of the 120K cistron (5). RNA 2translates into viral coat protein (3,4). In addition to RNAs 1 and 2, asubgenomic RNA 3 of about 1.6 kb has been identified in several strains,which directs the synthesis of a 30K protein. There is a controversy in theliterature about whether the 30K-cistron should be assicned to RNA 1 or RNA 2(5,6).
To obtain further insight in the genome structure of TRV we have clonedcDNA of the RNAs of strains PSG and TCM, isolated in The Netherlands frompotato and tulip, respectively (7). RNA 2 of strain TCM (3.5 kb) is about1600 nucleotides longer than RNA 2 of strain PSG (1.9 kb). Here we report the3'-terminal 2077 nucleotides of PSG-RNA 1 and the complete sequence of RNA 2(1905 nucleotides). The RNA 1 sequence contains three open reading framesencoding the C-terminal region of the 170K protein, a 29K protein and a 16Kprotein, respectively. The amino acid sequences of these proteins were com-pared to corresponding proteins encoded by the tobacco mosaic virus (TMV)genome. RNA 2 was found to contain only one significant open reading frameencoding the capsid protein. The structure of PSG-RNA 2 was compared to theCAM-RNA 2 sequence that was published recently (8). Sequencing of a subge-nomic RNA (RNA 4, 1431 nucleotides) showed it to be derived from PSG-RNA 2.Two other subgenomic RNAs (RNAs 3 and 5) were identified that are probablyinvolved in the expression of the 29K and 16K proteins.
MATERIALS AND METHODSVirus strains. A Dutch TRV strain from potato (PSG) was a gift from Dr.
J.A. de Bokx (Wageningen, The Netherlands). Inocula of two local TRV strainsfrom tulip (TCM and TAK) were kindly provided by Dr. C.J. Asjes (Lisse, TheNetherlands). Strains ORY, SYM and CAM were obtained from Drs. B.D. Harrisonand D.J. Robinson (Dundee, Scotland). A Dutch isolate of pea early-browningvirus (PEBV) was from Dr. L. Bos (Wageningen, The Netherlands).
Purification of viraZ nucZeoprotein and RNA. Virus was purified fromSamsun NN tobacco according to Huttinga (9) and sedimented in sucrosegradients. Fractions containing approximately equal amounts of long and shortparticles were collected. RNA was extracted with phenol/chloroform (1:1) at650C, from purified virus suspensions that had been incubated in 1% SDS for15 min at 37°C. Ethanol precipitated RNA was dissolved in 20 mM tris-HCl,pH 7.6, 0.1 mM EDTA.
Synthesis and cloning of double-stranded cDNA. RNA was polyadenylatedusing ATP:RNA adenyltransferase (Bethesda Research Laboratories). This RNA
2158
Nucleic Acids Research
was copied into DNA by the method of Gubler and Hoffman (10): the firststrand was synthesized with reverse transcriptase (Life Sciences Inc.) usingoligo(dT) as a primer. The second strand was synthesized with a combinationof the enzymes E. coli DNA polymerase I, RNase H and E. coZi ligase (all fromP-L Biochemicals). Double-stranded cDNA was tailed with dCTP and annealed to
Pst I-cut, G-tailed pUC 9 or pBR 322 as described (11). Transformation of
E. coZi and isolation of plasmid DNA were according to Pagert and Ehrlich(12) and to Birnboim and Doly (13), respectively. Clones were characterizedby restriction enzyme and Northern blot analyses.
DNA sequencing. cDNA inserts or restriction fragments thereof, were
subcloned into the mp and tg derivatives of M13 (14,15) and sequenced by thedideoxy chain termination method (16) using (a-35S)dATP (17).
RNA sequencing. RNAs 1, 2 and 4 of TRV strain PSG were decapped withtobacco acid pyrophosphatase (kindly provided by Dr. L. Pinck, Strasbourg,France) and dephosphorylated with bacterial alkaline phosphatase (BethesdaResearch Laboratories). After 5'-labeling with (y-32P)ATP and T4 polynucleo-tide kinase (Bethesda Research Laboratories), the RNAs were separated byelectrophoresis in 1.5% low melting point agarose, partially digested withnuclease P1 (Boehringer) and analysed by the wandering spot technique as
described previously (18).Northern bZotting and hybridization. Samples of 0.5 ug of TRV-RNA were
denatured with glyoxal (19), electrophoresed in 1.5% agarose gels and trans-ferred to Biodyne membranes (Pall Ltd., Portsmouth, U.K.) according to Thomas(20). The hybridization of the blots to cDNA probes, labelled by nick trans-lation (21), was performed as described previously (22).
RESULTSAnatysis of TRV RNAs
Figure 1 shows a Northern blot of RNAs of PEBV, strain CAM and five strainsof the TRV cluster (TCM, TAK, PSG, SYM and ORY). In panel A the blot isprobed with a cDNA clone corresponding to the 3'-terminal 440 nucleotides ofTCM-RNA 1. This clone hybridizes to RNAs 1 and 2 of a number of TRV strains(7). At least five RNAs were consistently found with this probe in prepa-rations of TCM, TAK, PSG, SYM and ORY. The genomic RNA 1 and the subgenomicRNAs 3 and 5 are of the same length in all strains; the length of the
genomic RNA 2 and the subgenomic RNA 4 differs from strain to strain (theirposition is indicated by two small bars in Figure 1A). The size differencebetween RNA 2 and 4 is about 500 nucleotides in all strains. In addition to
2159
Nucleic Acids Research
:-~~ ~~~~ C 0-t_ 0
4,,
Figure 1. Northern blot of RNAs extracted from virus preparations of TRV-strains TCM, TAK, PSG, SYMA! ORY, CAM and pea early-browning (PEB) virus.The blot was probed with 3 P-cDNA corresponding to (A) the 3'-terminal 440nucleotides of TCM-RNA 1, and (B) nucleotides 1193 to 2077 from the 3'-endof PSG RNA 1. The positioii of RNAs 1, 3 and 5 in all strains is indicatedin the margin; the position of RNAs 2 and 4 varies in each strain and isindicated by two small bars.
RNAs 1 to 5 some strainscontain RNAmolecules that have no clear counterpartin other strains. In panel B of Figure 1 the blot was probed with cDNAcorresponding to nucleotides 1193-2077 from the 3'-end of PSG-RNA 1. Thisprobe does not hybridize to RNAs 2, 4 and 5 but it does hybridize to RNAs 1and 3. Thus RNA 3 is derived from RNA 1. Moreover, the results of Figure 1confirm that there is extensive sequence homology between the 3'-terminal2 kb region of RNA 1 of the five strains from the TRV cluster, whereas nohomology is found with strains PEBV and CAM.
The data of Figure 1A show that there is considerable sequence homologyat the 3'-termini of TRV-RNAs 1 and 2 andtherespective subgenomic RNAs. Thesequence of the 5'-terminal 16 nucleotides of PSG-RNAs 1, 2 and 4 was deter-mined with the wandering spot method and is listed in Table 1. In all threeRNAs the cap-structure (2) is followed by the sequence AUAAA---; in RNAs 1and 2 the 5'-homology is extended to 10 nucleotides.3'-TerminaZ nucZeotide seqcuence of PSG-RNA 1
The sequence of the 3'-terminal 2077 nucleotides of PSG-RNA 1 was
2160
Nucleic Acids Research
Table 1: Confirmed and putative 5'-terminal sequences of TRV-RNAs
-5 -1 +1 +10Confirmed 5'-termini PSG-RNA 1 AUAAAACAUUUCAAUC--
PSG-RNA 2 AUAAAACAUUGCACCU--PSG-RNA 4 AUGGC AUAAAUAAACUGUUUG--CAM-RNA 2 AAAAUUUUCAGAAUGU--
Putative 5'-termini PSG-RNA 3 AUGGAAGC AUAUUAAGAGUUUUAC--PSG-RNA 5 AUGC AUAAAGAAAUUUAUUG--CAM-RNA 4 AUGC AUAAUUAUACUGAUUG--
Homologous nucleotides are underlined. Homologous genomic sequencespreceding the 5'-termini of the subgenomic RNAs are given. Sequencesof strain CAM are from Bergh et al. (8).
deduced from a single cDNA clone. Details of the sequencing procedure areavailable on request. Comparison with the 3'-terminal sequence of PSG-RNA 2(see below) indicated that the 3'-terminal 15 nucleotides are missing in thisclone. The sequenced region of PSG-RNA 1 contains three open reading frameswhich are schematically represented in Figure 2; for comparison also thegenome structure of TMV (23) is given.
Figure 3 shows the nucleotide sequence together with the amino acidsequences corresponding to the three open reading frames. The first readingframe probably corresponds to the C-terminal 179 amino acids of the TRV 170K
126 K 183K 30K CP
TMV - RNA 6395 nt
120 K 170K 29K 16K
z -T------ r-IZ PSG - RNA 1 ±6300 nt
CP
,; I_ PSG - RNA 2 1905 nt
CP:F_ '1 PSG - RNA 4 1431 nt
Figure 2. Schematic representation of the genetic information encoded in RNAs1, 2 and 4 of TRV-strain PSG. Regions of the RNAs that have been sequencedare indicated by solid bars; the open box at the end of these bars representsthe 3'-terminal sequence of 497 nucleotides that is identical in the threeRNAs. The location of the cistrons encoding the 120K protein, 170K protein,29K protein, 16K protein and coat protein (CP) is indicated. For comparison,the genome structure of TMV (23) is included.
2161
Nucleic Acids Research
G A HAUGGCGCACAUC
10L P L EUACCAUUAGAGA
100P K L ACAAAGUUGGCUA
190S C Y i;CGUGUUACGAGL
280Y 1 S LACAUCUCGCUGA
370K G D SAAGGUGAUUCUC
460
AAGGAAUUGAAC550
V T L KGGUUACCUUGAM
640K Y F E
GMGUACUUUCA730
Y K M ICUAUAAAAUGAL
820I Q I D
UAUUCAMUCGA910
C S V ACUGUUCAGUUGC
1000V P V C
CGUUCCGGUCUG1090
E L N SAGMCUGAAUAG
1180T I E A
GACGAUUGAGGC1270
N K K NUAAUMGAAAAA
1360G H E I
UGGUCACGAGAC1450
N C G WUAAUUGUGGAUG
1540K R V E
UAAACGCGUUGA1630
N S K NUAAUUCAAGAA
1720D Q L F
ULGAUCMUUGUU1810
UCUUACCAUAGG1900
AGAGAUUAUUGC1990
UACGCCC2077
L V P f K S 1;:UGGUACCAACAAAAAGUGG
20 30K A V M V T Y,AAGCAGUCAUGGUUACGUA
110 120T K W N F E CLCUAAGUGGAAUUUCGAGUG
200 210F V P G KIUUGUGCCAGAUCCAGUAAA
290 300N D S N R A LAUGAUUCCMUAGAGCUCU
380 390V H A L C A L;UUCAUGCGCUUUGUGCGCU
470 480
;CCGGCUCAGGUUGAUUGGA560 570
K K T F E VLGMGMGACUUUCGMGUU
650 6601 R R R E T VLCAGAAGMGAGAAACUGUC
740 750K S Y A F L
IUAAAUCUUACGCGUUUCUG830 840
L L D S R LUCUGUUGGAUUCGAGACUU
920 930L Q Y K V E Y:GCAGUACMGGUUGAAUAC
1010 1020:D G T Y P F;UGACGGUACAUACCCUUUC
1100 1110S D Y I E G
UUCGGAUUACAUUGAAGGC1190 1200
K Y D G P Y:GAAGUAUGACGGUCCUUAC
1280 1290[S S *,CUCGUCUUAAUGCAUAAG
1370 1380C S I G H A
:GUGUAGUAUCGGUCAUGCU1460 1470
f F V C I I ICUUUGUCUGUAUUAUUAUC
1550 1560A R N R E I
AGCAAGAAAUCGAGAGAUC1640 1650
[ S K K K F K,CUCUAAGMGAAAUUCAAA
1730 1740'V F *UGUUUMUUGAUUUUAUUUU
1820 1830
GAAACGGACUUUGUUUGUA1910 1920
;GGGUGAGUAAGUACUUUU2000 2010
;A
.0
;C.
A
IU
IA
D A D T Y N A N S D R T L C A L L S KGAUGCUGAUACUUACAAUGCAAAUUCAGACAGAACGCUUUGUGCGCUCUUGUCUGAAU
40 50 60 70 80 90G G U U S L I A F P R G T Q F V D P CGGAGGAGAUGACUCACUGAUUGCGUUUCCCAGGGGAACACMUUUGUUGAUCCAUGUC
130 140 150 160 170 180K I F K Y D V P M F C G K F L L K T SAAGAUCUUUAAGUAUGAUGUCCCGAUGUUUUGUGGGAAGUUCUUGCUUAAGACGUCAU
220 230 240 250 260 270V L T K L E K K S I K D V Q H 1. A E 1GUUCUGACGAAGUUGGGGAAAAAGAGCAUAAAAGAUGUACMCAUUUAGCUGAGAUUU
310 320 330 340 350 360G N Y M V V S K L S E S V S D R Y L Y
GGAAAUUAUAUGGUGGUGUCCAAACUGUCCGAAUCUGUUUCAGAUCGGUAUUUGUACA400 410 420 430 440 450
W K H I K S F T A L C T L L P R R K GUGGAAGCAUAUUMGAGUUUUACGGCUCUGUGUACUUUACUUCCGAGACGAAMGGAU
10 500 510 520 530 540M E D K S L
AGAAGGCACAGCGGGCUGUGUCAMCUUUUACGACUGGUGAGAUGGAAGACAAGUCAUU580 590 600 610 620 630
S K F S N L G A I E L F V D G R R K R PIUCAAAAUUUUCAMUCUAGGGGCCAUUGAGUUGUUUGUAGACGGUAGGAGGMGAGACC
670 680 690 700 710 720L N H V G G K K S L H K L O V F D Q R D
:CUAAAUCAUGUUGGUGGAMGAAGAGUGAACAUAAGUUAGACGUUUUCGAUCAMGGGA760 770 780 790 800 810
K I V G V Q L V V T S H L P A D T P G FAAGAUAGUGGGUGUACAACUAGUUGUAACAUCACAUCUACCUGCAGAUACACCUGGGUU
850 860 870 880 890 900T E K R K K G K T I Q R F K A R A C D N
'ACUGAGAAAGAMAGAAGGGAMGACUAUUCAGAGAUUCAMGCUCGAGCUUGCGAUAA940 950 960 970 980 990
S I S T Q E N V L D V W K V G C I S E GAGUAUUUCUACACAGGAGMUGUGCUUGAUGUCUGGMGGUGGGUUGUAUCUCUGAGGG
1030 1040 1050 1060 1070 1080S I E V S L I W V A T D S T R R L N V E
AGUAUCGMGUGUCGCUAAUUUGGGUUGCUACUGAUUCGACUAGGCGUCUUAAUGUGGA1120 1130 1140 1150 1160 1170
D F T D Q E V F G E F H S L K Q V E M KGAUUUCACCGAUCMGAGGUUUUCGGUGAGUUCAUGUCUUUGAAACMGUGGAGAUGAA
1210 1220 1230 1240 1250 1260R P A T T R P K S L L S S E D V K R A S
AGACCAGCUACUACUAGACCUAMUCAUUGUUGUCMGUGAAGAUGUUAAGAGAGCGUC1300 1310 1320 1330 1340 1350
M T C V L K G C V N E V T V LAAAUUUAUUGUCAAUAUGACGUGUGUACUCMGGGUUGUGUGMUGMGUCACUGUUCU
1390 1400 1410 1420 1430 1440N K L R K Q V A D M V G V T R R C A E N
AACMAUUGCGAAAGCAAGUUGCUGACAUGGUUGGUGUCACACCUAGGUGCGCGGMAA1480 1490 1500 1510 1520 1530
N D F T F D V Y N C C G R S H L E K C RAMUGAUUUUACUUUUGAUUGJUAUMUUGUUGUGGCCGUAGCCACCUUGAAAAGUGUCG
1570 1590 1600 1610 1620W K Q I R R I Q A E S S S A T R K K S H
UGGAMCAGAUUCGACGAAUUCMGCUGMAAGCUCGUCUGCGACACGUMAAAGUCUCA1660 1670 1680 1690 1700 1710
E D R E F G A P K R F L R D D V P L G I.GAGGACAGAGAAUUUGGGGCACCMAAAGAUUUUUAAGAGAUGAUGUUCCUUUGGGMU
1750 1760 1770 1780 1790 1800
AAMUUGUUAUCUGUUUCUGUGUAUAGACUGUUUGAGAUUGGCGUUUGGCCGACUCAUUG1840 1850 1860 1870 1880 1890
,UUGUUAUUUUAUUUGUAUUUUAUUMAAMUUCUCAAUGAUCUGMAMAGCUUCGCGGCUA1930 1940 1950 1960 1970 1980
IAMGUGAUGAUGGUUACAAAGGCAAAAGGGGUAAAACCCCUCGCCUACGUMGCGUUAU2020 2030 2040 2050 2060 2070
Figure 3. Sequence of the 3'-terminal 2077 nucleotides of PSG-RNA 1. Theamino acid sequence deduced from the open reading frames for the C-terminusof the 170K protein (nucleotides 1 to 539), the 29K protein (nucleotides 614to 1369) and the 16K protein (nucleotides 1397 to 1819) are given. Thearrows at positions 490 and 1376 indicate the putative 5'-termini of RNAs 3and 5, respectively. The asterisk marks the beginning of the 3'-terminalsequence of 497 nucleotides that is identical in PSG-RNAs 1 and 2.
2162
Nucleic Acids Research
protein (see discussion). The second reading frame encodes a protein of 252amino acids with a molecular weight of 28,793, hereafter referred to as 29Kprotein. The third reading frame corresponds to a protein of 141 amino acids
with a molecular weight of 16,278 (16K protein). The two intercistronic
regions are 75 and 27 nucleotides long, repectively; the length of the 3'-terminal noncoding region is 258 nucleotides.
Complete nucZeotide sequence of PSG-RNA 2
Three overlapping cDNA clones were used to deduce the sequence of
nucleotides 10 to 1905 of PSG-RNA 2. Sequencing of 3'-labeled RNA confirmedthat the 3'-terminal sequence of the RNA was represented in one of the clones(Van Belkum et aZ., manuscript in preparation). The missing 5'-terminal 9
nucleotides were deduced by sequencing 5'-labeled RNA 2 (Table 1). The 5'-terminal sequence was confirmed by reverse transcription of RNA 2, primedby a deoxyoligonucleotide (kindly provided by Dr. J.H. van Boom, Leiden)complementary to nucleotides 286 to 305 of RNA 2.
Figure 4 shows the complete sequence of the 1905, nucleotides of PSG-RNA
2. The 3'-terminal 497 nucleotides are exactly identical to the 3'-terminalsequence of PSG-RNA 1. (We assume that the homology also holds for the 15
nucleotides that were missing in the RNA 1 specific clone.) Because of this
homology, the reading frame for the C-terminal 79 amino acids of the RNA 1
encoded 16K protein is also present in RNA 2. Inspection of the sequence that
is unique to RNA 2 reveals only one significant open reading frame for a
protein of 209 amino acids (Mr 22,856), flanked by 5'- and 3'-noncodingregions of 570 and 708 nucleotides, respectively. Three observations support
the conclusion that this reading frame encodes the capsid protein. (a) When
a DNA fragment corresponding to nucleotides 528 to 1815 of RNA 2 was inserted
in pSP65 and transcribed with SP6-polymerase (24), translation of the trans-
cript in a reticulocyte cell free system yielded a product that comigratedwith PSG coat protein (result not shown). (b) The amino acid composition of
the RNA 2 encoded protein is identical to the composition reported for a
"Dutch isolate" of TRV (25). (c) The PSG-RNA 2 encoded protein showsconsiderable sequence homology to the capsid protein of strain CAM (seeDiscussion).
Two different sets of direct repeats have been reported to occur in the
leader sequence of CAM-RNA 2 (8). Such repeats are absent in PSG-RNA 2. The
leader sequence of PSG-RNA 2 contains 8 AUG-codons. The sequence of the sub-
genomic RNA 4 starts at position 475 in RNA 2, just downstream the eighth
2163
Nucleic Acids Research
AUAAAACAUUCCACCUUUGGCUGUCGCCCCUGGCUGGGGUAUGUCUUUGAACGCAGUACAAUGUGCUAAUUGACAAGUUGOAGAACGCGG10 20 30 40 50 60 70 80 90
UAGAACGUACUUAUCCGACAGGCCUUUAUCCCUCUUCCUGACCAGGUUUUUGUCAGUGUAUCAUGUUGUUUUGAACUAUCCAACUUAGUA100 110 120 130 140 150 160 170 180
CCGGAAUGGGAAAGUGAUUGGUGUGCUUAUCUUCGAUAUGAUGCUUUGAAUUUUGCAUAGUAGGAAGUUAGAAAAGAAACUCUUGUCUUC190 200 210 220 230 240 250 260 270
UCAAACAAGUAAAACCUGAGACGUGUUAACUACGAAAGUGUCCAUUCAAAAUAUCAUGAACGAACGUAGUUUGUUUGUGGUUACCAAAAA280 290 300 310 320 330 340 350 360
CGAUAAGAACACCUUUAAGGUUUUCUUUACGCAAGUGUUCGCCAGAACACUGGGGUUUUGUCAGUUUCUUUAGAGAAAACUGACUAAGUU370 380 390 400 410 420 430 440 450
IJCUAAUGUUAUCAUUAGAGAUGGCAUAAAUAAACUGUUUGUGUCUGCUGAUMGAUCAUUUUUACUUUGACAGUUAGCUUUGCUGAACUA460 470 - 480 490 500 510 520 530 540
M G D M Y D E Q F D K A G G P A D L M DCUGGUUACUGAAUCACUUACGCUAACUAACAUGGGUGACAUGUACGAUGAGCMUUCGAUAAGGCGGGAGGGCCUGCAGAUUUGAUGGAU
550 560 570 580 590 600 610 620 630D S W V E S T A W K D L L K K L H S V K F A L Q S G R D E I
GACUCAUGGGUUGAGUCUACAGCUUGGAMAGAUCUUUUAAAGAAACUGCAUAGCGUAAAAUUUGCGCUACAGUCUGGUAGAGAUGAGAUC640 650 660 670 680 690 700 710 720
T G L L T T L S R Q C P Y S P Y E Q F P E R K V Y F L L D SACCGGUUUGCUGACCACUCUCAGUAGACAGUGUCCUUAUUCGCCGUACGAACAGUUUCCUGAAAGAAAGGUUUACUUUUUGUUAGACUCA
730 740 750 760 770 780 790 800 810R A N N A ,, V I Q A S % F K R R A D E K N A V A G V T N
CGUGCUAAUAACGCUCUCGGUGUUAUUCAGAACGCUUCUGCCUUCAAGAGGAGAGCCGAUGAGAAGAAUGCUGUAGCUGGUGUUACAAAU820 830 840 850 860 870 880 890 900
I P A N P N T T V T T N (0G S C T Fr r K A N T S S T L E E DAUACCAGCCAAUCCAAAUACCACGGUCACUACAAAUCAAGGUAGUACUACUACUACGAAGGCGAACACAAGCUCGACUUUGGAGGAGGAU
910 920 930 940 950 960 970 980 990L Y T Y Y K F D D A S T T F d K S L T S L E N M Q L K S Y YCUGUAUACUUAUUACAAAUUCGAUGACGCCUCGACAACAUUUCAUAAAUCUCUGACGUCGUUGGAAAAUAUGCMCUGMGAGUUAUUAU
1000 1010 1020 1030 1040 1050 1060 1070 1080R R N F E K N F G V K F G S A S T P A S G G S G A T P P P A
CGAAGAMUUUUGAGAAGMCUUUGGUGUCAAAUUUGGUAGCGCGUCGACUCCGGCCUCGGGGGGAAGUGGUGCMCACCACCUCCUGCG1090 1100 1110 1120 1130 1140 1150 1160 1170
S G G A V R P N P *.\i, 20CGGGUJCUGUGCGUCCUMUCCUUGAUGUCGUCAAUCAACCUUUMGGGACCUUGUGAAAUUCMGGGGCGGGUGUCGCCAGM
1180 1190 1200 1210 1220 1230 1240 1250 1260
AUCACCGAUACUACUAGUUGUAUCAAAACAAMCUAAUCACAACUUGCAUUUUACUAGUGCAUGUUGAAUUCCUGUGGMGUCAGGAGGU1270 1280 1290 1300 1310 1320 1330 1340 1350
GGAUGUUACCAUAAACAUUAAUGUCGCAGGUGGCAUUUUAAAAUMGACCUGAUACGAUGUAUAAUUGUUGUGGCCGUAGCCACCUUGM1360 1370 1380 1390 1400 1420 1430 1440
AAGUGUCGUAAACGCGUUGAAGCAAGAAAUCGAGAGAUCUGGAAACAGAUUCGACGAAUUCAAGCUGAAAGCUCGUCUGCGACACGUAM1450 1460 1470 1480 1490 1500 1510 1520 1530
AAGUCUCAUAAUUCAAAGAACUCUAAGAAGAAAUUCAAAGAGGACAGAGMUUUGGGGCACCMAAAGAUUUUUMGAGAUGAUGUUCCU1540 1550 1560 1570 1580 1590 1600 1610 1620
UUGGGAAUUGAUCAAUUGUUUGUUUUUUGAUUUUAUUUUAAAUUGUUAUCUGUUUCUGUGUAUAGACUGUUUGAGAUUGGCGUUUGGCCG1630 1640 1650 1660 1670 1680 1690 1700 1710
ACUCAUUGUCUUACCAUAGGGAAACGGACUUUGUUUGUAUUGUUAUUUUAUUUGUAUUUUAUUAAAAUUCUCAAUGAUCUGMAAAGCUU1720 L730 1740 1750 1760 1770 1780 1790 1800
CGCGGCUAAGAGAUUAUUGGGGGGUGAGUAAGUACUUUUAAAGUGAUGAOGUUACAAAGGCAAAAGGGGUAAAACCCCUCGCCUACGUA1810 1820 1830 1840 1850 1860 1870 1880 1890
AGCGUUAUUACGCCC1900
Figure 4. Complete nucleotide sequence of PSG-RNA 2 and amino acid sequencededuced from the coat protein cistron (nucleotides 571 to 1197). The arrowat position 475 indicates the 5'-terminus of RNA 4; the asterisk at position1409 marks the beginning of the 3'-terminal sequence of 497 nucleotides thatis identical in PSG-RNAs 1 and 2.
AUG-codon. Thus, the sequence of 96 nucleotides preceding the coat proteincistron in RNA 4 is devoid of AUG-codons. The length of RNA 4 is 1431 nucleo-tides; its relationship to RNA 2 is illustrated in Figure 2.
2164
Nucleic Acids Research
A10 20 30 40 50 60 70 80 90
PSG -GAHLVPTKSGDADTYNANSDRTLCALLSELPLEKAVMVTYGGDDSL IAFPRGTQFVDPCPKLATKWNFECKIFKYDVPMFCGKFLLKTSTMV -TCIWYQRKSGDVTTFIGNTVI IAACLASMLPMEKI IKGAFCGDDSLLYFPKGCEFPDVQHSANLMWNFEAKLFKKQYGYFCGRYVIHHD
xxxx x x x x xx xx xxxxx xx x x x xxxx x xx xxx
100 119 120 130 140 150 160 170 180PSG SCYEFVPDPVKVLTKLGKKSIKDVQHLAEIYISLNDSNRALGNYMVVSKLSESVSDRYLYKGDSVHALCALWKHIKSFTALCTLLPRRKGTMV RGCIVYYDPLKL ISKLGAKHI KDWEHLEEFRRSLCDVAVSLNNCAYYTQLDDAVWEVHKTAPPGSFVYKSLVKYLSDKVLFRSLFIDGSSC
xx x xxx x xxx xx x xx x x x x x x x x
B10
PSG MEDKSLVTLK
20 30 40 50 60 70 80 90 100PSG KKTFEVSKFSNLGAIELFVDGRRKRPKYFHRRRETVLNHVGGKKSEHKLDVFDQRDYKMIKSYAFLKIVGVQLVVTSHLPADTPGFIQIDTMV MALVVKGKVNINEFIDLTKMEKILPSMFTPVKSVMCSKVDKIMVHENESLSEVNLLKGVKL IDSTYVCLAGLVVTGEWNLPDNCRGGVS
x x x x x x x xxxx
110 120 130 140 150 160 170 180 190PSG LLDSRLTEKRKKGKTIQRFKARACSNCSVAQYKVEYSISTQENVLDVWKVGCISEGVPVCDGTYPFSIEVSL IWVATDSTRRLNVEELNSTMV VCLVDKRMERADEATLGSYYTAAAKKRFQFKVVPNYAITTQDAMKNVWQVLVNIRNVKMSAGFCPLSLEFVSVCIVYRNNIKLGLREKIT
x x x x x xx xx x x x x x x x x
200 210 220 230 240 250 260 270PSG SDY IEGDFTDQEVFGEFMSLKQVEMKTIEAKYDGPYRPATTRPKSLLSSEDVKRASNKKNSSTMV NVRDGGPMELTEEVVDEFMEDVPMSIRLAKFRSRTGKKSDVRKGKNSSNDRSVPN KNYRNVKDFGGMSFKKNNLIDDDSEATVAESDSF
x x x x x x
C10 20 30 40 50 60 70 80 90
PSG MGDMYDEQFD-KAGGPADLMDDSWVESTAWKDLLKKLHSVKFALQSGRDEITGLLTTLSRQCPYSPYEQFPERKVYFLLDSRMNALGVICAM MAMYDDEFDTKAS ---DLTFSPWVEVENWKDVTTRLRAIKFALQADRDKIPGVLSDLKTNCPYSAFKRFPDKSLYSVLSKEAVIAVAQI
xxx xx xx xx xxx xxx x xxxxx xx x x x x xxxx xx x x x x x
100 110 120 130 140 150 160 170 180PSG QNASAFKRRADEKNAVAG---- VTNIPANPNTTVTTNQGSTTTTKANTS-STLEEDLYTYYKFDDASTTFHKSLTSLENMQLKSYYRRNCAM QSASGFKRRADEKNAVSGLVSVTPTQISQSASSSAATPVGLATVKPPRESDSAFQEDTFSYAKFDDASTAFHKALAYLEGLSLRPTYRRK
xxxxxxxxxxxxxx x x x x x x x x xx xxxxxxxxxxx x xx x xxx
190 200 210 220PSG FEKNFGVKFGSASTPASGGSGATPPPASGGAVRPNPCAM FEKDMNVKWGGSGSAPSGAPAGGSSGSAPPTSGSSGSGAAPTPPPNP
xxx xx x xx
Figure 5. Alignment of amino acid sequences of (A) the C-terminal region ofPSG 170K and TMV 183K proteins, (B) PSG 29K and TMV 30K proteins, and (C)PSG coat protein and CAM coat protein. Identical residues are indicated byan "x' below the sequence; underlined residues in (A) are part of the sequencethat is conserved in all RNA-dependent RNA-polymerases (26).
DISCUSSIONThe similarities in the organization and expression of genetic informa-
tion in TRV-RNA 1 and TMV-RNA prompted us to compare the amino acidsequences deduced from corresponding reading frames. Figure 5A shows analignment of the C-terminal sequences of the TRV-strain PSG 170K protein andthe TMV 183K protein. Identical residues are found at 55 positions of the 179amino acids that are compared. In addition there are a number of conservedamino acid changes. The underlined residues between positions 10 and 45constitute the consensus sequence that can be found in all proteins with a
(putative) role in RNA-dependent RNA-synthesis (26). They occur at the same
2165
Nucleic Acids Research
position in the TRV 170K and TMV 183K proteins. Like the 183K protein, the126K protein shows homology to tricorna- and alpha-virus proteins with aputative role in RNA replication (27,28). By analogy, we assume that theTRV 120K and 170K proteins are both involved in viral RNA synthesis.
Figure 5B shows that the homology between TRV strain PSG 29K proteinand TMV 30K protein is relatively low. However, significant local homologiesexist at position 83-86 and between positions 135 and 170. Studies withmutants indicate that the TMV 30K protein has a role in cell-to-cell trans-
port of the virus (29). The conceptthat the TRV 120K and 170K proteins are
responsible for RNA replication, whereas the 29K protein performs a transportfunction correlates well with the observation that TRV-RNA 1 is able toreplicate systemically in intact plants in the absence of RNA 2 (2).
The position of the 16K cistron in TRV-RNA 1 is similar to the locationof the coat protein cistron in TMV-RNA. It could be the remnant of a defectivecoat protein cistron. However, no significant sequence homology between the16K protein and either TMV or PSG coat protein was observed. The observationthat the 16K cistron is conserved in TRV strain TCM (Angenent et at., manu-
script in preparation) suggests that theencodedprotein has a function invirus multiplication. RNA 5 is probably the messenger involved in theexpression of this function. The results of the Northern blot experimentindicate that RNA 5 is 3'-coterminal with either RNA 1 or RNA 2; its estima-ted length of 700 nucleotides would map its 5'-end just upstream of the 16Kcistron in RNA 1. If RNA 5 corresponded to the 3'-terminal 700 nucleotides ofRNA 2 it would contain no meaningful information. RNA 3 was shown to be 3'-coterminal with RNA 1 (Figure 1). This would locate its 5'-end just upstreamof the 29K cistron. Probably, TRV-RNA 3 is functionally equivalent to I2-RNA,the messenger for the TMV 30K protein (30). The finding that TRV strains TCM,TAK, SYM and ORY contain RNA 3 and 5 molecules that are comparable in size tothe PSG RNAs, suggests that the genome structure of RNA 1 of these strains issimilar.
The length and genome organization of RNA 2 of strain PSG (1905nucleotides) and the CAM strain (1799 nucleotides)(8), are remarkably similar.Figure 5C shows that the amino acid sequence homology between the coatproteins of the respective strains is about 40%. A major difference betweenthe amino acid composition of the two proteins is the presence of 21 threonineresidues in PSG coat protein whereas only 14 threonine residues are presentin the coat protein of CAM. A cluster of 10 threonine residues is presentbetween positions 104 and 144 of the PSG sequence in Figure 5C, in a region
2166
Nucleic Acids Research
where the homology between the two proteins is relatively low.At least five TRV strains were found to contain an RNA 4 molecule that
is 400 to 500 nucleotides shorter than the genomic RNA 2 (Figure 1A). Thisindicates that although in these strains the length of RNA 2 is quite variable,the coat protein cistron is located at a fixed position with respect to the 5'end. In TCII-RNA 2 the coat protein cistron was found to initiate at 542nucleotides from the 5'-end (Angenent et al., manuscript in preparation). Asfar as is known the subgenomic RNA 4 is not replicated by the RNA 1 inducedreplicase. This indicates that the 5'-terminal sequence of RNA 2, that is
absent in RNA 4, contains signals that are essential to replication. Probablythese signals interfere with translation, thus creating the need for the
synthesis of a subgenomic coat protein messenger. The 5'-terminal sequenceAUAAAACAUU-- that is identical in PSG-RNAs 1 and 2 may reflect (part of) a
replicase recognition signal in the corresponding minus-strand RNAs; the 5'-
terminal sequence AUAAA-- of RNA 4 may reflect (part of) an internalinitiation site for the replicase in minus-strand RNA 2. The sequence AUAAAis also found 21 nucleotides upstream of the 16K cistron in RNA 1 (position1376, arrow in Figure 3). Initiation of transcription of minus-strand RNA 1at this position would generate an RNA molecule of 702 nucleotides, close to
the estimated length of RNA 5. This putative initiation site is preceded bythe sequence AUGC similar to the sequence AUGGC that is found upstream of theinitiation site in RNA 2. Although less conserved, a comparable sequence isfound 124 nucleotides upstream of the cistron for the 29K protein (position490, arrow in Figure 3). The use of this site would produce an RNA 3 moleculeof 1587 nucleotides. The putative 5'-terminal sequences of PSG-RNAs 3 and 5are also listed in Table 1. The 5'-terminal sequence of CAM-RNA 2 (8) isrich in A and U but is not identical to that of the PSG-RNAs. However, around
position 486 in the leader sequence of this RNA the sequence AUGC/AUAA isfound which probably represents the 5'-end of CAM-RNA 4 (Table 1).
At the 3'-end PSG-RNA 4 is 100% homologous to PSG-RNA 1 for a length of
497 nucleotides. Our studies on TCM-RNA 2 show that in this RNA molecule the
3'-terminal homology with RNA 1 continues for another 601 nucleotides(Angenent et al., manuscript in preparation). The sequence of the 3'-terminal1098 nucl eotides of TCM-RNA 2 shows a 94% homol ogy wi th the correspondi ngregion of PSG-RNA 1. The available data indicate that the RNA 1 molecules of
strains of the TRV serotype I-II cluster are closely related by sequence and
that the RNA 2 molecules of these strains show a 3'-terminal sequencehomology to RNA 1 for various lengths. Because of this homology the complete
2167
Nucleic Acids Research
16K cistron and part of the 29K cistron are present in TCM-RNA 2. It is quitepossible that in strain TCM the 16K protein is expressed from both RNA 1 andRNA 2. TCM-RNA 2 is about 1600 nucleotides longer than PSG-RNA 2. Of theseadditional nucleotides 601 contribute to the increased homology with RNA 1;the others are located in between the coat protein cistron and the 3'-termi-nal homologous region (Angenent et al., manuscript in preparation).
The 3'-terminal homologous sequence in TRV-RNAs 1 and 2 may be involvedin encapsidation and/or replication of the viral RNAs. The observation thatthe length of this homologous region can vary among different strainssuggests that only part of this sequence is required for one or both of thesefunctions. It is therefor remarkable that the PSG-RNAs are identical over asequence of 497 nucleotides. This 100% homology is not found when differentstrains are compared, e.g. strains PSG and TCM. In vitro, pseudorecombinantsare readily obtained between strains of TRV (2). The observed homology withina given strain indicates that either recombination does not occur frequentlyin the field or that once a pseudorecombi nant with heterol ogous 3 -termi nihas been formed a mechanism comes into play that corrects the sequencedifferences, e.g. by recombination. A 100% identity has been reported for the3'-terminal 459 nucleotides of the CAM strain RNAs 1 and 2. Comparison of theCAM and PSG sequences shows an 80% homology for the 3'-terminal 44 nucleo-.tides. The observation that CAM-RNA 1 lacks detectable sequence homology withRNA 1 of other TRV strains, exept for the 3'-terminal 44 nucleotides,separates strain CAM from the serotype I-II cluster.
ACKNOWLEDGMENTSThanks are due to Dr. T. Kartasova and Mr. E. Modderman for their help
with sequencing. Mr. C. Cuperus is greatfully acknowledged for isolation ofPSG and TCM isolates. We are indebted to Drs. S. Bergh and A. Siegel forsending us their manuscript prior to publication. This work was sponsered inpart by the Netherlands Foundation for Chemical Research (S.O.N.) withfinancial aid from the Netherlands Organization for the Advancement of PureResearch (Z.W.O.).
*To whom correspondence should be addressed
REFERENCES1. Robinson, D.J. and Harrison, B.D. (1985) J. gen. Virol. 66, 171-176.2. Harrison, B.D. and Robinson, D.J. (1978) Adv. Virus Res. 23, 25-77.3. Mayo, M.A., Fritsch, C. and Hirth, L. (1976) Virology 69, 408-415.
2168
Nucleic Acids Research
4. Fritsch, C., Mayo, M.A. and Hirth, L. (1977) Virology 77, 722-732.5. Pelham, H.R.B. (1979) Virology 97, 256-265.6. Bisaro, D. and Siegel, A. (1980) Virology 107, 194-201.7. Linthorst, H.J.M. and Bol, J.F. (1986) in: "Developments and Applica-
tions in virus testing". Eds. R.A.C. Jones and L. Torrance. Associationof Applied Biologists. in press.
8. Bergh, S.T., Koziel, M.G., Huang, S-C., Thomas, R.A., Gilley, D.P. andSiegel, A. (1985) Nucleic Acids Res. 13, 8507-8518.
9. Huttinga, H. (1972) Ph. D. Thesis, University of Wageningen.10. Gubler, U. and Hoffman, B.J. (1983) Gene 25, 263-269.11. Cornelissen, B.J.C., Brederode, F.Th., Moormann, R.J.M. and Bol, J.F.
(1983) Nucleic Acids Res. 11, 1253-1265.12. Pagert, M. and Ehrlich, S.D. (1979) Gene 6, 23-28.13. Birnboim, H.C. and Doly, J. (1979) Nucleic Acids Res. 7, 1513-1523.14. Messing, J. (1983) Methods Enzymol. 101, 20-78.15. Kieny, M.P., Lathe, R. and Lecocq, J.P. (1983) Gene 26, 91-99.16. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci.
USA 74, 5463-5467.17. Biggin, M.D., Gibson, T.J. and Hong, G.F. (1983) Proc. Natl. Acad. Sci.
USA 80, 3963-3965.18. Koper-Zwarthoff, E.C., Lockard, R.E., Alzner-De Weerd, B., RajBhandary,
U.L. and Bol, J.F. (1977) Proc.Natl. Acad. Sci. USA 74, 5504-5508.19. McMaster, G.K. and Carmichael, G.C. (1977) Proc. Natl. Acad. Sci. USA
76, 4835-4838.20. Thomas, P.S. (1980) Proc. Natl. Acad. Sci. USA 77, 5201-5205.21. Maniatis, T., Fritsch, E.F. and Sambrook,J. (1982) Molecular Cloning,
Cold Spring Harbor Laboratory.22. Wahl, G.M., Stern, M. and Stark, G.R. (1979) Proc. Natl. Acad. Sci. USA
76, 3683-3687.23. Goulet, P., Lomonosoff, G.P., Butler, P.J. G., Akam, M.E., Gait, M.J.
and Karn, J. (1982) Proc. Natl. Acad. Sci. USA 79, 5818-5822.24. Melton, D.A., Krieg, P.A., Rebagliati, M.R., Maniatis, T., Zinn, K. and
Green, M.R. (1984) Nucleic Acids Res. 12, 7035-7056.25. Offord, and Harris, (1965) in: Proc. 2nd FEBS-meeting, 216-217.26. Kamer, G. and Argos, P. (1984) NucleicAcids Res.-12, 7269-7282.27. Haseloff, J., Goelet, P., Zimmern, D., Ahlquist, P., Dasgupta, R. and
Kaesberg, P. (1984) Proc. Natl. Acad. Sci. USA 81, 4358-4362.28. Cornelissen, B.J.C. and Bol., J.F. (1984) Plant Mol. Biol. 3, 379-384.29. Leonard, D.A. and Zaitlin, M. (1982) Virology 117, 416-424.30. Hirth, L. and Richards, K.E. (1981) Adv. Virus Res. 26, 145-199.
2169