Nucleotide sequence of the 3' terminal region of plum pox potyvirus RNA

17
Virus Research, 10 (1988) 325-342 Elsevier 325 VRR 00423 Nucleotide sequence of the 3’ terminal region of plum pox potyvirus RNA Sonia Lain, Jost Luis Riechmann, Enrique MCndez ’ and Juan Antonio Garcia Centro de Biologia Molecular (CSIC-UAM), Universidad Aukinoma, Canto Blanco, 28049 Madrid Spain and I Servicio de Endocrinologia, Centro Ram& y Cajal, Carretera de Colmenar Viejo Km 9.100, 28034 Madrid (Spain) (Accepted for publication 26 February 1988) The nucleotide sequence of the 3’ terminal 2854 nucleotides of plum pox virus RNA has been determined. There is an open reading frame of 2634 nucleotides upstream from a 220 nucleotide 3’ non-coding region followed by a long poly(A) tail. The gene coding for the capsid protein has been mapped adjacent to the 3’ non-coding region. The predicted capsid protein is originated by a Gln-Ala cleavage and consists of 330 amino acids; the central and carboxy terminal, but not the amino terminal regions present high sequence similarity with other potyviral capsid proteins. A truncated capsid protein which is present in purified preparations of plum pox virus is originated by a Gln-Thr cleavage and contains the 260 carboxy terminal amino acids of the capsid protein. The predicted plum pox virus RNA-de- pendent RNA polymerase, 518 amino acids long, has been localized at the 5’ end of the open reading frame. Potyvirus; Capsid proteins; RNA replicases Introduction The monopartite genomes of members of the potyvirus group consists of single- stranded, positive-sense RNA molecules which are 3’ polyadenylated (Hari et al., Correspondence to: S. Lain, Centro de Biologia (CSIC-UAM) Universidad Autbnoma, Canto Blanco, 28049 Madrid, Spain. 0168-1702/88/$03.50 0 1988 Elsevier Science Publishers B.V. (Biomedical Division)

Transcript of Nucleotide sequence of the 3' terminal region of plum pox potyvirus RNA

Virus Research, 10 (1988) 325-342

Elsevier 325

VRR 00423

Nucleotide sequence of the 3’ terminal region of plum pox potyvirus RNA

Sonia Lain, Jost Luis Riechmann, Enrique MCndez ’ and Juan Antonio Garcia

Centro de Biologia Molecular (CSIC-UAM), Universidad Aukinoma, Canto Blanco, 28049 Madrid Spain and I Servicio de Endocrinologia, Centro Ram& y Cajal, Carretera de Colmenar Viejo Km 9.100,

28034 Madrid (Spain)

(Accepted for publication 26 February 1988)

The nucleotide sequence of the 3’ terminal 2854 nucleotides of plum pox virus RNA has been determined. There is an open reading frame of 2634 nucleotides upstream from a 220 nucleotide 3’ non-coding region followed by a long poly(A) tail. The gene coding for the capsid protein has been mapped adjacent to the 3’ non-coding region. The predicted capsid protein is originated by a Gln-Ala cleavage and consists of 330 amino acids; the central and carboxy terminal, but not the amino terminal regions present high sequence similarity with other potyviral capsid proteins. A truncated capsid protein which is present in purified preparations of plum pox virus is originated by a Gln-Thr cleavage and contains the 260 carboxy terminal amino acids of the capsid protein. The predicted plum pox virus RNA-de- pendent RNA polymerase, 518 amino acids long, has been localized at the 5’ end of the open reading frame.

Potyvirus; Capsid proteins; RNA replicases

Introduction

The monopartite genomes of members of the potyvirus group consists of single- stranded, positive-sense RNA molecules which are 3’ polyadenylated (Hari et al.,

Correspondence to: S. Lain, Centro de Biologia (CSIC-UAM) Universidad Autbnoma, Canto Blanco, 28049 Madrid, Spain.

0168-1702/88/$03.50 0 1988 Elsevier Science Publishers B.V. (Biomedical Division)

326

1979) and contain a 5’ terminal genome linked protein (Hari, 1981; Siaw et al., 1985). The complete nucleotide sequence of tobacco vein mottling virus (TVMV) (Domier et al., 1986) and tobacco etch virus (TEV) (Allison et al., 1986) RNAs, and in vitro translation results (Vance and Beachy, 1984) indicate that the monopartite genome of potyviruses is probably expressed through the production of large polyproteins which are proteolytically cleaved to produce the functional proteins. Six virus-encoded polypeptides have been found to be associated with potyviral infections, and the cistron for each of them, with the exception of the 5’ terminal genome linked protein, has been mapped in the potyviral genome (Hellmann et al., 1986; Domier et al., 1987). Potential cistrons capable of encoding two additional polypeptides are also present in the potyvirus RNA but no such proteins have yet been identified in vivo (Hellman et al., 1986; Domier et al., 1987).

We have determined the nucleotide sequence of the 3’ terminal 2854 nucleotides of plum pox potyvirus (PPV) RNA. The cistrons for the capsid protein and the presumed viral RNA-dependent RNA polymerase have been localized and the predicted amino acid sequences have been analysed.

Materials and Methods

PPV RNA preparation

PPV, Rankovik strain, (obtained from Dr. D.Z. Maat) was propagated in Nicotiana cleuelandii. Virus was purified from infected leaves and harvested 20-30 days after inoculation. Leaf tissue was homogenized with a mixture of 2 vol(2 ml/g of fresh tissue) of 0.18 M McIlvain’s citric acid-phosphate buffer, pH7, containing 0.2% thioglycolic acid, 0.01 M sodium diethyldithiocarbamate and 0.5 M urea, and l/3 vol of chloroform. The homogenate was centrifuged for 10 min at 6000 X g and then the supematant was centrifuged for 1.5 h at 57500 x g. The pellet was resuspended in l/2 vol of 0.01 M citric acid-phosphate buffer containing 0.2% 2-mercaptoethanol and 1 M urea, and the two centrifugation steps were repeated. The pellet was resuspended in 0.1 vol of 0.1 M sodium borate buffer, pH 8.2, containing 0.01 M EDTA, clarified by low speed centrifugation, layered over an 8 ml cushion of 20% sucrose in the same buffer and centrifuged at 57500 x g for 2 h. The pellet was resuspended in 0.02 vol of borate-EDTA buffer, layered on top of a lo-40% sucrose gradient in the same buffer and centrifuged at 75000 X g for 1.5 h. Virus fractions were diluted 1: 1 with borate-EDTA buffer and centrifuged for 3 h at 57500 x g. Purified virus was resuspended in 50 mM sodium borate buffer, pH 8.2. All the purification steps were carried out at 4 o C. Virus yield was about l-4 mg per 100 g of fresh leaves.

RNA was isolated from purified virus preparations at l-5 mg/ml adjusted to 0.01 M Tris-HCl, pH8,O.l M NaCl, 0.01 M EDTA, 2% SDS, by phenol-chloroform extraction (Zimmem, 1975). Poly(A)+ RNA was selected by binding to oligo dT-cellulose essentially as described (Maniatis et al., 1982).

327

Synthesis and cloning of cDNA

First strand of complementary DNA was synthesized in reaction mixtures containing 50 mM Tris-HCl, pH8.3, 10 mM MgCl,, 5 mM dithiothreitol, 7.5 rg/ml oligo(dT),,_,,, 0.5 mM each of dATP, dCTP, dGTP, d’ITP, 100 mM KCl, 1000 U/ml RNasin, 135 U/ml reverse transcriptase (Anglian Biotechnology) and 50 pg/ml PPV RNA, during 2 h at 37” C in the presence of 100 pCi/ml of [a- 32 P]dATP. After phenol-chloroform extraction and chromatography on Sep- hadex G50, the synthesis of the second strand was performed using E. coli DNA polymerase I, RNase H and E. coli DNA ligase as described (Lapeyre and Amalric, 1985). Double-stranded cDNA, purified by extraction with phenol-chloroform and chromatography on Sephadex G50, was digested with the indicated restriction endonucleases, ligated to EcoRI or Hind111 linearized plasmid pUC19 (Yanish-Per- ron et al., 1985) and used to transform competent E. coli JM109 cells (Yanish-Per- ron et al., 1985). Clones containing recombinant plasmids were selected by colony hybridization (Grunstein and Hogness, 1975) using first strand [ 32 P]cDNA as probe, and characterized by restriction enzyme mapping and Southern blot analysis (South- em, 1975). Plasmid pPPV1 was originated from an illegitimate ligation of the 3’ terminal Hind111 fragment to Hind111 linearized pUC19, maintaining only the first nucleotide of the restriction site at the insertion point of the poly(A) tail.

Nucleotide sequence

Restriction fragments of the cloned cDNA were inserted into the poly-linker region of either Ml3 mp18 or Ml3 mp19 (Norrander et al., 1983). Single-stranded templates were isolated and sequenced by the dideoxy chain termination method (Sanger et al., 1977) using the universal primer (New England Biolabs) or synthetic oligodeoxynucleotide primers. Direct RNA sequence analysis of the 3’ end of the molecule was performed by a modified chain termination method using d(T,,GTC) as primer and reverse transcriptase (Zimmem and Kaesberg, 1978). The oligodeoxynucleotide primers were prepared using an Applied Biosystems Model 381A DNA Synthesizer and purified as described (Strauss et al., 1986).

Analysis of the PPV coat protein

The coat protein was obtained from purified virus by phenol-chloroform extrac- tion (Zimmem, 1975) and acetone precipitation of the organic phases. The intact and truncated capsid proteins were separated by size exclusion high performance liquid chromatography (HPLC) on a TSK 3000 SWG column, equilibrated and eluted with 0.1 M ammonium acetate. Protein preparations were analyzed on 15% SDS-polyacrylamide gels (Laemmli, 1970). Purified protein was hydrolyzed with 0.1 ml of 5.7 M HCl containing 0.05% 2-mercaptoethanol in evacuated and sealed tubes at 110°C for 24, 48 and 72 h and the amino acid analysis were performed on a Beckmann 121 MB Analyzer equipped with a Shimadzu Integrator model CRA. Protein was sequenced in a Beckman sequencer (model 890 M) according to the

328

method of Edman and Begg (1967). The sequence of the protein was determined using the Beckman protein/peptide/~cro/macrosequenc~ng program. To reduce peptide washout, 1 mg of polybrene was added when the sample was applied on the cup (Tarr et al., 1978). The PTH-amino acids were identified on a reverse-phase HPLC system based upon a Nova-Pak Cl8 column (Waters) and eluted with the following buffers: A, 35 mM sodium acetate, pH 5.0: acetonitrile (5 : 1); B, 2-propanol : water (3 : 2).

Comparison of amino acid sequences

Amino acid sequence alignments were carried out by computer analysis using the programs GAP and PRETTY from the University of Wisconsin Genetics Computer Group (Devereux et al., 1984).

Percentages of identity between two proteins were calculated dividing the number of matches by the total number of residues (including gaps) of the sequences compared, and multiplying by 100.

Results

The presence of a poly(A) tail at the 3’ end of several potyviruses has been described (Hari et al., 1979; Hari, 1981; Hellmann et al., 1980; Gough et al., 1987). Affinity chromatography in oligo(dT) cellulose indicated that PPV RNA also contains a poly(A) tail (Fig. 1A). Fig. 1B shows that the poly(A) tail is heteroge- neous in size. The length ranged from more than 500 to less than 15 residues of AMP; enzymatic sequencing showed that only two nucleotides (A and C) precede the poly(A) tract in the RNAse Tl fragments retained in the oligo(dT) cellulose column (not shown). Length distribution of PPV RNA poly(A) chains contained peaks spaced about 30 residues with an average length of 240 residues (Fig. 1C).

Fig. 1. (A) Electrophoresis on gel containing 0.8% agarose in Tris-borate buffer (Maniatis et al., 1982) with 0.1% SDS of PPV RNA fractionated by oligo dT-cellulose chromatography. Equivalent volumes of the samples were applied to each slot. (a) input PPV RNA (b) flow-through fractions (c) RNA eluted

with 10 mM Tris-HCl, pH 7.5. (B) Autoradiograms of 3’ terminal poly A chains released from PPV

RNA by RNase Tl digestion, [5’-32P] end-labeled, purified by retention in oligo dT cellulose and analyzed in thin gels containing 20% (a) or 6% (b) acrylamide in Tris-borate buffer (Maniatis et al., 1982) with 7 M urea. (C) Length distribution of 3’ terminal poly A chains determined as described by Ahlquist and Kaesberg (1979). The position and size, in mtmber of bases, of fragments generated by Hind111

digestion of phage $29 DNA (Yoshikawa and Ito, 1982) are shown beside the lane (a) of panel (A) and

lane (b) of panel (B) and the position of a twenty residue-long oligonudeotide is shown beside lane (a) of panel (B).

329

A a b

4545

2850 2480 2200 1950

-270

J 0 100 200 300 400 500 600

-72

3’ poly (A) chain length (no of A residues)

330

-2qsc -2:oo

C.... pPPV3 ’

-2:oo Too -‘4p0 -:“” -t 8 pPPV1

, pPPV2 I

1

d

Fig. 2. Sequencing strategy. PPV cDNA inserts cloned intorecombinant plasmids are shown in the top of the figure. The insert of plasmid pPPV3 span about 4000 nt upstream of the sequenced region. The horizontal arrows indicate the direction and extent of sequence determination. Only the restriction enzyme sites used for subcloning are shown on the map. When indicated, o~g~eoxynuclwtides d(CATCA~ATC~GACG) (a), d(AG~GGATG~A~AC) (b), d(TCACAGT~TCCATC) (c),

were used as primers, or RNA was sequenced directly (d).

Sequence of the 3’ terminal 2854 nucleotides

PPV cDNA, synthesized as described in Materials and Methods, was cloned in the plasmid pUC19. Plasmids pPPV1, pPPV2 and pPPV3, containing overlapping inserts, were isolated, characterized and used for sequencing (Fig. 2). DNA frag- ments were subcloned in Ml3 vectors and sequenced by the dideoxy chain termina- tion method (Sanger et al., 1977). All the nucleotides were sequenced at least once on each strand. Plasmid pPPV1 contained a poly(A) tract of about 100 residues; the sequence of more than 400 nucl~tides adjacent to the poly(A) tract was confirmed by sequencing the RNA directly. The 2854nucleotide sequence of the 3’ terminal region of the PPV genomic RNA is presented in Fig. 3.

Analysis of the nucleotide sequence of the + strand (virion polarity) revealed a single open reading frame (ORF) of 2634 nucleotides. The other reading frames of the + sense and the three of the -sense sequence contained numerous stop codons and few extended ORFs. The initiation codon of the large ORF was not identified in the region analyzed and presumably was located upstream of the sequence presented. The large ORF was terminated by a single TAG codon located at nucleotide position - 220.

The base composition of the 2854nucleotide sequence showed a high content of adenosine, 30.3%, and a correspon~n~y low content of cytosine, 20.3%. On the other hand the non-coding region contained a high percentage of uridine, 36.88, and a similar amount of the other three nucleotides.

The partial PPV RNA sequence contained one undecanucleotide, five decanuc- leotides and seventeen nonanucleotides repeated twice, three decanucleotides and thirteen nonanucleotides found in both senses, 5’-3’ and 3’-5’, and more than one hundred octanucleotides directly or oppositely repeated two or three times. Statisti- cally highly improbable numbers of repeated oligonucleotides in viral DNA (Escarrnis et al., 1984) and RNA (Go@ et al., 1987; Richards et al., 1978; Mekler, 1981; Villanueva et al., 1983). have also been reported. We do not know the significance of these repetitions, if any.

331

332

No similarity in length, nucleotide sequence or secondary structure could be detected when the 3’ untranslated region was compared with other RNA virus

genomes.

Sequence of PPV capsid protein

A common occurrence in potyviruses is heterogeneity of the capsid protein in

purified virus (Huttinga and Mosch, 1974; Hiebert and McDonald, 1976). Purified preparations of PPV contained a major and a minor protein with electrophoretic

mobilities corresponding to 36 and 28 kDa, respectively; the proportion of both

polypeptides varies between preparations (Fig. 4). Both proteins were purified by HPLC and their amino terminal sequences determined. The position of the nucleo-

tide sequence coding for these amino acids showed that the smallest protein is a truncated form of the capsid protein without the 70 ammo terminal amino acids.

The capsid protein was originated by cleavage at a Gln-Ala dipeptide as it occurs in pepper mottle virus (PeMV) (Dougherty et al., 1985), and the truncated form by

cleavage at a Gln-Thr dipeptide. The predicted amino acid sequence was in good agreement with the amino acid analysis of the intact capsid protein determined

chemically (Table 1). When the coat protein sequences of PeMV (Dougherty et al., 1985) potato virus

Y (PVY) (Shukla et al., 1986) sugarcane mosaic virus (SCMV) (Gough et al., 1987)

TEV (Allison et al., 1986) and TVMV (Domier et al., 1986) were compared with that of PPV, a high level of similarity was found except in the ammo terminal

a b d

67.6 -

49.6 -

33.5 -

29.5 - *-A

Fig. 4. Electrophoresis on 15% SDS-polyacrylamide gels of protein components of two different purified virus preparations (a and b) and the intact (c) and truncated (d) capsid proteins purified by HPLC. Phage

$29 stActura1 proteins (VlEek and Paces, 1986) were used as molecular weight markers.

333

TABLE 1

AMINO ACID COMPOSITION OF THE PPV CAPSID PROTEIN.

Amino acid Number of residues

residue Predicted from Determined

nucleotide sequence chemically a

Ala(A) 30 30.7

A%(R) 19 18.4

Asx(D and N) 41 39.8

Cys(C) 2 ND

Glx(E and Q) 34 38.8

GIy(G) 19 22.1

His(H) 1 7.3

WI) 13 12.6

Leu(L) 21 22.6

Lys(R) 13 13.2

Met(M) 13 12.3

Phe(F) 9 9.8

Pro(P) 24 22.4

WS) 19 19.5

Thi.0 21 21.2

Trp(W 3 ND

Tyr(Y) 12 10.9

VaW 24 23.4

’ Each value represents the average after 24,48 and 72 h of hydrolysis except the values for Thr and Ser,

which were extrapolated to zero hours of hydrolysis. ND = not determined.

region, where essentially no similarity could be detected (Fig. 5). The unlike amino terminal part started in a well-defined position in all the viruses but its length was heterogeneous. SCMV had an unlike region of 66 aa and PPV of 92 aa, whereas

TVMV had 27 aa, TEV 26 aa, and PVY and PeMV had 29 aa highly conserved in both viruses. Interestingly, a well-conserved sequence, (V/I)DAG, could be detected near the amino terminus of all the proteins. Table 2 shows the percentage of identity among the different coat proteins compared. PVY and PeMV coat proteins were very similar even when the amino terminal part of the molecule was included in the comparison. The rest of the proteins presented similar levels of identity when the

TABLE 2

AMINO ACID SEQUENCE SIMILARITY BETWEEN SIX POTYVIRUS CAPSID PROTEINS.

TVMV 58.4 (60.5)

PVY 61.7 (66.4) 53.2 (57.1)

PeMV 60.6 (65.1) 54.7 (58.8) 91.8 (94.1)

SCMV 49.7 (61.6) 44.7 (54.6) 49.3 (60.9) 49.7 (62.6) PPV 49.7 (67.2) 42.1 (56.3) 49.4 (65.5) 50.6 (67.6) 50.6 (61.3)

TEV TVMV PVY PeMV SCMV

The values are expressed as percentages of identity. Values in parentheses correspond to the comparison of the 237 carboxy terminal amino acids in TEV and SCMV, or 238 in the rest of the viruses.

ARPEQCSIQVNP .......

l-VW S ...... KU. .XAR QKLADXPTLA .........

41 80 PPV TASHLNPIFTPATTQPATXPVSQVSGPQLQTFGTYGNEDA

P&IV ........................................ PVY ........................................

SCHV TATADN ............ KPSSDNTSNAQGTSQTKCCCES TEN ........................................

TvtlV ........................................

200

PPV PCHV PVY scllv

T%

281 RiiVQ KSAQ KSAO RGS? RNSG PQFA

320

320 330 CV N?! NU I. Q . CV

Fig. 5. Alignment of six potyvirus capsid protein amino acid sequences. Amino acids identical in at least four of the sequences are boxed. Dots mark the gaps needed for optimal alignment.

335

***

Potyvirus (260-268) ; ;

***** *

-AFDFYE-V

Potexvirus (157-167) K F A A F D F F D G V

Fig. 6. A@ntent of the consensus sequences of the six potyviruses compared in Fig. 5 and of the potexviruses PVX, PMIV and WClMV in the region spanning from amino acid 260 to amino acid 268 of

PPV and from amino acid 157 to amino acid 167 of PVX. 2 indicates that amino acids are identical in all

the viruses. * indicates that amino acids are chemically similar (Dornier et al., 1987) in all the viruses. Dots mark gaps.

conserved regions were compared, although the percentages decreased in the long proteins of SCMV and PPV when the amino terminal variable region was taken into account. When the complete series of proteins was compared as a whole, a large coincidence of the homologous sequences was detected; 39% of the amino acids of the constant region were identical in all the proteins and this percentage increased to 62.2% if conservative substitutions (Domier et al., 1987) were included.

PPV capsid protein did not show significant homology to sequences in the NBRF protein sequence bank. However, when the consensus sequence of the capsid proteins of the six potyviruses analyzed in this paper was compared with that of the potexviruses papaya mosaic virus (PMV), potato virus X (PVX) and white clover mosaic virus (WClMV) (Short et al., 1986; Foster et al., 1988), besides some less stringent similarities, a conserved region was found, being the sequence FDF present in all the sequences compared (Fig. 6).

Sequence of the predicted PPV RNA-dependent RNA polymerase

The sequence coding for a protein homologous to the nuclear inch&on b proteins (NI,) of TEV and TVMV was localized upstream of the PPV coat protein cistron. NI, proteins have been postulated as the potyviral RNA-dependent RNA poly- merase based on the presence of amino acid clusters very conserved among viral proteins involved in replication. The alignment with TEV and TVMV NI, proteins (Fig. 7) indicated that the predicted PPV RNA polymerase probably originates from a Gln-Ser dipeptide cleavage and would consist of 518 amino acids giving rise to a molecular weight of 59.1 kDa. The two blocks characteristic of the viral RNA-de- pendent RNA polymerases (T/S)GXXXTXXXN(T/S) and GDD (Domier et al., 1987) could be identified at positions 310 and 351 in the PPV protein but the level of sequence similarity with the NI, proteins of TEV and TVMV was very high along the complete molecule (Fig. 7). PPV protein showed 62.0% and 60.6% identity with the NI, proteins of TEV and TVMV respectively, similar to the 62.2% identity found between them.

Discussion

C~omato~aphy in o~g~dT~~~ose of viral RNA and analysis of the RNase Tl fragments retained in this coltmm showed that, like other potyviral RNAs, PPV

336

PPV TFZV

lvnV

PPV TI~V

TVHV

PPV

4%

Ppv

lsz

z-z lv?lV

PW

G

PPV

s

g TVUV

PPV

Tz

PW TEZV

TVHV

z nMV

z TVnV

ALNMKAA

KEliVEANKT 200

s TAAPIDTLLGC IEKVENNKTR F k-l Ir.Tv

FTAAPIDTLL LIE RTFThAPImTLL

281 H,SNLYTEIVYTPIATPDCTIVKKFKG

dillNv] 1 .

320

NN’G;PsrvvDnr ILTPDGTI~KK@~KCNNSGQP~TVVDNT ULQNLYTEIVYTPISTPDGTIVKKFKGNNSG PSTVVDNT

441 480

Fig. 7. Alignment of three potyvirus NI, type protein amino acid sequences. Amino acids identical in at least two of the sequences are boxed.

RNA is polyadenylated. The size of PPV RNA poly(A) tail was very heterogeneous, ranging from more than 500 to less than 15 residues with a calculated average of about 240 nucleotides. These values are considerably higher than those described for

331

the potyvirus TEV (length distribution from about 200 to less than 33 residues) (Hari et al., 1981) and for other genomic RNAs like poliovirus (average length of 62 residues), human rhinovirus (74 residues), cowpea mosaic virus (CPMV) B compo- nent (87 residues), and similar to those of CPMV M component (162 residues) and Rous-associated virus 61 (RAV61) (213 residues) (Ahlquist and Kaesberg, 1979). The presence of preferred poly(A) chain lengths, uniformly spaced, is probably related with the number of nucleotides protected by the units used for the elonga- tion of the filamentous capsid.

In accord with the genomic structure proposed for potyviruses (Domier et al., 1986; Allison et al., 1986), we have found a unique open reading frame in the 3’ terminal 2854 nucleotides of PPV RNA, coding for part of a larger polypeptide which by proteolytic cleavage at Gin-Ser and Gln-Ala dipeptides would originate two proteins, one homologous to TVMV and TEV nuclear inclusion b proteins and the capsid protein. Similar dipeptides have been described as the cleavage sites recognized by the proteases of potyviruses (Carrington and Dougherty, 1987) and other RNA viruses (Wellink et al., 1986; Hanecak et al., 1982).

Capsid protein heterogeneity in purified preparations is a common feature in potyviruses. Conversion of slow to fast electrophoretic mobility forms after storage or incubation of partially purified virus, suggested that proteolytic degradation of the capsid protein originated the heterogeneity (Huttinga and Mosch, 1974; Hiebert and McDonald, 1976). We have determined the amino terminal sequence of the two major polypeptides present in purified PPV preparations and identified the nucleo- tide sequence coding for them, demonstrating that the slow-migrating band corre- sponds to the intact capsid protein and the fast-migrating one to a breakdown product, in which the first 70 N-terminal amino acids have been removed by proteolytic cleavage at a Gln-Thr dipeptide. The specificity of the breakdown and the similarity of the Gln-Thr cleavage site to the dipeptides recognized by viral proteases suggest that the same protease activity responsible for the normal processing of the potyviral polyprotein could be involved in the generation of the truncated capsid protein. We can not discriminate whether this truncated capsid protein originated from an alternative processing of the potyviral polyprotein and can assemble without the amino-terminal part of the molecule or it is formed by degradation of previously assembled intact capsid protein. The accessibility of the amino terminal part of the coat protein to proteolytic degradation suggests that it is located in the virion surface, in agreement with the results reported for the TEV capsid protein where only an amino terminal oligopeptide, containing a surface epitope, was released after treatment of intact virions with trypsin (Allison et al., 1985).

Sequence homology between potyviral proteins has been described (Domier et al., 1987). In accord with these data, the amino acid sequence deduced from the nucleotide sequence of PPV RNA described in this paper presented considerable similarity to the same region of the polyproteins of other potyviruses. The amino acid sequences of the coat proteins of PPV and 5 other potyviruses have been compared. All of them share a high peroentage of identity (from 54.6 to 94.1%) in the 238 carboxy terminal amino acids but only PVY and PeMV presented signifi-

338

cant homology in the amino terminal part of the molecule. These data suggest that the information for the protein-protein and protein-RNA interaction involved in the morphogenesis of the viral particles lies in the central and carboxy terminal regions of the coat protein. The existence of a truncated capsid protein, lacking an amino terminal fragment, in purified PPV preparations indicates that the amino terminal part is not necessary for the stability of the capsid, nor, perhaps, for its formation. The significance of the (I/V)DAG conserved block, present in the proximity of the amino end, is unknown; however, three of the six amino acid differences between the highly aphid-transmissible (HAT) and the not-aphid-trans- missible (NAT) isolates of TEV lie in the ten amino terminal residues (Allison et al., 1985), changing the sequence SGTDGAPADAG (HAT isolate) to _GGTVDASADyG (NAT isolate), suggesting that the conserved sequence can be important for the vector-virus interaction. Additional data are required to test these hypotheses.

Although the complete amino acid sequences of the capsid proteins of the potexviruses PVX, PMV and WClMV share only 18.2% identity, this percentage increases to 58.6% when amino acids 1444172 of PVX and 120-148 of PMV and WClMV are compared (Short et al., 1986; Forster et al,, 1988). This region showed significant sequence similarity to the potyviral capsid proteins, and contained a block rich in aromatic amino acids, present in all 6 potyviral and 3 potexviral capsid proteins, whose sequence has been reported. Amino acid blocks conserved in non-structural proteins of very distant viruses have been described, but the capsid proteins were much more divergent (Goldbach, 1986). The presence of homology between very conserved domains of capsid proteins of two viral groups which differ in genome structure and expression but share the flexuous rod morphology of the vii-ion, could suggest an important role of that region in the formation and/or maintenance of the capsid structure.

On the other hand, we have identified a polypeptide in the PPV sequence with a high degree of identity with the NI, proteins of TEV and TVMV. In this region we have localized the two clusters, (T/S)GXXXTXXXN(T/S) and GDD which are also present in the predicted sequences of TEV and TVMV NI, proteins and which are thought to form the core of the RNA-dependent RNA polymerase (Domier et al., 1987; Kamer and Argos, 1984). These findings suggest that this region of the genome encodes the potyviral polymerase. The NI, proteins of potyviruses present more similarity to comoviral and picomaviral RNA polymerases than to the replication proteins of other RNA viruses (manuscript in preparation), which could indicate a common mechanism of replication in the three groups of viruses. This is in agreement with the presence of a terminal protein at the 5’ end of potyviral, comoviral and picomaviral genomic RNAs, which is probably involved in the initiation of the replication of the RNA (Wimmer, 1982). The knowledge of the amino acid sequences of the RNA polymerases may permit defining precisely the protein regions involved in the RNA replication process, by means of in viva and in vitro functional analysis making use of cDNA clones obtained by directed mutagen- esis.

339

Acknowledgements

We thank Dr. M. Salas and Dr. E. Viiiuela for useful discussions and laboratory facilities, Dr. C. Lopez Otin for his contribution to the purification of capsid proteins and Dr. F. Garcia Arena1 for critical reading of the manuscript. We are also grateful to the Departamento de Quimica Agricola of the Universidad Autbnoma de Madrid and to the Departamento de Patologia Vegetal of the ETSI Agrbnomos de Madrid for greenhouse space. S.L. was a recipient of a fellowship from the Fondo de Investigaciones Sanitarias.

References

Ahlquist, P. and Kaesberg, P. (1979) Determination of the length distribution of poly(A) of the 3’

terminus of the virion RNAs of EMC virus, poliovirus, rhinovirus, RAV-61 and CPMV and of

mouseglobin mRNA. Nucleic Acids Res. 7,1195-1204.

Allison, R.F., Dougherty, W.G., Parks, T.D., Willis, L., Johnston, R.E., Kelly, M.E. and Armstrong, F.B.

(1985) Biochemical analysis of the capsid protein gene and capsid protein of tobacco etch virus:

N-terminal amino acids are located on the virion’s surface. Virology 147, 309-316.

Allison, R.F., Johnston, R.E. and Dougherty, W.G. (1986) The nucleotide sequence of the coding region

of tobacco etch virus genomic RNA: evidence for the synthesis of a single polyprotein. Virology 154,

9-20.

Carrington, J.C. and Dougherty, W.G. (1987) Processing of the tobacco etch virus 49K protease requires

autoproteolysis. Virology 160, 355-362.

Devereux, J., Haeberh, P. and Smithies, 0. (1984) A comprehensive set of sequence analysis programs for

the VAX. Nucleic Acids Res. 12, 387-395.

Domier, L.L., Franklin, K.M., Shahabuddin, M., Helhnann, G.M., Overmeyer, J.H., Hiremath, S.T.,

Siaw, M.F.E., Lomonossoff, G.P., Shaw, J.G. and Rhoads, R.E. (1986) The nucleotide sequence of

tobacco vein mottling virus RNA. Nucleic Acids Res. 14, 5417-5430.

Domier, L.L., Shaw, J.G. and Rhoads, R.E. (1987) Potyviral proteins share amino acid sequence

homology with picoma-, come-, and caulimoviral proteins. Virology 158, 20-27.

Dougherty, W.G., Allison, R.F., Parks, T.D., Johnston, R.E., Feild, M.J. and Armstrong, F.B. (1985)

Nucleotide sequence at the 3’ terminus of pepper mottle virus genomic RNA: evidence for an

alternative mode of potyvims capsid protein gene organization. Virology 146, 282-291.

Forster, R.L.S., Bevan, M.W., Harbison, S. and Gardner, R.C. (1988) The complete nucleotide sequence

of the potexvirus white clover mosaic virus. Nucleic Acids Res. 16, 291-303.

Edman, P. and Begg, G. (1967) A protein sequenator. Eur. J. B&hem. 1, 80-90.

Escarmis, C., Gomez, A., Garcia, E., Ronda, C., Lopez, R. and Salas, M. (1984) Nucleotide sequence at

the termini of the DNA of Streptococcus pneumoniae phage Cp-1. Virology 133, 166-171.

Goldbach, R. (1986) Molecular evolution of plant RNA viruses. Annu. Rev. Phytopathol. 24, 289-310.

Got@, K.H., Azad, A.A., Hanna, P.J. and Shukla, D.D. (1987) Nucleotide sequences of the capsid and

nuclear inclusion protein genes from the Johnson grass strain of sugarcane mosaic virus RNA. J. Gen. Virol. 68, 297-304.

Grunstein, M. and Hogness, D.S. (1975) Colony hybridization: a method for the isolation of cloned

DNAs that contain a specific gene. Proc. Natl. Acad. Sci. USA 72, 3961-3965.

Hanecak, R., Semler, B.L., Anderson, C.W. and Wimmer, E. (1982) Proteolytic processing of pohovims polypeptides: antibodies to polypeptide P3-7 inhibit cleavage at glutamine-glycine pairs. Proc. Natl.

Acad. Sci. USA 79, 3973-3977.

Hari, V., Siegel, A., Rozek, D. and Timberlake, W.E. (1979) The RNA of tobacco etch virus contains

poly(A). Virology 92, 568-571.

Hari, V. (1981). The RNA of tobacco etch virus: further characterization and detection of protein linked to RNA. Virology 112, 391-399.

340

Helhnann, G.M., Shaw, J.G., Lesnaw, J.A., Chu, L.Y., Pirone, T.P. and Rhoads, R.E. (1980) Cell-free translation of tobacco vein mottling RNA. Virology 106, 207-216.

Helhnann, G.M., Hiremath, S.T., Shaw, J.G. and Rboads, R.E. (1986) Cistron mapping of tobacco vein mottling virus. Virology 151, 159-171.

Hiebert, E. and McDonald, J.G. (1976) Capsid protein heterogeneity in turnip mosaic virus. Virology 70, 144-150.

Huttinga, H. and Mosch, W.H.M. (1974) Properties of viruses of the potyvirus group. 2. Buoyant density. S value, particle morphology and molecular weight of the coat protein subunit of bean yellow mosaic virus, pea mosaic virus, lettuce mosaic virus and potato virus Y. Neth. J. Plant. Pathol. 80, 19-27.

Kamer, G. and Argos, P. (1984) Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Res. 12, 7269-7282.

Laemmli, U.K. (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227, 680-685.

Lapeyre, B. and Amalric, F. (1985) A powerful method for the preparation of cDNA libraries: isolation of cDNA encoding a IOO-kDa nucleolar protein. Gene 37, 215-220.

Mania&, J., Fritsch, E.F. and Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.

Mekler, P. (1981) Determination of nucleotide sequences of the bacteriophage Q@ genome: organization and evolution of an RNA virus. Ph.D. Thesis, Universit%t Zurich.

Norrander, J., Kempe, T. and Messing, J. (1983) Construction of improved Ml3 vectors using oligodeoxynucleotide-directed mutagenesis. Gene 26, 101-106.

Richards, K., Guilley, H., Jonard, G. and Hirth, L. (1978) Nucleotide sequences at the 5’ extremity of tobacco mosaic virus RNA. Eur. J. B&hem. 84, 513-519.

Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463-5469.

Short, M.N., Turner, D.S., March. J.F., Pappin, D.J.C., Parente, A. and Davies, J.W. (1986) The primary structure of papaya mosaic virus coat protein. Virology 152,280-283.

Shukla, D.D., Inghs, AS., MaeKern, N.M. and Got&r, K.H. (1986) Coat protein of potyviruses. 2. Amino acid sequence of the coat protein of potato virus Y. ViroIogy 152, 118-125.

Siaw, M.F.E., Shahabuddin, M., Ballard, S., Shaw, J.G. and Rhoads, R.E. (1985) Identification of a protein covalently linked to the 5’ terminus of tobacco vein mottling virus RNA. Virology 142, 134-143.

Southern, EM. (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 503-517.

Strauss, EC., Kobori, J.A., Siu, G. and Hood, L.E. (1986) Specific-primed-directed DNA sequencing. Anal. B&hem. 154, 353-360.

Tarr, G., Beecker, J.F., Bell, M. and McKean, D.L. (1978) Polyquartenary amines prevent peptide loss from sequenators. Anal. B&hem. 84, 622-627.

Vance, V.B. and Beachy, R.N. (1984) Translation of soybean mosaic virus RNA in vitro: evidence of protein processing. Virology 132, 271-281.

Villanueva, N., Davila, M., Ortin, J. and Domingo, E. (1983) Molecular cloning of cDNA from foot-and-mouth disease virus C, Santa Pau (C-SS). Seq uence of prote~-VPl-cuing segment. Gene 23,185-194.

Vl~k, C. and Pa&s, V. (1986) Nucleotide sequence of the late region of Bacillus phage +29 completes the 19285-bp sequence of the #29 genome. Comparison with the homologous sequence of pbage PZA. Gene 46215-225.

Wellink, K.J., Rezelman, G., Goldbach, R. and Beyreuther, K. (1986) Determination of the proteolytic processing sites in the polyprotein encoded by the bottom-component RNA of cowpea mosaic virus. J. Virol. 59, 50-58.

Wimmer, E. (1982) Genome-linked proteins of viruses. Cell 28, 199-201. Yanish-Perron, C., Vieira, J. and Messing, J. (1985) Improved Ml3 phage cloning vectors and host

strains; nucleotide sequences of the Ml3 and pUCl9 vectors. Gene 33,103-119. Yoshikawa, H. and Ito, J. (1982) Nucleotide sequence of the major early region of bacteriophage +29.

Gene 17,323-335.

343

Zimmem, D. (1975) The 5’ end group of tobacco mosaic virus RNA is ~7~G[S’~p~. Nucleic Acids Res. 2,3189-1201.

Zimmem, D. and Kaesberg, P. (1978) 3’-terminal nuckotide sequence of encephalomyocarditis virus RNA determined by reverse transcriptase and chain-terminating inhibitors. Proc. Natl. Aead. Sci. USA IS, 42574261.

(Received 21 January 1988; revision received 26 February 1988).