Cloning and sequence analysis of theSchistosoma mansoni membrane glycoprotein antigen gene GP22

16
Molecular and Bwchemlcal Parasitology, 49 (1991) 83 98 © 1991 Elsevier Science Publishers B V All rights reserved. / 0166-6851/91/$03 50 ADONIS 0166685191003475 MOLBIO 01604 83 Cloning and sequence analysis of the Schistosoma mansoni membrane glycoprotein antigen gene GP22 Mohamed E1-Sherbeini 1, Naasa Ramadan 1, Keith A. Bostian 1 and Paul M. Knopf 2 1Department of Mmrobtology and Molecular Genetics, Merck, Sharp and Dohme Research Laboratories, Rahway, N J, U.S A and 2Division of Bzology and Medwme, Brown Umverstty, Prowdence, RL U.S A. (Received 19 March 1991, accepted 31 May 1991) A family of Schtstosoma mansom proteins (18-22 kDa, pI 5 3 5 8) are biosynthesized in juvenile worms and lmmunoprecapltated by antibodies uniquely present in protective Fischer rat antiserum. A cDNA clone, 2gtl 1-40, expressing epltopes common to this protein family was used to obtain a genomlc DNA clone, by hybrldlzatmn with a 2gtll-40 ohgonucleotlde probe In the 1 37 kb of genomac DNA sequenced, an open reading frame of 182 amino acids was identified on the strand corresponding to 2gtl 1-40 coding sequences [1], and those of identical independently isolated cDNA clones defining a 25-kDa surface membrane glycoproteln [2,3] The new S mansom gene is termed GP22 There are two candidate promoters, confirmed by primer extension studies with worm RNA. Promoter 1 (P1) IS preceded by a G + C-rich region and potential CAAT sequences, and is to the 5'-side of P2 Transcription from P1 is initiated at 2 different sites, apparently producing mRNAs with different translation start sites (ATG) Decoding these mRNAs yields protean products of 182 (P1), 175 (P1), 140 (P2) and 136 (P2) amino acids The polypeptides share the following features a hydrophobic segment near the carboxy terminus sufficient to span a hpld bxlayer, with a consensus sequence for thlo-esterlficatlon by a fatty acid, an external domain containing 2 potential N-linked glycosylation sites, and a candidate leucine-zipper motif, suggesting the protein may exist as a dlmer on the worm surface. While sharing these common features in their carboxy terminal regions, the three proteins dafter in the length and properties of their amino termim The 140-amino acid protein has a short hydrophobac amino terminus, while the 175- and 182-amino acid proteins have more extensive hydrophobic sequences, each preceded by a hydrophihc amino terminal sequence The heterogeneity observed in 2-dimensional gels of the antigen may be explained m part by the size and charge differences among the proteins deduced from the sequence and transcription pattern of this gene The possablhty of stage-specific regulated expression of this candidate vaccine antigen family is an attractive concept, potentially accounting for the phenomenon of concomitant immunity observed in the rat and perhaps other schlstosome hosts Key words' Schzstosoma mansonl, Multiple promoters, Membrane glycoproteln, Vaccine development Correspondence address Mohamed E1-Sherbelni, Merck Sharp & Dohme Research Laboratories, Microbiology and Molecular Genetics, P.O Box 2000, 80Y-300, Rahway, NJ 07065, U S A Note Nucleotxde sequence data reported m this paper have been submitted to the GenBank TM database w~th the accession number M34357. Abbreviations F-2x, twice infected Fischer rat protective antiserum; W-2 x, twice infected Wistar-Furth rat non- protective antiserum, SDS, sodium dodecyl sulfate, SSC, sodmm chloride, sodium citrate buffer, 2D, two-dimensional. aa, amino acid, ORF, open reading frame, 3' UTR, 3-prime untranslated region Introduction We previously identified clones in a 2gtll cDNA expression library producing fusion proteins with epltopes umquely reactive with antibodies in protective rat antiserum F-2 x, from twice-infected Fischer rats [1]. The strategy applied was termed the 'contrasting antiserum' approach. It utilized a sequential screening of clones producing fusion products reactive with protective F-2 x, but not reactive with nonprotective antiserum from twice- infected Wistar-Furth rats (W-2 x ). The latter antiserum contained comparable titers of anti-

Transcript of Cloning and sequence analysis of theSchistosoma mansoni membrane glycoprotein antigen gene GP22

Molecular and Bwchemlcal Parasitology, 49 (1991) 83 98 © 1991 Elsevier Science Publishers B V All rights reserved. / 0166-6851/91/$03 50 ADONIS 0166685191003475

MOLBIO 01604

83

Cloning and sequence analysis of the Schistosoma mansoni membrane glycoprotein antigen gene GP22

M o h a m e d E1-Sherbeini 1, Naasa R a m a d a n 1, Kei th A. Bostian 1 and Paul M. K n o p f 2 1Department of Mmrobtology and Molecular Genetics, Merck, Sharp and Dohme Research Laboratories, Rahway, N J, U.S A

and 2Division of Bzology and Medwme, Brown Umverstty, Prowdence, RL U.S A.

(Received 19 March 1991, accepted 31 May 1991)

A family of Schtstosoma mansom proteins (18-22 kDa, pI 5 3 5 8) are biosynthesized in juvenile worms and lmmunoprecapltated by antibodies uniquely present in protective Fischer rat antiserum. A cDNA clone, 2gtl 1-40, expressing epltopes common to this protein family was used to obtain a genomlc DNA clone, by hybrldlzatmn with a 2gt l l -40 ohgonucleotlde probe In the 1 37 kb of genomac DNA sequenced, an open reading frame of 182 amino acids was identified on the strand corresponding to 2gtl 1-40 coding sequences [1], and those of identical independently isolated cDNA clones defining a 25-kDa surface membrane glycoproteln [2,3] The new S mansom gene is termed GP22 There are two candidate promoters, confirmed by primer extension studies with worm RNA. Promoter 1 (P1) IS preceded by a G + C-rich region and potential CAAT sequences, and is to the 5'-side of P2 Transcription from P1 is initiated at 2 different sites, apparently producing mRNAs with different translation start sites (ATG) Decoding these mRNAs yields protean products of 182 (P1), 175 (P1), 140 (P2) and 136 (P2) amino acids The polypeptides share the following features a hydrophobic segment near the carboxy terminus sufficient to span a hpld bxlayer, with a consensus sequence for thlo-esterlficatlon by a fatty acid, an external domain containing 2 potential N-linked glycosylation sites, and a candidate leucine-zipper motif, suggesting the protein may exist as a dlmer on the worm surface. While sharing these common features in their carboxy terminal regions, the three proteins dafter in the length and properties of their amino termim The 140-amino acid protein has a short hydrophobac amino terminus, while the 175- and 182-amino acid proteins have more extensive hydrophobic sequences, each preceded by a hydrophihc amino terminal sequence The heterogeneity observed in 2-dimensional gels of the antigen may be explained m part by the size and charge differences among the proteins deduced from the sequence and transcription pattern of this gene The possablhty of stage-specific regulated expression of this candidate vaccine antigen family is an attractive concept, potentially accounting for the phenomenon of concomitant immunity observed in the rat and perhaps other schlstosome hosts

Key words' Schzstosoma mansonl, Multiple promoters, Membrane glycoproteln, Vaccine development

Correspondence address Mohamed E1-Sherbelni, Merck Sharp & Dohme Research Laboratories, Microbiology and Molecular Genetics, P.O Box 2000, 80Y-300, Rahway, NJ 07065, U S A

Note Nucleotxde sequence data reported m this paper have been submitted to the GenBank T M database w~th the accession number M34357.

Abbreviations F-2x , twice infected Fischer rat protective antiserum; W-2 x, twice infected Wistar-Furth rat non- protective antiserum, SDS, sodium dodecyl sulfate, SSC, sodmm chloride, sodium citrate buffer, 2D, two-dimensional. aa, amino acid, ORF, open reading frame, 3' UTR, 3-prime untranslated region

Introduction

We previously identified clones in a 2gtll cDNA expression library producing fusion proteins with epltopes umquely reactive with antibodies in protective rat antiserum F-2 x , from twice-infected Fischer rats [1]. The strategy applied was termed the 'contrasting antiserum' approach. It utilized a sequential screening of clones producing fusion products reactive with protective F-2 x , but not reactive with nonprotective antiserum from twice- infected Wistar-Furth rats (W-2 x ). The latter antiserum contained comparable titers of anti-

84

schistosome antibodies, but did not signifi- cantly protect rats in a passive immunization assay [4].

The cDNA insert of the clone producing the most highly antigenic, F-2 x unique fusion protein (FP), was sequenced [1]. The clone, termed clone 2gtl 1-40, has an insert of 129 bp, plus 12 bp for two 6-base flanking EcoRI sequences, totalling 141 bp. Rabbit antiserum prepared against gel-purified fusion protein from clone 2gt11-40 is cross-reactive with worm protein of about 20 kDa, obtained from soluble extracts of metabolically [35S]- methionine-radiolabeled 4-week-old mouse worms. This antigen species is among the subset uniquely recognized by F-2 x. Further characterization of the anti-FP40 antigens by 2-dimensional (2D) gel electrophoresis [5] has revealed that the uniquely recognized metabo- lically radiolabeled antigen is a family of multiple protein species, ranging from 18-22 kDa, with pIs from 5.3-5.8.

Knight et al. [2] independently isolated an identical 141-bp clone. Antibodies to their fusion protein identified a glycoprotein anti- gen present in the tegumental membrane of surface radio-iodinated adult worms, but not in 3-h schistosomula. The Mr was reported to be 25 kDa, while the product of an in vitro translation system was 22 kDa. Based on nucleotide sequencing data presented here and previously [1-3], it is concluded that the integral membrane protein described by the Mill 'Hil l group [2,3] is the same as that characterized in our laboratory [1,5]. Further- more, limited biochemical data also suggest that this antigen may be the same as that independently described by 2 other groups [6,7]. In 2D gels, the 25-kDa worm membrane antigen has a pI of 4. Immunization with adult membranes induced resistance in mice, most reproducibly when saponin was used as adjuvant [8]. Resistance correlated with anti- body to the 38-, 25-, and 20-kDa species. Thus, both the Mill Hill group and our laboratory [4] have demonstrated correlations between resis- tance and a subset of anti-schistosome anti- bodies.

In this paper we present the full-length

sequence of GP22, the gene encoding the 22- kDa surface glycoprotein antigen. This gene was identified by screening a genomic library in 2EMBL-3 [9] with a synthetic oligonucleo- tide probe based on the clone 240 cDNA sequence. Primer extensaon studies, using adult or 4-week-old worm RNA confirm the utiliza- tion of 2 promoters, with at least 3 transcrip- tion products. The possibility of stage-specific regulation of this gene, to account for the phenomenon of concomitant immunity ob- served in rats [10], is discussed.

Materials and Methods

Screening of the EMBL-3 genomic li- brary. The genomic library of S. mansoni adult worm DNA in the replacement vector bacteriophage 2EMBL-3 was a gift from P.T. LoVerde, constructed from a Sau3A partial digest as previously described [9]. Bacterio- phage from the library were plated for screen- ing on the bacterial host strain P2392 without prior amplification, and plaque lifts prepared, as described by Ausubel et al. [11]. Disks were blocked by incubation for 3 h at 42°C in prehybridization solution (0.45 M NaC1/0.09 M Tris-HC1, pH 8/3 mM EDTA/10 x Denhardt's solution/0.2% SDS/50 mg m1-1 salmon sperm DNA). Hybridization reactions were carried out for 16 h at 42°C in the same solution to which dextran sulfate (final con- centration 10%) and labeled probe (final concentration 5 ng m1-1) were added. After reaction, the disks were washed as described by Ausubel et al. [11], and analyzed by auto- radiography. The screening probe was an end- labeled ohgonucleotide 58-mer derived from the 2gtl 1-40 cDNA sequence [1], complemen- tary to bases 486-543, as shown in Fig. 3. End- labeling of oligodeoxynucleotides was per- formed with T4 polynucleotide kinase as described by Ausubel et al. [11] using [7-32p]ATP (>5000 Ci mmo1-1, 140 mCi m1-1, NEN). Prior to use, unincorporated ATP was removed by passage of the reaction mixture over a Sephadex G-25 (Select D) mini- spin column (5 Prime-3 Prime, Inc. West

Chester, PA). Probes of specific activities 3-4 x 108 cpm mg -1 were routinely obtained.

Southern hybridization and DNA sequence analysis. Southern analysis [12] was per- formed using a total of 10-30 #g of genomic DNA digested with the desired restriction enzymes, electrophoresed and transferred to Gene Screen Plus membrane (New England Nuclear, Wilmington, DE), as recommended by the manufacturer. The conditions for hybridization of oligodeoxynucleotide probes to genomic DNA blots were as described by Ausubel et al. [11] except for the temperature of hybridization (TH) which was determined empirically to be 50°C. Final washes were carried out for 1 h at 50°C in each of the following solutions: (a) 3 x SSC, 10 mM phosphate buffer, pH 7.0/10 x Denhardt's solution/5% SDS; (b) 1 x SSC/I% SDS. Nucleotide sequencing was performed by the M13 dideoxy chain termination method [13] using [35S]dATP sequenase and M 13mpl 8 and M 13mp 19 vectors (United States Biochemical Corp., Cleveland, OH). Oligonucleotides used as sequencing primers were synthesized on an Applied Biosystems model 380B synthesizer, and purified prior to use by passage through an oligo purification cartridge. Analysis of nucleotide and deduced amino acid sequences was carried out with IBI MacVector sequence analysis software (International Biotechnolo- gies, Inc.)

RNA preparation and primer extension. Total schistosomal RNA was extracted from 4-week stage or adult worms by the guanidinium thiocyanate/hot phenol method [14]. Worms were obtained from a Puerto Rican strain of S. mansoni in Biomphalaria glabrata snails and female outbred albino mice [I 5]. RNA pellets from the final ethanol precipitation of purifi- cation were lyophilized and resuspended in water. RNA concentrations were estimated by measuring absorbance at 260 nm (A260) and dividing by the extinction coefficient (5 = 24 ml m g - 1 RNA). From approx. 10 000 juvenile worms recovered from 80 mice infected 4 weeks previously with about 600 cercariae

85

each, 2-4 mg of RNA was recovered [16]. Primer extensions were performed as described by MacKnight and Kingsbury [17] using total RNA extracted from 4-week or adult worms. At the end of extension reactions, the products were ethanol-precipitated at - 20°C for at least 2 h. Precipitated DNA was pelleted, washed in 70% ethanol, resuspended in sequencing stop solution (95% formamide/20 mM EDTA/ 0.05% bromophenol blue/0.05% xylene cya- nol) and boiled 5 min prior to loading onto an 8% polyacrylamlde sequencing gel. To label the oligonucleotide primer, 100 ng primer was incubated in 20 #1 of reaction mix [11] containing 200 #Ci of [TYP]ATP (5000 Ci mmo1-1) with 7 units of T4 polynucleotide kinase for 30 min at 37°C. The unincorporated label was removed using a minispin column and 0.5 ng of the labeled primer was used for hybridization to 10-30/~g of total RNA in the primer extension reaction.

Antigenic index. The antigenic index was determined for the sequence by the method of Jameson and Wolf [18] using the MacVector sequence analysis computer program devel- oped by International Biotechnologies Inc. (IBI). The method combines information from hydrophilicity, surface probability, back- bone flexibility, along with secondary struc- tures in order to produce a composite prediction of the surface contour of a protein.

Results

Isolation of genomzc clones hybridizing with 2gtll-40 cDNA sequences. A genomic library from S. mansoni adult worms, constructed in the replacement vector 2EMBL-3 [9] was screened with a 58-mer oligodeoxynucleotide probe derived from the previously described cDNA clone 2gtll-40 [1], with the objective of isolating a full-length genomic clone. The probe corresponds to the negative strand of clone 2gtl 1-40 mRNA, and hybridizes with a single, 5.5-kb band of HindIII-digested adult worm genomic DNA (data not shown). The 2EMBL-3 library was prepared by cloning

86

Sau3A digested worm DNA into the BamHI site of the vector, which is externally flanked by very nearby SalI sites (< 0.01 kb). Approxi- mately 2 x 104 bacteriophage were screened with this probe, without prior library amplifi- cation, resulting in the identification of 3 hybridizing plaques. DNA was prepared from each phage clone following plaque purifica- tion, digested with SalI and HmdIII, and the products analyzed by agarose gel electrophor- esis and Southern blotting, using the 58-mer oligonucleotide as a hybridization probe. One of the clones, 2EMI.1, possessed a 5.5-kb HindIII fragment which strongly hybridized with the 58-mer probe (Fig. 1A, lane b). The insert size of this clone was approximately 16.5 kb, consisting of 3 SalI fragments of 7.5, 5.5,

A B

a b c d a b c

23 ; 23 9 5

9.5 6 5

6 5

4.4

2 2

2 0

: . j : ,4 , . ,

: :so:!:

22 20

0.56

Fig. 1 Southern hybridization analysis of 2EM1 1 and S mansom genomac DNA (A) DNA prepared from clone 2EM1 1 was restriction digested, and following electrophoresis on a 1% agarose gel and transfer to nitrocellulose filters, the DNA was hybridized with a radiolabeled 58-mer ohgonucleo- tide probe corresponding to 2gt11-40 Insert sequences Lanes are (a) SalI, (b) HmdIII. (c) SalI/EcoRI, or (d) HmdIII /EcoRI (B) Genom]c DNA from S mansom (10 #g/lane) was subjected to restriction digestion, Southern blotting and hybridization, as above, using a radiolabeled 1.5-kb AccI restriction fragment of 2EM1 1 as a probe Lanes are (a) SalI, (b) HmdIII, (c) EcoRI

and 3.5 kb, the latter hybridizing with the clone 2gtll-40 probe (Fig. 1A, lane a). Double digestions with SalI/EcoRI or HindIII/EcoRI yielded single hybridizing bands of 140 bp (Fig. 1A, lanes c and d), indicating that the cDNA EcoRI fragment upon which the 58-mer oligonucleotide is based, lS an internal se- quence of both the 3.5-kb SalI and the 5.5-kb HindIII fragments. The remaining 2 hybridiz- Ing plaques were not further analyzed.

The region of 2EM1.1 containing the 140-bp EcoRI region was localized to a larger, 1500- bp AccI fragment, which was then used as a probe in Southern blots with adult worm genomic DNA (Fig. 1B). This yielded a single hybridizing SaII band of 3.5 kb (Fig. 1B, lane a), a single hybridizing HindIII band of 5.5 kb (Fig. 1B, lane b), and hybridizing EcoRI bands of 4.9 and 2.1 kb (Fig. 1B, lane c). The 140-bp EcoRI fragment was too small to be detected in the blot. The 2.1-kb fragment corresponded to a similarly sized fragment seen in EcoRI digestions of 2EMI.1 (data not shown). The 4.9-kb fragment was not among the EcoRI

A G N B E E H IsrP [H. . ) . . ) I . . . . . . Y!3....~....,

~ ~ l O O b p

/ ORF ~ ~ ~ I

." ~ JiI n

g II, ¢ ,3

4 • p K B 1 r-

¢ = p K B 5 ¢

¢ ; pE6

Fig 2 Restriction map of the 3.5-kb-Sa/I fragment and overall strategy for sequencing the S mansom GP22 gene region A restriction map is shown above, and the sequencing strategy is depicted below The 1371-bp region sequenced is Indicated by the striped box Sites are. (B) BamHI, (S) SalI, (H) HmdII], (A) AecI, (E) EcoRI, (P) PstI NarI (N) and BglII (G) are unique restriction s]tes identified by sequence analysis The large arrow indicates the open reading frame, and direction of translation The location of internal priming sites (open boxes) and extent of sequencing (small arrows) is Indicated The location of primers used for 5' end mapping is ln&cated by solid boxes

87

products of 2EMI.1 and presumably corre- kb SalI and 5.5-kb HindIII fragments of sponds to a region of DNA extending beyond 2EMI.1 were subcloned into the SalI and the clone. On the basis of these results, the 3.5- HindIII sites (respectively) of the plasmid

-460 -440 -420 -400 GAGACTCTAGGTTACATGGCTCAGAATCAATGACAATGGCGTAGGTGTATACACTCTTTATCTTCCCTTAAACCTTG

-380 -360 -340 -320 AGATTAAAATTGCCTCATAACTTTTTTCCTTCCTGTACTATATCGTTATATACAACCTATCTTTTATATACTACCACAAC

-300 -280 -260 -240 TAAATTAcCTATCTCTATGAATTGCGTGTTCATCTTGTTGTGCTAACGGGGTATGGCAACTTGGAcCGATGCATATGTGT

-220 -200 -180 -160 GCCTGGTCCTACGTTGTAGCTGACGACGACGACTTTGTGGATCAGATCTTTAGGTCAAAGGCTcCGGATGTGGCCCCCTA

140 120 I00 -80 AGTAAACCATCTGCTTCAGTTTGAGCACCCGGACAGTCACACGGCCCTCATACAAATCAAATCAGATTTGTGTGACGCAT

-60 -40 -20 1 ATCTGTATCTGGCGCCCCTTTGTATCAATATTTATGTGTTTTTTTAAACTTATGACAAGAAAGTAAAAACAAAAAAATTT

Atl 20 40 60 80

ATCACATTTATAAGAAATATTCATGATGTTACTGAGCcAAAAAAAAGAAAAAAAAGAGAAAAAAGAAAAAAAAGAAAAAC

i00 ATG TTT Met Phe At3

160 GAA CGG Glu Arg

220 ATC ATA Ile Ile

280 TTA GAT Leu Asp

340 GAT GAA Asp Glu

40O CAA GAA Gln Glu

460 GAA ACA Glu ThE

520 TTA GAT Leu Asp

58O AAT GAA Asn Glu

640 AAA TCT Lys Ser

7OO TAT GTT Tyr Val

760 TTA TTA Leu Leu

At2 120 140

TCC TTT CAT GAA CGA ATG AAA AAA AAA CAT GTA ACC ATT Ser Phe His Glu Arg Met Lys Lys Lys His Val Thr Ile At4

180 200 ATA TTA GTG TTC CTT TTT GTT TTA TTC ATT TCA ACT ACT Ile Leu Val Phe Leu Phe Val Leu Phe Ile Set Thr Thr

&t5 At6 240 260

ATG ATT CAA ACA ATG AAT TGG ATC GTT TGG AAA CTA TTC Met Ile Gln Thr Met Asn Trp Ile Val Trp Lys Leu Phe

300 320 TTA TTC TCT TTA AAA TTG GTA AAT TCT GAA GAG AAC AGC Leu Phe Ser Leu Lys Leu Val Asn Ser Glu Glu Asn Ser

360 380 GAT TAT GAT CAT TAT AAT AGC TCT CTT GAT TCA TCT AAT Asp Tyr Asp His Tyr Asn Ser Ser Leu Asp Ser Ser ASh

420 440 GCA TTC CAT AGA AAC TCG GAT CCT GAT GGA TTT CCG GAA Ala Phe His Arg Asn Ser Asp Pro Asp Gly Phe Pro Glu

480 500 TCT ATT GAA ATT AAA GAA GAA TTA GGC CAA GAA CTT CAT Set Ile Glu ~ Lys GIu Glu Leu Gly Gln Glu ~ His

540 560 GAA TTA AGT AGA AGA ATA AGG GCA ACT CCA AAT TCA GCA Glu Leu Ser Arg Arg Ile Arg Ala Thr Pro Ash Set Ala

600 620 TTC TTA ATG AGT AGT ~GT ATT GTG ATT ACA TTG AAT CTA Phe Leu Met Ser Ser Cvs Ile Val Ile Thr Leu Asn Leu

660 680

AGG GTA TAT AAA CAT Arg Val Tyr Lys Eis

GAT TTT TCT ACT GAA Asp Phe Ser Thr Glu

ATT ATT TAT ATT AGT Ile Ile Tyr Ile Ser

AAC AGT ATC ATT ACC Ash Ser Ile Ile Thr

AAT GTC AAG CAT TCC Ash Val Lys His Ser

TAC GAA TTC TTG AAT Tyr GIu Phe LeuuAsn

CAA TTA CAA CTT ATA Gin Leu Gln Leu Ile

AAT AAA TAT ATG AAA Ash Lys Tyr Met Lys

TTC ATA TTT ATG TAT Phe Ile Phe Met Tyr

TA ATG ATA ACA ATA ACA AAA ACA AAA CTA CTT ATT ATT CTT AAA AGT AGT TTA Met Ile Thr Ile Thr Lys Thr Lys Leu Leu Ile Ile Leu Lys Ser Ser Leu

720 740 TTA ATA AAA TTT ATC AAA AAG GTT GTT ATT ATA ATG TAT CAT ATA CTT TTC CTG Leu Ile Lys Phe Ile Lys Lys Val Val Ile Ile Met Tyr His Ile Leu Phe Leu

780 800 820 AAA 2~A~TA~AGTGATTGTAATTTCAATAAATTCATTACTTTTTATTATTATACTACTTAAATATAATAA nys

840 860 880 900 TGAGTAGTAACAAAATGTTAATTTCTAATCTTTTGAAGCGAGTTCTTGGATACGCATTGCTGAGAAGTCCCACAATAAG

Fig 3 Nucleotide sequence and predicted amino acid sequences of the S mansont GP22 gene Consensus sequences In the 5' and 3' regions referred to in the Results and Discussion, and the Ec oRI sites of the 2gtl 1-40 cDNA clone are underlined Putative N-hnked glycosylataon sites and the putative thlo-esterlficatlon and leucine zipper site are also underlined. Transcription start sites are labeled

and marked underneath the sequence by sohd triangles.

88

vector pUC 18 for further analysis. DNA from one of the pUC18 subclones,

pCS3.5L, containing the 3.5-kb SalI fragment, was used for further restriction mapping and for subcloning into M13 vectors for sequen- cing. For restriction analysis, pCS3.5L DNA was digested singly and in combinations with EcoRI, SalI, HindIII, AccI, BamHI and PstI, and the restriction fragments analyzed by agarose gel electrophoresls and Southern blotting using the 58-mer probe. The resulting restriction map of the 3.5-kb Sall fragment is depicted in Fig. 2. As can be seen from the map, the EcoRI fragment representing the previously characterized 2gtl 1-40 cDNA clone is positioned centrally within the SalI fragment, at a distance of 2.3 and 1.1 kb from either end. Subclones of the 5.5-kb HindIII fragment were not analyzed.

Nucleotide sequence analysis o f the recombinant clone. The overall strategy for sequencing the pertinent region of 2EM 1.1 is shown in Fig. 2. The 3.5 kb SalI fragment from pCS3.5L was cloned into the SalI sate of the vector M13mp18, and recombinants containing the SalI fragment in opposite orientations were obtained and used for sequencing. Oligonu- cleotides corresponding to both strands of the 140-bp EcoRI region of 2gtl 1-40 cDNA were utilized to prime sequencing reactions, as were the synthetic nucleotides pKB1-KB10, derived from the nucleotide sequence data. The resultlng 1371 nucleotide sequence is shown in Fig. 3. The 2gtl 1-40 cDNA sequences (141 nucleotides, 47 codons) are located without interruption at position 444-584. The 2 EcoRI sites flanking the cDNA sequence are also present in the genomic DNA. A characteristic feature of the sequence is a high A + T content, amounting to 69.7% for the 1371-bp sequence and 72.2% for the ORF1 sequence (see below).

Translation of the genomic DNA sequence on both strands yielded only one extensive open reading frame (ORF), at positions 96- 641 (ORF1), along with a second, shorter ORF (ORF2), 2 bp down-stream of ORF1 (posi- tions 644-763). ORF1 contains the predicted 47-codon cDNA sequences of clone 2gt11-40

[1] followed by 3 tandem stop codons (TAA TGA TAA). The amino terminus starts with the fourth in-frame methionine codon up- stream of the )~gtll-40 cDNA sequences, resulting in a 182 codon ORF. ORF2 has a length of 40 codons. In addition to candidate translational initiation and termination sites, ORF1 contains 2 N-linked glycosylation sequences (Asn X Ser/Thr, positions 357 and 453), a candidate thio-esterification site (Cys Ile Val; position 597), and a putative leucine zipper motif (positions 450-539), all indicated in Fig. 3, and described further in the Discussion.

In the 556 bp of sequence to the 5' side of ORF1 there are 6 candidate promoter 'TATA' boxes, beginning at positions -339 , -321 , -234 , - 39 , +24 and +31, respectively. These are indicated by underlined regions m Fig. 3. Two G+C-r ich regions, representing potential SP1 binding sites [19] are located 16 bp (GGCGCCCC) and 65 bp (CGGCCC) upstream of the candidate TATA box at position -39 . The entire region contains only poor matches to the consensus CAAT box sequence (GC/TCAAT) [20]. However, a potential CAAT box-like sequence in tandem (CAAATCAAAT) is located in a relevant position between the two G+C-r ich regions. A sequence (TGTGGATCAG) with high similarity to the enhancer core (C/EBP) sequence [21] is located at position -190. An additional candidate TATA box within the ORF1 (position 145-152) may be relevant, based on primer extension experiments descri- bed below.

In the 268 bp of sequence to the 3' side of ORF1 there are two candidate AATAAA polyadenylation signals, at position 703-708 and position 786-791. Thirteen nucleotides downstream from the first AATAAA element is a sequence (position 721-732) in good agreement with a G+T-r ich element [22] thought to be involved in 3' end formation of some eukaryotic PolII genes. Two bp down- stream from the second AATAAA sequence is a potential cleavage site (CATT), followed 2 bp later by a T-rich sequence (TTTTTATTA) thought to also play a role in 3' processing of

B

t l t2

t3 t4

t5 t6

l b -

I:

A X1 X2 X3 G A T C X4

%

r~

%,, ~;

. %

: I v

t3 ~,~

t5

t6 I~

X G A T C G A

t5 ~,-

TC X

C

89

Fig 4 Primer extension analysis of the 5' ends of GP22 RNA with ohgonucleotades pKB1 and pE1 (A) Primer extension with ollgonucleotlde pKBl Radlolabeled 20-mer ollgonucleotlde pKB1 was annealed to either 30 ~g of total RNA, or 5 #g of poly(A) RNA prepared from 4-week worms The same probe was also annealed to 30 #g of total RNA prepared from 8-week worms Reactions were performed as described in Materials and Methods and the resulting products were analyzed on a 6% denaturing polyacrylamide gel Unlabeled pKB1 was also used in sequencing reactions using as a template M13 clone carrying GP22 Lanes are. (x 1) pKB 1 extension products (4/11) from total 4-week worm RNA, (x2) pKB I extension products (1 ~1) of same reaction as in lane x 1, (x3) pkB1 extension products (4/H) from 4-week poly(A) RNA, (x4) pKB1 extension products (1 #1) from 8-week total RNA Lanes (G,A,T,C) are the sequencing products of MI3-GP22 using pKB1 as a primer (B) pkB1 extension products (4 ~tl) from 4-week poly(A) RNA (like panel A, lane x3) electrophoresed for a longer period of time (C) Primer extension with ohgonucleotlde pE1 Radlolabeled 34-mer oligonucleotlde pE1 was used in primer extension and sequencing reactions as described above for pKBI Lanes are (x) pE1

extension products (4/4) from 4-week total RNA, (G, A, T, C) the sequencing products of MI3-GP22 using pE1 as a primer

90

some PolII genes [23]. The 268-bp region is also very A + T rich, containing an exact copy of an element (TTATTATT) and variants thought to have a potential role an destabiliz-

mg m R N A or precluding translation [24].

Mapping of transcription start sites. A series of primer extension studies were undertaken to

X

tl ~,~ @"

v

";I

i

A C T A G

;~

L -,4~t~4 t2 ~,~

B X1 X2 C T A G

Fig 5 Primer extension analysis of the 5' ends of GP22 R N A with ohgonucleotldes pKB5 and pE6 (A) Primer extension with ollgonucleotlde pKB5 Radiolabeled 26-mer ohgonucleotlde pKB5 was annealed to 10 #g of total RNA, prepared from 7-week adult worms Extension and sequencing reactions were performed as described in Fig 4 Lanes are (X) pKB5 extensJon products (1 #1), (C, T, A, G) the sequencing products of M 13-GP22 using pKB5 as a primer (B) Primer extension w~th ohgonucleotlde pE6. Primer extension and sequencing reactions were performed as above. Lanes are (X1) pE6 extension products (1 #1), (X2) 3 #1 of the same reaction

mixture as m lane X1 Lanes (C, T, A, G) are the sequencing products of M13-GP22 using pE6 as a primer

define the transcription start site(s) of the GP22 gene. Four synthetic oligonucleotides, pKB1, pE1, pE6, and pKB5, corresponding to positions 333-352, 241-274, 51-74, and 33-58, respectively (Fig. 2B), were used in primer extension reactions. Total RNA from 8-week adult worms and from 4-week juvenile worms, and poly(A)-- RNA from 4-week worms, were used as templates. Sequencing reactions were performed alongside the primer extension reactions, using the same primer and an M13 vector carrying a cloned SalI genomic frag- ment. With the exception of primer extension products terminating at t l and t2 (see below), there is no ambiguity in mapping the positions of the 5' ends, since the extension products contain no potential intervening sequences which would leave the 2gtll-40 cDNA read- ing frame intact (see Discussion).

A total of 6 transcription start sites were identified in RNA from the adult or juvenile worms (Figs. 4 and 5). With 8-week worm RNA, two strong bands were seen at positions 96 (t3) and 174 (t5), in addition to a weak band just below each of these positions, at 102 (t4) and 180 (t6) (Fig. 4A, lane X4). Two longer products at tl and t2 were also produced (Fig. 4A, lane X4); their exact positions were ascertained in other reactions (see below). Total and poly(A) RNA from the 4-week juvenile stage showed a predominant band corresponding to t5, accom- panied by a lower band at t6, and weak bands at positions 96 (t3) and 102 (t4) (Fig. 4A, lanes Xl and X3}. The positions of t3-t6 were more accurately determined in the reactions shown in Fig. 4B, lane X. To confirm the results obtained with pKB 1, an independent 34-mer oligonucleo- tide (pE1), corresponding to sequences 51 nucleotides upstream of pKB1, was used in primer extension reactions using 4-week juve- nile stage total RNA as a template. Like pKB 1, this primer produced a predominant band at the t5 position (Fig. 4C, lane X). However, pE1 primed more poorly in these reactions, yielding lower amounts of extension products. Minor bands at t3, t4 and t6 were not readily seen.

To more accurately map the positions of the 5' most ends of GP22 RNAs (tl and t2), primer extension reactions were performed with

91

additional primers corresponding to regions further upstream of the location of pKB1. Synthetic ohgonucleotides within the region from positions 100-200 were unsuccessful in priming reactions from RNA or from single- stranded DNA templates, probably due to high local secondary structure (data not shown). However, 2 primers, pKB5 and pE6, generated a major band at tl (Fig. 5A and B), corresponding to position 1 in the sequence (Fig. 3), provided splicing does not occur within the 31-bp region of RNA correspond- ing to the pKB5 extension product. This is the case for at least one cloned and sequenced GP22 transcript [3]. However, there is a splice acceptor site 2 nucleondes upstream of pKB5 (position 15-30), in good agreement with the consensus sequence for S. mansoni [25-28] (14/ 16, WNYDWYWDY4HAGM) derived from the 13 published S. mansoni introns, which may be utilized for alternative sphclng. If so, the 5' end of tl would map further upstream, perhaps in close proximity to another candi- date TATA box. Several appropriate splice donor sites for such a hypothetical reaction exist within the 461-bp 5' upstream region. ORF1 (see below) would be unaltered. The primer pE6, located 18 nucleotides down- stream of pKB5, also produced a band at position 24 (Fig. 5B), presumably correspond- lng to t2. This product extended 27 nucleotides from the 3' end of pE6 and just beyond the

5 00 4 00

g 0 00 • .oo. / I I -3 00 Y -4 00

-5 00 20 40 60 80 100 120 140 160 180

I II II II I I II HI IV

I O0 . 080 1 o 60 040 q ~ &

o

-0 40 , -060 . RW ~ r', • VV -0 80 ,, IW ' -1 O0

20 40 60 80 100 120 140 160 180

A m i n o Acid Postbon

Fig 6. Hydrophlhclty plot (top) and antlgemc index (bottom) of the 182-amino acid GP22 gene product Regions I, II, III, IV are

described m the text

92

region where abortive Initiation took place.

Analysis of amino acid sequences. The 182 aa product of ORF1 has a molecular weight (21 665) and pI (5.38) in close agreement with the empiric values established for the family of schistosome anti-FP40 proteins by 2D gel electrophoresis [4]. A hydrophobicity analysis was therefore performed on the 182-residue amino acid sequence, using the Kyte and Doolittle algorithm [29] with a span of 7 (Fig. 6) and the antigenic index was deter- mined for the sequence by the method of Jameson and Wolf [18]. There is a hydrophilic segment of 23 residues at the amino terminus (ignoring 2 minor hydrophobic sections). This terminal segment (called Region I) is followed by an extensive hydrophobic segment (called Region II) of almost 50 residues (ignoring one minor hydrophilic section). Following Region II is another mainly hydrophilic segment of about 90 residues (called Region III), which is interrupted by a short hydrophobic section. Virtually all of the predicted B-cell antigenic epitopes are located at the beginning and end of Region III. Region IV at the carboxyl end is again a hydrophobic segment of 17 residues, followed by the final 2 hydrophilic amino acids. The hydrophobic segment of Region IV IS the minimum length for a membrane- spanning domain, leaving the terminal two residues outside the membrane. In support of this transmembrane hypothesis, the Region IV sequence also possesses a candidate site for post-translational modification by acylation of the Cys residue (Cys Ile Val, at position 597- 605) [30,31]. Palmityl residues added to such sites in other proteins, by formation of a thio- ester linkage, serve as lipid anchors. The short hydrophobic section of Region III identifies an amino acid sequence which is a candidate leucine-zipper (Fig. 3): Leu (position 452454), Ile (position 471-473), Leu (position 495-497), Leu (position 516-518), Ile (position 537-539). Perfect zippers are composed of helical regions in which L or I residues are separated by 6 intervening amino acids [32]. The span of 5 L and I residues in Region III of this sequence is imperfect, being separated by 7 amino acids in

one case. Since the consequences of this distortion have not been evaluated, the proper- ties of this hydrophobic segment are uncertain. The amino acid sequence deduced from this genomlc clone is identical to that deduced from the cDNA clone recently published by Ali et al. [3], with one amino acid difference. They have an Ile at position 234-236 which is met in the genomic clone due to a C versus a G at position 236.

Discussion

The 141-bp EcoRI 2gtl 1-40 cDNA fragment of the S. mansoni GP22 gene [1] was used to obtain and analyze the corresponding genomic region of S. mansoni. From this analysis, several interesting features of the GP22 gene, its transcription/translation product(s), and the flanking genomlc sequences were uncov- ered.

3' regzon. This IS the first S. mansoni gene reported with three translation stop codon signals in tandem (TAA TGA TAA). Most other termination sites in published schisto- some sequences (15/19) have single stop codons. The remainder have 2 stop codons in tandem [33-36]. Like several other schlstosome genes, the 3'-UTR (untranslated region) con- tains at least 2 consensus polyadenylation signal sequences. Both candidate polyadenyla- tion sites are associated with other sequence elements thought to influence 3' end forma- tion. Upstream and immediately adjacent to the first AATAAA site is the tripartite sequence TAGT-TATGT-TTT, reportedly im- portant in yeast transcription termination [37]. Just downstream from this AATAAA is a good match to a G + T - r i c h element (GTTGTGGT; ref.22) that may be critical to poly(A) site formation. The second AATAAA element is associated with a potential CATT cleavage site, and a T-rich element (TTTTTA- TA, ref. 17), thought to be a second and alternate motif to GTTGTGGT. A third AATAAA element is present within the 3' end of ORF1 at position 561-566, and could serve as a poly(A) site for transcripts of "a

truncated protein. Also present in the 3' region is the sequence TTATTATT, and related A+T- r i ch elements. A sequence of this AT content in the 3'-UTR has been shown to destabilize m R N A in some eukaryotes [24]. Thus, one or more of the AATAAA containing regions, or even one beyond the current range of sequenced DNA, may be used for differ- ential cleavage and 3' end formation of GP22 transcripts. These events and differential m R N A stability may be exploited as a means for the genetic control of GP22 expression. Sequencing of multiple cDNA clones (in progress) may resolve this issue.

5' region. There are as many as 6 candidate TATA box-like elements in the 461 bp of known sequence in the GP22 5' region. This is not atypical of eukaryotic promoter regions, and in fact, utilization of at least 2 of these elements for transcription initiation is strongly supported by primer extension data. The 2 most likely candidate promoters are P1 and P2. P1 is designated by the TATA box element at - 3 9 , 39 bp upstream from the tentatively assigned 5' end of the major primer extension product tl . P2 is designated by the TATA element beginning at position 145, 21 bp upstream from the assigned 5' end of the major primer extension product t5. Assign- ment of the 5' ends is based upon the lack of RNA splicing in the primer extended regions. While no potential intervening sequences exist in the region defining t5 (P2), tl (P1) could map to a position further upstream, if the candidate splice donor sequence at positions 15-30 is utilized. This would place the up- stream TATA elements within an appropriate distance from ORF1 transcription and transla- tion start sites.

Recent studies indicate that multiple TATA- binding proteins do exist in eukaryotic organ- isms, and that TATA elements with different sequences bind these proteins preferentially [38]. Several lines of evidence have also shown that TATA elements of different sequence have distinct functions [39,40]. The elements in the GP22 region fall into 3 general types of TATA elements found in eukaryotic systems. The type

93

1 sequence (TATATA), at positions - 3 3 9 and -321, is present in the yeast CYCI gene [41]. The type 2 (TATTTAT) sequence, at positions - 3 9 and + 24, is present in SV40. The type 3 sequence (TATATAAA), at position 145, is present in many eukaryotic promoters [39,40]. In the yeast CYC1 gene, 5 TATA-like elements lie close to or within the m R N A initiation region. Only the 2 most upstream TATA elements are required for normal expression. For these elements to function together they have to be of different types. Only when both upstream elements are inactivated do the 3rd and 4th TATAs become functional.

The candidate promoter P1 has 3 potential consensus promoter recognition elements: a pair of G+C- r i ch SP1 binding sequences which surround tandem CAAT-Iike sites and lie just upstream of ATATTTAT. Based on the primer extension data, two translation initia- tion sites 3' to P1 and in-frame with the 2gtl 1- 40 sequence are utilized. The first of these (ATGTTTT), 69 bases downstream from the TATAA, is contained within the best match to the consensus 5'-transcription initiation se- quence ATCA(G/T)T(C/T), using insect data [42] and other schistosome genomic sequences ATCA(T/C) [43,44] and ATCTGTT [26]. This is the third ATG from the mapped 5' end of P1 mRNA, but this in-frame ATG has a much more favorable context for translation initia- tion than the first [45]. Seven codons down- stream from this first in-frame ATG is the second ATG. This ATG is 22 bp downstream from the shorter transcript presumed to be derived from P1. Translation from these 2 ATG codons through to the stop codons yields proteins of 182 and 175 amino acids (aa), respectively. The TATATAA element for P2 is further downstream, with a translation initia- tion site 48 bases from the mapped 5' RNA end. Again, there is a second ATG, located at the fourth codon following the initiation site, and the possibility that it too may be utilized for initiation cannot be ignored. Translation from these ATG codons to the stop signals would yield proteins of 140 and 136 amino acids, respectively.

A precedent for multiple translation initia-

94

tion sites from the same m R N A exists [46]. If this occurs for GP22, then the final 136 amino acids of all protein products would be identical. Such a situation becomes very interesting when the properties of these various amino acid sequences are examined. However, these conclusions need to be con- firmed by other methods, including S1 nucle- ase mapping and isolation and analysis of additional cDNA clones. If the multiplicity of transcripts is confirmed, with their different translational initiation sites, it would account at least in part for the size/pI heterogeneity detected by immunoprecipitation of 4-week worm antigen by anti-FP40 antibodies [5]. Three or more transcripts of possibly differing lengths would also account for the approxi- mately 100-nucleotide spread of the poly(A)+ RNA (700-800 bases) observed in the North- ern blotting assay [16].

The presence of 2 (or more) promoters controlling the expression of a single copy gene has been well documented for several systems (reviewed by Schibler and Sierra, ref. 47): human porphobilinogen deaminase, 2 promoters/2 enzyme isoforms [48]; rat IGFII, 4 promoters/1 polypeptide [49] and several genes with 2 promoters/1 polypeptide [50-53]. In most of these cases, there is tissue-specific expression from the alternative promoters. In all instances, each promoter has binding sites for accessory transcription factors (e.g., GC- or CAAT-boxes) preceding each of the TATA- boxes, unlike the S. mansoni GP22 gene. Either the same GC- and CAAT-boxes are used for P1 and P2, or there are other uncharacterized cis-acting elements which precede P2. The relative abundance of primer extension pro- ducts obtained from transcripts in juvenile versus adult stage RNA suggests that P1 and P2 may be preferentially utilized at different stages of the S. mansoni life-cycle.

In each of the published cases of multiple promoters cited above, the transcripts con- tained introns, and alternative splicing occur- red. There is no evidence that any of the 3 possible intervening sequence elements within GP22 (see below) are utilized in juvenile or adult worms. The products of such transcripts,

however, would encode overlapping proteins. The first intron would be contained within the 5' untranslated region of the longest tran- scripts. The second would remove the first 16 amino acids of the 182-aa product; the third would remove the last 4 amino acids of the 182-aa product, extending the protein by 50 residues.

Coding region. Based on the regions of GP22 RNA mapped by oligonucleotide hybridiza- tion (primer extension), and the reading-frame context of the 2gtl 1-40 cDNA sequences, it is unlikely that ORF1 is interrupted by interven- ing sequences. All minimal splice-donor (GT) and acceptor (AG) sites between positions 142 and 590 can be ruled out by eliminating introns which would span known (hybridizing) regions of GP22 RNA, and those that would introduce stop codons interrupting the downstream 2gtll-40 cDNA reading frame. The only remaining potential intron within ORF1 would interrupt the first methionine codon (position 96-98), by joining the AT with the G at position 143, creating a new start codon, and eliminating the first 16 codons of ORF1. The donor and acceptor sequences, however, are poor matches to the general eukaryotic consensus splice donor (4/9; M A G G T R A G T ) and acceptor (10/16; Yl lNYAGG), or to consensus S. mansoni donor (5/9; MADG- TRAGT) or acceptor (12/16; WNYDWY- WDY4HAGM) sequences [25-28]. A poten- tial intron also exists at the end of ORFI , which could extend ORFI into ORF2. A donor splice site at the fourth codon from the end of ORF1 (5/8; WKGTAWGT) could join to an acceptor site within codon 15 of ORF2 (11/15; YYNDYWYsMAGM), resulting in the removal of 56 nucleotides of intervening sequence and an RNA coding sequence of 228 amino acid residues. The product would have a molecular weight of 27,065 and a pI of 5.59. Neither of these splicing reactions took place in the known, GP22 mRNA, cloned from adult worms [3]. Whether such introns are spliced in other stages in the life cycle remains to be demonstrated.

At least 4 hypothetical translation products

of unspliced transcripts of the GP22 gene have been identified (182, 175, 140, 136 aa), all sharing a common 136-aa carboxy terminus. The 2 N-linked glycosylation sites (Asn X Ser/ Thr), the acylation site (Cys A A), where A is any aliphatic amino acid within a candidate transmembrane domain (hydrophobic segment of 17 aa), the 'leucine-zipper' dimerization site, and all putative B and T cell epitopes predicted by computer modeling are present in this terminal 136-aa region. The 182- and 175-aa proteins would contain, in addition, an extensive hydrophobic region separating the common segment of these proteins from a hydrophilic amino terminus. The length of this extensive hydrophobic region may be sufficient to span the pentalaminate tegumental mem- brane of the parasite, or possibly fold within the membrane to expose the amino terminus. The 140-aa protein product lacks the hydro- philic amino terminus, and the truncated hydrophobic segment remaining is too short to span the membrane.

Leucine zippers are found to function as regions of protein-protein interactions, e.g., in dimer formation [54], and the hydrophobic section containing the site may serve this function even if the putative zipper motif were to prove invalid. If Region IV is anchored to the membrane [2], then Region III would be on the opposite side from the carboxyl tail of two amino acids. In support of this orientation, there are 2 candidate glycosy- lation sites within Region III, a motif common to a large number of eukaryote transmembrane proteins.

Regions I and II are of interest with respect to the issues of initiation of transcription and translation. The putative 182 and 175 aa proteins would only differ in the length of the hydrophilic Region I. Both would contain the long Region II hydrophobic segment. Schisto- somes produce the protein(s) encoded by GP22 after acquiring their pentalaminate tegument [2,3]. Thus, one could propose that the extensive hydrophobic sequence may be re- quired to span such a structure, which would place Region I on the opposite side of the tegument from Region III. Alternatively, the

95

Region II sequences may fold within one bilayer leaf of the tegument. If there is a single 'bend', then Region I would be exposed on the same side of the membrane as Region III; if there are two bends, then these regions would lie on opposite sides of the lipid bilayer, but not span the pentalaminate structure. Post- translational cleavage of the amino terminus at the threonine residue (positions 200-202) [3] would generate a mature protein having 147 amino acids. The other putative protein encoded by this gene, having 140 amino acid residues, would lack all of Region I and about half of Region II. This would leave a sufficient number of hydrophobic residues to 'bury' the amino terminus into the membrane. Thus, variations in the amino terminal sequences of GP22 products would share certain properties (Regions III and IV) but differ in their amino ends.

Confirming these features of the protein products of the GP22 gene, and establishing criteria for their expression may hold impor- tant clues to successful development of an anti- schistosome vaccine. The possibility of stage- specific expression of the different members of this protein family is an attractive concept. We have conducted preliminary studies in which surface radio-iodinated 3-h schistosomula were shown to have antigens reactive with F-2 x (protective Fischer rat serum ) but not with anti-FP40 (E1-Sherbeini, unpublished). The Mill Hill group also failed to find surface radio-iodinated antigen in 3-h schistosomula with their related antisera, but report unpub- lished results on finding antigen on lung stage parasites [55]. We detect a more heterogeneous array of antigens in biosynthetically radiola- beled 4-week worms than detected by the Mill Hill group in surface radio-iodinated 8-week worms, although there may be other explana- tions unrelated to stage-specific expression (e.g., differential tissue expression, incomplete post-translational processing). Expression of different forms of the antigen may be the mechanism used by the parasite to create specific immunity to reinfection, an immune response to which the adult worms which induced it are insensitive. This is the phenom-

96

enon of concomitant immunity [10]. For instance, the differences in N-termini could result in folding of the amino-terminal hydro- phobic segment (Region II) within the mem- brane, permitting the hydrophilic amino termini of the 182- or 175-aa forms to be exposed on the external surface, where they could bind host proteins and shield the hydrophilic Region III sequences (containing predicted epitopes) common to all forms.

Acknowledgements

This work was supported by grants from the UNDP/World Bank/WHO Special Pro- gramme for Research and Training in Tropi- cal Diseases [T16/181/B2/47/E and B20/181/ 53] and the US Public Health Service, NIH [AI-21380 and AI-31224], and by an NIAID supply contract [AI-02650]. The technical assistance of Ms. Linda Fitzgibbons is greatly appreciated. We wish to thank Dr. P.T. LoVerde for the genomic 2EMBL-3 library of S. mansoni adult worm DNA, Jim Occi for the preparation of the pkB oligonucleotides, and Michael Justice for help in designing and preparing the PE oligonucleotides. MES was the recipient of a fellowship from the Irvington Institute for Medical Research.

References

1 E1-Sherbeini, M., Bostian, K.A. and Knopf, P.M. (1990) Schisosoma mansoni: cloning of antigen gene sequences in Escherichia coli. Exp. Parasitol. 70, 72 84.

2 Knight, M., Kelly, C., Rodrigues, V., Yi, X., Wamachi, A., Smithers, S.R. and Simpson, A.J.G. (1989) A cDNA clone encoding part of the major 25 000 dalton surface membrane antigen of adult Schis tosoma mansoni. Parasitol. Res. 75, 280-286.

3 All, O.P., Jeffs, S.A., Meadows, H.M., Hollyer, T., Owen, C.A., Abath, F.G.C., Allen, R., Hackett, F., Smithers, R.S. and Simpson, A.J.G. (1991) Structure of Sm25, an antigenic integral membrane glycoprotein of adult Schistosoma mansoni. Mol. Biochem. Parasitol. 45, 215-222.

4 Barker, R.J. Jr., Srivastava, B.S., Suri, P., Goldberg, M. and Knopf, P.M. (1985) Immunoprecipitation analysis of radiolabeled protein antigens biosynthesized in vitro by Schis tosoma rnansoni: identification of antigens

uniquely recognized by protective antibodies. J. Im- munol. 134, 1192 1201.

5 Mark, H.F.L., E1-Sherbeini, M., Goldberg, M., Suri, P.K., Sturley, S.L., Bostian, K.A. and Knopf, P.M. (1991) Sch i s to soma mansoni: two-dimensional gel electrophoretic analysis of antigens uniquely immuno- reactive with protective rat serum. Exp. Parasitol. 72, 294-305.

6 Karcz, S.R., Barnard, B.J. and Podesta, R.B. (1988) Biochemical properties of a 24-kilodalton membrane glycoprotein antigen complex from S c h i s t o s o m a mansoni. Mol. Biochem. Parasitol. 31, 163-172.

7 Rogers, M.V., Davern, K.M., Smyth, J.A. and Mitchell, G.F. (1988) Immunoblotting analysis of the major integral membrane protein antigens of Schistosoma mansoni. Mol. Biochem. Parasitol. 29, 77-88.

8 Smithers, S.R., Hackett, F., Omer Ali, P. and Simpson, A.J.G. (1989) Protective immunization of mice against Schistosoma mansoni with purified adult worm surface membranes. Parasite lmmunol. 11,301-318.

9 Bobek, L.A., Rekosh, D.M. and LoVerde, P.T. (1987) Isolation and analysis of adult-female-specific genes from three species of human schistosoma parasites. UCLA Symp. Mol. Cell. Biol. New Series 60, 149 158.

10 Smithers, S.R. and Terry, R.J. (1969) Immunity in Schistosomiasis. Ann. NY Acad. Sci. 160, 826-840.

11 Ausubel, F.M., Brent, R., Kington, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. and Struhl, K. (1987) Current Protocols in Molecular Biology. Wiley, New York.

12 Southern, E.M. (1975) Detection of specific sequences among DNA fragments separated by gel electrophor- esis. J. Mol. Biol. 98, 503 517.

13 Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463-5467.

14 Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982) Molecular Cloning. A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.

15 Knopf, P.M., Nutman, T.B. and Reasoner. (1977) Schistosoma mansoni: resistance to reinfection in the rat. Exp. Parasitol. 41, 74-82.

16 Suri, P.K. (1989) Use of recombinant DNA technology to identify candidate vaccine protein antigens of Schistosoma mansoni. Ph.D. Thesis, Brown University.

17 MacKnight, S.L. and Kingsbury, R. (1982) Transcrip- tional control signals of a eukaryotic protein-coding gene. Science 217, 316 324.

18 Jameson, B.A. and Wolf, H. (1988) The antigenic index: a novel algorithm for predicting antigenic determinants. Comp. Appl. Biosci. 4, 181 186.

19 Dynan, W.S. and Tjan, R. (1983) The promoter-specific transcription factor SP1 binds to upstream sequences in the SV40 early promoter. Cell 35, 79 87.

20 Chodosh, L.A., Baldwin, A.S., Carthew, R.W. and Sharp, P. (1988) Human CCAAT-binding proteins have heterologous subunits. Cell 53, 11 24.

21 Landschulz, W.H., Johnson, P.F., Adashi, E.Y., Graves, B.J. and MacKnight, S.L. (1988) Isolation of

a recombinant copy of the gene encoding C/EBP. Gene Dev. 2, 786-800.

22 McDevitt, M.A., Hart, R.P., Wong, W.W. and Nevins, J.R. (1986) Sequences capable of restoring poly(A) site function define two distinct downstream elements. EMBO J. 5, 2907.

23 Henikoff, S., Kelly, J.D. and Cohen, E.H. (1983) Transcription terminates in yeast distal to a control sequence. Cell 33, 607.

24 Kruys, V., Marinx, O., Shaw, G., Deschams, J. and Huez, G. (1989) Translational blockade imposed by cytokine-derived UA-rich sequences. Science 245, 852- 855.

25 Simurda, M.C., vanKeulen, H., Rekosh, D.M. and LoVerde, P.Y. (1988) Schistosoma mansoni: identifica- tion and analysis of an mRNA and a gene encoding superoxide dismutase (Cu/Zn). Exp. Parasitol. 67, 73- 84.

26 Davis, R.E., Davis, A.H., Carroll, S.M., Rajkovic, A. and Rottman, F.M. (1988) Tandemly repeated exons encode 81-base repeats in multiple, developmentally regulated Schistosoma mansoni transcripts. Mol. Cell. Biol. 8, 4745-4755.

27 Craig, S.P., Muralidhar, M.G., McKerrow, J.H. and Wang, C.C. (1989) Evidence for a class of very small introns in the gene for hypoxanthine-guanine phospho- ribosyltransferase in Schistosoma mansoni. Nucleic Acids Res. 17, 1635-1647.

28 Ram, D., Grossman, Z., Markovics, A., Aviv, A., Ziv, E., Lantner, F. and Schechter, I. (1989) Rapid changes in the expression of a gene encoding a calcium-binding protein in Schistosoma mansoni. Mol. Biochem. Para- sitol. 34, 167-176.

29 Kyte, J. and Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105-132.

30 Magee, T. and Hanley, M. (1988) Sticky fingers and CAAX boxes. Nature 335, 114-115.

31 Grand, R.J.A. (1989) Acylation of viral and eukaryotic proteins. Biochem. J. 258, 625-638.

32 Landschulz, W.H., Johnson, P. and McNigt, S.L. (1988) The Leueine Zipper: A hypothetical structure common to a new class of DNA binding proteins. Science 240, 1759-1764.

33 Goudot-Crozel, V., Caillol, D., Djabali, M. and Dessein, A.J. (1989) The major parasite surface antigen associated with human resistance to schistosomiasis is a 37-kD glyceraldehyde-3P-dehydrogenase. J. Exp. Med. 170, 2065-2080.

34 Klinkert, M-Q, Felleisen, R., Link, G., Ruppel, A. and Beck, E. (1989) Primary structures of Sm31/32 diagnostic proteins of Schistosoma mansoni and their identification as proteases. Mol. Biochem. Parasitol. 33, 113-122.

35 Stein, L.D. and David, J.R. (1986) Cloning of a developmentally regulated tegument antigen of Schisto- soma mansoni. Mol. Biochem. Parasitol. 20, 253-264.

36 Xu, H., Miller, S., vanKeulen, H., Wawrzynski, M.R., Rekosh, D.M. and LoVerde, P.T. (1989) Schistosoma

97

mansoni tropomyosin: cDNA characterization, se- quence, expression and gene product localization. Exp. Parasitol. 69, 373-392.

37 Zaret, K.S. and Sherman, F. (1982) DNA sequence required for efficient transcription termination in yeast. Cell 28, 563-573.

38 Chen, W. and Struhl, K. (1988) Saturation mutagenesis of a yeast his3 'TATA element': genetic evidence for a specific TATA-binding protein. Proc. Natl. Acad. Sci. USA 85, 2691-2695.

39 Simon, M.C., Fisch, T.M., Benecke, B.J., Nevins, J.R. and Heintz, N. (1988) Definition of multiple, function- ally distinct TATA elements, one of which is a target in the hsp70 promoter for EIA regulation. Cell 52, 723- 729.

40 Wefald, F.C., Devlin, B.H. and Williams, R.S. (1990) Functional heterogeneity of mammalian TATA-box sequences revealed by interaction with a cell-specific enhancer. Nature 344, 260 262.

41 Wen-Zhuo, L. and Sherman, F. (1991) Two types of TATA elements for the CYCI gene of the yeast Saccharomyces cerevisiae. Mol. Cell. Biol. 11,666-676.

42 Hultmark, D., Klemenz, R. and Gehring, W.J. (1986) Translational and transcriptional control elements in the untranslated leader of heat-shock gene hsp22. Cell 44, 429.

43 Bobek, L., Rekosh, D.M. and LoVerde, P.T. (1988) Small gene family encoding an eggshell (chorion) protein of the human genomic parasite Schistosoma mansoni. Mol. Cell. Biol. 8, 3008-3016.

44 Scallon, B.J., Bogitsh, B.J. and Carter, C.E. (1989) Characterization of a large gene family in Schistosoma japonicum that encodes an immunogenic miracidial antigen. Mol. Biochem. Parasitol. 33, 105-112.

45 Kozak, M. (1989) The Scanning model for Translation: An Update. J. Cell Biol. 108, 229-241.

46 Blasi, U., Chang, C-Y., Zagott, M.T., Nam, K. and Young, R. (1990) The lethal 1S gene encodes in own inhibitor. EMBO J. 9, 981-989.

47 Schibler, U. and Sierra, F. (1987) Alternate promoters in developmental gene expression. Annu. Rev. Genet. 21,237 257.

48 Chretien, S., Dubart, A., Beaupain, D., Raich, N., Grandchamp, B., Rosa, J., Goosens, M. and Romeo, P- H. (1988) Alternative transcription and splicing of the human porphobilinogen deaminase gene result either in tissue-specific or housekeeping expression. Proc. Natl. Acad. Sci. USA 85, 6-10.

49 Matsuguchi, T., Takahashi, K., Ikejiri, K., Ueno, T., Endo, H. and Yamamoto, M. (1990) Functional analysis of multiple promoters of the rat insulin-like growth factor II gene. Biochim. Biophys. Acta 1048, 165.

50 Perlino, E., Cortese, R. and Ciliberto, G. (1987) The human ctl-antitrypsin gene is transcribed from two different promoters in macrophages and hepatocytes. EMBO J. 6, 2767-2771.

51 Benyajati, C., Spoerel, N., Haymerle, H. and Ashbur- ner, M. (1983) The messenger RNA for alcohol

98

dehydrogenase in Drosophila melanogaster differs in its 5' end in different developmental states. Cell 33, 125- 133.

52 Shaw, P.H., Sordat, B. and Schibler, U. (1985) The two promoters of the mouse alpha-amylase gene Amy- 1 are differentially activated during parotid gland differentia- tion. Cell 40, 907-912.

53 Leff, S.E. and Rosenfled, M.G. (1986) Complex

transcriptional units: diversity by alternative RNA processing. Annu. Rev. Biochem. 55, 1091 1117.

54 Kim, P., Rotkowski, R. and O'Shera, E.K. (1989) Evidence that the leucine zipper is a coiled coil. Science 243, 538-542.

55 Simpson, A.J.G. (1990) Schistosome surface antigens: developmental expression and immunological function. Parasitol. Today 6, 40-45.