The ILtat 1.4 surface antigen gene family of Trypanosma brucei

15
Volume 10 Number 21 1982 Nucleic Acids Research The ILtat 1.4 surface antigen gene family of Trypanosoma brucei John E.Donelson*+, John R.Young, David Dorfman+, Phelix A.O.Majiwa* and Richard O.Williams *International Laboratory for Research on Animal Diseases (I.L.R.A.D.), P.O. Box 30709, Nairobi, Kenya and +Department of Biochemistry, University of Iowa, Iowa City, IA 52242, USA Received 31 August 1982; Accepted 11 October 1982 ABSTRACT The cDNA sequence for the variable surface glycoprotein (VSG) expressed in Trypanosoma brucei clone ILtat 1.4 (called clone D for brevity) hybridizes strongly to three regions in trypanosome genomic DNA. These three regions were extensively characterized by Southern hybridization analyses, genomic DNA cloning and DNA sequence determinations. All three regions occur in the genomes of all trypanosome clones of the ILTAR 1 repertoire regardless of whether or not VSG D was being expressed. Extensive (clone dependent) DNA rearrangements and a (clone independent) double strand DNA break were found distal to the 3'-end of the VSG D coding sequence of one of the regions. VSG D mRNA is most likely synthesized from this region, but a recombinant DNA clone of the VSG coding sequence could not be obtained for confirmation. Recombinant clones of the other two regions were obtained. DNA sequence analyses revealed that their coding sequences differ from each other by 17%. They differ from the ILtat 1.4 cDNA sequence by 4% in one case, and 13% in the other. By analogy with another VSG gene system, one of these two regions may have originally given rise to the third region from which the mRNA is probably transcribed. INTRODUCTION Haemoparasitic African trypanosomes reside in the bloodstream of mammalian hosts and evade destruction by the the hosts' immune response by random expression of a single member of a large repertoire of antigenically distinct surface glycoproteins (variable surface glycoproteins or VSGs). Most investigations of the molecular phenomenon responsible for the "switch" from the expression of one VSG to another have emphasized the genomic DNA rearrangements associated with the switch (for reviews, see 1-4). Previous work from several laboratories has demonstrated that the VSGs consist of single polypeptide chains of about 500 amino acids, which can be classified into one of two groups based on extensive amino acid homologies within the C-terminal 120 amino acids (5). There are a minimum of 100 different VSG genes in the trypanosome genome (6) and may be an order of magnitude more. Southern hybridization analyses have shown that at least C I RL Press Umited, Oxford, England. 6581 0305-1048/82/1021-6581$ 2.00/0

Transcript of The ILtat 1.4 surface antigen gene family of Trypanosma brucei

Volume 10 Number 21 1982 Nucleic Acids Research

The ILtat 1.4 surface antigen gene family of Trypanosoma brucei

John E.Donelson*+, John R.Young, David Dorfman+, Phelix A.O.Majiwa* and Richard O.Williams

*International Laboratory for Research on Animal Diseases (I.L.R.A.D.), P.O. Box 30709, Nairobi,Kenya and +Department of Biochemistry, University of Iowa, Iowa City, IA 52242, USA

Received 31 August 1982; Accepted 11 October 1982

ABSTRACTThe cDNA sequence for the variable surface glycoprotein (VSG) expressed

in Trypanosoma brucei clone ILtat 1.4 (called clone D for brevity)hybridizes strongly to three regions in trypanosome genomic DNA. Thesethree regions were extensively characterized by Southern hybridizationanalyses, genomic DNA cloning and DNA sequence determinations. All threeregions occur in the genomes of all trypanosome clones of the ILTAR 1repertoire regardless of whether or not VSG D was being expressed.Extensive (clone dependent) DNA rearrangements and a (clone independent)double strand DNA break were found distal to the 3'-end of the VSG D codingsequence of one of the regions. VSG D mRNA is most likely synthesized fromthis region, but a recombinant DNA clone of the VSG coding sequence couldnot be obtained for confirmation. Recombinant clones of the other tworegions were obtained. DNA sequence analyses revealed that their codingsequences differ from each other by 17%. They differ from the ILtat 1.4cDNA sequence by 4% in one case, and 13% in the other. By analogy withanother VSG gene system, one of these two regions may have originally givenrise to the third region from which the mRNA is probably transcribed.

INTRODUCTION

Haemoparasitic African trypanosomes reside in the bloodstream of

mammalian hosts and evade destruction by the the hosts' immune response by

random expression of a single member of a large repertoire of antigenically

distinct surface glycoproteins (variable surface glycoproteins or VSGs).

Most investigations of the molecular phenomenon responsible for the "switch"

from the expression of one VSG to another have emphasized the genomic DNA

rearrangements associated with the switch (for reviews, see 1-4).

Previous work from several laboratories has demonstrated that the VSGs

consist of single polypeptide chains of about 500 amino acids, which can be

classified into one of two groups based on extensive amino acid homologies

within the C-terminal 120 amino acids (5). There are a minimum of 100

different VSG genes in the trypanosome genome (6) and may be an order of

magnitude more. Southern hybridization analyses have shown that at least

C I RL Press Umited, Oxford, England. 65810305-1048/82/1021-6581$ 2.00/0

Nucleic Acids Research

some of these genes are closely related (7, 8, 9) and that many genes have

similar, but not identical, 3'-terminal sequences which is consistent with

the observed amino acid homologies at the C-terminus. Two different genomicDNA recombination events have been observed with different VSG genes. Some

genes are expressed via gene duplication and translocation, i.e. an

"expression-linked extra copy" (ELC) mechanism (8). The extra gene copy

appears to be the one that is transcribed (10), and in one case it is not a

faithful copy of the original basic copy gene (11). Other VSG genes are not

expressed via the appearance of an ELC. In these cases, DNA rearrangementsoccur distal to the 3'-terminus of the genes (7, 10, 13). These 3'-distal

rearrangements do not necessarily correlate with the expression of the

genes. In many instances they occur regardless of whether or not the gene

is being expressed (13). The relationship of the two recombination

mechanisms, if any, remains to be elucidated, as does the preciserelationship of these rearrangements to the expression of particular VSG

genes.

In this communication we report the characterizion of the structure ofthree very closely related regions within the genomic DNA, one of whichcodes for the VSG expressed in the Trypanosoma brucei clone ILtat 1.4. The

expression of the ILTAT 1.4 VSG described in this paper is not accompanied

by the generation of an ELC. Clone Dl, expressing ILtat 1.4, is related to

the other clones used in these studies, which express different VSGs, as

shown below:

ILTAT 1.3

(Clone Cl)ILTAT 1.1 ILTAT 1.2

(

(Clone A) (Clone Bl)

ILTAT 1.4

(Clone Dl)

For convenience, these trypanosome clones and their four VSGs are called A,B, C and D in this paper instead of the formal nomenclature of ILtat 1.1,1.2, 1.3 and 1.4 respectively.

EXPERIMENTAL PROCEDURES

Trypanosomes: All trypanosome clones used were derived from a single

clone (Clone A) which originated from Trypanosome brucei stock 227 (14).

Trypanosome clones are identified by the letter indicating the VSG expressed

followed by a distinguishing number. For example, Dl, D4 and D8 are three

6582

Nucleic Acids Research

individual trypanosome clones, each expressing VSG D (ILtat 1.4) as

determined using monospecif ic antiserum raised against the VSG expressed in

Clone Dl. Clone Bi was derived from a relapse of Clone A in a rabbit and B2

was cloned from a population of Bl. All clones expressing VSGs C and D were

cloned from the first relapse populations in normal mice infected with Clone

Bl.

Populations of individual trypanosome clones were grown in lethally

irradiated (900 rads) rats or mice and shown by immunofluorescence to be at

least 99% homogeneous for expression of a single VSG (15).

DNA Techniques. Nuclear DNA from trypanosomes was isolated, and

kinetoplast DNA subsequently removed from the preparation as described

(16). Plasmid DNA and bacteriophage DNA were isolated as described (17,

18). Restriction enzyme digestions were performed as suggested by the

commercial suppliers (New England BioLabs and Bethesda Research Labs).

Individual fragments of recombinant phage or plasmid DNAs were isolated

after separation by agarose or acrylamide gel electrophoresis, as described

by Maxam and Gilbert (19). Plasmids and DNA fragments were labeled with32 by nick translation (20) to specific activities of 1-2 x 108 cpm/ug

DNA.

Southern filter hybridization of genomic restriction fragments (21).

Nuclear DNA (1.5 ug) was digested with at least 20-fold excess of

restriction enzyme and DNA fragments were separated by electrophoresis in

0.6 or 0.8% agarose gels in 40 mM Tris-acetate, 2 m EDTA, pH 8.2. The DNA

was partially depurinated by soaking the gel for 10-15 minutes in 0.2 M HlU,

denatured and cleaved by soaking in 0.5 M NaOH, 1 M NaCl for 30 minutes,

neutralized by soaking for 1 hour in 1 M Tris HCl, 3 M NaCl, pH 7.4, and

transferred from the gel to nitro-cellulose filters (Schleicher and Schuell)

in 20 x SSC. Hybridization was carried out using 10% dextran sulphate, as

described by Wahl et al. (22). After hybridization the filters were washed

twice in 0.1 x SSC, 0.1% SDS for an hour at 65°C, dried and exposed to Kodak

AR-5 film at -80°C with intensifying screens.

Construction and screening of genomic DNA libraries. Libraries of

trypanosome DNA fragments were made from partial EcoRI-digests inserted into

bacteriophage Charon 4A, and from partial Sau 3A digests inserted into

Charon 28, using methods essentially as described (23,24). The partial

EcoRI-libraries of Bl-DNA and Cl-DNA each contained about 2 x 10

individual phage clones. The partial EcoRI-library of Dl-DNA contained over

106 individual phage clones. The number of phage in the three partial

6583

Nucleic Acids Research

EcoRI-libraries was amplified before screening for VSG genes. The partial

Sau3A-library of clone A-DNA contained about 5 x 105 phage clones. Phage

containing VSG coding sequences were identified and plaque purified via

three successive screenings using the procedure of Benton and Davis (25).

DNA sequence Determinations. DNA fragments were radiolabeled at

recessed 3'- termini generated by the restriction enzymes, Eco RI, Sal I,

Bgl II and Hinf I, using the DNA repair reaction of Escherichia coli DNA

polymerase I (Boehringer Mannheim) and the appropriate [ot 2p] (3000

Ci/mmole from Amersham) as described (26). Protruding 3'-termini generated

by the restriction enzyme Kpn I were labeled with [o 32P] dideoxy ATP (3000

Ci/mmol) and terminal deoxynucleotidyl transferase (P-L Biochemicals) as

described (27). Fragments labeled at one terminus were subjected to the

modification and cleavage reactions of Maxam and Gilbert (19) and

electrophoresed through the thin sequencing gels (80 cm length) of Sanger

and Coulson (28).

RESULTS

Southern filter hybridization analysis. We have previously reported

the construction, identification and isolation of plasmid pcB Dl which

contained a partial cDNA for the VSG D expressed in trypanosome clone Dl

(13). The sequence of this cDNA was determined (5) and found to extend from

about midway through the coding region to about 54 codons before the

termination codon , a region of 687 nucleotides of the estimated 1800

nucleotides of the VSG mR1NA. Since this rather short cDNA does not contain

much of the sequence coding for that region of the VSG outside of the region

of C-terminal homologies, other recombinant plasmids in the same cDNA

library were screened to find one which contains more of the 5' (N-terminal)

coding sequences. Such a plasmid, designated pcB D2, was found which

contains a 350 base pair overlap with pcB Dl and about 550 additional base

pairs on the 5'-side. The relationship between pcB Dl and pcB D2, based on

restriction enzyme analysis, is shown in Figure 1. The sequence of pcB D2

has not yet been determined.

Mixtures of the two plasmids were nick-translated (20) with P and

used as probes in a series of Southern filter hybridizations. In addition,

the two plasmids were digested individually with Bgl II which cuts once

within the cDNA (and does not cut in pBR322) and a restriction enzyme which

cuts once in only pBR322 so that a 5'-coding sequence probe could be

isolated from pcB D2 and a 3'-coding sequence probe could be isolated from

6584

Nucleic Acids Research

0 400 800 1200 1600

H RS

pcB D2- 5' t 3

P *P K B P A H:S P

5 | | 3''pcB D 1

5' frag(from pcB. D.2) 4 3' frag (from pcB.D.l)

Figure 1. Abbreviated restriction maps of the two cDNA inserts of VSG Daligned to show their region of overlap. Numbers are in base pairs. P, H,S, K, B and A refer to cleavage sites of Pst I, Hinc II, Sal I, Kpn I, BglII and Ava I respectively. The symbol* shows the location of an Eco RI sitein two of the corresponding genomic DNA coding regions (figure 3) which isnot present in the cDNA (see text). The 5'- and 3'-fragments were isolatedfor use as probes in the filter hybridizations shown in Figure 2. The smallboxes at the ends of the cDNAs indicate the poly dG: poly dC boundaries.

pcB Dl (see bottom of Fig. 1). These 5'-and 3'-terminal probes were also

made radioactive by nick-translation in vitro.

Figure 2 shows the results of two such Southern hybridization

experiments using the probes for the different parts of the coding

sequences. Figure 2a shows the result when the trypanosome genome is

digested with Bgl II which (as indicated above) cuts the cDNA inserts once.

When the cDNA mixture (containing about 80% of the coding sequence) is used

as the probe (the first six lanes), it hybridizes to six genomic Bgl II

fragments ranging in size from 20 kb to 1.3 kb. Since the cDNA contains a

Bgl II site, this suggests that three different genomic regions contain

sequences complementary to the cDNA. The sizes of five of the fragments is

independent of the trypanosome clone from which the genomic DNA was isolated

(B2, Cl, Dl, etc.). The remaining fragment varies with each genomic DNA

tested. For example, trypanosome clones B2 and Bl are both expressing VSG B

but the variable fragment to which the cDNA hybridizes is different in the

two clones. Likewise trypanosome clones Dl, D4 and D8 are all expressing

VSG D (as determined immunologically), yet the variable band not the same

size in all these clones. When just the 5'-terminal sequence is used as a

probe (the lane labeled 5'), it hybridizes to three of the constant

fragments. The 3'-terminal sequence probe (the lane labeled 3') hybridizes

to the other two constant fragments and the variable one. This suggests

that the variability occurs on the 3'-side of one of the three genomic

regions to which the cDNA hybridizes while no changes occur within the

immediate vicinity of the other two regions.

6585

Nucleic Acids Research

a bProbe (53 ) 1 5 3' (5'+3)-" -- 5 3"

62 Cl Bt Dl D4DBD8D8 B2 Cl Dl D8 D8 D8

-20- _ _

10- 4Eu -10-~~~~~~~~~~~~~~~~~~1-

33 - -_~-__Mam

3-3-4mm-31

13- _m _ u _a_

Figure 2. Autoradiograms of Southern filter hybridizations of (a) BglII-digested nuclear DNAs and (b) Eco RI-digested nuclear DNAs using asprobes either a mixture of the two cDNAs (the first six lanes of part a andthe first four lanes of part b) or the 5'- and 3' -fragments (the last twolanes respectively of parts a and b). The trypanosomes clones from whichthe nuclear DNAs were isolated are indicated across the top, i.e., B2, Cl,D8, etc. The approximate sizes in kb of the fragments to which the cDNAhybridizes is shown at the side.

Figure 2b shows a similar experiment in which the genomic DNAs are

digested with Eco RI. The cDNA sequences do not contain an Eco RI site. In

this case the total cDNA probe (the first four lanes) hybridizes to a

minimum of five different fragments. None of the fragments vary in size

with the different genomes but one of the fragments is about 35 kb in

length, beyond the effective resolution limit of the agarose gel so that

variations of several kb in this fragment would not be detected. The

5'-terminal probe hybridizes to all five EcoRI fragments (the lane labeled5' in Fig. 2b) while the 3'-terminal probe hybridizes to only three, the

8.5, 10.0 and 35 kb fragments (the lane labeled 3'). It does not hybridizeto the 3.1 and 11.5 kb fragments,indicating that they contain only

5r-terminal sequences. Since Eco RI does not cut the cDNA, these resultswere initially confusing to interpret and appeared to suggest either (i)

more than three hybridizable regions in the genome, (ii) the occurrence ofintrons or (iii) sequence heterogeneity in the different coding regions.

Subsequent Southern experiments using other restriction enzymes and DNA

6586

Nucleic Acids Research

sequence determinations revealed that the last interpretation, sequence

heterogeneity, was correct. Two of the three genomic coding regions contain

an Eco RI site. The third region, like the cDNA, does not. This suggests,

but does not prove, that this third region is the one from which the mRNA

arose.

Southern hybridization experiments allowed the construction of

restriction enzyme maps of the three regions as described elsewhere (30).

These maps are reproduced in figure 3 with the addition of EcoR I sites

determined from analysis of cloned genomic DNA (see below). The large arrow

at the right of region III indicates a site, apparently cut by many

restriction enzymes, which is probably a natural double strand break in the

trypanosme genome, not a specific site for any of the enzymes (30). The 'V'

GENOMIC -16 -12 -8 -4 0 +4 +8 +12REGION *I*

R RR R A R RB RB A R R

I II I II V I

II

ZE

hR KH K Ss s X170.

k 187m

2.9 X 19., 30.

2.2 J t 2.4 2.8 1.9 3.1 10.0r.71 8.5 2

4.0 A26-,19 A72_A>183

' A 17

R B A RBA R B R R AL~~~~~~~L(

S S

R A B BA

S HV

HKSR

Figure 3. Summary of the three genomic regions, I, II and III, to which VSGD cDNA hybridizes as determined by Southern filter hybridizations andanalysis of trypanosome nuclear DNA fragments cloned in bacteriophageCharon 4A. The thick lines indicate the three genomic regions and the solidrectangles show where the cDNA hybridizes. R, A, B, H, S and K are cleavagesites for Eco RI, Ava I, Bgl II, Hinc II, Sal I and Kpn I respectively.Numbers are units of kb. Thin lines indicate the different trypanosomenuclear DNA fragments found in the partial Eco RI-libraries that werescreened. The subscript 'ex" means that the DNA originated from atrypanosome clone expressing VSG D while the subscript "non-exo means theDNA was from a trypanosome clone not expressing VSG D. For example, 30 isfrom a library of Bl DNA or Cl DNA. The letter V refers to the variablesegment of genomic region III as indicated by Southern filter hybridizationsand discussed in the text.

6587

Nucleic Acids Research

indicates that the distance between this site and the gene is variable in

different trypanosome clones. This variation accounts for the changes in

the 3' Bgl II fragment shown in figure 2. The remaining Bgl II fragments

can be identified from the map. They are unaltered in different clones,

whether ILtat 1.4 is expressed or not. Thus the double stranded break, and

the length variation associated with it, affect only region III, probably

the region that is transcribed in clone Dl.

Cloned genomic DNA analysis. The genomic regions homologous to VSG

cDNA were further investigated by identifying bacteriophage Charon 4A

clones in partial EcoRI-libraries of trypanosome DNA that contained

sequences homologous with pcB-Dl. Libraries of nuclear DNA from trypanosome

Bl, Cl and Dl were constructed and initial screenings revealed many plaques

in each library to which pcB-Dl hybridized. Phage was prepared from 20-25

plaques of each of the three libraries and DNA was isolated for restriction

enzyme digestions and hybridization analysis. Figure 3 summarizes the

information obtained from these experiments.

Two distinct genomic regions were found to contain DNA homologous to

the cDNA of pcB-Dl. Region I contains adjacent 10.0 kb and 3.1 kb EcoRI

fragments to which pcB-Dl hybridizes and region II has an 8.5 kb fragment to

which pcB-Dl hybridizes. These three fragments co-migrate with the

corresponding EcoRI fragments shown in the genomic filter hybridizations of

Figure 2b. By inspecting DNAs from all three libraries it is clear that, at

the gross level, these two genomic regions remain the same whether or not

VSG-D is being expressed, i.e., the three EcoRI fragments of 175eX from

the "expresser" library are also present in clones of two "non-expresser"

libraries. Likewise the other genomic region is present in libraries of

both Dl "expresser" DNA and Cl or Bl "non-expresser" DNA.

The positions of the DNA segments in the two genomic regions that are

homologous to Dl cDNA were identified (i) by formation of RNA:DNA

heteroduplexes and measurement on electron micrographs of the resultant

R-loops relative to the ends of the DNA and (ii) by restriction enzyme

analysis. The R-loop analysis (not shown) was the less accurate of the two

but established the approximate locations and demonstrated the absence of

introns large enough to be seen by the R-loop methods, i.e. greater than

about 100 base pairs. To facilitate the restriction enzyme analysis, the

8.5, 10.0 and 3.1 kb fragments from both "expresser" and "non-expresser"

DNAs were first subcloned at the EcoRI-site of pBR322. Multiple digestions

of recombinant DNAs containing these subcloned fragments confirmed the

6588

Nucleic Acids Research

enzyme site locations as indicated in Figure 3. For example, Dl cDNA

contains a Kpn I site (see Figure 3) and the 8.5 kb and 10.0 kb fragments

each have only one Kpn I site - in each case about 300 base pairs from an

Eco RI site. The 3.1 kb fragment does not have a Kpn I site. Again

comparison of the restriction enzyme digestion patterns of the subcloned

fragments from "expresser" DNA and "non-expresser" DNA did not reveal any

differences.

The two genomic regions were also analyzed by DNA:DNA heteroduplex

formation of recombinant DNAs containing each of the two regions. Figure 4

shows one such example. The heteroduplex is between the DNAs of 170ex and

183eX (see Figure 3) and shows one homologous region which on the basis of

measurements of ten such heteroduplex molecules is 1.0 kb. The location of

this homologous region corresponds to the position at which the cDNA

hybridizes to each of the two phage DNAs.

Formation of various other combinations of DNA:DNA heteroduplexes

between the genomic clones did not reveal any additional homologies between

these genomic regions. In addition heteroduplexes were formed between DNAs

containing "expresser" DNA and "non-expresser" DNA of the same region - for

A~~~~~~

B A170

A183

Figure 4. Electron micrograph (A) and corresponding diagram (B) of aheteroduplex between the DNAs of 170ex and 183ex (see Fig. 3) showingthe region containing the trypanosome nuclear DNA inserts. The 20 kb arm ison the left and the 12 kb arm is on the right. The single stranded circleon the left side of the micrograph is o174 DNA used as an internal lengthstandard.

6589

Nucleic Acids Research

example, between 187ex and 30non-ex and between 183ex and

27non-ex' In these cases no difference was observed between the DNA from

trypanosomes expressing surface antigen D and those which do not (notshown). Again, the resolution limit is about 100 base pairs; a region of

non-homology or only partial homology less than this length would not have

been detected.

A major flaw in the analysis of these genomic clones is that no clone

was obtained which contains the third genomic region to which the cDNA

hybridizes, i.e., genomic region III shown in Figure 3. This is clearly

because the 35 kb Eco RI fragment containing region III is too large to be

cloned in Charon 4A which accepts a maximum insert size of about 22 kb

(23). As a result, the Charon 28-partial Sau 3A library of DNA from

trypanosome clone A was also screened for sequences homologous to D cDNA.

Several such clones were identified and characterized but their DNA inserts

all originated from either genomic region I or II. Due to technical

difficulties encountered in working with individual clones of the Charon 28library, additional clones of the partial Sau 3A library were notinvestigated. Furthermore, because of the lack of many restriction sites

within region III and the putative double-stranded break in the DNA

downstream from the coding sequence, it seemed likely that the partial Sau

3A library might not include this region either.

DNA Sequence Analysis. Portions of the coding sequences in the cloned

genomic regions I and II were determined to see how similar they were to

each other and to the cDNA. Recombinant plasmids containing the subcloned10.0 kb fragment (region I) and the 8.5 kb fragment (region II) were eachfound to contain a Kpn I and a Bgl II site 300-400 bp from one of the Eco RI

boundaries. It was (correctly) assumed that these sites were equivalent tothe same sites in the cDNA which are about 100 bp apart. Sequences in thisarea of both the 10.0 and 8.5 kb fragments were obtained from fragmentslabeled at the sites of Eco RI, Kpn I, Bgl II, Sal I and Hinf I (see Fig. 5legend). These two genomic sequences and the cDNA sequence in pcB Dl are

compared in Figure 5.

The genomic sequences begin within the region corresponding to thevariable region coding sequence of the cDNA. Within the 658 nucleotideswhich the two genomic regions have in common, about 17% of the positions aredifferent (119/658). Likewise the two genomic regions are different fromthe cDNA. Region I and the cDNA differ at about 4% of positions (27/770)while region II and the cDNA are different at 13% of positions (63/478).

6590

Nucleic Acids Research

I TTC CTA GAC BAA O A B6T BBT TGC CT A BCATG6 ¢ CM

'I BBCABCBCABAA GA TT66_____GGT_GGTTGC CTA 6CAG__6_AG

G GCA GGC TGC ABA W GU CCA GTA TAT ACB GBA T ABC AM ACA

1I O";CG GCA GGC TGC AG AA G2CA CCA GTA TAT ACG GCA GS1M ACC SC CTA G8AA m AGC AAA ACA 666 TTC

cON AM GCA GGA GGA GAG BBC BAT GCA AAA BAC BBA6 ACG 6CC AAA T6T ATT TTB TTC AAM BC 6Cr 666 GCA 6CC BC GCT BCA C6TI AMA BCBGGA BG ATABBC GAT BC AMA GAC 6CC I BAl6CC AM TBT ATT TTB TTC AA GCA GCT G6tXA GC G

II AGt AAC 6 CC AAA TGT ATT TTG TTC

cDN4A TTC TAC CAB CAC CM ACC MG BT CAC CTT BGC 666 TAC CTG BAA ATA ACA TCA GGA GCA G6C AGA ACG ACG CTA GAA CT6 AAA AMCI TTC TAC CABCAC CM ACCAM6A CAC CTT GGC GGG TAC CTG GM ATA ACA TCA 66A GCA GGC AA ACG C MAAC

II TTC TAC CAB CAC CM GIT CAC CTT GGC 666 TAC CTB6M ATA A&ITC GGA GCA

cDKA CTC AAC BAC ATC GCA CAM GACBGT BTA CAC AAM ABC 666 CAM CTA TT6 GGA GAS ATC TAC ACA CCG CTU GCA ACA TTA AAC AGT BAA BACI CTCMG&C ATC GCA CAB BAC G6T GTA CAC MAGC CTA UB6A BAG ATC TAC CCG CTT GCA ACA_ AMCMT GA

II C -Cr ATC M ACTBTACAC ABC GGABAGATCTAC CC6CCTCAACA AAACABTG

cONA ACA ACA BAA ATT AM ACC ACA BAT GAA ABC ATA ATA GA6C6CA6U 6CT 6CC ABC ACA TTA GAG 6CC 6CC UTT CAGGS 6CT CTT AMI ACA ACA GM ATT AM ACC ACA BAT BAA ABC ATA A GAMA6C 6Cd GCA BCT 6CC ABC ACA TTA B:JBCC 6CC UTT CAM BAG GCT CTTCA

II hACA ACA CM ATT MA ACC ACA BAT 6A A C AMA AMC GCTGCA GCT BCC AMC ACA TTA BA6CC BCC UTT CAB BAG CT CTT AM

cDNA CTG GCA MT CCC M CTG AM BAA GAM GCA GCA BAC ATC ATC MB BAGm GTG BBA ABC BAA AAT ACA AAI CTG GCA AATC AAGMC64M CTG AAM GAA GAA GCA GCA GAC ATC ATC AAG GAGm GTG GGA AGC GAA MTIAMAl6

II _B

cDKA TCC AYAG 6CT TGG GAAAM CTA AAG TCG ACA AMGTGM6 GGC ACA GAG GCG AM CCC GAA ACA AAMAA GAG CTA AAA AC ATTCI TCC AAG 6LT T66 GAA A* CTA A2AG TCG ACA AAM GTG AAG GGC ACA GAG GCG AAM CCC GAA ACA GAA AAA GAG CTA AAA GAC AT CILII TCCMa TBGGG CTAAMTCG

cONA AAC GCT MA CTG GTG TCG GCA CTA MT TAC TAC ATC AGC AT GCT MAA TCT M6 CTACAA6M GCG GAM ACA AM CTA GCA 6I MC GCT MA CT6 6TG TCG BCA CTA MT TAC TAC ATC ABC AMT GCT BAA TCT M6 CTA CAB BAA 6CC BAG ABA ACTA GCA

cONA GCT WCA GCT AMMA GTG CCA ACA GCG CCT AM CCA GAT GM TGC AM GCT AMAAAG GG 6ACC TGC MAM AT GGA TGC AM TGG GATI BCT BCA BCT MA AM GTG CCA ACA GCG CCT AMA CCA BAT GMAT6C AMA CT MAM 666 BAC ACC T6C MA BAT GBA TBC AM T66 BAT

cDKA ABC GAC GGT BAA AMAAAAA GTA 6T6 GAC CCG MT TAC ACA AAMMI AGC BAC GGT AAMC MAMAAA 6TA GTG GAC CCMMT TAC ACA MAMA CAG GTA UTT GAA CA GCA GCC MAA GTT AT AM ACAMC

I ABC ACA GBA ABC MT CTTC AU

Figure 5. A comparison of the sequences of the cDNA and the correspondingcoding segments of genomic regions I and II. The genomic sequences weredetermined from the subcloned 10.0 and 8.5 kb fragments (regions I and IIrespectively) labeled at the Eco RI site (position 1), the Kpn I site(position 310), the Bgl II site (position 425), a Sal I site (position 656)and a Hinf I site (position 767). Sequences displayed within the lines areidentical. Locations at which the sequences differ are not included withinthe lines. The vertical bar at position 1008 indicates the predicted startof the codons specifying the C-terminal hydrophobic tail of the VSG (basedon analogy with other VSG amino acid sequeces, see reference 5).

Within the area in which all three sequences can be compared (478

nucleotides), the three are different at only two locations, positions 216

and 219. There is no apparent pattern to these differences although region

I is clearly more similar to the cDNA (and presumably region III) than is

region II.

Figure 6 shows the corresponding amino acid sequences predicted from

the nucleotide sequences. Clearly large portions of two genomic regions

code for the same amino acids as the cDNA although there are sufficient

6591

Nucleic Acids Research

9 kb F L D E|;LIDL GTE T6 G-C L A E A S AGNV A N R E V A A A G C R KI| P V Y T A T E NE T G|E F S K T G FIQ8 kb KAQID G TID A G C L AEDASADGI V D H K I T|A A G C R K P V Y T A T E D S A E F S K T 6 P

CONA K GEGD|_DGET A K C I L F K A A G A A G A A R F Y Q H Q T K V N H L G G Y L E I T S G A G R T T L E L K N9 kb K A G G V G D A K D A A K C I L F K A A G A A G A A R F Y Q H Q T K V T H L G6 Y L E I T S G AG R T T L E L K N8 kb DPTA KCINLFA H L G6 Y L E I T S 6A6

CDNA L N D I A Q D G V H K S G Q L L G E I Y T P L A T L N S E D T T E I K T T D E S I IIR A A A A S T L E A A V Q E A L K9 kb L N D I A Q D G V H K S G Q L L G E I Y T P L A T L N S E D T T E I K T T D E S I I K S A A A A S T L E A A V Q E A L K9kb LNDIIAQIGV HKTnS GQIL TP LATLSElDTT EI KTTD E SIISAAAASTLE A A VQE A LK

9 kb L A N AD G| Q E K L K E E A A D I I K E F V G SEA K G S K A W E K L K S T K V K G T E A K P E T E K E L K D I TY

_______ _ _ _ _ _ _ _ _ _ ~~~~~~~~~~~~~~~44 4cDA NAKLVSALNYYISSAESKLQEAETKLAATKAAAEKV PT KPDECKAKKGDTCKDGCKWD9 kb N A K L V S A L N Y Y I S S A E S K L Q E A E T K L A A A K A A A E K V P T A P K P D E C K A K K G D T C K D G C K W D

4cDNA S D G E N K KC V V D P N Y T K K....9 kb S D G E N K K|S V V D P N Y T K K Q V V E A A A K V D K T N T T G S . . .

Figure 6. Amino acid sequences deduced from the corresponding nucleotidesequences shown in Figure 5. Lines are drawn around the sequences incommon. Arrows point to the cysteine residues within the VSG homologyregion which are conserved in all VSGs of that homology subset which havebeen analyzed (5). The vertical bar indicates the cleavage site of theC-terminal hydrophobic tail from the nascent VSG (based on analogy withother VSGs, see reference 5).

differences to eliminate the possibility that the mRNA (from which the cDNA

was constructed) arose from either genomic region I or II. The arrows in

Fig. 6 point to cysteine residues in the VSG homology region which appear,

on the basis of comparison with other VSG cDNA sequences (5), to be

invariant. In genomic region I, the codon for one of these invariant

cysteines is replaced by a serine codon. This may suggest, but does not

prove, that the coding sequence in genomic region I is a VSG pseudogene as

discussed below. The sequence of genomic region II was not determined in

this area.

DISCUSSION

Of the three genomic regions to which VSG D cDNA hybridizes, all of the

evidence suggests that region III is the one from which VSG D mRNA is

transcribed. The variable fragment occurs within this region and sequencesof the other two regions are not homologous with the cDNA. It is

unfortunate that extensive efforts to clone region III using recombinant DNA

techniques were unsuccessful. Since this region lies within a 35 kb or

greater Eco RI fragment, one end of which may not even be an Eco RI site, it

6592

Nucleic Acids Research

would be unlikely to appear in the partial Eco RI libraries that were

screened. Less clear is why it was also not detected in the partial Sau 3A

library. Two possibilities are (i) an additional lack of Sau 3A sites in

this region or (ii) instability of the DNA sequences in the cloning

vector. Neither possibility was investigated in detail.

If the cDNA can be taken as an index of region III, then a comparison

of the sequences in Figure 5 reveals that region I is much more similar to

the expressed region III than is region II (4% differences vs. 13%

differences). Furthermore, the two sequences are even more similar within

the area that codes for the C-terminal 120 amino acid homology region. Only

4 differences occur in the last 390 nucleotides of the comparison between

the cDNA and region I (1% difference). This pattern of nucleotide changes,

i.e., more differences in the variable region than in the homologous region,is very similar to the pattern observed previously in comparing the sequence

of the VSG A (ILtat 1.1) basic copy gene and its expression-linked extra

copy (ELC) (11). In this case the two VSG A sequences were found to posses

9% difference in the variable coding region and no difference in the

homology coding region (excluding the C-terminal hydrophobic tail codons

which may be involved in recombination that generates the ELC). This

anology between the VSG A basic copy and ELC and the VSG D gene family might

suggest, but does not prove, that region III was originally generated as an

ELC of region I but was not subsequently lost in the switch to expression of

another VSG. This would imply that region III, as a "retained" ELC in the

genome, present in clone Bl, was successfully re-expressed in clone Dl.

Thus the region III present in clone Bl is not merely an insignificant

remnant of a pre-existing ELC. The 3' distal rearrangements adjacent to the

region III gene may be an important pre-requisite for the chance

re-expression of such a "retained" ELC. Although little additional evidence

is available for such a model, it is an attractive possibility because it

relates to the two general, and seemingly dissimilar, molecular phenomena

associated with VSG expression, i.e. the occurrence of an ELC or the

presence of 3V-distal rearrangements of pre-existing VSG genes.

Furthermore, various biological tests of this molecular model in the

laboratory may be possible.

An argument against the model, based on the sequence comparison in Fig.

5, is that one of the 4 changes in the last 390 nucleotides of the region I

and cDNA comparison affects a cysteine residue which appears to be invariant

in VSGs of that homology subgroup (see bottom arrow in Figure 6). This may

6593

Nucleic Acids Research

suggest that the region I gene which codes for a serine in this position, is

a pseudogene which cannot give rise to a functional VSG. Alternatively,

this single nucleotide change could be a mutation in region I which occurred

after region III was generated as an ELC of the region I basic copy. This

possibility must, however, remain speculative.

Regions I and II differ from each other at 17% of the nucleotide

positions, more than either differs from the cDNA. There is no apparent

pattern to the differences with respect to either location of purines vs.

pyrimidines. Single nucleotide changes predominate although there are a few

regions of 2-6 continuous nucleotides that differ. No deletions or

insertions were detected and no evidence for introns within the coding

sequence was obtained. Careful attention was given during the

characterization of the genomic clones to the possibility that other VSG

genes, containing a detectable C-terminal homology region, might be nearby.

However, no recombinant genomic clone was obtained which appeared to possess

two distinct VSG coding sequences as detected by hybridization.

The occurrence of a double stranded break in the DNA down-stream from

an expressed VSG gene has been recently reported by others (29) and has been

observed by us with other VSG genes as well (30). Since it has recently

been shown that a double stranded DNA break is an intermediate in the

rearrangements associated with yeast mating type switches (J. Hicks,

personal communication), it seems likely that such a double strand break may

also be involved in VSG switches. Experiments are underway to further

investigate the location and function of this double strand break.

REFERENCES

1. Marcu, K.B. and Williams, R.O. (1981) In: Genetic Engineering (Setlow,J.K. and Hollaender, A., Eds.) Plenum, New York Vol. 3, 129-155.

2. Englund, P.T., Hajduck, S.J. and Marini, J. (1982 Annl. Rev. Biochem.51 (in press).

3. Turner, M.J. and Cordingley, J.S. (1981) In: Molecular and CellularAspects of Microbial Evolution (Carthe, Collins and Moseley, eds.)Cambridge University Press.

4. Borst, P., Frasch, A.C.C., Bernards, A., Van der Ploeg, L.H.T.,Hoeijmakers, J.H.J., Arnberg, A.C. and Cross, G.A.M. (1980) Cold SpringHarbor Symp. on Quant. Biol. 45 935-943.

5. Rice-Ficht, A.C., Chen, K.K. and Donelson, J.E. (1981) Nature 294, 53-57.6. Capern, A., Giroud, C., Baltz, T. and Mattern, P. (1977) Exp. Parasital

42, 6-13.7. Williams, R.0., Young, J.R., and Majiwa, P.A.0. (1979) Nature 282,

847-849.8. Hoeijmakers, J.H.J., Frasch, A.C.C., Bernards, A., Borst, P. and Cross,

G.A.M. (1980) Nature 284, 78-80.

6594

Nucleic Acids Research

9. Pays, E., Van Meirvenne, N., LeRay, D. and Steinert, M. (1981) Proc.Natl. Acad. Sci. USA 78, 2673-2677.

10. Pays, E., Lheureux, M. and steinert, M. (1981) Nature 292, 265-267.11. Rice-Ficht, A.C., Chen, K.K. and Donelson, J.E. (1982) Nature (in press).12. Pays, E., Lheureux, M. and Steinert, M. (1982) Nucleic Aclds Research

10, 3149-3163.13. Young, J.R., Donelson, J.E., Majiwa, P.A.0., Shapiro, S.Z. and Williams,

R.O. (1982) Nucleic Acids Research 10, 803-819.14. Doyle, J.J. (1977) In: Immunity to Blood Parasites in Animals and Man

(Miller, L., Pino, J. and McKelvy, eds.) Plenum Press, New York, pp27-63.

15. Doyle, J.J., Behin, R., Mauel, J. and Rowe, D.S. (1975) Anal. N.Y. Acad.Sci. 254, 315-325.

16. laureint, M., Van Assel, S. and Steinert, M. (1971) Biochem. Biophys.Res. Comm. 43, 278-284.

17. Citron, B.A., Feiss, M. and Donelson, J.E. (1979) Gene 6, 251-264.18. Robbins, J. Freyer, G., Haynes, J.R., Rosteck, P., Cleary, M.L., Kalter,

H.D., Smith, K., and Lingrel, J.B. (1979) J. Biol. Chem. 254, 6187-6195.19. Maxam, A.M. and Gilbert, W. (1977) Proc. Nat.Acad.T9T. UWX 74, 560-564.20. Rigby, P.W.J., Dieckmann, M., Rhodes, C. and Berg, P. (1979) J. Mol.

Biol. 113, 237-251.21. Southern, E.M. (1975) J. Mol. Biol. 98, 501-517.22. Wahl, G.M., Stern, M. and Stark, G.R. (1979). Proc. Natl. Acad. Sci. USA

76, 3683-3687.23. Blattner, F.R., Williams, B.G., Blechl, A.E., Denniston-Thompson, K.,

Faber, H .E., Furlong, L.-A., Grunwald, D.J., Kiefer, D.0., Moore, D.D.,Schumm, J.W., Sheldon, E.L. and Smithies, 0. (1977) Science 196, 161-169.

24. Maniatis, T., Hardison, R.C., Lacy, E., Lauer, J., O'Conneell, C., andQuon, D. (1978) Cell 15, 687-701.

25. Benton, W.D. and Davis, R.W. (1977) Science 196, 180-182.26. Nichols, B.P. and Donelson, J.E. (1978) J. Virol. 25, 429-434.27. Tu, C. -P.D. and Cohen, S.N. (1980) Gene10-,777-lw.28. Sanger, F. and Coulson, A.R. (1978) FEBS Lett. 87, 107-110.29. Pays, E., Lheureux, M. and Steinert, M. (1982) Nucleic Acids Research

10, 3149-3163.30. Williams, R.0., Young, J.R. and Majiwa, P.A.0. (1982) Nature (in press).

6595