Post on 10-Mar-2023
Volume 10 Number 21 1982 Nucleic Acids Research
The ILtat 1.4 surface antigen gene family of Trypanosoma brucei
John E.Donelson*+, John R.Young, David Dorfman+, Phelix A.O.Majiwa* and Richard O.Williams
*International Laboratory for Research on Animal Diseases (I.L.R.A.D.), P.O. Box 30709, Nairobi,Kenya and +Department of Biochemistry, University of Iowa, Iowa City, IA 52242, USA
Received 31 August 1982; Accepted 11 October 1982
ABSTRACTThe cDNA sequence for the variable surface glycoprotein (VSG) expressed
in Trypanosoma brucei clone ILtat 1.4 (called clone D for brevity)hybridizes strongly to three regions in trypanosome genomic DNA. Thesethree regions were extensively characterized by Southern hybridizationanalyses, genomic DNA cloning and DNA sequence determinations. All threeregions occur in the genomes of all trypanosome clones of the ILTAR 1repertoire regardless of whether or not VSG D was being expressed.Extensive (clone dependent) DNA rearrangements and a (clone independent)double strand DNA break were found distal to the 3'-end of the VSG D codingsequence of one of the regions. VSG D mRNA is most likely synthesized fromthis region, but a recombinant DNA clone of the VSG coding sequence couldnot be obtained for confirmation. Recombinant clones of the other tworegions were obtained. DNA sequence analyses revealed that their codingsequences differ from each other by 17%. They differ from the ILtat 1.4cDNA sequence by 4% in one case, and 13% in the other. By analogy withanother VSG gene system, one of these two regions may have originally givenrise to the third region from which the mRNA is probably transcribed.
INTRODUCTION
Haemoparasitic African trypanosomes reside in the bloodstream of
mammalian hosts and evade destruction by the the hosts' immune response by
random expression of a single member of a large repertoire of antigenically
distinct surface glycoproteins (variable surface glycoproteins or VSGs).
Most investigations of the molecular phenomenon responsible for the "switch"
from the expression of one VSG to another have emphasized the genomic DNA
rearrangements associated with the switch (for reviews, see 1-4).
Previous work from several laboratories has demonstrated that the VSGs
consist of single polypeptide chains of about 500 amino acids, which can be
classified into one of two groups based on extensive amino acid homologies
within the C-terminal 120 amino acids (5). There are a minimum of 100
different VSG genes in the trypanosome genome (6) and may be an order of
magnitude more. Southern hybridization analyses have shown that at least
C I RL Press Umited, Oxford, England. 65810305-1048/82/1021-6581$ 2.00/0
Nucleic Acids Research
some of these genes are closely related (7, 8, 9) and that many genes have
similar, but not identical, 3'-terminal sequences which is consistent with
the observed amino acid homologies at the C-terminus. Two different genomicDNA recombination events have been observed with different VSG genes. Some
genes are expressed via gene duplication and translocation, i.e. an
"expression-linked extra copy" (ELC) mechanism (8). The extra gene copy
appears to be the one that is transcribed (10), and in one case it is not a
faithful copy of the original basic copy gene (11). Other VSG genes are not
expressed via the appearance of an ELC. In these cases, DNA rearrangementsoccur distal to the 3'-terminus of the genes (7, 10, 13). These 3'-distal
rearrangements do not necessarily correlate with the expression of the
genes. In many instances they occur regardless of whether or not the gene
is being expressed (13). The relationship of the two recombination
mechanisms, if any, remains to be elucidated, as does the preciserelationship of these rearrangements to the expression of particular VSG
genes.
In this communication we report the characterizion of the structure ofthree very closely related regions within the genomic DNA, one of whichcodes for the VSG expressed in the Trypanosoma brucei clone ILtat 1.4. The
expression of the ILTAT 1.4 VSG described in this paper is not accompanied
by the generation of an ELC. Clone Dl, expressing ILtat 1.4, is related to
the other clones used in these studies, which express different VSGs, as
shown below:
ILTAT 1.3
(Clone Cl)ILTAT 1.1 ILTAT 1.2
(
(Clone A) (Clone Bl)
ILTAT 1.4
(Clone Dl)
For convenience, these trypanosome clones and their four VSGs are called A,B, C and D in this paper instead of the formal nomenclature of ILtat 1.1,1.2, 1.3 and 1.4 respectively.
EXPERIMENTAL PROCEDURES
Trypanosomes: All trypanosome clones used were derived from a single
clone (Clone A) which originated from Trypanosome brucei stock 227 (14).
Trypanosome clones are identified by the letter indicating the VSG expressed
followed by a distinguishing number. For example, Dl, D4 and D8 are three
6582
Nucleic Acids Research
individual trypanosome clones, each expressing VSG D (ILtat 1.4) as
determined using monospecif ic antiserum raised against the VSG expressed in
Clone Dl. Clone Bi was derived from a relapse of Clone A in a rabbit and B2
was cloned from a population of Bl. All clones expressing VSGs C and D were
cloned from the first relapse populations in normal mice infected with Clone
Bl.
Populations of individual trypanosome clones were grown in lethally
irradiated (900 rads) rats or mice and shown by immunofluorescence to be at
least 99% homogeneous for expression of a single VSG (15).
DNA Techniques. Nuclear DNA from trypanosomes was isolated, and
kinetoplast DNA subsequently removed from the preparation as described
(16). Plasmid DNA and bacteriophage DNA were isolated as described (17,
18). Restriction enzyme digestions were performed as suggested by the
commercial suppliers (New England BioLabs and Bethesda Research Labs).
Individual fragments of recombinant phage or plasmid DNAs were isolated
after separation by agarose or acrylamide gel electrophoresis, as described
by Maxam and Gilbert (19). Plasmids and DNA fragments were labeled with32 by nick translation (20) to specific activities of 1-2 x 108 cpm/ug
DNA.
Southern filter hybridization of genomic restriction fragments (21).
Nuclear DNA (1.5 ug) was digested with at least 20-fold excess of
restriction enzyme and DNA fragments were separated by electrophoresis in
0.6 or 0.8% agarose gels in 40 mM Tris-acetate, 2 m EDTA, pH 8.2. The DNA
was partially depurinated by soaking the gel for 10-15 minutes in 0.2 M HlU,
denatured and cleaved by soaking in 0.5 M NaOH, 1 M NaCl for 30 minutes,
neutralized by soaking for 1 hour in 1 M Tris HCl, 3 M NaCl, pH 7.4, and
transferred from the gel to nitro-cellulose filters (Schleicher and Schuell)
in 20 x SSC. Hybridization was carried out using 10% dextran sulphate, as
described by Wahl et al. (22). After hybridization the filters were washed
twice in 0.1 x SSC, 0.1% SDS for an hour at 65°C, dried and exposed to Kodak
AR-5 film at -80°C with intensifying screens.
Construction and screening of genomic DNA libraries. Libraries of
trypanosome DNA fragments were made from partial EcoRI-digests inserted into
bacteriophage Charon 4A, and from partial Sau 3A digests inserted into
Charon 28, using methods essentially as described (23,24). The partial
EcoRI-libraries of Bl-DNA and Cl-DNA each contained about 2 x 10
individual phage clones. The partial EcoRI-library of Dl-DNA contained over
106 individual phage clones. The number of phage in the three partial
6583
Nucleic Acids Research
EcoRI-libraries was amplified before screening for VSG genes. The partial
Sau3A-library of clone A-DNA contained about 5 x 105 phage clones. Phage
containing VSG coding sequences were identified and plaque purified via
three successive screenings using the procedure of Benton and Davis (25).
DNA sequence Determinations. DNA fragments were radiolabeled at
recessed 3'- termini generated by the restriction enzymes, Eco RI, Sal I,
Bgl II and Hinf I, using the DNA repair reaction of Escherichia coli DNA
polymerase I (Boehringer Mannheim) and the appropriate [ot 2p] (3000
Ci/mmole from Amersham) as described (26). Protruding 3'-termini generated
by the restriction enzyme Kpn I were labeled with [o 32P] dideoxy ATP (3000
Ci/mmol) and terminal deoxynucleotidyl transferase (P-L Biochemicals) as
described (27). Fragments labeled at one terminus were subjected to the
modification and cleavage reactions of Maxam and Gilbert (19) and
electrophoresed through the thin sequencing gels (80 cm length) of Sanger
and Coulson (28).
RESULTS
Southern filter hybridization analysis. We have previously reported
the construction, identification and isolation of plasmid pcB Dl which
contained a partial cDNA for the VSG D expressed in trypanosome clone Dl
(13). The sequence of this cDNA was determined (5) and found to extend from
about midway through the coding region to about 54 codons before the
termination codon , a region of 687 nucleotides of the estimated 1800
nucleotides of the VSG mR1NA. Since this rather short cDNA does not contain
much of the sequence coding for that region of the VSG outside of the region
of C-terminal homologies, other recombinant plasmids in the same cDNA
library were screened to find one which contains more of the 5' (N-terminal)
coding sequences. Such a plasmid, designated pcB D2, was found which
contains a 350 base pair overlap with pcB Dl and about 550 additional base
pairs on the 5'-side. The relationship between pcB Dl and pcB D2, based on
restriction enzyme analysis, is shown in Figure 1. The sequence of pcB D2
has not yet been determined.
Mixtures of the two plasmids were nick-translated (20) with P and
used as probes in a series of Southern filter hybridizations. In addition,
the two plasmids were digested individually with Bgl II which cuts once
within the cDNA (and does not cut in pBR322) and a restriction enzyme which
cuts once in only pBR322 so that a 5'-coding sequence probe could be
isolated from pcB D2 and a 3'-coding sequence probe could be isolated from
6584
Nucleic Acids Research
0 400 800 1200 1600
H RS
pcB D2- 5' t 3
P *P K B P A H:S P
5 | | 3''pcB D 1
5' frag(from pcB. D.2) 4 3' frag (from pcB.D.l)
Figure 1. Abbreviated restriction maps of the two cDNA inserts of VSG Daligned to show their region of overlap. Numbers are in base pairs. P, H,S, K, B and A refer to cleavage sites of Pst I, Hinc II, Sal I, Kpn I, BglII and Ava I respectively. The symbol* shows the location of an Eco RI sitein two of the corresponding genomic DNA coding regions (figure 3) which isnot present in the cDNA (see text). The 5'- and 3'-fragments were isolatedfor use as probes in the filter hybridizations shown in Figure 2. The smallboxes at the ends of the cDNAs indicate the poly dG: poly dC boundaries.
pcB Dl (see bottom of Fig. 1). These 5'-and 3'-terminal probes were also
made radioactive by nick-translation in vitro.
Figure 2 shows the results of two such Southern hybridization
experiments using the probes for the different parts of the coding
sequences. Figure 2a shows the result when the trypanosome genome is
digested with Bgl II which (as indicated above) cuts the cDNA inserts once.
When the cDNA mixture (containing about 80% of the coding sequence) is used
as the probe (the first six lanes), it hybridizes to six genomic Bgl II
fragments ranging in size from 20 kb to 1.3 kb. Since the cDNA contains a
Bgl II site, this suggests that three different genomic regions contain
sequences complementary to the cDNA. The sizes of five of the fragments is
independent of the trypanosome clone from which the genomic DNA was isolated
(B2, Cl, Dl, etc.). The remaining fragment varies with each genomic DNA
tested. For example, trypanosome clones B2 and Bl are both expressing VSG B
but the variable fragment to which the cDNA hybridizes is different in the
two clones. Likewise trypanosome clones Dl, D4 and D8 are all expressing
VSG D (as determined immunologically), yet the variable band not the same
size in all these clones. When just the 5'-terminal sequence is used as a
probe (the lane labeled 5'), it hybridizes to three of the constant
fragments. The 3'-terminal sequence probe (the lane labeled 3') hybridizes
to the other two constant fragments and the variable one. This suggests
that the variability occurs on the 3'-side of one of the three genomic
regions to which the cDNA hybridizes while no changes occur within the
immediate vicinity of the other two regions.
6585
Nucleic Acids Research
a bProbe (53 ) 1 5 3' (5'+3)-" -- 5 3"
62 Cl Bt Dl D4DBD8D8 B2 Cl Dl D8 D8 D8
-20- _ _
10- 4Eu -10-~~~~~~~~~~~~~~~~~~1-
33 - -_~-__Mam
3-3-4mm-31
13- _m _ u _a_
Figure 2. Autoradiograms of Southern filter hybridizations of (a) BglII-digested nuclear DNAs and (b) Eco RI-digested nuclear DNAs using asprobes either a mixture of the two cDNAs (the first six lanes of part a andthe first four lanes of part b) or the 5'- and 3' -fragments (the last twolanes respectively of parts a and b). The trypanosomes clones from whichthe nuclear DNAs were isolated are indicated across the top, i.e., B2, Cl,D8, etc. The approximate sizes in kb of the fragments to which the cDNAhybridizes is shown at the side.
Figure 2b shows a similar experiment in which the genomic DNAs are
digested with Eco RI. The cDNA sequences do not contain an Eco RI site. In
this case the total cDNA probe (the first four lanes) hybridizes to a
minimum of five different fragments. None of the fragments vary in size
with the different genomes but one of the fragments is about 35 kb in
length, beyond the effective resolution limit of the agarose gel so that
variations of several kb in this fragment would not be detected. The
5'-terminal probe hybridizes to all five EcoRI fragments (the lane labeled5' in Fig. 2b) while the 3'-terminal probe hybridizes to only three, the
8.5, 10.0 and 35 kb fragments (the lane labeled 3'). It does not hybridizeto the 3.1 and 11.5 kb fragments,indicating that they contain only
5r-terminal sequences. Since Eco RI does not cut the cDNA, these resultswere initially confusing to interpret and appeared to suggest either (i)
more than three hybridizable regions in the genome, (ii) the occurrence ofintrons or (iii) sequence heterogeneity in the different coding regions.
Subsequent Southern experiments using other restriction enzymes and DNA
6586
Nucleic Acids Research
sequence determinations revealed that the last interpretation, sequence
heterogeneity, was correct. Two of the three genomic coding regions contain
an Eco RI site. The third region, like the cDNA, does not. This suggests,
but does not prove, that this third region is the one from which the mRNA
arose.
Southern hybridization experiments allowed the construction of
restriction enzyme maps of the three regions as described elsewhere (30).
These maps are reproduced in figure 3 with the addition of EcoR I sites
determined from analysis of cloned genomic DNA (see below). The large arrow
at the right of region III indicates a site, apparently cut by many
restriction enzymes, which is probably a natural double strand break in the
trypanosme genome, not a specific site for any of the enzymes (30). The 'V'
GENOMIC -16 -12 -8 -4 0 +4 +8 +12REGION *I*
R RR R A R RB RB A R R
I II I II V I
II
ZE
hR KH K Ss s X170.
k 187m
2.9 X 19., 30.
2.2 J t 2.4 2.8 1.9 3.1 10.0r.71 8.5 2
4.0 A26-,19 A72_A>183
' A 17
R B A RBA R B R R AL~~~~~~~L(
S S
R A B BA
S HV
HKSR
Figure 3. Summary of the three genomic regions, I, II and III, to which VSGD cDNA hybridizes as determined by Southern filter hybridizations andanalysis of trypanosome nuclear DNA fragments cloned in bacteriophageCharon 4A. The thick lines indicate the three genomic regions and the solidrectangles show where the cDNA hybridizes. R, A, B, H, S and K are cleavagesites for Eco RI, Ava I, Bgl II, Hinc II, Sal I and Kpn I respectively.Numbers are units of kb. Thin lines indicate the different trypanosomenuclear DNA fragments found in the partial Eco RI-libraries that werescreened. The subscript 'ex" means that the DNA originated from atrypanosome clone expressing VSG D while the subscript "non-exo means theDNA was from a trypanosome clone not expressing VSG D. For example, 30 isfrom a library of Bl DNA or Cl DNA. The letter V refers to the variablesegment of genomic region III as indicated by Southern filter hybridizationsand discussed in the text.
6587
Nucleic Acids Research
indicates that the distance between this site and the gene is variable in
different trypanosome clones. This variation accounts for the changes in
the 3' Bgl II fragment shown in figure 2. The remaining Bgl II fragments
can be identified from the map. They are unaltered in different clones,
whether ILtat 1.4 is expressed or not. Thus the double stranded break, and
the length variation associated with it, affect only region III, probably
the region that is transcribed in clone Dl.
Cloned genomic DNA analysis. The genomic regions homologous to VSG
cDNA were further investigated by identifying bacteriophage Charon 4A
clones in partial EcoRI-libraries of trypanosome DNA that contained
sequences homologous with pcB-Dl. Libraries of nuclear DNA from trypanosome
Bl, Cl and Dl were constructed and initial screenings revealed many plaques
in each library to which pcB-Dl hybridized. Phage was prepared from 20-25
plaques of each of the three libraries and DNA was isolated for restriction
enzyme digestions and hybridization analysis. Figure 3 summarizes the
information obtained from these experiments.
Two distinct genomic regions were found to contain DNA homologous to
the cDNA of pcB-Dl. Region I contains adjacent 10.0 kb and 3.1 kb EcoRI
fragments to which pcB-Dl hybridizes and region II has an 8.5 kb fragment to
which pcB-Dl hybridizes. These three fragments co-migrate with the
corresponding EcoRI fragments shown in the genomic filter hybridizations of
Figure 2b. By inspecting DNAs from all three libraries it is clear that, at
the gross level, these two genomic regions remain the same whether or not
VSG-D is being expressed, i.e., the three EcoRI fragments of 175eX from
the "expresser" library are also present in clones of two "non-expresser"
libraries. Likewise the other genomic region is present in libraries of
both Dl "expresser" DNA and Cl or Bl "non-expresser" DNA.
The positions of the DNA segments in the two genomic regions that are
homologous to Dl cDNA were identified (i) by formation of RNA:DNA
heteroduplexes and measurement on electron micrographs of the resultant
R-loops relative to the ends of the DNA and (ii) by restriction enzyme
analysis. The R-loop analysis (not shown) was the less accurate of the two
but established the approximate locations and demonstrated the absence of
introns large enough to be seen by the R-loop methods, i.e. greater than
about 100 base pairs. To facilitate the restriction enzyme analysis, the
8.5, 10.0 and 3.1 kb fragments from both "expresser" and "non-expresser"
DNAs were first subcloned at the EcoRI-site of pBR322. Multiple digestions
of recombinant DNAs containing these subcloned fragments confirmed the
6588
Nucleic Acids Research
enzyme site locations as indicated in Figure 3. For example, Dl cDNA
contains a Kpn I site (see Figure 3) and the 8.5 kb and 10.0 kb fragments
each have only one Kpn I site - in each case about 300 base pairs from an
Eco RI site. The 3.1 kb fragment does not have a Kpn I site. Again
comparison of the restriction enzyme digestion patterns of the subcloned
fragments from "expresser" DNA and "non-expresser" DNA did not reveal any
differences.
The two genomic regions were also analyzed by DNA:DNA heteroduplex
formation of recombinant DNAs containing each of the two regions. Figure 4
shows one such example. The heteroduplex is between the DNAs of 170ex and
183eX (see Figure 3) and shows one homologous region which on the basis of
measurements of ten such heteroduplex molecules is 1.0 kb. The location of
this homologous region corresponds to the position at which the cDNA
hybridizes to each of the two phage DNAs.
Formation of various other combinations of DNA:DNA heteroduplexes
between the genomic clones did not reveal any additional homologies between
these genomic regions. In addition heteroduplexes were formed between DNAs
containing "expresser" DNA and "non-expresser" DNA of the same region - for
A~~~~~~
B A170
A183
Figure 4. Electron micrograph (A) and corresponding diagram (B) of aheteroduplex between the DNAs of 170ex and 183ex (see Fig. 3) showingthe region containing the trypanosome nuclear DNA inserts. The 20 kb arm ison the left and the 12 kb arm is on the right. The single stranded circleon the left side of the micrograph is o174 DNA used as an internal lengthstandard.
6589
Nucleic Acids Research
example, between 187ex and 30non-ex and between 183ex and
27non-ex' In these cases no difference was observed between the DNA from
trypanosomes expressing surface antigen D and those which do not (notshown). Again, the resolution limit is about 100 base pairs; a region of
non-homology or only partial homology less than this length would not have
been detected.
A major flaw in the analysis of these genomic clones is that no clone
was obtained which contains the third genomic region to which the cDNA
hybridizes, i.e., genomic region III shown in Figure 3. This is clearly
because the 35 kb Eco RI fragment containing region III is too large to be
cloned in Charon 4A which accepts a maximum insert size of about 22 kb
(23). As a result, the Charon 28-partial Sau 3A library of DNA from
trypanosome clone A was also screened for sequences homologous to D cDNA.
Several such clones were identified and characterized but their DNA inserts
all originated from either genomic region I or II. Due to technical
difficulties encountered in working with individual clones of the Charon 28library, additional clones of the partial Sau 3A library were notinvestigated. Furthermore, because of the lack of many restriction sites
within region III and the putative double-stranded break in the DNA
downstream from the coding sequence, it seemed likely that the partial Sau
3A library might not include this region either.
DNA Sequence Analysis. Portions of the coding sequences in the cloned
genomic regions I and II were determined to see how similar they were to
each other and to the cDNA. Recombinant plasmids containing the subcloned10.0 kb fragment (region I) and the 8.5 kb fragment (region II) were eachfound to contain a Kpn I and a Bgl II site 300-400 bp from one of the Eco RI
boundaries. It was (correctly) assumed that these sites were equivalent tothe same sites in the cDNA which are about 100 bp apart. Sequences in thisarea of both the 10.0 and 8.5 kb fragments were obtained from fragmentslabeled at the sites of Eco RI, Kpn I, Bgl II, Sal I and Hinf I (see Fig. 5legend). These two genomic sequences and the cDNA sequence in pcB Dl are
compared in Figure 5.
The genomic sequences begin within the region corresponding to thevariable region coding sequence of the cDNA. Within the 658 nucleotideswhich the two genomic regions have in common, about 17% of the positions aredifferent (119/658). Likewise the two genomic regions are different fromthe cDNA. Region I and the cDNA differ at about 4% of positions (27/770)while region II and the cDNA are different at 13% of positions (63/478).
6590
Nucleic Acids Research
I TTC CTA GAC BAA O A B6T BBT TGC CT A BCATG6 ¢ CM
'I BBCABCBCABAA GA TT66_____GGT_GGTTGC CTA 6CAG__6_AG
G GCA GGC TGC ABA W GU CCA GTA TAT ACB GBA T ABC AM ACA
1I O";CG GCA GGC TGC AG AA G2CA CCA GTA TAT ACG GCA GS1M ACC SC CTA G8AA m AGC AAA ACA 666 TTC
cON AM GCA GGA GGA GAG BBC BAT GCA AAA BAC BBA6 ACG 6CC AAA T6T ATT TTB TTC AAM BC 6Cr 666 GCA 6CC BC GCT BCA C6TI AMA BCBGGA BG ATABBC GAT BC AMA GAC 6CC I BAl6CC AM TBT ATT TTB TTC AA GCA GCT G6tXA GC G
II AGt AAC 6 CC AAA TGT ATT TTG TTC
cDN4A TTC TAC CAB CAC CM ACC MG BT CAC CTT BGC 666 TAC CTG BAA ATA ACA TCA GGA GCA G6C AGA ACG ACG CTA GAA CT6 AAA AMCI TTC TAC CABCAC CM ACCAM6A CAC CTT GGC GGG TAC CTG GM ATA ACA TCA 66A GCA GGC AA ACG C MAAC
II TTC TAC CAB CAC CM GIT CAC CTT GGC 666 TAC CTB6M ATA A&ITC GGA GCA
cDKA CTC AAC BAC ATC GCA CAM GACBGT BTA CAC AAM ABC 666 CAM CTA TT6 GGA GAS ATC TAC ACA CCG CTU GCA ACA TTA AAC AGT BAA BACI CTCMG&C ATC GCA CAB BAC G6T GTA CAC MAGC CTA UB6A BAG ATC TAC CCG CTT GCA ACA_ AMCMT GA
II C -Cr ATC M ACTBTACAC ABC GGABAGATCTAC CC6CCTCAACA AAACABTG
cONA ACA ACA BAA ATT AM ACC ACA BAT GAA ABC ATA ATA GA6C6CA6U 6CT 6CC ABC ACA TTA GAG 6CC 6CC UTT CAGGS 6CT CTT AMI ACA ACA GM ATT AM ACC ACA BAT BAA ABC ATA A GAMA6C 6Cd GCA BCT 6CC ABC ACA TTA B:JBCC 6CC UTT CAM BAG GCT CTTCA
II hACA ACA CM ATT MA ACC ACA BAT 6A A C AMA AMC GCTGCA GCT BCC AMC ACA TTA BA6CC BCC UTT CAB BAG CT CTT AM
cDNA CTG GCA MT CCC M CTG AM BAA GAM GCA GCA BAC ATC ATC MB BAGm GTG BBA ABC BAA AAT ACA AAI CTG GCA AATC AAGMC64M CTG AAM GAA GAA GCA GCA GAC ATC ATC AAG GAGm GTG GGA AGC GAA MTIAMAl6
II _B
cDKA TCC AYAG 6CT TGG GAAAM CTA AAG TCG ACA AMGTGM6 GGC ACA GAG GCG AM CCC GAA ACA AAMAA GAG CTA AAA AC ATTCI TCC AAG 6LT T66 GAA A* CTA A2AG TCG ACA AAM GTG AAG GGC ACA GAG GCG AAM CCC GAA ACA GAA AAA GAG CTA AAA GAC AT CILII TCCMa TBGGG CTAAMTCG
cONA AAC GCT MA CTG GTG TCG GCA CTA MT TAC TAC ATC AGC AT GCT MAA TCT M6 CTACAA6M GCG GAM ACA AM CTA GCA 6I MC GCT MA CT6 6TG TCG BCA CTA MT TAC TAC ATC ABC AMT GCT BAA TCT M6 CTA CAB BAA 6CC BAG ABA ACTA GCA
cONA GCT WCA GCT AMMA GTG CCA ACA GCG CCT AM CCA GAT GM TGC AM GCT AMAAAG GG 6ACC TGC MAM AT GGA TGC AM TGG GATI BCT BCA BCT MA AM GTG CCA ACA GCG CCT AMA CCA BAT GMAT6C AMA CT MAM 666 BAC ACC T6C MA BAT GBA TBC AM T66 BAT
cDKA ABC GAC GGT BAA AMAAAAA GTA 6T6 GAC CCG MT TAC ACA AAMMI AGC BAC GGT AAMC MAMAAA 6TA GTG GAC CCMMT TAC ACA MAMA CAG GTA UTT GAA CA GCA GCC MAA GTT AT AM ACAMC
I ABC ACA GBA ABC MT CTTC AU
Figure 5. A comparison of the sequences of the cDNA and the correspondingcoding segments of genomic regions I and II. The genomic sequences weredetermined from the subcloned 10.0 and 8.5 kb fragments (regions I and IIrespectively) labeled at the Eco RI site (position 1), the Kpn I site(position 310), the Bgl II site (position 425), a Sal I site (position 656)and a Hinf I site (position 767). Sequences displayed within the lines areidentical. Locations at which the sequences differ are not included withinthe lines. The vertical bar at position 1008 indicates the predicted startof the codons specifying the C-terminal hydrophobic tail of the VSG (basedon analogy with other VSG amino acid sequeces, see reference 5).
Within the area in which all three sequences can be compared (478
nucleotides), the three are different at only two locations, positions 216
and 219. There is no apparent pattern to these differences although region
I is clearly more similar to the cDNA (and presumably region III) than is
region II.
Figure 6 shows the corresponding amino acid sequences predicted from
the nucleotide sequences. Clearly large portions of two genomic regions
code for the same amino acids as the cDNA although there are sufficient
6591
Nucleic Acids Research
9 kb F L D E|;LIDL GTE T6 G-C L A E A S AGNV A N R E V A A A G C R KI| P V Y T A T E NE T G|E F S K T G FIQ8 kb KAQID G TID A G C L AEDASADGI V D H K I T|A A G C R K P V Y T A T E D S A E F S K T 6 P
CONA K GEGD|_DGET A K C I L F K A A G A A G A A R F Y Q H Q T K V N H L G G Y L E I T S G A G R T T L E L K N9 kb K A G G V G D A K D A A K C I L F K A A G A A G A A R F Y Q H Q T K V T H L G6 Y L E I T S G AG R T T L E L K N8 kb DPTA KCINLFA H L G6 Y L E I T S 6A6
CDNA L N D I A Q D G V H K S G Q L L G E I Y T P L A T L N S E D T T E I K T T D E S I IIR A A A A S T L E A A V Q E A L K9 kb L N D I A Q D G V H K S G Q L L G E I Y T P L A T L N S E D T T E I K T T D E S I I K S A A A A S T L E A A V Q E A L K9kb LNDIIAQIGV HKTnS GQIL TP LATLSElDTT EI KTTD E SIISAAAASTLE A A VQE A LK
9 kb L A N AD G| Q E K L K E E A A D I I K E F V G SEA K G S K A W E K L K S T K V K G T E A K P E T E K E L K D I TY
_______ _ _ _ _ _ _ _ _ _ ~~~~~~~~~~~~~~~44 4cDA NAKLVSALNYYISSAESKLQEAETKLAATKAAAEKV PT KPDECKAKKGDTCKDGCKWD9 kb N A K L V S A L N Y Y I S S A E S K L Q E A E T K L A A A K A A A E K V P T A P K P D E C K A K K G D T C K D G C K W D
4cDNA S D G E N K KC V V D P N Y T K K....9 kb S D G E N K K|S V V D P N Y T K K Q V V E A A A K V D K T N T T G S . . .
Figure 6. Amino acid sequences deduced from the corresponding nucleotidesequences shown in Figure 5. Lines are drawn around the sequences incommon. Arrows point to the cysteine residues within the VSG homologyregion which are conserved in all VSGs of that homology subset which havebeen analyzed (5). The vertical bar indicates the cleavage site of theC-terminal hydrophobic tail from the nascent VSG (based on analogy withother VSGs, see reference 5).
differences to eliminate the possibility that the mRNA (from which the cDNA
was constructed) arose from either genomic region I or II. The arrows in
Fig. 6 point to cysteine residues in the VSG homology region which appear,
on the basis of comparison with other VSG cDNA sequences (5), to be
invariant. In genomic region I, the codon for one of these invariant
cysteines is replaced by a serine codon. This may suggest, but does not
prove, that the coding sequence in genomic region I is a VSG pseudogene as
discussed below. The sequence of genomic region II was not determined in
this area.
DISCUSSION
Of the three genomic regions to which VSG D cDNA hybridizes, all of the
evidence suggests that region III is the one from which VSG D mRNA is
transcribed. The variable fragment occurs within this region and sequencesof the other two regions are not homologous with the cDNA. It is
unfortunate that extensive efforts to clone region III using recombinant DNA
techniques were unsuccessful. Since this region lies within a 35 kb or
greater Eco RI fragment, one end of which may not even be an Eco RI site, it
6592
Nucleic Acids Research
would be unlikely to appear in the partial Eco RI libraries that were
screened. Less clear is why it was also not detected in the partial Sau 3A
library. Two possibilities are (i) an additional lack of Sau 3A sites in
this region or (ii) instability of the DNA sequences in the cloning
vector. Neither possibility was investigated in detail.
If the cDNA can be taken as an index of region III, then a comparison
of the sequences in Figure 5 reveals that region I is much more similar to
the expressed region III than is region II (4% differences vs. 13%
differences). Furthermore, the two sequences are even more similar within
the area that codes for the C-terminal 120 amino acid homology region. Only
4 differences occur in the last 390 nucleotides of the comparison between
the cDNA and region I (1% difference). This pattern of nucleotide changes,
i.e., more differences in the variable region than in the homologous region,is very similar to the pattern observed previously in comparing the sequence
of the VSG A (ILtat 1.1) basic copy gene and its expression-linked extra
copy (ELC) (11). In this case the two VSG A sequences were found to posses
9% difference in the variable coding region and no difference in the
homology coding region (excluding the C-terminal hydrophobic tail codons
which may be involved in recombination that generates the ELC). This
anology between the VSG A basic copy and ELC and the VSG D gene family might
suggest, but does not prove, that region III was originally generated as an
ELC of region I but was not subsequently lost in the switch to expression of
another VSG. This would imply that region III, as a "retained" ELC in the
genome, present in clone Bl, was successfully re-expressed in clone Dl.
Thus the region III present in clone Bl is not merely an insignificant
remnant of a pre-existing ELC. The 3' distal rearrangements adjacent to the
region III gene may be an important pre-requisite for the chance
re-expression of such a "retained" ELC. Although little additional evidence
is available for such a model, it is an attractive possibility because it
relates to the two general, and seemingly dissimilar, molecular phenomena
associated with VSG expression, i.e. the occurrence of an ELC or the
presence of 3V-distal rearrangements of pre-existing VSG genes.
Furthermore, various biological tests of this molecular model in the
laboratory may be possible.
An argument against the model, based on the sequence comparison in Fig.
5, is that one of the 4 changes in the last 390 nucleotides of the region I
and cDNA comparison affects a cysteine residue which appears to be invariant
in VSGs of that homology subgroup (see bottom arrow in Figure 6). This may
6593
Nucleic Acids Research
suggest that the region I gene which codes for a serine in this position, is
a pseudogene which cannot give rise to a functional VSG. Alternatively,
this single nucleotide change could be a mutation in region I which occurred
after region III was generated as an ELC of the region I basic copy. This
possibility must, however, remain speculative.
Regions I and II differ from each other at 17% of the nucleotide
positions, more than either differs from the cDNA. There is no apparent
pattern to the differences with respect to either location of purines vs.
pyrimidines. Single nucleotide changes predominate although there are a few
regions of 2-6 continuous nucleotides that differ. No deletions or
insertions were detected and no evidence for introns within the coding
sequence was obtained. Careful attention was given during the
characterization of the genomic clones to the possibility that other VSG
genes, containing a detectable C-terminal homology region, might be nearby.
However, no recombinant genomic clone was obtained which appeared to possess
two distinct VSG coding sequences as detected by hybridization.
The occurrence of a double stranded break in the DNA down-stream from
an expressed VSG gene has been recently reported by others (29) and has been
observed by us with other VSG genes as well (30). Since it has recently
been shown that a double stranded DNA break is an intermediate in the
rearrangements associated with yeast mating type switches (J. Hicks,
personal communication), it seems likely that such a double strand break may
also be involved in VSG switches. Experiments are underway to further
investigate the location and function of this double strand break.
REFERENCES
1. Marcu, K.B. and Williams, R.O. (1981) In: Genetic Engineering (Setlow,J.K. and Hollaender, A., Eds.) Plenum, New York Vol. 3, 129-155.
2. Englund, P.T., Hajduck, S.J. and Marini, J. (1982 Annl. Rev. Biochem.51 (in press).
3. Turner, M.J. and Cordingley, J.S. (1981) In: Molecular and CellularAspects of Microbial Evolution (Carthe, Collins and Moseley, eds.)Cambridge University Press.
4. Borst, P., Frasch, A.C.C., Bernards, A., Van der Ploeg, L.H.T.,Hoeijmakers, J.H.J., Arnberg, A.C. and Cross, G.A.M. (1980) Cold SpringHarbor Symp. on Quant. Biol. 45 935-943.
5. Rice-Ficht, A.C., Chen, K.K. and Donelson, J.E. (1981) Nature 294, 53-57.6. Capern, A., Giroud, C., Baltz, T. and Mattern, P. (1977) Exp. Parasital
42, 6-13.7. Williams, R.0., Young, J.R., and Majiwa, P.A.0. (1979) Nature 282,
847-849.8. Hoeijmakers, J.H.J., Frasch, A.C.C., Bernards, A., Borst, P. and Cross,
G.A.M. (1980) Nature 284, 78-80.
6594
Nucleic Acids Research
9. Pays, E., Van Meirvenne, N., LeRay, D. and Steinert, M. (1981) Proc.Natl. Acad. Sci. USA 78, 2673-2677.
10. Pays, E., Lheureux, M. and steinert, M. (1981) Nature 292, 265-267.11. Rice-Ficht, A.C., Chen, K.K. and Donelson, J.E. (1982) Nature (in press).12. Pays, E., Lheureux, M. and Steinert, M. (1982) Nucleic Aclds Research
10, 3149-3163.13. Young, J.R., Donelson, J.E., Majiwa, P.A.0., Shapiro, S.Z. and Williams,
R.O. (1982) Nucleic Acids Research 10, 803-819.14. Doyle, J.J. (1977) In: Immunity to Blood Parasites in Animals and Man
(Miller, L., Pino, J. and McKelvy, eds.) Plenum Press, New York, pp27-63.
15. Doyle, J.J., Behin, R., Mauel, J. and Rowe, D.S. (1975) Anal. N.Y. Acad.Sci. 254, 315-325.
16. laureint, M., Van Assel, S. and Steinert, M. (1971) Biochem. Biophys.Res. Comm. 43, 278-284.
17. Citron, B.A., Feiss, M. and Donelson, J.E. (1979) Gene 6, 251-264.18. Robbins, J. Freyer, G., Haynes, J.R., Rosteck, P., Cleary, M.L., Kalter,
H.D., Smith, K., and Lingrel, J.B. (1979) J. Biol. Chem. 254, 6187-6195.19. Maxam, A.M. and Gilbert, W. (1977) Proc. Nat.Acad.T9T. UWX 74, 560-564.20. Rigby, P.W.J., Dieckmann, M., Rhodes, C. and Berg, P. (1979) J. Mol.
Biol. 113, 237-251.21. Southern, E.M. (1975) J. Mol. Biol. 98, 501-517.22. Wahl, G.M., Stern, M. and Stark, G.R. (1979). Proc. Natl. Acad. Sci. USA
76, 3683-3687.23. Blattner, F.R., Williams, B.G., Blechl, A.E., Denniston-Thompson, K.,
Faber, H .E., Furlong, L.-A., Grunwald, D.J., Kiefer, D.0., Moore, D.D.,Schumm, J.W., Sheldon, E.L. and Smithies, 0. (1977) Science 196, 161-169.
24. Maniatis, T., Hardison, R.C., Lacy, E., Lauer, J., O'Conneell, C., andQuon, D. (1978) Cell 15, 687-701.
25. Benton, W.D. and Davis, R.W. (1977) Science 196, 180-182.26. Nichols, B.P. and Donelson, J.E. (1978) J. Virol. 25, 429-434.27. Tu, C. -P.D. and Cohen, S.N. (1980) Gene10-,777-lw.28. Sanger, F. and Coulson, A.R. (1978) FEBS Lett. 87, 107-110.29. Pays, E., Lheureux, M. and Steinert, M. (1982) Nucleic Acids Research
10, 3149-3163.30. Williams, R.0., Young, J.R. and Majiwa, P.A.0. (1982) Nature (in press).
6595