The structure and phylogeny of a new family of human endogenous retroviruses

11
'[°urna! ~f G[e. "e.r.al.Vir°l°~gy..~19.96!~`~77:.~!~63~1.-~1~6~4~1~: P[![~!ed~i~G[eat~B~[!!a~![~.......................................................................................................................... The structure and phylogeny of a new family of human endogenous retroviruses Bengt Widegren, I Christian Kjellman, I Stefan Aminoff, ~ Leif G. Sahlford 2 and Hans-Olov Sj6gren I i Department of Tumor Immunology, Institute of Cell and Molecular Biology, The Wallenberg Laboratory, S61vegatan 33, S-223 62 Lund, Sweden 2 Department of Neurosurgery, The Academy Hospital, S-221 85 Lund, Sweden A novel endogenous retrovirus (ERV) designated XA34 was isolated from a human glioma cDNA library using low stringency hybridization with an ERV-9 env probe. Southern blot hybridizations with human genomic DNA revealed the presence of approximately 16 genomic copies closely related to XA34. Sequencing of a 2303 bp cDNA clone of XA34 showed that it belongs to a new ERV family. The XA34 ERV has recombined with an ERV-9-1ike retrovirus resulting in a truncated ERV-9-1ike env region that ends with an Alul-like 3' LTR. By using PCR, we isolated ~ 940 bp polfragments from three additional members of this family, XA35, XA36 and XA37. A fifth member, XA38, was isolated and sequenced as a 4729bp genomic clone. The genomic XA38 clone spans from pol towards the 3' flanking region. The XA38 virus contains a more cryptic env region. The XA38 env is truncated in the transmembrane region and the virus then ends with three Alu repeats. Southern blot studies with hu- man, chimpanzee, orangutan and squirrel monkey DNA show the presence of the XA34 family in all these species. That both the New and Old World monkeys have this ERV family means that the integration and/or amplification in the primate germ-line of XA34 probably took place about 40-45 million years ago. The phylogeny and the closest relatives to ERV XA34 are discussed. Introduction Probably all eukaryotes carry endogenous retroviruses (ERVs) in their genomes. For some species the proportion of the genome consisting of ERVs is substantial, e.g. 2% for mouse and 1% for man (Gallo, 1995). So far, many human ERVs (Larsson et al., 1989; Lieb-M6sch et al., 1990) have been isolated and characterized, e.g. ERV-1 (Bonner eta]., 1982), ERV-3 (O'Connell et al., i984), ERV-9 (La Mantia et al., 1991), Humer 41 (Rabson et al., I985; Repaske et a]., 1985), RGH (Hirose eta]., 1993) and RTVLH (Mager & Freeman, 1987). There is a great variability of copy number among different human ERVs. For instance, there are single or low copy ERVs like ERV-1 (Bonner et al., 1982) and ERV-3 (O'Connell et al., 1984) and there are others like Humer 41 (Rabson et al., 1985; Repaske et aI., I985) with 50 to 100 representatives and RTVLH (Goodchild et al., I993; Mager & Freeman, 1987; Authorfor correspondence: Bengt Widegren. Fax +46 46 I04201. e-mail [email protected] The accession numbers for the sequence data reported in this paper are U29659 (for XA34), U37054 (XA35), U37067 (XA36), U29658 (XA37) and U37066 (XA38). Wilkinson et al., I993) with more than I000 copies per haploid genome. ERVs enter the germ-line by means of infection and are then either lost or fixed in the population. For the large copy ERVs, the original integrations have probably been followed by a germ-line amplification by mechanisms other than infection (Wilkinson et al., 1990; Goodchild et aI., 1993). ERVs represent new genetic material arriving at a specific chromo- somal position at a specific time during evolution. The majority of ERVs that we can study in the human genome have been fixed. If the mutation (insertion of a retrovirus in the germ-line) does not affect the fitness, then fixation of the new genetic material will be a process of random drift. The time for fixation of a new endogenous retrovirus will then be 4N generations (Kimura, 1983), where N is the size of the population. Once fixed, it will take the same time for such a virus to be lost from the population by random drift. The localization and fixation at a specific chromosomal position of an ERV in a population or a species will be specific for that branch of population(s) or species. Hence, ERVs can be of great phylogenetic value. In this paper we describe sequencing data and phylogenetic relationships of a novel family of ERVs. We have identified several related sequences of this new ERV, XA34, which was ;3 OOO1-3861 © 1996 SGM

Transcript of The structure and phylogeny of a new family of human endogenous retroviruses

' [ °urna! ~ f G[e. "e.r.al.Vir°l°~gy..~19.96!~`~77:.~!~63~1.-~1~6~4~1~: P[![~!ed~i~G[eat~B~[!!a~![~ ..........................................................................................................................

The structure and phylogeny of a new family of human endogenous retroviruses

Bengt Widegren, I Christian K j e l l m a n , I S t e f a n A m i n o f f , ~ L e i f G. S a h l f o r d 2 a n d H a n s - O l o v S j 6 g r e n I

i Department of Tumor Immunology, Institute of Cell and Molecular Biology, The Wallenberg Laboratory, S61vegatan 33, S-223 62 Lund, Sweden

2 Department of Neurosurgery, The Academy Hospital, S-221 85 Lund, Sweden

A novel endogenous retrovirus (ERV) designated XA34 was isolated from a human glioma cDNA library using low stringency hybridization with an ERV-9 env probe. Southern blot hybridizations with human genomic DNA revealed the presence of approximately 16 genomic copies closely related to XA34. Sequencing of a 2303 bp cDNA clone of XA34 showed that it belongs to a new ERV family. The XA34 ERV has recombined with an ERV-9-1ike retrovirus resulting in a truncated ERV-9-1ike env region that ends with an Alul-like 3' LTR. By using PCR, we isolated ~ 940 bp polfragments from three additional members of this family, XA35, XA36 and XA37. A fifth member, XA38, was isolated and

sequenced as a 4 7 2 9 b p genomic clone. The genomic XA38 clone spans from pol towards the 3' flanking region. The XA38 virus contains a more cryptic env region. The XA38 env is truncated in the transmembrane region and the virus then ends with three Alu repeats. Southern blot studies with hu- man, chimpanzee, orangutan and squirrel monkey DNA show the presence of the XA34 family in all these species. That both the New and Old World monkeys have this ERV family means that the integration and/or amplification in the primate germ-line of XA34 probably took place about 4 0 - 4 5 million years ago. The phylogeny and the closest relatives to ERV XA34 are discussed.

In t roduct ion

Probably all eukaryotes carry endogenous retroviruses (ERVs) in their genomes. For some species the proportion of the genome consisting of ERVs is substantial, e.g. 2% for mouse and 1% for man (Gallo, 1995). So far, many human ERVs (Larsson et al., 1989; Lieb-M6sch et al., 1990) have been isolated and characterized, e.g. ERV-1 (Bonner eta]., 1982), ERV-3 (O'Connell et al., i984), ERV-9 (La Mantia et al., 1991), Humer 41 (Rabson et al., I985; Repaske et a]., 1985), RGH (Hirose eta]., 1993) and RTVLH (Mager & Freeman, 1987). There is a great variability of copy number among different human ERVs. For instance, there are single or low copy ERVs like ERV-1 (Bonner et al., 1982) and ERV-3 (O'Connell et al., 1984) and there are others like Humer 41 (Rabson et al., 1985; Repaske et aI., I985) with 50 to 100 representatives and RTVLH (Goodchild et al., I993; Mager & Freeman, 1987;

Author for correspondence: Bengt Widegren. Fax +46 46 I04201. e-mail [email protected]

The accession numbers for the sequence data reported in this paper are U29659 (for XA34), U37054 (XA35), U37067 (XA36), U29658 (XA37) and U37066 (XA38).

Wilkinson et al., I993) with more than I000 copies per haploid genome.

ERVs enter the germ-line by means of infection and are then either lost or fixed in the population. For the large copy ERVs, the original integrations have probably been followed by a germ-line amplification by mechanisms other than infection (Wilkinson et al., 1990; Goodchild et aI., 1993). ERVs represent new genetic material arriving at a specific chromo- somal position at a specific time during evolution. The majority of ERVs that we can study in the human genome have been fixed. If the mutation (insertion of a retrovirus in the germ-line) does not affect the fitness, then fixation of the new genetic material will be a process of random drift. The time for fixation of a new endogenous retrovirus will then be 4N generations (Kimura, 1983), where N is the size of the population. Once fixed, it will take the same time for such a virus to be lost from the population by random drift. The localization and fixation at a specific chromosomal position of an ERV in a population or a species will be specific for that branch of population(s) or species. Hence, ERVs can be of great phylogenetic value.

In this paper we describe sequencing data and phylogenetic relationships of a novel family of ERVs. We have identified several related sequences of this new ERV, XA34, which was

;3 OOO1-3861 © 1996 SGM

originally isolated from a human glioma c D N A library. By

means of PCR we have isolated and analysed ~ 940 bp pol fragments of t:hree additional ERVs from human D N A that are

closely related to XA34. A 4729 bp genomic clone repre- sent ing an addit ional ERV of the XA34 ERV family with a 3 '

flanking region was also isolated and sequenced. The evolu t ion of these ERVs is discussed.

Methods • PCR primers. The PCR primers used and referred to are: 390, 5" TTATGAGTATTTCTTCCAGGG 3' 490, 5" AGCAAGTTCAGCCTGGTTAAGT 3' 522, 5' CCTGGAGCCCGTCAGTAT 3' 523, 5' ACCAACTGGTAATGGTAGC 3' 560, 5' CCAAAACCGCTGAGGCCTAGA 3' 561, 5' GGCCCAAAGGCGAGTAACAGCA 3' 593, 5' CGATGATCAACTATTCATAGATGG 3' 693, 5' CAAGAATTTGGGAGACTTCTGC 3' 2793, 5' TGGTGAGAGCTATGAGTTCTGC 3'

• DNA probes. An env probe from ERV-9 (La Mantia et al., 199i) was made by PCR with the primer pair 560/561. The reaction mixture (100 laI) contained 1I pmol of each primer, 0'1 ~lg human male total DNA, 2'5 mM-MgCI~, 0'2 mM-dNTP, standard buffer (Perkin-Elmer Cetus) and 1 U of AmpliTaq (Perkin-Elmer Cetus). The PCR was run for 30 cycles with a thermal profile of 95 °C for 45 s, 50 °C for 45 s and 72 °C for 60 s. The PCR generated a product of 250 bp which was of the expected size. This was further isolated from an agarose gel with glassmilk (Biol01).

A 378 bp poI fragment from XA34 and 142-155 bp pol fragments from XA34, XA35, XA36, XA37 and XA38 were used as probes for the hybridizations (Fig. 1). The 378 bp XA34 polprobe was made from the 5' end of a 2303 bp XA34 cDNA fragment after digestion with EcoRV and EcoRl. The 142-155 bp poI probes from XA34, XA35, XA36, XA37 and XA38 were made by PCR from cloned material using primer pair 593/2793. The reaction mixtures (50 lal) contained 0'1 gg primer, 10 ng template, 2"5 mM-MgCI~, 0"2 mM-dNTP, standard buffer (Perkin-Elmer Cetus) and I U of AmpliTaq (Perkin-Elmer Cetus). The PCRs were for 25 cycles with a thermal profile of 94 °C for 30 s, 56 °C for 45 s and 72 °C for 60 s. The fragments were isolated from an agarose gel.

The ERV-9 env probe and the 378 bp XA34 probe were both labelled with [~-a2p]dCTP using a kit for random priming (T7 Quick Priming, Pharmacia). The specific activities of the probes were between I and 3 x I09 d.p.m./tag DNA. The 142-155 bp DNA fragments were labelled with [~-a~p]dCTP by thermal cycling. The labelling reactions (20 p.l) contained 3 ng of template, 10 ng of the 593 primer, I nmol each of dATP, dTTP and dGTP, 0'3 nmoI [~PJdCTP (3000 Ci/mmol, Amersham), 2"5 mM-MgCl~, PCR buffer (Perkin-Elmer Cetus) and 1 U of AmpliTaq (Perkin-Elmer Cetus). The reactions were run for six cycles at 94 °C for 00 s, 54 °C for 50 s and at 72 °C for 4 rain.

• cDNA library screening and cloning. A ~. g t l I cDNA library (1 x 10 G plaques) from human glioma (Clonetech #1049 b) was screened using the env probe from ERV-9. Plaque blotting to positively charged nylon filters (Hybond N +, Amersham) was done according to the manufacturer's recommendations. A low stringency hybridization was carried out in Rapid Hybridization solution (Amersham) containing 30 % formamide at 42 °C. The washes were done at room temperature in 2 x SSC-0"2 % SDS with several changes. Autoradiography was typically carried out overnight.

To amplify the positive cDNA clones we used PCR with the primer pair 522/523, which binds just outside the EcoRI cloning site of ,~ gt11. The reaction mixture (100 lal) contained 0"1 l~g of each primer, 5 ~l phage suspension, 2'5 mM-MgC12, 0-2 mM-dNTP, standard buffer (Perkin-Elmer Cetus) and 1 U of AmpliTaq (Perkin-Elmer Cetus). PCR was carried out for 30 cycles with a thermal profile of 95 °C for 45 s, 58 °C for 45 s and 72 °C for 90 s. The PCR products (cDNA inserts) were digested with EcoRI and cloned in EcoRI-digested and dephosphorylated M13mplS.

• Isolation of XA34 related retroviral elements. To get sequence information from ERVs closely related to XA34 we digested total DNA from a human male with EcoRI and size fractionated the digested DNA on a 0"8% agarose gel at 2 V/cm for 24 h. DNA from three regions that hybridized with the 378 bp XA34 probe was excised from the preparative gel and the DNA was extracted from the gel slices using Qiaex (Qiagen). The regions of interest were about 4"5, 7 and 20 kb in length. DNA from these three EcoRI gel slices was used for PCR amplification with the primer pair 593/693. The PCR fragments generated were cloned in the TA vector pTTBlue (AMS Biotechnology) and sequenced. This resulted in the identification of three human ERV sequences, XA35, XA36 and XA37, that were closely related to XA34. From the 4"5 kb band we identified XA35, from the 7 kb band we identified XA34 and XA35 and from the 20 kb band we identified XA37.

The 4"5 kb EcoRI fragment was ligated into ,~ gtlO and packaged using a packaging extract (Gigapack II plus, Stratagene). Recombinant phages (50000) were screened with the 3~P-labelled XA34 probe (see above). Three positive plaques were isolated. To amplify the positive clones we used PCR as described before with the primer pair 490/390, which binds outside the EcoRI cloning site of ,t, gtIO. The PCR products were separated and isolated from an agarose gel and the fragments were cloned in pTTBlue. The three clones were identical and after sequencing this human ERV was named XA38. The reason why we didn't amplify XA38 from the 4"5 kb slice is probably because the primers didn't fit this virus well enough. Primer 593 has four mismatches within 24 bases and primer 693 has four mismatches out of 22 bases. On the other hand, we probably would have picked up XA35 from the library of the 4"5 kb genomic EcoRI gel slice if we had screened more clones.

• Sequencing and computer analysis. Dideoxy sequencing was performed according to Sanger et al. (I977), using Sequenase (USB) [~-35S]dATP (Amersham). Electrophoresis was done on wedge-shaped sequencing gels (0"2-0'4 mm thick). The software package (version 7.0) from the University of Wisconsin Genetics Computer Group (Devereux et al., 1984) was used for the DNA sequence analysis. The programs used within this package were Bestfit, FastA, TFastA, Lineup, Pretty, Blast, Pileup, Distances and Growtree. The trees were constructed using Pileup to align the proteins. The phylogenies were then constructed using Distances with Kimura's parameters and Growtree with either the neighbour-joining algorithm or the UPGMA algorithm. The EMBL or GenBank nucleotide databases were used for homology searches using FastA, TFastA or Blast. We use continuously updated versions of these databases.

• Southern blot hybridization with total DNA. Total human DNA was prepared from blood lymphocytes. The chimpanzee (Pan troglodytes) female DNA and testis tissue were kind gifts from Ulfur Arnason at the Department of Evolutionary Molecular Systematics, University of Lund. The orangutan (Pongo pygmaeus) material (blood lymphocytes from Anna and Dennis) was a kind gift from Jens Lilleor at the Aalborg Zoo in Denmark. The squirrel monkey (Saimiri sciureus) tissue (testis and kidney) was a kind gift from Maths Berlin at the Institute of Environmental Medicine, University of Lund. The macaque (Macaca fascicularis) kidney tissue was a kind gift from Thomas Brodin at

63~

Pharmacia in Lund. Rat DNA was prepared from the liver of a male Rattus norvegicus, strain Wistar Furth. DNA was isolated by standard methods. Total DNA was digested with either PstI (Boehringer Mannheim) or EcoRI (Boehringer Mannheim); 15 ~g of the digested DNA was run on a 0"8% agarose gel at 1"3 V/cm. After electrophoresis the gel was treated according to standard techniques (Ausubel et al., 1987). The DNA was transferred from the gel to a Biodyne B membrane, baked at 80 °C for 2 h and hybridized at 65 °C in Rapid Hybridization solution (Amersham) overnight. The concentrations of the labelled probes in the hybridization solution were 5 x 10 s d.p.m./ml. The filters were washed with two changes of 2 x SSC--0"5 % SDS for 1"5 h each at room temperature and then with i x SSC--0'5 % SDS for 1"5 h at 65 °C. Autoradiography of the filters was typically done overnight. Before reprobing, the filters were stripped according to the manufacturer's protocol and the blot was checked by autoradiography overnight at - 70 °C.

• Retrov i ruses. The accession numbers for the sequences used are as follows: baboon endogenous retrovirus (BaEV), M16550; ERV-3, MI2140; ERV-9, X57147; feline leukaemia virus (FeLV), M18247; HSRIRT, M64936; HSRTVE, D10450; HUMER41, M10976; RDl143, X51930; RESIV2DC, M16605 ; RESIVMPC, M12349; RGH1, D 10083; RGH2, Dl1078; RTVLH2, M18048; SIVENVLTR, L38695; MLV, J02255, J02256, J02257; XA34, U29659; XA35, U37054; XA36, U37067; XA37, U29658; XA38, U37066.

R e s u l t s Low stringency screening of the 2 g i l l cDNA library from

human glioma with the ERV-9 env probe yielded 50 positive clones. Seven of these clones were isolated and subcloned into

ii!i!iii!i!iiiiiiiiiii iii iiW iii i!i i iii iiiii i !i iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiil

the EcoRI site of M13mp18. Complete sequencing of four cDNA clones and partial sequencing of the three others showed that all were identical and came from the same ERV. The different clones varied in length from about 900 bp to 2303 bp but all had the 3' end with the poly(A) tail in common. This ERV was called XA34 after the longest cDNA clone with a length of 2303 bp. Of the other 43 cDNA clones that scored positive after the low stringency probing, none were of ERV- 9 origin and none could be identified as being of retroviral

origin. XA34 starts from the C-terminal end of the reverse

transcriptase of the pol region followed by a partially truncated integrase and a truncated env region respectively that ends with a 3' Alu-LTR (Fig. 1). The somewhat truncated integrase ends with a stop codon at position 1817 followed almost immediately (position 1819) by an incomplete transmembrane (TM) protein. The integrase of XA34 is about 80 amino acids shorter than the corresponding protein of XA38, FeLV or BaEV. The env region, which completely lacks the surface (SU) protein, is also truncated at the C-terminal end of the TM region. The transmembrane protein region has strong hom- ology with the ERV-9 TM protein. So, it is rather likely that recombination of XA34 and an ERV-9-1ike retrovirus has taken place at about position 1817 resulting in XA34. Further, a second event has probably occurred where the C-terminal end has recombined with an Alu repeat. Interestingly, the truncated XA34 env region contains the conserved region that has been

X A 3 4 ~ [

(378 bp) (142-155 bp)

(593) E

i

polymerase 4 t -

(Z793)

XA35 5" I

X A 3 6 s'l

integrase 41-

(693)

t3" (938 bp)

P I

] 3" (936bp)

I TM ] Alu -LTR IAn3" (2303bp)

X A 3 7 s-I I 3" (939 bp)

X A 3 8 5" I polymerase

/ (z6ol)

sulTM I A'u Alu II Alu

integrase

P i

I I I

500 bp

/ (2600)

I 3" (4729 bp)

l - p o s i t i o n of probes --I~-position of primers

Fig. 1. Schematic representation of the 2303 bp XA34 cDNA clone and the PCR generated 936-939 bp clones XA35, XA36 and XA37. The 4-5 kb XA38 genornic done is at the bottom. Restriction enzyme sites are indicated: E, EcoRV; P, Pstl. -fhe probes used for isolating the XA34 related elements and for the Southern blot analyses are indicated (bars) at the top of the figure. The positions of AIu repeats in XA34 and XA38 are marked.

63 . =

~A94 I a L G V b G Q Q K G N P P P V A Y L 8 K Q b N T V K G W R A C F K

.... ~£A34 ~ .......... L E V V C~AC AsC, C ......... L A L EAA ASC, C A~ ............................................. K L T F S Q M T T V H S S H N L Q D ~C ......... L S S ? G~ ............ V S S

XA34 CTT CCT CCT TCC A¢~ ATT CAA TTA CTC CAT GCC CTC TTT ATA AAA AAT CCC AAA ATT CAG TCT TAC CAG AAG TGC TTC CCT CAA CCC AC-C ATC CTT ACN CCC C~T ;~C CTC TTC CCT TCC 360 XA3S A ATT CTG CCT CTC CAA AAA TGC TCC CCT CAG TCC TGC AT(? TCT ACC TCC A~T GTC TTC CTC TTC TCC 67 XA34 L P P S ~ I Q L L M A L F I K N P K I Q S Y Q K C F P Q P S [ L T P R ~ L F P S XA~8 I L ~ L Q ~ C S P Q S ¢ I S T S S V F n F S

XA34 TAC TCA T'9C TTC CAC TC, A TAT ¢CT ~93A CCA C., GTA CAG CCA CAT TTC CCA AAC ATT TCC TCC GAG CCT CTC ACC AAC CCC GAT CAT CAA CTA TTC ATA CAT ~ TCC TCT TCC AGG CCC 478 XA3S CAT CCA CTC CTO TAC TGA AAT QCT TGA TCA C.. CTG CT~ CAT CAT TTT CCC AAT ATC TiC CCG C, AA CCT QTT CTT OAT CCT GAT GAC CAG C~A TT'f~ ATA CAT G¢~ TCC TCC ~CC AAG TCC 195 XA35 C GAT CAT CAA CTA TTC ATA GAP GC, C TCC TCT TCT Q, GG CCC 4D XA36 C GAT GAT CAA CTA TTC ATA GAT C.~ YCC "DCT TCT ~ CCC 40 XA37 C GAT CAT CAA CTA TTC ATA OAT C-~ TCC TCT TCC AGA U, CC 40 XA34 Y S P F R * Y P G P X V Q P R F P N [ $ $ E P L T N P D D Q L F I D ~ S S S R P XA3S H P i L Y N P * S X L L H H F P N I S P E P L L D P D D Q L F I D G S S S K S XA35 D D Q 5 F ~ D ¢ S S S G P XA36 D D Q L F I D ~ 8 S S G P XA37 D D ~ L F ~ D G S S S R A

XA34 ACC AGC TAC CCC CCA GAT TC.C TGG ATA TGC AOT TGT TTC CCT TG, ACC AAG TAA T~G A~kO .............. CCA GGA TCC TCC TCC CAA AAA C/CA GAA CTC ATA C4CT CTC ACC AGO 592 XA38 GTC AAT T.C CAA TAA AAT TGC TGG ATA TCC ........... T TG, ACA GAG TAG TTA AAG CTA AGC CCC TAC CC. CCT GGA ACC TCC TCC CAA AAG GCA GAA CTC ATA C~CT CTC ACT AGG 291 XA35 ACC AC.C T,C CCC CAA AAT TC-C TGG ATA TAC A~'F TOT TTC CCT TG, AC AAG TAA TTG AAG C~A AC~ CCC TAC CC. CCA CC.A ACC TCC TCC CAA AA~ C.CA GAA CTC ATA C~CT CTC ACC AGO 155 XA36 ACC GGC T,C CCC CAA AAT TC-C TC~ ATA TCC AGT TOT TTC TCT TG, ACC GAG TA$~ q'TG AAG CCA AAC CCC TAC CC. CCA GGA ACA CCC TC KAA AAA C, CA GAA CTC ATA C.CT CTC ACC AGG 156 XA3~ ACT C~ TGT CCC A~ AFT .Cd2 TGT GTA TCC AGT AGT TAC CTT AA. ACC ~J~G TAA T~A A~G CTG ACC TCC TCC CC. CCA CAA ACC TCC TCC CAA AAA CCA C-AA C~'? ATA ~CT CTC TCC AGA ~5Y XAS~ T S Y P P D C %~ ~ C S C ~ P X T ~< * L K . . . . . ~ G 5 ~ S Q K A ~ L I A L T R X~3~ V N × Q " N C W I C , . X X T E " L ~ L • P Y X P S r g S Q • A E L Z A L T R XA35 T S X P Q N C W I ¥ I C F P X X K * b K P S P Y X g G T S S Q K A E b I A L T R XA3~ T S X ~ Q N ¢ W I C S S F S X T ~ * L ~ P N P 7 X ~ G r A X ~ ~ A ~ L I a L T R XA3~ T S ¢ ~ K N X ¢ V C S S Y 5 X T K L ~ L ~ S S X ~ Q • S S Q K A ~ L X A n s R

XA34 GCC CTA AAT CTT TCC AAA C~C-C AAA CGA GTC AAC ATT TAT ACA GAC TCC AAA TAT GCC TAT CAC ATT t¢T CGT T¢C CAC ~CT C~CT ATC TC~ CAA AAO AGA O~A CTC CTT ACT C, CC AAA GGA ?O2 XA38 C, CC CTA ACC CTC TCA A~ ¢,¢,C AAA cOG GTC AAC ATT TTT ACA GAC TCT AAT TAT ACC GAT CAC ATT C'F~ CAT ~CT CC.C ACC ACC ATC ~ CAO AAA AeA ,CA TTC CTT ACT ~$CC AAA ~A 410 XA3~ CCT CTA ACC CTT TCC A~A C,C,C A]~A TGA GTC AAC ATT TAT KCA GAC TCC AAA TAT C, CC TAT CAC AT'T CGT CAT TCC CAC ACC TCC ATC TC~ CAA GAG AeA CC, A TTC CTT ACT CCC AAA GGA 2?6 XA35 C-CC CTA ACC CTT TCC A~ f~GC AAA TAA GTC AAC ACT TAC ACA GAC TC.C AAA TAT ACC TAC CAT ATT CT'~ CA~ TCC CAT GCC GCC ATC TGG CA& GAG A~A TC~A TTC CTT ACT C, CC A~ GGA 276 XA3~ C-CC CTA ACC CT~ TCC AAA .TC AAA CGA GTC ATC ATT TAC CCA GAC TCC AAA TAT GCC TAT CAC ATT CTT CAT TCC CAT GCC ACC ATC TGG CAA GAG A~A C~A TTC CTT AT. CCC AAA GGA 275 XA3~ A L ~ L S K ~ K R V N I Y T D ~ ~ ~ A ~ ~ r P ~ s ~ A A I W ~ ~ R 0 L ~ ~ A ~ XA38 A L T L S K G K K V N I F T D S N Y T D R I 5 M S R T T I W Q K R X F L T A K O X~3S A L T n S K G K - v N I Y T ~ s ~ Y ~ Y ~ ~ ~ ~ s R T S l W ~ ~ ~ ~ F b T A K G XA36 A L T L S K ~ ~ V N T Y T D C X ~ T Y ~ 1 U ~ S ~ A A I W Q E ~ * ~ U T A ~ G XA)? A L T L S K X K R V I I [ ~ D $ ~ Y A Y }~ [ L }~ S R A T / W Q E ~ S F L X A K O

XA34 ACC CQT ATC ACT AAT ~C CAC CTT ATT TAC TAA CTC CTT CAG GCC ACa CAC CTC CCA OCT AAA OCA C~A GTT ATA CAC TOT TGA GGA CAT CGA ACA C~T TCA CAT CAA ATC TCG A .... ... ~17 XA38 ACC CQC GTT AAA AAC ¢,GC CCC CTT ATT TAC CAG CTC CTC CAO GCT OCA T~C CTC CCA ACT TAG QCA G,q~ GTT ATA TAC TGT CAG <3C, A CGT GAA ACA GTA -~-~A GAC A~A ATA TCA A.. AAA 528 XA3S ACC CQC ATC ACT AAC AGC CCC CTT ATT TAC CAA CTC C'F? CAG GTT CCA CAC CTC CCA CCG AAA CCA ~GA GTT ATA CAC TG~T CGA GGA CAT CAA ACA C~A TCA GAT OAA ATC '/CA A.. GA. 393 XA3~ ACC CTC ATC ACT AAC C, GC CCC CTT ATT TAC CAA CTC CTT CA~ GCC C-CA CAC CTC CCA GCT AAA ACG GGA GTT ATA CAC T~T AGA CC~ CAT CA~ ACA C,~ TCA GAT ~AG ATC TCC A., GA, 393 XA3? ACC CCC ATC ACT AAC C,C-C CCC CTT ATT TAC CAA CTC CTT CAG GCT C, CA CAC CTT CCA ACT AAA GCA C~A GTT ATA SAC ~T TGA ~ CAC CAA ACA C-C:A TCA CAT GAA ATC TCA A.. GA. 392 XA34 T P ~ T N G H L I Y * L L Q A T H L P A K A ~ V I H C • G ~ R T ~ $ D E I S X

XA35 T ~ I T N S P L I g L L V A R L P A K A G V I H C G H Q T G S D E I S X X

. . . . . . . . . . . . . QQ L L QQ A K H i . . . . . . . . C . . . . . . . . . . . . . XA~7 T ~ 1 T N ~ P L I ~ 5 L ~ ~ . ~ T ~ ~ ~ V ~ H " O ~ Q ~ 0 S D £ 1 s X X

XA~4 G~C AAC AC, A AAG ~C¢ CAT GAG CCA CCA AAA GAA C-CC TCQ CTT TCT TCT C, CC TCT OCC CCT CTC C'FT CCT GTT ATC CCA CCA ATC CTA CCC AAG AAC TCT CCC ACC eAO ;~ C, CT T'?G CTA 937 XA38 GGG AAT AC, A AAA C, CC A2~C GAG GCT C, CA AAA GAA C.CC TCC CTC TCA TCA CCC CCT C, CC CCT ~TC CTC cTT CTT ACC OCA C-CA ATC CAA CCC AGG TAC Tic CCC ACC AAG AA~ ~CT TTG CTA 648 MA35 C~ AAC AGA AAG OCC AAT ~AG C, CA OCA AAA ~AA TCC TCC CTT TOT TCT CCC CCT CCC CCT CTC CTC ~TC ATT ATC TCA ¢~A ATC CAA CC~ AAG TAC TQT CTC ACT ~AS A~ C, CT TCG CTA 513 XA36 C, OA AAC AC, A AAG GCC GAT GAG GTA GCA AAA G~A OCC TCC '/TT TCT TCT GTC CCT CCC CCT CTC CTC . ATT ACC OTG CCA ATe CAA CCC AAG TAC TCT CCC ACT CAA AAA GCT TTG CTA 510 XA37 ~ AAC A~A AA~ C4ZT ~AT e-AG GCA GCA AAA GA A CCC TC~ CTT TCT TCT GTC CCT GCC CCC CTC CTC CTC ATT CCC CCA GCA GTC CAA CCC AAG TAC T~T CCC ACC CAA AAG CCT TCA CTA 512 XA34 G N R K A D R A A K E A $ L S S A $ & P L L .~ V I P A I L P K N S P T E K A L L XA3~ e N R K A N E A A K Z A S L S S A P A P V L L V T P A I Q P R Y S P T K K A L L XA]5 e N ~ ~ A N • A A K E S S n s s A ~ ~ P L & b ~ I ~ A I 0 P ~ ? ~ L T ~ K A S L XA35 <~ N R K A D E A A K E A S F S S V p A P L L I T L A I Q P K Y $ P T E K A L L XA37 G N R ~ A D E A A K E A E L S S %" P A R L L L I P P A V C P K Y ~ P T Q K P S

Fig. 2. For legend see facing page.

implicated in immune suppression (Cianciolo et al., 1985). However, it lacks the SU region and the C-terminal part of p15E including the TM region. The presence of several frame shift and stop codon mutations in the coding regions makes it unlikely that the pol-env transcript would result in either PoI or Env peptides or proteins.

We performed Southern blot analysis of EcoRI-digested human DNA using a 378 bp XA34 pol probe (data not shown). This Southern blot revealed that XA34 is a multi-copy ERV. After high stringency washes the blot showed strongest hybridization with mainly three fragments. These three EcoRI fragments were about 4"5 kb, 7 kb and 20 kb in length. From a preparative gel three gel slices containing the above EcoRI- digested DNA were excised. PCR amplification of DNA isolated from the three different gel slices using primers 593/693 resulted in the amplification of 940 bp fragments in each case. The primer pair 593/693 was chosen from XA34 to amplify a reasonably large part of pol. Primers were also chosen from somewhat conserved regions after comparison of XA34 with other ERVs. The PCR fragments were cloned and sequenced. Besides XA34, the sequencing identified three additional retroviruses closely related to XA34 that were named XA35, XA36 and XA37 (Figs I and 2).

In an attempt to isolate the env region from a member of the XA34 family we made a small library from a 4"5 kb EcoRI gel

slice in ,~ gtlO. The small library was screened with the 378 bp XA34 pol probe and one positive clone of 4729 bp was further analysed. This ERV was, surprisingly, not identical to the clone XA35 that was identified after PCR of the 4-5 kb gel slice. This fifth ERV was designated XA38. The XA38 clone spanned from poI towards the 3' flanking side. The pol region spans from position 1 to position 1817 demonstrating an XA38 complete pol over this region.

The env region of XA34 is closely related to ERV-9 and shows almost no homology with the env region of the RGHs. This disparity, along with the missing splice acceptor site and g p 7 0 region (see Fig. 3), clearly demonstrate that the env region of XA34 is a result of recombination. The ERV XA38 (Fig. 4) shows a cryptic env region. The XA38 pol stops at position 1877 and from position 2827 to 3000 there is an env region with strong homology to the env of RGH. This env region corresponds to a region from the end of the SU region to the beginning of the TM region. Computer analysis of the XA38 env region (Fig. 5) demonstrates that the closest relatives are RGHI and RGH2, which coincides with the analysis of the pol region (see below). Hence, the env of XA38 is probably the original env, although this region of the retrovirus seems to have mutated more than the pol region and the start of the env region is difficult to recognize. The env region is also slightly truncated at the C-terminal end. Shortly after the beginning of

3,

XA]4 XA38 XA~5 XA~6 xaa~ XA]4 xa]~ XA35 XA36 XA~

X,X54 XA3S XA3S

XA3V XA34 X~38 XA35 XA36 XA~V

XA3~ XA3S XA3$ X ~ XAa~ X ~ 4 XA~g X~35 XA36 X ~ 7

XA34 XA3e XA3S XA36 X~3V XA34 X a ~ XA35 XA~6 X A ~

XA~4

XA3B

XA34 XA3~ XA34 X~a~

XA3~ XA3~ XA34 X ~ S

XA3~ XA3~ XA34 XA3S

XA3~ XA3~

XA3B XA3~

XA~@

!i!iii iii iiii!iiiiii iii iiW iiii iiii i iiii iiii! i iiiiiiiiiiiiiiiiiiiiiiiiii!iiiiiii iiiiiiiiiiiiiiiii!iii . TA CAG CAA C, GA C, CC TCC TTT CAA C-CG GAC CTA CAG CAA C, AA C-CC TCC TTT CAA GGG GAC CTA CAG CAA C.CA C, CC TCC TTT CAA C4~G .AC CTA CAG CAA GGA OCC TCC CTC CAA GGG GAT TGG ATA GTtC CTA GAG CAG C.,¢,C; C, CC TCC CTT CAA C4~A GAC TC~G ATA A~

L ~ A S F ¢ I~ w ~ : L Q Q G A R F Q G X W I V L Q Q c A S L Q ~ D W I V I~ Q Q U A S L Q ~ D W z I

cGc ccc C.A TAC CTA CTC CTT CAC CCT ,AT TTC TCC TCC CA¢ TTC CTG TAC CTA CTC CT'f TGC CCT TAT TTC TCC TCC CAC CCC .TG 'fAC CTA CTC cat cc, c CCT TAT TTC TCC TCT c~,c T~c c' i~ T;~c cT~, cTc c ' ~ .~c cC~ TK~ ~ ~'CC TC~ CAC CCC CTO TAC CTG CTC CTT CC, C CCT TAT TTC TCC 'i"CT R Y X Y L L L H P X F S $ H F L Y L L L C ~ Y F $ S m ~ X Y L L H m P y F s s H F L Y L L n X ~ y ~ S S R ~ L Y L Z. L ~ ~ Y F S S

TCT CCC TCC ATC CCT ACA CAT CCC CTC AGA GGA ACA CTC TCT CCC CCT ATT CCT ACA CAT CAG CTC AC, A ~ ACA CTC TCT CCC CCT ATT CCT ACA CAT CAG CTC AGA C, GA ACA CTC TCT CCC TCC ATC CCT ACA CAT CAG CTC AC~% GGA ACA AC, C TCT CTC CTC ATC A'I'~ ATA CAT CAG CTT AGA C, OA ACA CTC S P S I P T H P L R G T L $ P P I P T ~I Q L R G T L S P P I P T H Q L R G T L S P S I P r H Q L R G T E s L L I ~ I ~ Q L a S T L

TTC TCT AOG TOG GTA GAA GCA TTT CCT ACC TCT TCA GAA TTC TCT GGG ~OG GTA GAG GCA TTA CGT ACC TCT TCA GAA TTC TCT ~ TC, G GTA GAA C, CA TTT CCT ACC TTT CCA GAA TTC TCT C,C,G TC,~ GTA GAA CCA T~ CCA ACC TCT TCA GAA TTT TCA GGT TC4~ GTA GAA C.CA Tq~ CCT ),CC CCT TCA GAA F S R W V E A F P T S S E F S G W V E A L R T S S E F $ G W V E A F P T F P E V S G W V E A F p T S S E F S G W V m A F ~ T P s

GGC CCC TAG C.. ~ ATC TCC CAA ATC ACT CAA CAG GTT C-GC CC, TAG C,. TTC ATC TCC C~O ATC ATC CAA CAG GTT

G X X F I S I I v

ATC CTT AAG GCT CAG TTA ACC Ad~A CTC ACT CrfT GAA GTC ATC CTT CAG C-CT CAG TTA ACC AAA CTC AAG CTT AAA STC I L K A Q L 'P K L T L E V I L Q A Q L T K L K L K V

T'Pr GAG TTA ATA TAC GGA CCC CCT TTC CTC TTA CAA AAC TlY~ GAG TTA ATG TAT C,C.A CC, C CCT TAC CTC TTA CAA Jh%C F m L I Y G m m F L L Q N F ~ L m Y G R m Y L L Q N

CAG GCC CTC CCA AAA CCC CAC GAA C.C.G GTC TCC AAT CCC CAA GCC CTC CCA Ad~A CCC CAC ~ C,C,C CCC ACT GAC TC.~ QQ A L P K P H E G V S N P

A L P K P H K O P T D W

TGG ATA GTC AAR KAT CAA AAU CTC GTC CTC CCC TAA GAG CAG ACC AAA GAA ATT CTG ACA TCT CTT CAC CAA TCC TTC CAT ATC AOT GTG 1056 TOG ATA ATC AAA AGT CAA AAG CTC ATC CTC CCC CAA GAA CAA ACC AAA TJ~A ATT CTA ACA TCT CTT CA¢ CAA TCC TTC CA~ ATC GGT GCA 768 TC, G ATA G~"vC ~ AAT CAA A~.G CTC ATC CTC CCC CAA GAG GAA ACC AAA GAA A'PT C'IK) ACA TCT CTT CA<" CAA TC~ T'fC CAT ATT G~T GAG 632

AAA AAT CAA AAG CTT GTC CTC CCC CAA GAC CAA ACC AAA GAA A'Pr ATA ACA TCT CTG CAC CAA TCC TTT CAT ATC AGT C-CA 630 AAA AAT CAA AAG CTC GTC CTC GCC CAA GAG CAA ACC AAG GAA ATT GTG ACA TCT CTT CAC CAA TCC TTC CAT ATC GOT GCC- 632

K s ~ ~ I L ~ ~ g Q ~ K ~ L ~ S L ~ S ~ u I G a K N Q K L I L P Q E E T K E I L T $ L H Q S '; H I G E K N Q a U V L P Q E Q T K E I I T S L H Q S F H I S A K N Q K L V L A Q E Q T K E I V T S L H Q S F H I G A

CCC CAT CTA ~ ACC 'rCA CTA AGA ¢.,AC ATA ACC TCA AAC T~T CCT ATA TGC TCT GTT ACT TCC TCC CAA GC~G C, CC CTC CAC i174 CCC CAC CTA TTC ATC TCA CTA AGA CAC ATA ACC TCA AAC TGT CAT A'i'A TGC TCT GTT ACT TCC TCC CAA ~ GNC CTC CC.C B88 CCC ~ CTA T'fC ACO TCA CTA AAA C, AC ATA ACC TTA AAC TC:T CAT ATA TC, C TGT GTT ACT TCC TCC CAJk ~ GCC C'I~C CGC 751 CtC CAC TTA ~ ACC TCA CTA ~ CJ~C ATA ACC TCC A~.C TGT CGT ATA TC, C TCT GTT ACT TCC TCC CAA C,G<~ OCC CTC CAC 749 CCC CAT CeA TTC ACC ~A CTC ~ GAC ATA ACC TCA AAC TAT CAT ATA TGA TCT GTT ACT TCC TCC CAA C*CKI OCC CTC TC, C 752 P H L F T $ L R D I T $ N C P I C S V T S S ~ G A L H P H L F I S L R H I T S N C H I C S V T S $ ~ G X L R p X L F T S L K D I T S N C H I C C V T S S Q G A L R p M L F T $ L K D T T S N C R I C S V T S $ Q O A L :4 P H L F T S U K P I T S N Y ~ r • s v T S S Q o A L C

CCA CC43 GAG C-AC TGfl CAA ATA GAC TTC.~ AC<` CA<` ATT CAT CCC GTC AAG ACK~ ACA ~ TTT ATT C'yf aCT CTT ATA GAC ACC 1294 CCA C,~G GAG C.AC TGA CAG GTA C, AC TTC ACC CAC A~"~9 CCT CCC GTC AAG AAG ACA ~ TTT CTT C'I~I ~ ACT C'i'~T ATA GAC ACC IC08 CCA C.G~D CAG C4~C TCG CA). GTA GAC TTC ACC CAC ATO CCT CCC GTC AAG ACK~ ACA ~ TTT CTT Cri'~T ACT CTT ATA GAT ACC S?I CCA C,C*G A~KA C, AC TfJG CAG GTA GAC TTC ACC CAC ATG CCT CCC ATC ~ AGA ACA AAA TTT CT'f CT,i' ACT CTT ATA GAC A¢C 869 CCA C.CK~ GAG C,A.C TGG <`AA GTA GAC TTC ACC CAC ATC CTT CCC GTC AAA ~ AAT AAA TAT CTT CTT CTC CTC ATA C, AC ACC 872 P G E D W Q I D F T 14 I H p V K R T K F I L T L I D T P ~ ~ D • Q v D v T }~ M P p v K K T K F L n • L z D T p G Q D %q Q V D F T F~ M P P V K ~ T K F L L T L I D T P ~; K D W Q V D F T H M P P I K R T K F L L T L I D T P ~ ~ D W Q V U F T H I L P V K X N K V L L 5 L I D T

AAG GCC C, CA GAA GTC TCC CAA AT'i' CTT GTA ACA GAA ATe ATC COT ACA TT~ CC;T CTC CCT C,C,C TCC ATA CAA TCA GAC AAT 141~I AAG C, CC GCA GTA GTC TCT CAA AT<` CTT ACA ACA GAA ATC AT(: CCT ATA .TT CKIT CTC CCT ZAC TCC ATA CAA TCA GAC AAT 1127 AAO CCC C, CA GAA GTC TCC CAA ATT CTT G 938 AAG OCT C-CA GAA GTC TCC CAA ATT CTT G 936

GTC GC~, ~ GTC TGC C;~ ATT CTT G 9~9

K A A V V S I L T T E Z I p I x G L P H I S D N K A a ~ V s 0 z L K A A ~ V S Q I L K v A ~ v ~ e I L

TCT CAG TCC c T r GC-C ATe tAG TC, G CGT ¢'~C CAT ATC CCA TCC TGJ cC¢ tAG ACA T¢C C,~;~ A~a ~TC GAA AC,~ C-CA AAT ~ 1532 TTT CAG TCC CTT C.GC ~TC CAG AC-G :[~-,C TTC CGT AT(" CCA TAA TGG CCC CAG TCA TCC AGA ~ GTC AGA AGG GCA AAT C,C.G 1244

F S L ~ V R <̀ F R I P W P S $ R K V R R A N G

CAd~ AAA CCA KCK; ACC TCC CTT .~fG CCC ATA C, CA CTG GAG AGC AT'r AGA C.CC AGT CCA AAA C, CA CCC TCC TTC CTC AGT CCA 1652 CAA AAA CCA TC-G ACC TCC CTT TTG CCC ATA C, CA CTCI C.CC TC, C ATC AGA C, CA AOT CCG AAA TCA CCC TTC TTC CTT AC, C CCA 1364

K P X T S L L P I A L E S I R A S P K A p S F L S P K P W T S L L P I A L A ¢ I R A S P K S P F F L S P

AGG CCC CCT TCT AAC TCT CAG CTA C, GA GAA TAC CTC CCA ACA GTC TCC CTC ATG AC-C TAT C'fC CTC T<}C CAA CAA C.CC GAC I762 AC, G CCT CCT CTC GAC TCC CAG CTA C4ZA GAA TAT CTC CCA ACC TTC TCC CTT ATC CGT CAT CTC CTC TGT GAA ..... T GAT 1479 R P P S N S Q L G E Y L P T V S L M S Y L L C Q Q A D R P P L D $ Q L G E Y L P T F S L I R H L L C E X D

TAG 1822 ACT CTC CTA CCA GGA AAG TAT GTC T*'!'C CTA AAA ACT CTT AAC CCK ACA AGG CTA A~%A CCA AGG TC~.% GAA GC~C CCT TTT CAA 1599 K T L L P G K Y V F L K T L N p T R L K P R W E G P F Q

H a $ W Y H S R V K R P a V P T P Q T V L " Q S

T R N T L Q H T K Y •

Fig. 2. Alignment of po[ regions from XA34, XA35, XA36, XA37 and XA38. The deduced amino acids aFe shown below the

DNA sequences. Asterisks mark stop codons.

the TM region the sequence has an Alu repeat starting at about position 3050 (Fig. 4). No obvious 3' LTR can be identified but three Alu repeats downstream from the env region are recognizable.

Computer analysis of the 940 bp po] fragments of XA34, XA35, XA36, XA37 and XA38 showed (Fig. 6) that these viruses are grouped together and the closest relatives among retroviruses described are RGH1 and RGH2 (Hirose et a]., 1993). The RGHs belong to the RTVLH (Mager & Freeman, 1987) retroviral family, which, so far, is the most abundant retrovirus found in the human genome. A recent publication by Mager & Freeman (I995) studies the pol region of a New World monkey ERV resembling the human RTVLH ERVs. This marmoset ERV, as well as other human ERVs from this study, group together clearly outside the XA34 ERVs when included in the analysis (data not shown).

To generate probes that could discriminate between the different retroviruses we performed PCR with the primer pair 593/2793 on the cloned 940 bp fragments. Primer 2793 was chosen from a conserved region so that the primer pair 593/2793 could amplify all the five cloned retroviruses. This PCR generated a 142-155 bp probe from each clone and these

short probes were used for Southern blot analysis. Southern blot hybridization of the I42-I55 bp probes from XA34- XA38 to PstI-digested male DNAs from man, chimpanzee, orangutan, squirrel monkey and rat are shown in Fig. 7.

The hybridization pattern shown in Fig. 7 is well conserved among the primates. The chimpanzee is most similar to the human but the orangutan also shows striking homology. After the rather high stringency washes (1 x SSC at 65 °C for 90 min) hybridization to DNA from squirrel monkey, repre- senting a New World monkey, gives conspicuous bands. The positions of the bands in the hybridization to the squirrel monkey DNA are different from that for the chimpanzee and the orangutan. Considering the stringency of the washes and the conspicuous bands, we interpret the bands in the squirrel monkey to represent viruses common to those in the other primates and not of separate origin. The different positions of the bands in the squirrel monkey probably reflect mutations in the restriction enzyme site(s) resulting in alternative fragment lengths. If this is so, the introduction of the XA34 endogenous retroviruses in the primates took place between 35 to 45 million years ago, which is the estimated time of divergence between the Old and New World monkeys.

63[

TTG~TA~CTC~TGTCTTAG~C~CAGAAAGGAAATCCTCCTTCCTTTGCCCCTGTA~TTACCTCTCTAAAC~CTAGAT~CACAGTCAAA~GT~CCAGCCTGCTTTAAA 120 L G I A L G V L G Q Q K G N P P F A P V A Y L S K Q L D N T V K G W P A C F E

GCACTAG~GT~TA~CA~TTAGCTCTAGAAAGCA~AAACT~CTTTCAGCCAG~TACCACCGTCCACAGTTCTCAT~TCTAC~GATCTCCTCTCCTCCC~GCAGT~TCC 240 A L E V V A E L A L E S R K L T F S Q N T T V H S S N L Q D L L S S Q A V S S

CTTCCTCCTTCCA~ATTC~TTACTCCATGCCCTCTTT~AAA~TCCC/~AA.TTCAGTCTTACCAG~GT~TTCCCTC~CCCAGCATCCTTACNCCCCGTATCCTCTTCCCTTCC 360 L P P S R I Q L L H A L F I K N P K ~ Q S Y Q K C F Q P S I L T P R I F P S

TACTCATTCTTCCACTGATATCCTGGA~CAC..GTACAGCCACATTTCCCAAACATTTCCTCCGA~CTCTCACC~CCCCGA~GATC~CTATTCATAGATG~TCCTCTTCCA~CCC 478 Y S F F H * Y p G P X V Q P H F P N I S S E p L T N D D Q L F I D G S S R P

ACCAGCTACCCCCCAGATT~TG~TATGCAGTTGTTTCCCTTG.ACC~GT~TTG~CA~ATCCTCCTCCC.~A~AG~CTCATA~TCTCACCA~CCT~TCTTTCC 598 T S Y P P D C W I C S C F P X T K * L K P G S S S Q A E L I A L T R A N L S

AAA~CAAAC~AGTC~CATTTATACAGAcTCCAAATAT~CTATCACATT~CTCGTTCCCAC~TGcTATCT~CAAAAGAGA~AcTCCTTACTGcCAAAGGAACCCCTATCACT~T 718 K G K R V N I Y T D S K Y A Y H I P R S H A A I W Q R G L L T A K G T I T N

GGeCACCTTATTTACT~CTcCTTCAGC~2CACACAcCTCCCAGcT.~2a~GCA~AGTTATACACTGTTGA~ACATCG~CA~TTCAGATG2~KATCTCGA..G~CAG~%.~cCGAT 836 G H L I Y * L L Q A T H L P A K A G V I H C * G H R G S D E I S X G N R K A D

GAG~A~AAAAG~cTCCCTTTCTTCTGCCTCTGCCCCTCTCCTTCCTGTTATCCCAGC~TCCTACCC~G~CTCTCCCACCGAGAAA~TTTGCTA.TACA~AGCCTCC 955 E A A K E A S L S S A S A P L L P V I P A I L P K N P T E K A L L X Q Q G A S

TTTC~GGG~CT~ATAGTC~R/~TC~A~TCGTCCTCCCCT~GA~AG~CC~GAAATTCTG~ATCTCTTCACC~TCCTTCCATATCAGTGT~GCCCCC~ATACCTACTC 1075 F Q G D W I V K N Q K L V L P * E Q T K E I L T S L Q S F H I S V R P X Y L L

CTTCACCCT~ATTTCTCCTCCCCCCATCTATTCACCTCACT~GAGACAT~CCTC/~ACTGTCCTAT~TGCTCTGTTACTTCCTCCC~G~CCTC~CTCTCCCTCCATCCCTACA 1194 L H P X F S S P H L F T S L R D I T S N C P I C S V S S Q G A L H S P S I P T

CATCCCcTCAGAGG~CACTCCCAGGGGA~ACTG~ATAGACTTCACCCACATTCATCCCGTC~GAGGACA~ATTTATTCTTACTCTTATAGACACcTTCTCTA~TG~TAG~ 1314 H P L R G T L P G E D W Q I D F T H I H P V K R T K I L T L I D T F S R W V E

GCATTTCCT~CTCTTCAG.~.A~CC~AG~GTCTCCC~TTCTTGT~CAG~ATCATCCCTACATTTGGTCTCCCT~TCC~TAC~TCAGAC~TG~CCCT~C..TTC~TC 1432 A P P T S S E K A A E V S Q I L V T E I I P T F G L G S I Q S D N G P * X F

TcCCa~A.TcACTC~cA~TTTcTcAGTCcCTTGGCATCCAGTC̀C~GTCTcCATATCcCAT~TG~cCCAGAcATccGG̀.~GTcG2~.~A.~.T~GATCCTT~TCAGTTA 1552 S Q I T Q Q V S Q L G I Q W R L H I P C W P Q T S K V E R A N G I L K A Q L

ACCAA-ACTC~TCTTG~GTcCAAAAACC~ACCTCCCTTTTGCCCATA~ACTGGAGA~ATTAGA~CAGTcCAAAA~ACc~TCCTTCCTCAGTCCATTTGAGTT~TATAC~A 1672 T K L T L E V Q K P G T S L L P I A L E S I E A S P K A P S F L S P F E L I Y

c~cCTTTCCTcTTAc~AcAG~CcCCTTCT~cTcTCA~TA~AG~TACCTCcC~cAGTcTccCTCATGAGCTATcTCCTCTGCC~c~GccGACCAG~ccTcCc2Ja~%Accc 1792 R P F L L Q N R P S N S Q L G E Y L P T V S L M S L L C Q Q A D Q A L P K

CAC~-GGGGTCTCC~TCCCA~TAGACTT~TCTCc~TTCC~GACTCTTT~GCA~AGTGACACTTc/~CCATCGA~TTTAGATCTCCTCACT~CGAG~G~CCT 1912 H E G V S N P K * L A L O F O K T L * A A V T ~ O N H R G L D L L T A E K G G L

GTGTATCTTTTTAG~GAGGAGT~TGTTTCTATACT~CCAGTCAG~CTAGTAC~GAT~T~T~ACG~TA.~TGA/~̀ ~TTCTC,C~CGGGT~AGTCGTCACGCCTGT~TC 2032 ¢ I F L E E E ~ C F Y T N Q S G ~ V Q D A A Q ~ I N E K A S < < ...........................

CCAGATCTTTG~AG~TGAG~C~C.~ATCACGA~TCA~AGATCGAGACCATCTT~CT~CACGGTG~.AccCCGTCTCTACT-~.A/~TAG~.~%TTA~CG~CGTGGT~C 2252 ................ ~U REPEAT ............................................................................................

C~CA~TGTACTCCCA~TACTCA~AG~TGAG~A~AG~ToGCGTGTG~GTG~CCC~GAGGCGGA~TT~AGTGAGCAGAGATGAT~CACT~ACTCCA~CT~GAC2372

A G A ~ C T C C A T C T C A ~ ................ >>

Fig. 3. Sequence of ~ 3 4 with deduced transl~ion of the pol region and the ERV-9-1ike env region (underlined). The Alu repe~ is marked. Asteris~ mark stop codons.

After hybridization with the specific 142-155 bp po] probes of XA34-XA38 (Fig. 7) bands that are not only cognate but also homologous can in some cases be resolved. It is obvious that, in the primate genomes, there are many more copies of endogenous retroviruses related to XA34 than just the five cloned members. In the hybridization with XA35 (Fig. 7) i6 bands can be seen. We chose the short pol probe since there are significant differences between the viruses within the inter- primer region. Moreover, there is no PstI site in the cloned retroviruses within the probes and the chance of having a PstI site in a ~ I50 bp sequence is low. Hence, it is likely that one band represents one endogenous retrovirus which means that the XA34 family consists of about 16 members in the human genome. From the hybridization it can be noted that some of the bands are denser than others. Since the density varies with

different probes in Fig. 3 this is probably caused by homologous hybridization and not by either multiple bands or a conserved PstI site within several endogenous retroviruses. On this basis it is likely that the number of XA34-1ike retroviruses in the genome is close to I6.

From the hybridization with the XA34 probe (Fig. 8) weaker double bands of about 3 kb in size are seen in both the human and the chimpanzee male DNA, whereas only the shorter of these bands can be seen in the female DNAs. We propose that this band in the male represents an ERV on the Y chromosome. It is rather unlikely that this extra band should represent an RFLP that was recognized both in the human and chimpanzee DNA. Moreover, we see this extra band in other human male DNA of different racial origin (data not shown). However, RFLP can be seen and one example of this is the

63f

EcoRI G _AATTCT~¢TCTCCAAAAATGCTC CCCTCAGTCCTGCATCTCTACCTCCAGTGTCTTCC TCTTC TCCCATCCACTCCTGTACTGAAATC CTTGATCACC TGCTGCATCATTTTCCCAAT 120

ATCTCC CCGGAACCTCTTCTTGATCC TGATGACCAC-C TATTTATAGATGGCTCCTCCTCCAAGTCCGTCAATTCCAATAAAATTGCTGGATATGCTTGACAGAGTAGTTAAAGCTAAGC C 240

CCTACC CCCTGGAACCTCCTC CCAAAAGGCAC~AACTCATAC~CTCTCACTAGGGCCC TAAC CCTCTCAAAAGC-CAAACGGGTCAACATTTTTACAC~ACTCTAATTATAC CGATCACATTCT 360

TCATTCTCGCAC CACCATCTGC-CAGAAAAGAGATTCCTTACTGCCAAAGGAACC CC C GTTAAAAACGCgCCCCCTTATTTACCAGCTCCTCCAGGCTGCATGC CTCCCAACTTAGGCAGGG 480

GTTATATACTGTCAGGGACGTGAAACAGTATGAGACAAAATATCAAAAA~GAATAGAAAAGC CAACGAGGCTGCAAAAGAAGCCTCCCTCTCATCAGCCCCTGCCCCTGTCCTCCTTGT 600

TACCCCAGCAATCCAACCCAGGTACTCCCCCACCAAGAAGGCTTTGCTACTACAGCAAGAAGC C TCC TTTCAAGGGGACTGGATAATCAAAAGTCAAAAGCTCATCCTCCCCCAAGAACA 720

~ACCAAATAAATTC TAACATCTCTTCAC CAATCCTTC CA ATCGGTGCACACTTCCTGTACCTACTCcTTTGCccTTATTTcTCCTCCCCcCACCTATTcATCTcACTAAGACAcATAAC 840

CTCAAA~TGTCATATATC~CTCT$TTACTTCCTCCCAAC-GGGNCCTCCC.CTCTCCCCCTATTCCTACACATCAGCTCAGAGGAACACTCCCAGAGGAGGACTGACAGGTAGACTTCACCCA 960

CAT~CCTCCCGTCAAGAAGAcAAAATTTCTTCTTACTcTTATAGACACCTTCTC~GGGTGGGTAGAGGCATTACGTAccTCTTCAGAAAAGGCCGCAGTAGTCTcTCAAATCCTTACAAC 1080

AGAAATCATcCCTATATTGGTCTCCCTCACTCcATACAATCAGACAATGGCCcTAGCTTCATCTCcCA~ATCAT¢¢AACAC4~TTTTTCAGTCCCTTGGCGTCCAGAGGTGCTTcCGTATC 1200

~¢ATAATGG~¢¢AGTCATcCAGA~AAGTCAGAAGGGCAAATGGGATCCTTcAGGcTCAGTTAAC~AAACTCAAGCTTAAAGTCCAAAAACCATGGAcCTCCCTTTTGcCCATAGCACTG 1320

GCC TGCATCAGAGCAAGTC CGAAATCACCCTTCTTCC TTAGCCCATTTGAC~TTAATGTATGGACGCC CTTACCTCTTACAAAACAGGC CTCCTCTCGACTCCCAGCTAGGAGAATATCTC 1440

CCAACCTTCTCCCTTATCCGTCATC TCCTCTGTGAATGATCAAGCC CTCC CAAAACCC CACAAAGGCCC CACTGACTGGACTCTC C TACCAGGAAAGTATGTC TTCC TA~AACTCTTAA 1560

~¢ ¢AACAAGGCTAAAACCAAGGTGGGAAGGC CC TTTTCAAATTATCCTTACAAC CCCCAC TGCAGC CAAAC TCAGGACATGCCTC TTGGTACCATCTTTCCAGAGTAAAAAGGGCTC C TG 1680

~GGTGCCTACTGAAC CAC AGACTC.TCCTCCACCAATCCTCC AGCACC CTC CTTTGTCCAACC AAACTCTGCCTGAC GC C CATCC C TGAAGAAGGCCCACGAGC GGTAACTCTTTTACATA 1800

TCACTTCTGCCCCTGCTACAGATGCCAAGGTCCTCGTGACAAGGAACAGCC TCACCCTACAGCATACCAAATATTAGAATCCC TCTCTTCT CTC CC CATTTC CACATGAGGAAAAGTCTT 1920

AACCCACCTATACCTTGC CCATCTGCAAGGAGAATTTGACCATCCATGGGACGCTATCAGCCAC CTCATAGAC C CC CTAGAAGAAGCCAC CTCCC TGTATGGATC CTACCAGC TCCT C TC 2040

TCCAACAACAGATTCTGATTCTTC CTCAGATTCCTAATTC TTTTGACAGCCAGC C C CATGGC C TCTACCTCC TCAGGGTCC CTTCTCACTTCTACAAGTCTC GAATGATCTCTCAAC CTC 2160

ACTCACTTTTTGTTACTGCGCACCAATTCAACATTTGCTTCCAGTTTC TGGAC C TATTTTTGCCTTTCCTC TACTGC CTAC TC AGCTATCC CAATTT CTTAGCCC TGAGACCATTTTC CC 2280

ATTCACTTCTC TTACAACATCCAAAAGGGACCTGCC TTTGTAGACAGAGCCC TCAC TCT CAC T GGAAACTATCCCACTGCC C TCACTGACAGAGCTACCAAGCTC C TC TTCAATCTATAT 2400

T GCCAAC TATGATGACTCAAAACTCCC ATACCCCACCATC CAAGGTCTCATAACTCTCCATAC CACAAGACTTGGCCAAGCCC CCTTATGCATCAC CTCTTCCAATGGACACGTACACGT 2520

AGGCACCCTCCTTCCCTCTGCCTGTAACTCAACCCGAACCATCTCTCACCCTTCCTCCCATATCTCACTCTGAGTGGATTATTCAGTCTCCCCTGAAGTCAACGGTCTCTTTACTCAGCC 2640

CTTCTGCATTTCCTC C CTCCC ACCTCTCAAAC ATGGAAACCCCTTCCTTGGACAGGAGTGCTCAAAGGGGTTTCTGTCTC TTTC TCACT C TGGGTTGC TGAAGCTGAGCCAAGTCCC TCC 2760

AAGTGGGGAACCTACAAAcTGCAACACCTACTCTCAATCCACCTGTCTGTTTGcATAAATACCTCAGGCATCTTCTTTCTCTGCGGGTCCACAAcATAcTTCTGTCTCCTTACTAACTGG 2880 G I F F L C G S T T Y F C L L T N W

icAGG~A~TT~G~cTAG~$TG~TTAcT$~cATT~AT~GTcc~:~T~.~0&~c~c~GT~c~G~A~k~AcATA~$~c~&~c~&~T~T~c 3000 T G T C A L V C L T PN IN I V L S N Q E L P V P A T I H T H S K R T I QL I P

< ...... SU ......................... TM ........................................................................... >

TATTCACAATAGCAAAGACATGGACAGGC CC TAAATGCC TATCAATGGATAAAGGAAACACgGTAT$$TCAGAC GCGGT$$¢ TCATGC CTSTAATCC CAGCACTC TGGGAGGC TGAGGCA 3120

GGTGGATTGCC TGAGCTTAGAAGTTT GAGAC CAGCCTGGGCAACAGGGC~ATC CTGTCT CCATGCAAAATACAAAAATAGAAATTGGCCAGGT GTGGTAAAGCACAT( ,T GTAGTCC C C 3240

¢¢ACACCAAGTAC TTAGGAGGATGAGGTAGGAGACAATTGC TTGAGCC TGGGAGGTTGAGGC TAAAGTAAGCCAATATCATGCTATGACACTCTAGCC TGGGCCATAAGTGTGACC C TGT 3360

CT¢¢AAAAATAAAATAAAATAAAAGATTTAGT AAACAAAATATGGTACATAAACATCGTGGAATACTCT GTGGCCATTAAAAAAAGTGATCATGTCCTTTGCAAAAACATCAGTGAAC 3480

CT GGAGACCACTATTCTTAGAAAAC TAACGGAGAGCC TGGGCATGGTGGCTC GT GC CTGTA~T ¢~CAGCGCTTTGGGAGGGAGGTGGGCAAAT CAC CTGCAGTC~GGAGTTTGAGAC CAG 3600

¢¢TGA•CAACATGGAAAAAGCCCATCA•TACTAAAAATACAAAATTAGCAGGGCATGGTGGCATGCACCT•TAATCCCAGCTACTAGGGAGGCTGAGGCAGGAGAATCGCTTGAAC•TGG 3720

SAGACGAAGGTTGTAGTGAGCCGAGATTGC GCCATTGCACTC CAGTCTGGGCAAAAAGAGTAAAAC TCCATCT C AAAAAACAAAAC AAAAAAC AACAAAAAAAGAGAAAACTAATACAGA 3840

AAC AGAAAAC CAAATGC AT GTT AT TAT TTATAAGTAATAGC TAAATAATAAGAAC AC AT GAAC ACAA~GAGAAGAAC AAC AGACAC TGAGGCCT AGATGAGG GTGGAGGGTGTGAGGAC T 3960

AAGAGGATGAGAAAACAGAC C TGTTTAGTGCTACGGGTAGTACCT CAGTGAC _AAAATAATC TGCACACCAAACC ¢ ¢¢ATGACATAATTTTAGC TGTAAAACAAAGACACATGTATATCC C 4080

•AACCAAAAATAAAAGCTAAAAGAAAAAATATCCCTGGATGGCAGAGTGCAATGTAGCTGAAAGGACTGATTTTCACTACAGATAGTGGCCCAGGTGGGGCTGTACTGATTTATTTCTGT 4200

GTGAATGCAGGCAGATGAGAT•ATGAA•AGGTGGCCCAGAAGCTTAGGTTGGTGGAGAAAACAAGTTGCTGCTGCAGATTCAGTGTCTGAGGGTGGG•ATATG•cAGAAGAcTTGTAGAC 4320

ACTTGTGGGTTCTTGGCAAGA~.ACACTAGGATCAAAAATGCAGTGGTGAAGTTCCT GAGGGTGGTGCCTAGTCCTGGGAGGAGTGTGGACACATCAAT GTCTAGTGTGTGTGTGTTTGTG 4440

AGTGGGTGG~ATcCTGTGGTGGCAGATGCAAGACAGGGGTGTCTGTTCTTAGAGGTCCTTTTcTCTAAGTTTTCAGTCTTCTGTCACcCTGGGAGAAGACCTGGAATCACAGGAAAATG 4560

GGCAGTGTGACAGCCTGTGTACAGGAGAGCAGAGCCTCCCATTTCCAGAAACC CAGAGTTTTACTC CAGGCCAGGCC TCCATAATATCTTTTTTTCTGGC ACCAAATCTGTAGTTTGCTG 4680

AACATCAAACTATCCTCCAACACCAACTCATTGTCTAACATTTGAATTC 4729 EcoRI

Fig. 4. Sequence of the XA38 4729 bp genomic EcoRI fragment, The pol region from position 1 -1817 is underlined ending with a stop codon (marked with an asterisk). The env region follows pol and a region of the env at the end of the SU region and the beginning of the Tl'4 region is translated, Three Alu repeats that follow the truncated env are underlined.

163~

B a E V

FeLV

,MLV

HSRTVE

H U M E R 4 1

- - ERV3

L ~ HSRIRT

~ RESIV2DC

~ I SIVENVLTR

- - 1 ~ RESIVMPC

RGH1

R G H 2

- - ERV-9

- - HUMER41

VA34

XA35

XA36

~7

RGHI

RGH2

XA38 RTVLH2

Fig. 5 Fig. 6

Fig. 5. Phylogenetic relationships between Env proteins of XA38, RTVLH-RGH1, RTVLH-RGH2, ERV-3, HSRIRT, HUMER4-1, HSRTVE, BaEV, RD1143, RESIVMPC, SIVENVLTR and RESIV2DC given as a tree. The aligned protein of XA38 is the translation from Fig. 4 and the corresponding proteins from the other retroviruses. The programs used for aligning the sequences were Pileup and Lineup. The phylogeny was reconstructed using the programs Distances and Growtree. The Kimura protein distance correction algorithm was used together with the UPGMA algorithm.

Fig. 6. Phylogenetic relationships between Pol proteins and deduced Pol proteins of XA34, XA35, XA36, XA37, XA38, RTVLH2, RTVLH-RGH1, RTVLH-RGH2, ERV-9, HUMER4-1, MLV, FeLV and BaEV given as a tree. The aligned sequences of the XA34 family can be seen in Fig. 2. The Pol proteins for the other retroviruses are the corresponding sequences. The programs used for aligning the sequences were Pileup and Lineup from University of Wisconsin Genetics Computer Group (Devereux et at., 1984). The phylogeny was reconstructed using the programs Distances and Growtree. The Kimura protein distance correction algorithm was used together with the UPGMA algorithm.

5 kb band and the ~ 4"4 kb band seen in the human male and female DNA respectively. The female used in this blot is the only female where this shorter band of 4"4 kb has been detected. The human and chimpanzee male DNA 3 kb band cannot be detected in the orangutan. However, since the homology between this virus present on the Y chromosome and XA34 is more limited we cannot decide whether this virus is present or not in the orangutan lineage although an extra band of about 2-3 kb can be detected in the male orangutan. We have previously reported that some endogenous retro- viruses are over-represented on the Y chromosome (Kjellman et al., 1995). The hybridization to the female chimpanzee DNA is somewhat fuzzy because this DNA was of lower molecular mass and it was not possible to obtain good signals from longer fragments without loading about double the amount of DNA. This overloading of the female chimpanzee DNA led to bands that are somewhat out of position and, hence, give

stronger signals with lower molecular mass DNA. We kept this lane in Fig. 8 as we think that it is still informative.

Discussion We describe the isolation and characterization of the

human ERVs XA34-XA38. The evolutionary relationships of these retroviruses (Fig. 5) imply that they should be classified as a new ERV family.

For XA34, DNA sequence comparisons suggest that this virus has recombined with an ERV-9-Iike retrovirus at about position 1817 and during the process of recombination lost the C-terminal end of pol and most of the ERV-9 env region (Fig. 3). In order to determine the correct structure of the env region for the XA34 family of endogenous retroviruses we cloned and sequenced the 3' flanking region of the related XA38. This virus had a complete pol region when compared with BaEV and

63~

iiiiiiiiiiiiiiiii!iiii iii iii iii ii iiii i iiiiii i iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

kb

23.1 9.4

6.6

4.4

2.3 2.0

XA34 XA35 XA36 XA37 XA38

¢;

,~ o =

<.J © = 6

: i .

' . . . . 7

kb

i.ii I ~ ~ z , : ;~:~ ....... 1 23.1

! 6.6

~ 4.4

Fig. 7. Southern blot analysis of male DNA from human, chimpanzee, orangutan and squirrel monkey hybridized with the 142-155 bp pol probes from XA34, XA35, XA36, XA37 and XA38 respectively. The DNAs were digested with Pstl. Molecular size markers (Hindlll-digested). DNA) are indicated at the left and right.

FeLV. The XA38 e n v region is somewhat difficult to identify but there is a region from position 2827 to 3000 corresponding to the end of the SU and the beginning of the TM region that can be tracked. This region shows the strongest homology with the RGHs (Fig. 6) which makes it plausible that this is a true e n v region.

This e n v region is not complete and starting at position 3050 we can identify an Alu repeat. This Alu repeat is followed by two additional ones and this makes it likely that the original XA38 has lost the end of the e n v and the 3' LTR completely.

The Southern blot analyses in Fig. 7 and the sequence analyses of po l fragments (Fig. 6) from five members of the XA34 family show that the elements within the family are closely related to each other. From the Southern blot analyses in Figs 7 and 8 it can be seen that the XA34 ERV family is present in all primate lineages tested. Although lemurs and

other more primitive apes were not studied, 12 related retroviruses were detected in the squirrel monkey, a New World ape. These I2 bands in the squirrel monkey are difficult to identify in Fig. 7, where no more than seven bands can be distinguished, but after prolonged exposure 12 bands are detectable. Hence, XA34 retroviruses probably entered the primate genome about 40-45 million years ago. The separation of the great apes (humans, chimpanzees, gorillas and orangutans) from the monkeys took place about 35 million years ago and was preceded by the separation of New World and Old World monkeys that took place about 40-45 million years ago. It is difficult to decide whether XA34 was introduced in the genome as one single insertion and later amplified or if multiple infections of the germ-line have taken place.

It is likely that at least one copy of an XA34 related endogenous retrovirus is located on the Y chromosome.

Z; =

kb ~

23.1

9"4

6-6

4.4

XA34

.- ~ # #

2.3

2.0

! / i / !

Fig. 8. Southern blot analysis of male and female DNA from human, chimpanzee, orangutan, squirrel monkey and of male rat DNA hybridized with the 142 bp po) probe from XA34. The DNAs were digested with Pstl. Molecular size markers (Hindlll-digested 2 DNA) are indicated at the Jeff.

Expression of sequences on the Y chromosome is very limited, especially expression of sequences within or in the vicinity of the larger heterochromatic part of this chromosome. We argue that an over-representation of endogenous retroviruses (see Kjellman e~ aL, 1995) on the Y chromosome is mediated by three main factors: (i) absence of recombination of the Y chromosome that makes it more difficult for sequences to be lost; (ii) integration of retroviruses in or near heterochromatic regions is less harmful to the organism and (iii) the time for fixation on the Y chromosome is four times faster within a population that has an equal sex ratio. The normal time for fixation of a neutral mutation is 4N generations but for fixation on the Y chromosome this figure will be 2N generations, and if N is the population size of males alone, this figure will be about half the normal size.

XA34 was first isolated as a cDNA from a glioma cDNA library. It is intriguing as to why this virus is expressed in {:his malignant tissue. A study has been initiated to analyse the expression of endogenous retroviruses of the XA34 family in a range of malignant and normal tissues (C. Kjeltman and others, unpublished). What we can say so far is that the XA34 virus is abundantly expressed in normal human placenta. Several endogenous retroviruses have been reported pre- viously to be expressed in the placenta (Kato et al., 1987; Lyden et al., 1994; Rabson et al., 1985). It is noteworthy that if an endogenous retrovirus that is present only on the Y chromosome is expressed and translated to a functional peptide, such a peptide would represent an HY-antigen (Wiberg, 1987) by definition. It is also well known that females, after giving birth to several male offspring, have antibodies

;4(

that react with male cells. Some of these HY-antigens could simply be of retroviral origin, having nothing to do with sex differentiation.

We would like to thank Ms Ingar Nilsson for skilful technical assistance. This work was supported by the Swedish Cancer Foundation, the Swedish Medical Research Council, the Nilsson-Ehle Foundation, the Blficher Foundation, the John and Augusta Persson's Foundation and the Medical Faculty of the University of Lund.

References Ausubel, F. H., Brent, R., Kingston, R. E., Hoore, D. D., Seidman, J. G., Smith, J. A. & Struhl, K. (1987). Current Protocols in Molecular Biology. New York: Greene Publishing Associates and Wiley-Interscience.

Bonner, T. I., O'Connell, C. & Cohen, H. (1982). Cloned endogenous retroviral sequences from human DNA. Proceedings of the National Academy of Sciences, USA 79, 4709-4713.

Cianciolo, G. l., Copeland, T. D., Oroszlan, S. & Snyderman, R. (1985). Inhibition of lymphocyte proliferation by a synthetic peptide hom- ologous to envelope proteins of retrovimses. Science 230, 453-455.

Devereux, J., Haeberli, P. & Smithies, O. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Research 12, 387-395.

Gallo, R. C. (1995). Human retroviruses in the second decade: a personal perspective. Nature Medicine 1, 753-759.

Goodchild, N.L., Wilkinson, D.A. & Mager, D.L. (1993). Recent evolutionary expansion of a subfamily of RTVL-H human endogenous retrovirus-like elements. Virology 196, 778-788.

Hirose, Y., Takamatsu, H. & Harada~ F. (1993). Presence of env genes in members of the RTVL-H family of human endogenous retrovirus-like elements. Virology 192, 52-61.

Kato, N., Pfeifer-Ohlsson, S., Kato, H., Larsson, E., Rydnert, J, Ohlsson, R. & Cohen, H. (1987). Tissue-specific expression of human provirus ERV3 mRNA in human placenta: two of the ERV3 mRNAs contain human cellular sequences. Journal of Virology 61, 2182-2191.

Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press.

Kjellman, C., Sj6gren, H.-O. & Widegren, B. (1995). The Y chromosome: a graveyard for endogenous retrovimses. Gene 16I, 163-I70.

La Mantia, G., Maglione, D., Pengue, G., Di Christofano, A., Simeone, A., Lanfrancone, L. & Lania, L. (1991). Identification and charac-

iiii!ii!iiiiii i i ! ! i i i i i! !i iii i ii iiiii,ii,ii iii i i ii i iiii,i terization of novel human endogenous retroviral sequences, preferentially expressed in undifferentiated embryonal carcinoma cells. Nucleic Acids Research 19, 1513-1520.

Larsson, E., Kato, N. N Cohen, H. (1989). Human endogenous proviruses. Current Topics in Microbiology and Immunology 148, 115-132.

Lieb-MSsch, C., Brack-Werner, R., Werner, T., Baehmann, H., Faff, 0., EHie, V. & Hehlmann, R. (1990). Endogenous retroviral elements of human DNA. Cancer Research 50, 5636s-5642s.

Lyden, T. W., Johnson, P.H., Hwenda, I. M. & Rote, N.S. (1994). U/trastructural characterization of endogenous retroviral particles iso- tared from normal human placentas. Biology of Reproduction 51, 152-157.

Hager, D.L. & Freeman, J.D. (1987). Human endogenous retro- viruslike genome with type C pol sequences and gag sequences related to human T-cell lymphotropic viruses. Journal of Virology 61, 4060--4066.

Hager, D.L. & Freeman, J. D. (1995). HERV-H endogenous retro- viruses: presence in the New World branch but amplification in the Old World primate lineage. Virology ~.13, 395-404.

O'Connell, C., O'Brien, S., Nash, W. G. & Cohen, H. (1984). Erv3. a full- length human endogenous provirus: chromosomal localization and evolutionary relationships. Virology I38, 225-235.

Rabson, A. B., Hamagishi, g., Steel, P. E., lykocinski, H. & Martin, H. A, (1985). Characterization of human endogenous retroviral envelope RNA transcripts. Journal of Virology 56, 176--182.

Repaske, R., Steel, P. E., O'Neill, R. R, Rabson, A. B. & Martin 7 M. A. (1985). Nucleotide sequence of a full-length human endogenous retroviral segment. Journal of Virology 54, 764-772.

Sanger, F., Nicklen, S. & Coulson, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, USA 74, 5463-5467.

Swofford, D. L. (1993). Phylogenetic analysis using parsimony, version 3.1.1. Illinois Natural History Survey, Champaign, Ill., USA.

Wiberg, U.H. (1987). Facts and considerations about sex-specific antigens. Human Genetics 76, 207-219.

Wilkinson, D.A., Freeman, J. D., Goodchild, N. L, Kelleher, C.A. & Hager, D. L. (1990). Autonomous expression of RTVL-H endogenous retroviruslike elements in human cells. Journal of Virology 64, 2157-2167.

Wilkinson, D. A., Goodchild, N. L., Saxton, T. M., Wood, S. & Hager, D. L. (1993). Evidence for a functional subclass of the RTVLH family of human endogenous retrovirus-like sequences. Journal of Virology 67, 2981-2989,

Received 8 January 1996; Accepted 29 March 1996