Telomere and subtelomere of Trypanosoma cruzi chromosomes are enriched in (pseudo)genes of...

9
Telomere and subtelomere of Trypanosoma cruzi chromosomes are enriched in (pseudo)genes of retrotransposon hot spot and trans-sialidase-like gene families: the origins of T. cruzi telomeres B Dong Kim a,1 , Miguel Angel Chiurillo b,c,1 , Najib El-Sayed d , Kristin Jones d , Ma ´rcia R.M. Santos a , Patricio E. Porcile a , Bjorn Andersson e , Peter Myler f , Jose ´ Franco da Silveira a , Jose ´ Luis Ramı ´rez c,g, * a Departamento de Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina, UNIFESP, Rua Botucatu, 862, CEP 04023-062, S. Paulo, Brazil b Decanato de Medicina, Universidad Centro Occidental Lisandro Alvarado, Barquisimeto, Venezuela c Centro de Biotecnologia, Instituto de Estudios Avanzados (IDEA), carretera Nacional Hoyo de la Puerta, Caracas 1080, Caracas, Venezuela d Parasite Genomics, Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA e Center for Genomics and Bioinformatics, Karolinska Institutet, SE-171 77 Stockholm, Sweden f Seattle Biomedical Research Institute, 307 Westlake Av. N, Suite 500, Seattle, WA 98109, USA g Laboratorio de Genetica Molecular, Instituto de Biologia Experimental Universidad Central de Venezuela, calle Suapure, Colinas de Bello Monte, Caracas 1041-A, Venezuela Received 16 May 2004; received in revised form 6 September 2004; accepted 14 October 2004 Available online 28 January 2005 Received by F.G. Alvarez-Valin Abstract Here, we sequenced two large telomeric regions obtained from the pathogen protozoan Trypanosoma cruzi . These sequences, together with in silico assembled contigs, allowed us to establish the general features of telomeres and subtelomeres of this parasite. Our findings can be summarized as follows: We confirmed the presence of two types of telomeric ends; subtelomeric regions appeared to be enriched in (pseudo)genes of RHS (retrotransposon hot spot), TS (trans-sialidase)-like proteins, and putative surface protein DGF-1 (dispersed gene family-1). Sequence analysis of the ts -like genes located at the telomeres suggested that T. cruzi chromosomal ends could have been the site for generation of new gp85 variants, an important adhesin molecule involved in the invasion of mammalian cells by T. cruzi . Finally, a mechanism for generation of T. cruzi telomere by chromosome breakage and telomere healing is proposed. D 2004 Elsevier B.V. All rights reserved. Keywords: Pathogen protozoan; Chromosome end organization; Surface protein genes; Retrotransposon Hot Spot multigene family; Telomere healing 0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2004.10.014 Abbreviations: aa, amino acid(s); ASP-2, amastigote surface protein 2; BAC, bacterial artificial chromosome; dgf-1, dispersed gene family-1; bp, base pair(s); EST, expressed sequence tag; GPI, glycosylphosphatidylinositol; GP85, surface glycoprotein of 85 kDa; kbp, kilobase pair(s); LTR, long terminal repeat; nt, nucleotide; ORF, open reading frame; RHS, retrotransposon hot spot; SIRE, short interspersed repetitive element; SAS, SIRE-associated sequence; Tctel, Trypanosoma cruzi telomeric sequence; TS, trans-sialidase; UTR, untranslated region; VATc, Trypanosoma cruzi telomeric sequence cloned by vector- adaptor strategy; VIPER, vestigial interposed retroelement. B Sequences in this work were deposited in GenBankR with accession numbers: BAC D6C AY551440 and C6 AY552588. * Corresponding author. Centro de Biotecnologia, Instituto de Estudios Avanzados (IDEA), carretera Nacional Hoyo de la Puerta, Caracas 1080, Caracas, Venezuela. Tel.: +58 212 962 1605; fax: +58 212 962 1120. E-mail address: [email protected] (J.L. Ramı ´rez). 1 These authors contributed equally to this work. Gene 346 (2005) 153 – 161 www.elsevier.com/locate/gene

Transcript of Telomere and subtelomere of Trypanosoma cruzi chromosomes are enriched in (pseudo)genes of...

www.elsevier.com/locate/gene

Gene 346 (2005

Telomere and subtelomere of Trypanosoma cruzi chromosomes are

enriched in (pseudo)genes of retrotransposon hot spot and

trans-sialidase-like gene families: the origins of T. cruzi telomeresB

Dong Kima,1, Miguel Angel Chiurillob,c,1, Najib El-Sayedd, Kristin Jonesd,

Marcia R.M. Santosa, Patricio E. Porcilea, Bjorn Anderssone, Peter Mylerf,

Jose Franco da Silveiraa, Jose Luis Ramırezc,g,*

aDepartamento de Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina, UNIFESP, Rua Botucatu, 862, CEP 04023-062, S. Paulo, BrazilbDecanato de Medicina, Universidad Centro Occidental Lisandro Alvarado, Barquisimeto, Venezuela

cCentro de Biotecnologia, Instituto de Estudios Avanzados (IDEA), carretera Nacional Hoyo de la Puerta, Caracas 1080, Caracas, VenezueladParasite Genomics, Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA

eCenter for Genomics and Bioinformatics, Karolinska Institutet, SE-171 77 Stockholm, SwedenfSeattle Biomedical Research Institute, 307 Westlake Av. N, Suite 500, Seattle, WA 98109, USA

gLaboratorio de Genetica Molecular, Instituto de Biologia Experimental Universidad Central de Venezuela, calle Suapure, Colinas de Bello Monte,

Caracas 1041-A, Venezuela

Received 16 May 2004; received in revised form 6 September 2004; accepted 14 October 2004

Available online 28 January 2005

Received by F.G. Alvarez-Valin

Abstract

Here, we sequenced two large telomeric regions obtained from the pathogen protozoan Trypanosoma cruzi. These sequences, together

with in silico assembled contigs, allowed us to establish the general features of telomeres and subtelomeres of this parasite. Our findings can

be summarized as follows: We confirmed the presence of two types of telomeric ends; subtelomeric regions appeared to be enriched in

(pseudo)genes of RHS (retrotransposon hot spot), TS (trans-sialidase)-like proteins, and putative surface protein DGF-1 (dispersed gene

family-1). Sequence analysis of the ts-like genes located at the telomeres suggested that T. cruzi chromosomal ends could have been the site

for generation of new gp85 variants, an important adhesin molecule involved in the invasion of mammalian cells by T. cruzi. Finally, a

mechanism for generation of T. cruzi telomere by chromosome breakage and telomere healing is proposed.

D 2004 Elsevier B.V. All rights reserved.

Keywords: Pathogen protozoan; Chromosome end organization; Surface protein genes; Retrotransposon Hot Spot multigene family; Telomere healing

0378-1119/$ - s

doi:10.1016/j.ge

Abbreviation

pair(s); EST, ex

repeat; nt, nucle

Tctel, Trypanoso

adaptor strategyB Sequence

* Correspon

Venezuela. Tel.:

E-mail addr1 These auth

) 153–161

ee front matter D 2004 Elsevier B.V. All rights reserved.

ne.2004.10.014

s: aa, amino acid(s); ASP-2, amastigote surface protein 2; BAC, bacterial artificial chromosome; dgf-1, dispersed gene family-1; bp, base

pressed sequence tag; GPI, glycosylphosphatidylinositol; GP85, surface glycoprotein of 85 kDa; kbp, kilobase pair(s); LTR, long terminal

otide; ORF, open reading frame; RHS, retrotransposon hot spot; SIRE, short interspersed repetitive element; SAS, SIRE-associated sequence;

ma cruzi telomeric sequence; TS, trans-sialidase; UTR, untranslated region; VATc, Trypanosoma cruzi telomeric sequence cloned by vector-

; VIPER, vestigial interposed retroelement.

s in this work were deposited in GenBankR with accession numbers: BAC D6C AY551440 and C6 AY552588.

ding author. Centro de Biotecnologia, Instituto de Estudios Avanzados (IDEA), carretera Nacional Hoyo de la Puerta, Caracas 1080, Caracas,

+58 212 962 1605; fax: +58 212 962 1120.

ess: [email protected] (J.L. Ramırez).

ors contributed equally to this work.

D. Kim et al. / Gene 346 (2005) 153–161154

1. Introduction

Trypanosoma cruzi is a protozoan parasite causing

Chagas disease, an incurable and debilitating illness affect-

ing 16–18 millions of people in the Latin American region

(WHO, 2002). Besides its medical importance, trypanoso-

matids are evolutionarily interesting since they belong to one

of the earliest groups of mitochondria-containing eukaryotes,

have a highly plastic genome, and have unusual gene

organization (Zingales et al., 1997; Myler et al., 1999;

Anderson et al., 1998). Several parasites such as Plasmo-

dium and Trypanosoma brucei have developed sophisticated

evasion mechanisms to adapt to the hostile environment

posed by the host, such as exposing variable surface antigens

to escape the immune system. Genes coding for surface

antigens in these organisms are located at subtelomeric

regions, and it has been speculated that this preferred

location facilitates gene switching and expression, and the

generation of new variants (Cano, 2001; Barry et al., 2003).

In previous works, we have described the basic elements

of T. cruzi telomeres, and found that they were enriched in

(pseudo)genes from the ts (trans-sialidase)-like family and

sequences related to VIPER (vestigial interposed retroele-

ment) (Chiurillo et al., 1999, 2002). Members of the ts-like

gene family display great sequence diversity and encode

many surface proteins related with cell invasion, virulence,

and evasion from the host immune system (Weston et al.,

1999; Frasch, 2000). We speculated that the preferred

telomeric location of the ts-like family genes could be

connected to the generation of variants via non-homologous

recombination (Chiurillo et al., 1999, 2002).

Here, we sequenced and analyzed two large telomeric

regions from T. cruzi that together with the information in T.

cruzi Genome Project database, allowed us to draw a finer

picture of the organization of telomeric and subtelomeric

regions of this parasite. We also discuss the functional

importance of this organization to explain the generation of

genetic variability, and the origins of T. cruzi telomeres.

2. Materials and methods

2.1. Nucleotide (nt) sequencing of BAC and cosmid

telomeric clones

The telomeric BAC (BAC D6C) and cosmid (C6)

recombinants here studied were selected from T. cruzi

(clone CL Brener) libraries constructed in pBeloBAC11 or

cosmid Lawrist-7 by hybridization with an 18-mer telomeric

probe (5V-CCCTAA-3V)3 (Chiurillo et al., 1999, 2002).

Sheared DNA from the selected recombinants (1.6–2 kbp)

was cloned into a modified pUC18 vector via BstXI linkers.

Sequences were assembled using the TIGR assembler and

gaps were closed using a combination of BAC walking,

directed PCR or transposon insertion. In BAC D6C end-

DNA sequence readings confirmed the presence of the

telomeric adaptor and BamHI site used for cloning. Open

reading frames (ORF) were assigned by NCBI programs:

BLASTN, BLASTX, and ORF Finder.

2.2. In silico assembling of telomeric contigs

We searched TIGR T. cruzi WGS database, which

contains 994,060 sequences and represents an approxi-

mately 21-fold coverage of T. cruzi haploid genome (ca. 43

Mb in clone CL Brener), for contigs containing the 189-bp

telomeric junction, and then they were further extended

towards the subtelomeric regions using DNASTAR MegA-

lign software. To validate in silico assembling, primers

based on the contig sequences were used to amplify, by

polymerase chain reaction, specific fragments on T. cruzi

genomic DNA. The amplified fragments of the expected

size were cloned in plasmid vector and sequenced.

3. Results

3.1. Sequence organization of T. cruzi telomeres

Assembling of shotgun sequences from recombinant

BAC D6C and BLAST search analysis allowed the

construction of the map presented in Fig. 1. A summary

of the characteristics of the 29,248-bp contig in BAC D6C is

as follows, from telomere to centromere (right to left).

(1) A telomere sequence type II (Chiurillo et al., 1999)

whose basic elements are 66 copies of hexameric repeats

5V-TTAGGG-3V, followed by a 189-bp junction, a truncated

sequence of 100 bp from the 5V-UTR (untranslated region)

of a gp85 gene (GenBank accession no. M64836) from

group II of the ts-like family, the spacer between two gp85

genes, a sequence of 530 bp without homology in GenBank

database, and short interspersed repetitive element (SIRE)-

associated sequence (SAS) SZ23 (Vazquez et al., 1999,

2000). In addition, we identified two sequences between nt

26,572 and 27,845 that share 84–86% of identity with the

recently described L1Tc non-long terminal repeat (LTR)

retrotransposon flanking sequences Seq3Tc (Olivares et al.,

2000). These elements are part of (pseudo)genes of T. cruzi

rhs family (Bringaud et al., 2002a). Translation of the

subtelomeric rhs-related sequences showed the presence of

three putative peptides of 115, 344, and 178 aa (amino acids),

and the derived aa sequences shared 28–30% identity with

the T. brucei RHS proteins, including an ATP/GTP binding

motif and a putative insertion site for retroelements (Bring-

aud et al., 2002a) (Fig. 2). This result confirmed the presence

of T. brucei RHS-like sequences in the subtelomeric regions

of T. cruzi (Bringaud et al., 2002a). The block containing the

telomeric unit (excepting the 189-bp junction) plus the rhs

(pseudo)gene (Seq3Tc sequence) is 2.6 kbp long, and part of

it was duplicated upstream of an asp-2 (amastigote surface

protein-2) (pseudo)gene (coordinates 18,156 to 20,611) from

group II of the ts-like family (Low and Tarleton, 1997).

Fig. 1. Sequence organization of the telomeric BAC D6C recombinant. Black arrow indicates the sense of genes. Different sequences are identified on top: Bgp85 and B rhs (pseudo)genes from gp85 (group II of the ts-like family) and rhs multigene families, respectively; dgf-1 gene from dispersed gene family-1;

L1Tc non LTR-retrotransposon; L1Tc flanking sequences RS13Tc, RS1Tc and Seq3Tc; SIRE-associated sequence SZ23; 189-bp junction; (TTAGGG)66telomeric repeats. Black and white bars mark repeated sequence regions.

D. Kim et al. / Gene 346 (2005) 153–161 155

(2) After the previously described block, there is a 5-kbp

region containing an asp-2 (pseudo)gene in which the sense

strand was oriented towards the telomere, and presented

several frame shifts and stop codons. The translated peptides

of the (pseudo)gene shared 51–73% with asp-2 (Fig. 2).

Several characteristic motifs of the ts-like gene family

(Frasch, 2000), such as two Asp boxes (SxDxGxTW), the

subterminal motif VTVxNVfLYNR, and the hydrophobic

C-terminus characteristic of glycosylphosphatidylinositol

(GPI)-anchored proteins, were also observed (Fig. 2). The

region carrying the Asp boxes corresponds to a sialidase/

neuraminidase domain that binds and hydrolyzes terminal

sialic residues from glycoconjugates. Finally, the (pseudo)-

gene also encoded a 96-aa peptide corresponding to the

amino ASP-2 terminal domain, including two alternative

initiator methionines and a typical amino-terminal signal

sequence made of basic residues plus hydrophobic aa

common to several members of the ts-like family (Takle

and Cross, 1991; Carmo et al., 2002).

(3) Next, there is a duplicated region of 2455 bp (see

above). Computer translation of this region evidenced four

peptides corresponding to the central domain (aa 247 to

635) of RHS proteins. Again, this rhs (pseudo)gene shared

sequence homology at the nucleotide level with L1Tc

flanking sequences: RS13Tc (three blocks, 82–90% iden-

tity), RS1Tc (five blocks, 83–91% identity), and SIRE-

associated sequence SZ23. Recently, Olivares et al. (2000)

have shown that the non-LTR retrotransposon L1Tc (Martin

et al., 1995) is frequently found inserted between RS1Tc and

Seq3Tc fragments, flanked by short direct-repeated sequen-

ces (~9 bp). BAC D6C did not contain L1Tc, and the short

direct repeated sequences were slightly changed from

TGCAGACAT to TGCAGGCAT.

(4) After the rhs (pseudo)gene, there is a 10,425-bp ORF

encoding a complete protein of 3475 aa sharing 82%

identity with a putative T. cruzi surface protein called DGF-

1 (Dispersed Gene Family-1) (Wincker et al., 1992).

Nucleotide sequence identity search of the GenBank data-

base revealed a high percentage identity of dgf-1 with 15

ESTs (expressed sequence tags), and two clones (GenBank

accession nos. AF480942, AF480943) from an amastigote-

stage specific cDNA library, thus indicating that this is an

actively transcribed gene.

(5) At the end of BAC D6C, there is another gp85

(pseudo)gene with 86–93% DNA sequence identity with the

p85.1 and p85.2 genes from group II of the ts-like family

(Weston et al., 1999). This (pseudo)gene had several frame

shifts and stop codons, and the translated peptides share 63–

72% sequence identity with the central and carboxy-

terminal regions of P85.1 and P85.2 (Fig. 2). It is also

flanked by a truncated form of the non-LTR retrotransposon

L1Tc, and by SIRE-associated sequences SZ10 and SZ31 at

its 5V and 3V ends, respectively.In order to compare the organization found in BAC D6C

with other T. cruzi chromosome ends, we sequenced and

analyzed the 34,387-bp insert in recombinant cosmid 6 (C6)

(Chiurillo et al., 1999) (Fig. 3B). C6 presented an overall

sequence organization similar to BAC D6C, with two copies

of gp85 (pseudo)genes, the first located just near the

telomeric hexameric repeats, and the second at ~30 kbp

from the previous one (from telomere to centromere) and

immediately adjacent to a dgf-1 gene. Both gp85 (pseudo)-

genes showed a high degree of identity with protein genes

from group II of the ts-like gene family (Takle and Cross,

1991), and as in recombinant BACD6C, the sense of

transcription was oriented towards the telomere (Fig. 2).

Between dgf-1 gene and the telomeric end, C6 also

displayed a rhs (pseudo)gene that shared sequence homol-

ogy with L1Tc flanking sequences (RS1Tc, Seq3Tc) and

SIRE-associated sequence SZ23. After the second gp85

(pseudo)gene, we also identified two sequences that shared

83–87% of identity with Seq3Tc (Olivares et al., 2000).

Again, translated peptides from this region shared identity

with the central domain of T. brucei RHS proteins (Bring-

aud et al., 2002a) (Fig. 2). Remarkably, rhs (pseudo)genes

found in several BAC genomic clones shared the same

overall organization of BAC D6C and C6 (Fig. 3D), except

that in some of these BACs, this cluster was interrupted by

retrotransposon L1Tc (Olivares et al., 2000; Bringaud et al.,

2002a).

Fig. 2. Amino acid alignment of different domains of GP85 proteins encoded by subtelomere-associated loci. Alignments were done by Clustal W with

MegAlign program (DNASTAR). Sequences are as follows: ASP-2 (GenBank accession number no. U77951); B6-GP85-1 from BAC D6C; C6-GP85-1 from

cosmid C6; S2-GP85-1 and S2-GP85-3 from Silico-2. (Panel A) Amino acid terminal domain. Potential initiator methionine residues are indicated by asterisks.

Over-line indicates a predicted amino-terminal signal peptide (N). (Panel B) Central domain encoding the sialidase conserved domain. (Panel C) carboxy-

terminal domain. Over-line indicates a typical subterminal motif VTVxNVfLYNR (VTV) of T. cruzi ts and ts-like gene families, and the hydrophobic C-

terminal region, characteristic of GPI-anchored proteins. Dashes were introduced to minimize the alignment. Conserved residues are shaded in black (100%

conservation), gray (z80% conservation) and light gray (z60% conservation); and no shading denotes residues with V60% conservation.

D. Kim et al. / Gene 346 (2005) 153–161156

Previous screenings of the BAC telomeric library showed

that some chromosomal ends do not contain either dgf-1

gene, and/or gp85-like sequences (Chiurillo et al., 1999,

2002). To confirm this finding, we examined the chromo-

somal distribution of dgf-1 in T. cruzi CL Brener. Results of

this experiment (not shown) determined that dgf-1 sequen-

ces were present in 15 out of 20 chromosomal bands

separated by pulsed field gel electrophoresis, clearly

Fig. 3. Comparison of T. cruzi telomeric and subtelomeric regions: (A) telomeric BAC D6C, (B) telomeric cosmid C6, (C) in silico assembled contigs Silico-1

and Silico-2, and (D) genomic clones (Olivares et al., 2000): F1 from T. cruzi strain Maracay, pBAC62 and, BAC52 from strain CL Brener. All recombinants

have been adjusted to scale. Black arrow indicates the sense of genes. Different sequences are identified on top: B gp85 and B rhs (pseudo)genes from gp85

(group II of the ts-like family) and rhs multigene families, respectively; dgf-1 (pseudo)genes from dispersed gene family-1; L1Tc flanking sequences RS13Tc,

RS1Tc and Seq3Tc; LTR-retrotransposon VIPER; repeated elements SIRE and F604; SIRE-associated sequence SZ23; sequences from telomeric BACs F7 and

F3; 189-bp junction; (TTAGGG)n telomeric repeats.

D. Kim et al. / Gene 346 (2005) 153–161 157

confirming the absence of dgf-1 sequences at some

chromosomes.

3.2. In silico assembling of contigs containing T. cruzi

telomeric sequences

To further inquire in the organization of other chromo-

somal ends, we searched in TIGR’s WGS T. cruzi database

for contigs containing the 189-bp telomeric junction, and

then we assembled these contigs in silico. The search

produced 77 hits that were further assembled in 37 contigs.

Fig. 4 shows the alignment of the ends of such contigs

where the presence of two types of telomeres described by

Chiurillo et al. (1999) was confirmed.

Two large in silico assembled contigs, namely Silico-1

and Silico-2, were further analyzed. Silico-1 of 17,263 bp

differed from BAC D6C and C6 for not containing gp85

genes or (pseudo)genes (Fig. 3C). In addition, Silico-1

presented a block of hexameric repeats 324 bp long, the

basic telomeric units, and a degenerated rhs gene. The

telomere sequence in Silico-1 was followed by an 8.8-kbp

region with no similarity with other T. cruzi sequences

deposited in GenBank database except for rhs-related

sequence Seq3Tc and SIRE-associated sequence SZ23.

Next to this block, we found a 684-bp sequence with 92%

identity with T. cruzi telomeric BACs TcTel BAC:F7 and

TcTel BAC:F3 (GenBank accession nos. AF305884 and

AF305885) (Chiurillo et al., 2002); this finding confirmed

the validity of in silico assembly using TIGR WGS

database. After this region, we found a truncated copy of

SIRE-associated sequence SZ23 and degenerate rhs sequen-

ces. Finally, after the rhs sequences, we found a whole copy

of dgf-1 gene (Wincker et al., 1992).

The second in silico assembled contig, Silico-2, had

23,322 bp and differed from BACD6C, C6, and Silico-1,

by containing a VIPER element, which is regarded as a

LTR-retrotransposon (Vazquez et al., 2000), and by lacking

dgf-1 genes or sequences (Fig. 3C). Additionally, Silico-2

had three copies of gp85-like sequences: The first one

(~3.8 kbp long) is located next to the telomere repeats, the

second gp85 (~1.8 kbp long) is 4365 bp away from the

first one, and the third is located at position nt 3945 to

Fig. 4. Alignment of T. cruzi in silico telomeric ends. (Panel A) Alignment of in silico contigs containing the 189-bp telomeric junction. Conserved residues are

shaded in black (100% conservation), dark gray (75% conservation) and light gray (50% conservation). (Panel B) Representation of T. cruzi telomeres types,

type-II telomere was likely originated by the insertion of a rhs sequence into type-I telomeres.

D. Kim et al. / Gene 346 (2005) 153–161158

5452 towards the centromere. These three copies of gp85-

like sequences seem to be bmosaicQ (pseudo)genes made

out of segments derived from different members of gp85

multigene family (Fig. 2). Between the two gp85

(pseudo)genes, there is a 2146-bp sequence (position nt

9376 to 11,522) sharing 89% identity at nucleotide level

with VIPER element (Vazquez et al., 2000). A truncated

Seq3Tc sequence is located between nucleotides 1 and

3322 (Fig. 3C). Finally, we identified truncated forms of a

T. cruzi chitin binding-like protein and the elongation

factor g-1 (GeneBank accessions nos. AF310256 and

AB010288).

4. Discussion

4.1. T. cruzi subtelomeres are a patchwork of blocks of rhs

and gp85 (pseudo)genes

Here, we analyzed in detail two long telomeric recombi-

nants obtained from the pathogenic protozoan T. cruzi, their

telomeric origin was confirmed by the presence of

hexameric repeats and the telomeric type II signature

previously proposed for this parasite (Chiurillo et al.,

1999). Moving away from the telomere towards the

centromere, the subtelomeric region presented rhs (pseu-

D. Kim et al. / Gene 346 (2005) 153–161 159

do)genes, LTR retrotransposon (VIPER) (Vazquez et al.,

2000), and gp85 (pseudo)genes from the group II of the ts-

like family. In all T. cruzi telomeres herein studied, there is

at least one truncated rhs that shared homology with Seq3T

and SZ23 indicating that these sequences are part of rhs

genes. In T. brucei, RHS belongs to a protein family with

apparent molecular masses from 85 to 110 kDa with

intranuclear and perinuclear location (Bringaud et al.,

2002a,b). rhs genes are polymorphic, have a large number

of (pseudo)genes, map preferentially at T. brucei subtelo-

meric regions, and contain a hot spot for the insertion of

retroelements RIME and/or ingi. Previous results suggested

that T. cruzi contains polymorphic sequences similar to T.

brucei rhs (Bringaud et al., 2002a), in addition, full-length

copies of rhs-like genes are found scattered through T. cruzi

genome. More recently, proteomic studies have shown that

RHS proteins are highly expressed in T. cruzi epimastigotes

forms in culture (Parodi-Talice et al., 2004).

In a previous work (Verbisck et al., 2003), we have

described a reiterated family of transcribed oligo (A)-

terminated interspersed DNA, scattered through the genome

of T. cruzi, including the subtelomeric regions. The

members of this family (TcSx38, TcSx42, and TcSx12)

display the same L1Tc flanking regions contained within

rhs genes. Homologous sequences to TcSx38 and TcSx42

have also been found at subtelomeric regions of BAC D6C,

cosmid 6, and Silico-1. The high percentage identity of the

sequences here studied with several T. cruzi retrotranspo-

son-like ESTs (VIPER and L1Tc) from GenBank database

suggested that these sequences could be transcribed and

processed. However, since no complete ORFs were found

for these elements, these ESTs could be originated as a

product of the generalized polycistronic transcription typical

of trypanosomatids. The apparent inactivity of T. cruzi

retrotransposons and the lack of sexual reproduction in this

parasite whose population structure seems to be preferen-

tially clonal (Tibayrenc et al., 1991) agree with the model of

transposon proliferation proposed by Hickey (1982) and

confirmed by Arkhipova and Meselson (2000), where

deleterious transposons do not persist in long-term asexual

organisms.

4.2. Are the subtelomeric regions nurseries for gp85 gene

family diversity?

Subtelomere gp85 sequences frequently contain frame

disruptions (frameshifts, in-frame stop codons, insertion of

non-related sequences) and, therefore, they should be

considered as (pseudo)genes. The presence of gp85

sequences in chromosomal tips raises the possibility that

subtelomere functioned as a nursery for the generation of

diversity in this multigene family. Members of gp85 gene

family are scattered throughout all T. cruzi genome, but we

propose that before this expansion took place, variants of

this gene family originated by ectopic recombination of

gp85 genes located at the subtelomeres. We suggest that

subtelomere could have functioned as a place where gp85

sequences were duplicated and modified, without affecting

functional interstitial copies. The high frequency of recom-

bination in subtelomeric regions could have created a

favorable environment for the generation of new surface

protein variants. Thereafter, the mobilization of variants was

likely facilitated by retrotransposon elements.

4.3. A model for the generation of telomeres harboring

gp85 (pseudo)genes

Taking into account the results presented here, we

propose that the events that generated the common T. cruzi

telomeric block can be reconstructed from a tandem array of

gp85 genes such as the one present in the cosmid GenBank

accession no. AC104490 (Fig. 5). In a first step, a deletion

brought together a fragment containing the spacer between

two gp85 genes and part of a gp85 5V-UTR, with the 3V-UTR of the same gene. Subsequently, a break took place in

the 3V UTR, and the broken end was healed by telomerase.

These two structures were eventually fixed as the T. cruzi

telomere (Fig. 5). A similar process has been invoked to

explain the generation of Giardia lamblia telomeres

(Arkhipova and Morrison, 2001; Pardue et al., 2001). The

alignment of sequences shown in Fig. 5 confirms that the

189-bp junction was originally part of the 3V-UTR of a gp85

gene.

In support of the idea of the breakage on the 3VUTR and

telomerase healing, next to the 189-bp homologous regions,

where the putative breakage took place (Fig. 5), there are

sequences that resemble the hexameric repeat, this situation

is reminiscent of the breaking of a-globin gene in cases of

human a thalassemia where telomerase recognized and

healed the broken ends (Wilkie et al., 1990).

Despite sequence variations, similar telomeric structures

were detected in almost all T. cruzi chromosomes so far

studied (Fig. 3), therefore, we assumed that the event

producing the chromosomal fracture occurred during the

expansion of the gp85 gene family. Type-two telomere was

likely generated by the insertion of a rhs (pseudo)gene in a

latter event. The presence of rhs sequences in the same

relative order is coincident with an ancient transposition

event that took place during the gp85 gene family

expansion. The presence of gp85 sequences and genes

suggests that the subtelomeric zones may have served as an

evolutionary test-bed for this multigene family, but we do

not know yet whether telomeric gp85-like genes are

functional or not.

4.4. Is there a subtelomere-linked expression site for dgf-1

gene family?

Strikingly, the recombinants examined here contained a

dgf-1 gene oriented in the same direction, with same relative

location, and surrounded by the same set of rhs-related

sequences as the BACs reported by other authors (Olivares

Fig. 5. Model for generation of T. cruzi telomeres by deletion and breakage of a gp85 gene repeat. (Panel A) Schematic representation of different recombinant

clones sharing the 189-bp junction sequence found at all chromosome ends of T. cruzi. It summarizes the events that we propose took place to generate T. cruzi

telomeres: a deletion event removed most of the 5VUTR, the coding region of a gp85 gene, and part of the 3V UTR, the rest of the UTRs sequences fused

together, and in a second event a breakage occurred and the broken end was healed by telomerase. (Panel B) Comparison of nucleotide sequence from the

region carrying the 189-bp sequence, including the hexameric telomeric repeat. Sequences are follows: Tt34c1(GenBank accession no. M64836), a gp85

cDNA clone; cosmid GenBank accession no. AC104490, genomic fragment carrying an array of gp85 genes; TcSx12 and TsSx38 (GenBank accession nos.

AF510088 and AF510086), cDNA clones from a reiterated family of transcribed oligo (A)-terminated interspersed DNA; Tctel8.1.2 and VATc17 (GenBank

accession nos. AF100646 and AF100652), telomeric associated sequences.

D. Kim et al. / Gene 346 (2005) 153–161160

et al., 2000). This fact made us wonder whether those BACs

were telomeric origin. In recombinant BAC D6C, the large

dgf-1 ORF had not interruptions, and shared 85% of identity

with a previously reported dgf-1 gene (Wincker et al.,

1992). A surprising fact was to find another gp85

(pseudo)gene upstream these dgf-1 ORF because it raised

the question of how this large dgf-1 ORF can be preserved

in an apparently unstable environment surrounded by dead-

end (pseudo)genes of gp85 and rhs? The most logical

answer is that the subtelomeric dgf-1 gene is under a strong

selective pressure and consequently expressed.

As an indication that the telomeres examined here

correspond to different chromosomes, sequence of dgf-1

gene in C6 (Fig. 3B) produced an interrupted ORF with

partial homology with that of BAC D6C. In addition, the

fact that we found many hits at T. cruzi EST bank when we

query with dgf-1 is an indication that this gene is

transcribed. Considering the important role of subtelomeric

regions in specialized functions of pathogenic protozoa, the

role of this gene should be further investigated.

Acknowledgements

This work was supported by FONACIT Grants

G99000036 to JLR and CONICIT fellowship No. 9900034

toM. Chiurillo, and by FAPESP, CNPq and CYTED grants to

J. Franco da Silveira.

References

Anderson, B., Aslund, L., Tammi, M., Tran, A.-N., Hoheisel, J.D.,

Paterson, U., 1998. Complete sequence of a 93.4 contig from

chromosome 3 of Trypanosoma cruzi containing a strand switch

region. Gen. Res. 8, 809–816.

D. Kim et al. / Gene 346 (2005) 153–161 161

Arkhipova, I., Meselson, M., 2000. Transposable elements in sexual and

ancient asexual taxa. Proc. Natl. Acad. Sci. U. S. A. 97, 14473–14477.

Arkhipova, I.R., Morrison, H.G., 2001. Three retrotransposon families in

the genome of Giardia lamblia: two telomeric, one dead. Proc. Natl.

Acad. Sci. U. S. A. 98, 14497–14502.

Barry, J.D., Ginger, M.L., Burton, P., McCulloch, P.B., 2003. Why are

parasite contingency genes often associated with telomeres? Int. J.

Parasitol. 33, 29–45.

Bringaud, F., Biteau, N., Melville, S.E., Hez, S., El-Sayed, N.M., Leech, V.,

Berriman, M., Hall, N., Donelson, J.E., Baltz, T., 2002a. A new,

expressed multigene family containing hot spot for insertion of

retrolements is associated with polymorphic subtelomeric regions of

Trypanosoma brucei. Eukaryot. Cell 1, 137–151.

Bringaud, F., Garcıa-Perez, J.L., Heras, S.R., Ghedin, E., El-Sayed, N.M.,

Andersson, B., Baltz, T., Lopez, M.C., 2002b. Identification of non-

autonomous non-LTR retrotransposons in the genome of Trypanosoma

cruzi. Mol. Biochem. Parasitol. 124, 73–78.

Cano, M.I.N., 2001. Telomere biology of trypanosomatids: more questions

than answers. Trends Parasitol. 17, 425–429.

Carmo, M.S., Santos, M.R.M., Cano, M.I., Araya, J.E., Yoshida, N., Franco

da Silveira, J., 2002. Expression and organization of the gene family

encoding a 90 kDa surface glycoprotein of metacyclic trypomastigotes

of Trypanosoma cruzi. Mol. Biochem. Parasitol. 125, 201–206.

Chiurillo, M.A., Cano, I., Franco da Silveira, J., Ramırez, J.L., 1999.

Organization of telomeric and sub-telomeric regions of chromosomes

from the protozoan parasite Trypanosoma cruzi. Mol. Biochem.

Parasitol. 100, 173–183.

Chiurillo, M.A., Santos, M.R.M., Franco da Silveira, J., Ramırez, J.L.,

2002. A general improved approach for cloning and characterization of

telomeres: the protozoan parasite Trypanosoma cruzi as model

organism. Gene 294, 197–204.

Frasch, A.C.C., 2000. Functional diversity in members of the trans-

sialidase and mucin families in Trypanosoma cruzi. Parasitol. Today

16, 282–286.

Hickey, D.A., 1982. Selfish DNA: a sexually-transmitted nuclear parasite.

Genetics 101, 519–531.

Low, H.P., Tarleton, R.L., 1997. Molecular cloning of the gene encoding

the 83 kDa amastigote surface protein and its identification as a member

of Trypanosoma cruzi sialidase superfamily. Mol. Biochem. Parasitol.

88, 137–149.

Martin, F., Maranon, C., Olivares, M., Alonso, C., Lopez, M.C., 1995.

Characterization of a non-long terminal repeat retrotransposon cDNA

(L1Tc) from Trypanosoma cruzi: homology of the first ORF with the

Ape family of DNA repair enzymes. J. Mol. Biol. 247, 49–59.

Myler, P.J., Audleman, L., deVos, T., Hixson, G., Kiser, P., Lemley, C.,

Magness, C., Rickel, E., Sisk, E., Sunkin, S., Swartzell, S., Westlake, T.,

Bastien, P., Fu, G., Ivens, A., Stuart, K., 1999. Leishmania major

Friedlin chromosome 1 has an unusual distribution of protein-coding

genes. Proc. Natl. Acad. Sci. U. S. A. 96, 2902–9206.

Olivares, M., del Carmen Thomas, M., Lopez-Barajas, A., Requena, J.M.,

Garcia-Perez, J.L., Angel, S., Alonso, C., Lopez, M.C., 2000. Genomic

clustering of the Trypanosoma cruzi nonlong terminal L1Tc retro-

transposon with defined interspersed repeated DNA elements. Electro-

phoresis 21, 2973–2982.

Pardue, M.-L., DeBaryshe, P.G., Lowenhaupt, K., 2001. Another protozoan

contributes to understanding telomeres and transposable elements. Proc.

Natl. Acad. Sci. U. S. A. 98, 14195–14197.

Parodi-Talice, A., Duran, R., Arrambide, N., Prieto, V., Pineyro, M.D.,

Pritsch, O., Cayota, A., Cervenansky, C., Robello, C., 2004. Proteome

analysis of the causative agent of Chagas disease: Trypanosoma cruzi.

Int. J. Parasitol. 34, 881–886.

Takle, G.B., Cross, G.A.M., 1991. An 85-kilodalton surface antigen gene

family of Trypanosoma cruzi encodes polypeptides homologous to

bacterial neuraminidases. Mol. Biochem. Parasitol. 48, 185–198.

Tibayrenc, M., Kjellberg, F., Arnaud, J., Oury, B., Breniere, S.F., Darde,

M., Ayala, F.J., 1991. Are eukaryotic microorganisms clonal or sexual?

A population genetics vantage. Proc. Natl. Acad. Sci. U. S. A. 88,

5129–5133.

Vazquez, M., Lorenzi, H., Schijman, A.G., Ben-Dov, C., Levin, M.J., 1999.

Analysis of the distribution of SIRE in the nuclear genome of

Trypanosoma cruzi. Gene 239, 207–216.

Vazquez, M., Lorenzi, H., Schijman, A.G., Ben-Dov, C., Levin, M.J., 2000.

The short interspersed repetitive element of Trypanosoma cruzi, SIRE,

is part of VIPER, an unusual retroelement related to long terminal

repeat retrotransposons. Proc. Natl. Acad. Sci. U. S. A. 97, 2128–2133.

Verbisck, N.V., dos Santos, M.R., Engman, D.M., Chiurillo, M.A.,

Ramirez, J.L., Araya, J.E., Mortara, R.A., Franco da Silveira, J.,

2003. A novel reiterated family of transcribed oligo(A)-terminated,

interspersed DNA elements in the genome of Trypanosoma cruzi. Mem.

Inst. Oswaldo Cruz 98, 129–133.

Weston, D., Patel, B., Van Voorhis, W.C., 1999. Virulence in Trypanosoma

cruzi infection correlates with the expression of a distinct family of

sialidase superfamily genes. Mol. Biochem. Parasitol. 85, 1–11.

WHO Technical Report Series. Control of Chagas disease. 2002. Report of

a WHO Expert Committee, Geneva, World Health Organization

Technical Report Series 905.

Wilkie, A.O., Lamb, J., Harris, P.C., Finney, R.D., Higgs, D.R., 1990. A

truncated human chromosome 16 associated with alpha thalassaemia is

stabilized by addition of telomeric repeat (TTAGGG)n. Nature 346,

868–871.

Wincker, P., Murto-Dovales, A.C., Goldenberg, S., 1992. Nucleotide

sequence of a representative member of a Trypanosoma cruzi dispersed

gene family. Mol. Biochem. Parasitol. 55, 217–220.

Zingales, B., Rondinelli, E., Degrave, W., Franco da Silveira, J., Levin, M.,

LePaslier, D., Modabber, F., Dobrokhotov, B., Swindle, J., Kelly, J.M.,

Aslund, L., Hoheisel, J.D., Ruiz, A.M., Cazzulo, J.J., Pettersson, U.,

Frasch, A.C.C., 1997. The Trypanosoma cruzi genome initiative.

Parasitol. Today 13, 16–22.