cDNA cloning and genomic organization of enhancer of split groucho gene from nematode Caenorhabditis...

11
Vol. 43, No. 2, October 1997 BIOCHEMISTRY and MOLECULAR BIOLOGY INTERNATIONAL Pages327-337 cDNA CLONING AND GENOMIC ORGANIZATION OF ENHANCER OF SPLIT GROUCHO GENE FROM NEMATODE CAENORHABDITIS ELEGANS Farida S. Sharief I, Stephen C.-M. Tsoi I and Steven S.-L. Lil,2, 3 iLaboratory of Molecular Genetics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA 2Institute of Life Sciences, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424, ROC. Received June 12, 1997 SUMMARY This first genomic Enhancer of split groucho (ESG) gene and its full length complementary DNA (cDNA) from nematode C. elegans were cloned and sequenced via homology with the corresponding Drosophila groucho cDNA. The cDNA of 2.l-Kb encodes a protein of 612 amino acids, and the nematode ESG protein is the smallest and most different in structure compared to all ESG related proteins. The gene isolated is 4,246-bp in size, including 1,219-bp promoter region. A putative TATA-box at position -1166, two consensus sequence of ACTGG, characteristic of leader binding protein-i (LBP- i) binding motifs at position -563 and -211 and nine CAAT boxes were found in the promoter region of ESG gene. The protein-coding sequence is interrupted by five introns. The length of introns 1 to 5 is 52, 252, 87, 53 and 518 bp, respectively. The overall structural relationships of the ESG-related proteins among human, mouse, rat, Xenopus, Drosophila and nematode were also analyzed. INTRODUCTION In the fruitfly Drosophila melanogaster, neurogenesis is under the control of several loci, namely Enhancer of split, Notch, Delta, mastermind, big brain and neuralized. 4The nucleotide sequences have been deposited in the GenBank database under accession no. AF001271' and AF001272. 3To whom correspondence should be addressed at Institute of Life Sciences, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424, ROC. Tei:886-7-525-2379; Fax:886-7-525-2360; e-mail: [email protected]. 327 1039-9712/97/020327-11 $05.00/0 Copyright 9 1997 by Academic Press Australia. All rights ty~reproduction in any form reserved

Transcript of cDNA cloning and genomic organization of enhancer of split groucho gene from nematode Caenorhabditis...

Vol. 43, No. 2, October 1997 BIOCHEMISTRY and MOLECULAR BIOLOGY INTERNATIONAL Pages 327-337

cDNA CLONING AND GENOMIC ORGANIZATION OF ENHANCER OF SPLIT GROUCHO GENE FROM NEMATODE CAENORHABDITIS ELEGANS

Farida S. Sharief I, Stephen C.-M. Tsoi I and Steven S.-L. Lil,2, 3

iLaboratory of Molecular Genetics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA 2Institute of Life Sciences, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424, ROC.

Received June 12, 1997

SUMMARY

This first genomic Enhancer of split groucho (ESG) gene and its full length complementary DNA (cDNA) from nematode C. elegans were cloned and sequenced via homology with the corresponding Drosophila groucho cDNA. The cDNA of 2.l-Kb encodes a protein of 612 amino acids, and the nematode ESG protein is the smallest and most different in structure compared to all ESG related proteins. The gene isolated is 4,246-bp in size, including 1,219-bp promoter region. A putative TATA-box at position -1166, two consensus sequence of ACTGG, characteristic of leader binding protein-i (LBP- i) binding motifs at position -563 and -211 and nine CAAT boxes were found in the promoter region of ESG gene. The protein-coding sequence is interrupted by five introns. The length of introns 1 to 5 is 52, 252, 87, 53 and 518 bp, respectively. The overall structural relationships of the ESG-related proteins among human, mouse, rat, Xenopus, Drosophila and nematode were also analyzed.

INTRODUCTION

In the fruitfly Drosophila melanogaster, neurogenesis is

under the control of several loci, namely Enhancer of

split, Notch, Delta, mastermind, big brain and neuralized.

4The nucleotide sequences have been deposited in the GenBank database under accession no. AF001271' and AF001272.

3To whom correspondence should be addressed at Institute of Life Sciences, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424, ROC. Tei:886-7-525-2379; Fax:886-7-525-2360; e-mail: [email protected].

327

1039-9712/97/020327-11 $05.00/0 Copyright �9 1997 by Academic Press Australia. All rights ty~ reproduction in any form reserved

Vol. 43, No. 2, 1997 BIOCHEMISTRYand MOLECULAR BIOLOGY INTERNATIONAL

These neur0genic loci are required for the proper

segregation of neural and epidermal progenitor cells during

the formation of both the central and peripheral nervous

systems [1,2,3,4]. Mutations in these genes cause most

ventral ectodermal cells to become neuroblasts with little

or no formation of epidermis [5]. The Enhancer of split

gene complex contains at least 13 transcription units.

Molecular studies showed that most of these transcription

units encode proteins with basic helix-loop-helix (bHLH)

motif characteristic of certain transcription factors

[6,7,8]. Another structurally unrelated transcript, the

m9/10 group, was originally identified by a viable mutant,

groucho, which has specific head bristle duplications. The

groucho gene encodes a nuclear protein of 719 amino acids

demarcated by Trp-Asp (WD-40 repeat) present in a guanine

nucleotide binding protein (G-protein) ~-subunit [9].

In order to better understand the mechanisms regulating

invertebrate neurogenic genes during neural development, we

have cloned and sequenced the nematode ESG cDNA and

complete gene including the promoter region. We report

here the cDNA cloning, nucleotide and deduced amino-acid

sequences, genomic organization of nematode ESG gene, and

structural relationships among the groucho and related

proteins from human, mouse, rat, Xenopus, Drosophila and

nematode.

MATERIAL AND METHODS

Cloning and sequencing of nematode ESG cDNA and gene. Nematode embryo cDNA library in Lambda uni-ZAP XR vector (Sratagene) was screened using a probe generated by the mixed cDNA templates and PCR-primers (5'-GCGTTGGCGATGTCA CCAG-3') and 5'-GTCAAGACCACCTGACCAG-3') based on the partial ESG sequence of C. elegans (GenBank Accession no. T02011)o After isolating the full length cDNA, specific probe from the 5' end was generated, using primers (5'- AAGGCATCGTATCTGG-3' and 5'-AATCCTCCAGCAGCACTA-3'), to screen the nematode genomic library in Lambda FIXII Vector (Stratagene). The probe was labeled with digoxigenin system, and the positive clones were identified with chemilumineseent detection system according to the procedure recommended by the manufacturer (Boehringer Mannheim). The inserts excised to Bluescript SK(-)

328

Vol. 43, No. 2, 1997 BIOCHEMISTRYand MOLECULAR BIOLOGY INTERNATIONAL

phagemid were labeled with the Dye Terminator kit (Perkin Elmer), and its nucleotide sequence was determined using an automated DNA Sequencer (Applied Biosystems Model 373A).

Construction of evolutionary tree from 12 groucho related proteins. The complete amino acid sequence of ESG from nematode was deduced from the cDNA sequence determined in this investigation. Eleven other groucho related proteins previously reported [9,10,11,12,13,14,15,16] were obtained from Genbank database. The scientific names of the organisms and the accession numbers of the published groucho related sequences are as follows: human, Homo sapiens, AES/hesp (X73358/U04241), TLEI (M99435), TLE2 (M99436), TLE3 (M99438), mouse, Mus musculus, AES/Grg (X73361/LI2140), ESG (X73360); rat, Rattus norvegicus, R- espl (L14462), R-esp2 (L14463); fruitfly, Drosophila melanogaster (M20571); African frog, Xenopus laevis, AES (U18776), ESG (U18775). The amino acid sequences were aligned using the pileup program of the Wisconsin GCG package based on the method of Feng and Doolittle [23]. The evolutionary tree (cladogram) was constructed from the distance matrix of UPGMA method included in the GCG Wisconsin package.

RESULTS

Nematode ESG cDNA and Genomic sequences. Several cDNA and

genomic clones were isolated and the nucleotide sequences

of the insert DNAs were completely determined (Fig. i).

Full length cDNA clones 2, 84 and 88 contain an insert of

2,147 nucleotides, including the protein-coding sequence of

1,813 nucleotides, 5' and 3' noncoding region of 15 and 251

nucleotides respectively, and a poly(A) tail of 22

nucleotides (complete cDNA and gene sequences were

deposited in the GenBank, Accession No. AF001271 and

AF001272). A putative polyadenylation signal AATAAA was

present at 14 nucleotides 5' to the poly(A)-tail. Size of

partial cDNA clones 5, 77 and 78 is 1,335-bp long, but 812

nucleotides shorter at the 5' end of the full length cDNA.

The genomic clone ii0 isolated contains 1,347 nucleotides,

including the 5' end promoter region of 1,219 nucleotides,

exon 1 and partial exon 2 (Fig.2). A putative TATATAA-box

at position -1166, two leader-binding protein-i (LBP-I)

binding motifs ACTGG at position -563 and -211, and nine

329

Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL

o_

g

~T

- - ~ 0

z

o

co CO

c~

cO p..

K b -

co ua z z o UJ 0 r~ W

~0

0

0

o

o ~ N o o -,a -H

.oo~0 m ~ - ~ o o �9

~,- o~o- ~ o u -~ N O

J OL) O 0

�9 , J N - H

- U , ~ I

~ ~ d ~ 4 ~

�9 0 ~l [-I-~ h

O m O O N O

H M N �9 ~

330

Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL

putative CAAT boxes were found in the promoter region of

the nematode ESG gene. Genomic clones 108, iii, 113 and

116 contains 2,740 nucleotides, including complete exons 1

to 5, partial exon 6 and all 5 introns. The size of

introns 1 to 5 is 52, 252, 87, 53 and 518-bp, respectively,

while exons 1 to 6 contains 81, 144, 511, 198, 849 and 156-

bp, respectively, giving a total of 4,246-bp for the

complete ESG gene (Fig.2). A search of nematode ACeDB

database at the Sanger Center by BLAST E-mail server

detected 100% identity between a portion of contig W02D3

sequence, containing 35,959 nucleotides located on

chromosome i, and our 4,246-bp nematode ESG gene sequence

determined in this investigation.

DISCUSSION

The deduced 612 amino acid sequence of nematode ESG

protein was aligned with those of Drosophila groucho [9]

and related proteins from human, mouse, rat and Xenopus

reported previously [10,11,13,14,15,16] (data not

presented). Nematode ESG protein is found to be the

smallest in size when compared with the human transducin-

like Enhancer of split (TLE3), mouse ESG, rat R-esp2,

Xenopus ESGI and invertebrate Drosophila groucho

containing 772, 771, 741, 767 and 719 amino acids,

respectively. The amino acid sequence of nematode ESG

protein exhibits 47.71%, 47.71%, 47.22% and 45.1% identity

to human TLE3, mouse ESG, human TLEI and Drosophila

groucho, respectively. The amino terminus of nematode ESG

protein is 17 amino acids shorter as compared with human

TLEI and TLE3, and 19 amino acids shorter as compared to

Drosophila groucho. The central region of Drosophila

groucho and human TLE proteins was reported to contain

nuclear localization sequence (NLS), casein kinase II

(CKII) and cdc2 kinase (cdc2k) sites, and these proteins

were shown to be present in the nucleus [9,10]. It is

interesting to note that in nematode ESG and Drosophila

groucho, deletions at several positions were observed in

331

Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL

~-t i-t i-I o l o l ~ I~1

r.DD~ 4J O f ~ N

. u ~J

O U O o

O o

O r O O B 4 ' < Ea El O~ O I 4 o i~

~ o o o o o o o o o o o o

I I i i

M ~ � 9 B a ~ B ~ B

F

o o o ~ O O o ~ o O

o ~ o o O ~ o ~

~ O < O ~ O o o

o ~ ~ ~ H

o o o o o o ,~{ LQ ko E-- ,9o oh o H 0,1

3 3 2

Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL

E~

E~

r_9

�9

o E~ o E~ <9

t~

o

43 4m

Ez 4J 4J o

r~ 0 4J o

4~ 4~ 4m o

4D 4J

4~

o

r~ tm

4J tm 0

<9 0 0 E~ E~

<5

0 0 0

<5

I- ra

~ 0 ~ o ~ ~ ~ Q ~ ~ ~ Q o

u

o ~ o ~ o ~ o ~ o ~ o ~ ~o o ~

rd b q ~ bl ~ r.9 ~ b , ,~I U rd .,~ b3 ~ ~ L9

.o .J rd b3 E~ ~ E~ E-4 L9

ml ~ rd .~ ~ rd r d ~ � 9

b~ b~ rd r j L9 ~ �9 0

~U D~U 0 (5

4J ~ tmo �9

~J ~ 4J r~ C) 0~-I E~ E ~ o o~J tm ~ ~ OE~

o4J~ tm tm E~ ~ 0 E~

4J4J o o 4J ~ tm o ~ O~ 4J C9 I~ 0 r o o 4J 4J C] ~ 0 o tm~ 4J ~ E~ 0 E~ o o tmo ~ E~ ~ ~C0

~ ~J ~4~ 4J 0 E~ E~ ~4J o4J ~ E~ ~

o o o ~ o c) ~ c9 ~

E-4 o o 4~ o 4m

~ ~ ~ o o E~ U E~ O~ o o r~ ~ ~ E~ E~ <J ~ 4J tmO r~ cD E~ 0 E~

tm 4~ 4J ~ ~ 0

o tmo o o 0 ~q E~ E~ 0

o 4~

4~ 4J

E~ E~ 4J ~J UI 0 rO 0 ~ LJ

0 E~E~U

, ' U

s -~ 0 - I J -H (9

~ - I 0 0 ~ ~ - ~

04 r~ ' I J - ~

i ~ 0 0

�9 qt 0 . ~ C ~ ~ O C ~

- r ~ m (1), .~C9

r..)

.H r ~ @

~ o ~ o ~ I

m ' ~ ,s r r ) ~ d 0 m - , M

~ I ~ ~ �9 ~ - , - I u] -IJ .I-) [>

~q ~ O - ~ H

3 3 3

Vol. 43, No. 2, ] 997 BIOCHEMISTRYand MOLECULAR BIOLOGY INTERNATIONAL

the central nuclear localization sites. Also, in the

central species specific domain, there is a deletion of 23

and 33 amino acids in nematode ESG and Rat R-esp2,

respectively, and an insertion of 9 amino acids in Xenopus

ESGI. In the highly conserved C-terminal domain, all

species of vertebrate and invertebrate contain four WD-40

repeats at position 557, 611, 655 and 697 except nematode

ESG where the third WD at position 655 is missing. Also,

after the first WI]-40 repeat, there is an insertion of 7

amino acids at position 523 in nematode ESG, not found in

any other species. These WD-40 repeats were found in an

expanding group of unrelated proteins, including the yeast

proteins encoded by a cell cycle gene (CDC4), a mediator

of glucose repression gene (TUPI), suppressor gene for

flocculation (SLF2), al-a2 repression and cell control

(AARI) and control of heme regulated and catabolite-

repressed gene (AER2). The TUPI, SLF2, AARI and AER2 genes

were cloned on the basis of different phenotypes, but were

found to be identical [17,18,19,20,21].

The overall structural relationships among groucho and

related proteins from human, mouse, rat, Xenopus,

Drosophila and nematode were analyzed and the evolutionary

tree was constructed using UPGMA method [22], as

illustrated in Fig.3. The ancestor nematode groucho is

evolved earlier than fruitfly. Also, the nematode groucho

sequence was shown to be the best outgroup sequence for

studying the molecular evolution of groucho related

proteins by UPGMA. Two clear clusters, AES and ESG/TLE

proteins, were formed. The AES clusters were highly

conserved and was previously proposed to contain a leucine

zipper motif [ii]. Within the ESG/TLE proteins, isoforms

were originated by gene duplication. We proposed that the

first gene duplication event was shown in the early

separation of invertebrate and vertebrate ancestor groucho

related proteins. Later, at least four types of ESG/TLE

related proteins were identified from mammals

334

Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL

Mouse AES/Grg

Rat R-espl

Human AES/hesp

Xenopus AES

i_~ Human TLE 3

[~ t MouseESG

r_~ [~- Human TLE1

I I t-RatR-esp 2

Drosophila Groucho

Nematode ESG

FIG. 3. Evolutionary tree of 12 ESG and AES sequences. The evolutionary relationships among amino acid sequences of 13 groucho related proteins is presented using UPGMA method [22]. UPGMA indicated that nematode ESG branches off earlier than fruitfly groucho. Using nematode groucho ESG as an outgroup for the tree analysis, the four AES proteins are clustered into a group, and the groucho, ESG and TLE proteins are clustered into a separate group.

[i0,ii,12,13,14,16] and Xenopus [15] by two independent

gene duplication events. On the basis of UPGMA tree

analysis, the ESG/TLE typed-2 groucho iso-protein was

evolved earlier than other isoforms and they are

structurally closer to Xenopus ESGI and fruitfly groucho.

Finally, it will be of interest in the future to clone all

the four human homologs of ESG/TLE related proteins in

lower vertebrates such as zebrafish in order to elucidate

the molecular evolution of ESG/TLE groucho related gene

family by gene duplication event.

335

Vol. 43, No. 2, 1997 BIOCHEMISTRYand MOLECULAR BIOLOGY INTERNATIONAL

ACKNOWLEDGMENTS

This investigation was supported in part by NIEHS, National Institutes of Health, USA and in part by grants NSC85-2732-B-II0-002 and NSC86-2313-B-II0-002 from National Science Council of Taiwan, ROC. S.S.-L.Li is a recipient of Outstanding Professor Chair from the Foundation for the Advancement of Outstanding Scholarship in Taiwan, ROC.

REFERENCES

i. Artavanis-Tsakonas, S., Delidakis, C., and Fehon, R.G. (1991) Annu. Rev. Cell. Biol. 7, 427-452.

2. Cabrera, C.V. (1992) Development 115, 893-901. 3. Campos-Ortega, J.A., and Jan, Y.N. (1991) Neurosci.

14, 399-420. 4. Ghysen, A., Dambly-Chaudiere, C., Jan, L.Y., Jan, Y.N.

(1993) Genes Dev. 7, 723-733. 5. Lehman, R., Jimenez, F., Dietrich, U., and Campos-

Ortega, J.A. (1983) Arch. Dev. Biol. 192, 62-74. 6. Delidakis, C., and Artavanis-Tsakonas, S. (1992) Proc.

Natl. Acad. Sci. USA 89, 8731-8735. 7. Knust, E., Schrons, H., Grawe, F., and Campos-Ortega,

J.A. (1992) Genetics 132, 505-518. 8. Schrons, H., Knust, E., and Campos-Ortega, J.A. (1992)

Genetics 132, 481-503. 9. Hartley, D.A., Preiss, A., and Artavanis-Tsakonas, S.

(1988) Cell 55, 785-795. i0. Stifani, S., Blaumueller, C.M., Redhead, N.J., Hill,

R.E. and Artavanis-Tsakonas, S. (1992) Nature Genet. 2, 119-127.

ii. Miyasaka, H., Choudhury, B.K., Hou, E.W., and Li, S. S-L. (1993) Eur. J. Biochem. 216, 343-352.

12 Mallo, M., Steingrimsson, E., Copeland, N.G., Jenkins, N.A., and Gridley, T. (1994) Genomics 21, 194-201.

13 Schmidt, C.J., and Sladek, T.E. (1993) J. Biol. Chem. 268, 25681-25686.

14 Scala, L.A., Tirumalai, P.T., Piparo, K.E., and Howells, R.D. (1994) FASEB J. 8, A1419.

15 Choudhury, B. K., Kim, J., Kung, H-F., and Li, S. S.-L. (1997) Gene (In Press)

16 Mallo, M., Franco del Amo, F., and Gridley, T. (1993) Mech. Dev. 42, 67-76.

17 Yochem, J., and Byers, B. (1987) J. Mol. Biol. 195, 233-245.

18 Fujita, A., Matsumoto, S., Kuhara, S., Misumi, Y. and Kobayashi, H. (1990) Gene 89, 93-99.

19 Williams, F.E., and Trumbly, R.J. (1990) Mol. Cell. Biol. I0, 6500-6511.

20 Mukai, Y., Harashima, S., and Oshima, Y.-(1991) Mol. Cell. Biol. ii, 3773-3779.

21 Zhang, M., Rosenblum-Vos, L.S., Lowry, C.V., Boakye, K.A., and Zitomer, R.S. (1991) Gene 97, 153-161.

336

Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL

22. Swofford, D.L., and Olsen, G.J. (1990) Molecular Systemics. In Hillis, D.M., and Moritz~ C. (Eds.), Sinauer, Sunderland, MA, 411-S01.

23. Feng, D.F., and Doolittle, R.F. (1987) O. Mol. Evol. 25, 353-360.

337