cDNA cloning and genomic organization of enhancer of split groucho gene from nematode Caenorhabditis...
Transcript of cDNA cloning and genomic organization of enhancer of split groucho gene from nematode Caenorhabditis...
Vol. 43, No. 2, October 1997 BIOCHEMISTRY and MOLECULAR BIOLOGY INTERNATIONAL Pages 327-337
cDNA CLONING AND GENOMIC ORGANIZATION OF ENHANCER OF SPLIT GROUCHO GENE FROM NEMATODE CAENORHABDITIS ELEGANS
Farida S. Sharief I, Stephen C.-M. Tsoi I and Steven S.-L. Lil,2, 3
iLaboratory of Molecular Genetics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA 2Institute of Life Sciences, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424, ROC.
Received June 12, 1997
SUMMARY
This first genomic Enhancer of split groucho (ESG) gene and its full length complementary DNA (cDNA) from nematode C. elegans were cloned and sequenced via homology with the corresponding Drosophila groucho cDNA. The cDNA of 2.l-Kb encodes a protein of 612 amino acids, and the nematode ESG protein is the smallest and most different in structure compared to all ESG related proteins. The gene isolated is 4,246-bp in size, including 1,219-bp promoter region. A putative TATA-box at position -1166, two consensus sequence of ACTGG, characteristic of leader binding protein-i (LBP- i) binding motifs at position -563 and -211 and nine CAAT boxes were found in the promoter region of ESG gene. The protein-coding sequence is interrupted by five introns. The length of introns 1 to 5 is 52, 252, 87, 53 and 518 bp, respectively. The overall structural relationships of the ESG-related proteins among human, mouse, rat, Xenopus, Drosophila and nematode were also analyzed.
INTRODUCTION
In the fruitfly Drosophila melanogaster, neurogenesis is
under the control of several loci, namely Enhancer of
split, Notch, Delta, mastermind, big brain and neuralized.
4The nucleotide sequences have been deposited in the GenBank database under accession no. AF001271' and AF001272.
3To whom correspondence should be addressed at Institute of Life Sciences, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424, ROC. Tei:886-7-525-2379; Fax:886-7-525-2360; e-mail: [email protected].
327
1039-9712/97/020327-11 $05.00/0 Copyright �9 1997 by Academic Press Australia. All rights ty~ reproduction in any form reserved
Vol. 43, No. 2, 1997 BIOCHEMISTRYand MOLECULAR BIOLOGY INTERNATIONAL
These neur0genic loci are required for the proper
segregation of neural and epidermal progenitor cells during
the formation of both the central and peripheral nervous
systems [1,2,3,4]. Mutations in these genes cause most
ventral ectodermal cells to become neuroblasts with little
or no formation of epidermis [5]. The Enhancer of split
gene complex contains at least 13 transcription units.
Molecular studies showed that most of these transcription
units encode proteins with basic helix-loop-helix (bHLH)
motif characteristic of certain transcription factors
[6,7,8]. Another structurally unrelated transcript, the
m9/10 group, was originally identified by a viable mutant,
groucho, which has specific head bristle duplications. The
groucho gene encodes a nuclear protein of 719 amino acids
demarcated by Trp-Asp (WD-40 repeat) present in a guanine
nucleotide binding protein (G-protein) ~-subunit [9].
In order to better understand the mechanisms regulating
invertebrate neurogenic genes during neural development, we
have cloned and sequenced the nematode ESG cDNA and
complete gene including the promoter region. We report
here the cDNA cloning, nucleotide and deduced amino-acid
sequences, genomic organization of nematode ESG gene, and
structural relationships among the groucho and related
proteins from human, mouse, rat, Xenopus, Drosophila and
nematode.
MATERIAL AND METHODS
Cloning and sequencing of nematode ESG cDNA and gene. Nematode embryo cDNA library in Lambda uni-ZAP XR vector (Sratagene) was screened using a probe generated by the mixed cDNA templates and PCR-primers (5'-GCGTTGGCGATGTCA CCAG-3') and 5'-GTCAAGACCACCTGACCAG-3') based on the partial ESG sequence of C. elegans (GenBank Accession no. T02011)o After isolating the full length cDNA, specific probe from the 5' end was generated, using primers (5'- AAGGCATCGTATCTGG-3' and 5'-AATCCTCCAGCAGCACTA-3'), to screen the nematode genomic library in Lambda FIXII Vector (Stratagene). The probe was labeled with digoxigenin system, and the positive clones were identified with chemilumineseent detection system according to the procedure recommended by the manufacturer (Boehringer Mannheim). The inserts excised to Bluescript SK(-)
328
Vol. 43, No. 2, 1997 BIOCHEMISTRYand MOLECULAR BIOLOGY INTERNATIONAL
phagemid were labeled with the Dye Terminator kit (Perkin Elmer), and its nucleotide sequence was determined using an automated DNA Sequencer (Applied Biosystems Model 373A).
Construction of evolutionary tree from 12 groucho related proteins. The complete amino acid sequence of ESG from nematode was deduced from the cDNA sequence determined in this investigation. Eleven other groucho related proteins previously reported [9,10,11,12,13,14,15,16] were obtained from Genbank database. The scientific names of the organisms and the accession numbers of the published groucho related sequences are as follows: human, Homo sapiens, AES/hesp (X73358/U04241), TLEI (M99435), TLE2 (M99436), TLE3 (M99438), mouse, Mus musculus, AES/Grg (X73361/LI2140), ESG (X73360); rat, Rattus norvegicus, R- espl (L14462), R-esp2 (L14463); fruitfly, Drosophila melanogaster (M20571); African frog, Xenopus laevis, AES (U18776), ESG (U18775). The amino acid sequences were aligned using the pileup program of the Wisconsin GCG package based on the method of Feng and Doolittle [23]. The evolutionary tree (cladogram) was constructed from the distance matrix of UPGMA method included in the GCG Wisconsin package.
RESULTS
Nematode ESG cDNA and Genomic sequences. Several cDNA and
genomic clones were isolated and the nucleotide sequences
of the insert DNAs were completely determined (Fig. i).
Full length cDNA clones 2, 84 and 88 contain an insert of
2,147 nucleotides, including the protein-coding sequence of
1,813 nucleotides, 5' and 3' noncoding region of 15 and 251
nucleotides respectively, and a poly(A) tail of 22
nucleotides (complete cDNA and gene sequences were
deposited in the GenBank, Accession No. AF001271 and
AF001272). A putative polyadenylation signal AATAAA was
present at 14 nucleotides 5' to the poly(A)-tail. Size of
partial cDNA clones 5, 77 and 78 is 1,335-bp long, but 812
nucleotides shorter at the 5' end of the full length cDNA.
The genomic clone ii0 isolated contains 1,347 nucleotides,
including the 5' end promoter region of 1,219 nucleotides,
exon 1 and partial exon 2 (Fig.2). A putative TATATAA-box
at position -1166, two leader-binding protein-i (LBP-I)
binding motifs ACTGG at position -563 and -211, and nine
329
Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL
o_
g
~T
- - ~ 0
z
o
co CO
c~
cO p..
K b -
co ua z z o UJ 0 r~ W
~0
0
0
o
o ~ N o o -,a -H
.oo~0 m ~ - ~ o o �9
~,- o~o- ~ o u -~ N O
J OL) O 0
�9 , J N - H
- U , ~ I
~ ~ d ~ 4 ~
�9 0 ~l [-I-~ h
O m O O N O
H M N �9 ~
330
Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL
putative CAAT boxes were found in the promoter region of
the nematode ESG gene. Genomic clones 108, iii, 113 and
116 contains 2,740 nucleotides, including complete exons 1
to 5, partial exon 6 and all 5 introns. The size of
introns 1 to 5 is 52, 252, 87, 53 and 518-bp, respectively,
while exons 1 to 6 contains 81, 144, 511, 198, 849 and 156-
bp, respectively, giving a total of 4,246-bp for the
complete ESG gene (Fig.2). A search of nematode ACeDB
database at the Sanger Center by BLAST E-mail server
detected 100% identity between a portion of contig W02D3
sequence, containing 35,959 nucleotides located on
chromosome i, and our 4,246-bp nematode ESG gene sequence
determined in this investigation.
DISCUSSION
The deduced 612 amino acid sequence of nematode ESG
protein was aligned with those of Drosophila groucho [9]
and related proteins from human, mouse, rat and Xenopus
reported previously [10,11,13,14,15,16] (data not
presented). Nematode ESG protein is found to be the
smallest in size when compared with the human transducin-
like Enhancer of split (TLE3), mouse ESG, rat R-esp2,
Xenopus ESGI and invertebrate Drosophila groucho
containing 772, 771, 741, 767 and 719 amino acids,
respectively. The amino acid sequence of nematode ESG
protein exhibits 47.71%, 47.71%, 47.22% and 45.1% identity
to human TLE3, mouse ESG, human TLEI and Drosophila
groucho, respectively. The amino terminus of nematode ESG
protein is 17 amino acids shorter as compared with human
TLEI and TLE3, and 19 amino acids shorter as compared to
Drosophila groucho. The central region of Drosophila
groucho and human TLE proteins was reported to contain
nuclear localization sequence (NLS), casein kinase II
(CKII) and cdc2 kinase (cdc2k) sites, and these proteins
were shown to be present in the nucleus [9,10]. It is
interesting to note that in nematode ESG and Drosophila
groucho, deletions at several positions were observed in
331
Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL
~-t i-t i-I o l o l ~ I~1
r.DD~ 4J O f ~ N
. u ~J
O U O o
O o
O r O O B 4 ' < Ea El O~ O I 4 o i~
~ o o o o o o o o o o o o
I I i i
M ~ � 9 B a ~ B ~ B
F
o o o ~ O O o ~ o O
o ~ o o O ~ o ~
~ O < O ~ O o o
o ~ ~ ~ H
o o o o o o ,~{ LQ ko E-- ,9o oh o H 0,1
3 3 2
Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL
E~
E~
r_9
�9
o E~ o E~ <9
t~
o
43 4m
Ez 4J 4J o
r~ 0 4J o
4~ 4~ 4m o
4D 4J
4~
o
r~ tm
4J tm 0
<9 0 0 E~ E~
<5
0 0 0
<5
I- ra
~ 0 ~ o ~ ~ ~ Q ~ ~ ~ Q o
u
o ~ o ~ o ~ o ~ o ~ o ~ ~o o ~
rd b q ~ bl ~ r.9 ~ b , ,~I U rd .,~ b3 ~ ~ L9
.o .J rd b3 E~ ~ E~ E-4 L9
ml ~ rd .~ ~ rd r d ~ � 9
b~ b~ rd r j L9 ~ �9 0
~U D~U 0 (5
4J ~ tmo �9
~J ~ 4J r~ C) 0~-I E~ E ~ o o~J tm ~ ~ OE~
o4J~ tm tm E~ ~ 0 E~
4J4J o o 4J ~ tm o ~ O~ 4J C9 I~ 0 r o o 4J 4J C] ~ 0 o tm~ 4J ~ E~ 0 E~ o o tmo ~ E~ ~ ~C0
~ ~J ~4~ 4J 0 E~ E~ ~4J o4J ~ E~ ~
o o o ~ o c) ~ c9 ~
E-4 o o 4~ o 4m
~ ~ ~ o o E~ U E~ O~ o o r~ ~ ~ E~ E~ <J ~ 4J tmO r~ cD E~ 0 E~
tm 4~ 4J ~ ~ 0
o tmo o o 0 ~q E~ E~ 0
o 4~
4~ 4J
E~ E~ 4J ~J UI 0 rO 0 ~ LJ
0 E~E~U
, ' U
s -~ 0 - I J -H (9
~ - I 0 0 ~ ~ - ~
04 r~ ' I J - ~
i ~ 0 0
�9 qt 0 . ~ C ~ ~ O C ~
- r ~ m (1), .~C9
r..)
.H r ~ @
~ o ~ o ~ I
m ' ~ ,s r r ) ~ d 0 m - , M
~ I ~ ~ �9 ~ - , - I u] -IJ .I-) [>
~q ~ O - ~ H
3 3 3
Vol. 43, No. 2, ] 997 BIOCHEMISTRYand MOLECULAR BIOLOGY INTERNATIONAL
the central nuclear localization sites. Also, in the
central species specific domain, there is a deletion of 23
and 33 amino acids in nematode ESG and Rat R-esp2,
respectively, and an insertion of 9 amino acids in Xenopus
ESGI. In the highly conserved C-terminal domain, all
species of vertebrate and invertebrate contain four WD-40
repeats at position 557, 611, 655 and 697 except nematode
ESG where the third WD at position 655 is missing. Also,
after the first WI]-40 repeat, there is an insertion of 7
amino acids at position 523 in nematode ESG, not found in
any other species. These WD-40 repeats were found in an
expanding group of unrelated proteins, including the yeast
proteins encoded by a cell cycle gene (CDC4), a mediator
of glucose repression gene (TUPI), suppressor gene for
flocculation (SLF2), al-a2 repression and cell control
(AARI) and control of heme regulated and catabolite-
repressed gene (AER2). The TUPI, SLF2, AARI and AER2 genes
were cloned on the basis of different phenotypes, but were
found to be identical [17,18,19,20,21].
The overall structural relationships among groucho and
related proteins from human, mouse, rat, Xenopus,
Drosophila and nematode were analyzed and the evolutionary
tree was constructed using UPGMA method [22], as
illustrated in Fig.3. The ancestor nematode groucho is
evolved earlier than fruitfly. Also, the nematode groucho
sequence was shown to be the best outgroup sequence for
studying the molecular evolution of groucho related
proteins by UPGMA. Two clear clusters, AES and ESG/TLE
proteins, were formed. The AES clusters were highly
conserved and was previously proposed to contain a leucine
zipper motif [ii]. Within the ESG/TLE proteins, isoforms
were originated by gene duplication. We proposed that the
first gene duplication event was shown in the early
separation of invertebrate and vertebrate ancestor groucho
related proteins. Later, at least four types of ESG/TLE
related proteins were identified from mammals
334
Vol. 43, No. 2, 1997 BIOCHEMISTRYond MOLECULAR BIOLOGY INTERNATIONAL
Mouse AES/Grg
Rat R-espl
Human AES/hesp
Xenopus AES
i_~ Human TLE 3
[~ t MouseESG
r_~ [~- Human TLE1
I I t-RatR-esp 2
Drosophila Groucho
Nematode ESG
FIG. 3. Evolutionary tree of 12 ESG and AES sequences. The evolutionary relationships among amino acid sequences of 13 groucho related proteins is presented using UPGMA method [22]. UPGMA indicated that nematode ESG branches off earlier than fruitfly groucho. Using nematode groucho ESG as an outgroup for the tree analysis, the four AES proteins are clustered into a group, and the groucho, ESG and TLE proteins are clustered into a separate group.
[i0,ii,12,13,14,16] and Xenopus [15] by two independent
gene duplication events. On the basis of UPGMA tree
analysis, the ESG/TLE typed-2 groucho iso-protein was
evolved earlier than other isoforms and they are
structurally closer to Xenopus ESGI and fruitfly groucho.
Finally, it will be of interest in the future to clone all
the four human homologs of ESG/TLE related proteins in
lower vertebrates such as zebrafish in order to elucidate
the molecular evolution of ESG/TLE groucho related gene
family by gene duplication event.
335
Vol. 43, No. 2, 1997 BIOCHEMISTRYand MOLECULAR BIOLOGY INTERNATIONAL
ACKNOWLEDGMENTS
This investigation was supported in part by NIEHS, National Institutes of Health, USA and in part by grants NSC85-2732-B-II0-002 and NSC86-2313-B-II0-002 from National Science Council of Taiwan, ROC. S.S.-L.Li is a recipient of Outstanding Professor Chair from the Foundation for the Advancement of Outstanding Scholarship in Taiwan, ROC.
REFERENCES
i. Artavanis-Tsakonas, S., Delidakis, C., and Fehon, R.G. (1991) Annu. Rev. Cell. Biol. 7, 427-452.
2. Cabrera, C.V. (1992) Development 115, 893-901. 3. Campos-Ortega, J.A., and Jan, Y.N. (1991) Neurosci.
14, 399-420. 4. Ghysen, A., Dambly-Chaudiere, C., Jan, L.Y., Jan, Y.N.
(1993) Genes Dev. 7, 723-733. 5. Lehman, R., Jimenez, F., Dietrich, U., and Campos-
Ortega, J.A. (1983) Arch. Dev. Biol. 192, 62-74. 6. Delidakis, C., and Artavanis-Tsakonas, S. (1992) Proc.
Natl. Acad. Sci. USA 89, 8731-8735. 7. Knust, E., Schrons, H., Grawe, F., and Campos-Ortega,
J.A. (1992) Genetics 132, 505-518. 8. Schrons, H., Knust, E., and Campos-Ortega, J.A. (1992)
Genetics 132, 481-503. 9. Hartley, D.A., Preiss, A., and Artavanis-Tsakonas, S.
(1988) Cell 55, 785-795. i0. Stifani, S., Blaumueller, C.M., Redhead, N.J., Hill,
R.E. and Artavanis-Tsakonas, S. (1992) Nature Genet. 2, 119-127.
ii. Miyasaka, H., Choudhury, B.K., Hou, E.W., and Li, S. S-L. (1993) Eur. J. Biochem. 216, 343-352.
12 Mallo, M., Steingrimsson, E., Copeland, N.G., Jenkins, N.A., and Gridley, T. (1994) Genomics 21, 194-201.
13 Schmidt, C.J., and Sladek, T.E. (1993) J. Biol. Chem. 268, 25681-25686.
14 Scala, L.A., Tirumalai, P.T., Piparo, K.E., and Howells, R.D. (1994) FASEB J. 8, A1419.
15 Choudhury, B. K., Kim, J., Kung, H-F., and Li, S. S.-L. (1997) Gene (In Press)
16 Mallo, M., Franco del Amo, F., and Gridley, T. (1993) Mech. Dev. 42, 67-76.
17 Yochem, J., and Byers, B. (1987) J. Mol. Biol. 195, 233-245.
18 Fujita, A., Matsumoto, S., Kuhara, S., Misumi, Y. and Kobayashi, H. (1990) Gene 89, 93-99.
19 Williams, F.E., and Trumbly, R.J. (1990) Mol. Cell. Biol. I0, 6500-6511.
20 Mukai, Y., Harashima, S., and Oshima, Y.-(1991) Mol. Cell. Biol. ii, 3773-3779.
21 Zhang, M., Rosenblum-Vos, L.S., Lowry, C.V., Boakye, K.A., and Zitomer, R.S. (1991) Gene 97, 153-161.
336