Genomic organization, transcript variants and comparative analysis of the human nucleoporin 155...
Transcript of Genomic organization, transcript variants and comparative analysis of the human nucleoporin 155...
Genomic organization, transcript variants and comparative analysis of thehuman nucleoporin 155 (NUP155) geneq
Xiuqing Zhanga,b, Huanming Yanga,b,*, Jun Yua,c, Cong Chena, Guangyu Zhanga, Jingyue Baoa,Yutao Dua, Miho Kibukawac, Zhijie Lia,b,d, Jun Wanga, Songnian Hua, Wei Donga, Jian Wanga,
Niels Gregersend, Erik Niebuhre, Lars Bolundb
aHuman Genome Center, Institute of Genetics, Chinese Academy of Sciences, Datun Road, Beijing 100101, ChinabInstitute of Human Genetics, Aarhus University, Aarhus, Denmark
cHuman Genome Center, University of Washington, Seattle, WA, USAdResearch Unit for Molecular Medicine, Aarhus University Hospital, Aarhus, Denmark
eDepartment of Medical Genetics, IMBG, Copenhagen University, Copenhagen, Denmark
Received 6 August 2001; received in revised form 21 January 2002; accepted 4 February 2002
Received by E. Sverdlov
Abstract
Nucleoporin 155 (Nup155) is a major component of the nuclear pore complex (NPC) involved in cellular nucleo-cytoplasmic transport.
We have acquired the complete sequence and interpreted the genomic organization of the Nup155 orthologos from human (Homo sapiens)
and pufferfish (Fugu rubripes), which are approximately 80 and 8 kb in length, respectively. The human gene is ubiquitously expressed in
many tissues analyzed and has two major transcript variants, resulted from an alternative usage of the 5 0 cryptic or consensus splice donor in
intron 1 and two polyadenylation signals. We have also cloned DNA complementary to RNAs of the Nup155 orthologs from Fugu and
mouse. Comparative analysis of the Nup155 orthologs in many species, including H. sapiens, Mus musculus, Rattus norvegicus, F. rubripes,
Arabidopsis thaliana, Drosophila melanogaster, and Saccharomyces cerevisiae, has revealed two paralogs in S. cerevisiae but only a single
gene with increasing number of introns in more complex organisms. The amino acid sequences of the Nup155 orthologos are highly
conserved in the evolution of eukaryotes. Different gene orders in the human and Fugu genomic regions harboring the Nup155 orthologs
advocate cautious interpretation of synteny in comparative genomic analysis even within the vertebrate lineage. q 2002 Elsevier Science
B.V. All rights reserved.
Keywords: Nuclear pore complex (NPC); Shotgun sequencing; DNA complementary to RNA cloning; Gene order; Nucleoporin 155 gene orthologs
1. Introduction
Nucleoporins are major components of the nuclear pore
complex (NPC). They are involved in regulating bi-direc-
tional trafficking of cellular macromolecules, especially
message RNAs (mRNAs) and proteins, between the nucleus
and cytosol (Bagley et al., 2000; Gorlich and Mattaj, 1996).
Studies on yeast and Drosophila have revealed that most of
the nucleoporin genes are essential for survival (Fabre and
Hurt, 1997; Kiger et al., 1999). More than 30 nucleoporins
have been identified in yeast, Drosophila, Arabidopsis,
Tritrichomonas, Fugu, zebrafish, rat, mouse, and humans
(Doye and Hurt, 1995; Kosova et al., 1999; Miller et al.,
2000; Belgareh et al., 2001). Malfunction of nucleoporins
has been suggested to be pathogenic in humans. For exam-
ple, overexpression of human CAN/Nup214, a well studied
nucleoporin and putative oncogene associated with myeloid
leukemia, was demonstrated to induce nucleo-cytoplasmic
transport defects, cell growth arrest, and apoptosis (Boer et
al., 1998; van Deursen et al., 1996). The disruption of the
human NUP98 gene and/or a produced fusion protein
appeared related to de novo childhood acute myeloid leuke-
mia (Jaju et al., 2001).
Gene 288 (2002) 9–18
0378-1119/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved.
PII: S0378-1119(02)00470-5
www.elsevier.com/locate/gene
Abbreviations: Nup155, nucleoporin 155 gene; NPC, nuclear pore
complex; ORFs, open reading frames; RACE, rapid amplification of
cDNA ends; kb, kilobase(s); kDa, kilodalton(s); cDNA, DNA complemen-
tary to RNA; mRNA, message RNA; EST, expressed sequence tag; BAC,
bacterial artificial chromosome; UTR, untranslated region; RT-PCR,
reverse transcription and PCR amplification; ARVCF, Armadillo Repeat
gene deleted in Velo-Cardio-facial syndrome; BMP10, bone morphogenic
protein 10 gene; LINEs, long interspersed elements; SINEs, short inter-
spersed elements; LTR, long-terminal repeatq Sequence data from this article have been deposited with EMBL/
GenBank Data Libraries under Accession Nos. AJ007558 (human
NUP155 cDNA), AF165926 (a BAC clone containing human NUP155
gene), AF322375 (mouse Nup155 cDNA), AF301600 (Fugu Nup153
cDNA) and AF301601 (a cosmid clone containing Fugu Nup153 gene).
* Corresponding author. Tel.: 186-10-6487-1664; fax: 186-10-6488-
9329.
E-mail address: [email protected] (H. Yang).
We have previously reported the identification of the full-
length human nucleoporin 155 gene (NUP155) DNA
complementary to RNA (cDNA) by exon trapping and in
silico cloning (Zhang et al., 1999). The gene was localized
to the 5p13 region, which might be involved in mental and
developmental retardation as observed in a collection of 5p-
syndrome patients. In this report we present the complete
sequence and genomic organization of the human NUP155
gene as well as a description of its transcript variants. We
have also cloned cDNAs of the Nup155 orthologs from
Fugu and mouse. The Fugu gene has a similar genomic
organization but much smaller introns than its human ortho-
log. We have further compared the neighboring genes
around the orthologous loci of Fugu and human NUP155.
To our surprise, no obvious common synteny is found to
exist between these two organisms because the five open
reading frames (ORFs), or genes, around the Fugu Nup155
ortholog are located in three different chromosome regions
in the human genome.
2. Materials and methods
2.1. Screening for human bacterial artificial chromosome
(BAC) clones containing the NUP155 gene
We screened the RPCI-11 Human Male BAC Library
(http://www.chori.org/bacpac/11framehmale.htm) with
three specific primer pairs designed from the most 5 0 and
3 0 regions as well as the middle part of the human NUP155
cDNA sequence by PCR on the DNA pools of the library.
Sequences of the primers are as follows (listed as the
forward and reverse primers with PCR product size and
primer annealing temperature in parentheses): 5 0 region:
AGAACGGCGTCTTCCAGTTCC and AACAAGAAAA-
GATCCAAGAAG (127 bp, 558C), 3 0 region: TGTGCC-
TGGCCTATTCCCTTC and AAGTAGACATGACAGA-
ATTTTA (368 bp, 558C), and middle region: TTCCTG-
GGCTCTTTCTGCG and GAGAGGAGAACAAATTTC-
TTC (141 bp, 558C). Clones positive for at least two of
the three primer pairs were further tested by multiple restric-
tion analysis with enzymes BglII, EcoRI, and NsiI, respec-
tively, using the software developed at the Human Genome
Center, University of Washington (Wong et al., 1997). The
clone that possessed the biggest insert, and was positive for
all the three primer pairs, was selected for shotgun sequen-
cing.
2.2. Shotgun sequencing and data analysis
The selected BAC clone was sequenced by a shotgun
strategy (Bouck et al., 1998). The BAC DNA was prepared
by a modified alkaline lysis method described previously
(Wong et al., 1997) and sheared by sonication. The resulting
fragments were end-repaired with T4 DNA polymerase
(New England Biolabs). The 1.6–3.0 kb DNA fragments
were selected by agarose gel electrophoresis, purified with
the Wizard PCR Preps DNA Purification System (Promega)
and cloned into SmaI-digested and calf intestinal alkaline
phosphatase-treated plasmid pUC18. The ligation mixture
was used to transform DH5a competent cells (Gibco BRL).
The recombinant plasmid DNA was prepared in a system
with 96-well plates according to protocols developed at the
Human Genome Center, University of Washington (http://
www.genome.washington.edu/UWGC/protocols) and sequ-
enced using dye terminator or dye primer chemistry with
ABI 377 automated sequencers following the protocol
provided by the manufacturer. The Phred program was
used to call the bases from the ABI trace data and to assign
quality values. The Phrap program was used to assemble
sequence traces into contiguous sequences, which were
viewed using the Consed program (Ewing and Green,
1998; Ewing et al., 1998; Gordon et al., 1998; Smith et
al., 1996). Gaps in the assembly were closed experimentally
by using primer walking based on the sequence information
from the assembled contig ends and from PCR products
spanning the gap region. Finally, exon/intron boundaries
were identified by comparison of the assembled consensus
sequences with cDNA sequences using the Cross-Match
program (P. Green, unpublished results, http://www.genome.
washington.edu). The assembly was further confirmed by
comparison of the computer-predicted restriction sites in
the sequence with the size of restriction fragments estimated
by complete digestion with multiple enzymes. The genomic
sequence was searched against databases to identify other
potential genes in the BAC.
2.3. Screening and sequencing of Fugu genomic and cDNA
clones
The Fugu whole genome and cDNA-K libraries on high-
density membranes were purchased from UK HGMP
Resource Center (http://www.hgmp.mrc.ac.uk). The Fugu
clones were screened with the human NUP155 cDNA
clone as a probe which was labeled with a-32P-dCTP
using nick translation with Prime-It RmT Random Primer
Labelling Kit (Stratagene). The biggest Fugu cosmid clone
and all the cDNA clones obtained were sequenced and
analyzed as described above.
2.4. cDNA cloning and alternative splicing analysis using
reverse transcription and PCR amplification (RT-PCR)
For Fugu cDNA cloning, 9 PCR primer pairs were
designed based on Fugu genomic sequences (listed as
forward and reverse primers with predicted PCR sizes
from genomic and deduced cDNA sequences, respectively,
and primer annealing temperature in parentheses): (1)
ATGCCTTCCAGCGCTGGACCCAAC and TTTACAA-
GACCTACAGCCAGAATG (809, 437 bp, 608C); (2) CA-
TTCTGGCTGTAGGTCTTGTAAA and ACGACTCATT-
CCCTGACCATC (909, 505 bp, 608C); (3) GATGGTCAG-
GGAATGAGTCGT and GATGGGGATCAGCTCTTTG-
TT (858, 543 bp, 608C); (4) AACAAAGAGCTGATCCC-
X. Zhang et al. / Gene 288 (2002) 9–1810
CATC and CACTGTGAAACCTCTCTGTCACAA (437,
275 bp, 608C); (5) CTTGTGACAGAGAGGTTTCACAGT
and GAATGTCTGATTTCCCTTGGTTAT (640, 344 bp,
608C); (6) TTATCTTCTCAGGCAAACACAATG and C-
TTAAAACCAACTCCCTTCATCTG (954, 512 bp, 608C);
(7) CAGATGAAGGGAGTTGGTTTTAAG and ACTCT-
GTCATCCTCGGGCTCT (961, 431 bp, 608C); (8) AGA-
GCCCGAGGATGACAGAGT and GCTGCAGCTCTT-
CTCGTAGTACC (687, 412 bp, 608C) and (9) CTGTG-
GCGGTACTACGAGAAG and GTCCATGAGCTCAGA-
GTCCAACT (483, 321 bp, 608C). The primers were first
tested by PCR on the Fugu cosmid clones. RT-PCR was
performed with Access RT-PCR System as described by
the manufacturer (Promega) using total RNA prepared
from Fugu liver tissue with the SV total RNA Isolation
System (Promega). PCR products were cloned into plasmid
pUC18 for sequencing. The alternative splicing analysis
was performed on total RNA from human lymphoblastoid
cell lines, mouse cell lines (F9 and NIH/3T3) and rat cell
lines (PC12 and PC13). The forward primer (CAAGAG-
GACCGCATGTACCCG) and reverse primer (CACAG-
CAAGAATAGTCTCACT) were derived from sequences
located inside exons 1 and 4 of the human NUP155 gene,
respectively.
2.5. Mouse cDNA cloning by 5 0-rapid amplification of
cDNA ends (RACE) and in silico walking
Sequences of the human NUP155 cDNA were used for a
BLAST search against the mouse expressed sequence tag
(EST) database. The matched mouse ESTs were used to
design PCR primers. The 5 0-RACE was performed with a
kit from Gibco BRL. Briefly, first-strand synthesis was
performed using 0.4 mg poly(A)1 RNA from mouse NIH/
3T3 cell lines (ATCC) and the mouse Nup155 specific
primer (TGGCGAATCATTATTGGAA). The tailed cDN-
A 5 0-end was amplified with a nested primer (CAGGAAT-
GACCATCAACAC) as well as an abridged anchor primer
provided by the manufacturer. The PCR product was cloned
into the pCR II vector (Original TA Cloning Kit, InVitro-
gen). The selected clones were sequenced and the cDNA
sequence was extended by primer walking as described
previously (Zhang et al., 1999).
3. Results
3.1. Sequence assembly and analysis of a human BAC clone
containing the NUP155 gene
We initially identified five human BAC clones from the
Human Male RPCI-11 BAC Library by PCR screening.
Two of the clones were demonstrated to be positive with
all the three primer pairs, indicating that they contained
most, if not all, of the genomic sequences of the human
NUP155 gene. After restriction analysis, the biggest
clone, RP11-085O06, estimated to be 166.4 kb in length
according to the sum of restriction fragments, was chosen
for sequencing.
In all, over 2.2 Mb shotgun sequence data were assembled
into a single consensus sequence of 165.6 kb. The high
quality assembly had not only a very low error rate (0.06/
10 kb, estimated with Phred), but also a high fidelity since
all its computer-simulated restriction fragments matched
perfectly to the experimental data from restriction analysis.
The difficulties in assembly turned out to be caused by the
extremely high frequency of interspersed repeats in this
BAC clone (Table 1).
The assembled BAC sequence contained the complete
human NUP155 gene and an incomplete sequence of the
FLJ10233 gene. The full NUP155 gene was approximately
80 kb in length, covering the whole cDNA sequence. The
FLJ10233 gene was located upstream the NUP155 gene and
transcribed in the opposite direction. The two genes were
separated by only 8 kb DNA sequences, presumably contain-
ing the 5 0 promoter regions of both genes. Promoter pre-
diction (http://www.fruitfly.org/seq_tools/promoter.html)
revealed that the transcription start site with the highest
score (0.99) would be located 757 bp upstream the start
codon (ATG). The TATA box and other promoter elements
were not significant. A definitive determination of the tran-
scription start site and regulatory elements will require func-
tional testing.
X. Zhang et al. / Gene 288 (2002) 9–18 11
Table 1
The contents of main interspersed repeats in the human BAC clone RPI 1-085006
Repeat type Total number of elements Total bp of repeats Fraction (%)
RPII-085006 Whole human genomea
SINEs 262 68,519 41.37 13.14
(ALUs) (252) (67,057) (40.49) (10.60)
LINEs 39 15,988 9.65 20.42
LTR elements 22 10,800 6.52 8.29
DNA elements 18b 4178 2.52 2.84
Total interspersed repeats 341 99,485 60.07 44.83
a International Human Genome Sequencing Consortium, 2001.b Including 11 MERI, three MER2, and two Mariner elements.
Analysis of the completely assembled sequence also
revealed that the BAC clone contained all major types of
interspersed repeats in high numbers (Table 1), as defined
by RepeatMasker (http://ftp.genome.washington.edu/
RM.RepeatMasker.html). The overall repeat content was
60.07%, in contrast to that of the whole human genome
(44.83%) (International Human Genome Sequencing
Consortium, 2001). The Alu repeats accounted for about
40%, in contrast to 10.60% in the whole human genome
(Table 1). The genomic segment had a GC-content of 42%,
close to the genome-wide average (41%) of humans (Inter-
national Human Genome Sequencing Consortium, 2001).
3.2. Tissue specific expression and alternative transcription
of the human NUP155 gene
Our previous analysis demonstrated that the human
NUP155 gene was expressed at different level in all the
eight tissues tested (heart, brain, placenta, lung, liver, skele-
tal muscle, kidney, and pancreas) with two universal
variants, approximately 5.4 and 4.7 kb in length (Zhang et
al., 1999). It was postulated that this might be due to alter-
native usage of two 3 0 polyadenylation signals, which were
743 bp away from each other.
In a more detailed sequence analysis, we found another
size difference in the 5 0 part of the transcripts. The cDNA
cloned from a testis cDNA library and two published ESTs
(Accession Nos. AA644462 and AL045174) did not contain
a 120 bp segment that was present in another NUP155 cDNA
clone (Accession No. NM_004298). In order to resolve if the
sequence discrepancy was due to alternative splicing of the 5 0
sequences, we performed an RT-PCR analysis based on
exons 1 and 4 sequences of the NUP155 gene (Fig. 1A).
The results revealed two possible transcripts with a size
difference of 155 bp. Sequence data from RT-PCR products
showed that the difference of 155 bp was a result from alter-
native usage of a 5 0 cryptic splice donor signal (tcag/
GTTTTT) inside intron 1, which was located 155 bp down-
stream of the 5 0 consensus splice donor sequence (ccaa/
GTGAGT) of intron 1 (Fig. 1B). The transcript in smaller
size, utilizing the consensus splice signal, seemed to be the
major species as judged from the intensity of PCR products
on agarose gels (Fig. 1A). The results were consistent with
those obtained by RT-PCR from mouse and rat cell lines
where only the shorter transcript variant was identified.
3.3. Isolation and analysis of a Fugu cosmid containing the
genomic sequence of the Nup155 ortholog
We screened a Fugu cosmid genomic library with the
human NUP155 cDNA as probe and obtained three positive
clones. The clone, 78-K9, with the biggest insert as esti-
mated by restriction analysis, was shotgun-sequenced. The
final assembly gave a sequence of 43.5 kb, containing the
entire Fugu ortholog of the human NUP155 gene. The Fugu
gene was identified to be approximately 8 kb long and to
have 33 exons (Fig. 2).
Analysing the Fugu genomic sequence at both the DNA
and protein levels, we came to the conclusion that there
were five intact or incomplete ORFs over the length of
this cosmid (Fig. 3). The first 5.0 kb sequence of the cosmid
encoded an incomplete ORF that was 85% identical at the
amino acid sequence level to the human IDN3 gene (Acces-
sion No. NP_056199) in the 5p13.3. This similarity was
even higher than that of the Fugu Nup155 ortholog situated
immediately downstream (from positions 6.0 to 15.0 kb in
the cosmid sequence) which was 83% identical to the
human NUP155 gene in the 5p13.3. The region from posi-
tions 19.0 to 25.0 kb contained the third intact ORF with
about 72% amino acid identity to the human KIAA1292
gene (Accession No. XP_000748) in the 22q11.21. The
fourth ORF, from positions 30.0 to 36.0 kb, was 78% iden-
tical to the human Armadillo Repeat gene deleted in Velo-
Cardio-facial syndrome (ARVCF) (Accession No.
NP_001661) in the 22q11.21. The last ORF (positions
38.0–42.0 kb) showed 67% identity to the human bone
morphogenic protein 10 (BMP10) gene (Accession No.
NP_055297) in the 2p14. The ARVCF is a member of the
catenin family of genes that plays crucial roles in the forma-
tion of adherent junction complexes thought to facilitate
communication with the outside environments of a cell
(Sirotkin et al., 1997). All the five genes were determined
to be transcribed in the same direction according to their
cDNA or EST sequences (Fig. 3).
X. Zhang et al. / Gene 288 (2002) 9–1812
Fig. 1. Alternative usage of a 5 0cryptic splice donor signal in intron 1. Total
RNA was isolated from human lymphoblastoid cell lines from seven indi-
viduals (1–7), mouse cell lines F9 and NIH/3T3 (8, 9), and rat cell lines
PC12 and PC13 (10, 11). The RT-PCR products were analyzed on an
agarose gel (A). The middle and last lanes were loaded with DNA Mole-
cular Weight Marker VIII from Roche (M). The weaker bands are believed
to represent an alternatively spliced product due to a 5 0 cryptic splice donor
signal in intron 1 (RT-PCR 1). The major bands represent the product using
the consensus splice signal of intron I (RT-PCR 2). The predicted sizes of
the RT-PCR products are labeled on the left. A schematic interpretation of
the alternative splicing process involving the first four exons (E1–E4) is
illustrated in (B). The positions of the forward (F) and reverse (R) primers
are indicated by arrows.
3.4. Cloning of Nup155 cDNA orthologs from Fugu and
mouse
RT-PCR assays were performed using Fugu liver mRNA
and primers designed based on the Fugu 78-K9 cosmid
sequence. The corresponding sizes of both RT-PCR and
genomic PCR products were in perfect agreement with the
deduced cDNA sequence of the Fugu Nup155 ortholog and
its genomic sequence. Finally, sequences of all the RT-PCR
products were assembled into a contig of 4316 bp for the
cDNA of the Fugu Nup155 ortholog.
The corresponding mouse cDNA fragments were
obtained by 5 0-RACE and EST walking on mRNA isolated
from the mouse NIH/3T3 cell line. Sequences from over-
lapping PCR products and/or clones were assembled into a
contig of 4361 bp for the mouse cDNA. The sequence of its
open reading frames (ORFs) is highly homologous to the rat
Nup155 cDNA, 94% at the nucleotide level and 98% at the
amino acid level, except for an insertion of a codon for
serine at position 18, making its size same as that of the
human ortholog. Its predicted amino acid sequence is 96%
identical to that of the human NUP155 gene (Fig. 4).
4. Discussion
We have previously reported the cloning and character-
ization of a full-length cDNA of the human NUP155 gene
and localization of the gene to the 5p13 region (Zhang et al.,
1999). In the present study, we have sequenced a BAC clone
containing the whole human NUP155 gene and identified its
complete genomic sequence and organization. We have also
X. Zhang et al. / Gene 288 (2002) 9–18 13
Fig. 3. Comparison of the genome regions with Nup155 and neighboring genes in human and Fugu. The directions of transcription are indicated by arrows.
Fig. 2. Comparison of the genomic organization of the human and Fugu Nup155 orthologs. Only the 5 0end and the regions around exons 17 and 36 (E17 & E36,
in solid boxes), and introns 6, 17 and 35 (I6, I17 & I35, in thicker lines), which are absent from the Fugu gene, are drawn to scale in the human gene. The sizes
of the biggest introns, (I1 in human and I21 in Fugu) are indicated.
X.
Zh
an
get
al.
/G
ene
28
8(2
00
2)
9–
18
15
Fig. 4. A multiple alignment of predicted amino acid sequences of the Nup155 orthologs in Homo sapiens (Man), M. musculus (Mouse), R. norvegicus (Rat), Fugu rubripes (Fugu), Arabidopsis thaliana
(Arabidopsis), Drosophila melanogaster (Drosophila), and S. cerevisiae (Yeast). The conserved regions or amino acids with various homology are emphasized by dark or light shading.
cloned and sequenced a Fugu cosmid containing the
Nup155 ortholog and another four genes, as well as
cDNAs of the Nup155 ortholog in Fugu and mouse. We
have also studied alternative transcript variants of the
human NUP155 gene and performed comparative analysis
of the genomic organization and gene order in the region
containing the Nup155 orthologs in different species.
4.1. The genomic organization and alternative transcripts of
the human NUP155 gene
The human NUP155 gene has 36 exons according to the
alignment with the cDNA sequence (Fig. 2). The biggest
exon, exon 1, which contains the 5 0-untranslated region (5 0
untranslated region, UTR) and the translation start site
(ATG) is 588 bp in length or even longer since the transcrip-
tion start site could be further upstream. The smallest exon,
exon 17, which is not present in the Fugu gene, is only 63 bp
in length. The biggest intron, intron 1, is 6281 bp in length
whereas the smallest, intron 26, is only 213 bp in length. The
total size of all the introns is 75.705 kb. All the exon-intron
boundaries are conserved except the 5 0 splice donor signal
of intron 35 (GC instead of GT), which, together with the
sequence corresponding to the human exon 36, is not
present in the Fugu gene.
Our previous Northern analyzes and EST-derived infor-
mation have suggested that the human NUP155 gene is
widely expressed (eight tissues tested). We also found that
there are two main transcripts of the gene, around 5.4 and
4.7 kb in length, and suggested that it may result from an
alternative usage of the two polyadenylation signals (Zhang
et al., 1999). In the present analysis we show, on the basis of
the RT-PCR results, that the alternative transcripts also
involve an alternative usage of a 5 0 cryptic splice donor
signal inside intron 1 (Fig. 1). The bigger and less abundant
PCR product (494 bp) that results from the usage of the 5 0
cryptic splicing donor signal might constitute a minor
species of the transcripts which is difficult to detect by
Northern analysis. The usage of the 5 0 cryptic splice
donor signal inside intron 1 would create an in-frame stop
codon so that a second ATG would have to be used as the
translation start site. A truncated gene product of 149 kDa
would be predicted, which is much smaller than the protein
that was characterized in rat (Radu et al., 1993).
4.2. Comparative analyzes of the Nup155 orthologs and
their evolution
Although the coding sequence of the Nup155 gene and its
orthologs are highly conserved (Fig. 4), the genomic orga-
nization has undergone many significant changes during the
evolution of eukaryotes. Firstly, there are two Nup155 para-
logs, Nup170 and Nup157, in yeast. Both are major consti-
tuents of the yeast nuclear pore complex. Although the
function of the yeast Nup170, which encodes a specialized
nucleoporin with a unique role in chromosome segregation
and possibly kinetochore function (Kerscher et al., 2001), is
replaceable with rat Nup155, its complete deletion gives rise
to a synthetic lethal phenotype (Aitchison et al., 1995a,b).
Strikingly, such dependence on two paralogous genes is not
conserved in higher eukaryotes since only a single Nup155
locus is present in all other eukaryotes examined so far,
indicating different evolutionary paths since the divergence
of unicellular and multicellular eukaryotic organisms.
Secondly, we are unable to find a Nup155 ortholog in the
complete sequence of the C. elegance genome by BLAST
search. In Drosophila, the Nup154 gene is identified as the
Nup155 ortholog, which is 47% identical to the human
NUP155 cDNA. The Nup154 gene is proven necessary for
survival. This protein is also essential when assembly of
new NPCs is required in proliferating or growing tissues
(Kiger et al., 1999), such as in male and female gametogen-
esis (Gigliotti et al., 1998). The Nup155 ortholog has also
been found in zebrafish (five ESTs, Accession Nos.
AA494635, AI558361, AW170971, AW175336,
AW422352). Finally, Nup155 orthologs are also found in
plants, including Arabidopsis thaliana (Accession No.
AAF79236) and in Tritrichomonas foetus (a partial
sequence with Accession No. AAB51116).
The Fugu ortholog presently characterized is only one/ten
of the size of its human counterpart since all of the corre-
sponding introns in Fugu were significantly smaller than
those in human (Fig. 2). It does not contain sequences
homologous to the human exon 17, which is the smallest
exon (63 bp) of the human NUP155 gene, nor sequences
corresponding to the human intron 17 (Fig. 2). The biologi-
cal significance of the corresponding protein domain is not
known. The Fugu 3 0 untranslated region (3 0 UTR) is similar
to that of the shorter (less abundant) transcript of the human
NUP155 gene, where exon 36 is not present. The sequences
homologous to exons 6 and 7 in human are fused into a
single intact exon in Fugu because of the absence of intron
6 (Fig. 2). Therefore, the predicted total molecular weight of
the Fugu gene product is only 153–2 kDa smaller than its
human counterpart, thus it should be named Nup153 accord-
ingly. Most of the exon and intron structures seem well
conserved, indicating a similarity in intron phasing. Totally,
the human introns in the NUP155 gene are 21.7 times that of
the Fugu ones. The ratio in basepair length of introns rela-
tive to exons is also much higher in human (14.0:1 in human
and 1.2:1 in Fugu). Another observation is that the size
pattern among introns is not at all consistent (the biggest
intron in man is intron 2, whereas intron 21 is the biggest in
Fugu), indicating independent evolution (Fig. 2).
Other organizational changes among the Nup155 gene
orthologs have also occurred during evolution of eukar-
yotes. In yeast, both Nup157 on Chromosome V and
Nup170 on Chromosome II appear intronless. The Nup154
gene on Chromosome II in Drosophila has only 11 introns
and the Arabidopsis ortholog on Chromosome 1 has 12
introns. The introns are three times as many in Fugu and
human, which have 32 and 35 introns, respectively. Such an
increase in intron number is quite commonly seen when
X. Zhang et al. / Gene 288 (2002) 9–1816
genes are compared over large evolutionary distance.
Detailed comparison of the Drosophila Nup154 gene and
the human NUP155 gene demonstrates that five out of the
11 introns are found at the same positions in the amino acid
sequence as those in the human NUP155 gene. The fact that
many introns only exist in some evolutionary lineages does
not necessarily mean that they are functionally unimportant,
but may indicate that some aspects of the evolutionary
process lie in the subtlety of the genomic structure.
4.3. Gene order differences in the Nup155 orthologous
regions of the human and Fugu genomes
One role of comparative genomics is to provide informa-
tion for the assembly of contiguous clusters of sequence
data across orthologous segments in related genomes and
for the identification of gene structural and functional units.
It has been suggested that with a synteny similar to that in
man, the very small Fugu genome could be utilized in posi-
tional cloning (Davidson et al., 2000; Trower et al., 1996).
In our study, however, the five genes identified in the Fugu
78-K9 cosmid clone cast significant doubts on the synteny
similarities between the two vertebrate genomes. Human
orthologs of these five genes have been located to three
different regions in the genome (Fig. 3). This is in agree-
ment with other reports in the recent literature (Gilley and
Fried, 1999). The degree of synteny similarity between the
two genomes could be different from region to region
arguing for further comparative mapping and sequencing
of the two genomes. However, the difference in gene
order in the region containing the Nup155 orthologs in the
human and Fugu genomes advocates cautious interpretation
of synteny in comparative genomics.
Acknowledgements
This study was supported by Chinese Academy of
Sciences, Ministry of Sciences and Technologies and the
National Natural Science Foundation of China, as well as
by the Danish Karen Elise Jensens Fund and DANIDA,
Denmark.
References
Aitchison, J.D., Blobel, G., Rout, M.P., 1995a. Nup120p: a yeast nucleo-
porin required for NPC distribution and mRNA transport. J. Cell Biol.
131, 1659–1675.
Aitchison, J.D., Rout, M.P., Marelli, M., Blobel, G., Wozniak, R.W.,
1995b. Two novel related yeast nucleoporins Nup170p and Nup157p:
complementation with the vertebrate homologue Nup155p and func-
tional interactions with the yeast nuclear pore-membrane protein
Pom152p. J. Cell Biol. 131, 1133–1148.
Bagley, S., Goldberg, M.W., Cronshaw, J.M., Rutherford, S., Allen, T.D.,
2000. The nuclear pore complex. J. Cell Sci. 113, 3885–3886.
Belgareh, N., Rabut, G., Bai, S.W., van Overbeek, M., Beaudouin, J.,
Daigle, N., Zatsepina, O.V., Pasteau, F., Labas, V., Fromont-Racine,
M., Ellenberg, J., Doye, V., 2001. An evolutionarily conserved NPC
subcomplex, which redistributes in part to kinetochores in mammalian
cells. J Cell Biol. 154, 1147–1160.
Boer, J., Bonten-Surtel, J., Grosveld, G., 1998. Overexpression of the
nucleoporin CAN/NUP214 induces growth arrest, nucleocytoplasmic
transport defects, and apoptosis. Mol. Cell Biol. 18, 1236–1247.
Bouck, J., Miller, W., Gorrell, J.H., Muzny, D., Gibbs, R.A., 1998. Analysis
of the quality and utility of random shotgun sequencing at low redun-
dancies. Genome Res. 8, 1074–1084.
Davidson, H., Taylor, M.S., Doherty, A., Boyd, A.C., Porteous, D.J., 2000.
Genomic sequence analysis of Fugu rubripes CFTR and flanking genes
in a 60 kb region conserving synteny with 800 kb of human chromo-
some 7. Genome Res. 10, 1194–1203.
Doye, V., Hurt, E.C., 1995. Genetic approaches to nuclear pore structure
and function. Trends Genet. 11, 235–241.
Ewing, B., Green, P., 1998. Base-calling of automated sequencer traces
using phred. II. Error probabilities. Genome Res. 8, 186–194.
Ewing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of auto-
mated sequencer traces using phred. I. Accuracy assessment. Genome
Res. 8, 175–185.
Fabre, E., Hurt, E., 1997. Yeast genetics to dissect the nuclear pore complex
and nucleocytoplasmic trafficking. Annu. Rev. Genet. 31, 277–313.
Gigliotti, S., Callaini, G., Andone, S., Riparbelli, M.G., Pernas-Alonso, R.,
Hoffmann, G., Graziani, F., Malva, C., 1998. Nup154, a new Droso-
phila gene essential for male and female gametogenesis is related to the
NUP155 vertebrate nucleoporin gene. J. Cell Biol. 142, 1195–1207.
Gilley, J., Fried, M., 1999. Extensive gene order differences within regions
of conserved synteny between the Fugu and human genomes: implica-
tions for chromosomal evolution and the cloning of disease genes. Hum.
Mol. Genet. 8, 1313–1320.
Gordon, D., Abajian, C., Green, P., 1998. Consed: a graphical tool for
sequence finishing. Genome Res. 8, 195–202.
Gorlich, D., Mattaj, I.W., 1996. Nucleocytoplasmic transport. Science 271,
1513–1518.
International Human Genome Sequencing Consortium, 2001. Initial
sequencing and analysis of the human genome. Nature 409, 860–921.
Jaju, R.J., Fidler, C., Haas, O.A., Strickson, A.J., Watkins, F., Clark, K.,
Cross, N.C., Cheng, J.F., Aplan, P.D., Kearney, L., Boultwood, J.,
Wainscoat, J.S., 2001. A novel gene, NSD1, is fused to NUP98 in the
t(5;11)(q35;p15.5) in de novo childhood acute myeloid leukemia. Blood
98, 1264–1267.
Kerscher, O., Hieter, P., Winey, M., Basrai, M.A., 2001. Novel role for a
Saccharomyces cerevisiae nucleoporin, Nup170p, in chromosome
segregation. Genetics 157, 1543–1553.
Kiger, A.A., Gigliotti, S., Fuller, M.T., 1999. Developmental genetics of the
essential Drosophila nucleoporin nup154: allelic differences due to an
outward-directed promoter in the P-element 3 0 end. Genetics 153, 799–
812.
Kosova, B., Pante, N., Rollenhagen, C., Hurt, E., 1999. Nup192p is a
conserved nucleoporin with a preferential location at the inner site of
the nuclear membrane. J. Biol. Chem. 274, 22646–22651.
Miller, B.R., Powers, M., Park, M., Fischer, W., Forbes, D.J., 2000. Identi-
fication of a new vertebrate nucleoporin, nup188, with the use of a novel
organelle trap assay. Mol. Biol. Cell 11, 3381–3396.
Radu, A., Blobel, G., Wozniak, R.W., 1993. Nup155 is a novel nuclear pore
complex protein that contains neither repetitive sequence motifs nor
reacts with WGA. J. Cell Biol. 121, 1–9.
Sirotkin, H., O’Donnell, H., DasGupta, R., Halford, S., St.Jore, B., Puech,
A., Parimoo, S., Morrow, B., Skoultchi, A., Weissman, S.M., Scambler,
P., Kucherlapati, R., 1997. Identification of a new human catenin gene
family member (ARVCF) from the region deleted in velo-cardio-facial
syndrome. Genomics 41, 75–83.
Smith, T.M., Lee, M.K., Szabo, C.I., Jerome, N., McEuen, M., Taylor, M.,
Hood, L., King, M.C., 1996. Complete genomic sequence and analysis
of 117 kb of human DNA containing the gene BRCA1. Genome Res. 6,
1029–1049.
Trower, M.K., Orton, S.M., Purvis, I.J., Sanseau, P., Riley, J., Christodou-
lou, C., Burt, D., See, C.G., Elgar, G., Sherrington, R., Rogaev, E.I.,
X. Zhang et al. / Gene 288 (2002) 9–18 17
St.George-Hyslop, P., Brenner, S., Dykes, C.W., 1996. Conservation of
synteny between the genome of the pufferfish (Fugu rubripes) and the
region on human chromosome 14 (14q24.3) associated with familial
Alzheimer disease (AD3 locus). Proc. Natl. Acad. Sci. 93, 1366–1369.
van Deursen, J., Boer, J., Kasper, L., Grosveld, G., 1996. G2 arrest and
impaired nucleocytoplasmic transport in mouse embryos lacking the
proto-oncogene CAN/Nup214. EMBO J. 15, 5574–5583.
Wong, G.K., Yu, J., Thayer, E.C., Olson, M.V., 1997. Multiple-complete-
digest restriction fragment mapping: generating sequence-ready maps
for large-scale DNA sequencing. Proc. Natl. Acad. Sci. 94, 5225–5230.
Zhang, X., Yang, H., Corydon, M.J., Pedersen, S., Korenberg, J.R., Chen,
X.N., Laporte, J., Gregersen, N., Niebuhr, E., Liu, G., Bolund, L., 1999.
Localization of a human nucleoporin 155 gene (NUP155) to the 5p13
region and cloning of its cDNA. Genomics 57, 144–151.
X. Zhang et al. / Gene 288 (2002) 9–1818