Genomic organization, transcript variants and comparative analysis of the human nucleoporin 155...

10
Genomic organization, transcript variants and comparative analysis of the human nucleoporin 155 (NUP155) gene q Xiuqing Zhang a,b , Huanming Yang a,b, * , Jun Yu a,c , Cong Chen a , Guangyu Zhang a , Jingyue Bao a , Yutao Du a , Miho Kibukawa c , Zhijie Li a,b,d , Jun Wang a , Songnian Hu a , Wei Dong a , Jian Wang a , Niels Gregersen d , Erik Niebuhr e , Lars Bolund b a Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Datun Road, Beijing 100101, China b Institute of Human Genetics, Aarhus University, Aarhus, Denmark c Human Genome Center, University of Washington, Seattle, WA, USA d Research Unit for Molecular Medicine, Aarhus University Hospital, Aarhus, Denmark e Department of Medical Genetics, IMBG, Copenhagen University, Copenhagen, Denmark Received 6 August 2001; received in revised form 21 January 2002; accepted 4 February 2002 Received by E. Sverdlov Abstract Nucleoporin 155 (Nup155) is a major component of the nuclear pore complex (NPC) involved in cellular nucleo-cytoplasmic transport. We have acquired the complete sequence and interpreted the genomic organization of the Nup155 orthologos from human (Homo sapiens) and pufferfish (Fugu rubripes), which are approximately 80 and 8 kb in length, respectively. The human gene is ubiquitously expressed in many tissues analyzed and has two major transcript variants, resulted from an alternative usage of the 5 0 cryptic or consensus splice donor in intron 1 and two polyadenylation signals. We have also cloned DNA complementary to RNAs of the Nup155 orthologs from Fugu and mouse. Comparative analysis of the Nup155 orthologs in many species, including H. sapiens, Mus musculus, Rattus norvegicus, F. rubripes, Arabidopsis thaliana, Drosophila melanogaster, and Saccharomyces cerevisiae, has revealed two paralogs in S. cerevisiae but only a single gene with increasing number of introns in more complex organisms. The amino acid sequences of the Nup155 orthologos are highly conserved in the evolution of eukaryotes. Different gene orders in the human and Fugu genomic regions harboring the Nup155 orthologs advocate cautious interpretation of synteny in comparative genomic analysis even within the vertebrate lineage. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Nuclear pore complex (NPC); Shotgun sequencing; DNA complementary to RNA cloning; Gene order; Nucleoporin 155 gene orthologs 1. Introduction Nucleoporins are major components of the nuclear pore complex (NPC). They are involved in regulating bi-direc- tional trafficking of cellular macromolecules, especially message RNAs (mRNAs) and proteins, between the nucleus and cytosol (Bagley et al., 2000; Gorlich and Mattaj, 1996). Studies on yeast and Drosophila have revealed that most of the nucleoporin genes are essential for survival (Fabre and Hurt, 1997; Kiger et al., 1999). More than 30 nucleoporins have been identified in yeast, Drosophila, Arabidopsis, Tritrichomonas, Fugu, zebrafish, rat, mouse, and humans (Doye and Hurt, 1995; Kosova et al., 1999; Miller et al., 2000; Belgareh et al., 2001). Malfunction of nucleoporins has been suggested to be pathogenic in humans. For exam- ple, overexpression of human CAN/Nup214, a well studied nucleoporin and putative oncogene associated with myeloid leukemia, was demonstrated to induce nucleo-cytoplasmic transport defects, cell growth arrest, and apoptosis (Boer et al., 1998; van Deursen et al., 1996). The disruption of the human NUP98 gene and/or a produced fusion protein appeared related to de novo childhood acute myeloid leuke- mia (Jaju et al., 2001). Gene 288 (2002) 9–18 0378-1119/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S0378-1119(02)00470-5 www.elsevier.com/locate/gene Abbreviations: Nup155, nucleoporin 155 gene; NPC, nuclear pore complex; ORFs, open reading frames; RACE, rapid amplification of cDNA ends; kb, kilobase(s); kDa, kilodalton(s); cDNA, DNA complemen- tary to RNA; mRNA, message RNA; EST, expressed sequence tag; BAC, bacterial artificial chromosome; UTR, untranslated region; RT-PCR, reverse transcription and PCR amplification; ARVCF, Armadillo Repeat gene deleted in Velo-Cardio-facial syndrome; BMP10, bone morphogenic protein 10 gene; LINEs, long interspersed elements; SINEs, short inter- spersed elements; LTR, long-terminal repeat q Sequence data from this article have been deposited with EMBL/ GenBank Data Libraries under Accession Nos. AJ007558 (human NUP155 cDNA), AF165926 (a BAC clone containing human NUP155 gene), AF322375 (mouse Nup155 cDNA), AF301600 (Fugu Nup153 cDNA) and AF301601 (a cosmid clone containing Fugu Nup153 gene). * Corresponding author. Tel.: 186-10-6487-1664; fax: 186-10-6488- 9329. E-mail address: [email protected] (H. Yang).

Transcript of Genomic organization, transcript variants and comparative analysis of the human nucleoporin 155...

Genomic organization, transcript variants and comparative analysis of thehuman nucleoporin 155 (NUP155) geneq

Xiuqing Zhanga,b, Huanming Yanga,b,*, Jun Yua,c, Cong Chena, Guangyu Zhanga, Jingyue Baoa,Yutao Dua, Miho Kibukawac, Zhijie Lia,b,d, Jun Wanga, Songnian Hua, Wei Donga, Jian Wanga,

Niels Gregersend, Erik Niebuhre, Lars Bolundb

aHuman Genome Center, Institute of Genetics, Chinese Academy of Sciences, Datun Road, Beijing 100101, ChinabInstitute of Human Genetics, Aarhus University, Aarhus, Denmark

cHuman Genome Center, University of Washington, Seattle, WA, USAdResearch Unit for Molecular Medicine, Aarhus University Hospital, Aarhus, Denmark

eDepartment of Medical Genetics, IMBG, Copenhagen University, Copenhagen, Denmark

Received 6 August 2001; received in revised form 21 January 2002; accepted 4 February 2002

Received by E. Sverdlov

Abstract

Nucleoporin 155 (Nup155) is a major component of the nuclear pore complex (NPC) involved in cellular nucleo-cytoplasmic transport.

We have acquired the complete sequence and interpreted the genomic organization of the Nup155 orthologos from human (Homo sapiens)

and pufferfish (Fugu rubripes), which are approximately 80 and 8 kb in length, respectively. The human gene is ubiquitously expressed in

many tissues analyzed and has two major transcript variants, resulted from an alternative usage of the 5 0 cryptic or consensus splice donor in

intron 1 and two polyadenylation signals. We have also cloned DNA complementary to RNAs of the Nup155 orthologs from Fugu and

mouse. Comparative analysis of the Nup155 orthologs in many species, including H. sapiens, Mus musculus, Rattus norvegicus, F. rubripes,

Arabidopsis thaliana, Drosophila melanogaster, and Saccharomyces cerevisiae, has revealed two paralogs in S. cerevisiae but only a single

gene with increasing number of introns in more complex organisms. The amino acid sequences of the Nup155 orthologos are highly

conserved in the evolution of eukaryotes. Different gene orders in the human and Fugu genomic regions harboring the Nup155 orthologs

advocate cautious interpretation of synteny in comparative genomic analysis even within the vertebrate lineage. q 2002 Elsevier Science

B.V. All rights reserved.

Keywords: Nuclear pore complex (NPC); Shotgun sequencing; DNA complementary to RNA cloning; Gene order; Nucleoporin 155 gene orthologs

1. Introduction

Nucleoporins are major components of the nuclear pore

complex (NPC). They are involved in regulating bi-direc-

tional trafficking of cellular macromolecules, especially

message RNAs (mRNAs) and proteins, between the nucleus

and cytosol (Bagley et al., 2000; Gorlich and Mattaj, 1996).

Studies on yeast and Drosophila have revealed that most of

the nucleoporin genes are essential for survival (Fabre and

Hurt, 1997; Kiger et al., 1999). More than 30 nucleoporins

have been identified in yeast, Drosophila, Arabidopsis,

Tritrichomonas, Fugu, zebrafish, rat, mouse, and humans

(Doye and Hurt, 1995; Kosova et al., 1999; Miller et al.,

2000; Belgareh et al., 2001). Malfunction of nucleoporins

has been suggested to be pathogenic in humans. For exam-

ple, overexpression of human CAN/Nup214, a well studied

nucleoporin and putative oncogene associated with myeloid

leukemia, was demonstrated to induce nucleo-cytoplasmic

transport defects, cell growth arrest, and apoptosis (Boer et

al., 1998; van Deursen et al., 1996). The disruption of the

human NUP98 gene and/or a produced fusion protein

appeared related to de novo childhood acute myeloid leuke-

mia (Jaju et al., 2001).

Gene 288 (2002) 9–18

0378-1119/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved.

PII: S0378-1119(02)00470-5

www.elsevier.com/locate/gene

Abbreviations: Nup155, nucleoporin 155 gene; NPC, nuclear pore

complex; ORFs, open reading frames; RACE, rapid amplification of

cDNA ends; kb, kilobase(s); kDa, kilodalton(s); cDNA, DNA complemen-

tary to RNA; mRNA, message RNA; EST, expressed sequence tag; BAC,

bacterial artificial chromosome; UTR, untranslated region; RT-PCR,

reverse transcription and PCR amplification; ARVCF, Armadillo Repeat

gene deleted in Velo-Cardio-facial syndrome; BMP10, bone morphogenic

protein 10 gene; LINEs, long interspersed elements; SINEs, short inter-

spersed elements; LTR, long-terminal repeatq Sequence data from this article have been deposited with EMBL/

GenBank Data Libraries under Accession Nos. AJ007558 (human

NUP155 cDNA), AF165926 (a BAC clone containing human NUP155

gene), AF322375 (mouse Nup155 cDNA), AF301600 (Fugu Nup153

cDNA) and AF301601 (a cosmid clone containing Fugu Nup153 gene).

* Corresponding author. Tel.: 186-10-6487-1664; fax: 186-10-6488-

9329.

E-mail address: [email protected] (H. Yang).

We have previously reported the identification of the full-

length human nucleoporin 155 gene (NUP155) DNA

complementary to RNA (cDNA) by exon trapping and in

silico cloning (Zhang et al., 1999). The gene was localized

to the 5p13 region, which might be involved in mental and

developmental retardation as observed in a collection of 5p-

syndrome patients. In this report we present the complete

sequence and genomic organization of the human NUP155

gene as well as a description of its transcript variants. We

have also cloned cDNAs of the Nup155 orthologs from

Fugu and mouse. The Fugu gene has a similar genomic

organization but much smaller introns than its human ortho-

log. We have further compared the neighboring genes

around the orthologous loci of Fugu and human NUP155.

To our surprise, no obvious common synteny is found to

exist between these two organisms because the five open

reading frames (ORFs), or genes, around the Fugu Nup155

ortholog are located in three different chromosome regions

in the human genome.

2. Materials and methods

2.1. Screening for human bacterial artificial chromosome

(BAC) clones containing the NUP155 gene

We screened the RPCI-11 Human Male BAC Library

(http://www.chori.org/bacpac/11framehmale.htm) with

three specific primer pairs designed from the most 5 0 and

3 0 regions as well as the middle part of the human NUP155

cDNA sequence by PCR on the DNA pools of the library.

Sequences of the primers are as follows (listed as the

forward and reverse primers with PCR product size and

primer annealing temperature in parentheses): 5 0 region:

AGAACGGCGTCTTCCAGTTCC and AACAAGAAAA-

GATCCAAGAAG (127 bp, 558C), 3 0 region: TGTGCC-

TGGCCTATTCCCTTC and AAGTAGACATGACAGA-

ATTTTA (368 bp, 558C), and middle region: TTCCTG-

GGCTCTTTCTGCG and GAGAGGAGAACAAATTTC-

TTC (141 bp, 558C). Clones positive for at least two of

the three primer pairs were further tested by multiple restric-

tion analysis with enzymes BglII, EcoRI, and NsiI, respec-

tively, using the software developed at the Human Genome

Center, University of Washington (Wong et al., 1997). The

clone that possessed the biggest insert, and was positive for

all the three primer pairs, was selected for shotgun sequen-

cing.

2.2. Shotgun sequencing and data analysis

The selected BAC clone was sequenced by a shotgun

strategy (Bouck et al., 1998). The BAC DNA was prepared

by a modified alkaline lysis method described previously

(Wong et al., 1997) and sheared by sonication. The resulting

fragments were end-repaired with T4 DNA polymerase

(New England Biolabs). The 1.6–3.0 kb DNA fragments

were selected by agarose gel electrophoresis, purified with

the Wizard PCR Preps DNA Purification System (Promega)

and cloned into SmaI-digested and calf intestinal alkaline

phosphatase-treated plasmid pUC18. The ligation mixture

was used to transform DH5a competent cells (Gibco BRL).

The recombinant plasmid DNA was prepared in a system

with 96-well plates according to protocols developed at the

Human Genome Center, University of Washington (http://

www.genome.washington.edu/UWGC/protocols) and sequ-

enced using dye terminator or dye primer chemistry with

ABI 377 automated sequencers following the protocol

provided by the manufacturer. The Phred program was

used to call the bases from the ABI trace data and to assign

quality values. The Phrap program was used to assemble

sequence traces into contiguous sequences, which were

viewed using the Consed program (Ewing and Green,

1998; Ewing et al., 1998; Gordon et al., 1998; Smith et

al., 1996). Gaps in the assembly were closed experimentally

by using primer walking based on the sequence information

from the assembled contig ends and from PCR products

spanning the gap region. Finally, exon/intron boundaries

were identified by comparison of the assembled consensus

sequences with cDNA sequences using the Cross-Match

program (P. Green, unpublished results, http://www.genome.

washington.edu). The assembly was further confirmed by

comparison of the computer-predicted restriction sites in

the sequence with the size of restriction fragments estimated

by complete digestion with multiple enzymes. The genomic

sequence was searched against databases to identify other

potential genes in the BAC.

2.3. Screening and sequencing of Fugu genomic and cDNA

clones

The Fugu whole genome and cDNA-K libraries on high-

density membranes were purchased from UK HGMP

Resource Center (http://www.hgmp.mrc.ac.uk). The Fugu

clones were screened with the human NUP155 cDNA

clone as a probe which was labeled with a-32P-dCTP

using nick translation with Prime-It RmT Random Primer

Labelling Kit (Stratagene). The biggest Fugu cosmid clone

and all the cDNA clones obtained were sequenced and

analyzed as described above.

2.4. cDNA cloning and alternative splicing analysis using

reverse transcription and PCR amplification (RT-PCR)

For Fugu cDNA cloning, 9 PCR primer pairs were

designed based on Fugu genomic sequences (listed as

forward and reverse primers with predicted PCR sizes

from genomic and deduced cDNA sequences, respectively,

and primer annealing temperature in parentheses): (1)

ATGCCTTCCAGCGCTGGACCCAAC and TTTACAA-

GACCTACAGCCAGAATG (809, 437 bp, 608C); (2) CA-

TTCTGGCTGTAGGTCTTGTAAA and ACGACTCATT-

CCCTGACCATC (909, 505 bp, 608C); (3) GATGGTCAG-

GGAATGAGTCGT and GATGGGGATCAGCTCTTTG-

TT (858, 543 bp, 608C); (4) AACAAAGAGCTGATCCC-

X. Zhang et al. / Gene 288 (2002) 9–1810

CATC and CACTGTGAAACCTCTCTGTCACAA (437,

275 bp, 608C); (5) CTTGTGACAGAGAGGTTTCACAGT

and GAATGTCTGATTTCCCTTGGTTAT (640, 344 bp,

608C); (6) TTATCTTCTCAGGCAAACACAATG and C-

TTAAAACCAACTCCCTTCATCTG (954, 512 bp, 608C);

(7) CAGATGAAGGGAGTTGGTTTTAAG and ACTCT-

GTCATCCTCGGGCTCT (961, 431 bp, 608C); (8) AGA-

GCCCGAGGATGACAGAGT and GCTGCAGCTCTT-

CTCGTAGTACC (687, 412 bp, 608C) and (9) CTGTG-

GCGGTACTACGAGAAG and GTCCATGAGCTCAGA-

GTCCAACT (483, 321 bp, 608C). The primers were first

tested by PCR on the Fugu cosmid clones. RT-PCR was

performed with Access RT-PCR System as described by

the manufacturer (Promega) using total RNA prepared

from Fugu liver tissue with the SV total RNA Isolation

System (Promega). PCR products were cloned into plasmid

pUC18 for sequencing. The alternative splicing analysis

was performed on total RNA from human lymphoblastoid

cell lines, mouse cell lines (F9 and NIH/3T3) and rat cell

lines (PC12 and PC13). The forward primer (CAAGAG-

GACCGCATGTACCCG) and reverse primer (CACAG-

CAAGAATAGTCTCACT) were derived from sequences

located inside exons 1 and 4 of the human NUP155 gene,

respectively.

2.5. Mouse cDNA cloning by 5 0-rapid amplification of

cDNA ends (RACE) and in silico walking

Sequences of the human NUP155 cDNA were used for a

BLAST search against the mouse expressed sequence tag

(EST) database. The matched mouse ESTs were used to

design PCR primers. The 5 0-RACE was performed with a

kit from Gibco BRL. Briefly, first-strand synthesis was

performed using 0.4 mg poly(A)1 RNA from mouse NIH/

3T3 cell lines (ATCC) and the mouse Nup155 specific

primer (TGGCGAATCATTATTGGAA). The tailed cDN-

A 5 0-end was amplified with a nested primer (CAGGAAT-

GACCATCAACAC) as well as an abridged anchor primer

provided by the manufacturer. The PCR product was cloned

into the pCR II vector (Original TA Cloning Kit, InVitro-

gen). The selected clones were sequenced and the cDNA

sequence was extended by primer walking as described

previously (Zhang et al., 1999).

3. Results

3.1. Sequence assembly and analysis of a human BAC clone

containing the NUP155 gene

We initially identified five human BAC clones from the

Human Male RPCI-11 BAC Library by PCR screening.

Two of the clones were demonstrated to be positive with

all the three primer pairs, indicating that they contained

most, if not all, of the genomic sequences of the human

NUP155 gene. After restriction analysis, the biggest

clone, RP11-085O06, estimated to be 166.4 kb in length

according to the sum of restriction fragments, was chosen

for sequencing.

In all, over 2.2 Mb shotgun sequence data were assembled

into a single consensus sequence of 165.6 kb. The high

quality assembly had not only a very low error rate (0.06/

10 kb, estimated with Phred), but also a high fidelity since

all its computer-simulated restriction fragments matched

perfectly to the experimental data from restriction analysis.

The difficulties in assembly turned out to be caused by the

extremely high frequency of interspersed repeats in this

BAC clone (Table 1).

The assembled BAC sequence contained the complete

human NUP155 gene and an incomplete sequence of the

FLJ10233 gene. The full NUP155 gene was approximately

80 kb in length, covering the whole cDNA sequence. The

FLJ10233 gene was located upstream the NUP155 gene and

transcribed in the opposite direction. The two genes were

separated by only 8 kb DNA sequences, presumably contain-

ing the 5 0 promoter regions of both genes. Promoter pre-

diction (http://www.fruitfly.org/seq_tools/promoter.html)

revealed that the transcription start site with the highest

score (0.99) would be located 757 bp upstream the start

codon (ATG). The TATA box and other promoter elements

were not significant. A definitive determination of the tran-

scription start site and regulatory elements will require func-

tional testing.

X. Zhang et al. / Gene 288 (2002) 9–18 11

Table 1

The contents of main interspersed repeats in the human BAC clone RPI 1-085006

Repeat type Total number of elements Total bp of repeats Fraction (%)

RPII-085006 Whole human genomea

SINEs 262 68,519 41.37 13.14

(ALUs) (252) (67,057) (40.49) (10.60)

LINEs 39 15,988 9.65 20.42

LTR elements 22 10,800 6.52 8.29

DNA elements 18b 4178 2.52 2.84

Total interspersed repeats 341 99,485 60.07 44.83

a International Human Genome Sequencing Consortium, 2001.b Including 11 MERI, three MER2, and two Mariner elements.

Analysis of the completely assembled sequence also

revealed that the BAC clone contained all major types of

interspersed repeats in high numbers (Table 1), as defined

by RepeatMasker (http://ftp.genome.washington.edu/

RM.RepeatMasker.html). The overall repeat content was

60.07%, in contrast to that of the whole human genome

(44.83%) (International Human Genome Sequencing

Consortium, 2001). The Alu repeats accounted for about

40%, in contrast to 10.60% in the whole human genome

(Table 1). The genomic segment had a GC-content of 42%,

close to the genome-wide average (41%) of humans (Inter-

national Human Genome Sequencing Consortium, 2001).

3.2. Tissue specific expression and alternative transcription

of the human NUP155 gene

Our previous analysis demonstrated that the human

NUP155 gene was expressed at different level in all the

eight tissues tested (heart, brain, placenta, lung, liver, skele-

tal muscle, kidney, and pancreas) with two universal

variants, approximately 5.4 and 4.7 kb in length (Zhang et

al., 1999). It was postulated that this might be due to alter-

native usage of two 3 0 polyadenylation signals, which were

743 bp away from each other.

In a more detailed sequence analysis, we found another

size difference in the 5 0 part of the transcripts. The cDNA

cloned from a testis cDNA library and two published ESTs

(Accession Nos. AA644462 and AL045174) did not contain

a 120 bp segment that was present in another NUP155 cDNA

clone (Accession No. NM_004298). In order to resolve if the

sequence discrepancy was due to alternative splicing of the 5 0

sequences, we performed an RT-PCR analysis based on

exons 1 and 4 sequences of the NUP155 gene (Fig. 1A).

The results revealed two possible transcripts with a size

difference of 155 bp. Sequence data from RT-PCR products

showed that the difference of 155 bp was a result from alter-

native usage of a 5 0 cryptic splice donor signal (tcag/

GTTTTT) inside intron 1, which was located 155 bp down-

stream of the 5 0 consensus splice donor sequence (ccaa/

GTGAGT) of intron 1 (Fig. 1B). The transcript in smaller

size, utilizing the consensus splice signal, seemed to be the

major species as judged from the intensity of PCR products

on agarose gels (Fig. 1A). The results were consistent with

those obtained by RT-PCR from mouse and rat cell lines

where only the shorter transcript variant was identified.

3.3. Isolation and analysis of a Fugu cosmid containing the

genomic sequence of the Nup155 ortholog

We screened a Fugu cosmid genomic library with the

human NUP155 cDNA as probe and obtained three positive

clones. The clone, 78-K9, with the biggest insert as esti-

mated by restriction analysis, was shotgun-sequenced. The

final assembly gave a sequence of 43.5 kb, containing the

entire Fugu ortholog of the human NUP155 gene. The Fugu

gene was identified to be approximately 8 kb long and to

have 33 exons (Fig. 2).

Analysing the Fugu genomic sequence at both the DNA

and protein levels, we came to the conclusion that there

were five intact or incomplete ORFs over the length of

this cosmid (Fig. 3). The first 5.0 kb sequence of the cosmid

encoded an incomplete ORF that was 85% identical at the

amino acid sequence level to the human IDN3 gene (Acces-

sion No. NP_056199) in the 5p13.3. This similarity was

even higher than that of the Fugu Nup155 ortholog situated

immediately downstream (from positions 6.0 to 15.0 kb in

the cosmid sequence) which was 83% identical to the

human NUP155 gene in the 5p13.3. The region from posi-

tions 19.0 to 25.0 kb contained the third intact ORF with

about 72% amino acid identity to the human KIAA1292

gene (Accession No. XP_000748) in the 22q11.21. The

fourth ORF, from positions 30.0 to 36.0 kb, was 78% iden-

tical to the human Armadillo Repeat gene deleted in Velo-

Cardio-facial syndrome (ARVCF) (Accession No.

NP_001661) in the 22q11.21. The last ORF (positions

38.0–42.0 kb) showed 67% identity to the human bone

morphogenic protein 10 (BMP10) gene (Accession No.

NP_055297) in the 2p14. The ARVCF is a member of the

catenin family of genes that plays crucial roles in the forma-

tion of adherent junction complexes thought to facilitate

communication with the outside environments of a cell

(Sirotkin et al., 1997). All the five genes were determined

to be transcribed in the same direction according to their

cDNA or EST sequences (Fig. 3).

X. Zhang et al. / Gene 288 (2002) 9–1812

Fig. 1. Alternative usage of a 5 0cryptic splice donor signal in intron 1. Total

RNA was isolated from human lymphoblastoid cell lines from seven indi-

viduals (1–7), mouse cell lines F9 and NIH/3T3 (8, 9), and rat cell lines

PC12 and PC13 (10, 11). The RT-PCR products were analyzed on an

agarose gel (A). The middle and last lanes were loaded with DNA Mole-

cular Weight Marker VIII from Roche (M). The weaker bands are believed

to represent an alternatively spliced product due to a 5 0 cryptic splice donor

signal in intron 1 (RT-PCR 1). The major bands represent the product using

the consensus splice signal of intron I (RT-PCR 2). The predicted sizes of

the RT-PCR products are labeled on the left. A schematic interpretation of

the alternative splicing process involving the first four exons (E1–E4) is

illustrated in (B). The positions of the forward (F) and reverse (R) primers

are indicated by arrows.

3.4. Cloning of Nup155 cDNA orthologs from Fugu and

mouse

RT-PCR assays were performed using Fugu liver mRNA

and primers designed based on the Fugu 78-K9 cosmid

sequence. The corresponding sizes of both RT-PCR and

genomic PCR products were in perfect agreement with the

deduced cDNA sequence of the Fugu Nup155 ortholog and

its genomic sequence. Finally, sequences of all the RT-PCR

products were assembled into a contig of 4316 bp for the

cDNA of the Fugu Nup155 ortholog.

The corresponding mouse cDNA fragments were

obtained by 5 0-RACE and EST walking on mRNA isolated

from the mouse NIH/3T3 cell line. Sequences from over-

lapping PCR products and/or clones were assembled into a

contig of 4361 bp for the mouse cDNA. The sequence of its

open reading frames (ORFs) is highly homologous to the rat

Nup155 cDNA, 94% at the nucleotide level and 98% at the

amino acid level, except for an insertion of a codon for

serine at position 18, making its size same as that of the

human ortholog. Its predicted amino acid sequence is 96%

identical to that of the human NUP155 gene (Fig. 4).

4. Discussion

We have previously reported the cloning and character-

ization of a full-length cDNA of the human NUP155 gene

and localization of the gene to the 5p13 region (Zhang et al.,

1999). In the present study, we have sequenced a BAC clone

containing the whole human NUP155 gene and identified its

complete genomic sequence and organization. We have also

X. Zhang et al. / Gene 288 (2002) 9–18 13

Fig. 3. Comparison of the genome regions with Nup155 and neighboring genes in human and Fugu. The directions of transcription are indicated by arrows.

Fig. 2. Comparison of the genomic organization of the human and Fugu Nup155 orthologs. Only the 5 0end and the regions around exons 17 and 36 (E17 & E36,

in solid boxes), and introns 6, 17 and 35 (I6, I17 & I35, in thicker lines), which are absent from the Fugu gene, are drawn to scale in the human gene. The sizes

of the biggest introns, (I1 in human and I21 in Fugu) are indicated.

X. Zhang et al. / Gene 288 (2002) 9–1814

X.

Zh

an

get

al.

/G

ene

28

8(2

00

2)

9–

18

15

Fig. 4. A multiple alignment of predicted amino acid sequences of the Nup155 orthologs in Homo sapiens (Man), M. musculus (Mouse), R. norvegicus (Rat), Fugu rubripes (Fugu), Arabidopsis thaliana

(Arabidopsis), Drosophila melanogaster (Drosophila), and S. cerevisiae (Yeast). The conserved regions or amino acids with various homology are emphasized by dark or light shading.

cloned and sequenced a Fugu cosmid containing the

Nup155 ortholog and another four genes, as well as

cDNAs of the Nup155 ortholog in Fugu and mouse. We

have also studied alternative transcript variants of the

human NUP155 gene and performed comparative analysis

of the genomic organization and gene order in the region

containing the Nup155 orthologs in different species.

4.1. The genomic organization and alternative transcripts of

the human NUP155 gene

The human NUP155 gene has 36 exons according to the

alignment with the cDNA sequence (Fig. 2). The biggest

exon, exon 1, which contains the 5 0-untranslated region (5 0

untranslated region, UTR) and the translation start site

(ATG) is 588 bp in length or even longer since the transcrip-

tion start site could be further upstream. The smallest exon,

exon 17, which is not present in the Fugu gene, is only 63 bp

in length. The biggest intron, intron 1, is 6281 bp in length

whereas the smallest, intron 26, is only 213 bp in length. The

total size of all the introns is 75.705 kb. All the exon-intron

boundaries are conserved except the 5 0 splice donor signal

of intron 35 (GC instead of GT), which, together with the

sequence corresponding to the human exon 36, is not

present in the Fugu gene.

Our previous Northern analyzes and EST-derived infor-

mation have suggested that the human NUP155 gene is

widely expressed (eight tissues tested). We also found that

there are two main transcripts of the gene, around 5.4 and

4.7 kb in length, and suggested that it may result from an

alternative usage of the two polyadenylation signals (Zhang

et al., 1999). In the present analysis we show, on the basis of

the RT-PCR results, that the alternative transcripts also

involve an alternative usage of a 5 0 cryptic splice donor

signal inside intron 1 (Fig. 1). The bigger and less abundant

PCR product (494 bp) that results from the usage of the 5 0

cryptic splicing donor signal might constitute a minor

species of the transcripts which is difficult to detect by

Northern analysis. The usage of the 5 0 cryptic splice

donor signal inside intron 1 would create an in-frame stop

codon so that a second ATG would have to be used as the

translation start site. A truncated gene product of 149 kDa

would be predicted, which is much smaller than the protein

that was characterized in rat (Radu et al., 1993).

4.2. Comparative analyzes of the Nup155 orthologs and

their evolution

Although the coding sequence of the Nup155 gene and its

orthologs are highly conserved (Fig. 4), the genomic orga-

nization has undergone many significant changes during the

evolution of eukaryotes. Firstly, there are two Nup155 para-

logs, Nup170 and Nup157, in yeast. Both are major consti-

tuents of the yeast nuclear pore complex. Although the

function of the yeast Nup170, which encodes a specialized

nucleoporin with a unique role in chromosome segregation

and possibly kinetochore function (Kerscher et al., 2001), is

replaceable with rat Nup155, its complete deletion gives rise

to a synthetic lethal phenotype (Aitchison et al., 1995a,b).

Strikingly, such dependence on two paralogous genes is not

conserved in higher eukaryotes since only a single Nup155

locus is present in all other eukaryotes examined so far,

indicating different evolutionary paths since the divergence

of unicellular and multicellular eukaryotic organisms.

Secondly, we are unable to find a Nup155 ortholog in the

complete sequence of the C. elegance genome by BLAST

search. In Drosophila, the Nup154 gene is identified as the

Nup155 ortholog, which is 47% identical to the human

NUP155 cDNA. The Nup154 gene is proven necessary for

survival. This protein is also essential when assembly of

new NPCs is required in proliferating or growing tissues

(Kiger et al., 1999), such as in male and female gametogen-

esis (Gigliotti et al., 1998). The Nup155 ortholog has also

been found in zebrafish (five ESTs, Accession Nos.

AA494635, AI558361, AW170971, AW175336,

AW422352). Finally, Nup155 orthologs are also found in

plants, including Arabidopsis thaliana (Accession No.

AAF79236) and in Tritrichomonas foetus (a partial

sequence with Accession No. AAB51116).

The Fugu ortholog presently characterized is only one/ten

of the size of its human counterpart since all of the corre-

sponding introns in Fugu were significantly smaller than

those in human (Fig. 2). It does not contain sequences

homologous to the human exon 17, which is the smallest

exon (63 bp) of the human NUP155 gene, nor sequences

corresponding to the human intron 17 (Fig. 2). The biologi-

cal significance of the corresponding protein domain is not

known. The Fugu 3 0 untranslated region (3 0 UTR) is similar

to that of the shorter (less abundant) transcript of the human

NUP155 gene, where exon 36 is not present. The sequences

homologous to exons 6 and 7 in human are fused into a

single intact exon in Fugu because of the absence of intron

6 (Fig. 2). Therefore, the predicted total molecular weight of

the Fugu gene product is only 153–2 kDa smaller than its

human counterpart, thus it should be named Nup153 accord-

ingly. Most of the exon and intron structures seem well

conserved, indicating a similarity in intron phasing. Totally,

the human introns in the NUP155 gene are 21.7 times that of

the Fugu ones. The ratio in basepair length of introns rela-

tive to exons is also much higher in human (14.0:1 in human

and 1.2:1 in Fugu). Another observation is that the size

pattern among introns is not at all consistent (the biggest

intron in man is intron 2, whereas intron 21 is the biggest in

Fugu), indicating independent evolution (Fig. 2).

Other organizational changes among the Nup155 gene

orthologs have also occurred during evolution of eukar-

yotes. In yeast, both Nup157 on Chromosome V and

Nup170 on Chromosome II appear intronless. The Nup154

gene on Chromosome II in Drosophila has only 11 introns

and the Arabidopsis ortholog on Chromosome 1 has 12

introns. The introns are three times as many in Fugu and

human, which have 32 and 35 introns, respectively. Such an

increase in intron number is quite commonly seen when

X. Zhang et al. / Gene 288 (2002) 9–1816

genes are compared over large evolutionary distance.

Detailed comparison of the Drosophila Nup154 gene and

the human NUP155 gene demonstrates that five out of the

11 introns are found at the same positions in the amino acid

sequence as those in the human NUP155 gene. The fact that

many introns only exist in some evolutionary lineages does

not necessarily mean that they are functionally unimportant,

but may indicate that some aspects of the evolutionary

process lie in the subtlety of the genomic structure.

4.3. Gene order differences in the Nup155 orthologous

regions of the human and Fugu genomes

One role of comparative genomics is to provide informa-

tion for the assembly of contiguous clusters of sequence

data across orthologous segments in related genomes and

for the identification of gene structural and functional units.

It has been suggested that with a synteny similar to that in

man, the very small Fugu genome could be utilized in posi-

tional cloning (Davidson et al., 2000; Trower et al., 1996).

In our study, however, the five genes identified in the Fugu

78-K9 cosmid clone cast significant doubts on the synteny

similarities between the two vertebrate genomes. Human

orthologs of these five genes have been located to three

different regions in the genome (Fig. 3). This is in agree-

ment with other reports in the recent literature (Gilley and

Fried, 1999). The degree of synteny similarity between the

two genomes could be different from region to region

arguing for further comparative mapping and sequencing

of the two genomes. However, the difference in gene

order in the region containing the Nup155 orthologs in the

human and Fugu genomes advocates cautious interpretation

of synteny in comparative genomics.

Acknowledgements

This study was supported by Chinese Academy of

Sciences, Ministry of Sciences and Technologies and the

National Natural Science Foundation of China, as well as

by the Danish Karen Elise Jensens Fund and DANIDA,

Denmark.

References

Aitchison, J.D., Blobel, G., Rout, M.P., 1995a. Nup120p: a yeast nucleo-

porin required for NPC distribution and mRNA transport. J. Cell Biol.

131, 1659–1675.

Aitchison, J.D., Rout, M.P., Marelli, M., Blobel, G., Wozniak, R.W.,

1995b. Two novel related yeast nucleoporins Nup170p and Nup157p:

complementation with the vertebrate homologue Nup155p and func-

tional interactions with the yeast nuclear pore-membrane protein

Pom152p. J. Cell Biol. 131, 1133–1148.

Bagley, S., Goldberg, M.W., Cronshaw, J.M., Rutherford, S., Allen, T.D.,

2000. The nuclear pore complex. J. Cell Sci. 113, 3885–3886.

Belgareh, N., Rabut, G., Bai, S.W., van Overbeek, M., Beaudouin, J.,

Daigle, N., Zatsepina, O.V., Pasteau, F., Labas, V., Fromont-Racine,

M., Ellenberg, J., Doye, V., 2001. An evolutionarily conserved NPC

subcomplex, which redistributes in part to kinetochores in mammalian

cells. J Cell Biol. 154, 1147–1160.

Boer, J., Bonten-Surtel, J., Grosveld, G., 1998. Overexpression of the

nucleoporin CAN/NUP214 induces growth arrest, nucleocytoplasmic

transport defects, and apoptosis. Mol. Cell Biol. 18, 1236–1247.

Bouck, J., Miller, W., Gorrell, J.H., Muzny, D., Gibbs, R.A., 1998. Analysis

of the quality and utility of random shotgun sequencing at low redun-

dancies. Genome Res. 8, 1074–1084.

Davidson, H., Taylor, M.S., Doherty, A., Boyd, A.C., Porteous, D.J., 2000.

Genomic sequence analysis of Fugu rubripes CFTR and flanking genes

in a 60 kb region conserving synteny with 800 kb of human chromo-

some 7. Genome Res. 10, 1194–1203.

Doye, V., Hurt, E.C., 1995. Genetic approaches to nuclear pore structure

and function. Trends Genet. 11, 235–241.

Ewing, B., Green, P., 1998. Base-calling of automated sequencer traces

using phred. II. Error probabilities. Genome Res. 8, 186–194.

Ewing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of auto-

mated sequencer traces using phred. I. Accuracy assessment. Genome

Res. 8, 175–185.

Fabre, E., Hurt, E., 1997. Yeast genetics to dissect the nuclear pore complex

and nucleocytoplasmic trafficking. Annu. Rev. Genet. 31, 277–313.

Gigliotti, S., Callaini, G., Andone, S., Riparbelli, M.G., Pernas-Alonso, R.,

Hoffmann, G., Graziani, F., Malva, C., 1998. Nup154, a new Droso-

phila gene essential for male and female gametogenesis is related to the

NUP155 vertebrate nucleoporin gene. J. Cell Biol. 142, 1195–1207.

Gilley, J., Fried, M., 1999. Extensive gene order differences within regions

of conserved synteny between the Fugu and human genomes: implica-

tions for chromosomal evolution and the cloning of disease genes. Hum.

Mol. Genet. 8, 1313–1320.

Gordon, D., Abajian, C., Green, P., 1998. Consed: a graphical tool for

sequence finishing. Genome Res. 8, 195–202.

Gorlich, D., Mattaj, I.W., 1996. Nucleocytoplasmic transport. Science 271,

1513–1518.

International Human Genome Sequencing Consortium, 2001. Initial

sequencing and analysis of the human genome. Nature 409, 860–921.

Jaju, R.J., Fidler, C., Haas, O.A., Strickson, A.J., Watkins, F., Clark, K.,

Cross, N.C., Cheng, J.F., Aplan, P.D., Kearney, L., Boultwood, J.,

Wainscoat, J.S., 2001. A novel gene, NSD1, is fused to NUP98 in the

t(5;11)(q35;p15.5) in de novo childhood acute myeloid leukemia. Blood

98, 1264–1267.

Kerscher, O., Hieter, P., Winey, M., Basrai, M.A., 2001. Novel role for a

Saccharomyces cerevisiae nucleoporin, Nup170p, in chromosome

segregation. Genetics 157, 1543–1553.

Kiger, A.A., Gigliotti, S., Fuller, M.T., 1999. Developmental genetics of the

essential Drosophila nucleoporin nup154: allelic differences due to an

outward-directed promoter in the P-element 3 0 end. Genetics 153, 799–

812.

Kosova, B., Pante, N., Rollenhagen, C., Hurt, E., 1999. Nup192p is a

conserved nucleoporin with a preferential location at the inner site of

the nuclear membrane. J. Biol. Chem. 274, 22646–22651.

Miller, B.R., Powers, M., Park, M., Fischer, W., Forbes, D.J., 2000. Identi-

fication of a new vertebrate nucleoporin, nup188, with the use of a novel

organelle trap assay. Mol. Biol. Cell 11, 3381–3396.

Radu, A., Blobel, G., Wozniak, R.W., 1993. Nup155 is a novel nuclear pore

complex protein that contains neither repetitive sequence motifs nor

reacts with WGA. J. Cell Biol. 121, 1–9.

Sirotkin, H., O’Donnell, H., DasGupta, R., Halford, S., St.Jore, B., Puech,

A., Parimoo, S., Morrow, B., Skoultchi, A., Weissman, S.M., Scambler,

P., Kucherlapati, R., 1997. Identification of a new human catenin gene

family member (ARVCF) from the region deleted in velo-cardio-facial

syndrome. Genomics 41, 75–83.

Smith, T.M., Lee, M.K., Szabo, C.I., Jerome, N., McEuen, M., Taylor, M.,

Hood, L., King, M.C., 1996. Complete genomic sequence and analysis

of 117 kb of human DNA containing the gene BRCA1. Genome Res. 6,

1029–1049.

Trower, M.K., Orton, S.M., Purvis, I.J., Sanseau, P., Riley, J., Christodou-

lou, C., Burt, D., See, C.G., Elgar, G., Sherrington, R., Rogaev, E.I.,

X. Zhang et al. / Gene 288 (2002) 9–18 17

St.George-Hyslop, P., Brenner, S., Dykes, C.W., 1996. Conservation of

synteny between the genome of the pufferfish (Fugu rubripes) and the

region on human chromosome 14 (14q24.3) associated with familial

Alzheimer disease (AD3 locus). Proc. Natl. Acad. Sci. 93, 1366–1369.

van Deursen, J., Boer, J., Kasper, L., Grosveld, G., 1996. G2 arrest and

impaired nucleocytoplasmic transport in mouse embryos lacking the

proto-oncogene CAN/Nup214. EMBO J. 15, 5574–5583.

Wong, G.K., Yu, J., Thayer, E.C., Olson, M.V., 1997. Multiple-complete-

digest restriction fragment mapping: generating sequence-ready maps

for large-scale DNA sequencing. Proc. Natl. Acad. Sci. 94, 5225–5230.

Zhang, X., Yang, H., Corydon, M.J., Pedersen, S., Korenberg, J.R., Chen,

X.N., Laporte, J., Gregersen, N., Niebuhr, E., Liu, G., Bolund, L., 1999.

Localization of a human nucleoporin 155 gene (NUP155) to the 5p13

region and cloning of its cDNA. Genomics 57, 144–151.

X. Zhang et al. / Gene 288 (2002) 9–1818