Gene cataloging and expression profiling in human gastric cancer cells by expressed sequence tags

22
Gene cataloging and expression profiling in human gastric cancer cells by expressed sequence tags Nam-Soon Kim, a Yoonsoo Hahn, a Jung-Hwa Oh, a Ju-Yeon Lee, a Kyung-Jin Oh, a Jeong-Min Kim, a Hong-Seog Park, a Sangsoo Kim, a Kyu-Sang Song, b Seung-Moo Rho, b Hyang-Sook Yoo, a and Yong Sung Kim a, * a Laboratory of Human Genomics, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-333, South Korea b College of Medicine, ChungNam National University, Daejeon 301-040, South Korea Received 7 July 2003; accepted 4 December 2003 Available online 8 March 2004 Abstract To understand the molecular mechanism associated with gastric carcinogenesis, we identified genes expressed in gastric cancer cell lines and tissues. Of 97,609 high-quality ESTs sequenced from 36 cDNA libraries, 92,545 were coalesced into 10,418 human Unigene clusters (Build 151). The gene expression profile was produced by counting the cluster frequencies in each library. Although the profiles of highly expressed genes varied greatly from library to library, those genes related to cell structure formation, heat shock proteins, the glycolysis pathway, and the signaling pathway were highly represented in human gastric cancer cell lines and in primary tumors. Conversely, the genes encoding immunoglobulins, ribosomal proteins, and digestive proteins were down-regulated in gastric cancer cell lines and tissues compared to normal tissues. The transcription levels of some of these genes were confirmed by RT-PCR. We found that genes related to cell adhesion, apoptosis, and cytoskeleton formation were particularly up-regulated in the gastric cancer cell lines established from malignant ascites compared to those from primary tumors. This comprehensive molecular profiling of human gastric cancer should be useful for elucidating the genetic events associated with human gastric cancer. D 2003 Elsevier Inc. All rights reserved. Keywords: Gastric cancer; EST; Expression profiling; Metastasis Gastric cancer is one of the leading causes of cancer death in East Asia, including Korea and Japan [1], and although advances in the development of diagnostic tools and thera- peutics have enabled us to offer an excellent long-term survival rate to those with early gastric cancer, the prognosis of advanced gastric cancer remains poor [2]. Recent molec- ular analyses have revealed that many genetic alterations occur in genes such as p53 [3], h-catenin [4], E-cadherin [5], trefoil factor 1 [6], and c-met [7] in gastric cancer. However, this information is hardly sufficient to enable us to under- stand the common pathways of carcinogenesis and of the progression of gastric cancer. Furthermore, gastric cancer shows diverse clinical properties in terms of histological type, metastatic status, invasiveness, and responsiveness to chemotherapy, and little is known about the genes associated with these characteristics. The identification of the genes that specify a given cell type or tissue or those associated with a pathological state of a particular tissue provides valuable molecular insight into biological phenomena and cellular physiology. Like any specified tissue and cell population in human disease, the biological features of human gastric cancer might be affected primarily by the expression of a particular set of genes. High-throughput methods, such as cDNA or oligo- nucleotide microarrays, are increasingly being used to compare systematically molecular features of individual cancers to key clinical parameters. Recent studies have reported expression profiles using microarray or SAGE in 0888-7543/$ - see front matter D 2003 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2003.12.002 $ Sequence data from this article have been deposited with the GenBank Data Library under Accession Nos. BM737564 – BM750047, BM750051 – BM770037, BM770039 – BM773528, BM781444 – BM799443, BM817422–BM820944, BM820950–BM838954, BM838958 – BM844429, BM844437 – BM857413, and BQ081846 – BQ084582. * Corresponding author. Fax: +82-42-860-4409. E-mail address: [email protected] (Y.S. Kim). www.elsevier.com/locate/ygeno Genomics 83 (2004) 1024 – 1045

Transcript of Gene cataloging and expression profiling in human gastric cancer cells by expressed sequence tags

www.elsevier.com/locate/ygeno

Genomics 83 (2004) 1024–1045

Gene cataloging and expression profiling in human gastric cancer cells by

expressed sequence tags

Nam-Soon Kim,a Yoonsoo Hahn,a Jung-Hwa Oh,a Ju-Yeon Lee,a Kyung-Jin Oh,a

Jeong-Min Kim,a Hong-Seog Park,a Sangsoo Kim,a Kyu-Sang Song,b Seung-Moo Rho,b

Hyang-Sook Yoo,a and Yong Sung Kima,*

aLaboratory of Human Genomics, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-333, South KoreabCollege of Medicine, ChungNam National University, Daejeon 301-040, South Korea

Received 7 July 2003; accepted 4 December 2003

Available online 8 March 2004

Abstract

To understand the molecular mechanism associated with gastric carcinogenesis, we identified genes expressed in gastric cancer cell lines

and tissues. Of 97,609 high-quality ESTs sequenced from 36 cDNA libraries, 92,545 were coalesced into 10,418 human Unigene clusters

(Build 151). The gene expression profile was produced by counting the cluster frequencies in each library. Although the profiles of highly

expressed genes varied greatly from library to library, those genes related to cell structure formation, heat shock proteins, the glycolysis

pathway, and the signaling pathway were highly represented in human gastric cancer cell lines and in primary tumors. Conversely, the genes

encoding immunoglobulins, ribosomal proteins, and digestive proteins were down-regulated in gastric cancer cell lines and tissues compared

to normal tissues. The transcription levels of some of these genes were confirmed by RT-PCR. We found that genes related to cell adhesion,

apoptosis, and cytoskeleton formation were particularly up-regulated in the gastric cancer cell lines established from malignant ascites

compared to those from primary tumors. This comprehensive molecular profiling of human gastric cancer should be useful for elucidating the

genetic events associated with human gastric cancer.

D 2003 Elsevier Inc. All rights reserved.

Keywords: Gastric cancer; EST; Expression profiling; Metastasis

Gastric cancer is one of the leading causes of cancer death

in East Asia, including Korea and Japan [1], and although

advances in the development of diagnostic tools and thera-

peutics have enabled us to offer an excellent long-term

survival rate to those with early gastric cancer, the prognosis

of advanced gastric cancer remains poor [2]. Recent molec-

ular analyses have revealed that many genetic alterations

occur in genes such as p53 [3], h-catenin [4], E-cadherin [5],trefoil factor 1 [6], and c-met [7] in gastric cancer. However,

this information is hardly sufficient to enable us to under-

0888-7543/$ - see front matter D 2003 Elsevier Inc. All rights reserved.

doi:10.1016/j.ygeno.2003.12.002

$ Sequence data from this article have been deposited with the

GenBank Data Library under Accession Nos. BM737564–BM750047,

BM750051 – BM770037, BM770039 – BM773528, BM781444 –

BM799443, BM817422 – BM820944, BM820950 – BM838954,

BM838958–BM844429, BM844437–BM857413, and BQ081846–

BQ084582.

* Corresponding author. Fax: +82-42-860-4409.

E-mail address: [email protected] (Y.S. Kim).

stand the common pathways of carcinogenesis and of the

progression of gastric cancer. Furthermore, gastric cancer

shows diverse clinical properties in terms of histological

type, metastatic status, invasiveness, and responsiveness to

chemotherapy, and little is known about the genes associated

with these characteristics.

The identification of the genes that specify a given cell

type or tissue or those associated with a pathological state

of a particular tissue provides valuable molecular insight

into biological phenomena and cellular physiology. Like

any specified tissue and cell population in human disease,

the biological features of human gastric cancer might be

affected primarily by the expression of a particular set of

genes. High-throughput methods, such as cDNA or oligo-

nucleotide microarrays, are increasingly being used to

compare systematically molecular features of individual

cancers to key clinical parameters. Recent studies have

reported expression profiles using microarray or SAGE in

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1025

gastric cancer samples, identifying a few hundred genes

that distinguish tumor from normal tissue [8–17]. But

traditionally the expressed sequence tags (ESTs) generated

by the single-pass sequencing of cDNA clones randomly

selected from cDNA libraries have been also used to

identify novel genes as well as to collect novel ESTs

[18,19]. The ESTs have also been found useful for the

differential and quantitative analysis of expression patterns

[20] and for the evaluation of gene expression profiles in a

given tissue [18–21]. Therefore, it is evident that the

identification and cataloging of genes associated with

gastric carcinogenesis by detailing gene expression profiles

will facilitate the elucidation of the molecular mechanisms

of gastric carcinogenesis.

We set out to collect the entire set of genes expressed in

gastric cancer cell lines or tissues and to identify those genes

associated with gastric cancer by examining their expression

profiles. In particular, we applied a strategy to obtain full-

length cDNAs, as these clones are a valuable resource for the

functional study of the genes. We also tried to discover novel

genes or rarely expressed genes efficiently by subtracting

and normalizing abundant cDNAs. As a first step, we

constructed 36 cDNA libraries, including full-length

enriched cDNA libraries [22,23], subtracted cDNA libraries,

and a normalized cDNA library [24], from gastric cancer cell

lines and the tissues of Korean patients. Here, we report upon

the expression profiles of those genes abnormally expressed

in gastric cancer cell lines and tissues.

Results

Large-scale EST collection from human gastric cells

Twenty-two libraries including 19 full-length and 3 uni-

versal cDNA libraries were constructed from 12 gastric

cancer cell lines and one primary tumor as well as from six

normal tissues obtained from Korean patients. Thirteen

subtracted cDNA libraries were also generated by removing

highly represented clones from the original libraries. In

addition, 1 normalized cDNA library was prepared. In total,

123,360 clones were randomly picked from these 36 libraries

and used for 5V-end single-pass sequencing. The sequences

obtained were subjected to quality control procedures,

namely, the trimming of the vector region and the removal

of low-quality or short (fewer than 100 bp) sequences.

Finally, 97,609 high-quality ESTs of average length 480 bp

were collected (Table 1).

After screening out 2354 ESTs derived from mitochon-

drial DNAs, ribosomal DNAs, and human repetitive sequen-

ces, we submitted the remaining 95,255 sequences to

NCBI’s dbEST database. To identify different kinds of

transcripts, we first used the CAP3 program and assembled

all ESTs into 19,124 clusters. Of these 9222 were assembled

from 85,353 sequences and the remaining 9902 were derived

from singlets (Table 2). However, only 92,545 of our ESTs

were coalesced into 10,418 human UniGene clusters (Build

151). This difference may be due to the stringency and the

strategy used for clustering. That is, 2710 ESTs were

excluded during UniGene clustering, as if these ESTs do

not have a polyadenylation signal or tail. On the other hand,

CAP3-derived but different transcripts could be grouped as a

single cluster, because transcripts or cDNAs produced by

alternative splicing or from different transcription start sites

may be omitted by UniGene. We analyzed the subsequent

expression profiles from the results obtained using the

UniGene clustering method.

Identification of genes highly expressed in cDNA libraries of

gastric cancer cell lines and tissues

Since it is likely that cellular functions are associated

with genes abundantly expressed in the cell, we first

examined the expression profiles of genes highly expressed

in each gastric cancer cell line and tissue by analyzing those

ESTs obtained from the cDNA libraries that were not either

subtracted or normalized. Thus, we selected 194 cDNAs

with a frequency of over 0.6% in each cDNA library, as

shown in Table 3.

Of the selected genes, 132 cDNAs were abundant in the

libraries of the 12 gastric cancer cell lines or in the one

primary tumor, but were rare or below 0.6% in frequency in

libraries of normal gastric tissues. The cDNAs abundantly

expressed in cancer cell are marked as ‘‘c’’ and ‘‘*c’’ in the

first column of Table 3. The ‘‘c’’ indicates that the cDNAwas

expressed abundantly in at least two libraries and 22 cDNAs

were included to this group. For example, RPS2 and ACTB

were highly abundant in eight libraries; K-ALPHA-1 in six

libraries; GAPD and KRT8 in five libraries; RPLP0 and

FTH1 in four libraries; FKBP1A, PKM2, ENO1, LDHA,

SURF4, GNB2L1, and PGK1 in three libraries; and HSPA8,

RPL5, RPL10, SH3GLB2, TUBA6, RPL23A, TMSB4X, and

SDC1 in two libraries. In particular, RPS2, RPLP0, FTH1,

and GNB2L1 were highly abundant in the libraries from the

single primary tumor and from the cancer cell lines.

We also identified 110 abundant cDNAs that were

specifically present only in a given library from the cancer

cell lines or from the single primary tumor; these are marked

as ‘‘*c’’. For example, UNC93B1, NP, SQSTM1, CIZ1, and

DGKZ were observed in a library of S1SNU5 but were rare

or had a frequency below 0.6% in all the other libraries, as

shown in Table 3. The cDNAs abundantly expressed in a

given cDNA library were also detected in all other libraries

except for S21SNU520 and S13KMS5. It is noticeable that

two cDNAs, ANXA8 and GADD45B, were present at

extremely high levels in SNU668 (5.7%) and SNU638

(8.2%) and that a large portion of the abundant cDNAs

for 34 genes was observed in the S8SNU638 library. These

library-specific features suggest that a particular cellular

function might be reflected in a specific expression profile

in a given cell rather than being due to a sampling error

involving picking library clones, since about 3800 clones

Table 1

Summary of cDNA libraries

Source Library Vector Library Reads UniGene 151b

typeaClones Clusters

Cancer cell line

SNU5 S1SNU5 pCNS Full 1,994 1,876 805

S1SNU5s1 pCNS Full/sub 370 358 208

S1SNU5s2 pCNS Full/sub 1,136 1,117 274

SNU668 S2SNU668 pCNS Univ 1,914 1,696 684

S2SNU668s1 pCNS Univ/sub 1,878 1,821 588

SNU16 S3SNU16 pTZ18RP1 Full 2,906 2,674 964

S3SNU16s1 pTZ18RP1 Full/sub 1,818 1,748 702

S22SNU16 pT7T3Pac Univ 976 828 640

S22SNU16n1 pT7T3Pac Univ/norm 11,381 10,798 4,442

SNU1 S4SNU1 pTZ18RP1 Full 1,874 1,800 736

S4SNU1s1 pTZ18RP1 Full/sub 79 79 48

S10SNU1 pME18S-FL3 Full 2,023 1,973 1,007

S11SNU1 pME18S-FL3 Full 6,344 6,089 2,417

SNU484 S5SNU484 pTZ18RP1 Univ 2,559 2,351 934

S5SNU484s1 pTZ18RP1 Univ/sub 1,527 1,434 806

SNU620 S6SNU620 pCNS Full 2,887 2,778 1,051

S6SNU620s1 pCNS Full/sub 1,996 1,930 825

SNU719 S7SNU719 pCNS Full 2,942 2,781 853

S7SNU719s1 pCNS Full/sub 1,788 1,695 726

SNU638 S8SNU638 pCNS Full 2,649 2,483 487

S8SNU638s1 pCNS Full/sub 55 53 31

SNU601 S9SNU601 pME18S-FL3 Full 7,467 6,883 2,902

SNU216 S12SNU216 pCNS Full 7,901 7,634 1,780

S12SNU216s1 pCNS Full/sub 76 75 52

SNU520 S21SNU520 pTZ18RP1 Full 6,170 5,931 2,402

S21SNU520s1 pTZ18RP1 Full/sub 1,483 1,437 858

KMS5 S13KMS5 pCNS Full 5,549 5,444 2,025

S13KMS5s1 pCNS Full/sub 2,317 2,269 1,203

Normal tissue

NK402 S14K402 pTZ18RP1 Full 6,005 5,800 2,026

S14K402s1 pTZ18RP1 Full/sub 1,613 1,569 583

N181907 S15N181907 pCNS Full 33 28 27

N667673 S16N667673 pCNS Full 113 97 56

N258215 S17N258215 pCNS Full 696 527 263

N669761 S18N669761 pCNS Full 2,057 1,808 521

N665307 S19N665307 pCNS Full 1,410 1,310 469

Tumor tissue

T665307 S20T665307 pCNS Full 3,623 3,371 1,274

Total 97,609 92,545 10,418

a Library type: Full, full-length enriched; Univ, universal; sub, subtracted; norm, normalized.b Number of clones and clusters in NCBI UniGene Build 151 contributed by our EST sequences.

Table 2

EST clustering results

CAP3a UniGeneb

Total sequences analyzed 97,609

Mitochondrial DNA and others 2,354

Number of ESTs in cluster 95,255 92,545

Number of clusters 19,124 10,418

Number of singletons 9,902 2,922c

a Repetitive sequences were masked to N’s and their quality values were set

to 1. CAP3 program was run with default parameters. Contigs and singlets

were counted as clusters.b EST clones in UniGene Build 151 contributed by our stomach ESTs were

counted.c Number of clusters in which only a single stomach EST was found.

N.-S. Kim et al. / Genomics 83 (2004) 1024–10451026

per library were randomly chosen from the libraries exam-

ined (Table 1) and many cDNAs abundantly expressed in

cancer cells (c) were commonly shared by the libraries

examined. In particular, three cDNA libraries (S4SNU1,

S10SNU1, and S11SNU1) derived from SNU1 cells at

different stages of growth showed a similar occurrence of

abundant cDNAs, though the profiles of S10SNU1 and

S11SNU1 were most similar. For example, EEF1A1 and

ACTB were commonly present in three libraries, but LDHA,

K-ALPHA-1, ENO1, and TMSB4X were abundant only in

S10SNU1 and S11SNU1.

Of the selected cDNAs with a frequency over 0.6% in

each cDNA library, we also identified 55 cDNAs abundant

Table 3

Genes highly expressed in each cDNA library

UniGene Copies (%) Gene Chromosome Title

S1SNU5

b Hs.181165 29 (1.5458) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.5662 23 (1.2260) GNB2L1 5q35.3 guanine nucleotide binding protein (G protein), beta

polypeptide 2-like 1

c Hs.288061 20 (1.0661) ACTB 7p15-p12 actin, beta

*c Hs.135187 17 (0.9062) UNC93B1 11q13 unc-93 homolog B1 (C. elegans)

c Hs.334842 14 (0.7463) K-ALPHA-1 12q12-12q14.3 tubulin, alpha, ubiquitous

*c Hs.75514 13 (0.6930) NP 14q13.1 nucleoside phosphorylase

b Hs.77961 12 (0.6397) HLA-B 6p21.3 major histocompatibility complex, class I, B

c Hs.350046 12 (0.6397) RPL23A 17q11 ribosomal protein L23a

*c Hs.182248 12 (0.6397) SQSTM1 5q35 sequestosome 1

*c Hs.23476 12 (0.6397) CIZ1 9q34.1 Cip1-interacting zinc finger protein

*c Hs.277445 12 (0.6397) DGKZ 11p11.2 diacylglycerol kinase, zeta (104kD)

c Hs.254105 12 (0.6397) ENO1 1p36.3-p36.2 enolase 1, (alpha)

S2SNU668

*c Hs.87268 97 (5.7193) ANXA8 10q11.2 annexin A8

*c Hs.78225 28 (1.6509) ANXA1 9q12-q21.2 annexin A1

*c Hs.110849 28 (1.6509) ESRRA 11q13 estrogen-related receptor alpha

*c Hs.77326 20 (1.1792) IGFBP3 7p13-p12 insulin-like growth factor binding protein 3

*c Hs.750 19 (1.1203) FBN1 15q21.1 fibrillin 1 (Marfan syndrome)

b Hs.181165 18 (1.0613) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

*c Hs.10958 17 (1.0024) DJ-1 1p36.33-p36.12 RNA-binding protein regulatory subunit

*c Hs.81328 17 (1.0024) NFKBIA 14q13 nuclear factor of kappa light polypeptide gene enhancer in

B-cells inhibitor, alpha

*c Hs.7278 15 (0.8844) CRY2 11p11.11 cryptochrome 2 (photolyase-like)

*c Hs.101813 15 (0.8844) SLC9A3R2 16p13.3 solute carrier family 9 (sodium/hydrogen exchanger), isoform

3 regulatory factor 2

*c Hs.75415 14 (0.8255) B2M 15q21-q22.2 beta-2-microglobulin

c Hs.5662 14 (0.8255) GNB2L1 5q35.3 guanine nucleotide binding protein (G protein), beta

polypeptide 2-like 1

c Hs.284296 14 (0.8255) SURF4 9q34.2 surfeit 4

*c Hs.21568 13 (0.7665) FLJ22160 17q25.3 hypothetical protein FLJ22160

*c Hs.181243 12 (0.7075) ATF4 22q13.1 activating transcription factor 4 (tax-responsive enhancer

element B67)

*c Hs.18759 11 (0.6486) C21orf108 21q22.11 chromosome 21 open reading frame 108

*c Hs.93764 11 (0.6486) CPA4 7q32 carboxypeptidase A4

c Hs.182426 11 (0.6486) RPS2 16p13.3 ribosomal protein S2

S3SNU16

b Hs.181165 86 (3.2162) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.29797 48 (1.7951) RPL10 Xq28 ribosomal protein L10

*c Hs.12142 39 (1.4585) WDR13 Xp11.23 WD repeat domain 13

c Hs.242463 34 (1.2715) KRT8 12q13 keratin 8

c Hs.182426 33 (1.2341) RPS2 16p13.3 ribosomal protein S2

*c Hs.180577 31 (1.1593) GRN 17q21.32 granulin

c Hs.350108 30 (1.1219) RPLP0 12q24.2 ribosomal protein, large, P0

*c Hs.334895 27 (1.0097) RPL10A 6p21.3-p21.2 ribosomal protein L10a

c Hs.169476 27 (1.0097) GAPD 12p13 glyceraldehyde-3-phosphate dehydrogenase

*c Hs.353218 27 (1.0097) – 1 Homo sapiens cDNA FLJ23867 fis, clone LNG09729

*c Hs.169610 23 (0.8601) CD44 11p13 CD44 antigen (homing function and Indian blood group

system)

c Hs.62954 23 (0.8601) FTH1 11q13 ferritin, heavy polypeptide 1

b Hs.14376 22 (0.8227) ACTG1 17q25 actin, gamma 1

*c Hs.75812 20 (0.7479) PCK2 14q11.2 phosphoenolpyruvate carboxykinase 2 (mitochondrial)

*c Hs.182979 18 (0.6731) RPL12 9q34 ribosomal protein L12

*c Hs.108124 18 (0.6731) RPS4X Xq13.1 ribosomal protein S4, X-linked

*c Hs.153357 17 (0.6358) PLOD3 7q22 procollagen-lysine, 2-oxoglutarate 5-dioxygenase 3

S4SNU1

b Hs.181165 87 (4.8333) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.179661 22 (1.2222) FKBP1A 20p13 FK506 binding protein 1A (12kD)

(continued on next page)

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1027

Table 3 (continued)

UniGene Copies (%) Gene Chromosome Title

S4SNU1

c Hs.198281 22 (1.2222) PKM2 15q22 pyruvate kinase, muscle

c Hs.169476 21 (1.1667) GAPD 12p13 glyceraldehyde-3-phosphate dehydrogenase

c Hs.182426 20 (1.1111) RPS2 16p13.3 ribosomal protein S2

b Hs.14376 19 (1.0556) ACTG1 17q25 actin, gamma 1

*c Hs.82932 17 (0.9444) CCND1 11q13 cyclin D1 (PRAD1: parathyroid adenomatosis 1)

*c Hs.159557 17 (0.9444) KPNA2 17q23.1-q23.3 karyopherin alpha 2 (RAG cohort 1, importin alpha 1)

*c Hs.76927 16 (0.8889) TOMM34 20 translocase of outer mitochondrial membrane 34

c Hs.284296 14 (0.7778) SURF4 9q34.2 surfeit 4

c Hs.288061 13 (0.7222) ACTB 7p15-p12 actin, beta

*c Hs.155101 12 (0.6667) ATP5A1 18q12-q21 ATP synthase, H+ transporting, mitochondrial F1 complex,

alpha subunit, isoform 1, cardiac muscle

*c Hs.224005 12 (0.6667) – – ESTs

c Hs.78771 12 (0.6667) PGK1 Xq13 phosphoglycerate kinase 1

*c Hs.95110 11 (0.6111) LOC113026 17q21.32 similar to phospholipase C, delta

*c Hs.22391 11 (0.6111) C20orf3 20p11.22-p11.21 chromosome 20 open reading frame 3

c Hs.350108 11 (0.6111) RPLP0 12q24.2 ribosomal protein, large, P0

b Hs.2186 11 (0.6111) EEF1G 11q13.2 eukaryotic translation elongation factor 1 gamma

S10SNU1

c Hs.2795 190 (9.6251) LDHA 11p15.4 lactate dehydrogenase A

c Hs.334842 85 (4.3060) K-ALPHA-1 12q12-12q14.3 tubulin, alpha, ubiquitous

b Hs.181165 82 (4.1540) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.254105 20 (1.0132) ENO1 1p36.3-p36.2 enolase 1, (alpha)

c Hs.288061 19 (0.9625) ACTB 7p15-p12 actin, beta

*c Hs.286 14 (0.7092) RPL4 15q22 ribosomal protein L4

c Hs.75968 13 (0.6586) TMSB4X Xq21.3-q22 thymosin, beta 4, X chromosome

*c Hs.146381 12 (0.6079) RBMX Xq26 RNA binding motif protein, X chromosome

c Hs.78771 12 (0.6079) PGK1 Xq13 phosphoglycerate kinase 1

S11SNU1

c Hs.2795 153 (2.5000) LDHA 11p15.4 lactate dehydrogenase A

*c Hs.289088 104 (1.6993) HSPCA 1q21.2-q22 heat shock 90kD protein 1, alpha

c Hs.75968 92 (1.5033) TMSB4X Xq21.3-q22 thymosin, beta 4, X chromosome

b Hs.181165 62 (1.0131) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.198281 60 (0.9804) PKM2 15q22 pyruvate kinase, muscle

c Hs.180414 56 (0.9150) HSPA8 11q23.3-q25 heat shock 70kD protein 8

*c Hs.74335 48 (0.7843) HSPCB 6p12 heat shock 90kD protein 1, beta

*c Hs.183704 47 (0.7680) UBC 12q24.3 ubiquitin C

c Hs.334842 46 (0.7516) K-ALPHA-1 12q12-12q14.3 tubulin, alpha, ubiquitous

c Hs.254105 46 (0.7516) ENO1 1p36.3-p36.2 enolase 1, (alpha)

c Hs.288061 42 (0.6863) ACTB 7p15-p12 actin, beta

*c Hs.25155 37 (0.6046) NET1 10p15 neuroepithelial cell transforming gene 1

S5SNU484

b Hs.181165 45 (1.9141) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.288061 39 (1.6589) ACTB 7p15-p12 actin, beta

*c Hs.76392 34 (1.4462) ALDH1A1 9q21 aldehyde dehydrogenase 1 family, member A1

c Hs.179661 34 (1.4462) FKBP1A 20p13 FK506 binding protein 1A (12kD)

*c Hs.155986 23 (0.9783) DDX24 14q32 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 24

*c Hs.74573 23 (0.9783) HU-K4 19q13.2 similar to vaccinia virus HindIII K4L ORF

*c Hs.111577 22 (0.9358) ITM3 2q37 integral membrane protein 3

c Hs.334842 21 (0.8932) K-ALPHA-1 12q12-12q14.3 tubulin, alpha, ubiquitous

*c Hs.279946 21 (0.8932) MARS 12q13.2 methionine-tRNA synthetase

*c Hs.284280 17 (0.7231) DKFZp547H236 19q13.33 hypothetical protein DKFZp547H236

b Hs.2186 17 (0.7231) EEF1G 11q13.2 eukaryotic translation elongation factor 1 gamma

c Hs.180414 17 (0.7231) HSPA8 11q23.3-q25 heat shock 70kD protein 8

*c Hs.164280 16 (0.6806) SLC25A6 Xp22.32 or Yp solute carrier family 25 (mitochondrial carrier; adenine

nucleotide translocator), member 6

*c Hs.184640 16 (0.6806) C11orf9 11q12-q13.1 chromosome 11 open reading frame 9

*c Hs.73947 16 (0.6806) PEPD 19q12-q13.2 peptidase D

*c Hs.102950 15 (0.6380) LOC51137 3 coat protein gamma-cop

N.-S. Kim et al. / Genomics 83 (2004) 1024–10451028

Table 3 (continued)

UniGene Copies (%) Gene Chromosome Title

S6SNU620

*c Hs.1435 61 (2.1958) GMPR 6p23 guanosine monophosphate reductase

b Hs.181165 41 (1.4759) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

b Hs.1867 39 (1.4039) PGC 6p21.3-p21.1 progastricsin (pepsinogen C)

c Hs.30002 29 (1.0439) SH3GLB2 9q34 SH3-domain GRB2-like endophilin B2

c Hs.82109 27 (0.9719) SDC1 2p24.1 syndecan 1

*c Hs.174051 24 (0.8639) SNRP70 19q13.3 small nuclear ribonucleoprotein 70kD polypeptide

(RNP antigen)

*c Hs.234734 22 (0.7919) LYZ 12q13.2 lysozyme (renal amyloidosis)

*c Hs.236556 21 (0.7559) STRAIT11499 Xp11.4 hypothetical protein STRAIT11499

*c Hs.139800 20 (0.7199) HMGIY 6p21 high-mobility group (nonhistone chromosomal) protein

isoforms I and Y

*c Hs.12084 20 (0.7199) TUFM 16p11.2 Tu translation elongation factor, mitochondrial

c Hs.78771 19 (0.6839) PGK1 Xq13 phosphoglycerate kinase 1

*c Hs.72363 19 (0.6839) – – Homo sapiens, clone MGC:16771 IMAGE:3907551, mRNA,

complete cds

c Hs.182426 18 (0.6479) RPS2 16p13.3 ribosomal protein S2

c Hs.29797 17 (0.6120) RPL10 Xq28 ribosomal protein L10

S7SNU719

b Hs.181165 82 (2.9486) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

*c Hs.79217 53 (1.9058) PYCR1 17q25.3 pyrroline-5-carboxylate reductase 1

c Hs.242463 52 (1.8698) KRT8 12q13 keratin 8

*c Hs.84285 47 (1.6900) UBE2I 16p13.3 ubiquitin-conjugating enzyme E2I (UBC9 homolog, yeast)

*c Hs.20313 40 (1.4383) PTK2B 8p21.1 protein tyrosine kinase 2 beta

*c Hs.115960 40 (1.4383) MGC3103 11q12.1 hypothetical protein MGC3103

*c Hs.127269 30 (1.0787) – – Homo sapiens, clone MGC:21990 IMAGE:4397794, mRNA,

complete cds

*c Hs.279789 25 (0.8990) GPI 19q13.1 glucose phosphate isomerase

*c Hs.283429 23 (0.8270) SMCX Xp11.22-p11.21 Smcx homolog, X chromosome (mouse)

*c Hs.73452 19 (0.6832) MGC10791 16p13.11 hypothetical protein MGC10791

*c Hs.4835 18 (0.6472) EIF3S8 16p11.2 eukaryotic translation initiation factor 3, subunit 8 (110kD)

*c Hs.195219 18 (0.6472) FLJ14486 19q13.43 hypothetical protein FLJ14486

*c Hs.326588 18 (0.6472) GLTSCR2 19q13.3 glioma tumor suppressor candidate region gene 2

*c Hs.274317 17 (0.6113) ABS 5q35.3 DEAD-box protein abstrakt

c Hs.284296 17 (0.6113) SURF4 9q34.2 surfeit 4

S8SNU638

*c Hs.110571 204 (8.2159) GADD45B 19p13.3 growth arrest and DNA-damage-inducible, beta

*c Hs.78465 77 (3.1011) JUN 1p32-p31 v-jun sarcoma virus 17 oncogene homolog (avian)

*c Hs.28935 75 (3.0205) TLE1 19p13.3 transducin-like enhancer of split 1 (E(sp1) homolog,

Drosophila)

*c Hs.1872 72 (2.8997) PCK1 20q13.31 phosphoenolpyruvate carboxykinase 1 (soluble)

*c Hs.184727 70 (2.8192) – – ESTs, Weakly similar to SYN1 MOUSE SYNAPSIN I

[M.musculus]

*c Hs.75612 48 (1.9331) STIP1 11q13 stress-induced-phosphoprotein 1 (Hsp70/Hsp90-organizing

protein)

c Hs.82109 47 (1.8929) SDC1 2p24.1 syndecan 1

*c Hs.18842 45 (1.8123) PACSIN2 22q13.2-13.33 protein kinase C and casein kinase substrate in neurons 2

*c Hs.12185 45 (1.8123) PPP1R16A 8q24.3 protein phosphatase 1, regulatory (inhibitor) subunit 16A

b Hs.226795 42 (1.6915) GSTP1 11q13 glutathione S-transferase pi

*c Hs.84120 40 (1.6110) MGC13016 3q21.3 hypothetical protein MGC13016

C Hs.182426 38 (1.5304) RPS2 16p13.3 ribosomal protein S2

*c Hs.301866 38 (1.5304) KIAA0819 22q11.1 KIAA0819 protein

*c Hs.172801 37 (1.4901) IARS 9q21 isoleucine-tRNA synthetase

*c Hs.238296 36 (1.4499) GGA1 22q13.31 golgi associated, gamma adaptin ear containing, ARF binding

protein 1

*c Hs.177592 35 (1.4096) RPLP1 15q22 ribosomal protein, large, P1

c Hs.30002 35 (1.4096) SH3GLB2 9q34 SH3-domain GRB2-like endophilin B2

*c Hs.111244 34 (1.3693) RTP801 10pter-q26.12 HIF-1 responsive RTP801

*c Hs.183842 33 (1.3290) UBB 17p12-p11.2 ubiquitin B

*c Hs.178551 25 (1.0068) RPL8 8q24.3 ribosomal protein L8

(continued on next page)

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1029

Table 3 (continued)

UniGene Copies (%) Gene Chromosome Title

S8SNU638

*c Hs.209119 25 (1.0068) AGPAT2 9q34.3 1-acylglycerol-3-phosphate O-acyltransferase 2

(lysophosphatidic acid acyltransferase, beta)

*c Hs.254211 24 (0.9666) LOC135763 6p21.1 taube nuss

c Hs.350108 24 (0.9666) RPLP0 12q24.2 ribosomal protein, large, P0

*c Hs.237924 22 (0.8860) LOC51629 17q12 CGI-69 protein

*c Hs.290070 21 (0.8458) GSN 9q33 gelsolin (amyloidosis, Finnish type)

*c Hs.154890 19 (0.7652) FACL2 4q34-q35 fatty-acid-Coenzyme A ligase, long-chain 2

*c Hs.18069 19 (0.7652) LGMN 14q32.1 legumain

b Hs.181165 19 (0.7652) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

*c Hs.155342 19 (0.7652) PRKCD 3p21.31 protein kinase C, delta

c Hs.350046 17 (0.6847) RPL23A 17q11 ribosomal protein L23a

*c Hs.2055 16 (0.6444) UBE1 Xp11.23 ubiquitin-activating enzyme E1 (A1S9T and BN75 temperature

sensitivity complementing)

*c Hs.109428 16 (0.6444) ABT1 6p21.31 TATA-binding protein-binding protein

c Hs.180946 16 (0.6444) RPL5 1p22.1 ribosomal protein L5

*c Hs.283664 16 (0.6444) ASPH 8q12.1 aspartate beta-hydroxylase

*c Hs.4147 15 (0.6041) TRAM 8q13.1 translocating chain-associating membrane protein

*c Hs.288042 15 (0.6041) FLJ14299 8p11.21 hypothetical protein FLJ14299

S9SNU601

b Hs.181165 284 (4.1142) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.2795 153 (2.2164) LDHA 11p15.4 lactate dehydrogenase A

c Hs.288061 151 (2.1875) ACTB 7p15-p12 actin, beta

c Hs.334842 116 (1.6804) K-ALPHA-1 12q12-12q14.3 tubulin, alpha, ubiquitous

*c Hs.111334 43 (0.6229) FTL 19q13.3-q13.4 ferritin, light polypeptide

S12SNU216

c Hs.62954 131 (1.7160) FTH1 11q13 ferritin, heavy polypeptide 1

b Hs.181165 112 (1.4671) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.242463 97 (1.2706) KRT8 12q13 keratin 8

c Hs.288061 77 (1.0086) ACTB 7p15-p12 actin, beta

c Hs.169476 72 (0.9431) GAPD 12p13 glyceraldehyde-3-phosphate dehydrogenase

c Hs.182426 66 (0.8646) RPS2 16p13.3 ribosomal protein S2

*c Hs.79361 58 (0.7598) KLK6 19q13.3 kallikrein 6 (neurosin, zyme)

*c Hs.342389 53 (0.6943) PPIA 7p13-p11.2 peptidylprolyl isomerase A (cyclophilin A)

b Hs.2186 49 (0.6419) EEF1G 11q13.2 eukaryotic translation elongation factor 1 gamma

S21SNU520

b Hs.181165 117 (1.9724) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

b Hs.91011 70 (1.1800) AGR2 7p21.3 anterior gradient 2 homolog (Xenepus laevis)

c Hs.242463 66 (1.1126) KRT8 12q13 keratin 8

b Hs.1867 63 (1.0620) PGC 6p21.3-p21.1 progastricsin (pepsinogen C)

c Hs.169476 60 (1.0115) GAPD 12p13 glyceraldehyde-3-phosphate dehydrogenase

c Hs.198281 40 (0.6743) PKM2 15q22 pyruvate kinase, muscle

c Hs.179661 38 (0.6406) FKBP1A 20p13 FK506 binding protein 1A (12kD)

c Hs.348557 38 (0.6406) TUBA6 12q12-q14 tubulin alpha 6

S13KMS5

b Hs.181165 181 (3.3248) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.288061 119 (2.1859) ACTB 7p15-p12 actin, beta

c Hs.182426 69 (1.2675) RPS2 16p13.3 ribosomal protein S2

b Hs.77961 65 (1.1940) HLA-B 6p21.3 major histocompatibility complex, class I, B

c Hs.62954 62 (1.1389) FTH1 11q13 ferritin, heavy polypeptide 1

c Hs.242463 61 (1.1205) KRT8 12q13 keratin 8

c Hs.169476 58 (1.0654) GAPD 12p13 glyceraldehyde-3-phosphate dehydrogenase

c Hs.348557 45 (0.8266) TUBA6 12q12-q14 tubulin alpha 6

b Hs.2186 42 (0.7715) EEF1G 11q13.2 eukaryotic translation elongation factor 1 gamma

c Hs.334842 36 (0.6613) K-ALPHA-1 12q12-12q14.3 tubulin, alpha, ubiquitous

c Hs.180946 35 (0.6429) RPL5 1p22.1 ribosomal protein L5

S14K402

n Hs.293441 124 (2.1372) – – Homo sapiens SNC73 protein (SNC73) mRNA, complete cds

*n Hs.423 119 (2.0510) PAP 2p12 pancreatitis-associated protein

N.-S. Kim et al. / Genomics 83 (2004) 1024–10451030

Table 3 (continued)

UniGene Copies (%) Gene Chromosome Title

S14K402

n Hs.156110 93 (1.6029) IGKC 2p12 immunoglobulin kappa constant

*n Hs.1032 80 (1.3788) REG1A 2p12 regenerating islet-derived 1 alpha (pancreatic stone protein,

pancreatic thread protein)

n Hs.69319 71 (1.2237) LOC56287 1 CA11

b Hs.91011 54 (0.9307) AGR2 7p21.3 anterior gradient 2 homolog (Xenepus laevis)

b Hs.1867 52 (0.8962) PGC 6p21.3-p21.1 progastricsin (pepsinogen C)

b Hs.181165 49 (0.8445) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

n Hs.279860 44 (0.7584) TPT1 13q12-q14 tumor protein, translationally-controlled 1

*n Hs.350470 42 (0.7239) TFF1 21q22.3 trefoil factor 1 (breast cancer, estrogen-inducible sequence

expressed in)

n Hs.181125 41 (0.7067) IGLJ3 22q11.1-q11.2 immunoglobulin lambda joining 3

S18N669761

n Hs.293441 127 (7.0243) – – Homo sapiens SNC73 protein (SNC73) mRNA, complete cds

*n Hs.21396 80 (4.4248) MGC4796 1p34.2 hypothetical protein MGC4796

n Hs.300697 53 (2.9314) IGHG3 14q32.33 immunoglobulin heavy constant gamma 3 (G3m marker)

*n Hs.84298 49 (2.7102) CD74 5q32 CD74 antigen (invariant polypeptide of major

histocompatibility complex, class II antigen-associated)

b Hs.181165 46 (2.5442) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

b Hs.2186 41 (2.2677) EEF1G 11q13.2 eukaryotic translation elongation factor 1 gamma

*n Hs.119598 41 (2.2677) RPL3 22q13 ribosomal protein L3

n Hs.156110 35 (1.9358) IGKC 2p12 immunoglobulin kappa constant

*n Hs.181244 29 (1.6040) HLA-A 6p21.3 major histocompatibility complex, class I, A

n Hs.350038 29 (1.6040) – – Homo sapiens, clone IMAGE:4296901, mRNA

n Hs.69319 28 (1.5487) LOC56287 1 CA11

*n Hs.159161 26 (1.4381) ARHGDIA 17q25.3 Rho GDP dissociation inhibitor (GDI) alpha

*n Hs.153998 25 (1.3827) CKMT1 15q15 creatine kinase, mitochondrial 1 (ubiquitous)

*n Hs.135281 25 (1.3827) ALP 4q35 alpha-actinin-2-associated LIM protein

*n Hs.5372 25 (1.3827) CLDN4 7q11.23 claudin 4

*n Hs.16258 25 (1.3827) RAB24 1 Homo sapiens, clone MGC:29471 IMAGE:4329216, mRNA,

complete cds

*n Hs.6191 24 (1.3274) DKFZP762I166 11q12.2 hypothetical protein DKFZp762I166

*n Hs.284146 23 (1.2721) DKFZp762N0610 1p36.13 hypothetical protein DKFZp762N0610

*n Hs.83347 21 (1.1615) AAMP 2q36.1 angio-associated, migratory cell protein

*n Hs.4943 21 (1.1615) MAGED2 Xp11.4-p11.1 melanoma antigen, family D, 2

*n Hs.21223 20 (1.1062) CNN1 19p13.2-p13.1 calponin 1, basic, smooth muscle

*n Hs.279818 18 (0.9956) AF093680 16q21 similar to mouse Glt3 or D. malanogaster transcription

factor IIB

*n Hs.159177 17 (0.9403) LIPF 10q23.33 lipase, gastric

n Hs.153261 16 (0.8850) IGHM 14q32.33 immunoglobulin heavy constant mu

n Hs.181125 16 (0.8850) IGLJ3 22q11.1-q11.2 immunoglobulin lambda joining 3

*n Hs.273415 15 (0.8296) ALDOA 16q22-q24 aldolase A, fructose-bisphosphate

*n Hs.301175 14 (0.7743) RAC2 22q13.1 ras-related C3 botulinum toxin substrate 2 (rho family, small

GTP binding protein Rac2)

*n Hs.6449 13 (0.7190) FLJ20542 1p36.33 hypothetical protein FLJ20542

b Hs.1867 13 (0.7190) PGC 6p21.3-p21.1 progastricsin (pepsinogen C)

*n Hs.75337 12 (0.6637) NOLC1 10q24.32 nucleolar and coiled-body phosphprotein 1

*n Hs.192023 11 (0.6084) EIF3S2 1p34.1 eukaryotic translation initiation factor 3, subunit 2 (beta, 36kD)

*n Hs.183153 11 (0.6084) ARF4L 17q12-q21 ADP-ribosylation factor 4-like

*n Hs.30928 11 (0.6084) TOMM40 19q13 translocase of outer mitochondrial membrane 40

homolog (yeast)

S19N665307

n Hs.350038 88 (6.7176) – – Homo sapiens, clone IMAGE:4296901, mRNA

n Hs.293441 78 (5.9542) – – Homo sapiens SNC73 protein (SNC73) mRNA, complete cds

*n Hs.177516 39 (2.9771) HDLBP 2q37 high density lipoprotein binding protein (vigilin)

*n Hs.61265 30 (2.2901) LOC131177 3p21.1 hypothetical protein BC015359

*n Hs.334808 24 (1.8321) MGC4189 17p13.2 hypothetical protein MGC4189

*n Hs.249495 23 (1.7557) HNRPA1 12q13.1 heterogeneous nuclear ribonucleoprotein A1

*n Hs.5613 19 (1.4504) – 22 Homo sapiens mRNA; cDNA DKFZp564E2222 (from clone

DKFZp564E2222)

*n Hs.41045 18 (1.3740) – – Homo sapiens mRNA for FLJ00067 protein, partial cds

(continued on next page)

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1031

Table 3 (continued)

UniGene Copies (%) Gene Chromosome Title

S19N665307

*n Hs.74497 17 (1.2977) NSEP1 1p34 nuclease sensitive element binding protein 1

*n Hs.151604 15 (1.1450) RPS8 1p34.1-p32 ribosomal protein S8

n Hs.279860 15 (1.1450) TPT1 13q12-q14 tumor protein, translationally-controlled 1

b Hs.14376 14 (1.0687) ACTG1 17q25 actin, gamma 1

*n Hs.182278 14 (1.0687) CALM2 2p21 calmodulin 2 (phosphorylase kinase, delta)

*n Hs.32916 14 (1.0687) NACA 12q23-q24.1 nascent-polypeptide-associated complex alpha polypeptide

b Hs.77961 14 (1.0687) HLA-B 6p21.3 major histocompatibility complex, class I, B

*n Hs.331803 13 (0.9924) – 2 ESTs, Highly similar to CH60_HUMAN 60 KDA HEAT

SHOCK PROTEIN, MITOCHONDRIAL PRECURSOR

*n Hs.42743 13 (0.9924) HSPC148 11q21 hypothetical protein HSPC148

*n Hs.180859 13 (0.9924) LOC51142 7p14.1 16.7 Kd protein

*n Hs.16762 12 (0.9160) – 3 Homo sapiens mRNA; cDNA DKFZp564B2062 (from clone

DKFZp564B2062)

*n Hs.699 12 (0.9160) PPIB 15q21-q22 peptidylprolyl isomerase B (cyclophilin B)

n Hs.156110 12 (0.9160) IGKC 2p12 immunoglobulin kappa constant

b Hs.226795 12 (0.9160) GSTP1 11q13 glutathione S-transferase pi

*n Hs.25951 11 (0.8397) ARHGEF3 3p21-p13 Rho guanine nucleotide exchange factor (GEF) 3

*n Hs.7393 10 (0.7634) RIC-8 11p15.5 likely ortholog of mouse synembryn

*n Hs.28907 10 (0.7634) EPS8R1 19q13.42 epidermal growth factor receptor pathway substrate 8 related

protein 1

*n Hs.107979 9 (0.6870) SMP1 1p36.11 small membrane protein 1

n Hs.300697 9 (0.6870) IGHG3 14q32.33 immunoglobulin heavy constant gamma 3 (G3m marker)

n Hs.153261 8 (0.6107) IGHM 14q32.33 immunoglobulin heavy constant mu

*n Hs.119140 8 (0.6107) EIF5A 17p13-p12 eukaryotic translation initiation factor 5A

*n Hs.33264 8 (0.6107) – 1 Homo sapiens, clone IMAGE:3909623, mRNA, partial cds

S20T665307

c Hs.350108 41 (1.2163) RPLP0 12q24.2 ribosomal protein, large, P0

b Hs.181165 41 (1.2163) EEF1A1 6q14.1 eukaryotic translation elongation factor 1 alpha 1

c Hs.62954 31 (0.9196) FTH1 11q13 ferritin, heavy polypeptide 1

c Hs.182426 31 (0.9196) RPS2 16p13.3 ribosomal protein S2

c Hs.5662 31 (0.9196) GNB2L1 5q35.3 guanine nucleotide binding protein (G protein), beta

polypeptide 2-like 1

*c Hs.77039 24 (0.7120) RPS3A 4q31.2-q31.3 ribosomal protein S3A

b Hs.14376 24 (0.7120) ACTG1 17q25 actin, gamma 1

*c Hs.9614 21 (0.6230) NPM1 5q35 nucleophosmin (nucleolar phosphoprotein B23, numatrin)

*c Hs.9911 21 (0.6230) FLJ11773 12q12 hypothetical protein FLJ11773

Genes showing abundant expression in each library are listed. Total of 194 kinds of genes are presented. Abundant and rare expression are defined as z 0.6%

and V 0.1%, respectively. *c indicates gene abundant in the given library of cancer cells and primary tumor, but rare in all other libraries (110 genes); c, gene

abundant in one or more libraries from cancer cells and primary tumor, but rare in all libraries from normal cells (22 genes); *n, gene abundant in the given

library from only normal tissues (47 genes); n, gene abundant in one or more libraries from only normal tissues (8 genes); b, gene abundant in one or more

libraries from cancer cells and normal samples, respectively (7 genes).

N.-S. Kim et al. / Genomics 83 (2004) 1024–10451032

in three libraries only from normal gastric tissues, and these

are marked as ‘‘n’’ or ‘‘*n’’ in the first column of Table 3.

Eight genes (n) were abundant in at least two libraries, and

47 cDNAs (*n) were specifically present only in a given

library, suggesting that these cDNAs from normal tissues

were rare or were of low frequency in the cancer cell

libraries. For example, IGKC and SNC73 were commonly

abundant in three libraries and IGHM, IGLJ3, IM-

AGE429601, TPT1, IGHG3, and CA11 were abundant in

two. Three cDNAs from S14K402, 23 cDNAs from

S18N669761, and 21 cDNAs from S19N665307 were

observed as cDNAs abundantly expressed in each library.

We further analyzed expression profiles to identify those

genes associated with gastric carcinogenesis by examining

frequencies of the up- and down-regulated genes in each

library.

Identification of up- and down-regulated genes in gastric

cancer cells

To identify the candidate genes related to human gastric

carcinogenesis, we divided the constructed libraries into two

groups, i.e., ‘‘normal pool’’ and ‘‘cancer pool.’’ Normal pool

indicates 3 libraries of normal tissues and the cancer pool is

15 libraries of cancer cells and the single primary tumor. We

then selected 142 genes that had a frequency of over 0.1% on

average in each pool and that showed a significant difference

( p < 0.001) between two pools. We found 43 up-regulated

genes and 99 down-regulated genes specifically in the cancer

pool, as shown in Table 4.

Among the up-regulated genes in the cancer pool, clones

of GADD45B, PKM2, HSPCA, ANXA8, UBC, JUN, SDC1,

PGK1, PCK1, an EST (Hs.184727), EIF3S6IP, GPI, KLK6,

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1033

STIP1, GMPR, YWHAE, CD44, PPP1R16A, CD151, and

SKB1 had not been found previously in 8918 clones of the

normal pool (Table 1). Significant differences were observed

in gene groups associated with cell structure formation

(ACTB, K-ALPHA-1, and KRT8), the glycolysis pathway

(LDHA, GAPD, ENO1, PKM2, and PGK1), the heat shock

proteins (HSPA8, HSPCB, and HSPB1), and the signal

pathway (HMGIY, JUN, CD44, TLE1, and YWHAE). Al-

though the gene EEF1A1 was relatively abundant in the

normal pool, the frequency of this gene in the cancer pool

was significantly higher than in the normal pool.

In the case of down-regulated genes in the cancer pool, 19

genes (PAP, IMAGE 4296901, CA11, MGC4796, REG1A,

TFF1, an EST (Hs.16757), ALP, IGHM, GAS, CNN1,

MGC24130, RAC2, MSMB, DEFA5, HLA-DQA1, ARF4L,

SPINK1, and FLJ11068) had not been found previously in

the 53,791 clones of the cancer pool (Table 1), though some

of these genes were rare in the cancer pool. Significant

differences were observed in the groups of immunoglobulin

genes (SNC73, IGKC, IGHG3, IGLJ3, and IGHM), genes

involved in stomach digestion (PGC, LIPF, TFF1, TFF2,

GAS, MUC6, and CA11), genes involved in protein synthesis

(RPL3, RPS4, RPS8, RPL31, NACA, and RPL41), and MHC

class I genes (HLA-A and HLA-B).

To verify our data more quantitatively, we selected 14 up-

regulated genes from the cancer pool (EEFA1A, ACTB, K-

ALPHA-1, FTH1, KRT8, FKBP1A, GAPD, LDHA, PKM2,

PGK1, HSPCA, HSPA8, HSPCB, and HSPB1) and 4 com-

monly down-regulated genes from the cancer pool (SNC73,

IMAGE4296901, IGKC, and CA11) and performed semi-

quantitative RT-PCR. We found that all up-regulated genes

were highly expressed in most of the cancer cell lines or in

the single primary tumor, but were not expressed in normal

tissues, except one normal tissue (K402). On the other hand,

all down-regulated genes were detected at very low levels in

the primary tumor and in all cancer cell lines and had high

expression levels in normal tissues, as shown in Fig. 1. This

indicates that transcriptional activity by RT-PCR concurred

with the EST frequency data for all 18 genes.

Identification of the genes associated with gastric cancer

cell metastasis

We further analyzed EST frequencies between libraries of

cell lines established from the different tumor sites to identify

genes associated with metastasis. The gastric cancer cell

lines examined were divided into two groups based on the

original tumor sites: one group (‘‘primary pool’’) was of cells

from primary tumors (SNU1, SNU484, SNU719, and

SNU520) and the other group (‘‘ascites pool’’) was com-

posed of malignant ascites cells (SNU5, SNU668, SNU16,

SNU620, SNU638, and SNU601) [15,16]. We then isolated

106 genes with frequencies over 0.1% on average from each

pool and which showed a significant difference ( p < 0.001)

between two pools, as shown in Table 5. The genes in the

ascites pool were found to contain 67 up-regulated genes

(versus the primary pool), which included GADD45B,

ANAX8, JUN, PCK1, and TLE1. In addition, we observed

39 down-regulated genes in the ascites pool. In this down-

regulated case, significant differences were found for LDHA,

HSPCA, PKM2, FKBP1A, KRT8, etc. We performed semi-

quantitative RT-PCR for 7 up-regulated genes from the

ascites pool (JUN, HMGIY, LMNA, ESRRA, PLK, CD44,

and IGFBP3) and 7 commonly down-regulated genes from

the ascites pool (TMSB4X, ATP5A1, PTMA, CALM2, NET1,

FKBP1A, and GAPD). Fig. 2 shows that up-regulated genes

in the ascites pool were more abundantly expressed than in

the primary pool in this analysis. In particular, IGFBP3 was

differentially expressed in only two ascitic cancer cell lines

but the overall expression levels were more or less various

depending on cancer cell line. In the case of down-regulated

genes, no dramatic difference was observed in expression

level between the two pools, although the expression levels

were a little low in some cell lines of the ascites pool.

Discussion

Identification of the complete set of human genes and the

isolation of disease-related genes are the next major tasks of

the human genome project. Large-scale EST sequencing in

different organs, tissues, or cells of the human body has been

used to elucidate gene functioning in human cells. In

particular, the analysis of ESTs generated by cDNA libraries

has been shown to provide an extensive and quantitative

measure of the transcriptional activity of expressed genes

[18,20]. Although wide coverage of human genes in the EST

database permits parallel gene expression monitoring based

on prior knowledge of gene sequences, the comprehensive

analysis of ESTs in human gastric cancers has not been

reported. Here we have undertaken the large-scale sequenc-

ing of cDNA libraries of human gastric cancer cell lines and

tissues as the first step in the investigation of genes associ-

ated with human gastric carcinogenesis. A preliminary

profile of the genes expressed in gastric cancer cells and

tissues was built based on the analysis of 95,255 ESTs. The

cluster number per total number of ESTs was estimated to be

about 20% (19,124/95,255) from transcript analysis using

the CAP3 program or about 11% (10,418/92,545) from the

gene analysis using UniGene Build 151. The complexity of

the collected genes, on average, was almost the same as that

of the Cap-trapper cDNA libraries collated by normalization

and subtraction [27]. Compared to the known genes, 52% of

our clusters turned out to have at least one full-length cDNA

clone with an initiation codon (data not shown).

In this study, our initial goal was to explore quantitatively

the transcriptional activity of the expressed genes in gastric

cancer cells. When we compared EST frequencies for each

cDNA library from cancer cell lines and normal tissues, we

identified 194 abundant genes with a frequency of over 0.6%

in each cDNA library. Only a small portion of the abundant

genes (7 genes) commonly appeared in both gastric cancer

Table 4

Genes showing significant difference (P* > 0.999) in expression between normal and cancer pools

N.-S

.Kim

etal./Genomics

83(2004)1024–1045

1034

(continued

onnextpage)

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1035

Table

4(continued)

N.-S. Kim et al. / Genomics 83 (2004) 1024–10451036

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1037

cells and normal tissues. Most of the abundant genes were

library specific: of 110 genes from cancer cells and 47 genes

from normal tissues, 22 genes of 110 and 8 genes of 47 were

commonly present within the libraries of cancer cells and

normal tissues, respectively. This result indicates that gene

functions are diversified in cells and may differ from cell to

cell and that cancer-specific genes or highly expressed genes

in cancer cells can be effectively identified. It has been

reported that genetic alterations in the original tumors are

sustained during the establishment of immortalized cancer

cells and that genetic differences between the cancer cell line

and the original tumor are no greater than those between the

different areas of the original tumor [28]. This presumption

was reconfirmed by our observation that the global expres-

sion profile of a primary gastric tumor (T665307) was very

similar to those of the gastric cancer cell lines (as shown in

Table 4). Therefore, the expression profiles of abundant

genes in different gastric cancer cells may be valuable for

the understanding of the diverse clinical properties of gastric

cancers, such as their different histological types, metastatic

statuses, and responsiveness to chemotherapy.

When the expression profiles of the cancer pool and the

normal pool were compared in terms of EST frequency,

significant differences were found in some of the gene

groups. That is, ACTB, K-ALPHA-1, and KRT8, which are

involved in cell structure formation; LDHA, GAPD, ENO1,

PKM2, and PGK1 in the glycolysis pathway; HSPA8,

HSPCB, and HSPB1 in heat shock proteins; and HMGIY,

JUN, CD44, TLE1, and YWHAE in the signal pathway were

highly represented in the cancer pool. The up-regulations of

some of these genes have been reported previously in other

cancers:GAPD in breast cancer [29] and renal cell carcinoma

[30], PYCR in prostate carcinoma [31], and KLK6 in ovarian

carcinoma [32]. In particular, it is intriguing that the genes

involved in the glycolysis pathway were up-regulated in the

cancer pool, because this result correlates well with the

observation that cancer cells utilize anaerobic metabolism,

even in the presence of sufficient oxygen, and produce high

levels of lactate and pyruvate. This phenomenon is known as

the Warburg effect [33]. LDHA has been described as a c-

Myc-responsive gene [34] and this has been supported by

previous findings that oncogenes such as ras, src, and myc

enhance aerobic glycolysis by increasing the expression of

glycolytic enzyme [35]. The up-regulation of these genes in

the cancer pool could be confirmed by data obtained by the

serial analysis of gene expression (SAGE) (http://cgap.nci.

nih.gov/Tissues/). Therefore, the determined expressions of

GAPD, ENO1, PKM2, PGK1, and LDHA in this study

support the notion that cancer cells are selected in terms of

their acquired ability to perform aerobic glycolysis. It is likely

that the enhancement of the aerobic glycolysis metabolism is

correlated with the oncogenesis of human gastric cancer.

EEF1A1, which is known to be a cofactor of eukaryotic

protein synthesis, had the highest frequency in the cancer

pool in this study. The overexpression of this gene has also

been found in human and rat gastric cancers [36]. These

observations coincide well with the finding that increased

EEF1A1 expression is related to increased cell proliferation

[37] and oncogenic transformation [38]. SDC1, which is

known to be a heparan sulfate-bearing proteoglycan, was

also highly expressed in our cancer pool. Although this

gene has been reported to be down-regulated during ma-

lignancy in various human cancers, including hepatocellular

carcinoma [39], head and neck carcinoma [40], and colo-

rectal carcinoma [41], its expression was significantly

increased in Wnt-1-induced mammary tumorigenesis in

mice [42] and in adenocarcinoma of the pancreas [43].

Thus, it is likely that SDC1 expression in tumor cells may

depend on the developmental stage or type of tumor.

On the other hand, PGC, LIPF, TFF1, TFF2, GAS, and

MUC6, which are involved in digestion; RPL3, RPS4, RPS8,

RPL31, and RPL41, which are involved in protein synthesis;

SNC73, IGKC, IGHG3, and IGHM, which are immunoglob-

ulin genes; and HLA-A and HLA-B in MHC class I were

highly down-regulated in the cancer pool compared to the

normal pool. In particular, no copy of CA11, TFF1, IGHM,

or GAS was found in the cancer pool. CA11, which has an

unknown function but exhibits 98% homology with human

secreted frizzled-related protein (hsFRP), has been previous-

ly reported to be dramatically down-regulated in gastric

cancer [44,45]. Trefoil factors such as TFF1 and TFF2,

which are normally synthesized in the gastrointestinal mu-

cosa, also have been described to be down-regulated in

gastric cancers [45]. These trefoil factors are known to

promote cell migration and to protect and heal the mucosa

and are tumor-suppressor candidates [46]. PGC, which is

involved in the digestion of proteins in the stomach, has been

reported to be down-regulated in breast cancers [47], prostate

cancers [48], and hepatocellular carcinomas [49]. Semiquan-

titative RT-PCR analysis showed that our observations of the

up- or down-regulation of genes in the cancer pool corre-

spond well with the expression levels of mRNAs in the

cancer cells and normal tissues analyzed in the present

experiment. Therefore, these results imply that EST frequen-

cy analysis can efficiently identify genes expressed differ-

entially in human gastric cancer.

Advanced gastric cancer is often accompanied by metas-

tasis to the peritoneum, resulting in a high mortality rate.

Although it is known that adhesion molecules or apoptosis-

related genes may play a pivotal role in the metastasis of

gastric cancer [50,51], the mechanisms of gastric cancer

metastasis are not fully understood, because metastasis

involves multiple steps and requires a combination of altered

gene expressions. Thus, global analysis of the differential

gene expression of gastric cancer cell lines established from

primary tumors (primary pool) and from other cell lines

established from metastasis to the peritoneal cavity (ascites

pool) should provide information on the mechanism of

gastric cancer metastasis. Our examinations showed that in

total 106 genes were specifically up- or down-regulated in the

ascites pool. Of the up-regulated genes in ascites pool,

ESRRA, which is a subgroup of the steroid/thyroid/retinoid

N.-S. Kim et al. / Genomics 83 (2004) 1024–10451038

receptor family, has been reported to be associated with

metastatic potential in human gastric cancers [52] and in

endometrial cancers [53]. KRT18 plays an important role in

tumor cell invasion by increasing cell flexibility and motility

[54].CD44, which is a cell adhesion molecule and a signaling

receptor, has been described as a facilitator of intralymphatic

space invasion and lymph node metastasis [55]. The expres-

sion of CD44 affects not only tumor growth but also metas-

tasis in human colon carcinomas [56], mammary carcinomas

[57], and osteosarcomas [58]. SDC1, which is a cell surface

proteoglycan, has been suggested to be involved in facilitat-

ing the distant metastasis of tumor cells once they manage to

enter the bloodstream [59]. GADD45B, which constitutes a

family of evolutionarily conserved, small, acidic, nuclear

proteins, has been implicated in terminal differentiation,

growth suppression, and apoptosis [60]. GSN, which is an

actin-binding/severing protein, is a major regulator of the

formation and function of the actin cytoskeleton in most cells

[61]. Thus, it is reasonable to believe that these genes may be

related to the peritoneal dissemination of gastric cancers. The

results of this global gene expression analysis of gastric

cancer cells with peritoneal dissemination promises to pro-

vide a new insight into the study of human gastric cancer

peritoneal dissemination.

Recently the application of high-density DNA micro-

array technology or SAGE has facilitated the determination

of the expression levels of thousands of genes in a single

experiment. This technology is a powerful tool for analyz-

ing genes whose expressions can be correlated with the

pathological phenotypes of different tumors. Several groups

have reported on the results of expression-profile analyses

in gastric cancers using these techniques [8–17]. Interest-

ingly, a marked attenuation of trefoil factors, which were

down-regulated genes in our cancer pool, has been de-

scribed in previous works of microarray studies [11,13,15]

or SAGE study [17]. Expression patterns of other down-

regulated genes are in good concordance with the findings

of previous reports: LOC56287, IGHM, SURF4, REG1A,

and RNASE1 [11], PGC [11,13], and MUC6 and CLDN4

[13] from cDNA microarray studies and CA11, GAS,

SNC73, IGHG3, and IGHM [17] from the SAGE study.

On the other hand, overexpression of up-regulated genes in

our cancer pool agrees well with the cDNA microarray data

of RLP10 and HSPCB [11], GAPD and LDHA [14], and

HSPCA [14,15]. In addition, the up-regulation of CD44 in

our ascites pool is in agreement with the previous result

[12], which was performed to identify genes related to the

peritoneal dissemination of gastric cancer. These results

Fig. 1. The semiquantitative RT-PCR for the genes randomly selected from the p

extracted from normal tissues (S14, K402; S17, N258215; S18, N669761; S19, N

SNU1; S5, SNU484; S6, SNU620; S7, SNU719; S8, SNU638; S9, SNU601; S12

then used as templates for real-time RT-PCR on a LightCycler, according to the m

Up-regulated genes in the cancer pool; (o)– (r) down-regulated genes in the can

calculated relative to the average amount of target gene in normal tissues for (a)–

h2-MG. The amount of h2-MG product in each cell line and tissue was calculate

strongly indicate that our expression profiles based on

EST analysis from cDNA libraries are compatible with

expression profiling performed using high-density DNA

microarray technology.

Our next goal is to collect more ESTs from gastric

cancer cells and normal tissues and to set up a cDNA

microarray system containing all gastric tissue-expressed

genes. Recently we have collected about 21,000 unique

ESTs in total (data not shown), and comprehensive cDNA

microarray analysis using these clinical samples from

several types of gastric cancers is under way. This further

study should provide a valuable resource by revealing

specific patterns that may reflect prognosis and the drug

sensitivity of tumor cells and by identifying the genes

involved in malignant transformation, progression, and the

metastasis of tumors.

Materials and methods

Sample source and RNA preparation

Eleven different cell lines established from Korean stom-

ach cancer patients, SNU1, SNU5, SNU16, SNU216,

SNU484, SNU520, SNU601, SNU620, SNU638, SNU668,

and SNU719 [25,26], were obtained from the Korean Cell

Line Bank (http://cellbank.snu.ac.kr/). KMS5 was a cell line

established from a primary gastric tumor classified as being

poorly differentiated according to histological classification.

Twelve different cell lines were maintained at 37jC in

humidified air containing 5%CO2 using RPMI 1640 medium

(Gibco BRL, Gaithersburg, MD, USA) supplemented with

10% fetal bovine serum. Five normal stomach tissues

(NK402, N181907, N258215, N665307, and N669761) and

one tumor tissue (T665307) were obtained from the College

of Medicine, ChungNam National University, Korea, with

informed consent. The obtained tissues were immediately

frozen in liquid nitrogen. Total RNAs were isolated from the

cultured cells and tissues using a commercially available

RNA isolation kit (Qiagen, Inc., Valencia, CA, USA).

Poly(A)+ RNAwas purified using the Oligotex mRNA Midi

Kit (Qiagen).

Construction of full-length, subtracted, and normalized

cDNA libraries

The full-length cDNA library was constructed by using

a capping method [22] using a dT-tailed pTZ18RP1 or

anels of up- or down-regulated genes in the cancer pool. Total RNAs were

665307) and gastric cell lines (S1, SNU5; S2, SNU668; S3, SNU16; S10,

, SNU216; S21, SNU520; S13, KMS5; S20, T665307). These RNAs were

anufacturer’s instructions (for details, see Materials and methods). (a) – (n)

cer pool. The mRNA steady-state levels of the genes from (a) to (r) were

(n) or relative to that of cancer cells for (o)– (r), after normalization against

d from a standard curve.

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1039

Table 5

Genes showing significant difference (P* > 0.999) in expression between primary and ascitic cancer cells

N.-S

.Kim

etal./Genomics

83(2004)1024–1045

1040

Genes abundant (z0.1%) in primary or ascitic cancer cells are listed. See Table 4 for the data manipulation.

N.-S

.Kim

etal./Genomics

83(2004)1024–1045

1041

Fig. 2. The semiquantitative RT-PCR for the genes randomly selected from the panels of up- or down-regulated genes in the ascitic pool. Total RNAs used were

from primary cancer cell lines (S10, SNU1; S5, SNU484, S7, SNU719, S21, SNU520) and ascitic cancer cell lines (S1, SNU5; S2, SNU668; S3, SNU16; S6,

SNU620; S8, SNU638; S9, SNU601). Real-time RT-PCR was performed by same procedure as for Fig. 1. (a) – (g) Up-regulated genes in the ascitic pool; (h)–

(n) down-regulated genes in the ascitic pool. The mRNA steady-state level of the genes from (a) to (n) were calculated relative to the average amount of target

gene in primary cancer cells for (a)– (g) or relative to that of ascitic cancer cells for (h)– (n), after normalization against h2-MG.

N.-S. Kim et al. / Genomics 83 (2004) 1024–10451042

pCNS vector (GenBank Accession No.AF416744) primer.

The universal cDNA library was prepared by using the

capping method, excluding the bacterial alkaline phospha-

tase treatment step. The full-length cDNA library was

constructed by using an oligo-capping method [23] using

a DraIII-digested pME18S-FL3 (GenBank Accession

No.AB009864).

To isolate rarely expressed genes in each cDNA library,

previously sequenced highly expressed genes were select-

ed and used for preparing biotinylated RNA probe accord-

ing to a previously described method [62]. Single-stranded

DNAs (0.2 Ag) prepared from these cDNA libraries were

hybridized with 1.5 Ag of biotin-labeled RNA probes at

42jC for 8 h in hybridization solution containing 70%

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1043

formamide and 0.36 M potassium phosphate buffer (pH

7.6). After hybridization, highly expressed genes were

removed by affinity binding with Vectrex Avidin D

(Vector Laboratories, Inc., Burlingame, CA, USA). The

remaining single-stranded DNAs were concentrated by

ethanol treatment and then used to prepare the subtracted

cDNA library by electroporation into Escherichia coli

Top10FV. A normalized cDNA library was also constructed

to obtain genes rarely expressed by the previous method

[24].

DNA sequencing

Plasmid DNAs were extracted by using a MWG

Plasmidprep 96 (MWG Biotech., Ebersberg, Germany).

Sequencing reactions were performed on a GeneAmp PCR

System 9700 thermal reactor (Perkin–Elmer, Inc., Boston,

MA, USA) using a BigDye Terminator Sequencing kit

with adequate sequencing primer for each cDNA library.

After the unincorporated dye terminator was removed

from the sequencing reaction, the reaction products were

applied to an ABI Prism 3700 DNA analyzer (PE Applied

Biosystems).

Semiquantitative RT-PCR

The reverse transcription reaction was performed in

mixtures containing 5 Ag of the isolated RNA, 2 pmol of

oligo(dT)11, 1 Al of 10 mM dNTP, 4 Al of 5� buffer, 2 Al of100 mM DTT, 1 Al of RNase (40 U/ml), and 1 Al of

Superscript reverse transcriptase (200 U/ml; Gibco BRL) at

42jC for 1 h and diluted up to 500 Al. PCR was carried out in

a 20-Al reaction mixture containing 2 Al of diluted RT

reaction solution, 2 mM MgCl2, 0.5 AM each primer, and

2 Al of 10� SYBR Green I using a LightCycler (Roche Ltd.,

Basel, Switzerland). The PCR conditions were as recom-

mended by the manufacturer except for the elongation

condition in the amplification step: 72jC, 30 s. The PCR

primers were designed from the coding region of each gene;

their product sizes ranged from 300 to 600 bp. The sequences

of primers for validation of the up- and down-regulated

genes from the cancer pool more quantitatively were as

follows: EEF1A1 forward primer, 5V-TGACCCACCAATG-GAAGC-3V, and reverse, 5V-AGCCTTCTGAGCTTTCTGG-3V; ACTB forward primer, 5V-CTGGCACCCAGCACA-ATG-3V, and reverse, 5V-CACCTCCCCTGTGTGGAC-3V;K-ALPHA-1 forward primer, 5V-GGCTTCAAGGTTGG-CATC-3V, and reverse, 5V-CCTCTCCTTCTTCCTCACC-3V;FTH1 forward primer, 5V-CATGACGACCGCGTCCA-3V,and reverse, 5V-TCACACAAATGGGGGTCA-3V; KRT8 for-

ward primer, 5V-CTGCTGAGGGCGAGGA-3V, and reverse,

5V-CAGCGCAGGAGGGGTAG-3V; FKBP1A forward

primer, 5V-GCTGCTGTCTTCCGTGGT-3V, and reverse,

5V-TGCCTTAGGCCTCCTCTT-3V; GAPD forward primer,

5V-CCAAGGCTGTGGGCAAG-3V, and reverse, 5V-ACCAGGAAATGAGCTTGACA-3V; LDHA forward prim-

er, 5V-ATGGCAACTCTAAAGGATCAGC-3V, and reverse,

5V-GCAACTTGCAGTTCGGGC-3V; PKM2 forward primer,

5V-ATGTCGAAGCCCCATAGTGA-3V, and reverse, 5V-TCCACCTCTGCAGTGCC-3V; PGK1 forward primer, 5V-ATGTCGCTTTCTAACAAGCTGA-3V, and reverse, 5V-GCGGAGGTTCTCCAGCA-3V; HSPCA forward primer,

5V-ATGCCTGAGGAAACCCAGA-3V, and reverse, 5V-TCTGCACCAGCCTGCAA-3V; HSPA8 forward primer, 5V-CGTATGGTCCAGGAAGCTG-3V, and reverse, 5V-ATCAACCTCTTCAATGGTGGG-3V; HSPCB forward

primer, 5V-GTGACCAGCACCTACGGC-3V, and reverse,

5V-ATCGACTTCTTCCATGCGAG-3V; HSPB1 forward

primer, 5V-TCCCCTTCTCGCTCCTG-3V, and reverse, 5V-ATGTAGCCATGCTCGTCCT-3V; SNC73 forward primer,

5V-CAGGATGTGACTGTGCCC-3V, and reverse, 5V-GGCAGCAGTGCAAGTGAA-3V; IMAGE4296901 for-

ward primer, 5V-CCCACCCTGGTAGATGAAC-3V, and re-

verse, 5V-GGTGTCAGAGATGCCTCCA-3V; IGKC forward

primer, 5V-CCATTCACTTTCGGCCC-3V, and reverse, 5V-GGCTGGAACTGAGGAGCA-3V; CA11 forward primer,

5V-ATGCTTGCCTACTCCTCTGTC-3V, and reverse, 5V-GTTCTCCACCGTGTCTCCA-3V. The sequences of pri-

mers for validation of the up- and down-regulated genes

from the ascites pool more quantitatively were as follows:

JUN forward primer, 5V-GGTATCCTGCCCAGTGTTGT-3V,and reverse, 5 V-CTCCAGCCTCCTGAAACATC-3V;HMGIY forward primer, 5V-GCATCCCAGCCATCACTC-3V, and reverse, 5V-AGCGGAGCAAAGCTGTC-3V; LMNA

forward primer, 5V-AAAAGCGCAAACTGGAGTC-3V, andreverse, 5V-CTCATCCTCGTCGTCCTCA-3V; ESRRA for-

ward primer, 5V-GGCCTTCGCTGAGGACT-3V, and reverse,

5V-GTCCATCATGGCCTCGAG-3V; PLK forward primer, 5V-ACGGCACTGAGTCCTACCTC-3V, and reverse, 5V-GGAGGCCTTGAGACGGTT-3V; CD44 forward primer,

5V-GGCCAGCAAGTCTCAGGA-3V, and reverse, 5V-AGGCCTCCAAGTGGGAAC-3V; IGFBP3 forward primer,

5V-TCTGCGTCAACGCTAGTGC-3V, and reverse, 5V-CGCTTCCTGCCTTTGGA-3V; TMSB4X forward primer,

5 V-CCTCCGCAACCATGTCT-3 V, and reverse, 5 V-GCACGCCTCATTACGATT-3V; ATP5A1 forward primer,

5V-ATGCTGTCCGTGCGCGTT-3V, and reverse, 5V-CCA-CAATGGCTCCTGTCC-3V; PTMA forward primer, 5V-ATGTCAGACGCAGCCGTAG-3V, and reverse, 5V-GT-

CATCCTCGTCGGTCTTCT-3V; CALM2 forward primer,

5V-ATGGCTGACCAACTGACTGAA-3V, and reverse, 5V-CTTTGCTGTCATCATTTGTACA-3V; NET1 forward

primer, 5V- ATGGTGGCACATGATGAGACT-3V, and re-

verse, 5V-CCTGTTCACCTCGGGACA-3V; the primers for

FKBP1A and GAPD are same as those that were used to

verify the genes in the cancer pool as above. The mRNA

steady-state level, which is a balance between transcription

and degradation, of each gene was calculated as relative to the

average amount of the target gene in normal tissues for up-

regulated genes in the cancer pool or relative to that of cancer

cells for down-regulated genes in cancer pool. Also the

mRNA steady-state level of each gene for up- and down-

N.-S. Kim et al. / Genomics 83 (2004) 1024–10451044

regulated genes in the ascites pool was calculated as relative

to those of the primary pool, after normalization against a

h2-MG standard.

Bioinformatic analysis of ESTs

Base-calling and quality assessment were performed with

the PHRED program. Vector and linker sequences were

located using the FASTA program and were subsequently

removed. Low-quality bases defined by the PHRED program

were trimmed from both ends of the EST sequences. ESTs of

at least 100 bp after both vector and low-quality trimming

were regarded as ‘‘high-quality’’ ESTs. Human repetitive

elements and low complexity regions were masked using the

RepeatMasker program. Those EST sequences having an

identity of at least 90% with human mitochondrial DNAs

or ribosomal RNAs over at least 90 bp were also excluded.

The annotation of ESTs was carried out by stand-alone

BLAST programs and locally installed databases. The indi-

vidual ESTs were searched against the human mRNA subset

extracted from the GenBank database and then against the

UniGene database (Hs.seq.all, Build 151) for similarity

comparisons using BLASTN. For protein similarity assess-

ment, a comparison was performed against the nonredundant

protein database using BLASTX. Contig assembly was

performed with the EST sequences and quality values as

input data by using the CAP3 program. The high-quality

gastric ESTs identified were submitted to the GenBank

database.

Gene expression analysis

The frequency for each gene was analyzed by dividing

the number of ESTs of a gene by the number of total

clones merged into the UniGene database Build 151 in

each library. Libraries with more than 1000 EST sequences

were subjected to gene expression analysis. Subtracted or

normalized libraries were excluded from the analysis. The

genes abundantly expressed in each cDNA library were

selected and listed from among ESTs showing an expres-

sion frequency of z0.6%. Significant differences of gene

expression between the data sets were calculated by the

method described previously [63]. Analysis of expression

differences between the normal pool and the cancer pool

was performed at a cut-off probability of 0.001. The same

analysis was also performed in the primary pool and the

ascites pool. The gene list was sorted according to gene

frequency in the pool overexpressing the gene. The

classification of known genes was performed using the

Gene Ontology (GO) database (http://www.geneontolo-

gy.org/#godatabase). Each UniGene cluster was mapped

to LocusLink (as of June 13, 2002) and then assigned

with GO IDs as in the LocusLink database. The GO

database used for functional classification was ‘‘proces-

s.ontology’’ for biological process version 2.427 (as of

June 2002).

Acknowledgments

This work was supported by Grants FG-5-1-01 and FG-5-

4 of the 21C Frontier Functional Human Genome Project

from the Ministry of Science and Technology of Korea. We

thank Dr. Sumio Sugano (Institute of Medical Science,

University of Tokyo) for technical support in the construc-

tion of the full-length cDNA library and Dr. Marcelo Bento

Soares (Iowa University) for technical support during the

construction of the normalized cDNA library. We also

appreciate the help of Ms. Hee-Young Ahn, Ms. Mi-Rang

Kim, Ms. Mi-Young Chu, and Ms. Wan Song for their

excellent technical support in EST sequencing.

References

[1] A.I. Neugut, M. Hayek, G. Howe, Epidemiology of gastric cancer,

Semin. Oncol. 23 (1996) 281–291.

[2] D.H. Roukos, Current status and future perspectives in gastric cancer

management, Cancer Treat. Rev. 26 (2000) 243–255.

[3] H. Yokozaki, W. Yasui, E. Tahara, Genetic and epigenetic changes in

stomach cancer, Int. Rev. Cytol. 204 (2001) 49–95.

[4] W.S. Park, et al., Frequent somatic mutations of the h-catenin gene in

intestinal-type gastric cancer, Cancer Res. 59 (1999) 4257–4260.

[5] G. Berx, K.F. Becker, H. Hofler, F. van Roy, Mutation of the human

E-cadherin (CDH1) gene, Hum. Mutat. 12 (1998) 226–237.

[6] W.S. Park, et al., Somatic mutations of the trefoil factor family 1 gene

in gastric cancer, Gastroenterology 119 (2000) 691–698.

[7] J.H. Lee, et al., A novel germ line juxtamembrane Met mutation in

human gastric cancer, Oncogene 19 (2000) 4947–4953.

[8] Y. Hippo, et al., Global gene expression analysis of gastric cancer by

oligonucleotide microarrays, Cancer Res. 62 (2002) 233–240.

[9] S. Lee, et al., Identification of genes differentially expressed between

gastric cancers and normal gastric mucosa with cDNA microarrays,

Cancer Lett. 184 (2002) 197–206.

[10] J. Ji, et al., Comprehensive analysis of the gene expression profiles in

human gastric cancer cell lines, Oncogene 21 (2002) 6549–6556.

[11] S. Hasegawa, et al., Genome-wide analysis of gene expression

in intestinal-type gastric cancers using a complementary DNA

microarray representing 23,040 genes, Cancer Res. 62 (2002)

7012–7017.

[12] C. Sakakura, et al., Differential gene expression profiles of gastric

cancer cells established from primary tumour and malignant ascites,

Br. J. Cancer 87 (2002) 1153–1161.

[13] X. Chen, et al., Variation in gene expression patterns in human gastric

cancers, Mol. Biol. Cell 14 (2003) 3208–3215.

[14] S.I. Meireles, et al., Differentially expressed genes in gastric tumors

identified by cDNA array, Cancer Lett. 190 (2003) 199–211.

[15] A. Boussioutas, et al., Distinctive patterns of gene expression in pre-

malignant gastric mucosa and gastric cancer, Cancer Res. 63 (2003)

2569–2577.

[16] S.T. Tay, et al., A combined comparative genomic hybridization and

expression microarray analysis of gastric cancer reveals novel molec-

ular subtypes, Cancer Res. 63 (2003) 3309–3316.

[17] K.A. Oien, J.K. Vass, I. Downie, G. Fullarton, W.N. Keith, Profiling,

comparison and validation of gene expression in gastric carcinoma

and normal stomach, Oncogene 22 (2003) 4287–4300.

[18] M.D. Adams, et al., Complementary DNA sequencing: expressed

sequence tag and human genome project, Science 252 (1991)

1651–1656.

[19] M. Marra, et al., An encyclopedia of mouse genes, Nat. Genet. 21

(1999) 191–194.

[20] K. Okubo, et al., Large scale cDNA sequencing for analysis of quan-

N.-S. Kim et al. / Genomics 83 (2004) 1024–1045 1045

titative and qualitative aspects of gene expression, Nat. Genet. 2

(1992) 173–179.

[21] M. Mao, et al., Identification of genes expressed in human CD34(+)

hematopoietic stem/progenitor cells by expressed sequence tags and

efficient full-length cDNA cloning, Proc. Natl. Acad. Sci. USA 95

(1998) 8175–8180.

[22] S. Kato, et al., Construction of a human full-length cDNA bank, Gene

150 (1994) 243–250.

[23] K. Maruyama, S. Sugano, Oligo-capping: a simple method to replace

the cap structure of eukaryotic mRNAs with oligonucleotide, Gene

150 (1994) 171–174.

[24] M. Soares, et al., Construction and characterization of a normalized

cDNA library, Proc. Natl. Acad. Sci. USA 91 (1994) 9228–9232.

[25] J.G. Park, et al., Characteristics of cell lines established from human

gastric carcinoma, Cancer Res. 50 (1990) 2773–2780.

[26] J.G. Park, et al., Establishment and characterization of human gastric

carcinoma cell lines, Int. J. Cancer 70 (1997) 443–449.

[27] P. Carninci, et al., Normalization and subtraction of Cap-trapper-se-

lected cDNAs to prepare full-length cDNA libraries for rapid discov-

ery of new genes, Genome Res. 10 (2000) 1617–1630.

[28] S.I. Bae, J.G. Park, Y.I. Kim, W.H. Kim, Genetic alteration in gastric

cancer cell lines and their original tissues, Int. J. Cancer 87 (2000)

512–516.

[29] F. Revillion, V. Pawlowski, L. Hornez, J.P. Peyrat, Glyceraldehyde-3-

phosphate dehydrogenase gene expression in human breast cancer,

Eur. J. Cancer 36 (2000) 1038–1042.

[30] M.R. Vila, A. Nicolas, J. Morote, I. de Torres, A. Meseguer, Increased

glyceraldehyde-3-phosphate dehydrogenase gene expression in renal

cell carcinoma identified by RNA-based, arbitrarily primed polymer-

ase chain reaction, Cancer 89 (2000) 152–164.

[31] T. Ernst, et al., Decrease and gain of gene expression are equally

discriminatory markers for prostate carcinoma: a gene expression

analysis on total and microdissected prostate tissue, Am. J. Pathol.

160 (2002) 2169–2180.

[32] E.P. Diamandis, G.M. Yousef, A.R. Soosaipillai, P. Bunting, Human

kallikrein 6 (zyme/protease M/neurosin): a new serum biomarker of

ovarian carcinoma, Clin. Biochem. 33 (2000) 579–583.

[33] E. Racker, M. Spector, Warburg effect revisited: merger of biochem-

istry and molecular biology, Science 213 (1981) 303–307.

[34] M. Hubank, D.G. Schatz, Identifying differences in mRNA expres-

sion by representational difference analysis of cDNA, Nucleic Acids

Res. 22 (1994) 5640–5648.

[35] C.V. Dang, G.L. Semenza, Oncogenic alterations of metabolism,

Trends Biochem. Sci. 24 (1999) 68–72.

[36] M. Tanaka, H. Sasaki, I. Kino, T. Sugimura, M. Terada, Genes pref-

erentially expressed in embryo stomach are predominantly expressed

in gastric cancer, Cancer Res. 52 (1992) 3372–3377.

[37] J. Sanders, M. Brandsma, G.M.C. Janssen, J. Dijk, W. Moller, Im-

munofluorescence studies of human fibroblasts demonstrate the pres-

ence of the complex of elongation factor-1hgy in the endoplasmic

reticulum, J. Cell Sci. 109 (1996) 1113–1117.

[38] K. Chi, D.V. Jones, M.L. Frazier, Expression of an elongation factor 1

related sequence in adenocarcinomas of the colon, Gastroenterology

103 (1992) 98–102.

[39] A. Matsumoto, et al., Reduced expression of syndecan-1 in human

hepatocellular carcinoma with high metastatic potential, Int. J. Cancer

74 (1997) 482–491.

[40] P. Inki, H. Joensuu, R. Grenman, P. Klemi, M. Jalkanen, Associ-

ation between syndecan-1 expression and clinical outcome in squa-

mous cell carcinoma of the head and neck, Br. J. Cancer 70 (1994)

319–323.

[41] R.M. Day, et al., Changes in the expression of syndecan-1 in the

colorectal adenoma-carcinoma sequences, Virchows Arch. 434

(1999) 121–125.

[42] C.M. Alexander, et al., Syndecan-1 is required for Wnt-1-induced

mammary tumorigenesis in mice, Nat. Genet. 25 (2000) 329–332.

[43] J.R. Conejo, et al., Syndecan-1 expression is up-regulated in pancre-

atic but not in other gastrointestinal cancers, Int. J. Cancer 88 (2000)

12–20.

[44] Y. Yoshikawa, H. Mukai, F. Hino, K. Asada, I. Kato, Isolation of two

novel genes, down-regulated in gastric cancer, Jpn. J. Cancer Res. 91

(2000) 459–463.

[45] K. Shiozaki, et al., Human stomach-specific gene, CA11, is down-

regulated in gastric cancer, Int. J. Oncol. 19 (2001) 701–707.

[46] O. Lefebvre, et al., Gastric mucosa abnormalities and tumorigenesis

in mice lacking the pS2 trefoil protein, Science 274 (1996) 259–262.

[47] I. Diez-Itza, et al., Expression of pepsinogen C in human breast

tumors and correlation with clinicopathologic parameters, Br. J. Can-

cer 68 (1993) 637–640.

[48] N. Konishi, et al., Expression of pepsinogen II with androgen and

estrogen receptors in human prostate carcinoma, Pathol. Int. 49

(1999) 203–207.

[49] Y. Miyasaka, et al., Analysis of differentially expressed genes in

human hepatocellular carcinoma using suppression subtractive hy-

bridization, Br. J. Cancer 85 (2001) 228–234.

[50] G. Christofori, H. Semb, The role of the cell-adhesion molecule E-

cadherin as a tumour-suppressor gene, Trends Biochem. Sci. 24

(1999) 73–76.

[51] A. Yawata, et al., Prolonged cell survival enhances peritoneal dissem-

ination of gastric cancer cells, Oncogene 16 (1998) 2681–2686.

[52] N. Takano, et al., Expression of estrogen receptor-alpha and -beta

mRNAs in human gastric cancer, Cancer Lett. 176 (2002) 129–135.

[53] J. Fujimoto, et al., Steroid receptors and metastatic potential in endo-

metrial cancers, J. Steroid Biochem. Mol. Biol. 75 (2000) 209–212.

[54] H.E. Schaafsma, et al., Ramaekers, increased expression of cytoker-

atins 8, 18 and vimentin in the invasion front of mucosal squamous

cell carcinoma, J. Pathol. 170 (1993) 77–86.

[55] D. Naor, R.V. Sionov, D. Ish-Shalom, CD44 structure, function and

association with the malignant process, Adv. Cancer Res. 71 (1997)

241–319.

[56] K. Takahashi, I. Stamenkovic, M. Cutler, H. Saya, K.K. Tanabe,

CD44 hyaluronate binding influences growth kinetics and tumor-

igenicity of human colon carcinomas, Oncogene 11 (1995)

2223–2232.

[57] Q. Yu, B.P. Toole, I. Stamenkovic, Induction of apoptosis of meta-

static mammary carcinoma cells in vivo by disruption of tumor cell

surface CD44 function, J. Exp. Med. 186 (1997) 1985–1996.

[58] G.F. Weber, et al., Absence of the CD44 gene prevents sarcoma

metastasis, Cancer Res. 62 (2002) 2281–2286.

[59] K. Hirabayashi, et al., Altered proliferative and metastatic potential

associated with increased expression of syndecan-1, Tumour Biol. 19

(1998) 454–463.

[60] M. Vairapandi, A.G. Balliet, B. Hoffman, D.A. Liebermann,

GADD45b and GADD45g are cdc2/cyclinB1 kinase inhibitors with

a role in S and G2/M cell cycle checkpoints induced by genotoxic

stress, J. Cell. Physiol. 192 (2002) 327–338.

[61] M.R. Crowley, K.L. Head, D.J. Kwiatkowski, H.L. Asch, B.B.

Asch, The mouse mammary gland requires the actin-binding pro-

tein gelsolin for proper ductal morphogenesis, Dev. Biol. 15 (2000)

407–423.

[62] N.S. Kim, S. Kato, In-tube cDNA cloning method using a biotiny-

lated RNA probe, Anal. Biochem. 217 (1994) 20–24.

[63] S. Audic, J.M. Claverie, The significance of digital gene expression

profiles, Genome Res. 7 (1997) 986–995.