A Combined Comparative Genomic Hybridization and Expression Microarray Analysis of Gastric Cancer...

9
[CANCER RESEARCH 63, 3309 –3316, June 15, 2003] A Combined Comparative Genomic Hybridization and Expression Microarray Analysis of Gastric Cancer Reveals Novel Molecular Subtypes 1,2 Su Ting Tay, 3 Siew Hong Leong, 3 Kun Yu, Amit Aggarwal, Soo Yong Tan, Chee How Lee, Keith Wong, Jaya Visvanathan, Dennis Lim, Wai Keong Wong, Khee Chee Soo, Oi Lian Kon, and Patrick Tan 4 Cellular and Molecular Research [S. T. T., K. Y., A. A., C. H. L., K. W., P. T.], Division of Medical Sciences [S. H. L., J. V., D. L., K. C. S., O. L. K.], and Defence Medical Research Institute [P. T.], National Cancer Centre, Singapore 169610, Republic of Singapore; Department of Pathology and Laboratory Medicine, Tan Tock Seng Hospital, Singapore 308433, Republic of Singapore [S. Y. T.]; and Department of General Surgery, Singapore General Hospital, Singapore, Republic of Singapore [W. K. W.] ABSTRACT Comparative genomic hybridization (CGH), microsatellite instability (MSI) assays, and expression microarrays were used to molecularly sub- classify a common set of gastric tumor samples. We identified a number of novel genomic aberrations associated with gastric cancer and discov- ered that gastric tumors could be grouped by their expression profiles into three broad classes: “tumorigenic,” “reactive,” and “gastric-like.” Pa- tients with gastric-like tumors exhibited a significantly better overall survival than patients belonging to the other two classes (P < 0.05). A novel supervised learning methodology for multiclass prediction was used to identify optimal predictor gene sets that accurately predicted the class of an unknown tumor sample. These predictor sets may prove useful in the development of new diagnostic applications for gastric cancer staging and prognostication. INTRODUCTION Gastric adenocarcinoma is a leading cause of cancer mortality worldwide, surpassed only by lung and breast cancer (1). A major difficulty in the diagnosis and treatment of gastric cancer is that very few of the currently used classification schemes are strong predictors of clinical behavior. Traditional classifications of gastric cancer on the basis of mucin content, histological architecture, and cellular differ- entiation are highly subject to interobserver variation and are, thus, neither robust nor clinically meaningful (2). To date, only tumor staging is a proven prognosticator of gastric cancer and, therapeuti- cally, only surgery has been shown to convey a survival benefit (3). Recently, it has been shown that the resolving power of classifica- tion schemes based on molecular data can be sufficiently sensitive to detect new disease subtypes that have hitherto eluded traditional light microscopy approaches (4). In this study, we used various molecular assays such as CGH, 5 MSI studies, and expression microarrays to characterize a common set of gastric tumors. We identified several novel genomic aberrations associated with gastric cancer and discov- ered that gastric cancers could be divided into three broad molecular subgroups (“tumorigenic,” “reactive,” and “gastric-like”) on the basis of their expression profiles. Patients belonging to one of these sub- groups (gastric-like) exhibited a significantly better overall survival than patients belonging to the other groups. Using a recently described novel methodology for multiclass prediction, we defined various optimal predictor gene sets capable of accurately predicting the class of an unknown tumor sample. Our results show that molecular data can provide a useful framework for furthering our understanding of the taxonomy and pathology of gastric cancer. MATERIALS AND METHODS Tissue Samples and Histological Review. Gastric tissue specimens, pe- ripheral blood samples, and clinical records were provided by the National Cancer Center Tissue Repository after approval by the Center’s Ethics Com- mittee. Three surgical samples of normal gastric tissue were also obtained from patients with benign gastric disease. Paraffin sections of the 60 gastric cancer cases in this study were independently reviewed and classified by a single pathologist (S. Y. T.) using established criteria (5). CGH. CGH was performed as described elsewhere (6). Of the 13 tumors with low CNA, all 13 (100%) had 50% tumor content. Of the 16 tumors with no CNA, 8 (50%) had 60% tumor content, and the remaining 50% had 50% tumor content. Microsatellite Analysis of Tumors. Multiplex PCR was performed at five markers (Bat25, Bat26, D5S346, D2S123, D17S250) on tumor DNA and case-matched normal genomic DNA from peripheral blood or histologically verified normal gastric tissues of 59 patients. Microsatellite stability (MSS), MSI-H, and MSI-L were scored by consensus criteria (7). Generation of Expression Profiles. cDNA microarrays of approximately 13K and 18K array targets were produced using established procedures (8) with cDNA clones from commercial vendors (Incyte and Research Genetics). Identities of array targets were confirmed by resequencing the parent clones. Expression profiles were generated from tumor specimens containing a min- imum of 50% cancer cells as assessed by cryosections. Total RNA was extracted from homogenized gastric tissue and 5 g were amplified using a single-round T7-polymerase-based linear amplification protocol (9). Each mi- croarray hybridization used 1–2 g aRNA (amplified RNA) and was compared with a common reference RNA pool (Universal Reference RNA; Stratagene). Microarray Data Analysis. Microarray data sets are downloadable. 6 An initial data set was created from array targets that were well measured across 90% of all of the arrays and normalized by median centering each sample (array) and array target (gene). A truncated data set (764 array targets) was then formed by selecting array targets exhibiting a minimal SD of 0.7 across all of the samples. Minor variations on the gene selection filter (e.g., using a SD of 0.6 – 0.8) did not significantly affect results of the clustering analysis (data not shown). 7 “Semisupervised” clustering was performed using seven distinct expression clusters whose boundaries were visually determined using Treeview software. Supervised clustering was performed using the following algorithms: OVA SVM (10), nearest neighbor correlation analysis (NNCA), and GA/MLHD; Ref. 11; see Supplementary Information 2 for details). Accuracy of the supervised classification methodologies was assessed using LVO CV. Survival Curves. Kaplan-Meier survival curves were generated using SPSS software. To maximize sample size, clinical data was used from patients whose biological samples are in Fig. 3, as well as from four additional patients whose samples could be reliably assigned to a specific tumor class (one tumorigenic, one reactive, and two gastric-like) using two independent clas- sification methodologies [OVA SVM and nearest neighbor correlation analysis (NNCA); see Supplementary Information 2 ]. Received 9/25/02; accepted 4/11/03. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1 Supported in part by National Medical Research Council, National Cancer Centre, and the Lee Foundation. 2 Supplementary data for this article are available at Cancer Research Online (http:// cancerres.aacrjournals.org). 3 S. T. T. and S. H. L. contributed equally to this work. 4 To whom requests for reprints should be addressed, at Defence Medical Research Institute, National Cancer Centre, 11 Hospital Drive, Singapore 169610, Republic of Singapore. E-mail: [email protected]. 5 The abbreviations used are: CGH, comparative genomic hybridization; CNA, copy number abnormality; MSI, microsatellite instability, HC, hierarchical clustering, SVM, support vector machine, LVO, leave-one-out; CV, cross-validation; OVA, one-versus-all; MSI-H, high MSI; MSI-L, low MSI; GA/MLHD, genetic algorithm/maximum likelihood discriminant analysis. 6 The entire expression data set is available at www.omniarray.com/gastric_cancer.html. 7 P. Tan, unpublished observations. 3309 Research. on January 2, 2015. © 2003 American Association for Cancer cancerres.aacrjournals.org Downloaded from

Transcript of A Combined Comparative Genomic Hybridization and Expression Microarray Analysis of Gastric Cancer...

[CANCER RESEARCH 63, 3309–3316, June 15, 2003]

A Combined Comparative Genomic Hybridization and Expression MicroarrayAnalysis of Gastric Cancer Reveals Novel Molecular Subtypes1,2

Su Ting Tay,3 Siew Hong Leong,3 Kun Yu, Amit Aggarwal, Soo Yong Tan, Chee How Lee, Keith Wong,Jaya Visvanathan, Dennis Lim, Wai Keong Wong, Khee Chee Soo, Oi Lian Kon, and Patrick Tan4

Cellular and Molecular Research [S. T. T., K. Y., A. A., C. H. L., K. W., P. T.], Division of Medical Sciences [S. H. L., J. V., D. L., K. C. S., O. L. K.], and Defence MedicalResearch Institute [P. T.], National Cancer Centre, Singapore 169610, Republic of Singapore; Department of Pathology and Laboratory Medicine, Tan Tock Seng Hospital,Singapore 308433, Republic of Singapore [S. Y. T.]; and Department of General Surgery, Singapore General Hospital, Singapore, Republic of Singapore [W. K. W.]

ABSTRACT

Comparative genomic hybridization (CGH), microsatellite instability(MSI) assays, and expression microarrays were used to molecularly sub-classify a common set of gastric tumor samples. We identified a numberof novel genomic aberrations associated with gastric cancer and discov-ered that gastric tumors could be grouped by their expression profiles intothree broad classes: “tumorigenic,” “reactive,” and “gastric-like.” Pa-tients with gastric-like tumors exhibited a significantly better overallsurvival than patients belonging to the other two classes (P < 0.05). Anovel supervised learning methodology for multiclass prediction was usedto identify optimal predictor gene sets that accurately predicted the classof an unknown tumor sample. These predictor sets may prove useful in thedevelopment of new diagnostic applications for gastric cancer staging andprognostication.

INTRODUCTION

Gastric adenocarcinoma is a leading cause of cancer mortalityworldwide, surpassed only by lung and breast cancer (1). A majordifficulty in the diagnosis and treatment of gastric cancer is that veryfew of the currently used classification schemes are strong predictorsof clinical behavior. Traditional classifications of gastric cancer on thebasis of mucin content, histological architecture, and cellular differ-entiation are highly subject to interobserver variation and are, thus,neither robust nor clinically meaningful (2). To date, only tumorstaging is a proven prognosticator of gastric cancer and, therapeuti-cally, only surgery has been shown to convey a survival benefit (3).

Recently, it has been shown that the resolving power of classifica-tion schemes based on molecular data can be sufficiently sensitive todetect new disease subtypes that have hitherto eluded traditional lightmicroscopy approaches (4). In this study, we used various molecularassays such as CGH,5 MSI studies, and expression microarrays tocharacterize a common set of gastric tumors. We identified severalnovel genomic aberrations associated with gastric cancer and discov-ered that gastric cancers could be divided into three broad molecularsubgroups (“tumorigenic,” “reactive,” and “gastric-like”) on the basisof their expression profiles. Patients belonging to one of these sub-groups (gastric-like) exhibited a significantly better overall survivalthan patients belonging to the other groups. Using a recently describednovel methodology for multiclass prediction, we defined various

optimal predictor gene sets capable of accurately predicting the classof an unknown tumor sample. Our results show that molecular datacan provide a useful framework for furthering our understanding ofthe taxonomy and pathology of gastric cancer.

MATERIALS AND METHODS

Tissue Samples and Histological Review. Gastric tissue specimens, pe-ripheral blood samples, and clinical records were provided by the NationalCancer Center Tissue Repository after approval by the Center’s Ethics Com-mittee. Three surgical samples of normal gastric tissue were also obtained frompatients with benign gastric disease. Paraffin sections of the 60 gastric cancercases in this study were independently reviewed and classified by a singlepathologist (S. Y. T.) using established criteria (5).

CGH. CGH was performed as described elsewhere (6). Of the 13 tumorswith low CNA, all 13 (100%) had �50% tumor content. Of the 16 tumors withno CNA, 8 (50%) had �60% tumor content, and the remaining 50% had�50% tumor content.

Microsatellite Analysis of Tumors. Multiplex PCR was performed at fivemarkers (Bat25, Bat26, D5S346, D2S123, D17S250) on tumor DNA andcase-matched normal genomic DNA from peripheral blood or histologicallyverified normal gastric tissues of 59 patients. Microsatellite stability (MSS),MSI-H, and MSI-L were scored by consensus criteria (7).

Generation of Expression Profiles. cDNA microarrays of approximately13K and 18K array targets were produced using established procedures (8)with cDNA clones from commercial vendors (Incyte and Research Genetics).Identities of array targets were confirmed by resequencing the parent clones.Expression profiles were generated from tumor specimens containing a min-imum of �50% cancer cells as assessed by cryosections. Total RNA wasextracted from homogenized gastric tissue and 5 �g were amplified using asingle-round T7-polymerase-based linear amplification protocol (9). Each mi-croarray hybridization used 1–2 �g aRNA (amplified RNA) and was comparedwith a common reference RNA pool (Universal Reference RNA; Stratagene).

Microarray Data Analysis. Microarray data sets are downloadable.6 Aninitial data set was created from array targets that were well measured across90% of all of the arrays and normalized by median centering each sample(array) and array target (gene). A truncated data set (764 array targets) wasthen formed by selecting array targets exhibiting a minimal SD of �0.7 acrossall of the samples. Minor variations on the gene selection filter (e.g., using aSD of 0.6–0.8) did not significantly affect results of the clustering analysis(data not shown).7 “Semisupervised” clustering was performed using sevendistinct expression clusters whose boundaries were visually determined usingTreeview software. Supervised clustering was performed using the followingalgorithms: OVA SVM (10), nearest neighbor correlation analysis (NNCA),and GA/MLHD; Ref. 11; see Supplementary Information2 for details).Accuracy of the supervised classification methodologies was assessed usingLVO CV.

Survival Curves. Kaplan-Meier survival curves were generated usingSPSS software. To maximize sample size, clinical data was used from patientswhose biological samples are in Fig. 3, as well as from four additional patientswhose samples could be reliably assigned to a specific tumor class (onetumorigenic, one reactive, and two gastric-like) using two independent clas-sification methodologies [OVA SVM and nearest neighbor correlation analysis(NNCA); see Supplementary Information2].

Received 9/25/02; accepted 4/11/03.The costs of publication of this article were defrayed in part by the payment of page

charges. This article must therefore be hereby marked advertisement in accordance with18 U.S.C. Section 1734 solely to indicate this fact.

1 Supported in part by National Medical Research Council, National Cancer Centre,and the Lee Foundation.

2 Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org).

3 S. T. T. and S. H. L. contributed equally to this work.4 To whom requests for reprints should be addressed, at Defence Medical Research

Institute, National Cancer Centre, 11 Hospital Drive, Singapore 169610, Republic ofSingapore. E-mail: [email protected].

5 The abbreviations used are: CGH, comparative genomic hybridization; CNA, copynumber abnormality; MSI, microsatellite instability, HC, hierarchical clustering, SVM,support vector machine, LVO, leave-one-out; CV, cross-validation; OVA, one-versus-all;MSI-H, high MSI; MSI-L, low MSI; GA/MLHD, genetic algorithm/maximum likelihooddiscriminant analysis.

6 The entire expression data set is available at www.omniarray.com/gastric_cancer.html.7 P. Tan, unpublished observations.

3309

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from

RESULTS

Identification of Novel Genomic Aberrations in Gastric Cancerby CGH

We first classified the 60 gastric tumors in our study by conven-tional clinical and histopathological criteria (Supplementary Informa-tion2) and confirmed that the demographic, anatomical and his-topathological features of the tumors (all adenocarcinomas) in thisseries were comparable with other published studies and representa-tive of gastric cancers in general. Using CGH, we then determinedthat 16 (26.6%) of the 60 tumors had no CNAs. The remaining 44aneuploid tumors were then stratified into high-, intermediate-, andlow-frequency CNAs using criteria of �10, 5–9, and 1–4 chromo-somal gains and/or losses per tumor respectively, with tumors show-ing gains or losses of an entire chromosome being scored as havingtwo separate abnormalities (of the p and q arms). On the basis of thesecriteria, 16 tumors (26.6%) had high CNAs, 15 (25%) had interme-diate CNAs, and 13 (21.2%) had low CNAs. Similar to other smallerseries, the mean genomic copy number change was 8.8 (range, 1–29)with gains (mean, 6.4/tumor; range, 1–27) exceeding losses (mean,2.3/tumor; range, 1–10; Ref. 12).

In addition to previously reported gains in 20q, 8q, 7p, 13q, 20p,and 17q (Ref. 12; Fig. 1; Supplementary Information2), several tumorsalso exhibited novel chromosomal amplifications in 11p, 12p, 14q,22q, 10q, 17p, 4p, 10p, 16q, 19p, and 4q. Strikingly, �13% of tumorsexhibited a gain of 16q. High- level amplifications were also identi-fied in 20q11.2-q13 (six cases), 6p21.1-p21.3, 16p12 and 19q12-q13(four cases each), 8q24.1-q24.2, 11p13-p14, 12p11.2-q12, 12q14, and17q12-q21 (three cases each). Although deletions on 18q, 4q, 5q, 17p,and 9p have been reported by others, we also observed deletions in 8p,10q, 11q, and 1q. Chromosomal imbalances in 2p, 4p, and 5p occurred

only in the intestinal-type tumors but were absent in all of thediffuse-type tumors in this series.

Classification of Tumors by Microsatellite Analysis

A high percentage of tumors (48%) in our series exhibited no orlow CNAs. Because it has been shown in colon cancer that manyno-CNA or low-CNA tumors often exhibit microlevel genomic insta-bility resulting from defects in DNA mismatch repair components(e.g., MSH2 and MLH1; Ref. 13), we then performed MSI studies onthe gastric tumors. Seven (12%) of 58 adenocarcinomas in this serieswere MSI-H, among which six had relatively few (� 7; n � 3) or nocopy number changes (n � 3) by CGH. Five tumors were MSI-L.Immunostaining for MSH2 and MLH1 showed that very few of thegastric tumors had lost expression of either protein (S. Y. T., data notshown). Signet ring tumors were more likely to be MSI-H (3 of 15)than tubular tumors (4 of 39; P � 0.295 by Fisher’s exact test), andexpansive gastric adenocarcinomas were more frequently MSI-H (2 of10) than infiltrating tumors (5 of 48; P � 0.347).

Identification of Biological Expression Signatures UsingUnsupervised Clustering

We then used cDNA microarrays to generate expression profiles forthe gastric tumors, focusing initially on samples containing �50%tumor cells as determined by cryosections (47 tumors), and alsoprofiling 3 surgical samples of normal gastric mucosae obtained frompatients with benign gastric disease. Using various data filters, wedefined a set of 746 array targets representing well-measured genesthat exhibited considerable transcriptional variation across all of thegastric samples (see “Materials and Methods”). A two-way unsuper-

Fig. 1. Chromosomal gains and losses in gastric cancer specimens. Gains are shown as green lines to the right of chromosomes, and losses as red lines on the left. Thick solid lines,highly amplified regions.

3310

MOLECULAR SUBTYPES OF GASTRIC CANCER

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from

vised HC algorithm was then used to order the gastric samples andgenes on the basis of their similarity to one another (Fig. 2). Thegastric tumors segregated into three broad subclasses (discussed in thenext section), and the three normal gastric samples exhibited tight

cosegregation, indicating that their expression profiles are highlycorrelated to one another.

The unsupervised clustering algorithm successfully grouped themajority of array targets/genes into distinct “expression signatures”

Fig. 2. Unsupervised clustering of gastric cancer expression profiles. Two-way HC was used to order samples (columns) and array targets (rows). Samples include 47 tumorspecimens and 3 samples of normal gastric tissue (dark blue bar under dendogram). The relative expression level of an array target in each sample (compared with all other samples)is depicted according to the color scale bar (top right). All of the tumor specimens contained �50% tumor cells as assayed by cryosections.

3311

MOLECULAR SUBTYPES OF GASTRIC CANCER

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from

based on their relative expression levels in the gastric samples (colorbars, left of clustergram in Figure 2). We identified at least sevenspecific signatures, each associated with a distinct biological process.In this report, only a few members of each signature are mentioned.6

Cell Growth and Proliferation. This expression signature con-tained several genes involved in different aspects of cell growth, e.g.,energy metabolism (adenylate kinase), DNA and protein synthesis(various ribosomal proteins and nucleoside phosphorylase), and cellcycle regulation (cyclin D1). Notably, cyclin D1 overexpression hasbeen reported in a gastric cancer subset (14). Other relevant genes inthis signature (not depicted in Fig. 2) included thymidine kinase andreplication factor C (data not shown).7

Intestinal Metaplasia. The genes in this expression signature werehighly expressed in many of the tumor samples but were down-regulated in the samples of normal gastric mucosae. They includedmarkers of intestinal differentiation such as villin-1, trefoil factor3(intestinal), the intestinal brush border protein galectin 4, and theintestinal enzyme glutathione peroxidase 2. The cytoskeletal proteinskeratin 8 and 18 were also part of this signature suggesting a specific“crypt”-like intestinal character (15). The presence of these intestinalmarkers in tumors but not in normal tissue supports the hypothesisthat intestinal metaplasia is a predisposing factor in gastric carcino-genesis.

Immunity. We detected two distinct clusters (Immunity A and B)related to immunological function. Immunity A contained multipleMHC Class I genes (B, C, and G), whereas Immunity B was com-posed primarily of MHC Class II genes (DO, DP, DQ, and DR) and�2-macroglobulin. Previous reports have suggested that gastric cancercells with a tendency for peritoneal dissemination are associated withthe up-regulation of MHC Class I molecules, whereas gastric cancercells with a tendency for lymph node metastasis tend to up-regulateMHC Class II genes instead (16, 17). Alternatively, it is also possiblethat the presence of distinct populations of immune cells may con-tribute to the differential expression of the MHC genes observed inthis tumor series.

Tumor-like. This prominent expression signature contained sev-eral genes associated with an active tumorigenic phenotype, such asmarkers of tumor hypoxia (HIF-1) and reactive angiogenesis (VEGF).Also in this cluster were �1-integrin and matrix metalloproteinase 9(MMP-9), both having been implicated in gastric tumor invasion anddissemination (18, 19). Tumor markers in this cluster included tumorrejection antigen gp96 and tumor-associated calcium signal trans-ducer 1. Genes involved in protein degradation, e.g., several 26Sproteosome subunits and the E1 ubiquitin-activating enzyme werealso prominent in this cluster, as was the transcription factor GATA6,previously shown to be strongly expressed in certain gastric cancercell lines (20).

Remodeling. This cluster contained genes such as Mucin 5B andFGFR1, which have been reported to be expressed in a subset ofgastric cancers (21, 22) but not in normal gastric tissue (22, 23).However, the most striking feature of this cluster was the presence ofnumerous genes involved in stromal remodeling and endothelialgrowth, suggesting the presence of an active desmoplastic reaction,which is frequently observed in gastric cancer. A number of smoothmuscle genes (leiomodin 1, calponin 1) were highly up-regulated, aswere the pan-endothelial markers hevin, IGFBP4, and matrix Glaprotein (24). Also present were genes such as MMP-2 and COL1A1that behave as specific markers of tumor endothelium (24).

Gastric-like. This final cluster was strongly expressed in the threebenign gastric specimens, as well as in several tumor samples, andcontained genes associated with gastric epithelia including the diges-tive proteins pepsinogen C, gastric lipase, and tryptase II/GranzymeK. The tight junction epithelial proteins p55 and desmoplakin were

also present, as was the gastric-specific growth hormone ghrelin. Thesecreted frizzled-related protein hsFRP, shown to be expressed innormal gastric tissues and some gastric cancers (24), was also in thiscluster. It is important to note that the tumor samples in this groupwere confirmed by histological examination of cryosections to containa very high percentage (80–100%) of tumor cells. Thus, it is unlikelythat the presence of this gastric-like expression signature in thesetumor samples arises from the presence of contaminating normalgastric tissue, but instead is reflective of the endogenous tumorexpression profile.

Molecular Subtypes of Gastric Cancer have DistinctClinical Behaviors

The expression signatures detected in the previous analysis suggestthat several specific and possibly independent biological subprogramsmight be operating in the gastric cancer samples. Because thesesignatures appeared to be differentially regulated across the gastriccancer specimens, we hypothesized that they could be used to dividethe gastric cancer samples into various molecular subtypes. To testthis hypothesis, we performed a “semisupervised” clustering opera-tion in which the gastric tissue specimens were reclustered on thebasis of their expression levels in the seven signatures described in theprevious section. This operation, using a combined total of 598 arraytargets/genes, subdivided the gastric cancer specimens into threebroad groups, which we refer to as tumorigenic, reactive, and gastric-like based on the principal expression signature that defines eachgroup (Fig. 3). Because the groupings defined in the purely unsuper-vised and semisupervised clusterings were highly comparable (Sup-plementary Information),2 such semisupervised clustering was per-formed primarily to refine the distribution of specific tumors withineach group and to minimize “noise” caused by extraneous genes.

We then attempted to determine whether the subtypes defined bythe expression analysis might be associated with any clinical orhistopathological criteria. To maximize our sample size, we includedclinical data from four additional patients whose samples were notused in the initial expression analysis because they contained 40%tumor cells (as assessed by cryosections).The samples, subsequently,could be reliably assigned to a specific class using two independentclassification methods (Supplementary Information).2 No significantassociations were discovered between the three molecular subgroupsand age of diagnosis, patient sex, tumor site, Lauren classification(intestinal or diffuse), tumor differentiation status, or clinical stage atdiagnosis (Supplementary Information).2 However, when a survivalanalysis was performed, we discovered that patients with gastric-liketumors exhibited a significantly better overall survival (P � 0.05) thanpatients belonging to the other two groups (Fig. 4A), suggesting thatsubtyping gastric cancers by expression profiling might identify clin-ically relevant features of gastric adenocarcinoma.

The presenting tumor stages of patients belonging to each of thethree molecular subtypes were comparable, and a multivariate analy-sis confirmed that tumor stage and molecular subtype were not sig-nificantly associated (P � 0.58 by �2, and P � 0.51 by subsequentevaluation using ANOVA). This result suggests that the improvedprognosis of patients with gastric-like tumors might be attributable tofactors independent of tumor stage, and that knowing the molecularsubtype of a gastric tumor might serve as a useful adjunct to tradi-tionally used staging systems for disease prognostication. To explorethis possibility, we stratified our patients by tumor stage, and weconfirmed that patients presenting at both extremes of the clinicalspectrum (stages I and IV) were associated with statistically signifi-cant “good” and “bad” prognoses, respectively (good, Stage I versusII/III/IV, P � 0.001; bad, Stage IV versus II/III, P � 0.05; data not

3312

MOLECULAR SUBTYPES OF GASTRIC CANCER

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from

shown).7 However, although there was an observed tendency for stageII patients to have a better prognosis than stage III patients, thisdifference was not statistically significant, possibly because of thesmall sample sizes involved (Fig. 4B; P � 0.17). Nevertheless, whenthese same patients (stage II and III) were then restratified accordingto their molecular subtype, we once again found that, despite thereduced sample size, patients with gastric-like tumors still exhibited asignificantly better overall survival than patients belonging to theother two groups (Fig. 4C; P � 0.05). These results suggest that, forgastric cancer prognostication, classical tumor staging may play adominant role for early stage (I) and late stage (IV) patients. However,for patients presenting at intermediate clinical stages (Stage II and III,

the majority of gastric cancer cases), prognostication by molecularsubtype may prove more clinically useful than classical tumor stage.

Identification of Minimal Predictor Gene Sets for GastricCancer Classification

We then attempted to define a minimal predictor gene set that couldaccurately predict the subtype of an unknown gastric tumor sample. Aspecific requirement of this gene set was the necessity to distinguishbetween three subgroups (i.e., multiclass prediction). Applying asupervised learning approach to this problem, we first adopted a OVAapproach on the initial 746-gene set, reducing the multiclass set to a

Fig. 3. Molecular subtypes of gastric cancer. A, two-way HC was used to order samples (columns) using the array targets (rows) corresponding to the seven expression signaturesdefined in Fig. 2 (color-codes of expression signatures are depicted in Fig. 2). B, identity of specific tissue specimens belonging to each molecular subtype. Color codes: black,tumorigenic; purple, reactive; and green, gastric-like.

3313

MOLECULAR SUBTYPES OF GASTRIC CANCER

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from

series of quasibinary class distinctions (i.e., T versus (RG), R versus(TG), and G versus (TR) where T, R, and G refer to tumorigenic,reactive, and gastric-like, respectively). A SVM was then used toclassify the samples, and the accuracy of the algorithm was assessedthrough LVO CV studies. (Because of the limited number of samples,we were unable in this study to assess the accuracy of the algorithmthrough the ‘gold standard’ of an independent test of naı̈ve samples.It is relatively challenging to obtain these samples, as reflected by thefact that our study, despite its limited size, actually constitutes one ofthe largest molecular profiling studies (CGH or expression microar-ray) performed on gastric cancer to date). Nevertheless, the SVMalgorithm, trained on all 746 genes, successfully classified the tumorsamples to reasonably high degrees of sensitivity and specificity(81%; Table 1). Higher predictive accuracies (94%) were obtained,however, when the OVA SVM was trained on only the top 10 genesboth positively and negatively correlated to each of the three classes(total number, 60 genes).

The SVM algorithm, although extremely powerful for binary classprediction scenarios, is associated with certain issues that render it lessthan ideal for use in a multiclass prediction setting, such as thereliance on rank-based gene selection and OVA reduction (see “Dis-cussion”). As the requirement for small numbers of genes is particu-larly relevant in the development of diagnostic applications, we thenapplied a novel classification methodology (GA/MLHD) to classifythe gastric tumors (11). One advantage of the GA/MLHD methodol-ogy is its ability to identify predictor gene sets of drastically fewergenes that nevertheless deliver comparable or slightly higher predic-tive accuracies than the conventional OVA SVM approach. We ap-plied the GA/MLHD methodology to the gastric tumor data set andselected optimal predictor sets after multiple independent runs of 100generations each. Using this approach, we identified several optimalpredictor gene sets of minimal feature size (12–17 genes) that yieldeda CV classification accuracy of 100% (See Supplemental Informa-tion2 for predictor sets). Similar results were also obtained when thesamples were randomly divided 100 times in a 66/33% training-/test-set manner, in which an independently generated predictor set wasused for each split (Supplementary information).2 In contrast, a pre-dictor set of comparable size (18 genes), created using the OVA SVM,exhibited a lower CV classification accuracy of 89–91% (Table 1).

DISCUSSION

The pathogenesis of gastric cancer is complex and dependent onboth extrinsic (e.g., microbial infections) and intrinsic factors (e.g.,hypochlorhydria; Ref. 25). Although associations between gastriccancer and various genotypes [e.g., E-cadherin (26) and interleukinreceptor IL-1RN gene polymorphisms (27)] have been reported, rel-atively little is still currently known about the fundamental pathobi-ology of gastric adenocarcinoma. In this report, several differentmolecular assays were used to analyze a common set of gastric cancerspecimens. Using CGH, we identified several specific chromosomalaberrations that occur frequently in gastric cancer, including novelgenomic aberrations such as amplifications at the 16p locus. It islikely that these chromosomal aberrations reflect the selective reten-tion of genomic fragments housing “driver” genes whose productsfunctionally contribute to gastric carcinogenesis. For example, the 20q

Fig. 4. Molecular subtypes of gastric cancer exhibit distinct clinical behaviors. A,Kaplan-Meier survival curves of all gastric cancer patients (n � 51) divided by molecularsubtypes (gastric-like versus tumorigenic and reactive). Patients with gastric-like gastriccancers exhibited higher overall survival than patients belonging to the other two groups(P � 0.0496). No significant differences in the survival of patients belonging to thetumorigenic and reactive groups were observed (data not shown).7 B, survival curves ofStage II and Stage III gastric cancer patients divided by tumor stage (n � 30). No

statistically significant differences are observed, possibly because of the small sample size(P � 0.16). C, survival curves of the same 30 stage II and stage III gastric cancer patientsdivided by molecular subtype (tumorigenic: 10 patients, 6 at stage II and 4 at stage III;reactive: 12 patients, 3 at stage II and 9 at stage III; gastric-like: 8 patients, 4 at stage IIand 4 at stage III). Patients with gastric-like tumors still exhibit a significantly betteroverall survival (P � 0.042).

3314

MOLECULAR SUBTYPES OF GASTRIC CANCER

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from

region harbors several genes implicated in tumor formation, such asAIB1, BTAK, and PTPN1. For other frequently observed aberrations(e.g., 13q), there are as yet no specific driver genes that have beenimplicated. To identify these novel genes, we are currently attemptingto integrate the CGH data with the microarray expression results. Aninitial attempt to combine these two data sets revealed a relativelypoor correlation between the presence of an amplified chromosomalfragment and the transcriptional overexpression of genes on thatfragment (data not shown),8 but this may be attributable to therelatively low resolution achievable by the chromosomal CGH assay.Addressing this issue through the use of higher-resolution assays suchas array-CGH will be an important task for future research.

As an alternative to CGH, which detects large-scale genomic ab-errations, we also used MSI studies to address the possibility that thehigh proportion of low- and no-NA tumors in our series might beassociated with microlevel genomic instability. Although only a smallfraction of the no- and low-CNA tumors [i.e., 6 (19%) of 32] wereMSI-H, this observation carries several caveats. For example, protocoldiscrepancies may explain much of the reported variation of MSI ingastric cancer (from 9 to �44%; Refs. 28 and 29), and the standardmarker loci used in the MSI assay may not be equally mutable in allDNA mismatch repair-deficient tumors. Indeed, a study of MSI ingastric cancer showed that the rate of instability of different markersin the same tumors ranged from 0 to 77% (30). Furthermore, MSI-Hgastric cancers are known to harbor mutations in genes that aredistinct from those found in other tumor types (31). Nevertheless, if amajority of CNA-absent gastric cancers are truly mismatch repair-proficient (as our data indicate), then this may suggest the existence ofan alternative pathway capable of driving the oncogenic potential ofno/low-CNA gastric tumors in the absence of processes causing eitherclassical micro- or macro-genomic instability (the former being meas-ured by MSI and the latter by CGH).

We also discovered that the gastric cancers could be divided on thebasis of their expression profiles into three major groups: tumorigenic,reactive, and gastric-like, and that patients with gastric-like tumorsexhibited a significantly better overall survival than patients of theother two groups. The clinical usefulness of the molecular subtypesbecame more apparent when used to prognosticate patients presentingat intermediate clinical stages. Our current hypothesis is that each ofthe molecular subtypes is associated with a distinct biological behav-ior, which ultimately contributes to the differing survival rates. Forexample, tumorigenic tumors may be more clinically and metaboli-cally aggressive, whereas gastric-like tumors may progress along amore indolent course. In addition, the expression signatures found ineach subtype logically suggest that certain therapies may be moreeffective against certain tumor subtypes than against other therapies.For example, reactive tumors, by virtue of their association withnumerous endothelial growth markers, may be more susceptible toantiangiogenic therapies and strategies that target the surroundingnormal stroma.

Finally, we used various supervised learning approaches to definea minimal predictor gene set that could accurately classify the class of

an unknown gastric sample. To date, much less work has been donespecifically on algorithms for multiclass prediction than for binaryprediction. The popular OVA SVM approach (10), for example, isassociated with certain issues that render it less than ideal for use ina multiclass setting. Because it relies on converting the multiclassscenario into a series of quasibinary class prediction problems (theOVA approach), distinct sets of predictor genes need to be selected foreach quasibinary class distinction, leading to a final combined pre-dictor gene set that can be fairly large and unwieldy, especially for thedevelopment of diagnostic assays. As an alternative, we used a meth-odology that we developed (GA/MLHD) which was created specifi-cally for use in a multiclass prediction setting (11). In addition tobeing able to automatically determine the optimal number of genesthat should belong to a predictor gene set (a number that normally hasto be prespecified), the GA/MLHD approach does not rely on arank-based gene selection strategy, and deliberately selects genes thatare uncorrelated in expression to each other to belong to a predictorgene set. Although the strength of this approach is primarily seen inscenarios involving many classes (i.e., more than five; see Ref. 10),the application of the GA/MLHD methodology to the gastric cancerdata set allowed us to define a series of small (�20) gene sets thatdelivered very high classification accuracies (100% CV accuracy, ascompared with 87% for a OVA SVM based on 21 genes). We arehopeful that the GA/MLHD methodology will prove useful also inother complex multiclass prediction settings for other cancers. Inconclusion, our results offer several insights and suggest multiplelogical avenues for future research into gastric cancer, which mayultimately lead to improved methods of diagnosis, treatment, andprevention of this important and complex disease.

ACKNOWLEDGMENTS

We thank the NCC Tissue Repository for tissue specimens, Choon Wei Weeand Cheryl Lee for clone resequencing, Alwin Loh and Ivy Sng for assistancewith histological review, National Medical Research Council and NationalCancer Centre for financial support, the Lee Foundation for the purchase ofclones, and Pulivarthi Rao for CGH training. P. T. thanks Hui Kam Man for hisencouragement and support.

REFERENCES

1. Ferlay, J., Bray, F., Pisani, P., and Parkin, D. M. GLOBOCAN 2000: CancerIncidence, Mortality and Prevalence Worldwide, IARC CancerBase No. 5. Lyon:IARC Press, 2001.

2. Dixon, M. F., Martin, I. G., Sue-Ling, H. M., Wyatt, J. I., Quirke, P., and Johnston,D. Goseki grading in gastric cancer: comparison with existing systems of grading andits reproducibility. Histopathology, 25: 309–316, 1994.

3. Wu, C. W., Hsieh, M. C., Lo, S. S., Tsay, S. H., Li, A. F., Lui, W. Y., and Peng, F. K.Prognostic indicators for survival after curative resection for patients with carcinomaof the stomach. Dig. Dis. Sci., 42: 1265–1269, 1997.

4. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A.,Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E.,Moore, T., Hudson, J., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C.,Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson,W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O., and Staudt, L. M. Distincttypes of diffuse large B-cell lymphoma identified by gene expression profiling.Nature (Lond.), 403: 503–511, 2000.

5. Sobin, L. H., and Wittekind, C. TNM Classification of Malignant Tumors: Interna-tional Union Against Cancer. New York: Wiley, 1997.

6. Kallioniemi, A., Kallioniemi, O. P., Sudar, D., Rutovitz, D., Gray, J. W., Waldman,F. M., and Pinkel, D. Comparative genomic hybridization for molecular cytogeneticanalysis of solid tumors. Science (Wash. DC), 258: 818–821, 1992.

7. Boland, C. R., Thibodeau, S. N., Hamilton, S. R., Sidransky, D., Eshleman, J. R.,Burt, R. W., Meltzer, S. J., Rodriguez-Bigas, M. A., Fodde, R., Ranzani, G. N., andSrivastava, S. A National Cancer Institute workshop on microsatellite instability forcancer detection and familial predisposition: development of international criteria forthe determination of microsatellite instability in colorectal cancer. Cancer Res., 58:5248–5257, 1998.

8. DeRisi, J., Penland, L., Brown, P. O., Bittner, M. L., Meltzer, P. S., Ray, M., Chen,Y., Su, Y. A., and Trent, J. M. Use of a cDNA microarray to analyse gene expressionpatterns in human cancer. Nat. Genet., 14: 457–460, 1996.8 P. Tan and O. L. Kon, unpublished observations.

Table 1 Classification accuracies delivered by the OVA SVM algorithm

Predictive accuracy was assessed by LVO CV (47 samples). Numbers in parenthesesrefer to the total number of unique samples that were misclassified.

No. of genes MisclassificationsOverall

accuracy

746 9 (7) 0.81 (0.85)58 3 (3) 0.94 (0.94)18 5 (4) 0.89 (0.91)

3315

MOLECULAR SUBTYPES OF GASTRIC CANCER

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from

9. Wang, E., Miller, L. D., Ohnmacht, G. A., Liu, E. T., and Marincola, F. M.High-fidelity mRNA amplification for gene profiling. Nat. Biotechnol., 18: 457–459,2000.

10. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C. H., Angelo, M.,Ladd, C., Reich, M., Latulippe, E., Mesirov, J. P., Poggio, T., Gerald, W., Loda, M.,Lander, E. S., and Golub, T. R. Multi-class cancer diagnosis using tumor geneexpression signatures. Proc. Natl. Acad. Sci. USA, 98: 15149–15154, 2001.

11. Ooi, C. H., and Tan, P. Genetic algorithms applied to multi-class prediction for theanalysis of gene expression data. Bioinformatics, 19: 37–44, 2002.

12. Sakakura, C., Mori, T., Sakabe, T., Ariyama, Y., Shinomiya, T., Date, K., Hagiwara,A., Yamaguchi, T., Takahashi, T., Nakamura, Y., Abe, T., and Inazawa, J. Gains,losses, and amplifications of genomic materials in primary gastric cancers analyzedby comparative genomic hybridization. Genes Chromosomes Cancer, 24: 299–305,1999.

13. Curtis, L. J., Georgiades, I. B., White, S., Bird, C. C., Harrison, D. J., and Wyllie,A. H. Specific patterns of chromosomal abnormalities are associated with RER statusin sporadic colorectal cancer. J. Pathol., 192: 440–445, 2000.

14. To, K. F., Chan, M. W., Leung, W. K., Tong, J. H., Lee, T. L., Chan, F. K., and Sung,J. J. Alterations of frizzled (FzE3) and secreted frizzled related protein (hsFRP)expression in gastric cancer. Life Sci., 70: 483–489, 2001.

15. Calnek, D., and Quaroni, A. Differential localization by in situ hybridization ofdistinct keratin mRNA species during intestinal epithelial cell development anddifferentiation. Differentiation, 5: 95–104, 1993.

16. Hippo, Y., Yashiro, M., Ishii, M., Taniguchi, H., Hirakawa, K., Kodama, T., andAburatani, H. Differential gene expression profiles of scirrhous gastric cancer cellswith high metastatic potential to peritoneum or lymph nodes. Cancer Res., 61:889–895, 2001.

17. Hippo, Y., Taniguchi, H., Tsutsumi, S., Machida, N., Chong, J.-M., Fukayama, M.,Kodama, T., and Aburatani, H. Global gene expression analysis of gastric cancer byoligonucleotide microarrays. Cancer Res., 62: 233–240, 2002.

18. Nishimura, S., Chung, Y. S., Yashiro, M., Inoue, T., and Sowa, M. Role of �2�1 and�3�1-integrin in the peritoneal implantation of scirrhous gastric carcinoma. Br. J.Cancer, 74: 1406–1412, 1996.

19. Torii, A., Kodera, Y., Ito, M., Shimizu, Y., Hirai, T., Yasui, K., Morimoto, T.,Yamamura, Y., Kato, T., Hayakawa, T., Fujimoto, N., and Kito, T. Matrix metallo-proteinase 9 in mucosally invasive gastric cancer. Gastric Cancer, 1: 142–145, 1998.

20. Bai, Y., Akiyama, Y., Nagasaki, H., Yagi, O. K., Kikuchi, Y., Saito, N., Takeshita,K., Iwai, T., and Y. Yuasa. Distinct expression of CDX2 and GATA4/5, develop-

ment-related genes, in human gastric cancer cell lines. Mol. Carcinog., 28: 184–188,2000.

21. Iida, S., Katoh, O., Tokunaga, A., and Terada, M. Expression of fibroblast growthfactor gene family and its receptor gene family in the human upper gastrointestinaltract. Biochem. Biophys. Res. Commun., 199: 1113–1119, 1994.

22. Perrais, M., Pigny, P., Buisine, M. P., Porchet, N., Aubert, J. P., and Van Seuningen-Lempire, I. Aberrant expression of human mucin gene MUC5B in gastric carcinomaand cancer cells. Identification and regulation of a distal promoter. J. Biol. Chem.,276: 15386–15396, 2001.

23. Hughes, S. E. Differential expression of the fibroblast growth factor receptor (FGFR)multigene family in normal human adult tissues. J. Histochem. Cytochem., 45:1005–1019, 1997.

24. Croix, B. S., Rago, C., Velculescu, V., Traverso, G., Romans, K. E., Montgomery, E.,Lal, A., Riggins, G. J., Lengauer, C., Vogelstein, B., and Kinzler, K. W. Genesexpressed in human tumor endothelium. Science (Wash. DC), 289: 1197–1202, 2000.

25. Fuchs, C. S., and Mayer, R. J. Gastric carcinoma. N. Engl. J. Med., 333: 32–41, 1995.26. Guilford, P., Hopkins, J., Harraway, J., McLeod, M., McLeod, N., Harawira, P., Taite,

H., Scoular, R., Miller, A., and Reeve, A. E. E-Cadherin germline mutations infamilial gastric cancer. Nature (Lond.), 392: 402–405, 1998.

27. El-Omar, E. M., Carrington, M., Chow, W. H., McColl, K. E., Bream, J. H., Young,H. A., Herrera, J., Lissowska, J., Yuan, C. C., Rothman, N., Lanyon, G., Martin, M.,Fraumeni, J. F. Jr., and Rabkin, C. S. Interleukin-1 polymorphisms associated withincreased risk of gastric cancer. Nature (Lond.), 404: 398–402, 2000.

28. Hayden, J. D., Cawkwell, L., Quirke, P., Dixon, M. F., Goldstone, A. R., Sue-Ling,H., Johnston, D., and Martin, I. G. Prognostic significance of microsatellite instabilityin patients with gastric carcinoma. Eur. J. Cancer, 33: 2342–2346, 1997.

29. Wirtz, H. C., Muller, W., Noguchi, T., Scheven, M., Ruschoff, J., Hommel, G., andGabbert, H. E. Prognostic value and clinicopathological profile of microsatelliteinstability in gastric cancer. Clin. Cancer Res., 4: 1749–1754, 1998.

30. Halling, K. C., Harper, J., Moskaluk, C. A., Thibodeau, S. N., Petroni, G. R., Yustein,A. S., Tosi, P., Minacci, C., Roviello, F., Piva, P., Hamilton, S. R., Jackson, C. E., andPowell, S. M. Origin of microsatellite instability in gastric cancer. Am. J. Pathol.,155: 205–211, 1999.

31. Menoyo, A., Alazzouzi, H., Espin, E., Armengol, M., Yamamoto, H., and Schwartz,S. J. Somatic mutations in the DNA damage-response genes ATR and CHK1 insporadic stomach tumors with microsatellite instability. Cancer Res., 61: 7727–7730,2001.

3316

MOLECULAR SUBTYPES OF GASTRIC CANCER

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from

2003;63:3309-3316. Cancer Res   Su Ting Tay, Siew Hong Leong, Kun Yu, et al.   Novel Molecular SubtypesExpression Microarray Analysis of Gastric Cancer Reveals A Combined Comparative Genomic Hybridization and

  Updated version

  http://cancerres.aacrjournals.org/content/63/12/3309

Access the most recent version of this article at:

  Material

Supplementary

  http://cancerres.aacrjournals.org/content/suppl/2003/06/20/63.12.3309.DC1.html

Access the most recent supplemental material at:

   

   

  Cited Articles

  http://cancerres.aacrjournals.org/content/63/12/3309.full.html#ref-list-1

This article cites by 26 articles, 10 of which you can access for free at:

  Citing articles

  http://cancerres.aacrjournals.org/content/63/12/3309.full.html#related-urls

This article has been cited by 20 HighWire-hosted articles. Access the articles at:

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  SubscriptionsReprints and

  [email protected] at

To order reprints of this article or to subscribe to the journal, contact the AACR Publications

  Permissions

  [email protected] at

To request permission to re-use all or part of this article, contact the AACR Publications

Research. on January 2, 2015. © 2003 American Association for Cancercancerres.aacrjournals.org Downloaded from