Molecular changes from dysplastic nodule to hepatocellular carcinoma through gene expression...

10
Molecular Changes From Dysplastic Nodule to Hepatocellular Carcinoma Through Gene Expression Profiling Suk Woo Nam, 1,2 Jik Young Park, 1,2 Adaikalavan Ramasamy, 3 Shirish Shevade, 3 Amirul Islam, 3 Philip M. Long, 3 Cheol Keun Park, 4 Soo Eun Park, 1,2 Su Young Kim, 1,2 Sug Hyung Lee, 1,2 Won Sang Park, 1,2 Nam Jin Yoo, 1,2 Edison T. Liu, 3 Lance D. Miller, 3 and Jung Young Lee 1,2 Progression of hepatocellular carcinoma (HCC) is a stepwise process that proceeds from pre- neoplastic lesions—including low-grade dysplastic nodules (LGDNs) and high-grade dysplastic nodules (HGDNs)—to advanced HCC. The molecular changes associated with this progression are unclear, however, and the morphological cues thought to distinguish pre-neoplastic lesions from well-differentiated HCC are not universally accepted. To understand the multistep process of hepato-carcinogenesis at the molecular level, we used oligo-nucleotide microarrays to inves- tigate the transcription profiles of 50 hepatocellular nodular lesions ranging from LGDNs to primary HCC (Edmondson grades 1-3). We demonstrated that gene expression profiles can discriminate not only between dysplastic nodules and overt carcinoma but also between different histological grades of HCC via unsupervised hierarchical clustering with 10,376 genes. We identified 3,084 grade-associated genes, correlated with tumor progression, using one-way ANOVA and a one-versus-all unpooled t test. Functional assignment of these genes revealed discrete expression clusters representing grade-dependent biological properties of HCC. Using both diagonal linear discriminant analysis and support vector machines, we identified 240 genes that could accurately classify tumors according to histological grade, especially when attempting to discriminate LGDNs, HGDNs, and grade 1 HCC. In conclusion, a clear molecular demarca- tion between dysplastic nodules and overt HCC exists. The progression from grade 1 through grade 3 HCC is associated with changes in gene expression consistent with plausible functional consequences. Supplementary material for this article can be found on the HEPATOLOGY website (http://www.interscience.wiley.com/jpages/0270-9139/suppmat/index.html). (HEPATOLOGY 2005;42: 809-818.) H epatocelluar carcinoma (HCC) is one of the most common malignancies worldwide. The chronic hepatitis resulting from infection with hepatitis B virus or hepatitis C virus and exposure to car- cinogens such as aflatoxin B1 are known as major risk factors for HCC. 1 Molecular investigations have recently found that genetic alterations of tumor suppressor genes or oncogenes such as p53, -catenin, and AXIN1 might be involved in the progression to HCC, 2-4 but the frequency of these somatic mutations appears to be low in HCCs. Furthermore, it is unclear how these genetic changes re- flect the clinical characteristics of the individual tumors. Therefore, the predominant molecular events underlying HCC in most patients remain unknown. Because HCC typically develops in close association with pre-existing cirrhosis, it is widely believed that a liver with cirrhosis may contain pre-neoplastic nodules that are in an intermediate stage between nonneoplastic regener- ating nodules and overtly malignant HCC. 5,6 These nod- Abbreviations: HCC, hepatocelluar carcinoma; LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; eHCC, early hepatocellular carci- noma; G1-3, Edmondson grades 1-3; HBV, hepatitis B virus; OVA, one-versus-all; SVM, support vector machine; DLDA, diagonal linear discriminant analysis. From the 1 Department of Pathology and 2 Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea; the 3 Ge- nome Institute of Singapore, Singapore; and the 4 Department of Pathology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea. Received May 3, 2005; accepted July 15, 2005. Supported by the Korea Science & Engineering Foundation (KOSEF) through the Cell Death Disease Research Center at The Catholic University of Korea (R13- 2002-005-01004-0) and the 21C Frontier Foundation Human Genome Project (Number F-1-1-02) from Ministry of Science and Technology of Korea. Address reprint requests to: Jung Young Lee, M.D., Ph.D., Department of Pa- thology, College of Medicine, The Catholic University of Korea, #505 Banpodong, Seocho-gu, Seoul, South Korea, 137-701. E-mail: [email protected]; fax: (82) 2-537-6586; or Lance D. Miller, Ph.D., Microarray and Expression Genom- ics Genome Institute of Singapore, Genome Building #02-01, 60 Biopolis Street, Singapore 138672. E-mail: [email protected]; fax: (65) 6478-9060. Copyright © 2005 by the American Association for the Study of Liver Diseases. Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/hep.20878 Potential conflict of interest: Nothing to report. 809

Transcript of Molecular changes from dysplastic nodule to hepatocellular carcinoma through gene expression...

Molecular Changes From Dysplastic Nodule toHepatocellular Carcinoma Through Gene

Expression ProfilingSuk Woo Nam,1,2 Jik Young Park,1,2 Adaikalavan Ramasamy,3 Shirish Shevade,3 Amirul Islam,3 Philip M. Long,3

Cheol Keun Park,4 Soo Eun Park,1,2 Su Young Kim,1,2 Sug Hyung Lee,1,2 Won Sang Park,1,2 Nam Jin Yoo,1,2

Edison T. Liu,3 Lance D. Miller,3 and Jung Young Lee1,2

Progression of hepatocellular carcinoma (HCC) is a stepwise process that proceeds from pre-neoplastic lesions—including low-grade dysplastic nodules (LGDNs) and high-grade dysplasticnodules (HGDNs)—to advanced HCC. The molecular changes associated with this progressionare unclear, however, and the morphological cues thought to distinguish pre-neoplastic lesionsfrom well-differentiated HCC are not universally accepted. To understand the multistep processof hepato-carcinogenesis at the molecular level, we used oligo-nucleotide microarrays to inves-tigate the transcription profiles of 50 hepatocellular nodular lesions ranging from LGDNs toprimary HCC (Edmondson grades 1-3). We demonstrated that gene expression profiles candiscriminate not only between dysplastic nodules and overt carcinoma but also between differenthistological grades of HCC via unsupervised hierarchical clustering with 10,376 genes. Weidentified 3,084 grade-associated genes, correlated with tumor progression, using one-wayANOVA and a one-versus-all unpooled t test. Functional assignment of these genes revealeddiscrete expression clusters representing grade-dependent biological properties of HCC. Usingboth diagonal linear discriminant analysis and support vector machines, we identified 240 genesthat could accurately classify tumors according to histological grade, especially when attemptingto discriminate LGDNs, HGDNs, and grade 1 HCC. In conclusion, a clear molecular demarca-tion between dysplastic nodules and overt HCC exists. The progression from grade 1 throughgrade 3 HCC is associated with changes in gene expression consistent with plausible functionalconsequences. Supplementary material for this article can be found on the HEPATOLOGY website(http://www.interscience.wiley.com/jpages/0270-9139/suppmat/index.html). (HEPATOLOGY 2005;42:809-818.)

Hepatocelluar carcinoma (HCC) is one of themost common malignancies worldwide. Thechronic hepatitis resulting from infection with

hepatitis B virus or hepatitis C virus and exposure to car-cinogens such as aflatoxin B1 are known as major riskfactors for HCC.1 Molecular investigations have recentlyfound that genetic alterations of tumor suppressor genesor oncogenes such as p53, �-catenin, and AXIN1 might beinvolved in the progression to HCC,2-4 but the frequencyof these somatic mutations appears to be low in HCCs.Furthermore, it is unclear how these genetic changes re-flect the clinical characteristics of the individual tumors.Therefore, the predominant molecular events underlyingHCC in most patients remain unknown.

Because HCC typically develops in close associationwith pre-existing cirrhosis, it is widely believed that a liverwith cirrhosis may contain pre-neoplastic nodules that arein an intermediate stage between nonneoplastic regener-ating nodules and overtly malignant HCC.5,6 These nod-

Abbreviations: HCC, hepatocelluar carcinoma; LGDN, low-grade dysplasticnodule; HGDN, high-grade dysplastic nodule; eHCC, early hepatocellular carci-noma; G1-3, Edmondson grades 1-3; HBV, hepatitis B virus; OVA, one-versus-all;SVM, support vector machine; DLDA, diagonal linear discriminant analysis.

From the 1Department of Pathology and 2Microdissection Genomics Research Center,College of Medicine, The Catholic University of Korea, Seoul, South Korea; the 3Ge-nome Institute of Singapore, Singapore; and the 4Department of Pathology, SamsungMedical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea.

Received May 3, 2005; accepted July 15, 2005.Supported by the Korea Science & Engineering Foundation (KOSEF) through

the Cell Death Disease Research Center at The Catholic University of Korea (R13-2002-005-01004-0) and the 21C Frontier Foundation Human Genome Project(Number F-1-1-02) from Ministry of Science and Technology of Korea.

Address reprint requests to: Jung Young Lee, M.D., Ph.D., Department of Pa-thology, College of Medicine, The Catholic University of Korea, #505 Banpodong,Seocho-gu, Seoul, South Korea, 137-701. E-mail: [email protected]; fax:(82) 2-537-6586; or Lance D. Miller, Ph.D., Microarray and Expression Genom-ics Genome Institute of Singapore, Genome Building #02-01, 60 Biopolis Street,Singapore 138672. E-mail: [email protected]; fax: (65) 6478-9060.

Copyright © 2005 by the American Association for the Study of Liver Diseases.Published online in Wiley InterScience (www.interscience.wiley.com).DOI 10.1002/hep.20878Potential conflict of interest: Nothing to report.

809

ular lesions have been designated as “dysplastic nodules”by the International Working Party and are further di-vided into low-grade dysplastic nodules (LGDNs) andhigh-grade dysplastic nodule (HGDNs) depending onthe degree of cytological or architectural atypia on histo-logical examination.7 That these nodules frequently con-tain one or more microscopic foci of HCC suggests thatdysplastic nodules, especially HGDNs, might be precan-cerous lesions of HCC.8,9 Some investigators haveadopted the concept of early HCC (eHCC)—sometimesreferred to as “carcinoma in situ” or “microinvasive carci-noma” of the liver—which is characterized by a smalltumor mass lacking invasive growth properties such asvascular invasion or intrahepatic metastasis.10-12 How-ever, there remains considerable controversy as to whethereHCC should be regarded as frank cancer or as a form ofHGDN.13 As such, the distinction between precancerousand cancerous lesions remains debatable, and the devel-opmental process from pre-neoplastic lesion to overtHCC is still unclear.

HCC can be classified into four different histologicalgrades, known as Edmondson grades 1 through 4, whichgenerally correspond to well-differentiated, moderatelydifferentiated, poorly differentiated, and undifferentiatedtypes of HCC, respectively.14,15 Most cancer nodules lessthan 1 cm in diameter consist of well-differentiated can-cerous tissues and are completely replaced by less well-differentiated cancerous tissues when the tumor sizereaches a diameter of approximately 3 cm.16 As such, tu-mor de-differentiation and increasing tumor size arethought to reflect a continuum of morphological changein a multistep hepato-carcinogenesis process, but the mo-lecular underpinnings of this are largely unknown.

Recently, DNA microarray technology has enabled thegenome-wide analysis of gene transcript levels, and assuch has yielded great insight into the molecular nature ofcancer. Although several reports have described tumor-associated molecular expression profiles of liver can-cers,17-20 little insight into the molecular nature of early ormultistep hepato-carcinogenesis has been gained. To bet-ter understand this multistep process at the molecularlevel, we analyzed global transcript levels in the context ofthree different histological grades of HCC as well as pre-cancerous LGDNs and HGDNs using a high-densityspotted oligo-nucleotide microarray.

In the present study, we show that an extensive andremarkably reproducible expression signature comprisedof several thousand genes underlies the procress of HCCprogression. For the majority of genes, messenger RNAlevels were either positively or negatively correlated withtumor progression. Functional analysis of these genes re-vealed discrete expression clusters representing grade-de-

pendent biological properties of HCC, including cellproliferation, protein synthesis, and hepatocyte-specificfunctions. We also observed altered expression of knowntumor suppressor genes and oncogenes that potentiallycontribute to this process. Additionally, we identified asubset of progression-associated genes that could accu-rately classify tumors according to grade, and readily dis-tinguish dysplastic nodules from low-grade cancer.

Patients and Methods

Patients and Tissue Preparation. Primary HCCs,including Edmondson grade 1 (G1), grade 2 (G2), grade3 (G3), and premalignant lesions of HCC (LGDNs andHGDNs), were obtained from 42 patients who under-went surgical treatment for HCC at Samsung MedicalCenter, Sungkyunkwan University School of Medicine,Seoul, Korea. Immediately after hepatectomy, freshly re-moved livers were serially sliced from the top edge to thebottom edge at 7- to 8-mm intervals and examined by apathologist for the presence of nodular lesions. Any bulg-ing nodules 10 mm or more in diameter or lesions mac-roscopically different in color from the surrounding liver,regardless of size, were snap-frozen in liquid nitrogen andstored at �80°C until use. Subsequent sections from thesame nodule were fixed in 10% neutral formalin for con-firmation of morphological diagnosis. The hematoxylin-eosin–stained sections were examined independently bytwo pathologists and classified as HCC with different his-tological grading according to the Edmondson andSteiner method or dysplastic nodules of low or high gradeaccording to the guidelines of the International WorkingParty. In this way, we obtained a total of 30 HCCs (10G1, 10 G2, and 10 G3), 10 LGDNs, and 10 HGDNsfrom 42 patients. To reduce experimental bias, we se-lected all specimens that had a background associatedwith cirrhosis and were hepatitis B virus (HBV) seropos-itive (Supplementary Table 1). Approval was obtainedfrom the institutional review boards of the Catholic Uni-versity of Korea College of Medicine and the Sungkyunk-wan University School of Medicine. Informed consentwas provided according to the Declaration of Helsinki.

DNA Microarrays. The Compugen/Sigma HumanOligolibrary (60-mers) representing 18,664 LEADS clus-ters (Compugen/Sigma-Genosys, Woodland, TX) wasspotted onto poly-L-lysine-coated glass microscope slidesusing an OmniGrid robotic arrayer (GeneMachines, SanCalos, CA). All microarrays were manufactured at theMicroarray and Expression Genomics Laboratory of theGenome Institute of Singapore essentially according toEisen and Brown.21

810 NAM ET AL. HEPATOLOGY, October 2005

RNA Preparation and Microarray Hybridization.Total RNA was extracted from frozen tissues usingTRIzol reagent following the manufacturer’s protocol(Life Technology, Rockville, MD). Human universal ref-erence RNA (Stratagene, La Jolla, CA) was used as thereference RNA. Total RNA (20 �g) was used for DNAtarget synthesis as described.22 The reference RNA waslabeled with Cyanine-3, and the test sample was labeledwith Cyanine-5.

Data Processing and Analysis. All GenePix files wereuploaded into the Genome Institute of Singapore mi-croarray database and log expression ratios were normal-ized using the global median method. Microarray featureswith signal (foreground) intensities less than 50% abovemedian local background intensity in both channels, andfeatures automatically and manually flagged as “notfound” were treated as missing values. Genes with expres-sion values in 70% or more of the tumors within each ofthe five grades were retained for further analyses. 10,376of 18,708 probes passed this filter. This probe set was usedas the basis for all subsequent analyses. Hierarchical clus-tering of log ratios was performed using the softwaresCluster and Treeview23; Pearson correlation, mean cen-tering and average linkage were applied in all clusteringapplications. One-way ANOVA (F test) and one-versus-all (OVA) t test were performed in the R statisticalpackage (http://www.r-project.org/). Support vector ma-chines (SVM) and diagonal linear discriminant analysis(DLDA) were used to assess the classification accuracy ofgene classifiers with grade prediction potential.24-26

Class Prediction: Stratified Three-fold Cross-vali-dation. Classification accuracies were assessed using astratified three-fold cross-validation scheme. Here the ar-rays were randomly partitioned into three folds. Any ran-dom grouping that resulted in less than two members ofany grade in any fold was discarded and resampled. Twofolds were used to train the classifier, which was thentested on the remaining fold. At every training/test setselection, p genes were selected. The p genes used in theclassifier comprised the p/2 genes most significantly up-regulated and the p/2 genes most significantly downregu-lated for each grade according to the OVA Welch t testresults. The three-fold cross-validation process was re-peated 100 times, and the mean accuracies were reported.

Results

Large-Scale Gene Expression Alterations CoincideWith Different Histological Grade of HCC Progres-sion. To determine whether global alterations in geneexpression could discern histological grade differencesranging from pre-neoplastic lesion to advanced HCC, we

examined the expression profiles in a series of 50 hepato-cellular nodular lesions from 42 patients treated (see Ma-terials and Methods). These specimens were subsequentlyhybridized onto spotted oligo-nucleotide microarrays,each containing 18,861 probes representing approxi-mately 18,000 unique genes. Of these, 9 cases were notincluded in the final dataset as a result of suboptimalaverage signal intensities (owing to poor RNA quality) orunusually high background fluorescence. Therefore, 41samples from patients comprised the final dataset, whichincluded 7 LGDNs, 7 HGDNs, and 9 G1, 10 G2, and 8G3 HCCs. The relevant patient/tissue clinico-pathologi-cal variables are provided as supporting information (Sup-plementary Table 1).

First, we performed unsupervised hierarchical clusteranalysis on the expression profiles of the 41 hepatocellularnodular lesions with 10,376 genes that passed the basicfiltering criteria described in Materials and Methods. Thisresulted in two predominant tissue clusters: one cluster(CI) that contained all the dysplastic nodules (LGDNsand HGDNs), and a second cluster (CII) that containedall of the G2 and G3 HCCs and a majority of the G1tumors (6/9) (Fig. 1). Within the CII cluster, all of the G3HCCs were found together in a single G3-exclusive sub-cluster, which was flanked by the G2 HCCs that togetherwith the G3s comprised a larger G2-G3 subcluster. Theostensible separation of dysplastic hepatocytes (CI) fromHCC (CII), along with the occurrence of grade-specificsubclusters in CII, demonstrates that reproducible large-scale changes in gene expression distinguish pre-neoplas-tic lesions and overt HCC as well as different histologicalgrades of HCC. Of the 9 G1 HCCs, 3 cases (G1-05, -06,and -09) were found to have expression profiles moresimilar to the pre-neoplastic nodules, while the rest clus-tered with the overt HCCs. This observation suggests thatG1 HCCs are molecularly heterogeneous, sitting on theborder between the transition from premalignant lesionto overt malignant carcinoma, and that this heterogeneityis distinguishable at the molecular level. Additionally, wecompared a small number of nontumorigenic surround-ing tissues (i.e., “normal” tissue) to the HCCs and dys-plastic nodules. The expression profiles of thenontumorigenic surrounding tissues consistently clus-tered apart from the dysplastic nodules, suggesting thatdysplasia itself is marked by transcriptional alterations dis-tinct from “normal” liver tissue (Supplementary Fig. 1).

Identification and Pattern Analysis of HCC Grade-Associated Genes. To study in detail the genes most of-ten correlated with tumor progression (referred tohenceforth as “grade-associated” genes), we identified allgenes associated with grade at P � .001 by either one-wayANOVA (F test) or OVA unpooled t test.26 The F test

HEPATOLOGY, Vol. 42, No. 4, 2005 NAM ET AL. 811

assigns the greatest significance to genes with expressionprofiles that show continuous variation among classes(e.g., grades), while the OVA t test assigns greater signifi-cance to genes with expression profiles that clearly distin-guish one class from the rest. Consequently, the F test isbiased toward selecting genes with profiles that progres-sively change from one class to the next, while the OVA ttest is biased toward selection of those that show class-specific expression spikes. Thus, gene selection based onthis combination of statistical measures allows for greaterdiscovery of differentially expressed genes as it takes ad-vantage of the inherent differences between the F test andthe t test. We obtained 2,423 and 3,118 probes significantat P � .001 by F test and at least one of the five OVA ttests, respectively. After removing redundant discoveries,we were left with 3,084 probes with nonredundant geneidentities with an expected maximum occurrence of falsediscoveries of 63 genes [10,376 � (0.001 � 0.001 � 5)].Hierarchical clustering of these genes in Fig. 2 shows thatthe predominant grade-associated expression profiles arethose with either positive or negative correlations withgrade, rather than genes with spiking expression at precisestages of progression.

Biological Properties of Grade-Associated Genes.Hierarchical cluster analysis of the 3,084 grade-associ-ated genes revealed several clusters of particularlyhighly correlated genes with ostensible biological im-plications, suggesting the coordination of certain bio-logical activities with HCC progression. Genes of thetop cluster (cluster 1) shown in Fig. 2 are characterizedby a gradual increase in transcript levels, with the high-est levels found consistently in G3 tumors. Using GeneOntology terms, we observed enrichment in this clus-ter for genes associated with cell cycle functions, in-cluding numerous genes involved in DNA replication,

chromatin remodeling, and cell proliferation. Compar-ative analysis of this cluster with the human cell cyclegene list defined by Whitfield et al.27 revealed furtherinvolvement of genes having periodic expression dur-ing the cell cycle (Fig. 2, cluster 1). Cluster 2 (middlecluster) is characterized by expression patterns showinga gradual increase from pre-neoplastic lesion to G2HCC, followed by a relatively sharp increase in tran-script abundance in most G3 tumors. These genes,which had the highest overall “within-cluster” correla-tion, are comprised predominantly of genes directlyinvolved in protein synthesis, including ribosomal pro-teins, translation initiation, and elongation factors andconstituents of the spliceosome. Finally, the genescomprising cluster 3 (Fig. 2, bottom cluster) are char-acterized by a gradual but large-magnitude decline intranscript levels from LGDNs to high-grade HCC, andare made up mostly of genes that have central roles inprimary liver function or are expressed exclusively inhepatocytes. These include genes involved in fatty acidand lipid metabolism, detoxification pathways, andsynthesis of complement and coagulation factors, sug-gesting a gradual loss of normal hepatocyte functioncoincident with progressive cellular de-differentiation.

Further examination of the grade-associated genes re-vealed a number of genes that, through altered expression,may contribute directly to the increasing malignant be-havior of advancing HCC. Figure 3 shows representativefunction-associated clusters. For example, the top clustershows 24 such genes known or suspected to play roles inoncogenic transformation or tumor suppression. In addi-tion, several growth factors, genes involved in apoptosis,and cell adhesion molecules that might have potentialroles in HCC development and progression through al-

Fig. 1. Unsupervised hierarchical clustering of pre-neoplastic lesions and primary HCCs can accurately partition tissues according to malignancystatus and high tumor grade. (A) Two-dimensional clustergram of the 10,376 genes selected with minimal filtering criteria (see Materials andMethods). Each row represents a tumor profile; each column represents a probe’s measurements. The color saturation reflects the difference inexpression between the tissue specimen and the common reference RNA. (B) Tissue dendrogram derived from clustering using the 10,376 gene set.Note that the two dominant clusters, cluster I (CI) and cluster II (CII)—with the exception of G1 HCCs—accurately partition the pre-neoplastic andmalignant tissues, and that the CII cluster is further subdivided into branches that are largely grade-specific. LGDN, low-grade dysplastic nodule;HGDN, high-grade dysplastic nodule; G1-3, Edmondson grades 1-3.

812 NAM ET AL. HEPATOLOGY, October 2005

tered expression were also extracted via categorical analy-sis using Gene Ontology as shown in Fig. 3.

Grade-Associated Genes Predict Stage of HCC Pro-gression. We next sought to determine whether we couldidentify a subset of genes that could accurately classify thespecimens according to grade. We addressed two classifica-

tion problems: (1) discriminating among LGDNs,HGDNs, or G1, and (2) discriminating among G1, G2, orG3, because these are the most relevant problems in HCCdiagnosis. Treating these as one five-grade problem couldlimit the use of some genes that otherwise might performwell in the two smaller clinically relevant problems.

Fig. 2. Clusters of grade-associated genes with biological implications. Hierarchical clustergram of 3,084 gene expression patterns with significantassociations with grade (P � .001; F test and/or OVA t test) is shown (left). Three clusters of highly correlated genes are shown: cell cycle genes(cluster 1, top), genes involved in protein synthesis (cluster 2, middle), and genes involved in liver-specific functions (cluster 3, bottom). Unigenenames are given (Unigene build #161). Note that the majority of genes show gradual but continuous change from low dysplastic expression to highmalignant expression, or high dysplastic expression to low malignant expression. LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplasticnodule; G1-3, Edmondson grades 1-3.

HEPATOLOGY, Vol. 42, No. 4, 2005 NAM ET AL. 813

We compared three different classification methods—DLDA, SVM,24 and k-nearest neighbor28—for each ofthe two classification problems. Of note, approximately4% of the values were missing and were thereforeimputed according to the k-nearest neighbor imputationmethod.28 Classification accuracies were assessed usingstratified three-fold cross-validation with 100 repetitions(see Materials and Methods).

The number of genes for each grade, p, was varied tofind the optimal number of gene classifiers. Figure 4A

shows the plot of classification accuracy as p was varied forDLDA and SVM classifiers. We found DLDA and twovarieties of SVM (ie, linear and RBF kernels to be themost robust classifiers for both problems. (Note: becausethe k-nearest neighbor classifiers had inferior perfor-mance, only the SVM and DLDA accuracies are shown.)These results suggested that using 30 to 50 genes pergrade was optimal. We therefore decided to use 40 genes(the 20 most significantly upregulated and the 20 mostsignificantly downregulated in each grade), resulting in

Fig. 3. Expression patterns of grade-associated genes with possible roles in HCC pathogenesis. Grade-correlated genes were classified accordingto Gene Ontology terms or through a search of the literature. Subsets of these genes are shown for the following categories: (1) oncogenes and tumorsuppressors, (2) growth factors, (3) apoptosis, and (4) cell adhesion. Unigene cluster symbols and names are shown (Unigene build #161). LGDN,low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; G1-3, Edmondson grades 1-3.

814 NAM ET AL. HEPATOLOGY, October 2005

120 total genes for each problem (Fig. 4B-C)—that is,120 “early-stage” genes for discriminating among early-stage samples, and 120 “late-stage” genes for discriminat-ing among late-stage samples (Supplementary Tables 2and 3) for a total of 240 grade classifier genes. As shown inFig. 4, we would expect an approximately 95% classifica-tion accuracy in discriminating between early stage sam-ples and an approximately 91% classification accuracy indiscriminating between late-stage samples.

The prediction confidence of a specimen can be as-sessed by the frequency with which it is correctly classifiedin 100 random partitions in three-fold cross-validation. Asummary of the frequency of class assignments using theearly- and late-stage genes is tabulated in Table 1. For theearly-stage samples, 100% of the specimens were correctlyclassified the majority of the time by all three methods.For the late-stage samples, all but two arrays were classi-fied correctly the majority of the time by all three meth-ods. Specimens G2_02 and G3_08 were consistentlymisclassified into the adjacent lower grade.

We next extended our validation of the 240 classifiergenes to an independent set of specimens consisting of 5

new samples and the 9 samples that were previously ex-cluded from the initial analysis. As shown in Table 2, wewere able to correctly classify all 5 of the new fresh sam-ples. Furthermore, despite RNA quality concerns, themajority of the remaining 9 samples (7 of 9) were alsoclassified correctly. These data, though limited by a rela-tively small test set, suggest that these 240 progression-associated genes could be clinically useful classifiers forassisting diagnosis of all stages of hepato-carcinogenesis.

DiscussionPatients with HCC have a poor prognosis because

most HCCs are detected at a stage too late for curativetreatment. Therefore, early detection of small HCC orprecancerous lesions appears to be the best way to achievebetter therapeutic results. However, morphological andmolecular features of precancerous lesions are far frombeing fully elucidated. The terminology of nodular hepa-tocellular lesions adopted by the International WorkingParty of the 1995 World Congress of Gastroenterologysuggests that there is a continuum in hepato-carcinogen-

Fig. 4. Determination of optimal classification strategies and high-accuracy gene classifiers. (A) Classification accuracies as a function of gene(classifier) number are shown for each of three different classification methods (DLDA, SVM linear, and SVM radial), for each of the two gradeproblems: G1-G2-G3 and LGDN-HGDN-G1. (B,C) Genes were classified according to OVA t test–derived P values for the LGDN-HGDN-G1 problem(B) and the G1-G2-G3 problem (C), and the top 20 most highly expressed and 20 most underexpressed genes in each grade class (i.e., 40 genesper grade class) were selected to constitute a 120-gene classifier for each problem. G1-3, Edmondson grades 1-3; LGDN, low-grade dysplasticnodule; HGDN, high-grade dysplastic nodule; DLA, discriminant linear analysis; SVM, support vector machine.

HEPATOLOGY, Vol. 42, No. 4, 2005 NAM ET AL. 815

esis that includes low-grade dysplastic nodules, HGDNs,and dysplastic nodules with microscopic foci of HCC,which may enlarge and replace the nodule giving rise to asmall HCC, and finally advanced HCC.7,9 Despite thefact that this group provided several morphological crite-

ria to discriminate between well-differentiated HCC andHGDN and/or LGDN, they acknowledged that a strictline could not be drawn between premalignant and ma-lignant lesions by simple microscopic observation.

The recent advance of DNA microarray technology, ahigh-throughput method of monitoring gene expression,has made it possible to analyze the expression of thou-sands of genes at once. Consequently, expression profilingby microarrays has been profitably applied to gene discov-ery and class determination in human cancers.29 To un-derstand molecular changes associated with thedevelopmental stages of HCC, we assessed gene expres-sion profiles of the different histopathological stages ofHCC, including LGDNs, HGDNs, and G1-G3 HCCs,using a high-density spotted oligo-nucleotide microarrayanalysis. We observed not only the clear separation ofdysplastic nodule (CI) from overt cancer (CII) but alsograde-specific subclusters of HCC in CII via unsupervisedhierarchical clustering analysis (Fig. 1). These results in-dicate that there is a clear difference in molecular signa-ture between each histological grade in the progression ofHCC. However, there is some molecular heterogeneity inG1 HCC. Most G1 HCCs (6/9) had expression profilesshowing more relation to the frank carcinoma and clus-

Table 1. LGDN-HGDN-G1 Confusion Matrix and G1-G2-G3 Confusion Matrix of the DLDA and SVM Classifiers

Sample

Diagonal LDA SVM Linear SVM Radical

Sample

Diagonal LDA SVM Linear SVM Radical

LGDN HGDN G1 LGDN HGDN G1 LGDN HGDN G1 G1 G2 G3 G1 G2 G3 G1 G2 G3

LGDN_02 95 5 0 97 3 0 96 4 0 G1_01 62 37 1 85 15 0 86 14 0LGDN_03 65 35 0 80 20 0 76 24 0 G1_03 94 6 0 82 18 0 87 13 0LGDN_05 100 0 0 100 0 0 100 0 0 G1_04 99 1 0 99 1 0 98 2 0LGDN_07 100 0 0 100 0 0 100 0 0 G1_05 100 0 0 100 0 0 100 0 0LGDN_08 99 1 0 99 1 0 99 1 0 G1_06 100 0 0 100 0 0 100 0 0LGDN_09 99 1 0 98 2 0 97 3 0 G1_07 88 12 0 95 5 0 98 2 0LGDN_10 100 0 0 100 0 0 99 1 0 G1_08 89 11 0 97 3 0 95 5 0HGDN_01 1 93 6 3 57 40 3 62 35 G1_09 100 0 0 99 1 0 99 1 0HGDN_02 0 100 0 0 96 4 0 98 2 G1_10 100 0 0 100 0 0 100 0 0HGDN_05 0 100 0 0 100 0 0 100 0 G2_01 0 95 5 0 97 3 0 94 6HGDN_06 0 100 0 0 98 2 0 97 3 G2_02* 57 43 0 60 40 0 66 34 0HGDN_07 0 100 0 0 100 0 0 100 0 G2_03 0 100 0 4 93 3 3 91 6HGDN_08 0 100 0 0 100 0 0 100 0 G2_04 1 99 0 5 94 1 5 95 0HGDN_10 0 100 0 1 99 0 3 97 0 G2_05 0 100 0 0 99 1 0 99 1G1_01 0 0 100 0 0 100 0 0 100 G2_06 0 92 8 0 90 10 0 86 14G1_03 0 0 100 0 0 100 0 0 100 G2_07 2 97 1 11 84 5 14 83 3G1_04 0 0 100 0 0 100 0 0 100 G2_08 0 89 11 0 68 32 0 62 38G1_05 2 10 88 0 0 100 1 0 99 G2_09 0 97 3 1 88 11 1 84 15G1_06 0 2 98 0 0 100 0 1 99 G2_10 1 99 0 12 88 0 18 82 0G1_07 0 2 98 0 9 91 0 7 93 G3_02 0 0 100 0 0 100 0 0 100G1_08 0 3 97 0 3 97 0 3 97 G3_04 0 17 83 0 22 78 0 26 74G1_09 0 42 58 1 6 93 1 7 92 G3_05 0 0 100 0 0 100 0 0 100G1_10 0 0 100 0 0 100 0 0 100 G3_06 0 1 99 0 7 93 0 7 93

G3_07 0 0 100 0 2 98 0 6 94G3_08* 0 57 43 0 65 35 0 64 36G3_09 0 2 98 0 10 90 0 11 89G3_10 0 2 98 0 5 95 1 7 92

Abbreviation: LDA, linear discriminant analysis.*Indicates misclassified sample.

Table 2. Good Overall Classification by DLDA on TwoIndependent Datasets

Sample Class Prediction

New dataset HCC_168 G1 G1HCC_141 G2 G2HCC_143 G1 G1HCC_203 HGDN HGDNHCC_219 G2 G2

Excluded dataset LGDN_01 LGDN LGDNLGDN_04 LGDN LGDNLGDN_06 LGDN LGDNHGDN_03 HGDN HGDNHGDN_04 HGDN HGDNHGDN_09 HGDN HGDNG1_02 G1 G1G3_01* G3 G2G3_03* G3 G1

NOTE. The new dataset is comprised of five previously unanalyzed tumorsamples; the excluded dataset includes the nine arrays that were previouslyexcluded from analysis due to suboptimal RNA or hybridization features.

*Indicates misclassified sample.

816 NAM ET AL. HEPATOLOGY, October 2005

tered adjacent to G2 and G3 HCCs as shown in Fig. 1;however, 3 cases of G1 HCCs (G1-5, G1-6, and G1-9)were grouped into precancerous nodules. Among these 3cases, G1-9 was confirmed as eHCC upon histopatholog-ical review. In fact, histologically defined G1 HCC lesionscan be further divided into small HCC with indistinctivemargin (eHCC) and small nodular HCC with distinctivemargin, with more than half of these encapsulated by athin fibrous capsule.30 Unlike eHCC, which lacks an in-vasive growth pattern, the later lesion revealed tumor in-vasion into the portal vein and intrahepatic metastasis in27% (G1-5) and 10% (G1-6).31 This heterogeneitystrongly suggests that G1 HCC might border betweenpre-neoplastic lesion and outright carcinoma representinga transition state from dysplasia to carcinoma. Further-more, after carefully analyzing whether or not the replica-tive state of HBV infection could influence the resultantexpression profiling by using pathological informationpertaining to the replicative state of HBV infection (asmeasured by HBV DNA levels in serum), we were unableto find any significant correlation between our gene ex-pression results and the replicative state of HBV in thesamples, suggesting that the replicative state of HBV haslittle or no measurable effect on gene transcription in ourHCC samples.

We also identified 3,084 grade-associated genes whosetranscript levels were either positively or negatively corre-lated with tumor progression through a combination ofone-way ANOVA and OVA unpooled t test. Functionalanalysis of these genes revealed discrete expression clustersrepresenting grade-dependent biological properties ofHCC, including cell proliferation, protein synthesis, andhepatocyte-specific function (Fig. 2). Using Gene Ontol-ogy terms, we performed categorical analysis according togene function and extracted a number of well-knowngenes as tumor suppressor, oncogenes, growth factors,effectors of apoptosis, and cell adhesion molecules in-volved in cell– cell and cell–matrix interactions whose ex-pression patterns were associated with grade. Forexample, RARRES3 (retinoic acid receptor responder 3) isa class II tumor suppressor gene (i.e., downregulated intumorigenesis rather than mutated) with growth suppres-sive and apoptosis-inducing activity.32 It has previouslybeen found to be downregulated in a manner correlatedwith progression of B-CLL33 and cellular de-differentia-tion in colorectal adenocarcinoma,34 consistent with ourobservation that this gene is downregulated in G2 and G3HCCs. The majority of oncogenes and tumor suppressorsidentified here demonstrate expression patterns that sys-tematically change from dysplasia to carcinoma, and insome cases, with alterations in expression already detect-able in the pre-neoplastic state. It is therefore interesting

to speculate that these genes, acting together or separately,could be directly involved in common pathways of HCCpathogenesis in a grade-dependent fashion.

Due to the recent advances in diagnostic imaging tech-niques and increased clinical and pathological interest,small hepatocellular nodular lesions even less than 1 cm insize are frequently detected in patients with cirrhosis whohave been monitored as high-risk patients. These nodulescould be LGDNs, HGDNs, or well-differentiated smallHCCs and sonographic- or CT-guided needle biopsiesfrom these nodules are performed routinely for differen-tial diagnosis. However, it is often difficult—even for ahepatopathologist—to differentiate among these lesions,especially in needle-biopsied specimens with limited ma-terial. For this reason, the discovery of an objective mo-lecular marker or classifier genes that will help tostandardize histological differential diagnosis of thesenodules and lead to appropriate treatment is eagerly an-ticipated. In the present study, we identified a subset ofgenes that could accurately classify specimens accordingto histological grade. We considered these as two separateproblems rather than one five-grade problem: (1) discrim-inating among LGDNs, HGDNs, or G1 HCCs (early-stage lesions) and (2) discriminating among G1, G2, orG3 HCCs (late-stage lesions). We selected the top 20most highly expressed and 20 most under-expressed genesin each grade class (i.e., 40 gene per grade class) resultingin 120 total genes for each problem. Indeed, of 23 pre-neoplastic lesions and G1 HCCs, none was misclassifiedby a chosen set of 120 early-stage–associated genes. And 2samples (2/27) were misclassified in the case of overtHCC (Table 1). We extended our confidence validationanalysis for the 240 outlier genes to an independent set ofspecimens consisting of 5 new samples and the 9 samplesthat were previously excluded for analysis because of RNAquality concerns. We were able to correctly classify allsamples except for two G3 cases, which were misclassifiedas G2 and G1, respectively. Although more testing on alarger, independent set of tumors graded by a differentpathologist will be necessary to establish the accuracy andclinical value of the classifier, this implied that a series of240 outlier genes could potentially be good classifiers es-pecially for distinguishing among LGDNs, HGDNs, andG1 HCCs via both DLDA and SVM.

In conclusion, it is true that despite numerous investi-gations of hepato-carcinogenesis, only limited or incom-plete data are available regarding gene expression profilesduring the development and progression of HCC in hu-mans.18,19,35-37 The systemic approaches such as the si-multaneous evaluation of genome-wide transcripts andregulatory pathways in precancerous lesions and HCCsare necessary to gain much-needed molecular insight into

HEPATOLOGY, Vol. 42, No. 4, 2005 NAM ET AL. 817

hepato-carcinogenesis. We uncovered the molecular sig-natures of pre-neoplastic lesions and early- and advanced-stage HCC. Our 240 classifier genes for distinguishingthe early and advanced stages of HCC exhibited highfidelity in classification from pre-neoplastic lesions toHCCs. Through further informative analysis of these out-lier genes and intensive clinical validations, we hope toidentify clinically useful biomarkers that will facilitateearly detection of liver cancer, and perhaps further eluci-date the underlying molecular pathology of HCC.

References1. Thorgeirsson SS, Grisham JW. Molecular pathogenesis of human hepato-

cellular carcinoma. Nat Genet 2002;31:339-346.2. Pang A, Ng IO, Fan ST, Kwong YL. Clinicopathologic significance of

genetic alterations in hepatocellular carcinoma. Cancer Genet Cytogenet2003;146:8-15.

3. de La CA, Romagnolo B, Billuart P, Renard CA, Buendia MA, SoubraneO, et al. Somatic mutations of the betacatenin gene are frequent in mouseand human hepatocellular carcinomas. Proc Natl Acad Sci U S A 1998;95:8847-8851.

4. Satoh S, Daigo Y, Furukawa Y, Kato T, Miwa N, Nishiwaki T, et al.AXIN1 mutations in hepatocellular carcinomas, and growth suppressionin cancer cells by virus-mediated transfer of AXIN1. Nat Genet 2000;24:245-250.

5. Takayama T, Makuuchi M, Hirohashi S, Sakamoto M, Okazaki N, Taka-yasu K, et al. Malignant transformation of adenomatous hyperplasia tohepatocellular carcinoma. Lancet 1990;336:1150-1153.

6. Theise ND, Park YN, Kojiro M. Dysplastic nodules and hepatocarcino-genesis. Clin Liver Dis 2002;6:497-512.

7. Terminology of nodular hepatocellular lesions. International WorkingParty. HEPATOLOGY 1995;22:983-993.

8. Tornillo L, Carafa V, Sauter G, Moch H, Minola E, Gambacorta M, et al.Chromosomal alterations in hepatocellular nodules by comparativegenomic hybridization: high-grade dysplastic nodules represent early stagesof hepatocellular carcinoma. Lab Invest 2002;82:547-553.

9. Sakamoto M, Hirohashi S, Shimosato Y. Early stages of multistep hepato-carcinogenesis: adenomatous hyperplasia and early hepatocellular carci-noma. Hum Pathol 1991;22:172-178.

10. Kanai T, Hirohashi S, Upton MP, Noguchi M, Kishi K, Makuuchi M, etal. Pathology of small hepatocellular carcinoma. A proposal for a new grossclassification. Cancer 1987;60:810-819.

11. Takayama T, Makuuchi M, Hirohashi S, Sakamoto M, Yamamoto J,Shimada K, et al. Early hepatocellular carcinoma as an entity with a highrate of surgical cure. HEPATOLOGY 1998;28:1241-1246.

12. Kojiro M. Premalignant lesions of hepatocellular carcinoma: pathologicviewpoint. J Hepatobiliary Pancreat Surg 2000;7:535-541.

13. Theise ND, Park YN, Kojiro M. Dysplastic nodules and hepatocarcino-genesis. Clin Liver Dis 2002;6:497-512.

14. Edmondson HA, Steiner PE. Primary carcinoma of the liver: a study of 100cases among 48,900 necropsies. Cancer 1954;7:462-503.

15. Ferrell LD, Crawford JM, Dhillon AP, Scheuer PJ, Nakanuma Y. Proposalfor standardized criteria for the diagnosis of benign, borderline, and ma-lignant hepatocellular lesions arising in chronic advanced liver disease.Am J Surg Pathol 1993;17:1113-1123.

16. Kenmochi K, Sugihara S, Kojiro M. Relationship of histologic grade ofhepatocellular carcinoma (HCC) to tumor size, and demonstration oftumor cells of multiple different grades in single small HCC. Liver 1987;7:18-26.

17. Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, et al. Gene expressionpatterns in human liver cancers. Mol Biol Cell 2002;13:1929-1939.

18. Xu XR, Huang J, Xu ZG, Qian BZ, Zhu ZD, Yan Q, et al. Insight intohepatocellular carcinogenesis at transcriptome level by comparing geneexpression profiles of hepatocellular carcinoma with those of correspond-ing noncancerous liver. Proc Natl Acad Sci U S A 2001;98:15089-15094.

19. Okabe H, Satoh S, Kato T, Kitahara O, Yanagawa R, Yamaoka Y, et al.Genome-wide analysis of gene expression in human hepatocellular carci-nomas using cDNA microarray: identification of genes involved in viralcarcinogenesis and tumor progression. Cancer Res 2001;61:2129-2137.

20. Lee JS, Chu IS, Heo J, Calvisi DF, Sun Z, Roskams T, et al. Classificationand prediction of survival in hepatocellular carcinoma by gene expressionprofiling. HEPATOLOGY 2004;40:667-676.

21. Eisen MB, Brown PO. DNA arrays for analysis of gene expression. Meth-ods Enzymol 1999;303:179-205.

22. DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and geneticcontrol of gene expression on a genomic scale. Science 1997;278:680-686.

23. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis anddisplay of genome-wide expression patterns. Proc Natl Acad Sci U S A1998;95:14863-14868.

24. Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, etal. Knowledge-based analysis of microarray gene expression data by usingsupport vector machines. Proc Natl Acad Sci U S A 2000;97:262-267.

25. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D.Support vector machine classification and validation of cancer tissue samplesusing microarray expression data. Bioinformatics 2000;16:906-914.

26. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M,et al. Multiclass cancer diagnosis using tumor gene expression signatures.Proc Natl Acad Sci U S A 2001;98:15149-15154.

27. Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, AlexanderKE, et al. Identification of genes periodically expressed in the human cellcycle and their expression in tumors. Mol Biol Cell 2002;13:1977-2000.

28. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R,et al. Missing value estimation methods for DNA microarrays. Bioinfor-matics 2001;17:520-525.

29. Kim I, Kang HC, Park J. Microarray application in cancer research. CancerResearch and Treatment 2004;36:207-213.

30. Kojiro M. Pathology of early hepatocellular carcinoma: progression fromearly to advanced. Hepatogastroenterology 1998;45(Suppl 3):1203-1205.

31. Kojiro M, Yano H, Nakashima O. Pathology of early hepatocellular car-cinoma: progression from early to advanced. Semin Surg Oncol 1996;12:197-203.

32. Huang SL, Shyu RY, Yeh MY, Jiang SY. The retinoid-inducible gene I:effect on apoptosis and mitogen-activated kinase signal pathways. Antican-cer Res 2002;22:799-804.

33. Casanova B, de la Fuente MT, Garcia-Gila M, Sanz L, Silva A, Garcia-Marco JA, Garcia-Pardo A. The class II tumor-suppressor gene RARRES3is expressed in B cell lymphocytic leukemias and down-regulated withdisease progression. Leukemia 2001;15:1521-1526.

34. Shyu RY, Jiang SY, Chou JM, Shih YL, Lee MS, Yu JC, et al. RARRES3expression positively correlated to tumour differentiation in tissues of colo-rectal adenocarcinoma. Br J Cancer 2003;89:146-151.

35. Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, et al. Geneexpression patterns in human liver cancers. Mol Biol Cell 2002;13:1929-1939.

36. Chuma M, Sakamoto M, Yamazaki K, Ohta T, Ohki M, Asaka M, et al.Expression profiling in multistage hepatocarcinogenesis: identification ofHSP70 as a molecular marker of early hepatocellular carcinoma. HEPATOL-OGY 2003;37:198-207.

37. Lee JS, Chu IS, Heo J, Calvisi DF, Sun Z, Roskams T, et al. Classificationand prediction of survival in hepatocellular carcinoma by gene expressionprofiling. HEPATOLOGY 2004;40:667-676.

818 NAM ET AL. HEPATOLOGY, October 2005