Hierarchical Clustering of Flow Cytometry Data for the Study of Conventional Central Chondrosarcoma

11
Hierarchical Clustering of Flow Cytometry Data for the Study of Conventional Central Chondrosarcoma JOSE DIAZ-ROMERO, 1 * SALVATORE ROMEO, 2 JUDITH V.M.G. BOVE ´ E, 2 PANCRAS C.W. HOGENDOORN, 2 PAUL F. HEINI, 3 AND PIERRE MAINIL-VARLET 1 1 Osteoarticular Research Group, Institute of Pathology, University of Bern, Bern, Switzerland 2 Department of Pathology, Leiden University Medical Centre, Leiden, The Netherlands 3 Department of Orthopedic Surgery, Inselspital University of Bern, Bern, Switzerland We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10- high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes. J. Cell. Physiol. 225: 601–611, 2010. ß 2010 Wiley-Liss, Inc. Chondrosarcomas are a heterogeneous group of malignant bone tumors characterized by the production of cartilaginous matrix, with conventional chondrosarcomas being the most common variant (Bovee et al., 2005). Conventional chondrosarcomas can be further divided, according to their location in bone, between central and peripheral, with the majority located centrally within the medullar cavity. The histological grading of conventional chondrosarcomas into three grades of malignancy is until now the only statistically significant way to predict clinical behavior, even though the grading is subjected to considerable inter-observer variability (Aigner et al., 2002; Boeuf et al., 2008; Eefting et al., 2009). In spite of advances in the understanding of chondrosarcomas at the molecular level, no established molecular markers for tumor identification or classification are available (Aigner, 2002), and the putative cell origin of these tumors remains uncertain (Aigner, 2002; Boeuf et al., 2008). A debate currently exists whether central chondrosarcoma arises from cartilaginous remnant in the bone after closure of the growth plate or from a mesenchymal stem cell. The emerging fields of functional genomics and proteomics have revolutionized the approach to disease classification, with simultaneous analysis of thousands of genes or proteins to delineate expression patterns (Habib and Finn, 2006). Although often not considered as such, flow cytometry (FCM) is at its essence a proteomic method, albeit on a relatively small scale. Although limited in the number of proteins simultaneously analyzable, FCM is widely available, can isolate specific cell populations, and more importantly, it measures protein expression in the cell, an intact functional unit. Although currently used more for marker detection or validation, newer approaches to FCM analysis involve the treatment of data as single high-dimensional datasets and subsequent analysis through cluster algorithms (Petrausch et al., 2006). Cluster analysis aims to group either the data or the variables into clusters such that the elements within a cluster have a high degree of association among themselves while clusters remain relatively distinct from one another. Hierarchical clustering (HC) methods partition the objects without predetermined number of clusters into a tree of nodes, with each node representing a cluster. HC is popular in biology because the results can be easily visualized using heat maps and Additional Supporting Information may be found in the online version of this article. Contract grant sponsor: European Commission; Contract grant number: 018814. Salvatore Romeo’s present address is Department of Pathology, Treviso Regional Hospital, Treviso, Italy. Paul F. Heini’s present address is Department of Neurosurgery and Spinal Orthopedics, Sonnenhof Hospital, Bern, Switzerland. *Correspondence to: Jose Diaz-Romero, Osteoarticular Research Group, Institute of Pathology, University of Bern, Murtenstrasse 31, 3010 Bern, Switzerland. E-mail: [email protected] Received 10 March 2010; Accepted 7 May 2010 Published online in Wiley Online Library (wileyonlinelibrary.com.), 19 May 2010. DOI: 10.1002/jcp.22245 ORIGINAL ARTICLE 601 Journal of Journal of Cellular Physiology Cellular Physiology ß 2010 WILEY-LISS, INC.

Transcript of Hierarchical Clustering of Flow Cytometry Data for the Study of Conventional Central Chondrosarcoma

ORIGINAL ARTICLE 601J o u r n a l o fJ o u r n a l o f

CellularPhysiologyCellularPhysiology

Hierarchical Clustering of Flow

Cytometry Data for the Study ofConventional CentralChondrosarcoma

JOSE DIAZ-ROMERO,1* SALVATORE ROMEO,2 JUDITH V.M.G. BOVEE,2

PANCRAS C.W. HOGENDOORN,2 PAUL F. HEINI,3 AND PIERRE MAINIL-VARLET1

1Osteoarticular Research Group, Institute of Pathology, University of Bern, Bern, Switzerland2Department of Pathology, Leiden University Medical Centre, Leiden, The Netherlands3Department of Orthopedic Surgery, Inselspital University of Bern, Bern, Switzerland

We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma,a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known celltypes. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell linesfrom chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eighthierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariateoutlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clusteringapproaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacomacells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together withmesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcomacells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells,but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering appliedto flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover newcancer subtypes.

J. Cell. Physiol. 225: 601–611, 2010. � 2010 Wiley-Liss, Inc.

Additional Supporting Information may be found in the onlineversion of this article.

Contract grant sponsor: European Commission;Contract grant number: 018814.

Salvatore Romeo’s present address is Department of Pathology,Treviso Regional Hospital, Treviso, Italy.

Paul F. Heini’s present address is Department of Neurosurgery andSpinal Orthopedics, Sonnenhof Hospital, Bern, Switzerland.

*Correspondence to: Jose Diaz-Romero, Osteoarticular ResearchGroup, Institute of Pathology, University of Bern, Murtenstrasse31, 3010 Bern, Switzerland.E-mail: [email protected]

Received 10 March 2010; Accepted 7 May 2010

Published online in Wiley Online Library(wileyonlinelibrary.com.), 19 May 2010.DOI: 10.1002/jcp.22245

Chondrosarcomas are a heterogeneous group of malignantbone tumors characterized by the production of cartilaginousmatrix, with conventional chondrosarcomas being the mostcommon variant (Bovee et al., 2005). Conventionalchondrosarcomas can be further divided, according to theirlocation in bone, between central and peripheral, with themajority located centrally within the medullar cavity. Thehistological grading of conventional chondrosarcomas intothree grades of malignancy is until now the only statisticallysignificant way to predict clinical behavior, even though thegrading is subjected to considerable inter-observer variability(Aigner et al., 2002; Boeuf et al., 2008; Eefting et al., 2009). Inspite of advances in the understanding of chondrosarcomas atthe molecular level, no established molecular markers fortumor identification or classification are available (Aigner,2002), and the putative cell origin of these tumors remainsuncertain (Aigner, 2002; Boeuf et al., 2008). A debate currentlyexists whether central chondrosarcoma arises fromcartilaginous remnant in the bone after closure of the growthplate or from a mesenchymal stem cell.

The emerging fields of functional genomics and proteomicshave revolutionized the approach to disease classification, withsimultaneous analysis of thousands of genes or proteins todelineate expression patterns (Habib and Finn, 2006). Althoughoften not considered as such, flow cytometry (FCM) is at itsessence a proteomic method, albeit on a relatively small scale.Although limited in the number of proteins simultaneouslyanalyzable, FCM is widely available, can isolate specific cellpopulations, and more importantly, it measures proteinexpression in the cell, an intact functional unit. Althoughcurrently used more for marker detection or validation, newer

� 2 0 1 0 W I L E Y - L I S S , I N C .

approaches to FCM analysis involve the treatment of data assingle high-dimensional datasets and subsequent analysisthrough cluster algorithms (Petrausch et al., 2006). Clusteranalysis aims to group either the data or the variables intoclusters such that the elements within a cluster have a highdegree of association among themselves while clusters remainrelatively distinct from one another. Hierarchical clustering(HC) methods partition the objects without predeterminednumber of clusters into a tree of nodes, with each noderepresenting a cluster. HC is popular in biology because theresults can be easily visualized using heat maps and

TABLE 1. Chondrosarcoma samples used in the present study

Case Grade Location Gender Agea Follow-upb Outcome

L1812 GI Femur F 27 42 NEDL1892 GI Humerus F 55 38 NEDL2174 GI Femur F 49 32 NEDL2182 GI Humerus F 40 15 NEDL2252 GI Femur F 29 17 NEDL2279 GI Humerus F 43 27 RecurrenceL2345 GI Fibula F 65 20 NEDL1662 GII Sternum M 59 49 NEDL1729 GII Rib F 46 41 NEDL2388 GII Acetabulum M 42 16 RecurrenceL1515 GIII Pelvis M 60 61 MetastasisL1158 GIII Femur M 69 96 Metastasis, DODL1679 GIII Pelvis F 40 57 Recurrence, DOD

F, female; M, male; NED, no evidence of disease; DOD, died of disease.aAt diagnosis in years.bIn months.

602 D I A Z - R O M E R O E T A L .

dendrograms, facilitating the identification of potentialrelationships within a biologically relevant context, and is widelyused in gene expression (de Souto et al., 2008) and proteomics(Meunier et al., 2007; Sardiu et al., 2009) studies. However, HCapproaches to analyze FCM biomedical data have been onlysporadically used to investigate normal and malignanthematopoietic cells (Maynadie et al., 2002; Habib and Finn,2006; Rawstron et al., 2006; Zucchetto et al., 2006; Lugli et al.,2007).

There is no one-size-fit-all solution to clustering and thechoice of a suitable approach depends both on the type of dataand on the particular purpose of the study (D’Haeseleer, 2005;Sardiu et al., 2009). HC is a multiple-choice process in whichdifferent selection of criteria for analysis can yield very differentresults for the same data set (Meunier et al., 2007; Sardiu et al.,2009). In order to put samples and variables on comparablescales, data preprocessing is needed before clustering can beperformed (Verhaak et al., 2006; Meunier et al., 2007).Furthermore, the selection of an appropriate metric(measuring the distance between pairs of observations) and alinkage criterion (specifying the dissimilarity of sets as a functionof the pairwise distances of observations in the sets) arerequired. HC is subdivided into agglomerative methods (themore commonly used), which proceed by series of fusions ofthe ‘‘n’’ objects into groups, and divisive methods, whichseparate ‘‘n’’ objects successively into finer groupings (Belacelet al., 2006). Therefore, HC needs extensive parameter fine-tuning in order to reveal biologically meaningful results.

In the present study, we have investigated the use of differentdata preprocessing strategies, distance metrics and a widevariety of clustering algorithms in order to group conventionalcentral chondrosarcomas according to their cell surfacemarker expression profiles obtained by FCM. By using knowncell types for which an independent knowledge of clustermembership was available (human articular chondrocytes,mesenchymal stem cells, fibroblasts, and a panel of tumor celllines) we could: (a) asses cluster accuracy and guide the choiceof the different clustering approaches tested; and (b) establishsimilarities with chondrosarcoma cells that can provide cluesconcerning the cellular origin of these tumors.

Materials and MethodsCell isolation and cell culture

Human articular cartilage samples were obtained from the lateralfemoral condyles of knee joints from 10 subjects at autopsy within24 h post-mortem (in agreement with the ethical guidelines of theInstitute of Pathology, University of Bern). None of the subjects hada clinical history of arthritis or any other pathology affectingcartilage, and the specimens appeared normal by morphologicalexamination. Human articular chondrocytes were obtained byisolation from articular cartilage samples by sequential pronase/collagenase digestion as previously described (Diaz-Romero et al.,2005). Human bone marrow samples were obtained by standardaspiration from the iliac crest of 10 donors undergoing spinal fusionsurgery at the Inselspital (University of Bern) according to localethical committee guidelines. The mononuclear cell fraction fromeach bone marrow sample was obtained by Ficoll-Paque Plus (GEHealthcare, Otelfingen, Switzerland) separation of low-densitycells from red blood cells and granulocytes. Mesenchymal stemcells were isolated from the mononuclear cell fraction byadherence selection to plastic in vitro (Diaz-Romero et al., 2008).Human primary fibroblasts were isolated from non-fixedmastoplasty reduction specimens by mechanic and enzymaticdigestion as previously described (Szuhai and Tanke, 2006). Thehuman chondrosarcoma cell lines CH2879 (Gil-Benso et al., 2003)and OUMS27 (Kunisada et al., 1998) were generously provided byProfessor A. Llombart-Bosch and Dr. M. Namba, respectively. Thehuman chondrosarcoma cell line SW1353 (Ouyang, 1998), the

JOURNAL OF CELLULAR PHYSIOLOGY

human cervical carcinoma cell lines CaSki (Pattillo et al., 1977) andHeLa (Gey et al., 1952), the human colon adenocarcinoma cell lineSW48 (Leibovitz et al., 1976), the human epidermoid lungcarcinoma cell line Calu-1 (Perucho et al., 1981), and the humanprostatic carcinoma cell line PC-3 (Kaighn et al., 1979) wereobtained from the American Type Cell Culture (ATCC, Rockville,MD). Conventional central chondrosarcoma samples wereobtained from resected specimens at the Leiden University MedicalCenter. According to the histological grading (Eefting et al., 2009),the chondrosarcoma samples included 7 grade I, 3 grade II, and 3grade III chondrosarcomas (Table 1). All specimens were handledaccording to the ethical guidelines as described in the Code forProper Secondary Use of Human Tissue in The Netherlands of theDutch Federation of Medical Scientific Societies. Human primarycentral chondrosarcoma cells were isolated from non-fixedchondrosarcoma samples by mechanical and enzymatic digestion aspreviously described (Szuhai and Tanke, 2006). All cells werecultured in monolayer in a humidified 378C/5% CO2 incubator inculture medium containing D-MEM/F-12 (Life Technologies, Basel,Switzerland), 10% fetal bovine serum (Hyclone, Lausanne,Switzerland), 100 U/ml penicillinþ 100 mg/ml streptomycin (LifeTechnologies), 1 ng/ml of TGF-b1 (Acris Antibodies, Herford,Germany), and 5 ng/ml of FGF-2 (Acris Antibodies). TGF-b1 andFGF-2 have been shown to increase the proliferation ofchondrocytes during monolayer culture (Jakob et al., 2001), andpreliminary experiments in our group have shown a similar effecton primary chondrosarcoma cells, which otherwise proliferatevery poorly (data not shown). Cells were passaged atsubconfluence until passage 3, harvested with trypsin/EDTA (LifeTechnologies), and cryopreserved in liquid nitrogen until FCManalysis as previously described (Diaz-Romero et al., 2005).

Flow cytometry

Immunophenotypic analysis was performed as previouslydescribed (Diaz-Romero et al., 2005). Cells were stained with oneof the following monoclonal antibodies directly conjugated toeither FITC or PE: anti-CD90 FITC, anti-CD105 FITC, anti-CD10-PE, anti-CD49b PE (Serotec, Dusseldorf, Germany), anti-CD26-FITC, anti-CD49a PE, anti-CD49c PE, anti-CD49f PE, anti-CD104PE, anti-CD221 PE (Becton Dickinson, Allschwil, Switzerland), andanti-CD14 PE (Diatec, Oslo, Norway). 7-aminoctinomycin D(7-AAD, Invitrogen, Basel, Switzerland) was added to each sampleto exclude dead (permeable) cells. Non-specific staining wasassessed using isotype controls. Cells were analyzed on a FACScanflow cytometer (Becton Dickinson), and for each sample, a regionfor live cells (cells excluding 7-AAD) was defined and at least 10,000cells were acquired. Data analysis was performed with FlowJosoftware (version 3.4, Tree Star Inc., San Carlos, CA) to calculatethe MFI of the cells. For background normalization, levels of

H I E R A R C H I C A L C L U S T E R I N G O F F L O W C Y T O M E T R Y 603

expression were expressed as MFI ratios. These values representthe MFI determined for each specific antibody divided by the MFI ofthe appropriate isotype control. Tukey’s box-and-whisker plots ofthe level of expression of the different markers were generatedusing Graph Pad Prism 5 (Graph Pad Software, Inc., San Diego, CA).

Hierarchical clustering

All analyses were performed using the open-source statisticalpackage R (Gentleman et al., 2004), based on the level of expressionmatrix, with rows corresponding to cell surface markers andcolumns corresponding to samples. Prior to clustering, datacleaning and data transformation steps were used for datapreprocessing. Multivariate outlier detection was used for datacleaning of the four cell types with known membership (humanarticular chondrocytes, mesenchymal stem cells, fibroblasts, andtumor cell lines) based on the Euclidean interpoint distance matrixfor each class/cell type (Marchette and Solka, 2003), after z-scorenormalization of the columns/samples (where the column mean issubtracted from each value and divided by the standard deviation).Using these matrices, the interpoint distances for each sample,calculated as the average distance for any given sample to the rest ofthe members of the class, and the interpoint distance for each class,calculated as the average distance of all samples inside a class, wereobtained. Samples exceeding two standard deviations from thecorresponding class interpoint distance were considered asoutliers and removed from further analysis. Two popular datatransformation procedures in clustering analysis, logarithmictransformation (base 2) and z-score normalization (of rows/markers), were evaluated and compared to non-transformed data.The logarithmic transformation compresses the dynamic range oflevels of expression reducing the effect of large influential values,while the z-score normalization put all of the values on the samescale and centers the data on 0. Each type of transformed data wasapplied to eight different hierarchical clustering algorithms tosystematically compare and contrast the results. Single linkage,complete linkage, average linkage, weighted average linkage,centroid linkage, weighted centroid linkage, and Ward’s linkage,are commonly used agglomerative methods (Morgan and Ray,1995). Single and complete linkage calculate the distance betweentwo clusters based on individual objects (the closest or the farthest,respectively), whereas average linkage uses average distancesbetween all pairs of objects, and centroid linkage uses distancesbetween centroids (average points in the multidimensional spacedefined by the dimensions) (de Souto et al., 2008). Weightedaverage and weighted centroid (median) linkage are identical toaverage and centroid linkage, respectively, apart from the fact thatweighting is introduced into the computations to take intoconsideration differences in cluster sizes. Ward’s linkage does notcompute distances but rather forms clusters by minimizing thewithin-cluster variance. DIANA, in contrast to the otheralgorithms, is divisive and, starting with all objects in one cluster,forms clusters by splitting them based on maximum pair-wisedissimilarities (Kaufman and Rousseeuw, 2005). Each of thealgorithms was implemented with three different distance metricsthat are commonly used for gene and protein expression clusteringanalysis: the Pearson correlation, the Euclidean distance, and theManhattan distance (D’Haeseleer, 2005). While Euclidean andManhattan measure distance between pairs of data points in space,Pearson quantifies the correlation between them. Agglomerativeclustering was performed with the hcluster function in ‘‘amap’’package (http://cran.r-project.org/web/packages/amap/index.html), divisive clustering was performed with the dianafunction in ‘‘cluster’’ package, and the heatmap.2 function in‘‘gplots’’ package (http://cran.r-project.org/web/packages/gplots/index.html) was used for clustering visualization. For clustervalidation, both external and internal validation techniques wereused (Handl et al., 2005). External validations evaluate clusteringresults based on the knowledge of the correct class membership,

JOURNAL OF CELLULAR PHYSIOLOGY

permitting an entirely objective evaluation and comparison ofclustering algorithms on benchmark data. The corrected Randindex, calculated with the classAgreement function in ‘‘e1071’’package (http://cran.r-project.org/web/packages/e1071/index.html), was used as an external validation method (Hubert andArabie, 1985; de Souto et al., 2008) to compare the a priori knownclass membership of the data (human articular chondrocytes,mesenchymal stem cells, fibroblasts, and tumor cell lines) with thedifferent clustering solutions when a number of four clusters wasimposed (using the cutree function in ‘‘stats’’ package). Internalvalidation techniques do not use additional knowledge in the formof class membership, but base their quality estimate on theinformation intrinsic to the data alone. The cophenetic correlationcoefficient was used as an internal validation method (Sokal andRohlf, 1962). This coefficient is defined as the Pearson correlation(calculated with the cor function in ‘‘stats’’ package) between thedistance matrix of the data before clustering (calculated with theDist function in ‘‘amap’’ package), and the cophenetic distancematrix derived from the clustering solution (calculated with thecophenetic function in ‘‘stats’’ package).

ResultsData set

The FCM data set (available as Supplemental Information)consists of 10 human articular chondrocytes (HAC), 10mesenchymal stem cells (MSC), 6 fibroblasts (FIB), 8 tumor celllines (TCL), and 13 primary central conventionalchondrosarcoma cells (CS) of different grades. All cells werecultured in monolayer under the same conditions and analyzedby FCM for the level of expression, based on mean fluorescenceintensity (MFI) ratios, of 11 surface markers (Fig. 1). While mostof these markers were selected based on the potential todistinguish between cell types of mesenchymal lineage (Diaz-Romero et al., 2008), CD221 (insulin-like growth factor-Ireceptor) and CD104 (b4 integrin subunit) were included dueto the reported overexpression in tumors, with CD221expressed in a variety of malignancies and CD104 morerestricted to epithelial-derived cancers (Li et al., 2009; Yanget al., 2009). All markers showed a unimodal profile by FCM inall samples (data not shown), indicative of the lack of cellularsubpopulations (Diaz-Romero et al., 2008). A large dynamicrange for the level of expression of the cell surface markersanalyzed was found, with some markers, such as CD14,exhibiting very low values, and others, such as CD90, displayingseveral orders of magnitude higher values. While the samplesincluded in the HAC, MSC, and FIB groups could be consideredas biological replicates, the TCL group consisted of cell lines ofdifferent origin (three chondrocytic and five epithelial) that isreflected in the heterogeneous expression of the cell surfacemarkers investigated in this group (Supplemental Fig. 1).

Data cleaning by multivariate outlier detection

To guide the choice of the HC method to be used for clusteringCS, we took advantage of the fact that the data set utilized in thecurrent study contains cell types with known class labels (HAC,MSC, FIB, and TCL). Therefore, different clustering solutionscan be evaluated by measuring the agreement of the clusterpartition with the known class labels (Yeung et al., 2003).However, an important consideration for this approach is todiscard, prior to clustering, potential outliers inside each classthat can reduce the quality of data analysis and lead toerroneous results. While the detection of univariate outliers (asshown in Fig. 1) is relatively straightforward, techniques fordetecting multivariate outliers become more complicated,specially for small data sets such as the one used in the presentstudy. In this study we have used a novel approach based onsample interpoint distance matrices, previously proposed as avisualization technique for multivariate outlier detection

Fig. 1. Boxplotsdisplayingthelevelofexpressionofcell surfacemarkersonchondrocytes(HAC),bone-marrowderivedmesenchymalstemcells(MSC),fibroblasts(FIB), tumorcell lines(TCL),andprimarychondrosarcomacells (CS).Thesolidcirclesoutsideofanyboxplotrepresentoutlierswith the number indicating the identity of the sample: 1 U CH2879, 2 U L2388, 3 U CALU1, 4 U HAC10, 5 U MSC8, 6 U CALU1, 7 U L1515,8 U MSC10, 9 U CALU1, 10 U HAC4, 11 U CH2879, 12 U L1679, 13 U L2345, 14 U CH2879, 15 U L1662, 16 U L1812, 17 U CH2879, 18 U SW48.Dashed lines indicate a level of expression of 1 (no expression).

604 D I A Z - R O M E R O E T A L .

(Marchette and Solka, 2003). A multivariate outlier in each ofthe class analyzed was defined as a sample whose averageEuclidean interpoint distance was above a predeterminedthreshold (2 SD of the class interpoint distance), and theidentified multivariate outliers (HAC10 and MSC10) wereremoved before further analysis (Fig. 2).

Evaluation of hierarchical clustering approaches

Recovery of cluster structure, measured via the correctedRand index (cRi), was used to evaluate the effect of differentcombinations of data transformation, distance metrics, andalgorithms on clustering results for HAC, MSC, FIB, and TCL.By setting the number of clusters to the true number (4) ofknown classes in the data set, cRi provides information abouthow well the clustering procedure respects the known classboundaries, with a value of 1 indicating perfect agreement of the

JOURNAL OF CELLULAR PHYSIOLOGY

clustering solution with the known class membership. Figure 3shows the impact of data transformation on clustering results.While very poor results were obtained with non-transformeddata, a combination of logarithmic transformation and z-scorenormalization for data preprocessing prior to clustering yieldedhigh cRi values. Among the distance metrics evaluated, thePearson correlation clearly outperformed the Euclidean andManhattan distance in most of the combinations with thedifferent algorithms tested, although a cRi of 1 was obtainedcombining Manhattan distance with the Ward algorithm. Theimpact of the algorithm selection on the clustering results wasnot so pronounced as the choice of the data preprocessingapproach or distance metric, although single and median linkageperformed substantially worst compared to other algorithms.In addition to Ward combined with Manhattan, two otheralgorithms (average linkage and DIANA combined withPearson) reached the top score for cRi. A prerequisite to

Fig. 2. Detection of multivariate outliers based on sampleinterpoint distances. Left parts show the sample Euclidean distancematrices (after z-score normalization of the samples) for (A) HAC,(B) MSC, (C) FIB, and (D) TCL. The color intensity is related to theinterpoint distance between samples with lower values representedas green colors, progressing to black for intermediate values, and redsfor higher values. Right parts show plots of average interpointdistance for each sample, calculated from each row in thecorresponding sample Euclidean distance matrix (A, B, C, or D).Outliers (gray bars) were defined as samples exceeding 2 SD abovethe class (cell type) interpoint distance (dashed line), obtained byaveraging sample interpoint distances. [Color figure can be viewed inthe online issue, which is available at wileyonlinelibrary.com.]

H I E R A R C H I C A L C L U S T E R I N G O F F L O W C Y T O M E T R Y 605

achieve this excellent performance, was the removal ofmultivariate outliers (Supplemental Fig. 2), supporting themethod described above as an approach that can significantlyimprove the quality of HC results.

Unfortunately, for many biological data there is no a prioriknowledge of which objects should cluster together andalternative validation methods must be employed. Therefore

JOURNAL OF CELLULAR PHYSIOLOGY

we compared cRi to an internal validation method that does notrequires class labeling information and has been commonly usedfor HC validation: the cophenetic correlation coefficient (Sokaland Rohlf, 1962). This coefficient estimates directly the degreeto which distance information in the original data is preserved ina partitioning, with values closer to 1 supposedly reflectingmore accurate clustering solutions. But in line with previousreports (Holgersson, 1978), our data clearly showed that thecophenetic correlation coefficient is quite misleading, with verypoor accuracy in discriminating performance of HC algorithmswhen compared to cRi (Supplemental Fig. 3).

The best combinations of distance metrics and algorithmsevaluated, establishing a maximum misclassification error of 1,are shown in Figure 4. The presence of inversions (clustersjoined at a certain step of the algorithm at a level of similaritythat is lower than the preceding step) was observed for thecentroid linkage, a characteristic known to be associated withthis algorithm (Morgan and Ray, 1995). Although inversions cancause difficulties in the interpretation of clustering results, thisdid not seem to be the case in this particular situation.

Hierarchical clustering of CS

The combinations of distance metrics and algorithms shown inFigure 4 were used for HC of CS. Considering CS as anunknown class, our aim was to find out if CS samples belong toany of the four known classes (HAC, MSC, FIB, and TCL), orwhether they may form their own new class/classes.Dendrograms of HC for HAC, MSC, FIB, TCL, and CS, areshown in Figure 5. From the six algorithms evaluated, thecombinations Manhattan–Ward and Pearson–DIANA showedinaccuracy in classification of known classes, offered differentsolutions for CS clustering and, in the case of DIANA,incorrectly classified two technical replicates of a CS sample indifferent clusters. On the other hand, Pearson combined withcentroid, Ward, weighted average, or average linkage, classifiedcorrectly all samples of known classes, providing identical orvery similar solutions for the clustering of CS. With theexception of L2388-GII, all CS samples were classified asbelonging to the MSC cluster (8 of 13 CS) or to the FIB cluster(4 of the 13 CS). While MSC and CS were intercalated in theMSC cluster, FIB and CS could be still clearly resolved in twoseparated sub-clusters inside the FIB cluster. L2388-GIIclustered with TCL, except in the case of centroid linkage,where it formed a separate cluster of a single sample. Thesimilarity observed using different HC approaches emphasizethe reliability of the CS clustering results, although no clearseparation according to the histological grading was observed.To identify the minimum feature/marker subset required forcorrectly classifying HAC, MSC, FIB and TCL an iterativeanalysis was performed. At each iteration, a marker wasremoved and the results were compared with that obtainedusing the whole panel of markers. Figure 6 shows that reducingthe number of variables (cell surface markers) from 11 to 8, byremoving CD14, CD104 and CD221, produce identicalclustering results using the average linkage algorithm. Sameresults were obtained with Ward, weighted average andcentroid linkage (Supplemental Fig. 4), with the only differencefor the later that L2388-GII no longer clustered alone butforming part, as observed for the other algorithms, of the TCLcluster.

Differences on the level of expression of cell surface markersfor CS grouped according to the clustering results areillustrated in Figure 7. CS clustering with MSC showed a higherexpression of CD49b and CD221 compared to CS clusteringwith FIB, while the opposite is true for CD10 expression. Theonly CS sample clustering with TCL was characterized by anextremely high expression of CD49a compared to the rest ofCS samples.

Fig. 3. Impact of data preprocessing and algorithms on hierarchical clustering of HAC, MSC, FIB, and TCL. Clustering results without datatransformation (not transformed), logarithmic in base 2 (log 2) transformation, z-score normalization of rows/markers (z-normalized), and acombination of log 2 transformation followed by z-score normalization (log 2 R z-normalized) were compared using three different distancemetrics:Pearsoncorrelation(P, leftpart),Euclideandistance(E,centerpart),andManhattandistance(M,rightpart)combinedwitheightdifferenthierarchical clustering algorithms: average linkage (Unweighted Pair Group Method using Arithmetic mean, UPGMA), weighted average linkage(WeightedPairGroupMethodusingArithmeticmean,WPGMA),singlelinkage(SING),completelinkage(COMP),centroidlinkage(UnweightedPair Group Method using Centroids, UPGMC), median linkage (Unweighted Pair Group Method using Centroids, WPGMC), Ward’s linkage(WARD) and Divisive Analysis (DIANA). Clustering results were scored using the corrected Rand index (cRi). All data shown were analyzed afterexclusion of outliers.

606 D I A Z - R O M E R O E T A L .

Discussion

The versatility and power of FCM, boosted by the latesttechnical developments, has opened up new vistas tophenotype in detail different cell types and their malignantderivatives (Chattopadhyay et al., 2008; Steinbrich-Zollneret al., 2008). Despite a strong need for new advances, the leastdeveloped of FCM technologies is data analysis. FCM dataanalysis is usually performed manually and the choice of therelationships to be examined is typically hypothesis-driven witha limited number of parameters to be compared. Analysistechniques developed for genomic and proteomic studies, suchas HC, have been sporadically used in FCM (Maynadie et al.,2002; Anichini et al., 2006; Habib and Finn, 2006; Rawstronet al., 2006; Zucchetto et al., 2006; Kitsos et al., 2007; Lugli et al.,2007; Steinbrich-Zollner et al., 2008). However, in spite of theirtremendous potential, HC has not yet made the way into thefield of FCM as a commonly used procedure (Lizard, 2007).Validation of specific procedures in cluster analysis presentssignificant challenges. Most clustering algorithms produce aclustering even in the absence of actual structure, leaving to theuser the task of detecting the significance of the resultsreturned (Handl et al., 2005). The choice of the optimalclustering solution is often subjective, debatable and,sometimes, inadequate, and both the reproducibility andvalidity of many findings have been challenged (Dupuy andSimon, 2007). Therefore, the first aim of this study was toperform a systematic assessment of HC methodologies forFCM data analysis. While not uncommon in the genomic andproteomic fields (Meunier et al., 2007; de Souto et al., 2008),this is the first time to our knowledge that such assessment hasbeen performed with FCM data.

The first choice to be done in HC of FCM data is how toreport the expression of the surface markers to be used for

JOURNAL OF CELLULAR PHYSIOLOGY

clustering. While several studies have used percent of positivecells for this purpose (Habib and Finn, 2006; Zucchetto et al.,2006; Lugli et al., 2007), other groups have favored the use ofMFI, either directly (Rawstron et al., 2006) or after backgroundnormalization by subtracting (Anichini et al., 2006) or ratioing(Maynadie et al., 2002) against an autofluorescence control.Percent of positive cells relies on a subjective manual setting of adelimiter between negative and positive events, and involves aninherent loss of information by disregarding relative intensitiesof positive cells (Diaz-Romero et al., 2008). On the other hand,background subtraction of MFI can produce zeros and negativevalues, an undesirable property for cluster analysis due torestrictions imposed by certain data transformationapproaches. Given that none of these limitations apply to theMFI ratio, this seems the best choice for FCM clustering whennon-obvious subpopulations in the samples are observed.

Data preprocessing aims to transform the data obtainedduring acquisition in order to improve the quality and relevanceof HC results. Two different approaches for data preprocessingwere evaluated here: data cleaning and data transformation. Anew multivariate outlier detection method was introduced as afirst step for data cleaning in the workflow of the HC process.This method allows outlier removal from the known labelclasses used for refining the HC process, resulting in increasedintra-class homogeneity and improved performance of many ofthe clustering algorithms tested. Data transformationaddresses bias in clustering due to the presence of extremevalues caused by variables with large dynamic ranges bystandardizing the range of each variable (de Souto et al., 2008),and it is often omitted or poorly documented in HC of FCMdata (Maynadie et al., 2002; Habib and Finn, 2006; Petrauschet al., 2006; Zucchetto et al., 2006; Lugli et al., 2007). Acombination of logarithmic transformation in base 2 followedby z-score normalization was shown to be optimal in this study.

Fig. 4. HierarchicalclusteringofHAC,MSC,FIBandTCL.Dendrogramswereobtainedapplyingthebestalgorithms(asdefinedforthehighercRishown in thefigure) selected from Figure 3.Cell identityofeachsample is indicated bya colorcode. Four separateclusters are indicated by coloredboxeswithidenticalcoloringtothecorrespondingcell type.Notethatbranchinversions(astheoneindicatedbythearrowhead)areobservedusingP/UPGMC as clustering algorithm. Misclassification of samples is indicated by open boxes. [Color figure can be viewed in the online issue, which isavailable at wileyonlinelibrary.com.]

H I E R A R C H I C A L C L U S T E R I N G O F F L O W C Y T O M E T R Y 607

Another factor with potential impact on HC is the choice ofthe adequate distance metric, a mathematical description ofsimilarity between the data. Euclidean distance, Manhattandistance, and Pearson correlation are commonly used distancemetrics in HC (D’Haeseleer, 2005; Sardiu et al., 2009), with thelatter been the most applied in FC (Maynadie et al., 2002;Anichini et al., 2006; Habib and Finn, 2006; Petrausch et al.,2006; Rawstron et al., 2006; Lugli et al., 2007), although the useof Euclidean distance has also been reported (Zucchetto et al.,2006). As expected, HC results in this study were highlydependent on the metric used to quantify differences betweendata, with Pearson correlation clearly outperforming Euclideandistance, in agreement with other comparative HC studies inproteomics and genomics (Meunier et al., 2007; de Souto et al.,2008; Sardiu et al., 2009). The low performance of Euclideandistance was similar to that observed with Manhattan distance,although the combination of this metric with the Ward’s linkage

JOURNAL OF CELLULAR PHYSIOLOGY

provided excellent results for clustering of known classes(HAC, MSC, FIB, and TCL).

The best way to infer biological knowledge from a clusteringexperiment is to use different algorithms that can provide thebasis for the synthesis of accurate and reliable results (Azuajeand Bolshakova, 2002). Clustering algorithms are biasedtowards partitions that are in accordance with their ownclustering criterion (Handl et al., 2005). Average (Maynadieet al., 2002; Petrausch et al., 2006; Kitsos et al., 2007; Steinbrich-Zollner et al., 2008) and complete linkage (Anichini et al., 2006;Habib and Finn, 2006; Zucchetto et al., 2006; Lugli et al., 2007)seem to be the preferred choices for clustering of FCM data, afact that is arguably due to their wide availability in softwarepackages rather than intrinsic merits (Handl et al., 2005; Belacelet al., 2006). In this study, conceptually different algorithms,such as average, centroid and Ward linkage, generated highlysimilar partitions after recursive feature elimination to identify

Fig. 5. Hierarchical clustering of HAC, MSC, FIB, TCL, and primary chondrosarcoma cells (CS). Dendrograms were obtained applying thealgorithmsfromFigure4.Cell identityofsamples is indicatedbyacolorcode,withCSingray including informationaboutthehistologicgrading(GI,GII or GIII). Four separate clusters are indicated by colored boxes with identical coloring to the corresponding cell type. Two technical replicateswere performed on one CS sample (L1515-GIII and L1515b-GIII) generated by splitting the sample during monolayer culture. Note that branchinversions and the presence of a fifth cluster containing a single sample were observed using P/UPGMC as clustering algorithm. Misclassification ofsamples is indicated by open boxes. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

608 D I A Z - R O M E R O E T A L .

the subset of markers potentially relevant for distinguishing theclasses of samples (Li et al., 2004). This is a good indicator thatthe clustering results reflect actual structure of the data ratherthan simply random aggregation (Handl et al., 2005). Theremaining question is how meaningful are these findings fromthe biological point of view.

The second aim of this study was to investigate if theclustering pattern of CS could be correlated with histologicalgrading and/or provide insights regarding the cellular origin ofthese tumors by determining relationships among CS sampleswith different cell types of mesenchymal lineage. To ourknowledge, this is the first study in which HC of FCM data hasbeen applied to cells of this lineage. The lack of correlation

JOURNAL OF CELLULAR PHYSIOLOGY

between histological grading and clustering pattern of CS foundin the current study is not completely unexpected. Although arecent study has proposed a gene set predicting the clinicalbehavior of conventional chondrosarcoma (Boeuf et al., 2008),cytogenetics (Sandberg, 2004), biochemistry (Mankin et al.,1980), and gene expression analysis (Soderstrom et al., 2002) ofthese tumors have revealed an extraordinary heterogeneityindependent of the histological malignancy grade. To explainthis variability, it has been proposed that the clinicaldenomination of ‘‘chondrosarcoma’’ could represent differentspecies of tumor with distinctive characteristics (Mankin et al.,1980). Indeed, the appearance of subtypes among tumors ofsimilar anatomical origin is not uncommon (Sorlie et al., 2003),

Fig. 6. Hierarchical clusteringofHAC,MSC,FIB,TCL,andCSusingtheP/UPGMAalgorithm before (A)andafter (B)reductionof thenumber ofcell surface markers used for clustering (exclusion of CD14, CD104 and CD221). For visualization, dendrograms were combined with heat mapswhere each cell surface marker is represented by a single row of boxes colorized based on the level of expression of the marker in a certain sample.The color scale ranges from saturated green forvalues of�3.0 andbelow tosaturated red for valuesof 3.0andabove. [Color figure canbeviewed inthe online issue, which is available at wileyonlinelibrary.com.]

H I E R A R C H I C A L C L U S T E R I N G O F F L O W C Y T O M E T R Y 609

and based on similarities between normal cells and theirmalignant counterparts, these tumor subtypes are regarded asdifferent biological entities arising from different cell types. Ourfindings provide some support to the potential existence ofsuch subtypes in conventional chondrosarcoma, given that CSsamples clustered according to cell types rather than tohistological grading, although additional studies including alarger number of samples with known clinical outcome will beneeded to confirm if the chondrosarcoma groups defined in thisstudy represent biologically distinct disease entities. Inagreement with the hypothesis that chondrogenic neoplasmsdo not originate from adult chondrocytes (Aigner, 2002; Boeufet al., 2008) not a single CS sample clustered with HAC.Currently a debate exists whether chondrosarcoma arisesfrom a cartilage remnant from the pre-existent growth plate, orfrom MSC (Boeuf et al., 2008). Since a substantial number of theCS samples analyzed cluster with MSC, our results partiallysupport the later possibility. Furthermore, an additional groupof CS samples were shown to cluster close to FIB. Thepossibility exists that these results were due to overgrowth ofprimary CS cultures by fibroblasts potentially present in thetumor. However, the clear distinction between FIB and theseCS samples shown by the separate clustering in differentbranches of the HC dendrogram suggests that this CS couldoriginate from cells with characteristics similar but not identicalto FIB.

Identification of molecular markers showing differentialexpression between distinct groups of CS samples was possible

JOURNAL OF CELLULAR PHYSIOLOGY

after grouping of these samples according to the phenotypedefined by clustering. The potential significance of thesemarkers is illustrated by CD221, the insulin-like growth factor-Ireceptor (IGF1R). IGF1R targeting is currently under evaluationin clinical trials for different tumors (Li et al., 2009), and has beenrecently proposed as a therapeutic approach forchondrosarcoma (Ho et al., 2009). A key aspect in developmentof this therapy is the appropriate selection of responsivetumors, and several studies have indicated a strong associationbetween IGF1R levels on cancer cells with tumorresponsiveness to anti-IGF1R antibodies (Gong et al., 2009; Zhaet al., 2009). The current finding of differential expression ofIGF1R on chondrosarcoma cells could be potentially helpful forfuture development of new IGF1R targeting based therapies forchondrosarcoma, allowing the stratification of patients inclinical trials according to the clusters described in the presentstudy.

In addition to categorize CS cells according to surfacemarkers expression, HC of FCM data also allowed investigationof differences between primary tumor cells and cell lines. Celllines derived from human tumors are extensively used asexperimental model of neoplastic disease (van Staveren et al.,2009). Chondrosarcoma cell lines, in addition to beingcommonly used as a replacement for normal humanchondrocytes in basic research, have been employed to studybiological characteristics, carcinogenesis and development oftumor treatments for chodrosarcoma (Kunisada et al., 1998;Gebauer et al., 2005). However, the potential of these cell lines

Fig. 7. Box plots displaying the level of expression of cell surface markers on CS grouped according to the clustering results previously described:CS clustering with MSC (CS-MSC), CS clustering with fibroblasts (CS-FIB), and CS clustering with tumor cell lines (CS-TCL). CS-TCL is shown ashorizontal thick lines for the single sample belonging to this group. All CS were negative for CD104 and CD14 and therefore box plots for thesemarkers were omitted.The solid circle outside the CD26 box plot represents an outlier (L1662-GII). Dashed lines indicate a level of expression of 1(no expression). Differences betweenCS-MSC andCS-FIB for the level of expression of agiven marker considered statistically significant (two-tailunpaired Student’s t-test, P value < 0.05) are indicated (M).

610 D I A Z - R O M E R O E T A L .

as substitutes for normal human chondrocytes is controversial,due to the lack of significant expression levels of many cartilage-specific gene products (Gebauer et al., 2005), and the validity ofchondrosarcoma cell lines to mimic primary tumors remainsunexplored. Several studies have consistently demonstratedthat cell lines sharing a common tissue origin are more similar toeach other than to the tumors they derived from (van Staverenet al., 2009), and our data point in the same direction. The threechondrosarcoma cell lines investigated not only segregatedapart from primary tumor cells (with the exception of L2388-GII) and normal tissue cells (HAC, MSC, and FIB), but clusteredtogether with cell lines from different lineage (epithelial). Giventhat all cells analyzed in the study were cultured in vitro beforeHC analysis, it is unlikely that the observed differences betweencell lines and primary tumor cells/normal tissue cells can bemerely explained by adaptation to cell culture conditions of thecell lines (Ertel et al., 2006). Our results suggest that thesedifferences can be inherent characteristics underlying thetransformed phenotype of established cell lines and emphasizethe need for great caution when trying to extrapolate resultsobtained in chondrosarcoma cell lines to humanchondrosarcomas in vivo.

It is worth mentioning that the methods demonstratedherein are not limited to the particular type of data collected in

JOURNAL OF CELLULAR PHYSIOLOGY

this study, and could be used as guidelines for HC of FCM data.The methodology and selection of parameters described in thepresent study should facilitate comparative analysis ofmultiparametric data obtained in FCM and potentiate thetransformation of diagnostic flow cytometry into a cytomicplatform for the development of standardized predictivedisease classifiers for clinical purposes.

Acknowledgments

The authors would like to thank Mrs. Isabelle Estella (Institute ofPathology, Bern, Switzerland) for excellent technical assistance,Dr. Dobrila Nesic (Institute of Pathology, Bern, Switzerland),Dr. Eberhard Korsching (Institute of Pathology, Muenster,Germany), and Dr. Anton Gluck (Novartis Institutes forBioMedical Research, Basel, Switzerland) for critically readingthe manuscript, and Dr. Una McKeever (Novartis Institutes forBioMedical Research, Basel, Switzerland) for help with theEnglish editing. The Department of Pathology, LeidenUniversity Medical Centre, and the Osteoarticular ResearchGroup, Institute of Pathology, University of Bern, are partnersof the EuroBoNeT consortium, a European Commission FP-6granted Network of Excellence for studying the pathology andgenetics of bone tumors.

H I E R A R C H I C A L C L U S T E R I N G O F F L O W C Y T O M E T R Y 611

Literature Cited

Aigner T. 2002. Towards a new understanding and classification of chondrogenic neoplasiasof the skeleton–biochemistry and cell biology of chondrosarcoma and its variants.Virchows Arch 441:219–230.

Aigner T, Muller S, Neureiter D, Illstrup DM, Kirchner T, Bjornsson J. 2002. Prognosticrelevance of cell biologic and biochemical features in conventional chondrosarcomas.Cancer 94:2273–2281.

Anichini A, Mortarini R, Nonaka D, Molla A, Vegetti C, Montaldi E, Wang X, Ferrone S. 2006.Association of antigen-processing machinery and HLA antigen phenotype of melanomacells with survival in American Joint Committee on Cancer stage III and IV melanomapatients. Cancer Res 66:6405–6411.

Azuaje F, Bolshakova N. 2002. Clustering genomic expression data: Design and evaluationprinciples. In: Berrar D, Dubitzky W, Granzow M, editors. Understanding and usingmicroarray analysis techniques: A practical guide. London: Kluwer Academic Publishers.pp. 230–245.

Belacel N, Wang Q, Cuperlovic-Culf M. 2006. Clustering methods for microarray geneexpression data. Omics 10:507–531.

Boeuf S, Kunz P, Hennig T, Lehner B, Hogendoorn P, Bovee J, Richter W. 2008.A chondrogenic gene expression signature in mesenchymal stem cells is a classifier ofconventional central chondrosarcoma. J Pathol 216:158–166.

Bovee JV, Cleton-Jansen AM, Taminiau AH, Hogendoorn PC. 2005. Emerging pathways in thedevelopment of chondrosarcoma of bone and implications for targeted treatment. LancetOncol 6:599–607.

Chattopadhyay PK, Hogerkorp CM, Roederer M. 2008. A chromatic explosion: Thedevelopment and future of multiparameter flow cytometry. Immunology 125:441–449.

de Souto MC, Costa IG, de Araujo DS, Ludermir TB, Schliep A. 2008. Clustering cancer geneexpression data: A comparative study. BMC Bioinformatics 9:497.

D’Haeseleer P. 2005. How does gene expression clustering work? Nat Biotechnol 23:1499–1501.

Diaz-Romero J, Gaillard JP, Grogan SP, Nesic D, Trub T, Mainil-Varlet P. 2005.Immunophenotypic analysis of human articular chondrocytes: Changes in surface markersassociated with cell expansion in monolayer culture. J Cell Physiol 202:731–742.

Diaz-Romero J, Nesic D, Grogan SP, Heini P, Mainil-Varlet P. 2008. Immunophenotypicchanges of human articular chondrocytes during monolayer culture reflect bona fidededifferentiation rather than amplification of progenitor cells. J Cell Physiol 214:75–83.

Dupuy A, Simon RM. 2007. Critical review of published microarray studies for canceroutcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst 99:147–157.

Eefting D, Schrage YM, Geirnaerdt MJ, Le Cessie S, Taminiau AH, Bovee JV, Hogendoorn PC.2009. Assessment of interobserver variability and histologic parameters to improvereliability in classification and grading of central cartilaginous tumors. Am J Surg Pathol33:50–57.

Ertel A, Verghese A, Byers SW, Ochs M, Tozeren A. 2006. Pathway-specific differencesbetween tumor cell lines and normal and tumor tissue cells. Mol Cancer 5:55.

Gebauer M, Saas J, Sohler F, Haag J, Soder S, Pieper M, Bartnik E, Beninga J, Zimmer R, AignerT. 2005. Comparison of the chondrosarcoma cell line SW1353 with primary human adultarticular chondrocytes with regard to their gene expression profile and reactivity to IL-1beta. Osteoarthritis Cartilage 13:697–708.

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y,Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M,Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. 2004. Bioconductor:Open software development for computational biology and bioinformatics. Genome Biol5:R80.

Gey GO, Coffman WD, Kubicek MT. 1952. Tissue culture studies of the proliferative capacityof cervical carcinoma and normal epithelium. Cancer Res 12:264–265.

Gil-Benso R, Lopez-Gines C, Lopez-Guerrero JA, Carda C, Callaghan RC, Navarro S, FerrerJ, Pellin A, Llombart-Bosch A. 2003. Establishment and characterization of a continuoushuman chondrosarcoma cell line, ch-2879: Comparative histologic and genetic studies withits tumor of origin. Lab Invest 83:877–887.

Gong Y, Yao E, Shen R, Goel A, Arcila M, Teruya-Feldstein J, Zakowski MF, Frankel S, Peifer M,Thomas RK, Ladanyi M, Pao W. 2009. High expression levels of total IGF-1R and sensitivityof NSCLC cells in vitro to an anti-IGF-1R antibody (R1507). PLoS ONE 4:e7273.

Habib LK, Finn WG. 2006. Unsupervised immunophenotypic profiling of chronic lymphocyticleukemia. Cytometry B Clin Cytom 70:124–135.

Handl J, Knowles J, Kell DB. 2005. Computational cluster validation in post-genomic dataanalysis. Bioinformatics 21:3201–3212.

Ho L, Stojanovski A, Whetstone H, Wei QX, Mau E, Wunder JS, Alman B. 2009. Gli2 and p53cooperate to regulate IGFBP-3- mediated chondrocyte apoptosis in the progression frombenign to malignant cartilage tumors. Cancer Cell 16:126–136.

Holgersson M. 1978. Limited value of cophenetic correlation as a clustering criterion. PatternRecognit 10:287–295.

Hubert L, Arabie P. 1985. Comparing partitions. J Classif 2:193–218.Jakob M, Demarteau O, Schafer D, Hintermann B, Dick W, Heberer M, Martin I. 2001.

Specific growth factors during the expansion and redifferentiation of adult human articularchondrocytes enhance chondrogenesis and cartilaginous tissue formation in vitro. J CellBiochem 81:368–377.

Kaighn ME, Narayan KS, Ohnuki Y, Lechner JF, Jones LW. 1979. Establishment andcharacterization of a human prostatic carcinoma cell line (PC-3). Invest Urol 17:16–23.

Kaufman L, Rousseeuw PJ. 2005. Finding groups in data: An introduction to cluster analysis.Hoboken, NJ: Wiley. xiv, 342 pp.

Kitsos CM, Bhamidipati P, Melnikova I, Cash EP, McNulty C, Furman J, Cima MJ, Levinson D.2007. Combination of automated high throughput platforms, flow cytometry, andhierarchical clustering to detect cell state. Cytometry A 71:16–27.

JOURNAL OF CELLULAR PHYSIOLOGY

Kunisada T, Miyazaki M, Mihara K, Gao C, Kawai A, Inoue H, Namba M. 1998. A new humanchondrosarcoma cell line (OUMS-27) that maintains chondrocytic differentiation. Int JCancer 77:854–859.

Leibovitz A, Stinson JC, McCombs WB III, McCoy CE, Mazur KC, Mabry ND. 1976.Classification of human colorectal adenocarcinoma cell lines. Cancer Res 36:4562–4569.

Li T, Zhang C, Ogihara M. 2004. A comparative study of feature selection and multiclassclassification methods for tissue classification based on gene expression. Bioinformatics20:2429–2437.

Li R, Pourpak A, Morris SW. 2009. Inhibition of the insulin-like growth factor-1 receptor(IGF1R) tyrosine kinase as a novel cancer therapy approach. J Med Chem 52:4981–5004.

Lizard G. 2007. Flow cytometry analyses and bioinformatics: Interest in new softwares tooptimize novel technologies and to favor the emergence of innovative concepts in cellresearch. Cytometry A 71:646–647.

Lugli E, Pinti M, Nasi M, Troiano L, Ferraresi R, Mussi C, Salvioli G, Patsekin V, Robinson JP,Durante C, Cocchi M, Cossarizza A. 2007. Subject classification obtained by clusteranalysis and principal component analysis applied to flow cytometric data. Cytometry A71:334–344.

Mankin HJ, Cantley KP, Lippiello L, Schiller AL, Campbell CJ. 1980. The biology of humanchondrosarcoma. I. Description of the cases, grading, and biochemical analyses. J Bone JointSurg Am 62:160–176.

Marchette DJ, Solka JL. 2003. Using data images for outlier detection. Comput Stat Data Anal43:541–552.

Maynadie M, Picard F, Husson B, Chatelain B, Cornet Y, Le Roux G, Campos L, Dromelet A,Lepelley P, Jouault H, Imbert M, Rosenwadj M, Verge V, Bissieres P, Raphael M, Bene MC,Feuillard J. 2002. Immunophenotypic clustering of myelodysplastic syndromes. Blood100:2349–2356.

Meunier B, Dumas E, Piec I, Bechet D, Hebraud M, Hocquette JF. 2007. Assessment ofhierarchical clustering methodologies for proteomic data mining. J Proteome Res 6:358–366.

Morgan BJT, Ray APG. 1995. Nonuniqueness and inversions in cluster-analysis. J R Stat SocSer C Appl Stat 44:117–134.

Ouyang P. 1998. An in vitro model to study mesenchymal-epithelial transformation. BiochemBiophys Res Commun 246:771–776.

Pattillo RA, Hussa RO, Story MT, Ruckert AC, Shalaby MR, Mattingly RF. 1977. Tumorantigen and human chorionic gonadotropin in CaSki cells: A new epidermoid cervicalcancer cell line. Science 196:1456–1458.

Perucho M, Goldfarb M, Shimizu K, Lama C, Fogh J, Wigler M. 1981. Human-tumor-derivedcell lines contain common and different transforming genes. Cell 27:467–476.

Petrausch U, Haley D, Miller W, Floyd K, Urba WJ, Walker E. 2006. Polychromatic flowcytometry: A rapid method for the reduction and analysis of complex multiparameter data.Cytometry A 69:1162–1173.

Rawstron AC, de Tute R, Jack AS, Hillmen P. 2006. Flow cytometric protein expressionprofiling as a systematic approach for developing disease-specific assays: Identification of achronic lymphocytic leukaemia-specific assay for use in rituximab-containing regimens.Leukemia 20:2102–2110.

Sandberg AA. 2004. Genetics of chondrosarcoma and related tumors. Curr Opin Oncol16:342–354.

Sardiu ME, Florens L, Washburn MP. 2009. Evaluation of clustering algorithms for proteincomplex and protein interaction network assembly. J Proteome Res 8:2944–2952.

Soderstrom M, Bohling T, Ekfors T, Nelimarkka L, Aro HT, Vuorio E. 2002. Molecularprofiling of human chondrosarcomas for matrix production and cancer markers. Int JCancer 100:144–151.

Sokal RR, Rohlf FJ. 1962. The comparison of dendrograms by objective methods. Taxon11:33–40.

Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R,Geisler S, Demeter J, Perou CM, Lonning PE, Brown PO, Borresen-Dale AL, Botstein D.2003. Repeated observation of breast tumor subtypes in independent gene expressiondata sets. Proc Natl Acad Sci USA 100:8418–8423.

Steinbrich-Zollner M, Grun JR, Kaiser T, Biesen R, Raba K, Wu P, Thiel A, Rudwaleit M, SieperJ, Burmester GR, Radbruch A, Grutzkau A. 2008. From transcriptome to cytome:Integrating cytometric profiling, multivariate cluster, and prediction analyses for aphenotypical classification of inflammatory diseases. Cytometry A 73:333–340.

Szuhai K, Tanke HJ. 2006. COBRA: Combined binary ratio labeling of nucleic-acid probes formulti-color fluorescence in situ hybridization karyotyping. Nat Protoc 1:264–275.

van Staveren WC, Solis DY, Hebrant A, Detours V, Dumont JE, Maenhaut C. 2009. Humancancer cell lines: Experimental models for cancer cells in situ? For cancer stem cells?Biochim Biophys Acta 1795:92–103.

Verhaak RG, Staal FJ, Valk PJ, Lowenberg B, Reinders MJ, de Ridder D. 2006. The effect ofoligonucleotide microarray data pre-processing on the analysis of patient-cohort studies.BMC Bioinformatics 7:105.

Yang X, Pursell B, Lu S, Chang TK, Mercurio AM. 2009. Regulation of {beta}4-integrinexpression by epigenetic modifications in the mammary gland and during the epithelial-to-mesenchymal transition. J Cell Sci 122:2473–2480.

Yeung KY, Medvedovic M, Bumgarner RE. 2003. Clustering gene-expression data withrepeated measurements. Genome Biol 4:R34.

Zha J, O’Brien C, Savage H, Huw LY, Zhong F, Berry L, Lewis Phillips GD, Luis E, Cavet G, HuX, Amler LC, Lackner MR. 2009. Molecular predictors of response to a humanized anti-insulin-like growth factor-I receptor monoclonal antibody in breast and colorectal cancer.Mol Cancer Ther 8:2110–2121.

Zucchetto A, Sonego P, Degan M, Bomben R, Dal Bo M, Bulian P, Benedetti D, Rupolo M, DelPoeta G, Campanini R, Gattei V. 2006. Surface-antigen expression profiling of B cell chroniclymphocytic leukemia: From the signature of specific disease subsets to the identification ofmarkers with prognostic relevance. J Transl Med 4:11.