Quality Indicators Increase the Reliability of Microarray Data

10
GENOMICS Vol. 80, Number 4, October 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved. 0888-7543/02 $35.00 385 Article doi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL INTRODUCTION Until recently only a limited number of genes were accessi- ble to gene expression profiling, as northern blot, RT-PCR, and ribonuclease protection assays are designed for single genes or small groups of genes at a time. During the course of the human genome project, comprehensive cDNA libraries became available allowing the development of techniques for massive parallel expression profiling. Two types of microar- rays emerged either using oligonucleotides directly synthe- sized on a chip surface (Affymetrix) [reviewed in 1,2] or depositing cDNA PCR products on glass slides [reviewed in 1,3]. In parallel, clustering algorithms for data analysis have been developed [4–7]. High-density microarrays allowed genome-wide screening programs for identification of target genes or expression profiles in disease and cancer [reviewed in 8–10]. Large amounts of data have been generated quickly, but several types of problems encourage the development of novel concepts for data evaluation. Large data sets with intrinsic variation (“noisy data”) have to be interpreted by recognizing and excluding outlier data from subsequent analysis in an automated and highly reliable way. Quality Indicators Increase the Reliability of Microarray Data Wolfgang Raffelsberger, 1 Doulaye Dembélé, 1 Mike G. Neubauer, 2 Marco M. Gottardis, 3 and Hinrich Gronemeyer 1,* 1 Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, B.P. 10142, F-67404 Illkirch Cedex, C. U. de Strasbourg, France Departments of 2 Applied Genomics and 3 Oncology Drug Discovery, Bristol-Myers Squibb Pharmaceutical Research Institute, Princeton, New Jersey 08543-4000, USA * To whom correspondence and reprint requests should be addressed. Fax (+33) 388 65 3201. E-mail: [email protected]. Large-scale gene expression profiling with DNA microarrays opens new dimensions to molecular biology but still lacks the overall precision of traditional low-scale techniques. We developed a novel strategy of data processing linking search stringency to quality indi- cators for efficient detection of low-level, regulated genes. Using retinoid-induced differ- entiation of NB-4 promyelocytic cells, the variation of expression profiles between biolog- ical duplicates was studied and compared with the changes induced by all-trans retinoic acid (atRA) treatment. An analysis of 4320 genes showed that retinoic acid has mainly gene- activating function in NB-4 cells. Treatment with atRA for 18 hours induced metabolic genes that may be associated with cell differentiation and signaling factors triggering later events leading to apoptosis; cytokine genes were among the highest stimulated by atRA. Notably, we identified a regulatory loop inhibiting MYC action: as MYC was downregulated, a cognate repressor of MYC was upregulated. Key Words: retinoic acid, cell differentiation, gene expression profiling, biostatistics Inconsistently reproducing replicates may be due to incor- rect normalization and/or imprecision caused by the limita- tions of cDNA spotting technology. In addition, every meas- urement has an instrument-specific random error that is proportionally higher at low fluorescence intensities. Large- scale verification of array results, by comparison with alter- native methods (like real-time RT-PCR or multiplex ribonu- clease protection assays), remains one of the most important means to determine the overall reliability. Improved reliability of results can be obtained by adding quality indicators, also called “quality scores” or “flags” [11,12]. We used the following parameters to define quality indicators: a gene’s signal/noise-ratio, saturation at slide-scanner, and its replicates’ reproducibility [11,13–15]. These flags allowed efficient identification and subsequent elimination of data not satisfying basic quality require- ments. Furthermore, we propose a novel method that uses quality indicators to gradually favor high-quality data when filtering for gene regulation. Such filtered data have a minimized risk of containing false positives (that is, genes expressed at constant levels but appearing differentially expressed due to imprecise measurements). This is partic- ularly important to identify regulated genes expressed at low rates.

Transcript of Quality Indicators Increase the Reliability of Microarray Data

Articledoi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

Quality Indicators Increase the Reliability of Microarray Data

Wolfgang Raffelsberger,1 Doulaye Dembélé,1 Mike G. Neubauer,2 Marco M. Gottardis,3and Hinrich Gronemeyer1,*

1Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, B.P. 10142, F-67404 Illkirch Cedex, C. U. de Strasbourg, FranceDepartments of 2Applied Genomics and 3Oncology Drug Discovery, Bristol-Myers Squibb Pharmaceutical Research Institute, Princeton, New Jersey 08543-4000, USA

*To whom correspondence and reprint requests should be addressed. Fax (+33) 388 65 3201. E-mail: [email protected].

Large-scale gene expression profiling with DNA microarrays opens new dimensions tomolecular biology but still lacks the overall precision of traditional low-scale techniques.We developed a novel strategy of data processing linking search stringency to quality indi-cators for efficient detection of low-level, regulated genes. Using retinoid-induced differ-entiation of NB-4 promyelocytic cells, the variation of expression profiles between biolog-ical duplicates was studied and compared with the changes induced by all-trans retinoicacid (atRA) treatment. An analysis of 4320 genes showed that retinoic acid has mainly gene-activating function in NB-4 cells. Treatment with atRA for 18 hours induced metabolicgenes that may be associated with cell differentiation and signaling factors triggering laterevents leading to apoptosis; cytokine genes were among the highest stimulated by atRA.Notably, we identified a regulatory loop inhibiting MYC action: as MYC was downregulated,a cognate repressor of MYC was upregulated.

Key Words: retinoic acid, cell differentiation, gene expression profiling, biostatistics

INTRODUCTION

Until recently only a limited number of genes were accessi-ble to gene expression profiling, as northern blot, RT-PCR,and ribonuclease protection assays are designed for singlegenes or small groups of genes at a time. During the courseof the human genome project, comprehensive cDNA librariesbecame available allowing the development of techniques formassive parallel expression profiling. Two types of microar-rays emerged either using oligonucleotides directly synthe-sized on a chip surface (Affymetrix) [reviewed in 1,2] ordepositing cDNA PCR products on glass slides [reviewed in1,3]. In parallel, clustering algorithms for data analysis havebeen developed [4–7]. High-density microarrays allowedgenome-wide screening programs for identification of targetgenes or expression profiles in disease and cancer [reviewedin 8–10].

Large amounts of data have been generated quickly, butseveral types of problems encourage the development ofnovel concepts for data evaluation. Large data sets withintrinsic variation (“noisy data”) have to be interpreted byrecognizing and excluding outlier data from subsequentanalysis in an automated and highly reliable way.

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.0888-7543/02 $35.00

Inconsistently reproducing replicates may be due to incor-rect normalization and/or imprecision caused by the limita-tions of cDNA spotting technology. In addition, every meas-urement has an instrument-specific random error that isproportionally higher at low fluorescence intensities. Large-scale verification of array results, by comparison with alter-native methods (like real-time RT-PCR or multiplex ribonu-clease protection assays), remains one of the most importantmeans to determine the overall reliability.

Improved reliability of results can be obtained byadding quality indicators, also called “quality scores” or“flags” [11,12]. We used the following parameters to definequality indicators: a gene’s signal/noise-ratio, saturation atslide-scanner, and its replicates’ reproducibility [11,13–15].These flags allowed efficient identification and subsequentelimination of data not satisfying basic quality require-ments. Furthermore, we propose a novel method that usesquality indicators to gradually favor high-quality datawhen filtering for gene regulation. Such filtered data havea minimized risk of containing false positives (that is, genesexpressed at constant levels but appearing differentiallyexpressed due to imprecise measurements). This is partic-ularly important to identify regulated genes expressed atlow rates.

385

Article doi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

FIG. 1. Comparison of signal intensity and reproducibility. (A) After normalizing six data sets for each sample, median expression values and correspondingCVs (coefficients of variation) were calculated and plotted against each other. (B) Cumulative data histogram for signal intensities. Expression values for eachof the six data sets originating from one sample were corrected for the edge effect and median-normalization factors were determined. To verify such normal-ization factors the resulting normalized values were plotted as cumulative frequencies. In the case that the curves were not overlaid satisfactorily, we manuallyadjusted the normalization factors.

A B

Retinoic-acid triggered transcriptional responses havebeen studied extensively in leukemic cells [reviewed in16–19]. NB-4 cells, which have been established from a patientwith acute promyelocytic t(15;17) leukemia, undergo termi-nal differentiation and apoptosis [20] once treated with all-trans retinoic acid (atRA). Several genes transcriptionally reg-ulated during this process have been described [5,21–23] andserved for additional comparisons.

RESULTS

Edge Effect and NormalizationThe microarrays used had a considerable edge effect: spotslocated close to the edge of a slide displayed lower fluores-cence signals than duplicate spots in the center of the slide.For each column a correction factor was introduced mini-mizing the normalized differences of spot-duplicate(left/right). As low spot intensity values have 3- to 10-foldelevated deviation (Fig. 1A), only the 60% most intense spotpairs were used. Spots at saturation were excluded.

All normalizations between replicate slides or subse-quently between different samples were based on theassumption that there are no major changes in expression lev-els for the bulk part of the genes tested. This was a validassumption—it is supported by near-identical shapes ofcumulative frequency histograms of fluorescence intensitiesfor different slides after median normalization (Fig. 1B).

Comparison with Quantitative RT-PCR and PreviousResults Obtained with Affymetrix GeneChipsFrom preliminary experiments 18 genes were selected andtheir atRA-induced expression was assessed by real-timePCR. In general, most results were in agreement with the

386

array results. Six representative genes exhibiting more thantwofold regulation are compared in Fig. 2. Only for one gene(G1P3) major differences were observed between array resultsand PCR. The regulation of three genes was underestimatedfrom the arrays (TGM2, LSP1, and BCL2A1 (alias Bfl1)) andin one case (G1P3) the array did not reveal any significantgene regulation.

The arrays used had 4608 sequences spotted, representing3907 different UniGene clusters and an additional 413 acces-sion numbers not listed in UniGene, yielding a total of 4320different genes. The remaining spots were considered as geneduplicates, but they were treated independently in data analy-sis. Similarly the list of accession numbers published for theAffymetrix (Santa Clara, CA) HU6000 GeneChips used byTamayo et al. [5] was analyzed to compare atRA-dependentgene regulation from our arrays with the Affymetrix study.The HU6000 GeneChip represents 4690 UniGene clusters and1881 accession numbers not listed in UniGene; both arraytypes matched for 1594 UniGene clusters. Most genes presenton both arrays had minimal changes in expression levels: 1506genes (94.5%) did not reach threefold regulation; and 149genes passing the threshold filter of 2.5-fold for at least one ofthe experiments are displayed in Fig. 3. Also, 13% of 149 geneswere regulated only in the cDNA array results and 11% wereonly regulated in the Affymetrix results: PTMS (parathy-mosin), splicing factor SFRS7, and CSF3R (colony stimulatingfactor 3 receptor) were upregulated only in the data generatedusing Affymetrix GeneChips, whereas SYCA13 (alias Mcp4)was only found to be upregulated in our data set. Retinoidinduction of LSP1 and G1P3 (confirmed by RT-PCR; Fig. 2)was not observed previously [5]. We detected upregulation ofLSP1 with the cDNA arrays, albeit less pronounced than byRT-PCR, and the HU6000 GeneChips failed to detect anyretinoic acid induced expression of LSP1. The two types of

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Articledoi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

FIG. 2. Comparison of results obtained using microarrays and real-time PCR. An aliquot of the RNA used for array hybridizations was reverse transcribed andgene expression levels were measured using gene-specific oligonucleotides in quantitative real-time PCR. Samples were run as duplicates and quantified based onsix-point calibration curves. A total of 22 primer pairs were tested, and 4 of them representing housekeeping genes were used for data normalization. Three repli-cate microarrays were hybridized for each probe, the resulting expression data normalized against each other, and gene regulation determined using median expres-sion values. The Y axis is broken because +1 and –1 represent “no change of expression levels” on an axis of x-fold gene regulation; error bars represent the stan-dard error. Sample A is shown in dark gray, sample B in light gray, and sample C, representing the atRA-treated status, is shown as dashed bars.

arrays revealed upregulation of TGM2, but the degree wasunderestimated compared with RT-PCR. BCL2A1 was under-estimated from the cDNA arrays, whereas MYC and SER-PINB1 were underestimated using the HU6000 GeneChips.We note that other recent reports also showed that microar-rays may underestimate the extent of gene regulation [24,25].

Quality IndicatorsTo understand the parameters defining the quality of microar-ray data, we studied the principle factors of variability anddeveloped a system of quality indicators (“flagsample”). Qualityindicators were designed to grade every spot considering sig-nal saturation at fluorescence detection, low signal/noise ratios,and the degree of reproducibility of multiple replicates. Lownumbers indicated safe data, whereas increasing flagsample val-ues identified less reliable data. Comparison of flagsample andmedian fluorescent signal intensities revealed a sigmoidal rela-tionship (Fig. 4). Fluorescence signals at detection limit (47units) and spots < 80 units (n = 385 or 8%) had consistently highflagsample values. For the next group of spots (< 1000 units, n =2335 or 51%) fluorescence intensities correlated directly withflagsample, while even higher fluorescence measurements (> 1000

FIG. 3. Comparison with results obtained by Tamayo et al. [5] using AffymetrixHU6000 GeneChips. Of 1594 genes present on both arrays, 149 different genesthat had a minimum 2.5-fold regulation in at least one experiment were fur-ther analyzed for the degree of agreement between both data sets. Relativechanges in gene expression levels were plotted in the format of “(x-1)-foldregulation,” to obtain a continuous scale with 0 representing “no regulation.”The regression line was close to the expected diagonal but had only a corre-lation coefficient r2 of 0.58.

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

units, n = 1609 or 35%) had flagsample values varying inde-pendently of the apparent spot intensity. At very high fluo-rescence measurements (> 25,000 units, n = 279 or 6%) the cor-responding flagsample values were elevated and formed smallclusters (circled population in Fig. 4), representing genes withone or up to six measurements at saturation of fluorescencedetection.

387

Article doi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

Minimal Data Quality FilteringA predefined threshold of 1.0 for the quality indicator flagsamplewas used to filter all data that satisfied minimal quality require-ments. This resulted in the exclusion of 613 genes (13.3%); theremaining 3995 genes (86.7%) were used for subsequent dataanalysis. The 613 removed genes (for samples A, B, and C) com-prise genes whose expression was at detection limit (151 genesor 25%), low-level expressed genes with a high flagsample (n = 303or 49%), and genes for which more than two replicate spotswere at fluorescence saturation (n = 159 or 26%). Scatter plotregression lines improved from r2 values of 0.82 (n = 4608) to0.91 or 0.94 (n = 3995; Fig. 5). The complete data for all threesamples are available at http://www1-igbmc.u-strasbg.fr/~Gronemeyer.

Determination of Regulated Genes: “Secure Regulation”Changes in expression levels can be presented as the loga-rithm of the ratio of sample/reference [4,7], resulting in con-tinuous scales with “0” indicating no gene regulation. Toobtain an output format that is easy to read, we remainedwith the “x-fold regulated” style. To allow continuous scales,we adopted the format of “(x-1)-fold regulation” whereupregulations are shown with positive prefixes and down-regulations with negative prefixes (for example, threefoldgene induction yielding “+2,” fivefold downregulation yield-ing “–4”); no regulation is represented by “0.”

We suggest use of the flagsample by subtraction from a sam-ple’s calculated gene regulation (on (x-1)-scale). Such reducedvalues for gene regulation are referred to as “secure regula-tion.” When comparing the atRA treated sample with theaverage of both untreated samples, the number of weakly

FIG. 4. Correlation between fluorescence signal intensity on flagsample. Mediansignal intensities and their corresponding flagsample values were plotted againsteach other for all 4608 data points from sample C. Furthermore, the data weresegmented in 20 classes, and corresponding spot intensity and flagsample val-ues were determined for each class and plotted in white. Genes with one ormore replicate spots at saturation of fluorescence detection are marked by acircle.

388

regulated genes was dramatically reduced (Fig. 6). In fact, themajor part of weakly regulated genes originated from lessreliable measurements and consequently values were reducedto “0 secure regulation.” The occurrence of the class 0 (takenfrom –0.05 to +0.05) was boosted from 407 genes or 10% (con-ventional gene regulation, blue bars in Fig. 6) to 2444 genesor 61% at secure regulation (Fig. 6, red bars). This indicatedthat a major part of the proposed “regulated genes” may beartifacts. Most of the genes upregulated more than threefoldremained in the final results (49 of 59 genes, or 83.1%),whereas only 7 of 57 (12.3%) with more than threefold down-regulation remained. These results are in agreement with pre-vious studies on atRA-induced gene regulation reportingmore upregulated genes than repressed ones [25,26].

Curve Fitting and Statistical TestThe most common strategy for selecting regulated genes con-sists in applying arbitrarily defined thresholds [26,27]. In con-trast to this procedure we suggest using a statistical test todecide which thresholds to choose. Based on the hypothesisthat there are no global changes in gene expression after atRAtreatment, we assumed only genes with changes of expressionlevels exceeding 95% of a modeled test-distribution should beconsidered significantly regulated [28].

Looking at “secure gene regulation” (red bars in Fig. 6),we observed that the frequency distribution had slightly moreweakly downregulated genes than upregulated ones. But asthe working hypothesis was a stochastic model for scatteredsignal we restricted our search on symmetrical functions only.We obtained a best fit using a composed function with onepart of a “Cauchy” and an additional part “Laplace” proba-bility density, and the resulting function is shown as yellowline in Fig. 6 (sum of squared residues = 8.15 E–4). The result-ant threshold (� = 5%) was ± 1.55-fold of “secure regulation”on a (x-1)-scale. This resulted in 51 upregulated genes (53entries, but 2 genes out of this list were spotted as 2 inde-pendent sequences each) and 9 downregulated genes, corre-sponding to changes of at least ± 2.73-fold “traditional” generegulation (Table 1 in supplementary data) [21,22,29–37]. Theregulated genes LSP1 and SERPINB1 were present on thearrays as two independent spots each (with different acces-sion numbers) and the results were very close to each other:SERPINB1 with 10.5- and 8.3-fold and LSP1 with 4.0- and 3.6-fold gene induction. This confirmed the highly reproducibledesign of microarrays and constant hybridization conditions.

In summary, 60 genes were found to be significantly reg-ulated (� = 5). Notably, this represents 39.2% of all spots thatwere found to be regulated more than ± 2.73-fold withoutconsidering “secure regulation.” Thus nearly two-thirds ofthese genes do not comply with our quality criteria.

Fuzzy ClusteringAs an alternative method to identify retinoic acid regulatedgenes, we performed a fuzzy clustering approach with theaim of isolating high-quality data for regulated genes. All can-didate genes for gene regulation (average of A and B yielding

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Articledoi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

FIG. 5. Scatter-plot comparison of expression values obtained from samples A, B, and C. Cy3-labeled probe was synthesized from each of the three RNA sam-ples and was hybridized to three microarrays each. Such replicates were normalized and median expression values compared on scatter plots (A). Low-qualitymeasurements were eliminated using the flagsample values and a cut-off value of 1.0. We found 3995 genes to fulfill these quality criteria and they were re-plot-ted (B). The remaining scatter when plotting sample B versus sample C represents genes regulated by atRA.

A

B

≥ twofold regulation compared with C, n = 266) were plotted against corresponding flagsample values and the distribution was devised in distinct sub-groups using a FCMapproach. We tested clustering into 6 to 12 clusters and choseto use results based on 7 clusters (Fig. 7). Cluster numbers 2,4, and 5 represented the best experimental data (that is, lowgene regulation paired with low flagsample values), consistingof 49 spots or 14 downregulated and 34 upregulated genes(SERPINB1 was present as 2 duplicate spots). Also, 42% of thegenes from the statistical test were not found during this

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

clustering approach. Two upregulated (CD79A, APLP2) andseven downregulated (CTSH and six ESTs) genes were iden-tified using only the clustering approach, and these genes wereregulated between 2.9- and 3.4-fold and had very low flagsam-

ple values. Five of the additional downregulated ESTs clearlyrepresented mixed effects of gene regulation: while beingrepressed by atRA all of them were upregulated in sample A(compared to samples B or C), therefore we consider themrather as sensitive to environmental factors and most proba-bly not atRA-regulated.

389

Article doi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

FIG. 6. Gene regulation by atRA. Normalizedreplicate spots were used to calculate medianexpression values and corresponding qualityindicators (“flags”). We completely eliminated613 gene measurements from further analysisas they did not fulfill a minimum qualitythreshold. For the remaining 3995 genes rela-tive changes in expression levels were deter-mined, results for the biological duplicate sam-ples combined and their frequencydistribution represented in blue bars. To obtain“secure gene regulation” the flag values weresubtracted from apparent gene regulation. Thecorresponding frequency distribution wasplotted as red bars. Finally a test distributionwas fitted into the “secure gene regulation”histogram, shown as a yellow line.

As expected, results obtained from increased stringencyfiltration converged to a high degree, and 18 of 21 genes (86%)from the � = 1% test statistics were also identified using theclustering approach. Two conceptually different approaches,“secure regulation” and fuzzy clustering, produced similarresults demonstrating novel ways of using data quality indi-cators (flags) in combination with gene regulation.

Similarity between Samples (Hierarchical ClusterAnalysis)We used hierarchical cluster analysis to address the question

of which of the three data sets might resemblemore, the biological duplicates (samples B andC) or the pair drug-treated/untreated cells(samples A and B). All six replicate data sets(with nreduced = 3995) for each sample clusteredtightly together in the same dendrogram sub-branch (Fig. 8). The distances of the branchpoints for the different RNA samples indicatedthat the biological duplicates were slightlycloser compared with the atRA treated sam-ple. However, the dendrogram indicates thatone should consider all three samples differentfrom each other.

Biological NoiseDifferential gene expression observed betweenthe biological duplicates (samples A and B) wasdefined as biological noise. Scatter-plot analysisrevealed that signal intensities for samples Aand B did not perfectly follow linear regression(Fig. 5B, left). For this reason we chose to searchfor regulated genes under high stringency con-ditions (� = 1%) and found six regulated genes,with four of them upregulated ≥ 6.1-fold (RCN1,LMAN1, MBP, and EST R06666). These genesexert different functions; as expected, no

390

common molecular target or biological function can be associ-ated with this transcription profile.

DISCUSSION

The purpose of this study was to test the quality and robust-ness of expression profiling experiments to define retinoicacid regulated genes in a leukemia cell model. Originally,methods used for evaluating microarray experiments focusedon the identification of the strongest regulated genes. At

FIG. 7. Cluster analysis of gene-regulation profiles using fuzzy C-means (FCM). All genes with anaverage gene regulation (of samples A and B versus sample C) > 2 and individual gene regulation> 1.5 were selected (n = 266) and plotted against their average flag values (for flagsample values fromsamples A, B, and C). These data were column-normalized and classified into seven FCM clusters.Clusters 2, 4, and 5 had profiles close to the expected type and were selected as results.

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Articledoi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

ptitheVtmatt

CUTomoefcEPoEwsf

rtuoose

resent the challenge is to provide the basis for robust datahrough evaluating each raw signal’s reliability and disqual-fying artifacts. Two types of factors exist that may give riseo data deformation: systematic error based on cross-ybridization or incorrect normalization and random-typerror responsible for inconsistently reproducing replicates.irtually no cross-hybridization was observed in a competi-

ive hybridization experiment using a hybridization probeade from yeast total RNA under the conditions described

bove (data not shown). Below we discuss the factors con-ributing to random error and the development of methodso generate reliable results.

omparison with Quantitative RT-PCR and Data Derivedsing Affymetrix GeneChipso estimate the reliability we compared our results with thosebtained by quantitative RT-PCR and from an equivalenticroarray experiment using Affymetrix GeneChips. Results

btained by quantitative RT-PCR were considered as refer-nce values because PCR products were checked for speci-icity by resequencing and the dynamic range of all amplifi-ations was verified by linearity of calibration curves.xpression levels for 18 genes were measured by quantitativeCR, only the regulation of gene G1P3 was not revealed fromur array results. For the genes LSP1 and SERPINB1 (aliaslan), duplicate spots (holding different parts of the sequence)ere measured on the microarrays and their results corre-

ponded well. This confirmed highly reproducible conditionsor hybridizations.

We found that 1594 genes were present on both microar-ay platforms (cDNA arrays and oligonucleotide arrays) andhe results were in good agreement. Not a single case ofpregulation by one array type, but downregulation by thether, was observed. However, in 11% of the cases the degreef gene regulation did not match well, which could be due toeveral reasons: differences in conditions of cell culture (forxample, serum batches are known to vary significantly) or

FIG. 8. Similarity between samples A, B, and C. To study and rank the variousdifferences in gene expression values for samples A, B, and C on a global scalewe chose to use hierarchical cluster analysis. Weights and mean centers for“arrays” were determined for all normalized expression data and average link-age clustering with uncentered absolute correlation was performed. Only thedendrogram linking the 18 data sets in respect to their similarity was shown.The five-digit number represents array identifiers, and the letters A, B, or Cindicate the RNA probe that was used to hybridize to the particular array

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

RNA preparation; or array-related issues like linear range ofquantitation, stringency at hybridization, or different detec-tion of splice variants. Both platforms tended to underesti-mate bigger changes in RNA concentrations (like for TGM2)when compared with quantitative RT-PCR. In conclusion, thenumber of false positives from our array experiments can beestimated as low, whereas there is limited risk of missing truepositives for low-level regulated genes.

Quality Indicators and Concept of “Secure GeneRegulation”As data generated from microarray hybridizations haveintrinsic variation (often referred to as “noisy data”), we havebeen investigating algorithms determining the data qualityfor each gene in order to improve the reliability of results. Onesimple method to approach the problem of “noisy data” is todefine a threshold of minimum variation (like threefold generegulation) and to select all data above. Thresholds typicallyused are totally arbitrary [26,27,38], and several modificationsor statistical tests have been suggested [28,38,39]. The fact thatmost suggested regulated genes belong to the class of low-level gene regulation makes proper selection of thresholds achallenging task. Using conventional approaches, no efficientdistinction between true low-level regulations and artifactsdue to inaccuracy of measurement could be made.

An emerging number of groups have started using qual-ity indicators or “flag values” based on spot signal/noiseratios and/or reproducibility of spot replicates [11,14,15]. Thequality indicators or “flags” described in this manuscriptreveal that low-intensity spots typically yield less reliabledata. Signal/noise ratios were particularly important forexamining regions on the arrays with elevated non-specificbackground (for example, due to variable surface-coating orparticles sticking to the array surface). As an alternative tosimply removing low-quality spots/genes, we developed twonovel concepts of analysis considering the additional infor-mation provided by flagsample values throughout the entireanalysis. The approach of calculating “secure gene regula-tion” allows the search for regulated genes to be particularlystringent among low-level regulated genes. This feature isimportant to exclude artifacts because low-level gene regula-tions are easily distorted by random noise, particularly in thecase of low-level expressed genes.

Comparison of Samples on Global LevelScatter plots allow the comparison of only two data sets,where individual points differing between both data sets canbe easily identified. The large number of small differencesbetween the two biological duplicates (samples A and B) maygive rise to the impression that samples B and C might bemost similar. However, hierarchical cluster analysis allowsthe comparison and ranking of multiple samples with respectto similarity. As expected, the biological duplicates wereranked slightly closer. Surprisingly, in some cases data fromthe same position of the slides clustered even closer than toreplicates elsewhere on the slides. This indicates that the

391

Article doi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

slides were manufactured with high reproducibility; in sev-eral cases spots showed higher resemblance slightly betteramong different slides than spot duplicates elsewhere on theslide.

Factors of Variability, Biological NoiseThe principal reason for studying biological noise was toobtain an estimate for sample-to-sample variation. Althoughcell pellets and corresponding RNAs were prepared withextreme care, the biological duplicates were found to differin respect to expression levels of several genes. These genesare subject to regulation through environmental factors diffi-cult to control in standard laboratory conditions. Comparedwith atRA treatment, biological noise exhibited a differentprofile: although atRA induced a rather distinct set of genesthat can be distinguished from random error, the character-istics of biological noise resemble those of random error (thatis, a high number of very low-level changes). As the correla-tion of signal intensities obtained for samples A and B was notperfectly linear (Fig. 5B, left), gene regulations were slightlydistorted. To avoid listing false positives, we chose only themost extreme upregulated genes (� = 1%). These genes areinvolved in a variety of basic functions in cell maintenanceand metabolism, not suggesting a common stimulus or con-sensus molecular function.

Replicate hybridizations based on one RNA sample forestimating a measurement’s error are widely used [13,15], butthe presence of biological noise [40,41] demands an extendedconcept of “biological replicates” in order to cover this addi-tional level of variance.

Transcription Profiling for atRA, Biological SignificanceNB-4 cells undergo terminal differentiation and apoptosisseveral days after treatment with atRA [20]. We were inter-ested in targets of early transcriptional response and for thisreason we chose to harvest cells after 18 hours of atRA treat-ment. At this point first morphological changes indicatingthat differentiation has been initiated can be observed andthe cells slow down the rate of proliferation. In contrast, onsetof apoptosis can only be detected after 4 days of treatmentwith atRA. This explains why we found only one gene asso-ciated with apoptosis (BCL2A1 alias Bfl1). BCL2A1 has beenreported to have an anti-apoptotic function [42] and, in fact,after 18 hours of treatment with atRA no increase in the rateof apoptosis was detected [22].

Expression profiling allows us to take a snapshot of acell’s transcriptional activity. The transcription profile foratRA revealed two major groups of biological function: cellgrowth and maintenance; and genes involved in cell–cellsignaling and signal transduction. Downregulated genesbelonged to the class of cell growth and maintenance,whereas genes involved in cell communication were foundonly with upregulated genes. Among the strongest regu-lated genes we found two cytokines, confirming paracrineand/or autocrine signaling processes [21]. Basal expressionclose to the detection limit revealed that these cytokines

392

were activated from an initial inactive status. In contrast,genes responsible for metabolic functions were upregulatedless intensely—most had a detectable basal rate of tran-scriptional activity, indicating that only their rate of tran-scription was amplified.

We identified a double inhibition of the transcription fac-tor gene MYC: while MYC mRNA was downregulated, therepressor of MYC, ZFP161 [43], was upregulated.Interestingly, the number of downregulated genes was quitelow, with most of their functions remaining unknown (Table1 in supplementary data).

The concept of defining quality indicators or “flags” rep-resents a universal, unbiased, and objective system to dis-qualify and eliminate low-quality data giving rise to defor-mation of the overall results. Furthermore this concept’sexpansion to “secure regulation” of genes significantlyreduces the risk of picking false positives (specificity), whilesimultaneously increasing the rate of true positives (sensitiv-ity). This method is clearly of advantage as it automaticallyadopts to the quality of signals in a given experiment. In addi-tion, we show that a FCM clustering approach can be used asalternative to curve fitting of “secure-regulation” data.

Supplementary data for this article are available on IDEAL(http://www.idealibrary.com).

MATERIALS AND METHODS

Cell culture and preparation of RNA. NB-4 cells were cultivated in RPMI +10%FCS at a cell density of 0.5–1 � 105 cells/ml [20]. A flask holding 107 cells wassplit in two equal volumes, one of them treated with 1 �M atRA (Sigma, St.Louis, MO). Cells were harvested by centrifugation after 18 hours (sample C,treated atRA; sample B, no treatment) and washed once in ice-cold PBS, andpellets were frozen at –80�C. An additional cell pellet of untreated NB-4 cellswas prepared similarly at a different time (sample A). RNA was extractedusing the Qiagen RNeasy Midi protocol (Qiagen, Hilden, Germany).

Probe synthesis. The following protocol describes probe synthesis for thehybridization of three replicate slides: 60 �g RNA and 3 �l AmershamScoreCard spike controls (Amersham Pharmacia, Uppsala, Sweden) weremixed with 0.23 �g of anchored oligo dT(25) (Amersham Pharmacia) andwater added to 30 �l final volume, heated for 5 minutes at 70�C and allowedto cool at RT for 10 minutes. Reverse transcription was performed in 60 �lfinal volume using 10 U/�l Superscript II reverse transcriptase (Gibco LifeTechnologies, Gaithersburg, MD), 3 �l dNTPs (Amersham Pharmacia), and50 �M Cy3-dCTP (Amersham Pharmacia), incubated for 90 minutes at 42�Cand heat-inactivated by 3 minutes at 95�C. Residual RNA was destroyed byadding NaOH to 0.24 M final concentration and incubating as 37�C for 10 min-utes, the reaction terminated by an equimolar amount of HCl and bufferedat 0.185 M Tris, pH 6.8. Unincorporated nucleotides were removed using aQIAquick PCR purification kit (Qiagen, Hilden, Germany). In parallel a poolof all three RNAsamples was labeled using Cy5-dCTP (AmershamPharmacia). Purified Cy3 and Cy5 probes were combined and dried to com-pleteness using a speedvac, followed by resuspension in 110 �l 42�C pre-warmed hybridization buffer (50% formamide, 35.7 ng/�l A80, 1 � hybridiza-tion buffer V.2 (Amersham Pharmacia)), incubated for 30 minutes at 70�C,cooled at room temperature, and filtered through a 0.45 �m Ultrafree-MC(Millipore, Bedford, MA).

Microarray preparation. A subset of 4608 genes from the UniGene library(Research Genetics, Groningen, The Netherlands) was PCR amplified and spot-ted together with 96 internal controls on glass microarrays (Corning, Corning,

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Articledoi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

NY) using a Molecular Dynamics generation 3 slide spotter (AmershamPharmacia). The 32 ScoreCard control DNAs (Amersham Pharmacia) werespotted in every thirteenth row for each sub-grid. Deposited DNA wascrosslinked using a Stratalinker (Stratagene, La Jolla, CA) and baked for 2 hoursat 80�C. To avoid positional effects, spot-duplicates were not located next toeach other but duplicates of all spot-subgrids (with 13 � 32 spots) on the lefthalf of the slide were respotted on the right half of the slide-surface.Immediately before use the slides were incubated for 2 hours in a blocking solu-tion (6� SSC, 0.2% SDS, 1 mg/ml BSA), washed quickly five times in distilledH2O, denatured for 3 minutes in boiling water, and dried by centrifugation for7 minutes at 500 rpm.

Hybridization and washing of the arrays. One-third of fluorescent labeledcDNA probe (containing approximately 2 �g cDNA) was pipetted on eacharray and covered by a 25 � 60 mm Lifterslip coverslip (Erie ScientificCompany, Portsmouth, NH). Then the arrays were incubated for 18 hours at42�C in a humidified atmosphere. To remove cover slips the slides were dippedin 32�C pre-warmed 1� SSC/0.2% SDS, and washed once for 5 minutes in 1�SSC/0.2% SDS and twice for 10 minutes each in 0.1� SSC/0.2% SDS (all solu-tions at 32�C) with gentle shaking. Finally, the slides were dipped in 1� SSCfollowed by quickly dipping in H2O and dried by centrifugation for 7 minutesat 500 rpm.

Spot measurement, data normalization, and quality indicators (“flags”).Slides were scanned on a ScanArray 2000 (Packard BioScience, Meriden, CT)and resulting images analyzed using ArrayVision (Imaging Research Inc., St.Catherines, ON, Canada), all measures were taken using the median function.Using Microsoft Excel spreadsheets the apparent edge effect was analyzed andcorrection factors were optimized for minimal residues between left/right spotduplicates. As spots with low fluorescence intensities exhibit very high devia-tion, only the 60% most intense data pairs were used to determine each slide’sedge correction factors. Replicate slides were normalized to their median spe-cific signals (Fig. 1B) using a user-interactive procedure before normalizingresulting medians for the three different RNA samples. Based on negative con-trols spots from the “ScoreCard” collection the median detection limit wasdetermined. Relative changes in expression levels were represented on a mod-ified scale as (x-1)-fold gene regulation (for example, threefold gene inductionyielding “+2”).

A primary quality indicator, “flag at slide-level,” was formed using sig-nal/noise ratios and data at scanner-saturation. If a gene’s specific signalintensity reached 98–100% of maximal scanner intensity (that is, saturation),a gradually increasing penalty value was given (maximum value: 0.6). Suchsaturation penalty values (if different from 0) were added to the square prod-uct of noise/signal ratios giving the “flag at slide-level.” Each gene’s vari-ability of expression data in the replicate data sets was combined with the“flags at slide-level” to generate the flagsample. The coefficient of variation(CV, standard deviation divided by mean) was determined for each geneusing the six normalized and edge-effect corrected expression values.Resultant CV values were divided by 10 and added to the correspondingaverage “flag at slide-level.” Genes with a flagsample reaching or exceeding thethreshold of 1.0 were eliminated from further analysis. “Secure gene regula-tion” data were obtained by subtracting flagsample from the amount of calcu-lated induction. If the flagsample was larger than the change of expression lev-els, corresponding “secure regulation” values were defined as 0. “Secure reg-ulation” of RNA from sample C compared with both reference samples (Aand B) was determined using the lower level regulation comparing (C ver-sus A) or (C versus B), this way we avoided picking up genes only regulatedin one of the reference samples.

Curve fitting and statistical test. The frequency distribution of “secure reg-ulation” data for sample C versus samples A and B was plotted and underthe condition of the integrated area under both curves remaining identical forthe observed and the fitted function, test distributions were fitted aiming tominimize the sum of squared residues. We tested functions for probabilitydensities type Gaussian, “Cauchy” (f(x) = 1 / (1 + x2)), “Laplace” (f(x) = exp -|x|), and composed functions with one part of a “Cauchy” and an additionalpart “Laplace” probability density [44]. Functions with multiple variableswere fitted using an iterative Newton method. From the resulting best fitfunction 95% and 99% cut-off values were determined indicating at whichlevel of “secure regulation” genes can be considered significantly regulated(5% risk or 1% risk).

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Cluster analysis. The Fuzzy C-Means (FCM) method allows the partition ofdata sets into k clusters where every cluster is characterized by its center [45].Each gene is linked to all cluster centers via a vector composed of indexes. Thevalues for such indexes range between 0 and 1 determining the membershipof a given gene to the clusters. As constraint, the sum of all indexes for everyvector is 1. Computations of cluster centers and indexes were performed alter-nately until convergence was reached, that is, stabilization of centers from oneiteration to the next. To search for atRA regulated genes we filtered out allgenes with the average of sample A and B being regulated at least twofold com-pared with sample C (n = 429). Absolute values for gene regulation were plot-ted against the corresponding average flagsample.

Edge-effect corrected gene expression data were used for the program“Cluster” from Stanford University [4] and before clustering normalized andweighted for columns (“arrays”). Analysis was performed as average linkageclustering, and distances for genes and arrays were calculated as absoluteuncentered correlation.

Real-time PCR. Each RNA (10 mg) was reverse transcribed using 25 ng/�loligo dT (Gibco BRL, Rockville, MD) and 2.5 U/�l AMV reverse transcriptase(Sigma, St. Louis, MO) for 100 minutes at 42�C. The remaining RNA wasdigested by 0.013 U/�l RNase H (Promega, Madison, WI), phenol-chloroformpurified, ethanol-precipitated, and resuspended in 120 �l H2O. Quantitativereal-time PCR was performed on a LightCycler (Roche Diagnostics, Mannheim,Germany) using a SYBR-green master-mix with 4 mM MgCl2 (Sigma) and 8 pMoligos (synthesized at the IGBMC core facility). The following genes wereamplified and measured: BCL2A1 (alias Bfl1), CAMK1, CDC6, CDK8, G1P3,HMGIY, LSP1, MPHOSPH10, MT1B, MYC, NDN, SERPINB1, STX3A, TCEA1,TGM2, TP53BP1 (alias p53), FLJ14297 (EST H91281), and EST R09153. IndividualcDNA samples were run as duplicates, quantitation was performed based onsix-point calibration curves, and all experiments were repeated a second time.Differential gene expression was calculated after normalization of the samplesagainst the average of four housekeeping genes (GAPD, RPLP0 (alias 36B4),ACTB (�-actin), and HPRT1), and deviation of replicates was calculated as stan-dard error.

ACKNOWLEDGMENTSWe thank Michéle Lieb and Astrid Pornon (IGBMC) for technical assistance; PaulKayne and Johnny Park (Bristol-Myers Squibb) for providing the microarrays andteaching protocols; Christine Bôle-Feysot and Stanislas Dumanoir (IGBMC) for dis-cussions; and Philippe Kastner (IGBMC) for contributing to development of FCM,Institut National de la Santé et de la Recherche Médicale, Centre National de laRecherche Scientifique and the Hôpital Universitaire de Strasbourg. W.R. was financedin part through a Marie-Curie Fellowship, E.C. contract QLG3-CT2000-00844 andalso was participant in EMBO-courses on Microarrays and Advanced Bioinformatics.

RECEIVED FOR PUBLICATION MARCH 28; ACCEPTED AUGUST 6, 2002.

REFERENCES1. Harrington, C. A., Rosenow, C., and Retief, J. (2000). Monitoring gene expression using

DNA microarrays. Curr. Opin. Microbiol. 3: 285–291.2. Lipshutz, R. J., Fodor, S. P., Gingeras, T. R., and Lockhart, D. J. (1999). High density syn-

thetic oligonucleotide arrays. Nat. Genet. 21: 20–24.3. Brown, P. O., and Botstein, D. (1999). Exploring the new world of the genome with DNA

microarrays. Nat. Genet. 21: 33–37.4. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). Cluster analysis and

display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95: 14863–14868.5. Tamayo, P., et al. (1999). Interpreting patterns of gene expression with self-organizing

maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci.USA 96: 2907–2912.

6. Brazma, A., and Vilo, J. (2000). Gene expression data analysis. FEBS Lett. 480: 17–24.7. Quackenbush, J. (2001). Computational analysis of microarray data. Nat. Rev. Genet. 2:

418–427.8. Cooper, C. S. (2001). Applications of microarray technology in breast cancer research.

Breast Cancer Res. 3: 158–175.9. Devaux, F., Marc, P., and Jacq, C. (2001). Transcriptomes, transcription activators and

microarrays. FEBS Lett. 498: 140–144.10. Alizadeh, A. A., Ross, D. T., Perou, C. M., and van de Rijn, M. (2001). Towards a novel

classification of human malignancies based on gene expression patterns. J. Pathol. 195:41–52.

11. Wang, X., Ghosh, S., and Guo, S. W. (2001). Quantitative quality control in microarrayimage processing and data acquisition. Nucleic Acids Res. 29: E75.

393

Article doi:10.1006/geno.2002.6848, available online at http://www.idealibrary.com on IDEAL

12. Yang, M. C., et al. (2001). A statistical method for flagging weak spots improves nor-malization and ratio estimates in microarrays. Physiol. Genomics 7: 45–53.

13. Lee, M. L., Kuo, F. C., Whitmore, G. A., and Sklar, J. (2000). Importance of replication inmicroarray gene expression studies: statistical methods and evidence from repetitivecDNA hybridizations. Proc. Natl. Acad. Sci. USA 97: 9834–9839.

14. Tseng, G. C., Oh, M. K., Rohlin, L., Liao, J. C., and Wong, W. H. (2001). Issues in cDNAmicroarray analysis: quality filtering, channel normalization, models of variations andassessment of gene effects. Nucleic Acids Res. 29: 2549–2557.

15. Wildsmith, S. E., Archer, G. E., Winkley, A. J., Lane, P. W., and Bugelski, P. J. (2001).Maximization of signal derived from cDNA microarrays. Biotechniques 30: 202–206, 208.

16. Benoit, G. R., Tong, J. H., Balajthy, Z., and Lanotte, M. (2001). Exploring (novel) geneexpression during retinoid-induced maturation and cell death of acute promyelocyticleukemia. Semin. Hematol. 38: 71–85.

17. Kastner, P., and Chan, S. (2001). Function of RAR� during the maturation of neutrophils.Oncogene 20: 7178–7185.

18. Melnick, A., and Licht, J. D. (1999). Deconstructing a disease: RAR�, its fusion partners,and their roles in the pathogenesis of acute promyelocytic leukemia. Blood 93: 3167–3215.

19. Pandolfi, P. P. (2001). Oncogenes and tumor suppressors in the molecular pathogenesisof acute promyelocytic leukemia. Hum. Mol. Genet. 10: 769–775.

20. Ruchaud, S., et al. (1994). Two distinctly regulated events, priming and triggering, dur-ing retinoid-induced maturation and resistance of NB4 promyelocytic leukemia cell line.Proc. Natl. Acad. Sci. USA 91: 8428–8432.

21. Altucci, L., et al. (2001). Retinoic acid-induced apoptosis in leukemia cells is mediatedby paracrine action of tumor-selective death ligand TRAIL. Nat. Med. 7: 680–686.

22. Benoit, G., et al. (1999). RAR-independent RXR signaling induces t(15;17) leukemia cellmaturation. EMBO J. 18: 7011–7018.

23. Liu, T. X., et al. (2000). Gene expression networks underlying retinoic acid-induced dif-ferentiation of acute promyelocytic leukemia cells. Blood 96: 1496–1504.

24. Leemans, R., et al. (2000). Quantitative transcript imaging in normal and heat-shockedDrosophila embryos by using high-density oligonucleotide arrays. Proc. Natl. Acad. Sci.USA 97: 12138–12143.

25. Taniguchi, M., Miura, K., Iwao, H., and Yamanaka, S. (2001). Quantitative assessmentof DNA microarrays—comparison with northern blot analyses. Genomics 71: 34–39.

26. Kelly, D. L., and Rizzino, A. (2000). DNA microarray analyses of genes regulated duringthe differentiation of embryonic stem cells. Mol. Reprod. Dev. 56: 113–123.

27. Bassett, D. E., Jr., Eisen, M. B., and Boguski, M. S. (1999). Gene expression informatics—it’s all in your mine. Nat. Genet. 21: 51–55.

28. Chen, Y., Dougherty, E. R., and Bittner, M. (1997). Ratio based decisions and the quanti-tative analysis of cDNA microarray images. J. Biomed. Opt. 2: 364–374.

29. Burn, T. C., Petrovick, M. S., Hohaus, S., Rollins, B. J., and Tenen, D. G. (1994). Monocytechemoattractant protein-1 gene is expressed in activated neutrophils and retinoic acid-

394

induced human myeloid cell lines. Blood 84: 2776–2783.30. Jing, Y., et al. (2001). Combined effect of all-trans retinoic acid and arsenic trioxide in acute

promyelocytic leukemia cells in vitro and in vivo. Blood 97: 264–269.31. Horie, S., et al. (2001). Acceleration of thrombomodulin gene transcription by retinoic acid:

retinoic acid receptors and Sp1 regulate the promoter activity through interactions withtwo different sequences in the 5�-flanking region of human gene. J. Biol. Chem. 276:2440–2450.

32. Platko, J. D., and Yen, A. (1997). Paxillin increases as retinoic acid or vitamin D3 induceHL-60 cell differentiation. In Vitro Cell Dev. Biol. Anim. 33: 84–87.

33. Benedetti, L., et al. (1996). Retinoid-induced differentiation of acute promyelocytic leukemiainvolves PML-RAR�-mediated increase of type II transglutaminase. Blood 87: 1939–1950.

34. Takenaga, K., Nakamura, Y., and Sakiyama, S. (1994). Expression of a calcium binding pro-tein pEL98 (mts1) during differentiation of human promyelocytic leukemia HL-60 cells.Biochem. Biophys. Res. Commun. 202: 94–101.

35. Katagiri, K., et al. (1996). Lyn and Fgr protein-tyrosine kinases prevent apoptosis duringretinoic acid-induced granulocytic differentiation of HL-60 cells. J. Biol. Chem. 271:11557–11562.

36. Rius, C., Vilaboa, N., Mata, F., and Aller, P. (1993). Differential modulation of the expres-sion of the intermediate filament proteins vimentin and nuclear lamins A and C by dif-ferentiation inducers in human myeloid leukemia (U-937, HL-60) cells. Exp. Cell Res. 208:115–120.

37. Delia, D., et al. (1995). Regulation of apoptosis induced by the retinoid N-(4-hydroxyphenyl)retinamide and effect of deregulated bcl-2. Blood 85: 359–367.

38. Kerr, M. K., and Churchill, G. A. (2001). Bootstrapping cluster analysis: assessing the reli-ability of conclusions from microarray experiments. Proc. Natl. Acad. Sci. USA 98:8961–8965.

39. Young, R. A. (2000). Biomedical discovery with DNA arrays. Cell 102: 9–15.40. Hughes, T. R., et al. (2000). Functional discovery via a compendium of expression profiles.

Cell 102: 109–126.41. Mills, J. C., Roth, K. A., Cagan, R. L., and Gordon, J. I. (2001). DNA microarrays and

beyond: completing the journey from tissue to cell. Nat. Cell Biol. 3: E175–178.42. D’Sa-Eipper, C., and Chinnadurai, G. (1998). Functional dissection of Bfl-1, a Bcl-2 homolog:

anti-apoptosis, oncogene-cooperation and cell proliferation activities. Oncogene 16:3105–3114.

43. Sobek-Klocke, I., et al. (1997). The human gene ZFP161 on 18p11.21-pter encodes a puta-tive c-myc repressor and is homologous to murine Zfp161 (Chr 17) and Zfp161-rs1 (XChr). Genomics 43: 156–164.

44. Papoulis, A. (1991). Probability, Random Variables and Stochastic Processes. McGraw Hill,New York.

45. Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. PlenumPress, New York.

GENOMICS Vol. 80, Number 4, October 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.