GeneNote: whole genome expression profiles in normal human tissues

6
C. R. Biologies 326 (2003) 1067–1072 Molecular biology and genetics GeneNote: whole genome expression profiles in normal human tissues Orit Shmueli a,, Shirley Horn-Saban b , Vered Chalifa-Caspi b , Michael Shmoish a , Ron Ophir b , Hila Benjamin-Rodrig a,c , Marilyn Safran b , Eytan Domany c , Doron Lancet a a Departments of Molecular Genetics, The Weizmann Institute of Science, 76100 Rehovot, Israel b Biological Services, The Weizmann Institute of Science, 76100 Rehovot, Israel c Physics of Complex Systems, The Weizmann Institute of Science, 76100 Rehovot, Israel Received 16 September 2003; accepted 23 September 2003 Presented by François Gros Abstract A novel data set, GeneNote (Gene Normal Tissue Expression), was produced to portray complete gene expression profiles in healthy human tissues using the Affymetrix GeneChip HG-U95 set, which includes 62 839 probe-sets. The hybridization intensities of two replicates were processed and analyzed to yield the complete transcriptome for twelve human tissues. Abundant novel information on tissue specificity provides a baseline for past and future expression studies related to diseases. The data is posted in GeneNote (http://genecards.weizmann.ac.il/genenote/ ), a widely used compendium of human genes (http://bioinfo.weizmann.ac.il/genecards). To cite this article: O. Shmueli et al., C. R. Biologies 326 (2003). 2003 Académie des sciences. Published by Elsevier SAS. All rights reserved. Résumé GeneNote : profils d’expression complets dans des tissus humains normaux. Un nouveau jeu de données, GeneNote (Gene Normal Tissue Expression), a été produit pour décrire les profils d’expression complets des gènes dans les tissus humains sains, en utilisant les puces GeneChip HG-U95 d’Affymetrix, qui comprennent 62 839 jeux de sondes. Les intensités d’hybrida- tion de deux réplicats ont été traités et analysés pour décrire le transcriptome complet pour douze tissus humains. Les données nouvelles abondantes sur la spécificité tissulaire fournissent une base de référence pour les études d’expression passées et futures en relation avec les maladies. Les données sont accessibles dans GeneNote (http://genecards.weizmann.ac.il/genenote/ ) une compilaton de gènes humains largement utilisée (http://bioinfo.weizmann.ac.il/genecards). Pour citer cet article : O. Shmueli et al., C. R. Biologies 326 (2003). 2003 Académie des sciences. Published by Elsevier SAS. All rights reserved. Keywords: microarray; transcriptome; expression profile; normal human tissues; GeneCards Mots-clés : microréseau ; transcriptome ; profil d’expression ; tissus humains normaux ; GeneCards * Corresponding author. E-mail address: [email protected] (O. Shmueli). 1631-0691/$ – see front matter 2003 Académie des sciences. Published by Elsevier SAS. All rights reserved. doi:10.1016/j.crvi.2003.09.012

Transcript of GeneNote: whole genome expression profiles in normal human tissues

ofilesidization

tissuesdiseaseses

otehumains

rsquohybrida-s donneacuteeses et futures

C R Biologies 326 (2003) 1067ndash1072

Molecular biology and genetics

GeneNote whole genome expression profilesin normal human tissues

Orit Shmuelialowast Shirley Horn-Sabanb Vered Chalifa-Caspib Michael ShmoishaRon Ophirb Hila Benjamin-Rodrigac Marilyn Safranb Eytan Domanyc

Doron Lanceta

a Departments of Molecular Genetics The Weizmann Institute of Science 76100 Rehovot Israelb Biological Services The Weizmann Institute of Science 76100 Rehovot Israel

c Physics of Complex Systems The Weizmann Institute of Science 76100 Rehovot Israel

Received 16 September 2003 accepted 23 September 2003

Presented by Franccedilois Gros

Abstract

A novel data set GeneNote (Gene Normal TissueExpression) was produced to portray complete gene expression prin healthy human tissues using the Affymetrix GeneChip HG-U95 set which includes 62 839 probe-sets The hybrintensities of two replicates were processed and analyzed to yield the complete transcriptome for twelve humanAbundant novel information on tissue specificity provides a baseline for past and future expression studies related toThe data is posted in GeneNote (httpgenecardsweizmannacilgenenote) a widely used compendium of human gen(httpbioinfoweizmannacilgenecards) To cite this article O Shmueli et al C R Biologies 326 (2003) 2003 Acadeacutemie des sciences Published by Elsevier SAS All rights reserved

Reacutesumeacute

GeneNote profils drsquoexpression complets dans des tissus humains normauxUn nouveau jeu de donneacutees GeneN(Gene NormalTissueExpression) a eacuteteacute produit pour deacutecrire les profils drsquoexpression complets des gegravenes dans les tissussains en utilisant les puces GeneChip HG-U95 drsquoAffymetrix qui comprennent 62 839 jeux de sondes Les intensiteacutes dtion de deux reacuteplicats ont eacuteteacute traiteacutes et analyseacutes pour deacutecrire le transcriptome complet pour douze tissus humains Lenouvelles abondantes sur la speacutecificiteacute tissulaire fournissent une base de reacutefeacuterence pour les eacutetudes drsquoexpression passeacuteen relation avec les maladies Les donneacutees sont accessibles dans GeneNote (httpgenecardsweizmannacilgenenote) unecompilaton de gegravenes humains largement utiliseacutee (httpbioinfoweizmannacilgenecards) Pour citer cet article O Shmueliet al C R Biologies 326 (2003) 2003 Acadeacutemie des sciences Published by Elsevier SAS All rights reserved

Keywords microarray transcriptome expression profile normal human tissues GeneCards

Mots-cleacutes microreacuteseau transcriptome profil drsquoexpression tissus humains normaux GeneCards

Corresponding authorE-mail address oritshmueliweizmannacil (O Shmueli)

1631-0691$ ndash see front matter 2003 Acadeacutemie des sciences Publisdoi101016jcrvi200309012

hed by Elsevier SAS All rights reserved

1068 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

n byhernt ofideda-iestis-lyres-as

yn-n-in apar-ovelionor

us-2]

sestypescecalandaseishichnts

ntof

essed0]an

G-eentionsthetantion

sive

-alo

n-ainertaterd

neuf--G-of

ne-du-et

ue

asuita

itsvia

ata

aluetheentplesand

en-t ofde-

l-es

1 Introduction

The human body orchestrates gene expressioco-regulating genes whose products function togetMany challenges are posed due to the vast amougenomic and expression data that has to be divinto functional biological groups using biological sttistical and computational tools Many past studcentered on comparisons of diseased to healthysues and were limited to a subset of human genes

High-density oligonucleotide arrays enable highparallel and comprehensive studies of gene expsion The transcription patterns produced knownthe expression profile depict the subset of mRNA sthesized in a certain cell or tissue At its most fudamental level the expression profile describesquantitative way which genes are expressed in aticular tissue More sophisticated issues such as ngene functional characterization gene identificatin biological pathways genetic variation analysisidentification of drug targets could be surveyed bying bioinformatics tools such as cluster analysis ([1and self organizing-maps [34])

The construction of gene expression databais a high priority of todayrsquos research communiSuch databases closely integrated with other tyof genomic information promise not only to enhanour understanding of many fundamental biologiprocesses but also to accelerate drug discoverylead to customized diagnosis and treatment of dise[56] An example generated by our own grouppresented the recent releases of GeneCards wcontain expression data both from array experimeand from lsquoelectronic Northernrsquo analyses [7]

As a significant extension of previous relevaefforts we describe a whole-genome repertoireexpression profiles in twelve normal human tissuPreviously only studies that compare a single diseatissue with a healthy one were preformed [8ndash1Other surveys that were done on healthy humtissues were limited to one array type (ie HU95A [11] Hu6800 [12]) Here for the first time wpresent the expression analysis of the full complemof more than 60 000 gene and EST representatin 12 normal human tissues It is shown thatadditional less characterized genes harbor imporinformation and that a whole-genome express

analysis is essential for generating comprehentranscription analyses

2 Materials and methods

PolyA+ RNA samples from twelve normal human tissues were purchased from Clontech (PAlto CA) This collection of major human tissues icludes Bone marrow (catalog number 6573-1) br(6516-1) heart (6533-1) kidney (6538-1) liv(6510-1) lung (6524-1) pancreas (6539-1) pros(6546-1) skeletal muscle (6541-1) spinal co(6593-1) spleen (6542-1) and thymus (6536-1)

Preparation and hybridization of cRNA were doaccording to the manufacturersquos instructions [13] Sficient hybridization cocktail solution for five independent hybridizations reactions (ie the full set HU95A-E) was prepared The hybridization reactionsall the arrays in the set were carried out simultaously The mRNA from each tissue was reacted inplicates against the full human Affymetrix arrays s(HG-U95A-E) to yield two sets of results The 3prime5primesignal ratios of GAPDH were always below the valof 3 as expected for a fine labeling reaction

Arrays were analyzed and expression value wcalculated for each gene by using Microarray S(MAS) version 50 software (Affymetrix Santa ClarCA) Expression values for each gene calledsignalwere calculated using the MAS 50 software withdefault parameter settings Scaling was not donea MAS 50 option Instead we normalized our das follows the intensities of each array were log10transformed and scaled to a constant reference v(global normalization) This reference value wasmean of all log intensities in all of the tissues Prescalls percentages are presented in Fig 2 for all samaccording to the array type (see also ResultsDiscussion)

3 Results and discussion

Whole-genome gene expression profiles were gerated from the normalized signal values The se12 tissue expression values for a given gene wasfined as itstissue vector Normalization was used to alow a meaningful comparison among different tissu

O Shmueli et al C R Biologies 326 (2003) 1067ndash1072 1069

m-liza-

bu-ys its Ablyar-

heig 2ngell

rrayulteost

ileen-Se-

sig-m-sueach

e des of

elec

rrayporttheing

s

asedsuefor

es isar-re-ake

isin-

fact

nlycr-entrsquo7]tiveueelu-

Fig 1 Intensity distribution in the different arrays set in heart saple Density plots of one heart sample before and after normation are presented PanelA presents the density plot for log10 rawsignals while panelB presents the normalized signals

Fig 1 shows a comparison of the intensities distrition of one heart sample for the five different arrabefore and after normalization In this density plotis observed that the highest intensities are in arrayand B while arrays C D and E produce consideralower intensities These results are similar to the elier observations by Bakay et al [8] Aspects of tdecreased array intensities are also presented in Fwhich displays the percentage of lsquopresentrsquo calls alothe samples for each array type For example it is wobserved that the percentage of lsquopresentrsquo calls in aD is consistently below 10 This could be the resof intrinsic hybridization differences It could also bdue to the average quality of the probe-sets since mof the well-studied genes are found on array A wharrays B to E are constructed with increasing propsities of less well-characterized ESTs (Expressedquence Tags)

Tissue specific genes are supposed to have anificant role in tissue functionality Hence we exained the counts of such genes in the different tissamples (Fig 3) One representative sample for etissue was chosen and tissue specific genes wertermined on the basis of the lsquoabsentndashpresentrsquo callthe Affymetrix MAS 50 software A similar criterionwas used in the past for housekeepings genes s

-

-

Fig 2 Percent of present calls in all samples according to the atype lsquoPresentrsquo call percentages were taken from the MAS 50 refiles and are presented for the various array types in all ofsamples The AndashE letters represent the array type The followare the tissue samples shortcutsBMR for bone marrowBRN forbrain HRT for heartKDN for kidney LNG for lung LVR forliver MSL for musclePNC for pancreasPST for prostateSPCfor spinal cordSPL for spleen andTMS for thymus The numberattached to the sample name represent the replicate number

tion [14] In this paper we define tissue specificityhaving a lsquopresentrsquo call in only one tissue It is observthat brain and thymus had the highest number of tisspecific genes almost twice the number observedother tissues The propensity of tissue specific gennot much lower in arrays BndashE when compared toray A suggesting that important information alsosides within the former However one should also tinto consideration that the amount of lsquopresentrsquo callsreduced on arrays CndashE (Fig 2) Consequently thecreased effect of tissue specificity may be an artiof the array quality and design

The MAS 50 package is one of the most commoused software tools for analyzing high-density miroarray results However its use of lsquopresentndashabscalls is a controversial issue in the literature [15ndash1To address this we have begun to develop alternamethods utilizing a normalized entropy-related TissSpecificity Index (TSI) computed directly from thtissue vectors This algorithm is currently being eva

1070 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

rrayesentsple

genes

t al

ub-ct

un-elseen

en-ex-delshtillof

the

n a 4m)t Patedtoa

e ofthe

forand

dgigherlored

ive

ofsionilarncentalionsueat

Fig 3 Tissue specific genes in all of the tissues sorted by the aset Tissue specific genes were found based on the lsquoabsentndashprcall of the Affymetrix MAS 50 software Tissue specificity wadefined as having a lsquopresentrsquo call in only one tissue One samfor each tissue was chosen for the search The tissue specificdistribution on the arrays set HG-U95A-E is shown for all tissue

ated as compared to other methods (O Shmueli emanuscript in preparation)

Criticism has also been directed towards the straction of the mismatch probes (MM) from the perfematch probes (PM) creating PM-MM values as anderlying computation for assessing background levand signal computations Several methods have bdeveloped to evaluate the background level of intsity above which the value truly reflects the genepression value [151618ndash20] Some of these mofit log(PM-Background) instead of PM-MM thougin low-intensity regions the current models are svery questionable [16] Consequently the validitythe zone of low-intensity probes which is close tobackground level remains undetermined

The tissue vectors for each gene were drawn oroot scale representation [7] as exemplified in Figfor the gene APCS (Amyloid P-component seru[21] APCS is a precursor for amyloid componenwhich is found in basement membrane and associwith amyloid deposits A root scale enables onevisualize several orders of magnitude similar tologarithmic scale but preserves some advantagthe linear scale ie a differential increase with

rsquo

s

Fig 4 Tissue vector for the gene APCS [21] Tissue vectorthe gene APCS was calculated from the normalized signalsis presented in a graphical way (panelA) Tissues were groupeaccording to their origin and the groups colored accordingly (enerve tissues in green) The range between the lower and hmeasurements was represented by a white box above the cominimal measurement bar The graph is presented on they-axiswith a special root scale [7]Y = X(1B) whereB = log210 PanelB presents the tissue vector colored map

different orders of magnitude This affords an effectview-at-a-glance of the tissue vectors

A major aim of this work is to enable predictionthe function of novel genes based on their expresprofiles It is expected that genes that display simexpression patterns are functionally related sithey are co-regulated under all of the developmeconditions [2223] On the basis of the expressprofile similarities ten tissues were ordered in a tiscorrelation matrix (Fig 5) It is clearly seen th

O Shmueli et al C R Biologies 326 (2003) 1067ndash1072 1071

zedorre-cal-in a

re-

wonsof

eelyueondusents

and

d

n

e anesetataplayme

tedionoisisir

terroc

ing97

aly-12

ster7ndash

ro-e

ewhen-

a-ld-

erann

Fig 5 Tissue correlation matrix Tissue vectors of the normalisignals (log transformed and scaled) were used to calculate clations between twenty tissue samples The correlations wereculated using the Pearson correlation coefficient and presentedtissue vector correlation matrix Different correlation levels are psented according to the color scale on the right PanelA presents thetissue vector correlation matrix for array A while panelB presentsarrays BndashE

array A (Fig 5A) and arrays BndashE (Fig 5B) shothe same pattern of correlations The high correlatifound between tissue replicates is an indicationthe validity of the results [2425] In addition thcorrelation matrix revealed three groups of closrelated tissues The first group with high inter-tisscorrelation includes brain and spinal-cord the secis heart and muscle and the last includes thymspleen bone-marrow and lung This is in agreemwith the expected biological origin of the tissue

since they represent the nervous system muscleimmune system respectively

All expression data raw as well as normalizehave been stored in the GeneNote database (httpgenecardsweizmannacilgenenote) GeneNote tis-sue vectors are provided in GeneCards (httpbioinfoweizmannacilgenecards) for all genes that couldbe explicitly assciated via the Affymetrix annotatio(currentlysim20 000 genes)

The preliminary results presented above providclear indication of the importance of surveying geexpression in an as broad as possible a geneThe combination of comprehensive experimental dgathering computer analysis and database disprovide relevant and useful tools for the transcriptocommunity

Acknowledgements

The work described in this paper was supporby the Abraham and Judith Goldwasser foundatDoron Lancet is the incumbent of the Ralph and LSilver Chair in Human Genomics Eytan Domanythe incumbent of the Henry J Leir Professorial Cha

References

[1] MB Eisen PT Spellman PO Brown D Botstein Clusanalysis and display of genome-wide expression patterns PNatl Acad Sci USA 95 (1998) 14863ndash14868

[2] G Getz E Levine E Domany Coupled two-way clusteranalysis of gene microarray data Proc Natl Acad Sci USA(2000) 12079ndash12084

[3] M Kurella LL Hsiao T Yoshida JD Randall G ChowSS Sarang RV Jensen SR Gullans DNA microarray ansis of complex biologic processes J Am Soc Nephrol(2001) 1072ndash1078

[4] A Sturn J Quackenbush Z Trajanoski Genesis cluanalysis of microarray data Bioinformatics 18 (2002) 20208

[5] G Zweiger Knowledge discovery in gene-expression-micarray data mining the information output of the genomTrends Biotechnol 17 (1999) 429ndash436

[6] T Strachan M Abitbol D Davidson JS Beckmann A ndimension for the human genome project towards compresive expression maps Nat Genet 16 (1997) 126ndash132

[7] M Safran V Chalifa-Caspi O Shmueli T Olender M Lpidot N Rosen M Shmoish Y Peter G Glusman E Femesser A Adato I Peter M Khen T Atarot Y GronD Lancet Human Gene-Centric Databases at the Weizm

1072 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

E

iblecle

t-ines

anl-nsre- J

rhschmes

oual-m

217

engar-of

ics 7

deion

n-nbe

h-el

-u-at-

s-

e-ith3ndash

geentd Pe 1

tou-ak-

ery09ndash

liccale

jutal

MC

g--

sity9

Institute of Science GeneCards UDB CroW 21 and HORDNucleic Acids Res 31 (2003) 142ndash146

[8] M Bakay P Zhao J Chen EP Hoffman A web-accesscomplete transcriptome of normal human and DMD musNeuromuscul Disord Suppl 12 (1) (2002) S125ndashS141

[9] TJ Mariani V Budhraja BH Mecham CC Gu MA Wason Y Sadovsky A variable fold-change threshold determsignificance for expression microarrays FASEB J (2002)

[10] CA Iacobuzio-Donahue A Maitra GL Shen-Ong T vHeek R Ashfaq R Meyer K Walter K Berg MA Holingsworth JL Cameron CJ Yeo SE Kern M GoggiRH Hruban Discovery of novel tumor markers of pancatic cancer using global gene expression technology AmPathol 160 (2002) 1239ndash1249

[11] AI Su MP Cooke KA Ching Y Hakak JR WalkeT Wiltshire AP Orth RG Vega LM Sapinoso A MoqricA Patapoutian GM Hampton PG Schultz JB HogeneLarge-scale analysis of the human and mouse transcriptoProc Natl Acad Sci USA 99 (2002) 4465ndash4470

[12] PM Haverty Z Weng NL Best KR Auerbach LL HsiaRV Jensen SR Gullans HugeIndex a database with visization tools for high-density oligonucleotide array data fronormal human tissues Nucleic Acids Res 30 (2002) 214ndash

[13] Affymetrix Inc GeneChip Expression Analysis 2001[14] LL Hsiao F Dangond T Yoshida R Hong RV Jens

J Misra W Dillon KF Lee KE Clark P Haverty Z WenGL Mutter MP Frosch ME Macdonald EL MilfordCP Crum R Bueno RE Pratt M Mahadevappa JA Wrington G Stephanopoulos SR Gullans A compendiumgene expression in normal human tissues Physiol Genom(2001) 97ndash104

[15] C Li WH Wong Model-based analysis of oligonucleotiarrays expression index computation and outlier detectProc Natl Acad Sci USA 98 (2001) 31ndash36

[16] R Irizarry B Hobbs F Collin YD Beazer-Barclay KJ Atonellis U Scherf TP Speed Exploration Normalizatioand Summaries of High Density Oligonucleotide Array ProLevel Data 2002

[17] R Sasik E Calvo J Corbeil Statistical analysis of higdensity oligonucleotide arrays a multiplicative noise modBioinformatics 18 (2002) 1633ndash1640

[18] BM Bolstad RA Irizarry M Astrand TP Speed A comparison of normalization methods for high density oligoncleotide array data based on variance and bias Bioinformics 19 (2003) 185ndash193

[19] E Hubbell WM Liu R Mei Robust estimators for expresion analysis Bioinformatics 18 (2002) 1585ndash1592

[20] WM Liu R Mei X Di TB Ryder E Hubbell S DeeTA Webster CA Harrington MH Ho J Baid SP Smekens Analysis of high density expression microarrays wsigned-rank call algorithms Bioinformatics 18 (2002) 1591599

[21] EC Mantzouranis SB Dowton AS Whitehead MD EdGA Bruns HR Colten Human serum amyloid P componcDNA isolation complete sequence of pre-serum amyloicomponent and localization of the gene to chromosomJ Biol Chem 260 (1985) 7752ndash7756

[22] TR Hughes MJ Marton AR Jones CJ Roberts R Sghton CD Armour HA Bennett E Coffey H Dai YDHe MJ Kidd AM King MR Meyer D Slade PY LumSB Stepaniants DD Shoemaker D Gachotte K Chraburtty J Simon M Bard SH Friend Functional discovvia a compendium of expression profiles Cell 102 (2000) 1126

[23] JL DeRisi VR Iyer PO Brown Exploring the metaboand genetic control of gene expression on a genomic sScience 278 (1997) 680ndash686

[24] M Bakay YW Chen R Borup P Zhao K NagaraEP Hoffman Sources of variability and effect of experimenapproach on expression profiling data interpretation BBioinformatics 3 (2002) 4

[25] KR Coombes WE Highsmith TA Krogmann KA Bagerly DN Stivers LV Abruzzo Identifying and quantifying sources of variation in microarray data using high-dencDNA membrane arrays J Comput Biol 9 (2002) 655ndash66

1068 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

n byhernt ofideda-iestis-lyres-as

yn-n-in apar-ovelionor

us-2]

sestypescecalandaseishichnts

ntof

essed0]an

G-eentionsthetantion

sive

-alo

n-ainertaterd

neuf--G-of

ne-du-et

ue

asuita

itsvia

ata

aluetheentplesand

en-t ofde-

l-es

1 Introduction

The human body orchestrates gene expressioco-regulating genes whose products function togetMany challenges are posed due to the vast amougenomic and expression data that has to be divinto functional biological groups using biological sttistical and computational tools Many past studcentered on comparisons of diseased to healthysues and were limited to a subset of human genes

High-density oligonucleotide arrays enable highparallel and comprehensive studies of gene expsion The transcription patterns produced knownthe expression profile depict the subset of mRNA sthesized in a certain cell or tissue At its most fudamental level the expression profile describesquantitative way which genes are expressed in aticular tissue More sophisticated issues such as ngene functional characterization gene identificatin biological pathways genetic variation analysisidentification of drug targets could be surveyed bying bioinformatics tools such as cluster analysis ([1and self organizing-maps [34])

The construction of gene expression databais a high priority of todayrsquos research communiSuch databases closely integrated with other tyof genomic information promise not only to enhanour understanding of many fundamental biologiprocesses but also to accelerate drug discoverylead to customized diagnosis and treatment of dise[56] An example generated by our own grouppresented the recent releases of GeneCards wcontain expression data both from array experimeand from lsquoelectronic Northernrsquo analyses [7]

As a significant extension of previous relevaefforts we describe a whole-genome repertoireexpression profiles in twelve normal human tissuPreviously only studies that compare a single diseatissue with a healthy one were preformed [8ndash1Other surveys that were done on healthy humtissues were limited to one array type (ie HU95A [11] Hu6800 [12]) Here for the first time wpresent the expression analysis of the full complemof more than 60 000 gene and EST representatin 12 normal human tissues It is shown thatadditional less characterized genes harbor imporinformation and that a whole-genome express

analysis is essential for generating comprehentranscription analyses

2 Materials and methods

PolyA+ RNA samples from twelve normal human tissues were purchased from Clontech (PAlto CA) This collection of major human tissues icludes Bone marrow (catalog number 6573-1) br(6516-1) heart (6533-1) kidney (6538-1) liv(6510-1) lung (6524-1) pancreas (6539-1) pros(6546-1) skeletal muscle (6541-1) spinal co(6593-1) spleen (6542-1) and thymus (6536-1)

Preparation and hybridization of cRNA were doaccording to the manufacturersquos instructions [13] Sficient hybridization cocktail solution for five independent hybridizations reactions (ie the full set HU95A-E) was prepared The hybridization reactionsall the arrays in the set were carried out simultaously The mRNA from each tissue was reacted inplicates against the full human Affymetrix arrays s(HG-U95A-E) to yield two sets of results The 3prime5primesignal ratios of GAPDH were always below the valof 3 as expected for a fine labeling reaction

Arrays were analyzed and expression value wcalculated for each gene by using Microarray S(MAS) version 50 software (Affymetrix Santa ClarCA) Expression values for each gene calledsignalwere calculated using the MAS 50 software withdefault parameter settings Scaling was not donea MAS 50 option Instead we normalized our das follows the intensities of each array were log10transformed and scaled to a constant reference v(global normalization) This reference value wasmean of all log intensities in all of the tissues Prescalls percentages are presented in Fig 2 for all samaccording to the array type (see also ResultsDiscussion)

3 Results and discussion

Whole-genome gene expression profiles were gerated from the normalized signal values The se12 tissue expression values for a given gene wasfined as itstissue vector Normalization was used to alow a meaningful comparison among different tissu

O Shmueli et al C R Biologies 326 (2003) 1067ndash1072 1069

m-liza-

bu-ys its Ablyar-

heig 2ngell

rrayulteost

ileen-Se-

sig-m-sueach

e des of

elec

rrayporttheing

s

asedsuefor

es isar-re-ake

isin-

fact

nlycr-entrsquo7]tiveueelu-

Fig 1 Intensity distribution in the different arrays set in heart saple Density plots of one heart sample before and after normation are presented PanelA presents the density plot for log10 rawsignals while panelB presents the normalized signals

Fig 1 shows a comparison of the intensities distrition of one heart sample for the five different arrabefore and after normalization In this density plotis observed that the highest intensities are in arrayand B while arrays C D and E produce consideralower intensities These results are similar to the elier observations by Bakay et al [8] Aspects of tdecreased array intensities are also presented in Fwhich displays the percentage of lsquopresentrsquo calls alothe samples for each array type For example it is wobserved that the percentage of lsquopresentrsquo calls in aD is consistently below 10 This could be the resof intrinsic hybridization differences It could also bdue to the average quality of the probe-sets since mof the well-studied genes are found on array A wharrays B to E are constructed with increasing propsities of less well-characterized ESTs (Expressedquence Tags)

Tissue specific genes are supposed to have anificant role in tissue functionality Hence we exained the counts of such genes in the different tissamples (Fig 3) One representative sample for etissue was chosen and tissue specific genes wertermined on the basis of the lsquoabsentndashpresentrsquo callthe Affymetrix MAS 50 software A similar criterionwas used in the past for housekeepings genes s

-

-

Fig 2 Percent of present calls in all samples according to the atype lsquoPresentrsquo call percentages were taken from the MAS 50 refiles and are presented for the various array types in all ofsamples The AndashE letters represent the array type The followare the tissue samples shortcutsBMR for bone marrowBRN forbrain HRT for heartKDN for kidney LNG for lung LVR forliver MSL for musclePNC for pancreasPST for prostateSPCfor spinal cordSPL for spleen andTMS for thymus The numberattached to the sample name represent the replicate number

tion [14] In this paper we define tissue specificityhaving a lsquopresentrsquo call in only one tissue It is observthat brain and thymus had the highest number of tisspecific genes almost twice the number observedother tissues The propensity of tissue specific gennot much lower in arrays BndashE when compared toray A suggesting that important information alsosides within the former However one should also tinto consideration that the amount of lsquopresentrsquo callsreduced on arrays CndashE (Fig 2) Consequently thecreased effect of tissue specificity may be an artiof the array quality and design

The MAS 50 package is one of the most commoused software tools for analyzing high-density miroarray results However its use of lsquopresentndashabscalls is a controversial issue in the literature [15ndash1To address this we have begun to develop alternamethods utilizing a normalized entropy-related TissSpecificity Index (TSI) computed directly from thtissue vectors This algorithm is currently being eva

1070 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

rrayesentsple

genes

t al

ub-ct

un-elseen

en-ex-delshtillof

the

n a 4m)t Patedtoa

e ofthe

forand

dgigherlored

ive

ofsionilarncentalionsueat

Fig 3 Tissue specific genes in all of the tissues sorted by the aset Tissue specific genes were found based on the lsquoabsentndashprcall of the Affymetrix MAS 50 software Tissue specificity wadefined as having a lsquopresentrsquo call in only one tissue One samfor each tissue was chosen for the search The tissue specificdistribution on the arrays set HG-U95A-E is shown for all tissue

ated as compared to other methods (O Shmueli emanuscript in preparation)

Criticism has also been directed towards the straction of the mismatch probes (MM) from the perfematch probes (PM) creating PM-MM values as anderlying computation for assessing background levand signal computations Several methods have bdeveloped to evaluate the background level of intsity above which the value truly reflects the genepression value [151618ndash20] Some of these mofit log(PM-Background) instead of PM-MM thougin low-intensity regions the current models are svery questionable [16] Consequently the validitythe zone of low-intensity probes which is close tobackground level remains undetermined

The tissue vectors for each gene were drawn oroot scale representation [7] as exemplified in Figfor the gene APCS (Amyloid P-component seru[21] APCS is a precursor for amyloid componenwhich is found in basement membrane and associwith amyloid deposits A root scale enables onevisualize several orders of magnitude similar tologarithmic scale but preserves some advantagthe linear scale ie a differential increase with

rsquo

s

Fig 4 Tissue vector for the gene APCS [21] Tissue vectorthe gene APCS was calculated from the normalized signalsis presented in a graphical way (panelA) Tissues were groupeaccording to their origin and the groups colored accordingly (enerve tissues in green) The range between the lower and hmeasurements was represented by a white box above the cominimal measurement bar The graph is presented on they-axiswith a special root scale [7]Y = X(1B) whereB = log210 PanelB presents the tissue vector colored map

different orders of magnitude This affords an effectview-at-a-glance of the tissue vectors

A major aim of this work is to enable predictionthe function of novel genes based on their expresprofiles It is expected that genes that display simexpression patterns are functionally related sithey are co-regulated under all of the developmeconditions [2223] On the basis of the expressprofile similarities ten tissues were ordered in a tiscorrelation matrix (Fig 5) It is clearly seen th

O Shmueli et al C R Biologies 326 (2003) 1067ndash1072 1071

zedorre-cal-in a

re-

wonsof

eelyueondusents

and

d

n

e anesetataplayme

tedionoisisir

terroc

ing97

aly-12

ster7ndash

ro-e

ewhen-

a-ld-

erann

Fig 5 Tissue correlation matrix Tissue vectors of the normalisignals (log transformed and scaled) were used to calculate clations between twenty tissue samples The correlations wereculated using the Pearson correlation coefficient and presentedtissue vector correlation matrix Different correlation levels are psented according to the color scale on the right PanelA presents thetissue vector correlation matrix for array A while panelB presentsarrays BndashE

array A (Fig 5A) and arrays BndashE (Fig 5B) shothe same pattern of correlations The high correlatifound between tissue replicates is an indicationthe validity of the results [2425] In addition thcorrelation matrix revealed three groups of closrelated tissues The first group with high inter-tisscorrelation includes brain and spinal-cord the secis heart and muscle and the last includes thymspleen bone-marrow and lung This is in agreemwith the expected biological origin of the tissue

since they represent the nervous system muscleimmune system respectively

All expression data raw as well as normalizehave been stored in the GeneNote database (httpgenecardsweizmannacilgenenote) GeneNote tis-sue vectors are provided in GeneCards (httpbioinfoweizmannacilgenecards) for all genes that couldbe explicitly assciated via the Affymetrix annotatio(currentlysim20 000 genes)

The preliminary results presented above providclear indication of the importance of surveying geexpression in an as broad as possible a geneThe combination of comprehensive experimental dgathering computer analysis and database disprovide relevant and useful tools for the transcriptocommunity

Acknowledgements

The work described in this paper was supporby the Abraham and Judith Goldwasser foundatDoron Lancet is the incumbent of the Ralph and LSilver Chair in Human Genomics Eytan Domanythe incumbent of the Henry J Leir Professorial Cha

References

[1] MB Eisen PT Spellman PO Brown D Botstein Clusanalysis and display of genome-wide expression patterns PNatl Acad Sci USA 95 (1998) 14863ndash14868

[2] G Getz E Levine E Domany Coupled two-way clusteranalysis of gene microarray data Proc Natl Acad Sci USA(2000) 12079ndash12084

[3] M Kurella LL Hsiao T Yoshida JD Randall G ChowSS Sarang RV Jensen SR Gullans DNA microarray ansis of complex biologic processes J Am Soc Nephrol(2001) 1072ndash1078

[4] A Sturn J Quackenbush Z Trajanoski Genesis cluanalysis of microarray data Bioinformatics 18 (2002) 20208

[5] G Zweiger Knowledge discovery in gene-expression-micarray data mining the information output of the genomTrends Biotechnol 17 (1999) 429ndash436

[6] T Strachan M Abitbol D Davidson JS Beckmann A ndimension for the human genome project towards compresive expression maps Nat Genet 16 (1997) 126ndash132

[7] M Safran V Chalifa-Caspi O Shmueli T Olender M Lpidot N Rosen M Shmoish Y Peter G Glusman E Femesser A Adato I Peter M Khen T Atarot Y GronD Lancet Human Gene-Centric Databases at the Weizm

1072 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

E

iblecle

t-ines

anl-nsre- J

rhschmes

oual-m

217

engar-of

ics 7

deion

n-nbe

h-el

-u-at-

s-

e-ith3ndash

geentd Pe 1

tou-ak-

ery09ndash

liccale

jutal

MC

g--

sity9

Institute of Science GeneCards UDB CroW 21 and HORDNucleic Acids Res 31 (2003) 142ndash146

[8] M Bakay P Zhao J Chen EP Hoffman A web-accesscomplete transcriptome of normal human and DMD musNeuromuscul Disord Suppl 12 (1) (2002) S125ndashS141

[9] TJ Mariani V Budhraja BH Mecham CC Gu MA Wason Y Sadovsky A variable fold-change threshold determsignificance for expression microarrays FASEB J (2002)

[10] CA Iacobuzio-Donahue A Maitra GL Shen-Ong T vHeek R Ashfaq R Meyer K Walter K Berg MA Holingsworth JL Cameron CJ Yeo SE Kern M GoggiRH Hruban Discovery of novel tumor markers of pancatic cancer using global gene expression technology AmPathol 160 (2002) 1239ndash1249

[11] AI Su MP Cooke KA Ching Y Hakak JR WalkeT Wiltshire AP Orth RG Vega LM Sapinoso A MoqricA Patapoutian GM Hampton PG Schultz JB HogeneLarge-scale analysis of the human and mouse transcriptoProc Natl Acad Sci USA 99 (2002) 4465ndash4470

[12] PM Haverty Z Weng NL Best KR Auerbach LL HsiaRV Jensen SR Gullans HugeIndex a database with visization tools for high-density oligonucleotide array data fronormal human tissues Nucleic Acids Res 30 (2002) 214ndash

[13] Affymetrix Inc GeneChip Expression Analysis 2001[14] LL Hsiao F Dangond T Yoshida R Hong RV Jens

J Misra W Dillon KF Lee KE Clark P Haverty Z WenGL Mutter MP Frosch ME Macdonald EL MilfordCP Crum R Bueno RE Pratt M Mahadevappa JA Wrington G Stephanopoulos SR Gullans A compendiumgene expression in normal human tissues Physiol Genom(2001) 97ndash104

[15] C Li WH Wong Model-based analysis of oligonucleotiarrays expression index computation and outlier detectProc Natl Acad Sci USA 98 (2001) 31ndash36

[16] R Irizarry B Hobbs F Collin YD Beazer-Barclay KJ Atonellis U Scherf TP Speed Exploration Normalizatioand Summaries of High Density Oligonucleotide Array ProLevel Data 2002

[17] R Sasik E Calvo J Corbeil Statistical analysis of higdensity oligonucleotide arrays a multiplicative noise modBioinformatics 18 (2002) 1633ndash1640

[18] BM Bolstad RA Irizarry M Astrand TP Speed A comparison of normalization methods for high density oligoncleotide array data based on variance and bias Bioinformics 19 (2003) 185ndash193

[19] E Hubbell WM Liu R Mei Robust estimators for expresion analysis Bioinformatics 18 (2002) 1585ndash1592

[20] WM Liu R Mei X Di TB Ryder E Hubbell S DeeTA Webster CA Harrington MH Ho J Baid SP Smekens Analysis of high density expression microarrays wsigned-rank call algorithms Bioinformatics 18 (2002) 1591599

[21] EC Mantzouranis SB Dowton AS Whitehead MD EdGA Bruns HR Colten Human serum amyloid P componcDNA isolation complete sequence of pre-serum amyloicomponent and localization of the gene to chromosomJ Biol Chem 260 (1985) 7752ndash7756

[22] TR Hughes MJ Marton AR Jones CJ Roberts R Sghton CD Armour HA Bennett E Coffey H Dai YDHe MJ Kidd AM King MR Meyer D Slade PY LumSB Stepaniants DD Shoemaker D Gachotte K Chraburtty J Simon M Bard SH Friend Functional discovvia a compendium of expression profiles Cell 102 (2000) 1126

[23] JL DeRisi VR Iyer PO Brown Exploring the metaboand genetic control of gene expression on a genomic sScience 278 (1997) 680ndash686

[24] M Bakay YW Chen R Borup P Zhao K NagaraEP Hoffman Sources of variability and effect of experimenapproach on expression profiling data interpretation BBioinformatics 3 (2002) 4

[25] KR Coombes WE Highsmith TA Krogmann KA Bagerly DN Stivers LV Abruzzo Identifying and quantifying sources of variation in microarray data using high-dencDNA membrane arrays J Comput Biol 9 (2002) 655ndash66

O Shmueli et al C R Biologies 326 (2003) 1067ndash1072 1069

m-liza-

bu-ys its Ablyar-

heig 2ngell

rrayulteost

ileen-Se-

sig-m-sueach

e des of

elec

rrayporttheing

s

asedsuefor

es isar-re-ake

isin-

fact

nlycr-entrsquo7]tiveueelu-

Fig 1 Intensity distribution in the different arrays set in heart saple Density plots of one heart sample before and after normation are presented PanelA presents the density plot for log10 rawsignals while panelB presents the normalized signals

Fig 1 shows a comparison of the intensities distrition of one heart sample for the five different arrabefore and after normalization In this density plotis observed that the highest intensities are in arrayand B while arrays C D and E produce consideralower intensities These results are similar to the elier observations by Bakay et al [8] Aspects of tdecreased array intensities are also presented in Fwhich displays the percentage of lsquopresentrsquo calls alothe samples for each array type For example it is wobserved that the percentage of lsquopresentrsquo calls in aD is consistently below 10 This could be the resof intrinsic hybridization differences It could also bdue to the average quality of the probe-sets since mof the well-studied genes are found on array A wharrays B to E are constructed with increasing propsities of less well-characterized ESTs (Expressedquence Tags)

Tissue specific genes are supposed to have anificant role in tissue functionality Hence we exained the counts of such genes in the different tissamples (Fig 3) One representative sample for etissue was chosen and tissue specific genes wertermined on the basis of the lsquoabsentndashpresentrsquo callthe Affymetrix MAS 50 software A similar criterionwas used in the past for housekeepings genes s

-

-

Fig 2 Percent of present calls in all samples according to the atype lsquoPresentrsquo call percentages were taken from the MAS 50 refiles and are presented for the various array types in all ofsamples The AndashE letters represent the array type The followare the tissue samples shortcutsBMR for bone marrowBRN forbrain HRT for heartKDN for kidney LNG for lung LVR forliver MSL for musclePNC for pancreasPST for prostateSPCfor spinal cordSPL for spleen andTMS for thymus The numberattached to the sample name represent the replicate number

tion [14] In this paper we define tissue specificityhaving a lsquopresentrsquo call in only one tissue It is observthat brain and thymus had the highest number of tisspecific genes almost twice the number observedother tissues The propensity of tissue specific gennot much lower in arrays BndashE when compared toray A suggesting that important information alsosides within the former However one should also tinto consideration that the amount of lsquopresentrsquo callsreduced on arrays CndashE (Fig 2) Consequently thecreased effect of tissue specificity may be an artiof the array quality and design

The MAS 50 package is one of the most commoused software tools for analyzing high-density miroarray results However its use of lsquopresentndashabscalls is a controversial issue in the literature [15ndash1To address this we have begun to develop alternamethods utilizing a normalized entropy-related TissSpecificity Index (TSI) computed directly from thtissue vectors This algorithm is currently being eva

1070 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

rrayesentsple

genes

t al

ub-ct

un-elseen

en-ex-delshtillof

the

n a 4m)t Patedtoa

e ofthe

forand

dgigherlored

ive

ofsionilarncentalionsueat

Fig 3 Tissue specific genes in all of the tissues sorted by the aset Tissue specific genes were found based on the lsquoabsentndashprcall of the Affymetrix MAS 50 software Tissue specificity wadefined as having a lsquopresentrsquo call in only one tissue One samfor each tissue was chosen for the search The tissue specificdistribution on the arrays set HG-U95A-E is shown for all tissue

ated as compared to other methods (O Shmueli emanuscript in preparation)

Criticism has also been directed towards the straction of the mismatch probes (MM) from the perfematch probes (PM) creating PM-MM values as anderlying computation for assessing background levand signal computations Several methods have bdeveloped to evaluate the background level of intsity above which the value truly reflects the genepression value [151618ndash20] Some of these mofit log(PM-Background) instead of PM-MM thougin low-intensity regions the current models are svery questionable [16] Consequently the validitythe zone of low-intensity probes which is close tobackground level remains undetermined

The tissue vectors for each gene were drawn oroot scale representation [7] as exemplified in Figfor the gene APCS (Amyloid P-component seru[21] APCS is a precursor for amyloid componenwhich is found in basement membrane and associwith amyloid deposits A root scale enables onevisualize several orders of magnitude similar tologarithmic scale but preserves some advantagthe linear scale ie a differential increase with

rsquo

s

Fig 4 Tissue vector for the gene APCS [21] Tissue vectorthe gene APCS was calculated from the normalized signalsis presented in a graphical way (panelA) Tissues were groupeaccording to their origin and the groups colored accordingly (enerve tissues in green) The range between the lower and hmeasurements was represented by a white box above the cominimal measurement bar The graph is presented on they-axiswith a special root scale [7]Y = X(1B) whereB = log210 PanelB presents the tissue vector colored map

different orders of magnitude This affords an effectview-at-a-glance of the tissue vectors

A major aim of this work is to enable predictionthe function of novel genes based on their expresprofiles It is expected that genes that display simexpression patterns are functionally related sithey are co-regulated under all of the developmeconditions [2223] On the basis of the expressprofile similarities ten tissues were ordered in a tiscorrelation matrix (Fig 5) It is clearly seen th

O Shmueli et al C R Biologies 326 (2003) 1067ndash1072 1071

zedorre-cal-in a

re-

wonsof

eelyueondusents

and

d

n

e anesetataplayme

tedionoisisir

terroc

ing97

aly-12

ster7ndash

ro-e

ewhen-

a-ld-

erann

Fig 5 Tissue correlation matrix Tissue vectors of the normalisignals (log transformed and scaled) were used to calculate clations between twenty tissue samples The correlations wereculated using the Pearson correlation coefficient and presentedtissue vector correlation matrix Different correlation levels are psented according to the color scale on the right PanelA presents thetissue vector correlation matrix for array A while panelB presentsarrays BndashE

array A (Fig 5A) and arrays BndashE (Fig 5B) shothe same pattern of correlations The high correlatifound between tissue replicates is an indicationthe validity of the results [2425] In addition thcorrelation matrix revealed three groups of closrelated tissues The first group with high inter-tisscorrelation includes brain and spinal-cord the secis heart and muscle and the last includes thymspleen bone-marrow and lung This is in agreemwith the expected biological origin of the tissue

since they represent the nervous system muscleimmune system respectively

All expression data raw as well as normalizehave been stored in the GeneNote database (httpgenecardsweizmannacilgenenote) GeneNote tis-sue vectors are provided in GeneCards (httpbioinfoweizmannacilgenecards) for all genes that couldbe explicitly assciated via the Affymetrix annotatio(currentlysim20 000 genes)

The preliminary results presented above providclear indication of the importance of surveying geexpression in an as broad as possible a geneThe combination of comprehensive experimental dgathering computer analysis and database disprovide relevant and useful tools for the transcriptocommunity

Acknowledgements

The work described in this paper was supporby the Abraham and Judith Goldwasser foundatDoron Lancet is the incumbent of the Ralph and LSilver Chair in Human Genomics Eytan Domanythe incumbent of the Henry J Leir Professorial Cha

References

[1] MB Eisen PT Spellman PO Brown D Botstein Clusanalysis and display of genome-wide expression patterns PNatl Acad Sci USA 95 (1998) 14863ndash14868

[2] G Getz E Levine E Domany Coupled two-way clusteranalysis of gene microarray data Proc Natl Acad Sci USA(2000) 12079ndash12084

[3] M Kurella LL Hsiao T Yoshida JD Randall G ChowSS Sarang RV Jensen SR Gullans DNA microarray ansis of complex biologic processes J Am Soc Nephrol(2001) 1072ndash1078

[4] A Sturn J Quackenbush Z Trajanoski Genesis cluanalysis of microarray data Bioinformatics 18 (2002) 20208

[5] G Zweiger Knowledge discovery in gene-expression-micarray data mining the information output of the genomTrends Biotechnol 17 (1999) 429ndash436

[6] T Strachan M Abitbol D Davidson JS Beckmann A ndimension for the human genome project towards compresive expression maps Nat Genet 16 (1997) 126ndash132

[7] M Safran V Chalifa-Caspi O Shmueli T Olender M Lpidot N Rosen M Shmoish Y Peter G Glusman E Femesser A Adato I Peter M Khen T Atarot Y GronD Lancet Human Gene-Centric Databases at the Weizm

1072 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

E

iblecle

t-ines

anl-nsre- J

rhschmes

oual-m

217

engar-of

ics 7

deion

n-nbe

h-el

-u-at-

s-

e-ith3ndash

geentd Pe 1

tou-ak-

ery09ndash

liccale

jutal

MC

g--

sity9

Institute of Science GeneCards UDB CroW 21 and HORDNucleic Acids Res 31 (2003) 142ndash146

[8] M Bakay P Zhao J Chen EP Hoffman A web-accesscomplete transcriptome of normal human and DMD musNeuromuscul Disord Suppl 12 (1) (2002) S125ndashS141

[9] TJ Mariani V Budhraja BH Mecham CC Gu MA Wason Y Sadovsky A variable fold-change threshold determsignificance for expression microarrays FASEB J (2002)

[10] CA Iacobuzio-Donahue A Maitra GL Shen-Ong T vHeek R Ashfaq R Meyer K Walter K Berg MA Holingsworth JL Cameron CJ Yeo SE Kern M GoggiRH Hruban Discovery of novel tumor markers of pancatic cancer using global gene expression technology AmPathol 160 (2002) 1239ndash1249

[11] AI Su MP Cooke KA Ching Y Hakak JR WalkeT Wiltshire AP Orth RG Vega LM Sapinoso A MoqricA Patapoutian GM Hampton PG Schultz JB HogeneLarge-scale analysis of the human and mouse transcriptoProc Natl Acad Sci USA 99 (2002) 4465ndash4470

[12] PM Haverty Z Weng NL Best KR Auerbach LL HsiaRV Jensen SR Gullans HugeIndex a database with visization tools for high-density oligonucleotide array data fronormal human tissues Nucleic Acids Res 30 (2002) 214ndash

[13] Affymetrix Inc GeneChip Expression Analysis 2001[14] LL Hsiao F Dangond T Yoshida R Hong RV Jens

J Misra W Dillon KF Lee KE Clark P Haverty Z WenGL Mutter MP Frosch ME Macdonald EL MilfordCP Crum R Bueno RE Pratt M Mahadevappa JA Wrington G Stephanopoulos SR Gullans A compendiumgene expression in normal human tissues Physiol Genom(2001) 97ndash104

[15] C Li WH Wong Model-based analysis of oligonucleotiarrays expression index computation and outlier detectProc Natl Acad Sci USA 98 (2001) 31ndash36

[16] R Irizarry B Hobbs F Collin YD Beazer-Barclay KJ Atonellis U Scherf TP Speed Exploration Normalizatioand Summaries of High Density Oligonucleotide Array ProLevel Data 2002

[17] R Sasik E Calvo J Corbeil Statistical analysis of higdensity oligonucleotide arrays a multiplicative noise modBioinformatics 18 (2002) 1633ndash1640

[18] BM Bolstad RA Irizarry M Astrand TP Speed A comparison of normalization methods for high density oligoncleotide array data based on variance and bias Bioinformics 19 (2003) 185ndash193

[19] E Hubbell WM Liu R Mei Robust estimators for expresion analysis Bioinformatics 18 (2002) 1585ndash1592

[20] WM Liu R Mei X Di TB Ryder E Hubbell S DeeTA Webster CA Harrington MH Ho J Baid SP Smekens Analysis of high density expression microarrays wsigned-rank call algorithms Bioinformatics 18 (2002) 1591599

[21] EC Mantzouranis SB Dowton AS Whitehead MD EdGA Bruns HR Colten Human serum amyloid P componcDNA isolation complete sequence of pre-serum amyloicomponent and localization of the gene to chromosomJ Biol Chem 260 (1985) 7752ndash7756

[22] TR Hughes MJ Marton AR Jones CJ Roberts R Sghton CD Armour HA Bennett E Coffey H Dai YDHe MJ Kidd AM King MR Meyer D Slade PY LumSB Stepaniants DD Shoemaker D Gachotte K Chraburtty J Simon M Bard SH Friend Functional discovvia a compendium of expression profiles Cell 102 (2000) 1126

[23] JL DeRisi VR Iyer PO Brown Exploring the metaboand genetic control of gene expression on a genomic sScience 278 (1997) 680ndash686

[24] M Bakay YW Chen R Borup P Zhao K NagaraEP Hoffman Sources of variability and effect of experimenapproach on expression profiling data interpretation BBioinformatics 3 (2002) 4

[25] KR Coombes WE Highsmith TA Krogmann KA Bagerly DN Stivers LV Abruzzo Identifying and quantifying sources of variation in microarray data using high-dencDNA membrane arrays J Comput Biol 9 (2002) 655ndash66

1070 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

rrayesentsple

genes

t al

ub-ct

un-elseen

en-ex-delshtillof

the

n a 4m)t Patedtoa

e ofthe

forand

dgigherlored

ive

ofsionilarncentalionsueat

Fig 3 Tissue specific genes in all of the tissues sorted by the aset Tissue specific genes were found based on the lsquoabsentndashprcall of the Affymetrix MAS 50 software Tissue specificity wadefined as having a lsquopresentrsquo call in only one tissue One samfor each tissue was chosen for the search The tissue specificdistribution on the arrays set HG-U95A-E is shown for all tissue

ated as compared to other methods (O Shmueli emanuscript in preparation)

Criticism has also been directed towards the straction of the mismatch probes (MM) from the perfematch probes (PM) creating PM-MM values as anderlying computation for assessing background levand signal computations Several methods have bdeveloped to evaluate the background level of intsity above which the value truly reflects the genepression value [151618ndash20] Some of these mofit log(PM-Background) instead of PM-MM thougin low-intensity regions the current models are svery questionable [16] Consequently the validitythe zone of low-intensity probes which is close tobackground level remains undetermined

The tissue vectors for each gene were drawn oroot scale representation [7] as exemplified in Figfor the gene APCS (Amyloid P-component seru[21] APCS is a precursor for amyloid componenwhich is found in basement membrane and associwith amyloid deposits A root scale enables onevisualize several orders of magnitude similar tologarithmic scale but preserves some advantagthe linear scale ie a differential increase with

rsquo

s

Fig 4 Tissue vector for the gene APCS [21] Tissue vectorthe gene APCS was calculated from the normalized signalsis presented in a graphical way (panelA) Tissues were groupeaccording to their origin and the groups colored accordingly (enerve tissues in green) The range between the lower and hmeasurements was represented by a white box above the cominimal measurement bar The graph is presented on they-axiswith a special root scale [7]Y = X(1B) whereB = log210 PanelB presents the tissue vector colored map

different orders of magnitude This affords an effectview-at-a-glance of the tissue vectors

A major aim of this work is to enable predictionthe function of novel genes based on their expresprofiles It is expected that genes that display simexpression patterns are functionally related sithey are co-regulated under all of the developmeconditions [2223] On the basis of the expressprofile similarities ten tissues were ordered in a tiscorrelation matrix (Fig 5) It is clearly seen th

O Shmueli et al C R Biologies 326 (2003) 1067ndash1072 1071

zedorre-cal-in a

re-

wonsof

eelyueondusents

and

d

n

e anesetataplayme

tedionoisisir

terroc

ing97

aly-12

ster7ndash

ro-e

ewhen-

a-ld-

erann

Fig 5 Tissue correlation matrix Tissue vectors of the normalisignals (log transformed and scaled) were used to calculate clations between twenty tissue samples The correlations wereculated using the Pearson correlation coefficient and presentedtissue vector correlation matrix Different correlation levels are psented according to the color scale on the right PanelA presents thetissue vector correlation matrix for array A while panelB presentsarrays BndashE

array A (Fig 5A) and arrays BndashE (Fig 5B) shothe same pattern of correlations The high correlatifound between tissue replicates is an indicationthe validity of the results [2425] In addition thcorrelation matrix revealed three groups of closrelated tissues The first group with high inter-tisscorrelation includes brain and spinal-cord the secis heart and muscle and the last includes thymspleen bone-marrow and lung This is in agreemwith the expected biological origin of the tissue

since they represent the nervous system muscleimmune system respectively

All expression data raw as well as normalizehave been stored in the GeneNote database (httpgenecardsweizmannacilgenenote) GeneNote tis-sue vectors are provided in GeneCards (httpbioinfoweizmannacilgenecards) for all genes that couldbe explicitly assciated via the Affymetrix annotatio(currentlysim20 000 genes)

The preliminary results presented above providclear indication of the importance of surveying geexpression in an as broad as possible a geneThe combination of comprehensive experimental dgathering computer analysis and database disprovide relevant and useful tools for the transcriptocommunity

Acknowledgements

The work described in this paper was supporby the Abraham and Judith Goldwasser foundatDoron Lancet is the incumbent of the Ralph and LSilver Chair in Human Genomics Eytan Domanythe incumbent of the Henry J Leir Professorial Cha

References

[1] MB Eisen PT Spellman PO Brown D Botstein Clusanalysis and display of genome-wide expression patterns PNatl Acad Sci USA 95 (1998) 14863ndash14868

[2] G Getz E Levine E Domany Coupled two-way clusteranalysis of gene microarray data Proc Natl Acad Sci USA(2000) 12079ndash12084

[3] M Kurella LL Hsiao T Yoshida JD Randall G ChowSS Sarang RV Jensen SR Gullans DNA microarray ansis of complex biologic processes J Am Soc Nephrol(2001) 1072ndash1078

[4] A Sturn J Quackenbush Z Trajanoski Genesis cluanalysis of microarray data Bioinformatics 18 (2002) 20208

[5] G Zweiger Knowledge discovery in gene-expression-micarray data mining the information output of the genomTrends Biotechnol 17 (1999) 429ndash436

[6] T Strachan M Abitbol D Davidson JS Beckmann A ndimension for the human genome project towards compresive expression maps Nat Genet 16 (1997) 126ndash132

[7] M Safran V Chalifa-Caspi O Shmueli T Olender M Lpidot N Rosen M Shmoish Y Peter G Glusman E Femesser A Adato I Peter M Khen T Atarot Y GronD Lancet Human Gene-Centric Databases at the Weizm

1072 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

E

iblecle

t-ines

anl-nsre- J

rhschmes

oual-m

217

engar-of

ics 7

deion

n-nbe

h-el

-u-at-

s-

e-ith3ndash

geentd Pe 1

tou-ak-

ery09ndash

liccale

jutal

MC

g--

sity9

Institute of Science GeneCards UDB CroW 21 and HORDNucleic Acids Res 31 (2003) 142ndash146

[8] M Bakay P Zhao J Chen EP Hoffman A web-accesscomplete transcriptome of normal human and DMD musNeuromuscul Disord Suppl 12 (1) (2002) S125ndashS141

[9] TJ Mariani V Budhraja BH Mecham CC Gu MA Wason Y Sadovsky A variable fold-change threshold determsignificance for expression microarrays FASEB J (2002)

[10] CA Iacobuzio-Donahue A Maitra GL Shen-Ong T vHeek R Ashfaq R Meyer K Walter K Berg MA Holingsworth JL Cameron CJ Yeo SE Kern M GoggiRH Hruban Discovery of novel tumor markers of pancatic cancer using global gene expression technology AmPathol 160 (2002) 1239ndash1249

[11] AI Su MP Cooke KA Ching Y Hakak JR WalkeT Wiltshire AP Orth RG Vega LM Sapinoso A MoqricA Patapoutian GM Hampton PG Schultz JB HogeneLarge-scale analysis of the human and mouse transcriptoProc Natl Acad Sci USA 99 (2002) 4465ndash4470

[12] PM Haverty Z Weng NL Best KR Auerbach LL HsiaRV Jensen SR Gullans HugeIndex a database with visization tools for high-density oligonucleotide array data fronormal human tissues Nucleic Acids Res 30 (2002) 214ndash

[13] Affymetrix Inc GeneChip Expression Analysis 2001[14] LL Hsiao F Dangond T Yoshida R Hong RV Jens

J Misra W Dillon KF Lee KE Clark P Haverty Z WenGL Mutter MP Frosch ME Macdonald EL MilfordCP Crum R Bueno RE Pratt M Mahadevappa JA Wrington G Stephanopoulos SR Gullans A compendiumgene expression in normal human tissues Physiol Genom(2001) 97ndash104

[15] C Li WH Wong Model-based analysis of oligonucleotiarrays expression index computation and outlier detectProc Natl Acad Sci USA 98 (2001) 31ndash36

[16] R Irizarry B Hobbs F Collin YD Beazer-Barclay KJ Atonellis U Scherf TP Speed Exploration Normalizatioand Summaries of High Density Oligonucleotide Array ProLevel Data 2002

[17] R Sasik E Calvo J Corbeil Statistical analysis of higdensity oligonucleotide arrays a multiplicative noise modBioinformatics 18 (2002) 1633ndash1640

[18] BM Bolstad RA Irizarry M Astrand TP Speed A comparison of normalization methods for high density oligoncleotide array data based on variance and bias Bioinformics 19 (2003) 185ndash193

[19] E Hubbell WM Liu R Mei Robust estimators for expresion analysis Bioinformatics 18 (2002) 1585ndash1592

[20] WM Liu R Mei X Di TB Ryder E Hubbell S DeeTA Webster CA Harrington MH Ho J Baid SP Smekens Analysis of high density expression microarrays wsigned-rank call algorithms Bioinformatics 18 (2002) 1591599

[21] EC Mantzouranis SB Dowton AS Whitehead MD EdGA Bruns HR Colten Human serum amyloid P componcDNA isolation complete sequence of pre-serum amyloicomponent and localization of the gene to chromosomJ Biol Chem 260 (1985) 7752ndash7756

[22] TR Hughes MJ Marton AR Jones CJ Roberts R Sghton CD Armour HA Bennett E Coffey H Dai YDHe MJ Kidd AM King MR Meyer D Slade PY LumSB Stepaniants DD Shoemaker D Gachotte K Chraburtty J Simon M Bard SH Friend Functional discovvia a compendium of expression profiles Cell 102 (2000) 1126

[23] JL DeRisi VR Iyer PO Brown Exploring the metaboand genetic control of gene expression on a genomic sScience 278 (1997) 680ndash686

[24] M Bakay YW Chen R Borup P Zhao K NagaraEP Hoffman Sources of variability and effect of experimenapproach on expression profiling data interpretation BBioinformatics 3 (2002) 4

[25] KR Coombes WE Highsmith TA Krogmann KA Bagerly DN Stivers LV Abruzzo Identifying and quantifying sources of variation in microarray data using high-dencDNA membrane arrays J Comput Biol 9 (2002) 655ndash66

O Shmueli et al C R Biologies 326 (2003) 1067ndash1072 1071

zedorre-cal-in a

re-

wonsof

eelyueondusents

and

d

n

e anesetataplayme

tedionoisisir

terroc

ing97

aly-12

ster7ndash

ro-e

ewhen-

a-ld-

erann

Fig 5 Tissue correlation matrix Tissue vectors of the normalisignals (log transformed and scaled) were used to calculate clations between twenty tissue samples The correlations wereculated using the Pearson correlation coefficient and presentedtissue vector correlation matrix Different correlation levels are psented according to the color scale on the right PanelA presents thetissue vector correlation matrix for array A while panelB presentsarrays BndashE

array A (Fig 5A) and arrays BndashE (Fig 5B) shothe same pattern of correlations The high correlatifound between tissue replicates is an indicationthe validity of the results [2425] In addition thcorrelation matrix revealed three groups of closrelated tissues The first group with high inter-tisscorrelation includes brain and spinal-cord the secis heart and muscle and the last includes thymspleen bone-marrow and lung This is in agreemwith the expected biological origin of the tissue

since they represent the nervous system muscleimmune system respectively

All expression data raw as well as normalizehave been stored in the GeneNote database (httpgenecardsweizmannacilgenenote) GeneNote tis-sue vectors are provided in GeneCards (httpbioinfoweizmannacilgenecards) for all genes that couldbe explicitly assciated via the Affymetrix annotatio(currentlysim20 000 genes)

The preliminary results presented above providclear indication of the importance of surveying geexpression in an as broad as possible a geneThe combination of comprehensive experimental dgathering computer analysis and database disprovide relevant and useful tools for the transcriptocommunity

Acknowledgements

The work described in this paper was supporby the Abraham and Judith Goldwasser foundatDoron Lancet is the incumbent of the Ralph and LSilver Chair in Human Genomics Eytan Domanythe incumbent of the Henry J Leir Professorial Cha

References

[1] MB Eisen PT Spellman PO Brown D Botstein Clusanalysis and display of genome-wide expression patterns PNatl Acad Sci USA 95 (1998) 14863ndash14868

[2] G Getz E Levine E Domany Coupled two-way clusteranalysis of gene microarray data Proc Natl Acad Sci USA(2000) 12079ndash12084

[3] M Kurella LL Hsiao T Yoshida JD Randall G ChowSS Sarang RV Jensen SR Gullans DNA microarray ansis of complex biologic processes J Am Soc Nephrol(2001) 1072ndash1078

[4] A Sturn J Quackenbush Z Trajanoski Genesis cluanalysis of microarray data Bioinformatics 18 (2002) 20208

[5] G Zweiger Knowledge discovery in gene-expression-micarray data mining the information output of the genomTrends Biotechnol 17 (1999) 429ndash436

[6] T Strachan M Abitbol D Davidson JS Beckmann A ndimension for the human genome project towards compresive expression maps Nat Genet 16 (1997) 126ndash132

[7] M Safran V Chalifa-Caspi O Shmueli T Olender M Lpidot N Rosen M Shmoish Y Peter G Glusman E Femesser A Adato I Peter M Khen T Atarot Y GronD Lancet Human Gene-Centric Databases at the Weizm

1072 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

E

iblecle

t-ines

anl-nsre- J

rhschmes

oual-m

217

engar-of

ics 7

deion

n-nbe

h-el

-u-at-

s-

e-ith3ndash

geentd Pe 1

tou-ak-

ery09ndash

liccale

jutal

MC

g--

sity9

Institute of Science GeneCards UDB CroW 21 and HORDNucleic Acids Res 31 (2003) 142ndash146

[8] M Bakay P Zhao J Chen EP Hoffman A web-accesscomplete transcriptome of normal human and DMD musNeuromuscul Disord Suppl 12 (1) (2002) S125ndashS141

[9] TJ Mariani V Budhraja BH Mecham CC Gu MA Wason Y Sadovsky A variable fold-change threshold determsignificance for expression microarrays FASEB J (2002)

[10] CA Iacobuzio-Donahue A Maitra GL Shen-Ong T vHeek R Ashfaq R Meyer K Walter K Berg MA Holingsworth JL Cameron CJ Yeo SE Kern M GoggiRH Hruban Discovery of novel tumor markers of pancatic cancer using global gene expression technology AmPathol 160 (2002) 1239ndash1249

[11] AI Su MP Cooke KA Ching Y Hakak JR WalkeT Wiltshire AP Orth RG Vega LM Sapinoso A MoqricA Patapoutian GM Hampton PG Schultz JB HogeneLarge-scale analysis of the human and mouse transcriptoProc Natl Acad Sci USA 99 (2002) 4465ndash4470

[12] PM Haverty Z Weng NL Best KR Auerbach LL HsiaRV Jensen SR Gullans HugeIndex a database with visization tools for high-density oligonucleotide array data fronormal human tissues Nucleic Acids Res 30 (2002) 214ndash

[13] Affymetrix Inc GeneChip Expression Analysis 2001[14] LL Hsiao F Dangond T Yoshida R Hong RV Jens

J Misra W Dillon KF Lee KE Clark P Haverty Z WenGL Mutter MP Frosch ME Macdonald EL MilfordCP Crum R Bueno RE Pratt M Mahadevappa JA Wrington G Stephanopoulos SR Gullans A compendiumgene expression in normal human tissues Physiol Genom(2001) 97ndash104

[15] C Li WH Wong Model-based analysis of oligonucleotiarrays expression index computation and outlier detectProc Natl Acad Sci USA 98 (2001) 31ndash36

[16] R Irizarry B Hobbs F Collin YD Beazer-Barclay KJ Atonellis U Scherf TP Speed Exploration Normalizatioand Summaries of High Density Oligonucleotide Array ProLevel Data 2002

[17] R Sasik E Calvo J Corbeil Statistical analysis of higdensity oligonucleotide arrays a multiplicative noise modBioinformatics 18 (2002) 1633ndash1640

[18] BM Bolstad RA Irizarry M Astrand TP Speed A comparison of normalization methods for high density oligoncleotide array data based on variance and bias Bioinformics 19 (2003) 185ndash193

[19] E Hubbell WM Liu R Mei Robust estimators for expresion analysis Bioinformatics 18 (2002) 1585ndash1592

[20] WM Liu R Mei X Di TB Ryder E Hubbell S DeeTA Webster CA Harrington MH Ho J Baid SP Smekens Analysis of high density expression microarrays wsigned-rank call algorithms Bioinformatics 18 (2002) 1591599

[21] EC Mantzouranis SB Dowton AS Whitehead MD EdGA Bruns HR Colten Human serum amyloid P componcDNA isolation complete sequence of pre-serum amyloicomponent and localization of the gene to chromosomJ Biol Chem 260 (1985) 7752ndash7756

[22] TR Hughes MJ Marton AR Jones CJ Roberts R Sghton CD Armour HA Bennett E Coffey H Dai YDHe MJ Kidd AM King MR Meyer D Slade PY LumSB Stepaniants DD Shoemaker D Gachotte K Chraburtty J Simon M Bard SH Friend Functional discovvia a compendium of expression profiles Cell 102 (2000) 1126

[23] JL DeRisi VR Iyer PO Brown Exploring the metaboand genetic control of gene expression on a genomic sScience 278 (1997) 680ndash686

[24] M Bakay YW Chen R Borup P Zhao K NagaraEP Hoffman Sources of variability and effect of experimenapproach on expression profiling data interpretation BBioinformatics 3 (2002) 4

[25] KR Coombes WE Highsmith TA Krogmann KA Bagerly DN Stivers LV Abruzzo Identifying and quantifying sources of variation in microarray data using high-dencDNA membrane arrays J Comput Biol 9 (2002) 655ndash66

1072 O Shmueli et al C R Biologies 326 (2003) 1067ndash1072

E

iblecle

t-ines

anl-nsre- J

rhschmes

oual-m

217

engar-of

ics 7

deion

n-nbe

h-el

-u-at-

s-

e-ith3ndash

geentd Pe 1

tou-ak-

ery09ndash

liccale

jutal

MC

g--

sity9

Institute of Science GeneCards UDB CroW 21 and HORDNucleic Acids Res 31 (2003) 142ndash146

[8] M Bakay P Zhao J Chen EP Hoffman A web-accesscomplete transcriptome of normal human and DMD musNeuromuscul Disord Suppl 12 (1) (2002) S125ndashS141

[9] TJ Mariani V Budhraja BH Mecham CC Gu MA Wason Y Sadovsky A variable fold-change threshold determsignificance for expression microarrays FASEB J (2002)

[10] CA Iacobuzio-Donahue A Maitra GL Shen-Ong T vHeek R Ashfaq R Meyer K Walter K Berg MA Holingsworth JL Cameron CJ Yeo SE Kern M GoggiRH Hruban Discovery of novel tumor markers of pancatic cancer using global gene expression technology AmPathol 160 (2002) 1239ndash1249

[11] AI Su MP Cooke KA Ching Y Hakak JR WalkeT Wiltshire AP Orth RG Vega LM Sapinoso A MoqricA Patapoutian GM Hampton PG Schultz JB HogeneLarge-scale analysis of the human and mouse transcriptoProc Natl Acad Sci USA 99 (2002) 4465ndash4470

[12] PM Haverty Z Weng NL Best KR Auerbach LL HsiaRV Jensen SR Gullans HugeIndex a database with visization tools for high-density oligonucleotide array data fronormal human tissues Nucleic Acids Res 30 (2002) 214ndash

[13] Affymetrix Inc GeneChip Expression Analysis 2001[14] LL Hsiao F Dangond T Yoshida R Hong RV Jens

J Misra W Dillon KF Lee KE Clark P Haverty Z WenGL Mutter MP Frosch ME Macdonald EL MilfordCP Crum R Bueno RE Pratt M Mahadevappa JA Wrington G Stephanopoulos SR Gullans A compendiumgene expression in normal human tissues Physiol Genom(2001) 97ndash104

[15] C Li WH Wong Model-based analysis of oligonucleotiarrays expression index computation and outlier detectProc Natl Acad Sci USA 98 (2001) 31ndash36

[16] R Irizarry B Hobbs F Collin YD Beazer-Barclay KJ Atonellis U Scherf TP Speed Exploration Normalizatioand Summaries of High Density Oligonucleotide Array ProLevel Data 2002

[17] R Sasik E Calvo J Corbeil Statistical analysis of higdensity oligonucleotide arrays a multiplicative noise modBioinformatics 18 (2002) 1633ndash1640

[18] BM Bolstad RA Irizarry M Astrand TP Speed A comparison of normalization methods for high density oligoncleotide array data based on variance and bias Bioinformics 19 (2003) 185ndash193

[19] E Hubbell WM Liu R Mei Robust estimators for expresion analysis Bioinformatics 18 (2002) 1585ndash1592

[20] WM Liu R Mei X Di TB Ryder E Hubbell S DeeTA Webster CA Harrington MH Ho J Baid SP Smekens Analysis of high density expression microarrays wsigned-rank call algorithms Bioinformatics 18 (2002) 1591599

[21] EC Mantzouranis SB Dowton AS Whitehead MD EdGA Bruns HR Colten Human serum amyloid P componcDNA isolation complete sequence of pre-serum amyloicomponent and localization of the gene to chromosomJ Biol Chem 260 (1985) 7752ndash7756

[22] TR Hughes MJ Marton AR Jones CJ Roberts R Sghton CD Armour HA Bennett E Coffey H Dai YDHe MJ Kidd AM King MR Meyer D Slade PY LumSB Stepaniants DD Shoemaker D Gachotte K Chraburtty J Simon M Bard SH Friend Functional discovvia a compendium of expression profiles Cell 102 (2000) 1126

[23] JL DeRisi VR Iyer PO Brown Exploring the metaboand genetic control of gene expression on a genomic sScience 278 (1997) 680ndash686

[24] M Bakay YW Chen R Borup P Zhao K NagaraEP Hoffman Sources of variability and effect of experimenapproach on expression profiling data interpretation BBioinformatics 3 (2002) 4

[25] KR Coombes WE Highsmith TA Krogmann KA Bagerly DN Stivers LV Abruzzo Identifying and quantifying sources of variation in microarray data using high-dencDNA membrane arrays J Comput Biol 9 (2002) 655ndash66