Transcriptome Profiles: Diagnostic Signature of Dolphin Populations

11
Transcriptome Profiles: Diagnostic Signature of Dolphin Populations Annalaura Mancia & Gregory W. Warr & Jonas S. Almeida & Artur Veloso & Randall S. Wells & Robert W. Chapman Received: 19 March 2009 / Revised: 24 February 2010 / Accepted: 9 March 2010 / Published online: 14 April 2010 # Coastal and Estuarine Research Federation 2010 Abstract Peripheral blood leukocyte samples were collect- ed from 151 common bottlenose dolphins (Tursiops truncatus) in the course of capture/release health evaluation studies at four different sites: Charleston Harbor, SC; Indian River Lagoon, FL; Sarasota Bay, FL; and St. Joseph Bay, FL. RNA was extracted and hybridized to a first-generation dolphin microarray. We tested the hypothesis that individ- ual dolphins could be assigned to their home regions (by machine learning methods) using only their transcriptomic signatures as classifiers. The machine learning approaches used in this study were artificial neural networks (ANNs) which were able to identify gene expression differences in males and females in some geographical locations. As the sex ratios sampled in each location were not the same and could influence the classification of individuals to loca- tions, males and females at each location were considered This material is based in part on the work supported by the National Science Foundation. Any opinion, finding, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Electronic supplementary material The online version of this article (doi:10.1007/s12237-010-9287-0) contains supplementary material, which is available to authorized users. A. Mancia : G. W. Warr Department of Biochemistry and Molecular Biology, The Medical University of South Carolina, Hollings Marine Laboratory, 331 Ft Johnson Road, Charleston, SC 29412, USA A. Mancia : G. W. Warr Marine Biomedicine and Environmental Sciences Center, The Medical University of South Carolina, Hollings Marine Laboratory, 331 Ft Johnson Road, Charleston, SC 29412, USA A. Mancia Department of Experimental Evolutionary Biology, University of Bologna, Via Selmi, 3, Bologna 40126, Italy J. S. Almeida M.D. Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77030, USA A. Veloso The College of Charleston, Hollings Marine Laboratory, 331 Ft Johnson Road, Charleston, SC 29412, USA R. S. Wells Chicago Zoological Society, c/o Mote Marine Laboratory, 1600 Ken Thompson Parkway, Sarasota, FL 34236, USA R. W. Chapman (*) The South Carolina Department of Natural Resources, Hollings Marine Laboratory, 331 Fort Johnson Road, Charleston, SC 29412, USA e-mail: [email protected] Present Address: G. W. Warr Division of Molecular and Cellular Biosciences, National Science Foundation, Arlington, VA 22230, USA Estuaries and Coasts (2010) 33:919929 DOI 10.1007/s12237-010-9287-0

Transcript of Transcriptome Profiles: Diagnostic Signature of Dolphin Populations

Transcriptome Profiles: Diagnostic Signatureof Dolphin Populations

Annalaura Mancia & Gregory W. Warr &

Jonas S. Almeida & Artur Veloso & Randall S. Wells &

Robert W. Chapman

Received: 19 March 2009 /Revised: 24 February 2010 /Accepted: 9 March 2010 /Published online: 14 April 2010# Coastal and Estuarine Research Federation 2010

Abstract Peripheral blood leukocyte samples were collect-ed from 151 common bottlenose dolphins (Tursiopstruncatus) in the course of capture/release health evaluationstudies at four different sites: Charleston Harbor, SC; IndianRiver Lagoon, FL; Sarasota Bay, FL; and St. Joseph Bay,FL. RNAwas extracted and hybridized to a first-generationdolphin microarray. We tested the hypothesis that individ-ual dolphins could be assigned to their home regions (by

machine learning methods) using only their transcriptomicsignatures as classifiers. The machine learning approachesused in this study were artificial neural networks (ANNs)which were able to identify gene expression differences inmales and females in some geographical locations. As thesex ratios sampled in each location were not the same andcould influence the classification of individuals to loca-tions, males and females at each location were considered

This material is based in part on the work supported by the NationalScience Foundation. Any opinion, finding, and conclusions orrecommendations expressed in this material are those of the authorsand do not necessarily reflect the views of the National ScienceFoundation.

Electronic supplementary material The online version of this article(doi:10.1007/s12237-010-9287-0) contains supplementary material,which is available to authorized users.

A. Mancia :G. W. WarrDepartment of Biochemistry and Molecular Biology, The MedicalUniversity of South Carolina, Hollings Marine Laboratory,331 Ft Johnson Road,Charleston, SC 29412, USA

A. Mancia :G. W. WarrMarine Biomedicine and Environmental Sciences Center,The Medical University of South Carolina,Hollings Marine Laboratory,331 Ft Johnson Road,Charleston, SC 29412, USA

A. ManciaDepartment of Experimental Evolutionary Biology,University of Bologna,Via Selmi, 3,Bologna 40126, Italy

J. S. AlmeidaM.D. Anderson Cancer Center,1515 Holcombe Blvd,Houston, TX 77030, USA

A. VelosoThe College of Charleston, Hollings Marine Laboratory,331 Ft Johnson Road,Charleston, SC 29412, USA

R. S. WellsChicago Zoological Society, c/o Mote Marine Laboratory,1600 Ken Thompson Parkway,Sarasota, FL 34236, USA

R. W. Chapman (*)The South Carolina Department of Natural Resources,Hollings Marine Laboratory,331 Fort Johnson Road,Charleston, SC 29412, USAe-mail: [email protected]

Present Address:G. W. WarrDivision of Molecular and Cellular Biosciences,National Science Foundation,Arlington, VA 22230, USA

Estuaries and Coasts (2010) 33:919–929DOI 10.1007/s12237-010-9287-0

separately. ANNs were able to correctly classify dolphinsaccording to their site of sampling with a high degree ofconfidence. The basis for this result may lie in geneticdifferences between populations, or in environmentalfactors (including for example diet, infection, contaminantload, or exposure to biotoxins) or in combinations of thesefactors. These results suggest that a combination of micro-arrays and machine learning analytical approaches will be apowerful approach to understanding the interaction ofdolphins with the marine environment.

Keywords Bottlenose dolphins . Population biology .

cDNA microarray . Artificial neural network

Introduction

The common bottlenose dolphin, Tursiops truncatus, hasbeen the focus of numerous investigations over the past fewdecades including studies of contaminant loads (Duffieldand Wells 1991) and population identification and structure,including genetics (Duffield and Wells 1991, 2002; Natoliet al. 2004; Sellas et al. 2005). Primarily, this research hasbeen driven by the protected status of the animal and byconcerns that environmental contamination resulting fromhuman activities may pose substantial risks for an apexpredator such as T. truncatus. Following the first determi-nation of local residency for a population of bottlenosedolphins, in Sarasota Bay, Florida in 1970 (Irvine et al.1981), philopatry has been demonstrated in inshore watersthrough much of the species’ range, primarily through theuse of tagging, radio-tracking, and photographic identifica-tion techniques (Reynolds et al. 2000). Ongoing studies inSarasota Bay have shown that this philopatry can continueover decades and multiple generations (Scott et al. 1990;Wells 1991). Although adult males may range outside ofthe general community range on occasion, most of theresident animals remain within a limited, definable rangefor most of their lives (Wells 1991; Owen et al. 2002).Genetic data provide additional evidence that discretebottlenose dolphin populations exist on a local scale andthat regional variations in body burdens of some contam-inants may be correlated to local environments and therestricted movements of these animals. Studies of contam-inants (Houde et al. 2006; Hansen et al. 2004; Fair et al.2007; Stavros et al. 2007; Montie et al. 2007) foundsignificant elevation in a number of organic contaminantsin Charleston Harbor, SC and Beaufort, NC animals relativeto Indian River Lagoon and Sarasota Bay, Florida pop-ulations, which is consistent with this hypothesis.

These results, collectively, are significant in that theysuggest that dolphin populations cannot escape the legacyof contamination of the areas in which they live, as their

philopatric nature, for the most part, precludes occupationof new and less contaminated areas. Studies on the impactof organic contamination on dolphins have shown elevatedCYP1A1 levels correlated to PCB levels (Wilson et al.2007), elevated risks to first-born calves (Wells et al. 2005),and increases in contaminant loads with increasing age inmales (Hansen et al. 2004; Wells et al. 2005). The risks tofirst-born calves, as well as lower contaminant levels infemales, are generally attributed to the offloading oflipophilic organic contaminants into milk during lactation.We have developed a microarray specifically to study stressand immune responses in the blood of dolphins (Mancia etal. 2007, 2008), and the present report details theapplication of this tool to the study of local populations ofT. truncatus in coastal regions of the South East Atlanticand Gulf of Mexico.

The advent of advanced molecular tools such as micro-arrays has presented wildlife biologists with a greatlyexpanded repertoire of tools to assess the impact ofenvironmental stressors on natural populations. This tech-nology permits investigators to examine, with a singlecomplex biosensor, literally thousands of genes that mayrespond to a given stressor or community of stressors.However, the capacity to examine such a large repertoire ofthe genome comes with a cost: the Curse of Dimensionality(Bellman 1961; Bishop 2006). The Curse refers to thedifficulty in fitting models, estimating parameters, oroptimizing functions in many dimensions simultaneously.Popular microarray analysis tools such as Genespring andBioconductor assess the change in expression of individualgenes as a result of experimental conditions and adjust formultiple simultaneous test using the Benjamini andHochberg (1995) correction. There are many recentillustrations of this in the literature (Whitehead andCrawford 2006a, b; Kennerly et al. 2008), and while thestatistical exactitude of the formations cannot be disputed,the approach implies that expression of each gene isindependent of all the others. Identifying complex inter-actions among genes could be achieved by reversing theapproach, using gene expression levels as inputs: essential-ly, a discriminate function analysis. This would requireunreasonably large samples sizes to examine even a smallnumber of genes using linear statistics. Even then, the linearmodels detect only non-additive effects and do not produceuseful models of interactive dynamics.

The purpose of this study was to examine the differencesin gene expression profiles among wild dolphin populationsusing advanced machine learning tools. While the ability ofmachine learning methods to fit any given input data set toan output is a general property of artificial intelligence (e.g.,machine learning tools), the artificial neural networks(ANNs) that have been primarily used in this study havesome particularly attractive properties for forecasting and

920 Estuaries and Coasts (2010) 33:919–929

modeling dynamical systems (Weigend et al. 1995; Voit andAlmeida 2004). In particular, ANNs thrive in modelingnon-linear interactions among input variables as outputclassifiers and are, therefore, well suited to dealing withtranscriptomic signatures, which almost certainly involvenon-linear interdependencies among the inputs. ANNs havebeen used extensively in medical science to classify diseasestatus based upon expression profiles (Khan et al. 2001;Wei et al. 2004; Linder et al. 2004; Dankbar et al. 2007).While the approach is attractive for a number of mathematicaland biological reasons, it does not escape the “Curse” anddimensionality reduction is necessary to build robust models(Bishop 2006). Most studies employing ANN modelingapproaches of microarray data have used linear analysistools (significant up- or down-regulation), expert knowl-edge, principal component analysis, or some combination ofthese to select genes for inputs to the ANN. This re-introduces linear methods to the analyses since linear toolsare being used to select inputs for a non-linear analysis andhas the potential to degrade the predictive value of micro-arrays. However, ANNs also provide a potential solution tothis problem in that they can select genes, using non-linearmethods, for input into the analysis. Specifically, the modelsgenerated by an initial training session using all genespresent on a microarray (after filtering non-responsiveelements) can be interrogated for responses of the outputsassociated with changes in these inputs. These are essen-tially the derivatives of the ANN function but are computednumerically and called “sensitivities” as they are anassessment of the sensitivity of the output (dependentvariable) to changes in the inputs (independent variables).The accuracy, as classifiers, of analytical methods usinggenes selected in this manner can be estimated by receiveroperating characteristic (ROC) curves.

In this study, we have tested using machine learningapproaches discussed above the hypothesis that the periph-eral blood transcriptome of bottlenose dolphins can be usedas a classifier of geographic location.

Materials and Methods

Sample Collection

Blood samples were collected during capture-release study infour different locations in the USA. The locations were alongthe US east coast, in Charleston, SC; Indian River Lagoon,FL; in the Gulf of Mexico in Sarasota Bay, FL; and St. JosephBay, FL. The samples were collected during health assess-ment capture-release operations (Wells et al. 2004) underNational Marine Fisheries Service (NMFS) Permits # 998-1678-01, 522-1785, and 932-1489-09 to Gregory Bossart,Randall Wells, and Teri Rowles, respectively. A total of 151

individuals were sampled between June of 2003 and June of2006 of which 59 individuals were from Charleston; 35 fromIndian River Lagoon, FL; 32 from Sarasota Bay, FL; and 25from St. Joseph Bay, FL. The animal’s ID and otheravailable information are listed in the Electronic supplemen-tary material—Table S1. All of the dolphins sampled fromSarasota Bay, FL were identified as residents of the regionbased on long-term studies (Scott et al. 1990; Wells 1991).

Many of the individuals were sampled more than once aseither part of an investigation of the effect of veterinaryexamination on transcript profiles or through capture andsample collection in multiple years. All duplicate sampleshave been removed from the current data analysis in orderto avoid complications of replicate sampling. Bloodsamples were collected in PAXgene™ Blood tubes(Qiagen, Valencia, CA, USA), mixed immediately to lysethe blood cells and stabilize the RNA, and stored accordingto the manufacturer’s instructions, i.e., at room temperaturefor up to 24 h prior to RNA purification, and at 4°C whenlonger storage times were needed.

Probe Preparation and Complementary DNA MicroarrayHybridization

RNA was extracted with PAXgene™ Blood RNA kits(Qiagen) and the quantity and integrity of the extractedRNA was determined by spectrophotometry and byelectrophoresis in a 1% agarose gel. Total RNA (1–2 µg)was used to produce Cy3-labeled aminoallyl RNA (Cy3-aaRNA) probe using the Amino Allyl MessageAmp™ Kit(Ambion Inc, Austin, TX, USA), according to the manu-facturer’s instructions, and 10 !g of the subsequentlyproduced Cy3-aaRNA was used to probe a custom dolphinperipheral blood leukocyte complementary DNA (cDNA)microarray as previously described (Mancia et al. 2007).Microarray slides were scanned with ScanArray™ Expressand SpotArray software at 80 V PMT and analyzed withQuantArray software (Perkin Elmer, Boston, MA, USA).The microarray hybridization data and the MIAME proto-cols have been deposited at the GEO site—www.ncbi.nlm.nih.gov/geo/ (Accession nos. GSM267587–GSM267737)and at www.marinegenomics.org.

Data Processing via Bioconductor

The QuantArray files were uploaded in R/Bioconductor(Gentleman et al. 2004) using the “limma” package (Smythet al. 2005; Smyth 2005). The data background wascorrected using the “normexp” methodology, which fits aconvoluted model to the background and foregroundintensities using maximum likelihood (Ritchie et al.2007). The corrected data were normalized using the“loess” method to adjust for print-order effects that could

Estuaries and Coasts (2010) 33:919–929 921

have been generated by differences in the cDNA batchesprinted onto the microarrays (Smyth and Speed 2003).Following the normalization, the genes for which at leastone of the replicates failed to exceed the intensity of non-dolphin genes (e.g., Duck IgY Heavy Chain and Kareniabrevis photolyase) in more than 10% of the slides wereeliminated from the analysis. The remaining spots werenormalized using the “VSN” method (Huber et al. 2002),and the duplicates were averaged using a correlation factorcalculated by the “limma” package (Smyth 2005). In order toselect the genes best suited to discern differences betweenthe sexes, an empirical Bayesian approach implemented inthe package “limma” was taken to shrink the standard errorstowards a common value (Smyth 2004). Next, moderated t-statistics for each of the probes in each of the populationswas calculated (Smyth 2004) and the p values obtained wereadjusted using the false discovery rate method (Benjaminiand Hochberg 1995). Gene selection was also conducted viaANNs between the sexes within each location followingsimilar procedures to those described below.

Gene Selection via ANNs

While general machine learning approaches encapsulated inartificial neural networks have been applied to the analysisof microarray data (Khan et al. 2001; Wei et al. 2004), thebasic formulation of our approach is shown in (Fig. 1) andis similar to that employed by Wei et al. (2004). An initialtraining of ANNs was conducted using the entire set ofVSN transformed expression data that passed Bioconductorfiltering (i.e., above background). Five models were run foreach population keeping the sexes separate, using a one-vs-rest approach (e.g., Charleston Harbor males vs males in allother populations) withholding a random selection (1/7) ofthe microarray records from each population as a crossvalidation set to prevent over training of the ANN (Haykin1999). The model with the highest R-square for the trainingdata from each population comparison was used to computethe sensitivities of the individual genes. Sensitivities in thiscontext are the partial derivatives of the weight andsigmoidal transfer function for each gene. The sensitivitiesacross all populations were then averaged and ranked toselect the top 250 most “important” genes (Fig. 2).

Dolphin ESTs selected by Bioconductor and ANNanalyses were blasted against the non-redundant proteinsequences database (nr) at NCBI (http://www.ncbi.nlm.nih.gov/). Blastx cutoff value was 1.0E!3. The blastx searchwas updated on October 12, 2009.

Machine Learning Analysis

Following the selection of genes using the sensitivityanalysis in the ANNs, a second round of ANN training

withholding 10% of the available data as a cross validationset which was distinct from the data used in early stoppingof the ANN training session. The top 250 genes were usedto train 20 or more ANNs. The design was balanced bycomparing each location pairwise to the others and taking90% of the smaller of the two sample sizes and an equalnumber of the larger set for training the ANNs. Theremaining samples were allocated to the cross validation(CV) set. Each round of training produces a model that canbe used to predict the state of the cross validation set(which is known) and thus generate a correlation betweenobserved and predicted values. This is useful in examiningthe predictive value of any ANN model to data that havenot been used in the training session using standardcorrelation coefficients comparing the observed value andthat predicted by the ANN.

To compare the precision of the classifications derivedfrom the ANNs, we used ROC curves and computed thearea under the curve (AUC) as well as the standard error(SE) using only the CV data. For justifications of thismethod for comparing various classifiers, see Bradley(1997) and Stephan et al. (2003). Statistical differencesbetween two comparisons can then be assessed usingstandard z-scores (Hanley and McNeil 1982). Whencomparing two diagnostic or classificatory procedures, thecalculation of the SE should include a measure ofthe correlation between the two AUCs, due to sampling of

Fig. 1 Schematic flow chart of the ANN analysis of microarray data

922 Estuaries and Coasts (2010) 33:919–929

the same set of cases. The sampling procedure outlinedabove does not exclude the possibility that some ROCsinclude some of the same individuals, but it is not clear howthe correlations should be estimated when some, but not all,cases are common to both analyses. We assume that thecorrelations are sufficiently small to be ignored, but haveadjusted the significance level (a=0.01) to compensate forthis potential bias.

Results and Discussion

Transcriptomic profiles were obtained from the blood offour populations of dolphins (Charleston Harbor, IndianRiver Lagoon, St. Joseph Bay, and Sarasota Bay), and thenull hypothesis to be tested was that these transcriptomicprofiles would not show location-specific signatures.However, within each population, there were unknownvariables (such as genotype, health history, current healthstatus, contaminant load) as well as known variables suchas age and sex. Thus, an important question was whether, incomparing the different populations of dolphins, males andfemales should be considered separately.

Sex Differences Within Each Location Can BeDiscriminated Using ANNs

As males and females are likely to express a somewhatdifferent suite of genes and the individual locations werenot always balanced with respect to sex ratios, it waspossible that comparisons of locations using all individualscould be biased by difference in the proportions of each sexsamples in each location. To assess this bias, we comparedthe expression profiles between the sexes using ANNs asdescribed above (Fig. 1) on the basis of their sensitivities(ANN selected genes are listed in the Electronic supple-

mentary material—Table S2). These results (Table 1)clearly indicate gene expression differences between thesexes in Charleston Harbor and St. Joseph Bay, but notbetween the sexes in Indian River Lagoon and SarasotaBay. These analyses indicated that differences in geneexpression between the sexes could bias discrimination ofdolphins from Charleston Harbor, St. Joseph Bay, and (to alesser extent) Sarasota Bay, and the assessment of between-location population differences should be conducted foreach sex independently. We deem this finding an importantcautionary note. While sexual differences in gene expres-sion have received considerable attention in the past decade(Jiang and Machado 2009 and references therein), fewinvestigators addressing the impacts of environmentalconditions on natural populations have accounted for thispotential bias in sexually dimorphic species (Waters et al.2003; Whitehead and Crawford 2006a, b). If the samplesizes within contrasted groups are fairly large (>30) and thesex ratios relatively stable, it is unlikely that this compli-cation will strongly influence the results. However, ifsample sizes are small, unequal sampling of the sexescould influence the results.

Discrimination of Locations Using ANNs

The results of the ANNs classification of individualtranscript profiles to location are presented in Fig. 3. Inall comparisons, both males and females were correctlyassigned to location of collection above random expect-ations, and of the 10 total comparisons, all but 2 weregreater than 90% accurate. Both of these involved compar-ison of male dolphins, but this should not be taken as anindication that classification of males should have been lessprecise than females as this difference is not significant viaa sign test. This being said, we would have expected thereverse, that when difference in the discrimination appeared

Fig. 2 Sensitivity plot of generesponses to the parameter“geographic location”. Sensitiv-ities (plotted on the ordinate)provide an assessment of howchanges in the input (genenumber) impact the output

Estuaries and Coasts (2010) 33:919–929 923

between the sexes, the classification of females should havebeen less precise due to their ability to shed certaincontaminants (Wells et al. 2005) via lactation, and thus,their gene expression profiles would have been lessdiscriminatory. This was not the case.

The Differentially Expressed Genes

Although the gene expression profiles, considered asoverall signatures, provide useful classificatory tools asdescribed above, nevertheless there is also a great deal ofinformation that can be inferred from examining whichgenes show significantly different levels of expression.

The Bioconductor selection identified 171 unigenes formale dolphins and 195 for females as being significantlydifferentially regulated (less redundancies). These genes,listed in the Electronic supplementary material—Table S3,were used to generate heatmaps using Euclidean distanceand hierarchical clustering as implemented in the Rpackage “gplots”.

The results (Fig. 4a, b) showed identical clusteringpatterns for males and females, with the closest relationshipbeing between the dolphins from the Southeast US

(Charleston and Indian River Lagoon) with the St. JosephBay animals being the next most closely related and theSarasota Bay animals forming a separate basal branch.From the ANN analyses, the 250 genes with the highestsensitivities were identified (Electronic supplementarymaterial—Table S4) for both male and female dolphinsand used to generate heatmaps (Fig. 5a, b). As for theBioconductor-selected genes, these analyses showed thatfor both males and females the dolphins from Charlestonand Indian River Lagoon clustered together. However,while the males showed the St. Joseph Bay dolphins asbeing the next most closely related (Fig. 5a), for females itwas the Sarasota Bay dolphins (Fig. 4b). A globalcomparison of the genes differentially expressed both in maleand female dolphins in the two types of analysis selected the130 genes that have been used to generate the Venn diagram inFig. 6. This shows the number of differentially expressedgenes, for males and females, that were found to besignificant in the determination of the location. Male andfemale dolphins share 45 genes uniquely identified byBioconductor analysis and 44 genes uniquely identified bythe ANN analysis. Overall, 41 genes were identified assignificantly informative in both ANN and Bioconductoranalyses (Fig. 6).

Although a large proportion of the genes identified asdifferentially regulated were not common to the ANN andBioconductor analyses (Fig. 6), nevertheless when typicalcluster analyses were carried out, the inferred relationshipswere almost identical (Figs. 4 and 5).

As would be expected with an array that was focused onstress and immune response, many of the genes found to bedifferentially expressed between populations reflect thisbias of the array. However, this does not detract from thelegitimacy of the observed changes in gene expression. It isinteresting that while some of the genes that weredifferentially expressed between males and females aresex-specific (for example, the breast and ovarian cancersusceptibility protein, the ribosomal protein X-linked, dpy-30 homolog), major differences between the sexes wereobserved in immune function genes (Electronic supplemen-tary material—Tables S3 and S4). The Bioconductor andANN analyses identified in females a variety of innate and

IRLNf=15Nm=18

CHSNf=25Nm=23

SJBNf=15Nm=13

SARNf=15Nm=22

F=0.94M=0.80

F=0.93M=0.95

F=1.00M=0.99

F=1.0M=1.0

Fig. 3 Classification precision of dolphins to location based upontranscriptional signature. CHS, Charleston, South Carolina; IRL, IndianRiver Lagoon, Florida; SAR, Sarasota, Florida; SJB, St. Josephs Bay,Florida. Nf, female sample size. Nm, male sample size. F, percent correctclassification for females. M, percent correct classification for males

Table 1 Discrimination between male and female dolphins at four locations

No. of genes CHS IRL SAR SJBAUC (SE) AUC (SE) AUC (SE) AUC (SE)

250 0.6598 (0.0416) 0.5843 (0.0708) 0.6275 (0.072) 0.7019 (0.0749)

Areas under the ROC curves (AUC) and their standard errors (SE) for classification of male and female dolphin via ANNs are shown within eachsample location. The number of genes used in the test is indicated on the left (250). These genes (Electronic supplementary material—Table S4)were identified according to sensitivities from the best of five ANN models run on the full suite of genes on the microarray. The values in boldtype are significantly different from random expectations at p<0.05

CHS, Charleston, South Carolina; IRL, Indian River Lagoon, Florida; SAR, Sarasota, Florida; SJB, St. Josephs Bay, Florida

924 Estuaries and Coasts (2010) 33:919–929

adaptive immune receptors (including TLR-1, TLR-4,TLR-7, TLR-8, TCR and CD48). In contrast, in males theinformatively regulated genes included not only receptorsand signal transduction molecules (TLR-1, TLR-3, TLR-6,BCR, CD47, CD79, STAT1) but also interleukins (IL-1"and IL-13). Male populations also showed differentialexpression of genes for muscle structural proteins such asmyosin and tropomyosin, probably correlated with therelatively greater development of muscle mass in males.Some immune function genes were found to be significant-ly differently expressed in both males and females(including MHC", lymphotoxin, and a C-type lectin-related protein). A major component of the genes regulatedsignificantly in both sexes (as identified in both ANN and

Bioconductor analyses) is the structural proteins, includingthose of the ribosome and mitochondrion. Not surprisingly,many of the genes identified with both the analyses belongto the family of retrotransposons (endonuclease reversetranscriptase), as in mammals, almost half the genome(45%) is comprised of transposons or remnants of trans-posons (Lander et al. 2001; Deininger et al. 2003).

Another noteworthy set of genes identified in theBioconductor and ANN analyses in both sexes is metal-lothionein, ERp44, selenoprotein K, ferritin, and heat shockprotein (Table 2). The expression of these genes is knownto protect against oxidative stress-induced cellular injury(Itoh et al. 1999; Borghesi and Lynes 1996; Baird et al.2006).

SA IRCHSJ SA IRCHSJ

males females

a bFig. 4 Intensity values (scaledby row) for the genes selectedby the Bioconductor of the male(a) and female (b) populations.CHS, Charleston, South Caro-lina; IRL, Indian River Lagoon,Florida; SAR, Sarasota, Florida;SJB, St. Josephs Bay, Florida

SA IRCHSJ SJ CHIRSA

males females

a bFig. 5 Intensity values (scaledby row) for the genes selectedby the ANN of the male (a) andfemale (b) populations. CHS,Charleston, South Carolina; IRL,Indian River Lagoon, Florida;SAR, Sarasota, Florida; SJB, St.Josephs Bay, Florida

Estuaries and Coasts (2010) 33:919–929 925

Table 2 shows the expression of these genes relative tothe expression in Sarasota dolphins, which exhibited alower level of expression.

Metallothioneins form complexes with heavy metal ions,binding physiological (zinc, copper, selenium) but alsoxenobiotic heavy metals (cadmium, mercury, silver, arse-nic) via the thiol groups of the cysteine residues. Selenium,originally considered toxic, is now known to be animportant micronutrient (Small-Howard and Berry 2005).Deficiency as well as too high concentration of seleniumcan lead to several disorders and diseases (Small-Howardand Berry 2005).

Endoplasmic reticulum resident protein (ERp44 orthioredoxin domain-containing protein 4) is an ER-resident that contains a thioredoxin domain and is inducedduring ER stress. Its overexpression alters the equilibrium

of the different Ero-1 (endoplasmic reticulum oxidoreduc-tin) redox isoforms, suggesting that ERp44 may beinvolved in the control of oxidative protein folding (Anelliet al. 2002). Little is known of selenoprotein K (SelK). Ithas been recently identified in Drosophila melanogasterwhere knock-down assay showed that SelK is necessary fornormal development (Kryukov et al. 2003; Richardson2005). SelK is also present in humans where it exhibits awider variety of functions; Lu et al. (2006) showed itsfunction as an antioxidant in cardiomyocytes. It was veryinteresting that SelK seemed to be present in dolphin blood.The knowledge of the importance of selenoproteins to theimmune system is limited, but it is known that they haveroles in modulating inflammatory response (Bellinger et al.2009). The presence and expression of selenoprotein K indolphins is in need of further study.

Table 2 Relative expression of oxidative stress-response genes in all four geographic locations through the Bioconductor and ANN analyses

Dolphin microarray ESTs Location Analysis/sex

Acc. no. Sequence description CHS IRL SAR SJB

DV468529 hsp70 subfamily B suppressor 1 (HBS1) 0.58 0.78 – 1.44 B/M

DV799541 Metallothionein 1 0.07 0.00 – 0.14 B/M

DT661135 ERp44 (thioredoxin domain-containing protein 4) 0.83 0.50 – 3.08 B/M

DV467972 Heat shock factor binding protein 1 (HSBP1) 0.59 0.83 – 1.50 A/M

DV467994 Selenoprotein k 1.84 0.84 – 0.93 B/F

DT660293 Ferritin I 0.44 0.04 – 0.89 A/F

Expression of oxidative stress-response genes relative to the expression of that gene for the SAR dolphins. Values are expressed as fold change inexpression using Sarasota dolphin values (lower expression) as reference set. The fold change were obtained from VSN transformed microarraydata

B Bioconductor analysis, A ANN analysis, M male dolphins, F female dolphins; CHS, Charleston, South Carolina; IRL, Indian River Lagoon,Florida; SAR, Sarasota, Florida; SJB, St. Josephs Bay, Florida

Fig. 6 Venn diagram of the significant genes for the prediction of theparameter location from the ANN and Bioconductor analyses infemale and male populations. ANN genes are indicated with thesymbols for the correspondent sex in pink and blue (on the left) while

the Bioconductor are indicated with the symbols for the correspondentsex in green and yellow (on the right) as explained in the figure legend(left box)

926 Estuaries and Coasts (2010) 33:919–929

Conclusions

The use of transcriptomic data for studies of organisms intheir natural environment faces many challenges, perhapsthe most important of which is the development ofappropriate techniques for data analysis. A second problemis that often the full transcriptome has not been character-ized for species of ecological and environmental interest,and thus (as in this study), relatively small microarraysmust be used. We recognize that recent developments insequencing technologies are rapidly changing this land-scape. Beyond this, a third important problem in the study ofprotected species, such as dolphins, is the difficulty inobtaining comprehensive, relevant data (such as life history,genetic, and health status information) on the animals that arebeing studied. Despite these potential problems, the studypresented here shows that a machine learning approach(ANN) is applicable to the problem of classifying individualanimals based on transcriptomic signatures. The dimension-ality reduction, which is necessary for any interrogation oftranscriptomic data and usually accomplished by lineartechniques, can be substantially improved by machinelearning approaches which take advantage of the dynamicinterdependencies of the components of the transcriptome.This study also shows that despite a lack of knowledge of thesources or extent of variability in populations of wild dolphinsand the use of a relatively small microarray (that accesses onlya small proportion of the total transcriptome), neverthelessaccurate classifications reflecting geographic location couldbe achieved.We view the differences in gene expression notedhere as both a precaution to those using transcriptomicresponses to assess environmental challenges and a validationof our data. Failure to find differences between the sexeswould certainly lead to challenges concerning the precision oftranscriptomic analysis to deconvolve the impacts of environ-ments on organismal responses. Failure to partition the dataaccordingly could lead to misrepresentations of response.

Acknowledgments This research was supported by awards from theNational Ocean Services/NOAA (NCNS4000-4-0054) and the Officeof Naval Research (N000140610296) and by fellowship support fromthe University of Bologna to A.M. The construction and character-ization of the dolphin microarrays (which are publicly available) wassupported by the Hollings Marine Laboratory (National OceanService, National Center for Coastal Ocean Science) and the Centerof Excellence in Oceans and Human Health in the Hollings MarineLaboratory. Samples from Sarasota Bay dolphins were collectedthrough the support of Dolphin Quest, NOAA Fisheries Service, theChicago Zoological Society, and Mote Marine Laboratory. Thesamples were collected under National Marine Fisheries Service(NMFS) Permits # 998-1678-01, 522-1785, and 932-1489-09 toGregory Bossart, Randall Wells, and Teri Rowles, respectively. Wethank David White for expert assistance in the preparation of Fig. 1.The paper is contribution number 664 to the Marine ResourcesDivision, SCDNR and 51 to the Marine Biomedicine and Environ-mental Sciences Center

References

Anelli, T., M. Alessio, A. Mezghrani, T. Simmen, F. Talamo, A.Bachi, and R. Sitia. 2002. ERp44, a novel endoplasmic reticulumfolding assistant of the thioredoxin family. EMBO Journal 21:835–844.

Baird, S.K., T. Kurz, and U.T. Brunk. 2006. Metallothionein protectsagainst oxidative stress-induced lysosomal destabilization. Bio-chemical Journal 394: 275–283.

Bellinger, F.P., A.V. Raman, M.A. Reeves, and M.J. Berry. 2009.Regulation and function of selenoproteins in human disease.Biochemical Journal 422: 11–22.

Bellman, R. 1961. Adaptive control processes: A guided tour.Princeton: Princeton University Press.

Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discoveryrate—A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B: Methodolog-ical 57: 289–300.

Bishop C.M. 2006. Pattern recognition and machine learning.Springer, New York.

Borghesi, L.A., and M.A. Lynes. 1996. Stress proteins as agents ofimmunological change: Some lessons from metallothionein. CellStress Chaperones 1(2): 99–108.

Bradley, A.P. 1997. The use of the area under the ROC curve in theevaluation of machine learning algorithms. Pattern Recognition30: 1145–1159.

Dankbar, D.M., E.D. Dawson, M. Mehlmann, C.L. Moore, J.A.Smagala, M.W. Shaw, et al. 2007. Diagnostic microarray forinfluenza B viruses. Analytical Chemistry 79(5): 2084–2090.

Deininger, P.L., J.V. Moran, M.A. Batzer, and H.H. Kazazian Jr. 2003.Mobile elements and mammalian genome evolution. CurrentOpinion in Genetics and Development 13(6): 651–658.

Duffield, D.A., and R.S. Wells. 1991. The combined application ofchromosome, protein and molecular data for the investigation ofsocial unit structure and dynamics in Tursiops truncatus. InGenetic ecology of whales and dolphins, ed. I.A.R. Hoelzel, 155–169. Cambridge: The International Whaling Commission.

Duffield, D.A., and R.S. Wells. 2002. The molecular profile of aresident community of bottlenose dolphins, Tursiops truncatus.In Molecular and cell biology of marine mammals, ed. C.J.Pfeiffer, 3–11. Melbourne: Krieger.

Fair, P.A., G. Mitchum, T.C. Hulsey, J. Adams, E. Zolman, W. McFee,et al. 2007. Polybrominated diphenyl ethers (PBDEs) in blubberof free-ranging bottlenose dolphins (Tursiops truncatus) fromtwo southeast Atlantic estuarine areas. Archives of EnvironmentalContamination and Toxicology 53(3): 483–494.

Gentleman, R.C., V.J. Carey, D.M. Bates, B. Bolstad, M. Dettling, S.Dudoit, et al. 2004. Bioconductor: Open software developmentfor computational biology and bioinformatics. Genome Biology 5(10): R80.

Hanley, J.A., and B.J. McNeil. 1982. The meaning and use of the areaunder a receiver operating characteristic (ROC) curve. Radiology143(1): 29–36.

Hansen, L.J., L.H. Schwacke, G.B. Mitchum, A.A. Hohn, R.S. Wells,E.S. Zolman, et al. 2004. Geographic variation in polychlorinatedbiphenyl and organochlorine pesticide concentrations in theblubber of bottlenose dolphins from the US Atlantic coast.Science of the Total Environment 319(1–3): 147–172.

Haykin, S. 1999. Neural networks: A comprehensive foundation.Upper Saddle River: Prentice Hall.

Houde, M., G. Pacepavicius, R.S. Wells, P.A. Fair, R.J. Letcher, M.Alaee, et al. 2006. Polychlorinated biphenyls and hydroxylatedpolychlorinated biphenyls in plasma of bottlenose dolphins(Tursiops truncatus) from the Western Atlantic and the Gulf of

Estuaries and Coasts (2010) 33:919–929 927

Mexico. Environmental Science and Technology 40(19): 5860–5866.

Huber, W., A. Von Heydebreck, H. Sultmann, A. Poustka, and M.Vingron. 2002. Variance stabilization applied to microarray datacalibration and to the quantification of differential expression.Bioinformatics 18(Suppl 1): S96–S104.

Irvine, A.B., M.D. Scott, R.S. Wells, and J.H. Kaufmann. 1981.Movements and activities of the Atlantic bottlenose dolphin,Tursiops truncatus, near Sarasota, Florida. Fishery Bulletin US79: 671–688.

Itoh, K., T. Iahii, N. Wakabayashi, and M. Yamamoto. 1999.Regulatory mechanisms of cellular response to oxidative stress.Free Radical Research 31: 319–324.

Jiang, Z.F., and C.A. Machado. 2009. Evolution of sex-dependentgene expression in three recently diverged species of Drosophila.Genetics 183: 1175–1185.

Kennerly, E., A. Ballmann, S. Martin, R. Wolfinger, S. Gregory, M.Stoskopf, et al. 2008. A gene expression signature of confine-ment in peripheral blood of red wolves (Canis rufus). MolecularEcology 17(11): 2782–2791.

Khan, J., J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. West-ermann, et al. 2001. Classification and diagnostic prediction ofcancers using gene expression profiling and artificial neuralnetworks. Nature Medicine 7(6): 673–679.

Kryukov, H.K., S. Castellano, S.V. Novoselov, A.V. Lobanov, O.Zehtab, R. Guido, and V.N. Gladyshev. 2003. Characterization ofmammalian selenoproteomes. Science 300: 1439–1443.

Lander, E.S., L.M. Linton, B. Birren, et al. 2001. Initial sequencingand analysis of the human genome. Nature 409(6822): 860–921.

Linder, R., D. Dew, H. Sudhoff, D. Theegarten, K. Remberger, S.J.Poppl, et al. 2004. The ‘subsequent artificial neural network’(SANN) approach might bring more classificatory power toANN-based DNA microarray analyses. Bioinformatics 20(18):3544–3552.

Lu, C., F. Qiu, H. Zhou, Y. Peng, W. Hao, J. Xu, J. Yuan, S. Wang, B.Qiang, and C. Xu. 2006. Identification and characterization ofselenoprotein K: An antioxidant in cardiomyocytes. FEBS Letters580(22): 5189–5197.

Mancia, A., M.L. Lundqvist, T.A. Romano, M.M. Peden-Adams, P.A.Fair, M.S. Kindy, et al. 2007. A dolphin peripheral bloodleukocyte cDNA microarray for studies of immune functionand stress reactions. Developmental and Comparative Immunol-ogy 31(5): 520–529.

Mancia, A., G.W. Warr, and R.W. Chapman. 2008. A transcriptomicanalysis of the stress induced by capture-release health assess-ment studies in wild dolphins (Tursiops truncatus). MolecularEcology 17(11): 2581–2589.

Montie, E.W., S.R. Garvin, P.A. Fair, G.D. Bossart, G.B. Mitchum, W.E. McFee, T. Speakman, V.R. Starczak, and M.E. Hahn. 2007.Blubber morphology in wild bottlenose dolphins (Tursiopstruncatus) from the Southeastern United States: Influence ofgeographic location, age class, and reproductive state. Journal ofMorphology 269: 496–511.

Natoli, A., V.M. Peddemors, and A.R. Hoelzel. 2004. Populationstructure and speciation in the genus Tursiops based on micro-satellite and mitochondrial DNA analyses. Journal of Evolution-ary Biology 17(2): 363–375.

Owen, E.C.G., S. Hofmann, and R.S. Wells. 2002. Ranging and socialassociation patterns of paired and unpaired adult male bottlenosedolphins, Tursiops truncatus, in Sarasota, Florida, provide noevidence for alternative male strategies. Canadian Journal ofZoology 80: 2072–2089.

Reynolds J.E.I., R.S. Wells, S.D. Eide. 2000. The bottlenose dolphin:Biology and conservation. University Press of Florida, Gaines-ville, pp 289.

Richardson, D.R. 2005. More roles for selenoprotein P; local seleniumstorage and recycling protein in the brain. Biochemical Journal386: e5–e7.

Ritchie, M.E., J. Silver, A. Oshlack, M. Holmes, D. Diyagama, A.Holloway, et al. 2007. A comparison of background correctionmethods for two-colour microarrays. Bioinformatics 23(20):2700–2707.

Scott, M., R.S. Wells, and A.B. Irvine. 1990. A long-term study ofbottlenose dolphins on the west coast of Florida. In Thebottlenose dolphin, ed. ISLaRR Reeves, 235–244. San Diego:Academic.

Sellas, A., R.S. Wells, and P.E. Rosel. 2005. Mitochondrial andnuclear DNA analyses reveal fine scale geographic structure inbottlenose dolphins (Tursiops truncatus) in the Gulf of Mexico.Conservation Genetics 6: 715–728.

Small-Howard, A.L., and M.J. Berry. 2005. Unique features ofselenocysteine incorporation function within the context ofgeneral eukaryotic translational processes. Biochemical SocietyTransactions 33: 1493–1497.

Smyth, G.K. 2004. Linear models and empirical Bayes methodsfor assessing differential expression in microarray experi-ments. Statistical Applications in Genetics and MolecularBiology 3: 3.

Smyth, G.K. 2005. Limma: Linear models for microarray data. InBioinformatics and computational biology solutions using R andbioconductor, gentleman R, ed. V. Carey, S. Dudoit, R. Irizarry,and W. Huber, 397–420. New York: Springer.

Smyth, G.K., and T. Speed. 2003. Normalization of cDNA microarraydata. Methods 31(4): 265–273.

Smyth, G.K., J. Michaud, and H.S. Scott. 2005. Use of within-arrayreplicate spots for assessing differential expression in microarrayexperiments. Bioinformatics 21(9): 2067–2075.

Stavros, H.C., G.D. Bossart, T.C. Hulsey, and P.A. Fair. 2007. Traceelement concentrations in skin of free-ranging bottlenose dol-phins (Tursiops truncatus) from the southeast Atlantic coast.Science of the Total Environment 388(1–3): 300–315.

Stephan, C., S. Wesseling, T. Schink, and K. Jung. 2003. Comparisonof eight computer programs for receiver-operating characteristicanalysis. Clinical Chemistry 49(3): 433–439.

Voit, E.O., and J. Almeida. 2004. Decoupling dynamical systems forpathway identification from metabolic profiles. Bioinformatics 20(11): 1670–1681.

Waters, M.D., K. Olden, and R.W. Tennant. 2003. Toxicogenomicapproach for assessing toxicant-related disease. Mutation Re-search 544: 415–424.

Wei, J.S., B.T. Greer, F. Westermann, S.M. Steinberg, C.G. Son, Q.R. Chen, et al. 2004. Prediction of clinical outcome usinggene expression profiling and artificial neural networks forpatients with neuroblastoma. Cancer Research 64(19): 6883–6891.

Weigend, A.S., M. Mangeas, and A.N. Srivastava. 1995. Nonlineargated experts for time series: Discovering regimes and avoidingoverfitting. International Journal of Neural Systems 6(4): 373–399.

Wells, R.S. 1991. The role of long-term study in understanding thesocial structure of a bottlenose dolphin community. In Dolphinsocieties: Discoveries and puzzles, ed. KPaKS Norris, 199–225.Berkeley: University of California Press.

Wells, R.S., H.L. Rhinehart, L.J. Hansen, J.C. Sweeney, F.I.Townsend, R. Stone, D. Casper, M.D. Scott, T. Hohn, and T.K.Rowles. 2004. Bottlenose dolphins as marine ecosystem senti-nels: Developing a health monitoring system. EcoHealth 1:246–254.

Wells, R.S., V. Tornero, A. Borrell, A. Aguilar, T.K. Rowles, H.L.Rhinehart, et al. 2005. Integrating life-history and reproductivesuccess data to examine potential relationships with organochlo-

928 Estuaries and Coasts (2010) 33:919–929

rine compounds for bottlenose dolphins (Tursiops truncatus) inSarasota Bay, Florida. Science of the Total Environment 349(1–3): 106–119.

Whitehead, A., and D.L. Crawford. 2006a. Neutral and adaptivevariation in gene expression. Proceedings of the NationalAcademy of Sciences of the United States of America 103(14):5425–5430.

Whitehead, A., and D.L. Crawford. 2006b. Variation within andamong species in gene expression: Raw material for evolution.Molecular Ecology 15(5): 1197–1211.

Wilson, J., R.S. Wells, A. Aguilar, A. Borrell, V. Tornero, P. Reijnders,M. Moore, and J.J. Stegeman. 2007. Correlates of cytochromeP450 1A1 expression in bottlenose dolphin (Tursiops truncatus)integument biopsies. Toxicological Sciences 97: 111–119.

Estuaries and Coasts (2010) 33:919–929 929