Download - Validation of metabolomics for toxic mechanism of action screening with the earthworm Lumbricus rubellus

ORIGINAL ARTICLE

Validation of metabolomics for toxic mechanism of actionscreening with the earthworm Lumbricus rubellus

Qi Guo Æ Jasmin K. Sidhu Æ Timothy M. D. Ebbels Æ Faisal Rana ÆDavid J. Spurgeon Æ Claus Svendsen Æ Stephen R. Sturzenbaum ÆPeter Kille Æ A. John Morgan Æ Jacob G. Bundy

Received: 12 August 2008 / Accepted: 15 December 2008! Springer Science+Business Media, LLC 2009

Abstract One of the promises of environmental meta-bolomics, together with other ecotoxicogenomic approaches,

is that it can give information on toxic compound mechanism

of action (MOA), by providing a specific response profile orfingerprint. This could then be used either for screening in the

context of chemical risk assessment, or potentially in con-

taminated site assessment for determining what compoundclasses were causing a toxic effect. However for either of

these two ends to be achievable, it is first necessary to know if

different compounds do indeed elicit specific and distinctmetabolic profile responses. Such a comparative study has

not yet been carried out for the earthworm Lumbricus

rubellus. We exposed L. rubellus to sub-lethal concentrationsof three very different toxicants (CdCl2, atrazine, and fluo-

ranthene, representing three compound classes with different

expected MOA), by semi-chronic exposures in a laboratorytest, and used NMR spectroscopy to obtain metabolic pro-

files. We were able to use simple multivariate pattern-

recognition analyses to distinguish different compounds tosome degree. In addition, following the ranking of individual

spectral bins according to their mutual information with

compound concentrations, it was possible to identify bothgeneral and specific metabolite responses to different toxic

compounds, and to relate these to concentration levels

causing reproductive effects in the worms.

Keywords Environmental biomarker ! Atrazine !Cadmium ! Fluoranthene ! Ecotoxicogenomics !Metabonomics

1 Introduction

The major application of metabolomics in the environ-

mental sciences to date has been in ecotoxicology. Within

this, a majority of groups are working in aquatic ecotoxi-cology, recently reviewed by Lin et al. (2006) and Viant

(2007). In contrast, fewer studies have been carried out on

terrestrial models; to date, most of the examples of appli-cation of metabolomics in terrestrial ecotoxicology are in

earthworms. Earthworms have, of course, been widely used

for assessing soil contamination for many years (Spurgeonet al. 2003b; Sanchez-Hernandez 2006). A host of different

biochemical and biomolecular endpoints have been used as

measures of earthworm response, and so it is not surprisingthat metabolite profiles have been added to the arsenal of

molecular biomarkers used by terrestrial ecotoxicologists.

Qi Guo and Jasmin K. Sidhu contributed equally to this paper.

Electronic supplementary material The online version of thisarticle (doi:10.1007/s11306-008-0153-z) contains supplementarymaterial, which is available to authorized users.

Q. Guo ! J. K. Sidhu ! T. M. D. Ebbels ! F. Rana !J. G. Bundy (&)Department of Biomolecular Medicine, Division of Surgery,Oncology, Reproductive Biology, and Anaesthetics, Facultyof Medicine, Imperial College London, Sir Alexander FlemingBuilding, London SW7 2AZ, UKe-mail: [email protected]

D. J. Spurgeon ! C. SvendsenCentre for Ecology and Hydrology, Maclean Building, BensonLane, Crowmarsh Gifford, Wallingford OX10 8BB, UK

S. R. SturzenbaumPharmaceutical Science Division, King’s College London,School of Biomedical & Health Sciences, Franklin WilkinsBuilding, Stamford Street, London SE1 9NH, UK

P. Kille ! A. J. MorganUniversity of Cardiff, School of Biosciences, Main Building,Park Place, Cardiff CF10 3TL, UK

123

Metabolomics

DOI 10.1007/s11306-008-0153-z

https://www.researchgate.net/publication/274219439_Metabolomics_Methodologies_and_Applications_in_the_Environmental_Sciences?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/238071428_Metabolomics_of_aquatic_organisms_The_new_'omics'_on_the_block?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==


https://www.researchgate.net/publication/278210956_A_summary_of_eleven_years_progress_in_earthworm_ecotoxicology?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==


https://www.researchgate.net/publication/6777949_Earthworm_biomarkers_in_ecological_risk_assessment?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

Recent publications have used NMR-based profiling for

assessing field and semi-field studies on Lumbricus rubel-lus (Bundy et al. 2004, 2007, 2008; Jones et al. 2008). This

species also has EST sequence data available (Owen et al.

2008), meaning that it is suitable for parallel transcriptomicstudies. Furthermore, a project is ongoing to sequence the

entire L. rubellus genome (see www.earthworms.org for

more information). In addition, L. rubellus is widely dis-tributed and can often be found at field sites, making it a

useful choice for both lab and field studies. Eisenia fetida isanother important terrestrial model, because it is required

for regulatory testing (OECD 1984), although as an

extreme epige it is less relevant for soil testing than L.rubellus. Brown et al. (2008) have evaluated different

sample preparation methods with E. fetida which, coupled

with additional genomic information in the form of an ESTlibrary and resultant cDNA microarrays (Gong et al. 2007),

makes it, like L. rubellus, a very suitable organism to

combine future metabolomic research with other post-genomic analyses.

For metabolomics (and other toxicogenomic methods)

to be adopted as a useful monitoring tool in ecotoxicology,it is not enough to do as well as existing methods: sub-

stantial improvements are needed (van Straalen and

Roelofs 2008). Hence an important question is, whatadvantages are there to using ecotoxicogenomic methods in

addition to current approaches based on chemical residue

analysis? If biological response information is needed, whynot make use of a known biomarker response? One pos-

sibility is that omic methods can potentially discriminate

between specific compounds/toxic mode of action (MOA);it then becomes clear that being able to discriminate and

assign MOAs is a critical need for environmental meta-

bolomics studies. Metabolic profiling has been widely usedfor classifying or discriminating MOA in toxicology

(Gartland et al. 1989, 1990, 1991; Ebbels et al. 2007), and

has also been applied to earthworm multiple-toxicantstudies (Bundy et al. 2002). However, there are currently

no studies comparing the metabolic effect of toxicants with

different MOA for the species L. rubellus. It is alsoimportant that enough baseline data are available to give a

context for interpreting metabolic responses—i.e. is a

metabolite change really a biomarker of a specific MOA, oris it instead a more general stress response (van Straalen

and Roelofs 2008)?

Field exposure data are important in ecotoxicology, aslaboratory exposures cannot fully model the complexity of

the natural environment, particularly for a highly hetero-

geneous matrix such as soil. Nonetheless controlledexposures to single toxicants are still essential, in order to

provide a context for field observations. Here, we present

data for NMR-based profiling of tissue extracts ofL. rubellus following chronic soil exposure to three

different toxicants: Cd, fluoranthene (FA), and atrazine

(AZ), i.e. a toxic metal, a common organic pollutant, andan agrochemical (herbicide). We have identified both

general and toxicant-specific biomarkers in the dataset. In

addition, we tested if the profiles could be used to distin-guish the effects of different toxicants, and thus potentially

assign MOA. These data will provide a useful baseline for

future metabolomic experiments using L. rubellus.

2 Materials and methods

2.1 Chemical exposures

A full description of the exposure protocol was given by

Owen et al. (2008), and so we will give only a summaryhere. Briefly, earthworms were exposed in a spiked natural

soil, using an existing 28-day test protocol (Spurgeon et al.

2003a). The toxicants Cd, fluoranthene (FA), and atrazine(AZ) were mixed with a commercially available loam soil

(soil characteristics described by Spurgeon et al. 2004) at a

range of concentrations, using an experimental design withreplicated concentrations (n = 8 for controls, and n = 5

for replicated toxicant concentrations), supplemented by

individual non-replicated exposures at intermediate andadditional concentrations (Table 1). The concentrations

were chosen based on previous experiments to cover the

same sub-lethal effect range for each toxicant, as shownby similar reductions in reproduction for each toxicant

across the concentration ranges used. Each replicate

comprised a group of 8 clitellate earthworms in 1 kg ofsoil. Exposure was for 28 days, and survival, weight

change, and cocoon production at the end of the period

were all recorded. At the end of the 28 days, three indi-viduals from each replicate were then pooled, snap-frozen,

and ground, to give a single sample. In total, 104 samples

were analysed.

2.2 Sample preparation for NMR analysis

We lyophilized ground tissues without allowing them to

thaw, and then stored them at -80"C until extraction. We

extracted the tissues by homogenizing approx. 80 mg dryweight into 4 ml of 70% ice-cold acetonitrile solution,

using a Heidolph SilentCrusher homogenizer. One ml of

each of the extracts was then subjected to solid-phaseextraction, using Strata-X-AW mixed-mode 6 ml car-

tridges with 200 mg adsorbent (Phenomenex, Macclesfield,

UK), in order to remove 2-hexyl-5-ethyl-furan-3-sulfonicacid (HEFS). The extracts were diluted with water, loaded

onto the cartridges, and eluted with 6 ml of HPLC-grade

methanol. The eluate was then dried using a rotary vacuumconcentrator at 45"C, and resuspended in 0.65 ml of NMR

Q. Guo et al.

123

https://www.researchgate.net/publication/5517972_Evaluation_of_sample_preparation_methods_for_nuclear_magnetic_resonance_metabolic_profiling_studies_with_Eisenia_fetida?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/5917232_A_Metabolomics_Based_Approach_to_Assessing_the_Toxicity_of_the_Polyaromatic_Hydrocarbon_Pyrene_to_the_Earthworm_Lumbricus_rubellus?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/8973632_Responses_of_earthworms_Lumbricus_rubellus_to_copper_and_cadmium_as_determined_by_measurement_of_juvenile_traits_in_a_specifically_designed_test_system?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/40797520_Quantifying_copper_and_cadmium_impacts_on_intrinsic_rate_of_population_increase_in_the_terrestrial_oligochaete_Lumbricus_rubellus?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==


https://www.researchgate.net/publication/21122246_The_application_of_pattern_recognition_methods_to_the_analysis_and_classification_of_toxicological_data_derived_from_proton_NMR_spectroscopy_of_urine?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/20237114_Investigations_Into_the_Biochemical_Effects_of_Region-specific_Nephrotoxins?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

buffer (0.1 M phosphate buffer, pH 7.0; 0.98 mM sodium

trimethylsilyl-2,2,3,3,-2H4-propionate (TSP); made up in

90% v/v 2H2O). This resuspended sample was then filteredthrough a 10 kDa Nanosep centrifugal membrane filter

(VWR, Lutterworth, UK), which had previously been

rinsed three times with 2H2O to remove traces of glycerol.The samples (0.55 ml) were then transferred to 5 mm

NMR tubes.

2.3 NMR analysis

We acquired the spectra on a Bruker Avance DRX600spectrometer (Bruker, Rheinstetten, Germany) with a 14.1

T magnet and a resulting 1H resonance frequency of

600 MHz, which was equipped with a 5 mm triple-axis-inverse probe and BACS tube-changer autosampler. The

samples were loaded onto the autosampler in randomized

blocks, and were held at room temperature while on theautosampler, and at 300 K during acquisition. The spectra

were acquired using previously described parameters for

one-dimensional spectra using the first increment of aNOESY experiment together with solvent suppression on

the water resonance (Beckonert et al. 2007); 256 transients

were acquired per sample, following 8 dummy scans toapproach a steady state. Data were acquired into 32 K

points across a 12 kHz spectral width, with a resultant

acquisition time of 1.36 s. An additional longitudinalrelaxation delay of 3.5 s was included, giving a 5 s recycle

time. We processed the spectra using iNMR 2.5.5 (Nu-

cleomatica, Molfetta, Italy). We performed time-domainfiltering of the residual solvent peak for each spectrum, and

then multiplied each FID by a 0.5 Hz exponential apodi-

zation function, followed by Fourier transformation. Phasevalues and baselines were both adjusted automatically,

using the ‘metabolomic phase correction’ option, and first-

order polynomial baseline correction. The data wereexported at three different resolutions for further analysis:

(i) at the native spectral resolution; (ii) divided into

0.005 ppm bins; and (iii) divided into 148 bins with man-ually selected integral regions, chosen such that, as far as

possible and considering all spectra simultaneously, each

manually selected bin included resonances from a singlemetabolite only, and all resonances in the spectra were

included (i.e. this does not equate to a variable selection

procedure). This reduces the effects of peak shifts betweenspectra (as might be caused by slight pH variation, for

instance), reduces the number of individual resonances

spread across more than one bin, and excludes regionscontaining only noise. The boundaries between the bins are

given in Table S1 (supporting information). Data for

metabolite assignment were taken from previous work,together with additional information from BMRB (Ulrich

et al. 2008).

2.4 Data analysis

All multivariate analyses were carried out using the soft-ware package Simca-P 11.5 (Umetrics UK, Windsor, UK).

Data were pre-treated two ways: (a) for initial multivariateanalyses, i.e. principal components analysis (PCA) of all

data and partial least squares (PLS) regressions of indi-

vidual compounds, we used 3 Hz (0.005 ppm) bins, which

Table 1 Number of replicate microcosms per dose level at differentconcentrations (nominal, mg kg-1 soil) of toxicants dosed toearthworms

Concentration AZ Cd FA

0 8 8 7

5 1

7 1

8 1

9 5

12 1

13 5

14 5

15 1

19 1

20 4

21 1

26 1

29 1

31 1

35 4

43 5

45 1

47 5

59 4

65 1

70 1

76 1

98 1

100 1

105 1

148 5

158 5

222 1

237 1

333 1

356 1

500 5

533 5

750 1

800 1

1200 1

Validation of metabolomics for toxic mechanism

123

have previously been used for pattern-recognition of NMR

data, and considered to give advantages over broaderbins (Warne et al. 2000; Viant 2003). Difference profiles

were then calculated for each of the separate toxicant

groups by subtracting the mean control profile (by ‘dif-ference profile’ we mean the equivalent of a difference

spectrum, but for the binned data, not for the full resolu-

tion spectra).The data were then mean-centred, and analysed both

with and without scaling to unit variance. (b) For SIMCAand PCA of the data for the two highest replicated con-

centration levels for each toxicant (Table 1), the manually

selected bins were used. Difference profiles were scaledsuch that the three independent control groups of the ori-

ginal data would have been converted to unit variance

(Malmendal et al. 2006).In this study, we also used mutual information to

investigate relationships between metabolites and external

variables. Mutual information is a general informationtheoretic approach to measure the statistical dependence

between variables. It has been applied in many areas such

as analysis of gene expression data (Steuer et al. 2002;Daub et al. 2004; Meyer et al. 2008), independent com-

ponent analysis (Hyvarinen et al. 2001), and image

processing (Thevenaz and Unser 2000). The conventionalmethod to quantify the linear dependence between vari-

ables is Pearson correlation. However, a vanishing Pearson

correlation does not imply that two variables are indepen-dent (Steuer et al. 2002). Inter-metabolite relationships are

frequently nonlinear (Camacho et al. 2005), and linear

models are also often inadequate to describe biologicaldose-responses (Calabrese 2008; May and Bigelow 2005).

We previously observed nonlinear responses of metabolites

to copper in L. rubellus (Bundy et al. 2008). Therefore, it ishighly desirable to investigate both the linear and nonlinear

dependence between the metabolites and external vari-

ables, especially those nonlinear relations which cannot bedetected by linear measures. Mutual information is able to

detect any type of functional relationship and extends

conventional methods, such as Pearson correlation.Let A be a system with m finite states {a1, a2, …, am}

and pi be a probability of state ai, then the Shannon entropy

H(A) is given as

H"A# $ %Xm

i$1

p"ai# log p"ai# "1#

The Shannon entropy measures the uncertainty of the stateof the system A. The entropy of the system A is zero if

p(aj) = 1 and p(ai) = 0 for all i = j, whereas the entropy

becomes maximal if pi = pj for all i and j. The jointentropy H(A,B) of two systems A and B (with n finite

states) is defined as

H"A;B# $ %Xm;n

i$1;j$1

p"ai; bj# log p"ai; bj# "2#

So we have

H A;B" #&H A" # ' H B" # "3#

The above equation fulfils equality only if A and B areindependent. The mutual information MI(A,B) can be

defined as

MI A;B" # $ H A" # ' H B" # % H A;B" #( 0 "4#

Since mutual information is defined in terms of discretedata, its application to continuous data, such as metabolo-

mics data, requires a binning procedure. Here, wecalculated MI following the method of Daub et al. (2004)

which is based on B-spline functions, and the results were

visualized by plotting MI against Pearson correlation. Theseplots allow one to visually identify individual variables with

high MI and/or correlation for further inspection.

The statistical significance of the MI was estimatedusing the method proposed by Steuer et al. (2002). A

surrogate dataset is generated by random permutations of

the original data. From the mutual information of the ori-ginal dataset MI(X,Y)data, the average value obtained from

surrogate data\MI(Xsurr, Ysurr)[, and its standard devia-

tion rsurr, the significance S can be given as

S $ MI"X; Y#data %\MI"Xsurr; Ysurr#[rsurr

"5#

In this study, S for each variable was estimated by

resampling the data 30 times and then visualized by colour-

scale.

3 Results and discussion

The NMR spectra showed a typical mixture of smallmolecule metabolites, as expected from extracts of earth-

worm tissue (Fig. S1, supporting information). We have

previously assigned metabolites from proton spectra of L.rubellus tissue extracts (Bundy et al. 2008), so will not here

present extensive data on assignment. We did observe a

novel unassigned compound (with triplet resonances at2.98 and 3.15 ppm) in all samples, that we had not previ-

ously seen in earthworm tissue extracts.

We removed the compound HEFS from earthworm tis-sue extracts before NMR analysis, for two reasons: firstly,

as this is found in very high concentration in earthworms,

and has seven separate 1H resonances, it may obscureresponses of other low-concentration metabolites. Sec-

ondly, our preliminary results indicated that, at least for

this set of samples, the tissue extracts were not stable, with

Q. Guo et al.

123

https://www.researchgate.net/publication/5589110_Optimization_of_Mutual_Information_for_Multiresolution_Image_Registration?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/246456885_Independent_Component_Analysis_A_Tutorial?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/8373881_Estimating_mutual_information_using_B-spline_functions-An_improved_similarity_measure_for_analysing_gene_expression_data?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==


https://www.researchgate.net/publication/5577335_Hormesis_Why_it_is_Important_to_Toxicology_and_Toxicologists?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/23427412_minet_A_RBioconductor_Package_for_Inferring_Large_Transcriptional_Networks_Using_Mutual_Information?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/226296023_The_origin_of_correlations_in_metabolomics_data?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/9056914_Viant_MR_Improved_methods_for_the_acquisition_and_interpretation_of_NMR_metabolomic_data_Biochem_Biophys_Res_Commun_310_943-948?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/11075480_The_Mutual_Information_Detecting_and_Evaluating_Dependencies_between_Variables?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==



https://www.researchgate.net/publication/289168468_An_NMR-based_metabonomic_investigation_of_the_toxic_effects_of_3-_trifluoromethyl-aniline_on_the_earthworm_Eisenia_veneta?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

visible changes when the samples were left for several

hours at room temperature (i.e. when running samplesusing an autosampler), including a decrease in HEFS and

increase in resonances from a suspected HEFS degradation

product at 6.60 and 1.16 ppm. We surmised that, given itsamphiphilic properties, HEFS might stabilize macromole-

cules even in the solvent extracts, which could include

degradative enzymes. In addition, we filtered samplesthrough a 10 kDa-cutoff membrane filter. The extracts

treated in this way were stable over the period of acquisi-tion, as observed by NMR.

3.1 The three compounds induced distinct and well-defined metabolic responses at sub-lethal

concentrations

Clear dose-responsive effects were observed for all toxi-

cants at the whole profile level. All exposures were at a

sub-lethal level, but sufficient to cause a significantreduction in an ecologically important endpoint (repro-

duction, as measured by rate of cocoon production).

Pattern-recognition analysis showed that there were bio-logical differences between the control groups for the three

toxicants (data not shown), which was not surprising as,

because of logistical constraints, the three toxicant expo-sures were carried out sequentially rather than in parallel.

Because of this, we used difference spectra for further

comparative analyses.Initially, we analysed the effects of each toxicant sepa-

rately. The supervised method partial least squares (PLS)

regression offers the advantage of modelling the multi-variate response across different concentrations, as

opposed to just class discrimination. All three toxicants had

highly significant PLS models for exposure (soil) concen-trations (Fig. 1), for data both scaled and not scaled to unit

variance (not shown). It is important to validate these

models beyond fit alone (Broadhurst and Kell 2006); per-mutation plots show that for each compound, Q2Y

distributions fall to zero as the data are randomized (Fig.

S2, supporting information).In addition to the PLS models for soil toxicant con-

centrations, we fitted models for the tissue concentration

data, shown in Fig. 1d–f as predictions for data from anindependent test set (data for training set are shown in Fig.

S3, supporting information). This was with the aim of

mimicking a more realistic scenario for use with earth-worms sampled from contaminated sites (e.g. during

monitoring of exposure to persistent organic pollutants in

top predators), where (a) tissue concentrations might beused in preference to soil concentrations, because of con-

taminant heterogeneity; and (b) one would want to predict/

classify samples based on existing data. Weak relationshipswere shown for all three toxicants. These were significant

for AZ and Cd (P\ 0.001 and P = 0.009, respectively),

and approached significance at the 5% level for FA(P = 0.076). In our current study, which used well-mixed

soil microcosms, the soil toxicant concentrations were

generally a better indicator of exposure than tissue con-centrations. The tissue levels could have been affected by

either contaminant excretion, for all toxicants, or bio-

transformation, for FA and AZ, either of which phenomenawould underestimate total exposure.

Metabolic profiling can be used as a biomarker discov-ery tool. Environmental biomarkers have great potential as

a tool for effect-based rather than residue-based monitor-

ing—although have also been criticized as not translatingto useful regulatory or monitoring tools (Forbes et al.

2006). Nonetheless, there is interest in ecotoxicogenomic

techniques, including metabolomics, from regulators(Ankley et al. 2006). One reason is that the use of a highly

multivariate profile as an endpoint promises the ability to

distinguish between different chemicals; or, more plausi-bly, different toxicant/MOA classes. Thus, in order to

prove usefulness beyond existing biomarkers, one impor-

tant consideration for environmental metabolomics of L.rubellus is to ask if individual toxicants can be distin-

guished on the basis of their NMR spectral fingerprints.

3.2 Classification of individual samples

An initial unsupervised analysis of all data (PCA) did notseparate the dosed samples into three groups in a

straightforward manner, although there were dose- and

concentration-related effects on different PC axes (sup-porting information, Fig. S4), and so we analysed a sub-set

of the data in more detail. We chose the two highest-con-

centration sets of replicate dosed samples for each toxicant(dose levels given in Table 1), as we assumed the between-

toxicant metabolic differences would be most distinct at the

higher concentrations. A PCA of this reduced dataset didindeed show that the different exposed samples tended to

cluster in toxicant-specific groups (Fig. 2a); in addition, the

higher-concentration dose samples fell further from theorigin than the lower-concentration samples. Given that

PCA alone is not a classification method, we also used

SIMCA (statistical isolinear multiple component analysis;or, soft independent modelling of class analogy (Wold

1974, as cited in Duewer et al. 1975; Wold 1976)) to

perform a simple classification.Our initial SIMCAwas based on a PCA (two components)

of each separate toxicant. Figure 2b shows, unsurprisingly,

that each toxicant model successfully predicted all its ownsamples (with the exception of one AZ sample at the higher

dose level, which fell outside all models, and is not shown in

the figure). Four samples were classed as belonging to allthreemodels (two FA and twoCd, both from the lower dose).


123

https://www.researchgate.net/publication/7283543_The_use_and_misuse_of_biomarkers_in_ecotoxicology_Environ_Toxicol_Chem?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==


https://www.researchgate.net/publication/255231422_Source_identification_of_oil_spills_by_pattern_recognition_analysis_of_natural_elemental_composition_Interim_technical_report_No_7_August_1974January_1975?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/222719110_Pattern_Recognition_by_Means_of_Disjoint_Principal_Component_Models?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/292265796_Statistical_strategies_for_avoiding_false_discoveries_in_metabolomics_and_related_experiments?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

The Cd and FA models were both conservative, with only 3

and 2 false positives, respectively (17% and 11% of possiblefalse positives), but the AZ model included several samples

from the other toxicant groups: in fact, all but three samples

fitted the AZmodel, 17 false positives in all (85%). Only oneof the AZ samples was classified as belonging to another

group (Cd), and none of the higher-dose AZ samples.

As the validation for this model comes solely from theconsideration of which samples were false positive classi-

fications, we also performed a more stringent test by

considering each toxicant/dose combination as a separatetreatment, i.e. 6 groups instead of 3. An additional level of

validation of the SIMCA performance (using in this case,

one-component PCA models) can then be obtained bytesting not only classification between toxicants, but also

within toxicant dose levels. Figure 2c shows, again, good

classification for the high-dose groups for FA and Cd, withno false positive classifications, and 3/5 and 2/5 of the

lower-dose samples, respectively, correctly classified as FA

or Cd. The high-dose AZ model correctly predicted alllower-dose AZ samples, but again also included two-thirds

of all other samples (13/20) as false positives. Interestingly,

the lower-concentration models were in general poor atpredicting the higher-concentration samples: only the FA

model also included any of the higher-concentration sam-

ples. This emphasizes the need for covering a full dose–response range for parameterizing models. We speculate,

though, that this may in itself lead to additional problems:

the doses in this study were carefully chosen to give sub-lethal responses, and it seems a reasonable hypothesis that

as effect levels increase, i.e. as doses start to induce severe

cellular/tissue damage, that the metabolic profiles mayconverge on a similar high-stress phenotype.

The tendency to false-positive classification of the AZ

models indicates that it would be wise to use caution ininterpreting the specificity of metabolomic classification.

The distinction between Cd- and FA-dosed samples was

good. Hence, if we had not had the AZ samples for com-parison, it would have seemed easier to classify toxicant

MOA by NMR-based profiling and SIMCA. On the same

lines, we expect that if we had data for many more toxi-cants, we would observe a lot of overlap between them. It

1.0 1.5 2.0

1.0

1.5

2.0

Nom

inal

soi

l con

cent

ratio

ns (

log 10

[mg

kg-1 +

10])

1 2 3

1

2

3

-1.5 -1.0 -0.5 0.0 0.5 1.0-1.5

-1.0

-0.5

0.0

0.5

1.0

1 2 3

1

2

3

-2 -1 0 1 2 3-2

-1

0

1

2

3

Fitted concentrations (log10 [mg kg-1 +10])

Mea

sure

d tis

sue

conc

entr

atio

ns (

log 10

mg

kg-1)

Predicted concentrations (log10 mg kg-1)

D

A

E

B

F

C

r2 = 0.60P = 0.009

r2 = 0.34P = 0.076

r2 = 0.79P < 0.001

-0.5 0.0 0.5 1.0 1.5 2.0-0.5

0.0

0.5

1.0

1.5

2.0

Fig. 1 PLS regression analyses of individual compounds vs. 1HNMR data. Dashed black line represents ideal model, i.e. X = Y. a–cfitted values (training set only) against added soil concentrations forAZ (black), Cd (blue), and FA (red), respectively. d–f model

validation, tissue concentrations; predicted values (samples from testset not included in model at any stage) for two samples from each ofthe replicated dose groups. Measured against predicted concentrations

Q. Guo et al.

123

should be borne in mind that the models presented herewere based on relatively small numbers of samples, and

thus the results should be interpreted with care. For this

particular study, the models for the AZ-dosed samples

tended to misclassify the other toxicant-dosed samples asfalse positives. However it would not be safe to assume that

this would be a widely applicable inference about the

effects of atrazine on L. rubellus biochemistry: more data,from repeated biological experiments using different soil

types etc., would be required before we could draw such

general conclusions.We also note that SIMCA may not be the best choice for

future real-world compound classification: simpler models,e.g. Fisher’s discriminant analysis (Fisher 1936), ideally

based on a few selected biomarker compounds, offer

advantages such as model transparency and transferabilitybetween different labs and users; and, conversely,

approaches which do not assume specific distributions can

more fully model the data complexity (e.g. Flaherty et al.2005; Ebbels et al. 2007; Maere et al. 2008). However,

SIMCA provided proof-of-principle for multivariate com-

pound MOA classification, based on a simple model thatuses all spectral data.

3.3 Responses of specific compounds(biomarker identification)

In addition to the sample-centred pattern-recognitionanalyses described above, we were also interested in a

variable-centred analysis to determine which metabolites

were changing as a result of compound exposure. PLS, likemany data modelling approaches, assumes a linear

dependence of response on predictor variables. However,

in reality, many biomolecular responses can be expected tobe non-linear. Hence we also calculated the mutual infor-

mation (MI) and Pearson correlation (r) between each

variable (i.e. NMR spectral region bin) and the compoundconcentration for that sample, using both nominal and

measured tissue concentrations. The MI gives a measure of

statistical dependence between two variables, which can beuseful where there is a biological relationship which has

zero or low r (Steuer et al. 2002), and has been used for

assessing relationships between metabolites for urinaryprofile data (Bang et al. 2008). Plotting r against MI is an

efficient way to display the results for highly multivariate

data (Daub et al. 2004), as it can be seen at a glance whichvariables represent high MI and also high r, or else high MI

but low r (probable non-linear relationship). Individual

variables can then be selected for more detailed examina-tion of their relationship to the external factor (e.g.

contaminant concentration), and so this approach, as we

have employed it here, is best thought of as a screeningtool. As increasing the spectral resolution increases the

number of points on this type of plot and consequently

hinders interpretation, we used the manually selected binsto minimize the number of variables. Using data on

AZ Cd

FA

B

C

A

AZ

59

AZ

35

Cd

500

Cd

148

FA 5

33

FA 1

58

AZ 59

AZ 35

Cd 500

Cd 148

FA 533

FA 1580

0.25

0.5

0.75

1

-30 -15 0 15 30

PC 1 scores (24% variance explained)

-20

-10

0

10

20

PC

2 s

core

s (1

7% v

aria

nce

expl

aine

d)

Fig. 2 a Principal components analysis of two highest replicateddose groups for three compounds. Small symbol size = lower doselevel, and large symbol size = higher dose level. Black = AZ,blue = Cd, red = FA. b SIMCA analysis of three groups (AZ, Cd,FA). Sample within the appropriate ellipse indicates sample wasclassified as belonging to that compound group. Symbols are the sameas for panel a. c SIMCA analysis of six groups (AZ, Cd, and FA, atboth higher and lower doses). Y axis indicates model, X axis indicatessample. Both size and colour of point indicates proportion of samplesclassified as belonging to that model (1 = all belonging, 0 = nonebelonging), i.e. point indicates the proportion of samples in the jthcolumn belonging to the model in the ith row. Thick black linesindicate within-compound comparisons, i.e. an ‘ideal’ metabolicresponse would have been 100% classification for squares within thethick black lines, and 0% for squares outside the lines


123

https://www.researchgate.net/publication/5449383_Extracting_expression_modules_from_perturbational_gene_expression_compendia?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==


https://www.researchgate.net/publication/7821585_A_latent_variable_model_for_chemogenomic_profiling?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==



https://www.researchgate.net/publication/260869480_The_Use_of_Multiple_Measurements_in_Taxonomic_Problems?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

individual fitted compounds would have been better still

(Wishart 2008), but was not practicable in the currentstudy. It should also be borne in mind that more data are

required to calculate MI accurately than Pearson r, andideally we would have had more samples, especially for thesingle-toxicant analyses (Camacho et al. 2005; Khan et al.

2007). However, our use of the permutation procedure in

the MI calculation protects against spuriously high valuesof the MI being interpreted as a significant relationship.

The MI plots show that, when calculated against eithersoil or tissue concentrations, there were some highly sig-

nificant metabolic responses to the three compounds, but

these were largely linear relationships (Fig. 3). For bothtissue and soil AZ concentrations, a variable centred at

1.20 ppm was the most significant; inspection of the

original NMR spectra shows that this region contains a

doublet resonance. We have tentatively assigned this as b-hydroxybutyrate, on the basis of chemical shift, multi-

plicity, and J-coupling. In addition, fumarate (variable at

6.52 ppm) was identified as a significantly responsivemetabolite (Fig. 3a). Atrazine exposure causes upregula-

tion of transcripts associated with the citric acid cycle and

oxidative phosphorylation in L. rubellus, and also a verysignificant upregulation of transcripts related to protein

turnover (Owen et al. 2008). It is possible that the changesobserved here represent some kind of biochemical starva-

tion response induced by atrazine. Cadmium induced two

very clear biochemical responses, a reduction in succinate(variable at 2.41 ppm), and an increase in nicotinic acid

(variable at 8.72 ppm). The responses to FA included a

!-hydroxybutyrate

Fumarate

Succinate2.39

Betaine

CTP

Lactate

CTP

Lactate

A B

C D

E F

Asn

Lys

Asp

DMH

Glucose

Lys

3.39

2.39

3.553.74

3.79

!-hydroxybutyrate

Fumarate

3.79

Malate

mutual information

mutual information mutual information

mutual information

mutual informationmutual information

Pear

son’

s co

rr. c

oeffi

cien

tPe

arso

n’s

corr

. coe

ffici

ent

Pear

son’

s co

rr. c

oeffi

cien

t

Pear

son’

s co

rr. c

oeffi

cien

tPe

arso

n’s

corr

. coe

ffici

ent

Pear

son’

s co

rr. c

oeffi

cien

t

Fig. 3 Linear and non-linearrelationships of metabolites totoxic chemicals: mutualinformation (abscissa) plottedagainst Pearson correlation r(ordinate). Parabola indicatesthe MI expected for linearrelationship; points with highMI but low correlation are likelyto show strong non-linearrelationships againstcontaminants. Colour scaleindicates significance of MI. aAZ (soil concentrations). b AZ(tissue concentrations). c Cd(soil concentrations). d Cd(tissue concentrations). e FA(tissue concentrations). f FA(soil concentrations)

Q. Guo et al.

123


https://www.researchgate.net/publication/223260981_Quantitative_Metabolomics_Using_NMR?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

variable at 4.11 ppm (lactate; the methyl resonance at

1.33 ppm was not here identified as significantly different,presumably because at neutral pH this overlaps with a

resonance from threonine in 1D NMR spectra), and a group

of variables at 6.00, 6.13, 7.94, and 7.96 ppm. (The indi-vidual relationships for these variables listed here are all

shown in supporting information, Fig. S5.) These last four

variables correspond to three resonances (7.94 and 7.96represent two halves of a doublet, binned separately

because of partial overlap with another resonance): thechemical shifts, multiplicities, relative intensities, and

J-couplings are all consistent with cytidine triphosphate

(CTP), and the statistical correlation between these threeresonances across all spectra is high, indicating they belong

to a single compound, and so we have assigned this as

CTP. (The 7.4 Hz J-coupling of the resonance at 6.13 ppmis particularly diagnostic, indicating this is not an anomeric

ribose proton resonance.) This analysis clearly shows that

the response of L. rubellus to the three toxicants does notjust involve the same metabolites in each case, again

supporting the future potential of metabolomics for MOA

classification in earthworms.Multivariate analysis of sample differences (pattern

recognition) is especially useful as a screening tool for

metabolic biomarker discovery. However, actual imple-mentation of the results may well be better done through

selection of a small number of robust biomarkers, rather

than using the entire profile (although there may, of course,also be advantages to using the whole profile). Here, a

simple scatterplot of just two metabolites identified from

the MI/correlation plots (scyllo-inositol and b-hydroxybu-tyrate) shows that the different toxicant-dosed samples

clearly tend to separate (Fig. 4). We could have also cho-

sen other metabolites to demonstrate this, and probablyselection of a small number would prove optimal for future

development of discriminatory models. Still, this demon-strates the robust nature and potential value of metabolic

biomarkers in L. rubellus—although, naturally, further

tests with more toxicants would be required to fully vali-date this as a reliable diagnostic tool, especially if field

samples were to be used as well as laboratory-dosed

worms.

3.4 Comparing metabolomic responses to ecologically

important endpoints

It is important within toxicogenomics, including environ-

mental metabolomics, to anchor omic profile responses towell-understood and accepted phenotypic endpoints (Paules

2003; van Gestel and Weeks 2004; Hines et al. 2007).

Within ecotoxicology, some kind of reproductive endpointis often used, as reproduction is considered likely to be one

of the most sensitive targets of environmental pollution and

is, of course, highly relevant to effects on populations. Inearthworms, reproduction can be assessed in toxicity tests by

measuring the cocoon production rate (van Gestel et al.

1989; Spurgeon et al. 2003b). We have previously shownthat both metabolomic and transcriptomic data in earth-

worms can be anchored in this way to macro phenotype data

such as weight change (Bundy et al. 2008). It should benoted that, although this is often described as ‘linking’

molecular and functional endpoints (e.g. van Gestel and

Weeks 2004), the links are usually purely statistical, with nodirect mechanistic linkage. Nonetheless, even a statistical

linkage is useful for putting the metabolomic profile results

in context, and it has an additional use in our current study: itprovides us with a biologically justifiable approach to assess

the general metabolite responses to toxicity across all three

compounds, by using the reproduction rate as a comparator.Indeed, we expect that some kind of functional endpoint that

can be related in some way to long-term effects on indi-

viduals that are relevant to populations—probably based oneither reproductive success (preferable if feasible), or

energetic balance (if the organism has a long generation

time, or is difficult to breed in the laboratory)—will have tobe included for this reason in any environmental meta-

bolomic study aiming for comparison of multiple compound

MOA in a chemical compendium (Walker et al. 1999;Hughes et al. 2000; Hillenmeyer et al. 2008).

-25 0 25 50

!-hydroxybutyrate

-100

-50

0

50

100

150

scyl

lo-in

osito

l

1200

60714

Fig. 4 Two metabolites are sufficient to give considerable separationof all dosed worm samples, even at low toxicant concentrations.Colour of points indicates toxicant (black: atrazine; blue: cadmium;red: fluoranthene), and size of point represents dose level. The axesrepresent difference spectra of normalized concentrations, asdescribed in the Sect. 2, but data were not otherwise scaled nortransformed in any way


123

https://www.researchgate.net/publication/12390336_Functional_Discovery_via_a_Compendium_of_Expression_Profiles?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/51359650_Prediction_of_gene_function_by_genome-scale_expression_analysis_prostate_cancer-associated_genes?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==

https://www.researchgate.net/publication/10748275_Phenotypic_Anchoring_Linking_Cause_and_Effect?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==



It was possible to fit a significant PLS model againstcocoon production rate for all three compounds simulta-

neously (Fig. 5a), which implies there were general

metabolic responses representative of overall stress ortoxicity. The MI for 3 compounds against cocoon pro-

duction rate was very low, with maximum MI of only

around 0.12 (Fig. 5b), although there were a number ofsignificant variables. This particular case also exemplifies

the use of MI in identifying potentially non-linear rela-

tionships: betaine showed a linear increase for all threecompounds with increasing toxicity (i.e. decreasing

reproduction; Fig. 5c), and hence would be a good candi-

date for a future general-toxicity biomarker. In contrast,scyllo-inositol increased with increasing toxicity for Cd

and AZ, but did not respond or even decreased for the high-

concentration FA doses (Fig. 5d). This complex responsegives a very low correlation (point ii, Fig. 5b), and hence

could have been missed by linear analyses.

4 Conclusion

Metabolic profiling approaches have already been applied to

earthworms with some success, for both laboratory and field

studies. Here we have reported data for the peregrine species

Lumbricus rubellus, which is of particular interest as ametabolomics model, given its wide distribution, suitability

to both field and laboratory tests, and amount of sequence

data available (Lumbribase, www.earthworms.org). In par-ticular, we aimed to determine (as a proof-of-principle for

MOA discrimination) if we could tell apart worms dosed

with one of three very different chemicals, expected to havedifferent MOA: a toxic metal salt (cadmium), a polycyclic

aromatic hydrocarbon (fluoranthene), and an agrochemical

biocide (atrazine). We could discriminate samples both byusing multivariate chemometric methods, and also by

selecting individual metabolite biomarkers; unsurprisingly,

higher doses were better separated than lower doses. Wewere also able to show relationships between metabolic

responses and the widely accepted and ecologically mean-

ingful parameter cocoon production rate (i.e. reproduction).We conclude NMR-based metabolic profiling is indeed

capable of discriminating toxic MOA for L. rubellus, withboth specific and general potential metabolic biomarkersidentified.

Acknowledgements We thank Slawomir Zukowski for valuableassistance in developing the code for analysis of mutual information.The Natural Environment Research Council is acknowledged forfunding.

0 1 2

Cocooon production rate (predicted)

0

1

2

Coc

oon

prod

uctio

n ra

te (

obse

rved

)

0.0 0.5 1.0 1.5 2.0

Cocoon production rate

-200

0

200

400

600

Bet

aine

0.0 0.5 1.0 1.5 2.0

-100

-50

0

50

100

150

Var

iabl

e at

3.3

5 pm

A B

C D

Fig. 5 Metabolites related toreproduction (cocoonsworm-1 week-1) across threedifferent toxic compoundresponses. a predicted vs.observed reproduction rates(PLS, 4 axes, cumulative Q2Y0.54; validation plot given asFig. S6, supportinginformation). Black: atrazine.Blue: cadmium. Red:fluoranthene. Grey: controls.Dotted line represents perfectprediction, X = Y. b mutualinformation vs. Pearsoncorrelation, relationship tococoon production rate.Significance given as colourscale. Relationships for pointslabelled i. (betaine) and ii.(scyllo-inositol) are shown in cand d, respectively. Black:atrazine. Blue: cadmium. Red:fluoranthene. Data for controlpoints not shown

Q. Guo et al.

123

References

Ankley, G. T., et al. (2006). Toxicogenomics in regulatory ecotoxi-cology. Environmental Science and Technology, 40, 4055–4065.doi:10.1021/es0630184.

Bang, J. W., et al. (2008). Integrative top-down system metabolicmodeling in experimental disease states via data-driven Bayesianmethods. Journal of Proteome Research, 7, 497–503. doi:10.1021/pr070350l.

Beckonert, O., et al. (2007). Metabolic profiling, metabolomic andmetabonomic procedures for NMR spectroscopy of urine,plasma, serum and tissue extracts. Nature Protocols, 2, 2692–2703. doi:10.1038/nprot.2007.376.

Broadhurst, D. I., & Kell, D. B. (2006). Statistical strategies for avoidingfalse discoveries in metabolomics and related experiments. Meta-bolomics, 2, 171–196. doi:10.1007/s11306-006-0037-z.

Brown, S. A., Simpson, A. J., & Simpson, M. J. (2008).Evaluation of sample preparation methods for nuclear mag-netic resonance metabolic profiling studies with Eiseniafetida. Environmental Toxicology and Chemistry, 27, 828–836. doi:10.1897/07-412.1.

Bundy, J. G., et al. (2002). Metabonomic assessment of toxicity of 4-fluoroaniline, 3,5-difluoroaniline and 2-fluoro-4-methylaniline tothe earthworm Eisenia veneta (Rosa): Identification of newendogenous biomarkers. Environmental Toxicology and Chem-istry, 21, 1966–1972. doi:10.1897/1551-5028(2002)021\1966:MAOTOF[2.0.CO;2.

Bundy, J. G., et al. (2004). Environmental metabonomics: Applyingcombination biomarker analysis in earthworms at a metalcontaminated site. Ecotoxicology (London, England), 13, 797–806. doi:10.1007/s10646-003-4477-1.

Bundy, J. G., et al. (2007). Metabolic profile biomarkers of metalcontamination in a sentinel terrestrial species are applicableacross multiple sites. Environmental Science and Technology,41, 4458–4464. doi:10.1021/es0700303.

Bundy, J. G., et al. (2008). ‘Systems toxicology’ approach identifiescoordinated metabolic responses to copper in a terrestrial non-model invertebrate, the earthworm Lumbricus rubellus. BMCBiology, 6, 25. doi:10.1186/1741-7007-6-25.

Calabrese, E. J. (2008). Hormesis: Why it is important to toxicologyand toxicologists. Environmental Toxicology and Chemistry, 27,1451–1474. doi:10.1897/07-541.1.

Camacho, D., de la Fuente, A., & Mendes, P. (2005). The origin ofcorrelations in metabolomics data. Metabolomics, 1, 53–63. doi:10.1007/s11306-005-1107-3.

Daub, C. O., Steuer, R., Selbig, J., & Kloska, S. (2004). Estimatingmutual information using B-spline functions—an improvedsimilarity measure for analysing gene expression data. BMCBioinformatics, 5, 118. doi:10.1186/1471-2105-5-118.

Duewer, D. L., Kowalski, B. R., & Schatzki, T. F. (1975). Sourceidentification of oil spills by pattern recognition analysis ofnatural elemental composition. Analytical Chemistry, 47, 1573–1583. doi:10.1021/ac60359a051.

Ebbels, T. M., et al. (2007). Prediction and classification of drugtoxicity using probabilistic modeling of temporal metabolic data:The consortium on metabonomic toxicology screening approach.Journal of Proteome Research, 6, 4407–4422. doi:10.1021/pr0703021.

Fisher, R. A. (1936). The use of multiple measures in taxonomicproblems. Annals of Eugenics, 7, 179–188.

Flaherty, P., Giaever, G., Kumm, J., Jordan, M. I., & Arkin, A. P.(2005). A latent variable model for chemogenomic profiling.Bioinformatics (Oxford, England), 21, 3286–3293. doi:10.1093/bioinformatics/bti515.

Forbes, V. E., Palmqvist, A., & Bach, L. (2006). The use and misuseof biomarkers in ecotoxicology. Environmental Toxicology andChemistry, 25, 272–280. doi:10.1897/05-257R.1.

Gartland, K. P., Beddell, C. R., Lindon, J. C., & Nicholson, J. K.(1990). A pattern recognition approach to the comparison ofPMR and clinical chemical data for classification of nephrotox-icity. Journal of Pharmaceutical and Biomedical Analysis, 8,963–968. doi:10.1016/0731-7085(90)80151-E.

Gartland, K. P., Beddell, C. R., Lindon, J. C., & Nicholson, J. K.(1991). Application of pattern recognition methods to theanalysis and classification of toxicological data derived fromproton nuclear magnetic resonance spectroscopy of urine.Molecular Pharmacology, 39, 629–642.

Gartland, K. P., Bonner, F. W., & Nicholson, J. K. (1989).Investigations into the biochemical effects of region-specificnephrotoxins. Molecular Pharmacology, 35, 242–250.

Gong, P., et al. (2007). Toxicogenomic analysis provides new insightsinto molecular mechanisms of the sublethal toxicity of 2,4,6-trinitrotoluene in Eisenia fetida. Environmental Science andTechnology, 41, 8195–8202. doi:10.1021/es0716352.

Hillenmeyer, M. E., et al. (2008). The chemical genomic portrait ofyeast: Uncovering a phenotype for all genes. Science, 320, 362–365. doi:10.1126/science.1150021.

Hines, A., Oladiran, G. S., Bignell, J. P., Stentiford, G. D., & Viant,M. R. (2007). Direct sampling of organisms from the field andknowledge of their phenotype: Key recommendations forenvironmental metabolomics. Environmental Science and Tech-nology, 41, 3375–3381. doi:10.1021/es062745w.

Hughes, T. R., et al. (2000). Functional discovery via a compendiumof expression profiles. Cell, 102, 109–126. doi:10.1016/S0092-8674(00)00015-5.

Hyvarinen, A., Karhunne, J., & Oja, E. (2001). IndependentComponent Analysis. New York: Wiley.

Jones, O. A., Spurgeon, D. J., Svendsen, C., & Griffin, J. L. (2008). Ametabolomics based approach to assessing the toxicity of thepolyaromatic hydrocarbon pyrene to the earthworm Lumbricusrubellus.Chemosphere, 71, 601–609. doi:10.1016/j.chemosphere.2007.08.056.

Khan, S., et al. (2007). Relative performance of mutual informationestimation methods for quantifying the dependence among shortand noisy data.Physical Review E: Statistical, Nonlinear, and SoftMatter Physics, 76, 026209. doi:10.1103/PhysRevE.76.026209.

Lin, C. Y., Viant, M. R., & Tjeerdema, R. S. (2006). Metabolomics:Methodologies and applications in the environmental sciences.Journal of Pesticide Science, 31, 245–251. doi:10.1584/jpestics.31.245.

Maere, S., Van Dijck, P., & Kuiper, M. (2008). Extracting expressionmodules from perturbational gene expression compendia. BMCSystems Biology, 2, 33. doi:10.1186/1752-0509-2-33.

Malmendal, A., et al. (2006). Metabolomic profiling of heat stress:Hardening and recovery of homeostasis in Drosophila. AmericanJournal of Physiology: Regulatory, Integrative and Compara-tive Physiology, 291, R205–R212. doi:10.1152/ajpregu.00867.2005.

May, S., & Bigelow, C. (2005). Modeling nonlinear dose–responserelationships in epidemiologic studies: Statistical approaches andpractical challenges. Dose Response, 3, 474–490. doi:10.2203/dose-response.003.04.004.

Meyer, P. E., Lafitte, F., & Bontempi, G. (2008). minet: A R/Bioconductor package for inferring large transcriptional net-works using mutual information. BMC Bioinformatics, 9, 461.doi:10.1186/1471-2105-9-461.

OECD. (1984). Guidelines for the testing of chemicals. 207.Earthworm acute toxicity tests. Paris: OECD.


123








































https://www.researchgate.net/publication/6295873_Direct_Sampling_of_Organisms_from_the_Field_and_Knowledge_of_their_Phenotype_Key_Recommendations_for_Environmental_Metabolomics?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==









https://www.researchgate.net/publication/51422301_Modeling_Nonlinear_Dose-Response_Relationships_in_Epidemiologic_Studies_Statistical_Approaches_and_Practical_Challenges?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==





















Owen, J., et al. (2008). Transcriptome profiling of developmental andxenobiotic responses in a keystone soil animal, the oligochaeteannelid Lumbricus rubellus. BMC Genomics, 9, 266. doi:10.1186/1471-2164-9-266.

Paules, R. (2003). Phenotypic anchoring: Linking cause and effect.Environmental Health Perspectives, 111, A338–A339.

Sanchez-Hernandez, J. C. (2006). Earthworm biomarkers in ecologicalrisk assessment. Reviews of Environmental Contamination andToxicology, 188, 85–126. doi:10.1007/978-0-387-32964-2-3.

Spurgeon, D. J., Svendsen, C., Kille, P., Morgan, A. J., & Weeks, J.M. (2004). Responses of earthworms (Lumbricus rubellus) tocopper and cadmium as determined by measurement of juveniletraits in a specifically designed test system. Ecotoxicology andEnvironmental Safety, 57, 54–64. doi:10.1016/j.ecoenv.2003.08.003.

Spurgeon, D. J., Svendsen, C., Weeks, J. M., Hankard, P. K.,Stubberud, H. E., & Kammenga, J. E. (2003a). Quantifyingcopper and cadmium impacts on intrinsic rate of populationincrease in the terrestrial oligochaete Lumbricus rubellus.Environmental Toxicology and Chemistry, 22, 1465–1472.doi:10.1897/1551-5028(2003)22\1465:QCACIO[2.0.CO;2.

Spurgeon, D. J., Weeks, J. M., & van Gestel, C. A. M. (2003b). Asummary of eleven years progress in earthworm ecotoxicology.Pedobiologia, 47, 588–606.

Steuer, R., Kurths, J., Daub, C. O., Weise, J., & Selbig, J. (2002). Themutual information: Detecting and evaluating dependenciesbetween variables. Bioinformatics (Oxford, England), 18(Suppl2), S231–S240.

Thevenaz, P., & Unser, M. (2000). Optimization of mutual informa-tion for multiresolution image registration. IEEE Transactionson Image Processing, 9, 2083–2099. doi:10.1109/83.887976.

Ulrich, E. L., et al. (2008). BioMagResBank. Nucleic Acids Research,36, D402–D408. doi:10.1093/nar/gkm957.

van Gestel, C. A., van Dis, W. A., van Breemen, E. M., &Sparenburg, P. M. (1989). Development of a standardized

reproduction toxicity test with the earthworm species Eiseniafetida andrei using copper, pentachlorophenol and 2,4-dichloro-aniline. Ecotoxicology and Environmental Safety, 18, 305–312.doi:10.1016/0147-6513(89)90024-9.

van Gestel, C. A., & Weeks, J. M. (2004). Recommendations of the3rd international workshop on earthworm ecotoxicology, Aar-hus, Denmark, August 2001. Ecotoxicology and EnvironmentalSafety, 57, 100–105.

van Straalen, N. M., & Roelofs, D. (2008). Genomics technology forassessing soil pollution. Journal of Biology (Online), 7, 19. doi:10.1186/jbiol80.

Viant, M. R. (2003). Improved methods for the acquisition andinterpretation of NMR metabolomic data. Biochemical andBiophysical Research Communications, 310, 943–948. doi:10.1016/j.bbrc.2003.09.092.

Viant, M. R. (2007). Metabolomics of aquatic organisms: The new‘omics’ on the block. Marine Ecology Progress Series, 332,301–306. doi:10.3354/meps332301.

Walker, M. G., Volkmuth, W., Sprinzak, E., Hodgson, D., & Klingler,T. (1999). Prediction of gene function by genome-scale expres-sion analysis: Prostate cancer-associated genes. GenomeResearch, 9, 1198–1203. doi:10.1101/gr.9.12.1198.

Warne, M. A., Lenz, E. M., Osborn, D., Weeks, J. M., & Nicholson, J.K. (2000). An NMR-based metabonomic investigation of thetoxic effects of 3-trifluoromethyl-aniline on the earthwormEisenia veneta. Biomarkers, 5, 56–72. doi:10.1080/135475000230541.

Wishart, D. S. (2008). Quantitative metabolomics using NMR. Trendsin Analytical Chemistry, 27, 228–237. doi:10.1016/j.trac.2007.12.001.

Wold, S. (1976). Pattern recognition by means of disjoint principalcomponents models. Pattern Recognition, 8, 127–139. doi:10.1016/0031-3203(76)90014-5.

Q. Guo et al.

123




https://www.researchgate.net/publication/51412549_Genomics_technology_for_assessing_soil_pollution?el=1_x_8&enrichId=rgreq-03270426a5fd427f47275f13a70acf92-XXX&enrichSource=Y292ZXJQYWdlOzIxNTk5MzQyMjtBUzoyNzI5Nzk3NTM5NTk0MjVAMTQ0MjA5NDg1MzMxMA==