Determining the composition of mineral-organic mixes using UV–vis–NIR diffuse reflectance...

14
This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for non-commercial research and educational use including without limitation use in instruction at your institution, sending it to specific colleagues that you know, and providing a copy to your institution’s administrator. All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier’s permissions site at: http://www.elsevier.com/locate/permissionusematerial

Transcript of Determining the composition of mineral-organic mixes using UV–vis–NIR diffuse reflectance...

This article was originally published in a journal published byElsevier, and the attached copy is provided by Elsevier for the

author’s benefit and for the benefit of the author’s institution, fornon-commercial research and educational use including without

limitation use in instruction at your institution, sending it to specificcolleagues that you know, and providing a copy to your institution’s

administrator.

All other uses, reproduction and distribution, including withoutlimitation commercial reprints, selling or licensing copies or access,

or posting on open internet sites, your personal or institution’swebsite or repository, are prohibited. For exceptions, permission

may be sought for such use through Elsevier’s permissions site at:

http://www.elsevier.com/locate/permissionusematerial

Autho

r's

pers

onal

co

py

Determining the composition of mineral-organic mixes using UV–vis–NIRdiffuse reflectance spectroscopy

R.A. Viscarra Rossel ⁎, R.N. McGlynn, A.B. McBratney

Australian Centre for Precision Agriculture, The University of Sydney, NSW 2006, Australia

Received 23 November 2005; received in revised form 29 May 2006; accepted 18 July 2006Available online 12 September 2006

Abstract

This paper reports a first attempt to develop a simple, quantitative, non-destructive and inexpensive methodology to characterise the mineralcomposition of soil using diffuse reflectance spectroscopy (DRS). Although there are studies that qualitatively characterise soil minerals usingDRS, few studies quantify their composition in soil. Our aims were to: (i) design an experiment that may be used to model mineral-organic mixesas a function of their ultra violet (UV), visible (vis) and near infrared (NIR) diffuse reflectance spectra and (ii) use these models to predict themineral-organic composition of independent test mixes. Our experiments used a three-factor simplex lattice design with three levels,corresponding to kaolinite (K), illite (I) and smectite (S). To this simplex we added two levels of goethite (G) and two levels of a 50/50 mix ofhumic and fulvic acids (H–F). Finally we added 3 levels of quartz (Q) to the mixes. We modelled the data using the partial least squares regression1 algorithm (PLSR1) and made predictions using bootstrap aggregation-PLSR (or bagging-PLSR). The numbers of factors in the PLSR modelswere selected by leave-one-out cross-validation. These models were independently tested using mixes that we made up to roughly represent themineral-organic composition of common Australian soils. Predictions of the amount of K, I and S in the test mixes were accurate (RMSE=3.6, 3.4and 3.4%, respectively). Predictions of goethite and the H–F mix were biased. Predictions of Q were very poor as quartz does not have a spectralresponse in the UV–vis–NIR. A principal components analysis (PCA) was used to compare the spectra of these test mixes with those ofcorresponding Australian soils. In future work we plan to expand our spectral library and improve the experimental design and our models.© 2006 Elsevier B.V. All rights reserved.

Keywords: Soil diffuse reflectance spectroscopy; Ultra violet (UV); Visible (vis); Near infrared (NIR) spectroscopy; Soil mineral composition; Partial least squaresregression

1. Introduction

Amongst the principal properties of a soil is its mineralcomposition. Soil minerals account for generally half the soilvolume (Schulze, 2002). Their type, proportion and concentra-tion ultimately determine important properties such as textureand cation exchange capacity. These properties may in turn havea significant effect on many other soil properties: take nutrientavailability for plant uptake, e.g. potassium in soil, is dependenton its release from the weathering of primary minerals. Determi-ning the mineral composition of a soil is therefore important, asis the development of a simple, quantitative technique to do it.Conventional methods, such as the commonly used X-ray dif-

fraction (XRD) technique (e.g. Brown and Brindley, 1984) areprimarily qualitative (Whittig and Allardice, 1986). Althoughquantitative modifications of the XRD methodology exist, e.g.X-ray powder diffraction (XRPD), they are usually involved,time-consuming and expensive (e.g. Hillier et al., 2003). Thispaper presents our initial attempt to develop a rapid, cheap,accurate and non-destructive method to determine the mineral-organic composition of soil using diffuse reflectance spectros-copy (DRS) in the ultra violet (UV), visible (vis) and nearinfrared (NIR) portions of the electromagnetic spectrum.

Soil minerals absorb light in the UV–vis–NIR and in the midinfrared (MIR) portions of the electromagnetic spectrum. Ab-sorption of light in these regions may occur through electronictransitions of atoms and also vibrational stretching and bending ofmolecules and crystals, all of which are frequency dependent.This dependency allows us to obtain information about the

Geoderma 137 (2006) 70–82www.elsevier.com/locate/geoderma

⁎ Corresponding author.E-mail address: [email protected] (R.A. Viscarra Rossel).

0016-7061/$ - see front matter © 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.geoderma.2006.07.004

Autho

r's

pers

onal

co

py

chemistry of a mineral. The fundamental vibrations of most soilmaterials occur in the MIR, with overtones and combinationsfound in the NIR and electronic transition in the UV–vis. Anumber of other publications have dealt with the use of the MIRregion for mineral identification (e.g. Nguyen et al., 1991; Janikand Skjemstad, 1995). Here we focus on the UV–vis–NIR. Soilminerals such as phyllosilicates have distinct spectral signaturesin the (short-wave) NIR which are due to strong absorption of theovertones of OH, SO4 and CO3 groups as well as combinations offundamental features of H2O and CO2. Absorptions due toelectronic transition are primarily associated with minerals thatcontain iron (e.g. haematite, goethite) and their fundamentals maybe found in the vis–NIR region. For comprehensive accounts onall of these processes the reader is directed to, amongst manyothers, Hunt and Salisbury (1970), Hunt (1977), Clark (1999).

Visible light soil reflectance is widely used by soil scientists tocharacterise field soil colour (e.g. Viscarra Rossel et al., 2006a).

The use of DRS in soil science has received much attentionin the last 15 years, and interest is still increasing (e.g. Ben-Dorand Banin, 1990; Stenberg et al., 1995; Viscarra Rossel andMcBratney, 1998; Brown et al., 2005). Mostly, DRS has beenused in conjunction with a multivariate calibration technique(e.g. partial least squares regression PLSR) to relate soil spectrain the UV, vis, NIR and MIR portions of the electromagnetic(EM) spectrum to soil attributes such as organic carbon, claycontent, etc. and to estimate their concentrations (e.g. ViscarraRossel et al., 2006b). Although there are studies that qua-litatively characterise soil minerals using DRS (e.g. Hunt andSalisbury, 1970; Farmer, 1974; Hunt, 1977; Clark et al., 1990),few studies quantify their composition in soil. Ben-Dor and

Fig. 1. (a) Three-factor simplex lattice experimental design with three levels corresponding to kaolinite (K), illite (I) and smectite (S). (b) Additionally, two levels of theiron oxide goethite (G) and two levels of a humic–fulvic acid (H–F) mix were added compositionally. Quartz (Q) was added as mass % of total. Additionally, 15 end-member minerals were also used in the calibrations.

71R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

Banin (1990) used NIR to estimate the carbonate concentrationin soils. Ben-Dor and Banin (1994) used vis–NIR (400 nm–1100 nm) DRS with multiple regression to estimate CaC03,Fe203, Al203, Si02, free Fe oxides, and K20. The authors con-cluded that although the vis–NIR technique was not as preciseas conventional chemical analyses, the precision obtained islikely to be useful for rapid soil characterisation and remotesensing. Brown et al. (2006) used vis–NIR DRS with boostedregression trees to predict ordinal clay mineral levels (0–5ordinal scale) of kaolinite, montmorillonite and vermiculite to96%, 88% and 83% respectively, falling within one ordinal unitof reference X-ray diffraction (XRD) values.

The aims of this paper are to: (i) design an experiment thatmay be used to model mineral-organic mixes as a function oftheir UV, vis and NIR diffuse reflectance spectra and (ii) usethese models to predict the mineral-organic composition ofindependent test mixes. Eventually, we aim to apply similar,more refined models to predict the mineral-organic compositionof soil and to use them with a spectral soil inference system(SPEC-SINFERS) (McBratney et al., in press) for the predictionof functional soil properties.

2. Materials and methods

2.1. Experimental design

We started with a three-factor simplex lattice design withthree levels, corresponding to the three most common soil clayminerals kaolinite (K), illite (I) and smectite (S) (Fig. 1).

To this simplex, that is, to the original K, I, S mixes, we firstincorporated 2 levels of the iron oxide goethite (G) (1% and15%), and then 2 levels of a 50/50 mix of humic and fulvicacids (H–F) (1% and 5%), with the constraint that all thecomponents (K+I+S+G+H–F) summed to 100%. Finally wealso added 3 levels of quartz to the mixes as 10%, 20% and 50%of total weight. We aimed to account for the range incomposition of goethite, organic matter and sand in Australiansoils. These data were used to calibrate our models. The designis shown in Fig. 1.

A number of other minerals that may occur in the clayfraction of soil were also used as additional factors (or ‘end-members’) in the development of the calibration models. Therange of both primary and secondary minerals encompassedattapulgite, bentonite, brucite, carbonate, chlorite, dickite, K-

feldspar, gibbsite, halloysite, hectorite, limonite, metabentonite,mica, oligoclase, and vermiculite. These were included in thecalibrations as either absent (0%) or present (100%).

2.2. Sample and mixture preparation

Our mineral samples originated from various sources. Whennecessary, samples of large minerals were crushed using ahydraulic press and then made into fine powders using a shatterbox. Samples were then ground to a size fraction smaller than200 μm. To ensure complete mixing of the mineral and organicH–F mix, we mixed the components in an agate mortar andpestle before transferring them to petri-dishes and wetting themwith deionised water. We then oven-dried the mixes for 48 h at40 °C and finally ground them up to a size fraction smaller than200 μm.

2.3. Collection of diffuse reflectance spectra

The reflectance spectra of the minerals and mixes wererecorded using an ultra violet–visible–near infrared (UV–VIS–NIR) spectrometer (Cary 500, Varian Inc., CA, USA) with adiffuse reflectance accessory (DRA-CA-50D, Labsphere, NH,USA), equipped with a 150 mm diameter Spectralon integratingsphere. The spectral range of the instrument is 250–2500 nm(28,570– 4000 cm−1) and spectra were collected at 2 nmintervals. A Spectralon® reference panel was used for whitereferencing. Percent reflectance was transformed to Log(1/R)units. Due to insensitivities of the spectrometer at the higher-frequency end of the spectrum, we used data in the rangebetween 280 and 2500 nm. The sample mixes were presented tothe spectrometer in small petri-dishes and 2 spectra werecollected for each sample. The average of these was used fordata analysis and modelling.

2.4. Exploratory data analysis and spectral preprocessing

The spectral data was compressed using principal compo-nents analysis (PCA) and the scores of the first 10 principalcomponents (PCs) were analysed. Scatter-plots of thesecomponents were used to visualise the structure of themultivariate data and to check for possible outliers. We testeda number of preprocessing techniques, alone and in combina-tions. These included the multiplicative scatter correction(MSC) (Martens and Næs, 1989), standard normal variate(SNV) with and without detrending (Barnes et al., 1989),Savitzky–Golay and derivatives (Savitzky and Golay, 1964).Preprocessing using: (i) a wavelet de-noising algorithm(Donoho, 1995) with a Daubechies wavelet with four vanishingmoments and (ii) the first derivative of the de-noised spectra,resulted in robust calibration mixture models.

2.5. Modelling of mineral-organic mixes

Initially, we used the partial least squares regression 2(PLSR2) algorithm (Martens and Næs, 1989) for multiple Y-variables to calibrate our spectroscopic data to the mineral-

Table 1Composition of nine independent test mineral-organic mixes reflecting mineralcomposition of some Australian soils classification orders

ASC Order Kaolinite Illite Smectite Goethite H/F mix Quartz

Vertosol a 0.250 0.290 0.170 0.005 0.020 0.265Vertosol b 0.240 0.090 0.390 0.013 0.023 0.244Kurosol a 0.440 0.033 0.150 0.017 0.013 0.347Kurosol b 0.500 0.100 0.080 0.020 0.018 0.282Dermosol 0.560 0.140 0.014 0.039 0.010 0.237Sodosol/chromosol 0.320 0.440 0.007 0.016 0.015 0.202Kandosol 0.740 0.029 0.004 0.010 0.008 0.209Rudosol 0.350 0.190 0.060 0.042 0.012 0.346Hydrosol 0.400 0.150 0.100 0.030 0.090 0.230

72 R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

organic mixes. Although this was our more intuitive solution tothe modelling of additive mineral mixes, we found the PLSR1algorithm for modelling of single y-variables to surpass PLSR2in terms of model interpretability and predictability. Therefore,we used the PLSR1 algorithm (Wold et al., 1983) to model ourdata. Leave-one-out cross-validation was used to approximatethe number of bilinear factors to use in our PLSR1 calibrationmodels. We tested up to 10 factors for each of our mixturecomponents. Typically, the model with the lowest rool meansquared error (RMSE) is selected; however, in this instance we

used the Akaike Information Criterion (AIC) (Akaike, 1973) topreserve parsimony in the models. Thus we selected the modelwith the lowest AIC value. The AIC was calculated by: AIC=nln(RMSE)+2p, where n is the number of observations and p isthe number of PLSR factors used in the model. We plotted thevector of regression coefficients, b, for each of the kaolin, illiteand smectite PLSR models. These vectors provide usefulinsight into the calibrations. The b coefficients are related to thepure-component spectra and account for the effects ofinterfering components, molecular interactions and base-line

Fig. 2. Stacked UV–vis–NIR diffuse reflectance spectra of: (a) our main mix end-members (i.) quartz, (ii.) kaolin, (iii.) smectite (montmorillonite), (iv.) illite, (v.)goethite, (vi.) H–F mix; and (b) a sample series of the mineral mixes used in the experiments, showing decreasing amounts of kaolin (K) and varying amounts ofprimarily illite (I) and smectite (S), but also goethite (G) and the H–F (humic−fulvic acid) mix. Proportions shown as percentages.

73R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

variations. They are also less dependent on the calibrationdesign than the more commonly used PLSR loading weights, w(Haaland and Thomas, 1988).

2.6. An independent validation set of mineral-organic mixtures

To test our PLSR1 mixture models we made up nineindependent artificial mineral-organic mixes, whose componentswere roughly based on the approximate mineral composition ofAustralian soils as described in Stace et al. (1972). The mixeswere prepared as in Section 2.2, their spectra collected as inSection 2.3 and preprocessed as in Section 2.4 above. Stace etal.'s great soil groups were converted to the more recentAustralian Soil Classification (ASC) system orders (Isbell,1996). Table 1 shows the mineral-organic compositions of thesedata.

2.7. Predictions using bootstrap aggregation or bagging(bagging-PLSR)

Bootstrap aggregating or bagging (Breiman, 1996) was usedin combination with PLSR (bagging-PLSR) to obtain a morerobust, aggregated predictor. Bagging-PLSR was conducted asfollows:

(i) Each training data set was randomly sampled withreplacement using the bootstrap (e.g. Wehrens et al.,2000), to form 30 different bootstrap samples for each ofour mineral-organic components being modelled. Briefly,the bootstrap assumes that the training data set is arepresentation of the population, and that multiplerealisations of the population may be simulated from asingle dataset (Hastie et al., 2001).

Fig. 3. Stacked UV–vis–NIR diffuse reflectance spectra of our additional mineral end-members (i.) feldspar, (ii.) bentonite, (iii.) metabentonite, (iv.) carbonate, (v.)gibbsite, (vi.) dickite, (vii.) mica, (viii.) brucite, (ix.) oligoclase, (x.) halloysite, (xi.) attapulgite, (xii.) hectorite, (xiii.) vermiculite, (xiv.) chlorite, (xv.) limonite.

74 R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py(ii) PLSR models were developed for each of the bootstrap

samples and predictions made onto the corresponding testdata. The bagging-PLSR estimates were calculated byaggregating these predictions. Their uncertainty wascalculated by the 95% confidence intervals from thestandard error of the mean prediction.

2.8. Statistics used to asses the predictions of our independentvalidation mixtures

The quality of the bagging-PLSR predictions of the nineindependent validation mineral-organic mix components wasassessed using the root mean-squared error (RMSE), the meanerror (ME) and the standard deviation of the error distribution(SDE), measuring the accuracy of the predictions, their bias andprecision, respectively. We also recorded the adjusted coefficientof determination R2

adj., which measures the proportion of thevariation in the response that may be attributed to the model ratherthan to random error. The adjustment makes the coefficient morecomparable over models with different numbers of parameters asit uses the degrees of freedom in its computation. Finally werecorded the ratio of percent deviation (RPD) which the ratio ofthe standard deviation of the laboratory measured (reference) datato the RMSE of the validation (Williams, 1987). It is the factor bywhich the prediction accuracy has been increased compared tousing the mean of the original data. We classified RPD values as

follows: RPDb1.0 indicates very poor model/predictions andtheir use is not recommended; RPD between 1.0 and 1.4 indicatespoor model/predictions where only high and low values aredistinguishable; RPD between 1.4 and 1.8 indicates fair model/predictions which may be used for assessment and correlation;RPD values between 1.8 and 2.0 indicates good model/predictions where quantitative predictions are possible; RPDbetween 2.0 and 2.5 indicates very good, quantitative model/predictions, and RPDN2.5 indicates excellent model/predictions.

2.9. Comparison of the artificial mineral-organic mix spectrawith spectra of real soils

Using a PCA analysis, we compared the multivariate spectralspace of the nine artificial mineral-organic mix spectra (Table 1)to the spectra of the B2 horizons of thirteen Australian soils thatroughly correspond to the same orders. The ASC orders of thesesoils and the numbers used were: 4 Chromosols; 3 Tenosols; 2Grey Sodosols; 1 Grey Vertosol; 1 Red Dermosol; 1 Rudosoland 1 Kandosol. Both the PCA scores and loadings of these datawere compared.

The preprocessing, PCA analysis, PLSR modelling andbagging-PLSR predictions were made using the sharewareParLeS v2.1 (Viscarra Rossel, 2005).

3. Results and discussion

3.1. Spectral interpretation

The UV–vis–NIR spectra of our five main mineral andorganic components, are shown in Fig. 2a.

Fig. 4. UV–vis–NIR diffuse reflectance spectra of the preprocessed calibrationdata showing (a) the wavelet de-noised spectra and (b) their first derivatives.

Fig. 5. Principal components analysis scores plots for the preprocessed spectra,bounded by a 99% density ellipse (fine-dashed line). Preprocessing realisedusing wavelet de-noising and first derivatives. Closed black circles representKaolin (K), illite (I), smectite (S), goethite (G), organic mix (H–F), quartz (Q)and 15 additional end-member minerals. Open circles represent the 107 K, I, S,G, H–F, Q mixes based on the experimental design that is described in Section2.1. The asterisks represent the nine independent mineral-organic mixes that weused to test our PLSR models. These were made up to approximate the mineralcomposition of some Australian soils.

75R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

pyFramework silicates such as quartz (Fig. 2a (i)) do not

have prominent absorption features in the UV–vis–NIRregion. Their intense fundamental vibrations occur in themid infrared around 10,000 nm (Nguyen et al., 1991). Thesmall absorption bands occurring near 850 nm, 1200 nm,1400 nm, 1900 nm may be due to the vibrationalcombinations and overtones of molecular water contained invarious locations in the mineral (Hunt, 1977).

The spectrum of kaolin (Fig. 2a (ii)) shows thecharacteristic doublet absorption peak near 1400 nm, whichmay be attributed to the first overtone of the O–H stretch anda second doublet at 2200 nm due to the combination Al–OHbend plus O–H stretch. This metal-OH bend plus O–Hstretch combination near 2200 nm and 2300 nm arediagnostic absorption features in clay mineral identification(Clark et al., 1990).

Smectites form a group of expanding lattice clay mineralscapable of varying amounts of isomorphous substitution inboth the tetrahedral and octahedral sheets. The smectite usedin this work was montmorillonite, the high-alumina memberof the group. Water molecules are absorbed between thesheets of the montmorillonite structure, hence the spectrum ofmontmorillonite is typical of a water-rich mineral, showing anintense absorption peak at 1900 nm (Fig. 2a (iii)), which isdue to the combinations of the H–O–H bend with the O–Hstretches (Clark et al., 1990). Less pronounced absorptionpeaks also occur at 1400 nm and 2200 nm. Note that aspectrum that has a 1400 nm band but no 1900 nm bandindicates that only O–H is present (e.g. kaolinite in Fig. 2a(ii)) has only a small amount of water because of the weak1900 nm absorption but a large amount of O–H.

The absorption features of illite (Fig. 2a (iv)) are similar tothose of kaolin and smectite, however they are less welldefined. Illite also shows absorption bands near 2200 and2300 nm due to additional Al–OH features.

The iron oxide goethite shows strong absorption featuresnear 450 nm and 900 nm (Fig. 2a (v)). The absorption bandnear 450 nm is caused by paired and single Fe3+ electrontransitions to a higher energy state (Sherman and Waite,1985), while the absorption near 900 nm is due to Laporte-forbidden transitions (Sherman, 1990). The small absorbancepeak near 550 nm may be due to the chromophore FeOOH,found in goethite (Mortimore et al., 2004) which is ayellowish mineral.

Fig. 6. Cross-validation results showing plots of the number of factors vs. the Akaike Information Criterion (AIC) for each of our mineral and organic mixes.

Table 2Assessment statistics for predictions of mineral-organic components usingbagging-PLSR

Mineral-organiccomponent

# PLSRfactors

RMSE ME SDE R2adj. RPD

Kaolin 3 0.036 0.007 0.038 0.94 4.41(0.044) (−0.001) (0.047) (0.91) (3.61)

Illite 2 0.034 0.022 0.027 0.96 3.89(0.039) (0.001) (0.040) (0.99) (3.34)

Smectite 2 0.034 0.012 0.033 0.92 3.61(0.038) (0.027) (0.028) (0.96) (3.19)

Goethite 2 0.036 0.032 0.017 0.55 0.36H–F-mix 5 0.028 0.023 0.017 0.60 0.18Quartz 1 0.142 −0.134 0.050 0.16 0.38

For comparison, predictions of kaolin, illite and smectite made using PLSR areshown in brackets.

76 R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

The spectrum of the H–F mix shows a broad absorptionpeak in the visible range, between 400 nm and 700 nm,which is dominated by the darkness of the humic acid (Fig.2a (vi)). The absorption peak near 1700 nm is the firstovertone of the C–H stretch fundamental which occurs near3400 nm, and a combination band which occurs near2300 nm. This combination at 2300 nm can sometimes beconfused with hydroxyl and carbonate absorptions in minerals

(Clark et al., 1990). The peak at 1900 nm indicates that themix also contained some water molecules (Fig. 2a (vi)).

Fig. 2b shows the spectra of a series of our mineral mixes,showing decreasing amounts of kaolin and varying amounts ofprimarily illite and smectite (Fig. 2b (i) to (vii)). As the amountof kaolin in the mix decreases, the characteristic absorptiondoublets at 1400 nm and 2200 nm become less pronounced.Conversely as the amount of smectite in the mix increases, the

Fig. 7. Bagging-PLSR predictions of kaolin, illite and smectite for the nine artificial mixes made up to approximate some Australian soils. (a), (b) and (c) show plots ofthe predictions (circles) and their 95% confidence intervals (lines) and the actual values (crosses); (d), (e) and (f) show plots of actual vs. predicted with predictionstatistics.

77R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

characteristic 1900 nm water absorption band also increases(Fig. 2b). Fig. 3 shows the UV–vis–NIR spectra of theadditional 15 end-member minerals we used in the PLSRcalibration models.

Feldspars do not have prominent absorption features in theUV–vis–NIR. The spectrum of feldspar (Fig. 3 (i)) showssome minor peaks that are mostly due to combinations andovertones of molecular water contained in the mineral as well

as impurities in our sample. Metabentonite is a metamorphosedbentonite characterised by clay minerals (particularly illite) thatare much less able of absorbing water, hence the smaller peakat 1900 nm (Fig. 3 (ii)). Bentonite is an aluminiumphyllosilicate, generally impure clay consisting of mostlymontmorillonite, as indicated by the characteristic waterabsorption peak at 1900 nm (Fig. 3 (iii)). Fig. 3 (iv) showsthe spectrum of calcium carbonate. The characteristic absorp-tion peaks occur as a result of the vibrational combinations andovertones of the CO3 fundamentals which occur in the MIRregion (Janik and Skjemstad, 1995). The spectrum shows weakabsorption peaks near 1850 nm, 2160 nm and between the2300 nm and 2500 nm region. Note that band positions incarbonates may vary with composition (Hunt and Salisbury,1970).

The spectrum of gibbsite (Fig. 3 (v)) shows the typical Al–OH bend plus O–H stretch combination at about 2250 nm.Dickite is a kaolin polymorph and as such it is a 1:1 aluminousdioctahedral phyllosilicate mineral. Like kaolin, its spectrumshows a doublet absorption peak at 1400 nm, although it isbroader and less well defined (Fig. 3 (vi)). Absorption peakstypical of the Al–OH bend plus O–H stretch combination arealso present near 2200 nm.

The spectrum of muscovite (Fig. 3 (vii)) shows absorptionpeaks at 1400 nm and 2200 nm. Brucite is closely related togibbsite however it has an Mg2+ instead of an Al3+, thusrequiring one-third of the octahedrons to be vacant of a centralion in order to maintain a neutral sheet. The spectrum ofbrucite shows the typical first overtone of the O–H stretch at1400 nm and an Mg–OH bend plus O–H stretch combinationnear 2300 nm (Fig. 3 (viii)).

Oligoclase is a member of the plagioclase feldspar groupand it comprises minerals that range in chemical compositionfrom pure albite (NaAlSi3O8), to pure anorthite (CaAl2Si2O8).Oligoclase by definition must contain 90–70% sodium to 10–30% calcium in the sodium/calcium position of the crystalstructure. Its spectrum is shown in Fig. 3 (ix).

Halloysite is a hydrated kaolin, thus it shows similarspectral features to kaolin, but for the intense peak at 1900 nm,which is due to the presence of water in its structure. Thespectrum of attapulgite, a magnesium–aluminium silicate,shows absorption peaks at 1400 nm, 1900 nm and peaksbetween 2200 nm and 2500 nm (Fig. 3 (xi)).

Hectorite is a lithium containing smectite, whose spectrumshows the first overtone of the O–H stretch at 1400 nm and thetypical 1900 nm absorption peak common to water-richminerals. The presence of magnesium in its structure hasalso given rise to an Mg–OH bend plus O–H stretchcombination near 2300 nm (Fig. 3 (xii)). Similar features areobserved in the spectrum of vermiculite (Fig. 3 (xii)).

Chlorite is an iron bearing silicate which shows absorptionfeatures due to electronic transitions near 400 nm, 700 nm,900 nm and 1000 nm (Fig. 3 (xiv)). Chlorite also displays anMg–OH bend plus O–H stretch combination near 2300 nm.Limonite is a mixture of hydrated iron oxides made up ofmostly goethite. Hence the similarities in their spectra (cf. Figs.3 (xv) and 2a (v)).

Fig. 8. Regression coefficients, b, used in bagging-PLSR predictions of (a)kaolin, (b) illite and (c) smectite. Also shown are the percent explained variation(EV) in the spectra (X) and the mineral-organic component (y) for the number ofPLSR factors (NF) used in the models.

78 R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

3.2. Quantitative analysis of soil mineral-organic UV–vis–NIRspectra

3.2.1. Preprocessing of spectra and principal componentsanalysis

Spectra were preprocessed before modelling as preproces-sing improved predictability. A sample of the de-noised log(1/R) spectra of selected sample mixtures are show in Fig. 4a,while their first derivatives are shown in Fig. 4b.

The most characteristic absorption features of the mineraland organic spectra are highlighted in the first derivative spectra(Fig. 4b), which measures the local rate of change of absorbancewith respect to wavelength. For example, the group of positivepeaks near 900 nm may represent absorption caused byelectronic transitions in goethite; the group of peaks near1400 nm may be attributed to the first overtone of the O–Hstretch; the peaks near 1700 nm due to the first overtone of theC–H stretch fundamental; the prominent group of peaks near1900 nm may be attributed to water held within mineralstructures and the peaks near 2200 nm, 2300 nm and 2400 nm tometal-OH bend plus O–H stretch combinations (see Section 3.1for more details).

The PCA scores plot of the first two principal components(Fig. 5) produced from the preprocessed spectra reflects thestructure of our experimental design, shown in Fig. 1.

Fig. 5 also shows that the nine validation mixes made up torepresent the mineral composition of Australian soils, were wellrepresented by the calibration data.

3.2.2. PLSR modelling and predictionsThe cross-validation results for our five main minerals and

the H–F mix are shown in Fig. 6.The numbers of factors selected from cross-validation were

in good agreement to those that gave us best results with theexternal validation data. Predictions of the nine artificialmineral-organic test mixes were made using both PLSR(bracketed results, Table 2) and bagging-PLSR. Bagging-PLSR improved predictions of kaolin, illite and smectite but itdid not improve predictions of goethite, H–F mix or quartz.Predictions for kaolin, illite and smectite together with theiruncertainties are shown in Fig. 7.

Bagging-PLSR produced excellent predictions of the com-position of kaolin, illite and smectite in the nine (independent)test mixes, with RPD values greater than 3.5 and with anaccuracy (as measured by the RMSE) of approximately 3.5%(Table 2 and Fig. 7). Predictions for goethite and the H–F mixwere biased and although their R2

adj.values were 0.55 and 0.60,respectively (Table 2), the models were poor (RPD≪1). Themain reason for the bias in these predictions may be the ratherrudimentary experimental design that we used in the experiment,particularly with respect to these two factors. The PLSR modelfor quartz was poor and inaccurate (Table 2).

The first few PLSR regression coefficient vectors, b, forkaolin, illite and smectite are shown in Fig. 8.

Although the regression coefficients of the first derivativesmay be less useful for the assignment of spectral bands thanthose of the unprocessed log(1/R) spectra, large changes in the

Fig. 9. Principal components analysis scores plots for the preprocessed data including those of the soil samples. Preprocessing was realised using the wavelet de-noising algorithm and a first-derivative. Light grey open and closed circles represent the calibration data, the dark grey asterisks represent the nine independentmineral-organic mixes used to test our PLSR models and the closed black circles represent the scores of thirteen Australian (B2) soils horizons representing 7 differentorders in the Australian Soil Classification.

79R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

absolute values of the regression coefficients, |b|, (Fig. 8) showthe regions of the spectra that were important for thepredictions of kaolin, illite and smectite. These regions are ingood agreement with the characteristic spectral features of eachof these minerals (see Section 3.1).

3.2.3. Comparison of the artificial mineral-organic mix spectrawith spectra of real soils

We compared the spectral PC space of the thirteen differentAustralian soil (B2) horizons to the spectral PC space of the

nine artificial test mixes which correspond to similar ASCorders. Fig. 9 shows the PCA scores plot of the calibration datatogether with the nine artificial mineral-organic test mixes andthe thirteen soil B2 horizons.

Both the PC scores of the artificial test mixes and the scoresof the preprocessed soil spectra are clustered on the right-half ofFig. 9, towards the iron (goethite, limonite) and organic (H–Fmix) vertex of the experimental design. This may be consistentwith the more varied iron and organic composition of the soils.Nevertheless, the scores of the Vertosols, Sodosols and

Fig. 10. Comparison of the first 5 principal components (PC) analysis loadings, (a)–(d), respectively, for the preprocessed spectra of the nine independent test mineral-organic mixes (Test mixes) and the thirteen Australian (B2) soils horizons (Soils) representing 7 different orders in the Australian Soil Classification. The correlationbetween each of the PCs is represented by the correlation coefficient (ρ).

80 R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

Chromosols of both the soils and mixes occupy a similar PCspace. All samples are within the 99% density ellipse thatbounds the data. In Fig. 10 we compare the loadings of the first5 PCs of the test mixes to those of the thirteen soils.

From Fig. 10, the first three PCA loadings of the soils and theartificial mixes were similar around 1400 nm, 1900 nm and2200–2500 nm, which are characteristic regions for soil and soilmineralogy (see Section 3.1). Their correlations were 0.34, 0.68and 0.14 respectively (Fig. 10). The first three PCs account forapproximately 75% of the variation in the data. The loadings ofthe fourth and fifth PCs were poorly correlated (ρb0.05). Thesecomponents account for a smaller amount of the variation in thedata (Fig. 10). Their low correlations may account for thedifferences in particle size between the soils and our groundmineral mixes, the general homogeneity of our mixes comparedto the heterogeneity of the soils and errors in our experimentalprocedure.

4. Conclusions

Partial least squares regression 1 (PLSR1) successfullymodelled the mineral composition of our calibration data. Themodels accurately predicted the amount of kaolin, illite andsmectite in the nine independent test mixes. Predictions ofgoethite and the H–F (organic) mix were biased. We believethat these predictions could be improved by improving theexperimental design. Predictions for quartz were poor as it doesnot have spectral response in the UV–vis–NIR. Bagging-PLSRwas useful as it gave better predictions than PLSR and provideda measure of prediction uncertainty. A comparison of themultivariate space of thirteen Australian soil horizons and ournine artificial mixes made up to approximate them showedsimilarities around the characteristic wavelengths used for thespectral characterisation of soil and soil mineralogy. This PCAanalysis also showed the more inherent differences between theartificial mixes and the soils.

5. Future work

We plan to further improve this work by:

○ Enhancing our spectral library to include various forms ofparticular mineral groups.

○ Incorporating the various forms of organic materials in ourmodels — not just humic and fulvic acids.

○ Improving the experimental design.○ Supplementing our UV–vis–NIR library with MIR spectra

for improved modelling of organic matter and quartz.○ Predictions of the mineral composition of soil and

comparison to XRD analysis.

References

Akaike, H., 1973. Information theory and an extension of maximum likelihoodprinciple. In: Petrov, B.N., Csáki, F. (Eds.), Second InternationalSymposium on Information Theory. Akadémia Kiadó, Budapest, Hungary,pp. 267–281.

Barnes, R.J., Dhanoa, M.S., Lister, S.J., 1989. Standard normal variatetransformation and de-trending of near-infrared diffuse reflectance spectra.Applied Spectroscopy 43 (5), 772–777.

Ben-Dor, E., Banin, A., 1990. Near-infrared reflectance analysis of carbonateconcentration in soils. Applied Spectroscopy 44 (6), 1064–1069.

Ben-Dor, E., Banin, A., 1994. Visible and near-infrared (0.4–1.1 (μm) analysisof arid and semiarid soils. Remote Sensing of Environment 48 (3), 261–274.

Breiman, L., 1996. Bagging predictors. Machine Learning 24, 123–140.Brown, G., Brindley, G.W., 1984. X-ray diffraction procedures for clay mineral

identification. In: Brindley, G.W., Brown, G. (Eds.), Crystal Structures ofClay Minerals and their X-Ray Identification. Mineralogical Society,London, pp. 305–361.

Brown, D.J., Bricklemyer, R.S., Miller, P.R., 2005. Validation requirements fordiffuse reflectance soil characterization models with a case study of VNIRsoil C prediction. Geoderma 129, 251–267.

Brown, D.J., Shepherd, K.D., Walsh, M.G., Dewayne Mays, M., Reinsch, T.G.,2006. Global soil characterization with VNIR diffuse reflectance spectros-copy. Geoderma 132, 273–290.

Clark, R.N., 1999. Spectroscopy of rocks and minerals, and principles ofspectroscopy. In: Rencz, N. (Ed.), Remote Sensing for the Earth Sciences:Manual of Remote Sensing. John Wiley & Sons, New York, pp. 3–52.

Clark, R.N., King, T.V.V., Klejwa, M., Swayze, G., Vergo, N., 1990. Highspectral resolution reflectance spectroscopy of minerals. Journal ofGeophysical Research 95, 12653–12680.

Donoho, D.L., 1995. De-noising by soft-thresholding. Transactions onInformation Theory 413, 613–626.

Farmer, V.C., 1974. In: Farmer, V.C. (Ed.), The Infra-Red Spectra of Minerals.Mineralogical Society, London, p. 539.

Haaland, D.M., Thomas, E.V., 1988. Partial least-squares methods for spectralanalyses. 1. Relation to other quantitative calibration methods and theextraction of qualitative information. Analytical Chemistry 60, 1193–1202.

Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning:Data Mining, Inference, and Prediction. Springer–Verlag, New York.

Hillier, S., Roe, M.J., Geelhoed, J.S., Fraser, A.R., Farmer, J.G., Paterson, E.,2003. Role of quantitative mineralogical analysis in the investigation of sitescontaminated by chromite ore processing residue. The Science of the TotalEnvironment 308, 195–210.

Hunt, G.R., 1977. Spectral signatures of particulate minerals in the visible andnear infrared. Geophysics 42, 501–513.

Hunt, G.R., Salisbury, J.W., 1970. Visible and near-infrared spectra of mineralsand rocks I. Silicate Minerals. Modern Geology 1 (4), 283–300.

Isbell, R.F., 1996. The Australian Soil Classification. CSIRO Australia,Collingwood, VIC.

Janik, L.J., Skjemstad, J.O., 1995. Characterisation and analysis of soils usingmid-infrared partial least squares. II. Correlations with some laboratory data.Australian Journal of Soil Research 33, 637–650.

Martens, H., Næs, T., 1989. Multivariate Calibration. John Wiley & Sons,Chichester. 419 pp.

McBratney, A.B., Minasny, B., Viscarra Rossel, R.A., in press. Spectral soilanalysis and inference systems: a powerful combination for solving the soildata crisis. Geoderma (Available online 11 May 2006).

Mortimore, J.L., Marshall, L.-J.R., Almond, M.J., Hollins, P., Matthews, W.,2004. Analysis of red and yellow ochre samples from Clearwell Caves andÇatalhöyük by vibrational spectroscopy and other techniques. Spectro-chimica Acta. Part A 60, 1179–1188.

Nguyen, T.T., Janik, L.J., Raupach, M., 1991. Diffuse Reflectance InfraredFourier Transform (DRIFT) Spectroscopy in soil studies. Australian Journalof Soil Research 29, 49–67.

Savitzky, A., Golay, M.J.E., 1964. Smoothing and differentiation of databy simplified least squares procedures. Analytical Chemistry 36,1627–1639.

Schulze, D.G., 2002. An introduction to soil mineralogy. In: Dixon, J.B.,Schulze, D.G. (Eds.), Soil Mineralogy with Environmental Applications.Soil Science Society of America, Madison, WI, pp. 1–34.

Sherman, D.M., 1990. Crystal Chemistry, electronic structures and spectra of Fesites in clay minerals. In: Coyne, L.M., McKeever, S.W.S., Drake, D.F.(Eds.), Spectroscopic Characterization of Minerals and their Surfaces.American Chemical Society, Washington DC, pp. 284–309.

81R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82

Autho

r's

pers

onal

co

py

Sherman, D.M., Waite, T.D., 1985. Electronic spectra of Fe3+ oxides andoxyhydroxides in the near infrared to ultraviolet. American Mineralogist 70,1262–1269.

Stace, H.T.C., Hubble, G.D., Brewer, R., Northcote, K.H., Sleeman, J.R.,Mulcahy, M.J., Hallsworth, E.G., 1972. A Handbook of Australian Soils.Rellim Glenside, South Australia.

Stenberg, B., Nordkvist, E., Salomonsson, L., 1995. Use of near infraredreflectance spectra of soils for objective selection of samples. Soil Science159, 109–114.

Viscarra Rossel, R.A., 2005. ParLeS v2.1a. Shareware for spectroscopy andchemometrics. Australian Centre for Precision Agriculture, McMillanbuilding A05, The University of Sydney. At: www.usyd.edu.au/su/agric/acpa/people/rvrossel/soft01.htm.

Viscarra Rossel, R.A., McBratney, A.B., 1998. Laboratory evaluation of aproximal sensing technique for simultaneous measurement of soil clay andwater content. Geoderma 85, 19–39.

Viscarra Rossel, R.A., Minasny, B., Roudier, P., McBratney, A.B., 2006a.Colour space models for soil science. Geoderma 133, 320–337.

Viscarra Rossel, R.A., Walvoort, D.J.J., McBratney, A.B., Janik, L.J.,Skjemstad, J.O., 2006b. Visible, near-infrared, mid-infrared or combineddiffuse eflectance spectroscopy for simultaneous assessment of various soilproperties. Geoderma 131, 59–75.

Whittig, L.D., Allardice, W.R., 1986. X-ray diffraction techniques. In: Klute, A.(Ed.), Methods of Soil Analysis. Part 1. ASA and SSSA, Madison, WI.

Williams, P.C., 1987. Variables affecting near-infrared reflectance spectroscopicanalysis. In: Williams, P., Norris, K. (Eds.), Near-infrared Technology in theAgricultural and Food Industries. American Association of Cereal Chemists,St. Paul, MN, pp. 143–166.

Wold, S., Martens, H., Wold, H., 1983. The multivariate calibration method inchemistry solved by the PLS method. In: Ruhe, A., Kagstrom, B. (Eds.),Proc. Conf. Matrix Pencils, Lecture Notes in Mathematics. Springer–Verlag,Heidelberg, pp. 286–293.

Wehrens, R., Putter, H., Buydens, L.M.C., 2000. The bootstrap: a tutorial.Chemometrics and Intelligent Laboratory Systems 54, 35–52.

82 R.A. Viscarra Rossel et al. / Geoderma 137 (2006) 70–82