Source apportionment of gasoline and diesel by multivariate calibration based on single particle...

15
Analytica Chimica Acta 446 (2001) 329–343 Source apportionment of gasoline and diesel by multivariate calibration based on single particle mass spectral data Xin-Hua Song a , Nicolaas (Klaas) M. Faber a , Philip K. Hopke a,, David T. Suess b , Kimberly A. Prather b , James J. Schauer c , Glen R. Cass d a Department of Chemical Engineering, Clarkson University, Potsdam, NY 13699-5705, USA b Department of Chemistry, University of California, Riverside, CA 92521, USA c Department of Civil and Environmental Engineering, University of Wisconsin at Madison, Madison, WI 53706, USA d School of Earth and Atmospheric Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA Received 7 November 2000; accepted 25 June 2001 Abstract The mass apportionment of gasoline and diesel particles in ambient aerosol samples is a difficult problem because both sources exhibit very similar chemical composition. However, individual particle analysis could provide additional information and help achieve source apportionment with good accuracy. Aerosol time-of-flight mass spectrometry (ATOFMS) has proven to be a powerful technique capable of simultaneously determining both the size and chemical composition of single particles in real time. Thus, samples of gasoline and diesel particles were analyzed by ATOFMS for their single particle information. In addition to the aerodynamic diameter from which the individual particle mass can be estimated, positive and negative mass spectra were obtained for each particle. A novel data analysis approach based on the combination of an adaptive resonance theory-based neural network (ART-2a), and a multivariate calibration method, partial least squares (PLS), has been developed to apportion the mass contributions of gasoline and diesel sources to mixture samples. The ART-2a neural network was used first to classify the particle-by-particle mass spectral data. The source profile for each source (gasoline/diesel) was obtained in terms of the mass fractions of the classified particle types. Next, PLS was applied to build a model relating the mass fractions of different particle classes and the mass contributions of the two sources to mixture samples. Artificial mixture samples obtained by randomly mixing some particles from the two source samples have been used to examine the feasibility of the proposed method. Satisfactory predictions for the mass contributions of gasoline and diesel exhaust to the mixture samples have been obtained. A recently proposed formula for prediction error variance is successfully modified to quantify the uncertainty in the PLS predictions. This study exemplifies the potential promise of multivariate calibration as applied to the aerosol source apportionment problem. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Chemometrics; Aerosol source apportionment; Aerosol source characterization; Time-of-flight mass spectrometry; Artificial neural network; Partial least squares; Prediction interval estimation Corresponding author. Tel.: +1-315-268-3861; fax: 1-315-268-6654. E-mail address: [email protected] (P.K. Hopke). 1. Introduction There have been ever-increasing studies about the sources and effects of atmospheric particulate matter since Dockery et al. [1] published their study linking aerosol particles to human morbidity and mortality. 0003-2670/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved. PII:S0003-2670(01)01270-3

Transcript of Source apportionment of gasoline and diesel by multivariate calibration based on single particle...

Analytica Chimica Acta 446 (2001) 329–343

Source apportionment of gasoline and diesel by multivariatecalibration based on single particle mass spectral data

Xin-Hua Songa, Nicolaas (Klaas) M. Fabera, Philip K. Hopkea,∗, David T. Suessb,Kimberly A. Pratherb, James J. Schauerc, Glen R. Cassda Department of Chemical Engineering, Clarkson University, Potsdam, NY 13699-5705, USA

b Department of Chemistry, University of California, Riverside, CA 92521, USAc Department of Civil and Environmental Engineering, University of Wisconsin at Madison, Madison, WI 53706, USA

d School of Earth and Atmospheric Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA

Received 7 November 2000; accepted 25 June 2001

Abstract

The mass apportionment of gasoline and diesel particles in ambient aerosol samples is a difficult problem because bothsources exhibit very similar chemical composition. However, individual particle analysis could provide additional informationand help achieve source apportionment with good accuracy. Aerosol time-of-flight mass spectrometry (ATOFMS) has provento be a powerful technique capable of simultaneously determining both the size and chemical composition of single particlesin real time. Thus, samples of gasoline and diesel particles were analyzed by ATOFMS for their single particle information.In addition to the aerodynamic diameter from which the individual particle mass can be estimated, positive and negative massspectra were obtained for each particle. A novel data analysis approach based on the combination of an adaptive resonancetheory-based neural network (ART-2a), and a multivariate calibration method, partial least squares (PLS), has been developedto apportion the mass contributions of gasoline and diesel sources to mixture samples. The ART-2a neural network was usedfirst to classify the particle-by-particle mass spectral data. The source profile for each source (gasoline/diesel) was obtainedin terms of the mass fractions of the classified particle types. Next, PLS was applied to build a model relating the massfractions of different particle classes and the mass contributions of the two sources to mixture samples. Artificial mixturesamples obtained by randomly mixing some particles from the two source samples have been used to examine the feasibilityof the proposed method. Satisfactory predictions for the mass contributions of gasoline and diesel exhaust to the mixturesamples have been obtained. A recently proposed formula for prediction error variance is successfully modified to quantifythe uncertainty in the PLS predictions. This study exemplifies the potential promise of multivariate calibration as applied tothe aerosol source apportionment problem. © 2001 Elsevier Science B.V. All rights reserved.

Keywords: Chemometrics; Aerosol source apportionment; Aerosol source characterization; Time-of-flight mass spectrometry; Artificial neuralnetwork; Partial least squares; Prediction interval estimation

∗ Corresponding author. Tel.:+1-315-268-3861;fax: 1-315-268-6654.E-mail address: [email protected] (P.K. Hopke).

1. Introduction

There have been ever-increasing studies about thesources and effects of atmospheric particulate mattersince Dockery et al. [1] published their study linkingaerosol particles to human morbidity and mortality.

0003-2670/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved.PII: S0003-2670(01)01270-3

330 X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343

Particulate matter emissions from motor vehicles areamong the major contributors to fine particle concen-trations in the urban atmosphere [2]. A number ofstudies have been performed to analyze the chemicalcomposition of particles from vehicular emissions.Early studies made use of the collection of parti-cles on filters, with subsequent chemical analysis by,for example, gas chromatography-mass spectrometry[3]. Recently, the ability to determine the nature ofindividual particles that are sampled and analyzedon-line has been greatly improved through the devel-opment of real time single particle mass spectrometrytechniques, such as aerosol time-of-flight mass spec-trometry (ATOFMS) [4–6]. ATOFMS is capable ofmeasuring the size and chemical composition of in-dividual aerosol particles in real time. It has beensuccessfully used for the on-line characterization ofindividual particles from automobile emissions [7,8].Single particle source characterization studies areneeded to determine the ATOFMS qualitative chem-ical signatures of specific aerosol sources for the ap-plication to ambient single particle data. Although theATOFMS system produces a great deal of useful in-formation regarding individual particles, it cannot cur-rently provide a complete quantitative estimate of thecomposition of airborne particulate matter. The goal ofthe present study is to demonstrate the application ofa multivariate calibration method to ATOFMS particledata, thus, providing the link between single particlecharacteristics and quantitative aerosol compositions.

Being able to accurately estimate the mass contri-butions of similar emission sources to ambient par-ticulate matter samples is of great importance in airquality management [9,10]. In this study, the sourceapportionment of two very similar sources, namelygasoline- and diesel-powered vehicles, is demon-strated. Generally, this is a difficult problem becauseof the similarity between the chemical signaturesfrom these two aerosol sources. However, using singleparticle mass spectral data, it may be possible to dif-ferentiate these sources in complex ambient data sets.

Then, the question arises as to how to describe thechemical and physical properties (source profiles) ofthe two different sources (gasoline and diesel) basedon the particle-by-particle mass spectra data obtainedthrough ATOFMS. For this purpose, one of the mostpromising pattern classification techniques, the adap-tive resonance theory-based neural network (ART-2a)

[11–14] has been applied here. The ART-2a networkappears to be a very useful tool for unsupervised pat-tern classification problems. For example, it has beensuccessfully applied to airborne particle classificationin environmental monitoring [15] and to the rapid sort-ing of post-consumer plastic [16]. Hopke and Song[17] studied the usefulness of ART-2a for the classi-fication of single particles collected in a photographicprinter cabinet. More recently, ART-2a has been re-ported to infer possible particle emission sources anddistinguish among single particle types from ATOFMSdata acquired during a 1996 field study in southernCalifornia [18].

In the present study, the ART-2a neural networkis used to partition the particles from the gasolineand diesel source samples into different homoge-neous classes based on the particle mass spectral dataobtained by the ATOFMS analysis. The ability toadaptively detect outliers and new classes through theresonance mechanism is a typical feature of this typeof neural network. Thus, ART-2a provides a dynamicclassification system. After different particle classesare produced by ART-2a, their mass fractions arecalculated. In this way, the source profiles for the twovehicular emission sources (gasoline and diesel) canbe obtained in terms of the mass fractions for differ-ent particle types. The trained ART-2a weight matrixwill be applied to simulated ATOFMS mixture dataand for this reason is saved for future use.

To explore the feasibility of apportioning the masscontributions of the two emission sources to ambientmixture samples, partial least squares (PLS) has beenapplied. The theoretical and practical aspects of PLShave been extensively studied and reported over theyears [19–25]. To examine the potential of PLS in thesource apportionment of gasoline and diesel exhaustparticles to mixture samples, calibration and predic-tion samples are required for model building andtesting. As determined earlier [10], artificial ambientsamples can be generated by randomly selecting par-ticles from the source samples. That is, each mixturesample is composed of particles that are randomlyselected from the two source samples. Then the truecorresponding mass contributions of the two sourcesto the mixture sample are calculated. For these mix-ture samples, the mass fractions corresponding todifferent particle types can be obtained by the trainedand saved ART-2a classifier in a manner identical to

X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343 331

the source samples. Suppose these mass fractions ofdifferent particle classes serve as the X-block, andthe fractions of mass contributions of the two sourcesto the mixture samples are the Y-block. Then, PLSbuilds the relationship between the X- and Y-block.The suitability of PLS is anticipated from its relativeinsensitivity to collinearity in the X-block. This studyattempts to demonstrate the potential promise of thecombination of ART-2a neural network and PLSmodeling in aerosol source apportionment.

2. Data description

A total of 10 in-use light duty gasoline vehiclesand one in-use light duty diesel truck were testedon a dynamometer in July 1998 at the CaliforniaAir Resources Board Haagen-Smit Laboratory inEl Monte, CA. The vehicles were tested under theFederal Test Procedure (FTP) to mimic urban driv-ing conditions. Vehicular particulate emissions weresampled by ATOFMS after passing through a dilutionsampler used to simulate proper ambient cooling anddilution [26–28]. The data analyzed herein are thesingle particle mass spectra acquired using a trans-portable ATOFMS during this vehicular particulateemission source characterization experiment. For amore detailed description of this specific experiment,see Suess et al. [8].

Single particle mass spectrometry is described inmore detail by Suess and Prather [29]. The detailedoperation of a transportable ATOFMS is given byGard et al. [30]. Briefly, an aerosol is introducedinto the ATOFMS through a converging nozzle. Thenozzle is used to accelerate the particles to differ-ent terminal velocities depending on the particle’saerodynamic diameter. The particle beam is thencollimated and introduced into the sizing region ofthe instrument. Each particle’s aerodynamic diameteris determined from the time it takes to traverse theknown distance between the beams from two 532 nmdiode lasers. The particle is introduced into the ionsource region of the mass spectrometer. A Nd:Yaglaser operating at 266 nm is used to desorb and ionizethe material from the particle producing both pos-itively and negatively charged ions. These ions areanalyzed by a dual polarity reflectron time-of-flightmass spectrometer and detected by two microchannel

plate detectors. Each particle was characterized byboth positive (350 mass-to-charge ratios) and nega-tive (also 350 mass-to-charge ratios) mass spectra.In addition, the aerodynamic diameter for each parti-cle was obtained, from which the individual particlemass was estimated. A total of 1114 particles werecollected and used in this study, with 744 particlesfrom gasoline and 370 from diesel engine exhaust.Each pair of particle spectra were stored as one rowin the data matrix, with the value at thenth columnrepresenting the area under any peak within 0.5 Daof n. The aerodynamic diameter for each particle wasmeasured where the aerodynamic diameter of a parti-cle is the diameter of a unit density sphere that movesin an aerodynamically identical manner to the particleof interest. The individual particle mass values werethen estimated from the aerodynamic diameter.

3. Theory and algorithm

Apportioning the mass contributions of gasoline anddiesel to the mixture samples is based on the com-bination of ART-2a neural network and PLS model-ing. The ART-2a neural network is used to classifythe particles from the sources into different homoge-neous particle types based on the mass spectra. Thisstep yields the profile for each source (gasoline/diesel)in terms of the mass fractions of the classified particletypes. Finally, PLS is used to build a model describingthe relationship between the mass fractions of differ-ent particle classes and the mass contributions of theemission sources to mixture samples. The flow chartof the approach is shown in Fig. 1.

3.1. Adaptive resonance theory-based neuralnetwork (ART-2a)

Adaptive resonance theory (ART) was originallyintroduced as a mathematical model of the fundamen-tal behavioral functions of the biological brain, suchas learning, neglecting, parallel and distributed infor-mation storage, short-term and long-term memory andpattern recognition. The aim of the ART-based modelsis to understand the seemingly paradoxical situationthat a biological brain is able to identify an unexpectedevent as what it is: as belonging to or not belongingto the existing knowledge base. The previous

332 X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343

Fig. 1. Flow chart for the proposed methodology.

sentence seems awkward. Moreover, ART-based mod-els try to describe the ability of the brain to expand itsknowledge by learning deviating unknowns withoutdisturbing or destroying stored knowledge. ART-2ahas been chosen for this study because its algorithmis mathematically simple and computationally inex-pensive as compared with other types of ART-based

neural networks. The essence of the ART-2a algo-rithm is the dynamic formation of a weight matrixW(dimensionm × k) wherem is the length of the inputfeature vector andk is the number of detected classesin the data. Thus, each column of this weight matrixW gives the measure of the future centroid of a class.The class size is controlled by a vigilance parametercalledρmax that lies between 0 and 1. If an input vec-tor x forms a smaller angle with a particular weightvector w than arccos (ρmax), then x is in resonancewith this class corresponding to the weightw. Other-wise, a novel event has been detected and a new classweight vector is added to the previous weight matrix.After the network’s training of a given number ofcycles on the data set, the training process is stoppedif the change in the weight matrix is very small, orthe obtained class membership for the particles in thedata has been fixed. The detailed algorithm has beendescribed elsewhere [13,17] and is not repeated here.

3.2. Partial least squares (PLS)

The theory and properties of partial least squares(PLS) have been extensively studied and reported inthe literature, hence only a brief description will bepresented. The basic idea in PLS is to find latent vari-ables or scores that have good predictive ability. To thisend, PLS constructs linear combinations of the predic-tor variables (X) that have maximum covariance withthe predictand variables (Y) subject to being orthogo-nal. In this application, theX variables are the massfractions of the ART-2a classified particle types andtheY variables contain the mass contributions of gaso-line and diesel exhaust particles, expressed as frac-tions. Because the sum of these fractions equals unity,it suffices to consider only one of them. In the remain-der of the paper, the mass contribution of gasoline ex-haust particles serves as the predictand variable. Con-sequently, the models are built using PLS1, which, asopposed to PLS2, deals with a single predictand vari-able.

The preceding considerations lead to the followingmodel for the calibration samples:

y = Xb + e (1)

wherey (nc×1) is the vector of the predictand variable,X (nc × k) the matrix of predictor variables,b (k × 1)the vector of regression coefficients,e (nc × 1) the

X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343 333

vector of residuals, andnc is the size of the calibrationset. The regression coefficient vector is estimated as

b = X+y (2)

where X+ symbolizes the generalized inverse ofX,provided by the PLS1 algorithm. To obtain the PLSmodel with the best predictive performance, one mustdetermine the optimal number of PLS components.This is achieved by monitoring the average predictionerror obtained for an independent set of validationmixture samples while building models with increas-ing number of PLS components. The average predic-tion error or root-mean squared error of validation(RMSEV) is calculated as

RMSEV =(

1

nv

nv∑i=1

(yi − yi)2

)1/2

(3)

where nv is the number of validation samples, andyi denotes the prediction ofyi . The optimal numberof PLS components has been reached when addingcomponents does not significantly improve RMSEV.

The ultimate goal of calibration is prediction. Thus,once the PLS model is built from the calibration setdata, it is used to predict the mass contribution ofgasoline exhaust particles to new test mixture samplesthat have not been seen in the calibration process. Themodel performance measure is root mean squared er-ror of prediction (RMSEP), which is calculated usingEq. (3) withnv replaced bynp, the size of the predic-tion set.

3.3. PLS prediction intervals

A well-known adage in analytical chemistry statesthat an analytical result is incomplete without a mea-sure of its uncertainty. Assessing the prediction un-certainty when using PLS as a calibration method hasreceived considerable attention, mainly in the chemo-metrics literature [31–42]. Here we will use a mod-ification of a recently proposed formula for predic-tion error variance [37] that has worked well in twonear-infrared applications [43,44].

It is derived elsewhere [45] that the variance in theprediction error

PEi = yi − yi (4)

can be approximated by

V (PEi ) =(

1 + 1

nc + hi

)MSEC− V y (5)

wherehi denotes the sample leverage, which measuresthe distance of the prediction sample to the calibrationsamples in multivariateX-space, MSEC is the meansquared error of calibration, andV y is the variance ofthe measurement error in theY-values of the calibra-tion samples. It is noted that Eq. (5) is a modificationof Eq. (8) in Faber and Kowalski [37].

MSEC is estimated from the fit errors as

MSEC= 1

nc − A − 1

nc∑i=1

(yi − yi)2 (6)

whereA denotes the selected optimal number of PLScomponents. It is observed that Eq. (6) differs fromEq. (3) with respect to the denominator. The reasonfor this is that fit errors will be smaller than predic-tion errors, because the samples have been used forconstructing the model: the factornc − A − 1 is in-troduced to neutralize this effect. It can be interpretedto account for the loss of degrees of freedom due toanA-dimensional model fit to the mean-centered data.However, attributing one degree of freedom to eachPLS component can be a rather crude approximation.van der Voet [46] has detailed that the low-numberedPLS components extract more than one degree of free-dom from the data. The implication for the currentwork is that MSEC may be significantly underesti-mated, thus, leading to optimistic values for predictionerror variance when using Eq. (5).

A potential remedy is derived as follows. Ideally,the prediction error variance can be used to construct a100(1−α)% prediction interval for the true predictandas

yi − tα/2,ν σ (PEi ) ≤ yi ≤ yi + tα/2,ν σ (PEi ) (7)

wheret�/2,v is the critical value of at-distribution withν degrees of freedom, andσ (PEi ) = V (PEi )

1/2 (thedegrees of freedom can be calculated using the methoddetailed in Appendix VI of [37]). From Eq. (7), itfollows that the random variable

ti = yi − yi

σ (yi − yi)(8)

334 X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343

should be distributed as Student’st with standard de-viation one andν degrees of freedom. This conditioncan be verified by examining the distribution for the“ t-values” obtained for the validation set. Underesti-mating the degrees of freedom will yield a standarddeviation of the “t-values” that is systematically largerthan 1. Thus, Eq. (5) is effectively corrected by mul-tiplying with the squared standard deviation awkwardsentence. It is noted that this procedure is closely re-lated to van der Voet’s calculation of pseudo-degreesof freedom for PLS [46], since it merely corrects forinserting an inadequate number of degrees of freedomin Eq. (6).

4. Results and discussion

The positive and negative spectra were put togetherside by side for each particle. Thus, each particle wascharacterized as a 700-element vector. All of the par-ticles (1114) from both source samples were classifiedby the ART-2a neural network. Before being input tothe ART-2a, the mass spectral data were pre-treatedin two ways. First, the data at a given mass-to-chargewere removed if there was no intensity for all parti-cles. Next, the data were double scaled as follows. Af-ter scaling each particle vector to unit length, the dataat each mass-to-charge ratio were normalized with re-spect to the maximum intensity seen in a peak of thatparticular mass-to-charge. Alternative scaling schemesincluding logarithmic and square root transformationswere evaluated and it was found that the double scal-ing approach provided better classification results.

With the vigilance parameterρmax = 0.60 and alearning rate of 0.05, the ART-2a network produced26 major classes to which at least five particles wereassigned. The network’s trained weight matrix wassaved for future use. The centroid spectra (both posi-tive and negative) of the 26 major particle classes areshown in Fig. 2a and b. In several of the negative ionspectra, there are large unresolved peaks belowm/zequal to 24. These peaks are due to the production ofelectrons that are unrelated to the particle composi-tions and could have been eliminated from the anal-ysis. However, such a change would not change theclassification results that are based on the vector of700m/z values of the combined positive and negativeion spectra.

The source profiles of the two sources (gasoline anddiesel) were described as the mass fractions of these26 particle classes (see Fig. 3). The 26 classes ac-counted for 83.6% of the total mass in the gasolineexhaust particle sample and 81.5% of the total particu-late matter mass in the diesel exhaust particle sample.It can be seen that the two source profiles have quitedifferent mass fractions for certain particle classes, in-dicating that it is possible to apportion the mass con-tributions of the two sources to mixture samples withgood accuracy.

To achieve the source apportionment, the PLSmodel has been used. Artificial mixture samples weregenerated by randomly selecting some particles fromthe two source collections and mixing them. Themass fractions for different particle classes were ob-tained through the trained ART-2a weight matrix in amanner identical to the source samples. Clearly, thefractions of mass contributions of the two sources toeach of these mixture samples are known from thesimulation (truth). The data were mean-centered andsubjected to the PLS analysis. To examine the effectof the number of calibration samples necessary forgood prediction accuracy, different numbers of cali-bration samples (20, 30, 50, 100, 200 and 500) wererandomly generated. Two additional independent setsof mixtures were generated and used as validationsamples (500) and prediction samples (1000), respec-tively. The PLS modeling was repeated 1000 timeswith different randomly generated calibration, valida-tion and prediction mixture samples. It has been foundthat generally four PLS components were optimal forthe model predictive performance. Fig. 4 shows theaverage RMSEC and RMSEV versus the number ofPLS components over 1000 repetitions when 500 cal-ibration samples were used: no further significant im-provement in RMSEV was observed when more thanfour PLS components were employed in the model.

Fig. 5 shows the average RMSEP for the predictionsamples over 1000 repetitions with different numbersof calibration samples. It can be seen that the PLS pre-diction performance improves with increasing numberof calibration samples. However, the PLS predictionresults are already quite satisfactory with 50 or 100calibration samples.

A useful visualization of prediction quality is to plotthe predicted data against the truth: the data shouldfall on the principal diagonal in this plot. Fig. 6 shows

X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343 335

Fig. 2. Class centroid mass spectra: (a) positive ions; and (b) negative ions. The inserts show the mass spectra form/e values greater than200 with an expanded vertical scale. Lack of an insert indicates that there are no identified peaks abovem/e = 200.

336 X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343

Fig. 2. (Continued).

X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343 337

Fig. 2. (Continued).

338 X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343

Fig. 2. (Continued).

X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343 339

Fig. 3. Mass fraction source profiles for gasoline (�) and diesel (�) exhaust particles.

Fig. 4. RMSEC (�) and RMSEV (�) versus the number of PLS components. The calibration set contains 500 samples and the errormeasures are obtained as the average over 1000 repetitions.

340 X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343

Fig. 5. RMSEP as a function of number of calibration samples. The error measure is obtained as the average over 1000 repetitions.

the predicted mass contributions of gasoline exhaustparticles to 1000 prediction mixture samples againstthe true mass contributions in one run of PLS model-ing with 500 calibration samples. The correlation co-efficient between the predicted and the truth is 0.95,indicating good prediction.

Fig. 6. Predicted mass contribution of gasoline exhaust particles versus the true mass contribution in one run of PLS modeling with 500calibration samples.

Once it has been verified that stable models are ob-tained, attention is turned towards calculating the un-certainty in individual predictions. Close examinationof Fig. 4 shows that RMSEC is systematically un-derestimated: with 500 calibration samples and onlyapproximately four degrees of freedom involved, the

X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343 341

Fig. 7. Distribution of standard deviation of “t-values” calculated using Eq. (8) after correction for inadequate number of degrees offreedom. The models were constructed with 500 calibration samples and 1000 repetitions were used.

curves for RMSEC and RMSEV should be almost in-distinguishable in this plot. From the predictions ob-tained for the validation set one obtains the correctionfactor 1.05. Applying this correction factor to improve

Eq. (5) yields excellent results for the prediction set:the standard deviations of the “t-values” are nicelycentered around unity (see Fig. 7). It was observedthat the distributions were wider for smaller numbers

342 X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343

of calibration samples, but that the centers were con-sistently close to unity (not shown). A final remarkwith respect to the use of Eq. (5) seems to be in order.Since the fractions of mass contributions of the twosources to the mixture samples are known,V y andEq. (5) simplifies, obtaining a dependable estimate forV y is crucial in real applications.

5. Conclusion

A promising approach based on the combinationof ART-2a neural network and PLS calibration us-ing the information about individual particles ana-lyzed by ATOFMS has been developed to apportionthe mass contributions of gasoline and diesel exhaustparticles to mixture samples. The ART-2a neural net-work is first used to classify the particle-by-particledata to characterize the source profiles for differentsources. Then PLS is applied to build the model be-tween the mass fractions of different particle classesand the mass contributions. Artificial mixture samplesobtained by randomly mixing some particles from thetwo source samples have been used to examine the pro-posed method. Satisfactory predictions for the masscontributions of gasoline and diesel exhaust particlesto the mixture samples have been obtained. Furtherexaminations with real ambient data sets will be pur-sued, when appropriate data become available.

Acknowledgements

This work was supported by the State of Califor-nia Air Resources Board through Contract 97-321, theStrategic Environmental Research and DevelopmentProgram (SERDP) (FO8637-98-C-6011), and the Na-tional Renewable Energy Laboratory (NREL).

References

[1] D.W. Dockery, et al. N. Engl. J. Med. 329 (1993) 1753.[2] B.J. Finlayson-Pitts, J.N. Pitts, Atmospheric Chemistry:

Fundamentals and Experimental Techniques, Wiley, NewYork, 1986.

[3] R. Westerholm, K.E. Egeback, Environ. Health Perspect.102 (Suppl. 4) (1994) 13.

[4] K.A. Prather, T. Nordmeyer, K. Salt, Anal. Chem. 66 (1994)1403.

[5] K. Salt, C.A. Noble, K.A. Prather, Anal. Chem. 68 (1996)230.

[6] D.P. Fergenson, D.-Y. Liu, P.J. Silva, K.A. Prather, Chemom.Intell. Lab. Syst. 37 (1997) 197.

[7] P.J. Silva, K.A. Prather, Environ. Sci. Technol. 31 (1997)3074.

[8] D.T. Suess, D.S. Gross, P.J. Silva, K.A. Prather, J.J. Schauer,G.R. Cass, ES&T, 2001, submitted for publication.

[9] P.K. Hopke, Receptor Modeling for Air Quality Management,Elsevier, Amsterdam, 1991.

[10] X.-H. Song, L. Hadjiiski, P.K. Hopke, L.L. Ashbaugh, O.Carvacho, G.S. Casuccio, S. Schlaegle, J. Air Waste Manage.Assoc. 49 (1999) 773.

[11] S. Grossberg, Biol. Cybernet. 23 (1976) 121.[12] S. Grossberg, Biol. Cybernet. 23 (1976) 187.[13] G.A. Carpenter, S. Grossberg, D.B. Rosen, Neural Networks

4 (1991) 493.[14] D. Wienke, L. Buydens, Trends Anal. Chem. 14 (1995) 398.[15] Y. Xie, P.K. Hopke, D. Wienke, Environ. Sci. Technol. 28

(1994) 1921.[16] D. Wienke, W. van den Broek, W. Melssen, L. Buydens, R.

Feldhoff, T. Kantimm, T. Huth-Fehre, L. Quick, F. Winter,K. Cammann, Anal. Chim. Acta 317 (1995) 1.

[17] P.K. Hopke, X.-H. Song, Anal. Chim. Acta 348 (1997) 375.[18] X.-H. Song, P.K. Hopke, D.P. Fergenson, K.A. Prather, Anal.

Chem. 71 (1999) 860.[19] P. Geladi, B.R. Kowalski, Anal. Chim. Acta 185 (1986) 1.[20] E.V. Thomas, D.M. Haaland, Anal. Chem. 62 (1990) 1091.[21] H. Martens, T. Naes, Multivariate Calibration, Wiley, New

York, 1991.[22] A. Lorber, L.E. Wangen, B.R. Kowalski, J. Chemom. 1 (1987)

19.[23] X.-H. Song, P.K. Hopke, M.A. Bruns, K. Graham, K. Scow,

Environ. Sci. Technol. 33 (1999) 3524.[24] L. Hadjiiski, P. Geladi, P.K. Hopke, Chemom. Intell. Lab.

Syst. 49 (1999) 91.[25] A. Lorber, B.R. Kowalski, J. Chemom. 2 (1988) 93.[26] L.M. Hildemann, G.R. Cass, G.R. Markowski, Aerosol Sci.

Technol. 10 (1989) 193.[27] J.J. Schauer, M.J. Kleeman, G.R. Cass, B.R.T. Simoneit,

ES&T 33 (1999) 1566.[28] J.J. Schauer, M.J. Kleeman, G.R. Cass, B.R.T. Simoneit,

ES&T 33 (1999) 1578.[29] D.T. Suess, K.A. Prather, Chem. Rev. 99 (1999) 3007.[30] E. Gard, J.E. Mayer, B.D. Morrical, T. Dienes, D.P.

Fergenson, K.A. Prather, Anal. Chem. 69 (1997) 4083.[31] A. Höskuldsson, J. Chemom. 2 (1988) 211.[32] T.V. Karstang, J. Toft, O.M. Kvalheim, J. Chemom. 6 (1992)

177.[33] A. Phatak, P.M. Reilly, A. Penlidis, Anal. Chim. Acta 277

(1993) 495.[34] S. De Vries, C.J.F. Ter Braak, Chemom. Intell. Lab. Syst. 30

(1995) 239.[35] K. Faber, B.R. Kowalski, Chemom. Intell. Lab. Syst. 34

(1996) 283.[36] M.C. Denham, J. Chemom. 11 (1997) 39.[37] K. Faber, B.R. Kowalski, J. Chemom. 11 (1997) 181.

X.-H. Song et al. / Analytica Chimica Acta 446 (2001) 329–343 343

[38] T. Morsing, C. Ekman, J. Chemom. 12 (1998) 295.[39] M. Høy, K. Steen, H. Martens, Chemom. Intell. Lab. Syst.

44 (1998) 123.[40] M.C. Denham, J. Chemom. 14 (2000) 351.[41] N.M. Faber, J. Chemom. 14 (2000) 363.[42] N.M. Faber, Chemom. Intell. Lab. Syst., in press.

[43] N.M. Faber, D.L. Duewer, S.J. Choquette, T.L. Green, S.N.Chesler, Anal. Chem. 70 (1998) 2972.

[44] R. Boqué, M.S. Larrechi, F.X. Rius, Chemom. Intell. Lab.Syst. 45 (1999) 397.

[45] N.M. Faber, X.-H. Song, P.K. Hopke, TRAC, in preparation.[46] H. van der Voet, J. Chemom. 13 (1999) 195.