Issues When Modeling Benzene, Toluene, and Xylene Exposures Using a Literature Database

13
This article was downloaded by: [Stephen B. Thacker CDC Library] On: 03 October 2014, At: 05:43 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Occupational and Environmental Hygiene Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uoeh20 Issues When Modeling Benzene, Toluene, and Xylene Exposures Using a Literature Database Misty J. Hein a , Martha A. Waters a , Edwin van Wijngaarden b , James A. Deddens a c & Patricia A. Stewart d a National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention , Cincinnati , Ohio b Department of Community and Preventive Medicine , University of Rochester , Rochester , New York c Department of Mathematical Sciences , University of Cincinnati , Cincinnati , Ohio d Division of Cancer Epidemiology and Genetics , National Cancer Institute , Bethesda , Maryland Published online: 04 Dec 2007. To cite this article: Misty J. Hein , Martha A. Waters , Edwin van Wijngaarden , James A. Deddens & Patricia A. Stewart (2007) Issues When Modeling Benzene, Toluene, and Xylene Exposures Using a Literature Database, Journal of Occupational and Environmental Hygiene, 5:1, 36-47, DOI: 10.1080/15459620701763947 To link to this article: http://dx.doi.org/10.1080/15459620701763947 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Transcript of Issues When Modeling Benzene, Toluene, and Xylene Exposures Using a Literature Database

This article was downloaded by: [Stephen B. Thacker CDC Library]On: 03 October 2014, At: 05:43Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Journal of Occupational and Environmental HygienePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/uoeh20

Issues When Modeling Benzene, Toluene, and XyleneExposures Using a Literature DatabaseMisty J. Hein a , Martha A. Waters a , Edwin van Wijngaarden b , James A. Deddens a c &Patricia A. Stewart da National Institute for Occupational Safety and Health, Centers for Disease Control andPrevention , Cincinnati , Ohiob Department of Community and Preventive Medicine , University of Rochester , Rochester ,New Yorkc Department of Mathematical Sciences , University of Cincinnati , Cincinnati , Ohiod Division of Cancer Epidemiology and Genetics , National Cancer Institute , Bethesda ,MarylandPublished online: 04 Dec 2007.

To cite this article: Misty J. Hein , Martha A. Waters , Edwin van Wijngaarden , James A. Deddens & Patricia A. Stewart(2007) Issues When Modeling Benzene, Toluene, and Xylene Exposures Using a Literature Database, Journal of Occupationaland Environmental Hygiene, 5:1, 36-47, DOI: 10.1080/15459620701763947

To link to this article: http://dx.doi.org/10.1080/15459620701763947

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Journal of Occupational and Environmental Hygiene, 5: 36–47ISSN: 1545-9624 print / 1545-9632 onlineDOI: 10.1080/15459620701763947

Issues When Modeling Benzene, Toluene, and XyleneExposures Using a Literature Database

Misty J. Hein,1 Martha A. Waters,1 Edwin van Wijngaarden,2

James A. Deddens,1,3 and Patricia A. Stewart4

1National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention,Cincinnati, Ohio2Department of Community and Preventive Medicine, University of Rochester, Rochester, New York3Department of Mathematical Sciences, University of Cincinnati, Cincinnati, Ohio4Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland

A database of benzene, toluene, and xylene measurementswas compiled from an extensive literature review that containedinformation on several exposure determinants, including jobtype, operation, mechanism of release, process type, venti-lation, temperature, distance from the source, quantity, andlocation. The database was used to develop statistical modelsfor benzene, toluene, and xylene exposure as a function ofoperation and other workplace determinants. These modelscan be used to predict exposure levels for subjects enrolledin community-based case-control studies. This article presentsthe derived parameter estimates for specific operations andadditional workplace exposure determinants and describesa number of statistical and data limitation issues that areinherent in determinants modeling of historical published data.

[Supplementary materials are available for this article. Goto the publisher’s online edition of Journal of Occupational andEnvironmental Hygiene for the following free supplemental re-source(s): a PDF file of QQ plots and a Word file with referencesused in the benzene/toluene/xylene exposure database.]

Keywords case control studies, exposure assessment, exposuredeterminants, occupational exposure

Address correspondence to: Misty J. Hein, Division of Surveil-lance, Hazard Evaluations and Field Studies, National Institute forOccupational Safety and Health, Centers for Disease Control andPrevention, 4676 Columbia Parkway, Mail-Stop R-13, Cincinnati,OH 45226; e-mail: [email protected].

The findings and conclusions in this report are those of the authorsand do not necessarily represent the views of the National Institutefor Occupational Safety and Health.

INTRODUCTION

E xposure assessment for population- or hospital-basedcase-control studies is a challenging task. Unlike in

cohort or industry-based case-control studies, in population- orhospital-based case-control studies, a large number of agentsmay have been identified a priori for investigation, participants

have worked in a wide variety of jobs and industries, andexposure estimation almost always relies on questionnairesor interviews rather than direct measurements.(1,2) Methods ofcollecting information for occupational exposure estimationfor population-based case-control studies have evolved overtime. Traditionally, the collection methods focused on self-reports of working with an exposure agent and on workhistories to estimate risks by industry and occupation.(3)

Methods have evolved to job-specific or exposure-specificquestionnaires that have been used to collect more detailedinformation on exposure determinants.(2,4−8) Participants canusually provide job history information with accuracy,(2) fromwhich exposure determinants can be reasonably assessed byexperts.(9)

Along with improvements in data collection, developmentof exposure estimates in case-control studies progressed fromusing qualitative responses directly (e.g., ever/never reportingexposure to a chemical or holding a particular type of job,such as farmer) to matching reported industries or occupationsfrom work histories to a job exposure matrix (JEM).(10,11)

These methods have major limitations, so that more recently,subject-specific information beyond industry and occupationhas been used to adjust or refine estimates initially based onJEMs.(9,12−16) Estimates developed from JEMs or by experts,however, may not be comparable across studies becauseexposure assessors may use different evaluation criteria andmethods to estimate exposure levels.

Furthermore, the lack of published details on the exposureestimation process makes it difficult to evaluate the credibilityof the exposure assessment. These details include, but are notlimited to: (a) the dimensions of the exposure incorporatedinto the estimates (e.g., intensity, frequency, duration and/orintermittency); (b) methods for adjusting JEM estimates basedon exposure modifier information (e.g., use of protectiveequipment or controls); (c) completeness of information onexposure levels and exposure modifiers across subjects, jobs,

36 Journal of Occupational and Environmental Hygiene January 2008

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

or operations; (d) assumptions made when data are missing; (e)the methods by which and extent to which published exposuredata are used to calibrate the assessor and derive exposureestimates; and ( f ) how temporal changes in exposures areincorporated into estimates.

The exposure information collected from work sites incohort studies is usually directly applicable to the studysubjects. In case-control studies, descriptive exposure infor-mation comes from the subject, and measurement data fromthe work sites reported by study subjects are not readilyavailable. However, the methods used in cohort studies maybe applied, with modification, to the case-control design.Published exposure measurement data and associated exposuredeterminants can be used to develop a model that can beapplied to exposure determinant information reported by thestudy subject or assigned by industrial hygienists to the studysubjects’ jobs to estimate the subjects’ exposure levels.

A database of benzene exposure levels and associated de-terminants was developed by van Wijngaarden and Stewart(17)

for a community-based case-control study investigating therelationship between childhood brain cancer and parentaloccupation.(16,18) The database, constructed from informationcontained in published references, was expanded to includetoluene and xylene measurements and associated determinants.One objective of the present work was to describe the devel-opment of statistical prediction models from the measurementdata and the determinants that can be used to systematicallyestimate quantitative levels of benzene, toluene, and xyleneexposures. A second objective was to describe the decisionsmade to develop the models and the advantages and limitationsof an exposure modeling approach for case-control studies.The models described here will be used to predict exposurelevels of study subjects from determinants assigned to subjectsby an industrial hygienist. The determinants will be derivedfrom work history information (including, but not limited to,type of business, job title and job tasks) reported by the studysubjects.

METHODS

Exposure DatabasesThe benzene database was the product of an extensive

literature review of studies conducted in North America inthe 1980s and 1990s identifying the uses, occurrences, andexposure levels to benzene in a variety of industries.(17) Dataprior to the 1980s were not collected because they wereof limited relevance to the study for which the data weredeveloped. Databases were constructed for toluene and xylenein a similar manner. The publications provided measurementson one or more of the chemicals of interest from the years1979–2001 (Table I). The data comprised primarily long-term(60 min or longer) personal air samples; however, in instanceswhere the published reports did not present data from long-term personal air samples, observations derived from short-and long-term area air samples and short-term personal air

samples were included. The area air samples included inthe database were general room area samples, not sourcesamples.

The publications containing the measurement data some-times reported individual measurements and sometimes re-ported summary measures incorporating two or more obser-vations (median = 11 observations). Because there were toofew instances of individual results or summary results alone,both types of reports were used for the model development.Some summary measures comprised both personal and arealong-term samples and are henceforth denoted as “mixed”samples. Summary measures were usually arithmetic means;however, some publications provided only a geometric mean(GM) and geometric standard deviation (GSD), only a GM,or only a range. Rather than lose summary measures thatwere not arithmetic means, the authors elected to retain theseobservations by using the reported information to estimate thearithmetic mean. When both the GM and GSD were provided,a lognormal distribution was assumed and

AM = GM × exp[1/2 × (ln (GSD))2] (1)

was used to provide an estimate of the arithmetic mean.(19)

If the GM was provided, but not the GSD, the GSD wasestimated to be 3.5 and a similar conversion was made. Thisvalue, although higher than what has often been observed,(20)

was selected because many of the measurement data wereacross different jobs and work sites that probably were nothomogeneous. If only the range was provided, the arithmeticmean was estimated by assuming a lognormal distributionaccording to the following algorithm: first, the midpoint ofthe log transformed minimum and maximum levels providedan estimate of the mean of the log transformed levels (µL );second, the range of the log transformed levels divided byfour provided an estimate of the standard deviation of the logtransformed levels (σL ); and finally

AM = exp[µL + 1/2 × σ 2

L

](2)

provided an estimate of the arithmetic mean. In the following,the term “reported level” refers to both individual measure-ments and reported or estimated arithmetic means.

The reported levels, which were based on varying numbersof observations, comprised the outcome variables for themodeling exercise, whereas the specific exposure determinantscomprised the predictor variables. When modeling a statisticbased on different numbers of observations, it is appropriateto weight each observation using a weight that is proportionalto the inverse of the variance of the statistic. Most publicationsthat provided an arithmetic mean did not provide the standarddeviation; therefore, we used a logical alternate weight (i.e.,the sample size associated with the reported level) was used.In the few instances where summary reported levels wereprovided without a corresponding sample size, an estimateof the magnitude of the sample size (i.e., 1 or 10) wasmade based on information provided in the published report.Information such as the purpose of the sample collection, extent

Journal of Occupational and Environmental Hygiene January 2008 37

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

TABLE I. Characteristics of the Benzene, Toluene, and Xylene Exposure Database

Characteristic Benzene Toluene Xylene

Publications/reported levelsA 56/454 82/596 84/587Levels excluded based on publication year −50 −18 −20Levels excluded based on unknown type (personal or area) −2 −1 −6Levels excluded based on unknown determinant(s) −22 −3 −1Levels excluded for other reasons −1 −0 −0Publications/reported levels available for modeling 47/379 78/574 78/560Sample duration/type

Long-term/personal 284 (75%) 473 (82%) 455 (81%)Long-term/area 21 (6%) 11 (2%) 5 (<1%)Long-term/mixedB 29 (8%) 41 (7%) 44 (8%)Short-term/personal 42 (11%) 49 (9%) 56 (10%)Short-term/area 3 (<1%) 0 (0%) 0 (0%)

Sample size sourceN provided in the publication 339 (89%) 557 (97%) 545 (97%)N estimated (not directly provided in the publication) 40 (11%) 17 (3%) 15 (3%)

MeasurementsIndividual (i.e., N = 1) 136 (36%) 439 (76%) 424 (76%)Summary measures (i.e., N > 1) 243 (64%) 135 (24%) 136 (24 %)

Measurement levelReported as non-detectC 5 (1.3%) 24 (4.2%) 51 (9.1%)Reported as < LOD, LOD provided 28 (7.4%) 17 (3.0%) 17 (3.0%)Reported as < LOD, LOD not providedC 16 (4.2%) 14 (2.4%) 15 (2.7%)Reported as level or AM 320 (84.4%) 517 (90.1%) 473 (84.5%)Based on GM, GSD 0 (0%) 0 (0%) 1 (0.2%)Based on GM only 5 (1.3%) 0 (0%) 0 (0%)Based on range 5 (1.3%) 2 (0.4%) 3 (0.5%)

AThe term “reported level” refers to both individual measurements and summary measures.B The mixed sample type denotes summary measures comprised of both personal and area samples.C Levels reported as non-detect or below the limit of detection (LOD) where the LOD was not provided were assigned a level of 0.05/

√2, 0.14/

√2 and 0.07/

√2

mg/m3 for benzene, toluene and xylene, respectively.

of measurements for other agents, the time span over which themeasurements were collected, and nonquantitative commentsin the published report that were suggestive of the scale of themeasurement collection effort were all used to estimate themagnitude of the sample size.

Results reported as below the limit of detection (LOD),nondetectable, or zero were replaced with the LOD dividedby the square root of two if the LOD was provided in thepublication, or an assigned LOD divided by the square rootof two if the LOD was not provided in the publication. Mostof the reported levels that received an assigned LOD were forlong-term charcoal tube air samples collected by the NationalInstitute for Occupational Safety and Health (NIOSH) forHealth Hazard Evaluations (HHEs) in the 1980s. AssignedLODs were based on method LODs from the NIOSH Manualof Analytical Methods (NMAM) Method 1501 for aromatichydrocarbons (0.5, 0.7, and 0.8 µg/sample for benzene, tolueneand xylene, respectively) and typical sample volumes (10,5, and 10 L for benzene, toluene and xylene, respectively)that resulted in LODs of 0.05, 0.14, and 0.07 mg/m3 for

benzene, toluene and xylene, respectively.(21) Reported levelsin mg/m3 were converted to ppm using standard conversionfactors derived at 25◦C and 1 atmosphere of pressure (1 ppm =3.19 mg/m3 benzene, 1 ppm = 3.77 mg/m3 toluene and1 ppm = 4.34 mg/m3 xylene).(22)

An industrial hygienist characterized each reported levelaccording to the operation being measured (brushing, chemicalreaction, cleaning, coating, combustion, dipping, drying/off-gassing, gluing, laboratory work, loading and unloading, mix-ing, repair, rolling, separation, spraying, spreading, stacking,tank entry, and wiping). Additional workplace determinantsevaluated comprised the type of process and ventilation (closedprocess, closed process with local exhaust ventilation (LEV) atpoints where the process was opened, open process with LEV,and open process with inadequate LEV); temperature (elevatedabove room temperature and room temperature); distance fromthe source (1.8 m or more from the source and less than1.8 m from the source); quantity (less than 380 L/month, 380–3800 L/month and greater than 3800 L/month); and location(indoor, enclosed [i.e., a confined space] and outdoor). The

38 Journal of Occupational and Environmental Hygiene January 2008

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

distance of 1.8 m was selected to represent an extended breath-ing zone, incorporating factors such as worker movementto perform manual work (e.g., lab, brush painting, etc.) andmixing of room air with the breathing zone air. In general,little information on job titles was provided in the literature.Measurement levels were not used by the coder in assigning thedeterminants.

Some operations were combined based on similarity whendeveloping the models to eliminate operations with fewerthan five observations (e.g., the rolling and coating operationswere combined, as were the dipping and mixing operations,in models for benzene). An exception was the combustionoperation, which was included in the modeling but notcombined with any other operation even though there wereonly four reported levels, since the exothermic nature ofcombustion was different from the other operations. Indeed, thecombustion mean exposure level was substantially lower thanthe mean exposure levels for all the other operations (results notshown).

The indoor and enclosed locations were combined sincemost of the reported levels for the enclosed location werefor the tank entry operation (e.g., 19 of 21 enclosed benzeneresults were tank entry) and most of the reported levels forthe tank entry operation were classified as enclosed (e.g.,19 of 22 tank entry benzene results were enclosed). Themechanism of release (e.g., displacement, aerosol, agitation,evaporation and miscellaneous) was evaluated and consideredas an alternative to operation. When the year of sampling wasnot mentioned in the report, the publication year was used as asurrogate.

Exposure ModelingAll statistical analyses were performed using SAS 9

Software (SAS Institute Inc., Cary, N.C.). The distributionsof the reported exposure levels were highly skewed to theright and, although the Shapiro-Wilk tests of log-normalitywere rejected (as expected due to the large sample sizes), avisual examination of the QQ plots indicated that the datawere consistent with log-normality. Consequently, the naturallogarithm of the reported level was used as the dependentvariable in the modeling exercise.

Independent variables included operation and the otherdeterminants in addition to sample type (area, mixed andpersonal), sample duration (short and long term) and theyear of publication. Type and duration were included inthe model to control for possible confounding associatedwith these sampling characteristics. Reference categories forduration and type of sample were selected so that the modelintercept would be for a long-term personal sample. Yearof publication was treated as a categorical variable (1979–1984, 1985–1989, 1990–1994, and 1995–2001) in all models.Determinants were entered into the model as dummy variables.For some operations, there was not much variability in the otherdeterminants; therefore, a test for possible interactions betweenoperation and the determinants was not feasible. The general

form of the model was given by

ln(yi ) =β0 +β1 (operation = brushing) + · · · +β18 (operation = tank–entry) +β19 (process-ventilation = closed) +β20 (process-ventilation = closed with LEV) +β21 (process-ventilation = open with LEV) +β22 (temperature = above room temperature) +β23 (distance = 1.8 meters or longer) +β24 (quantity = less than 380 liters per month) +β25 (quantity = more than 3800 liters per month) +β26 (location = outdoors) +β27 (year = 1979–1984)+β28 (year = 1985–1989)+β29 (year = 1990–1994) +β30 (type = area) +β31 (type = mixed) +β32 (duration = short–term) +εi

(3)

where yi was the i th reported level (in ppm); β0 was theintercept; β1–β18 were the parameters for operation; β19–β26

were the parameters for the additional exposure determinants;β27–β32 were the parameters for publication year, sampletype, and sample duration; and εi was the random errorterm. The GLM procedure in SAS was used to estimatemodel parameters. Observations were weighted by the samplesize associated with the reported level using the WEIGHTstatement.

Effects were evaluated using the multiple coefficient ofdetermination (i.e., R-squared); however, it is important tonote that since all models incorporated weights, the valueof R-squared does not have a clear-cut interpretation.(23)

Because the primary objective of the modeling exercise wasprediction, the weighted mean square error (MSEw), calculatedas the mean of the squared differences between the observedand predicted log transformed reported levels (weighted bysample size), was also used to evaluate the models. Twomodeling strategies were compared: the first was to fit a modelcontaining all available determinants and the second was to fita reduced model that was selected based on a manual backwardstepwise elimination routine. In the latter, operation wasforced into the final model as were sample duration and type.The remaining workplace determinants (process-ventilation,temperature, distance, quantity, location, and categorical yearof publication) were assessed for significance in the modelscontaining all variables. Variables were removed one at atime beginning with the variable with the largest p-value untilall remaining variables were significant at the 0.05 level ofsignificance.

Predicted exposure levels in ppm were obtained fromthe models by exponentiation of the results. Because theoutcome variable comprised both individual measurementsand summary measures and because it was necessary to apply alog transformation to the data prior to modeling, the predictedexposure levels cannot be strictly interpreted as an arithmeticmean (as they could have if the data had not been log trans-formed) nor as a geometric mean (as they could have if all of

Journal of Occupational and Environmental Hygiene January 2008 39

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

the data had consisted of individual measurements) but shouldrather be thought of as generic measures of central tendency.

Predicted exposure levels were calculated for each oper-ation in the database using the most predominant (i.e., themost common) levels of the determinants associated with eachoperation. The Spearman correlation coefficient was used todetermine if the rank of the operations based on the predictionmodels was correlated with the rank of the operations basedon the unadjusted arithmetic or geometric means.

Model ValidationThe importance of model validation in retrospective ex-

posure assessment has been much discussed.(24–27) Exposuremodelers have employed several methods for model validation.Collecting additional data from other plants or countries tovalidate the model was not feasible because the model wasdeveloped using a comprehensive data set of benzene, tolueneand xylene exposure measurements spanning industries andoperations.(9,28–29) Comparing predicted exposure levels basedon the model to predicted levels based on expert judgment(i.e., predicted levels made by a single or panel of industrialhygienists)(30) was not done because this approach is typicallydone in the context of a plant- or industry-specific modelas opposed to the exposure models described here that arerepresentative of a large number of industries and operations. Astrict data splitting method was not used here because althoughthe number of reported levels for each of benzene, toluene,and xylene was fairly high, the large number of operationsand determinant combinations did not lend itself to the datasplitting approach.(31)

In addition, because a predicted exposure level from themodel is neither an arithmetic nor a geometric mean, a directcomparison of observed and predicted exposure levels maynot be entirely appropriate. Nevertheless, given the recognizedneed for model validation, validation of the modeling processwas performed using an internal cross-validation method thatinvolved a combination of data splitting and Monte Carlotechniques.(32)

For each chemical, the validation was limited to theoperations with 20 or more reported levels (e.g., for benzene,the coating, drying/off-gassing, laboratory work, loading andunloading, mixing, separation, and tank entry operations).For each of the Monte Carlo iterations, 80% of the reportedexposure levels were randomly selected to estimate aprediction model using the terms in the reduced modelsdescribed above. This model was applied to the remainingreported levels (20%), which allowed for the comparisonof observed and predicted exposure levels for validationpurposes. Limiting the data to observations with 20 or morereported levels ensured that there would be, for each operation,a reasonable number of observations in the modeling data set(i.e., at least 16) and a reasonable number of observations inthe validation data set (i.e., at least 4).

The association between observed and predicted exposurelevels overall for the validation database was measured usingSpearman and Pearson correlation coefficients. The process

was repeated 1000 times with a different randomly selectedset of exposure levels at each of the iterations. The mean ofthe correlation coefficients obtained in 1000 iterations of thisprocess was used as a measure of the association betweenthe observed and predicted exposure levels in the validationdatabases, which in turn was used to estimate the validity of themodeling process, rather than the validity of any one specifiedmodel. The 2.5 and 97.5 percentiles of the obtained correlationcoefficients were used to provide 95% confidence intervals forthe mean correlations.

RESULTS

T able II presents measures of fit for several models of ben-zene, toluene and xylene. All models contain duration and

type of sample. The first model contains no additional effects.Models 2–5 illustrate the effects of operation, of mechanismof release, of five other workplace determinants of exposure(i.e., process-ventilation, temperature, distance, quantity, andlocation) and of publication year, respectively. Model 6 isthe “full model” and includes all variables except mechanismof release. Model 7 represents the “reduced model” that isdescribed below separately for each chemical. Mechanism ofrelease, a variable considered as an alternative to operation,was dropped from further consideration because it did not varysubstantially within operations and did not perform better thanoperation.

Operation was an important effect for all three chemicals butparticularly for toluene and xylene. The five other workplacedeterminants taken together were important effects for allthree chemicals. Publication year was also an important effect,particularly for benzene and toluene. The reduced models,selected based on statistical significance of the variables,included some parameter estimates that were not alwaysinterpretable (i.e., they were in an unanticipated direction).For example, although the parameter estimates for the outdoorlocation were always negative (as expected when comparedwith working indoors), the parameter estimates for workingat a distance 1.8 m or more from the source were negative forbenzene and toluene (as expected) but positive for xylene whencompared with working within 1.8 m from the source. In thebenzene model only, the parameter estimate for low quantitywas in an unanticipated direction, so the low and mediumcategories were combined resulting in high vs. low/medium.In the xylene model only, the term for distance was removedbased on the value of the parameter estimate.

In addition to duration and type, the reduced model for ben-zene consisted of operation, distance, quantity and publicationyear. For toluene, all determinants were statistically significantin the full model; consequently, a reduced model is not pre-sented. For xylene, the reduced model consisted of operation,process-ventilation, and location. Other determinants were notretained in the various reduced models since they were notsignificant predictors and did not improve model fit.

Parameter estimates and standard errors from the full (forbenzene, toluene, and xylene) and reduced (for benzene and

40 Journal of Occupational and Environmental Hygiene January 2008

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

TABLE II. Measures of Fit for Exploratory Models of Natural Log Transformed Benzene, Toluene, and XyleneLevels

Model Measure of FitA Benzene (ppm) Toluene (ppm) Xylene (ppm)

1: Duration, type R2 0.7% 4.2% 3.1%MSEw 0.88 4.11 5.58

2: Duration, type, operation R2 12.8% 45.4% 47.7%MSEw 0.78 2.34 3.01

3: Duration, type, mechanism of release R2 2.9% 27.0% 17.0%MSEw 0.86 3.13 4.78

4: Duration, type, process-ventilation, R2 12.5% 32.5% 16.1%temperature, distance, quantity, location MSEw 0.78 2.90 4.83

5: Duration, type, publication year R2 23.8% 17.7% 14.7%MSEw 0.68 3.53 4.91

6: Full modelB R2 38.5% 63.3% 54.7%MSEw 0.55 1.57 2.61

7: Reduced modelC R2 34.4% Not 53.1%MSEw 0.58 applicable 2.70

Cross-validationD Spearman 0.36 (0.19–0.51) 0.46 (0.34–0.57) 0.33 (0.19–0.44)Pearson 0.38 (0.24–0.50) 0.44 (0.32–0.54) 0.35 (0.24–0.46)R2 15.0% (6%–25%) 19.5% (10%–29%) 12.9% (6%–22%)

AMeasures of fit: R2 is the multiple coefficient of determination reported by the GLM procedure, and MSEw is a unitless quantity defined as the mean of thesquared differences between the observed and predicted log transformed exposure levels weighted by sample size.B In addition to operation, duration, and type, the full model included process-ventilation, temperature, distance, quantity, location, and publication year.C In addition to operation, duration, and type, the reduced model for benzene included distance, quantity, and publication year; the reduced model for xyleneincluded process-ventilation and location.DModel cross-validation limited to operations with 20 or more exposure levels. The data set was randomly split into a model data set (80%) and a validation dataset (20%); the former was used to estimate model coefficients which were applied to the data in the latter in order to estimate predicted values. The process wasrepeated 1000 times. Spearman and Pearson denote the mean correlation between the observed and predicted exposure levels in the validation data sets. R2 denotesthe mean of the square of the Pearson correlations. Numbers in parentheses denote 95% confidence intervals estimated using the 2.5 and 97.5 percentiles.

xylene only) models are presented in Table III. Either the full orreduced models could be used to provide exposure estimates.For example, the models can be used to predict a meanlong-term personal level of benzene exposure for laboratoryworkers. Laboratory workers in the early 1980s were generallyassigned determinants of open process with inadequate LEV,room temperature, a distance of less than 1.8 m from the source,a quantity of less than 380 L/month and an inside location.Based on the reduced model, the estimate for the mean long-term personal benzene exposure for laboratory workers withthese determinant levels in the early 1980s would be given byexp[−2.07 − 0.59 + 0 + 0 + 1.93] = 0.49 ppm. The 95%confidence interval for this estimate is 0.16–1.44 ppm. In thesame manner, predicted exposures for these same workers inthe late 1980s, early 1990s and late 1990s would be 0.29 ppm,0.22 ppm, and 0.07 ppm, respectively.

The predicted values for each operation, using the mostpredominant levels of the determinants associated with theoperation in the database, were correlated with the operation-specific unadjusted arithmetic means of the measurements. TheSpearman correlation coefficients were 0.70, 0.76, and 0.94, forbenzene, toluene and xylene, respectively. For the operation-specific unadjusted geometric means of the measurements, the

Spearman correlation coefficients were 0.71, 0.92, and 0.97,for benzene, toluene and xylene, respectively.

Results of the limited validation performed using MonteCarlo and data splitting techniques indicated only modestsupport for the modeling process. Pearson and Spearmancorrelation coefficients between observed levels in thevalidation dataset and predicted levels based on the estimatedmodel derived from the modeling dataset were similar(Table II). The means of the Pearson correlation coefficientswere 0.38 (95% CI 0.24–0.50) for benzene, 0.44 (95% CI0.32–0.54) for toluene, and 0.35 (95% CI 0.24–0.46) forxylene. The validation R-squared values (15.0%, 19.5%, and12.9% for benzene, toluene, and xylene, respectively) indicatemoderate amounts of “shrinkage” from the full/reduced modelR-squared values. Shrinkage indicates the characteristicof generalizability of the model to other datasets. Thus amoderate amount of shrinkage indicates that the models’generalizability to other situations is limited. Because thevalidation was performed on a subset of the measurement data,specifically, operations with 20 or more reported levels, thesevalues should not be interpreted as a validation of the specificmodels presented in Table III; rather, they are indicative ofthe validity of the process used to obtain these models.

Journal of Occupational and Environmental Hygiene January 2008 41

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

TABLE III. Parameter Estimates and Standard Errors for Models of Natural Log Transformed Benzene,Toluene, and Xylene Exposure Levels

Benzene (ppm) Toluene (ppm) Xylene (ppm)

Full Reduced FullA Full Reduced

Term β SE β SE β SE β SE β SE

Intercept Intercept −2.97 0.77 −2.07 0.73 2.55 0.36 1.60 0.55 2.07 0.43Operation Brushing — — — — −0.68 0.58 −2.06 3.19 −2.07 3.22

Chemical reaction −1.68 0.85 −2.45 0.82 −0.39 0.80 −1.41 3.35 −1.61 3.34Cleaning −1.86 1.17 −2.32 1.15 −0.90 0.89 −1.56 0.64 −1.42 0.58Coating −0.07 0.97 −0.83 0.95 −2.07 0.45 −1.60 0.71 −1.32 0.69Combustion −0.98 1.34 −1.12 1.31 −3.13 0.95 −1.98 2.98 −1.82 2.98Dipping 0.37 0.80 −0.33 0.77 −4.66 0.59 −3.28 0.55 −2.99 0.50Drying/off-gassing −0.09 0.76 −0.77 0.74 −3.09 0.38 −5.00 0.63 −4.47 0.52Gluing −1.11 0.91 −1.72 0.90 −2.50 0.50 −2.25 1.42 −2.25 1.43Laboratory work −0.79 0.87 −0.59 0.88 −2.19 0.77 −3.01 1.12 −2.86 1.08Loading and unloading −0.19 0.81 −1.33 0.74 −1.48 0.40 −3.65 0.62 −3.35 0.57Mixing —B — — — −0.74 0.42 −0.14 0.69 −0.51 0.63Repair −0.68 0.76 −1.16 0.76 −5.52 0.42 −6.18 0.60 −6.56 0.50Rolling —B — — — −1.28 0.35 −0.04 0.56 0.00 0.56Separation −0.93 0.79 −1.68 0.74 −0.90 0.54 −0.42 1.01 −0.60 0.88Spraying −0.35 0.88 −0.99 0.87 −0.45 0.35 −2.31 0.53 −2.09 0.48Spreading — — — — −2.09 0.93 −0.42 1.66 −0.27 1.66Stacking −0.13 0.77 −0.75 0.75 −3.36 0.35 −4.02 0.55 −4.02 0.52Tank entry REFC REF — 1.56 1.24 1.23 1.83 2.35 1.77Wiping — — — — REF — REF — REF —

Process-ventilation Closed 0.19 0.34 −0.97 0.88 −5.11 2.78 −4.02 2.75Closed with LEV 0.13 0.41 −2.31 0.41 −3.36 0.72 −2.99 0.53Open with LEV −0.37 0.23 −1.55 0.24 −0.55 0.33 −0.69 0.25Open with inadequate LEV REF — REF — REF — REF —

Temperature >room temperature −0.13 0.28 0.59 0.28 0.02 0.50Room temperature REF — REF — REF —

Distance ≥1.8 m from source −0.81 0.32 −0.92 0.29 −0.63 0.22 1.07 0.34<1.8 m from source REF — REF — REF — REF —

Quantity <380 L/month 1.06 0.34 REFD — −1.05 0.24 0.13 0.30380–3800 L/month REF — REF — REF — REF —>3800 L/month 1.10 0.17 0.81 0.15 0.72 0.29 −0.26 0.42

Location Outdoors −0.43 0.29 −2.78 0.26 −1.33 0.41 −1.77 0.38Indoors/enclosed REF — REF — REF REF REF —

Year of publication 1979–1984 2.09 0.27 1.93 0.26 0.71 0.26 0.48 0.321985–1989 1.72 0.21 1.41 0.20 0.44 0.28 0.46 0.391990–1994 1.40 0.30 1.15 0.29 −0.48 0.31 0.62 0.411995–2001 REF — REF — REF — REF —

Type of sample Area −0.74 0.99 −0.65 1.01 1.36 0.40 1.03 2.02 0.93 2.03MixedE −1.58 1.33 −1.56 1.36 −1.01 0.37 −0.21 0.55 −0.17 0.53Personal REF — REF — REF — REF — REF —

Duration of sample Short-term 1.02 0.41 0.98 0.38 1.08 0.25 0.36 0.30 0.30 0.27Long-term REF — REF — REF — REF — REF —

AAll exposure determinant terms were statistically significant, so a reduced model was not necessary.B In the models for benzene, the rolling operation was combined with coating and the mixing operation was combined with dipping.C REF denotes the reference category.DIn the reduced model for benzene, the low and medium categories combined served as the reference category for quantity.E Mixed sample type denotes summary measures reportedly comprised of personal and area samples.

42 Journal of Occupational and Environmental Hygiene January 2008

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

DISCUSSION

T he models of exposure presented here are empirical ratherthan theoretical; that is, they are based on the exposure

concentration data available. The overarching objective wasthe retrospective estimation of exposures for subjects for whichthere are no measurements available and for exposure scenariosthat may no longer exist. Without vast resources to re-createsuch exposure scenarios, the method used to estimate exposurelevels in population-based case-control studies has been basedon ad hoc industrial hygiene judgment. Such assessment maybe improved somewhat by the use of several raters, allowinginter-rater reliability to be calculated, which can provide anestimate of misclassification.

Actual validation of the exposure assessment process,however, can be assessed only on present-day exposurescenarios either real or re-created and cannot capture theuncertainty element due to extrapolation to past exposures. Theprocess of estimating exposure levels based on measurementsand determinant information provided in the literaturedescribed herein provides a method to more rigorously andtransparently estimate exposure levels for case-control studieswhen limited subject- and/or work site-specific exposureinformation is available.

Prior to modeling, several issues related to the use ofthe data compiled from the published literature had to beresolved. Most of these issues stem from reporting differencesamong the publications. First of all, not all publications pro-vided individual measurements, and because both individualmeasurements, and summary measures were to be used, adecision was made to perform a weighted regression analysis.Although a weight proportional to the inverse of the varianceof the natural log of the arithmetic mean would have beenoptimal, not all publications reported measures of variability;therefore, estimated weights consisting of the sample sizesassociated with the reported levels were used. The use of theseestimated weights further requires an assumption of commonvariability; consequently, the effect of their use is difficult toassess. Simulations (results not shown) indicated, however,that using the sample size to estimate the weights providedestimates closer to the arithmetic mean than the geometricmean when compared with estimates obtained using a modelthat did not incorporate weights. Furthermore, since not allpublications provided the sample sizes associated with thereported levels, estimates of the magnitude of the samplesize (i.e., 1 or 10) were made in a few instances based onadditional information provided in the publications. A secondissue is that not all publications provided an arithmetic mean;rather than lose those observations, the arithmetic mean wasestimated from either the GM and GSD or the range. Theeffect of using these estimated values is not expected tobe great since reported levels based on estimated arithmeticmeans or estimated sample sizes constituted a small fractionof the overall database. The authors considered excludingthese observations but thought that they contained valuableinformation.

A third issue is that publications also varied in reportingof samples below the LOD; therefore, a common LOD wasassigned to samples reported as below the LOD where theLOD was not reported. Because this did not occur very often,the effect was expected to be minimal. Replacing censoredvalues with LOD/2 or LOD/

√2 in regression modeling has

been shown to produce biased estimates, especially when thecensoring rate is above 10%.(33) Because the overall censoringrate in the database used here was around 12%, the imputationis likely to produce slightly higher mean estimates. Finally,publication date was used as a surrogate for the measurementdate since publications did not consistently report the yearsduring which the measurements were performed. This lastlimitation should have a minimal effect on the exposureestimates.

The exposure information suffers from several additionallimitations that may adversely affect the modeling results. Thedatabase included short-term and area air samples because theavailability of long-term personal measurements was limited.The absence of long-term personal air samples limited theability of the model to predict long-term, personal exposuresfor some operations. It is unclear what the impact of thislimitation is because information on the duration of exposurewithin the sampling period was not available. Publications inthe database, dated from 1979–2001, comprised both publishedliterature and reports from NIOSH HHEs; however, the latterwere available primarily from the 1980s.(17) When reportedlevels from HHEs were excluded from the model for benzene(not shown), estimates for publication year were largelyunchanged; however, for benzene, only 25% of the reportedlevels were from HHEs.

The databases for toluene and xylene, on the other hand,were based primarily on HHEs from the 1980s, with 76 and80% of the reported levels based on HHEs for toluene andxylene, respectively. A publication year effect was observedfor toluene, with higher levels observed in the 1980s vs. the1990s. Because 85% of the toluene levels from the 1980s werefrom HHEs and 80% of the toluene levels from the 1990swere from the published literature, the observed publicationyear effect may be representing a data source effect; indeed,publication year was not significant in a model for toluenelimited to HHEs from the 1980s. Alternatively, it may reflectreality.(34) Publication year was not retained in the reducedmodel for xylene.

Representativeness of the measurement data is necessaryfor accurate exposure estimates. The measurement data in thedatabase authors’ database do not represent a random sampleof all jobs with measurements to benzene, toluene and xylenein the time period of interest; rather, the data were obtainedvia a convenience sample of exposure data reported in theliterature. The reason that the data were originally collected(i.e., to investigate a complaint, to test for compliance, orto conduct research) was not evaluated nor included in themodeling because this information was rarely presented in theliterature and could not be inferred. It is possible that highexposed jobs are overrepresented in the database.

Journal of Occupational and Environmental Hygiene January 2008 43

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

Such a situation, however, should not bias the model if thedeterminants associated with the measurements of those highexposed jobs are accurately characterized to reflect high expo-sure situations (e.g., no ventilation or elevated temperature).In the epidemiologic study, different jobs would be assigneddifferent values for the same determinants, and thus wouldresult in being assigned different exposure levels. Anothersource of variability could have been due to the sampling andanalytic methods employed. Although this information was notcollected from the publications, its possible effect is likely tobe minimal because the period of study for the measurementswas predominantly in the 1980s when charcoal tubes were thestandard sampling method.

Furthermore, it is well known that there are a numberof sources of exposure level variability within a given op-eration due to industry-specific differences,(35) differencesin individual work practices, proximity to exposure sourcesand differences in the use of personal protective equipment;however, in the exposure database, for many operations, therewas not much variability in the exposure determinants. As aresult, a test for interaction between operations and the variousdeterminants was not feasible, but the result would likely be adecrease in the precision of the model coefficients. In spite ofthis, the inclusion of the additional workplace determinants inthe model containing operation resulted in an improved model,particularly for benzene. Likewise, data for specific operationswere not available for every year, and data for many operationswere limited to only a few years.

Given this and the overall size of the measurement database,it was not possible to explore the interaction between operationand year. Although decreasing trends were observed, particu-larly for benzene, this is an unfortunate limitation given thedecreasing trends generally observed in industrial exposuresover a 30-year period by Symanski et al.(34) Although exposuredata were abundant for some operations, there were severaloperations with only a small number of reported levels sothat many years were unrepresented. Some operations werecombined for modeling purposes because of the limitednumber of reported levels, but others remained distinct.

Although expected to be similar, the reduced models forthe three chemicals varied in terms of the determinants thatwere retained and in the percent of variability accounted for bythe determinants. This result was probably due to the limitednumber of exposure scenarios for most operations, the highcorrelation among determinants for particular operations, thelimited number of measurements for particular operations andthe limited amount of determinant information in many ofthe studies, which may have resulted in assigning erroneousdeterminants. Using the expert rating approach to estimateexposures, Siemiatycki et al.(36) observed that some chemicalagents resulted in higher reproducibility than other agents.Differences among the agents observed here may reflect asimilar phenomenon; however, it may reflect the limitation ofthis approach for specific chemicals.

The predicted estimates from these models, however, do nothave an interpretation as an arithmetic mean nor as a geometric

mean since the data comprised both individual measurementsand summary measures of exposure and additionally requireda logarithmic transformation prior to modeling. The predictedexposure levels should be thought of as generic measures ofcentral tendency.

In general, the actual value of the estimate would bedetermined by several factors, including the skewness ofthe underlying distribution, the proportion of data that isrepresented by summary measurement values, and the samplesizes associated with the summary values. Simulations (notshown) suggest that even for a moderately skewed distribution(GSD = 3), where a majority of the reported levels weresummary levels that were based on N > 1 data values (median10 data values per summary level), the predicted level, whilein between the arithmetic and geometric means, was closer tothe arithmetic mean than the geometric mean. The predictedexposure levels have some unknown uncertainty associatedwith them and do not represent absolute levels, but are ratherquantitative exposure estimates that likely are sufficient torank operation exposures and provide at least the order ofmagnitude of the exposure levels. This concern, however, isof limited importance in an epidemiologic study evaluatingcausality because the ranking of the subjects is more importantthan the actual exposure levels.

Initially, modeling was performed using a mixed-effectsapproach where the source publication was treated as a randomeffect. The rationale for including a random publication effectwas that observations from the same report or publicationwere likely to be correlated with each other. Including arandom publication effect, which resulted in estimates ofwithin-publication variability and between-publication vari-ability, seemed to be appropriate, but it soon became clearthat its inclusion resulted in a model that was not usefulfor prediction. Because the primary goal was to develop aprediction model that could be applied to study subjects,as opposed to publications, the estimated publication effectscould not be used. In addition, it became apparent that the maineffects of operation and publication year were confounded bythe random publication effect. Consequently, the mixed-effectsmodels were abandoned in favor of the fixed-effects models(presented here).

The models were developed to estimate exposure levelsfor a case-control study. In the case-control study for whichthese models were developed, only general information on job,industry and tasks was available on the study subjects. This lim-ited information means that judgment was needed to identifydeterminant information in the exposure assessment process.To overcome the limitation of extrapolating determinant in-formation from limited task information, questionnaires havebeen developed to collect detailed exposure information fromstudy subjects, but these questionnaires can be quite long.(1)

It is the authors’ experience that, based on the use of thesequestionnaires in several studies, subjects are able to respond toquestions on mechanism of release (e.g., brush, roller, or spraypaint), use and effectiveness of ventilation, temperature, andlocation. Others have shown that subjects are able to respond

44 Journal of Occupational and Environmental Hygiene January 2008

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

to questions on task.(37) Other determinants found importantin the models developed here such as process type, quantity,and distance, could be inferred by the industrial hygienist fromthe job, industry, tasks, and other information collected in thequestionnaire. Identification of important determinants in themodels also provides information as to what types of questionsshould be considered in developing questionnaires. For a studyof these three chemicals, the models developed here indicatedthat questions on temperature may not be necessary in thepresence of information on operation, since this determinantwas not in the reduced models for benzene and xylene.Alternatively, there may have been too few observations toobserve the effect (e.g., for xylene, almost 90% of the reportedlevels were assigned to room temperature).

An external model validation could not be performed fortwo reasons: the lack of a suitable validation data set and,even if one were available, the problems previously notedwith directly comparing predicted values based on the modelto actual measurements. The internal cross-validation of themodeling process on a subset of the complete dataset yieldedaverage Pearson correlation coefficients of 0.38 for benzene,0.44 for toluene, and 0.35 for xylene. These values, represent-ing the correlation between observed and predicted levels inthe validation subset, are around half of the correlations basedon the model R-squared values (R-squared values of 34, 63,and 53% yield correlations of approximately 0.6, 0.8, and 0.7for benzene, toluene, and xylene, respectively) obtained fromthe final models on the complete data set. These “validationcorrelations” are low to moderate at best and likely reflect theinherent limitations of using the published literature as thesource of the exposure information.

Empirical statistical exposure models such as those pre-sented here do not have the same degree of precision asdeterministic or physical exposure models of data collected forthe purpose of modeling. The models presented here must alsobe distinguished from determinants models developed basedon single-industry or single-plant data collected prospectively,such as for a cohort study. Vermuelen and Kromhout(38)

describe limitations in determinant-based exposure groupingsin a study where exposure data were collected purposively anddetailed determinant information such as task and personalprotective equipment was obtained through individual par-ticipant interviews. Vermuelen and Kromhout concluded thatwhile determinant-based groupings improve contrast betweengroups, they should be used with caution beyond the studiedpopulation or time period. The level of determinant detailused in the authors’ model development was considerably lessdetailed than the task and protective equipment informationavailable to Vermuelen and Kromhout since the authors’data were collected from the literature rather than collectedpurposively. In addition, often the determinant informationbeing assigned to the measurements was absent and industrialhygiene judgment had to be used.

Despite these cautions, the authors believe there are strongreasons for expecting that the determinants models presentedhere will improve exposure estimates developed by judgment

alone, which has generally been used for exposure assessmentin case-control studies. The modeled estimates are quantitative,which are preferred over alternative approaches, such assemiquantitative assessments. Exposure-response models haveexhibited the strongest point estimates when exposure levelswere assigned to individuals based on a determinants modelas opposed to the individuals’ own measurement means ortheir job means.(39−40) Those exposure models were based onindividual level exposure data; however, the authors believethat the modeled estimates presented in this paper should bemore accurate than estimates obtained via alternative means.

The accuracy of industrial hygienists’ assessment of expo-sure levels in the context of a population-based case-controlstudy is not well known and further data are needed.(2,9,36)

Information on how industrial hygienists estimate exposurelevels in population-based case-control studies is noticeablyabsent from the literature. The authors’ experience in estimat-ing levels for these types of studies is that the estimation processis subjective and ad hoc based on a sample of the readily avail-able measurement data, without criteria on what determinantsare considered and how they are weighted in the estimation pro-cess. The process is difficult, complex, and tedious and there-fore likely to be prone to substantial error. This is supported byinformation on how well industrial hygienists assess the levelof exposure in the context of a population-based case-controlstudy, which suggests there is room for improvement.(2,36)

Having exposure assessors to evaluate determinants ofexposure such as those described here, rather than estimatingexposure levels, may improve the estimation process becauseit may be easier to evaluate determinants of exposure. Ifsuch an evaluation is then used in a model,(9) the exposureestimates may be more accurate. Even if the estimates arenot, however, the use of determinants makes the assignmentprocess more rigorous and the use of a model is likely toincrease the reproducibility of the estimates. This approachalso provides documentation as to how the estimates weredeveloped, providing transparency and allowing an evaluationof the credibility of the estimates. Thus, it should also improvethe comparability of disease risk estimates across studies.Nonetheless, the authors consider these models only a firststep in improving exposure assessment in community-basedcase-control studies.

CONCLUSIONS

I n summary, it was possible to construct exposure models forbenzene, toluene, and xylene based on measurement data

reported in the published literature. These models are functionsof operation and several additional workplace determinantsof exposure. The models are not expected to be superior toexposure models that could be developed for a specific industryusing data from plants within the industry but will allowexposures to be estimated across a wide variety of industriesand operations with potential for exposure, which will beespecially useful for community-based case-control studies.The authors encourage others to critique these models and

Journal of Occupational and Environmental Hygiene January 2008 45

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

develop other models to improve exposure assessment in thesetypes of studies.

ACKNOWLEDGMENTS

T his research was supported, in part, by the IntramuralResearch Program of the National Institutes of Health,

National Cancer Institute, Division of Cancer Epidemiologyand Genetics.

REFERENCES

1. Stewart, P.A., W.F. Stewart, J. Siemiatycki, E.F. Heineman, and M.Dosemeci: Questionnaires for collecting detailed occupational informa-tion for community-based case control studies. Am. Ind. Hyg. Assoc. J.58:39–44 (1998).

2. Teschke, K., A.F. Olshan, J.L. Daniels, et al.: Occupational exposureassessment in case-control studies: Opportunities for improvement.Occup. Environ. Med. 59:575–94 (2002).

3. Bouyer, J., and D. Hemon: Retrospective evaluation of occupationalexposures in population-based case-control studies: general overviewwith special attention to job exposure matrices. Int. J. Epidemiol.22(Suppl 2):S57–S64 (1993).

4. Gerin, M., J. Siemiatycki, H. Kemper, and D. Begin: Obtainingoccupational exposure histories in epidemiologic case-control studies. J.Occup. Med. 27(6):420–426 (1985).

5. Blair, A., A. Linos, P.A. Stewart, et al.: Evaluation of risks for non-Hodgkin’s lymphoma by occupation and industry exposures from a case-control study. Am. J. Ind. Med. 23(2):301–312 (1993).

6. Stewart, W.F., and P.A. Stewart: Occupational case-control studies: I.Collecting information on work histories and work-related exposures. Am.J. Ind. Med. 26(3):297–312 (1994).

7. McGuire, V., L.M. Nelson, T.D. Koepsell, H. Checkoway, and W.T.Longstreth Jr.: Assessment of occupational exposures in community-based case-control studies. Ann. Rev. Public Health 19:35–53 (1998).

8. Dosemeci, M., M.C.R. Alavanja, A.S. Rowland, et al.: A quantitativeapproach for estimating exposure to pesticides in the Agricultural HealthStudy. Ann. Occup. Hyg. 46(2):245–260 (2002).

9. Semple, S.E., F. Dick, and J.W. Cherrie: Exposure assessment for apopulation-based case-control study combining a job-exposure matrixwith interview data. Scand. J. Work, Environ. Health 30(3):241–248(2004).

10. Hoar, S.K., A.S. Morrison, P. Cole, and D.T. Silverman: An occupationand exposure linkage system for the study of occupational carcinogenesis.J. Occup. Med. 22(11):722–726 (1980).

11. Pannett, B., D. Coggon, and E.D. Acheson: A job-exposure matrix foruse in population based studies in England and Wales. Br. J. Ind. Med.42(11):777–783 (1985).

12. Dosemeci, M., P. Cocco, M. Gomez, P.A. Stewart, and E.F. Heineman:Effects of three features of a job-exposure matrix on risk estimates.Epidemiology 5:124–127 (1994).

13. McNamee, R: Retrospective assessment of occupational exposure tohydrocarbons – Job-exposure matrices versus expert evaluation of ques-tionnaires. Occup. Hyg. 3:137–143 (1996).

14. Rybicki, B.A., C.C. Johnson, E.L. Peterson, G.X. Kortsha, and J.M.Gorell: Comparability of different methods of retrospective exposureassessment of metals in manufacturing industries. Am. J. Ind. Med.31(1):36–43 (1997).

15. De Roos, A.J., K. Teschke, D.A. Savitz, et al.: Parental occupationalexposures to electromagnetic fields and radiation and the incidence ofneuroblastoma in offspring. Epidemiology 12(5):508–517 (2001).

16. van Wijngaarden, E., P.A. Stewart, A.F. Olshan, D.A. Savitz,and G.R. Bunin: Parental occupational exposure to pesticides

and childhood brain cancer. Am. J. Epidemiol. 157(11):989–997(2003).

17. van Wijngaarden, E., and P.A. Stewart: Critical literature review ofdeterminants and levels of occupational benzene exposure for UnitedStates community-based case-control studies. Appl. Occup. Environ. Hyg.18:678–693(2003).

18. Bunin, G.R., R.R. Kuijten, J.D. Buckley, L.B. Rorke, and A.T.Meadows: Relation between maternal diet and subsequent primitiveneuroectodermal brain tumors in young children. N. Engl. J. Med.329:536–541 (1993).

19. Aitchison, J., and J.A.C. Brown: The Lognormal Distribution. Cam-bridge, England: Cambridge University Press, 1963. p. 8.

20. Buringh, E., and R. Lanting: Exposure variability in the workplace: itsimplications for the assessment of compliance. Am. Ind. Hyg. Assoc. J.52:6–13 (1991).

21. National Institute for Occupational Safety and Health (NIOSH):Hydrocarbons, aromatic: Method 1501. In NIOSH Manual of AnalyticalMethods, Eller P.M., and M.E. Cassinelli (eds.), DHHS (NIOSH) Pub.94-113. Cincinnati, Ohio: NIOSH 1994.

22. National Institute for Occupational Safety and Health (NIOSH):NIOSH Pocket Guide to Chemical Hazards. DHHS (NIOSH) Pub. 2005-149. Cincinnati, Ohio: NIOSH, 2005.

23. Neter, J., M.H. Kutner, C.J. Nachtsheim, and W. Wasserman: AppliedLinear Statistical Models, 4th ed. Chicago: Irwin, 1996. pp. 400–409.

24. Schneider, T., and E. Holst: Validation of exposure assessment inoccupational epidemiology. Occup. Hyg. 3:59–71 (1996).

25. Cherrie, J.W., and T. Schneider: Validation of a new method forstructured subjective assessment of past concentrations. Ann. Occup. Hyg.43(4):235–245 (1999).

26. Friesen, M.C., P.A. Demers, J.J. Spinelli, and N.D. Le: Validation of asemi-quantitative job exposure matrix at a Soderberg aluminum smelter.Ann. Occup. Hyg. 47(6):477–484 (2003).

27. Stewart, P.A., P.S.J. Lees, A. Correa, P. Breysse, M. Gail, andB.I. Graubard: Evaluation of three retrospective exposure assessmentmethods. Ann. Occup. Hyg. 47(5):399–411 (2003).

28. Hornung, R.W., R.F. Herrick, P.A. Stewart, et al.: An experimentaldesign approach to retrospective exposure assessment. Am. Ind. Hyg.Assoc. J. 57:251–256 (1996).

29. Burstyn, I., P. Boffetta, G.A. Burr, et al.: Validity of empirical modelsof exposure in asphalt paving. Occup. Environ. Med. 59(9):620–624(2002).

30. Hornung, R.W., A.L. Greife, L.T. Stayner, et al.: Statistical model forprediction of retrospective exposure to ethylene oxide in an occupationalmortality study. Am. J. Ind. Med. 25:825–836 (1994).

31. Picard, R.R., and K.N. Berk: Data splitting. Am. Stat. 44(2):140–147(1990).

32. Harrell, F.E. Jr.: Regression Modeling Strategies. With Applications toLinear Models, Logistic Regression, and Survival Analysis. New York:Springer-Verlag New York, Inc., 2002. p. 93.

33. Lubin, J.H., J.S. Colt, D. Camann, et al.: Epidemiologic evaluation ofmeasurement data in the presence of detection limits. Environ. HealthPerspect. 112(17):1691–1696 (2004).

34. Symanski, E., L.L. Kupper, I. Hertz-Picciotto, and S.M. Rappaport:Comprehensive evaluation of long term trends in occupational exposure:Part 2. Predictive models for declining exposures. Occup. Environ. Med.55:310–316 (1998).

35. Burstyn, I., and K. Teschke: Studying the determinants of exposure:Areview of methods. Am. Ind. Hyg. Assoc. J. 60:57–72 (1999).

36. Siemiatycki, J., L. Fritschi, L. Nadon, and M. Gerin: Reliability ofan expert rating procedure for retrospective assessment of occupationalexposures in community-based case-control studies. Am. J. Ind. Med.31(3):280–286 (1997).

37. Reeb-Whitaker, C.K., N.S. Seixas, L. Sheppard, and R. Neitzel:Accuracy of task recall for epidemiological exposure assess-ment to construction noise. Occup. Environ. Med. 61(2):135–142(2004).

46 Journal of Occupational and Environmental Hygiene January 2008

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014

38. Vermuelen, R., and H. Kromhout: Historical limitations of determinantbased exposure groupings in the rubber manufacturing industry. Occup.Environ. Med. 62:793–799 (2005).

39. Preller, L., H. Kromhout, D. Heederick, and M.J.M. Tielen: Modelinglong-term average exposure in occupational exposure-response analysis.Scand. J. Work, Environ. Health 21:504–512 (1995).

40. Teschke, K., J. Spierings, S.A. Marion, P.A. Demers, H.W. Davies, andS.M. Kennedy: Reducing attenuation in exposure-response relationshipsby exposure modeling and grouping: the relationship between wooddust exposure and lung function. Am. J. Ind. Med. 46(6):663–667(2004).

Journal of Occupational and Environmental Hygiene January 2008 47

Dow

nloa

ded

by [

Step

hen

B. T

hack

er C

DC

Lib

rary

] at

05:

43 0

3 O

ctob

er 2

014