Percentile estimation using variable censored data

12
Percentile estimation using variable censored data Samuel P. Caudill a, * , Lee-Yang Wong b , Wayman E. Turner a , Robin Lee b , Alden Henderson b , Donald G. Patterson Jr. a a Division of Laboratory Sciences, National Center for Environmental Health, Centers for Disease Control and Prevention, 4770 Buford Highway NE, Atlanta, GA 30341, United States b Division of Health Studies, Agency for Toxic Substances and Disease Registry, 1600 Clifton Road, Atlanta, GA 30333, United States Received 23 March 2006; received in revised form 30 October 2006; accepted 6 December 2006 Available online 30 January 2007 Abstract Much progress has been made in recent years to address the estimation of summary statistics, using data that are subject to censoring of results that fall below the limit of detection (LOD) for the measuring instrument. Truncated data methods (e.g., Tobit regression) and multiple-imputation are two approaches for analyzing data results that are below the LOD. To apply these methods requires an assump- tion about the underlying distribution of the data. Because the log-normal distribution has been shown to fit many data sets obtained from environmental measurements, the common practice is to assume that measurements of environmental factors can be described by log-normal distributions. This article describes methods for obtaining estimates of percentiles and their associated confidence intervals when the results are log-normal and a fraction of the results are below the LOD. We present limited simulations to demonstrate the bias of the proposed estimates and the coverage probability of their associated confidence intervals. Estimation methods are used to generate summary statistics for 2,3,7,8-tetrachloro dibenzo-p-dioxin (2,3,7,8-TCDD) using data from a 2001 background exposure study in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population. Because the congener measurements used in this study were subject to variable LODs, we also present simulation results to demonstrate the effect of variable LODs on the multiple- imputation process. Published by Elsevier Ltd. Keywords: cPCB; Limit of detection; Multiple-imputation; PCDD; PCDF; TCDD 1. Introduction One problem that arises in trying to characterize envi- ronmental exposures is that levels of contaminants in some or many individuals are not detectable by the instrumenta- tion. The inability to detect can result from insufficient matrix or from extremely low exposure levels. Such results are said to be below the limit of detection (LOD) as deter- mined by the sampling and analytic method. In spite of continued improvements in the sensitivity of assays to detect lower and lower concentrations, exposure levels are also decreasing so that the percentage of results below the LOD is not declining and may actually be increasing. Several papers (Persson and Rootzen, 1977; Gleit, 1985; Haas and Scheff, 1990; Helsel, 1990; Hornung and Reed, 1990; Travis and Land, 1990; Huybrechts et al., 2002) have addressed the problem of estimating the mean or geometric mean of a population subject to results below the LOD. Lubin et al. (2004) took this a step further by focusing on regression models in which the dependent variable has results below the LOD. Regardless of the problem with results below the LOD, many investigators prefer to use percentiles rather than just summary measures of central tendency and dispersion to describe environmental exposure data because such data tend to be skewed and may not be exactly log-normal. In such cases, percentiles often provide a more thorough and accurate characterization than is achieved using 0045-6535/$ - see front matter Published by Elsevier Ltd. doi:10.1016/j.chemosphere.2006.12.013 * Corresponding author. Tel.: +1 770 488 4622; fax: +1 770 488 4192. E-mail address: [email protected] (S.P. Caudill). www.elsevier.com/locate/chemosphere Chemosphere 68 (2007) 169–180

Transcript of Percentile estimation using variable censored data

www.elsevier.com/locate/chemosphere

Chemosphere 68 (2007) 169–180

Percentile estimation using variable censored data

Samuel P. Caudill a,*, Lee-Yang Wong b, Wayman E. Turner a, Robin Lee b,Alden Henderson b, Donald G. Patterson Jr. a

a Division of Laboratory Sciences, National Center for Environmental Health, Centers for Disease Control and Prevention,

4770 Buford Highway NE, Atlanta, GA 30341, United Statesb Division of Health Studies, Agency for Toxic Substances and Disease Registry, 1600 Clifton Road, Atlanta, GA 30333, United States

Received 23 March 2006; received in revised form 30 October 2006; accepted 6 December 2006Available online 30 January 2007

Abstract

Much progress has been made in recent years to address the estimation of summary statistics, using data that are subject to censoringof results that fall below the limit of detection (LOD) for the measuring instrument. Truncated data methods (e.g., Tobit regression) andmultiple-imputation are two approaches for analyzing data results that are below the LOD. To apply these methods requires an assump-tion about the underlying distribution of the data. Because the log-normal distribution has been shown to fit many data sets obtainedfrom environmental measurements, the common practice is to assume that measurements of environmental factors can be described bylog-normal distributions. This article describes methods for obtaining estimates of percentiles and their associated confidence intervalswhen the results are log-normal and a fraction of the results are below the LOD. We present limited simulations to demonstrate the biasof the proposed estimates and the coverage probability of their associated confidence intervals. Estimation methods are used to generatesummary statistics for 2,3,7,8-tetrachloro dibenzo-p-dioxin (2,3,7,8-TCDD) using data from a 2001 background exposure study in whichPCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population. Because the congener measurements used inthis study were subject to variable LODs, we also present simulation results to demonstrate the effect of variable LODs on the multiple-imputation process.Published by Elsevier Ltd.

Keywords: cPCB; Limit of detection; Multiple-imputation; PCDD; PCDF; TCDD

1. Introduction

One problem that arises in trying to characterize envi-ronmental exposures is that levels of contaminants in someor many individuals are not detectable by the instrumenta-tion. The inability to detect can result from insufficientmatrix or from extremely low exposure levels. Such resultsare said to be below the limit of detection (LOD) as deter-mined by the sampling and analytic method. In spite ofcontinued improvements in the sensitivity of assays todetect lower and lower concentrations, exposure levelsare also decreasing so that the percentage of results below

0045-6535/$ - see front matter Published by Elsevier Ltd.

doi:10.1016/j.chemosphere.2006.12.013

* Corresponding author. Tel.: +1 770 488 4622; fax: +1 770 488 4192.E-mail address: [email protected] (S.P. Caudill).

the LOD is not declining and may actually be increasing.Several papers (Persson and Rootzen, 1977; Gleit, 1985;Haas and Scheff, 1990; Helsel, 1990; Hornung and Reed,1990; Travis and Land, 1990; Huybrechts et al., 2002) haveaddressed the problem of estimating the mean or geometricmean of a population subject to results below the LOD.Lubin et al. (2004) took this a step further by focusingon regression models in which the dependent variable hasresults below the LOD.

Regardless of the problem with results below the LOD,many investigators prefer to use percentiles rather than justsummary measures of central tendency and dispersion todescribe environmental exposure data because such datatend to be skewed and may not be exactly log-normal. Insuch cases, percentiles often provide a more thoroughand accurate characterization than is achieved using

170 S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

geometric means and standard deviations. An evaluationof various MLE methods and MLE-based imputationmethods for median and interquartile range estimationhas recently been published by Huybrechts et al. (2002),but these authors only considered fixed LODs. Our focusin this paper is on the estimation of percentiles and theirconfidence intervals and how these estimates are affectedby: (1) fixed and variable LODs, (2) percentage of resultsbelow the LOD, and (3) sample size. We compare single-and multiple-imputation methods for replacing resultsbelow the LOD by performing a limited number of simula-tions to determine how these three factors affect the bias ofvarious percentile estimates and the coverage probabilityassociated with their confidence intervals. We also presentpercentile estimates and associated confidence intervalsfor 2,3,7,8-TCDD using data from a 2001 backgroundexposure study, in which PCDDs/PCDFs/cPCBs in humanblood serum were measured in a Louisiana popula-tion (Agency for Toxic Substance and Disease Registry,2005).

2. Statistical methods and analysis

For multiple-imputation calculations, individual mea-surements are assumed to be log-normally distributedand the second multiple-imputation procedure describedby Lynn (2001) is used to impute values below theLOD. Using this method, missing values of the base(10)logarithm of congener results are sampled from f (XmisjXobs, Xmis < c (l*, r*)), by drawing a random variancer� � ðn� 1Þr̂2=v2

n�1 followed by a random mean l� �Nðl̂; r�=nÞ, where n (= nmis + nobs) is the total number ofsubjects and l̂ is the maximum likelihood estimate(MLE) of l and r̂2 is the MLE of r2. Xmis is then sampledfrom the lower tail of N(l*, r*) where values are less thanc[= log10(LOD)]. This process is repeated m times asdescribed by Rubin (1987) to create m sets of results. Eachof these m data sets is then analyzed independently toestimate various percentiles along with their confidencelimits.

The m within-imputation estimates of a given percentileare averaged to arrive at a final percentile estimate.Whereas, according to Rubin (1987), the variance of amean or geometric mean estimate would be obtained bycombining the average of the m within-imputation varianceestimates of the mean with the among-imputation varianceof these mean estimates, the within-imputation varianceestimates of percentiles are not readily available. Thus, adifferent approach is used to obtain confidence intervalsthat take into account the additional variability resultingfrom multiple-imputation. This approach uses the cumula-tive binomial distribution to estimate confidence limitsafter adjusting the sample size, to account for the increasein variance resulting from multiple-imputation. The methodused is adapted from methods described by Woodruff(1952) and by Korn and Graubard (1998). The method isdescribed as follows:

Step 1: Separately for each of the m data sets, the empir-ical distribution of results is used to determinethe value (Xi; i = 1, 2, . . . , m) associated with aselected percentile. Then the number of results(ki) below this Xi value is computed. To simplifynotation below, the subscript i on the k has beendropped.

Step 2: Next the Clopper and Pearson (1934) 0.95 confi-dence interval limits (PL(k, n) and PU(k, n)) forthe ith data set are computed as follows:

P Lðk; nÞ ¼v1F v1;v2ð0:025Þ

v2 þ v1F v1;v2ð0:025Þ and

P Uðk; nÞ ¼v3F v3;v4ð0:975Þ

v4 þ v3F v3;v4ð0:975Þ

ð1Þ

where n = the total number of results as definedearlier (i.e., n = nmis + nobs), k = number of resultsbelow Xi, v1 = 2k, v2 = 2(n � k + 1), v3 = 2(k +1), v4 = 2(n � k), and Fd1,d2(b) is the b quantileof an F distribution with d1 and d2 degrees offreedom.

Step 3: The empirical distribution of the sample results isthen used again to determine the values XL95 cor-responding to PL(k, n) and XU95 corresponding toPU(k,n).

Step 4: To estimate the relative increase in the variabilityof the percentile estimate resulting from mul-tiple-imputation, one-quarter of the width of the95% confidence interval around the base(10)logarithm of the percentile (i.e., log10[XU95] �log10[XL95]) is first used to approximate thewithin-imputation variance. After this variancehas been estimated for each of the m imputations,the average is computed to represent the within-imputation variance. The among-imputation vari-ance is calculated by computing the variance ofthe m log10[Xi] values and represents the increasein variance resulting from multiple-imputation.The total variance is estimated by adding the aver-age within-imputation variance estimate to (1 + 1/m) times the among-imputation variance estimate.The multiple-imputation design effect (D) is thencomputed as the ratio of this total variance esti-mate to the within-imputation variance estimate.The factor D is similar to the design effect (i.e.,complex sample design variance divided by thesimple random sample variance) in sample surveytheory and in this case, quantifies the increase invariance resulting from multiple-imputation.

Step 5: For the final percentile estimate, the log10[Xi] val-ues are averaged across the m imputation data sets,and the empirical distribution of these averages isused to determine the average log10[Xi] value(L10XAVE) associated with a selected percentile.Steps 2 and 3 are then repeated with the samplesize n replaced by a reduced sample size nr = n/D.This reduced sample size is used to incorporate

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180 171

the additional variability in the percentile estimateresulting from multiple-imputation. Changing thesample size also requires a change in the value ofk in Eq. (1) which should be replaced by kr =k/D, where k is now the number of results belowthe L10XAVE value. The resulting equation corre-sponding to the sample size reduction is given by

P Lðkr; nrÞ ¼v1F v1;v2ð0:025Þ

v2 þ v1F v1;v2ð0:025Þ and

P U ðkr; nrÞ ¼v3F v3;v4ð0:975Þ

v4 þ v3F v3;v4ð0:975Þ

ð2Þ

where v1 = 2kr, v2 = 2(nr � kr + 1), v3 = 2(kr + 1),v4 = 2(nr � kr), and Fd1,d2(b) is the b quantile ofan F distribution with d1 and d2 degrees of freedom.

Step 6: The empirical distribution of the L10XAVE

results is then used again to determine the valuesL10XAVE_L95 corresponding to PL(kr, nr) andL10XAVE_U95 corresponding to PU(kr, nr). Finally,the L10XAVE, L10XAVE_L95, and L10XAVE_U95 areback transformed to produce the final percentileestimate with corresponding confidence limits.

2.1. Simulation study

To evaluate the effects of fixed and variable LODs onestimates of various percentiles, we generated 1000 log-nor-mal data sets of sample sizes 100 and 500. The mean chosenfor the simulated log-normal data was �0.10 and the var-iance was 0.397. This mean and variance corresponds toMLEs of the mean and variance of log base(10)-trans-formed values of 2,3,7,8-TCDD in a sample of 415 subjectsfrom a 2001 background exposure study conducted in Lou-isiana. Simulation for this specific data set was performedso that we could explore the effects of variable censoringlimits which may be unique to this particular data setand to the Centers for Disease Control and Prevention(CDC) laboratory performing the analyses. The varianceof the log base(10)-transformed values of the censoring lim-its was 0.044, so we also simulated variable censoring limitswith this variance. We censored results from each of these2000 data sets using various fixed and variable censoringlimits to achieve censoring fractions from approximately5–70%, resulting in 2000 uncensored data sets, 2000 datasets with fixed censoring limits, and 2000 data sets with var-iable censoring limits. The fixed censoring limits were cho-sen so that approximately 5%, 25%, 50%, or 70% of the logtransformed data would be censored. Variable censoringlimits were achieved by adding random normal deviatesto the fixed censoring limits. For each of these 8000 datasets we computed the 25th, 50th, 75th, 90th, and 95th per-centiles, their 95% confidence limits, and their biases rela-tive to the true percentiles.

The censoring limit variance of 0.044 mentioned earliercorresponds to the 255 subjects with results below the LOD

in the Louisiana background exposure study. This censor-ing limit variance is approximately 10% of the magnitudeof the MLE estimate of variance for the base(10) logarithmof the 415 TCDD measurements. Because backgroundTCDD exposure levels are age-dependent, stratificationof the full data set into age-groups is also of interest. Itturns out that the percentage of censoring limit varianceto full sample variance of TCDD measurements rangesfrom 10% to 20% for the four age-groups considered inthe Louisiana study. Thus to determine the effect of censor-ing-limit variance on the variable censoring simulationsand to allow validity checks of the fixed censoring anduncensored simulation results, we also re-ran all the simu-lations using a censoring limit variance equal to 20% of thevariance of the simulated data.

2.2. Comparison of single-imputation and

multiple-imputation estimates of percentiles

We compared single-imputation and multiple-imputation estimates of the 25th, 50th, 75th, 90th, and95th percentiles using variable censored 2,3,7,8-TCDDresults from a 2001 background exposure study in whichPCDDs/PCDFs/cPCBs in human blood serum were mea-sured in a Louisiana population. For the single-imputationestimates, we use the variable censoring limit divided byeither two or the square root of two as the imputed value.For the multiple-imputation estimates we use the methoddescribed above.

3. Results

3.1. Simulation studies

Simulation results are presented in Tables 1–5, whichdisplay the average bias and coverage of the percentile esti-mates as a function of the sample size, the variable censor-ing limit variance, the fraction of results below the LOD,and the type of imputation. The bias and coverage are alsogiven for the uncensored data (i.e., the complete databefore censoring). Thus the column labeled ‘‘Frac-tion < LOD’’ does not apply to these estimates except inthe sense that these estimates were obtained from the samedata set that was later censored to the extent indicated. Thestandard error for coverage estimates of 95% confidencelimits is 0.0069. Differences in bias estimates under theuncensored column with comparable sample size, resultfrom random errors in the simulation process because thesedata were generated under identical conditions. Similarly,differences in bias estimates under the fixed censoring-limitcolumns with comparable sample size and fraction ofresults below the LOD, result from random errors in thesimulation process because these data were also generatedunder identical conditions.

Table 1 shows slight bias in estimation of the 25thpercentile for sample sizes of 100 and 500 even when no

Table 1Bias of 25th percentile estimates and coverage of their 95% confidence limits (entries are means of 1000 repetitions)

Sample size Variable censoringlimit variance (%)a

Fraction < LODb

F/VcUncensored Bias of 25th percentile estimates (coverage of 95% limits)

Fixed censoring limit Variable censoring limit

Single-imputation Multiple- imputation Single-imputation Multiple- imputation

LOD/2 LOD/p

2 LOD/2 LOD/p

2

100 10 0.04/0.01 0.026 (0.963) 0.026 (0.963) 0.026 (0.963) 0.026 (1.00) 0.024 (0.963) 0.026 (0.963) 0.024 (1.00)20 0.06/0.10 0.021 (0.968) 0.021 (0.968) 0.021 (0.968) 0.021 (1.00) 0.014 (0.967) 0.033 (0.965) 0.009 (1.00)10 0.35/0.24 0.017 (0.973) �0.069 (0.999) 0.070 (0.999) �0.228 (1.00) �0.013 (0.964) 0.192 (0.663) �0.148 (1.00)20 0.28/0.32 0.020 (0.959) �0.056 (0.997) 0.077 (0.997) �0.215 (1.00) 0.078 (0.948) 0.251 (0.588) �0.096 (1.00)10 0.55/0.57 0.028 (0.962) 0.959 (0.000) 1.33 (0.000) �0.146 (0.999) 0.395 (0.237) 0.918 (0.001) �0.246 (1.00)20 0.41/0.37 0.023 (0.971) 0.960 (0.000) 1.33 (0.000) �0.153 (1.00) 0.388 (0.224) 0.861 (0.147) �0.297 (1.00)10 0.69/0.69 0.020 (0.966) 3.717 (0.000) 4.610 (0.000) �0.084 (0.967) 1.818 (0.000) 2.934 (0.000) �0.223 (0.982)20 0.61/0.80 0.018 (0.950) 3.302 (0.000) 4.117 (0.000) �0.137 (0.981) 1.460 (0.000) 2.383 (0.000) �0.346 (0.997)

500 10 0.09/0.09 0.007 (0.952) 0.007 (0.952) 0.007 (0.952) 0.007 (0.952) 0.006 (0.952) 0.007 (0.950) 0.006 (1.00)20 0.08/0.09 0.007 (0.961) 0.007 (0.961) 0.007 (0.961) 0.007 (0.961) 0.002 (0.960) 0.013 (0.958) 0.000 (1.00)10 0.32/0.32 0.004 (0.955) �0.149 (1.00) 0.008 (1.00) �0.297 (1.00) �0.015 (0.947) 0.179 (0.266) �0.144 (1.00)20 0.27/0.15 0.009 (0.948) �0.148 (1.00) 0.008 (1.00) �0.295 (1.00) 0.061 (0.830) 0.216 (0.365) �0.097 (1.00)10 0.54/0.50 0.006 (0.953) 0.932 (0.000) 1.30 (0.000) �0.152 (1.00) 0.355 (0.257) 0.866 (0.000) �0.284 (1.00)20 0.52/0.52 0.007 (0.942) 0.932 (0.000) 1.30 (0.000) �0.148 (1.00) 0.331 (0.033) 0.796 (0.000) �0.346 (1.00)10 0.76/0.82 0.006 (0.948) 3.638 (0.000) 4.515 (0.000) �0.139 (1.00) 1.754 (0.000) 2.857 (0.000) �0.332 (1.00)20 0.74/0.70 0.003 (0.950) 3.637 (0.000) 4.515 (0.000) �0.147 (1.00) 1.585 (0.000) 2.575 (0.000) �0.425 (1.00)

a The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 2552,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serumwere measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four sub-samples of the 415 subjects based on age-groups.

b LOD is the limit of detection.c The fractions < LOD listed are averages for the fixed (F) and variable (V) censoring limits, respectively.

172S

.P.

Ca

ud

illet

al.

/C

hem

osp

here

68

(2

00

7)

16

9–

18

0

Table 2Bias of 50th percentile estimates and coverage of their 95% confidence limits (entries are means of 1000 repetitions)

Sample size Variable censoringlimit variance (%)a

Fraction < LODb

F/VcUncensored Bias of 50th percentile estimates (coverage of 95% limits)

Fixed censoring limit Variable censoring limit

Single-imputation Multiple- imputation Single-imputation Multiple- imputation

LOD/2 LOD/p

2 LOD/2 LOD/p

2

100 10 0.04/0.01 0.012 (0.964) 0.012 (0.964) 0.012 (0.964) 0.012 (0.964) 0.012 (0.969) 0.012 (0.964) 0.012 (1.00)20 0.06/0.10 0.024 (0.968) 0.024 (0.968) 0.024 (0.968) 0.024 (0.968) 0.024 (0.969) 0.024 (0.969) 0.024 (0.969)10 0.35/0.24 0.007 (0.962) 0.007 (0.962) 0.007 (0.962) 0.007 (0.974) 0.001 (0.962) 0.014 (0.963) �0.002 (0.975)20 0.28/0.32 0.017 (0.966) 0.017 (0.966) 0.017 (0.966) 0.017 (0.975) 0.013 (0.971) 0.053 (0.963) �0.013 (0.972)10 0.55/0.57 0.020 (0.965) �0.067 (0.986) 0.012 (0.986) �0.187 (0.988) �0.022 (0.955) 0.179 (0.584) �0.289 (0.968)20 0.41/0.37 0.013 (0.971) �0.082 (0.991) 0.003 (0.991) �0.212 (0.991) 0.021 (0.900) 0.233 (0.401) �0.323 (0.950)10 0.69/0.69 0.030 (0.974) 0.773 (0.000) 1.109 (0.000) �0.481 (0.202) 0.542 (0.025) 1.109 (0.000) �0.536 (0.274)20 0.61/0.80 �0.003 (0.961) 0.617 (0.000) 0.923 (0.000) �0.508 (0.184) 0.506 (0.239) 1.108 (0.000) �0.590 (0.316)

500 10 0.09/0.09 0.007 (0.949) 0.007 (0.949) 0.007 (0.949) 0.007 (0.949) 0.007 (0.949) 0.007 (0.949) 0.007 (0.949)20 0.08/0.09 0.006 (0.956) 0.006 (0.956) 0.006 (0.956) 0.006 (0.956) 0.006 (0.956) 0.006 (0.956) 0.006 (0.956)10 0.32/0.32 0.006 (0.960) 0.006 (0.960) 0.006 (0.960) 0.006 (0.972) 0.001 (0.955) 0.011 (0.959) �0.000 (0.969)20 0.27/0.15 0.004 (0.949) 0.004 (0.949) 0.004 (0.949) 0.004 (0.960) 0.002 (0.944) 0.037 (0.914) �0.016 (0.962)10 0.54/0.50 0.004 (0.947) �0.140 (0.984) �0.053 (0.984) �0.223 (0.981) �0.075 (0.936) 0.132 (0.246) �0.342 (0.782)20 0.52/0.52 0.004 (0.944) �0.142 (0.982) 0.054 (0.982) �0.230 (0.981) �0.024 (0.636) 0.188 (0.320) �0.371 (0.629)10 0.76/0.82 0.005 (0.964) 0.743 (0.000) 1.073 (0.000) �0.505 (0.000) 0.501 (0.000) 1.062 (0.000) �0.596 (0.000)20 0.74/0.70 0.002 (0.959) 0.743 (0.000) 1.073 (0.000) �0.508 (0.000) 0.579 (0.000) 1.135 (0.000) �0.640 (0.003)

a The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 2552,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serumwere measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four sub-samples of the 415 subjects based on age-groups.

b LOD is the limit of detection.c The fractions < LOD listed are averages for the fixed (F) and variable (V) censoring limits, respectively.

S.P

.C

au

dill

eta

l./

Ch

emo

sph

ere6

8(

20

07

)1

69

–1

80

173

Table 3Bias of 75th percentile estimates and coverage of their 95% confidence limits (entries are means of 1000 repetitions)

Sample size Variable censoringlimit variance (%)a

Fraction < LODb

F/VcUncensored Bias of 75th percentile estimates (coverage of 95% limits)

Fixed censoring limit Variable censoring limit

Single-imputation Multiple-imputation

Single-imputation Multiple-imputationLOD/2 LOD/

p2 LOD/2 LOD/

p2

100 10 0.04/0.01 0.012 (0.961) 0.012 (0.961) 0.012 (0.961) 0.012 (0.961) 0.012 (0.961) 0.012 (0.961) 0.012 (0.961)20 0.06/0.10 0.024 (0.968) 0.024 (0.968) 0.024 (0.968) 0.024 (0.968) 0.024 (0.968) 0.024 (0.968) 0.024 (0.968)10 0.35/0.24 �0.005 (0.968) �0.005 (0.968) �0.005 (0.968) �0.005 (0.968) �0.005 (0.968) �0.005 (0.968) �0.005 (0.968)20 0.28/0.32 0.016 (0.963) 0.016 (0.963) 0.016 (0.963) 0.016 (0.963) 0.016 (0.963) 0.017 (0.963) 0.015 (0.963)10 0.55/0.57 0.020 (0.953) 0.020 (0.953) 0.020 (0.953) 0.020 (0.953) 0.018 (0.953) 0.028 (0.953) 0.016 (0.960)20 0.41/0.37 0.013 (0.953) 0.013 (0.95) 0.013 (0.953) 0.013 (0.960) 0.017 (0.952) 0.057 (0.949) �0.003 (0.962)10 0.69/0.69 0.025 (0.965) �0.035 (0.981) 0.012 (0.981) �0.140 (0.991) �0.017 (0.893) 0.216 (0.355) �0.354 (0.947)20 0.61/0.80 0.001 (0.950) �0.040 (0.983) �0.009 (0.983) �0.110 (0.985) 0.021 (0.685) 0.276 (0.358) �0.456 (0.827)

500 10 0.09/0.09 0.006 (0.961) 0.006 (0.961) 0.006 (0.961) 0.006 (0.961) 0.006 (0.961) 0.006 (0.961) 0.006 (0.961)20 0.08/0.09 0.002 (0.962) 0.002 (0.962) 0.002 (0.962) 0.002 (0.962) 0.002 (0.962) 0.002 (0.962) 0.002 (0.962)10 0.32/0.32 0.005 (0.953) 0.005 (0.953) 0.005 (0.953) 0.005 (0.953) 0.005 (0.953) 0.005 (0.953) 0.005 (0.953)20 0.27/0.15 0.004 (0.943) 0.004 (0.943) 0.004 (0.943) 0.004 (0.943) 0.004 (0.943) 0.005 (0.945) 0.004 (0.948)10 0.54/0.50 0.003 (0.952) 0.003 (0.952) 0.003 (0.952) 0.003 (0.959) 0.001 (0.952) 0.008 (0.949) 0.001 (0.963)20 0.52/0.52 �0.001 (0.940) �0.001 (0.940) �0.001 (0.940) �0.001 (0.946) 0.003 (0.946) 0.039 (0.910) �0.013 (0.958)10 0.76/0.82 0.002 (0.950) �0.048 (0.974) �0.021 (0.974) �0.095 (0.984) �0.065 (0.651) 0.178 (0.010) �0.423 (0.675)20 0.74/0.70 �0.002 (0.961) �0.048 (0.983) �0.023 (0.983) �0.092 (0.990) 0.033 (0.053) 0.323 (0.350) �0.538 (0.437)

a The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 2552,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serumwere measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four sub-samples of the 415 subjects based on age-groups.

b LOD is the limit of detection.c The fractions < LOD listed are averages for the fixed (F) and variable (V) censoring limits, respectively.

174S

.P.

Ca

ud

illet

al.

/C

hem

osp

here

68

(2

00

7)

16

9–

18

0

Table 4Bias of 90th percentile estimates and coverage of their 95% confidence limits (entries are means of 1000 repetitions)

Sample size Variable censoringlimit variance (%)a

Fraction < LODb

F/VcUncensored Bias of 90th percentile estimates (coverage of 95% limits)

Fixed censoring limit Variable censoring limit

Single-imputation Multiple-imputation

Single-imputation Multiple-imputationLOD/2 LOD/

p2 LOD/2 LOD/

p2

100 10 0.04/0.01 0.020 (0.965) 0.020 (0.965) 0.020 (0.965) 0.020 (0.965) 0.020 (0.965) 0.020 (0.965) 0.020 (0.965)20 0.06/0.10 0.033 (0.959) 0.033 (0.959) 0.033 (0.959) 0.033 (0.959) 0.033 (0.959) 0.033 (0.959) 0.033 (0.959)10 0.35/0.24 0.010 (0.961) 0.010 (0.961) 0.010 (0.961) 0.010 (0.961) 0.010 (0.961) 0.010 (0.961) 0.010 (0.961)20 0.278/0.32 0.030 (0.959) 0.030 (0.959) 0.030 (0.959) 0.030 (0.959) 0.030 (0.959) 0.030 (0.959) 0.030 (0.959)10 0.55/0.57 0.017 (0.965) 0.017 (0.965) 0.017 (0.965) 0.017 (0.965) 0.017 (0.965) 0.017 (0.965) 0.017 (0.965)20 0.41/0.37 0.021 (0.964) 0.021 (0.964) 0.021 (0.964) 0.021 (0.964) 0.021 (0.964) 0.024 (0.964) 0.021 (0.964)10 0.69/0.69 0.031 (0.967) 0.031 (0.967) 0.031 (0.967) 0.031 (0.967) 0.029 (0.966) 0.045 (0.966) 0.026 (0.971)20 0.61/0.80 0.023 (0.969) 0.023 (0.969) 0.023 (0.969) 0.023 (0.969) 0.032 (0.968) 0.084 (0.954) 0.002 (0.973)

500 10 0.09/0.09 0.006 (0.959) 0.006 (0.959) 0.006 (0.959) 0.006 (0.959) 0.006 (0.959) 0.006 (0.959) 0.006 (0.959)20 0.08/0.09 �0.002 (0.954) �0.002 (0.954) �0.002 (0.954) �0.002 (0.954) �0.002 (0.954) �0.002 (0.954) �0.002 (0.954)10 0.32/0.32 0.006 (0.952) 0.006 (0.952) 0.006 (0.952) 0.006 (0.952) 0.006 (0.952) 0.006 (0.952) 0.006 (0.952)20 0.27/0.15 0.006 (0.960) 0.006 (0.960) 0.006 (0.960) 0.006 (0.960) 0.006 (0.960) 0.006 (0.960) 0.006 (0.960)10 0.54/0.50 0.008 (0.959) 0.008 (0.959) 0.008 (0.959) 0.008 (0.959) 0.008 (0.959) 0.008 (0.959) 0.008 (0.959)20 0.52/0.52 �0.000 (0.957) �0.000 (0.957) �0.000 (0.957) �0.000 (0.957) �0.000 (0.956) 0.002 (0.955) �0.001 (0.958)10 0.76/0.82 0.006 (0.949) 0.006 (0.949) 0.006 (0.949) 0.006 (0.953) 0.004 (0.949) 0.016 (0.948) 0.003 (0.967)20 0.74/0.70 0.001 (0.964) 0.001 (0.964) 0.001 (0.964) 0.001 (0.964) 0.015 (0.962) 0.075 (0.861) �0.016 (0.981)

a The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 2552,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serumwere measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four sub-samples of the 415 subjects based on age-groups.

b LOD is the limit of detection.c The fractions < LOD listed are averages for the fixed (F) and variable (V) censoring limits, respectively.

S.P

.C

au

dill

eta

l./

Ch

emo

sph

ere6

8(

20

07

)1

69

–1

80

175

Table 5Bias of 95th percentile estimates and coverage of their 95% confidence limits (entries are means of 1000 repetitions)

Sample size Variable censoringlimit variance (%)a

Fraction < LODb

F/VcUncensored Bias of 95th percentile estimates (coverage of 95% limits)

Fixed censoring limit Variable censoring limit

Single-imputation Multiple- imputation Single-imputation Multiple-imputationLOD/2 LOD/

p2 LOD/2 LOD/

p2

100 10 0.04/0.01 0.021 (0.955) 0.021 (0.955) 0.021 (0.955) 0.021 (0.955) 0.021 (0.955) 0.021 (0.955) 0.021 (0.955)20 0.06/0.10 0.041 (0.954) 0.041 (0.954) 0.041 (0.954) 0.041 (0.954) 0.041 (0.954) 0.041 (0.954) 0.041 (0.954)10 0.35/0.24 0.040 (0.954) 0.040 (0.954) 0.040 (0.954) 0.040 (0.954) 0.040 (0.954) 0.040 (0.954) 0.040 (0.954)20 0.28/0.32 0.041 (0.954) 0.041 (0.954) 0.041 (0.954) 0.041 (0.954) 0.041 (0.954) 0.041 (0.954) 0.041 (0.954)10 0.55/0.57 0.024 (0.952) 0.024 (0.952) 0.024 (0.952) 0.024 (0.952) 0.024 (0.952) 0.024 (0.952) 0.024 (0.952)20 0.41/0.37 0.036 (0.957) 0.036 (0.957) 0.036 (0.957) 0.036 (0.957) 0.036 (0.957) 0.036 (0.957) 0.036 (0.957)10 0.69/0.69 0.052 (0.964) 0.052 (0.964) 0.052 (0.964) 0.052 (0.964) 0.052 (0.964) 0.052 (0.964) 0.052 (0.964)20 0.61/0.80 0.032 (0.946) 0.032 (0.946) 0.032 (0.946) 0.032 (0.946) 0.034 (0.948) 0.046 (0.950) 0.030 (0.946)

500 10 0.09/0.09 0.007 (0.944) 0.007 (0.944) 0.007 (0.944) 0.007 (0.944) 0.007 (0.944) 0.007 (0.944) 0.007 (0.944)20 0.08/0.09 0.007 (0.949) 0.007 (0.949) 0.007 (0.949) 0.007 (0.949) 0.007 (0.949) 0.007 (0.949) 0.007 (0.949)10 0.32/0.32 0.008 (0.943) 0.008 (0.943) 0.008 (0.943) 0.008 (0.943) 0.008 (0.943) 0.008 (0.943) 0.008 (0.943)20 0.27/0.15 0.003 (0.945) 0.003 (0.945) 0.003 (0.945) 0.003 (0.945) 0.003 (0.945) 0.003 (0.945) 0.003 (0.945)10 0.54/0.50 0.014 (0.961) 0.014 (0.961) 0.014 (0.961) 0.014 (0.961) 0.014 (0.961) 0.014 (0.961) 0.014 (0.961)20 0.52/0.52 0.002 (0.966) 0.002 (0.966) 0.002 (0.966) 0.002 (0.966) 0.002 (0.966) 0.002 (0.966) 0.002 (0.966)10 0.76/0.82 0.003 (0.949) 0.003 (0.949) 0.003 (0.949) 0.003 (0.949) 0.003 (0.949) 0.004 (0.949) 0.003 (0.949)20 0.74/0.70 0.007 (0.957) 0.007 (0.957) 0.007 (0.957) 0.007 (0.957) 0.009 (0.959) 0.025 (0.963) 0.006 (0.964)

a The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 2552,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serumwere measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four sub-samples of the 415 subjects based on age-groups.

b LOD is the limit of detection.c The fractions < LOD listed are averages for the fixed (F) and variable (V) censoring limits, respectively.

176S

.P.

Ca

ud

illet

al.

/C

hem

osp

here

68

(2

00

7)

16

9–

18

0

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180 177

censoring occurred (all rows of column 4 labeled ‘‘Uncen-sored’’). For a sample size of 100 the average bias is 2.2%and for 500 the average bias is 0.6%. As long as the fractionof results below the LOD does not exceed 0.10, the 25thpercentile estimates using single-imputation (LOD/2 orLOD/

p2) or multiple-imputation under fixed censoring

(columns 5 through 7) retain the same low bias as theuncensored results because all of the censored values arebelow the 25th percentile. Similar results are obtained forsingle- and multiple-imputation methods in estimating the25th percentile when a variable censoring limit exists andthe fraction of censored values does not exceed 0.10. Thususing multiple-imputation to estimate the 25th percentiledoes not appear to have an advantage over single-imputa-tion when fixed or variable censoring limits exist and thefraction of results below the LOD is less than 10%. Asthe censoring fraction nears or exceeds 0.25 under fixedor variable censoring, the magnitude of bias relative tothe average uncensored bias (which is 0.022 and 0.006 forsamples sizes of 100 and 500, respectively) increases forall three imputation methods. When the fraction of resultsbelow the LOD is near 0.25 (rows 3 and 4 for each samplesize) the variable censoring limit variance (column 2) doesnot seem to adversely affect the bias of the 25th percentileestimates for any of the three imputation methods exceptsingle-imputation using LOD/2 (third from last columnrows 3 and 4 for each sample size). The reason for thisanomaly is not clear, but because none of the three meth-ods have consistently low bias for this degree of censoring,it appears that a 25th percentile should not be estimatedwhen close to 25% or more of the results are below theLOD.

Table 2 shows a slight bias in estimation of the 50th per-centile for a sample size of 100 and 500 even when no cen-soring occurred. For sample sizes of 100 the average bias is1.5% and for 500 the average bias is 0.5% (all rows of col-umn 4). As long as the fraction of censored values is nomore than one-third, the 50th percentile estimates usingsingle-imputation (LOD/2 or LOD/

p2) or multiple-impu-

tation under fixed censoring (columns 5 through 7) retainthe same low bias as the uncensored results because all ofthe censored values are below the 50th percentile. Undervariable censoring, when the fraction of censored resultsis at or near one-third and the variable censoring limit var-iance is 20%, single-imputation using LOD/

p2 appears to

be associated with a slight positive bias relative to uncen-sored results (next to last column row 4 for both samplesizes). Bias associated with single-imputation using LOD/2 and with multiple-imputation under variable censoringmay also be slightly different from those associated withuncensored results when one-third of results are censored,but the differences are minimal (compare column 4 withcolumns 8 and 10 rows 3 and 4 for both sample sizes). Thusthere may be a slight advantage to multiple-imputation orsingle-imputation using LOD/2 as compared to single-imputation using LOD/

p2 when estimating a 50th percen-

tile and up to one-third of results are censored and there is

a variable censoring limit with censoring limit variance ashigh as 20% of the variance of the measured samples. Asthe censoring fraction nears or exceeds one-half under fixedor variable censoring, the magnitude of bias relative to theaverage uncensored bias (which is 0.015 and 0.005 for sam-ples sizes of 100 and 500, respectively) increases for allthree imputation methods. As the censoring fractionexceeds 0.5 (see rows 7 and 8 for all sample sizes), single-imputation tends to be positively biased and multiple-imputation tends to be negatively biased. These results sug-gest that 50th percentile estimates should not be computedwhen the fraction of results below the LOD is near to orexceeds 0.5.

Table 3 shows a slight bias in estimation of the 75th per-centile for sample sizes of 100 and 500 even when no cen-soring occurred. For a sample size of 100, the averagebias is 1.3% and for 500, it is 0.2%. As long as the fractionof censored values is no more than one-half, the 75th per-centile estimates using single-imputation (LOD/2 or LOD/p

2) or multiple-imputation under fixed censoring (columns5 through 7) retain the same low bias as the uncensoredresults because all of the censored values are below the75th percentile. Under variable censoring, when the frac-tion of censored results is at or near one-half and the var-iable censoring limit variance is 20%, single-imputationusing LOD/

p2 appears to be associated with a slight posi-

tive bias relative to uncensored results (next to last columnrow 6 for both sample sizes). Bias associated with single-imputation using LOD/2 and with multiple-imputationunder variable censoring may also be slightly different fromthose associated with uncensored results when one-half ofresults are censored, but the differences are minimal (com-pare column 4 with columns 8 and 10 rows 5 and 6 for bothsample sizes). Thus there may be a slight advantage to mul-tiple-imputation or single-imputation using LOD/2 ascompared to single-imputation using LOD/

p2 when esti-

mating a 75th percentile and up to one-half of results arecensored and there is a variable censoring limit with cen-soring limit variance as high as 20% of the variance ofthe measured samples. As the censoring fraction nears orexceeds seven-tenths under fixed or variable censoring,the magnitude of bias relative to the average uncensoredbias (which is 0.013 and 0.002 for samples sizes of 100and 500, respectively) increases for all three imputationmethods. These results suggest that 75th percentile esti-mates should not be computed when the fraction of resultsbelow the LOD is near to or exceeds 0.7.

Table 4 shows a slight bias in estimation of the 90th per-centile for a sample size of 100 or 500 even when no censor-ing occurred. For sample sizes of 100, the average bias is2.3% and for 500, it is 0.4%. As long as the censoring frac-tion is no more than 0.7, the 90th percentile estimates usingsingle- or multiple-imputation under fixed censoring arecomparable to those that would have been obtained fromuncensored samples, with a couple of exceptions under var-iable censoring (see last three columns of the 7th and 8throws for both sample sizes). When the fraction less than

178 S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

the LOD is near to or above 0.70 and there is variable cen-soring, single-imputation appears to have an increasedpositive bias and multiple-imputation appears to have asmall negative bias when the censoring limit variance is20% of the variance associated with the base(10) logarithmof the measured results. So it appears that the size of thecensoring limit variance can affect the estimation of a90th percentile when the fraction of results below theLOD exceeds 70%.

Table 5 shows a slight bias in estimation of the 95th per-centile for a sample size of 100 or 500 even when no censor-ing occurred. For sample sizes of 100, the average bias is3.6% and for 500, it is 0.6%. As long as the censoring frac-tion is no more than 0.75, the 95th percentile estimatesusing single- or multiple-imputation under fixed or variablecensoring are equal to those that would have been obtainedfrom uncensored samples, with the possible exception ofsingle-imputation using LOD/

p2 and there is a variable

censoring limit with censoring limit variance as high as20% of the variance of the measured samples (column 9row 8 for both sample sizes). This result is not surprisingbecause imputed values are not likely to take values at or

Table 6Percentile estimates and 95% confidence intervals for 2,3,7,8-TCDDa in pg/g

Age-group Sample size Fraction < LODb Method

All 415 0.615 Single-imputatiLOD/2Single-imputatiLOD/

p2

Multiple-imput

[0, 30) 102 0.941 Single-imputatiLOD/2Single-imputatiLOD/

p2

Multiple-imput

[30, 45) 101 0.693 Single-imputatiLOD/2Single-imputatiLOD/

p2

Multiple-imput

[45, 60) 110 0.518 Single-imputatiLOD/2Single-imputatiLOD/

p2

Multiple-imput

[60+) 102 0.314 Single-imputatiLOD/2Single-imputatiLOD/

p2

Multiple-imput

Results presented by age-group, sample size, and method of estimation.a 2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin.b LOD is the limit of detection.

above the 95th percentile. This is also the reason, resultsare identical across many rows in the table. Using multi-ple-imputation to estimate the 95th percentile does notappear to have an advantage over single-imputation usingLOD/2 whether a fixed censoring limit or variable censor-ing limits exist, as long as the censoring fraction is less than0.75.

3.2. Example

The simulation results were used to determine whetherunbiased percentile estimates could be obtained for a2001 background exposure study in which PCDDs/PCDFs/cPCBs in human blood serum were measured ina Louisiana population. Subjects in this study had noknown exposure to dioxin-like compounds other thanexposure to background levels. Of the 415 measurementsof 2,3,7,8-TCDD in pg/g lipid, 255 (61.5%) were belowtheir corresponding limit of detection (LOD). The maxi-mum likelihood estimate (MLE) of the mean of the logbase(10) of these results was �0.086 and of the standarddeviation of the log base(10) of these results was 0.635.

lipid for 415 subjects from a 2001 study of a Louisiana population

Percentile (95% confidence interval)

50th 75th 90th 95th

on 0.9 3.0 5.1 7.2(0.8, 1.2) (2.6, 3.4) (4.6, 6.0) (5.9, 8.3)

on 1.2 3.1 5.3 7.2(1.1, 1.5) (2.7, 3.6) (4.7, 6.0) (6.0, 8.3)

ation 0.6 2.8 5.1 7.2(0.5, 0.7) (2.5, 3.6) (4.5, 6.0) (5.9, 8.3)

on 0.7 0.9 1.9 2.3(0.6, 0.8) (0.8, 1.1) (1.0, 2.3) (1.7, 5.0)

on 1.0 1.3 2.1 3.3(0.8, 1.1) (1.1, 1.6) (1.4, 3.3) (2.2, 7.0)

ation 0.5 0.6 0.8 1.9(0.4, 0.6) (0.6, 0.8) (0.6, 1.9) (0.8, 2.4)

on 0.8 2.2 3.3 4.3(0.6, 0.9) (1.2, 2.6) (2.5, 4.3) (3.1, 5.1)

on 1.1 2.3 3.3 4.4(0.8, 1.2) (1.5, 2.6) (2.6, 4.4) (3.1, 5.1)

ation 0.6 1.7 3.2 4.0(0.5, 0.7) (0.7, 2.6) (2.5, 4.3) (3.1, 5.1)

on 1.5 3.2 4.4 5.6(0.8, 2.2) (2.5, 3.7) (3.7, 5.6) (4.2, 8.4)

on 1.7 3.4 4.4 5.6(1.1, 2.4) (2.6, 3.8) (3.7, 5.6) (4.3, 8.4)

ation 0.9 3.2 4.3 5.9(0.5, 2.2) (2.5, 3.8) (3.7, 5.6) (4.3, 8.4)

on 3.5 5.9 8.3 11.7(2.6, 4.6) (4.8, 7.1) (7.0, 11.7) (7.5, 18.5)

on 3.6 5.9 8.3 11.7(2.6, 4.6) (4.8, 7.1) (7.0, 11.7) (7.5, 18.5)

ation 3.5 5.9 8.3 11.7(2.6, 4.6) (4.8, 7.1) (7.0, 11.7) (7.5, 18.5)

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180 179

The MLE estimate of the mean of the log10(LOD) valueswas 0.093 and the standard deviation of the log10(LOD)values was 0.215. Thus the variance of the log base(10) ofthe censoring limit values was approximately 10% of thevariance of the log base(10) of the 2,3,7,8-TCDD measure-ments.

Given the simulation results for samples of size 500 andvariable LODs with variance equal to 10% of the varianceof the 2,3,7,8-TCDD measurements, we should apparentlybe able to estimate the 75th, 90th, and 95th percentiles withless than 1% bias (Tables 3–5 row 13 columns 8–10). The50th percentile, on the other hand, could be positivelybiased by as much as 13% (Table 2 row 13 column 9) usingsingle-imputation with imputed values equal to LOD/

p2

and negatively biased by as much as 34% (Table 2 row13 column 10) using multiple-imputation.

The actual percentile estimates in pg/g lipid for all 415subjects are presented in the first two rows of Table 6.The single-imputation and multiple-imputation methodsdiffer substantially as would be expected based on the sim-ulation results in Table 1, when variable LODs exist and50% or more of results are below the LOD. The true50th percentile is most likely between 0.6 and 1.2 becausemultiple-imputation tends to be negatively biased and sin-gle-imputation with imputed value equal to LOD/

p2 tends

to be positively biased. As expected from the simulationspresented in Tables 2–4, the 75th, 90th, and 95th percen-tiles differ very little for the two methods.

Because interest exists in whether 2,3,7,8-TCDD levelsare related to age, we also stratified the sample by age tosee whether we could obtain unbiased percentile estimatesfor the resulting age-groups. The age-groups and their cor-responding sample sizes are also presented in Table 6.Because the sample sizes are nearer to 100 than to 500,we used the simulation results for 100 and a variableLOD with variance equal to about 10% of the varianceof the 2,3,7,8-TCDD measurements to determine the likelyreliability of various estimates. More than 90% of resultsare below the LOD in the less-than-30 years age-group,so even the 95th percentile estimates will likely be biased.

For the 30-to-45-years age-group, which has almost 70%of results below the LOD, the results in Tables 4 and 5 sug-gest that we should be able to estimate the 90th percentilewith approximately 3% bias, and the 95th percentiles withapproximately 5% bias. The 50th percentile on the otherhand, could be positively biased by as much as 111% andthe 75th percentile by as much as 22% (see seventh rownext to the last column of Tables 2 and 3 for sample sizeof 100), using single-imputation with imputed values equalto LOD/

p2. They could be negatively biased by as much

as 54% for the 50th percentile and 35% for the 75th percen-tile (see seventh row last column Tables 2 and 3 for samplesize of 100), using multiple-imputation.

For the 45-to-60-years age-group, which has approxi-mately one-half of results below the LOD, the results inTables 3–5 suggest that we should be able to estimate the75th with approximately 3% bias, and the 90th and 95th

percentiles with approximately 2% bias. The 50th percen-tile, on the other hand, could be positively biased by asmuch as 18% (see fifth row next to the last column of Table2 for sample size of 100) using single-imputation withimputed values equal to LOD/

p2 and negatively biased

by as much as 29% (see fifth row last column of Table 2for sample size of 100) using multiple-imputation.

For the 60-plus-years age-group, which has close to one-third of results below the LOD, the results in Tables 2–5suggest that we should be able to estimate the 50th, the75th, and the 90th percentiles with approximately 1% bias,and the 95th with approximately 4% bias.

Actual percentile estimates are presented in Table 6. Thesingle-imputation and multiple-imputation percentile esti-mates for the less-than-30-years age-group differ substan-tially from one another as would be expected, based onthe simulation results in Table 2 when variable LODs exist,and more than 75% of results are below the LOD. Exceptfor the 50th and 75 percentile estimates for the 30-to-45-years age-group, the percentile estimates by either methodare comparable to one another within each age-group forage-groups 30–45-years, 45–60 years, and 60-plus-years.Thus we can state with confidence, for instance, that the95th percentile of 2,3,7,8-TCDD increases from around4 ppt for 30-to-45-year-olds to about 12 ppt for personswho are 60 years old or older.

4. Discussion

Chemical exposure data tend to be highly skewed and toinclude a large fraction of measurements that are subject toleft censoring. For such data sets, it is often more appropri-ate to describe the distribution of results by presentingquantiles or percentiles. To estimate the upper percentilesof a distribution, the method of imputation has little effecton the bias of the estimates, as long as the LOD is fixed andthe percentage of censored results is below the percentilebeing estimated. Thus, multiple-imputation appears tohave no advantage over single-imputation when there is afixed censoring limit and the percentage of censored resultsis less than the percentile being estimated. With variableLODs, the percentage of results below the LOD can, how-ever, affect percentile estimates even when the percentage ofcensored values is less than the percentile being estimated.The extent to which this will occur depends on the variabil-ity in the LODs.

The variance of the base(10) logarithm of the LODsassociated with the 415 2,3,7,8-TCDD measurements inthis report was about 10% of the variance in the base(10)logarithm of the congener measurements. Simulationresults with an LOD variance of that relative magnitudesuggest that both single-imputation and multiple-imputa-tion lead to biased estimates of the particular percentileswhen a variable censoring limit exists and the percentageof censored values is near the percentile being estimated.

Although single-imputation with a fixed LOD has beenshown to lead to biased estimates of means or geometric

180 S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

means and their variances (Lubin et al., 2004), that doesnot seem to be the case for percentile estimation as longas the percentage of censored results is less than the percen-tile being estimated. When the censoring limits are variablewith a variance near 10% of the measurement variance,multiple-imputation does not appear to be advantageousover single-imputation when estimating a 50th percentile.But multiple-imputation does appear to have an advantageover single-imputation when estimating a 75th or a 90thpercentile if the censoring fraction is within 10-to-20 per-centage points of the percentile being estimated. Weassume this is probably true for a 95th percentile as wellalthough we did not include simulations with more than70% of results below the LOD. Thus, using multiple-impu-tation to estimate percentiles appears to have an advantageover single-imputation when variable censoring limits existand the censoring fraction is within 10-to-20 percentagepoints of the percentile being estimated.

Appendix

l = population mean of a distribution.r = population standard deviation of a distribution.l̂ ¼ estimate of l.r̂ ¼ estimate of r.f (Y|X) = density function of Y given X.X � N(l,r) indicates variable X is normally distributedwith mean l and standard deviation r.v2

n�1 is the symbol for a v-square distribution with n � 1degrees of freedom.Fv1,v2(a) is the symbol for the a quantile of an F distribu-tion with v1 and v2 degrees of freedom.

References

Agency for Toxic Substance and Disease Registry (ATSDR), 2005. Serumdioxin levels in residents of Calcasieu Parish, Louisiana. Atlanta: USDepartment of Health and Human Services. ATSDR.

Clopper, C.J., Pearson, E.S., 1934. The use of confidence or fiducial limitsillustrated in the case of the binomial. Biometrika 26, 404–413.

Gleit, A., 1985. Estimation of small normal data sets with detection limits.Environ. Sci. Technol. 19, 1201–1206.

Haas, C.N., Scheff, P.A., 1990. Estimation of averages in truncatedsamples. Environ. Sci. Technol. 24, 912–919.

Helsel, D.R., 1990. Less than obvious – statistical treatment of data belowthe detection limit. Environ. Sci. Technol. 24, 1766–1774.

Hornung, R.W., Reed, L.D., 1990. Estimation of average concentration inthe presence of nondetectable values. Appl. Occup. Environ. Hyg. 5,46–51.

Huybrechts, T., Thas, O., Dewulf, J., Van Langenhove, H., 2002. How toestimate moments and quantiles of environmental data sets with non-detected observations? A case study on volatile organic compounds inmarine water samples. J. Chromatogr. A 975, 123–133.

Korn, E.L., Graubard, B.I., 1998. Confidence intervals for proportionswith small expected number of positive counts estimated from surveydata. Survey Methodol. 24, 193–201.

Lubin, J.H., Colt, J.S., Camann, D., Davis, S., Cerhan, J.R., Severson,R.K., et al., 2004. Epidemiologic evaluation of measurement data inthe presence of detection limits. Environ. Health Perspect. 112, 1691–1696.

Lynn, H.S., 2001. Maximum likelihood inference for left-censored HIVRNA data. Statist. Med. 20, 33–45.

Persson, T., Rootzen, H., 1977. Simple and highly efficient estimators for atype I censored normal sample. Biometrika 64, 123–128.

Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys.Wiley, New York.

Travis, C.C., Land, M.L., 1990. Estimating the mean of data sets withnondetectable values. Environ. Sci. Technol. 24, 961–962.

Woodruff, R.S., 1952. Confidence intervals for medians and other positionmeasures. J. Am. Stat. Assoc. 47, 635–647.