Efficacy of Quasi-Cutoff Sampling and Model-Based Estimation For Establishment Surveys and Related...

13
Efficacy of Quasi-Cutoff Sampling and Model-Based Estimation For Establishment Surveys - and Related Considerations by James R. Knaub, Jr. Abstract: Weighted least squares regression through the origin has many uses in statistical science. An important use is for the estimation of attribute totals from establishment survey samples, where we might use quasi-cutoff sampling. Two questions in particular will be explored here, with respect to survey statistics: (1) How do we know this is performing well? and (2) What if the smallest members of the population appear to behave differently? This review article contains a summary of conclusions from experimental findings, and explanations with numerous references. Key Words: Establishment Surveys, Heteroscedasticity, Model-Based Classical Ratio Estimator, Multiple Attributes, Nonsampling Error, Prediction, Regression Through the Origin, Total Survey Error, Weighted Least Squares Regression Author: James R. Knaub, Jr., [email protected] Editor: Richard G. Graf, [email protected] NOTE: This article was revised on April 3, 2014. READING THE ARTICLE: You can read the article in portable document (.pdf) format (349165 bytes.) If you have any comments or for further discussion, contact an author. NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication. This page has been accessed 233 times since January 27 2014. Return to the Home Page. InterStat Page 1 of 1 JANUARY 2014 #1 TitleBy Efficacy of Quasi-Cutoff Sampling and Model-Based Estim... 4/9/2014 file://eianas01/OES/jk7/Documents/Statistical%20Methods%20Team/ResearchGate/JANU... 5/21/2016 - minor revision, p. 4

Transcript of Efficacy of Quasi-Cutoff Sampling and Model-Based Estimation For Establishment Surveys and Related...

Efficacy of Quasi-Cutoff Sampling and Model-Based Estimation For Establishment Surveys - and Related Considerations

by James R. Knaub, Jr.

Abstract: Weighted least squares regression through the origin has many uses in statistical science. An important use is for the estimation of attribute totals from establishment survey samples, where we might use quasi-cutoff sampling. Two questions in particular will be explored here, with respect to survey statistics: (1) How do we know this is performing well? and (2) What if the smallest members of the population appear to behave differently? This review article contains a summary of conclusions from experimental findings, and explanations with numerous references.

Key Words: Establishment Surveys, Heteroscedasticity, Model-Based Classical Ratio Estimator, Multiple Attributes, Nonsampling Error, Prediction, Regression Through the Origin, Total Survey Error, Weighted Least Squares Regression

Author:James R. Knaub, Jr., [email protected]

Editor: Richard G. Graf, [email protected]

NOTE: This article was revised on April 3, 2014.

READING THE ARTICLE: You can read the article in portable document (.pdf) format (349165 bytes.)If you have any comments or for further discussion, contact an author.

NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication.

This page has been accessed 233 times since January 27 2014.

Return to the Home Page.InterStat

Page 1 of 1JANUARY 2014 #1 TitleBy Efficacy of Quasi-Cutoff Sampling and Model-Based Estim...

4/9/2014file://eianas01/OES/jk7/Documents/Statistical%20Methods%20Team/ResearchGate/JANU...

5/21/2016 - minor revision, p. 4

Efficacy of Quasi-Cutoff Sampling and Model-Based Estimation For Establishment Surveys - and Related Considerations

James R. Knaub, Jr.1

Abstract: Weighted least squares regression through the origin has many uses in statistical science. An important use is for the estimation of attribute totals from establishment survey samples, where we might use quasi-cutoff sampling. Two questions in particular will be explored here, with respect to survey statistics: (1) How do we know this is performing well? and (2) What if the smallest members of the population appear to behave differently? This review article contains a summary of conclusions from experimental findings, and explanations with numerous references.

Key Words: Establishment Surveys, Heteroscedasticity, Model-Based Classical Ratio Estimator, Multiple Attributes, Nonsampling Error, Prediction, Regression Through the Origin, Total Survey Error, Weighted Least Squares Regression

1. Introduction

Weighted least squares (WLS) regression has many uses in statistics. Often, in science, there are relationships between variables such that as one or more regressor variables approach zero, so does the variable/attribute of interest. For example, as the number of square meters of a forest searched approaches zero, so does the number of poplar trees found. For an example in survey statistics: as the size of a respondent, measured in a previous annual inventory, approaches zero, so would the corresponding current weekly inventory response or predicted number, also approach zero. Here we consider establishment surveys, where there are many such situations where regression through the origin is required, which results in substantial heteroscedasticity. When estimating a total from a sample, with one or more sets of regressor data available, the smallest variance results from a cutoff sample of just the largest members of the population, as measured by the regressor or regressors. Unless the smallest members of the population are to be a major part of a subtotal to be published, dropping them from the sample may be very helpful, as this often leads to lower nonsampling error as well, and this often results in lower total survey error. This also reduces burden on responders and data collectors, and can truly result in ‘doing more with less,’ if one is careful to consider all aspects of total survey error. Note that because multiple attributes are simultaneously of interest, that is, multiple questions are asked on a survey, a respondent may be large for one attribute, and contribute little for others. The result for each attribute may be a quasi-cutoff sample, where the very largest of the population would be included in the sample, and a smattering of the others, including a few small ones, could be included as well. Those small observations, reported by an establishment large for something else, may be fairly accurate, if those responses are provided as well. Two

1 Lead Mathematical Statistician; Statistical Methods Team, US Energy Information Administration.Disclaimer: The views expressed are those of the author, and are not official US Energy Information Administration (EIA) positions unless claimed to be in an official US Government document.

questions will be explored here: (1) How do we know this is performing well? and (2) What if the smallest members of the population appear to behave differently?

2. Definitions

A few definitions may be helpful. First, let us consider regression weights for weighted least squares (WLS) regression. To derive the regression coefficients for WLS regression, we

consider a regression weight, , such that the estimated residual

. Thus the estimated residual is factored into a random factor, , and a nonrandom factor based on the regression weight, . (Note that the regression weight is not to be confused with survey weights or calibration weights. See, for example, Knaub(2012a), page 19.) An example of deriving the regression coefficient for one regressor is found in Knaub(2009), pages 3 and 4, and for two regressors, in Knaub(1996), pages 597 and 598. In Knaub(2007a), Appendix IV, page 36, weights for multiple regression are defined further in a ratio context. Below we will concentrate on the single regressor model, but this can be applied to multiple regression, as indicated in the last reference. Note that in that case, a size measure may no longer be a single regressor value, , but perhaps a linear combination of regressors.

These regression weights can be written in various formats. See Steel and Fay(1995) and Sweet and Sigman(1995). However, the simplest format is less likely to over fit a model to a given set of data, and thus be more broadly applicable, especially in a repeated data publication production environment, as found when an agency is producing Official Statistics. From Brewer, et.al.(1977) and Brewer(2002), one may use the following format: . Sometimes theformat is used, as in Särndal, Swensson, and Wretman(1992), but by using

, for one regressor ratio estimates, we have this: . See page 4 inKnaub(2009). The exponent γ is the coefficient of heteroscedasticity. (See Brewer(2002), and Knaub(2007b, 2011c).) Under this definition, when γ=½, the result is the model-based classical ratio estimator (CRE). (See Knaub(2005, and 2012a, top of page 4).) Note that the study of heteroscedasticity, when modeling the data, also has application to design-based sampling and estimation, as shown in Brewer(1963), and Holmberg(2007).

Regression/model-based estimation will function with any sample, as long as appropriate regressor data are available. Knaub(2007c) considers model performance. Note Douglas(2013) as a unique way to make the most of available regressor data. In econometric applications, one may be restrained to available data. For surveys, one may have more of a choice. Regardless, one must be aware of possible bias due to the method of data selection. Small area estimation (Knaub(1999, 2011b)) can make bias more of a concern. But as will be discussed in sections below, and noted strongly in the conclusions in Karmel and Jain(1987), purposive sampling, such as cutoff and quasi-cutoff sampling, can provide the most accurate alternatives.

The classical ratio estimator (CRE) has a lower coefficient of heteroscedasticity than one will usually estimate from an establishment survey. (See Knaub(1992, page 880; 1993; 1997; 2005).) However, it does appear robust against model-failure and nonsampling error. (See Knaub(2005, 2010).) Holmberg and Swensson(2001), in another context, also found it best to underestimate γ, to some extent. Further, using the CRE for purposes of estimating sample size needs may be

helpful (Knaub(2013b)), even if the value used for γ is changed later. The CRE may not be the best estimator very often, but it appears to rarely be a very bad one. Also, even when only the sum of the regressor data corresponding to the out-of-sample cases is known, we may still estimate not only the totals, but also the variances of the estimated totals. See page 11, Knaub(2003), and pages 776-777 in Knaub(1991). The CRE is not only a “skeleton key,” opening many locks, it is also often the best/most practical solution. Under some conditions one might consider rather ordinary, described in Cochran(1977), pages 159-160, one may find the CRE, “…hard to beat” (page 160). Being economical in addition, is quite a bonus.

For model-based estimation, the data are “…random or not, selected solely according to the values of the ,” as stated in Cochran(1977), page 158. The use of cutoff sampling (Knaub(2008b)) provides the smallest variance estimates for estimated totals. See Brewer(1963), Royall(1970), Cochran(1977, pages 158-160), and Knaub(2005). In Brewer(1963), this was called a “partial collection,” as noted in Knaub(2011c). This involves collecting a sample based solely on the largest from the regressor population. Knaub(2012b), and Knaub(2013a,b) show how volume coverage – the fraction of the regressor population total that corresponds to the sample – relates to variance estimation for the total, for the variable/attribute of interest. This technique reduces variance, but may cause concern as to a type of bias: model-failure (Knaub(2010)). However, as discussed below, the advantages of cutoff sampling are not only financial, from reducing the sample size, it very often has a lower total survey error.

Quasi-cutoff sampling, as noted in Knaub(2011a), is a result of the fact that surveys generally ask more than one question. When a respondent is asked for a datum because it is large for that attribute, it often answers for other attributes for which it may not be large. For those other attributes, there will then be data collected corresponding to smaller than would have been taken in a strictly cutoff sample. Thus the cutoff may be raised, as other data will be collected, which will contribute to the coverage, which in turn, based on the relationship of the ’s to the

’s, will determine the variance for the error in the estimated totals. (See Knaub(2013b).) Note that this is why some cutoffs seem to include so little in Douglas(2007). Because size measures will vary by attribute, this is far more efficient than using one size measure for a probability proportionate to size (PPS) sample, as has been struggled with at the US Energy Information Administration (EIA). Note that often, for agencies producing Official Statistics, there is one very good regressor available for each attribute, and here we rely on that relationship between x (regressor data) and y (attribute) to estimate for missing y-values. This is not to be confused with multivariate (multi-attribute) estimation. Multiple attributes are considered when sampling, if respondents do respond for more than one attribute, but the estimation here is based on each attributes relationship to a regressor or regressors.

3. How do we know that quasi-cutoff sampling and model-based estimation works?

Short Answer: Test like you would for any other case.

In 1989, the author developed an algorithm for selecting the certainty stratum for a stratified random sample survey of electric sales and revenue. (See Knaub(1989).) Soon afterward, a

jk7
Typewritten Text
jk7
Typewritten Text
jk7
Typewritten Text
___

presentation on model-based estimation by Nancy Kirkendall was the inspiration for applying that methodology to this electric sales and revenue survey. The certainty stratum was used as a quasi-cutoff sample, and the other data were ignored. Resulting estimated totals and standard errors were closely comparable to previous probability sample results which used more data.

Testing, as others have done, has included taking an annual census, drawing a sample from it corresponding to a monthly sample, using an older/other census as the regressor data, and checking to see how well we estimate the numbers we actually had in that more recent census. (Often, but not always, the best single regressor is the same data element from a previous census.) Results have been good.

One application at the EIA, by Jason Worrall, had no annual census, just a quasi-cutoff sample of oil and gas reserves by region, with production census data as regressor data. Some of the sample data were ignored, and then he estimated using the CRE for comparison purposes. Results were fair. Improvements are being sought, but a probability sample is not an option and the data have always been problematic. The smallest operators do not know their reserves. A probability proportionate to size (PPS) sample was previously attempted, but documentation problems regarding estimation that was used, and resource problems make that procedure untenable.

A great deal of electric power data have been checked by adding resulting estimated totals from twelve monthly samples, and comparing to later obtained annual censuses. Differences were examined in light of standard error estimates and nonsampling error. Results have been very favorable for this methodology. In particular, such a project was carried out by Joint Program in Survey Methodology (JPSM) Junior Fellows Brett Foster and Lisa Guo in the summer of 2010.

For graphical analysis of variance and bias, consider Knaub(2001). For logical considerations see Knaub(2010).

4. What if the smallest members of the population may behave very differently?

Knaub(2010) can provide some guidance. The impact of such a problem may be smaller than most people assume it will be. Using a probability sample can cause severe problems when it requests data from small establishments that may not be able to provide reliable data on a frequent basis. Further a probability sample will generally require a larger sample, which may add greatly to overall nonsampling error.

Following is a list of techniques that may alleviate problems when smaller members of the population may behave substantially differently from larger members, and one wishes to apply a cutoff or quasi-cutoff sample:

1) Stratify – by category (e.g. shale vs non-shale for gas production [suggested by AndrewHoegh], if even approximately possible by a proxy* measure), or by size (see Karmel and Jain(1987), noted by Nancy Kirkendall). Stratification can often be helpful. For purposivesampling and model-based estimation, data groupings are very important, so experiment.*

5/21/2016JRK - revised toadd acknowl-edgments

jk7
Typewritten Text
jk7
Typewritten Text
For example, depth-of-well might be used to stratify as follows: (1) less than a given depth, (2) greater than that given depth, and (3) unknown depth. The variance of y about the regression line in the first two strata should be improved such that the overall variance may be improved. Improved bias is also a consideration here. 4
jk7
Typewritten Text
jk7
Typewritten Text
jk7
Typewritten Text

2) Consider total survey error, and the aggregation levels at which you publish, to decide if somelevels are not publishable. (This applies to design-based sampling and census surveys as well.)

3) Consider using a regression weight step-function. This was found to be useful when thesmallest observations for petroleum-fired electric generation in a survey were found to be badlybehaved. See Knaub(2009), top of page 6.

4) See if multiple regression may be both feasible and helpful. (See Knaub(1996, 2003), butnote Brewer(2002), pages 109-110.)

5) Experimentation has shown that the model-based classical ratio estimator is often quite robustto model-failure.

6) If there is money in the budget, and respondent burden allows one to occasionally randomsample the smallest members of the population in a separate study, then that would be nice.

Note that in practice, model-based estimation with quasi-cutoff sampling seems to virtually always work much better than many statisticians anticipate it will (Karmel and Jain(1987), for instance - note their conclusions), and for establishment surveys, this is often not only cheaper, but much more accurate than design-based methods.

It is often important to use weighted least squares (WLS) regression. The presence of heteroscedasticity is demonstrable and measureable in the data (Brewer(1963), Carroll and Ruppert(1988), and Knaub(1992, 1993, 1997)). This applies to other areas as well. See Lee(2013), and for the independent variable, Blas and Sandoval(2010), for use re calibration in analytical chemistry. (Note: In analytical chemistry, “calibration” has a different meaning than in survey statistics.) WLS regression is likely the rule, rather than the exception, in many situations, but in particular when regression through the origin is reasonable. If it is reasonable that y-values approach zero as the regressor or regressors approach zero, and/or the standard error of an intercept value is comparable in size to that intercept value, then one should use regression through the origin, and WLS regression. This would apply to many disciplines. The combination of WLS regression and cutoff, or quasi-cutoff sampling, is very useful for establishment surveys, and regression is very often through the origin. Nonlinear regression might be considered, but needs to be justified. If data are stratified by size (the regressor x, or perhaps a linear combination of regressors) and say a linear regression of y on that size measure in each stratum, in a series of adjacent size-based strata, indicate a monotonically changing slope, with standard errors of those slopes indicating this phenomenon is ‘real,’ then nonlinearity may have been demonstrated. In about a quarter of a century of research on establishment surveys, however, this author has not seen this happen demonstrably, though there may be cases in the literature. It seems that WLS linear regression is often ‘hard to beat.’

It is important to consider variance and bias for sampling and nonsampling error. (See Knaub(2001, 2002, 2007c, 2010.) Note that estimation of the variance of prediction errors can be made readily enough (Knaub(1996, 1999, 2011b)), and the assumption of normality for a confidence interval may be reviewed (Knaub(2001)). When zero is included in that confidence interval, one can expect a skewed error distribution, and other distributional forms might be

 

6  

considered, using the standard error estimated, and a sensitivity analysis performed. One might also consider a nonparametric extreme width to a confidence interval as limited by Chebyshev's inequality. (See, for example, Nonparametric Methods(2001).) Cochran(1977), Chapter 13 is one source which deals with nonsampling error measurement, that might be used in a separate study of randomly selected data for out-of-sample (‘small’) cases. See the last paragraph on page 396 there. Arranging for this in a budget may be very difficult, however. Using the quasi-cutoff sample data here, and model-based estimation, we can still consider nonsampling error in the primary data, as indicated above. One option, on pages 8 and 9 of Knaub(1999) involves examining data revisions, as a proxy for measurement of nonsampling error, and incorporating that knowledge into the prediction of the sampling error. At the EIA, there are revision ‘error’ data collected as such proxies for nonsampling error, for a variety of energy sources. Tables of mean absolute changes are presented. Knaub(2001, 2002, and 2007c) may be helpful as well. Scatterplots can be enlightening. Fundamentally, one needs an idea of the magnitude of nonsampling error to be able to assess the behavior of the data.

5. As Noted In Previous Articles Consider the following excerpts when dealing with these data. Note that if the smallest members of a population do not follow the same model as the larger members, then if their overall impact on all published numbers is small, that may be acceptable, but if a goal of the survey is to estimate specifically for those members of the population, then this is not likely to be acceptable. From Knaub(2012b), page 15:

Notice that the smallest sample size that will achieve a given value of [volume coverage] is from a cutoff sample, as noted in Brewer(1963) and Royall(1970). This can be very helpful with highly skewed establishment surveys, in particular. However, many statisticians prefer not to leave some members of a population with no chance of being selected for a sample, and another such opinion is expressed in Brewer(1999). In practice, however, the current author has found cutoff sampling, or quasi-cutoff sampling caused by sampling for more than one attribute, to [appear to] lower the overall error, the total survey error (TSE), for establishment surveys. See Knaub(2010), and Knaub(2007a). In short, any data not sampled may be problematic, so why not try to leave out of a sample the data that will likely cause the least damage when one does so? Also, the smallest observations are often ones that are reported by the smallest establishments, and may have relatively large measurement errors, as opposed to large establishments which may have more expertise. In quasi-cutoff sampling, a small observation may be provided by an establishment selected for another attribute, and such data may be more reliable.

From Knaub(2009), pages 5 – 6:

Weights can be estimated from the data. (See Knaub(1993, 1997), Carroll and Ruppert(1988), Sweet and Sigman(1995), Steel and Fay(1995), and Brewer(1963), for example.) If the estimated [regression] weights cause an increase in estimated variance, it is

 

7  

not [permissible] to pick another weight solely to lower the variance estimate. Such an estimate would not be justified unless there was a functional reason for it. The object is to give less weight to the more uncertain data points, and those are generally the largest, but data near the origin can have disproportionately large measurement error in many practical situations. A good reason for using cutoff sampling is to avoid collecting data that are not reliable. Often with design-based sampling, the smallest observations are imputed by some model since they are either nonrespondents or their responses do not ‘pass the laugh test’ (badly fail reasonable edits). However, from Knaub(2008a), one may find a modified [regression] weight to be better for reasons of robustness. [*] …. In regression through the origin we may use 0.5<γ<1 (see Brewer(2002)), but near the origin, data quality problems can cause gamma to appear to be smaller (see Knaub(2002)). If one were to use WLS with those data, then one may overestimate variance.

*Perhaps better to use a step-function for the regression weight, as at the top of page 6 in Knaub(2009).

6. Summary

Testing, based on experiments and on the production of Official Statistics, along with academic research, has shown that quasi-cutoff sampling, with weighted least squares linear regression estimation, generally performs very well for establishment sample surveys, when good regressor data are available. Generally, for official statistics, there is a very good regressor available. Multiple regression is not as often available or useful, but may be. One should look for the best use of regressor data (Douglas(2013)), but otherwise will generally find it best to keep modeling as simple as possible, to avoid reacting too strongly to ‘noise.’ Model-failure should be considered, but not to the exclusion of the ‘big picture.’ From a total survey error point-of-view, quasi-cutoff sampling with model-based (i.e., regression) estimation may often be the most accurate option, in addition to being the least expensive, lowest respondent burden option. Heteroscedasticity and regression through the origin should be considered for this and other scientific applications. Stratification is a very important consideration for survey data.

Acknowledgements The author has learned from many statisticians and others over many years, personally and/or in the literature, and wishes to thank them all. It has been interesting.

 

8  

Supporting Statements, to Office of Management and Budget, for Survey Clearance

Some information provided to the Office of Management and Budget by various US Energy Information Administration personnel, with regard to use of quasi-cutoff sampling and model-based estimation, is found at the URLs below. More applications should follow in subsequent years. (See, for example, Knaub(2011b), for petroleum supply data, etc.) http://www.reginfo.gov/public/do/DownloadDocument?documentID=186666&version=2, electric power data, re “EIA-826, EIA-923,” 2011 http://www.reginfo.gov/public/do/DownloadDocument?documentID=277030&version=2, natural gas receipts, re “EIA-857,” 2011 (revisions needed) http://www.reginfo.gov/public/do/DownloadDocument?documentID=355180&version=0, electric power data, re “EIA-923,” 2012 http://www.reginfo.gov/public/do/DownloadDocument?documentID=381390&version=0, natural gas and oil reserves, re “EIA-23,” 2013 For further information on US Energy Information Administration (EIA) surveys, see these web pages: http://www.eia.gov/survey/.

References Blas, B., and Sandoval, M. C. (2010), “Heteroscedastic Controlled Calibration Model Applied to Analytical Chemistry. Journal of Chemometrics, 24(5), 241-248. Brewer, K.R.W.(1963), “Ratio Estimation in Finite Populations: Some Results Deducible from the Assumption of an Underlying Stochastic Process,” Australian Journal of Statistics, 5, pp. 93-105. Brewer, K.R.W. (1999), “Design-based or Prediction-based Inference? Stratified Random vs Stratified Balanced Sampling. Int. Statist. Rev., 67(1), 35-47. Brewer, K.R.W. (2002). Combined Survey Sampling Inference: Weighing Basu’s Elephants, Arnold, London. Brewer, K.R.W., Foreman, E.K., Mellor, R.W. and Trewin, D.J. (1977). Use of experimental design and population modelling in survey sampling. Bull. Int. Statist. Inst., 47(3), 173-190.

 

9  

Carroll, R.J., and Ruppert, D. (1988), Transformation and Weighting in Regression, Chapman & Hall. Cochran, W.G.(1977), Sampling Techniques, 3rd ed., John Wiley & Sons. Douglas, J.R.(2007), “Model-Based Sampling Methodology for the new EIA-923,” http://www.eia.gov/pressroom/presentations/asa/asa_meeting_2007/fall/files/modeleia923.ppt. Presented to the American Statistical Association Committee on Energy Statistics, October 18, 2007. Douglas, J.R.(2013), “Efficiently Utilizing Available Regressor Data Through a Multi-Tiered Survey Estimation Strategy,” InterStat, September 2013, http://interstat.statjournals.net/YEAR/2013/abstracts/1309001.php Holmberg, A.(2007), “Using Unequal Probability Sampling in Business Surveys to Limit Anticipated Variances of Regression Estimators,” Proceedings of the Third International Conference on Establishment Surveys (Montreal, Quebec, Canada), American Statistical Association, pp. 550-556. http://www.amstat.org/meetings/ices/2007/proceedings/ICES2007-000162.PDF. Holmberg, A., and Swensson, B. (2001), “On Pareto πps Sampling: Reflections on Unequal Probability Sampling Strategies,” Theory of Stochastic Processes, Vol. 7, pp. 142-155. Karmel, T.S., and Jain, M. (1987), "Comparison of Purposive and Random Sampling Schemes for Estimating Capital Expenditure," Journal of the American Statistical Association, Vol.82, pages 52-57. Knaub, J.R., Jr. (1989), "Ratio Estimation and Approximate Optimum Stratification in Electric Power Surveys," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 848-853, http://www.amstat.org/sections/srms/Proceedings/papers/1989_157.pdf. (Note Appendix A, pp. 852-853.) Knaub, J.R., Jr. (1991), “Some Applications of Model Sampling to Electric Power Data,” Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 773-778. http://www.amstat.org/sections/srms/Proceedings/papers/1991_133.pdf Knaub, J.R., Jr. (1992), "More Model Sampling and Analyses Applied to Electric Power Data," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 876-881. http://www.amstat.org/sections/srms/Proceedings/papers/1992_148.pdf Knaub, J.R., Jr. (1993), "Alternative to the Iterated Reweighted Least Squares Method: Apparent Heteroscedasticity and Linear Regression Model Sampling," Proceedings of the International Conference on Establishment Surveys (Buffalo, NY, USA), American Statistical Association, pp. 520-525.

 

10  

Knaub, J.R., Jr.(1996), “Weighted Multiple Regression Estimation for Survey Model Sampling,” Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 596-599. http://www.amstat.org/sections/srms/proceedings/papers/1996_101.pdf Knaub, J.R., Jr. (1997), “Weighting in Regression for Use in Survey Methodology,” Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 153-157. http://www.amstat.org/sections/srms/proceedings/papers/1997_023.pdf Knaub, J.R., Jr. (1999), “Using Prediction-Oriented Software for Survey Estimation,” InterStat, August 1999, http://interstat.statjournals.net/YEAR/1999/abstracts/9908001.php?Name=908001

- Short version: “Using Prediction-Oriented Software for Model-Based and Small Area Estimation,” Proceedings of the Survey Research Methods Section, American Statistical Association, http://www.amstat.org/sections/srms/proceedings/papers/1999_115.pdf Knaub, J.R., Jr. (2001), “Using Prediction-Oriented Software for Survey Estimation - Part III: Full-Scale Study of Variance and Bias,” InterStat, June 2001, http://interstat.statjournals.net/YEAR/2001/abstracts/0106001.php?Name=106001

- Short version: Proceedings of the Survey Research Methods Section, American Statistical Association, http://www.amstat.org/sections/srms/proceedings/y2001/Proceed/00621.pdf Knaub, J.R., Jr. (2002), “Practical Methods for Electric Power Survey Data,” InterStat, July 2002, http://interstat.statjournals.net/YEAR/2002/abstracts/0207001.php?Name=207001. Knaub, J.R., Jr. (2003), “Applied Multiple Regression for Surveys with Regressors of Changing Relevance: Fuel Switching by Electric Power Producers,” InterStat, May 2003, http://interstat.statjournals.net/YEAR/2003/abstracts/0305002.php?Name=305002. Knaub, J.R., Jr.(2005), “Classical Ratio Estimator,” InterStat, October 2005, http://interstat.statjournals.net/YEAR/2005/abstracts/0510004.php?Name=510004 - on model-based CRE. Knaub, J.R., Jr.(2007a), “Cutoff Sampling and Inference,” InterStat, April 2007, http://interstat.statjournals.net/YEAR/2007/abstracts/0704006.php?Name=704006 Knaub, J.R., Jr. (2007b), “Heteroscedasticity and Homoscedasticity” in Encyclopedia of Measurement and Statistics, Editor: Neil J. Salkind, Sage, Vol. 2, pp. 431-432. Knaub, J.R., Jr.(2007c), “Model and Survey Performance Measurement by the RSE and RSESP,” Proceedings of the Survey Research Methods Section, American Statistical Association, http://www.amstat.org/sections/srms/proceedings/y2007/Files/JSM2007-000197.pdf.

 

11  

Knaub, J.R., Jr. (2008a), “Cutoff vs. Design-Based Sampling and Inference For Establishment Surveys,” InterStat, June 2008, http://interstat.statjournals.net/YEAR/2008/abstracts/0806005.php?Name=806005. Knaub, J.R., Jr. (2008b), “Cutoff Sampling.” In Encyclopedia of Survey Research Methods, Editor: Paul J. Lavrakas, Sage. Knaub, J.R., Jr.(2009), “Properties of Weighted Least Squares Regression for Cutoff Sampling in Establishment Surveys,” InterStat, December 2009, http://interstat.statjournals.net/YEAR/2009/abstracts/0912003.php?Name=912003. Knaub J.R., Jr. (2010), “On Model-failure When Estimating from Cutoff Samples, InterStat, July 2010, http://interstat.statjournals.net/YEAR/2010/abstracts/1007005.php?Name=007005 Knaub J.R., Jr.(2011a). “Cutoff Sampling and Total Survey Error,” Journal of Official Statistics, Letter to the Editor, 27(1), 135-138, http://www.jos.nu/Articles/abstract.asp?article=271135. (click on “Full Text”) Knaub, J.R., Jr.(2011b), “Some Proposed Optional Estimators for Totals and their Relative Standard Errors for a set of Weekly Cutoff Sample Establishment Surveys,” InterStat, July 2011, http://interstat.statjournals.net/YEAR/2011/abstracts/1107004.php?Name=107004. Knaub, J.R., Jr. (2011c), “Ken Brewer and the Coefficient of Heteroscedasticity as Used in Sample Survey Inference,” Pakistan Journal of Statistics, Vol. 27(4), 2011, 397-406, invited article for special edition in honor of Ken Brewer’s 80th birthday, found at http://www.pakjs.com/journals/27(4)/27(4)6.pdf. Knaub, J.R., Jr. (2012a), “Use of Ratios for Estimation of Official Statistics at a Statistical Agency,” InterStat, May 2012, http://interstat.statjournals.net/YEAR/2012/abstracts/1205002.php?Name=205002. Knaub, J.R., Jr. (2012b), “Projected Variance for the Model-Based Classical Ratio Estimator,” InterStat, September 2012, http://interstat.statjournals.net/YEAR/2012/abstracts/1209001.php?Name=209001. Knaub, J.R., Jr. (2013a), “Projected Variance for the Model-Based Classical Ratio Estimator II: Sample Size Requirements,” InterStat, March 2013, http://interstat.statjournals.net/YEAR/2013/abstracts/1303001.php?Name=303001. Knaub, J.R., Jr. (2013b), “Projected Variance for the Model-Based Classical Ratio Estimator: Estimating Sample Size Requirements,” to be published in the Proceedings of the Survey Research Methods Section, American Statistical Association, http://www.amstat.org/sections/srms/proceedings/, for 2013, available online, circa April 2014.

 

12  

Lee, C.R.(2013), “Use of replicate calibration samples in analytical chemistry: uncertainties due to lack of knowledge of heteroscedasticity,” found at http://www.analyt.chrblee.net/calibration/calibscedastpost2.pdf, hosted by Exploring Analytical Chemistry, http://www.analyt.chrblee.net/topic/index.html, Christopher R. Lee. “Nonparametric Methods,” April 2001, Contaminated Sites Statistical Applications Guidance Document No. 12-5, Ministry of Environment, British Columbia, Canada http://www.env.gov.bc.ca/epd/remediation/guidance/technical/pdf/12/gd05_all.pdf. Royall, R.M. (1970), "On Finite Population Sampling Theory Under Certain Linear Regression Models," Biometrika, 57, pp. 377-387. Särndal C.-E, Swensson B., and Wretman, J.(1992), Model Assisted Survey Sampling, Springer. Steel, P. and Fay, R.E. (1995), “Variance Estimation for Finite Populations with Imputed Data,” Proceedings of the Section on Survey Research Methods, Vol. I, American Statistical Association, pp. 374-379. http://www.amstat.org/sections/srms/Proceedings/papers/1995_063.pdf. Sweet, E.M. and Sigman, R.S. (1995), “Evaluation of Model-Assisted Procedures for Stratifying Skewed Populations Using Auxiliary Data,” Proceedings of the Section on Survey Research Methods, Vol. I, American Statistical Association, pp. 491-496. http://www.amstat.org/sections/srms/Proceedings/papers/1995_084.pdf