An interactive decision support system for designing appropriate and adaptive sampling procedures in...

11
Socio-Econ. Plnnn. Sci. Vol. 23. No. 6. pp. 387-397, 1989 0038.0121:89 $3.00 + 0.00 Printed in Great Brllain. All rights reserved Copyright ,(‘ 1989 Pergamon Press plc An Interactive Decision Support System for Designing Appropriate and Adaptive Sampling Procedures in Electric Utility Load Research KRISHNAMURTY MURALIDHAR Department of Decision Sciences and Information Systems, Florida International University, Miami, FL 33199, U.S.A. and MARIETTA J. TRETTER Department of Business Analysis and Research, Texas A&M University, College Station, TX 71843. U.S.A. (Rc,ceiverl June 1988) Abstract-Confidence interval estimation of customer demand for electricity plays a vital role in the capacity and financial planning of electric utility companies. Inaccurate or inadequate estimation could severely affect the economic efficiency of these companies. Selecting the appropriate sampling procedure is, in turn, critical to ensuring success in the accurate estimation of electrical demand. Because of the nature and diversity of demand for electricity, however, classical sampling procedures have to be extensively modified. This paper describes a decision support system (DSS) that aids electric utility companies in selecting and designing appropriate sampling plans. The DSS adapts sophisticated statistical techniques to address all aspects of the sampling procedure. INTRODUCTION Load research plays a central role in the planning function of electric utility companies. A program for such research can be briefly described as the measurement and study of electrical load to provide reliable knowledge of load characteristics and the contribution of customer groups to system peak loads [I]. A load research program embraces several activities including load forecasting, estimating customer demand and load management. One of the key aspects of load research is the accurate estimation of customer electrical demand. Customer demand for electricity is estimated in 15-30 min intervals and reported to the Federal Energy Regulatory Commission (FERC) as well as to local public utility commissions (as part of the system peak load information) in 60min intervals. Customer demand data are often used in determining public utility rates. The data are also used as one of the inputs to other planning operations such as evaluating load management programs, rate experiments and cost of service studies. Incorrect or inadequate data can adversely affect the public utility companies and, ultimately, the general public. Hence, the Public Utilities Regulatory Policies Act (PURPA) of 1978 was enacted to specify regulations and guidelines for performing load research. PURPA also specifies the requirements in estimating customer demand for electricity so that all electric utility companies will have consistent and accurate information. The objective of this paper is to describe a Decision Support System (DSS) that will aid electric utility companies by addressing the key issue of load research, namely, the sampling design needed for accurate estimation of customer demand. The following section provides a general discussion of estimating electrical demand followed by a review of the special PURPA requirements. The third section describes the importance of sample size in designing a sample, followed by a section describing the need for a DSS. A description of the proposed DSS is provided in the fifth section. 387

Transcript of An interactive decision support system for designing appropriate and adaptive sampling procedures in...

Socio-Econ. Plnnn. Sci. Vol. 23. No. 6. pp. 387-397, 1989 0038.0121:89 $3.00 + 0.00 Printed in Great Brllain. All rights reserved Copyright ,(‘ 1989 Pergamon Press plc

An Interactive Decision Support System for

Designing Appropriate and Adaptive

Sampling Procedures in Electric Utility

Load Research

KRISHNAMURTY MURALIDHAR Department of Decision Sciences and Information Systems, Florida International University, Miami,

FL 33199, U.S.A.

and

MARIETTA J. TRETTER Department of Business Analysis and Research, Texas A&M University, College Station,

TX 71843. U.S.A.

(Rc,ceiverl June 1988)

Abstract-Confidence interval estimation of customer demand for electricity plays a vital role in the capacity and financial planning of electric utility companies. Inaccurate or inadequate estimation could severely affect the economic efficiency of these companies. Selecting the appropriate sampling procedure is, in turn, critical to ensuring success in the accurate estimation of electrical demand. Because of the nature and diversity of demand for electricity, however, classical sampling procedures have to be extensively modified. This paper describes a decision support system (DSS) that aids electric utility companies in selecting and designing appropriate sampling plans. The DSS adapts sophisticated statistical techniques to address all aspects of the sampling procedure.

INTRODUCTION

Load research plays a central role in the planning function of electric utility companies. A program for such research can be briefly described as the measurement and study of electrical load to provide reliable knowledge of load characteristics and the contribution of customer groups to system peak loads [I]. A load research program embraces several activities including load forecasting, estimating customer demand and load management.

One of the key aspects of load research is the accurate estimation of customer electrical demand. Customer demand for electricity is estimated in 15-30 min intervals and reported to the Federal Energy Regulatory Commission (FERC) as well as to local public utility commissions (as part of the system peak load information) in 60min intervals. Customer demand data are often used in determining public utility rates. The data are also used as one of the inputs to other planning operations such as evaluating load management programs, rate experiments and cost of service studies. Incorrect or inadequate data can adversely affect the public utility companies and, ultimately, the general public. Hence, the Public Utilities Regulatory Policies Act (PURPA) of 1978 was enacted to specify regulations and guidelines for performing load research. PURPA also specifies the requirements in estimating customer demand for electricity so that all electric utility companies will have consistent and accurate information.

The objective of this paper is to describe a Decision Support System (DSS) that will aid electric utility companies by addressing the key issue of load research, namely, the sampling design needed for accurate estimation of customer demand. The following section provides a general discussion of estimating electrical demand followed by a review of the special PURPA requirements. The third section describes the importance of sample size in designing a sample, followed by a section describing the need for a DSS. A description of the proposed DSS is provided in the fifth section.

387

388 KRISHNA~WRTV M~RALIDHAR and MARIETTA J.TRETTEK

Use of the DSS for sensitivity analysis is then described. The final section offers a summary and conclusions.

ESTIMATING THE CUSTOMER DEMAND FOR ELECTRICITY

In estimating the customer demand for electricity, utility companies are required to estimate demand in 15-30 min intervals by sector (e.g. industrial, commercial and residential). This estimate of electrical demand per customer (measured in kW) is different from the estimate of consumption of electricity per customer (measured in kW-h). Data on the population’s consumption of electricity is readily available to electric utility companies from monthly utility bills (or bill frequencies). This is not the case in estimating demand in 15-30 min intervals, however, since most meters are capable of measuring electrical consumption but not denzand.

In some cases, such as for large industrial or commercial users, special meters capable of recording demand are often used. However, for other customers, especially residential and general service customers, electric utility companies have to purchase, install and maintain special meters that are capable of recording in 15-30 min intervals. The cost of purchasing, installing and maintaining such meters for a random sample of customers was estimated at $1400 per observation in 1980 [l]. While this cost may now be lower due lo advances in technology, it may still represent a significant outlay for some utility companies.

The process of estimating electrical demand in 15-30 min intervals is thus different from determining the consumption of electricity over a specified time, and may involve a significant cost outlay both in designing samples for different sectors and installing and monitoring appropriate meters. The DSS described in this study specifically addresses the selection of a sampling procedure for sectors (such as the residential and general service sectors) which require special meters to be installed. In some situations, due to the diversity of electrical demand, it may be necessary to split a single sector into smaller sampling units. In these cases, the DSS can be used to determine the appropriate sampling procedure for each sampling unit.

SPECIAL REQUIREMENTS IN ESTIMATING ELECTRICAL DEMAND

The estimation of electrical demand for load research purposes is different from confidence interval estimation in other situations. In the general case of confidence interval estimation of the mean, the level of confidence to be achieved and the width of the interval are specified. The demand for electricity, however, varies widely. If only the width is specified in estimating demand, this will lead to results that vary widely in accuracy. For some electric utility companies, the specified width may represent but a small fraction of the mean demand per customer in that sampling unit allowing for a very accurate estimate. For others, though, the same width may represent a large fraction of the mean demand per customer resulting in a correspondingly inaccurate estimate.

In order to achieve some consistency, the FERC has therefore specified that electric utility companies must have a reliability of 0.1 or less in the estimation procedure. Reliability (r) is defined

as,

where

r = (Z,,,s)/(.f n I,‘) (1)

Z, z = the upper u/2 percentage point of a normal distribution; .f = the sample mean: n = the sample size; s = the sample standard deviation and

(1 - u) = the required level of confidence.

The attractive feature of specifying r is that the width of the confidence interval is now a function of the mean demand. Consequently, all electric utility companies irrespective of the demand will achieve the same level of consistency. The requirements of PURPA state that all electric utility companies must achieve a confidence level of at least 90% and a reliability of 0.1 or less [l]. Thus,

A decision support system in electric utility load research 389

modifying requirements for the general case of confidence interval estimation allows for a desired level of consistency among electric utility companies.

DETERMINATION OF SAMPLE SIZE

The Load Research Manual (LRM) [l] published by FERC as a guide for electric utility companies conducting load research suggests the following sample size (n) for achieving the requirements of PURPA:

n = [(Z,,,s)/(r.~)]?. (2)

This sample size formula is based directly on the general case of confidence interval estimation. It is true that the confidence interval constructed using this sample size will contain the true mean lOO( 1 - u)% of the cases. However, choosing n using (2) &es not address the special requirements of reliability. Since the sample mean and sample variance are random variables,

P(.f > 11) = 0.5,

and

P(s? < a’) = 0.5.

In order that r be 0.1 or less, it is necessary that ,U be greater than p and s2 be less than IJ’. According to Bonferroni’s inequality [2],

In other words. while the sample size may satisfy the confidence level requirement, it fails to satisfy the reliability requirement.

The sample size problem is further complicated by the distribution of electrical demand. An empirical investigation of the annual demand for electricity indicates that it is not normally distributed but is better described by distributions other than normal [3]. Specifically, the two parameter Gamma, Log-normal and Weibull were found to closely describe the annual demand. Further investigation using both annual and monthly demand of residential customers in Texas confirmed these findings [4]. This suggests that the distribution of electrical demand in 15-30 min intervals is also described by the two parameter Gamma, Log-normal or Weibull distributions. This argument is further strengthened by the fact that monthly and annual demands are based on demand in 15-30 min intervals and, in addition, sums of Gamma distributions are also distributed as a Gamma [5].

In the general case of confidence interval estimation it is only necessary to address the sampling distribution of the sample mean in determining the sample size. For most distributions and relatively large sample sizes, the sampling distribution of the sample mean is approximately normal. Hence, the underlying population distribution will have little or no impact on the determination of sample size. However, in estimating electrical demand for load research purposes, due to the unique nature of the reliability requirement, it is necessary to address the sampling distribution of the sample variance as well. Unlike the sampling distribution of the sample mean, the sampling distribution of the sample variance is not the same for all population distributions. For the Gamma, Log-normal and Weibull distributions (and for most other non-normal distributions), the sampling distribution of the sample variance is a function of the parameters that describe these distributions [6]. The existing procedure to determine sample size fails to consider this factor.

Failing to consider these factors can be detrimental to the effectiveness of the estimation procedure. The results of a simulation experiment to determine the effectiveness of the LRM suggested sample sizes are shown in Table 1. The last column of the table shows the percentage of samples satisfying both the confidence level and reliability requirements for several simulated Gamma populations. In all cases, the percentage of samples satisfying both requirements was found to be less than 50%. This implies that if an electric utility company uses the LRM suggested sample size, then the probability that it will satisfy the requirements is less than 50%. Stated differently, if several utility companies use this sample size, less than 50% of these companies will satisfy the

390 KRISHNAMURTY MURALIDHAR and MARIETTA J. TRETTER

Table I. Effectiveness of LRM sample sire for Gamma populations

Shape LRM Percentage of oar:,mettr samnle size YdlIIDkS

4.0 3.6

3.2

2.8

24

2.0

1.6 1.2 11.X 0.4

b8

76

85 97

113 136 170 226 339 677

48.1 47.5 4X.7 46.5 47.6 48.4 4X.2 49.9 47.6 49.2

0.2 1354 49.8

Table 2. Sample sizes determined by thr empirical procedurr

Population MGUl Variance

Gamma 1.0 5.00 Weibull 1.0 5.00 LOg-IKXIIld 1.0 4.75

LRM sample size

1354 1354 1287

Empirical sample size

1655 1908 4013

requirements. The results of simulation experiments conducted using Log-normal and Weibull populations also provide similar results [4].

The importance of addressing the underlying population distribution is further highlighted by the results shown in Table 2 which provides the sample size required to satisfy both requirements for three populations. The Gamma and Weibull populations have the same mean (1 .O) and variance (5.0). Since the LRM suggested sample size addresses only the mean and variance, the sample size suggested for both populations is the same. However, the simulation results show that. due to the inherent nature of these distributions, the sample size required to satisfy the requirements for the Weibull population is significantly higher than that of the Gamma population. In addition to these two populations. consider the Log-normal population with a mean of 1.0 and variance of 4.75. The LRM suggested sample size for this population is lorvrr than that of the Gamma and Weibull populations. The simulation results show, however, that the actual sample size is snore t&n r~.ice that suggested for the Gamma and Weibull populations. Thus, it is clear that the existing procedure to determine sample size is inappropriate for estimating electrical demand.

THE NEED FOR A DSS

Several authors have observed that a decision maker (DM) will be well served by a DSS in complex decision making situations [7-lo]. The previous section clearly indicates that the determination of sample size for estimating electrical demand may be a complex and difficult task. In order to determine the appropriate sampling procedure it is necessary to address several other rather difficult factors such as stratification and population changes, as a component of sample design. An additional complication is that certain sectors require special meters to estimate demand in 15-30 min intervals. In most situations, this involves significant time and financial investment. It may therefore take as many as 6 months to design and install an appropriate sample. The complexity of the task combined with the cost factor make it almost mandatory that DMs use advanced analytic tools in an effort to make the most effective possible decision. The DSS proposed in this study serves this purpose. Specifically, the DSS will assist the DM in the following functions:

(1) identification of the distribution of electrical demand; (2) determination of sample size; (3) determination of the effectiveness of stratification, and if stratification is to be used:

(a) determination of the optimum number of strata; (b) determination of strata boundaries; and (c) determination of the sample size within each strata;

(4) identification of changes in the population over time; and (5) responses to “what if” questions posed by the load research DM.

The proposed DSS has been designed with the (widely-accepted) characteristics suggested by Sprague and Carlson [lo], namely: (1) a combination of analytic techniques and traditional data access; (2) ease of use; and (3) flexibility and adaptability to accommodate changes in the environment. Special attention was also paid to system attributes emphasized by Medsker [l l] in developing a DSS for energy policy analysis; namely, adaptive filtering, system memory, processing facilities, communications and inquiry/response. Another attractive feature of this DSS is that it

A decision support system in electric utility load research 391

can be used by any electric utility (large or small), in any area of the country, with few or no changes, since it is capable of describing any distribution of electrical demand and selecting the appropriate sampling procedure for that demand.

The DSS described in this study has several unique features that are not generally available in DSSs currently used by electric utility companies. For example, it provides an integrated

mechanism for addressing all aspects of the sampling procedure. It is also able to monitor changes in the population distribution. Finally, it is one of the first DSSs that incorporates the concepts of computer-intensive statistics to provide a comprehensive system for designing samples for load research purposes.

DESCRIPTION OF THE DSS

Identifying the distribution of electrical demand

Muralidhar [4] showed that the underlying distribution of electrical demand has a significant im- pact on the sampling procedure and, specifically, on the determination of the sample size. Hence, the first step in designing the sampling procedure is to identify the distribution of electrical demand. This requires historical data from electric utility companies regarding the consumption of electricity.

The data should preferably be in the form of small time units (15-30 min) rather than annual consumption, since the DSS is concerned with the estimation of electrical demand in 15-30 min intervals. If the data is, in fact, provided in the specified time unit, the system would select the period of peak load. This peak load demand data would then be used in further analysis to determine the appropriate sampling procedure.

Using such data, the DSS is able to determine the parameters that best fit the data for each of the three distributions listed earlier [5]. The DM is given a plot of the actual demand and each of the three distributions. The system will also determine which of the three distributions fit the data best using the chi-square goodness of fit test [ 121. Further, the DM is provided with the degree of variation for each of the three distributions. An example of the output is given in Table 3.

While the system provides information to the DM regarding the best fit distribution, the actual decision to use a specific distribution for further analysis is left to the DM since a few data points may heavily influence this result. Further, it is necessary to incorporate the expertise of the DM at this critical stage, since this will be the basis for all further analysis. The DM is thus allowed to change the distribution and/or the parameter. The system would then provide the DM with a plot of demand and relative fit for his/her choice of distribution/parameter of best fit. The DM is allowed to iterate this procedure until satisfactory results are achieved. He/she can then either pick the best fit according to the system or provide his/her own best fit to be used in further analysis.

Determination of sample size

The sample size necessary to satisfy the requirements of PURPA can be determined using three methods. Each method has its own strengths and weaknesses. The DM will have the opportunity within the DSS to determine the method best suited for each specific case. A description of the three methods used in the DSS follows.

.4na!,,tical method. Muralidhar [4] developed an analytical method by jointly addressing the sampling distribution of the sample mean and sample variance that can be used to determine the sample size that guarantees requirements of PURPA will be satisfied. Details of the procedure are provided in the Appendix. The input required at this stage is the level of confidence/reliability to be achieved. Using this specified level and the best fit distribution from the previous stage of the DSS. the system will determine the sample size.

Table 3. Dlstributmn fit of the sample data

Distribution: Gamma Weibull Log-MlrITXIl Parameter: 3.87 1.83 0.32 Relative fit: 2.5% higher variancei Best fit 5.6% higher variancet Distribution of best fit: Weibull Parameter of best fit distribution: 1.83

tRelatiw to the best fit distribution/parameter.

Best lit dlwibutlon: Weibull Parameter: I .83 Sample we determined is I18

This procedure is easy to use. However, the sample size it determines is conservative (that is, the sampIe size is over-estimated). The degree of over-estimation (i.e. the difference between determined and minimum sample size) depends on the form of the underlying distribution; the difference may vary from I to 900 [47. The DM must therefore be provided with this information. Using the data from Table 3, the output resulting from this procedure is shown in Table 4. Note that Table 4 provides not only the sample size as determined by the analytical procedure but also the degree of over-estimation that could arise from using this procedure. If the DM feels that the degree of over-estimation is high, then he/she may decide to use one of the other two procedures that are availabIe in the DSS.

Note further that while the analytical procedure provides the DM with the sample size necessary to satisfy PURPA requirements, it does not offer him/her the level of confidence/reliability achieved by a specified sample size for the same distribution. If the DM seeks this information, the simulation procedure is required.

~~~~~~~r~~ f~~ter~7i~~ft~f~n. The sample size can also be determined by using simulated distributions to represent the annual demand for electricity [13]. The input required by this procedure is the level of cotlfidenceirefiability that is to be achieved. This level could be the Ievel meeting PURPA requirements, or some other level desired by the DM.

By drawing repeated samples of a specific size from the simulated distribution of electrical demand, constructing the confidence intervals and specifying reliability, the DSS can determine whether this sample size satisfies the stated requirements. If the requirements are not satisfied. the sample size is incremented and the procedure repeated. The procedure is terminated when the sample size which satisfies the requirements is reached. This procedure thus provides the DM with the minimum sample size that will satisfy the requirements. The number of repeated samples drawn from the simulated distribution has been set to 1000 (by drawing 1000 samples, the level of precision achieved would be at least to the second decimal place).

A sample output resulting from this procedure is provided in Table 5. Table 5 provides, for the distribution of best fit, the minimum sample size that will satisfy the requirements of PURPA (or the desired level ofconiidencei’reliability specified by the DM), and the level of con~dence~reliability that can be achieved by a specific sample size. Using this information, the DM can do a cost/benefit analysis of increasing/decreasing the sample size. The information can also be used to determine the effectiveness of a currently existing sample.

Empirical determination of sample size using the simulation procedure thus provides the DM with the ~~~~~~~~~~~2 sample size that will satisfy the requirements for a specified distribution. It should be noted. however, that this procedure is relatively more computer-intensive than either the analytical procedure. described above, or use of an existing data base, described in the following section.

Eisirzg trrr e.\-istkg d&n husr. The third method that can be used by the DM to determine the sample size is to use an existing data base available within the DSS. This data base provides the DM with rhe same information as the simulation procedure, namely, the minimum sample size that will satisfy the (minimum) requirements of PURPA, and the ievei of confidence/reliability &hat can be achieved by specified sample sizes. The data base is created or revised automatically whenever sample size simulations are run. Thus, the results that would be provided to the DM would be the simulation results of the parameter value existing in the data base that is closest to the best fit distribution. Further, the parameter value selected wilI always be a worst case election. For

example, if the best fit distribution was a Gamma with shape parameter 3.28, and the data base contains sample size information for 3.25 and 3.3, then the closest distribution selected from the data base would obviously be the Gamma with shape parameter 3.25. This is due simply to the fact that a sample size determined for a Gamma distribution with shape parameter 3.3 may not satisfy

A decision support system in electric utility load research 393

Table 6. Approximate sample axe determination using existing data base

Table 5. Confidence/reliability estimates for sample sizes using simulation pro-

cedure

Best fit distribution: Weibull Parameter: 1.83

Level of confidence,

Samule size reliabilitv

30 0.000 40 0.005 50 0.020 60 0.116 70 0.362 80 0.614 90 0.834

100 0.888 Ilot 0.900 I20 0.915 125: 0.926

tOptimum sample size for PURPA re- quirements.

$Sample size needed for requirements specified by the user. User speafied level of confidence is 0.925 and reh- atxlity IF 0. I.

Best fit distribution: Weibull Parameter: I .83 Distribution used: Weibull Parameter: I.8

Level of confidence/

Sample sire reliabihty

30 0.000 40 0.000 50 0.004 60 0.018 70 0.066 80 0.226 90 0.507

100 0.756 II0 0.878 120t 0.903

tSample size to satisfy PURPA reqwe- merits.

Note: this is based on prior simulation results. Note that the parameter used in the simulation is 1.8. The actual parameter estimate of best fit is I 83. Thus the optimal sample represents a conservative estimate. The exact sample size would be m the range of 105-120.

the requirements for a Gamma distribution with shape parameter 3.28. However, by selecting a relatively worse case, the sample size determined by using the data base procedure would be higher than the minimum sample size.

The output resulting from this procedure is listed in Table 6. Table 6 not only provides the DM with the sample size for the distribution and parameter value selected, but also indicates that the sample size determined using the data base may not be the minimum. It also gives the range for the minimum sample size. If the DM feels that there is a significant increase in the sample size, then he/she can use the simulation procedure to determine the minimum sample size. As in the analytical determination of sample size, this trade-off between the cost of the additional samples and the cost (if any) of using the computer-intensive simulation procedure must be evaluated by the DM. It should also be noted that the data base procedure cannot be used to determine the sample size necessary to satisfy requirements that are stricter than that of PURPA (say, confidence of 0.95 and reliability of 0.08), since prior simulations used to create the data base had been run only with the minimum levels specified by PURPA.

Finally, if the DM wishes to use the simulation procedure for determining the sample size that satisfies the requirements for a specified distribution and parameter value for which the results already exist in the data base, then the system would revert to the data base rather than repeat the simulation procedure, thus saving both time and money.

The DM is thus provided with three methods to determine sample size that will satisfy the requirements of PURPA. Two of the methods described are quick, while the third (empirical determination using simulation) is relatively more computer-intensive and time consuming. On the other hand, the simulation procedure provides the DM with the minimum sample size that will satisfy the requirements. In this way, the DM is provided with information regarding the strengths and weaknesses of each procedure, allowing him/her to choose the procedure that is most suitable for a given situation.

Stratcjied sampling

The use of stratified sampling procedures, in general, provides better estimation efficiencies than simple random sampling [14]. This may not be true, however, for specific cases. Further, the estimation efficiencies available from stratified sampling decrease with increasing sample size. These procedures also present several practical problems such as the migration of observations from one strata to another or, in some cases, movement entirely out of the population. This problem is particularly pronounced in strata where the number of observations is extremely low. In such cases,

394 KRISH~AM~JRTY MU~ALI~HAR and MARIETTA J. TRETTER

migration of a few observations may significantly affect the estimation results. If corrective action is not taken, the resulting estimates may not reflect the true population.

If the sample size required to satisfy the stated requirements is significantly reduced, however, then the DM may wish to use stratified sampling procedures in spite of these problems. The DM must therefore be provided with the appropriate information so that hejshe can consider the trade-off between using stratified or simple random sampling. If stratified sampling procedures are chosen, then the system must provide the DM with the optimum number of strata, strata boundaries and the allocation of samples within each stratum. The proposed DSS uses two methods to provide the DM with such information.

~~~~~r~~u~ ~eter~~~n~~ti~~~ qf the ~~~t~~ene.~s qf .~tr~t~~cation. As in the determination of sampie size, an empirical methodology using simulated distributions of electrical demand is used to determine the effectiveness of stratification. In this process, the DM can specify either the maximum number of strata to be used or the minimum sample size in each stratum together with the required level of confidence/reliability. This allows the DM to control the sample size in each stratum so as to minimize the effect of migration. If the DM does not specify these values, then the system determines the sample size required to satisfy minimum PURPA requirements. The technique proposed by Dalenius and Hodges [ 151 is used to determine the strata boundaries and the Neyman allocation [14] is used to determine the number of observations to be collected from each stratum.

The output resulting from this procedure will provide the minimum sample size necessary to satisfy the requirements using stratified sampling. Also supplied are the optimum number of strata, the strata boundaries, and the allocation of samples to each stratum for this sample size. An example of the output is provided in Table 7. The DM may iterate this procedure by varying the parameters until acceptable con~dence~reliability levels are met.

Usirzg rm e.uisfing datu base. The effectiveness of stratification can also be determined using an existing data base consisting of results from the simulation procedure, described above. These data thus include values for parameters of the Gamma, Weibull and Log-normal distributions. The results are not comprehensive, however, in that they are restricted to specified values. Further, the DM does not have the option of specifying the maximum number of strata or the ~linimum sample size in each stratum. As in the procedure to determine the sample size using an existing data base, selection of the parameter would also be the closest worse case value relative to the distribution of best fit.

Table 8 shows an example of the output resulting from this procedure. The table provides, for several sample sizes, the level of confidence and reliability achieved by stratified sampling procedures. The table also indicates the degree of over-estimation due to the selection of the relative worse case parameter.

It should be noted that the decision support system does not decide whether simple random sampling or stratified sampling is to be used. This determination must be made by the DM based on an evaluation of the trade-off between the relative complexity and the practical problems involved in stratified sampling versus the increased efficiency in estimation provided by this procedure. If the DM decides to use stratified sampling, then the results generated by the DSS can be used straightaway to construct the samples.

Monitoring chunges in the population oz.‘er time

As mentioned in the Introduction, load research is a continuing operation. According to PURPA, the results of load research must be reported at specified time intervals. Since the distribution of electrical demand is constantly changing over time, the sample designed for a specific time period may fail to satisfy the requirements in another time period. In such cases, electric utility companies must monitor the change in the population and adjust their samples to meet the requirements for the new population. One way to detect a change in the population is to study the complete population and determine the change. However, this is a time consuming and expensive procedure. The proposed DSS. on the other hand, uses a statistical technique called bootstrapping to monitor the change in the distribution of electrical demand employing just the sample data.

Bootstrapping was proposed by Efron [ 161 as a means for empirically constructing the sampling distribution of a statistic using only the data from a single sample rather than from the entire

A decision support system in electric utility load research 395

Table 8. Level of confidence/reliability with stratification using existing data base

Best fit distribution: W&bull Parameter: I .83 Distribution used in the analysis. W&bull Parameter: 1.8

Level of confidence/

Sample size reliability

Table 7. Level of confidence:reliability with stratification using simulation procedure

Best fit distribution: Weibull Parameter: I.83

Level of confidence/

Samole size rehabilitv

2s 0.752 50 0.827 75 0.857

I 00 0.885 105t 0.905

toptimal sample sze for PURPA requirements.

Ootlmal stratltication scheme

Best fit dlstributlon: Weibull Parameter. I .83 Optimal number of strata: 6 Optimal total sample size: I05

strata strata IOWCX upper Sample

SlMlZl hmit limits within number CkWh) CkWh) Ftrata

I 0 300 20 2 300 450 5 3 450 600 IO 4 600 800 19 5 800 1100 21 6 II00 INF 30

25 0.732 50 0.X23 75 0.869

100 0.889 115t 0.904

tOptimal sample size for PURPA requirements.

Optimal stratification scheme

Optimal number of strata: 6 Total sample size: II5

strata strata lower uPPer

strata hmit limits number (kWh) CkWh)

I 0 300 2 300 450 3 450 600 4 600 800 5 800 II00 6 II00 INF

S‘amplr within strata

24 5

12 17 22 35

Note: the above results are based on the W&bull distribution with parameter of 1.8. The best fit parameter is I X3. Hence the results are a conservative estimate, and would provide a higher level of confidence:reliability. The exact sample sire would be m the range 100-1 IS.

population [ 16, 171. To construct the empirical sampling distribution of the statistic (called the bootstrap distribution of the statistic), repeated realizations of the statistic (in the case of the electric utility companies, the mean) are generated by taking random samples from the sample data with replacement. The frequency distribution of the statistic is then constructed. This frequency distribution is an estimate of the sampling distribution of that statistic. Research evidence indicates that the bootstrap distribution does provide a good estimate [l&20]. Muralidhar [4] has also shown that the bootstrap distribution of the sample mean is indeed capable of effectively monitoring changes in the distribution of electrical demand. Hence, by constructing the bootstrap distribution of the sample mean and comparing it to a previous time period, the DSS can provide the DM with key information regarding changes in the population.

When the sample data for any period become available, the bootstrap distribution of the sample mean is constructed for this period. This bootstrap distribution is then compared to the bootstrap distribution of the sample mean corresponding to the worst case period for which the sample was constructed. The comparison of the two bootstrap distributions may indicate no change, a change for the better (lower ratio of standard deviation to the mean), or a change for the worse (higher ratio of standard deviation to the mean). No change would indicate that the sample design currently being used would continue to work. A change for the better would indicate that the current sample design would more than meet the requirements for the new distribution. A change for the worse would indicate that the current sample design may fail to satisfy the specified requirements, and that the manager must redesign the sample. In such cases, the stages of the DSS described in the earlier sections would again be used to determine the new appropriate sampling procedure.

In redesigning the sampling procedure, the DM may decide to repeat the complete procedure starting from the identification of the distribution of electrical demand, when such data become available. The second option is to use the estimate provided by the bootstrap distribution directly and then employ the identified distribution to redesign the sampling procedure.

It should be noted that while bootstrapping may be used to monitor changes in the population, the decision to redesign the sample must be initiated by the DM. In order to aid the DM in this effort, a plot of the two bootstrap distributions being compared is provided. Using this information,

396 KRISHNAM~RT~ M~~ALIDHAR and MARIETTAJ.TRETTER

the DM may then decide to redesign the sampling procedure to satisfy the requirements for the new distribution.

The redesigning of the sampling procedure is a key element in the construction of the DSS. Electric utility companies realize that demand distributions change over time and that monitoring these changes is a critical requirement for effective planning. According to PURPA, if an electric utility company fails to meet the requirements in a specified time period. then the sampling procedure must be modified so as to show an improvement in the subsequent time period. The proposed DSS provides the DM with a tool to monitor important changes and to satisfy the specific requirements made of load research.

SENSITIVITY ANALYSIS

The proposed DSS can also be used to answer the DMs “what if” questions. Each of the stages described above provide the capability to do so. In identifying the distribution of electrical demand, the DM uses the “what if” capability by changing the speci~cation of the distribution and parameter value to determine the best fit distribution. This allows incorporation of the DM’s expertise.

The DM can, in addition, use the existing data base and the simulation procedures to determine how sensitive the sample sizes are to changes in the distribution and/or parameters. For example, one “what if” question that might be raised by the DM is: “What would be the levels of confidence and reliability achieved if the new population is a Gamma with shape parameter 3.8 for a sample size of 1007” The question can be answered by using either the simulation or data base procedure. Even if the results for this specific distribution and parameter are not in the data base, interpolation of the existing results will provide the DM with a close approximation of the levels of confidence and reliability achieved. In general, the data base procedure is preferable to simulation in answering these “what if” questions when rime is a critical consideration. Finally, it should be noted that the analytical procedure can also be used to determine the sample size required for hypo- thetical distributions and that stratified sampling also allows for the performance of sensitivity

analyses.

CONCLUSIONS

Accurate estimation of customer demand for electricity plays an important role in the determination of electricity rates and is also used as input in several other utility planning functions such as load management programs, load forecasting and rate experiments. Errors in estimation can seriously affect company plans and consequently have a detrimental effect on the long-term financial welfare of electric utility companies and their customers. The DSS described in this study is designed to address all aspects of the sampling procedure for estimating customer demand (including the unique requirements of PURPA). It was shown that selection of the appropriate sampling procedure is crucial to the success of the demand estimation program.

There are, however, situations where electric utility companies may not be able to use the most appropriate sampling procedure due to severe constraints on cost. The guidelines outlined by the FERC recognizes such constraints and allows electric utility companies to improve their estimation procedures over time rather than requiring them to achieve the standards immediately. In such situations, the proposed DSS will aid utility companies in determining the impact of not choosing the most appropriate sampling procedure. In addition, it will supply additional information on how modifications to the existing procedure can help improve their estimation results.

Finally, one of the truly attractive features of the DSS is that it can be applied in areas other than load research. The problem of confidence interval requirements being different from those of the general case is not limited to electric utility companies but can be found in such diverse fields as auditing [21], inventory control [22] and marketing [23]. The DSS developed in this paper can thus be applied in all situations where the estimation requirements are different from the general case and require significant modifications in determining the sampling procedure.

A decision support system in electric utility load research 391

REFERENCES

I. Load Research Manual (Vols l-3). Argonne National Laboratory, New York (1980). 2. R. A. Johnson and D. W. Wichern. Applied Mulriuariate Sfatisfics. Prentice-Hall, N.J. (1982). 3. J. M. Liittschwager. Mathematical models for public utility rate revisions. Mgmt Sci. 17, B339-353 (1971). 4. K. Murahdhar. Appropriate sampling procedures in load research. Ph.D. dissertation, Texas A&M University (1986). 5. N. L. Johnson and S. Kotz. Confinuous Uniuariafe Distributions. Wiley, New York (1970). 6. W. Schulz. Sample size determination for the estimation of variance when the underlying distribution is not normal.

Eiomet. J. 18, 5355546 (1976). 7. S. Alter. A taxonomy of decision support systems. Sloan Mgmt Rev. 19, 39-56 (1977). 8. J. L. Bennet. Building Decision Support Sysstems. Addison-Wesley, Mass. (1983). 9. R. H. Bonczek, C. W. Holsapple and A. B. Whinston. Computer based support of organizational decision making.

Decn Sci. 10, 268~m291 (1979). 10. R. H. Sprague and E. D. Carlson. Building Efictiw Decision Suppot-/ Systems. Prentice-Hall, N.J. (1982). Il. L. R. Medsker. An interactive decision support system for energy policy analysis. Commun. ACM27, 1122-I 128 (1984). 12. W. W. Daniel. Applied Nonparametric Slaristics. Houghton-Mifflin, Boston, Mass. (1978). 13. K. Murahdhar and M. J. Tretter. A simulation procedure for sample size determination in electric utility load research.

Simulation 52(2), 53-58 (1989). 14. W. G. Cochran. Sampling Techniques. Wiley, New York (1977). 15. T. Dalenius and J. L. Hodges Jr. Minimum variance stratification. J. Am. s/at&. Ass. 54. 888101 (1959). 16. B. Efron. Bootstrap methods: another look at the jackknife. Ann. Statisl. 7, l-26 (1979): ~ 17. P. Diaconis and B. Efron. Computer intensive methods in statistics. Scient. Am. 248, 116-130 (1983). 18. K. Singh. On the asymptotic accuracy of Efron’s bootstrap. Ann. Srafisr. 9, 1187-l 195 (1981). 19. P. J. Bickcl and R. J. Freedman. Asymptotic normality and bootstrap in stratified sampling. Ann. Statist. 12, 470482

(1984). 20. D. A. Freedman and S. C. Peters. Bootstrapping a regression equation: some empirical results. J. Am. stafisl. Ass. 79,

97 106 (19841. 21. J. Johnson. R. Leitch and J. Neter. Characteristics of error in accounts receivable in inventory audits. Account. Reu.

56, 270-293 ( 198 I ). 22. T. A. Burgin. The gamma distribution in inventory control. Opl Res. Q. 23, 5077525 (1975). 23. E. Babakus. C. E. Frrguson Jr and K. G. Joreskog. The sensitivity of maximum likelihood factor analysis to violation

of measurement scale and distributional assumption. .I. Mkrng Res. 24, 222~.228 (1987).

Step 0:

Step 1: step 2: Step 3: Step 4:

Step 5: where

APPENDIX

Analytical Procedure to Determine Sample Size

Set the type of parent distribution and the associated parameter Set initial sample size (no). Set i = 1. Determine x* using II,_, in equation (1). Determine s’* using n,_, in equation (2). Solve equation (3) for n, using x* and s**. If 1 n, - n, _ , / is less than or equal to one, go to Step 5. If not, increment i by one, and go to Step 1. The appropriate sample size is the larger of n,_ , and n,.

values.

x* = p -Z,., i7/n”’

n = [(Z”.? (.G*)’ ‘)/(r x*)1*

p = mean (or prior estimate) of the specified population O’ = variance (or prior estimate) of the specified population

Z,? = the upper u/2 percentage point of a standard normal distribution

!J = (n - I)/(1 + [(n - 1)/(2n) VJ) Y? = the kurtosis of the specified population

x;‘. u.? = the upper u/2 percentage point of a chi-square distribution lOO( 1 - u)% = the specified level of confidence

r = the specified level of reliability.

(1)

(2)

(3)