A random coefficient approach to the estimation of residential end-use load profiles

31
Journal of Econometrics 50 (1991) 297-327. North-Holland A random coefficient approach to the estimation of residential end-use load profiles * Denzil G. Fiebig and Robert Bartels University of Sydney, Sydney NSW 2006, Australia Dennis J. Aigner lJnicersi@ of California, Irr’ine, CA 92717, USA Received January 1989, final version received June 1990 This paper develops some extensions to the statistical approach to the estimation of residential end-use load curves and provides a substantive application of these developments to a sample of households. Importantly, the typical assumption that the coefficients of the appliance dummies are tied, ignores two important sources of variation: during any particular hour the intensity of use of a particular appliance will vary from household to household; also the dummies indicate only absence or presence of the appliance and do not allow for variations in size or capacity. Our treatment of the coefficients of appliance dummies as random rather than fixed provides a structure for the heteroskedasticity that has been observed in previous studies of this kind. Also included in the analysis is the utilization of other sources of information in particular from direct metering and a sample of diaries. The resultant single equation specifications for individual hours are then pooled and jointly estimated using an SUR structure. 1. Introduction Conditional demand analysis (CDA) is a statistical method for allocating the total household electricity load during a period into its constituent *This paper developed from a project carried out in the state of New South Wales in Australia in conjunction with an Electricity Industry Working Group (IWG) comprising representatives from the Electricity Commission of N.S.W., the N.S.W. Department of Energy, the Local Government Electricity Association, and the County Councils of Sydney, Prospect, Southern Riverina, Illawarra, and Shortland. The authors gratefully acknowledge the assistance of the IWG. We are especially indebted to the Commission and its officers Bob Lumsdaine and Mike Garben whose assistance was invaluable in the preparation of this paper. We also wish to thank Professor Bill Griffiths whose expertise in the area of random coefficient models was most helpful, Kerrie Legge who skilfully prepared the diagrams, and the Associate Editor and two anonymous referees whose comments led to several improvements of this paper. 0304-4076/91/$03.50 0 1991-Elsevier Science Publishers B.V. (North-Holland)

Transcript of A random coefficient approach to the estimation of residential end-use load profiles

Journal of Econometrics 50 (1991) 297-327. North-Holland

A random coefficient approach to the estimation of residential end-use load profiles *

Denzil G. Fiebig and Robert Bartels University of Sydney, Sydney NSW 2006, Australia

Dennis J. Aigner lJnicersi@ of California, Irr’ine, CA 92717, USA

Received January 1989, final version received June 1990

This paper develops some extensions to the statistical approach to the estimation of residential end-use load curves and provides a substantive application of these developments to a sample of households. Importantly, the typical assumption that the coefficients of the appliance dummies are tied, ignores two important sources of variation: during any particular hour the intensity of use of a particular appliance will vary from household to household; also the dummies indicate only absence or presence of the appliance and do not allow for variations in size or capacity. Our treatment of the coefficients of appliance dummies as random rather than fixed provides a structure for the heteroskedasticity that has been observed in previous studies of this kind. Also included in the analysis is the utilization of other sources of information in particular from direct metering and a sample of diaries. The resultant single equation specifications for individual hours are then pooled and jointly estimated using an SUR structure.

1. Introduction

Conditional demand analysis (CDA) is a statistical method for allocating the total household electricity load during a period into its constituent

*This paper developed from a project carried out in the state of New South Wales in Australia in conjunction with an Electricity Industry Working Group (IWG) comprising representatives from the Electricity Commission of N.S.W., the N.S.W. Department of Energy, the Local Government Electricity Association, and the County Councils of Sydney, Prospect, Southern Riverina, Illawarra, and Shortland. The authors gratefully acknowledge the assistance of the IWG. We are especially indebted to the Commission and its officers Bob Lumsdaine and Mike Garben whose assistance was invaluable in the preparation of this paper. We also wish to thank Professor Bill Griffiths whose expertise in the area of random coefficient models was most helpful, Kerrie Legge who skilfully prepared the diagrams, and the Associate Editor and two anonymous referees whose comments led to several improvements of this paper.

0304-4076/91/$03.50 0 1991-Elsevier Science Publishers B.V. (North-Holland)

298 D.G. Fiebig et al., Estimation of residential end-use load profiles

components, each associated with a particular electricity-using appliance or end-use. The method exploits the fact that across a sample of households there is variation in the patterns of appliance ownership. As such, regressions of household load on appliance dummies can be used to estimate individual end-use or appliance load without actually metering the appliances directly. When the regressions are run for every hour in the day the method produces the daily load curve for each of the end-uses.

This paper develops some extensions to the standard statistical approach to the estimation of residential end-use load curves and provides a substan- tive application of these developments to a sample of households. The typical assumption that the coefficients of the appliance dummies are fixed, ignores two important sources of variation: during any particular hour the intensity of use of a particular appliance will vary from household to household; also the dummies indicate only absence or presence of the appliance and do not allow for variations in size or capacity.

To deal with these shortcomings we propose a random coefficient interpre- tation for the coefficients of the appliance dummies. Apart from its obvious intuitive appeal, the use of a random coefficient model (RCM) has several other attractive features:

6)

(ii)

(iii>

Statistically, it provides a structure for the heteroskedasticity that has been observed in previous studies of this kind.

An important innovation in our study is the availability of direct metering data. Unlike a fixed coefficient model where the incorporation of such information is problematic, a RCM provides a framework that is readily adapted to the inclusion of direct metering data.

It provides a source of useful extra information. For example, it is possible to predict the actual end-use loads for individual customers and hence to develop distributions of loads for appliances across individuals.

A natural extension to the analysis of loads for individual hours is to pool all 24 hourly regressions and estimate them as a seemingly unrelated regres- sions (SUR) system. Such an approach exploits the likely correlations be- tween the disturbances across hours for the same customer.

While SUR estimation has been utilized in previous CDA research, we are able to refine the procedure in two important respects. For the conventional CDA model, where the explanatory variables are identical for all hours and the individual equation disturbances are assumed to have scalar identity covariance matrices, there is no efficiency gain from this extension. However, when the individual equation disturbances are not spherical, which is the case here because of our RCM, SUR will lead to efficiency gains even if the

D.G. Fiebig et al., Estimation of residential end-use load profiles 299

explanatory variables are identical for all hours. The other innovation occurs because of the availability of diary information. This information allows the identification of times of the day when certain appliances are not used. The resultant exclusion restrictions are likely to produce further efficiency gains.

The remainder of the paper is organised as follows: Section 2 develops the single-equation RCM that provides the basic structure on which further developments are built; estimation of this model is discussed in section 3; extensions to the basic model are presented in section 4; the data used in the analysis is described in section 5; section 6 includes a comparison of the alternative estimators and presents some results from our preferred ap- proach; and the paper concludes with some summary comments and recom- mendations.

2. The basic random coefficient model

The starting point is to consider an arbitrary hour, for which the basic model is of the form

yi =x:/3 + qy,+ , i=l ,...> N,

where

yi = load of customer i,

x1= row vector of observations on p explanatory variables, di.= row vector of observations on k appliance dummies, the first of which is

always unity.

There are two assumptions typically made with regard to the appliance dummies that are contentious; namely that the appliance ownership is exogenous and that the coefficients of the appliance dummies are fixed. With regard to the first of these note that in the short-run appliance stocks are fixed. Although annual expected consumption might affect appliance owner- ship, it is highly unlikely that this is the case with hourly consumption which will be the focus of our empirical work. Moreover, existing evidence suggests that the bias from ignoring possible endogeneity may be qualitatively small; see Sebold and Parris (1989). For these reasons we were not overly con- cerned about the exogeneity issue, although it would have been useful to test this contention. Unfortunately no reasonable instruments were available to perform such tests.

The second assumption, that the coefficients of the appliance dummies are fixed, is unrealistic. There are two important sources of variation:

(i> During any particular hour the intensity of use of a particular appliance will vary from household to household.

300 D. G. Fiebig et al,, Estimation of residential end-use load profiles

(ii) The dummies indicate only absence or presence of the appliance and do not allow for variations in size or capacity.

Consequently, we assume that

y+ = y + u1 ) (2)

where y is a k x 1 vector of nonstochastic mean response coefficients and u; = (Ulr, c’z;, . . . , uki) is a vector of random disturbances.

Notice that (1) is written without a separate disturbance term. This omission is deliberate, as a separate disturbance cannot be distinguished from uli, the disturbance associated with the intercept.

At this stage it is appropriate to recognize that some care needs to be exercised in defining random coefficients for dummy variables. According to (2), all elements of y+ are random. However, on observing the realized sample values for the dummy variables (here appliance holdings), it is possible to identify some elements as identically equal to zero. Now a modified version of (2) is appropriate, namely

y,? =A,y’, (2’)

where Ai is a k-dimensional diagonal matrix whose diagonal elements are the appliance dummies, which are zero or unity depending on the appliance holdings of the ith customer.

Because d:Ai = d:, the combination of (1) with either (2) or (2’) yields a model of the form

yi =x:/3 + d;.y + ui, (3)

where

u, = d;.q. (4)

Assuming,

E(ui) ~0, E(Uiuj) =A, E(u,u~) =O for i#j, (5)

it follows that

E(ui) = 0, E(u~) =diAdi, E(uiuj) = 0 for i Zj. (6)

This is a variant of the Hildreth-Houck random coefficient model. In our empirical work, A is assumed to be a diagonal matrix, which implies a

D.G. Fiebig et al., Estimation of residential end-use load profiles 301

heteroskedastic error variance of the form

ai2 = E(u?) = d;cz, (7)

where a’ = (czr, (Ye,. . . , a,) is a vector comprising the diagonal elements of A.

In obvious notation write (3) as

yi = 2;s + ui, (8)

and let the full error covariance matrix be 6!, which is a diagonal matrix with typical diagonal element given by (7). Now for known 0 the GLS estimator of S is given by

s^ = (ZTr’Z)_‘zw’y. (9)

It is also possible to predict the individual random response vector. The predictor

$=A;P+Ad,(d;a)-‘(yi-z$) (10)

is best linear unbiased; see Griffiths (1972). It is unbiased in the sense that E[+,? - $I= 0, and best in the sense that the prediction error of any other linear unbiased predictor exceeds the covariance matrix of [?X - -$I by a nonnegative definite matrix.

For our particular problem these best linear unbiased predictors (BLUPs) are of great interest. They represent predictions of actual customer end-use loads and as such can be used to develop distributions of end-use loads over individuals. Previous conditional demand models have only been able to provide the means of these distributions.

Naturally, neither the estimator in (9) nor the predictor in (10) are operational because they rely on the unknown variances a. These need to be estimated.

3. Estimation

Numerous feasible GLS estimators have been proposed for this particular model; see for example the discussion in Judge et al. (1985). Typically they rely on estimation of primary and secondary equations. Here the primary equation is given by (81, while the secondary equation has the form

302 D.G. Fiebig et al., Estimation of residential end-use load profiles

where ei denotes the OLS residuals from the primary equation. In matrix form this can be written as

e2=Da+E.

Thus the OLS estimator of (Y is given by

CE = ( D’D)-‘D’e2.

(lib)

(12)

However, this estimator ignores the heteroskedasticity inherent in the error covariance matrix of E which asymptotically is proportional to R2. An alternative EGLS estimator suggested by Amemiya (1977) is given by

(13)

where fi is a diagonal matrix with typical diagonal element di&. The resultant EGLS estimator of 6 is then defined by

(14)

where the new estimator of fl utilizes the EGLS rather than OLS estimates of (Y. Amemiya has shown that this estimator is asymptotically equivalent to any fully iterated ML estimator. Moreover, Buse (1984) has demonstrated that it is also the second round of the scoring algorithm designed to maximize the log-likelihood function.

Despite the attractiveness of the Amemiya two-step estimator, it is neces- sary to emphasize that there is nothing in the procedure that guarantees the (Y estimates will be nonnegative as they should be. In fact, this problem is not confined to the Amemiya estimator and seems to be endemic in the imple- mentation of RCMs.

Notice that having nonnegative aj estimates is not a necessary condition for nonnegative a,2 estimates. Procedures such as Amemiya’s that utilize a secondary equation, generate these latter estimates as predictions of the conditional mean of the dependent variable in the secondary equation. In this regard the work of Ronning (1985) is relevant. He provides conditions under which these predictions are nonnegative with probability 1. Unfortu- nately, these conditions will rarely hold in practice. This, together with existing Monte Carlo evidence [again surveyed in Judge et al. (198511 suggests that negative variance estimates represent an unavoidable small sample problem for such estimation procedures.

The problem of eliminating negative aj and vii2 estimates has prompted numerous remedies. Many of these remedies involve modifying the sec-

D.G. Fiebig et al., Estimation of residential end-use load projiiles 303

ondary-equation estimates: for example, one suggestion of Hildreth and Houck (1968) was to replace negative (Y, estimates by zero. This is a somewhat curious suggestion as it can lead to some observations being associated with zero or extremely small variance estimates. This can result in very wild estimates in the GLS estimation of the secondary equation. The weighting matrix in this estimation is obtained by taking the reciprocal of the squares of these variance estimates and hence the observations that have been subject to adjustment can often dominate the outcome in the sense of having inordinately large weights relative to the bulk of the data. A more sensible approach may be to assign them a zero weight or at least to impose some upper bound on the weight.

Our initial approach to try and obtain nonnegative (Y was to follow Srivastava et al. (1981) and use mixed estimation to impose stochastic restrictions on the estimates. The basic idea relies on the existence of a priori knowledge of the bounds of the individual ai’s. Given that they represent the variance in the load of the jth appliance in the RCM, this is not unreason- able. However, experimentation with this approach proved somewhat disap- pointing. In particular the prior bounds needed to be quite narrow to ensure nonnegativity. In order to ensure the success of this procedure it would be necessary to experiment with a range of priors. Moreover, this experimenta- tion would be necessary for each separate hour being investigated.

None of the above options was attractive and as a consequence it was decided to move away from the use of a secondary equation to a procedure suggested by the work of Lee and Griffiths (1979). Here variance estimates are calculated as sums of squares and hence positive estimates are guaran- teed. The idea, presented for models involving cross-section time-series data, involves starting with initial (Y estimates and associated estimates of y and y,?. New (Y estimates can then be derived from summing squared deviations of the elements of the predictions of y,* from the corresponding element of the estimate of y.

However, using arguments similar to those in the appendix, it can be shown that for our particular problem this procedure generates inconsistent estimates of the (Y’s. Specifically, these estimates constitute serious underesti- mates. Fortunately the source of the inconsistency is easily remedied. Con- sider predictors of the form

7: = A,? &I;( d:a) --I’*( y, - .$), (15)

where p =A. Obviously they constitute a simple adjustment to the BLUPs, retaining the properties of linearity and unbiasedness but not that of being best. Importantly though, they can be used to generate consistent estimates

304 D.G. Fiebig et al., Estimation of residential end-use load profiles

of the cy’s. In particular a consistent estimate of say aj can be generated from

c-uj = n,:’ &i,,(rl*i - qj)*, (16)

where d;, is the indicator of whether customer i has appliance j, nj is the number of customers having the jth appliance, and j$ and yj are the jth elements of the corresponding vectors. An operational version of (16) re- quires initial estimates of the LY’S. Our choice here was to take the (Y’S generated by Amemiya’s procedure with any negative estimates being re- placed by reasonable guesses of the actual variances. These ‘feasible’ ad- justed BLUPs will be referred to simply as adjusted BLUPs, but it should be stressed that they do not necessarily retain properties such as unbiasedness. (For further details regarding these estimators, including assertions regarding consistency properties, see the appendix.)

In summary, the estimation of our basic RCM proceeds according to the following steps:

(i>

(ii)

(iii)

Estimate the primary equation by OLS.

Estimate the secondary equation by OLS and generate initial v esti- mates. When these fall below a prespecified value the associated weight is set to zero.

Re-estimate the secondary equation by GLS to produce new (Y and v estimates. Again there is an adjustment. We ensure that all (Y estimates are greater than some reasonable value.

At this stage the result is a variant of the two-step Amemiya (1977) estimator with some modification to ensure that the variance estimates are greater than a reasonable minimum value. The procedure continues as follows:

(iv)

(v)

(vi)

Use the resultant estimates to produce adjusted BLUPs according to (15) and the associated L-X estimates given in (16).

Re-estimate the primary equation given the new variance estimates.

Finally perform one further iteration implicit in steps (iv) and (v).

4. Extensions to the basic model

The RCM model and associated estimation procedure presented in the previous two sections provides our basic point of reference for other innova- tions.

D. G. Fiebig et al., Estimation of residential end-use load profiles 305

4.1. Direct metering

Conditional demand analysis arrives at estimates for the load contribution of different end-uses by statistically disaggregating the total household loads for a sample of households. This is an indirect approach and the estimates it generates are often imprecise. The RCM is likely to provide some efficiency gains relative to OLS procedures, but it remains the case that there is considerable room for improvement in these estimates. One obvious alterna- tive is to directly meter specific appliances for a subsample of households. An important characteristic of our data-base is the availability of direct metering data. To our knowledge such information has never been used in the context of conditional demand modelling. As such the appropriate methodology for its inclusion required some development.

As it turns out, the suggested approach follows directly from our random coefficient framework; moreover it would seem that the incorporation of such information into a fixed coefficient model is problematic. Suppose direct metering information is available on the kth appliance for a total of II households where n is less than the number of households in our sample who have this appliance. For these households we observe a realization of the random response coefficient. This load can then be subtracted from the household’s total observed load to yield

yi - Ykr * = zp + u;, i= l,...,n, ( 17a) y, = z:6 + dikyk + u;, i=n+l >..‘> N, ( 17b)

where zi and 6 have been redefined to exclude the kth appliance dummy and its associated mean response coefficient respectively. Finally, the obser- vations in (17) are augmented to include the additional n observations that constitute the actual response coefficients of the k th appliance dummy. These are of the form

$1 = Yk + L’k,, i=l ,..., n. (18)

The stacked regression allows joint estimation of yk, utilizing the data from households that were and were not directly metered.

The assumption that the covariance matrix for tii is diagonal ensures that the error covariance matrix for the stacked regression is also diagonal with a heteroskedastic structure of the form discussed previously. In fact, it is as if there is an additional sample of n households with only one appliance. Also notice that in the limiting case where everyone with the kth appliance is directly metered, there are no observations of the type given in (17b) and there is no gain from joint estimation of the analogous forms of (17a) and (18). For a more complete discussion of direct metering and likely efficiency gains in the context of CDA see Bartels and Fiebig (1990).

306 D.G. Fiebig et al., Estimation of residential end-use load profiles

4.2. Joint estimation

A natural extension to analyzing individual hours separately is to pool all 24 hourly regressions and estimate them as an SUR system. For the conven- tional CDA model, where the explanatory variables are identical for all hours and the individual equation disturbances are assumed to have a scalar identity covariance matrix, there is no efficiency gain from this extension. The approach of Aigner et al. (1984) was to hypothesize that some discretionary appliances, such as clothes dryers, are not used in the early hours of the morning and hence could be excluded from those particular equations. With such restrictions imposed on the parameters of the hourly models, Aigner et al. are able to generate more precise estimates.

This approach is refined in two ways. The heteroskedastic structure of our RCM implies that SUR will lead to efficiency gains even if the explanatory variables are identical for all hours. Intuitively the estimation can be thought of progressing in two steps: the first step weights all observations to produce homoskedastic errors in each equation, while the second is simply a standard SUR on the weighted data. However, the weighting is different for each equation, and hence at the second step the transformed explanatory variables are not the same for all hours. The other innovation occurs because of the availability of diary information. This information allows the identification of times of the day when certain appliances are not used. Again the restrictions imposed are exclusion restrictions but in our case we are able to base the decisions on collateral information.

It is possible that the diary information could also be of use in specifying additional restrictions on estimated loads. In particular, sensible ‘load shapes’ were found when the diary usage information was averaged over customers. It is true that the level of these curves is meaningless but they could be used to produce a restriction of the form that the load of appliance r in hour j is say twice that in hour j - 1 but only a third of that in j + 1. The load shapes estimated from the diary information could be used to produce restrictions on the estimated loads over the entire day. For the present study this option was not pursued.

Because of the manner in which our cross-section and time-series data are pooled, the elements of the disturbance covariance matrix in the SUR model measure serial correlations. The (j, k) element refers to the correlation between the jth and k th hours. Preliminary investigations of these estimated correlations revealed some patterns: the correlations were invariably positive and they tended to be inversely related to the difference or lag, (j - k). However, for any one lag there was enough variability in the correlations across (j - k) pairs to suggest something more than simply a Toeplitz form for the disturbance covariance matrix. For the current analysis we neglected the extension of imposing some structure on the autocorrelation of the

D.G. Fiebig et al., Estimation of residential end-use load profiles 307

disturbances and maintained the unconstrained formulation of the distur-

bance covariance matrix.

5. Data

The data for this study were compiled as part of the Domestic End-Use study conducted for the state of New South Wales (N.S.W.) in Australia under the auspices of an Industry Working Group comprising representatives from the Electricity Commission of N.S.W., the N.S.W. Department of Energy, the Local Government Electricity Association, and the County Councils of Sydney, Prospect, Southern Riverina, Illawarra, and Shortland. In the study a sample of approximately 380 households had special meters installed to record each household’s electricity load at quarter-hourly inter- vals over a 15month period from July 1986 to September 1987. In addition, a survey was carried out to determine demographic characteristics and appli- ance holdings. The data available for analysis included three types of vari- ables:

(i> load data, (ii) demographic variables, (iii) appliance dummies.

Eventually the Domestic End-Use study required the generation of results for both weekdays and weekends, all of N.S.W. and the subregions of the Sydney County Council and the rest of N.S.W., and for all months of the study period. Given the magnitude of this task and the numerous modelling questions that needed to be resolved it was decided to proceed in two stages: the first involving a comprehensive analysis of one month for one type of day for one region, while the second would draw on the qualitative results of this first stage as the basis for models of other days, months, and regions. The current paper documents this initial analysis. Total N.S.W. was chosen as the region, working days as the day type, and because of the importance of the winter peak, July 1986 was chosen as the month.

The load data used consisted of hourly integrated demands for each customer averaged over working days in July 1986. The resultant 24 observa- tions for each customer represent the household’s average working day load profile for that month.

Three demographic variables were chosen for consideration in the analysis:

PEOPLE = number of inhabitants of the dwelling, SIZE = physical size of the dwelling (in 10’s of square metres), INCOME= household income (in $10,000’~).

Over 20% of the income data was coded as missing. Neither the solution of omitting this important variable or of omitting those observations with missing income data were attractive. Instead it was decided to retain INCOME, with missing values being coded as zero, and to include a dummy

308 D.G. Fiebig et al., Estimation of residential end-use load profiles

variable, ZNCMSG, that indicated these missing cases. This is the modified zero-order regression method described in Maddala (1977) as a solution to missing values.

These four variables are measured as deviations from their respective means. This modification enables the intercept to be interpreted as the load associated with appliances that are not directly included in the model, when this load is evaluated for an ‘average household’.

Price variables have not been included because there is virtually no price variation in the sample. Tariffs are essentially the same for almost all customers and they almost all fall into the same pricing block.

A selection of nine appliance dummies was chosen. These, together with their estimated population penetration rates, are provided in the following list:

FREEZ = separate freezer (47%), FRIGAUT = automatic defrost fridge (53%), COOK = electric oven or hotplates (73%), DSH = dishwasher (22%), DRYER = clothes dryer (52%), HEAT = electric main or secondary heating (79%), HWPK = main tariff water heater (32%), HWOP = offpeak tariff water heater (51%), POOLPUMP = pool pump (6%).

Notice that no variable has been included for washing machine or standard refrigerator. The reason for this is that CDA cannot capture the end-use loads for appliances owned by all households. In fact, the FRZGAUT variable is intended to capture the additional load due to the automatic defrosting mechanism. It cannot capture the refrigeration load common to all house- holds.

A further indicator variable, MJROTH was included to capture the load of uncommon appliances with large capacities such as spas, saunas, electrically heated pools, etc.

Several observations were deleted from the analysis. These included those that had missing data for any of the variables, observations that were included in the sample but which were not households, and three households with zero loads over the month. A total sample size of 348 remained after making these modifications.

Diary information for 202 customers over a period of two weeks gave frequency-of-use information for a number of appliances. Averaging this information over customers, for each hour, provided the basis for restrictions on some end-use loads. The particular loads that were restricted to zero are presented in table 1. For comparison the restrictions employed by Aigner et al. (1984) are also included.

D. G. Fiebig et al., Estimation of residential end-use load profiles 309

Table 1

Exclusion restrictionsa

Appliance Diary information Aigner et al.

COOK 2-4 2-5 DSH 4-6 2-5 DRYER l-6 2-5 POOLPUMP 2-6 none Washing Machine n/a 2-5

“Entries indicate hours at which end-use loads are set to zero. Our model does not include a separate dummy for washing machines.

Two other restrictions that were thought to be reasonable, involved re- stricting the loads of FREEZ and FRIGAUT to be constant in the early morning. Where indicated, our estimators incorporate the exclusion restric- tions mentioned above and have the end-use loads of FREEZ and FRIGAUT held constant over hours 2, 3, and 4.

Direct metering information was available for two appliances, namely HWOP and HWPK. For HWOP a total of 125 out of the 189 households owning the appliance were directly metered, while it was 21 out of 105 for HWPK.

6. Estimation results

6.1. Comparing estimators

At this stage it is useful to summarize the range of estimators that have been discussed. Table 2 provides a list of estimators that are distinguished by the form of the GLS estimation undertaken, whether direct metering data is included, whether diary data is used to specify exclusion restrictions, and finally whether the hourly models are estimated jointly by SUR or not.

Table 2

Alternative estimators.

GLS Direct

metering Exclusion

restrictions SUR

El E2 E3 E4 E5 E6 E7 E8

No No No Amemiya BLUP BLUP BLUP BLUP

No No No No No Yes Yes Yes

No Yes Yes No No No No Yes

No No Yes No No No Yes Yes

310 D.G. Fiebig et al., Estimation of residential end-use load profiles

Estimators El-E3 are methods that currently appear in the CDA litera- ture, while the remainder represent innovations for the estimation of end-use loads. In fact, estimators utilizing the BLUP option are also new to the econometrics literature in general.

The estimation of the basic RCM involves an iterative procedure that starts with El, then moves to E4, and finally to E.5. The same basic steps applied to the data adapted to incorporate direct metering leads to E6. Joint estimation was only considered for the direct metering data. It proceeds by taking the E6 estimates of the heteroskedastic structure to provide appropri- ate weights. SUR estimation is then applied to the weighted data. The estimator that results if no restrictions are placed on the coefficients of the explanatory variables is denoted by E7, while E8 incorporates the above- mentioned restrictions.

A priori, E8 is the preferred estimator. However, the theoretical superior- ity attributed to it may not necessarily be realised in any one particular application. It is first necessary to quantify the relative merits of the alterna- tives for our particular estimation problem.

Theoretically, each of our suggested innovations is designed to increase the precision of the estimated coefficients. For the purposes of empirical compar- ison, the criterion chosen was the trace of the covariance matrices of the coefficient estimators. Two traces were calculated: one involved summing over all coefficients and the other is the partial trace over the coefficients for just the end-use dummies. As these invariably produced the same qualitative conclusion, only the comparisons on the basis of the partial traces have been reported.

Naturally the El (i.e., OLS) variances are biased in the presence of our hypothesized heteroskedastic structure. In order to overcome this potential source of distortion in the comparisons, consistent estimates of the variances were generated using E6 estimates of the heteroskedasticity.

6.2. Results from the comparison of estimators

The first step in our analysis involved estimating the basic model for all 24 hours. As an initial comment there was strong evidence of heteroskedasticity present in the El results. Fig. 1 provides a summary of the Breusch and Pagan (1979) statistics and the studentized versions suggested by Koenker (1981). In all 24 equations the Breusch-Pagan statistics indicate the presence of heteroskedasticity of the form hypothesized. (Recall that existing Monte Carlo evidence suggests that in small samples the Breusch-Pagan test tends to reject the null hypothesis of homoskedasticity too infrequently.)

The studentized statistics provide less conclusive evidence of heteroskedas- ticity, and in fact they are uniformly less than the associated Breusch-Pagan statistics. Nevertheless, for a majority of the hours the null hypothesis of

D.G. Fiebig et al., Estimation of residential end-use load profiles 311

Fig. 1. Graph of the Breusch-Pagan statistics (-_) and studentized Breusch-Pagan statistics (-----) for each hour. Statistics lying above the reference line are significant at the 95% level.

homoskedasticity is still rejected. Because these statistics are asymptotically equivalent under normality, it is reasonable to conjecture that the observed differences are a reflection of possible nonnormality of the disturbances.

It is of some interest to report the magnitude of the negative variance problem associated with E4, the Amemiya estimator. Of the 240 estimated coefficient variances, 75 (i.e., 31%) proved to be negative. Like Buse (19841, we are therefore skeptical of the usefulness of the Amemiya estimator and will not discuss it further.

The actual regression results for each of the alternative estimators have not been reported here. As one would expect, there are few differences and they tend to be relatively minor. Only the ES results are provided and the associated discussion is found in the next subsection. However, one piece of summary information that is of interest in the evaluation of estimators is the number of negative end-use load estimates. One would expect these esti- mates to be positive, but there is nothing in our estimation procedure that ensures this.

312 D.G. Fiebig et al., Estimation of residential end-use load profiles

Table 3

Number of negative end-use load estimates.

El E5 E6 E7 E8

All loads 65 40 43 30 18 Excluding FREEZ 42 21 21 10 2

For each estimator the number of negative estimates has been provided in table 3. The indication is clear; all the GLS estimators produce far fewer negative estimates than El, while ES is by far the best estimator in this regard. Out of the 240 end-use loads (including the intercept), El produces 65 negative estimates compared with 18 for E8. The comparison is even more dramatic if the poorly estimated FREEZ loads are ignored. In this case, the respective numbers are 42 and 2.

An indication of efficiency gains is given by the partial traces of the variance-covariance matrices of the alternative estimators in fig. 2. To highlight the gains from each innovation, these have also been reported in table 4 as ratios relative to the preceding estimator. For example, the first column provides the ratios of the sum of E5 estimated variances of the estimated end-uses relative to those of El. They indicate the gain from use of the RCM to account for heteroskedasticity. These gains are considerable and are present for all hours except hour 16. The actual pattern of gains basically mirrors the pattern of the Breusch-Pagan statistics: heteroskedasticity was most pronounced early in the morning and late at night and this is where E5 is most efficient relative to El.

The difference between E5 and E6 involves the use of the direct metering data; here the improvements are even more impressive. Understandably, these gains are the most dramatic in the early morning period where HWOP

accounts for a large proportion of the load. However, unlike the comparison of El and E5, here the improvement is considerable for all hours.

Because direct metering was confined to only two of the end-uses, one might conclude that this improvement in overall performance is simply a reflection of precision gains for just these appliances. Inspection of the individual end-use estimates, however, shows that precision gains are not confined to those appliances that were directly metered. As an illustration, table 5 provides the coefficient standard errors for all E5 and E6 end-use load estimates for a selection of hours. While the improvement is most marked for HWOP and HWPK, there is uniform improvement for all end-uses at each of the hours reported.

Consideration of the first of the joint estimators, E7, again reveals uniform efficiency gains. Here it is interesting to note that the gains are greatest around the evening peak. Recall that this estimator does not include any

D. G. Fiebig et al, Estimation of residential end-use load profiles 313

Fig. 2. Partial traces of the variance-covariance matrices of the alternative estimators: El (--), E5 (-----), E6 C---j, E7 C---j, and ES (----).

restrictions on the base model specification and as such these efficiency gains are purely those attributable to the cross-equation correlation between disturbances that can be exploited because of the different heteroskedasticity present in each of the equations.

The second joint estimator, E8, does incorporate restrictions on the coefficients of the base model and these are reflected in the pattern of precision gains that emerge. The majority of the restrictions are of the exclusion type and refer to the hypothesized absence of use of certain appliances early in the morning. These loads are taken to be zero with no associated estimation error. While the now familiar pattern of uniform gains for E8 over E7 emerges, the largest gains are in these morning hours. For the remaining hours the gains of E8 over E7 are only modest. Naturally the overall gains from joint estimation, E8 compared to E6, are considerable for all hours. As a final comparison, E8 is compared to El. This emphasises the overall improvement attributable to our estimation procedure relative to the more conventional one.

314 D.G. Fiebig et al., Estimation of residential end-use load profiles

Hour E5/El

Table 4

Trace comparisons.”

E6/E5 E?/E6 ES/E7 ES/El

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

0.37 0.47 0.86 0.69 0.10 0.42 0.35 0.86 0.54 0.07 0.39 0.43 0.85 0.48 0.07 0.52 0.46 0.86 0.36 0.07 0.73 0.44 0.87 0.31 0.08 0.71 0.55 0.88 0.32 0.11 0.63 0.67 0.87 0.75 0.27 0.79 0.55 0.84 0.89 0.33 0.70 0.61 0.84 0.94 0.34 0.72 0.59 0.85 0.96 0.34 0.86 0.52 0.87 0.96 0.37 0.86 0.56 0.87 0.95 0.40 0.84 0.54 0.86 0.95 0.37 0.90 0.57 0.85 0.96 0.41 0.90 0.61 0.81 0.96 0.43 1.05 0.63 0.75 0.92 0.45 0.97 0.55 0.78 0.92 0.38 0.76 0.65 0.79 0.90 0.35 0.82 0.71 0.76 0.90 0.40 0.94 0.83 0.67 0.91 0.48 0.95 0.73 0.70 0.94 0.46 0.76 0.68 0.76 0.91 0.36 0.61 0.66 0.81 0.88 0.29 0.48 0.53 0.83 0.84 0.17

“Entries represent the ratios of the partial traces of the variance-cnvariance matrices of the estimators indicated.

Table 5

Comparison of coefficient standard errors.”

Variable Hour 1 Hour 8 Hour 19

INTER 0.64 0.88 0.91 FREEZ 0.78 0.79 0.95 FRIGAUT 0.74 0.87 0.96 COOK 0.67 0.68 0.82 DSH 0.63 0.88 0.93 DRYER 0.79 0.85 0.93 HEAT 0.77 0.82 0.93 HWPK 0.24 0.64 0.61 HWOP 0.76 0.40 0.16 POOLPUMP 0.69 0.74 0.94

“Entries represent the estimated coefficient standard errors of E6 expressed as ratios of the standard errors for E5.

D.G. Fiebig et al., Estimation of residential end-use load profiles 315

A referee and the Associate Editor suggested that much of the gain that we have attributed to our estimators could have been accommodated by including interaction terms amongst the explanatory variables in our basic model. It turns out that this is not the case. Models containing interaction terms were utilized in the second phase of the Domestic End-Use study where other regions, months, and day types have been analysed, and the heteroskedasticity remains. This observation is supported by the discussion in Lawrence and Parti (1984) in their survey of CDA models.

In summary, each of our innovations have been a success under the criteria used and combining them in the form of E8 produces an estimator that performs remarkably well.

6.3. Coeficient estimates

The E8 results look very reasonable. These results have been reported in two parts: table 6 contains the coefficients for the demographic variables and

Table 6

E8 estimates for demographic variables.

Hour PEOPLE SIZE -

1 0.08” 0.01 2 0.07h 0.00 3 O.OBa 0.00 4 0.08” 0.00 5 0.08” 0.00 6 0.08” 0.01 7 0.09” - 0.00 8 0.11” 0.00 9 0.13” O.Olb

10 0.09” 0.02” 11 0.09” 0.02” 12 0.08” 0.02” 13 0.08” 0.02” 14 0.07” 0.02” 15 0.08” O.Ola 16 O.loa O.Ola 17 0.13= 0.01” 18 0.12” O.Olb 19 0.11” O.Olb 20 O.lla 0.03a 21 0.14” 0.02” 22 0.10” O.Olb 23 0.113 0.02b 24 O.loa O.Ola

“Statistical significance at the 99% level. bStatistical significance at the 95% level.

INCOME INCMSG MJROTH

0.01 0.10 - 0.01 0.00 0.08 -0.00 0.00 0.08 0.01

- 0.01 0.06 0.02 - 0.01 0.06 0.08

0.00 0.09 0.02 0.00 0.03 0.04 0.03 0.09 - 0.01 0.00 0.08 -0.04

- 0.00 0.05 0.01 0.01 0.07 0.10

- 0.00 0.04 0.13 -0.01 0.05 0.05 - 0.01 0.06 0.01 - 0.02 0.03 0.06 - 0.03b - O.Olb - 0.03 - 0.05” -0.14b -0.17b - 0.05b -0.19b - 0.22b

0.01 - 0.06 -0.32” 0.03 - 0.00 - 0.20 0.04 0.03 -0.11 0.04 0.02 -0.15 0.06a 0.14 - 0.09 0.04b O.lgb - 0.06

316 D.G. Fiebig et al., Estimation of residential end-use load profiles

LOAD PROFILE : FREEZ

Fig. 3

MJROTH, while figs. 3 to 12 provide the estimated end-use loads. Amongst the demographic variables, PEOPLE and SIZE are the most important explanatory variables. They are always positive, as expected, and with the exception of SIZE in the early morning hours, they are precisely estimated over the entire day. INCOME, INCMSG, and MJROTH on the other hand are rarely significant.

The estimates of most interest are those of the end-use loads, These are most encouraging. Importantly, estimates are typically positive as they should be. As previously mentioned a notable exception is FREEZ, where the associated load is poorly estimated over the entire day. Also noteworthy is the relative smoothness of the estimated end-use load profiles. A priori, smooth profiles are expected, but the manner in which the estimates are generated does not ensure that the estimates produced will necessarily be smooth. (There is the minor exception of the cross-equation restrictions on the FREEZ and FRIGAUT estimates.)

Recall that the intercept can be interpreted as the load associated with appliances that are not directly included in the model when this load is evaluated for an ‘average household’. The estimates of this load are substan- tial and are always precisely estimated throughout the day. There is a morning peak before the estimates steadily grow to a more pronounced late-night peak.

The pattern of a morning peak followed by a more substantial afternoon peak is repeated for most other appliance end-use estimates. Naturally,

D.G. Fiebig et al., Estimation of residential end-use load profiles 317

318 D.G. Fiebig et al., Estimation of residential end-use load profiles

LOAD PROFILE : DSH

Fig. 6

LOAD PROFILE : DRYER

Fig. 7

D. G. Fiebig et al., Estimation of residential end-use load profiles 319

LOAD PROFILE : HEAT

Fig. 8

Fig. 9

320 D.G. Fiebig et al., Estimation of residential end-use load profiles

LOAD PROFILE : HWOP

Fig. 10

LOAD PROFILE : POOLJ?UMP

lnTTS 900 _1

Fig. 11

D.G. Fiebig et a/., Estimation of residential end-use load profiles 321

LOAD PROFILE : MISCELLANEOUS

“Zb

1

Fig. 12

322 D.G. Fiebig et al., Estimation of residential end-use load profiles

TWPK loads over households at 7pm 30

Fig. 14

~OOLPUMIP loads over households at 7pm

Fig. 15

D. G. Fiebig et al., Estimation of residential end-use load profiles 323

HWOP is the major exception to this pattern. Here the loads are large late at night and early in the morning and small elsewhere. These HWOP loads are estimated precisely at all hours. HEAT and HWPK are also precisely estimated throughout the day. The pattern of significant end-use estimates for other appliances is mixed, but typically they are precisely estimated around the morning and afternoon peaks.

The existence of directly metered data provides an opportunity to perform a check on our CDA model. In particular the estimated loads for HWPK and HWOP derived from our regression model can be compared with the average loads obtained from the direct metering data alone. Recall that partial direct metering is available here; only a subsample of households possessing the particular end-use have been metered. For HWOP, where a larger percent- age of customers were directly metered, three estimates of the end-use loads are presented in fig. 13: the E5 estimates that ignore the direct metering information, the final E8 estimates, and the average of the direct metering data.

The results show that the E5 estimates tend to be smaller than the E8 estimates, and that the E8 estimates are very close to the direct metering estimates. The observed differences might be due to the subsample of households not being representative, but we do not have an explanation of why this would be the case.

6.4. Distribution of end-use loads across households

As has been mentioned, the BLUPs represent predictions of actual cus- tomer end-use loads and as such can be used to develop distributions of end-use loads over households. These distributions are available for each end-use at each hour. For the purposes of illustration, two examples corre- sponding to HWPK and POOLPUMP for the peak hour of 7 p.m. are presented in figs. 14 and 15. The extra information conveyed by these histograms has not previously been available and represents a further attrac- tive feature of the RCM employed here.

7. Conclusion

There are several conclusions that emerge from our study. In particular we have developed a menu of extensions to the conditional demand analysis approach that allows for dramatic improvements in the estimation of residen- tial end-use load profiles. Central to this development is the use of a random coefficient model. The random coefficient formulation not only provides a simple framework for the incorporation of direct metering information, but it also accounts for the heteroskedasticity in the data.

324 D.G. Fiebig et al., Estimation of residential end-use load profiles

In the process of implementing the random coefficient model we have also developed an approach to estimating the variance structure of such a model which is of wider interest. The problem of negative variance estimates is one that has been the cause for much concern without there being any concensus regarding the most appropriate solution. The solution provided in this paper is a definite candidate for much wider use in this area.

On the basis of the empirical results presented, the improvements at- tributable to each of the innovations has been considerable. The improve- ments afforded by the inclusion of partial direct metering information being especially noteworthy. These improvements occur for all appliances; they are not limited to those appliances being directly metered. This suggests that, in conjunction with conditional demand analysis, such data can provide quite accurate estimates of load profiles without the need for directly metering all appliances for all customers. The gains in efficiency are such that in future residential load studies every effort should be made to record the loads of all appliances such as offpeak water heaters or ranges which are on a separate circuit running from the main board.

In terms of refining our approach, there are two areas worthy of closer scrutiny. These are associated with the assumption that the disturbances of the individual random coefficients are independent and the treatment of appliance ownership patterns as given. Both of these extensions in the context of the current analysis represent challenging areas of future research.

Appendix: Generating consistent estimates of (Y using BLUPs

Consider the BLUP-based estimator cUj defined in (16) as

cij =n,y’ Cdij($ - q,)2.

i

Using the definition of 7; in (19, we can write

(A.1)

(A.2)

Since s^ is a consistent estimator for 6, we have that

Noting that Ecu?) = dice and assuming that Ecu:) is finite we can conclude

(yi - z:S)’ : uf.

D.G. Fiebig et al., Estimation of residential end-use load profiles 325

from the law of large numbers that

n,~‘~dij(d:a)-‘(yi-zjq2 5 1,

and hence P

ci + ‘yj’

proving that OI, is consistent. To make Zj operational we require an initial estimate for ‘Y~ so that the

adjusted BLUPs rl; can be calculated. We see from (A.2) that provided this initial estimate is consistent, the consistency of Lyi is not affected. In this study we chose the Amemiya estimator for ‘Y~ discussed in section 3, truncated to ensure each LY~ reached a reasonable lower bound.

The iterative procedure for estimating the aj, based on the adjusted BLUPs, can also be written as

z! = i&-‘Cd;,‘if/(@)*, I I I (A.3) i

where the superscripts 1 and 0 denote the current and previous iterations, respectively, and the zZi are the residuals from the most recent EGLS estimate of the primary equation. The initial value for Z$’ is assumed to be nonnegative, otherwise the r17 cannot be calculated. Convergence in the iteration procedure is achieved when for all j

n,-’ CdijLiy/(u:)2 = 1

i

or when

a” = 0. I

This implies that the initial i5$ must in fact all be chosen positive if we don’t wish to impose zero solutions a priori.

A slight modification of the iterative procedure in (A.31 will produce maximum likelihood estimates for the aj under normality. The first-order condition for crj is given by

a1og L - = ; ~dij(h;/u,4 - l/ai2) = 0.

a(yi 1

This condition will be satisfied if

Cdij(O?/a>) Cdij(l/a,‘) = 1. i i 1

(A.4)

(A.3

326 D.G. Fiebig et al., Estimation of residential end-use load profiles

Hence the iterative procedure,

(A.61

will on convergence produce either CI;, = 0 or ‘Yj satisfying the first-order condition (A.4). Thus if the initial cUj is chosen to be positive then (A.61 will ensure nonnegative MLEs for (Y, provided the iteration converges. Note that in this case there is no need to choose a consistent estimate for the initial (Y: since consistency for Gj derives from the fact that the first-order conditions are satisfied.

If we calculate the adjusted BLUPs as

7; = di, Tj + ( &:‘)1’2~j/c?jq ) [

then &; in (A.61 can also be obtained as

That is &j is a weighted sum of squares of the ?$ around qj

L”; = Cd;,w,($ - Tj)‘/ -&iijw,, i i

(A.71

(A.8)

(A.9)

with wi = (c?~‘)-*.

References

Aigner, D.J., C. Sorooshian, and P. Ketwin, 1984, Conditional demand analysis for estimating residential end-use load profiles, The Energy Journal 5, 81-97.

Amemiya, T., 1977, A note on a heteroskedastic model, Journal of Econometrics 6, 365-370. Bartels, R. and D.G. Fiebig, 1990, Integrating direct metering and conditional demand analysis

for estimating end-use loads, The Energy Journal 11, 79-97. Breusch, T.S. and A.R. Pagan, 1979, A simple test for heteroskedasticity and random coefficient

variation, Econometrica 47, 1287-1294. Buse, A., 1984, Tests for additive heteroskedasticity: Goldfeld and Quandt revisited, Empirical

Economics 9, 199-216. Davidian, M. and R.J. Carroll, 1987, Variance function estimation, Journal of the American

Statistical Association 82, 1079-1091. Griffiths, W.E., 1972, Estimation of actual response coefficients in the Hildreth-Houck random

coefficients model, Journal of the American Statistical Association 67, 633-635.

D.G. Fiebig et al., Estimation of residential end-use load profiles 321

Judge, G.G., W.E. Griffiths, R.C. Hill, H. Lutkepohl, and T.-C. Lee, 1985, The theory and practice of econometrics, 2nd ed. (Wiley, New York, NY).

Koenker, R., 1981, A note on studentizing a test for heteroskedasticity, Journal of Econometrics 17, 107-112.

Lawrence, A.G. and M. Parti, 1984, Survey of conditional energy demand models for estimating residential unit energy consumption coefficients, Report no. EA-3410 (Electric Power Re- search Institute, Palo Alto, CA).

Lee, L.F. and W.E. Griffiths, 1979, The prior likelihood and best linear unbiased prediction in stochastic coefficient linear models, Working papers in econometrics and applied statistics no. 1 (University of New England, Armidale, Australia).

Maddala, G.S., 1977, Econometrics (McGraw-Hill, New York, NY). Ronning, G., 1985, On the nonnegativity of XX+ and its relevance in econometrics, Metrika 32,

35-47. Sebold, F.D. and K.M. Parris, 1989, Residential end-use energy consumption: A survey of

conditional demand estimates, Report no. CU-6487 (Electric Power Research Institute, Palo Alto, CA).

Srivastava, V.K., G.D. Mishra, and A. Chaturvedi, 1981, Estimation of linear regression model with random coefficients ensuring almost non-negativity of variance estimators, Biometrical Journal 23, 3-8.