Large sample prediction intervals for a future sample mean: A comparative study

Post on 12-May-2023

2 views 0 download

Transcript of Large sample prediction intervals for a future sample mean: A comparative study

This article was downloaded by:[University of Jordan]On: 23 February 2008Access Details: [subscription number 781056816]Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Statistical Computationand SimulationPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713650378

Large sample prediction intervals for a future samplemean: a comparative studyMohammed A. Shayib a; Adnan M. Awad a; Ahmad M. Dawargeh ba Mathematics Department, Kuwait University, Kuwaitb Department of Statistics, Yarmouk University, Irbin, Jordan

Online Publication Date: 01 July 1986To cite this Article: Shayib, Mohammed A., Awad, Adnan M. and Dawargeh, AhmadM. (1986) 'Large sample prediction intervals for a future sample mean: acomparative study', Journal of Statistical Computation and Simulation, 24:3, 255 -270

To link to this article: DOI: 10.1080/00949658608810908URL: http://dx.doi.org/10.1080/00949658608810908

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction,re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expresslyforbidden.

The publisher does not give any warranty express or implied or make any representation that the contents will becomplete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should beindependently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with orarising out of the use of this material.

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

J. Statist. Comput. Simul., 1986, Vol. 24, 255-270 0094-9655/86/2404-0255 $18.50/0 0 1986 Gordon and Breach, Sc~ence Publishers, Inc Pnnted in Great Britain

Large Sample Prediction Intervals for a Future Sample Mean: A Comparative Study

MOHAMMED A. SHAYIB and ADNAN M. AWADt

Mathematics Department, Kuwait University, P.O. Box 5969, Kuwait

and

AHMAD M. DAWAGREH

Department of Statistics, Yarmouk University, Irbid, Jordan

(Received May 12, 1985)

This article gives a comparative study among several prediction intervals for the future sample mean. The observed sample, used in the techniques and the future sample for which the prediction intervals were established share the same underlying distribution. Two assumed underlying distributions were used.

The first underlying distribution is the exponential with parameter 6'. Five different intervals were set for the prediction of the future sample mean.

The second underlying distribution is the normal with the parameter 8 as the common value for the mean and the variance. Seven intervals were compared.

A simulation validation was done. Some criteria were set for the "best" interval in both cases. Moreover, the merits of the techniques were given, and a table of results of the simulations is supplied.

t o n sabbatical leave from the Department of Statistics, Yarmouk University, Irbin, Jordan.

255

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

256 M. A. SHAYIB, A. M. AWAD A N D A. M. DAWAGREH

1. INTRODUCTION

Several procedures have been suggested in the literature to construct prediction intervals. Hahn and Nelson (1973), Aitchison and Dunsmore (1975), and the references cited by them, give a good survey of these procedures and provide several applications on statistical prediction analysis. In this paper we will be concerned with a comparative study of the behaviour of several prediction intervals for the future sample mean.

Let X=(X, , ..., X,)' be an observed random sample from a population whose density function is f(x(O), where HEOC R. Let Y = (Y,, . . . , Y,)' be a future random sample from a population whose density is also f(y(Q). Assume that the two samples are independent of one another and that 6' is an unknown parameter. Given the observed random sample X we would like to construct a prediction interval for the future sample mean, i.e., we would like to find two statistics L(X) and U(X) such that P(L(X) 5 Ys U(X)) = 1 -a. Since there is an infinite number of solutions for L(X) and U(X) we will consider a two-sided interval with equal tail probabilities. The following procedures will be used to construct the required intervals.

Procedure 1 (True)

If Q is known then the density of Y I Q can be used to obtain what is termed the true interval. Such a case is of no practical use since it will be free of the observed sample. However, in some cases it may be possible to construct a density function which depends on both X, and Y, given 8, and that density may be used to construct a prediction interval. This fact will be illustrated in Section 2.

Procedure 2 (Pivot)

Faulkenberry (1973) and others suggested removing the unknown parameter 0 through pivots or the sufficiency principle. Then the resulting distribution may be used to obtain a prediction interval.

Procedure 3 (Maximum Likelihood Estimator, MLE)

Wald (1942) suggested removing the unknown parameter 0 from the

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

PREDICTION INTERVALS FOR A MEAN 257

density of fr(jld) by replacing 8 by its maximum likelihood es- timator & MLE. Then fF(plg may be used to construct a large sample prediction interval for Y given X.

Moreover, if 8 is assumed to be known and [L(X, 6), U(X, 8)] is a prediction interval for Y then a reasonable approximate prediction interval for P when 8 is unknown is [L(X, 6, U(X, @] where 8 is the MLE for 8.

Procedure 4 (Central Limit Theorem, CLT)

If X has a mean pX and a finite non-zero variance o$ and the same hold for p y and o: then the central limit theorem, CLT, implies that

and

If it happens that one can remove the unknown parameter 8, by applying the pivot method to the asymptotic distributions of P and R, we may construct a large sample prediction interval for L: This point will be illustrated in Section 2.

Procedure 5 (Variance Stabilizing Transformation, VST)

Awad (1985) used a variance stabilizing transformation in the cases where o: is a function of p,. His procedure depends on the well- known fact that

has a variance which is free of p. Then the asymptotic distributions of $(X) and $(P) can be used to construct a large sample prediction interval for F

To illustrate these procedures we will consider two examples. The first one assumes that the underlying distribution is exponential with

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

258 M. A. SHAYIB, A. M. AWAD AND A. M. DAWAGREH

mean 118, exp(8). The second example assumes that the underlying distribution is normal with mean 8 and variance 8, N(8,8). Sections 2 and 3 give the intervals derived by the above-mentioned pro- cedures for these two examples. The validity and efficiency of these intervals will be given in Section 4, through simulation methods.

Finally, Section 5 provides some suggestions about the applica- bility of these intervals together with their ranks according to some criteria.

2. PREDICTION INTERVALS FOR P IN THE EXPONENTIAL CASE

Let XI , X,, . . . , X , be an observed (past) random sample from the exponential distribution with parameter 8, exp(8), with density

Assume that we would like to observe a future sample Yl, Y2, . . ., Ym, independent from the past sample and sharing the same underlying distribution.

We are interested in constructing and comparing large sample prediction intervals for the future sample mean. We will apply the procedures stated in Section 1 to this case.

2.1 True interval-Procedure 1

Assuming 8 is known, it can be shown that 28xy=1 has a chi- square distribution with 2m degrees of freedom (df). Therefore

Hence a 100(1 -a)% prediction interval for P is

2.2 Pivot method-Procedure 2

Lawless (1972) has used the fact that P/8 has an F-distribution with

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

PREDICTION INTERVALS FOR A MEAN 259

(2m, 2n) degrees of freedom to obtain the prediction interval

2.3 MLE-Procedure 3

It is well known that the MLE for 8 is g= 118. So replacing 8 in (4) by 0 we obtain an approximate 100(1 -a)% prediction interval for P as

provided that n is large. To validate interval (6), one should notice that

and

(113) = 8 5 8, (MLE is a consistent estimator).

Hence - - d

2mY/X+ ~ 2 ~ .

So interval (6) can be obtained.

Using the CLT for i.i.d. random variables with finite variance, it follows that

and

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

260 M. A. SHAYIB, A. M. AWAD A N D A. M. DAWAGREH

provided that m and n are large. This together with the main theorem in Billingsley (1968, p. 30) implies that

m, P 2 / n X 2 + F,, ,(m, n)

where F,,,(m,n) denotes a doubly non-central F distribution with (1, l ) degrees of freedom and non-centrality parameters (m, n). There- fore an approximate 100(1 -a)% prediction interval for P is

provided that m and n are large.

2.5 VST-Procedure 5

Since p, = 110, a$ = 1/d2, Eq. (3) reduces to

Therefore

provided that m and n are large. Using this asymptotic result one obtains an approximate 100(1 -a)% prediction interval for as

3. PREDICTION INTERVALS FOR IN THE N(0,O) CASE

Let XI, X,, . . ., X, be an observed (past) random sample from X ( 0 , fl), and Y,, Y2, . . . , Y, be a future random sample, independent from the past sample, with the sarne above distribution. Now we will construct prediction intervals for 7 using the procedures in Section 1.

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

PREDICTION INTERVALS FOR A MEAN

3.1 True interval-Procedure 1

Assuming 0 is known and 0 > 0, it is clear that

and

(n/m)"' . ( P- Q)/(X - 8): Cauchy (0, I), (11)

Using Eqs. (9)-(11), we obtain the following 100(1 -a)% prediction intervals for P given X and 0:

where C1-a12 denotes the (1 -a/2)th percentile of the Cauchy distri- bution with parameters (0,l).

It is interesting to note that the true interval is not unique. Moreover, interval (12) is free of the past data while intervals (13) and (14) depend on the past data.

3.2 Pivot method-Procedure 2

The pivot method and Faulkenberry (1973) cannot be applied on this example.

3.3 M LE-Procedure 3

It can be shown that the MLE for 0 is

where X 2 =x; X?/n. So replacing 6 in (12)<14) by $ we obtain the

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

262 M. A. SHAYIB, A. M. AWAD AND A. M. DAWAGREH

following approximate 100(1-a)% prediction intervals for P given R, and large n:

and

Since X and Y are exactly normally distributed, the CLT is of no use.

3.5 VST-Procedure 5

Since p, = 8, and o: = 8, applying Eq. (3) we obtain that

provided that both m and n are large. Hence an approximate 100(1 -a)% prediction interval for Y is

4. VALIDATION BY SIMULATION

Three criteria will be used to compare the obtained prediction intervals in Sections 2 and 3. These criteria are:

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

PREDICTION INTERVALS FOR A MEAN 263

Criterion 1

A prediction interval procedure may be termed valid if actual convergence of the true but unknown future sample mean I: in repeated sampling, is close to the stated nominal confidence level.

Criterion 2

A prediction interval procedure may be termed limit-efficient if the limits of the simulated interval are close to the true limits.

Criterion 3

A prediction interval procedure may be termed length-efiicient if the length of the simulated interval is close to the true length.

To apply the above criteria simulation was carried out as follows. 1) A random sample of size n generated from the underlying

distribution, was used as the past data to calculate X and @. 2) A random sample of size m from the same distribution in (1)

above was considered as the future sample, and the sample mean P was calculated.

3) The lower limits of L,, the upper limits Ui, and the length of the interval K, were calculated for n = 10(10)50 and m = 5(5)30, and also for n = 60(10)80 and m = 35(5)80.

4) Steps in (1)+3) above, were repeated 1,000 times. The means of the lower limits L,, the upper limits Di and the length @ for each interval were calculated together with the standard errors for the average of the lower and upper limits.

5) The P's obtained in Step (2) were compared with Li and Oi for each procedure i. Then the percentages PI , of P<Li , and P, of Ei 5 P s Oi were calculated.

The results of these simulations, for the two cases considered in Sections 2 and 3 are given in Tables I and I1 respectively.

5. PREDICTION INTERVAL FOR HARMONIC MEAN

Let X = (XI, X,, . . . , X,)', and Y = (Y,, Y,, . . . , Y,)' be two independent

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

264 M. A. SHAYIB, A. M. A W A D A N D A. M. DAWAGREH

identically distributed random variables with finite mean and finite variance and satisfying P(Xi = 0) = P ( q = 0) V i = l ,2, . . . , n and V j =

1,2, ..., m. Moreover, let &= l/Xi, and y= l/q, i = 1 ,..., n and j= I , . . . , m. Thus H, = V p l and H y = W - l are called the harmonic means of X and Y respectively. Assume that the I/,'s and Wj's have finite mean p and finite variance 02. Using CLT for i.i.d.r.v. with finite variance, it follows that

or equivalently, that

If np2/02 and mp2/02 are free of the unknown parameters of the model, then an approximate 100(1 -a)% prediction interval for the future sample harmonic mean, H,, is

6. COMMENTS

On checking the tables, one can conclude, according to the under- lying distribution, the following remarks:

Exponential distribution (Table I)

Interval (6), when using the MLE procedures, is the best according to the three criteria above. The other intervals, namely (5), (7) and (8) show no difference among them.

Normal distribution (Table 11)

Intervals (14) and (17) in the text, using the Cauchy approximation, are not accepted according to the criteria 1, 2 above.

Interval (15) is the best based on the criteria 1, 2 and 3 above.

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

PREDICTION INTERVALS FOR A MEAN 265

Intervals (13), (16) and (18) using the doubly non-central F- distribution and VST method rank the same. In those intervals, the lower limit is less than the true lower limit, and the upper limit is greater than the true upper limit. This results in that the coverage probability is higher than the nominal coverage probability.

Acknowledgements

Thanks are due to the users and administration departments of the Computer Science Centre of Kuwait University for giving us more time on the CPU. The process of simulation was carried on the UNIVAC 1100 timelsharing-multi-processor system of Kuwait University.

Also, thanks are due to Mrs. E. Al-Khammash for typing the manuscript and for her tolerance in making the corrections needed.

References

Aitchison, J. and Dunsmore, I. R. (1975). Statistical Prediction Analysis. Cambridge University Press, London.

Awad, A. M. (1985). Large sample prediction intervals: A variance stabilizing approach. Submitted for publication.

Billingsley, P. (1968). Convergence of Probability Measures. John Wiley and Sons, New York.

Faulkenberry, G. D. (1973). A method of obtaining prediction intervals. J A S A 68, 433435.

Hahn, G. J. and Nelson, W. (1973). A survey of prediction intervals and their applications. J. Quality Tech. 5, 178-188.

Lawless, J. F. (1972). On prediction intervals for samples from the exponential distribution and prediction limits for system survival. Sankhya A. 34, 1-12.

Wald, A. (1942). Setting of tolerance limits when the sample is large. Annals of Mathematical Statistics 13, 389-399.

Appendix

The following tables consist of several sections controlled by the past sample size n and the future sample size m. The symbols TL, T U and T W represent the lower, the upper, and the width of the true prediction interval for Y given X. The columns represent the prediction intervals calculated by the cited methods:

I. The lower limit, L.

11. The upper limit, U .

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

266 M. A. SHAYIB, A. M. AWAD A N D A. M. DAWAGREH

111. The width, W IV. The standard deviation of the mean of the lower limit,

SDL. V. The standard deviation of the mean of the upper limit,

SDU.

v1. P , = P , [ Y < L ] .

VII. P , = P r [ L 5 F s U ] .

VIII. P , = 1 - P I - P,.

The interval numbers in the last column of the tables coincide with their numbers in the text. For example, 5 along the first line gives the pivot interval.

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

TABLE I

Exponential case

I I1 111 IV V VI VII VIII Interval

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

268 M. A. SHAYIB, A. M. AWAD AND A. M. DAWAGREH

TABLE I1

Normal case

1 11 I11 1V V VI VII VIII Interval

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

PREDICTION INTERVALS FOR A MEAN 269

TABLE I1 (Normal case) (continued)

111 IV V VI VII VIII Interval

Interval 16 111 IV VII VIII

Dow

nloa

ded

By:

[Uni

vers

ity o

f Jor

dan]

At:

05:4

1 23

Feb

ruar

y 20

08

2 70 M. A. SHAYIB, A. M. AWAD A N D A. M. DAWAGREH

TABLE I1 (Interval 16) (continued)

I I1 I11 IV v VI VII VIII