Download - Prediction of order statistics and record values based on ordered ranked set sampling

Transcript

This article was downloaded by: [University of Leicester]On: 13 June 2013, At: 12:33Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Statistical Computation andSimulationPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gscs20

Prediction of order statistics and recordvalues based on ordered ranked setsamplingMahdi Salehi a , Jafar Ahmadi a & N. Balakrishnan b ca Department of Statistics, Ordered and Spatial Data Centerof Excellence , Ferdowsi University of Mashhad , PO Box 1159,Mashhad , 91775 , Iranb Department of Mathematics and Statistics , McMaster University ,Hamilton , ON , Canada , L8S 4K1c Department of Statistics , King Abdulaziz University , Jeddah ,Saudi ArabiaPublished online: 29 May 2013.

To cite this article: Mahdi Salehi , Jafar Ahmadi & N. Balakrishnan (2013): Prediction of orderstatistics and record values based on ordered ranked set sampling, Journal of StatisticalComputation and Simulation, DOI:10.1080/00949655.2013.803194

To link to this article: http://dx.doi.org/10.1080/00949655.2013.803194

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.

Journal of Statistical Computation and Simulation, 2013http://dx.doi.org/10.1080/00949655.2013.803194

Prediction of order statistics and record values based on orderedranked set sampling

Mahdi Salehia, Jafar Ahmadia* and N. Balakrishnanb,c

aDepartment of Statistics, Ordered and Spatial Data Center of Excellence, Ferdowsi University ofMashhad, PO Box 1159, Mashhad 91775, Iran; bDepartment of Mathematics and Statistics, McMaster

University, Hamilton, ON, Canada L8S 4K1; cDepartment of Statistics, King Abdulaziz University,Jeddah, Saudi Arabia

(Received 6 April 2012; final version received 4 May 2013)

In this paper, we consider two-sample prediction problems. First, based on ordered ranked set sampling(ORSS) introduced by Balakrishnan and Li [Ordered ranked set samples and applications to inference. AnnInst Statist Math. 2006;58:757–777], we obtain prediction intervals for order statistics from a future sampleand compare the results with the one based on the usual-order statistics. Next, we construct predictionintervals for record values from a future sequence based on ORSS and compare the results with the one basedon an another independent record sequence developed recently by Ahmadi and Balakrishnan [Predictionof order statistics and record values from two independent sequences. Statistics. 2010;44:417–430].

Keywords: coverage probability; distribution-free; prediction interval; record data; ranked set sampling;ordered ranked set sampling

Mathematics Subject Classification: 62G30; 62G15

1. Introduction

Let {Xi, i ≥ 1} be a sequence of independent and identically distributed (iid) random variables.An observation Xj is called an upper (or lower) record value if its value exceeds (less than) allprevious observations, i.e. Xj is an upper (or lower) record if Xj > Xi (or Xj < Xi) for every i < j.For convenience of notations, let the 1th upper and lower records be taken as L1 = U1 ≡ X1, andthe nth upper and lower records be denoted by Un and Ln, respectively (for n ≥ 1). These typesof data arise in a wide variety of practical situations such as industrial stress testing, meteorology,hydrology, sports, and stock market analysis. Interested readers may refer to the book by Arnoldet al.[1] and the references contained therein.

Now, independently of the X-sequence, suppose Y1, Y2, . . . , Yn are iid random variables fromthe same distribution. The corresponding order statistics are the Yi’s arranged in the increasingorder of magnitude, denoted by Y1:n ≤ Y2:n ≤ · · · ≤ Yn:n. These order statistics play an importantrole in many problems including the characterization of probability distributions and goodness-of-fit tests, analysis of censored data, and reliability analysis; see, for example, [2–5] for moredetails concerning the theory and applications of order statistics.

*Corresponding author. Email: [email protected]

© 2013 Taylor & Francis

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

2 M. Salehi et al.

Several authors have discussed prediction problems in both one- and two-sample cases, withthe data involving record values and order statistics. In this context, the prediction of recordsbased on records and of order statistics based on order statistics has been addressed. One mayrefer to, among others, Dunsmore [6], who derived the mean coverage and guaranteed coveragetolerance region for the (m + r)th record value based on the first m record values in the classicalframework and also under the Bayesian setup. Kaminsky and Nelson [7] considered the predictionof order statistics in one-sample as well as two-sample cases, and obtained linear point predictorsand prediction intervals based on samples from location-scale families. Ahmadi and Doostparast[8] discussed a Bayesian estimation and prediction for some life distributions based on recordvalues. Raqab and Balakrishnan [9] derived distribution-free prediction intervals for records fromthe Y -sequence based on record values from the X-sequence of iid random variables from thesame continuous distribution. Recently, Vock and Balakrishnan [10] discussed non-parametricprediction intervals for order statistics based on ranked set sampling (RSS) using conditional andunconditional approaches, by making use of an algorithm proposed by Frey.[11]

In this paper, we consider the prediction of order statistics and record values based on orderedranked set sampling (ORSS), and in both cases, we obtain explicit expressions for predictioncoefficients. Throughout the paper, we use the following simplifying notation:

ψ(f , t; m, n, k) =∑St(m)

m∑b1=j1

· · ·m∑

bt=jt

jt+1−1∑bt+1=0

· · ·jm−1∑bm=0

[m∏

s=1

(m

bs

)]f (B; m, n, k), (1)

where f is an arbitrary real function, B := ∑mi=1 bi, t and m (t ≤ m) are positive integers values

and∑

St(m) denotes the summation over all permutations (j1, . . . , jm) of {1, . . . , m} for whichj1 < · · · < jt and jt+1 < · · · < jm.

The rest of this paper proceeds as follows. In Section 2, after describing briefly the conceptof ORSS, we develop the exact distribution-free prediction interval for an order statistic from afuture sample and also discuss the determination of an optimal interval. Next, in Sections 3 and4, we develop the distribution-free prediction of the sample mean and record values from a futuresequence.

2. Prediction of order statistics

McIntyre [12] proposed a method for unbiased selective sampling, using ranked sets that hasbeen termed as RSS in the literature. This sample, in turn, yields more efficient estimators ofmany population parameters of interest than a simple random sample (SRS) of the same sizedoes. Throughout this paper, we assume that {Xi, i ≥ 1} is a sequence of iid continuous randomvariables with cumulative distribution function (cdf) F(x) and probability density function (pdf)f (x), and Y1:n ≤ Y2:n ≤ · · · ≤ Yn:n denote the order statistics from a future random sample of sizen from the same distribution F(x). The observation process of a one-cycle RSS of size m can bedescribed as follows:

1 : X(1:m)1 X(2:m)1 · · · X(m:m)1 → X1,1 = X(1:m)1

2 : X(1:m)2 X(2:m)2 · · · X(m:m)2 → X2,2 = X(2:m)2...

......

. . ....

......

m : X(1:m)m X(2:m)m · · · X(m:m)m → Xm,m = X(m:m)m,

where X(i:m)j denote the ith-order statistic from the jth simple random sample of size m. The vec-tor of observations XRSS = (X1,1, . . . , Xm,m) is a one-cycle RSS of size m; note that Xi,i’s are not

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

Journal of Statistical Computation and Simulation 3

necessarily ordered. Balakrishnan and Li [13] considered the arrangement of Xi,i’s in the increas-ing order of magnitude and called such an ordered sample by ORSS. They obtained confidenceintervals for population quantiles and tolerance intervals based on ORSS. Also, Balakrishnan andLi [14] derived the best linear unbiased estimators (BLUEs) of location and scale parameters ofthe population distribution on the basis of ORSS, and showed that they are more efficient thanthe BLUEs based on RSS for the two-parameter exponential, normal, and logistic distributions.Suppose XORSS

1:m ≤ XORSS2:m ≤ · · · ≤ XORSS

m:m are the ORSS obtained from (X1,1, . . . , Xm,m). Then, thecdf of XORSS

i:m , denoted by FORSSi:m (.), is as follows [13]:

FORSSi:m (x) =

m∑t=i

∑St(m)

{t∏

l=1

Fjl :m(x)m∏

l=t+1

Fjl :m(x)

}, (2)

where Fjl :m(.) is as given in Equation (4). Recall that the pdf and cdf of Yk:n are (1 ≤ k ≤ n)

fk:n(y) = k

(n

k

)Fk−1(y)f (y)Fn−k(y) (3)

and

Fk:n(y) =n∑

t=k

(n

t

)Ft(y)Fn−t(y), (4)

respectively.Kaminsky and Nelson [7] considered two-sample prediction and obtained two-sided prediction

intervals for an order statistic from a future sample based on observed order statistics from acurrent sample. More specifically, let X1:m ≤ X2:m ≤ · · · ≤ Xm:m denote the order statistics froma sample of size m. Then, (Xr:m, Xs:m) is a two-sided distribution-free prediction interval for Yk:n,the kth-order statistic from a future sample of size n, with the following coverage probability [7]:

α1(r, s; m, k, n) = Pr{Xr:m ≤ Yk:n ≤ Xs:m} = k

(n

k

) s−1∑t=r

(m

t

)β(t + k, m + n + 1 − t − k), (5)

where β(a, b) = �(a)�(b)/�(a + b), for a, b > 0, is the complete beta function.Here, along the lines of Kaminsky and Nelson,[7] we are interested in the construction of

two-sided prediction intervals of the form (XORSSr:m , XORSS

s:m ) containing a future-order statistic Yk:n.We expect that the middle-order statistics will be predicted well through ORSS. In the followingtheorem, an explicit expression is derived for the prediction coefficient of prediction interval forfuture-order statistics based on ORSS.

Theorem 2.1 Let XORSS1:m ≤ XORSS

2:m ≤ · · · ≤ XORSSm:m be an ORSS of size m from a continuous pop-

ulation with cdf F(x) and pdf f (x). Also, let Yk:n be the kth-order statistic from a future randomsample of size n from the same cdf F(x). Then, (XORSS

r:m , XORSSs:m ), s > r ≥ 1, is a distribution-free

two-sided prediction interval for Yk:n with the coverage probability (1 ≤ k ≤ n)

π1(r, s; m, n, k) = k

(n

k

) s−1∑t=r

ψ(f1, t; m, n, k), (6)

where ψ is as defined earlier in Equation (1) and

f1 := β(B + k, mn + n − B − k + 1). (7)

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

4 M. Salehi et al.

Proof First, upon substituting Equation (4) into Equation (2), we can express FORSSr:m (x) (1 ≤

r ≤ m) as

FORSSr:m (x) =

m∑t=r

ψ(f2, t; m), (8)

where

f2 := [F(x)]B[F(x)]mn−B. (9)

Now, for a fixed x, we readily find that (r < s)

Pr{XORSSr:m < x < XORSS

s:m } = FORSSr:m (x) − FORSS

s:m (x). (10)

Next, using Equation (8), we write

Pr{XORSSr:m < x < XORSS

s:m } =s−1∑t=r

ψ(f2, t; m). (11)

Then, using conditional argument, we have

π1(r, s; m, n, k) = Pr{XORSSr;m < Yk:n < XORSS

s:m }

=∫ +∞

−∞Pr{XORSS

r;m < Yk:n < XORSSs:m |Yk:n = x}fk:n(x) dx.

Thus, from Equations (9) and (11) and the pdf of Yk:n in Equation (3), we obtain

π1(r, s; m, n, k) = k

(n

k

) s−1∑t=r

ψ(g, t; m, n, k),

where g := ∫ 10 uB+k−1(1 − u)mn+n−B−kdu and then the desired result follows. �

For symmetric distributions, we have the following result.

Remark 2.2 Suppose the cdf of X is symmetric, say about zero, without loss of any generality.Then, (XORSS

r:m , XORSSs:m ), r < s, is a two-sided prediction interval for Yk:n with coverage probability

γ if and only if (XORSSm−s+1:m, XORSS

m−r+1:m) is a two-sided prediction interval for Yn−k+1:n with the samecoverage probability γ . In other words,

π1(r, s; m, n, k) ≥ γ ⇔ π1(m − s + 1, m − r + 1; m, n, n − k + 1) ≥ γ . (12)

Corollary 2.3 From Theorem 2.1, we readily conclude that (XORSSr:m , ∞) and (−∞, XORSS

s:m )

are distribution-free one-sided prediction intervals for Yk:n with coverage probabilities π1(r, m +1; m, n, k) and 1 − π1(s, m + 1; m, n, k), respectively.

In Table 1, we have presented the values of π1(r, s; m, k, n) for n = 20, 50, m = 4, 6, 8, 10, 12,and some selected values of r, s and k.

From Table 1, we observe the following points:

• The middle-order statistics Yk:n are predicted better, based on ORSS, than the extreme-orderstatistics, and so it would be logical to determine an optimal prediction interval of the form(XORSS

r:m , XORSSm−r+1:m) for middle-order statistics;

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

Journal of Statistical Computation and Simulation 5

Table 1. The values of π1(r, s; m, k, n) for n = 20, 50, m = 4, 6, 8, 10, 12, and some selected values of r, s, and k.

m

4 6 8 10 12

s s s s s

n k r 4 4 5 6 6 7 8 8 9 10 10 11 12

20 1 1 0.180 0.256 0.256 0.256 0.322 0.322 0.322 0.379 0.379 0.379 0.427 0.427 0.4272 0.009 0.027 0.027 0.027 0.052 0.052 0.052 0.082 0.082 0.082 0.114 0.114 0.1143 0.000 0.001 0.001 0.001 0.006 0.006 0.006 0.013 0.013 0.013 0.024 0.024 0.024

5 1 0.714 0.837 0.851 0.852 0.919 0.920 0.921 0.955 0.955 0.955 0.973 0.973 0.9732 0.214 0.426 0.441 0.441 0.624 0.625 0.625 0.752 0.752 0.752 0.834 0.834 0.8343 0.020 0.102 0.117 0.118 0.272 0.273 0.273 0.435 0.435 0.435 0.573 0.573 0.573

10 1 0.941 0.730 0.943 0.990 0.919 0.988 0.998 0.976 0.997 1.000 0.992 0.999 0.9992 0.666 0.650 0.862 0.909 0.896 0.965 0.976 0.969 0.990 0.993 0.990 0.997 0.9973 0.210 0.372 0.585 0.632 0.782 0.851 0.861 0.926 0.947 0.950 0.973 0.980 0.981

1 0.795 0.196 0.570 0.912 0.405 0.751 0.960 0.589 0.856 0.980 0.723 0.915 0.98915 2 0.755 0.193 0.568 0.909 0.405 0.750 0.960 0.589 0.856 0.980 0.723 0.915 0.989

3 0.491 0.163 0.537 0.879 0.402 0.747 0.957 0.588 0.856 0.980 0.723 0.915 0.989

18 1 0.490 0.026 0.189 0.639 0.078 0.313 0.740 0.150 0.427 0.808 0.232 0.525 0.8552 0.486 0.026 0.189 0.639 0.078 0.313 0.740 0.150 0.427 0.808 0.232 0.525 0.8553 0.413 0.025 0.187 0.637 0.078 0.313 0.740 0.150 0.427 0.808 0.232 0.525 0.855

20 1 0.180 0.001 0.027 0.256 0.006 0.052 0.322 0.013 0.082 0.379 0.024 0.114 0.4272 0.180 0.001 0.027 0.256 0.006 0.052 0.322 0.013 0.082 0.379 0.024 0.114 0.4273 0.171 0.001 0.027 0.256 0.006 0.052 0.322 0.013 0.082 0.379 0.024 0.114 0.427

50 5 1 0.366 0.508 0.508 0.508 0.621 0.621 0.621 0.708 0.708 0.708 0.774 0.774 0.7742 0.026 0.077 0.077 0.077 0.149 0.149 0.149 0.234 0.234 0.234 0.323 0.323 0.3233 0.000 0.003 0.003 0.003 0.014 0.014 0.014 0.036 0.036 0.036 0.070 0.070 0.070

20 1 0.929 0.894 0.981 0.988 0.989 0.998 0.998 0.999 1.000 1.000 1.000 1.000 1.0002 0.527 0.743 0.830 0.837 0.946 0.954 0.954 0.988 0.988 0.988 0.997 0.997 0.9973 0.098 0.334 0.421 0.427 0.727 0.736 0.736 0.900 0.901 0.901 0.967 0.967 0.967

25 1 0.962 0.735 0.961 0.997 0.942 0.995 1.000 0.990 0.999 1.000 0.998 1.000 1.0002 0.714 0.687 0.914 0.950 0.935 0.989 0.993 0.989 0.999 0.999 0.998 1.000 1.0003 0.218 0.427 0.654 0.689 0.863 0.917 0.921 0.974 0.984 0.984 0.996 0.997 0.997

30 1 0.940 0.482 0.868 0.991 0.785 0.967 0.999 0.928 0.993 1.000 0.978 0.998 1.0002 0.822 0.472 0.858 0.982 0.784 0.967 0.998 0.928 0.993 1.000 0.978 0.998 1.0003 0.372 0.363 0.749 0.872 0.771 0.954 0.985 0.927 0.991 0.999 0.978 0.998 1.000

40 1 0.692 0.064 0.378 0.849 0.186 0.584 0.928 0.349 0.739 0.965 0.514 0.842 0.9832 0.684 0.064 0.378 0.849 0.186 0.584 0.928 0.349 0.739 0.965 0.514 0.842 0.9833 0.531 0.060 0.374 0.846 0.186 0.584 0.928 0.349 0.739 0.965 0.514 0.842 0.983

50 1 0.078 0.000 0.003 0.115 0.000 0.007 0.150 0.000 0.012 0.184 0.001 0.018 0.2162 0.078 0.000 0.003 0.115 0.000 0.007 0.150 0.000 0.012 0.184 0.001 0.018 0.2163 0.077 0.000 0.003 0.115 0.000 0.007 0.150 0.000 0.012 0.184 0.001 0.018 0.216

• For fixed values of m, n, and k, the prediction coefficient π1(r, s; m, n, k) is a decreasing functionof r and an increasing function of s;

• The prediction coefficient π1(r, s; m, n, k) is an increasing function of m when all others arefixed;

• The results in Table 1 also confirm the symmetry property mentioned in Remark 2.2. Forexample, we have π1(1, 11; 12, 20, 1) = π1(2, 12; 12, 20, 20) = 0.427;

• We observe empirically that, as n increases, the coverage probability π1(r, s; m, n, k) increases–decreases when k increases, and attains its maximum for k ≈ [(n + 1)/2], where [a] stands forinteger part of a.

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

6 M. Salehi et al.

2.1. Optimal prediction interval

It is important to mention that, for specified values of π0, k, and n, the two-sided prediction interval(XORSS

r:m , XORSSs:m ) exists if and only if for a suitable m, the inequality max1≤r<s≤m π1(r, s; m, n, k) ≥

π0 holds. From Equation (6), this condition is equivalent to

π1(1, m; m, n, k) = k

(n

k

) m−1∑t=r

ψ(f1, t; m, n, k) ≥ π0, (13)

where f1 is as given in Equation (7).For example, if n = 30, k = 15, and the given value of π0 is 0.95, then the minimum m which

satisfies the inequality in Equation (13), say mORSSopt , is equal to 4. For more such cases, one may

see the results in Table 3.For choosing an optimal prediction interval, if n, k, and the desired prediction level π0 are all

specified, we should start with m = 2 and increase it until π1(r, s; m, n, k) exceeds π0. Suppose,in addition, m is known too. Then, since there can be different choices of r and s for whichπ1(r, s; m, n, k) ≥ π0, we shall follow the algorithm to find an optimal prediction interval, say,(ropt, sopt):

(i) For given values of m, n, and π0, take C = {(r, s); π1(r, s; m, n, k) ≥ π0} and select (r, s) fromC such that minimizes the difference s − r;

(ii) If some choices of (r, s) have the same difference, then choose r and s such that

(ropt, sopt) = argmin{E(XORSSs:m − XORSS

r:m ), r < s}. (14)

As mentioned earlier, the prediction interval (XORSSs:m , XORSS

r:m ) is distribution-free and so we cancompute E(XORSS

r:m ) for the case of uniform U(0, 1)-distribution. Table 2 presents 90% optimalprediction intervals based on the above algorithm for n = 20 and some selected choices of kand m.

As mentioned earlier, for some given prediction level π0, k, and m, a prediction interval may notexist. For example, when k = 3 and m = 5, there is no prediction interval with level π0 = 0.90.These cases are indicated by a dash (–) in Table 2.

2.2. Comparison

For optimal interval, it is clear that we need mORSSopt , i.e. the minimum m for constructing prediction

interval with the shortest length for a specified level π0. Table 3 presents the values of mORSSopt for

π0 = 0.90, 0.95, n = 10, 20, 30, and 50 and some choices of k.

Table 2. 90% prediction interval for future-order statistic Yk:20 based on ORSS of size m.

k

m 3 5 8 10 12 15 20

3 – – – – – – –4 – – [1, 4] [1, 4] [1, 4] – –5 – – [1, 4] [1, 5] [2, 5] – –6 – – [1, 5] [2, 6], [1, 5]a [2, 6] [2, 6] –7 – – [1, 5] [2, 6] [3, 7] [3, 7] –8 – [1, 5] [1, 6] [2, 6] [3, 7] [4, 8] –9 – [1, 5] [1, 6] [2, 7], [3, 8]a [4, 9] [5, 9] –10 – [1, 5] [2, 7] [3, 8] [4, 9] [5, 10] –

Note: aThe same expected widths.

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

Journal of Statistical Computation and Simulation 7

Table 3. The values of mORSSopt and mOS

opt for π0 = 90%, 95% and some selected values of k and n.

π0 π0

90% 95% 90% 95%

n k mORSSopt mOS

opt mORSSopt mOS

opt n k mORSSopt mOS

opt mORSSopt mOS

opt

10 3 8 11 11 16 30 10 5 7 6 94 5 7 7 10 13 4 6 5 75 4 6 5 8 15 4 5 4 66 4 6 5 8 17 4 5 5 77 5 7 7 10 20 5 6 6 8

20 5 8 11 10 15 50 15 5 8 7 108 5 6 6 8 20 4 6 5 7

10 4 5 5 7 25 4 5 4 612 4 6 5 7 30 4 5 5 715 6 9 8 12 35 5 7 6 9

Table 4. Values of π1 = π1(1, m; m, n, k) and α1 = α1(1, m; m, n, k) for some selected values of m, k, and n.

m

4 5 6 7 8

n k π1 α1 π1 α1 π1 α1 π1 α1 π1 α1

20 5 0.7147 0.6286 0.7960 0.7058 0.8527 0.7634 0.8925 0.8076 0.9205 0.842110 0.9410 0.8385 0.9770 0.9058 0.9906 0.9435 0.9959 0.9652 0.9982 0.978115 0.7950 0.7002 0.8668 0.7764 0.9121 0.8296 0.9411 0.8682 0.9598 0.8967

50 20 0.9289 0.8254 0.9707 0.8945 0.9879 0.9346 0.9950 0.9586 0.9980 0.973325 0.9618 0.8602 0.9893 0.9249 0.9971 0.9590 0.9992 0.9773 0.9998 0.987330 0.9401 0.8370 0.9773 0.9047 0.9914 0.9429 0.9967 0.9651 0.9987 0.9782

From Table 3, it is easy to see that mORSSopt is smallest when constructing prediction interval

for the median of the future sample, which does support our intuition. We are now interestedin comparing our results with those of Kaminsky and Nelson.[7] To this end, let us denote theminimum m that satisfies the inequality α1(1, m; m, n, k) ≥ π0 by mOS

opt.In Table 3, we have presented the values of mORSS

opt and mOSopt for π0 = 0.90, 0.95 and some

selected values of k and n. From these values, we observe that mORSSopt is smaller than mOS

opt, thusrevealing the superiority of prediction intervals based on ORSS over the prediction intervals basedon the usual-order statistics.

Also, in Table 4, we have presented the values of π1(1, m; m, n, k) and α1(1, m; m, n, k) forsome selected choices of m, k, and n in order to show once again the superiority of ORSS in theprediction of future-order statistics as compared with the usual-order statistics.

3. Prediction of sample mean

Suppose we have observed XORSS1:m , XORSS

2:m , . . . , XORSSm:m as an ORSS of size m, and we are interested

in predicting the sample mean of a future sample of size n. With this in mind, let Y1, . . . , Yn be afuture random sample of size n from the same distribution F(x). Then,

[Yk:n ∈ (Xrk :m, Xsk :m), k = 1, . . . , n] ⇒ [Y ∈ (L, U)], (15)

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

8 M. Salehi et al.

where L = (1/n)∑n

k=1 Xrk :m and U = (1/n)∑n

k=1 Xsk :m. Equation (15) implies that

Pr(Y ∈ (L, U)) ≥ Pr

{n⋂

k=1

(Xrk :m ≤ Yk:n ≤ Xsk :m)

}

≥ 1 −n∑

k=1

[1 − Pr(Xrk :m ≤ Yk:n ≤ Xsi:m)]

= 1 − n +n∑

k=1

π1(rk , sk; m, n, k), (16)

where π1(r, s; m, n, k) is as given in Equation (6).

4. Prediction of record values

Ahmadi and Balakrishnan [15] derived a distribution-free prediction interval for future recordson the basis of observed order statistics. Here, we shall derive a prediction interval for the mthfuture record value, Rm (upper or lower), based on an observed ORSS of size m, i.e. we wish tofind an interval of the form (XORSS

r:m , XORSSs:m ), r < s, such that, for a given level π0, we have

Pr{XORSSr:n < Rm < XORSS

s:n } ≥ π0. (17)

In this section, distribution-free two-sided prediction intervals of the form (17) are derived forupper and lower records from a future sequence based on the observed ORSS.

4.1. Results for upper records

Let {Xi, i ≥ 1} be a sequence of iid random variables from a population with cdf and pdf as F(y)and f (y), respectively, and Uk be the corresponding kth upper record. Then, the pdf of Uk is [1]

fUk (u) = {− log F(u)}k−1

(k − 1)! f (u). (18)

Due to the very nature of upper records, it is logical that we find optimal intervals of the form(XORSS

r:m , XORSSm:m ).

Theorem 4.1 Suppose the assumptions of Theorem 2.1 hold. Also, let {Xn, n ≥ 1} be a futuresequence of iid random variables from the same cdf F(x) and pdf f (x), and Uk be the associatedkth upper record. Then, (XORSS

r:m , XORSSs:m ), s > r ≥ 1, is a distribution-free two-sided prediction

interval for Uk with its prediction coefficient (k ≥ 1) as

π(U)2 (r, s; m, k) =

s−1∑i=r

ψ(f3, t; m, k), (19)

where

f3 :=B∑

d=0

(−1)d(B

d

)(mn + d + 1 − B)k

. (20)

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

Journal of Statistical Computation and Simulation 9

Proof Using conditional argument in Equation (11), we obtain

π(U)2 (r, s; m, k) = Pr{XORSS

r:m ≤ Uk ≤ XORSSs:m }

=∫ +∞

−∞Pr{XORSS

r:m < u < XORSSs:m }fUk (u) du

=∫ +∞

−∞

s−1∑t=r

ψ(f2fUk (u), t; m, k) du

=s−1∑t=r

ψ(h, t; m, k), (21)

where f2 is as given in Equation (9) and h := ∑Bd=0(−1)d

(Bd

) ∫ +∞−∞ [F(u)]mn−B+dfUk (u) du. Upon

substituting fUk (u) in Equation (18) into the function h and then using the substitution v =− log F(u), the proof follows readily. �

Corollary 4.2 Suppose the conditions of Theorem 4.1 hold. Then, XORSSr:m and XORSS

s:m are,respectively, the lower and upper prediction bounds for Uk , k ≥ 1, with prediction coefficients asπ

(U)2 (r, m + 1; m, k) and 1 − π

(U)2 (s, m + 1; m, k), respectively.

As is the proceeding subsection, for a given k and the level π0, the necessary and sufficientcondition for the existence of the prediction interval (XORSS

r:m , XORSSs:m ) in Theorem 4.1 is

max1≤r≤s≤m

π(U)2 (r, s; m, k) = π

(U)2 (1, m; m, k) ≥ π0. (22)

In Table 5, we have presented the values of π(U)2 (r, m; m, k) in Equation (19) for m = 14, k =

1, . . . , 5, and some selected values of r.From Table 5, it can be observed that πU

2 (r, s; m, k) in Equation (19) is a non-increasing functionwith respect to r. For fixed values of m and r, if we set kmax = argmax{π(U)

2 (r, m; m, k), k ≥ 1},then from Table 5, we can see that kmax is a non-decreasing function of r, which is expected dueto the very nature of upper record values.

4.2. Results for lower records

Let {Xi, i ≥ 1} be a sequence of iid random variables from a population with cdf and pdf as F(y)and f (y), respectively, and Lm be the associated mth lower record. Then, the pdf of Lm is [1]

fLm(u) = [− log F(u)]m−1

(m − 1)! f (u). (23)

Table 5. The values of π(U)2 (r, m; m, k) in Equation (19), for m = 14,

k = 1, . . . , 5, and some choices of r.

k

r 1 2 3 4 5

1 0.903 0.818 0.630 0.435 0.2752 0.837 0.812 0.629 0.435 0.2753 0.768 0.800 0.628 0.435 0.2755 0.628 0.757 0.621 0.434 0.27410 0.275 0.502 0.516 0.402 0.267

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

10 M. Salehi et al.

Then, the following theorem can be established analogous to Theorem 4.1.

Theorem 4.3 In addition to the assumptions of Theorem 4.1, suppose Lk is the kth lower recordfrom the X-sequence. Then, (XORSS

r:m , XORSSs:m ), s > r ≥ 1, is a distribution-free two-sided prediction

interval for Lk with its prediction coefficient (k ≥ 1) as

π(L)2 (r, s; m, k) =

s−1∑i=r

ψ( f4, t; m, k), (24)

where

f4 :=mn−B∑d=0

(−1)d(mn−B

d

)(B + d + 1)k

. (25)

Corollary 4.4 Under the assumptions of Theorem 4.3, we have

(i) The r.v. XORSSr:m is a lower prediction bound for Lk , k ≥ 1, with prediction coefficient π(L)

2 (r, m +1; m, k);

(ii) The r.v. XORSSs:m is an upper prediction bound for Lk , k ≥ 1, with prediction coefficient 1 −

π(L)2 (s, m + 1; m, k).

Proposition 4.5 For symmetric distribution, a useful relation between the coverage probabili-ties π

(U)2 (r, s; m, n, k) and π

(L)2 (r, s; m, n, k) given in Equations (19) and (24), respectively, is as

follows:

π(L)2 (r, s; m, k) = π

(U)2 (m − s + 1, m − r + 1; m, k). (26)

Proof Let us assume that F is symmetric, say about zero. Then, Ukd= −Lk , k ≥ 1, and so

π(L)2 (r, s; m, k) = Pr{XORSS

r:m ≤ Lk ≤ XORSSs:m }

= Pr{XORSSm−s+1:m ≤ Uk ≤ XORSS

m−r+1:m}= π

(U)2 (m − s + 1, m − r + 1; m, k),

as required. �

One can use the above proposition to obtain π(L)2 (r, s; m, k) for some known choice values of

r, s, m, and k from the results in Table 5. It is logical to find intervals of the form (XORSS1:m , XORSS

s:m )

for the prediction of lower record values.

4.3. Comparison

Recently, Ahmadi and Balakrishnan [15] obtained prediction intervals for the kth future upperrecord based on the usual-order statistics arising from a SRS. Specifically, suppose X1:m ≤ X2:m ≤· · · ≤ Xm:m are the observed order statistics from a SRS of size m from a population with cdf F(x),and Uk is the kth future upper record coming from a sequence of variables from the same populationwith the same cdf F(x). Then, (Xr:m, Xs:m) is a two-sided distribution-free prediction interval forUk with its prediction coefficient as [15]

α(U)2 (r, s; m, k) =

s−1∑t=r

t∑i=0

(m

t

)(t

i

)(−1)i

(m + i − t + 1)k. (27)

In this subsection, we are interested in comparing the prediction interval presented in Theorem 4.1

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

Journal of Statistical Computation and Simulation 11

Table 6. The values of (mORSSmin , mOS

min) for someselected values of π0 and k.

π0

k 0.7 0.8 0.9

1 (5,6) (7,9) (14,19)2 (7,10) (13,17) (23,40)

Table 7. The values of π(U)2 = π

(U)2 (1, m; m, k) and α

(U)2 = α

(U)2 (1, m; m, k) for some selected

values of m and k.

k

1 2 3 4

m π(U)2 α

(U)2 π

(U)2 α

(U)2 π

(U)2 α

(U)2 π

(U)2 α

(U)2

13 0.8960 0.8571 0.8078 0.7626 0.6154 0.5658 0.4203 0.379114 0.9033 0.8667 0.8179 0.7743 0.6296 0.5800 0.4347 0.392615 0.9096 0.8750 0.8270 0.7848 0.6424 0.5931 0.4480 0.405116 0.9152 0.8824 0.8350 0.7942 0.6541 0.6051 0.4604 0.4169

with that of Ahmadi and Balakrishnan [15] having its prediction coefficient as in Equation (27).To this end, for some fixed values of π0 and k, we have presented the values mOS

min (the optimalvalue of m satisfying the inequality α

(U)2 (1, m; m, k) ≥ π0), and mORSS

min (the optimal value of msatisfying the inequality (22)) in Table 6.

Table 6 shows that mORSSmin is smaller than mOS

min, which means that the optimal prediction intervalbased on ORSS is narrower than that based on the usual-order statistics.

Table 7 presents the prediction coefficients π(U)2 (1, m; m, k) and α

(U)2 (1, m; m, k) for some

selected choices of m and k. It is readily observed that the former is larger than the latter, thus reveal-ing the superiority of the prediction interval based on ORSS over that based on the usual-orderstatistics.

5. Summary

In this paper, distribution-free two-sided prediction intervals for future-order statistics as well asrecord values are developed based on an ORSS. We then compare these prediction intervals withthose of Kaminsky and Nelson [7] and Ahmadi and Balakrishnan,[15] respectively, and displaythe superiority of the prediction intervals than the corresponding ones based on the usual-orderstatistics and those based on the record values. We also describe the determination of optimalprediction intervals.

References

[1] Arnold BC, Balakrishnan N, Nagaraja HN. Records. New York: John Wiley & Sons; 1998.[2] Arnold BC, Balakrishnan N, Nagaraja HN. A first course in order statistics. Classics in applied mathematics, 54.

Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM); 2008.[3] Balakrishnan N, Rao CR, editors. Order statistics: theory and methods. Handbook of statistics – vol. 16. Amsterdam:

North-Holland; 1998.[4] Balakrishnan N, Rao CR, editors. Order statistics: applications. Handbook of statistics – vol. 17. Amsterdam:

North-Holland; 1998.[5] David HA, Nagaraja HN. Order statistics. 3rd ed. Hoboken, New Jersey: John Wiley & Sons; 2003.

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013

12 M. Salehi et al.

[6] Dunsmore IR. The future occurrence of records. Ann Inst Statist Math. 1983;35:267–277.[7] Kaminsky KS, Nelson PI. Prediction of order statistics. In: Balakrishnan N, Rao CR, editors. Order statistics:

applications. Handbook of statistics – vol. 17. Amsterdam: North-Holland; 1998. p. 431–450.[8] Ahmadi J, Doostparast, M. Bayesian estimation and prediction for some life distributions based on record values.

Statist Papers. 2006;47:373–392.[9] Raqab M, Balakrishnan N. Prediction intervals for future records. Stat Probab Lett. 2008;78:1955–1963.

[10] Vock M, Balakrishnan N. Nonparametric prediction intervals based on ranked set samples. Commun Statist TheoryMethods. 2012;41:2256–2268.

[11] Frey J. A note on a probability involving independent order statistics. J Stat Comput Simul. 2007;77:969–975.[12] McIntyre GA. A method for unbiased selective sampling, using ranked sets. Aust J Agri Res. 1952;3:385–390.[13] Balakrishnan N, Li T. Ordered ranked set samples and applications to inference. Ann Inst Statist Math. 2006;58:

757–777.[14] Balakrishnan N, Li T. Ordered ranked set samples and applications to inference. J Statist Plann Inference.

2008;138:3512–3524.[15] Ahmadi J, Balakrishnan N. Prediction of order statistics and record values from two independent sequences. Statistics.

2010;44:417–430.

Dow

nloa

ded

by [

Uni

vers

ity o

f L

eice

ster

] at

12:

33 1

3 Ju

ne 2

013