Infinitesimal sensitivity of posterior distributions

10
The Cunadiun Joumul of Stutisfics Vol. 21, No. 2, 1993, Pages 195-203 Lo Revue Cunudienne de Stutistique 195 Infinitesimal sensitivity of posterior distributions Fabrizio RUGGERI’ and Larry WASSERMANt C. N.R. -I. A. M. I and Camegie Mellon University Key words and phrases: Bayesian robustness, Fr6chet derivatives, likelihood regions. AMS 1985 subject classifications: Primary 62F15; secondary 62F35. ABSTRACT We measure the local sensitivity of a posterior expectation with respect to the prior by computing the norm of the Frkchet derivative of the posterior with respect to the prior over several different classes of measures. We compute the derivative of the posterior upper expectation when the prior varies in a restricted c-contamination class. A bound on the global sensitivity of a class of priors is obtained. As an application, we show that of all sets with posterior probability 1 -a, the likelihood region minimizes the norm of the Frichet derivative over the c-contamination class and so is, in some sense, the most robust region with this posterior probability. But there exist counterexamples to this result for other classes of priors. RESUME Nous mesurons la sensibilitk locale d’une espkrance a posteriori, relativement ? I la distribution a priori, en calculant la norme de la dkrivke de Frkchet de la distribution a posteriori, relativement ti la distribution a priori, pour plusieurs classes de mesures diffkrentes. Nous utilisons la dirivtk de Frkchet afin de calculer la dkrivie de I’esp6rance supirieure a posteriori lorsque la distribution a priori varie dans une classe restreinte de €-contamination. Une borne pour la sensibilitb globale d’une classe de distributions a priori est obtenue. Nous montrons, en tant qu’application, que parmi tous les ensembles avec une probabilitk a posteriori de 1 - a, la rigion de vraisemblance minimise la nonne de la dCrivCe de Frkchet dans la classe de c-contamination et est donc, d’une certaine faqon, la rkgion la plus robuste ayant cette probabiliti a posteriori. Cependant il existe des contre-exemples ti ce rksultat pour d’autres classes de distributions a priori. 1. INTRODUCTION Sensitivity of the posterior distribution to the prior is a topic that has received much attention; see Berger (1984, 1985, 1986, 1990) and Wasserman (1992) and the references contained therein. Most of this literature is concerned with finding bounds on posterior quantities of interest as the prior varies in a class of distributions. We call this global sensitiviry analysis. Less attention has been given to studying the effect on the posterior of small changes in the prior. We call this local sensitivity analysis. Local sensitivity can be useful when we want to compare the sensitivity of several posteriors or several posterior quantities of interest. Furthermore, much global sensitivity analysis depends on specifying a parameter that measures our belief in the prior. Local sensitivity analysis ~ *This research was carried out while the first author was visiting the Department of Statistics, Camegie tResearch supported by NSF grant DMS-9005858. Mellon University.

Transcript of Infinitesimal sensitivity of posterior distributions

The Cunadiun Joumul of Stutisfics Vol. 21, No. 2, 1993, Pages 195-203 Lo Revue Cunudienne de Stutistique

195

Infinitesimal sensitivity of posterior distributions Fabrizio RUGGERI’ and Larry WASSERMANt

C. N. R. -I. A. M. I and Camegie Mellon University

Key words and phrases: Bayesian robustness, Fr6chet derivatives, likelihood regions. AMS 1985 subject classifications: Primary 62F15; secondary 62F35.

ABSTRACT

We measure the local sensitivity of a posterior expectation with respect to the prior by computing the norm of the Frkchet derivative of the posterior with respect to the prior over several different classes of measures. We compute the derivative of the posterior upper expectation when the prior varies in a restricted c-contamination class. A bound on the global sensitivity of a class of priors is obtained. As an application, we show that of all sets with posterior probability 1 -a, the likelihood region minimizes the norm of the Frichet derivative over the c-contamination class and so is, in some sense, the most robust region with this posterior probability. But there exist counterexamples to this result for other classes of priors.

RESUME

Nous mesurons la sensibilitk locale d’une espkrance a posteriori, relativement ?I la distribution a priori, en calculant la norme de la dkrivke de Frkchet de la distribution a posteriori, relativement ti la distribution a priori, pour plusieurs classes de mesures diffkrentes. Nous utilisons la dirivtk de Frkchet afin de calculer la dkrivie de I’esp6rance supirieure a posteriori lorsque la distribution a priori varie dans une classe restreinte de €-contamination. Une borne pour la sensibilitb globale d’une classe de distributions a priori est obtenue. Nous montrons, en tant qu’application, que parmi tous les ensembles avec une probabilitk a posteriori de 1 - a, la rigion de vraisemblance minimise la nonne de la dCrivCe de Frkchet dans la classe de c-contamination et est donc, d’une certaine faqon, la rkgion la plus robuste ayant cette probabiliti a posteriori. Cependant il existe des contre-exemples ti ce rksultat pour d’autres classes de distributions a priori.

1. INTRODUCTION

Sensitivity of the posterior distribution to the prior is a topic that has received much attention; see Berger (1984, 1985, 1986, 1990) and Wasserman (1992) and the references contained therein. Most of this literature is concerned with finding bounds on posterior quantities of interest as the prior varies in a class of distributions. We call this global sensitiviry analysis. Less attention has been given to studying the effect on the posterior of small changes in the prior. We call this local sensitivity analysis. Local sensitivity can be useful when we want to compare the sensitivity of several posteriors or several posterior quantities of interest. Furthermore, much global sensitivity analysis depends on specifying a parameter that measures our belief in the prior. Local sensitivity analysis

~

*This research was carried out while the first author was visiting the Department of Statistics, Camegie

tResearch supported by NSF grant DMS-9005858. Mellon University.

1 96 RUGGERI AND WASSERMAN Vol. 21, No. 2

does not require this, since it measures infinitesimal sensitivity. In many cases, both local and global sensitivity analyses are useful.

Diaconis and Freedman (1986) and Berger (1986) suggested using the Fr6chet deriva- tive of the posterior distribution with respect to the prior as a measure of the sensitivity of posterior inferences. Similar ideas are in Cuevas and Sanz (1988). Srinivasan and Truszczynska (1990), and Sivaganesan (1993). In this paper, we implement this idea. First, we follow Diaconis and Freedman by computing the nonn of the Fdchet deriva- tive of the posterior expectation of a function with respect to the prior over the class of all signed measures with zero mass. Next, we compute the norm of the derivative over e-contaminated priors and restricted versions of this class. Recall that if P is a given prior for the parameter 8 E 0, then the e-contaminated class of priors is defined by

= {a; a = (1 - e)P + eQ; Q E T}, where e E [0,1] and T is the set of all priors. If T is replaced with some subset, S say, the resulting restricted class of priors will be called a restricted-e contamination class and will be &noted by I?:. We shall compute the norm over some restricted classes as well. Ruggeri and Wasserman (1991) consider local sensitivity using a different type of derivative with density-based classes of priors.

Given a class re, we shall call Fe(A) = supper, P(A) the upper prior probability and FJAlx) = Supper, P(Alx) the upper posterior probability. Similarly, c ( g ) = suppg, ‘€p(g) is the upper prior expectation of the measurable function g , where % ( g ) = Jg(e )P(de ) . and % ( g ( x ) = suppa, ‘&(Lg)/Ep(L) is the upper poste- rior expectation of g , where L(8) is the likelihood function. Lower bars will be used to indicate lower bounds, and the corresponding quantities will be called lower prior probabilities, etc.

In the next section, we shall compute the norm of the Fdchet derivative. We shall obtain the norm over a variety of classes of priors. In Section 3 we discuss the interpretation of the norms. In Section 4, we use our results to prove that likelihood regions are maximally robust in a certain sense. This result may be regarded as an infinitesimal version of the result in Wasserman (1989) which says that liilihood regions are robust for e-contaminated neighborhoods. We regard this as further evidence to argue for the use of likelihood regions in Bayesian inference. However, it should be pointed out that there are classes of priors for which the likelihood regions are not maximally robust (Ruggeri 1991).

2. FRECHET DERIVATIVES

Let the sample space X and the parameter space 0 be complete, separable metric spaces, and let B ( X ) and B (0) be their Bore1 o-algebras. Let the model {Qe; 8 E 0) be a family of probability measures on ( X , B ( X ) ) with common, o-finite dominating measure h. We writef(x1e) for the value of dQe/dA at x, and we assume that supeEef(x)O) is finite, for each x E X.

For a prior distribution P on (0, B(0)) let Tg( P ) be the posterior expectation of a measurable function g:

where N~ = j g(e$(x ie)p(de) , D~ = jQ(xie)P(de), andf(xle) =f(xle)/suped(xie). Assume that Dp is nonzero and that P is nonatomic. Unless otherwise indicated, the integrals are over all of 0.

Consider the set {a; R = P + 6 ) , where 6 is a signed measure with S<0) = 0. Note that the set A of such 6 is a normed, linear space with norm given by lf6ll = d(6,0), where

1993 POSTERIOR DISTRIBUTIONS 197

d( P , Q ) = S U P ~ ~ B ( ( O ) ) P(A) - Q(A)/ is the total-variation metric. Note that d ( P , Q ) = 4 f 1 f (8)-g(O)(p(de), where p dominates P and Q , f = dP/dp , and g = d Q / d p (Strasser 1985, Lemma 2.4). We regard T, as a nonlinear operator taking A into W.

Let T[ be the Frichet derivative of the operator T,. In other words, T; is a linear map on A that satisfies fL(6) = T,( P + 6 ) - T,( P)+o(llSll). We regard T; as a measure of how a small change in P affects the posterior expectation. In what follows, Ap will be the set of all signed measures 6 of the form 6 = E(Q - P ) where Q E P and E E (0,1] (so that P + 6 E P) , and Ap will be the set of all signed measures 6 of the form 6 = c(Q - P ) where Q is discrete. The norm of Ti over a class M c A is defined by llf:ll~ = supsEM If[(S)l/llSll. The norm Ilf:ll without a subscript denotes the norm over all of A. The first two theorems are similar to Theorem 4 in Diaconis and Freedman (1986). For completeness, we include the proofs in the appendix.

THEOREM 1. Suppose that g f i s bounded. Then f[(6) = (Dp)-'(NR - pDs), where p =

Define h(B) = f ( x l 8 ) { g(8) - p}, h = supece h(e) ,h = infoE@ h(B), and b = h - h. We assume that b > 0. As a measure of the sensitivity of the posterior expectation to the prior, we follow Diaconis and Freedman (1986) in using the norm of the derivative.

THEOREM 2.

Tg ( P).

' 11 - - ( Dp)-' {h - h}.

Now consider the special case where g is the indicator function for a set A. In this case, we write TA for T,.

COROLLARY 1. If g is the indicator function for the set A, then

where p = P(A(x) ,q = 1 - p , and for any B E B((o), SB = supecBf(xle).

We now compute the norm over the set of c-contaminated priors. Note that if n = (1 - c )P + c Q E r,, where Q E P , then K = P + 6 , where 6 = E(Q - P ) , so that 6 E A p . Also, ))61) 5 c and llSll = c if Q is discrete. Note that, for such a 6, f!(6) = EDQ{T(Q) - T ( P ) } / D p . The derivative divided by the norm of 6 is ~ D Q { T ( Q ) - T(P) } / { eDpd(P ,Q>> = DQ{T(Q> - T ( P ) } / { D p d ( P , Q > } = .f h ( e ) Q ( d e ) / { 4 P , Q ) J f ( x l W ( d e ) ) .

We shall now show that the norm over Ap is the same as over the whole class A. Thus, by formally allowing perturbations that are not probability measures we have not changed the norm.

THEOREM 3.

I I q l P = (DP)-'@ -b}.

Proof. Since A p C A, it follows from Theorem 2 that I l f [ l [ p 5 (Dp)-'(h-hJ Let {q} be a sequence of real numbers such that h > CI > c2 > . . and ci 1 h. Let H; = (0; h(B) 2 ci} and Ji = Hi". Let Bi E Hi be such that h(e;) T h, let pi = P ( J i ) , and let Si be a point mass at 8;. We set Q; = p j G i + P j , where P;(A) = P(AnH;) for everyA E B((0). Finally, let h; = SUpecj, h(B), and note that h; 1 h.

198 RUGGERI AND WASSERMAN Vol. 21. No. 2

so that d( P , Qi) 5 pi. Hence, d( P, Q,) = pi as claimed. Now, noting that Je h(B)P(de) = 0 implies that JHi h@)P(dO) = - J,, h(B)P(de), we have

The first term converges to h. We claim that the second term converges to h. This fol- lows because (l/pi) J,i h(B)P(d0) - h = JJ,(h - &P(de)/Pi 5 (hi - b) lJi P(de)/pi = (hi - 4) .--* 0.

Hence the result follows. Q.E.D. It is interesting to compute the norm of the derivative over the class 6 = e(Q - P)

where Q is discrete. This is done in the next theorem. This might seem rather odd, but, as we shall see in Section 3, the resulting expression has a natural statistical interpretation.

THEOREM 4. I l T ~ I l ~ ~ = ( ~ p ) - ’ max{h, -h}.

Proof. Let 6 = e(Q - P ) with Q discrete. Thus, d( P , Q ) = 1, so that Dplf:@)l/ll6ll = eDplT[(Q - P ) l / d ( P , Q ) = I Jh(O)Q(d€l)l. Let i. denote the set of discrete probability measures. Then r+ = { Q E p; J h@)Q(de) 2 0 ) and r- = {Q E f; J’h(e)Q(de) < 0). Then

Q.E.D.

COROLLARY 2. l f g is the indicatorfunction for the set A, then l(T[ll+ = max{qSA,pSAc}.

The restriction to point masses changes the norm of the derivative, since point masses are not dense with respect to total variation norm.

Another important class in Bayesian robustness is the quantile class rQ [see, among others, Berger (1990) and Ruggeri (1990)l. containing all the priors sharing some given quantibs, or, equivalently, giving fixed probabilities pj to sets 4, j = 1.. . . , n, that partition 0. Given P E r,, let AQ be the set of all signed measures 6 of the form 6 = e(Q - P), where Q E rQ. THEOREM 5. Suppose that P is absolutely continuous with respect to Lebesgue measure. Then

llT[llQ = ( ~ P 1 - l C p j ( h j -bj)*

1993 POSTERIOR DISTRIBUTIONS 199

where h, = sup^^^^ h(8) and bj = info,^, h(8), j = 1,. . . ,n.

Proof. Since AQ C AT, it follows from Theorem 3 that l l ? : l l ~ 5 (DP)-' Cpj C p j ( h j -

For a fixed a, where 0 < a < 1, and j = 1 , . . . , n, let 8j') E Kj be such that h(8y)) t hj, and let Hi'') C K, be such that P(Hi(i)) = (1 - a)pj and infeEsc, h(8) > hi, with Jj") = 4 - Hi''), so that there are 8;') E .I)') such that h(8ji)) 1 tz,. The proof is now similar to Theorem 3, talung Qi = 01 CpjG,!" +Pi, where G y ) is the point mass at 8:) and f'i coincides with P over all Hi(l)'s and is null otherwise, and letting 01 10.

In the next theorem, the norm of the derivative is computed over AQ,, the restriction of AQ to discrete Q's. The proof is omitted.

THEOREM 6.

bj).

I

Q.E.D.

11?;114 = ( ~ p ) - ' max{Zpjh,, -~p,k,}.

Since the norm depends heavily on the reference prior P, in general the supremum of such norms, as P varies in r Q , would be a better measure. The supremum is bounded provided that there exists at least one interval K, such that the infimum of the likelihood function is greater than zero, so that such definition makes sense.

In general, it is not possible to find the norm of the derivative over a restricted class of measures. For some cases, in addition to those considered already, bounds are obtainable. For example, consider the following.

Given a, 0 < a 5 1, and rp C {Q*; d (Q* ,P) = a}, consider rs = {Q; Q = (1 - h)P + he*, 0 5 h 5 1, Q* E Tp}. Then it can be shown that Il?!ll~ = (Dp)-' max{'&*h, -ES.h}/a. Similarly, given a, 0 < a 5 1, consider r, = {Q; d ( Q , P ) 5 a}. Then, by using an argument similar to that of Theorem 3, [ l? [ l [~ = (Dp)-'(h- b).

3. INTERPRETING POSTERIOR DERIVATIVES

The norms in Theorems 2 and 4 differ. Now we discuss the interpretation of these norms. Let S be a set of priors, and let r, = {(I - E ) P + E Q ; Q E S }. This is a restricted €-contamination class. By choosing S to represent structural constraints on the prior, we get a class r, that is rich and that captures essential features of the prior. For example, S may be the set of all unimodal priors with given mode, or a set of priors with given quantiles. See Sivaganesan and Berger (1989) for example.

Given such a class, we are interested in st( g Ix ) = SUppGr, Ep( glx) and 2,( g I x ) = infpEr, EP( g Ix) . These bounds are of direct interest and are easy to interpret. Indeed, they are the focus of most attention in Bayesian robustness. Now it is natural to define AP = lim,lo{'E,(glx) - E , ( g l x ) } / E as a measure of sensitivity. Here, p = Ep( glx ) . This is reasonable, since these quantities literally measure how fast our inferences are changing as we depart from P. But it is easy to see that this is essentially the norm of the Frkchet derivative. To see this note that T(( 1 - E ) P + E Q ) = T( P ) + T! ( E ( Q - P ) ) + O ( E )

uniformly in Q E S , since { Q - P ; Q E S } is bounded in total variation norm. Hence, %,( g ) = T( P ) + SUPQES T; (E(Q - P ) ) + o(E) , SO that

Thus Ap = S U P Q ~ S ?!(Q - P) - infQEs ?[(Q - P) = (&)-I{ 2 (h) - !€(h)}. If we take S to be all distributions, we get AP = (Dp)-'(h - hJ = ll?:ll.

200 RUGGERI AND WASSERMAN Vol. 21. No. 2

On the other hand, if we instead defined A' to be the maximum of

lim f 10 €

g f ( g l 4 - P

and

we would find that AP = 1 1 p: I[+, , that is, the norm over the discrete measures. Now consider how we might measure the global sensitivity of the class S . To this end

we define the global sensitivity of S by A = suppEs A'. We now establish a bound on A. We assume that Q > 0, where Q = infpES DQ. The proof is straightforward and is omitted.

THEOREM 7. A 5 A, where

where a = infpEs !€p(g(x), b = suppEs Ep(glx), and h,(B) =](xl0){g(8) - c} .

selecting robust priors.

A = o-'(!Eha - z h b ) ,

A could be useful in a variety of problems, such as choosing robust designs and

4. APPLICATION TO LIKELIHOOD REGIONS

In Wasserman (1989), the following result was proved. If A is the set of all subsets of 0 that have posterior probability 1 - a under the prior P , with 1 - a > 0.5, then, under sufficient regularity assumptions, the set A E A that minimizes Ff(A\x) - PJAlx) is the likelihood region L, = (0; L(0) > c}, where c is chosen so that P(L,lx) = 1 - a. Here, L(0) =](xl0), and the upper and lower bounds are over the set of €-contaminations of the prior P . In this sense, it was argued that the likelihood regions are robust and deserve more attention in Bayesian inference. Generally, highest-posterior-density regions are preferred in Bayesian inference, but,likelihood regions play an important role in frequentist theory. The robustness property of likelihood regions is interesting because it provides some justification for their use in Bayesian inference as well. In this section, we prove an infinitesimal version of the above result.

To this end, let A be as above, and now assume that L is bounded and continuous and that P is mutually absolutely continuous with respect to Lebesgue measure. Also, assume that the maximum-likelihood estimate 6 exists. In the following theorem, we shall be taking the norm of the derivative over the unrestricted €-contamination class. The theorem is also true over the set of discrete measures.

We caution the reader that this result depends on the class of priors one is dealing with. As stated in Ruggeri (1991), very particular counterexamples may be given to prove that over some classes of priors, the likelihood regions are not maximally robust. Such an example appears after the following theorem.

THEOREM 8. r f 1 - a > 0.5, rhenfor every A E A , Il?'Lcll 5 llfI\l. Proof. By Corollary 1, D p l l ~ ~ l \ = qSA + ~ S A ~ . Now, suppose that A E A . Then Dpll?'~ll = ~ S A + (1 - a)& = U(A), say. Following Wasserman (1989, Theorem 3.1) w e d e f i n e A + = { A E A ; d E A } a n d A - = { A E A ; 6 # A } , a n d w e c o n s i d e r t w o cases. First consider A E A+. Then, U(A) = a+ ( 1 - a)s, where s = SAC. Let c be such that P(L,Ix) = 1 - a. Now, Ls C A so that P(L,lx) 5 1 - a. Thus, L, C Lc. Hence,

1993 POSTERIOR DISTRIBUTIONS 201

SL; = s 2 c = SL,. Thus, llp[l/ 5 11fI,'11 so that the likelihood region LE minimizes the norm of the derivative over A +.

Now consider A E A -. Then U(A) = sa + (1 -a), where now s = SA. Let Ld be such that P(Ld,(x) = a, and let B = L:. Then L, C A', so that P(L,,(x) 5 P(AcIx) = a. Hence, L, C Ld. Thus, SA = s 2 d = SB. Therefore, l l f & l l 5 l l f" l , ( , so that the complement of the likelihood region Ld minimizes the norm of the derivative over A -.

Now we compare the minima over the two classes, and we claim that the minimum of these two occurs over A +. First, note that a c 4 implies (1 -@/a > 1. Second, a < 4 also implies that 0 5 c 5 d 5 1, so that 1 2 (1 -d) / ( 1 -c). Hence, (1 -a)/a > ( 1 -d) / ( 1 -c), which in turn implies that a + (1 - a)c 5 old + (1 - a). Q.E.D.

Ruggeri (1991) showed that the result in Wasserman (1989) failed when other classes were considered. In particular he considered classes specified through the lower bound on the probability of a given subset. We will now consider a particular class among those considered by Ruggeri, and we will show that the maximal robustness of likelihood regions holds only under certain conditions. Then we will consider a modification of an example in Wasserman (1989) showing that highest-posterior-density intervals can be even more robust.

Let P be as above and L, and A E A . Further, suppose there exists K E B ( 0 ) such that 2, = infeEKf(x18) > 0, SK = 1, A 3 K , L , f lK # 8, L, $ K , and there is a sequence { O n } , 8, E K , such that f ( x l 8 , ) T c. Furthermore, suppose P ( K ) 2 p for some given

Consider the class !€'K containing P and all the discrete distributions Q such that p E [O, 11.

QW) 2 P. THEOREM 9. I l T ~ l l > Ilf~ll ifand only if

a 1 -a' C > -

WlK C>(l-p)SAc-- 1 -a'

Proof. Let SA = Q - P , so that we can apply Theorem 1, taking g(0) = I H ( 8 ) , where H E A. Then

Since IISQ(l = 1 , then llpLG 11 = max{a,(l - a)c} and Ilf~ll = max{a,(l - a)(l -

Example 1 in Wasserman (1989) considered an +contamination neighborhood of a probability measure for which the likelihood region was maximally robust. Now we present a different neighborhood of the same probability measure in which the likelihood function is not maximally robust.

EXAMPLE 4.1. Consider a random variable X which is normally distributed, with unknown mean 0 and variance 1 . Also, 0 itself is normally distributed with mean 0 and variance 2, and the sample x = 0.5 is given. It follows that the posterior distribution of 0 is still normal, with mean 4 and variance 3. Consider the highest posterior density interval A = [ - 1.27,1.93] and the likelihood set Lc = [ - 1.13,2.13] such that P(L, Ix) =

p ) S ~ c - w l ~ } . Q.E.D.

202 RUGGERI AND WASSERMAN Vol. 21, No. 2

P(Alx) = 0.95. Here c N 0.2649 > a/( 1 - a) N 0.0526 and SAC =f(0.5) 1.93) % 0.3597. The second requirement of Theorem 10 on c is fulfilled if we may pick p such that p > 1 - c/& N 0.2636. Since P(A) N- 0.7292, we may choose K = [kl,kzl with the above features, provided that we take -1.27 < kr < -1.13.0.5 < k2 < 1.93, and P ( K ) > p (e.g. p = 0.4 and K = [-1.2,1], so that P ( K ) N 0.5621). Thus, in this example, L, is less robust than A, since l l f ~ I I > 1 1 ~ ~ 1 1 . 5. DISCUSSION

The infinitesimal approach to evaluating the sensitivity of posterior distributions can be a useful robust Bayesian technique. The sensitivity of a large number of posterior quantities can be evaluated rapidly. Regarded as a function of the data, the norm of the derivative, as computed in either Section 2 or Section 3, identifies data points that lead to inferences that are highly sensitive to the choice of prior. This might be a valuable tool for deriving rebust experimental designs. The notion of global sensitivity developed in Section 3 might be a basis for selecting a maximally robust prior from a set of candidate priors. More work will be needed to get explicit expressions for the global sensitivity. Finally, as shown in Section 4, likelihood regions are maximally robust in the sense that they minimize the norm of the Fn5chet derivative over all priors. But as we pointed out, these regions may not minimize the norm over restricted classes. An open problem is to characterize those classes of priors for which likelihood regions are maximally robust.

APPENDIX

Proof of Theorem 1. We have that

where

where C p is a constant depending on P. The last inequality follows from the fact that IDS( i .ff(x(B)JG(de)l i supece f(xle)J)6(de)l = Il6ll. Similarly, INS/ = Jg(e$(xle)lti(dO>( 5 ~11611, where a = SUB g(B$(xle). Finally, note that D8 goes to zero as 6 goes to zero, so that IDp + D s ~ goes to Dp. Q.E.D.

Proof of Theorem 2. We have that

where 0, is the subset of 0 such that 6+(A) = @Q+ n A ) for all A, 6 = 6' - 6- is the Hahn-Jordan decomposition of 6, and 0- = of. Since g - 5 p 5 g, we have that h 2 0 and l g 5 0. Now, we may write the last expression as

1993 POSTERIOR DISTRIBUTIONS 203

Now, since 6 has zero mass and norm one, we conclude that a(@+) = 16(@-)1 = 1. Thus,

Yl + Y4 5 hi?(@+) + (-b)p(@-)l = (h - h).

Y2 + Y3 5 (-&6(@+) +@(@-)I = (h - @. Similarly,

Hence, DpIIp:lI 5 lh - b(. Now we show that we can get arbitrarily close to this upper bound. Choose t such that b > t > 0 and 8, and 82 such that h - t < h(&) 5 h and - h 5 h(&) < h + ~ . Let be a point mass at O i , i = 1,2, and let 6 = 61 -62. Then, using the expression above, we have that

~ p ( F : ( 6 ) ( = ( h - h - a1 = ( h - @ - a,

where 0 5 a 5 E. This holds for every t > 0, hence the result follows. Q.E.D.

ACKNOWLEDGEMENTS

The authors thank an associate editor and two referees for helpful comments.

REFERENCES Berger, J. (1984). The robust Bayesian viewpoint (with discussion). Robustness in Bayesian Srutisrics (J.

Berger, J. ( 1985). Srarisricul Decision Theory. Second Edition. Springer-Verlag, New York. Berger, J. (1986). Comment on: “On the consistency of Bayes estimates” by Diaconis, P. and Freedman, D.

Berger, J. (1990). Robust Bayesian analysis: Sensitivity to the prior. J. Sfurisr. Plunn. Inference, 25, 303-328. Cuevas, A,, and S a m , P. (1988). On differentiability properties of Bayes operators. Buyesiun Staristics 3 (J.M.

Bernardo, M.H. DeGroot, D.V. Lindley, and A.F.M. Smith, eds.), Oxford Univ. Press, 569-577. Diaconis, P., and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Srurisr., 14, 1-67. Ruggeri, F. (1990). Posterior ranges of functions of parameters under priors with specified quantiles. Cornrn.

Srurisr. Theory Methods, 19( I) , 127-144. Ruggeri, F. (1991). Robust Bayesian analysis given a lower bound on the probability of a set. Comm. Sfurisr.

Theory Merhods, 20, 1881-1891. Ruggeri, F., and Wasserman, L. (1991). Density based classes of priors: Infinitesimal properties and approxi-

mations. Technical Report 528, Department of Statistics, Camegie Mellon University. Sivaganesan, S. ( 1 993). Robust Bayesian diagnostics. J. Srurisr. Plann. Inference. In press. Sivaganesan, S.. and Berger, J. (1989). Ranges of posterior measures for priors with unimodal contaminations.

Srinivasan, C., and Truszczynska, H. (1990). On the ranges of posterior quantities. Technical Report 294,

Strasser, H. (1985). Mathematical Theory of Srurisrics. Walter de Gruyter, Berlin. Wasserman, L. (1989). A robust Bayesian interpretation of likelihood regions. Ann. Statist., 17, 1387-1393. Wasserman. L. ( 1992). Recent methodological advances in robust Bayesian inference. Bayesian Srurisrics IV:

Proceedings of the Fourth Valenciu Inremurionul Meeting on Bayesian Starisrics. Clarendon Press, Oxford,

Kadane, ed.), North-Holland, Amsterdam.

Ann. Srurisf., 14, 30-37.

Ann. Srufisr., 17, 868-889.

Department of Statistics, Unversity of Kentucky.

483-503.

Received 15 November 1991 Revised 2 November 1992 Accepted 25 January 1993

Consiglio Nuzionale delle Ricerche lnsriruto per le Applicazioni dellu Matemurica e dell’lnfonnaticu

Via A.M. AmpPre. 56 1-20131, M i h o

Iraly

Departmenr of Statistics Camegie Mellon University

Pinsburgh, PA U.S.A. 15213