Empirical Processes Based On Pseudo-Observations

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONSKILANI GHOUDI AND BRUNO REMILLARDDedicated to Mikl�os Cs�org}o on his 65th birthdayAbstract. Usually, empirical distribution functions are used to estimate the theoreticaldistribution function of known functions �(X) of the observable random variable X.In practice, many researchers are using empirical distribution functions constructed fromresiduals, which are estimations of a non-observable error terms in linear models. Thisfalls under a class of more general problems in which one is interested in the estimationof the distribution function of a non-observable random variable �(Q;X) depending on anobservable random variable X together with its unknown law Q. When Q is estimated bysome Qn, the quantities �(Qn;Xi) are called pseudo-observations. Some work has been donerecently when the pseudo-observations are the so-called residuals of linear models.The aim of this paper is to provide some tools to study the asymptotic behavior ofempirical processes constructed from general pseudo-observations. Examples of pseudo-observations will be given together with applications to copulas, weighted symmetry, re-gression and other statistical concepts.1. IntroductionIn a regression setting, including the study of time series, the series of residuals is used toconstruct empirical distribution functions for goodness-of-�t tests, for prediction intervals,and so on. Empirical processes built from residuals received considerable attention lately;see for example [Loynes 1980, Shorack 1984, Meester and Lockart 1988, Koul and Ossainder1994, Koul 1996, Kulperger 1996, and Mammen 1996].In a more general setting, consider the problem of approximating the distribution functionof a non-observable random variable �(Q;X) depending on an observable random variableX together with its unknown law Q.When Q is estimated by some Qn, the quantities �(Qn; Xi) are called pseudo-observations;obviously they are a generalization of residuals.In [Genest and Rivest 1993], pseudo-observations were used to approximate the distribu-tion function of H(X), where H is the distribution function of the bivariate random vectorX = (X(1); X(2)). Their estimation procedure was the following: take a random sample1991 Mathematics Subject Classi�cation. 60F05, 62E20.Key words and phrases. Empirical processes, pseudo-observations, residuals, weak convergence.The authors are supported in part by the Fonds Institutionnel de Recherche, Universit�e du Qu�ebec �a Trois-Rivi�eres, the Fonds pour la formation de chercheurs et l'aide �a la recherche du Gouvernement du Qu�ebecand by the Natural Sciences and Engineering Research Council of Canada, Grants No. OGP0042137 andNo. OGP0184065. 1

2 KILANI GHOUDI AND BRUNO REMILLARD(X1; : : : ; Xn). De�ne pseudo-observations Vi;n's viaVi;n = #fXj : X(1)j < X(1)i ; X(2)j < X(2)i g=(n� 1):If Kn is the empirical distribution function of the Vi;n's, then Kn converges almost surely toK, where K(v) = P (H(X) � v).They de�ned Kendall's process as IKn(v) = pnfKn(v) � K(v)g. It was shown thatV arfIKn(v)g is asymptotic toK(v)f1�K(v)g+ k(v)fk(v)R(v)� 2v + 2vK(v);where R(v) = E nH(X(1)1 ^X(1)2 ; X(2)1 ^X(2)2 )jV1 = V2 = vo � v2:The convergence of Kendall's process to a continuous Gaussian process was proven in [Barbeet al. 1996].Another application of pseudo-observations is hypothesis testing. Many test statistics ofthe form 1n nXi=1Hn(Xi) (e.g. Kendall's tau, Spearman rho, Wilcoxon signed-rank) are usedto test the hypothesis that the distribution function of H(X) is a given K, where Hn is aconsistent estimate of H, and where H may depend on the law of the Xi's. Unfortunately,the null hypothesis will be accepted in general whenever the expectation of H(X) coincideswith that of the random variable V having distribution function K.More e�cient tests should be based on the empirical distribution function Kn constructedfrom the pseudo-observationsHn(Xi)'s, rather than on the mean of these pseudo-observations.Note that 1n nXi=1Hn(Xi)� E(V ) = � Z 1�1 fKn(t)�K(t)gdt;so if IKn = pn fKn �Kg converges to a continuous process K,pn ( 1n nXi=1Hn(Xi)� E(V ))will converge in general to � Z 1�1K(t)dt. In particular, Kolmogorov-Smirnov or Cram�er-von-Mises type statistics based on IKn should be more e�cient in general than a test basedon pn ( 1n nXi=1Hn(Xi)� E(V )).This was the approach taken in [Abdous, Ghoudi and Remillard 1997] to test the hypoth-esis of weighted-symmetry, a notion generalizing the classical notion of symmetry. For moredetails on weighted-symmetry, see [Abdous and Remillard 1995].The aim of this paper is to provide some tools to study the asymptotic behavior of empiricalprocesses constructed from general pseudo-observations. More precisely, one wants to giveconditions under which IKn converges to a continuous process.

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 3Notations and su�cient conditions for the convergence of the process are stated in Section2. Examples of pseudo-observations will be given in Section 3, with applications to copulas,weighted-symmetry, regression and other statistical concepts. A sketch of the proof of themain result is then given in Section 4. The details of the argument can be found in Sections5 and 6. In Section 7, more tractable conditions are given for some linear model in order tosatisfy the Hypotheses of Section 2.2. Notations and main resultConsider an observable X-valued random variable X and let fXigi�1 be observations of Xso that the series is stationary and ergodic and so that the (non-observable) f"i = H(Xi)gi�1are random variables with values in an interval T of R. In many cases, H depends on thelaw Q of X.Given an estimate Hn of H, the pseudo-observations fei;ng1�i�n are de�ned byei;n = Hn(Xi); 1 � i � n:The empirical distribution Kn based on these pseudo-observations is then de�ned byKn(t) = 1n nXi=1 IIfei;n � tg; t 2 T:Let K be the distribution function of the random variable " = H(X) taking values in Twith probability one, where T is an interval of R.In order to establish the weak convergence of the process IKn(t) = pnfKn(t)�K(t)g, thefollowing assumptions will be made.Suppose that X is a complete and separable metric space . Let r be a continuous positivefunction from X to R such that infx2X r(x) > 0. Further let Cr be a closed subset of the Banachspace of all continuous functions f from X to R such that kfkr = supx2X jf(x)=r(x)j is �nite.The condition infx2X r(x) > 0 ensures that constant functions may belong to Cr. Note thatwhen X is compact, the norm k kr is equivalent to the supremum norm.Hypothesis I. Suppose that the law of " = H(X) admits a density k(�) on T which isbounded on every compact subset of T and that there exists a version of the conditionaldistribution of X given H(X) = t, denoted by Pt, such that for any f = rg 2 Cr, anycontinuous � on R and any continuous and bounded on R, the mappingt 7! �(t; (� � g)( � r)) = k(t)E f�(g(X)) (r(X))jH(X) = tg ;is continuous on T .Finally suppose that for any compact subset C of T ,limM!1 Z 1M sups2C Ps(r(X) > u)du = 0:(2.1)Hypothesis II. There exists a continuous version ~Hn of Hn such that pn supx2X j ~Hn(x)�Hn(x)j=r(x) converges in probability to zero. Suppose also that for any f = rg 2 Cr and

4 KILANI GHOUDI AND BRUNO REMILLARDany continuous on R, 0 � � 1, the process�n; �g(s; t) = 1pn nXi=1 [ (g(Xi))IIf"i � t+ sr(Xi)g�E f (g(X))IIf" � t+ sr(X)gg] ;is such that for any compact subset C of T and for s 2 R,supt2C ��n; �g(s=pn ; t)� �n; �g(0; t)��(2.2)converges in probability to zero. Finally suppose that if �n(t) = �n;1(0; t) and IHn =pn ( ~Hn �H), then (�n; IHn) converges in C(R) � Cr to a process (�; IH).Let Q be the set of all functions q de�ned in a positive neighborhood of zero, such that qis positive and increasing, q(t)=t is decreasing, and such that q(2t)=q(t) is bounded above.Hypothesis III. If t� = inf T is �nite and does not belong to T , then there exist q� 2 Qand a sequence ftng of positive numbers decreasing to zero such that limn!1pn K(t�+tn) = 0,limn!1 q�(tn)=(tnpn ) = 0, limt#0 k(t� + t)q�(t) = 0, and the sequence( supx : H(x)�t�>tn jIHn(x)j=q�(H(x)� t�))n�1is tight.If t� = supT is �nite and does not belong to T , then there exist q� 2 Q and a se-quence fsng of positive numbers decreasing to zero such that limn!1pn K(t� � sn) = 0,limn!1 q�(sn)=(snpn ) = 0, limt#0 k(t� � t)q�(t) = 0, and the sequence( supx : t��H(x)>sn jIHn(x)j=q�(t� �H(x)))n�1is tight.Remark 2.1. It follows easily from Hypothesis I that the mappings t 7! �(t; r) and t 7!�(t; f) are continuous on T for any f 2 Cr. When Hn and H are distribution function,the tightness of the sequences in Hypothesis III can possibly be proven using strong approx-imations techniques (e.g. [Cs�org}o and R�ev�esz 1981] or using Theorem 2.4 in [Alexander1987].With these notations, the main result of this paper is stated in the following way.

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 5Theorem 2.1. Under Hypotheses I and II above, the empirical process IKn converges inD(T ) to a continuous process IK with representationIK(t) = �(t)� �(t; IH):(2.3)If in addition Hypothesis III holds true, then IKn converges in D(R) to a continuous processhaving representation (2.3) on T , and vanishing outside T .3. Examples of ApplicationIn this Section examples of application of the above result are presented. The �rst Sub-section presents an example dealing with the dependence or copula functions. The secondSubsection is devoted to the weighted-symmetry process which represents a generalization ofthe classical symmetry process. The application given in the third Subsection generalizes theMann-Whithney statistic to what is called the Mann-Whitney process. The last application,presented in the fourth Subsection, deals with the weak convergence of empirical processconstructed from the residuals of a linear model.3.1. Copulas and Kendall's and Spearman's processes. LetX be a Rd-valued randomvariable with distribution function H and marginal distributions F1; : : : ; Fd.Speci�cation of a dependence model is equivalent to the speci�cation of the copula (de-pendence function) through the relationH(x) = CfF1(x(1)); � � � ; Fd(x(d))g;where the copula C is a distribution function concentrated on the unit square with uniformmarginals.A challenging inference problem is to choose a family of copulas instead of parameters ofa given family. Among the copulas, there are natural families. As de�ned in [Genest andMacKay 1986], a copula is Archimedean ifC(x) = ��1 dXi=1 �(x(i))! ;where �(1) = 0 and didti��1(t) > 0, on (0; 1), i = 1; : : : ; d; � is unique up to a constant.For example, the Gumbel (��(t) = log1=�(1=t), 0 < � � 1), Ali-Mikhail-Haq (��(t) =11�� log �1��+�tt �, 0 < � � 1), Clayton (��(t) = (t�� 1)=�, � � 0), and Frank (��(t) =� log 1� �t1� � !, 0 < � <1) are families of Archimedean copulas.But the Farlie-Gumbel-Morgenstern class of distributions, often used in practice to modelsmall departures from independence (e.g. [de la Horra and Fernandez 1995]), does not belongto Archimedean copulas family. Its associated copula is of the formC(x) = x(1)x(2) + �x(1)x(2)(1� x(1))(1� x(2)); j�j � 1:

6 KILANI GHOUDI AND BRUNO REMILLARDFor Archimedean copulas, the distribution function K of " = H(X) is easy to compute;in the two dimensional case, one obtainsK(t) = t� �(t)where � = �=�0. Hence � is determined by K via�(t) = exp(Z t0:5 1s �K(s)ds) :Thus an estimate of K yields an estimate for �. One can also be interested in estimatingthe distribution function of C(X) for non Archimedean copula C. That problem was �rststudied by [Genest and Rivest 1993] in a paper where they introduced Kendall's process forbivariate vectors.Let X1; : : : ; Xn be a random sample from a continuous multivariate distribution H(x) =P (X � x) with marginals F1; : : : ; Fd. Let Hn be the empirical distribution of the Xi'sand consider the pseudo-observations ei;n = Hn(Xi). First observe that ei;n represents theproportion of observations Xj, such that Xj � Xi component-wise, namelyei;n = #fj � n : Xj � Xig=n:Let Kn be the empirical distribution function based on the pseudo-observations ei;n's,and let K be the distribution function of the random variable � = H(X), taking valuesin (0; 1]. The Kendall's process is then given by IKn(t) = pn fKn(t) � K(t)g. One caneasily verify that IKn(t) does not depend on the marginals of X. Hence if the marginalsare continuous then one can assume, without loss of generality, that they are uniformlydistributed. Therefore, it is assumed in the rest of this subsection that X has uniformmarginals, meaning that H is assumed to be a copula.Now, suppose that the following conditions are satis�ed:The distribution function K(t) admits a continuous density k(t) on (0; 1] that veri�es k(t)= oft�1=2 log�1=2��(1=t)g for some � > 0 as t! 0.There exists a version of the conditional distribution of the vector (X(1); :::; X(d)) givenH(X) = t such that for any continuous real-valued function f on [0; 1]d , the mappingt 7! �(t; f) = k(t)E hffX(1); :::; X(d)g j H(X) = tiis continuous on (0; 1] with �(1; f) = k(1)f(1; :::; 1):Note that the second condition is in fact a condition on the (unique) copula C associatedto H, and that the conditions imply Hypotheses I, II and III.Take X = [0; 1]d, r � 1 and let Cr be the Banach space of continuous real-valued functionson X.

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 7It follows from well known results on empirical distribution functions that Hypothesis IIis satis�ed.It follows from Theorem 2.1 that the empirical process IKn(t) = pnfKn(t)�K(t)g, takingvalues in the space D of c�adl�ag functions from [0; 1] to R converges to a continuous Gaussianprocess IK with zero mean and covariance function�(s; t) = K(s ^ t)�K(s)K(t)+k(s)k(t)R(s; t)� k(t)Q(s; t)� k(s)Q(t; s);where Q(s; t) = PfH(X1) � s;X1 � X2 j H(X2) = tg � tK(s)and R(s; t) = PfX1 � X2 ^X3 j H(X2) = s;H(X3) = tg � st;0 � s; t � 1, where u ^ v denotes the component-wise minimum between u and v.In particular, Q(t; t) = tf1�K(t)g and hence�(t; t) = K(t)f1�K(t)g+ k(t) [k(t)R(t; t)� 2tf1�K(t)g] ;Note that Z 10 tdKn = 1n nXi=1 ei;n is a a�ne transformation of Kendall's measure of asso-ciation in arbitrary dimension d � 2 (e.g. [Joe 1990]), hence the name Kendall's process.For more details on that process, see [Barbe et al. 1996], where the convergence was provenunder stronger conditions.Note also that if X has independent components then K is given byK(t) = t8<:d�1Xj=0 log(1=t)jj 9=; :(3.1)Remark 3.1. One may also de�ne Spearman's process in the following way. Let H(x) =Qdj=1 Fi(x(j)), and set Hn(x) = dYj=1H(j)n (x(j));where H(j)n is the empirical distribution function of the X(j)i 's. Then an extension of Spear-man's rho is given by an a�ne transformation of the mean of the pseudo-observationsHn(Xi)'s. Under the same conditions as above, one obtains that IKn converges to a con-tinuous centered Gaussian process. Moreover, under the independence hypothesis, K is alsogiven by (3.1). However K will be di�erent in general since it is the law of Qdj=1 Fj(X(j)),instead of the law of H(X).

8 KILANI GHOUDI AND BRUNO REMILLARD3.2. Weighted-symmetry. Let Y be a random variable with continuous distribution func-tion F on R. For ! > 0, setjyj! = ( y if y � 0;�!y if y < 0 and sign(y) = 8><>: 1 if y > 0;0 if y = 0�1 if y < 0 :Following [Abdous and Remillard 1995], Y is (p; !)-symmetric about � if and only if sign(Y ��) and jY � �j! are independent and P (Y > 0) = p 2 (0; 1). This is equivalent to sayingthat pP (Y � � < �t) = (1� p)P (Y � � > !t); t > 0:In that paper they introduced an extension of Wilcoxon signed-rank statistic, namelyARn = 1nXi=1 sign(Yi � �)rank(jYi � �j!);where Y1; : : : ; Yn are i. i. d. observations of Y .For simplicity, suppose that � = 0. Recalling the motivations listed in the Introduction,it is natural to de�ne the pseudo-observations ei;n as follows:ei;n = sign(Yi)Gn(jYij!);where Gn is the empirical distribution function of the jYij!'s. Note that Gn(y) converges toG(y) = F (y)� F (�y=!), uniformly for y 2 [0;1).Now, write ei;n = eiGn(jYij!), where ei = sign(Yi), 1 � i � n.Let X = fx = (e; u) : e = �1; u 2 [0;1)g = f�1; 1g � [0;1), r � 1 and Cr = ff(x) =f(e; u) = eg(u) : g is continuous on [0;1), g(0) = 0 and limu!1 g(u) = 0g. Further letHn(x) = eGn(u) and H(x) = H(e; u) = eG(u).Under the hypothesis of weighted-symmetry, one getsK(t) = R(t; p) = ( 1� p+ pt if 1 � t > 0;(1� p)(1 + t) if � 1 � t � 0 :Hence one can take T = [�1; 1], and the density k is bounded on T and continuous onT nf0g. If f(e; u) = eg(u), then �(t; f) = k(t)sign(t)gfG�1(jtj)g, which is clearly continuousif G�1 is continuous, because g is continuous and vanishes at 0. Thus hypothesis I is satis�ed.Next, IHn(e; u) converges to eB �G(u), where B is a Brownian bridge. Moreover, it is easyto check that �n(t) = 1pn nXi=1 [IIfeiG(jYij! � tg �R(t; p)]converges to 8<: �q(1� p) B1(�t)� (1 + t)qp(1� p) Z if � 1 � t � 0;pp B2(t)� (1� t)qp(1� p) Z if 0 � t � 1 ;where B1 and B2 are independent Brownian bridges, which are also independent of Z, whichis a standard Gaussian random variable. It is easy to see that B = p1� p B1 +pp B2.

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 9Thus Hypothesis II is also satis�ed and it follows that IKn converges toIK(t) = qp(1� p) �q(1� p) B2(jtj)�pp B1(jtj)� (1� jtj)Z�= qp(1� p) fB3(jtj)� (1� jtj)Zg ; �1 � t � 1;where B3 = q(1� p) B2 �pp B1 is a Brownian bridge. Finally one can check that W (t) =B3(1� t)� tZ is a Wiener process on [0; 1]. HenceIK(t) = qp(1� p) W (1� jtj); t 2 [�1; 1]:As a by-product, if p is estimated by pn = 1n nXi=1 IIfYi > 0g, thenthen pn fKn(t)�R(t; pn)g=qpn(1� pn) converges to B3(jtj).Parameters � and ! can be estimated consistently by Hodges-Lehmann method appliedto the statistic ARn if one assumes additional conditions on the distribution of Y . For moredetails, see [Abdous, Ghoudi and Remillard 1997].3.3. Mann-Whitney process. In this case, suppose there are two independent samples:Zi, 1 � i � n, with continuous distribution function F , and Yi, 1 � i � mn, with continuousdistribution function G. Suppose also that they have the same support.Let Gm be the empirical distribution of the Yi's and let Hn(x) = Gm(x). De�ne thepseudo-observation ei;n = Hn(Zi) = Gm(Zi) and let " = H(Z) = G(Z). Observe thatK(t) = F � G�1(t) and that the mean of the pseudo-observations is the Mann-Whitneystatistic.Set X = R, r � 1 and Cr = C0(R), the set of continuous functions vanishing at in�nity.If n=(n +mn) converges to � 2 (0; 1), then it is easy to see that Hypothesis II is satis�edand that (�n; IHn) converges to (B1 � K;q �1�� B2 � H), where B1 and B2 are independentBrownian bridges. Note also that for any � 2 Cr, �(t; �) = k(t)�fG�1(t)gUnder the hypothesis F = G, K(t) = t for all t 2 T = [0; 1] and the density k � 1, iscontinuous on T . It follows that Hypothesis I holds. Thus Theorem 2.1 applies and IKnconverges to B1�q �1�� B2. Therefore fKn�Kg=q 1n + 1mn converges to p1� � B1�p� B2,which is a Brownian bridge.Note that it proves that Kolmogorov-Smirnov statistic and Mann-Whitney statistic arefunctionals of the same empirical process.If F 6= G, then one must suppose that the densities of F and G are continuous and thatg is strictly positive on fx : 0 < G(x) < 1g. In that case, the density k(t) is continuouson T = (0; 1). Therefore fKn �Kg=q 1n + 1mn converges to p1� � B1 �K � kp� B2 on T .Additional conditions must be imposed to obtain the convergence on [0; 1].As indicated by [Parzen 1997], pooled estimators are more natural than unpooled es-timators. This idea leads to de�ning pseudo-observations en;i = Hn(Zi), where Hn =�nFn + (1 � �n)Gmn , is the pooled estimator of F when F = G, instead of the unpooled

10 KILANI GHOUDI AND BRUNO REMILLARDestimator Gm as de�ned in the beginning of the subsection. Here Fn is the empirical dis-tribution function of the Zi's, and �n = n=(n + mn). Of course, one assumes that �ntends to � 2 (0; 1), as n tends to in�nity. Then Hn converges to H = �F + (1 � �)G.It follows that pn (Hn � H) converges in law to �B1 � F + q�(1� �)B2 � G. Set Cr =f�1�F +�2 �G;�1 and �2 vanishes at 0 and 1 and are continuous on [0; 1]g, and set K(t) =P (H(Z) � t) = F �H�1(t). Note that K is continuous since the support of F is contained inthe support of H. In [Parzen 1997], K is called the pooled comparison function, comparedto the unpooled comparison function F � G�1 which requires under additional assumptionson the support of F and G in order to be continuous.When F = G, the limiting process of fKn�Kg=q 1n + 1mn has the representationp1� � B1�p� B2, which is a Brownian bridge. When F 6= G, if one assumes in addition that the den-sity k of K exists and is continuous on T = (0; 1), pn (Kn � K) converges in law to aGaussian process having representation(1� �k)B1 �K � kq�(1� �)B2 �G �H�1;since �(t; �1 � F + �2 �G) = k(t)f�1 �K(t) + �2 �G �H�1(t)g.Remark 3.2. Many (signed) rank statistics can give rise to pseudo-observations and tomore e�cient tests.3.4. Regression. Set X = Rp+1, x = (y; z), y 2 R and z 2 Rp. The regression model isY = a+ b0Z + ", where Z and " are independent.Hence one can de�ne H(x) = H(y; z) = y� a� b0z, and Hn(x) = Hn(y; z) = y� an� b0nz,where an and bn are estimators of a and b respectively. In that case, the residuals Hn(Xi)are the pseudo-observations.If (pn (an � a);pn (bn � b); �n) converges to (A;B; �), then IHn converges in Cr to thecontinuous process H(y; z) = A+ B0z, where r(x) = r(y; z) = 1 + jzj and Cr = fc+ d0z; c 2R and d 2 Rpg. Therefore Hypothesis II is veri�ed.If one supposes that the density k of " = H(Y; Z) is continuous on the support T , andif E(jZj) is �nite, then Hypothesis I is satis�ed with �(t; f) = a + b0E(Z), when f(y; z) =a + b0z 2 Cr. Therefore Theorem 2.1 applies and IKn converges in D(T ) to a continuousprocess having representationIK(t) = �(t)� fA+ B0E(Z)gk(t):Moreover � = B �K, where B is a Brownian bridge.4. Sketch of Proof of Theorem 2.1The empirical process IKn may be expressed in terms of IHn and the "i's asIKn(t) = pn "1n nXi=1 IIf"i � t� IHn(Xi)=pn g �K(t)# ;

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 11which may also be written as the sum of two subsidiary processes, namely �n(t) = �n;1(0; t),as de�ned in the statement of Hypothesis II, and �n(t), de�ned by�n(t) = 1pn nXi=1 hIIf"i � t� IHn(Xi)=pn g � IIf"i � tgi :The convergence of the process �n will be studied in Sections 5 and 6. First, it willbe shown that �n(t) converges in D(T ) to a continuous process by proving that �n di�ersfrom a continuous function of the empirical process IHn by a quantity that tends to zero inprobability, uniformly on any compact subset C of T .More precisely,Lemma 4.1. Under Hypotheses I and II,limn!1P (supt2C j�n(t) + �(t; IHn)j > �) = 0;for any � > 0 and for any compact subset C of T .If Lemma 4.1 holds true, then the �rst part of Theorem 2.1 is proven since D(T ) is theprojective limit of the spaces fD(C), C is a compact subset of Tg and since the continuity ofthe mapping t 7! �(t; f) implies that �n(t) converges in D(T ) to �(t; IH). Therefore, whenT is closed, Lemmas 4.1 and 2.3 yields Theorem 2.1.When T is not closed, in order to extend the convergence to R, it is su�cient to prove thatunder the additional Hypothesis III, the restriction of �n(t) to T nC can be made arbitrarilysmall for some compact subset C of T . More precisely,Lemma 4.2. Under Hypotheses I, II and III, for any � > 0, one can �nd a compact subsetC of T so that lim supn!1 P ( supt2TnC j�n(t)j > �) < �:The proof of Lemma 4.2 is given in Section 6.5. Convergence of �n on compact subsets of TFirst, one may restrict the proof of Lemma 4.1 to continuous Hn's. For if the result is truein the continuous case, it follows from Hypothesis II that there exists a continuous version~Hn of Hn, such that pn supx2X j ~Hn(x)�Hn(x)j=r(x) converges in probability to zero; moreoverIHn = pn ( ~Hn � H) converges in Cr to a process IH. Hence Lemma 4.1 will be also truefor ~Hn + ar=pn , for any a 2 R. Since pn n ~Hn + ar=pn �Ho = IHn + ar, it follows thatIHn+ ar converges to IH + ar in Cr. Hence if �n(�; a) is the di�erence between the empiricalprocess based on the pseudo-observations ~Hn(Xi) + ar(Xi)=pn 's and the empirical process�n, that is �n(t; a) = 1pn nXi=1 hIIf ~Hn(Xi) + ar(Xi)=pn � tg � IIf"i � tgi ;

12 KILANI GHOUDI AND BRUNO REMILLARDthen Rn(a) = supt2C j�n(�; a) + �(t; IHn) + a�(t; r)j ;converges in probability to zero.Next, for any c > 0, the probability of the event Ec de�ned byEc = fpn supx2X j ~Hn(x)�Hn(x)j=r(x)> cg;is as small as one wants if n is large enough. Moreover, on the complement of Ec,�n(t; c) � �n(t) � �n(t;�c):for all t 2 T . Hence�c�(t; r)�Rn(c) � �n(t; c) + �(t; IHn + cr)� c�(t; r)� �n(t) + �(t; IHn)� �n(t;�c) + �(t; IHn � cr) + c�(t; r)� c�(t; r) +Rn(�c):It follows that on the complement of Ec,supt2C j�n(t) + �(t; IHn)j � maxfRn(�c); Rn(c)g+ c supt2C �(t; r):which can be made as small as one wants with high probability, by �rst choosing c smallenough, and then letting n tends to in�nity. Thus under Hypotheses I and II, Lemma 4.1also holds for Hn, even if it is not continuous.For rest of the section, suppose that Hn is continuous. The next two lemmas are requiredfor the proof of Lemma 4.1 but they have their own utility. The main idea of the proof ofLemma 4.1 is that due to the tightness IHn, one gets that IHn is arbitrarily close to somenon random element of Cr with high probability.5.1. Auxiliary results.Lemma 5.1. Suppose f 2 Cr and set�n(t; f) = pn fP (" � t+ f(X)=pn )� P (" � t)g:Under Hypothesis I, �n(t; f) converges to �(t; f) uniformly on compact subsets of T .proof. Let C be a compact subset of T . Set Y = maxf0; f(X)g and Z = maxf0;�f(X)g.Then �n(t; f) = pn P �t < " � t+ Y=pn ��pn P �t� Z=pn < " � t�g:It is su�cient to show thatpn P �t < " � t+ Y=pn � and pn P �t� Z=pn < " � t�converge respectively to �(t; Y ) and �(t; Z), uniformly on C.

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 13Since both proofs are similar, only the proof of the convergence ofpn P (t < " � t+ Y=pn )to �(t; Y ) will be given.For any � > 0 such that C� = fx+ y;x 2 C and jyj � �g is a compact subset of T ,pn supt2C ��P �t < " � t+ Y=pn �� P �t < " � t+min(�; Y=pn )�� pn P (Y > �pn );goes to zero as n tends to in�nity because E(Y ) is �nite.It follows that the asymptotic behavior of pn P (t < " � t+ Y=pn ) is the same as theone of An(t; �) = pn P (t < " � t+min(�; Y=pn )).Next, if M is given and pn > M=�,An(t; �) = pn Z t+�t k(s)Ps(Y > pn (s� t))ds= Z �pn0 k(t+ upn )Pt+ upn (Y > u)du= Z M0 k(t+ u=pn )Pt+ upn (Y > u)du+ Z �pnM k(t+ u=pn )Pt+ upn (Y > u)duUniformly for t 2 C, Z �pnM k(t + u=pn )Pt+ upn (Y > u)du is bounded by(sups2C� k(s)) Z 1M sups2C� Ps(Y > u)du;which can be made arbitrarily small by choosing M large enough.Let U be the distribution function of a uniformly distributed variable over (0; 1). For any� > 0, set �(y; u) = U((y � u)=�) and set �(y; u) = U((y � u+ �)=�).Then IIfy � � > ug � �(y; u) � IIfy > ug � �(y; u) � IIfy + � > ug;so Z M0 �(t+ upn ; �(�; u))du � Z M0 k(t+ u=pn )Pt+ upn (Y > u)du� Z M0 �(t+ upn ;�(�; u))du:Now supt2C ��Z M0 (�(t+ upn ; �(�; u))� �(t; �(�; u)))du��

14 KILANI GHOUDI AND BRUNO REMILLARDis bounded by Z M0 supt2C supt�s�t+M=pn j�(s; �(�; u))� �(t; �(�; u))jdu:(5.1)Similarly supt2C ��Z M0 (�(t+ upn ;�(�; u))� �(t; �(�; u))) du��is bounded by Z M0 supt2C supt�s�t+M=pn ��(s;�(�; u))� �(t;�(�; u))��du:(5.2)It follows from Hypothesis I and from the uniform continuity of U , that the �(s; �(�; u))and �(s;�(�; u)) are uniformly continuous on C� � [0;M ]. Therefore the quantities (5.1)and (5.2) both tend to zero as n tends to in�nity.Next Z M0 �(t;�(�; u))du � k(t) Z M0 Pt(Y + � > u)du= �(t; Y ) + �k(t)� k(t) Z 1M�� Pt(Y > u)du;and Z M0 �(t; �(�; u))du � k(t) Z M0 Pt(Y � � > u)du� �(t; Y )� �k(t)� k(t) Z 1M+� Pt(Y > u)du:Therefore supt2C ��Z M0 k(t+ u=pn )Pt+u=pn (Y > u)du� �(t; Y )��can be made arbitrarily small for all n large enough, by choosing M large enough and bychoosing � small enough.It follows that pn P (t < " � t+ Y=pn ) converges uniformly to �(t; Y ) on C.Lemma 5.2. Suppose f 2 Cr. Set n(t; f) = 1pn nXi=1 hII n"i � t+ f(Xi)=pn o� IIf"i � tgi :(5.3)Then under Hypothesis II, supt2C j n(t; f)� �(t; f)j converges in probability to 0, as n tendsto in�nity, for any compact subset C of T .proof. Let C be a given compact subset of T . Since g = f=r is bounded, there existsa compact set K 2 R such that g(R) � K . For any � > 0, one can �nd s1; : : : ; sk 2 (0;1)so that K is covered by balls Bj centered at aj of radius �, 1 � j � k. Moreover K is a

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 15normal space, so it follows from Theorem 5.1 in Chapter 4 of [Munkres 1975] that thereexists a partition of unity dominated by the covering, that is, there exist continuous positivefunctions j, 1 � j � k so that the support of j is contained in Bj , andkXj=1 j(x) = 1 for all x 2 K:Set n;j(t; f) = 1pn nXi=1 j � g(Xi) hIIf"i � t+ f(Xi)=pn g � IIf"i � tgi. Then n = kXj=1 n;j :Next, if f� = (sj � �)r, then n;j(t; f) � 1pn nXi=1 j � g(Xi) hIIf"i � t+ (sj + �)r(Xi)=pn g � IIf"i � tgi= (�n; j�g(sj + �pn ; t)� �n; j�g(0; t))+pn E h j � g(X) hIIf" � t+ (sj + �)r(X)=pn g � IIf" � tgii ;and n;j(t; f) � 1pn nXi=1 j � g(Xi) hIIf"i � t+ (sj � �)r(Xi)=pn g � IIf"i � tgi= (�n; j�g(sj � �pn ; t)� �n; j�g(0; t))+pn E h j � g(X) hIIf" � t+ (sj � �)r(X)=pn g � IIf" � tgii ;because jg(Xi)� sjj < � if j � g(Xi) > 0.Summing the last two inequalities yield n(t; f) � kXj=1(�n; j�g(sj + �pn ; t)� �n; j�g(0; t))+ �n(t; f + 2�r);and n(t; f) � kXj=1(�n; j�g(sj � �pn ; t)� �n; j�g(0; t))+ �n(t; f � 2�r);because (sj � �)r(x) � f(x) � (s+ j + �)r(x) on fx; j � g(x) > 0.It follows from Hypothesis II thatkXj=1 supt2C ��n; j�g(sj � �pn ; t)� �n; j�g(0; t)��

16 KILANI GHOUDI AND BRUNO REMILLARDconverges in probability to zero. Therefore it remains to show thatsupt2C j�n(t; f � 2�r)� �(t; f)jcan be made arbitrarily small. This follows from Lemma 5.1 sincej�n(t; f � 2�r)� �(t; f)j � j�n(t; f � 2�r)� �(t; f � 2�r)j+ 2��(t; r);so lim supn!1 supt2C j�n(t; f � 2�r)� �(t; f)j � lim supn!1 supt2C j�n(t; f � 2�r)� �(t; f � 2�r)j+2� supt2C �(t; r)= 0 + 2� supt2C �(t; r):Since supt2C �(t; r) is �nite and � can be chosen arbitrarily small, the proof is complete.5.2. Proof of Lemma 4.1. By Hypothesis II, IHn = pn (Hn � H) is tight, so for anym � 1, there exists a compact subset Km of Cr so that P (IHn 62 Km) < 1=m. ThereforeLemma 4.1 will be proven if one can show that under Hypotheses I and II,limn!1P (IHn 2 Km; supt2C j�n(t) + �(t; IHn)j > �) = 0;(5.4)for any � > 0 and for any compact subset C of T .Let C be a given compact subset of T . Since Km is compact, it is totally bounded, sofor a given � > 0, one can �nd a1; : : : ; ak 2 Cr so that Km is covered by balls Bj centeredat aj of radius �, 1 � j � k. Moreover, Km is a normal space, so it follows from Theorem5.1 in Chapter 4 of [Munkres 1975] that there exists a partition of unity dominated by thecovering, that is, there exist continuous positive functions �j , 1 � j � k so that the supportof �j is contained in Bj , and kXj=1�j(x) = 1 for all x 2 Km:Set fj = �(aj � �r) and gj = �(aj + �r), 1 � j � k.For the rest of the proof, suppose that IHn 2 Km and set �n;j = �n�j(IHn). Then�n(t) = kXj=1�j(IHn)�n(t) = kXj=1�n;j(t);so �n;j(t) � 1pn nXi=1 hIIfH(Xi) � t+ fj(Xi)=pn g � IIfH(Xi) � tgi ;and �n;j(t) � 1pn nXi=1 hIIfH(Xi) � t+ gj(Xi)=pn g � IIfH(Xi) � tgi ;

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 17because if �j(IHn) > 0, then IHn 2 Bj , so �fj(x) < IHn(x) < �gj(x), for all x 2 X.Using de�nition (5.3) introduced in Lemma 5.2, the last two inequalities yieldkXj=1�j(IHn) n(t; gj) � �n(t) � kXj=1�j(IHn) n(t; fj):(5.5)Set Rn;1 = max1�j�k supt2C j n(t; fj)� �(t; fj)j ;and Rn;2 = max1�j�k supt2C j n(t; gj)� �(t; gj)j :Then (5.5) yields�Rn;2 + kXj=1�j(IHn)�(t; gj) � �n(t) � Rn;1 + kXj=1�j(IHn)�(t; fj):(5.6)Next, note that�(t; fj) � 2��(t; r)� �(t; IHn); and �(t; gj) � �2��(t; r)� �(t; IHn):(5.7)Since kXj=1�j(IHn) = 1 on fIHn 2 Kmg, it follows from (5.6) and (5.7) thatsupt2C j�n(t) + �(t; IHn)j � max(Rn;1; Rn;2) + 2� supt2C �(t; r)(5.8)Using Hypothesis I, one obtains that supt2C �(t; r) is �nite. Using then Lemma 5.2 andchoosing � small enough, it is thus possible to make the right-hand side of (5.8) arbitrarilysmall with high probability when n is large enough.Since the left-hand side of (5.8) does not depend on the choice of the partition, the proofis complete. 6. Behavior of �n near the boundaryThe proof of Lemma 4.2 is similar to the proof of Lemma 3 in [Barbe et al. 1996]. Supposethat T is not closed and that the lower bound t� is �nite and t� 62 T .Set �n;+(t) = 1pn nXi=1 IIft < H(Xi) � t+ (Hn �H)�(Xi)gIIfH(Xi) > t� + tng;and �n;�(t) = 1pn nXi=1 IIft� (Hn �H)+(Xi) � H(Xi) < tgIIfH(Xi) > t� + tng:

18 KILANI GHOUDI AND BRUNO REMILLARDThen j�n(t)j � �n;+(t) + �n;�(t) + j�n(t� + tn)j+pn K(t� + tn)j:To prove Lemma 4.2 near the lower bound of T , it su�ces to show that for any � > 0,limt0!0 lim supn!1 P ( supt�<t�t�+t0 �n;+(t) > �) = 0;(6.1) limt0!0 lim supn!1 P ( supt�<t�t�+t0 �n;�(t) > �) = 0;(6.2) lim supn!1 P fj�n(t� + tn)j > �g = 0;(6.3)and lim supn!1 pn K(t� + tn) = 0:(6.4)Now (6.3) follows from the the fact that �n(t�) = 0 and the tightness of �n and (6.4) followsfrom Hypothesis III. It remains to prove (6.1) and (6.2).Denote by Fn;M the event supx : H(x)>t�+tnpn jHn(z)�H(z)jq(H(z)� t�) �M:By Hypothesis III, limM!1 lim infn!1 P (Fn;M) = 1;so one can assume that the events Fn;M occurs.It follows from the fact that q�(t)=t is decreasing, that for any x such that H(x) > t�+ tn,one has jHn(x)�H(x)jH(x)� t� � jHn(x)�H(x)jq�(H(x)� t�) q�(tn)tn �Mq(tn)=(tnpn );which goes to 0 as n!1 by Hypothesis III.Choosing n su�ciently large so that Mq(tn)=(tnpn ) � 1=2, it follows from the lastinequality that Hn(Xi) � t� + (H(Xi)� t�)=2, if H(Xi) > t� + tn. Sinceft < H(Xi) � t+ (Hn �H)�(Xi)g = fHn(Xi) � t < H(Xi)g;it follows that on Fn;M \ fH(Xi) > t� + tng, H(Xi) � t� is bounded above by 2(t � t�).Therefore �n;+(t) � 1pn nXi=1 IIft < H(Xi) � t+Mq(2(t� t�))=pn g= �n(t+Mq(2(t� t�))=pn )� �n(t)+pn nK(t+Mq(2(t� t�))=pn )�K(t)o :

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 19Hence sup0�t�t0 �n;+(t� + t) is bounded above by the sum ofsup0�t�t0 ��n nt� + t+Mq(2t)=pn o� �n(t� + t)��and sup0�t�t0pn hK nt� + t+Mq(2t)=pn o �K(t� + t)i :The �rst term goes to zero in probability due to the tightness of �n. To handle the secondterm, set �t = sup0<s�t k(t� + s)q(s). Using Hypothesis III, �t goes to zero as t tends to zero,and K(t� + t)�K(t� + s) � �t0(t� s)=q�(s);(6.5)for 0 < s � t � t0.Next, if n is large enough so that Mq(t0)=pn < t0,sup0�t�t0pn hK nt� + t+Mq(2t)=pn o�K(t� + t)i � M�2t0q(2t)=q(t)� c0M�2t0;where c0 = sup0<t�t0 q�(2t)=q�(t) is �nite if t0 is small enough. Therefore (6.1) holds true.It remains to prove (6.2). Restricting again the attention to events in Fn;M , it follows thatsup0�t�t0 �n;�(t� + t) is bounded above by the sum ofsup0�t�t0 ��n(t� + t)� �n nt� + t�Mq(t)=pn o��and sup0�t�t0pn hK(t� + t)�K nt� + t�Mq(t)=pn oi :The �rst term goes to zero in probability due to the tightness of �n.Using (6.5), if 0 < t � t0 and t�Mq�(t)=pn � t=2, thenpn nK(t� + t)�K(t� + t�Mq�(t)=pn )o � �t0Mq�(t)=q�(t=2)� c0M�t0:On the other hand, if 0 < t � t0 and t � Mq�(t)=pn � t=2, it follows that t < tn,if n is large enough, because t � tn implies q�(t)=t � q�(tn)=tn which is impossible sinceq�(tn)=(tnpn ) tends to zero, while q�(t)=(tpn ) is bounded below by 1=2M . Hencepn nK(t� + t)�K(t� + t�Mq�(t)=pn )o � pn K(t� + tn):Combining the two cases, it follows thatlim supn!1 sup0<t�t0pn nK(t+ t�)�K(t� + t�Mq�(t)=pn )o

20 KILANI GHOUDI AND BRUNO REMILLARDconverges to zero, as t0 tends to zero.Finally to prove Lemma 4.2 for the case of a �nite upper bound t� and t� 62 T , it su�cesto apply the same arguments to t� � H and K(t� � t), instead of H � t� and K(t� + t),replacing q� by q� and tn by sn.7. Verification of Hypothesis II for linear modelsIn this section it will be shown that Hypothesis II holds for most linear models. First it isestablished that the validity of Hypothesis II can be easily veri�ed by checking the conditionsof any of the following two Lemmas.Lemma 7.1. If Hypothesis I holds, if the "i's are independent and if for any g = f=r; f 2 Cr,the conditional law of (g(Xi); r(Xi)) , rg 2 Cr, given�fg(X1); r(X1); : : : ; g(Xi�1); r(Xi�1); "1; ; : : : ; "igis the same as the one given �f"ig, thensupt2C ��n; �g(s=pn ; t)� �n; �g(0; t)��converges to zero in probability.Lemma 7.2. If in addition to Hypothesis I, �n is tight, E(r2(X)) <1, for any g = f=r; f 2Cr, the conditional law of "i given�fg(X1); r(X1); : : : ; g(Xi); r(Xi); "1; ; : : : ; "i�1gis the same as the one given �fg(Xi); r(Xi)g, and if for any f = rg 2 Cr, the law of" given (r(X); g(X)) admits a density kf (�; r(X); g(X)) which is bounded and such thatfkf (�; x); x 2 Xg is equicontinuous on C, for any compact subset C of T , thensupt2C ��n; �g(s=pn ; t)� �n; �g(0; t)��converges to zero in probability.Remark 7.1. The existence of density kf is a strong assumption. However, it becomestrivial if "i is independent of the sigma-�eld�fg(X1); r(X1); : : : ; g(Xi); r(Xi); "1; ; : : : ; "i�1g:In such a case, kf = k. This happens to be the case of most linear models; in particular itholds for linear regression models (e.g. Subsection 3.4) and for autoregressive models of theform Yi � � = "i + pXk=1 �k(Yi�k � �);

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 21where the "i's are independent and identically distributed. In that case, Xi = (Yi; Zi) =(Yi; Yi�1; : : : ; Yi�p), r(x) = r(y; z) = 1 + pXk=1 jz(k)j and Cr = fc + d0z; c 2 R and d 2 Rpg. Itfollows that "i is independent of�fg(X1); r(X1); : : : ; g(Xi); r(Xi); "1; ; : : : ; "i�1g � �fY1; : : : ; Yi�1; "1; ; : : : ; "i�1g:The case of moving-average processes is not covered by the above since "i depends on "i�1; : : : ,"i�q. However, it can be done relatively simply (e.g. [Bai 1994]). General pseudo-observationsof the form "i = H(Xi; "i�1; : : : ; "i�q) will be studied in a forthcoming paper.Proof of Lemma 7.1 First, assume that s > 0 and let sn = s=pn, Yi = r(Xi). The caseof s < 0 is similar and is therefore omitted. Observe that�n; �g(sn; t)� �n; �g(0; t) = �n(t)� ~�n(t);where�n(t) = 1pn nXi=1 [ (g(Xi))IIft < "i � t+ snYig � E f (g(X))IIft < " � t+ snYigg] :Now let � > 0 and set,�n(t; �) = 1pn nXi=1 [ (g(Xi))IIft < "i � t+ snYi � t+ �g� E f (g(X))IIft < " � t+ snYi � t+ �gg] :Observe that supt2T j�n(t)� �n(t; �)j converges to zero in probability since it is bounded byLn(�) = 1pn nXi=1 IIfr(Xi) > �=sng+pn Pfr(X) > �=sng;and since E(Ln(�)) = 2pn Pfr(X) > �=sng goes to zero because Efr(X)g is �nite andsn goes to zero for any s � 0. It then follows that Lemma 7.1 holds if one shows thatsupt2C j�n(t; �)j converges to 0 in probability for every � > 0.Let Fi�1;n = �fg(X1); r(X1); : : : ; g(Xi�1); r(Xi�1); "1; ; : : : ; "ig;and t0; � � � ; tm(n) be a partition of the compact set C with mesh �n satisfyinglimn!1�npn + 1n�n = 0:(7.1)Then for any t 2 C, there exists an 1 � r � m(n) such that tr�1 < t � tr. Next, for each1 � i � n and for each 1 � r � m(n), letdi;n(r) = (g(Xi))IIftr < "i � tr + snYi � tr + �g�Ef (g(Xi))IIftr < "i � tr + snYi � tr + �gjFi�1;ng;

22 KILANI GHOUDI AND BRUNO REMILLARD�i;n(r) = Ef (g(Xi))IIftr < "i � tr + snYi � tr + �gjFi�1;ng�Ef (g(Xi))IIftr < "i � tr + snYi � tr + �gg;p(t; r) = Ef (g(Xi))IIftr < "i � tr + snYi � tr + �gg�Ef (g(Xi))IIft < "i � t+ snYi � t+ �gg;�di;n(r) = (g(Xi))IIftr�1 < "i � tr�1 + snYi � tr�1 + �g�Ef (g(Xi))IIftr�1 < "i � tr�1 + snYi � tr�1 + �gjFi�1;ng;��i;n(r) = Ef (g(Xi))IIftr�1 < "i � tr�1 + snYi � tr�1 + �gjFi�1;ng�Ef (g(Xi))IIftr�1 < "i � tr�1 + snYi � tr�1 + �gg;and �p(t; r) = Ef (g(Xi))IIftr�1 < "i � tr�1 + snYi � tr�1 + �gg�Ef (g(Xi))IIft < "i � t+ snYi � t+ �gg:Further let �n(r) = 1pn nXi=1 di;n(r), �n(r) = 1pn nXi=1 �i;n(r), ��n(r) = 1pn nXi=1 �di;n(r) and��n(r) = 1pn nXi=1 ��i;n(r). Then�n(t; �) � ��n(r) + ��n(r) +pn �p(t; r)� j�n(tr)� �n(tr�1)j � pnPftr�1 < � � trg;(7.2) �n(t; �) � �n(r) + �n(r) +pn p(t; r) + j�n(tr)� �n(tr�1)j+pnPftr�1 < � � trg:(7.3)Since �n is tight and �n is going to zero, max1�r�m(n) j�n(tr) � �n(tr�1)j converges to zero inprobability. It then follows that supt2C j�n(t; �)j converges to 0 in probability if supt2C jpn p(t; r)j,supt2C jpn �p(t; r)j and max1�r�m(n)pnPftr�1 < � � trg converge to zero and if each of max1�r�m(n) j�n(r)j,max1�r�m(n) j��n(r)j, max1�r�m(n) j�n(r)j and max1�r�m(n) j��n(r)j converges to zero in probability.First note that max1�r�m(n)pnPftr�1 < � � trg � pn �n supt2C k(t);which converges to zero by the choice of �n and since k is bounded on C. Now considerpn p(t; r) and note that because 0 � � 1,jpn p(t; r)j � ��pn Pftr < "i � tr + snYi � tr + �g�pn Pft < "i � t+ snYi � t+ �g�� ;= jBn(tr; �)� Bn(t; �)j;

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 23where Bn(t; �) = pn Pft < "i � t+ snYi � t+ �g. As in the proof of Lemma 5.1, note thatjBn(tr; �)� Bn(t; �)j � ��Z M0 k(tr + snu)Ptr+snu(Y > u)du� Z M0 k(t+ snu)Pt+snu(Y > u)du��+2(sups2C� k(s))Z 1M sups2C� Ps(Y > u)du:As argued in the proof of Lemma 5.1, the last part of the above inequality can be madearbitrarily small by choosing M large enough. Let~Bn(tt; t; �;M) = Z M0 k(tr + snu)Ptr+snu(Y > u)du� Z M0 k(t+ snu)Pt+snu(Y > u)du:Repeating the argument in the proof of Lemma 5.1 one sees that j ~Bn(tt; t; �;M)j is smallerthan the maximum ofZ M0 j�(tr + snu; �(�; u))� �(t+ snu;�(�; u))jdu(7.4)and Z M0 j�(tr + snu;�(�; u))� �(t+ snu; �(�; u))jdu:(7.5)By adding and subtracting �(t; �(�; u))��(t;�(�; u)) inside the absolute value in equations(7.4) and (7.5), one bounds each of (7.4) and (7.5) byZ M0 supt2C supt�s�t+�n+snM j�(s; �(�; u))� �(t; �(�; u))jdu+ Z M0 supt2C supt�s�t+�n+snM j�(s;�(�; u))� �(t;�(�; u))jdu+ Z M0 f�(t;�(�; u))� �(t; �(�; u))gduThe �rst two terms converge to zero by the argument given in the proof of Lemma 5.1 andthe last term is bounded by 4�k(t). Because k is bounded on C, by choosing � appropriately,one can make this term arbitrarily small. This completes the proof for supt2C pn jp(t; r)j. Theproof of supt2C pn j�p(t; r)j follows the same lines and is therefore omitted.

24 KILANI GHOUDI AND BRUNO REMILLARDNext note that for any � > 0,P ( max1�r�m(n) j�n(r)j > �) � m(n) max1�r�m(n)P fj�n(r)j > �g� ��4m(n) max1�r�m(n)Ef�n(r)4g:Since �n(r) is the sum of i.i.d mean zero random variables, one easily gets Ef�n(r)4g �3 fE[�i;n(r)2]g2 + Ef�i;n(r)4g=n. Now, going back to the de�nition of �i;n(r) one sees thatE[�i;n(r)2] � P (tr < "i � tr + snYi � tr + �);which is bounded by C0sn for some constant C0, from the argument in the proof for p(t; r).Moreover, Ef�i;n(r)4g � 1, so Ef�n(r)4g � (3C20 + 1)(s2n + 1=n). By (7.2) and (7.3), andbecausem(n) = O(1=�n), the quantity max1�r�m(n) j�n(r)j goes to zero in probability. Repeatingthe same argument, one gets the proof for j��n(r)j.Now consider �n(r) and note that for any � > 0P ( max1�r�m(n) j�n(r)j > �) � ��4m(n) max1�r�m(n)Ef�n(r)4g:Let Fi�1;n = �fg(X1); r(X1); : : : ; g(Xi�1); r(Xi�1); "1; ; : : : ; "ig and observe that di;n(r) is amartingale di�erence with respect to the �ltration F = fFi;ng. An application of Rosenthal'sinequality (e.g. [Hall and Heyde 1980]) yieldsEf�n(r)4g � cn�2 8<:E " nXi=1Efdi;n(r)2jFi�1;ng#2 + nXi=1Efdi;n(r)4g9=; ;for some positive constant c. Now observe that jdi;n(r)j � 1 and Efdi;n(r)2jFi�1;ng �E[IIftr < "i � tr + snYi � tr + �gj"i]. Replacing in the above and using the fact that the"i's are independent givesEf�n(r)4g � cnP (tr < "i � tr + Yi=pn � tr + �)2 + n�1o� c(s2n + 1n):The last inequality follows from the argument in the proof of �n(r). The proof is alsocompleted through the same lines. The convergence of max1�r�m(n) j��nj follows through thesame argument.Proof of Lemma 7.2 The proof follows the same steps as that of Lemma 7.1. In fact thesame notations are also used in this proof. For the exception that in this case the sigma �eldis Fi�1;n = �fg(X1); r(X1); : : : ; g(Xi); r(Xi); "1; ; : : : ; "i�1g:Fix � > 0 and choose � such thatsupjs�tj<� supx2X jkf(s; r(x); g(x))� kf (t; r(x); g(x))j � �:

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 25Such a � exists by the equicontinuity of kf .Now repeating the steps of the previous proof, one only needs to reverify that max1�r�m(n) j�n(r)jand max1�r�m(n) j�n(r)j converge to zero in probability.First consider max1�r�m(n) j�n(r)j. Observe that�i;n(r) = (g(Xi)) Z tr+min(snYi;�)tr kf (s; r(Xi); g(Xi))ds� Ef (g(Xi))IIftr < "i � tr + snYi � tr + �gg;hence�n(r) = 1pn nXi=1 (g(Xi)) Z tr+min(snYi;�)tr nkf (s; r(Xi); g(Xi))� (kf (tr; r(Xi); g(Xi))ods+ 1pn nXi=10@ (g(Xi)) Z tr+min(snYi;�)tr kf (tr; r(Xi); g(Xi))ds�E� (g(Xi)) Z tr+min(snYi;�)tr kf (tr; r(Xi); g(Xi))ds�1A+pn E� (g(Xi)) Z tr+min(snYi;�)tr nkf (tr; r(Xi); g(Xi))� kf (s; r(Xi); g(Xi))ods�:Letting,Sn(t) = 1pn nXi=1� (g(Xi))min(snYi; �)kf (t; r(Xi); g(Xi))� Eh (g(Xi))min(snYi; �)kf (t; r(Xi); g(Xi))i�;it follows from the equicontinuity of kf (�; r(Xi); g(Xi)) thatj�n(r)j � �sn 1n nXi=1 Yi + E(Y )o+ jSn(tr)j:Next observe that for any �xed t 2 C, Sn(t) converges to zero in probability since the Xi'sare ergodic and stationary. Now let s1; : : : ; sm be a �nite partition of the set C with mesh�. Then any t 2 C belongs to an interval [sj ; sj+1] for some 1 � j � m � 1. Moreover,jSn(t)j � jSn(t)� Sn(sj)j+ jSn(sj)j. From the choice of � one hasjSn(t)� Sn(sj)j � �sn 1n nXi=1 Yi + E(Y )o:This yields max1�r�m(n) j�n(r)j � 2�sn 1n nXi=1 Yi + E(Y )o+ max1�j�m jSn(sj)j:

26 KILANI GHOUDI AND BRUNO REMILLARDThis goes to zero in probability, since � can be chosen arbitrarily small, 1n Pni=1 Yi + E(Y )converges in probability to 2E(Y ), and the number of sj's, is �nite.Now consider �n(r) and note that for any � > 0P ( max1�r�m(n) j�n(r)j > �) � ��4m(n) max1�r�m(n)Ef�n(r)4g:Since di;n(r) is a martingale di�erence with respect to the �ltration F = fFi;ng, an applica-tion of Rosenthal's inequality yieldsEf�n(r)4g � cn�2 8<:E " nXi=1Efdi;n(r)2jFi�1;ng#2 + nXi=1Efdi;n(r)4g9=; ;for some positive constant c. Again jdi;n(r)j � 1 andEfdi;n(r)2jFi�1;ng � EhIIftr < "i � tr + snYi � tr + �gjFi�1;ni= Z tr+min(snYi;�)tr kf (s; r(Xi); g(Xi))ds� MsnYi:The last inequality follows form the fact that kf is uniformly bounded by some M . UsingJensen inequality, it follows Ef�n(r)4g � c1n ;where c1 = c[M2s2Efr(X)2g+ 1], which impliesP ( max1�r�m(n) j�n(r)j > �) � c1m(n)n�4 :The above goes to zero since m(n)=n goes to zero.REFERENCESAbdous, A. and Remillard, B. (1995). Relating quantiles and expectiles under weighted-symmetry. Ann.Inst. Statist. Math., 47 371{384.Abdous, A., Ghoudi, K. and Remillard, B. (1997). Signed Kendall processes. Rapport technique duLaboratoire de Recherche en Probabilit�es et Statistique, U. Q. T. R., 7 20 pages.Alexander, K. S. (1987). The central limit theorem for weighted empirical processes indexed by sets. J.Multivariate Anal., 22 313{339.Bai, J. (1994). Weak convergence of the sequential empirical processes of residuals in ARMA models. TheAnnals of Statistics, 22 2051{2061.Barbe, P., Genest, C., Ghoudi, K. and Remillard, B. (1996). On Kendall's process. J. MultivariateAnal., 58 197{229.Cs�org}o, M., and R�ev�esz, P. (1981). Strong Approximations in Probability and Statistics. AcademicPress.de la Horra, J. and Fernandez, C. (1995). Sensitivity to prior independence via Farlie-Gumbel-Morgenstern model. Comm. Statist. Theory Meth. 24 987-996.Genest, C. and Rivest, L. P. (1993). Statistical inference procedures for bivariate Archimedean copulas.J. Amer. Statist. Assoc., 88 1034{1043.

EMPIRICAL PROCESSES BASED ON PSEUDO-OBSERVATIONS 27Hall, P., and Heyde, C. (1980). Martingale Limit Theory and its Appications. Academic Press.Joe, H. (1990). Multivariate concordance, J. Multivariate Anal., 35 12{30.Koul, H. L. and Ossiander, M. (1994). Weak convergence of randomly weighted dependent residualempiricals with applications to autoregres- sion. The Annals of Statistics, 22 540{562.Koul, H. L. (1996). Asymptotics of some estimators and sequential residual empiricals in nonlinear timeseries. The Annals of Statistics, 24 380{404.Kulperger, R. J. (1996). Bootstrapping empirical distribution functions of residuals from autoregressivemodel �tting. Commun. Statist.- Simula., 25 657-670.Loynes, R. M. (1980). The empirical distribution function of residuals from generalized regression. TheAnnals of Statistics, 8 285{ 298.Mammen, E. (1996). Empirical process of residuals for high-dimensional linear models. The Annals ofStatistics, 24 307{335.Meester, S. G. and Lockhart, R. A. (1988). Testing for normal errors in designs with many blocks.Biometrika, 75 569{575.Munkres, R. J. (1975). Topology A First Course. Prentice-Hall.Parzen, E. (1997). Statistical methods mining, two-sample data analysis, comparison distributions andquantile limit theorems. Unpublished work.Shorack, G. R. (1984). Empirical and rank processes of observations and residuals. The Canadian Journalof Statistics, 12 319{332.D�epartement de math�ematiques et d'informatique,Universit�e du Qu�ebec �a Trois-Rivi�eres,Trois-Rivi�eres (Qu�ebec), Canada G9A 5H7E-mail address : [email protected], [email protected]

Empirical Processes Based On Pseudo-Observations

Documents

Transcript of Empirical Processes Based On Pseudo-Observations