Kernel regression estimation in a Banach space

14
Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434 Contents lists available at ScienceDirect Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi Kernel regression estimation in a Banach space Sophie Dabo-Niang a, , Noureddine Rhomari b a Laboratory GREMARS-EQUIPPE, University Charles De Gaulle, Villeneuve d'ascq, France b Department of Mathematics, Faculty of Sciences, University Mohammed I, Oujda, Maroc ARTICLE INFO ABSTRACT Article history: Received 25 October 2006 Received in revised form 17 June 2008 Accepted 24 June 2008 Available online 19 August 2008 MSC: 62 G 08 Keywords: Kernel estimates Regression function Banach-valued random vectors Metric-valued random vectors In this paper, we study a nonparametric regression estimator when the response variable is in a separable Banach space and the explanatory variable takes values in a separable semi- metric space. Under general conditions, we establish some asymptotic results and give upper bounds for the p-mean and almost sure (pointwise and integrated) estimation errors. Finally, we present the case where the explanatory variable is the Wiener process. Crown Copyright © 2008 Published by Elsevier B.V. All rights reserved. 1. Introduction Let (X, Y ) be a pair of random vectors (r.v.) taking values in X×Y o , : Y o −→ Y a given measurable function, where (X, d) is a separable semi-metric space, (Y o , · o ) and (Y, ·) are separable Banach spaces (with eventually infinite dimensions) equipped with their Borelian -fields. We assume that E(Y ) < . The distribution of (X, Y ) is often unknown, so is the regression function m (x) = E((Y )|X = x). The goal of this work is the estimation of this unknown regression function m (·) by m ,n (·), a function of a sample (X 1 , Y 1 ),...,(X n , Y n ) identically distributed as (X, Y ). The present framework includes the classical case where X and Y are of finite dimensions, but also spaces of infinite dimensions. This problem has been widely studied when the vector (X, Y ) lies in a finite dimensional space (there are many references on this topic) compared to the infinite dimensional observation case. The present work generalizes and extends the paper of Dabo-Niang and Rhomari (2003) which deals with a scalar response variable Y . The spaces X, Y o and Y may be for example separable function spaces or simply some separable subspaces of some L R (E, ), where (E, E, ) is some measure space and 0 < . An interesting case is, when the (X i ) i=1, ...,n and (Y i ) i=1, ...,n are random curves, for instance of C(R), C[0, T ] or some subspaces of L(R). Indeed many phenomena, in various areas (e.g. weather, medicine, etc.), are continuous in time and may or must be represented by curves. Recently, the statistics of functional data have received a growing attention. These questions in infinite dimension are particularly interesting, not only for the fundamental problems they formulate, but also for many applications they may allow, see Ramsay and Silverman (1997, 2002), Bosq (2000) and Ferraty and Vieu (2006). For example, let us consider the weather data for a country. One can investigate to what extent the total annual precipitation for weather stations can be predicted from the pattern of temperature variation through the year: let T Y be some set, and Y = (Y (t), t T Y ) be the logarithm of complete annual Corresponding author at: Laboratory GREMARS-EQUIPPE, University Charles De Gaulle, Villeneuve d'ascq, France. Tel.: +33 320 416 510; fax: +33 320 416 171. E-mail addresses: [email protected] (S. Dabo-Niang), [email protected] (N. Rhomari). 0378-3758/$ - see front matter Crown Copyright © 2008 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2008.06.015

Transcript of Kernel regression estimation in a Banach space

Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference

journal homepage: www.e lsev ier .com/ locate / jsp i

Kernel regression estimation in a Banach space

Sophie Dabo-Nianga,∗, Noureddine Rhomarib

aLaboratory GREMARS-EQUIPPE, University Charles De Gaulle, Villeneuve d'ascq, FrancebDepartment of Mathematics, Faculty of Sciences, University Mohammed I, Oujda, Maroc

A R T I C L E I N F O A B S T R A C T

Article history:Received 25 October 2006Received in revised form17 June 2008Accepted 24 June 2008Available online 19 August 2008

MSC:62 G 08

Keywords:Kernel estimatesRegression functionBanach-valued random vectorsMetric-valued random vectors

In this paper, we study a nonparametric regression estimator when the response variable isin a separable Banach space and the explanatory variable takes values in a separable semi-metric space. Under general conditions, we establish some asymptotic results and give upperbounds for the p-mean and almost sure (pointwise and integrated) estimation errors. Finally,we present the case where the explanatory variable is the Wiener process.

Crown Copyright © 2008 Published by Elsevier B.V. All rights reserved.

1. Introduction

Let (X,Y) be a pair of randomvectors (r.v.) taking values inX×Yo,� : Yo −→ Y a givenmeasurable function, where (X, d) is aseparable semi-metric space, (Yo, ‖ ·‖o) and (Y, ‖ ·‖) are separable Banach spaces (with eventually infinite dimensions) equippedwith their Borelian �-fields.We assume that E‖�(Y)‖ <∞. The distribution of (X,Y) is often unknown, so is the regression functionm�(x) = E(�(Y)|X = x). The goal of this work is the estimation of this unknown regression function m�(·) by m�,n(·), a function

of a sample (X1,Y1), . . . , (Xn,Yn) identically distributed as (X,Y). The present framework includes the classical case whereX andY are of finite dimensions, but also spaces of infinite dimensions. This problem has been widely studied when the vector (X,Y)lies in a finite dimensional space (there are many references on this topic) compared to the infinite dimensional observationcase. The present work generalizes and extends the paper of Dabo-Niang and Rhomari (2003) which deals with a scalar responsevariable Y .

The spacesX,Yo andYmay be for example separable function spaces or simply some separable subspaces of some L�R(E, �),

where (E,E, �) is some measure space and 0 <��∞. An interesting case is, when the (Xi)i=1,. . .,nand (Yi)i=1,. . .,n

are random

curves, for instance of C(R), C[0, T] or some subspaces of L(R). Indeedmany phenomena, in various areas (e.g. weather, medicine,etc.), are continuous in time and may or must be represented by curves.

Recently, the statistics of functional data have received a growing attention. These questions in infinite dimension areparticularly interesting, not only for the fundamental problems they formulate, but also for many applications they may allow,see Ramsay and Silverman (1997, 2002), Bosq (2000) and Ferraty and Vieu (2006). For example, let us consider the weather datafor a country. One can investigate to what extent the total annual precipitation for weather stations can be predicted from thepattern of temperature variation through the year: let TY be some set, and Y = (Y(t), t ∈ TY ) be the logarithm of complete annual

∗ Corresponding author at: Laboratory GREMARS-EQUIPPE, University Charles De Gaulle, Villeneuve d'ascq, France. Tel.: +33320416510; fax: +33320416171.E-mail addresses: [email protected] (S. Dabo-Niang), [email protected] (N. Rhomari).

0378-3758/$ - see front matter Crown Copyright © 2008 Published by Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2008.06.015

1422 S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434

precipitation at a weather station, and let X= (X(t), t ∈ [0, T]) be its temperature function, where the interval [0, T] is either [0, 12]or [0, 365] depending whether onmonthly or daily data. Then, the prediction problem of interest here is the estimation of E(Y|X).For more examples, we refer to Ramsay and Silverman (1997, 2002) and the references therein.

The introduction of the function � allows to cover and estimate various parameters (which may be functional) associatedwith the distribution of (X,Y); for example: (i) if �(Y)(·)= 〈Y , ·〉Y , where 〈y, y∗〉 := y∗(y) for y∗ being in the topological dualY∗ ofY, then m�(x) corresponds to the conditional covariance operator of Y; (ii) if, for a fixed measurable set A, �(Y) = 1A(Y), m�(x)corresponds to the conditional distribution of Y given X.

The popular nonparametric estimate of m�(x), x ∈ X, is a locally weighted average, defined by

m�,n(x) =n∑

i=1

Wni(x)�(Yi), (1)

where (Wn1(x)), . . . ,Wnn(x)) is a probability vector of weights and each Wni(x) is a Borel measurable function of x,X1, . . . ,Xn.The focus of the present paper is on kernel-type estimate given by

Wni(x) = K(d(Xi, x)/hn)∑nj=1K(d(Xj, x)/hn)

(2)

if∑n

j=1 K(d(Xj, x)/hn)�0 otherwiseWni(x)=0, where (hn)n is a sequence of positive numbers, and K is a nonnegativemeasurablefunction on R.

When (Xi,Yi) ∈ Rd × Rk, the estimation of mId, (Id := Identity) was treated by many authors with various weights includingthe nearest neighbor and kernel methods. For the references cited hereafter, we limit ourselves to the general case where noassumption concerning the existence of density with respect to some measure of reference is made. See, among others, Stone(1977), Devroye andWagner (1980), Spiegelman and Sacks (1980), Devroye (1981), Gyorfi (1981), Greblicki et al. (1984), Krzy �zak(1986) and Gyorfi (2002).

The literature on this topic in infinite dimension is relatively limited, to the best of our knowledge. Bosq (1983) and Bosqand Delecroix (1985) deal with general kernel predictors for Hilbert-valued Markovian processes. In the case of independentobservations Kulkarni and Posner (1995) studied the nearest neighbor estimation in general separable metric space X, with Yvalued in a finite dimensional space and � = Id. They derived a rate of convergence connected to metric covering numbers inX. For the kernel method, when Y is a scalar random variable, X takes values in a semi-normed space, � = Id, Y is bounded andmId continuous at x, Ferraty and Vieu (2000) proved the almost sure (a.s.) convergence of m�,n(x) under a fractal condition onthe distribution of X. This work has been extended in the dependent case and under similar conditions by Ferraty et al. (2002)and Masry (2005) who studied, respectively, the a.s. convergence and the asymptotic normality. Kulkarni (2002) considered thekernel and nearest neighbor estimations of the regression function of arbitrary processes valued in a metric space. Biau et al.(2005) studied the neighbor-type functional classification problem in a Hilbert space. Recently, Ferraty and Vieu (2006) useda concentration condition to prove the almost complete convergence of the regression estimate when the response variable isscalar and the explanatory variable is valued in a semi-metric space.

InDabo-Niang andRhomari (2003),we established the same statement as (Ferraty andVieu, 2000) under general assumptions.We asked that limn→∞ (nP(X ∈ Bxhn

))/logn=∞ (where Bxh denotes the closed ball of radius h and center x according to the metric

d of X), which is fulfilled for all random variables X and some class of sequences (hn) converging to 0. We also assumed thatthe regression function satisfies a local differentiation condition. Dabo-Niang and Rhomari (2003) is an extension of the kernelregression estimation to the case where the response variable Y is scalar and X takes values in a general separable semi-metricspace. We proved that mId,n is consistent in p-mean and almost surely.

In this paper we extend Dabo-Niang and Rhomari (2003), Devroye (1981) and Ferraty and Vieu (2000, 2006) results, to thecase where the response variable Y and �(Y) belong to some separable Banach spaces and X takes values in a general separablesemi-metric space (possibly of infinite dimensions). We also give the rates of convergence which are optimal in the case of finitedimensionalX. Precisely, we study the kernel estimate, and find sufficient conditions onWni(x) that ensure

limn→∞ E(‖m�,n(x) − m�(x)‖p) = 0 (3)

for x in X, when E(‖�(Y)‖p) <∞, p�1. We show the pointwise and integrated (i.e. L1(�)-error:∫ ‖m�,n − m�‖d�) strong

convergence of m�,n, when the r.v. �(Y) is bounded, under general conditions (m� may be discontinuous). We precise the rateof convergence in each type of consistency. The conditions and bounds are written in terms of �(Bxhn ) = P(X ∈ Bxhn

) to unify theregression estimationwith finite and infinite dimensional explanatory variables. The results, especially the bounds on estimationerrors, highlight how the probability of small balls for X is linked with the rates of convergence. Finally, we apply these results toa special and important case where X is the standard Wiener process (� is the Wiener measure), withX= C[0, 1] equipped withthe sup-norm.

S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434 1423

The rest of the paper is as follows. Section 3 contains results on pointwise consistency in p-mean and the strong consistency.We also study the integrated error in the special case of bounded �(Y). Section 4 is devoted to an application to the particularcase of Wiener process inX= C[0, 1] (see e.g. Dabo-Niang, 2002). The proofs are given in the last section.

2. Assumptions and comments

We suppose that Y is a separable Banach space of type s, 1 < s�2 (see e.g. Ledoux and Talagrand, 1991). Recall that fromHoffmann-JBrgensen and Pisier (1976), Y is a Banach space of type s if and only if there exists a constant Cs such that for allindependent randomvariables Y1, . . . ,Yn inY, withmean 0 and finite sthmoment, E‖∑n

i=1 Yi‖s�Cs∑n

i=1 E‖Yi‖s. For ameasurable

space (�,�, �) with a �-finite measure �, the function space LpR(�,�, �) is of type s=min(p, 2), for 1�p <∞. We also assume that

the semi-metric space (X,d) is separable.Let � denote the probability distribution of X. The main assumption that is needed here is a differentiation condition form�:

let x ∈ X, we suppose that, for some p�1,

limh→0

1�(Bxh)

∫Bxh

‖m�(w) − m�(x)‖p d�(w) = 0. (4)

In the finite dimensional case, this condition holds for all �, all functions in Lp(�) and �-almost all x (cf. e.g. Parthasarathy, 1967;Wheeden and Zygmund, 1977; Devroye, 1981; Vakhania et al., 1987) contrarily to the case of infinite dimensional space (cf. Preiss,1979; see Dabo-Niang and Rhomari, 2003 for some comments). However, condition (4) obviously holds if m� is continuous atx (also for all Borel locally finite measures); the converse is false in general. It may also be fulfilled for �-almost all x and allfunctions in Lp(�) for certain gaussian measure � as proved in Tišer (1988) (see Remark 1 below). Here, we prove that, as in finitedimension (Devroye, 1981), (4) and E(‖�(Y)‖p) <∞, for some p�1, imply (3), even for an infinite dimensional spaceX.

Let us assume that there exist positive numbers r, a and b such that

a1{|u|� r} �K(u)�b1{|u|� r} (5)

and for a fixed x inX suppose that

hn → 0 and limn→∞ n�(Bxrhn ) = ∞. (6)

We assume also that

E[‖�(Y) − m�(X)‖p|X ∈ Bxrhn ] = o([n�(Bxrhn )]p−p/s). (7)

Remark 1.

1. An example where (4) does not imply that m� is continuous at x is Direchlet's function defined on [0, 1] as m�(x) =1{x rational in [0,1]}. This function fulfills (4) when � is an absolutely continuous probability measure with respect to Lebesguemeasure on [0, 1], but it is nowhere continuous (example from Krzy �zak, 1986). In infinite dimension one can, instead ofrational, take some countable dense set and continuous measure.

2. Ifm� satisfies (4) for p = 1 and is bounded in a neighborhood of x then (4) is true for all p.3. When �(Y) is bounded, condition (7) is obviously satisfied.4. Remark that

E[‖�(Y) − m�(X)‖p|X ∈ Bxrh] = 1�(Bxh)

E[‖�(Y) − m�(X)‖p1Bxh(X)]

then condition (7) is satisfied, for example, when (6) holds and if E[‖�(Y) − m�(X)‖p|X ∈ Bxrh] = O(1) or limn→∞n(�(Bxrhn ))

1+s/(p(s−1)) = ∞.5. LetY= R,X be an Hilbert space and � a Gaussian measure with covariance operator having the following spectral represen-

tation, Rx =∑ci〈x, ei〉ei, where (ei) is an orthonormal system in X. If ci+1�cii

− for given >5/2, then Eqs. (4) and (7) are

valid for �-almost all x inX, when E|�(Y)|p′<∞ and p′ > p�1 (see Tišer, 1988). In this case, Theorem 1 holds for �-almost all

x inX.6. Recall that Eqs. (4) and (7) hold for �-almost all x and all � in the finite dimensional space X = Rd,whenever E|Y|p <∞, see

Devroye (1981) or Wheeden and Zygmund (1977). In this case, hd = O(�(Bxh)) for �-almost all x, and all probability measures

� on Rd. So condition (6) is implied by hn → 0 and limn→∞nhdn = ∞. Then the formulation of the assumptions and thebounds of the estimation errors in terms of the probability �(Bxh) unify the two cases of the finite and the infinite dimensionalexplanatory variables. It should be noted that the evaluation of �(Bxh) which is known as probability of small balls is verydifficult in infinite dimension (see e.g. Lifshits, 1995).

1424 S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434

3. Main results

3.1. p-Mean consistency

Theorem 1. If E(‖�(Y)‖p) <∞, and if (5)–(7) hold then

E(‖m�,n(x) − m�(x)‖p) −→n→∞ 0

for all x inX such that m� satisfies (4).

If � is discrete, the result in Theorem 1 holds for all points in the support of � whenever E(|�(Y)|p) <∞ and hn → 0.Rate of p-mean convergence: The next result is about a rate of p-mean convergence, p� swhere s is the type of the Banach space

Y. To provide bounds on the estimation error, we evaluate the bias. To this end, we need to refine condition (4). We suppose thatm� is “p-mean Lipschitz”, in a neighborhood of x, with parameters 0 < = x�1 and c = cx >0, that is,

1�(Bxh)

∫Bxh

‖m�(w) − m�(x)‖p d�(w)�cxhp as h → 0. (8)

Condition (8) is satisfied ifm� is Lipschitzian of parameters =x and cx, but is weaker than the Lipschitz one and does not implyin general the continuity at x (see the example in Remark 1). A similar condition was used by Krzy �zak (1986).

Corollary 1. If E(‖�(Y)‖p) <∞, E(‖�(Y) − m�(X)‖p|X ∈ Bxrhn) = O(1) and if (5), (6) and (8) are satisfied, with p� s, then we have

E(‖m�,n(x) − m�(x)‖p) = O

⎛⎝hpn +

(1

n�(Bxrhn )

)p−p/s⎞⎠ .

Note that the results, especially the bounds of estimation errors, highlight how the probability of small balls related to theexplanatory variable influences the convergence.

Remark 2 (when s = 2).

1. When the response variable �(Y) is scalar, i.e.Y= R, s= 2, we obtain the same bound as in Dabo-Niang and Rhomari (2003).2. It is obvious that if �(Y) is bounded, we deduce from Theorem 1, that E(‖m�,n(X) − m�(X)‖p) = ∫

E(‖m�,n(x) − m�(x)‖p)d�(x) −→

n→∞ 0, for all p >0, when (4), and (6) hold for �-almost all x.

3. If we assume that lim suph→0 �(Bxh)/ha(x) <∞, for some a(x) >0, we obtain, for h = cn−1/(2+a(x)),

E(‖m�,n(x) − m�(x)‖p) = O(n−p/(2+a(x))), p�2.

4. In the finite dimensional caseX=Rd, the best rate derived from the bound in Corollary 1 is optimal (in the sense that it is botha lower and an achievable rate of convergence, see Stone, 1982), because hd = O(�(Bxh)) for �-almost all x, and all probability

measures � on Rd; see Devroye (1981, Lemma 2.2 and its proof). In this case, we obtain a similar rate as above with d insteadof a(x).

5. Spiegelman and Sacks (1980) established in Rd, the same rate for E(|mI,n(X) − mI(X)|2), in the case where p = 2, = 1 andbounded Y . Krzy �zak (1986) obtained for a more general kernel, rates in probability and a.s. convergence under condition (8)and unbounded response Y .

3.2. Strong consistency

Below, we give the pointwise and the �-integrated strong convergence results of the kernel estimate andwe precise the rates.In order to simplify the proof, we assume that ‖�(Y)‖�M� <∞, a.s. But with the truncating techniques we can extend the resultto unbounded �(Y), see the following remark.

Theorem 2. If �(Y) is bounded, (5) holds, hn → 0 and limn→∞ (n�(Bxrhn ))/logn = ∞, then

‖m�,n(x) − m�(x)‖ −→ 0 a.s., as n → ∞ (9)

for all x inX such that m� satisfies (4).

S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434 1425

If in addition (4) holds for �-almost all x,

limn→∞ E(‖m�,n(X) − m�(X)‖|X1,Y1, . . . ,Xn,Yn) = 0, a.s. (10)

Rate of convergence: We establish now the rate of strong consistency.

Corollary 2. Let us assume that ‖�(Y)‖�M�. If (5) and (8) for p=1 hold, if hn → 0 and limn→∞ (n�(Bxrhn ))/logn=∞, then we have

‖m�,n(x) − m�(x)‖ = O

⎛⎝h

n +(

lognn�(Bxrhn )

)1/2+(

1n�(Bxrhn )

)1−1/s⎞⎠ a.s.

Actually we obtain more precisely, for large n, a.s.,

‖m�,n(x) − m�(x)‖�cx(b/a)(rhn) + 6M�(b/a)

(n�(Bxrhn )

logn

)−1/2

+ CsM�22−1/s(b/a)1−1/s((n − 1)�(Bxrhn ))−1+1/s.

The 6M� can be replaced by some a.s. bound on

2max(√

E[‖m�(X) − m�(x)‖2|X ∈ Bxrhn], 2√E[‖�(Y)‖2|X]1Bxrhn

(X)).

But ifm� is Lipschitzian in a neighborhood of x, with parameters 0 < = x�1 and cx >0, this bound becomes

‖m�,n(x) − m�(x)‖�cx(b/a)(rhn) + 2M�(b/a)

(n�(Bxrhn )

logn

)−1/2

+ CsM�22−1/s(b/a)1−1/s((n − 1)�(Bxrhn ))−1+1/s.

The 2M� can be replaced by some a.s. bound on

2√E[‖�(Y)‖2|X]1Bxrhn

(X).

Remark 3. WhenY is a separable Hilbert space, s = 2 and the terms where s appears can be omitted, i.e. we have

‖m�,n(x) − m�(x)‖�cx(b/a)(rhn) + 6M�(b/a)

(n�(Bxrhn )

logn

)−1/2

.

As above, the same remarks about M� are valid.

Remark 4. The proof of Corollary 2 shows that one can replace the boundedness assumption on �(Y) by a Cramer condition. Itmeans that there exists a positive constant c such that E(ec‖�(Y)‖) <∞; in this case we easily have max1� i�n ‖�(Yi)‖ =O(logn),

a.s. Therefore,M� may be taken asM� =O(logn), and the condition linking n and hn becomes limn→∞ n�(Bxrhn )log3n= ∞. Then

(9) is obtained whenever hn → 0, limn→∞ n�(Bxrhn )/log3n = ∞ and (4) holds. We can only assume the existence of a polynomial

moment, but in this case the logn in the condition becomes a power of n.

Remark 5 (for Banach space of type s = 2).

1. When the response variable �(Y) is scalar (s = 2), a similar bound has been obtained in Dabo-Niang and Rhomari (2003).2. If lim suph→0 �(Bxh)/h

a(x) <∞, we findfor h = c(n/logn)−1/(2+a(x)) that

‖m�,n(x) − m�(x)‖ = O((n/logn)−/(2+a(x))) a.s.

3. When Y takes values in R, (and � = Id), (Ferraty and Vieu, 2000) gave a rate of convergence slower than that of Corollary 2.They assumed in addition thatm� is Lipschitzian and a positive number b(x) exists, such that �(Bxh)=ha(x)c(x)+O(ha(x)+b(x)).

Under these conditions, they obtained the rate O((n/logn)−�(x)/(2�(x)+a(x))), where �(x) = min{b(x), }.4. In finite dimension, Krzy �zak (1986) similar rates in probability and a.s. convergence in a bounded case, but weaker for

unbounded Y (� = Id).

1426 S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434

3.3. Example in infinite dimension

In the simple but interesting case whereX = C[0, 1] (equipped with the sup-norm) is the set of all continuous real functionsdefined over [0, 1], let X = W be the standard Wiener process. Hence, � = Pw is the Wiener measure. Define the set

S ={x ∈ C[0, 1] : x(0) = 0, x is absolutely continuous and

∫ 1

0x′(t)2 dt <∞

},

where the ′ stands for the derivative. Then (see Csaki, 1980),

exp

(−12

∫ 1

0x′(t)2 dt

)Pw(B0h)�Pw(Bxh)�Pw(B0h)

as h → 0 and x ∈ S. In addition Pw(B0h)/exp(−�2/8h2) → 1 as h → 0, see Bogachev (1998) (cf. also Lifshits, 1995, Section 18).Hence, for x ∈ S,

exp

(−12

∫ 1

0x′(t)2 dt

)� lim inf

h→0

Pw(Bxh)

exp(−�2/(8h2))

� lim suph→0

Pw(Bxh)

exp(−�2/(8h2))�1.

From these inequalities, we deduce that for h=(c logn)−1/2, we have c(x)n−c�2/8�Pw(Bxh)�n−c�2/8, for large n. If h=(c logn)−1/2

with c <8/�2, then this last bound and Corollary 1, imply

E(‖m�,n(x) − m�(x)‖p) = O((logn)−p/2), p� s.

A strong rate of convergence can also be given when �(Y) is bounded.Note that ∀x ∈ S and ∀a >0 we have limh→0 (�(B

xh))/h

a = 0; hence the standard Wiener measure on (C[0, 1], ‖ · ‖∞) has aninfinite fractal dimension and then does not check the fractal condition of Ferraty and Vieu (2000,2006) and Ferraty et al. (2002)(cf. Remark 5).

3.4. How to choose the window h?

It is well known that the performance of the kernel estimate depends on the choice of the window parameter h. The boundin Corollary 2 is simple and easy to compute. So, this allows us to choose the window parameter h that minimizes this bound.This choice of the bandwidth h leads to be optimal in the finite dimensional case and near optimal performance in the infinitedimensional case (see Remark 2 and Comment 6 of the following section).We can also use differentmethods like cross-validation.For example, one can consider the bandwidth thatminimizes the following cross-validation criterion (see Rachdi and Vieu, 2007):

CV(h) = 1n

n∑j=1

(�(Yj) − m−j�,n(Xj))

2W(Xj), (11)

where

m−j�,n(x) =

n∑i=1, i� j

Wni(x)�(Yi)

and W is some known nonnegative weight function. The cross-validation bandwidth is then

hopt = argminh∈H

CV(h),

where H ⊂ R+ is a given subset. Rachdi and Vieu (2007) considered the case where � = Id and proved (under some conditionsabout the concentration of X, about �,W ,H and K) that the bandwidth hopt is asymptotically optimal with respect to averagesquare error or mean integrated square error. Such result can be extended to general � under some assumptions. That is beyondthe scope of this paper and deserves investigations.

4. Comments

(1) As we mentioned, since (4) may hold in the case of discontinuous regression function m�, then all the main results givenin this paper can hold for discontinuous regression operators. Further, if K(u) = 1{|u|� r},hn → 0 and limn→∞ n�(Bxrhn ) = ∞,then (4) is necessary.

S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434 1427

(2) We conjecture that the conditions hn → 0 and limn→∞ n�(Bxrhn ) = ∞ (resp. limn→∞ (n�(Bxrhn ))/logn = ∞) are necessary inTheorem 1 (resp. 2).

(3) The proofs we used do not require imposing a particular structure of distribution of X. Thus, our results include the casewhere X is a process having infinite fractal dimension, such as the Wiener process.

(4) (X, d) is a general metric space and may for example be any subset of usual function spaces (e.g. some set of the probabilitydensities equippedwith the usual metrics, any subspace of functions which is not a linear space, obtained for example undersome constraints, etc.), or(a) (P, dpro), a set of probabilities defined on some measurable space with dpro the Prohorov metric.(b) (PR, dK ), a set of probabilities defined onR endowedwith its Borelian �-field,with dK (P,Q)=supx |P(−∞, x]−Q(−∞, x]|,

the Kolmogorov metric.(c) (D[0, 1],dsko) set of the Cadlag functions on [0, 1] where dsko is the Skorohod metric.

(5) By the mean of approximation of dependent r.v.'s by independent ones, we can drop the independence hypothesis andreplace it by the -mixing, (see Rhomari, 2005).

(6) In this paper, we provide upper bounds which can be used to identify rates of convergence for the regression estimation.From these bounds we derive well-known optimal rates of convergence for finite-dimensional regression, see Stone (1982).For infinite dimensional regression case, it seems that upper rates given in this paper lead to near optimal performance.The problem of identifying lower rates for the kernel estimator or minimax rates of convergence in infinite dimensionalregression is a challenging and interesting open problem. To obtain minimax rates of convergence, lower bounds areoften needed for the targeted family of regression functions. These bounds are difficult to obtain in our general settingof infinite dimensional valued random variables without any additional conditions. Perhaps, for the Lp(�) metric, resultsdetermining lower bounds based on globalmetric entropy conditions on the targeted function class of the regression function(for example Yang and Barron, 1999, Condition 2) can be obtained for our regression model by using Theorem 6 of Yang andBarron (1999). Lower and minimax rates of the regression estimation considered here, exceed the scope of this paper anddeserve investigations.

5. Proofs

We adapt the proofs in Dabo-Niang and Rhomari (2003), which are, with slight modifications, similar to those in Devroye(1981), except that we use our Lemma 1 (see also Dabo-Niang and Rhomari, 2003) below instead of Lemma 2.1 of Devroye. Weadd some complements in Devroye's strong consistency proof, and we also obtain a rate of convergence as in Dabo-Niang andRhomari (2003). But, for a sake of completeness we present them in entirety. Inequality (13) gives a sharp bound than Devroye's(1981) Inequality 2.7.

Let us first introduce some notations needed in the sequel. According to the definition of the kernel estimates (1) and (2),we will adopt the convention 0/0 = 0. Define

� = 1{d(X,x)� rhn}, �i = 1{d(Xi ,x)� rhn} and Nn =n∑

t=1

�t .

The lemma is needed to prove the convergences in p-mean and a.s.

Lemma 1. For all measurable f �0 and all x inX we have, for all n�1,

E

⎛⎝ n∑i=1

Wni(x)f (Xi) |�1, . . . ,�n

⎞⎠ �

(b/a)�(Bxrhn )

∫Bxrhn

f (w) d�(w)1N�0 (12)

and

E

⎛⎝ n∑i=1

Wni(x)f (Xi)

⎞⎠ �

(b/a)[1 − (1 − �(Bxrhn ))n]

�(Bxrhn )

∫Bxrhn

f (w) d�(w) (13)

with equalities when a = b, i.e. when K(u) = 1{u� r}.

Proof of Lemma 1. Define first

U1 =n∑

i=1

f (Xi)�i∑nt=1 �t

.

It is easy to see that from the kernel condition (5) and the positivity of f we have

n∑i=1

Wni(x)f (Xi)� (b/a)U1

1428 S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434

and remark that i.i.d-ness of (X1,�1), . . . , (Xn,�n) implies

�iE(f (Xi)|�i) = �iE(f (X)|� = 1)

= �i1

�(Bxrhn )E(f (X)1{d(X,x)� rhn})

and E(f (Xi)|�1, . . . ,�n) = E(f (Xi)|�i). Then, we have

E(U1|�1, . . . ,�n,Nn �0) =n∑

i=1

E(f (Xi)| �i)�i∑nt=1 �t

=n∑

i=1

E(f (X)| � = 1)�iNn

= E(f (X)| � = 1)

= 1�(Bxrhn )

E(f (X)1{d(X,x)� rhn})

and E(U1|�1, . . . ,�n,Nn = 0) = 0. Thus, we can summarize as follows:

E(U1|�1, . . . ,�n) = 1�(Bxrhn )

E(f (X)1{d(X,x)� rhn})1Nn �0.

Taking the expectation of this quantity, we obtain

E(U1) = 1�(Bxrhn )

E(f (X)1{d(X,x)� rhn})P(Nn �0)

= 1�(Bxrhn )

E(f (X)1{d(X,x)� rhn})[1 − (1 − �(Bxrhn ))n].

This ends the proof of Lemma 1. �

Proof of Theorem 1 and Corollary 1. We use the same arguments as those in Theorem 2.1 in Devroye (1981), following theadaptation in Dabo-Niang and Rhomari (2003).

If we apply successively triangular and Jensen's inequalities we obtain for all p�1

31−pE

⎡⎣∥∥∥∥∥∥

n∑i=1

Wni(x)�(Yi) − m�(x)

∥∥∥∥∥∥p⎤⎦ �E

⎡⎣∥∥∥∥∥∥

n∑i=1

Wni(x)(�(Yi) − m�(Xi))

∥∥∥∥∥∥p⎤⎦

+ E

⎡⎣ n∑i=1

Wni(x)‖m�(Xi) − m�(x)‖p⎤⎦+ ‖m�(x)‖pP(Nn = 0). (14)

By positivity of the kernel and the independence of (Xi)'s we have

P(Nn = 0) = [1 − n�(Bxrhn )]n� exp(−n�(Bxrhn )) −→ 0

because r is finite and n�(Bxrhn ) → ∞ for all x, so the last term of (14) tends to 0. Remark that exp(−n�(Bxrhn )) = o(1/(n�(Bxrhn )).By Lemma 1, the third term in (14) is not greater than

(b/a)�(Bxrhn )

∫Bxrhn

‖m�(w) − m�(x)‖p d�(w). (15)

This latter goes to 0 for x such as m� satisfies (4).For the rate of convergence, remark that quantity (15) is bounded by bc(rhn)p/a = O(hpn ), if m� satisfies (8).Let us show that the first term on the right side of (14) tends to 0 for x such that (7) holds. We distinguish the case p� s and

1�p < s (s is the type of the Banach spaceY).In the sequel we use the notation EX(n)(·) = E(·|X1, . . . ,Xn).� Suppose first that p� s.

S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434 1429

Since (X1,Y1), . . . , (Xn,Yn) are i.i.d. and Wni(x) depends only upon (Xj)j, then triangular and Hoffmann-JBrgensen (1974)

inequalities (see also Ledoux and Talagrand, 1991, Proposition 6.8 or 6.20), conditionally to (X1, . . . ,Xn) (the summands below areconditionally i.i.d. centered r.v.'s), and Jensen's one, yield that there exists a constant C = C(p) >0 depending only on p, such that

EX(n)

⎡⎣∥∥∥∥∥∥

n∑i=1

Wni(x)(�(Yi) − m�(Xi))

∥∥∥∥∥∥p⎤⎦

�CEX(n)[

max1� i�n

(Wpni(x)‖�(Yi) − m�(Xi)‖p)

]+ C

⎛⎝EX(n)

⎡⎣∥∥∥∥∥∥

n∑i=1

Wni(x)(�(Yi) − m�(Xi))

∥∥∥∥∥∥⎤⎦⎞⎠p

=: C(A + Bp). (16)

We first study the term A. Since (X1,Y1), . . . , (Xn,Yn) are i.i.d., it is easy to see that

E(A)�n∑

i=1

E[Wp

ni(x)E(‖�(Yi) − m�(Xi)‖p|Xi)]

= nE[Wpnn(x)k�(Xn)], (17)

where k�(w) = E[‖�(Y) − m�(X)‖p|X = w].Now, define U = K(d(Xn, x)/hn),u = E(U),V = ∑n−1

j=1 K(d(Xj, x)/hn) and Zn−1 = min(1, b/V). Note that a�(Bxrhn )�u�b�(Bxrhn )and Wnj(x) = K(d(Xj, x)/hn)/(U + V)�Zn−1 for all j. Then, we have

Wp−1nn (x)�

(supj

Wnj(x)

)p−1

� (Zn−1)p−1.

So, this last bound give an estimate of (17),

E(A)�CnE[Zp−1n−1Wnn(x)k�(Xn)]. (18)

For all c >0, Zp−1n−1 = Zp−1

n−11V<c + Zp−1n−11V� c�1V<c + (b/c)p−1. Then for c = (n − 1)u/2, we have

CnE[Zp−1n−1Wnn(x)k�(Xn)]�CnP(V < (n − 1)u/2)E(1{d(Xn ,x)� rhn}k�(Xn)) + Cn

(2b

(n − 1)u

)p−1E(Wnn(x)k�(Xn)). (19)

Now we apply (13) to k�(·) in order to obtain

nE(Wnn(x)k�(Xn)) = E

⎡⎣ n∑i=1

Wni(x)k�(Xi)

⎤⎦

�(b/a)

�(Bxrhn )

∫Bxrhn

k�(w) d�(w).

Therefore, the last term of (19) can be bounded by

Cn(

2b(n − 1)u

)p−1E(Wnn(x)k�(Xn))�

2p−1C(b/a)p

[(n − 1)�(Bxrhn )]p−1�(Bxrhn )

∫Bxrhn

k�(w) d�(w).

This last expression converges to 0 if n�(Bxrhn ) → ∞ and (7) is satisfied.

For the rate of convergence, notice that, when E[‖�(Y) − m�(X)‖p|X ∈ Bxrhn] is bounded, this last term is O(1/(n�(Bxrhn ))

p−1).To end the proof of the term A in (16), it remains to show that the second term of (19) goes to 0. With U and V defined above,

and u = E(U) = E(V)/(n − 1)�a�(Bnhn ), put �2 = var(U). Because |U − EU|�2b and �2�bu, Bernstein's inequality for sums ofbounded independent random variables (see Bennett, 1962), permits to obtain

P(V < (n − 1)u/2) = P(V − E(V) < − (n − 1)u/2)

� exp

(−(n − 1)(u/2)2

2�2 + bu/3

)

� exp(−3(n − 1)u/28b)

� exp(−c1n�(Bxrhn

)) (20)

1430 S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434

for n�2, where c1 = 3a/56b. Thus,

CnP(V < (n − 1)u/2)E(1{d(Xn ,x)� rhn}k�(Xn))

�Cn exp(−c1n�(Bxrhn

))∫Bxrhn

k�(w) d�(w)

= Cn�(Bxrhn ) exp(−c1n�(Bxrhn

))1

�(Bxrhn )

∫Bxrhn

k�(w) d�(w). (21)

Then, (21) tends to 0 according to (7) and n�(Bxrhn ) → ∞.

For the proof of the corollary, remark that the second term of (19) becomes o(1/(n�(Bxrhn ))p−1), when E[‖�(Y)−m�(X)‖p|X ∈

Bxrhn] is bounded.For the second term B of (16), recall that we suppose that p� s. By Holder's inequality and sinceY is a Banach space of type s,

there exists a constant C = C(s) depending only upon s so that

Bs�EX(n)

⎡⎣∥∥∥∥∥∥

n∑i=1

Wni(x)(�(Yi) − m�(Xi))

∥∥∥∥∥∥s⎤⎦

� Cn∑

i=1

EX(n)[‖Wni(x)(�(Yi) − m�(Xi))‖s]

= Cn∑

i=1

Wsni(x)E(‖�(Yi) − m�(Xi)‖s|Xi). (22)

Hence, by Jensen and Holder inequalities:

E(Bp)�CE

⎛⎝ n∑i=1

Wsni(x)E(‖�(Yi) − m�(Xi)‖s|Xi)

⎞⎠p/s

� CE

⎛⎝ n∑i=1

Wni(x)Wp(s−1)/sni (x)E(‖�(Yi) − m�(Xi)‖p|Xi)

⎞⎠

= CnE(Wp−p/snn (x)Wnn(x)E(‖�(Yn) − m�(Xn)‖p|Xn))

� CnE[Zp−p/sn−1 Wnn(x)k�(Xn)].

This last bound is similar to that of E(A) in (18) with p− p/s instead of p− 1. Therefore, we obtain as before (the difference is onlyin the last term of inequality (19) which is not greater than the following bound)

Cn(

2b(n − 1)u

)p−p/sE(Wnn(x)k�(Xn))�

C2p−p/s(b/a)p+1−p/s

[(n − 1)�(Bxrhn )]p−p/s�(Bxrhn )

∫Bxrhn

k�(w) d�(w).

This last term converges to 0 if n�(Bxrhn ) → ∞ and (7) is satisfied.

For the proof of the corollary, notice that, when E[‖�(Y)−m�(X)‖p|X ∈ Bxrhn] is bounded, this last term is O(1/(n�(Bxrhn ))

p−p/s).

Thus, Theorem 1 and Corollary 1 are proved when p� s, because p − p/s < p − 1 for p� s.

Remark 6. In fact, from (16), (18) and (23) we show that for p� s

E

⎡⎣∥∥∥∥∥∥

n∑i=1

Wni(x)(�(Yi) − m�(Xi))

∥∥∥∥∥∥p⎤⎦ �CpnE[(W

pnn(x) + Cp,sW

p+1−p/snn (x))k�(Xn)].

� For 1�p < s, we use the truncating techniques as usual. LetM be a positive number and define�′(Y)=�(Y)1{‖�(Y)‖�M} and�′′(Y) = �(Y)1{‖�(Y)‖>M}, (that is �′′ = � − �′,�′ = �1{‖�‖�M}). Then Theorem 1 is true for (X,�′(Y)), m�′ (x) = E(�′(Y)|X = x),

S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434 1431

and all fixedM. It suffices to prove that the remainder term related to �′′(Y), that is

E

⎡⎣ n∑i=1

Wni(x)‖�′′(Yi) − m�′′ (Xi)‖p⎤⎦= E

⎡⎣ n∑i=1

Wni(x)g�,M(Xi)

⎤⎦

�(b/a)

�(Bxrhn )

∫Bxrhn

g�,M(w) d�(w)

� 2p(b/a)

�(Bxrhn )E‖�(Y)1{‖�(Y)‖>M}‖p

tends to 0, where m�′′ (x) = E(�′′(Y)|X = x), g�,M(w) = E[‖�′′(Y) − m�′′ (X)‖p|X = w]. The first inequality comes from Lemma 1.The last term tends to 0 as M → ∞ for all x and h>0. This ends the proof of Theorem 1 and Corollary 1. �

To prove the a.s. convergence of the kernel estimate we need the following lemma on binomial random variable.

Lemma 2 (Devroye, 1981, Lemma 4.1). If Nn is a binomial random variable with parameters n and p = pn such that np/logn → ∞,then

∞∑n=1

E(exp(−sNn)) <∞ for all s >0.

Proof of Theorem 2 and Corollary 2. We apply the same arguments as those for Theorem 4.2 in Devroye (1981), followingDabo-Niang and Rhomari (2003).

The conditional expectation in (10) is exactly∫ ‖m�,n − m�‖d�. Then (10) is deduced from (9), as in Devroye (1981), by

applying Fubini's and dominated convergence theorem (see Glick, 1974). To prove (9), recall first that �i = 1{d(Xi ,x)� rhn} and

Nn =∑nt=1 �t , and define

U1(x) = (b/a)n∑

i=1

‖m�(Xi) − m�(x)‖ �i∑nt=1 �t

,

U2(x) =∥∥∥∥∥∥

n∑i=1

Wni(x)(�(Yi) − m�(Xi))

∥∥∥∥∥∥ .Then, we have the following inequality:

‖m�,n(x) − m�(x)‖�U2(x) + U1(x) + ‖m�(x)‖1Nn=0. (23)

Note that (see the proof of Theorem 1)

P(Nn = 0) = [1 − �(Bxrhn )]n� exp(−n�(Bxrhn )).

This last term is the general term of summable series with respect to n when limn→∞ n�(Bxrhn )/logn>1. So the last term in (23)is a.s. 0 for large n.

Let Zx1,i = ‖m�(Xi) − m�(x)‖�i and �(n) = (�1, . . . ,�n), then |Zx1,i − E�(n)(Zx1,i)|�4M��i and var(Zx1,i|�1, . . . ,�n)�4M2��i, where

E�(n)(·) = E(·|�1, . . . ,�n). Then, we have by Bernstein's inequality, for any � >0

P(|U1(x) − E�(n)(U1(x))| > �|�1, . . . ,�n)

�P

⎛⎝∣∣∣∣∣∣n∑

i=1

(Zx1,i − E�i (Zx1,i))

∣∣∣∣∣∣ > �(a/b)Nn|�1, . . . ,�n

⎞⎠

�2 exp(−c1Nn),

where c1 = a2�2/(8bM�(bM� + a�/3)). We get from Lemma 1 that

E�(n)(U1(x))�(b/a)

�(Bxrhn )E(‖m�(X) − m�(x)‖1{d(X,x)� rhn})

= (b/a)�(Bxrhn )

∫Bxrhn

‖m�(w) − m�(x)‖d�(w). (24)

1432 S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434

Term (24) tends to 0 under (4). Thus, for large n, P(E�(n)(U1(x)) > �|�1, . . . ,�n) = 0 and

P(U1(x) >2�|�1, . . . ,�n)�2 exp(−c1Nn).

Therefore, for all � >0

P(U1(x) >2�)�2E{exp(−c1Nn)}.

Since Nn is a binomial random variable with parameters n and p(x) = �(Bxrhn ) such that np(x)/logn → ∞ as n → ∞, the last termin the above inequality is summable with respect to n by Lemma 2. Hence U1(x) −→ 0, a.s., by Borel–Cantelli lemma.

Remark 7. If m� was assumed to be continuous at x, it is easy to prove that U1(x) converges to 0. Indeed, maxi‖m�(Xi) −m�(x)‖�i��, for small h and then 0�U1(x)� (b/a)� for large n.

The term U2(x) is treated in a similar way. Let

Zx2,i = K(d(Xi, x)/hn)(�(Yi) − m�(Xi)) and zi = K(d(Xi, x)/hn)m�(Xi)

then ‖Zx2,i−zi‖�bM��i and E(‖Zx2,i−zi‖2|X1, . . . ,Xn)�b2M2��i. As above, we also have by Bernstein's type inequality for bounded

Banach-valued independent random variables (see Pinelis and Sakhanenko, 1985 or the remark on page 605 of Pinelis, 1990):

P(|U2(x) − EX(n)(U2(x))| > �|X1, . . . ,Xn)

�P

⎛⎝∣∣∣∣∣∣∥∥∥∥∥∥

n∑i=1

Zx2,i

∥∥∥∥∥∥− EX(n)

∥∥∥∥∥∥n∑

i=1

Zx2,i

∥∥∥∥∥∥∣∣∣∣∣∣ > �aNn|X1, . . . ,Xn

⎞⎠

�2 exp(−c2Nn),

where c2 = a2�2/(2bM�(bM� + a�/3)). Since∑n

i=1Wni(x)E(‖�(Yi) − m�(Xi)‖s|Xi)�2sMs�, we get from (22)

EX(n)(U2(x))�Cs2M�

(supj

Wnj(x)

)1−1/s

�CsM�2(Zn−1)1−1/s

and

P(EX(n)(U2(x)) > �|X1, . . . ,Xn) = 1{EX(n)(U2(x))>�}� 1{2CsM(Zn−1)

1−1/s>�}� 1{V<b(2CsM/�)s/(s−1)}� 1{V−E(V)<−(n−1)u/2}

if ��Cs2M�(2b/((n−1)u))(s−1)/s (this is satisfied for large n, since nu�an�(Bxrhn ) → ∞). From Inequality (20) and this last bound,

we deduce that, for all � >0 and large n such that ��Cs2M�(2b/((n − 1)u))(s−1)/s,

P(EX(n)(U2(x)) > �)� exp(−(3a/56b)n�(Bxrhn )).

Gathering these above inequalities and taking their expectations, we obtain for all � >0 and large n such that ��Cs2M�(2b/((n − 1)u))(s−1)/s,

P(U2(x) >2�)�2E(exp(−c2Nn)) + exp(−(3a/56b)n�(Bxrhn )). (25)

Lemma 2, the fact that this last term is a general term of a summable series w.r.t. n, and Borel–Cantelli lemma imply thatU2(x) −→ 0, a.s. Then, we deduce (9) from all the convergences above. This ends the proof of Theorem 2. �

Proof of Corollary 2. For the rate of convergence, using the p-mean Lipschitz hypothesis (8), on m� in a neighborhood of x,we easily get from (24)

E�(n)(U1(x))�c(b/a)(rhn).

S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434 1433

Then, we have P(E�(n)(U1(x)) > c(b/a)(rhn)|�1, . . . ,�n)=0. Take now �1 = c(b/a)(rhn) +C1(n�(Bxrhn )/logn)

−1/2, for an appropriatepositive number C1, we obtain as before

P(U1(x) > �1)�2E{exp(−c1Nn)}, (26)

where c1 defined above corresponds to � = C1(n�(Bxrhn )/logn)−1/2. Remark that (see the proof of Lemma 4.1 in Devroye, 1981)

E(exp(−�Nn))� exp(−�′n�(Bxrhn )), where �′ = min(�/2, 1/10). Then Borel–Cantelli lemma yields the proof.

Indeed, for n large enough and c′1 = min(c1/2, 1/10) = c1/2, (26) is bounded by

2 exp(−c′1n�(Bxrhn

))�2 exp

⎛⎝− a2C21 logn

16bM�[bM� + aC1(n�(Bxrhn )/logn)−1/2/3]

⎞⎠

� 2 exp(−C′1 logn).

The last term is the general term of a summable series with respect to n for a suitable choice of C1 to ensure that C′1 >1 (in fact

C1 >4bM�/a).

Remark 8. Ifm� was assumed to be -Lipschitzian in a neighborhood of x,U1(x) could be treated directly becausemaxi ‖m�(Xi)−m�(x)‖�i�c(rhn) for small h and then 0�U1(x)�c(b/a)(rhn) for large n.

To conclude, let us show in the same way as before, that

U2(x) = O((n�(Bxrhn )/logn)−1/2 + (n�(Bxrhn ))

−1+1/s).

For this purpose, we replace in inequality (25), 2� by � = �′ + �′′ with �′ = C2(n�(Bxrhn )/logn)−1/2 (for an appropriate positive

number C2), and �′′ = Cs2M�(2b/((n − 1)u))(s−1)/s. The last term in (25) which corresponds to �′′ is summable. Recall that in the

other term, c2 = a2�′2/(2bM�(bM� + a�′/3)). Then, for n large enough, c′2 =min(c2/2, 1/10)= c2/2, the second term of (25) is thenbounded by

2 exp(−c′2n�(Bxrhn

))�2 exp

⎛⎝− a2C22 logn

4bM�[bM� + aC2(n�(Bxrhn )/logn)−1/2/3]

⎞⎠

� 2 exp(−C′2 logn).

This latter is the general term of a summable series with respect to n for a suitable choice of C2 to ensure that C′2 >1, (in fact

C2 >2bM�/a).Therefore we obtain, a.s., for large n, since u�a�(Bxrhn ),

U1(x)�c(b/a)(rhn) + (4bM�/a)(n�(Bxrhn )/logn)−1/2,

U2(x)� (2bM�/a)(n�(Bxrhn )/logn)−1/2 + Cs2M�(2b/(a(n − 1)�(Bxrhn )))

1−1/s.

This yields the proof. �

Acknowledgments

The authors would like to thank the referees and the editor for their helpful comments leading to improve this work.

References

Bennett, G., 1962. Probability inequalities for sums of independent random variables. J. Amer. Statist. Assoc. 57, 33–45.Biau, G., Bunea, F., Wegkamp, M.H., 2005. Functional classification in Hilbert Spaces. IEEE Trans. Inform. Theory 51, 2163–2172.Bogachev, V.I., 1998. Gaussian Measures. American Mathematical Society, Providence, RI.Bosq, D., 1983. Sur la prédiction non paramétrique de variables aléatoires et mesures aléatoires. Zeit. Wahrs. Ver. Geb. 64, 541–553.Bosq, D., 2000. Linear processes in function spaces: theory and applications. In: Lecture Notes in Statistics, vol. 149, Springer, Berlin.Bosq, D., Delecroix, M., 1985. Nonparametric prediction of a Hilbert space valued random variable. Stochastic Process. Appl. 19, 271–280.Csaki, E., 1980. A relation between Chung's and Strassen laws of the iterated logarithm. Zeit. Wahrs. Ver. Geb. 19, 287–301.Dabo-Niang, S., 2002. Estimation de la densité dans un espace de dimension infinie: application aux diffusions. C. R. Acad. Sci. Paris I 334, 213–216.Dabo-Niang, S., Rhomari, N., 2003. Estimation non paramétrique de la régression avec variable explicative dans un espace métrique. C. R. Acad. Sci. Paris I 336,

75–80.Devroye, L., 1981. On the absolute everywhere convergence of nonparametric regression function estimates. Ann. Statist. 9, 1310–1319.

1434 S. Dabo-Niang, N. Rhomari / Journal of Statistical Planning and Inference 139 (2009) 1421 -- 1434

Devroye, L., Wagner, T.J., 1980. Distribution-free consistency results in nonparametric discrimination and regression function estimation. Ann. Statist. 8,231–239.

Ferraty, F., Vieu, P., 2000. Dimension fractal et estimation de la régression dans des espaces vectoriels semi-normés. C. R. Acad. Sci. Paris I 330, 139–142.Ferraty, F., Vieu, P., 2006. Nonparametric Functional Data Analysis. Springer, New York.Ferraty, F., Goia, A., Vieu, P., 2002. Functional nonparametric model for time series: a fractal approach for dimension reduction. TEST 11, 317–344.Glick, N., 1974. Consistency conditions for probability estimators and integrals of density estimators. Utilitas Math. 6, 61–74.Greblicki, W., Krzy �zak, A., Pawlak, M., 1984. Distribution-free pointwise consistency of kernel regression estimate. Ann. Statist. 12, 1570–1575.Gyorfi, L., 1981. Recent results on nonparametric regression estimate and multiple classification. Problem Control Inform. Theory 10, 43–52.Gyorfi, L., Kohler, M., Krzy �zak, A., Walk, H., 2002. A distribution free theory of nonparametric regression. In: Springer Series in Statistics. Springer, New York.Hoffmann-JBrgensen, J., 1974. Sums of independent Banach space valued random variables. Studia Math. 52, 159–186.Hoffmann-JBrgensen, J., Pisier, G., 1976. The law of large numbers and the central limit theorem in Banach spaces. Ann. Probab. 4, 587–599.Krzy �zak, A., 1986. The rates of convergence of kernel regression estimates and clasification rules. IEEE Trans. Inform. Theory IT-32, 668–679.Kulkarni, S.R., 2002. Data-dependent kn-NN and kernel estimators consistent for arbitrary processes. IEEE Trans. Inform. Theory 48 (10), 2785–2789.Kulkarni, S.R., Posner, S.E., 1995. Rates of onvergence of nearest neighbor estimation under arbitrary sampling. IEEE Trans. Inform. Theory 41, 1028–1039.Ledoux, M., Talagrand, M., 1991. Probability in Banach Spaces. Springer, New York.Lifshits, M.A., 1995. Gaussian Random Functions. Kluwer Academic Publishers, Dordrecht.Masry, E., 2005. Nonparametric regression estimation for dependent functional data: asymtotic normality. Stochastic Process. Appl. 115, 155–177.Parthasarathy, K.R., 1967. Probability Measures on Metric Spaces. Academic Press, New York.Pinelis, I.F., 1990. Inequalities for distribution of sums of independent random vectors and their application to estimating density. Theory Probab. Appl. 35,

605–607.Pinelis, I.F., Sakhanenko, A.I., 1985. Remarks on inequalities for large deviation probabilities. Theory Probab. Appl. 30, 143–148.Preiss, D., 1979. Gaussian measures and covering theorem. Comment. Math. Univ. Carolin. 20, 95–99.Rachdi, M., Vieu, P., 2007. Nonparametric regression for functional data: automatic smoothing parameter selection. J. Statist. Plann. Inference 137, 2784–2801.Ramsay, J.O., Silverman, B.W., 1997. Functional Data Analysis. Springer, New York.Ramsay, J.O., Silverman, B.W., 2002. Applied Functional Data Analysis; Methods and Case Studies. Springer, New York.Rhomari, N., 2005. Kernel regression estimation in Banach space under dependence, preprint.Spiegelman, C., Sacks, J., 1980. Consistent window estimation in nonparametric regression. Ann. Statist. 8, 240–246.Stone, C.J., 1977. Consistent nonparametric regression. Ann. Statist. 5, 595–645.Stone, C.J., 1982. Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10, 1040–1053.Tišer, J., 1988. Differentiation theorem for Gaussian measures on Hilbert spaces. Trans. Amer. Math. Soc. 308, 655–666.Vakhania, N.N., Tarieladze, V.I., Chobanyan, S.A., 1987. Probability Distributions on Banach Spaces. Reidel, Dordrecht.Wheeden, R., Zygmund, A., 1977. Measure and Integral. Marcel Dekker, New York.Yang, Y., Barron, A.R., 1999. Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27, 1564–1599.