Multivariate boundary kernels and a continuous least squares principle

Multivariate boundary kernels and a continuous

least squares principle

H. G. MuÈ ller

University of California, Davis, USA

and U. StadtmuÈ ller

UniversitaÈt Ulm, Germany

[Received August 1997. Final revision August 1998]

Summary. Whereas there are many references on univariate boundary kernels, the construction ofboundary kernels for multivariate density and curve estimation has not been investigated in detail.The use of multivariate boundary kernels ensures global consistency of multivariate kernel estimatesas measured by the integrated mean-squared error or sup-norm deviation for functions with compactsupport. We develop a class of boundary kernels which work for any support, regardless of thecomplexity of its boundary. Our construction yields a boundary kernel for each point in the boundaryregion where the function is to be estimated. These boundary kernels provide a natural continuationof non-negative kernels used in the interior onto the boundary. They are obtained as solutions of thesame kernel-generating variational problem which also produces the kernel function used in theinterior as its solution. We discuss the numerical implementation of the proposed boundary kernelsand their relationship to locally weighted least squares. Along the way we establish a continuous leastsquares principle and a continuous analogue of the Gauss±Markov theorem.

Keywords: Curve estimation; Density estimation; Edge effects; Kernel estimator; Local leastsquares; Smoothing

1. Introduction

Kernel methods are in common use for the nonparametric estimation of functions such asdensities, spectral densities, hazard rates and regression functions. Among the various curveestimation techniques, kernel methods are simple to understand and to implement and werethe ®rst such technique to be thoroughly investigated and widely used in many applications.They remain particularly popular in nonparametric density estimation and also spectraldensity estimation.

An overview of the various versions and applications of kernel estimators can for instancebe found in MuÈ ller (1988), Scott (1992) and more recently in Wand and Jones (1995). Oneproblem that is common to all kernel estimators of the various types is that they are subjectto boundary e�ects which occur near end points or boundaries of the support of the functionto be estimated. These boundary problems occur whenever the function to be estimated hascompact support.

The reason for the boundary problem for kernel estimators is that, at a boundary point,kernel mass falls outside the support of the function to be estimated and is lost. We can view

Address for correspondence: H. G. MuÈ ller, Division of Statistics, University of California at Davis, One ShieldsAvenue, Davis, CA 95616-8705, USA.E-mail: [email protected]

& 1999 Royal Statistical Society 1369±7412/99/61439

J. R. Statist. Soc. B (1999)61, Part 2, pp. 439^458

the problem as being caused by a discontinuity in the function to be estimated across theboundary. The value of the function jumps from a non-zero value in the interior of itssupport to 0 for the area outside the support of the function. An unmodi®ed kernel estimateof the function at or near the boundary then `averages out' these di�erent function values andthus will lead to biased estimates in the boundary region. Mathematically, this is re¯ected bythe fact that the familiar Taylor expansion for the bias cannot be carried out, whenever thesmoothness assumption is not satis®ed as in the case of a discontinuity.

The boundary e�ect phenomenon has long been recognized, starting with Gasser andMuÈ ller (1979) in the context of nonparametric regression and density estimation, and evenearlier for spectral density estimation. Speci®c classes of boundary kernels for univariatedensity and curve estimation were studied by Rice (1984), Gasser et al. (1985), Hall andWehrly (1991), MuÈ ller (1991, 1993) and MuÈ ller and Wang (1994). Recent investigations ofmethods to handle boundary e�ects for kernel function estimates include Jones (1993),Marron and Ruppert (1994), Cowling and Hall (1996), Jones and Foster (1996) and Cheng etal. (1997), among others.

Boundary problems are particularly severe in higher dimensions: if the support of the functionto be estimated is a unit cube in Rd and the mean-squared optimal bandwidth b � cnÿ1=�4�d �

is used for the situation of a twice continuously di�erentiable function to be estimated witha non-negative kernel, then the volume of the boundary region is 2db� o�b� � 2dnÿ1=�4�d �,which is increasing in d. Thus the boundary region takes up an increasingly larger fraction ofthe support of the function to be estimated, as the dimension d increases. A correction forboundary e�ects arising for multivariate kernel estimates has been discussed previously byStaniswalis et al. (1993), section 2.2, and Staniswalis and Messer (1996). This approachextends a method proposed by Rice (1984) for univariate boundary corrections to the multi-variate case by linearly combining estimates with several di�erent bandwidths. Although thisapproach is interesting, it is rather complicated both conceptually and practically for themultivariate situation, and some open problems remain regarding the asymptotics of multi-variate boundary e�ects (see Section 2.3 later). These observations lead to the challengewhich we address here, namely to extend the boundary kernel approach directly from theunivariate to the multivariate case.

The construction of boundary kernels which we propose here is based on the principle ofextending the same criterion which is implicitly or explicitly applied to select kernels in theinterior region to the selection of kernels for the estimation in the boundary region. Thisselection criterion amounts to choosing the kernel as the minimizer of a variational problem.We show that every non-negative kernel function with compact support arises as the solutionof such a problem, which we call the variational problem associated with the kernel. This isalso true for most negative-valued kernels used in practice. We provide a complete theoreticalsolution for the boundary kernels obtained by continuing this variational problem onto theboundary. We also discuss the numerical implementation of these boundary kernels as asolution of a least squares problem and give illustrative examples.

The paper is organized as follows: the kernel-generating variational problem is introducedin Section 2, where the relationship of this problem to the mean-squared error of a kerneldensity estimate is discussed. It turns out that the asymptotics for the multivariate boundaryproblem are conceptually intricate as will be discussed in Section 2.4. The general solution ofthe kernel-generating variational problem, which includes the case of boundary kernels, isprovided in Section 3 (theorem 2). An interesting equivalence to a variational problem of thecontinuous weighted least squares type is established (theorem 3). The variational problemswhich give rise to this boundary kernel construction can be interpreted as continuous

440 H. G. Mu�ller and U. Stadtmu�ller

versions of the least squares principle and the minimum variance principle respectively, andtheir equivalence is reminiscent of the Gauss±Markov theorem. Results on the numericalimplementation and discretized versions of the variational problems are discussed in Section4 (theorem 4). This discretized version leads to connections between the boundary-correctedmultivariate density estimate with histograms smoothed by locally weighted least squares.Illustrations and two examples of bivariate density estimation problems on compact supportsare presented in Section 5. Further discussion and concluding remarks are in Section 6.Proofs and some additional results are compiled in Appendix A.

2. A kernel-generating variational problem

2.1. Preliminaries and mean-squared errorsWe begin by discussing the mean-squared error for the case of multivariate kernel densityestimation. The structure of the mean-squared error for other multivariate curve estimatessuch as multivariate kernel regression is the same, except for minor variations. Consequently,the variational problems which we discuss in the following pertain to virtually all kernelestimators. For the multivariate density estimation setting, a basic assumption is as follows(assumption 1).

Let X1, . . ., Xn be a sample of independent and identically distributed random variables inRd, Xi � �Xi1, . . ., Xid �T, with a density function f which is twice continuously di�er-entiable on its support. We assume that the support of f is known and is given by

S � fx 2 Rd: f�x�5 � g � support � f �,for some � > 0, and that S is a compact connected set with a smooth boundary, whichsatis®es for instance cl�S8� � S8 where S8 denotes the interior and cl the closure of a set.

Since we are interested in a situation where genuine boundary e�ects occur, it makes senseto assume that f�x�5 � on S for some � > 0, ensuring that a discontinuity occurs across theboundary @S of S, while f is smooth on S:

The classical multivariate kernel density estimator (see Cacoullos (1966) and Scott (1992))at x � �x1, . . ., xd � 2 S is

f �x� � 1

nQdk�1

bk

Pni�1

K

�x1 ÿ Xi1

b1, . . .,

xd ÿ Xid

bd

�, �2:1�

where b � �b1, . . ., bd � is a vector of bandwidths, which depend on the sample size n, and K:T! R is a non-negative kernel function with support T. It is assumed that T is a com-pact and simply connected set, satisfying cl�T8 � � T. To simplify matters, we assume that(assumption 2)

bk � knÿ1=�4�d�, with k > 0, for k � 1, . . ., d, � � 1, . . ., d �:It is well known that the choices bk � knÿ1=�4�d�, 14 k4 d, lead to the optimal rate ofconvergence of mean-squared error for non-negative kernels under appropriate smoothnessassumptions.

We are using multi-index notation, � � ��1, . . ., �d �, where the �i are non-negativeintegers. This entails that x� � x

�11. . . . .x

�dd , �! � �1! . . . . .�d !, j�j � �1 � . . . � �d ,

Multivariate Boundary Kernels 441

D� f�x� � �@�1=@x�11 ��@�2=@x�22 � . . . �@�d=@x�dd � f�x�,as well as xy � �x1y1, . . ., xd yd � and �x � ��1x1, . . ., �dxd �: For vectors u, v 2 Rd, such thatvk > 0, k � 1, . . ., d, and sets A, B � Rd, we de®ne vÿ1 � �vÿ11 , . . ., vÿ1d �, and B � u� vA �fy 2 Rd: y � u� va for some a 2 Ag. We do not distinguish between row and column vectorsas long as there is no chance of confusion.

De®ning

Tn�x� � xÿ bT, �2:2�the `e�ective kernel support' or `smoothing window' of the density estimator (2.1) is Tn�x� \ S.Here Tn�x� depends on the sample size n through the bandwidths b. Only data falling into thissmoothing window contribute to the estimate f �x�. The basic assumptions on S and T ensurethat Tn�x� \ S is measurable and has positive measure for all x 2 S.For estimators (2.1) it is well known (see, for example, Wand and Jones (1995), equations

(4.7) and (4.8)) that for any x 2 S8 it holds that under appropriate moment conditions on thekernel function K

Ef f �x� ÿ f�x�g2 � nÿ4=�4�d�� P

j�j�2 �

D� f �x��!

�T

K �u� u� du�2

� f �x�� Qd

k�1 k

�ÿ1 �T

K 2�u� du�f1� o�1�g: �2:3�

The applicable moment conditions are given in conditions 1 below, replacing Tx with T ; seealso assumption 2 for the de®nition of .

2.2. The method of boundary kernelsThe convergence in equation (2.3) is not uniform in x 2 S8 . Given a sample size n andassociated bandwidth b, for x 2 S8 su�ciently close to the boundary @S, we will encounterTn�x� =� S, which means that boundary e�ects arise when estimating the function at x withunmodi®ed kernels. The reason is that the usual Taylor expansion for the bias term requiresthat Tn�x� � S, so that moment conditions 1 below with Tx � T can be applied. SinceTn�x� # x as n!1, for su�ciently large n � n�x�, Tn�x� \ S � Tn�x�, and therefore equation(2.3) holds pointwise for all x 2 S8 .

From a practical and ®nite sample standpoint, for any x 2 S for which boundary e�ectsoccur, i.e. where Tn�x� \ S 6� Tn�x�, a part of the support of Tn�x� is simply cut o�. To obtainthe usual bias expansion, the method of boundary kernels adopts kernels with accordinglymodi®ed support on which the moment conditions still hold. This amounts to using a varyingkernel for each x where Tn�x� \ S 6� Tn�x�.

Transposing and rescaling the support of a boundary kernel leads to the support set

Tx � bÿ1�xÿ fTn�x� \ Sg � ��z 2 Rd: z � xÿ y

b, y 2 Tn�x� \ S

�, �2:4�

and we note that Tx � T, where Tx � T for these x 2 S which are in the `interior'. Aboundary kernel to be used for obtaining f �x� is now a kernel with support Tx which satis®esthe moment conditions 1,


�Tx

K�z� dz � 1 and

�Tx

K�z�z� dz � 0 for j�j � 1:

A heuristic argument would now give

Ef f �x�g ��

S�x�\Tn�x�

K

�xÿ y

b

�f �y� dy

��Tx

K�z� f �xÿ zb� dz,

from which condition 1 and the smoothness of f would lead to

Ef f �x�g � f�x� � Pj�j�2

D�f�x��!

b��Tx

K�z� z� dz� o�b��: �2:5�

Similarly, we obtain

varf f �x�g � f �x��nQdk�1

bk

�ÿ1 �Tx

K 2�z� dzf1� o�1�g: �2:6�

Minimum variance boundary kernels are kernel functions which minimize the leading term ofthe asymptotic variance (2.6). Accordingly, they are obtained as solutions of the variationalproblem 1, �

Tx

K 2�z� dz � min, subject to moment conditions 1:

2.3. Asymptotics for multivariate boundary effectsIn the multidimensional setting not only the distance to the boundary but also the local shapeof the boundary plays a role. This has the e�ect that the heuristic argument given abovefor equation (2.5) is not strictly correct. The reason is that the set bÿ1 � fTn�x� \ Sg ÿ x � con-verges to T for all x 2 S8 . In particular, curved boundaries make the asymptotic analysisconceptually more complicated compared with the one-dimensional case. There, for the caseS � �0, 1�, Gasser and MuÈ ller (1979) introduced the concept of an entire sequence of pointsxn � qb converging to the left end point 0 (and analogously for the right end point) andindexed by q 2 �0, 1�, as n!1 and b! 0. This device stabilizes the asymptotic boundarybehaviour for the entire sequence, as is necessary for asymptotically valid results: the e�ectivekernel support for the entire sequence is found to be �ÿ1, q �.

When keeping both x 2 S8 and S ®xed, eventually Tn�x� � S for n su�ciently large.Whereas in the one-dimensional case an asymptotically stable support is achievable by con-sidering sequences xn ! @S as n!1, this does not work for the case of curved boundariesin the multidimensional case; the reason is that any ®xed curvature will not remain invariantunder the necessary rescaling bÿ1S to accommodate b! 0 as n!1.

We note that in the one-dimensional case the standard method for asymptotic analysisdescribed above of considering sequences xn � qb can be equivalently expressed as keeping x®xed and moving the (left) end point xL of the support towards x, leading to xL,n � xÿ bq.This alternative viewpoint of extending the ®nite sample situation to the asymptotic case inone dimension has the property that it indeed can be generalized to the multidimensionalboundary problem. More technical details on this device are provided in Appendix A.1. This


approach to extend the ®nite to the asymptotic situation by considering changing supportsets Sn and boundaries @Sn then provides a stringent argument for equations (2.5) and (2.6),justifying the heuristics based on ®nite sample considerations.

2.4. Kernel-generating variational problems and their boundary extensionsAn interesting generalization of problem 1 is obtained by introducing measures d��z� �dz=G �z�, where G is a given weight function. Measures d� are generalizations of the Lebesguemeasure dz. The weight function G is assumed to satisfy assumption 3:

support�G � � T, G > 0 on T8, and G is continuous on T:

Let U � T be the closure of a connected open set. De®ne the functional

V �K, G, U � ��U

K 2�z� d��z� ��U

K 2�z�G�z� dz:

For a given integer k5 1 and a multi-index � such that 04 j� j < k, we de®ne the set offunctions

M�,k�U � ��g: U! R,

�U

g�z�z� dz � 0 for � such that 04 j�j < k, � 6� �,

and

�U

g�z�z� dz � �ÿ1�� !�,

and we write simplyM�,k if the set U is not speci®ed. Note thatM�,k consists of functionswhich are suitable as kernels for the estimation of the �th derivative of f: Now the extensionof problem 1 to general measures d� leads to the class of variational problems 2,

arg minK2M�,k�U�

fV�K, G, U �g,

with U � Tx, for all x 2 S. For all x for which Tx � T, i.e. the points in the `interior' of S, thesolution of problem 2 is the same and is given in theorem 1, whereas for the general case it isgiven in theorem 2 (setting U � Tx �.

Theorem 1. Let G be a weight function satisfying assumption 3. Then there is a uniquesolution K*G of problem 2,

K*G � arg minK2M0,2�Tx�

fV�K, G, T �g,

where K*G �z� � G�z� P�z� 1T�z� and P is a polynomial. The polynomial P has the smallestdegree necessary to ensure the side-conditions K*G 2 M�,k�T � and is uniquely determined.This result implies that every given non-negative kernel K 2 M0,2 used in the interior is

obtained as the solution of the variational problem 2 with G � K; in this case, the polynomialP is constant, P � 1. This property justi®es referring to problem 2 as the `kernel-generating'variational problem. Given any non-negative kernel K 2 M0,2�T �, its kernel-generatingvariational problem is problem 2 with G � K. Special cases were considered previously inEpanechnikov (1969) and Granovsky and MuÈ ller (1991).

These connections suggest a natural extension of such kernels onto the boundary: since thebias expansion requires K 2 M0,2 , the extension of the variational problem generating agiven non-negative kernel ~K in the interior to cover the case of boundary kernels becomessimply problem 3:


arg minK2M0,2 �Tx�

fV�K, ~K, Tx�g:

If for example G � 1, the solution of problem 2 and the boundary extensions obtainedthrough P�G,x� for G � 1 are minimum variance kernels, as they minimize the kernel invarf f �x�g according to equation (2.5) (compare Gasser et al. (1985) for the one-dimensionalcase). In the following section we give the solutions of variational problems 3, thus providinga construction of multivariate boundary kernels which are natural extensions of the kernelsused in the interior region.

The solutions of problems 3 and in fact virtually all boundary kernels will exhibit moresign changes than the initial interior kernels; this means that, for non-negative kernels in theinterior, the associated boundary kernels in general will have negative values. The possibilitytherefore arises that the resulting density estimates are negative. Although the trivial solutionto rede®ne the density estimate as maxf f �x�, 0g is practically viable and asymptotically e�-cient, more sophisticated procedures have been devised recently (compare Jones and Foster(1996) and Brown and Chen (1998)).

3. Solutions for boundary kernels

3.1. The main constructionThroughout the remainder of the paper, it will be assumed that T, U � Rd are compact setsof positive Lebesgue measure such that cl�U8 � � U � T, the weight function G satis®esassumption 3 and A is an ordered set of multi-indices of size p � jAj. De®ne for a vector

� � �� 2 Rp, � 2 A,

the set of functions

K 2 M��, U � ��K: U! R: K is integrable and

�U

K�z�z� dz � ��, � 2 A

�: �3:1�

Then consider the following generalization of problems 2 and 3 (problem 4):

argminK2M��,U�

fV�K, G, U �g,

i.e. �U

K 2�z�G�z� dz � min , subject to K 2 M��, U �:

In some situations it is of interest to use more general (i.e. not necessarily non-negative)kernel functions K 2 M0,k�T � or K 2 M�,k�T �, k > 2: When k > 2, K cannot be chosen as anon-negative function (see MuÈ ller (1988)). These kernels are useful for estimating a derivativeof order � or when a faster asymptotic rate of convergence is desired under additional smooth-ness conditions. Such faster rates are sometimes required for applications of smoothing insemiparametric modelling. Also, more general kernels K 2 M��, T � and their boundaryextensions, targeting linear functionals of f and its derivatives, may be useful for someapplications.

Under mild regularity conditions, a higher order kernel function K 2 M�,k�T � for k > 2,which must have negative parts, can be decomposed into K � PKGK for a non-negativeweight function GK and a polynomial PK of degree kÿ 2, which has 0s at the location of the


sign changes of K (MuÈ ller, 1993). In the multivariate case, starting with a higher order kernelK 2 M�,k�T �, k > 2, in the interior, assume that such a decomposition is given, K � PKGK1T,where GK 5 0; examples of such kernels can be found in MuÈ ller (1988), p. 83. Then thekernel-generating problem for the construction of corresponding boundary kernels is

K0 � arg minK2M�,k�Tx �

fV�K, GK, Tx�g,

which is also of the form of problem 4.In the following, the support set U will be either U � T (interior) or U � Tx (boundary

region). The solution of problem 4 is given in the following result.

Theorem 2. Problem 4 has a unique solution, which is given by

K*�z� � P*�z�G�z�with a polynomial P*�z� � ��2A d�z

�. This polynomial is determined by the vector d � �d��of its coe�cients, d � Cÿ1�, where C is a p� p matrix with entries

c�, ��U

z�� G�z� dz, for �, 2 A:

Observe that C is a matrix of Gram type and thus is positive de®nite. Hence the inverseCÿ1 exists. Note that U may depend on x.According to theorem 2, unique multivariate boundary kernels can be found by extend-

ing the kernel-generating variational problems 2 to include more general supports U � Tx:we would normally start with an arbitrary non-negative kernel function K5 0, K 2 M0,2 ,support�K � � T, used for density or curve estimation in the interior where no boundarye�ects occur. Choosing G � K, and Tx � bÿ1 � fTn�x� \ Sg ÿ x �, we then construct the solution

K*x � arg minK2M0,2�Tx �

fV�K, G, Tx �g � P*xK,

with the property that K*x � K for all x in the interior region where Tn�x� � S, or equivalentlyTx � T:We also note that the result of theorem 2 hints at a connection to local least squaresmethods. It was shown in Ruppert and Wand (1994) that the smoothing weights usedimplicitly in local linear ®tting are obtained in a similar manner to K*, by multiplying G withan appropriate polynomial. We discuss this connection further in Sections 3.2, 4 and 6.

The solution of problem 4 given in theorem 2 has a simple form in the case �0 � 1 and�� 0 for j�j > 0, i.e. �� 0,�. Then, by the symmetry of the matrix Cÿ1 � � �c�, �,

P*�z� � P�2A

d�z� � P

�2A�c�,0z

� � P�2A

�c0,�z�: �3:2�

This applies in particular to the case K 2 M0,k�U �. Similarly, in the case �� ,� for some®xed � 2 A,

P*�z� � P�2A

�c�,�z�: �3:3�

By a linear combination of these formulae, we obtain for general � 2 Rp the representation

P*�z� � P�2A

��P�2A

�c�,�z�: �3:4�

We remark that another approach to the solution of problem 4 can be based on expansionsin orthonormal polynomials on U.


3.2. Connections to a continuous least squares principleTheorem 2 provides a complete and explicit solution for problems 4, and thus a constructionof boundary kernels by extending the kernel-generating problem to arbitrary supports. Wedevelop now a second closely related variational problem which has the same solutions K*as problem 4 and in addition provides an interesting link to weighted least squares. This linkcan be exploited for the construction of boundary kernels since it implies that commonlyavailable statistical least squares software is all that is needed. Moreover, it provides a secondinterpretation of our proposed boundary kernel construction as it demonstrates what we calla continuous least squares principle. For related ideas see also Jones (1993), section 5.

The following result uses the notation of a �-function. For U � Tx or U � T, and z 2 U8 , let�z be such that for every continuous function g on U�

U

�z�u� g�u� du � g�z�: �3:5�

Theorem 3. Let K* be the solution of the optimization problem 4 with side-conditionsK 2 M��, U �, where �� 0,� for � 2 A. Let U � T or U � Tx for a given x 2 S8 . Then, forany ®xed z 2 U8 , consider the optimization problem 5

argminQ2P�A�

� �U

G�u�f�z�u� ÿ 2 �z�u�Q�u� �Q�u�2 g du�,

where P�A� � fQ: Q�v� � ��2A c�v�, c� 2 Rg. Problem 5 has a unique solution given by the

polynomial

Q*z�u� �P�2A

q*��z�u�, q*� � Cÿ1y, y � �z��, �3:6�

where C is as in theorem 2. Furthermore

K*�z� � Q*z�0�: �3:7�Although the solution given in theorem 3 covers only the case of special �s of the form

�� 0,�, it is not di�cult to extend this result to the situation of general �s.

Corollary 1. If � is arbitrary, then

K*�z� � P�2A

��Q*z�� u��!

��u�0: �3:8�

Theorem 3 demonstrates that, in somewhat sloppy notation, boundary kernels K*x forestimating at x 2 S8 , obtained as solutions of problem 4 with U � Tx , are given by

K*x �z� � arg minQ2P�A�

��Tx

G�u� f�z�u� ÿQ�u�g2 du��v�jv�0, �3:9�

where Tx is as in equation (2.4) and does not depend on n or b, if we adopt the construction asdescribed in Appendix A.1.

This allows an interpretation of K*x�z� as a weighted least squares approximation of �z bya polynomial Q 2 P�A� and then evaluating this polynomial at 0. Our construction thuscorresponds to a continuous version of the locally weighted least squares principle. Anotherway to look at this is to say that solving problem 4 which has the minimum variance inter-pretation when integrating the variance with respect to measure G du is an equivalent


implementation of the continuous weighted least squares principle. For more discussion onthis see Section 6.

4. Approximate solutions

For the numerical implementation of the construction of multivariate boundary kernels usingthe weighted least squares principle, we approximate the �-function which appears in prob-lem 5 by a sequence of functions. For this, consider a positive function h 2 L�Rd � with�Rd h�x� dx � 1 and use that �z�u� � mÿd hfm�uÿ z�g as m!1. Possible choices for h are

h�u� � exp�uTu=2��2��d=2

or h�u� � 1�ÿ1=2,1=2�d�u�. Then consider the following approximation to problem 5 for ®xedx 2 S (problem 6):

arg minQ2P�A�

� �U

G�u� �mÿd hfm�uÿ z�g ÿ 2mÿd hfm�uÿ z�gQ�u� �Q�u�2 � du�:

By arguments similar to those in the proof of theorem 3 (see Appendix A.3), we obtain forthe coe�cients of the solution Q*z,m �u� � ��2A q*�,m �z�u�

q*�,m �z� �P 2A

��c�,

�U

mÿd hfm�uÿ z�gG�u�u du�

�: P 2A

�c�, �* ,m�z�, �4:1�

say. For the case �� ,0 , the approximate multivariate boundary kernel is then obtained as

K*m �z� � Q*z,m �0�:The next result deals with the quality of the approximation of K*m �z� for K*�z�. We ®nd the

following bound on the approximation error, for the case of general �.

Theorem 4. Assume in addition that the weight function G is continuously di�erentiable onU(whereU � T orU � Tx) and that the function h in problem 6 satis®es kwk h�w� 2 L�Rd�. Then

jK*m �z� ÿ K*�z�j � O

�1

m

�for z 2 U8 : �4:2�

The case where z 2 @U and some extensions are discussed in Appendix A.6. Theorem 4shows that for m su�ciently large it is su�cient to obtain the approximate solution K*m whenconstructing multivariate boundary kernels. Therefore, this provides a viable option to con-struct explicit versions of the multivariate boundary kernels.

Note that

K*m

�xÿ z

b

��arg minQ2P�A�

� �Tx

G

�xÿ u

b

�f�z�u� ÿQ�u�g2 du

��v�jv�0, �4:3�

and a discretized version therefore is obtained as follows: assume that zi , i � 1, . . ., md, arethe midpoints of a regular and dense grid of pixels covering S. An example would beequisized cubes each of volume mÿd. Then solving the weighted least squares problem, settingyi � md 1fzi�zj g,


arg minQ2P�A�

� Pzi2S

G

�xÿ zib

�fyi ÿQ�zi �g2

�, �4:4�

yields Q*j and

K*m

�xÿ zjb

�� Q*j �0�,

up to an additional approximation error of order mÿ1. This error is caused by approximatingthe integral in equation (4.3) by a Riemann sum in expression (4.4).

Solving this discrete problem by taking ®rst derivatives with respect to the coe�cients of Q,it is easy to see that the solution of expression (4.4) is a linear functional of the yi . Therefore,for any subsequence zj , j � 1, . . ., n, of the zi , i � 1, . . ., md (assuming that md > n�, wemay set

yi � md

�nQdk�1

bk

�ÿ1Pnj�1

1fzi�zj g, i � 1, . . ., md: �4:5�

We then obtain the solution Q*, which satis®es�nQdk�1

bk

�ÿ1Pnj�1

K*m

�xÿ zjb

�� Q*�0�: �4:6�

Now, in the setting of the density estimation problem, let Ci , i � 1, . . ., md, denote the ithcube of a partition of S intomd equisized cubes, each with volume jS j=md and such that zi 2 Ci

is the midpoint of Ci . Then, again up to an approximation error of order O�mÿ1�, setting

yi � md

�nQdk�1

bk

�ÿ1Pnj�1

1fXj2Ci g, i � 1, . . ., md,

and obtaining Q*, we arrive at

f �x� ��nQdk�1

bk

�ÿ1Pnj�1

K*m

�xÿ Xj

b

�� Q*�0�: �4:7�

This provides a boundary modi®ed multivariate density estimate. It can be interpreted as ahistogram based on bins Ci , which is smoothed with a locally weighted least squares smooth-er. We ®nd that as m " 1 this procedure approximates the exact boundary kernels.

5. Illustrations and examples

5.1. An example of an interior kernel and its boundary extensionsFor an illustration of the proposed multivariate boundary kernels, we implemented the approx-imative boundary kernels (4.1) in R2, which are the approximate solutions of problems 4 and5. We chose m � 2500, using interior kernels with circular support and the bivariate weightfunction f1ÿ �x2

1 � x22�g3. The side-conditionsM��, T � are based on those �� corresponding

to 1, x1, x2, x21, x

22, x

31, x

32, x1x2, x

21x2 and x1x

22, and these are higher order kernels inM0,4. The

support includes an annular region on which these kernels are negative. The interior andassociated boundary kernels are shown in Fig. 1 for various locations of x with respect to theboundary of S � �0, 1�2, the assumed support of the function to be estimated.


5.2. An example of bivariate density estimation with square supportWe now present two examples which demonstrate that the application of multivariate boun-dary kernels does make a di�erence in terms of interpretation and conclusions comparedwith unmodi®ed kernel density estimates in the two-dimensional case. The ®rst exampleconcerns the US draft lottery data of 1970, which are available from Statlib at

http://lib.stat.cmu.edu/DASL/Stories/DraftLottery.html

These data consist of n � 365 bivariate observations �Xi1, Xi2�, where Xi1

, Xi22 f1, 2, . . .,

365g and Xi1is a given day of the 365 days of the year, whereas Xi2

is a `draft number', whichcorresponds to a priority score assigned to that day. Low priority score increases the chancethat a male born on that day will be drafted, as priority score Xi2

� 1 means that men born


Fig. 1. Boundary kernels for a higher order kernel with circular support (the kernels satisfy moment conditionsM0,4 and the weight function is G(x1, x2) � f1ÿ (x2

1 � x22 )g3 1fx2

1�x22 41g ): (a) kernel weights in the interior, for

estimation at point (0.5, 0.5); (b) kernel weights for estimation at (1.0, 0.5), a boundary point; (c) kernel weights forestimation at (1.0, 0.75), at the boundary; (d) kernel weights for estimation at (1.0, 1.0), the lower right-handcorner point of the boundary

on day Xi1are drafted ®rst, etc. The priority scores were supposedly assigned randomly to the

days. Although these data are not strictly a bivariate random sample, since each day and eachpriority score occur exactly once, we expect the bivariate density to be constant under theassumption of a truly random assignment of priority scores. The density estimate serves hereas a simple graphical check of the randomness of the assignments of priority scores.

We apply our methods with weight function �1ÿ x21��1ÿ x2

2� 1fx2141,x2241g , bandwidths b1 �100 and b2 � 100, and approximate implementation (4.1) and (4.7) with m � 3600 for theboundary-modi®ed method. The results are in Fig. 2. It is clear that the unmodi®ed estim-ate is useless as a graphical device. The boundary-modi®ed estimate, in contrast, shows avery clear pattern: for those born early in the year, the numerical values of the priority scores(Y ) tend to be larger than for those born late in the year, and this means that those with latebirthdays (X ) had a higher chance of being drafted than those with earlier birthdays. Similarconclusions can be reached with a nonparametric regression analysis, which is posted on theStatlib site.


Fig. 1 (continued)

5.3. An example of bivariate density estimation with triangular supportThe data set for the next example consists of n � 456 observations on guesses of trinomialprobabilities made by participants in experimental games with the aim of empiricallyverifying assumptions made in economic game theory. The experiments and data aredescribed in Haruvy (1998). These data are available at

http://www.eco.utexas.edu/Faculty/Stahl/experimental/mode_ds2.dat

and further information on these experiments can be obtained via

http://www.eco.utexas.edu/Faculty/Stahl/experimental

We use the data of the second experiment, which can also be obtained from

http://www.blackwellpublishers.co.uk/rss/

Participants were asked to report their `hypotheses' on an opponent's strategy, which canbe stated as a trinomial probability. One goal was to identify subgroups of participants whichwould give rise to modes in the density of the probabilities reported. Since the three guessedprobabilities of a hypothesis must sum to 1, the ith data are �pi1 , pi2� for the ®rst two guessedprobabilities, with constraints 04 pi1 , pi2 4 1 and 04 pi1 � pi2 4 1. Thus the problem can beviewed as the estimation of a bivariate density on a triangular support.

Applying our methods with bandwidths b1 � b2 � 0:15, m � 2500 and the same kernelsand weight function as in the previous example led to the density estimates shown in Fig. 3for both unmodi®ed (Figs 3(a) and 3(b)) and boundary-modi®ed (Figs 3(c) and 3(d)) densityestimates, viewed from the front (Figs 3(a) and 3(c)) and the rear (Figs 3(b) and 3(d)). Thereis a strong mode at x � 0, y � 0, due to repeated observations with �pi1 , pi2� � �0, 0�. Theboundary-modi®ed estimate re¯ects this as the highest peak in the density (because of itsdominating size, it has been truncated in the plot), whereas the unmodi®ed estimate showsonly a feeble elevation in the density.

A further mode shown by the boundary-modi®ed estimate but not by the unmodi®edestimate also occurs right at the boundary, at about x � 0:5, y � 0. These ®ndings indicatethat, in particular when modes which may occur near or at boundaries are of interest,boundary modi®cations are mandatory.


Fig. 2. Bivariate density estimates for n � 365 draft lottery data, on square support 0 4 x1, x2 4 365 (band-widths b1 � b2 � 100 and non-negative kernels with weight function G(x1, x2) � (1ÿ x2

1 )(1ÿ x22 ) 1fx2

1 41,x22 41g are

used): both (a) unmodi®ed and (b) boundary-modi®ed density estimates are shown

6. Concluding remarks

The main message of our paper is that the direct construction of multivariate boundarykernels is feasible and not such a formidable challenge as once thought. Second, mostpractically occurring multivariate kernel functions and all non-negative kernels are obtainedas solutions of a kernel-generating variational problem. Third, the most natural constructionof such kernels is by extending a minimum variance type of criterion, incorporating a generalmeasure based on a weight function. Fourth, a second and equivalent view of this extensionpoints to what we call a `continuous weighted least squares principle'. This amounts toapproximating a delta function by a polynomial which is closest in terms of a weighted L2-norm but the choice of which is restricted by constraints. Fifth, these continuous versionsof the kernel-generating variational problem and the continuous weighted least squares for-mulation have discrete counterparts which are useful for obtaining actual solutions. Sixth, ifwe use the discrete version, the approximate construction of the boundary-modi®ed multi-variate density estimate can be interpreted as a ®nely binned histogram which is smoothed bya classical locally weighted least squares smoother, ®tting local planes or local higher ordermultivariate polynomials.

We conclude this section with some additional observations. The problems and results areuniversal for multivariate kernel estimates and the focus on kernel density estimation servesonly as a convenient framework in which the problem can be described. An interesting by-product of our analysis is the continuous least squares principle. Its essence is the equivalenceof the solutions to problems 4 and 5 in theorem 3. This is stated for a continuous analogue of


Fig. 3. Bivariate density estimates for n � 456 data on guessed probabilities in experimental games, on tri-angular support 0 4 x1, x2 4 1 and 04 x1 � x2 4 1 (bandwidths b1 � b2 � 0:15 and the same kernels as in Fig. 2are used): (a) front view of the unmodi®ed density estimate; (b) rear view of the unmodi®ed density estimate; (c)front view of the boundary-modi®ed density estimate; (d) rear view of the boundary-modi®ed density estimate

weighted least squares, involving the weight function G: Assuming for simplicity that G � 1,i.e. the unweighted case, and that Tx � T , then theorem 3 in the interpretation (3.9) andtheorem 2 imply that

arg minK2M��,T �

� �T

K 2�u� du��z� � arg min

Q2P�A�

� �T

f�z�u� ÿQ�u�g2 du��v�jv�0:

The right-hand side reveals the kernel as the continuous analogue of the linear regressionweights

g�x� �Pni�1

wiyi ,

where the yi are the regression data. The jth weight wj is obtained by calculating g for aninput vector of yis consisting of 0s, except for yj � 1. The left-hand side shows that thiscontinuous least squares kernel produces the smallest variance among all unbiased con-tinuous kernel functions, i.e. among those kernel functions satisfying the requisite momentconditionsM��, T �. In this sense, we found not only a continuous least squares principle butalso an extension of the classical Gauss±Markov theorem from linear regression analysis tothe continuous least squares case. The well-known discrete version of this theorem was usedin MuÈ ller (1987) to show the equivalence of versions of kernel regression and locally weightedleast squares smoothing for regular designs. A consequence of the above results is that thisequivalence extends to the multivariate case and to boundary regions.

Acknowledgements

We are grateful to E. Haruvy for sending us a preprint of his paper and his permission to usethe data from his second experiment. We also wish to thank two referees for most helpfulcomments and for bringing additional references to our attention. The research of Hans-Georg MuÈ ller was supported in part by National Science Foundation grant DMS 96-25984and by National Security Agency grant MDA 904-96-10026.

Appendix A: Additional results and proofs

A.1. Derivation of equation (2.4)As discussed in Section 2.3, to determine the pointwise asymptotics starting from a given ®nite situation,we assume that the support set S is not ®xed, but rather that it has a moving boundary depending onthe chosen point x and sample size n: In this way, the geometric relationship between the point ofestimation x and the boundary @S can be stabilized asymptotically.

Let Sn�x� � x� b�Sÿ x�, and de®ning the transposed and rescaled version of the e�ective smoothingwindow Tn�x� as Tx,n we ®nd that

Tx,n � bÿ1 �xÿ fTn�x� \ Sn g �� bÿ1 �xÿ �fx� b�Sÿ x�g \ �xÿ bT� � �� xÿ S� \ T �: Tx � T,

which is indeed independent of n. We assume that the function to be estimated on Sn�x� is fn�y� �f �y�jSn . Then


Ef f �x�g ��Qd

k�1bk

�ÿ1 �Sn�x�\Tn�x�

K

�xÿ y

b

�fn�y� dy

��Tx

K�z� f�xÿ zb� dz,

from which equation (2.4) follows.

A.2. Proofs of theorems 1 and 2In the following, the support set U denotes either T or Tx . We consider for a measurable function ' onU the functional

T�'� :��U

'2�u�G�u� du:

Let K* be a kernel of type P*G, where P*�z� � ��2A d�z� with as yet unde®ned coe�cients d� . Then we

can write any other integrable kernel K satisfying the side-conditions K 2 M��, U � asK�z� � K*�z� ��K�z�

with �K satisfying �U

�K�z� z� dz � 0 for all � 2 A:

We conclude that

T �K � � T�K*��K� � T�K*� � 2

�U

K*�z��K �z�G�z� dx�

�U

�K 2�z�G�z� dz

5 T�K*� � 2P�2A

d�

�U

z� �K�z� dz � T�K*�:

Thus the functional attains its minimum at K*. Obviously the minimum is strict if we do not have�K�z� � 0. Observe now that we obtain by equation (3.1), for any 2 A,

� ��U

K*�z�z dz � P�2A

d�

�U

z�� G�z� dz,

leading toCd � �. This yields the desired result for theorem 2. Theorem 1 follows from similar arguments.

A.3. Proof of theorem 3Since the convex functional in problem 5 is strictly convex, there is a unique solution Q*z �u� ��2A q*� �z�u�. Di�erentiation with respect to the coe�cients and introducing the vectors y � �z� �,� 2 A, and q � �q� �, � 2 A, we arrive at the following linear equation for q*:

ÿ2G�z�y� 2Cq � 0,

where the matrix C was de®ned in theorem 1. Hence we ®nd that

q* � G�z�Cÿ1y,and thus

Q*z �u� �P�2A

q*� �z�u� � G�z� P�2A

P 2A

�c�, z u�,

where Cÿ1 � � �c�, �. Finally, by equation (3.2),


Q*z �0� � G�z�P 2A

�c0, z � K*�z�:

A.4. Proof of corollary 1With �� ,� we ®nd that

Q*z��u�� !

��u�0� G�z� P

�, 2A�c�, z

u��

� !

��u�0� G�z�P

2A�c�, z

,

and this yields the result for this special � by equation (3.3). Now by writing � � ��2A ��e�, wheree� � ��,� ��2A , we obtain the general case.

A.5. Proof of theorem 4For a* ,m as in equation (4.1) and a* � G�z�z ,

a* ,m �z� ÿ a* �z� ��U

mÿd hfm�uÿ z�gG�u�u duÿ�Rd

h�w� dwG�z�z

��

fw:z�w=m2Ug

h�w�fG�z� w=m� �z� w=m� ÿ G�z�z g dwÿ�

fw:z�w=m2Uc g

h�w� dwG�z�z

� : Im � IIm, say:

With a � � ��w, m� 2 �0, 1�, using Taylor's expansion and dominated convergence,

Im ��

fw:z�w=m2Ug

h�w� �G�y�y �0��y�z��w=m

�w

m

�dw

� 1

m

�Rd

h�w��G�z�z �0w dwf1� o�1�g,

denoting the di�erential �@h=@x1, . . ., @h=@xd � �u� by h0 �u�.Since for z 2 U8 we have fw: z� w=m 2 Ug ! Rd as m!1, there is a � � �z > 0 such thatfw: z� w=m 2 Ucg � fw: kw=mk > �g. This yields

jIImj4C

�fw:kwk>m�g

jh�w�j w

m�

dw � o

�1

m

�

and

a* ,m �z� ÿ a �z� �1

m�G�z�z �0

�Rd

h�w�w dw� o

�1

m

�:

But this implies, using corollary 1, that

jK*m �z� ÿ K*�z�j �� P�2A

��Q*z,m��

�!�u�ju�0 ÿ K*�z�

�� P�2A

�� q*�,m �z� ÿP�2A

��P 2A

�c�, G�z�z ��

��P�2A

��P 2A

�c�, fa* ,m �z� ÿ a* �z�g��

� O

�1

m

�:


A.6. Problem 6 for points located on the boundary @S of SThe approximation procedure needs to be modi®ed at the boundary @S of S. One possibility is toreplace in problem 6 the quantity mÿd hfm�uÿ z�g by

hfm�uÿ z�g��

S

hfm�vÿ z�g dv, �A:1�

for points z 2 @S, provided that the denominator is not vanishing. A calculation similar to that given inAppendix A.5 leads to the following result.

Corollary 2. For the modi®ed problem 6, using the approximating functions (A.1), we obtain thefollowing boundary kernel for any z 2 S:

a* ,m �z� ÿ G�z�z � �G�z�z � 0

m

�Sm

h�w�w dw�Sm

h�w� dw� o

�1

m

�:

Here, Sm � fw: z� w=m 2 Sg and we have that Sm ! Rd if z 2 S8 and Sm ! H with a half-plane Hwhen z 2 @S, where @S is assumed to be smooth at z. The half-plane H is bounded by the tangent spacefor S at z and satis®es H \ S \ B �z, �� 6�1 for all balls with positive radius �, centred at z.

Under further assumptions we can improve the bound on the approximation error.

Corollary 3. Assume that G is twice continuously di�erentiable,�U

h�w�w dw � 0

and �U

jh�w�j kwk2 dw <1.

Then we have for any 2 A and z 2 S8 that

a* ,m �z� ÿ G�z�z � 1

2m2

�Rd

h�w�wT�G�z�z �@w dw� o

�1

m2

�:

The proof is similar to that of theorem 3 in Appendix A.3, using one more step in the Taylorexpansion of �G�z�z �. Note that h@�z� denotes the Jacobian.

References

Brown, B. M. and Chen, S. X. (1998) Beta-Bernstein smoothing for regression curves with compact support. Scand.J. Statist., to be published.

Cacoullos, R. (1966) Estimation of a multivariate density. Ann. Inst. Statist. Math., 18, 179±189.Cheng, M.-Y., Fan, J. and Marron, J. S. (1997) On automatic boundary corrections. Ann. Statist., 25, 1691±1708.Cowling, A. and Hall, P. (1996) On pseudodata methods for removing boundary e�ects in kernel density estimation.

J. R. Statist. Soc. B, 58, 551±563.Epanechnikov, V. A. (1969) Nonparametric estimation of a multivariate probability density. Theory Probab. Applic.,

14, 153±158.Gasser, T. and MuÈ ller, H. G. (1979) Kernel estimation of regression functions. Lect. Notes Math., 757, 238±252.Gasser, T., MuÈ ller, H.-G. and Mammitzsch, V. (1985) Kernels for nonparametric curve estimation. J. R. Statist. Soc.

B, 47, 238±252.Granovsky, B. and MuÈ ller, H. G. (1991) Optimizing kernel methods: a unifying variational principle. Int. Statist.

Rev., 59, 373±388.Hall, P. and Wehrly, T. E. (1991) A geometrical method for removing edge e�ects from kernel-type nonparametric

regression estimators. J. Am. Statist. Ass., 86, 665±672.Haruvy, E. (1998) Testing modes in the population distribution of beliefs from experimental games. Technical Report.

Department of Economics, University of Texas, Austin.


Jones, M. C. (1993) Simple boundary correction for kernel density estimation. Statist. Comput., 3, 135±146.Jones, M. C. and Foster, P. J. (1996) A simple nonnegative boundary correction method for kernel density

estimation. Statist. Sin., 6, 1005±1013.Marron, J. S. and Ruppert, D. (1994) Transformations to reduce boundary bias in kernel density estimation. J. R.

Statist. Soc. B, 56, 653±671.MuÈ ller, H. G. (1987) Weighted local regression and kernel methods for nonparametric curve ®tting. J. Am. Statist.

Ass., 82, 231±238.Ð(1988) Nonparametric Regression Analysis of Longitudinal Data. New York: Springer.Ð(1991) Smooth optimum kernel estimators near endpoints. Biometrika, 78, 521±530.Ð(1993) On the boundary kernel method for nonparametric curve estimation near endpoints. Scand. J. Statist.,

20, 313±328.MuÈ ller, H. G. and Wang, J. L. (1994) Hazard rate estimation under random censoring with varying kernels and

bandwidths. Biometrics, 50, 61±76.Rice, J. (1984) Boundary modi®cation for kernel regression. Communs Statist. Theory Meth., 13, 893±900.Ruppert, D. and Wand, M. P. (1994) Multivariate locally weighted least squares regression. Ann. Statist., 22, 1346±

1370.Scott, D. W. (1992) Multivariate Density Estimation. New York: Wiley.Staniswalis, J. G. and Messer, K. (1996) Addendum to `Kernel estimators for multiple regression'. J. Nonparam.

Statist., 7, 67±68.Staniswalis, J. G., Messer, K. and Finston, D. R. (1993) Kernel estimators for multiple regression. J. Nonparam.

Statist., 3, 103±121.Wand, M. P. and Jones, M. C. (1995) Kernel Smoothing. London: Chapman and Hall.


Multivariate boundary kernels and a continuous least squares principle

Documents

Transcript of Multivariate boundary kernels and a continuous least squares principle