Testing goodness-of-fit for nonlinear regression models with heterogeneous variances

E L S E V I E R Computational Statistics & Data Analysis 23 (1997) 491-507

COMPUTATIONAL STATISTICS

& DATA ANALYSIS

Testing goodness-of-fit for nonlinear regression models with heterogeneous

variances Nathalie Caouder a, Sylvie Huet b,.

aUniversit~ de Valenciennes, ISTV, Le Mont Houy, 59304 Valenciennes Cedex, France bLaboratoire de Biom~trie, 1NRA, 78352 Jouy-en-Josas, France

Received January 1995; revised February 1996

Abstract

This paper describes a method for testing a parametric model for the regression function and for the variance function in a parametric nonlinear regression model. The procedure is based on the robustness properties of various estimators of the parameters, the test statistics being simply equal to the normalized differences between two different estimators. The estimation of the normalizing factor using the first- order approximations of the distribution of estimators is strongly dependent on the chosen model, and thus leads to low values of the power of the test. We propose to use wild bootstrap method to estimate this normalizing factor. This method is robust to a misspecification of the variance fimction. Moreover, the simulation study shows that the wild bootstrap method has nearly no effect on the probability that the test rejects the null hypothesis.

Keywords: Nonlinear Regression; Goodness-of-fit testing; Resampling

I. Introduction

Let us consider a nonlinear regression model with independent observations and heterogeneous variances:

Yi -~- •(xi) ~- ~i, v a r ( e i ) = o-2(xi), i ---- 1 , . . . , n . (1)

The regression function # and the variance function 0 -2 a r e unknown, and they are approximated by parametric functions. In many applications of econometrics or

* Corresponding author.

0167-9473/97/$17.00 (~) 1997 Elsevier Science B.V. All rights reserved PH S 0 1 6 7 - 9 4 7 3 ( 9 6 ) 0 0 0 4 9 - 7

492 N. Caouder, S. Huetl Computational Statistics & Data Analysis 23 (1997) 491-507

biometrics, data are analyzed with the purpose to find good estimates of the model parameters (consistency, efficiency), and to make further inferences (confidence in- tervals with known asymptotic level, tests,...). Thus, it is of great interest to assess the adequacy of the assumptions defining the statistical modeling, specially the regression and variance functions. In practice, a variety of parametric functions can be appropriate to approximate the true functions # and 0.2. Our aim, however, is not to find the best model for analyzing the phenomena under consideration, but to test the adequacy of a parametric model. Model selection has already been studied by several authors. Bunke et al. (1994) proposed procedures based on the estimation of the actual squared prediction error. Their selection criterion is dependent on the aim of the analysis (prediction, calibration, estimation of a function of the independent variables . . . . ). If the variance of errors is constant, Haughton (1993) studied the consistency of criteria for model selection.

In this paper our purpose is to find a procedure for testing the parametric regression and variance function against an alternative which is not defined by parametric functions. A natural idea is to compare the parametric regression function with a nonparametric estimate of/~, see H/irdle and Mammen (1993), Miiller (1992) or Firth et al. ( 1991 ) for example. The conditional moment tests, introduced by Newey ( 1985) and Tauchen (1985), have been adapted to test a large set of alternatives: see Bierens (1990), Hansen (1992) and Horowitz and H/irdle (1994). Some of these tests take into account the heteroscedastieity of the errors, but they never test the parametric variance function. The problem of heteroscedasticity is ruled out by estimating the variance function using a non-parametric regression of the squared residuals versus the fitted values of the response for example. Such a method is not always well adapted to the data set, specially when the number of observations is small. More- over, in some applications, we are interested in modeling the variance if we want to carry out a likelihood ratio test for example. Thus, we propose a method for testing both the parametric regression and variance functions chosen so as to approximate the unknown functions # and 0 -2.

In practice, the diagnostics based on the analysis of residuals or studentized residuals are used to detect some misspecifications in the modeling. This approach simply imitates the usual linear regression analysis. The asymptotic study of the behavior of the residuals in nonlinear regression when the model is misspecified (1993) does not give useful information on their small-sample behavior. Cook and Tsai (1985), assuming that the variance of errors is constant, underlined that the residuals can pro- duce misleading results, and suggested to calculate projected residuals. Nevertheless, in most cases, a careful examination of graphics of residuals give good informations on the adequacy of the model. But this approach needs a lot of expertise and lacks of objectivity.

The procedure we propose is based on the properties of robustness of parameter estimators in a nonlinear regression model with heterogeneous variances. For example, the maximum likelihood estimator is generally not consistent when the variance function is not well specified, while the least-squares estimator is consistent (see Section 2.1.3). This lack of robustness of the maximum likelihood estimator suggests that the test based on the difference between the maximum likelihood and

N. Caouder, S. Huet/ Computational Statistics & Data Analysis 23 (1997) 491-507 493

the least-squares estimators is powerful to detect a discrepancy between the assumed variance function and the unknown function a 2. This idea was already explored by White (1981) when he derived a test based on the difference between the least- squares estimator of the parameters occurring in the regression function and the weighted least-squares estimator.

This paper is organized as follows. In Section 2.1 we derive the asymptotic properties of a class of estimators defined by a set of estimating equations, under the null hypothesis that the parametric model coincides with the true model, and under a local alternative. We consider three of them: the least-squares estimator, the modified least-squares estimator and the maximum likelihood (assuming Gaussian errors) estimator. In Section 2.2 we define three statistics as the differences between two of these estimators. We propose to estimate the asymptotic variance of these statistics using a resampling method which is robust to a misspecification of the error variances. Starting from the calculations of the bias of the statistics under a local alternative, we present a procedure to decide if the regression function is misspecified, or if the variance function is misspecified, or possibly if both functions are misspecified. Finally in Section 3, we present the results of two simulation experiments.

2. Goodness-of-fit test

2.1. Asymptotic properties o f the estimators

2.1.1. Notations The regression function # and the variance function a 2 defined by Eq. (1) are

unknown, so they are approximated by parametric functions. Let us denote by H0 the parametric model:

H0 { E(Y,.) = f(xi , 0o), var(Y/) = v (xi, 00, 80).

The variations of E(Y) and var(Y) versus the independent variables x are modeled by the known functions f and v. These functions depend on unknown parameters 00 and 80. x is a vector of R m taking the values Xl,.. . ,xn. 0 denotes the vector of parameters, with p components, occurring in the regression function and possibly in the variance function. 0 belongs to a subset of Rp, say ~9. /~ denotes the vector of parameters, with q components, occurring in the variance function, fl belongs to a subset of R q, say B. For example, in several situations we assume that the variance of the response is proportional to a function of its expectation, v(x, O, 8) = fl21V(f(x,O),fl2), where V is a strictly positive function defined on R x R q-l, see Section 3 for precise examples.

Let ~k0 ~ = (O~,flro) be the vector of p + q unknown parameters to be estimated under H0.

From the data set and from the estimation of the parameters, we propose a procedure allowing to detect a bad choice of the model H0.


Let t~ be an estimator of ~k0. Several methods, described in Section 2.1.2, are available to calculate a n~/E-consistent estimator of ~0 under H0. Let ~(~) and ~(2) be such estimators. Then the difference ~ 0 ) _ ~ ( 2 ) converges in probability under H0 to 0. Our procedure is simply based on the idea that if the model H0 is wrong, then the difference qj(1)_~(2) is biased.

We will study the ability of the tests to detect misspecification by calculating the local behavior o f ~ (1) -~(2) for small deviations of the true model. Let us define a local alternative H~ to the parametric model H0:

E(Y/) = ~n(Xi) = f(xi , 0o) + n-1/29(xi), H, var(Y/) = a2(xi) = v(x~, 0o, flo) + n-mw(x~).

The functions 9 and w may depend on n and 0. For example, if the true regression function belongs to a parametric family h(x, O, 7) such that h(x, O, 70) = f (x , O) for all 0 and x, then our local alternative H, will be defined by #,(x) = h(x, 0o, 7o+n-1/26) = f (x , 0o) + n-1/Zfon(X, 00, )~0).

2.1.2. Properties of estimators defined by a set of estimatin9 equations Let S,(~,) be the system of estimating equations

1 " = n B(x , , x, , qJ),

where B(xi, d/) is a matrix with dimension (p + q) × 2 and ~(Yi,xi,~,) is defined as follows:

( Yi - f(x, , O) ) q,) y2 _ fE(xi, O) - v(xi, ql) /"

The choice of the estimator ¢, defined by S,(~) = 0, is equivalent to the choice of the matrix B(x, ~). Let us define B(~k) the 2n × (p + q) matrix such that Bi, s,(d/) = Bs, a(Xi, ~O) and B,+i,~,(~b) = B~,2(xi,~k) for s' = 1 , . . . , p + q, i = 1 , . . . ,n . Later on, we write

=

n

where ~(~k) is a vector with 2n components such that ~i(ffl) = Yi - - f ( x i , O) and ~,+i(~k) = Yi 2 - f 2 ( x i , O ) - v(x,,~), for i = 1 , . . . ,n .

Let ~ be the n × p matrix of derivatives of f with respect to 0, and let rio and ba be defined in the same way. The most usual estimators of ~k are defined as follows: • If the Y, are distributed as Gaussian variables, under some regularity conditions

detailed in Theorem 1, and if ff lies in the interior of O x B, the maximum likelihood estimator of ~k is defined by Sn(~k) = 0 with

(d iag(v - ' )~ -d iag( fv -2 ) f~o - d i a g ( f v - 2 )I)~ )

B = ½diag(v -2)t~o ½diag( v-2)rt~ '

N. Caouder, S. Huet l Computational Statistics & Data Analysis 23 (1997) 491-507 4 9 5

where diag(v -1) denotes the diagonal matrix with components v-l(xi,~,), diag(fv -2) denotes the diagonal matrix with components f (x i , O)v-2(xi,@) and ½diag(v -2) denotes the diagonal matrix with components lv-2(xi, d/). S,(@) is, up to a constant term, the derivative of the log-likelihood with respect to the parameters.

• The modified least-squares estimator of ~b is defined by S.(@) = 0 with

B = ( d i a g ( v - l ) f o -diag(fv-2)z)~ ) 0,×p ½diag(v_2)t)# , (2)

where 0,×p is a matrix of 0. This estimator is directly defined by the estimating equation, there is no contrast associated to these equations. If we assume that the variance function is proportional to a function of 0 alone: var(Y/) = fl~V(x, Oo), then we get the modified least-squares estimator (see Huet, 1986) or a quasi- likelihood estimator of 00, solution of the following set of equations:

-~Yi - f (x i , O) Of ,:, V(xi- o"

MacCullagh (1983) and Morton (1981) studied such estimators. In the nonlinear regression context the modified least-squares estimator of 00 has the same asymptotic distribution as the generalized least-squares estimator of Carroll and Ruppert (1988), see Huet et al. (1992).

When the variance function depends on parameters rio, we can generalize the above idea and we get the estimating equations with the matrix B define at Eq. (2).

• The ordinary least-squares estimator of 0 is defined as the value of 0 which min- imizes the residual sum of squares with

Let ( = ~(~), A(~) = EH,(~(~)) where EH, is for expectation under H,, 3 : A(~0) and ~. = ~ - A. If the functions g and w are equal to zero, then .4 = 0. Finally., let ~(~) be the 2n x (p + q) matrix of derivatives of ~ with respect to ~,; is for ((~o).

The expansion o f ~ - ~o under H, is given by the following theorem.

Theorem 1. Let us assume that the following hypothesis are fulfilled: 1. 0 and B are compact sets, ~b~ = (OT, fl T) lies in the interior o f O × B. 2. The function 0 ~ f ( x , O) (respectively ~k ~ v(x, ~)) is two-times continuously

differentiable with respect to 0 on 0 (respectively to ~k on 0 × B). 3. ~ is defined by S , (~ )= O. 4. (1/n)BX~ is positive definite and its limit exists when n tends to infinity. 5. For all de, in 0 x B,

S(~,~h) = lim 1BX(~,)(((~h)- ~) n ---* o c n

exists and admits only one zero in ~h = ~.

496 N. Caouder, S. Huetl Computational Statistics & Data Analysis 23 (1997)491-507

6. The second order moments o f ~l(xi, Y,.) exist. 7. Let Zi = B(xi,@ )rl( Yi,xi) and let Fz, be the distribution function o f Zi. For all

s > O ,

l i m - l ~ f l l Z n---*o¢ n /=1 [[>6v'n Ilzll= = o ,

where Ilzll is the euclidian norm o f the vector z in R p+q. 8. Let Y" be the set o f variation o f x. For all x in 3; the functions f , v, w, 9,

and the derivatives o f f and v with respect to the parameters, are bounded on Y" uniformly on 0 x B. Then the expansion o f ~ - ~ is 9iven by

( ') ~ - t/lo = Al + b + op --~

where nl/2A1 = --nl/2(BT~)-IBTq converoes in distribution under H. to a centered Gaassian variable. The bias term b can be expanded accordino to the power o f n -1/2 : let bl = -(BT~)-1BTA, then b = b~ + op(n -1/2) and bl = O(n-l/2).

Proof. The consistency of ~ follows from the fact that ~ is a minimum of contrast estimator for the contrast function IS(~, qJ)l.

From a Taylor expansion we get

~--tfJo = - - ( I p + q + ( B T ~ ) - I B T A + o p ( 1 ) ) -1 (~BT~) -1 (1BT~/-t-~BTA),

where Ip+q is the matrix identity with dimension p + q. Thus, ~ - ~ is the sum of the random term A~, the bias term b and a remainder term which is a op(n-~/2). b can be written as

b = bl + {(Ip+q + (BT~)-'/~TA + Op(1)) -1 - Ip+q}b, . (3)

It follows that the bias is proportional to bl which equals 0 when A equals O.

Let us comment the assumptions in Theorem 1. The compactness of the sets O and B and the assumption 5 are used to show the consistency of the minimum of contrast estimators. The condition of compactness of the sets O and B may be weakened. L/iuter (1989) generalized Jenrich's result (1969) to unbounded parameter space.

If ~ is the maximum likelihood estimator of ~,, the assumption 4 means that the information matrix exists and is positive definite. If 0 is the least-squares estimator of 0, the matrix BT~ is simply the matrix f0Tf0 .

The assumptions 6-8 are used when we apply the law of large numbers or the central limit theorem. They can be weakened, but for the purpose of simplicity we prefer not to go in further details.

N. Caouder, S. Huet l Computational Statistics & Data Analysis 23 (1997) 491-507 497

The first term in the bias o f~-go , bl, depends both on the discrepancy between H0 and H~ through the quantity A and on the choice of the estimating method through the matrix B. The next section studies the properties of robustness of different estimators to a bad choice of the model Ho by comparing the different terms bl.

2.1.3. Robustness Firstly, let us consider the case when vat (Y/)= fl2V(xi, 00). Let OML (respectively

~MLS, and OOLS) be the maximum likelihood (respectively modified least-squares and

ordinary least-squares) estimators of 00 defined in the preceding section. Let /~ be the residual variance

~ = 1 ~ (Y~ - f(x,,O))2

Corollary 1. Assume that the hypothesis o f Theorem 1 are fulfilled for "0~, "0 MLs, andO °Ls. Let bl,~-oL s be the first term o f the bias in the expansion o f x/~(O °Ls- 00), and let bl,0~LS and bl,'gML be defined in the same way. Then we have the following results:

(foTfo)b,)-oLs = f0T(#~ -- f ) (4)

(foTdiag(v - ' )fo)bl,o%lLS = foTdiag(v -1 )(Pn - f ) (5)

(foTdiag(v-')fo + ½~)~diag(v-2)fio) bl,o%4L = f0Tdiag(v-l)(#n -- f )

1 "T " - - 2 2 + vodiag(v )(an - v ) (6) 2 where #n -- f is the vector with components gn(Xi) -- f (xi , O0) and a n - v is the

vector with components a2,(xi)- [3~V(xi, 8o).

The robustness of these estimators to a bad choice of the regression function or variance function appears now clearly. If a2n(x) = v(x, go) for all x and all n, the bias of the three estimators depends on the discrepancy between f and #n. The first term of the bias normalized by the matrix BT~ is identical for the modified least- squares and the maximum likelihood estimators. Suppose now that #~(x) = f ( x , 8o) for all x and all n. Then the ordinary least-squares and the modified least-squares estimators are nl/~-consistent estimators of 00 (see Eq. (3) in the proof of Theo- rem 1). On the other hand, the bias of the maximum likelihood estimator of 0o is proportional to the discrepancy between a~(x) and fl~V(x, Oo). Can'oll and Rup- pert (1982) showed a similar result for the Gaussian linear model. These properties of robustness will be used in the following section to build our procedure of test.

In the more general case when var Yc-- v(xi, 80,~o), the comparison between the estimators is similar, except that when gn(x) = f ( x , Oo), the bias of the modified least-squares estimator of 80 depends on the discrepancy between a~(x) and v(x, go).

498 iV. Caouder, S. Huetl Computational Statistics & Data Analysis 23 (1997) 491-507

2.2. Goodness-of-fit test

2.2.1. Test statistic Let ~0) and ~(2) be two estimators of ~ defined by the estimating equations

s~l)(~k) -- 0 and s.~z)(~k) = 0, where

=

n n

The following corollary gives the asymptotic properties of the difference ~(1)__~(2).

Corollary 2. I f the hypothesis o f Theorem 1 are f u l f i l l e d f o r ~ (l) and ~ ~2), then

~(1) _~(2) = 1FT ~ + R., n

where

F T = ((B~Z)T~)-IB ~2)T - (B°)r~)-lB°)r) and R. = Op(n-X/2). (7)

It follows that under Ho, nl/2(~ (1) _~(2)) converges in distribution to a centered Gaussian variable, with covariance matrix

f2 = lim t2. n0 where ~2 n Ho ---- 1FTEHo(~T) F" ( 8 ) n - ~ ' ' n

If we can find ~ . which converges in probability to t2, then we will easily build a test of H0 against H..

T h e o r e m 2. Assume that the hypothesis o f Theorem 1 are fulfilled for ~(1) and ~(2). Let F be defined as in Corollary 2, f2.,no be defined at Eq. (8) and ~2",H, = (1/n)FTEH.(rl.qT)F. Assume that the following hypothesis are fulfilled: f2 exists and there exists t2. such that f2.-f2,,Ho and ~ . - t 2 . , n . tend to 0 when n tends to infinity.

Let

where f2~ is a generalized inverse off2. , then • Under Ho, U. converges in distribution to a Z 2 with r degrees o f freedom, where

r is the rank o f t2. • Let t2~,Ho be a generalized inverse o f I2.,Ho, and

-

-

and - l f 2

2n = 1-ATFt-2~HoFTA, 2--- lim 2.. n n---* ~

= 0, U. converges in distribution under H. to a Z 2 with r degrees o f freedom. > O, U. converges in distribution under H. to a X 'z with r degrees o f freedom non-centrality parameter 2. = go, U. tends to infinity.

N. Caouder, S. Huet l Computational Statistics & Data Analysis 23 (1997.) 491-507 499

ProoL The proof is a direct consequence of Corollary 2.

Assume for simplicity that Qn, ~n, H0 and ~n,H, are non-singular. Under Hn,

~nl/2(~(l) -5(2)) = t2~,~/? n-l/2FT( q- nl/2Rn q- n'/2R'n,

where Rn = Op(n -1/2) is the remainder term appearing in Eq. (7).

nl/2RIn = 1 -1 A -& I ---~On, H.(~'2 n -- On, H.)~Qn,~inn-: F T (

converges in probability to 0 by hypothesis. This proves the second part of the theorem.

We have thus obtained a test of H0 against H~, H0 being rejected when {U~ > ul-~,r} where u~,r is the ~-quantile of a Zr 2. The level of the test tends to ct, its power depends on 2:

{ c~ if 2 = 0 , lim PrHo(Un > ul-~) = 1 if 2 = o~,

n--'~ p~ if not,

where p;, = Pr(z~2(2) > ul-~). To apply this result we need to examine two problems thoroughly:

• In Section 2.2.3 we build a procedure, allowing to detect a misspecification of the regression function, based on the properties of robustness detailed in Section 2.1.3. We will also adapt our procedure to the case of a constant variance function under the model H0, because in that case the estimators of 00 defined in Section 2.1.2 are identical.

• In the following section we propose two methods for estimating f2: a plug-in method and a resampling method. These methods are asymptotically equivalent but behave differently when applied with a finite number of data.

2.2.2. Estimatino the variance of the test statistic EH0(((T) is a function of f ( x i , O0), V(Xi, I~O), EH0(Yi- f(xi, O0)) 3, and EH0(Y/-

f(xi, 00)) 4. Using the plug-in method, f2 is estimated by ~n obtained by replacing in f2n, H0 the parameters ~'0 by their estimations (~(1) or ~(2)) and by replacing the 3rd and 4th moments by their empirical counterpart (Y,. - f ( x i , ~))3 and (Y,. - f ( x i , fi))4. If we assume that Y~ is distributed as a Gaussian variable, then the 3rd and 4th moments can be replaced by 0 and 3vE(xi, ~/) respectively.

fin converges in probability under Ho and under Hn to t2. Nevertheless, it appears in simulations studies (see Section 3) that when the variance function is wrong, the discrepancy between fin and the variance of n~/2(~ 0) _~(2)) can be very large. The consequence is that the power of the test is very small, and even estimated by 0.

The idea is to use a resampling method to estimate the variance of the statistic so as to reduce the effect due to a bad choice of the variance function v.

Several resampling methods are available to generate the pseudo-errors e*. The wild bootstrap proposed by Hiirdle and Mammen (1991 ), Hiirdle and Marron (1991 ), Wu (1986) and Liu (1988) seems to be appropriate because the distribution of e*


depends only on ei = Yi - f(xi,'O). For instance, the distribution of e* could be 76a7 ' + (1 - Y)667 ' with y = 126(5 + x/~), a = ½(1 - V'5), b = 1(1 + v'~), 6x being the Dirac measure at x.

Let Y~* = f(xi,'O) + e*, i = 1 , . . . ,n , where 0 is a nm-consistent estimator of 00 under H0. Let 5 *0) and 5 *(2) be the bootstrap counterparts of 5 0) and 5 ~2), and D* = n l / 2 ( 5 ' ( 1 ) - - 5 * ( 2 ) ) . Let D; b, b = 1 , . . . ,B be a B-sample of D* and let

1 B

- D , )(D, - D , ) ( D . * , b • T

where D*' is the empirical mean of the D *b. Then V* estimates g2. To illustrate the behavior of ~ . and V*, let us consider the s t a t i s t i c n l /2(O Oes -

~Mrs), because its variance depends only on the variance of Y/:

f2.,Ho = 1FVdiag(v)F and 12.,H. = 1Frdiag(a2.)F, n n

with F a" = (f0Tf0)-~f0 r - (/0Vdiag(v -1 )f0)-~f0 a'diag(v-' ). The difference between the asymptotic power of the test and its power for a finite

number of observations depends, among other things, on the discrepancy between the estimation of f2, ~ . or V*, and On, n. the variance under H. of the first term of the expansion of x/n(0 °Ls -~MLS) (see the remainder term R'. in the proof of Theorem 2).

We can write

~ . - f2n, a. = 1FXdiag(v - a2)F n

+ l / ? r d i a g ( ~ ) f f - 1 F T diag ( v )F n n

while

V* - f2.,n. = 1FX diag(e2 - a2)F n

+ l f f rdiag (~2)F - 1Fa'diag (e2)F n n

+ V* - 1 ~ rdiag (~2)ff. n

It appears that the difference between V* and t2n. does not depend directly on the difference between the functions v and cr 2. Thus, we can expect that V* will be more robust to a misspecification of the variance function to estimate the variance of the variable ~OLS _ OMLS.

2.2.3. How to detect a bad choice o f the reoression function or the variance function?

The power of the test depends both on r, the rank of 12, and on 2. In practice, the number of degrees of freedom of the Z 2 equals the rank of the estimation of

N. Caouder, S. Huetl Computational Statistics & Data Analysis 23 (1997) 491-507 501

g2, say ~ . or VB*, and in all the examples we have studied r = p + q. Obviously, depending on the model Ho, on the experimental design, and on the test statistic, r can be lower than p + q. Let us focus on 2 the discussion on the power. 2 depends on the robustness properties of the estimators ~o) and ~(2) to a misspecification of the model. Considering the results of Section 2.1.3 we consider the tests based on the statistics

Dl,n ---- nl/2(0 ML __ ~MLS),

D2,. = n 1/2(0MLS -- 00LS ),

D3.. = n'/2(0"ML -- ~OLS).

If #.(X) = f ( x , 00), the bias bb-0MLS and bb-0OLS equal 0 and the power of the test of Ho against H, based on the statistic D2,n tends to ~, the asymptotic level of the test. On the other hand, the tests based on D1,. and D3,. will be powerful to detect a bad choice of the variance function.

Assume now that a.Z(x) = v ( x , ~ ) . The three statistics defined above are biased proportionally to the differences between /~.(x) and f ( x , Oo). If we examine Eqs. (4) - (6) , we can forecast that the test based on D2.. and D3,. will be more powerful to detect a bad choice of the regression function than the test based on DI,,. Nevertheless, when the variance of errors varies with the expectation of the observed response: v(x, O, r ) = f l2V( f ( x , 0), f12), a bad choice of the regression function induces a bad choice of the variance function. Thus, we propose to detect a misspecified model in the following way: • if the tests based on the three statistics are significant, we suspect a bad choice

of the regression function, and possibly a bad choice of the variance function. • if the tests based on D2,. and D3.. are significant but not the test based on D~,.,

we suspect a bad choice of the regression function only. • if the tests based on D~,. and D3,. are significant but not the test based on D2,.,

we suspect a bad choice of the variance function only. Until now we have not treated the case of a constant variance under H0. Let us

recall that H0 is nothing more than a parametric model chosen so as to mimic as well as possible the unknown functions # and a 2. "When there is no reason to suspect the presence of heterogeneous errors, we propose to model the variance with a nearly constant function instead of assuming a constant variance. For example, we choose var(Yi) = [32 V ( f ( x i , 0o), h) where V(~b, h) is a positive function defined on R 2 equal to 1 when h equals 0, such that its derivative with respect to ~b equals 0 when h equals 0. The value of h is chosen small enough so as to keep the variations of V in a neighborhood of 1. Formulae 4 - 6 remain valid. If n is fixed, neglecting the terms of order h 2, we find

- ( f ~ f o ) f~ ( # . - f ) , bl,~.OL s __ "T " - 1 "T

b1,~'ML s = bl,~'OL s q- hMl(Oo)(12. - f ) + O(h2),

= + - + b I , ~ " M L zp~

502 iV. Caouder, S. Huetl Computational Statistics & Data Analysis 23 (1997) 491-507

where M1 and M2 are bounded functions of 00. These last equations confirm our procedure: the tests based on D2,, and D3,n detect a bad choice of the regression function with a great power while the tests based on D~,, and D3,, are powerful to detect a bad choice of the variance function.

3. Simulation studies

We present the results of two simulations studies. In the first example we estimate a calibration curve used in the context of a cortisol assay described in Huet et al. (1992). The second example was used by Horowitz and H/irdle (1994) to compare the so-called HH-test with Bierens' test.

3.1. Cortisol example

We simulated n = 64 observations (Y/, xi), i = 1 . . . . ,n such that Y,. = p ( x i ) + a(xi)ei, where the ei are i.i.d. Gaussian variables with mean 0 and variance 1. The functions #(x) and a(x) are defined as follows:

0 2 - - 01

it(x) = 01 + (1 + exp(03 + 041ogx)) °5' (9)

01 = 135.4, Oz = 2804, 03 = 2.886, 04 = 3.044, 05 = 0.666,

a2(x) = f12p23(x), fil = 0.00853.

The values of x are the following: x = (0, 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 1.5, 2, 4, +o~), with eight replications of the observed response for xl = 0, and four replications for the other values of x.

This model defines the hypothesis H,. We study the statistics Dj, n, j = 1,2, 3 for several models H0. • Under Ho,1 the regression function is assumed symmetrical, 05 = 1:

0 2 - - 01 f ( x , O) = 01 +

1 + exp(03 + 04 logx) '

v(x, O, fl) = fl~fZ3(x, 0).

• Under H0,2 the variance function is nearly constant:

f ( x , O) = Ol + 02 - - 0 l

(1 + exp(03 + 04 logx)) °~'

v(x, O, fl) = fl~ f°'°l (x, 0).


• Under Ho,3, the regression function is assumed symmetrical and the variance function is nearly constant:

02 - 01 f ( x , O) = 01 4-

1 + exp(03 + 04 logx)"

v(x, O, fl) = ~ f ° ° l ( x , 0).

We estimate the variance of each statistic Dj,. in three ways. 1. We apply the plug-in method to calculate ~n0,. = O. by replacing in f~. the

parameters by their estimations under H0. 2. We calculate ~n.,. by replacing in (1/n)FTEH.(tplT)F the matrix F(O,~l) by

its estimation under H0, F(0 ML, ~ML) for example. In practice, this procedure is not available because the variance of q under H. is unknown. But we use it in this simulation study to appreciate the consequences of a bad choice of the variance function for estimating the variance of the statistic.

3. Finally, we apply the bootstrap method described in Section 2.2.2. In this example, the assumed models H0 are embedded in the true model H. and

are rather far from H.. Some simple diagnostic procedures, based on graphics of residuals for example, would be able to detect the misspecifications of the models. This example is interesting, however, to test the effect of the estimation of (2, and also to verify that our procedure works to detect misspecifications of the variance function.

We estimate the error of first kind and the power of the test with N = 100 simulations. Thus, the estimated probabilities have an estimated standard error of about 2%. The results are given in Table 1.

When the variance function is well modeled (hypotheses H. and H0,1 ), the results obtained when f2 is estimated by ~m,. are identical to those obtained when ~ is estimated by ~n.,.. On the other hand when the variance function is wrong (hypotheses H0,2 and H0,3) the differences between the estimated power of the tests using ~H0,. or ~H.,., are significant. Thus, the way of estimating the variance of the statistic influences greatly the results. Because ~H.,. is not available in practice, it is interesting to evaluate the behavior of the bootstrap estimate. The number of bootstrap simulations, B, for estimating the variance of the statistic was chosen equal to 50. We based this choice on the following empirical procedure: for few simulations we calculated the test statistics D~V*-1D. for B varying from 10 to 300 with step sizes of 10. B = 50 was in most eases the minimum value for which the values of the statistics remained stable (see Table 2).

The estimated errors of first kind increase from 0 to 13% for the statistic DI,. and from 0 to 6 and 8% for the statistics D2,. and D3,. (for a nominal error of 5%). This simulation study shows that if we apply the bootstrap method to estimate O, the tests based on the statistics D2,. and D3,. are powerful to detect a bad choice of the regression function, while the tests based on Dl,n and D3,. are powerful to detect a bad choice of the variance function.

Finally, let us give the results of a simulation study to evaluate the performance of our procedure to detect a misspecified variance function. We calculate the statistics Dj,., j = 1,2, 3 for several models H0 defined in the following way: the regression


Table 1 Cortisol example: for each model, estimation of the probability that the test rejects the null hypothesis at nominal 0.05 level. Under H~, the regression function /1 is the five parameters logistic curve and the variance function is proportional to #2.3. Under H0,1 f is symmetrical, under H0,2 v is nearly constant, under H0,3 f is symmetrical and v nearly constant. The number of simulations is N = 100, the number of bootstrap simulations is B = 50

The variance t2 H~ H0,1 is estimated

DI,~ D2,n D3,n DI,~ D2,n Ds, n

By ffHo,n 0 0 By ffH.,. 0 0 With wild Boots~ap 13% 6%

The variance t2 H0,2 is estimated

0 0 100% 100%

0 0 100% 100% 8% 0 100% 100%

no, 3

Dt,~ D2,n Ds,. Dl,~ D2,n Ds,.

By ffH0,n 2% 7% 3% 0 0 0 By ~H.,~ 100% 32% 100% 100% 100% 100% With wild Bootstrap 100% 20% 94% 84% 89% 100%

Table 2 Cortisol example: for each model, estimation of the probability that the test rejects the null hypothesis at nominal 0.05 level. The regression function f is the five parameters logistic curve and the variance function is proportional to fh. The results obtained for the true model Hn (h = 2.3) and for the model H0,2 (h = 0.01) are reproduced in this table for the purpose of comparison. The number of simulations is N = 100, the number of bootstrap simulations is B = 50

h Dl,n D2,n D3,n

0.01 100% 20% 94% 0.5 99% 18% 75% 1 99% 12% 24% 2 40% 10% 13% 2.3 13% 6% 8% 2.5 13% 8% 14% 3 36% 11% 49% 3.5 89% 13% 82%

func t ion is the f ive p a r a m e t e r s log i s t i c

func t ion is

v(x, O, fl) = fl~ f h ( x , 0),

curve de f ined b y equa t ion 9, the va r i ance

N. Caouder, S. Huet / Computational Statistics & Data Analysis 23 (1997) 491-507 505

where h varies from 0.01 to 3, The test based on the statistic Dl,n is more powerful to detect a misspecification of the variance function than the test based on D3,n. This result is in accordance with the first terms of the bias for each estimator given in Corollary 1.

3.2. Second example

The data are generated by random sampling from the model Y~ = 1 + xi + 10b~(10xi) + eg using three values for b: b = 0,0.5,1. The xi, i = 1, . . . ,50 are generated once (and kept fixed for all the simulation experiment) as Gaussian centered variables with variance 1. The ei are Gaussian centered random variables with variance 0.25. • is the standard normal density function.

Horowitz and H/irdle (1994) used this model in a simulation study to illustrate the behavior of their test of a parametric model against a semi-parametric alternative. Our results are not quite comparable with theirs because we assume that the values of x are fixed while they assume that x is a random variable. But we thought that it was interesting to study a model for which a test based on a non-parametric estimation gives good results.

We study the tests based on the statistics Dj, n for two models H0. Under H0,1 the regression function is assumed linear, f (x , O) = O1 + 02x and the variance function is nearly constant, v(x,O, fl) = fl2exp(f(x,O)/lO0). Nevertheless, looking at Eqs. (4 ) - (6 ) it appears that the power of the tests depends also on the choice of the variance function. Thus, to appreciate the effect of the variance function, we define another null hypothesis H0,2 by choosing for the variance function v(x, O, fl) = fl~exp(f2(x,O)/lO0) whose variations are more important than those of v(x,O, fl) under H0,1. We estimate the variance of the statistics Dj, n using the wild bootstrap method with B = 150 bootstrap simulations. In this example, the time needed for the calculations was small enough to allow such a large value of B.

The results are given in Table 3. If the calculations are done under the model H0,1, the tests based on the statistics D2,n and D3,n a r e powerful to detect a bad choice of the regression function. The results obtained under H0,2 are not as good as those obtained under H0,1. Thus, the effect of the choice of the nearly constant variance function appears clearly: under H0,2 the variations of the variance function v are big enough to introduce a significant bias in the maximum likelihood estimator.

More generally, if the model H0 assumes that the variance function is nearly constant, the result of the test will depend on the choice of the pair (V,h). For the cortisol example, with h = 0.01, the variations of V( f ,h ) = fh lie between 1.04 and 1.1. For the second example the variations of V( f ,h ) = e x p h f lie between 0.98 and 1.05 while the variations of V( f ,h ) = e x p h f 2 lie between 1.005 and 1.25. Obviously, we can imagine a full of strictly positive functions V( f , h) to model the nearly constant variance function, but we can suggest to choose a pair (V,h) such that the variations of V( f ,h ) stay around 5%.

The HH-test gives better results than our tests: the estimated power of their test is 90% when b = 0.5 and 98% when b = 1. Nevertheless, their test depends on the choice of several parameters: K the kernel, h and s the bandwidths and w a

506 N. Caouder, S. Huet l Computational Statistics & Data Analysis 23 (1997)491-507

Table 3 Horowitz and H~irdle example: Estimation of the probability that the test rejects the null hypothesis at nominal 0.05 level. The number of simulations is N = 100, the number of bootstrap simulations is B = 150, h = 0.01. The two last columns give the estimated power of the HH-tests and Bierens' test. The results of these columns are not based on the same simulations than the results of the six first columns

b H0,1: f is linear, H0,2: f is linear, HH Bierens' v(x, ~k) ~x exp(hf(x, 0)) v(x, ~b) cx exp(hf2(x, 0)) test test

Dl,n D2,, D3,n DI,, D2,, D3,n

0 1% 0 1% 5% 7% 3% 5% 5% 0.5 3% 42% 42% 36% 45% 41% 90% 42% 1 0 89% 87% 62% 89% 43% 98% 73%

weighting function. Bierens' test requires also choosing several parameters. Moreover, they assume that the variance function is constant, tr2(x) -- fl~, and estimate it by the residual sum of squares calculated under the alternative.

References

Bierens H.J., A consistent conditional moment test of functional form, Econometrica, 58 (1990) 1443-1458.

Bunke, O., B, Droge and J. Polzehl, Model selection, Transformations and variance estimation in nonlinear regression, Discussion Paper 52/95, SFB 373, Humboldt-Universit/it zu Berlin.

Carroll, R.J. and D. Ruppert, A comparison between maximum likelihood and generalized least squares in a heteroscedastic linear model, J. Amer. Statist. Assoc., 77 (1982) 878-882.

Carroll, R.J. and D. Ruppert, Transformation and weighting in regression (Chapman and Hall, New York and London, 1988).

Cook, R.D. and C.L.Tsai, Residuals in nonlinear regression, Biometrika, 72 (1985) 23-29. Firth, D., J. Glosup and D.V. Hinkley, Model checking with nonparametric curves, Biometrika, 78

(1991) 245-252. Hansen, B.E., Testing the conditional mean specification in parametric regression using the empirical

score process. Preprint (University of Manchester, 1992). H~irdle, W. and E. Mammen, Bootstrap methods for nonparametric regression, in: G. Roussas (Ed.)

Non-parametric Functional Estimation and Related Topics, Series C: Mathematical and Physical Sciences, Vol. 335 (Kluwer, Dordrecht, 1991) 111 - 124.

H~irdle, W. and E. Mammen, Comparing nonparametric versus parametric regression fits, Ann. Statist., 21 (1993) 1926-1947.

H~irdle, W. and J.S. Marron, Bootstrap simultaneous error bars for nonparametric regression, Ann. Statist., 19 (1991) 778-796.

Haughton D., Consistency of a class of information criteria for model selection in nonlinear regression, Theory Probab. AppL, 37 (1993) 47-53.

Horowitz, J. and W. H~irdle, Testing a parametric model against a semi-parametric alternative, J. Econometric Theory, 10 (1994) 821-848.

Huet, S., Maximum likelihood and least squares estimators for a nonlinear model with heterogeneous variances, Statistics, 17 (1986) 517-526.

Huet, S., E. Jolivet and A. Messran, La rOgression non-lin~aire: m~thodes et applications en biologie (INRA Editions, Srrie: Mieux comprendre, 1992).


Liu, R.Y., Bootstrap procedures under some non-i.i.d, models, Ann. Statist., 16 (1988) 1696-1708. MacCullagh, P., Quasi likelihood functions. Ann. Statist., 11 (1983) 58-67. Morton R., Efficiency of estimating equations and the use of pivots, Biometrika, 68 (1981) 227-233. Miiller, H.G., Goodness-of-fit diagnostics for regression models, Scand. J. Statist. (1992) 157-172. Newey, W,K., Maximum likelihood specification testing and conditional moment test, Econometrica,

53 (1985). Tauchen, G., Diagnostics testing and evaluation of maximum likelihood models, J. Econom., 30 (1985)

415-443. White, H., Consequences and detection of misspecified nonlinear regression models, J. Amer. Statist.

Assoc., 76 (1981) 419-433. Wu, C.F.J., Jackknife, bootstrap and other resampling methods in regression analysis, Ann. Statist., 14

(1986) 1261-1295.

Testing goodness-of-fit for nonlinear regression models with heterogeneous variances

Documents

Transcript of Testing goodness-of-fit for nonlinear regression models with heterogeneous variances