A comparison of regression models for small counts

8
Tools and Technology Article A Comparison of Regression Models for Small Counts TRENT L. MCDONALD, 1 West, Inc., 2003 Central Avenue, Cheyenne, WY 82001, USA GARY C. WHITE, Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, CO 80523, USA ABSTRACT Count data with means ,2 are often assumed to follow a Poisson distribution. However, in many cases these kinds of data, such as number of young fledged, are more appropriately considered to be multinomial observations due to naturally occurring upper truncation of the distribution. We evaluated the performance of several versions of multinomial regression, plus Poisson and normal regression, for analysis of count data with means ,2 through Monte Carlo simulations. Simulated data mimicked observed counts of number of young fledged (0, 1, 2, or 3) by California spotted owls (Strix occidentalis occidentalis). We considered size and power of tests to detect differences among 10 levels of a categorical predictor, as well as tests for trends across 10-year periods. We found regular regression and analysis of variance procedures based on a normal distribution to perform satisfactorily in all cases we considered, whereas failure rate of multinomial procedures was often excessively high, and the Poisson model demonstrated inappropriate test size for data where the variance/mean ratio was ,1 or .1.2. Thus, managers can use simple statistical methods with which they are likely already familiar to analyze the kinds of count data we described here. KEY WORDS count data, cumulative logistic regression, multinomial regression, overdispersion, Poisson regression, Strix occidentalis occidentalis, underdispersion. Analysis of count data, assuming they follow the Poisson distribution, is a widely recommended practice in standard statistical texts (e.g., McCullagh and Nelder 1989). Making the assumption that count data are Poisson allows analysis within the flexible framework of generalized linear models (GLM). This assumption is convenient because routines for general linear model estimation, such as Proc Genmod in SAS (SAS Institute, Cary, NC), have become standard in nearly all statistical computer packages. However, more than one paper has cautioned that Poisson regression is not robust to departures from the Poisson distribution. White and Bennetts (1996) demonstrated poor performance of Poisson regression when count data generated from the negative binomial distribution were analyzed, even when overdispersion of the data was modeled using an estimated variance inflation factor. White and Bennetts (1996) found procedures that assume counts come from a normal distribution worked far better than Poisson regression; that is, test size was approximately correct for the normal distribution methods, and power of the normal methods was equivalent to the power of methods that correctly assumed counts followed the true negative binomial distribution. Other, more theoretically based approaches have also been suggested for count data. For instance, mixed Poisson models in which the negative binomial and Poisson-inverse Gaussian distributions are special cases have been considered (Puig and Valero 2006). Ecologists have also considered the zero-inflated Poisson distribution to handle overdispersion from excess zero observations (Martin et al. 2005). We considered a different type of count data than considered by White and Bennetts (1996), Martin et al. (2005), and Puig and Valero (2006). Specifically, we considered counts M 3. In these cases, ordinal counts are bounded and the data can be markedly non-Poisson in that the probability of values . 3 equals zero, even when the mean value approaches 2. The motivation for this work was analysis of the number of young fledged per nest for California spotted owls (Strix occidentalis occidentalis; Franklin et al. 2004), which fledge 0, 1, 2, or 3 young. Similar data were analyzed by Anthony et al. (2006) for northern spotted owls (S. occidentalis caurina) using general linear models assuming normally distributed residuals. Our scenario is typical of many species of birds for which number of young fledged is biologically limited to some maximum. Such count data are both underdispersed or overdispersed relative to a Poisson distribution depending on the reproductive rate of the population. We compared size and power of tests for differences among levels of a categorical factor and for a continuous (trend) factor. We considered Poisson regression, normal regression, multino- mial logit regression, cumulative logit regression, and a mean response model for ordinal data. METHODS We obtained all estimates via maximum likelihood estima- tion. Unless specified otherwise, we detected positive slopes if overall Wald P-values associated with b coefficients were less than a 5 0.05. We used SAS code to produce estimates (Appendix). Normal Regression Model The first model we considered was linear regression assuming that count responses followed a normal distribu- tion. Letting y i 5 0, 1, …, k denote the observed count on unit i, our normal model assumed y i , Normal(m i , s 2 ), and m i 5 x i b where x i 5 [1, x i1 , x i2 , …, x i p21 ] was a p- dimensional vector of predictor variables associated with unit i, b was a p-dimensional vector of coefficients, and s 2 was the common variance estimated from the data. Interest lay in the size (Type 1 error rate) and power (1 2 Type 2 error rate) to detect presence of L 1 nonzero coefficient among the set of p slope coefficients (excluding the 1 E-mail: [email protected] Journal of Wildlife Management 74(3):514–521; 2010; DOI: 10.2193/2009-270 514 The Journal of Wildlife Management N 74(3)

Transcript of A comparison of regression models for small counts

Tools and Technology Article

A Comparison of Regression Models forSmall Counts

TRENT L. MCDONALD,1 West, Inc., 2003 Central Avenue, Cheyenne, WY 82001, USA

GARY C. WHITE, Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, CO 80523, USA

ABSTRACT Count data with means ,2 are often assumed to follow a Poisson distribution. However, in many cases these kinds of data,

such as number of young fledged, are more appropriately considered to be multinomial observations due to naturally occurring upper truncation

of the distribution. We evaluated the performance of several versions of multinomial regression, plus Poisson and normal regression, for analysis

of count data with means ,2 through Monte Carlo simulations. Simulated data mimicked observed counts of number of young fledged (0, 1, 2,

or 3) by California spotted owls (Strix occidentalis occidentalis). We considered size and power of tests to detect differences among 10 levels of a

categorical predictor, as well as tests for trends across 10-year periods. We found regular regression and analysis of variance procedures based on

a normal distribution to perform satisfactorily in all cases we considered, whereas failure rate of multinomial procedures was often excessively

high, and the Poisson model demonstrated inappropriate test size for data where the variance/mean ratio was ,1 or .1.2. Thus, managers can

use simple statistical methods with which they are likely already familiar to analyze the kinds of count data we described here.

KEY WORDS count data, cumulative logistic regression, multinomial regression, overdispersion, Poisson regression, Strix

occidentalis occidentalis, underdispersion.

Analysis of count data, assuming they follow the Poissondistribution, is a widely recommended practice in standardstatistical texts (e.g., McCullagh and Nelder 1989). Makingthe assumption that count data are Poisson allows analysiswithin the flexible framework of generalized linear models(GLM). This assumption is convenient because routines forgeneral linear model estimation, such as Proc Genmod inSAS (SAS Institute, Cary, NC), have become standard innearly all statistical computer packages. However, more thanone paper has cautioned that Poisson regression is notrobust to departures from the Poisson distribution. Whiteand Bennetts (1996) demonstrated poor performance ofPoisson regression when count data generated from thenegative binomial distribution were analyzed, even whenoverdispersion of the data was modeled using an estimatedvariance inflation factor. White and Bennetts (1996) foundprocedures that assume counts come from a normaldistribution worked far better than Poisson regression; thatis, test size was approximately correct for the normaldistribution methods, and power of the normal methods wasequivalent to the power of methods that correctly assumedcounts followed the true negative binomial distribution.

Other, more theoretically based approaches have also beensuggested for count data. For instance, mixed Poissonmodels in which the negative binomial and Poisson-inverseGaussian distributions are special cases have been considered(Puig and Valero 2006). Ecologists have also considered thezero-inflated Poisson distribution to handle overdispersionfrom excess zero observations (Martin et al. 2005).

We considered a different type of count data thanconsidered by White and Bennetts (1996), Martin et al.(2005), and Puig and Valero (2006). Specifically, weconsidered counts

M

3. In these cases, ordinal counts arebounded and the data can be markedly non-Poisson in thatthe probability of values .3 equals zero, even when the

mean value approaches 2. The motivation for this work wasanalysis of the number of young fledged per nest forCalifornia spotted owls (Strix occidentalis occidentalis;Franklin et al. 2004), which fledge 0, 1, 2, or 3 young.Similar data were analyzed by Anthony et al. (2006) fornorthern spotted owls (S. occidentalis caurina) using generallinear models assuming normally distributed residuals. Ourscenario is typical of many species of birds for which numberof young fledged is biologically limited to some maximum.Such count data are both underdispersed or overdispersedrelative to a Poisson distribution depending on thereproductive rate of the population. We compared sizeand power of tests for differences among levels of acategorical factor and for a continuous (trend) factor. Weconsidered Poisson regression, normal regression, multino-mial logit regression, cumulative logit regression, and amean response model for ordinal data.

METHODS

We obtained all estimates via maximum likelihood estima-tion. Unless specified otherwise, we detected positive slopesif overall Wald P-values associated with b coefficients wereless than a 5 0.05. We used SAS code to produce estimates(Appendix).

Normal Regression ModelThe first model we considered was linear regressionassuming that count responses followed a normal distribu-tion. Letting yi 5 0, 1, …, k denote the observed count onunit i, our normal model assumed yi , Normal(mi, s

2), andmi 5 xib where xi 5 [1, xi1, xi2, …, xi p21 ] was a p-

dimensional vector of predictor variables associated withunit i, b was a p-dimensional vector of coefficients, and s2

was the common variance estimated from the data. Interestlay in the size (Type 1 error rate) and power (1 2 Type 2error rate) to detect presence of

L

1 nonzero coefficientamong the set of p slope coefficients (excluding the1 E-mail: [email protected]

Journal of Wildlife Management 74(3):514–521; 2010; DOI: 10.2193/2009-270

514 The Journal of Wildlife Management N 74(3)

intercept). We tested for presence of a nonzero coefficientusing (p 2 1) degree of freedom Type III F statistics (Neteret al. 1985), which assumed our observed Fs followed an Fdistribution with p 2 1 and n 2 p degrees of freedom. Wedetected presence of a nonzero coefficient if the P-value ofthis test was less than a 5 0.05.

The normal model was clearly not ideal for this type ofcount response because yi took on positive integer values in anarrow range, a clear violation of the assumption ofnormality. Nonetheless, we considered the normal modelbecause it is common practice to fit a normal model to countdata (White and Bennetts 1996). It is common knowledgethat this model is robust to departures from normality (i.e.,that it retains its statistical properties when fitted tononnormal data; White and Bennetts 1996), and wewondered whether this robustness extended to the markedlynonnormal small bounded count data of the type weconsidered here.

Poisson Regression ModelThe second model we investigated was a Poisson regressionmodel. The Poisson model assumed yi , Poisson(mi) and mi 5

xib. Following the theory of GLMs (McCullagh and Nelder1989), we estimated coefficients by maximum likelihood. Wetested for presence of

L

1 nonzero slope coefficient using adrop-in-deviance F test, adjusted for overdispersion. LettingDf equal the Poisson deviance from the full model (i.e.,containing all coefficients), and Dr equal deviance of the modelwithout any coefficients (i.e., containing b 5 [b0] only), thedrop-in-deviance test was

f ~Dr{Df

ðp{1Þw ð1Þ

where we estimated the overdispersion parameter w as thePearson chi-square statistic divided by n 2 p. We comparedthe test statistic f to an F distribution with p – 1 and n – pdegrees of freedom.

Like the normal model, the Poisson model was also notideal for this type of response because yi was bounded aboveby k. If k was large (.10 approx.), we suspect that thePoisson distribution would provide a better approximationto distribution of yi than it does when k is small, but thePoisson distribution is still technically incorrect when k islarge. Furthermore, when k is small, the mean of yi is oftenvery different from its variance, a further departure from thePoisson distribution. Nonetheless, estimation of a Poissonmodel for count responses, regardless of their size orwhether they are bounded, is more or less standard statisticalpractice (McCullagh and Nelder 1989).

Constrained Multinomial Regression (MulLogit) ModelFor unordered multinomial responses (such as red, green,and yellow), there exists only one multinomial modelconsisting of one fewer logistic regression equations thanthere are categories, each equation with a common referencecategory. However, when responses are ordinal (such as1, 2, and 3) as they are here, there are several choices ofreference levels and several link functions that can quantify

the relationship between probabilities and covariates differ-ently.

The first multinomial regression model we investigatedviewed yi as a realization of a multinomial process. Let yi 5 [yi0,yi1, …, yik] 5 [I(yi 5 0), I(yi 5 1), …, I(yi 5 k)], where I(e) is anindicator function equaling 1 if the event e is true and 0otherwise. Each multinomial vector yi had length k + 1 withexactly one element equal to 1, all others were 0. Themultinomial model assumed yi , Multinomial( pi,1), where pi

5 [pi0, pi1, …, pik] subject to the constraint pi0 + pi1 + … + pik 5

1. The linear regression part of the multinomial modelassumed common slopes across response levels, that is,

g1 pi

� �~ Ik Jkxi1j½ � ak

b1

� �ð2Þ

where Ik was an identity matrix of size k, Jk was a vector of k 1’s,xi1 5 [xi1, xi2, …, xip], ak 5 [a1, a2, …, ak] was a k vector ofintercept coefficients to be estimated, b1 5 [b1, b2, …, bp] didnot contain an intercept term, and g1(p) was the multivariatelogit link function,

g1 pð Þ~ lnp1

p0

� �, ln

p2

p0

� �,:::, ln

pk

p0

� �� �’

ð3Þ

For this model, the inverse link function was

g{11 (azJx1b1)~

ea1zx1b1

1zea1zx1b1z � � �zeakzx1b1

�,

ea2zx1b1

1zea1zx1b1z � � �zeakzx1b1,

� � � ,

eakzx1b1

1zea1zx1b1z � � �zeakzx1b1

�ð4Þ

Note that the linearity assumption of this model specified k

equations of the form

lnpj

p0

� �~ajzb1xi1zb2xi2z . . . zbpxip ð5Þ

( j 5 1, 2, …, k). The key feature of these equations is thecommon slope parameters that we estimated for therelationship between all k odds ratios and the covariates. Thisresults in a proportional odds model because it forces therelationship between the odds ratios p1/p0, p2/p0, …, pk/p0

and a particular x to have the same slope. In other words,estimated odds ratios are proportional because

pi

pj~eai{aj ð6Þ

does not depend on x.

Assuming counts were independent, the multinomial loglikelihood function for this model was

ln L ak,b1ð Þ~Xn

i~1

yi0 ln pi0

� �zyi1 ln pi1

� �z . . .

zyik ln pik

� �ð7Þ

McDonald and White N Regression Models for Small Count 515

subject to the constraint that pi0 + pi1 + … + pik 5 1. Theconstraint on p was satisfied if we used the inverse linkfunction in equation 4 and set pi0 to

1{pi1{ . . . {pik~ 1zea1zx1b1z . . . zeakzx1b1� {1

ð8Þ

In this constrained multinomial model, choice of thebaseline category makes a difference in the fit. Change of fitarises because slopes in this model are constrained to beequal across all probability ratios to produce proportionality.If slopes were not constrained, choice of baseline categorywould not matter. As a consequence, a model with referencelevel zero may fit well, but the same model with referencelevel one may not.

This constrained model is appropriate when covariatesprimarily influence probability of observing a nonreference(e.g., nonzero) response and are less influential in discrim-inating among probabilities of nonreference levels. Wechose to investigate this model because it was hypotheticallyappropriate for our spotted owl fledgling data from whichwe hypothesized that probability of producing

L

1 youngwas the quantity primarily related to covariates, and afterthat our covariates had little influence on whether a pairproduced 1, 2, or 3 young. If our hypothesis proved correct,the constrained model was more parsimonious than anonproportional odds model (see Agresti 2002) andtherefore more powerful for detecting trends. Furthermore,we considered the proportional odds model appropriatebecause changes in the mean of small count data will likelybe caused by changes in proportions of positive countsrelative to zero. In this case, the ratios p1/p0, p2/p0, …, pk/p0

will change by approximately the same amount and in thesame direction. Estimating p b parameters in this case,rather than pk, will yield higher precision slope estimates.Note that there are p b parameters and k a parameters, for atotal of p + k parameters.

Cumulative Multinomial (CumLogit) ModelThe second multinomial regression model we consideredappropriate for small counts was a cumulative multinomial(Aitchison and Silvey 1957, Ashford 1959, Walker andDuncan 1967, Cox and Snell 1989). Because counts have anatural and meaningful ordering, it was reasonable to modelPr(Y

M

y) as a function of x, in addition to modeling Pr(Y5 y) as above. Modeling a function of the cumulativeprobabilities of Y given x was reasonable because changes inthe mean count must be reflected by changes in cumulativeprobabilities. Conversely, changes in cumulative probabili-ties, particularly those for positive counts, should be highlysensitive to changes in the mean count.

We accomplished the switch from modeling individualprobabilities to modeling cumulative probabilities bychanging the link function ( g1) of the previous multinomialmodel. Index changes were also necessary because thereference count for the cumulative model was k rather thanzero (see Appendix for SAS code that changes the index).Like the previous multinomial model, the cumulativemultinomial model assumed yi , Multinomial(pi,1) where

pi 5 [pi0, pi1, …, pik] subject to the constraint pi0 + pi1 + …+ pik 5 1. The linear part of the cumulative multinomialmodel assumed

g2 pi

� �~ Ik j Jkxi1½ � ak

b1

� �ð9Þ

where intercepts in ak 5 [a0, a1, …, ak21] changed indicesby 21 to reflect the change in reference level. All othermatrices in the linear part of the cumulative model were thesame as those in the MulLogit model.

The real change between this and the previous model wasthe cumulative multinomial link function, that is,

g2(p)~ lnp0

1{p0

� ��;

lnp0zp1

1{ p0zp1

� � !

;

lnp0zp1zp2

1{ p0zp1zp2

� � !

;

� � � ;

lnp0z � � �zpk{1

1{ p0z � � �zpk{1

� � !#

ð10Þ

with inverse,

g{12 (azJx1b1)~

ea0zx1b1

1zea0zx1b1 ,

ea1zx1b1{ea0zx1b1

1zea1zx1b1ð Þ 1zea0zx1b1ð Þ ,

ea2zx1b1{ea1zx1b1

1zea2zx1b1ð Þ 1zea1zx1b1ð Þ ,

� � � ,

eak{1zx1b1{eak{2zx1b1

1zeak{1zx1b1ð Þ 1zeak{2zx1b1ð Þ

�ð11Þ

We satisfied the unity constraint on p by setting pk~ 1zðeakzx1b1Þ{1. Another difference between this model and theregular multinomial was that the additional constraint of a0M

a1

M

M

ak was needed on the intercepts to assurePr(Y M

y) MPr[Y M(y+1)] for all y. Note that thisconstraint assured increasing cumulative probabilities becausewe estimated common slopes for all odds ratios. Thus, likethe previous multinomial model, the cumulative modelestimated proportional odds at fixed levels of x, but unlikethe previous model, a nonproportional odds version was notreadily available. If we attempted estimation of separateslopes (i.e., nonproportionality), a much more complicatedconstraint structure would be required.

Multinomial Mean Response (CatMeans) ModelThe third and final multinomial regression model weconsidered was the mean response model (Agresti 2002).Like the previous 2 multinomial models, the mean response

516 The Journal of Wildlife Management N 74(3)

model assumed yi , Multinomial(pi,1), subject to theconstraint pi0 + pi1 + … + pik 5 1, but we defined the link

function as g3(p)~Xk

j~0

jpj , which is the mean of the

multinomial distribution with levels 0, 1, …, k. This modelresembles normal regression models (i.e., the normal modelabove), and for large k, normal regression approximates themean response model (Agresti 2002). The linear portion ofthis model that relates covariates to g3 was g3(pi) 5 a + xib.Note that a unique inverse link function does not exist forthis model because there are k estimable probabilities, butthe link is 1-dimensional, which means that unlike theMulLogit and CumLogit models, the mean response modelcannot uniquely estimate cell probabilities pj. This modelcan only estimate the mean and the relationship (i.e., bcoefficients) between that mean and the covariates. Thismodel, like the normal model, does not constrain the meanto be .0 or ,k.

Simulation MethodsWe simulated data to mimic counts of juveniles fledged pernest for California spotted owls (Franklin et al. 2004).California spotted owls can fledge 0, 1, 2, or 3 young, andproportions of each of these categories were available for 16years from Blakesley et al. (2010; Table 1). The variance/mean ratio provided one measure of how closely theobserved data followed a Poisson distribution becauseperfect Poisson data has a ratio of one. By this measure,the observed data varied from close to Poisson (e.g., 2004)to far from Poisson (e.g., 1992 and 2001), with mostvariance/mean ratios .1.0.

We ran 2 analyses, distinguished by different sets ofcovariates, using each of the 5 regression models outlinedabove. Each analysis contained 10 years of simulated data.The first analysis (hereafter t, for time model) included 9indicator variables in each of the 5 regression models, whereeach indicator variable equaled one during a particular year

and zero otherwise. That is, the t model contained covariatest1, t2, …, t9, where ti was one during year i and zerootherwise. This model estimated a separate mean each yearof the simulated study, and we detected trend if the overall 9degrees of freedom test for each model was significant. Thesecond analysis (hereafter T, for trend model) included onecovariate containing year of the simulated study and fit alinear line through time. This model estimated one long-term linear trend in mean number of chicks fledged, usefulfor determining if a trend in fledging rates took place in thepopulation being monitored. We detected trend if the slopecoefficient of the line was significantly .0.

To simulate the categorical response data for the t model,we randomly selected with replacement a year from theobserved data (Table 1), and generated a random number offledged young as one random deviate from the multinomialdistribution where we used associated proportions in theobserved data (Table 1) as true probabilities in each level.We evaluated size (Type 1 error rate, nominally a 5 0.05) ofall regression models twice, once by generating all responsesfor all 10 years using multinomial proportions for the yearwith the lowest proportion of zero young fledged (1992) andagain by generating responses using proportions for the yearwith the highest proportion of zero young fledged (2001).We evaluated the size of the tests at these 2 levels of fledgedyoung to check that rejection rates did not vary as a functionof mean number fledged. We conducted additionalsimulations of the size of the Poisson model’s test, and itsrelationship to the variance/mean ratio, under model t bygenerating multinomial observations from proportions ineach of the 16 years of observed data (Table 1). We thenrelated size of the Poisson test to the year’s variance/meanratio (Table 1) to evaluate the test’s size as a function of thedegree to which data violated the Poisson assumption.

To simulate multinomial responses for the T model, weused the average proportions for each count from theobserved data (Table 1) as a baseline set of proportions. To

Table 1. California spotted owl fecundity data from the southern Cascades and Sierra Nevada, California, USA, 1990–2005, we used to simulate data forevaluating power and size of 5 regression models for small counts.

Yr

No. of young fledgedNo. ofnestsPr(Y = 0) Pr(Y = 1) Pr(Y = 2) Pr(Y = 3) x Variance Variance/x

1990 0.432 0.250 0.318 0.000 0.886 0.737 0.832 441991 0.758 0.129 0.113 0.000 0.355 0.455 1.282 621992 0.105 0.210 0.343 0.343 1.924 0.966 0.502 1051993 0.545 0.216 0.239 0.000 0.693 0.690 0.995 881994 0.563 0.204 0.223 0.010 0.680 0.723 1.063 1031995 0.897 0.060 0.043 0.000 0.147 0.211 1.442 1161996 0.851 0.059 0.089 0.000 0.238 0.359 1.512 1011997 0.833 0.063 0.094 0.010 0.281 0.452 1.608 961998 0.612 0.212 0.165 0.012 0.576 0.644 1.117 851999 0.612 0.165 0.212 0.012 0.624 0.729 1.169 852000 0.593 0.160 0.247 0.000 0.654 0.720 1.100 812001 0.919 0.027 0.054 0.000 0.135 0.225 1.665 742002 0.351 0.234 0.319 0.096 1.160 1.028 0.886 942003 0.663 0.141 0.196 0.000 0.533 0.640 1.202 922004 0.479 0.125 0.375 0.021 0.938 0.934 0.996 962005 0.861 0.069 0.069 0.000 0.208 0.304 1.458 72x or total 0.629 0.149 0.176 0.045 0.637 0.855 1.341 1,394

McDonald and White N Regression Models for Small Count 517

induce linear trend in the mean response, we multiplied the3 proportions associated with 1, 2, and 3 young by anincreasing factor x, where x 5 0.70, 0.76, 0.82, …, 1.24, andwe subtracted the sum of these 3 modified proportions fromone each year to obtain the proportion of nests with zerofledglings. We selected this effect size to produce power ofapproximately 0.8 for the larger simulated sample size(provided below) and considerably lower power for thesmaller simulated sample size, but yet with power greaterthan the Type I error rate. We evaluated size (Type I errorrate, nominally a 5 0.05) of the T model’s test for trendassuming the baseline set of multinomial proportions didnot change (i.e., factor x 5 0). In this model, it was possiblefor the normal regression model to produce an estimatedmean response ,0 or .k, so we tabulated the proportion oftimes that the normal procedure predicted responses ,0 inyears 1 or 10, the extremes. It was not possible for any of theother regression models to predict a mean number fledged,0 or .k.

We evaluated size and power of all regression modelsunder both data-rich (n 5 50 nests/yr) and data-sparse (n 5

10 nests/yr) situations. We chose the low sample size toprovide test power greater than Type I error rates, but not ashigh as typically desired power levels of

L

0.8. Everysimulation involved generating 10,000 random sets of data,applying the regression model to each, and counting thenumber of times the regression model rejected the nullhypothesis of zero trend. For the follow-on simulationsevaluating size of the Poisson procedure as it relates to thedegree to which the Poisson assumption was violated, wesimulated 50 nests per year and generated 1,000, rather than10,000, data sets due to the length of computationsinvolved. Because it was theoretically possible for maximi-zation to fail under all nonnormal regression models, andbecause no estimates were produced in this case, wetabulated the number of iterations during which eachregression model failed to converge.

RESULTS

Variance/mean ratio in the actual owl data varied from0.502 to 1.665 (Table 1), the extremes of which representviolations of the distributional assumption for a Poissonmodel. Size results for model t (Table 2) clearly demonstrate

that the Poisson regression model performed improperly fornon-Poisson distributed count data. Using 1992 datawherein proportion of zeros was low, the Poisson modelrejected too infrequently. Using 2001 data wherein propor-tion of zeros was high, the Poisson model rejected toofrequently. The CatMeans model also performed poorlyunder model t. In these simulations, the CatMeans modeleither rejected too often or failed to produce estimates, evenin the data rich situation. Under model t, the CumLogitprocedure had approximately correct size when proportionof zeros was small [i.e., 1992 data with Pr(Y 5 0) 5 0.105]but rejected far too infrequently when proportion of zeroswas high [i.e., 2002 data with Pr(Y 5 0) 5 0.919]. Whenthe MulLogit procedure estimated model t, it either failedtoo often to be useful or failed to reject the proper number of10,000 data sets. Only the normal regression demonstratedproper performance across all 4 cases simulated under modelt (Table 2).

Size results under model T were generally better thanthose under model t (Table 3). For estimating the lineartrend in model T, only the CatMeans model exhibited poorbehavior by either rejecting too frequently or failing toproduce estimates. The other 4 regression proceduresexhibited approximately correct size, although rejection rateof the Poisson regression analysis was border line too low.

When we included differences in fledging rates acrossyears in data simulated under model t, power of the normal

Table 3. Simulated size of nominal a 5 0.05 tests under model T (lineartrend through time) based on 10,000 simulations assuming the averagemultinomial probabilities from Table 1 were unchanged over 10 years. Weconducted simulations with sparse (n 5 10 nests/yr) and rich (n 5 50 nests/yr) data sets. Proper number of rejections in all cases to achieve correct sizewas 500.

Estimator

n = 10 n = 50

Accept Reject Failure Accept Reject Failure

CatMeansa 7,529 1,580 891 9,297 703 0CumLogitb 9,505 495 0 9,496 504 0MulLogitc 9,506 494 0 9,496 504 0Normal 9,501 499 0 9,494 506 0Poisson 9,517 483 0 9,525 475 0

a Multinomial Mean Response Model.b Cumulative Multinomial Model.c Constrained Multinomial Regression Model.

Table 2. Simulated size of nominal a 5 0.05 of tests under model t (categorical effects) based on 10,000 simulations with the 1992 [Pr(Y 5 0) 5 0.105] and2001 [Pr(Y 5 0) 5 0.919] years from Table 1 for 10 years of data with sparse (n 5 10 nests/yr) and rich (n 5 50 nests/yr) data sets. Proper number ofrejections in all cases to achieve correct size was 500.

Estimator

Pr(Y = 0) = 0.105 Pr(Y = 0) = 0.919

n = 10 n = 50 n = 10 n = 50

Accept Reject Failure Accept Reject Failure Accept Reject Failure Accept Reject Failure

CatMeansa 7,701 2,297 2 9,256 744 0 29 0 9,971 7,438 1,224 1,338CumLogitb 9,577 423 0 9,510 490 0 9,999 0 1 9,923 77 0MulLogitc 255 0 9,745 9,546 152 302 45 0 9,955 8,700 69 1,231Normal 9,535 465 0 9,510 490 0 9,531 468 1 9,522 478 0Poisson 9,928 72 0 9,947 53 0 1,167 1,812 7,021 3,276 6,722 2

a Multinomial Mean Response Model.b Cumulative Multinomial Model.c Constrained Multinomial Regression Model.

518 The Journal of Wildlife Management N 74(3)

regression model was far superior to that of the other 4procedures in the data-sparse (n 5 10 observations/yr)situation (Table 4). Power was approximately the same forall 5 procedures in the data-rich situation (n 5 50observations/yr). The Poisson and CumLogit modelsperformed similarly in both the data-sparse and data-richsituations, whereas the CatMeans and MulLogit proceduresresulted in an unacceptable number of failures in the data-sparse setting (Table 4). Both the CatMeans and MulLogitmodels failed under model t when

L

1 years in the simulateddata contained only one response level. This happened, forexample, when all 10 responses in a particular year werezero, which in turn happened most often when proportionof zeros was high (e.g., 2001).

Performance of all 5 regression procedures was similarwhen we inserted positive linear trend into model T(Table 5). The CatMeans procedure failed regularly for n5 10 data. In this case, power of the CumLogit andMulLogit procedures was slightly greater than that of thenormal and Poisson procedures. Given that the way wegenerated our simulated data in this case mimicked thetheoretical model behind both the CumLogit and MulLogitmodels, this was not surprising. It was surprising that boththe normal and Poisson procedure’s power was so close tothat of the CumLogit and MulLogit. In this situation inwhich we expected the multinomial models to perform well,power of the normal model was only approximately 9% less.

For the scenarios simulated in Table 3 (model T, n 5 10and n 5 50), the means predicted by the normal model foryears 1 or 10 were never ,0 or .3. Even when we changedthe simulation to a scenario in which we might expect alarge number of negative estimates from the normal model[i.e., high Pr(Y 5 0) in 2001] the normal trend test stillexhibited a low rate of predicting means ,0, with 917/10,000 (9.2%) of simulations with n 5 10 observations/yearand 1/10,000 (0.01%) of simulations with n 5 50observations/year having predicted means ,0 in years 1 oryear 10.

Follow-on simulations to investigate the relationshipbetween a measure of the degree to which data follow thePoisson distribution (i.e., the variance/mean ratio) and sizeof the Poisson procedure’s test found that the Poisson testperformed appropriately only for variance/mean ratiosslightly .1.0 to approximately 1.2 (Fig. 1). Size of thePoisson test for variance/mean ratios outside this range wereeither too small (ratio ,1) or too large (ratio .1.2). Thesefollow-on simulations corroborate and refine our results(Table 2), where 1992 data with Pr(Y 5 0) 5 0.105corresponds to a ratio of 0.502 and 2001 data with Pr(Y 5 0)5 0.919 corresponds to a ratio of 1.665.

DISCUSSION

Overall, the normal model performed appropriately for allthe simulations reported here, whereas the other 4regression procedures exhibited problems with

L

1 scenario.The normal model almost always yielded answers, hadcorrect test size in all cases, and only in a worst-case scenarioyielded a few estimates outside the native range of the data.The only simulations during which a regression procedurehad greater power than the normal model was simulationsunder the linear trend model when the cumulative logit andmultinomial logit models were slightly more powerful.However, as stated in the results, we expected themultinomial models to do well in this case because theyhad the advantage that data were generated in a manner thatwas nearly identical to the assumptions of these models.

For the data simulated in this evaluation, the normalmodels seldom predicted mean values ,0. However, theworst-case scenario demonstrated that this issue could

Table 4. Power of 5 regression models for small counts under model t(categorical effects) based on 10,000 simulations of data containing 10 yearsof data, where we selected each year at random with replacement from the16 years of observed data in Table 1. We conducted both data-sparse (n 5

10) and data-rich (n 5 50) simulations. Higher numbers of rejectionsindicate higher power. We conducted all tests with a 5 0.05.

Estimator

n = 10 n = 50

Accept Reject Failure Accept Reject Failure

CatMeansa 597 3,323 6,070 12 9,869 0CumLogitb 4,639 5,361 0 19 9,981 0MulLogitc 2,066 1,032 6,902 27 9,830 0Normal 2,023 7,977 0 19 9,981 0Poisson 1,512 5,881 0 22 9,956 2

a Multinomial Mean Response Model.b Cumulative Multinomial Model.c Constrained Multinomial Regression Model.

Table 5. Power of 5 regression models for small counts under model T(linear trend through time) based on 10,000 simulations where we modifiedaverage multinomial probabilities in the example data set to produce apositive trend over 10 years. We conducted both data-sparse (n 5 10 nests/yr) and data-rich (n 5 50 nests/yr) simulations. Higher numbers ofrejections indicate higher power. We conducted all tests with a 5 0.05.

Estimator

n = 10 n = 50

Accept Reject Failure Accept Reject Failure

CatMeansa 5,325 3,017 1,558 1,906 8,094 0CumLogitb 7,478 2,522 0 1,632 8,368 0MulLogitc 7,379 2,621 0 1,440 8,560 0Normal 7,709 2,291 0 2,210 7,790 0Poisson 7,716 2,284 0 2,237 7,763 0

a Multinomial Mean Response Model.b Cumulative Multinomial Model.c Constrained Multinomial Regression Model.

Figure 1. Size of a nominally 5% Poisson regression tests under model t(categorical effects) assuming probability levels for each of 16 were equal tothose in the example data set. We simulated each scenario 1,000 times withn 5 50 observations per year.

McDonald and White N Regression Models for Small Count 519

become a problem. One solution to this problem would beto use a positive link function, such as mi 5 exp(xib), in thenormal model. This link function would force the estimatedmeans

L

0 but could also result in estimated values .k.Another solution would be to treat observed data as strictlybinomial data with categories of 0 and .0. In this case,because responses are small bounded counts, it is likely thatlittle information would be lost in combining counts of 1, 2,and 3.

Performance of the Poisson model was inadequate in allsituations where variance/mean ratios were outside thenarrow range (1.0, 1.2). This result corroborates previousresults (White and Bennetts 1996) and suggests thatPoisson regression is not robust to departures from thePoisson distribution. Even when data are almost perfectlyPoisson, it appears that the normal model performs just aswell as the Poisson model. Our results suggest that thePoisson regression model should only be used when data areclearly distributed as Poisson, although this assumptionwould appear to be difficult to assess and justify in manyecological situations. We qualify this remark by noting thatwe have only simulated performance of the Poisson under 2models, a categorical effects model and a linear trend model.It is possible, although we feel it unlikely, that the Poissonmodel could perform better or equivalent to the normalmodel in certain situations, particularly those were countsare larger.

Models based on the multinomial distribution failed moreoften than the normal model and were plagued by datasparseness. In particular, these models failed when noobservations occurred in

L

1 level of the multinomialresponse. The owl data we used in our simulations (Table 1)frequently did not produce an observation of a level during

L

1 year because the owl data contained some small [e.g.,0.043 5 Pr(Y 5 2) in 1995] and sometimes zero [e.g., Pr(Y5 3) in 1990] probabilities for certain levels. When we fittedthe MulLogit t model to data with missing levels, the logitlink function used in these models did not allow estimates ofpij to be exactly zero in missing cells. Consequently,coefficients in the model were infinite and the overallchi-square test failed. Even when cells were not missing andthe multinomial models did converge, they generallydisplayed power equivalent to or only slightly better thanthat of the normal model. Given the multinomial model’spropensity for failure in what we consider a realisticsimulation, and the more or less equivalent performance ofthe normal model, we see no reason to advocate use of thesemore complex models to analyze the type of data wesimulated here. Our analyses support the use of analysis ofvariance (ANOVA) procedures as conducted by Anthony etal. (2006) and Franklin et al. (1999) in their meta-analysis ofnorthern spotted owl fecundity.

MANAGEMENT IMPLICATIONS

Monitoring programs such as those we illustrated here withthe California spotted owl often use counts of young toexamine population status. These counts are characterizedby small integers that biologically cannot exceed some upper

bound. We found the simplest procedures, regular regres-sion and ANOVA procedures based on a normal distribu-tion, to perform satisfactorily in all cases we considered,whereas the failure rate of the multinomial procedures wasoften excessively high, and the Poisson model demonstratedinappropriate test size for data where the variance/meanratio was ,1 or .1.2. Thus, managers can use simplestatistical methods with which they are likely alreadyfamiliar to analyze the kinds of count data we describedhere.

ACKNOWLEDGMENTS

We thank M. Conner and J. Blakesley for suggestingthis project as part of the California spotted owl meta-analysis conducted at Utah State University 3–7January 2006. A. Franklin summarized the data (Table 1)provided by cooperators participating in the meta-analysisworkshop.

LITERATURE CITED

Agresti, A. 2002. Categorical data analysis. Second edition. Wiley-Interscience, New York, New York, USA.

Aitchison, J., and S. D. Silvey. 1957. The generalization of probit analysisto the case of multiple responses. Biometrika 44:131–140.

Anthony, R. G., E. D. Forsman, A. B. Franklin, D. R. Anderson, K. P.Burnham, G. C. White, C. J. Schwarz, J. D. Nichols, J. E. Hines, G. S.Olson, S. H. Ackers, L. S. Andrews, B. L. Biswell, P. C. Carlson, L. V.Diller, K. M. Dugger, K. E. Fehring, T. L. Fleming, R. P. Gerhardt, S.A. Gremel, R. J. Gutierrez, P. J. Happe, D. R. Herter, J. M. Higley, R. B.Horn, L. L. Irwin, P. J. Loschl, J. A. Reid, and S. G. Sovern. 2006. Statusand trends in demography of northern spotted owls, 1985–2003. WildlifeMonographs 163.

Ashford, J. R. 1959. An approach to the analysis of data for semi-quantalresponses in biology response. Biometrics 15:573–581.

Blakesley, J. A., M. E. Seamans, M. M. Conner, A. B. Franklin, G. C.White, R. J. Gutierrez, J. E. Hines, J. D. Nichols, T. E. Munton, D. W.H. Shaw, J. J. Keane, G. N. Steger, and T. L. McDonald. 2010.Population dynamics of spotted owls in the Sierra Nevada, California.Wildlife Monographs. In press.

Cox, D. R., and E. J. Snell. 1989. The analysis of binary data. Secondedition. Chapman and Hall, London, United Kingdom.

Franklin, A. B., K. P. Burnham, G. C. White, R. G. Anthony, E. D.Forsman, C. Schwarz, J. D. Nichols, and J. Hines. 1999. Range-widestatus and trends in northern spotted owl populations. ColoradoCooperative Fish and Wildlife Research Unit, Colorado State University,Fort Collins, USA.

Franklin, A. B., R. J. Gutierrez, J. D. Nichols, M. E. Seamans, G. C.White, G. S. Zimmerman, J. E. Hines, T. E. Munton, W. S. LaHaye, J.A. Blakesley, G. N. Steger, B. R. Noon, D. W. H. Shaw, J. J. Keane, T.L. McDonald, and S. Britting. 2004. Population dynamics of theCalifornia spotted owl (Strix occidentalis occidentalis): a meta-analysis.Ornithological Monograph 54:1–54.

Martin, T. G., B. A. Wintle, J. R. Rhodes, P. M. Kuhnert, S. A. Field, S. J.Low-Choy, A. J. Tyre, and H. P. Possingham. 2005. Zero toleranceecology: improving ecological inference by modelling the source of zeroobservations. Ecology Letters 8:1235–1246.

McCullagh, P. and J. A. Nelder. 1989. Generalized linear models. Secondedition. Chapman and Hall, London, United Kingdom.

Neter, J., W. Wasserman, and M. H. Kutner. 1985. Applied linearstatistical models, regression, analysis of variance, and experimentaldesigns. Richard Irwin, Homewood, Illinois, USA.

Puig, P., and J. Valero. 2006. Count data distributions: some character-izations with applications. Journal of the American Statistical Association101:332–340.

Walker, S. H., and D. B. Duncan. 1967. Estimation of the probability of anevent as a function of several independent variables. Biometrika 54:167–179.

520 The Journal of Wildlife Management N 74(3)

White, G. C., and R. E. Bennetts. 1996. Analysis of frequency count datausing the negative binomial distribution. Ecology 77:2549–2557.

APPENDIX

We present the SAS code we used to provide maximumlikelihood estimates for the 5 models we considered. Wespecify k 5 3, and define the variables count containingthe count responses, x1 a continuous covariate (such asyear), and x2 a categorical covariate (such as ageclass).

Normal Regression ModelWe used SAS Proc Reg to provide maximum likelihoodestimates for the normal regression model with continuouscovariates, and we used Proc Anova for categoricalcovariates.proc reg;

model count5x1;

proc anova;

class x2;

model count5x2;

Poisson Regression ModelWe used SAS Proc Genmod to provide maximumlikelihood estimates for the Poisson regression model.proc genmod;

class x2;

model count5x1 x2 / link5log dist5

Poisson type3 dscale;

Constrained Multinomial Regression (MulLogit) ModelWe carried out maximization of the likelihood using ProcCatmod in SAS. By default, Catmod places the last level ofa multinomial response in the denominator of the linkfunction. We desired that level zero be placed in thedenominator of the link function, and affected this changein SAS by recoding all zero counts to k + 1. The SASstatements to estimate the multinomial model were%let k53;

data example;

set example;

if count50 then count5&k+1;proc catmod;

direct x1;

response logits;

model count 5 _response_ x1 x2 ;

The _response_ keyword of the model statement wasnecessary to force the proportional odds assumption of themodel (i.e., to estimate common slopes). Removing thiskeyword estimates a nonproportional odds model with (p +1)k parameters (for more on this model see Agresti 2002).

Cumulative Multinomial (CumLogit) ModelLog likelihood for the cumulative models was identical to theregular multinomial (i.e., eq 7). We carried out actualestimation of the model out using Proc Logistic in SAS. Weleft counts in their original order. Assuming again that variablecount contained the count responses, x1 was a continuouscovariate, and x2 was a categorical covariate, the SASstatements to estimate the cumulative multinomial modelwereproc logistic;

class x2;

model count 5 x1 x2 ;

Multinomial Mean Response (CatMeans) ModelLog likelihood for the multinomial mean response modelswas identical to the regular multinomial (i.e., eq 7). Wecarried out actual estimation of the model using ProcCatmod in SAS. We left counts in their original order.Assuming variables are defined as above, the SASstatements to estimate the mean response model wereproc catmod;

direct x1;

response mean;

model count 5 x1 x2;

Associate Editor: Steidl.

McDonald and White N Regression Models for Small Count 521