Semiparametric transformation models for multivariate panel count data with dependent observation...

19
Semiparametric transformation models for multivariate panel count data with dependent observation process Ni Li, Do-Hwan Park, Jianguo Sun * , and KyungMann Kim Department of Statistics, University of Missouri, Columbia, MO 65211, USA Abstract This article discusses regression analysis of multivariate panel count data in which the observation process may contain relevant information about or be related to the underlying recurrent event processes of interest. Such data occur if a recurrent event study involves several related types of recurrent events and the observation scheme or process may be subject-specific. For the problem, a class of semiparametric transformation models is presented, which provides a great flexibility for modelling the effects of covariates on the recurrent event processes. For estimation of regression parameters, an estimating equation-based inference procedure is developed and the asymptotic properties of the resulting estimates are established. Also the proposed approach is evaluated by simulation studies and applied to the data arising from a skin cancer chemoprevention trial. Key words and phrases Counting processes; multivariate data analysis; panel count data; transformation models 1. INTRODUCTION Recurrent event studies concern occurrence rates or patterns of recurrent events. In this article, we consider the analysis of the recurrent event studies that involve several related types of recurrent events and in which each subject is observed only at finite discrete time points instead of continuously. Furthermore, the observation times or process may be related to or contain relevant information about the underlying recurrent event processes of interest. In these situations, only the number of events that occur between observation times is known and no information is available on subjects between the observation time points. The areas that often produce such multivariate panel count data include prospective cohort studies, population-based epidemiological studies, reliability studies, and tumorigenicity experiments since in these situations, it is usually either impossible or not practical to maintain continuous observation of subjects. An example of multivariate panel count data is given by a skin cancer chemoprevention trial conducted by the University of Wisconsin Comprehensive Cancer Center in Madison, Wisconsin. It was a double-blinded and placebo-controlled randomized Phase III clinical trial. The primary objective of this trial was to evaluate the effectiveness of 0.5 g/m 2 /day PO difluoromethylornithine (DFMO) in reducing new skin cancers in a population of patients with a history of non-melanoma skin cancers: basal cell carcinoma and squamous cell carcinoma. During the study, the patients were scheduled to be assessed or observed every 6 © 2011 Statistical Society of Canada * Author to whom correspondence may be addressed. [email protected]. NIH Public Access Author Manuscript Can J Stat. Author manuscript; available in PMC 2012 June 06. Published in final edited form as: Can J Stat. 2011 September ; 39(3): 458–474. doi:10.1002/cjs.10118. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

Transcript of Semiparametric transformation models for multivariate panel count data with dependent observation...

Semiparametric transformation models for multivariate panelcount data with dependent observation process

Ni Li, Do-Hwan Park, Jianguo Sun*, and KyungMann KimDepartment of Statistics, University of Missouri, Columbia, MO 65211, USA

AbstractThis article discusses regression analysis of multivariate panel count data in which the observationprocess may contain relevant information about or be related to the underlying recurrent eventprocesses of interest. Such data occur if a recurrent event study involves several related types ofrecurrent events and the observation scheme or process may be subject-specific. For the problem,a class of semiparametric transformation models is presented, which provides a great flexibilityfor modelling the effects of covariates on the recurrent event processes. For estimation ofregression parameters, an estimating equation-based inference procedure is developed and theasymptotic properties of the resulting estimates are established. Also the proposed approach isevaluated by simulation studies and applied to the data arising from a skin cancerchemoprevention trial.

Key words and phrasesCounting processes; multivariate data analysis; panel count data; transformation models

1. INTRODUCTIONRecurrent event studies concern occurrence rates or patterns of recurrent events. In thisarticle, we consider the analysis of the recurrent event studies that involve several relatedtypes of recurrent events and in which each subject is observed only at finite discrete timepoints instead of continuously. Furthermore, the observation times or process may be relatedto or contain relevant information about the underlying recurrent event processes of interest.In these situations, only the number of events that occur between observation times is knownand no information is available on subjects between the observation time points. The areasthat often produce such multivariate panel count data include prospective cohort studies,population-based epidemiological studies, reliability studies, and tumorigenicityexperiments since in these situations, it is usually either impossible or not practical tomaintain continuous observation of subjects.

An example of multivariate panel count data is given by a skin cancer chemoprevention trialconducted by the University of Wisconsin Comprehensive Cancer Center in Madison,Wisconsin. It was a double-blinded and placebo-controlled randomized Phase III clinicaltrial. The primary objective of this trial was to evaluate the effectiveness of 0.5 g/m2/day POdifluoromethylornithine (DFMO) in reducing new skin cancers in a population of patientswith a history of non-melanoma skin cancers: basal cell carcinoma and squamous cellcarcinoma. During the study, the patients were scheduled to be assessed or observed every 6

© 2011 Statistical Society of Canada*Author to whom correspondence may be addressed. [email protected].

NIH Public AccessAuthor ManuscriptCan J Stat. Author manuscript; available in PMC 2012 June 06.

Published in final edited form as:Can J Stat. 2011 September ; 39(3): 458–474. doi:10.1002/cjs.10118.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

months for the development of new skin cancers. As expected, however, the real observationtimes differ from patient to patient and so as the follow-up times. The study consists of 291patients randomized to either the placebo group (147) or the DFMO group (144) and thedata include the numbers of occurrences of both basal cell carcinoma and squamous cellcarcinoma between observation times. More details about the study will be given below. Heet al. (2008) discussed another example of multivariate panel count data from a cohort study,conducted at the University of Toronto Psoriatic Arthritis Clinic, of the patients withpsoriatic arthritis on the recurrences of joint damages.

A number of statistical procedures have been developed for univariate panel count datawhen the observation process can be regarded as being independent of the underlyingrecurrent event process (Sun & Kalbfleisch, 1993, 1995; Sun & Wei, 2000; & Lachin, 1988;Zhang, 2002). For the situation where the two processes may be related, among others, He,Tong, & Sun (2009), Huang, Wang, & Zhang (2006) and Sun, Tong, & He (2007) proposedsome joint modelling approaches that model the relationship between the underlyingrecurrent event process and the observation process through the use of some random effects.However, these approaches only apply to limited situations. For multivariate panel countdata, Chen et al. (2005) proposed two approaches based on a mixed Poisson model withpiecewise constant baseline intensities. One approach assumes that the different types ofrecurrent events are related through multivariate log-normal random effects and basesinference on the resulting full likelihood, while the other makes use of the marginal modelapproach. He et al. (2008) also discussed the same problem and proposed a marginal modelapproach. However, in both cases, the methods developed only apply to situations where therecurrent event processes and the observation process are independent completely orconditional on covariates.

The remainder of the article is organized as follows. We will begin in Section 2 withdefining the notation and describing the models used throughout the article. In particular, aclass of semiparametric transformation models is presented for modelling covariate effectsand the dependence of the underlying recurrent event processes of interest on theobservation process. A proportional rates model is employed for modelling the effect ofcovariates on the observation process. Note that here we focus on the marginal models andone major advantage of these models is that they leave the dependence structures for relatedtypes of recurrent events completely arbitrary. More comments on this are given below. InSection 3, some estimating equations are proposed for estimation of regression parametersand the asymptotic properties of the resulting estimates are established. Section 4 presentssome simulation results obtained for the evaluation of the proposed estimates and theysuggest that the approach works well for practical situations. The method is applied inSection 5 to the bivariate panel count data from the skin cancer chemoprevention trialdiscussed above and some concluding remarks are given in Section 6.

2. STATISTICAL MODELSConsider a study that consists of n independent subjects experiencing some recurrent eventsand suppose that each subject may experience K different types of events. Let Yik(t) denotethe point process that represents the total number of occurrences of the kth type recurrentevent of interest up to time t arising from subject i and 0 < tik,1 < … < tik,mik the potentialtime points at which subject i is observed on the kth type of recurrent event. Also let Zi(t) bea covariate process and Ci the follow-up or censoring time for subject i, i = 1, …, n. Notethat here for simplicity, we assume that Zi(t) and Ci are the same for different types of

recurrent events. Define , a counting process that records the numberof observations on subject i with respect to the kth type recurrent event up to time t. Thenthe process Yik(t) is observed only at the time points where Nik(t) jumps. In reality, Yik(t) is

Li et al. Page 2

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

observed at tik,l only if tik,l ≤ Ci ≤ τ, where τ denotes the longest follow-up time. DefineÑik(t) = Nik{min(t,Ci)}, the actual observation process for subject i with respect to the kth

type recurrent event, and , the total number of actual observations onsubject i with respect to the kth type recurrent event. It is easy to see that we haveΔi(t)dNik(t) = dÑik(t), where Δi(t) = I(Ci ≥ t).

For the dependence of the observation process on covariates, following Huang, Wang, &Zhang (2006) and Sun, Tong, & He (2007), we will assume that Nik(t) is a nonhomogeneousPoisson process with

(1)

i = 1, …, n. Here γ is a vector of unknown regression parameters and Λ0k(t) is an arbitrary,unknown nondecreasing function, representing the mean cumulative number of observationsby time t. In the following, we will assume that dΛ0k(t) = λ0k(t)dt and the model (1) aboveis often referred to as the proportional rate model (Cook & Lawless, 2007).

For subject i, define ℱikt = {Nik(s), 0 ≤ s < t}, the history or filtration of the observationprocess Nik up to time t−, i = 1, …, n. To characterize the relationship between the recurrentevent process Yik(t) and covariate process Zi(t), we will assume that the mean function ofYik(t) is specified by the following semiparametric transformation model

(2)

given Zi(t) and ℱikt. In the model above, g is a known twice continuously differentiable andstrictly increasing function, μ0k denotes an unspecified smooth function of t, β, and α arevectors of unknown regression parameters, and H(·) is a vector of known functions of ℱikt.Model (2) assumes that the observation process Nik(t) may be informative or containrelevant information about the underlying recurrent event process Yik(t), and Yik(t) dependson Nik(t) through α. The goal is to make inference about β and α.

For the function vector H in model (2), as discussed in Sun et al. (2005), it can havedifferent forms depending on the dependence of Yik(t) on Nik(t). For example, one may takeH(ℱikt) = Nik(t−) if it is believed that Yik(t) may depend on the total number of theobservations with respect to the kth type recurrent event up to time t. This could be the casein a medical study in which patients may pay more visits to their doctors because they feelworse than usual either with or without treatments.

3. INFERENCE PROCEDURESIn this section, we will present the estimation procedure for regression parameters β and αas well as other parameters. For this, let β0, α0, and γ0 denote the true values of β, α, and γ.

Also let Xik(t) = (Zi(t)′, H(ℱikt)′)′, θ = (β′, α′)′, for easy presentation. Define

i = 1, …, n. Then under models (1) and (2), the Mik(t; β0, α0, γ0)’s are zero-mean stochasticprocesses. This suggests that if β, α, γ, and Λ0k are known, one can estimate μ0k(t) by thesolution to

Li et al. Page 3

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

(3)

Also it is natural to estimate θ by the estimating equation

(4)

where W(t) is a possibly data-dependent weight function.

Of course, in general, γ and Λ0k(t) are unknown. Fortunately, we have recurrent event dataon the Nik(t)’s and in this case, a consistent estimator of γ is given by the solution, say γ̂, tothe estimating equation

(Andersen et al., 1993; Cook & Lawless, 2007). In the above, Z ̄(t; γ) = S1(t; γ)/S0(t; γ) with

j = 0, 1. Furthermore, Λ0k(t) can be estimated by

with replacing γ by γ̂.

Let θ̂ = (β̂′, α̂′)′, and μ̂0k(t) denote the estimates of θ and μ0k(t) given by the estimatingEquations (3) and (4) with replacing γ and Λ0k(t) by γ̂ and Λ̂0k(t; γ̂), respectively. Then byfollowing the discussion used in Lin, Wei, & Ying (2001) and Sun et al. (2005), one canshow that for large n, both θ̂ and μ̂0k(t) always exist and are unique and consistent. Toestablish the asymptotic normality of θ̂, define

Li et al. Page 4

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

and

In the above, ġ(t) = dg(t)/dt, and υ⊗2 = υυ′ for a vector υ. Then it can be shown that asn→∞, n1/2(θ̂ − θ0) has an asymptotically normal distribution with mean zero and thecovariance matrix that can be consistently estimated by Â−1Σ̂Â−1, where

and

The proof is sketched in the Appendix.

For the determination of θ̂ and μ̂0k(t), let s1 < s2 < … < sJ denote the distinct orderedobservation times of {tik, l, l = 1, …, mik; i = 1, …, n; k = 1, …, K}. Then one can firstderive μ̂0k(t; θ, γ) from the Equation (3), which can be rewritten as

j = 1, …, J. With replacing μ0k(t) by μ̂0k(t; θ, γ̂), the Equation (4) has the form

Li et al. Page 5

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

It can be easily seen that in general, there are no closed forms for μ̂0k(t; θ, γ) and θ̂ andsome iterative algorithms have to be used to solve the equations above. For some specialfunction g, however, the determination of these estimators is quite straightforward. Forexample, assume g(t) = tη, where η is a positive constant. In this case, μ̂0k(t; θ, γ) has anexplicit expression given by

and U(θ; γ̂) = 0 becomes

where

If we take g(t) = log(t), the estimator μ̂0k(t; θ, γ) also has a closed form that can be obtainedby

In this situation, we have

where

Li et al. Page 6

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

which yields

That is, θ̂ has a closed form.

4. A SIMULATION STUDYAn extensive simulation study was conducted to evaluate the performance of the proposedinference procedure with the focus on estimation of β and α. In the study, we assumed thatZi(t) was a Bernoulli random variable with 0.5 success probability and the follow-up time Ciwas generated from the uniform distribution over (τ/2, τ). For the generation of theobservation process, it was assumed that Ni1(t) and Ni2(t) are Poisson processes satisfyingmodel (1). In this case, given Zi, the number of observations, , follows the Poisson

distribution with mean Λ(Ci | Zi) = Λ0(Ci)eγZi and the observation times arethe order statistics of the random sample of size from the uniform distribution over (0,Ci).

To generate panel counts Yik(tik,l), we assumed that

and given the Qi’s, follow Poisson distributions with the meanfunctions

and

respectively, , k = 1, 2, i = 1, …, n. Here tik,0 = 0 and the Qi’s are a randomsample from the gamma distribution with mean 1 and variance 0.1. With respect to functionsg and μ0k, we considered a number of choices including g(t) = t, g(t) = t2 and g(t) = log(t)for g and μ0k(t) = t, μ0k(t) = t1/2 and μ0k(t) = exp(t) for μ0k. For all situations, we tookH(ℱikt) = Nik(t−), W(t) = 1, and n = 100 or 300. The results reported below are based on500 replications.

Table 1 presents the results for estimation of β and α based on the simulated data with g(t) =t, μ01(t) = t, μ02(t) = t1/2, λ01(t) = 20/τ, and λ02(t) = 15/τ. The results include the estimatedbiases (Bias) given by the sample means of the point estimates β̂ and α̂ minus their truevalues, the sample means of the estimated standard errors of β̂ and α̂ (SEE), the samplingstandard deviations of β̂ and α̂ (SSE), and the 95% empirical coverage probabilities (CP) forboth β̂ and α̂. It can be seen from the table that the point estimates seem to be unbiased andSEE and SSE are quite close to each other, indicating that the proposed variance estimationseems to work well. Also the coverage probabilities are reasonable and consistent with the

Li et al. Page 7

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

nominal levels and as expected, the estimates of variances became smaller as the sample sizeincreased.

The results given in Tables 2 and 3 are also about the estimation of β and α obtained underthe set-ups similar to those used in Table 1 except some underlying functions. Specifically,in Table 2, we employed g(t) = log(t), μ01(t) = μ02(t) = exp(t), λ01(t) = λ02(t) = 10/τ, whilefor Table 3, we used g(t) = log(t), μ01(t) = exp(t), μ02(t) = exp(t1/2), λ01(t) = 10/τ and λ02(t)= 6/τ. One can easily see that both Tables 2 and 3 gave similar conclusions to those given inTable 1 and again suggested that the proposed inference procedure seems to perform wellfor practical situations. In addition to the results presented here, we also considered otherset-ups such that the Zi’s follow the normal distribution and the observation process wasgenerated from the nonhomogeneous Poisson process. The results obtained are similar tothose described above.

5. APPLICATION TO THE SKIN CANCER CHEMOPREVENTION TRIALIn this section, we apply the methodology proposed in the previous sections to the bivariatepanel count data on the occurrences of the two types of related non-melanoma skin cancers,basal cell carcinoma and squamous cell carcinoma, from the skin cancer chemopreventiontrial discussed before. As mentioned above, the study involves 291 skin cancer patients andtwo treatment groups, placebo group and DFMO group. For the analysis below, we willfocus on the 290 patients (147 in the placebo group and 143 in the DFMO group) with atleast one observation. Among these patients, the number of observations ranges from 1 to17. With respect to the number of the recurrent events, the number of new basal cellcarcinoma ranges from 0 to 16, while the number of new squamous cell carcinoma rangesfrom 0 to 23. In addition to the treatment indicator, for each patient, other baselineinformation available include patient’s gender and age at the diagnosis and the number ofprior skin cancers from first diagnosis to randomization. In the original analysis of the study,the two-sample t-test was used to compare the skin cancer occurrence rates, defined as thetotal number of new skin cancers divided by the length of follow-up period, between the twotreatment groups. It is clear that a great deal of the useful information was lost in thisanalysis.

To analyze the bivariate skin cancer panel count data, for patient i, let Yi1(t) denote the totalnumber of the occurrences of basal cell carcinoma up to time t and Yi2(t) the total number ofthe occurrences of squamous cell carcinoma. Also let Ni1(t) = Ni2(t) denote the observationprocess on patient i, which is the same for both types of the skin cancers. Define Zi1 = 0 ifpatient i was in the placebo group and 1 otherwise, Zi2 and Zi3 to denote the number of priorskin cancers and the age of the patient, and Zi4 = 0 if patient i is female and 1 otherwise, i =1, …, 290. Assume that the Nik(t)’s and Yik(t)’s can be described by models (1) and (2),respectively. With respect to the function g, we considered the same three functions used inthe simulation study: g(t) = t, g(t) = t2, and g(t) = log(t). For the function H(ℱikt), twosituations were investigated. One is to take H(ℱikt) = Nik(t−), assuming that the recurrencerate of skin cancer may depend on the total number of patient’s visits, and the other is to letH(ℱikt) = Nik(t−) − Nik(t − 100), meaning that the recurrence rate may depend only on thenumber of patient’s visits during the 100-day period before. The latter choice was motivatedby the fact that sometimes it is the most recent visits that may carry information about theresponse variable. For all analyses, we took W(t) = 1.

The analysis results are given in Tables 4 and 5. The former is for H(ℱikt) = Nik(t−) and thelatter is for H(ℱikt) = Nik(t−) − Nik(t − 100). They include the point estimates of regressionparameters, the estimated standard errors (SE) and the estimated 95% confidence intervals(CI). One can see from the tables that it seems that the DFMO treatment did not have any

Li et al. Page 8

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

significant effect on the recurrence of the skin cancer and also the recurrence did not seem tobe significantly related to the age and gender of the patient. However, the occurrence of theskin cancer seems to be positively related to the number of prior skin cancers. It can also beseen that the analysis results seem to be consistent with respect to the choice of functions gand H(ℱikt). More comments on this are given in the next section. In terms of therelationship between the recurrence process of the types of skin cancers considered here andthe patient’s observation process, the results suggest that the latter seems to contain somerelevant information about the former. However, the information may depend on the follow-up period or timing as the results indicate that a higher total number of the observations wasrelated to a higher recurrence rate of the skin cancer, but a higher number of theobservations over a short period could cause a lower recurrence rate of the skin cancer. Apossible explanation is that the more observations or visits mean the higher recurrence ratein general, but the high number of the visits over a short period would leave no time for andthus prevent the occurrence of new skin cancers.

6. CONCLUDING REMARKSIn the preceding sections, we have discussed regression analysis of multivariate panel countdata when the observation process on study subjects may contain relevant information aboutthe underlying recurrent event processes of interest. Such data naturally occur when arecurrent event study involves several related types of recurrent events and follow-up orobservation processes on study subjects are subject-specific. The areas in which one oftenfaces such data include clinical trials and medical or social follow-up studies. For theproblem, we presented a class of flexible transformation models for the underlying recurrentevent processes and an estimating equation-based inference procedure was developed forestimation of regression parameters. Both finite sample and asymptotic properties of theproposed estimates have been established and the simulation results indicated that theprocedure seems to work well for practical situations. The methodology was applied to a setof bivariate panel count data arising from a skin cancer chemoprevention trial.

It is worth noting that both models (1) and (2) are marginal models and we took the marginalapproach for the problem considered here as the main focus of the article was on estimationof covariate effects. An advantage of the proposed approach, as many other marginalapproaches for multivariate data, is that it leaves the correlation between different types ofrecurrent events arbitrary. An alternative is to directly model the correlation structure, whichwould be appealing if the correlation is of main interest. In general, if the model is correct,the direct modelling of the correlation could increase the efficiency of the estimatedcovariate effects but has no effect on the unbiasedness of the estimate.

There exist several directions for future research. One is that in the proposed methodology,the observation process Nik(t) was assumed to be a nonhomogeneous Poisson process. It isclear that this may not be true in practice and it would be useful to generalize the proposedapproach to situations where the assumption does not hold. Another direction is modelchecking and the selection of the link function g. Although both models (1) and (2) are quitegeneral, there may exist situations where they do not hold. To assess their appropriatenessfor a given data set, one way is to develop some model checking procedures for thegoodness-of-fit test. Note that for the situation where the main interest is on covariate effectsas in the example discussed in Section 5, the selection of the function g may not be critical ifdifferent functions g give similar results. Also note that the regression parameters βcorresponding to different g may have different meanings. A related problem or method is toassume that g belongs to some class of functions and to develop a procedure for its selection.

Li et al. Page 9

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

A third direction is that in the estimating function (4), we used a possibly data-dependentweight function W(t) and it would be useful to develop some procedures for the selection ofan appropriate weight function for a given data set. It is clear that W(t) cannot be any data-dependent function as W(t) needs to be chosen such that the estimating function has zeroexpectation. In general, one would like to choose W(t) that gives the most efficient estimateof covariate effects. However, sometimes, this may not be possible and it is usually quitedifficult.

AcknowledgmentsThe authors wish to thank the reviewer and guest editors, Drs. Cook and Yi, for their many useful comments andsuggestions. The work of the third author was partly supported by the NIH grant R01 CA152035.

APPENDIX: JOINT ASYMPTOTIC NORMALITY OF β̂ AND α ̂In this section, we will use the same notation defined above and all limits are taken as n →∞. Assume that g is a twice continuously differentiable function. Also assume that Zi(t),H(․), and W(t) have bounded variations and W(t) converges almost surely to a deterministicfunction w(t) uniformly in t ∈ [0, τ]. Define

and

Let s0(t), s1(t), exk (t), and rk(t) denote the limits of S0(t; γ0), S1(t; γ0), EXk (t; θ0, γ0), andRk(t; θ0, γ0), respectively. Also let z̄(t) = s1(t)/s0(t). First by the linear expansion of g, wehave

(A.1)

where lies between μ0k(t) and μ̂0k(t; θ0, γ0). Note that μ̂0k(t; θ, γ) satisfies

(A.2)

Likewise, the linear expansion of (A.2) with θ = θ0 and γ = γ0 yields

Li et al. Page 10

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

(A.3)

where lies between μ0k(t) and μ̂0k(t; θ0, γ0). By using the functional central limittheorem (Pollard, 1990, p. 53) and noting sup0≤t≤τ |Λ̂0k(t; γ0) − Λ0k(t)| = op(n−1/2), it can beseen that

uniformly in t. Hence, combining (A.1) and (A.3) with the uniform convergence of μ̂0k(t;θ0, γ0), we have

where

It is well known that

Thus,

It then follows that

(A.4)

It can be checked that

Li et al. Page 11

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

(A.5)

where . Differentiation of (A.2) with respect to γ yields

(A.6)

where

Let P*(θ, γ) = −n−1∂U(θ, γ)/∂γ′. Thus it follows from (A.5) and (A.6) that P*(θ, γ)converges almost surely to a nonrandom function P(θ, γ) uniformly in θ and γ. DenoteP(θ0; γ0) by P. Then

Using the Taylor expansion of U(θ0; γ̂) at (θ0, γ0), we have

(A.7)

based on the consistency of γ̂ and Equation (A.5) of Lin et al. (2000), where

Hence it follows from (A.4), (A.7) and the multivariate central limit theorem that n−1/2U(θ0;γ̂) converges in distribution to a mean-0 normal random vector with covariance matrix

By using the method in Equation (A.3) of Lin et al. (2000), it is easy to see that Σ can beconsistently estimated by Σ̂ given in Section 3.

Differentiation of (A.2) with respect to θ gives

Li et al. Page 12

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

(A.8)

where

Thus, it follows from (A.8) and some simple algebra that −n−1∂U(θ; γ̂)/∂θ′ convergesalmost surely to a nonrandom function A(θ) uniformly in θ, and

which can be consistently estimated by  given in Section 3. The Taylor expansion of U(θ̂;γ̂) at (θ0, γ̂) yields

(A.9)

Therefore, it follows from (A.7) and (A.9) that n1/2(θ̂ − θ0) asymptotically follows thenormal distribution with mean zero and the covariance matrix A(θ0)−1ΣA(θ0)−1, which canbe consistently estimated by Â−1Σ̂Â−1.

BIBLIOGRAPHYAndersen, PK.; Borgan, O.; Gill, RD.; Keiding, N. Statistical Models Based on Counting Processes.

New York: Springer-Verlag; 1993.

Chen BE, Cook RJ, Lawless JF, Zhan M. Statistical methods for multivariate interval-censoredrecurrent events. Statistics in Medicine. 2005; 24:671C691. [PubMed: 15558580]

Cook, RJ.; Lawless, JF. The Statistical Analysis of Recurrent Events. New York: Springer-Verlag;2007.

He X, Tong X, Sun J, Cook RJ. Regression analysis of multivariate panel count data. Biostatistics.2008; 9:234–248. [PubMed: 17626224]

He X, Tong X, Sun J. Semiparametric analysis of panel count data with correlated observation andfollow-up times. Lifetime Data Analysis. 2009; 15:177–196. [PubMed: 19082711]

Huang CY, Wang MC, Zhang Y. Analysing panel count data with informative observation times.Biometrika. 2006; 93:763–775.

Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions ofrecurrent events. Journal of the Royal Statistical Society, Series B. 2000; 62:711–730.

Lin DY, Wei LJ, Ying Z. Semiparametric transformation models for point processes. Journal of theAmerican Statistical Association. 2001; 96:620–628.

Pollard, D. NSF-CBMS Regional Conference Series in Probability and Statistics. Hayward,Philadelphia: 2, Society for Industrial and Applied Mathematics, Institute of Mathematical Statisticsand American Statistical Association; 1990. Empirical processes: Theory and applications.

Sun J, Kalbfleisch JD. The analysis of current status data on point processes. Journal of the AmericanStatistical Association. 1993; 88:1449–1454.

Sun J, Kalbfleisch JD. Estimation of the mean function of point processes based on panel count data.Statistica Sinica. 1995; 5:279–290.

Li et al. Page 13

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

Sun J, Park D, Sun L, Zhao X. Semiparametric regression analysis of longitudinal data withinformative observation times. Journal of the American Statistical Association. 2005; 100:882–889.

Sun J, Tong X, He X. Regression analysis of panel count data with dependent observation times.Biometrics. 2007; 63:1053–1059. [PubMed: 18078478]

Sun J, Wei LJ. Regression analysis of panel count data with covariate-dependent observation andcensoring times. Journal of the Royal Statistical Society, Series B. 2000; 62:293–302.

Thall PF, Lachin JM. Analysis of recurrent events: Nonparametric methods for random-interval countdata. Journal of the American Statistical Association. 1988; 83:339–347.

Zhang Y. A semiparametric pseudolikelihood estimation method for panel count data. Biometrika.2002; 89:39–48.

Li et al. Page 14

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

Li et al. Page 15

Tabl

e 1

Res

ults

on

estim

atio

n of

β a

nd α

bas

ed o

n th

e si

mul

ated

dat

a w

ith g

(t)

= t,

µ01

(t)

= t,

µ02

(t)

= t1/

2 , λ

01( t

) =

20/τ,

and

λ02

(t)

= 1

5/τ.

n =

100

n =

300

τ =

= 2

τ =

= 2

β̂α̂

β̂α̂

β̂α̂

β̂α̂

θ =

(0,

0)

B

IAS

0.01

520.

0047

0.02

360.

0028

0.01

190.

0047

0.01

850.

0038

S

SE0.

2574

0.02

440.

2048

0.01

920.

1462

0.01

330.

1168

0.01

11

S

EE

0.25

040.

0235

0.19

930.

0187

0.14

660.

0140

0.11

690.

0110

C

P0.

954

0.92

40.

924

0.93

60.

952

0.93

80.

950

0.92

8

θ =

(0,

0.0

2)

B

IAS

0.04

490.

0023

0.02

300.

0030

0.01

400.

0048

0.01

130.

0032

S

SE0.

2333

0.02

100.

1898

0.01

700.

1335

0.01

270.

1078

0.00

99

S

EE

0.22

120.

0201

0.17

770.

0157

0.12

890.

0118

0.10

390.

0095

C

P0.

928

0.93

80.

928

0.93

20.

948

0.91

20.

938

0.93

2

θ =

(0.

5, 0

)

B

IAS

0.02

610.

0029

0.00

070.

0049

0.00

620.

0063

0.00

530.

0040

S

SE0.

2314

0.02

120.

1925

0.01

630.

1346

0.01

210.

1127

0.00

97

S

EE

0.22

390.

0205

0.18

030.

0162

0.13

120.

0123

0.10

640.

0098

C

P0.

934

0.93

40.

932

0.93

40.

936

0.92

00.

942

0.93

4

θ =

(0.

5, 0

.02)

B

IAS

0.01

970.

0029

0.01

360.

0042

0.01

340.

0040

0.02

110.

0029

S

SE0.

1918

0.01

860.

1643

0.01

450.

1153

0.01

060.

0959

0.00

84

S

EE

0.19

840.

0172

0.16

290.

0141

0.11

700.

0104

0.09

550.

0085

C

P0.

956

0.92

20.

936

0.93

00.

966

0.92

20.

950

0.93

0

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

Li et al. Page 16

Tabl

e 2

Res

ults

on

estim

atio

n of

β a

nd α

bas

ed o

n th

e si

mul

ated

dat

a w

ith g

(t)

= lo

g t,

µ 01(

t) =

µ02

(t)

= e

xp(t

), a

nd λ

01(t

) =

λ02

(t)

= 1

0/τ.

n =

100

n =

300

τ =

= 4

τ =

= 4

β̂α̂

β̂α̂

β̂α̂

β̂α̂

θ =

(0,

0)

B

IAS

0.01

08−

0.00

030.

0147

−0.

0001

0.00

350.

0006

−0.

0059

0.00

03

S

SE0.

1743

0.0

363

0.20

66 0

.045

10.

1014

0.02

12 0

.122

20.

0255

S

EE

0.16

67 0

.035

00.

1992

0.0

424

0.09

750.

0207

0.1

176

0.02

52

C

P0.

924

0.9

420.

950

0.9

380.

950

0.94

2 0

.934

0.95

6

θ =

(0,

0.1

)

B

IAS

−0.

0056

−0.

0022

0.01

94−

0.00

20−

0.00

830.

0012

−0.

0009

−0.

0001

S

SE 0

.211

2 0

.050

40.

2400

0.0

587

0.1

289

0.03

11 0

.143

4 0

.034

0

S

EE

0.2

116

0.0

480

0.24

32 0

.055

7 0

.122

90.

0290

0.1

429

0.0

334

C

P 0

.960

0.9

260.

950

0.9

32 0

.922

0.93

6 0

.938

0.9

44

θ =

(0.

5, 0

)

B

IAS

−0.

0103

−0.

0015

−0.

0178

−0.

0032

0.00

84−

0.00

05−

0.00

500.

0020

S

SE 0

.197

3 0

.040

7 0

.225

6 0

.050

10.

1112

0.0

251

0.1

392

0.02

82

S

EE

0.1

906

0.0

403

0.2

239

0.0

470

0.11

26 0

.024

2 0

.131

60.

0283

C

P 0

.942

0.9

36 0

.936

0.9

240.

954

0.9

48 0

.922

0.95

2

θ =

(0.

5, 0

.1)

B

IAS

−0.

0025

0.00

160.

0127

−0.

0015

−0.

0009

−0.

0017

0.00

860.

0016

S

SE 0

.230

10.

0530

0.25

70 0

.060

8 0

.132

9 0

.033

30.

1538

0.03

95

S

EE

0.2

308

0.05

320.

2626

0.0

592

0.1

374

0.0

320

0.15

410.

0363

C

P 0

.942

0.94

00.

948

0.9

30 0

.952

0.9

380.

936

0.93

8

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

Li et al. Page 17

Tabl

e 3

Res

ults

on

estim

atio

n of

β a

nd α

bas

ed o

n th

e si

mul

ated

dat

a w

ith g

(t)

= lo

g t,

µ 01(

t) =

exp

(t),

µ02

(t)

= e

xp(t

1/2 )

λ01

( t)

= 1

0/τ,

and

λ02

(t)

= 6

/τ.

n =

100

n =

300

τ =

= 4

τ =

= 4

β̂α̂

β̂α̂

β̂α̂

β̂α̂

θ =

(0,

0)

B

IAS

0.00

690.

0012

0.00

12−

0.00

120.

0141

−0.

0019

0.00

62−

0.00

04

S

SE0.

1728

0.04

140.

1976

0.0

498

0.09

76 0

.023

00.

1178

0.0

293

S

EE

0.16

760.

0396

0.19

60 0

.047

30.

0986

0.0

236

0.11

48 0

.028

2

C

P0.

950

0.93

40.

948

0.9

360.

942

0.9

500.

946

0.9

48

θ =

(0,

0.1

)

B

IAS

0.02

13−

0.00

340.

0224

−0.

0057

0.00

52−

0.00

160.

0041

−0.

0011

S

SE0.

2240

0.0

553

0.24

74 0

.063

40.

1229

0.0

340

0.13

83 0

.038

4

S

EE

0.20

46 0

.053

10.

2338

0.0

608

0.12

12 0

.032

50.

1374

0.0

370

C

P0.

920

0.9

340.

934

0.9

300.

948

0.9

420.

952

0.9

44

θ =

(0.

5, 0

)

B

IAS

−0.

0040

−0.

0053

0.01

76−

0.00

210.

0056

−0.

0021

0.00

16−

0.00

28

S

SE 0

.200

2 0

.045

60.

2331

0.0

553

0.11

14 0

.028

80.

1307

0.0

320

S

EE

0.1

936

0.0

457

0.22

11 0

.053

00.

1133

0.0

272

0.12

87 0

.031

4

C

P 0

.948

0.9

340.

932

0.9

300.

946

0.9

340.

962

0.9

46

θ =

(0.

5, 0

.1)

B

IAS

0.03

23−

0.00

400.

0158

−0.

0003

0.01

20−

0.00

290.

0086

−0.

0014

S

SE0.

2437

0.0

632

0.26

92 0

.070

40.

1384

0.0

373

0.14

83 0

.042

9

S

EE

0.23

18 0

.059

10.

2557

0.0

676

0.13

42 0

.036

30.

1502

0.0

408

C

P0.

934

0.9

320.

934

0.9

320.

946

0.9

380.

950

0.9

28

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

Li et al. Page 18

Tabl

e 4

Est

imat

ion

of r

egre

ssio

n pa

ram

eter

s fo

r th

e sk

in c

ance

r ch

emop

reve

ntio

n tr

ial w

ith H

(ℱit)

= N

i(t−

).

Fun

ctio

n g(

t)

DF

MO

tre

atm

ent:

β̂ 1, S

E(β̂

1),

95%

CI

for β 1

Pri

or s

kin

canc

er n

umbe

r:β̂ 2

, SE

(β̂2)

,95

% C

I fo

r β 2

Age

: β̂ 3

,

SE(β̂

3),

95%

CI

for β 3

Gen

der:

β̂4,

SE(β̂

4),

95%

CI

for β 4

Obs

erva

tion

his

tory

:α̂

, SE

(α̂),

95%

CI

for α

g(t)

= t

−0.

2629

, 0.1

849,

(−

0.62

52, 0

.099

5)0.

0697

, 0.0

080,

(0.

0539

, 0.0

854)

−0.

0016

, 0.0

085,

(−

0.01

83, 0

.015

1)0.

2419

, 0.1

896,

(−

0.12

97, 0

.613

5)0.

1657

, 0.0

469,

(0.

0738

, 0.2

575)

g(t)

= t2

−0.

1314

, 0.0

924,

(−

0.31

26, 0

.049

8)0.

0348

, 0.0

040,

(0.

0269

, 0.0

427)

−0.

0008

, 0.0

043,

(−

0.00

91, 0

.007

5)0.

1210

, 0.0

948,

(−

0.06

48, 0

.306

7)0.

0828

, 0.0

234,

(0.

0369

, 0.1

288)

g(t)

= lo

g(t)

−0.

1107

, 0.1

111,

(−

0.32

84, 0

.107

0)0.

0981

, 0.0

223,

(0.

0544

, 0.1

419)

−0.

0035

, 0.0

047,

(−

0.01

26, 0

.005

7)0.

1478

, 0.1

106,

(−

0.06

90, 0

.364

7)0.

1718

, 0.0

736,

(0.

0275

, 0.3

162)

Can J Stat. Author manuscript; available in PMC 2012 June 06.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

Li et al. Page 19

Tabl

e 5

Est

imat

ion

of r

egre

ssio

n pa

ram

eter

s fo

r th

e Sk

in c

ance

r ch

emop

reve

ntio

n tr

ial w

ith H

(ℱit)

= N

i(t−

) −

Ni(t

− 1

00).

Fun

ctio

n g(

t)D

FM

O t

reat

men

tβ̂ 1

, SE

(β̂1)

, 95%

CI

for β 1

Pri

or s

kin

canc

er n

umbe

rβ̂ 2

, SE

(β̂2)

, 95%

CI

for β 2

Age

β̂ 3, S

E(β̂

3), 9

5% C

I fo

r β 3

Gen

der

β̂ 4, S

E(β̂

4), 9

5% C

I fo

r β 4

Obs

erva

tion

his

tory

α̂, S

E(α̂

), 9

5% C

I fo

r α

g(t)

= t

−0.

3863

, 0.2

116,

(−

0.80

11,

0.02

84)

0.07

74, 0

.009

5, (

0.05

87, 0

.096

0)0.

0044

, 0.0

094,

(−

0.01

39, 0

.022

8)0.

2050

, 0.2

060,

(−

0.19

87,

0.60

87)

−0.

7768

, 0.2

744,

(−

1.31

45,

−0.

2391

)

g(t)

= t2

−0.

1932

, 0.1

058,

(−

0.40

05,

0.01

42),

0.03

87, 0

.004

8, (

0.02

93, 0

.048

0)0.

0022

, 0.0

047,

(−

0.00

70, 0

.011

4)0.

1025

, 0.1

030,

(−

0.09

94,

0.30

44)

−0.

3884

, 0.1

372,

(−

0.65

73,

−0.

1195

)

g(t)

= lo

g(t)

−0.

1418

, 0.1

146,

(−

0.36

64,

0.08

29)

0.10

600.

0247

, (0.

0576

, 0.1

544)

−0.

0008

0.00

48, (

−0.

0103

, 0.0

087)

0.11

49, 0

.110

8, (

−0.

1022

,0.

3320

)−

0.46

21, 0

.098

3, (

−0.

6549

,−

0.26

94)

Can J Stat. Author manuscript; available in PMC 2012 June 06.