Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study

18
STATISTICS IN MEDICINE Statist. Med. 2004; 23:3803–3820 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2098 Choice of time-scale in Cox’s model analysis of epidemiologic cohort data: a simulation study Anne C. M. Thi ebaut 1; 2; ; and Jacques B enichou 2 1 INSERM; E3N-EPIC, Institut Gustave Roussy; Villejuif; France 2 Unit e de Biostatistique; CHU et Facult e de M edecine-Pharmacie de Rouen; Rouen; France SUMMARY Cox’s regression model is widely used for assessing associations between potential risk factors and disease occurrence in epidemiologic cohort studies. Although age is often a strong determinant of disease risk, authors have frequently used time-on-study instead of age as the time-scale, as for clinical trials. Unless the baseline hazard is an exponential function of age, this approach can yield dierent estimates of relative hazards than using age as the time-scale, even when age is adjusted for. We performed a simulation study in order to investigate the existence and magnitude of bias for dierent degrees of association between age and the covariate of interest. Age to disease onset was generated from exponential, Weibull or piecewise Weibull distributions, and both xed and time-dependent dichotomous covariates were considered. We observed no bias upon using age as the time-scale. Upon using time- on-study, we veried the absence of bias for exponentially distributed age to disease onset. For non- exponential distributions, we found that bias could occur even when the covariate of interest was independent from age. It could be severe in case of substantial association with age, especially with time- dependent covariates. These ndings were illustrated on data from a cohort of 84 329 French women followed prospectively for breast cancer occurrence. In view of our results, we strongly recommend not using time-on-study as the time-scale for analysing epidemiologic cohort data. Copyright ? 2004 John Wiley & Sons, Ltd. KEY WORDS: omitted covariate; model misspecication; left truncation; time-dependent covariate; epidemiologic methods; breast cancer 1. INTRODUCTION Cox’s semiparametric model [1] is widely used for the analysis of survival or duration-type data both in clinical trials and epidemiologic cohort studies. In this model a time variable or time-scale, W , has to be dened. Based on this denition, at any time W = w, the hazard rate h(w) is the product of a non-parametric baseline hazard h 0 (w) and a parametric function of Correspondence to: Anne C. M. Thi ebaut, Unit e de Biostatistique, CHU Hˆ opitaux de Rouen, 1 rue de Germont, 76031 Rouen Cedex, France. E-mail: [email protected] Contract=grant sponsor: Fondation de France; contract=grant number: 2002009738 Received December 2003 Copyright ? 2004 John Wiley & Sons, Ltd. Accepted October 2004

Transcript of Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study

STATISTICS IN MEDICINEStatist. Med. 2004; 23:3803–3820Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2098

Choice of time-scale in Cox’s model analysis of epidemiologiccohort data: a simulation study

Anne C. M. Thi�ebaut1;2;∗;† and Jacques B�enichou2

1INSERM; E3N-EPIC, Institut Gustave Roussy; Villejuif; France2Unit�e de Biostatistique; CHU et Facult�e de M�edecine-Pharmacie de Rouen; Rouen; France

SUMMARY

Cox’s regression model is widely used for assessing associations between potential risk factors anddisease occurrence in epidemiologic cohort studies. Although age is often a strong determinant ofdisease risk, authors have frequently used time-on-study instead of age as the time-scale, as for clinicaltrials. Unless the baseline hazard is an exponential function of age, this approach can yield di�erentestimates of relative hazards than using age as the time-scale, even when age is adjusted for. Weperformed a simulation study in order to investigate the existence and magnitude of bias for di�erentdegrees of association between age and the covariate of interest. Age to disease onset was generated fromexponential, Weibull or piecewise Weibull distributions, and both �xed and time-dependent dichotomouscovariates were considered. We observed no bias upon using age as the time-scale. Upon using time-on-study, we veri�ed the absence of bias for exponentially distributed age to disease onset. For non-exponential distributions, we found that bias could occur even when the covariate of interest wasindependent from age. It could be severe in case of substantial association with age, especially with time-dependent covariates. These �ndings were illustrated on data from a cohort of 84 329 French womenfollowed prospectively for breast cancer occurrence. In view of our results, we strongly recommend notusing time-on-study as the time-scale for analysing epidemiologic cohort data. Copyright ? 2004 JohnWiley & Sons, Ltd.

KEY WORDS: omitted covariate; model misspeci�cation; left truncation; time-dependent covariate;epidemiologic methods; breast cancer

1. INTRODUCTION

Cox’s semiparametric model [1] is widely used for the analysis of survival or duration-typedata both in clinical trials and epidemiologic cohort studies. In this model a time variable ortime-scale, W , has to be de�ned. Based on this de�nition, at any time W =w, the hazard rateh(w) is the product of a non-parametric baseline hazard h0(w) and a parametric function of

∗Correspondence to: Anne C. M. Thi�ebaut, Unit�e de Biostatistique, CHU Hopitaux de Rouen, 1 rue de Germont,76031 Rouen Cedex, France.

†E-mail: [email protected]

Contract=grant sponsor: Fondation de France; contract=grant number: 2002009738

Received December 2003Copyright ? 2004 John Wiley & Sons, Ltd. Accepted October 2004

3804 A. C. M. THI �EBAUT AND J. B �ENICHOU

explanatory covariates Z and corresponding parameters �, usually taken to be the exponentialfunction, such that h(w |Z = z)= h0(w) exp(�′z).Cox’s model was developed originally with clinical applications in mind, where individuals

are generally followed up since the diagnosis of a given disease or initiation of treatmentuntil death, remission or relapse. In this context, the natural time-scale W is the delay frominclusion, also called length of follow-up or time-on-study. Thus, the number of subjects at riskmonotonically diminishes with time so that successive risk sets are nested. Such a structurede�nes a closed cohort. Usually, adjustment for age at recruitment is performed through theinclusion of a continuous or categorical covariate in the parametric term, exp(�′z).In most epidemiologic cohort studies, subjects are followed up prospectively for the oc-

currence of a given disease. Upon analysing such data, the e�ect of age needs to be tightlycontrolled because the incidence of most diseases, especially chronic diseases, is strongly de-termined by age. The natural time-scale W is then (attained) age. Using time-on-study asthe time-scale would generally not be relevant, especially when the inclusion into the cohortcoincides with an interview, which is not supposed to modify one’s risk. Indeed in epidemi-ologic cohort studies, contrary to clinical studies, the time when a subject �rst comes underobservation usually does not coincide with the time when the subject becomes at risk for thedisease of interest.Although Cox’s model with age as the time-scale has a similar expression to that of Cox’s

model with time-on-study as the time-scale, implicit mechanisms are di�erent, apart from rightcensoring. Indeed, at a given age, some subjects are not yet under observation whereas othersare no more. Thus, the number of subjects at risk does not vary monotonically with age andrisk sets are not nested, unlike with the use of the time-on-study scale. This structure de�nesan open cohort, under which a subject’s observation is conditional to the fact that he=shewas free of disease at his=her age of recruitment. Thus, using age as the time-scale impliesstaggered or delayed entry with left truncation occurring at the age at inclusion [2].Upon using age as the time-scale, the hazard function can be directly interpreted as the age-

speci�c incidence function [2]. Also, because age is taken into account in the non-parametricterm of the hazard function without any need for modelling its e�ect, a more e�ective controlof age ensues. Hence, several authors have recommended to use age as the time-scale for theanalysis of epidemiologic cohort studies [3–5].Yet Cox’s regression with time-on-study as the time-scale, usually with some form of age-

adjustment, has been commonly used instead for the analysis of epidemiologic cohort data inorder to assess associations between covariates and risk of disease occurrence. This approachcan be misleading because it yields estimates that may di�er from those resulting from usingage as the time-scale. However, Korn et al. [4] derived two su�cient conditions under whichCox’s model with age as the time-scale and Cox’s model with time-on-study as the time-scaleadjusting for age at recruitment as a continuous covariate will agree. First, these authors for-mally established that, under the condition of the hazard function belonging to a family of ex-ponential distributions (later referred to as the ‘exponential family’), that is, at any age W = a,h0(a)= c exp( a) for given c¿0 and (condition 1), the two approaches yield exactly iden-tical estimates. Second, they claimed that, in case of independence between the covariate ofinterest and age at recruitment (condition 2), estimates from the two approaches should notbe too far from each other. For this second condition, Korn et al. [4] relied on heuristic rea-soning rather than a formal proof and referred to results given by Gail et al. [6] pertaining tothe issue of covariate omission. Based on analytical results or simulations, Gail et al. [6] and

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

CHOICE OF TIME-SCALE IN COX’S MODEL 3805

other authors [7–13] showed that downward bias in estimating the e�ect of a covariate canresult from omitting a prognostic factor from Cox’s regression model, even when the covariateof interest is independent from the omitted prognostic factor. Thus, carrying over the analogybetween the issue of covariate omission and that of time-scale choice, these results tend tosuggest that omitting or mismodelling age as an adjustment factor in a follow-up-dependentCox model instead of using age as the time-scale could also lead to biases, contrary toKorn et al.’s reasoning.In order to clarify this issue, we performed a simulation study to determine whether in-

dependence between the covariate of interest and age at inclusion is also a su�cient con-dition for obtaining unbiased estimates. A second objective of the study was to investigatethe magnitude of the bias for di�erent degrees of association between age at inclusion andthe covariate of interest and several types of incidence functions. This investigation wasaimed at complementing limited results from small-scale simulations based on a single set ofparameters [5] or from cohort study examples [4, 14]. Some features of our simulation studywere borrowed from the E3N prospective cohort study [15], the main characteristics of whichare described below.

2. COHORT STUDY EXAMPLE

The E3N cohort study (‘Etude Epid�emiologique aupr�es de Femmes de la Mutuelle G�en�eralede l’Education Nationale’) is a prospective study whose main goal is to assess associa-tions between lifestyle risk factors and cancer occurrence in women [15]. Women residing inmetropolitan France (i.e. without Corsica or overseas territories) and aged 40–65 years wererecruited among members of the health insurance plan of the national educational system.Overall, 98 997 volunteers (around 20 per cent of all eligible women) agreed to participateafter a formal invitation, consent form and baseline questionnaire had been mailed to themin January or June 1989 for three pilot administrative areas (‘d�epartements’) of metropolitanFrance, and in June 1990 for the remaining 91 administrative areas. Active follow-up was per-formed through biennial questionnaires. Pathological reports of women who reported canceroccurrence during follow-up were sought from their general practitioner.In this paper, we focused on the relationship of breast cancer risk with menopausal status at

recruitment (i.e. as a �xed covariate) and during follow-up (i.e. as a time-dependent covariate).Such adjustment for menopausal status at recruitment is common in epidemiologic cohortstudies when only baseline measurements are available. The study sample consisted of 84 329French women without any previous diagnosis of cancer and who were either postmenopausalwith non-missing age at menopause or still premenopausal at the end of follow-up. Theoutcome of interest was incident primary invasive breast cancer as a �rst cancer. The cut-o�date for this analysis was June 28,2000, when the last validated follow-up questionnaire wassent (administrative censoring). Other censoring resulted from the occurrence of in situ breastcancer or cancer of another site during follow-up, loss of follow-up, in which case the dateof last response to a questionnaire was used as the censoring date, or death. For an averagefollow-up of 9.7 years, 2461 invasive breast cancer cases were reported and included in theanalysis, among which 97.9 per cent were histologically con�rmed. The observed percentageof censoring was thus 97.1 per cent. Analyses of the example data were performed usingStata software [16].

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

3806 A. C. M. THI �EBAUT AND J. B �ENICHOU

3. SIMULATION STUDY

3.1. Methods

For each situation considered, we generated 1000 independent samples of 50 000 subjectseach. To generate age at recruitment, we relied on the age distribution observed at entry inthe E3N cohort among the 40–65 year old participants without any previous cancer diagnosis[15]. In all simulations, a dichotomous covariate Z was considered and was assumed to actmultiplicatively on the hazard of disease onset on the age scale, with relative hazard exp(�)for the exposed category compared to the unexposed. Age to disease onset was generated fromexponential, Weibull or piecewise Weibull distributions. As for the exponential distribution, aspecial case of the exponential family was selected with =0 and c = �, namely the constant-hazard exponential distribution (later referred to as the ‘exponential distribution’). In contrast,neither Weibull distributions with shape parameter � �=1 (where the baseline hazard functionis de�ned at any age W = a as h0(a) = ���a�−1) nor piecewise Weibull distributions belongto the exponential family. To avoid any assumptions about the shape of disease incidencebefore inclusion into the cohort, we generated distribution functions conditional to disease-free survival until age at inclusion. Right censoring was either administrative (at 10 years offollow-up) or due to death from other causes than breast cancer, with survival times generatedusing constant rates by 10-year age intervals, as given by the 1995 French national mortalitystatistics (available on http://sc8.vesinet.inserm.fr:1080).We �rst explored situations in which condition 2 held in order to verify whether this is

a su�cient condition for the absence of bias in estimating the e�ect of a given covariate.Namely, we generated the covariate of interest, Z0, as a balanced dichotomous variable irre-spective of age at inclusion. We examined situations in which condition 1 held in addition tocondition 2, that is age to disease onset was generated from an exponential distribution, andsituations in which condition 1 did not hold. For the latter case, we used Weibull distributionswith shape parameter �=4. This value was selected to obtain hazard rates increasing withage (i.e. �¿1) as is the case with breast cancer incidence up to age 65 years, and to providea clear departure from exponential distributions (for which �=1). To identify situations inwhich bias could occur, we considered various degrees of association between Z0 and diseaseoccurrence, with six values of relative hazard exp(�), namely 1, 1.5, 2, 5, 10 and 50, for theexposed compared to the unexposed category. Values of � as large as ln 50, although rarelyobserved in practice, were chosen because we expected bias magnitude to increase with that of�, inferring from studies on covariate omission [8, 9]. Moreover, as one study also suggestedthat bias magnitude would be maximum for a censoring percentage of 50 per cent or less[11], we evaluated bias for 50 per cent censoring �rst, then for heavier censoring, namely 60,70, 80, 90 and 95 per cent censoring, expecting a decrease in bias magnitude with increasingcensoring. Scale parameters � of exponential and Weibull distributions were adjusted to obtainthe desired censoring percentages according to the value of �.Second, we aimed at evaluating the magnitude of bias in estimating the e�ect of covariates

associated with age. We examined the cases of two �xed covariates, Z1 and Z2, positively as-sociated with age at inclusion, as well as the case of a time-dependent covariate, Z3. Namely,covariate Z1 was moderately associated with age, probabilities of the exposed category being0.3, 0.4, 0.5, 0.6 and 0.7 in age groups 40–44, 45–49, 50–54, 55–59 and 60+ years, respec-tively. Covariate Z2 was more strongly associated with age, with corresponding probabilities0.1, 0.2, 0.5, 0.8 and 0.9. The time-dependent covariate Z3 was de�ned as a dichotomous

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

CHOICE OF TIME-SCALE IN COX’S MODEL 3807

Figure 1. Cumulative hazard of breast cancer occurrence for ages 40–75 years conditional on disease-freesurvival up to age 40 estimated from the French national incidence statistics in year 1995, and computedfrom an exponential distribution with scale parameter �=0:0022, a Weibull distribution with shapeparameter �=4 and scale parameter �=0:0076, and a piecewise Weibull distribution with parameters

�1 = 4 and �1 = 0:0079 up to age 60 years and �2 = 0:25 and �2 = 0:0031 for age 60 and over.

variable, changing from the at-birth unexposed level to the exposed level at a given age gen-erated from a normal distribution with parameters derived from that of age at menopause inthe E3N cohort (i.e. mean=50 years, standard deviation=4:5 years). For all distributions ofage to disease onset considered, we selected scale parameters that provided a reasonable �tof the incidence in the unexposed category to the observed incidence of breast cancer in theFrench population in 1995 for the age range 40–75 years [17] (see Figure 1). The age range40–75 years was considered in order to encompass the ten years of follow-up of 40–65 yearold women in the E3N study, and the year 1995 was considered because it was the midpointof the follow-up period (from 1990 to 2000) in that study. Speci�cally, age to disease on-set was generated from an exponential distribution with scale parameter �=0:0022, as wellas from a Weibull distribution with shape parameter �=4, as before, and scale parameter�=0:0076. Moreover, as the breast cancer incidence curve seemed to show a slight in�exionaround age 60 years (Figure 1), a piecewise Weibull distribution was also considered in orderto possibly improve the �t relative to a single Weibull distribution. Shape parameters werearbitrarily set at �1 = 4 up to age 60 years (i.e. 40–59 years) and �2 = 0:25 for age 60 yearsand over, and respective scale parameters �1 = 0:0079 and �2 = 0:0031 were selected. For allthree distributions and all three covariates, there followed censoring percentages of about98 per cent in the unexposed category, close to what was observed in the whole E3Ncohort. Results are presented for �=0 and ln 5.Data were analysed with �ve di�erent Cox regression models, using time-on-study as the

time-scale without any adjustment for age (model 1), with adjustment for age at recruitment

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

3808 A. C. M. THI �EBAUT AND J. B �ENICHOU

either as a continuous covariate (model 2) or as a categorical variable with a dummy vari-able for every �ve-year interval except the �rst one (model 3), with strati�cation on age atrecruitment in �ve categories of �ve years each (model 4), and using age as the time-scale(model 5). All models included the dichotomous covariate Z and corresponding parameter� in their parametric term. For each replication, we estimated �, its standard error and thestandardized test statistic obtained as the ratio of these two estimates. Results presented arethe average bias on log relative hazard estimates (i.e. � estimates) and associated standarddeviation derived from the distribution across the 1000 simulated samples, both multipliedby 103. We assessed the presence of bias using the ratio of the average bias to its standarddeviation as a standardized test statistic. As we performed up to 6× 6× 5=180 tests of sig-ni�cance, bias was considered statistically signi�cant at the 0.05 level for two-sided p-valuesless than 2:5× 10−4, in order to take into account multiple comparisons. The proportion ofreplications out of 1000 where the parameter estimate was found statistically di�erent from 0at the two-sided 0.05 level was used as an estimate of type I error for simulations with �=0or power for simulations with � �=0. Simulations were all performed using SAS [18].

3.2. Covariate independent from age at inclusion

We �rst explored situations in which condition 2 held, computing estimates of the logrelative hazard � associated with the exposed category of the age-independent covariateZ0 from the �ve models of analysis described above. To begin with, we examined situa-tions in which age to disease onset was generated from exponential distributions so thatboth conditions 1 and 2 held. Whatever the degree of association between Z0 and dis-ease risk (as determined by �) and whatever the censoring percentage (achieved by ad-justing scale parameter �), bias was found non-signi�cant and at most equal to 6× 10−3.This maximal value was reached for �=ln 50 and 95 per cent censoring with all �ve Coxregression models.Then, we considered single Weibull distributions with same shape parameter �=4, i.e.

not meeting condition 1, and scale parameters adjusted to obtain overall censoring percent-ages approximately equal to 50 per cent on average (Table I). Under the null hypothe-sis, �=0, no bias was apparent with any of the �ve Cox models, with bias equal to 0up to the third decimal point. Consequently, type I errors were found not statistically dif-ferent from the two-sided nominal value 0.05. For non-null values of �, however, con-trasted results were observed among the �ve models. While the age-dependent Cox model(model 5) provided unbiased � estimates in all situations, bias was apparent in some sit-uations for the follow-up-dependent Cox models (models 1–4). When age was completelyomitted (model 1), a pronounced and signi�cant underestimation of � was observed. Notethat, although the magnitude of the mean absolute bias increased with that of �, the rel-ative bias remained virtually constant and equal to −4:2 per cent up to the �rst deci-mal point. A far less pronounced overestimation was seen when age at recruitment wasincluded as a continuous adjustment factor (model 2), becoming signi�cant from �= ln 5but remaining moderate. When age was adjusted for in a categorical form (model 3), biaswas signi�cant, albeit of very small magnitude, only for the extreme relative hazard of50 for the exposed category of covariate Z0. Where bias was signi�cant with model 3,it was also signi�cant with the follow-up-dependent Cox model strati�ed on age at re-cruitment (model 4) but in the opposite direction. Despite the strong underestimation with

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

CHOICE OF TIME-SCALE IN COX’S MODEL 3809

Table I. Average bias (empirical standard deviation, both × 103) on estimates of log relativehazard � associated with the exposed category of age-independent covariate Z0.

Time-scale

Simulation parameters Length of follow-up (time-on-study) Age

Overall Not adjusted Adjusted for age Strati�ed� � per cent censoring for age on age

Continuous Categorical

ln 1 0.0182 50.3 0 (13) 0(13) 0(13) 0(13) 0 (13)ln 1:5 0.0173 50.2 −17∗(13) +1(13) 0(13) −1(13) 0 (13)ln 2 0.0167 49.8 −29∗(13) +1(13) 0(14) −1(13) 0 (13)ln 5 0.0146 50.2 −68∗(15) +3∗(15) 0(15) −3∗(15) 0 (15)ln 10 0.0131 50.3 −96∗(17) +6∗(18) +1(18) −4∗(18) 0 (18)ln 50 0.0100 50.2 −163∗(27) +14∗(28) +5∗(28) −8∗(28) 0 (27)

∗Di�erent from the true parameter value at p (two-sided) ¡2:5× 10−4.Results from Cox proportional hazards analysis (�ve models) of 1000 independent samples of 50 000 indi-viduals, with age to disease onset generated from Weibull distributions with shape parameter �=4 and scaleparameters � selected to yield approximately 50 per cent overall censoring on average.

model 1, statistical power was estimated at 100 per cent for all �ve non-null values of �considered, as well as with all four other models, because of the very large sample size(i.e. 25 000 events on average).We also examined the in�uence of censoring by considering �ve higher censoring percent-

ages (i.e. 60, 70, 80, 90 and 95 per cent) in addition to 50 per cent. Again, no bias wasobserved when age was used as the time-scale (model 5), bias being non-signi�cant and atmost equal to 3× 10−3 for all six values of censoring percentages and all six values of �considered. In the Cox regression model with time-on-study as the time-scale and age includedas a categorical covariate (model 3), bias was at most 5× 10−3 and appeared signi�cant inone extreme case only, i.e. for �= ln 50 and 50 per cent censoring. In the Cox regressionmodel strati�ed on age (model 4), signi�cant underestimation was observed with values of� greater or equal to ln 5 up to 60 per cent censoring only. Again, the largest bias, that is−8× 10−3, was seen for �= ln 50 and 50 per cent censoring. Results from the two remainingmodels, i.e. not including age at recruitment (model 1) or including it as a continuous covari-ate (model 2), are shown on Figure 2. With these two models, we observed no bias for �=0whatever the censoring percentage and a decrease in bias magnitude with increasing censoringpercentages for non-null values of �. For model 1, signi�cant underestimation persisted upto 80 per cent censoring for �= ln 1:5 and up to 90 per cent censoring for higher valuesof �. With �= ln 10 and 80 per cent censoring for instance, bias magnitude was moderatewith mean absolute bias equal to −26× 10−3, corresponding to a relative bias of −1:1 percent. For model 2, signi�cant overestimation was seen only for values of � greater or equalto ln 5 and persisted up to 60 per cent censoring for �= ln 5 and ln 10, and up to 70 percent censoring for �= ln 50. However, it remained less than 15 × 10−3 in all cases. It isworth noting that for 95 per cent censoring, a value typical of epidemiologic cohort studies,none of the �ve models produced signi�cantly biased estimates for any of the six values of� considered.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

3810 A. C. M. THI �EBAUT AND J. B �ENICHOU

Figure 2. Average bias (× 103) on estimates of log relative hazard � associated with the exposedcategory of age-independent covariate Z0. Results from Cox proportional hazards analysis withtime-on-study as the time-scale, unadjusted for age (closed symbols) and adjusted for age as acontinuous covariate (open symbols), of 1000 independent samples of 50 000 individuals, withage to disease onset generated from a Weibull distribution with shape parameter �=4 and varying

scale parameters selected to yield six censoring percentages values.

3.3. Covariate associated with age at inclusion

We then examined situations in which condition 2 did not hold. Namely, the �xed covariatesZ1 and Z2, respectively moderately and strongly associated with age at inclusion, as well as thetime-dependent covariate Z3, were considered successively. Table II shows the average bias on� estimates from the �ve Cox models of analysis when no e�ect of the covariate of interestwas assumed, that is �=0, and for exponential, Weibull and piecewise Weibull distributionsof age to disease onset. No bias was apparent when the exponential distribution was used togenerate age to disease onset or when data were analysed using Cox’s regression model withage as the time-scale (model 5). For both single and piecewise Weibull distributions, however,strong biases in the direction of overestimation were observed in the model where age wasnot at all taken into account (model 1), whichever the covariate under study. For instance, themean absolute bias was 0.439 for estimating the log relative hazard associated with covariateZ2 when age to disease onset was generated from a Weibull distribution. In the three modelsusing time-on-study as the time-scale and adjusting for or stratifying on age at recruitment,signi�cant biases of smaller magnitude (between 0.032 and 0.166) were observed with thetime-dependent covariate Z3 only. Adjusting for age as a categorical variable (model 3) ratherthan a continuous variable (model 2) reduced bias under the piecewise Weibull distribution butincreased it under the single Weibull distribution. Stratifying on age at recruitment (model 4)yielded overestimation of intermediate magnitude with the single Weibull distribution andof lower magnitude with the piecewise Weibull distribution. Where bias was signi�cant,

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

CHOICE OF TIME-SCALE IN COX’S MODEL 3811

Table II. Average bias (empirical standard deviation, both × 103) on estimates of log rela-tive hazard � associated with the exposed category of age-associated covariates Z1; Z2; Z3 for

�=0.

Time-scale

Length of follow-up (time-on-study) Age

Distribution Overall Adjusted for ageof age to per cent Not adjusted Strati�eddisease onset Covariate censoring for age Continuous Categorical on age

Exponential Z1 97.9 +3(62) +3(66) +2(66) +3(66) +3(65)Z2 97.9 +3(64) +4(79) +4(82) +4(82) +4(77)Z3 97.9 +5(66) +5(79) +5(79) +4(82) +4(84)

Weibull Z1 97.8 +196∗(61) +3(64) +3(64) +3(64) +3(64)Z2 97.8 +439∗(60) +3(76) +5(77) +5(77) +5(74)Z3 97.8 +512∗(75) +32∗(89) +53∗(93) +41∗(97) −3(99)

Piecewise Weibull Z1 97.8 +93∗(59) +2(62) +3(62) +3(62) +3(61)Z2 97.8 +202∗(62) −7(77) +5(76) +5(76) +5(72)Z3 97.8 +337∗(72) +166∗(87) +104∗(90) +40∗(92) −2(94)

∗Di�erent from the true parameter value at p (two-sided) ¡2:5× 10−4.Results from Cox proportional hazards analysis (�ve models) of 1000 independent samples of 50 000 indi-viduals, with age to disease onset generated from an exponential distribution with scale parameter �=0:0022,a Weibull distribution with shape parameter �=4 and scale parameter �=0:0076, and a piecewise Weibulldistribution with shape parameters �1 = 4 up to age 60 and �2 = 0:25 for age 60 and over, and correspondingscale parameters �1 = 0:0079 and �2 = 0:0031.

type I error was far above its nominal 0.05 value, especially with model 1. With the piece-wise Weibull distribution for instance, type I errors were 32.5, 90.6 and 99.9 per cent forcovariates Z1, Z2 and Z3, respectively, with model 1, 47.1, 21.4 and 7.0 per cent for covariateZ3 with models 2, 3 and 4, respectively. Values with model 1 were even higher with thesingle Weibull distribution, namely 89.5, 100.0 and 100.0 per cent for covariates Z1, Z2 andZ3, respectively. They were lower, however, for covariate Z3 with models 2, 3 and 4, namely6.5, 8.0 and 5.9 per cent.In Table III, the disease hazard was �ve times as high in the exposed as in the unexposed

group (�= ln 5). Results with �= ln 5 were very similar to those with �=0. Again, no biaswas apparent for the exponential incidence function or Cox’s regression models with age asthe time-scale (model 5). For the single and piecewise Weibull distributions, bias was stilllarge with model 1 for the e�ect of all three covariates Z1, Z2 and Z3, and its magnitudewas virtually the same as for �=0. For the analysis of the time-dependent covariate Z3,substantial bias persisted with models 2–4, with perhaps a slightly higher magnitude than for�=0 with models 2 and 3. Of note, bias became signi�cant with model 2 for the analysis ofthe highly age-associated covariate Z2 for both the single and piecewise Weibull distributions,and for the analysis of the moderately age-associated covariate Z1 for the piecewise Weibulldistribution only. Statistical power to detect an association between disease risk and all threecovariates Z1; Z2 and Z3 was 100 per cent with all models because of the large sample size(i.e. over 2500 events on average).

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

3812 A. C. M. THI �EBAUT AND J. B �ENICHOU

Table III. Average bias (empirical standard deviation, both × 103) on estimates of log rel-ative hazard � associated with the exposed category of age-associated covariates Z1; Z2; Z3

for �= ln 5.

Time-scale

Length of follow-up (time-on-study) Age

Distribution Overall Adjusted for ageof age to per cent Not adjusted Strati�eddisease onset Covariate censoring for age Continuous Categorical on age

Exponential Z1 94.3 +1(46) +1(47) +1(47) +1(47) +1(47)Z2 94.9 +1(46) +1(54) +2(55) +2(55) +2(53)Z3 92.5 +3(57) +3(61) +3(62) +3(64) +4(66)

Weibull Z1 93.9 +189∗(46) +3(48) +1(48) +1(48) +2(48)Z2 94.1 +433∗(45) +11∗(53) +2(55) +2(55) +3(52)Z3 91.5 +511∗(65) +38∗(70) +57∗(71) +40∗(74) −4(75)

Piecewise Weibull Z1 94.1 +89∗(45) +8∗(46) −1(46) 0(46) 0(46)Z2 94.5 +198∗(45) +51∗(53) −3(53) −1(53) −1(51)Z3 91.7 +350∗(63) +204∗(67) +129∗(69) +43∗(70) 0(71)

∗Di�erent from the true parameter value at p (two-sided) ¡2:5× 10−4.Results from Cox proportional hazards analysis (�ve models) of 1000 independent samples of 50 000 indi-viduals, with age to disease onset generated from an exponential distribution with scale parameter �=0:0022,a Weibull distribution with shape parameter �=4 and scale parameter �=0:0076, and a piecewise Weibulldistribution with shape parameters �1 = 4 up to age 60 and �2 = 0:25 for age 60 and over, and correspondingscale parameters �1 = 0:0079 and �2 = 0:0031.

4. EXAMPLE

We �rst assessed whether our data were in agreement with condition 1, that is h0(a)=c exp( a) at any age W = a, by examining a plot of the Nelson–Aalen estimate of the cu-mulative hazard of breast cancer in the E3N cohort as a function of age (Figure 3). Thisplot provided some evidence against condition 1 and the possible change of incline observedaround age 60 years appeared consistent with a two-piece distribution, such as a piecewiseexponential or a piecewise Weibull distribution. The maximum likelihood estimate of shapeparameter � for a single Weibull distribution suggested a departure from an exponential dis-tribution with �=2:13 (95 per cent con�dence interval = 1.83–2.42). Allowing a two-pieceWeibull distribution (¡ 60;¿60 years), the � estimates were found greater than 1 for youngerages and less than 1 for older ages.Table IV shows the estimated log relative hazards of postmenopausal status, as a �xed

(top part) and time-dependent (bottom part) covariate, from the �ve Cox models of anal-ysis used in the simulation study. The �xed covariate menopausal status at recruitment isintrinsically dependent on age at recruitment, postmenopausal status becoming increasinglylikely with increasing age. Indeed, this covariate showed an even stronger association withage at recruitment in our example than covariate Z2 in our simulations, with observed per-centages of postmenopausal women at recruitment of 4.5, 19.7, 66.1, 96.3 and 99.7 per centfor age groups 40–44, 45–49, 50–54, 55–59 and 60+ years, respectively. From Cox’s modelwith time-on-study as the time-scale unadjusted for age (model 1), no association with breast

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

CHOICE OF TIME-SCALE IN COX’S MODEL 3813

Cum

ulat

ive

haza

rd e

stim

ate

Age (years)40 45 50 55 60 65 70 75

0.00

0.05

0.10

Figure 3. Cumulative hazard estimates (using Nelson–Aalen’s method) for breast cancer occurrence asa function of age in 84 329 women of the E3N cohort.

Table IV. Estimates of the log relative hazard of breast cancer associated with post-menopausal status (standard error) upon analysing menopausal status as a �xed or

time-dependent covariate.

Time-scale

Length of follow-up (time-on-study) Age

Adjusted for ageNot adjusted Strati�ed

Covariate for age Continuous Categorical on age

Post- vs pre-menopausal at inclusion +0:011 −0:329∗ −0:318∗ −0:318∗ −0:375∗

(�xed covariate) (0:041) (0:063) (0:062) (0:062) (0:058)Post- vs pre-menopausal over time +0:212∗ +0:124∗ +0:036 +0:035 +0:010(time-independent covariate) (0:045) (0:057) (0:059) (0:062) (0:063)

∗Di�erent from 0 at p (two-sided) ¡0:05.Results from Cox proportional hazards analysis (�ve models) of 84 329 women from the E3N cohort.

cancer risk could be inferred whereas, with all four models taking age into account (models2–5), being postmenopausal at recruitment appeared to be associated with a decrease in breastcancer risk, the e�ect being slightly more pronounced when age was used as the time-scale(model 5). Corresponding decreases in breast cancer risk were estimated at 28, 27, 27 and31 per cent with models 2, 3, 4 and 5, respectively.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

3814 A. C. M. THI �EBAUT AND J. B �ENICHOU

When the model is uncontrolled for the e�ect of age (model 1), menopausal status atrecruitment may act as a proxy for age at recruitment, which in turn is strongly determinedby birth cohort, as recruitment took place in a relatively short period of calendar time. Becausebreast cancer incidence rates do not vary monotonically with age, the increasing rates up to age60 years may tend to counterbalance the decreasing rates after age 60, possibly leading to thenull association we observed with model 1 between postmenopausal status at recruitment andbreast cancer risk. However, when model 1 was strati�ed on �ve-year birth cohort intervals,the log relative hazard estimate of postmenopausal status notably changed from +0:011 to−0:331 and became much closer to estimates from models 2–5. For models using time-on-study as the time-scale and controlling the e�ect of age (models 2–4), women postmenopausalat a given age are compared for their risk of breast cancer after some length of follow-up withwomen who were still premenopausal at the same age. Hence, the reduction in risk that weobserved is consistent with the well-known protective e�ect of an earlier age at menopause onbreast cancer risk (see Reference [19] for instance). When age is used as the time-scale (model5), postmenopausal women are compared to premenopausal women for their risk of breastcancer at a certain age without regards to their age at recruitment. Postmenopausal women atrecruitment being likely to belong to earlier birth cohorts, we veri�ed in the E3N cohort thatbreast cancer incidence rates tended to be lower in earlier compared to later birth cohorts ateach age interval. Therefore, the log relative hazard of postmenopausal status at recruitment,which was greater in magnitude with model 5, may encompass birth cohort e�ect in additionto the e�ect of an earlier age at menopause. Besides, stratifying on one-year birth cohortintervals, the log relative hazard estimate from model 5 became slightly closer to estimatesfrom models 2–4 (−0:338), whereas it remained virtually unchanged when stratifying on�ve-year birth cohort intervals (−0:373).Such results appear to be in line with those from our simulations in Table III for covariate

Z2 and Weibull or piecewise Weibull distributions. Indeed, we observed a 0.386 di�erencebetween estimates from models 1 and 5 in the example, a value comprised between mean dif-ferences of 0.434 and 0.197 in simulations for the single and piecewise Weibull distributions,respectively. Nonetheless, some caution in the interpretation of these results may be neededsince, contrary to the simulation study, Cox’s model with age as the time-scale may also bemisspeci�ed. For instance, the assumption of proportional hazards appeared to be questionable,with some evidence of crossing hazards early on and a possible increased di�erence betweenpre- and postmenopausal status at recruitment after age 60 years (Figure 4). However, onecould not reject the proportional hazards assumption on the time-on-study scale (Figure 4).Since the log hazard estimate from model 5 and those from models 2–4 appear close, thelatter �nding seems to indicate that the departure from the proportional hazards assumptionon the age scale did not incur large bias.The bottom part of Table IV shows estimates for menopausal status considered as a time-

dependent covariate, thus not verifying condition 2 either. This covariate is very similar byconstruction to covariate Z3 in our simulations, because Z3 was derived from the distributionof age at menopause in the E3N cohort. In the example data, postmenopausal status appearedassociated with increased breast cancer risk with model 1, and to a lesser extent with model2. No such association was apparent from models 3, 4 or 5. Hence, menopausal status didnot seem to have an e�ect upon that of age on breast cancer risk. The reason might be that,due to the di�culty to assess age at menopause precisely, menopausal status during follow-upmainly acts as a proxy for age.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

CHOICE OF TIME-SCALE IN COX’S MODEL 3815

Figure 4. Graphical check of the proportional hazards assumption for menopausal status at inclu-sion on the age (top panel) and time-on-study (bottom panel) scales. Plots of ln(− ln S(w)) forthe Kaplan–Meier estimate S(w) corresponding to time to breast cancer onset from the analysis

of 84 329 women from the E3N cohort.

Results from the example data for menopausal status as a time-dependent covariate seemconsistent with those obtained from our simulations in Table II for covariate Z3 and Weibull orpiecewise Weibull distributions. Relative to model 5, overestimation decreased from model 1to model 4, in parallel with simulation results for the piecewise Weibull distribution. However,

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

3816 A. C. M. THI �EBAUT AND J. B �ENICHOU

the di�erence between estimates from models 3 and 5 was less marked in the example thanin the simulation study.

5. DISCUSSION

We showed that bias can occur in Cox’s regression analysis of epidemiologic cohort datawhen time-on-study is used as the time-scale instead of age, even when age is adjusted for(or strati�ed on). Such bias can be relatively severe in case of substantial association betweenage and the covariate under study, especially for time-dependent covariates such as menopausalstatus. Therefore, using time-on-study as the time-scale can be misleading in such situationsand should be discouraged. These results are in agreement with reports of similarly discrepant�ndings with the time-on-study and age scales obtained for �xed covariates strongly associatedwith age, from either limited simulations [5] or a few data examples [4, 14].Moreover, our simulations showed that bias can occur even for covariates independent of

age. Independence between the covariate of interest and age at recruitment is therefore not asu�cient condition for obtaining unbiased estimates in Cox’s regression models with time-on-study as the time-scale, even when age is adjusted for, contrary to Korn et al.’s assertion [4].However, although signi�cant bias was observed after adjustment for age in our simulations,its magnitude was not large and was mainly noticeable for mid-range censoring percentages,a situation much more typical of clinical studies than epidemiologic cohort studies whichare usually characterized by heavy censoring (e.g. in the range 90–100 per cent). Hence,large bias is unlikely in the analysis of covariates independent of age in epidemiologic cohortstudies, as long as age is adjusted for, so that Korn et al.’s claim is veri�ed in practice. Thesame conclusion holds when age at recruitment is strati�ed on.There can be several reasons why bias can occur when using time-on-study rather than

age as the time-scale, depending on whether the covariate of interest is associated with ageor not. First, for covariates associated with age, age can act as a confounder of the associ-ation between these covariates and the risk of disease occurrence, unless age is unrelated todisease occurrence. Indeed, whereas no bias was found in simulations with constant diseasehazard with age (i.e. under an exponential distribution), even for the follow-up-dependentCox model unadjusted for age (model 1), upward confounding bias was observed with model1 in simulations under the single Weibull distribution considered for all three age-associatedcovariates Z1, Z2 and Z3. Such bias could be expected in view of the positive associationsof age with these covariates and disease hazard. In simulations under the piecewise Weibulldistribution considered, with disease hazard increasing up to age 60 years and then decreasing,we still observed upward confounding bias with model 1, possibly because of the dominantweight of the 40–59 year age interval, in which more person-years accumulated than in the60–75 year age interval. Of note, in simulations under the null hypothesis, for both single andpiecewise Weibull distributions, we found that upward confounding bias could cause spuriousassociations between disease risk and any of the three age-associated covariates considered.While omitting age altogether can yield confounding bias, residual confounding bias is

still possible if age is included but its e�ect is misspeci�ed. As for confounding bias, agewould not act as a residual confounder, however, if not associated with disease hazard. Ac-cordingly, in our simulations, we did not observe any bias under the exponential distributionconsidered, whether age was adjusted for in a continuous (model 2) or a categorical (model

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

CHOICE OF TIME-SCALE IN COX’S MODEL 3817

3) manner, or strati�ed on (model 4). The lack of bias with model 2 could also be ex-pected from Korn et al.’s formal proof about exponential family distributions [4] to whichthe exponential distribution belongs. On the contrary, in simulations under single and piece-wise Weibull distributions, we observed upward residual confounding bias with models 2–4,but of markedly smaller magnitude than confounding bias with model 1. Bias in models 2and 3 may result from the misspeci�cation of the age e�ect if the parametric assumptionson the relationship between age and disease hazard on the time-on-study scale are not ver-i�ed. Indeed, when age is included as a continuous covariate in a follow-up-dependent Coxmodel (as in model 2), a log-linear relationship between age and disease hazard is assumed,whereas including age in �ve-year intervals (as in model 3) implies a constant multiplica-tive e�ect on the hazard rate by age group. Other forms of age adjustment, for instancefractional polynomial or spline regression [20], which we did not investigate here, might de-crease bias but would not be expected to eliminate it completely, as long as they rely onparametric assumptions for the age e�ect. Upon stratifying on age at inclusion (as in model4), no parametric assumptions are needed on the relationship between age and disease haz-ard; however, �ve-year age intervals may be too large to allow an e�ective control of thee�ect of age.Second, for covariates independent of age, confounding bias is no more an issue but other

types of bias, speci�c to non-linear regression models such as Cox’s models, may occur, dueto the omission of age or model misspeci�cation. Omitting a prognostic factor independent ofthe covariate of interest (i.e. age in our case) has been shown to result in biased estimates ofthe covariate e�ect toward the null, in situations with right censoring only [6–13]. This �ndingwas illustrated by our observation of underestimated relative hazard for the age-independentcovariate Z0 with model 1. Furthermore, it has been reported that this underestimation maylead to a loss of power to detect an association of the covariate of interest with disease risk,despite a small gain in precision [9, 13]. In our simulations, the sample size was such thatpower was always estimated at 100 per cent, even in cases of strong underestimation, but alimited gain in precision was suggested with model 1 compared to models 2 and 3 in someinstances. Also consistent with results from studies on covariate omission [8, 9], we observedthat �=0 was a su�cient condition for the absence of bias and that bias magnitude increasedwith that of �. In the case of an exponential distribution, no bias occurred, as expected becauseof the lack of association between the omitted factor age and disease risk [7–9]. Finally, weobserved that bias magnitude decreased with increasing censoring percentages higher than 50per cent, again in agreement with previous �ndings [11].Mismodelling age would be expected to have a similar consequence, yet to a lesser extent,

as omitting age on the estimated e�ect of an independent covariate. Two studies [7, 13] con-sidered estimating or testing the log relative hazard associated with the exposed category ofa dichotomous covariate T (e.g. treatment group) in the presence of an independent contin-uous covariate X . Fitting T along with a dichotomized version of X was shown to result indownward bias in estimating the log relative hazard associated with the exposed level of T[13], and to a loss of power in assessing the association between T and disease risk [7, 13].Accordingly, we veri�ed in additional simulations under Weibull distributions that adjustingfor age dichotomized with respect to the median age at inclusion in the E3N study (i.e.¿48 vs¡48 years) in Cox’s models with time-on-study as the time-scale yielded downwardbias in estimating the log relative hazard associated with Z0, as with model 1 though with asmaller magnitude (data not shown).

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

3818 A. C. M. THI �EBAUT AND J. B �ENICHOU

Other forms of model misspeci�cation could explain the overestimation we observed inestimating the log relative hazard associated with covariate Z0 with models 2–4, such asdepartures from the proportional hazards assumption on the time-on-study scale for eitherthe covariate of interest, Z0, or (in models 2 and 3) the adjustment covariate, age. Severalauthors examined the issue of non-proportionality in Cox’s regression models under a closedcohort structure and found that incorrectly assuming proportional hazards may yield biasedestimates of associations. From simulations, Schemper [21] observed downward (respectively,upward) bias in case of converging (respectively, diverging) hazards. Lagakos and Schoenfeld[7] showed that the loss of power is expected to be moderate in such situations, and larger incase of crossing hazards because hazard ratios before and after crossing can then compensateeach other and make the e�ect of the covariate disappear. Departures from the proportionalhazards assumption on the time-on-study scale for the adjustment covariate age could in�uencethe assessment of the association of the covariate of interest with disease risk and cause biasin an unpredictable direction.Besides age, calendar period and birth cohort may play a role in disease incidence. Because

at most two e�ects among age, calendar period and birth cohort e�ects are uniquely identi�ablein longitudinal analyses, possible options are to use Cox’s regression model with age as thetime-scale strati�ed on birth cohort [4, 14] or calendar period (using time-dependent strata)[3], or Cox’s model with time-on-study as the time-scale strati�ed on birth cohort [22]. Allthese models should be equivalent to one another since they make no parametric assumptionson the e�ects of age, calendar period and birth cohort, but exact equivalence may depend onthe precision of the strata used. In the analysis of epidemiologic cohort data, consideration ofbirth cohort e�ect in addition to age e�ect may be greatly relevant for correct interpretation[4, 14], as was illustrated with our example data.Although age appears to be the most natural time variable in most situations, other choices

such as time since �rst full-term pregnancy [23], duration of employment [3], etc., may beequally or more relevant in some situations. In such cases, it might be worth analysingdata with several time-scales in an exploratory fashion. Because subjects are rarely fol-lowed up from their birth in epidemiologic studies, the choice of time-scale in these stud-ies most often requires prior identi�cation of the time point when subjects �rst becomeat risk for the disease of interest. On the contrary, the choice of time-scale is usuallystraightforward in clinical studies, with the inclusion coinciding with the beginning of theat-risk period in most cases. However, it may sometimes be more problematic, if age isitself the prognostic factor of interest. For instance, based on the SEER data, Cheung etal. [24] found that a younger age at breast cancer diagnosis was associated with highermortality when they used Cox’s model with age as the time-scale, whereas they foundthe opposite with time since diagnosis as the time-scale. It should be noted that formalstatistical procedures have been derived for selecting an optimal time-scale or combina-tion of time-scales [25], which may prove useful when there are several plausibletime-scales.In conclusion, we strongly recommend to use age (or another meaningful time variable if

applicable) rather than time-on-study as the time-scale for the analysis of epidemiologic cohortstudies. Presently, as most usual statistical softwares allow the analysis of left-truncated data[26], there is no practical impediment to using Cox’s regression with age as the time-scale.Moreover, our results suggest that a critical review of �ndings from published studies whichused Cox’s regression with time-on-study as the time-scale may be warranted, particularly

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

CHOICE OF TIME-SCALE IN COX’S MODEL 3819

for those studies with covariates of interest which were either time-dependent or stronglyassociated with age.

ACKNOWLEDGEMENTS

Anne Thi�ebaut is grateful to ‘Fondation de France’ for its �nancial support and to Fran�coise Clavel-Chapelon for making the E3N cohort data available. The authors also wish to thank Klervi Leuraud forher help in computations and all E3N members who permitted the use of a dozen of computers. TheE3N study is being carried out with the �nancial support of ‘Ligue Nationale contre le Cancer’, theEuropean Community, Company 3M, ‘Mutuelle G�en�erale de l’Education Nationale’, ‘Institut Nationalde la Sant�e et de la Recherche M�edicale’, Gustave Roussy Institute and several General Councils inFrance. The investigators are indebted to all participants for providing the data and to practitioners fortheir active collaboration to the E3N study.

REFERENCES

1. Cox DR. Regression models and life tables (with discussion). Journal of the Royal Statistical Society, SeriesB 1972; 34(2):187–220.

2. Commenges D, Letenneur L, Joly P, Alioum A, Dartigues JF. Modelling age-speci�c risk: application todementia. Statistics in Medicine 1998; 17(17):1973–1988.

3. Breslow NE, Lubin JH, Marek P, Langholz B. Multiplicative models and cohort analysis. Journal of theAmerican Statistical Association 1983; 78(381):1–12.

4. Korn EL, Graubard BI, Midthune D. Time-to-event analysis of longitudinal follow-up of a survey: choice ofthe time-scale. American Journal of Epidemiology 1997; 145(1):72–80.

5. Commenges D, Letenneur L, Joly P. Re: serum transferrin saturation, stroke incidence, and mortality in womenand men. The NHANES I Epidemiologic Followup Study (letter). American Journal of Epidemiology 1997;146(8):683–684.

6. Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment e�ect in randomized experiments with nonlinearregressions and omitted covariates. Biometrika 1984; 71(3):431–444.

7. Lagakos SW, Schoenfeld DA. Properties of proportional-hazards score tests under misspeci�ed regressionmodels. Biometrics 1984; 40(4):1037–1048.

8. Struthers CA, Kalb�eisch JD. Misspeci�ed proportional hazard models. Biometrika 1986; 73(2):363–369.9. Schumacher M, Olschewski M, Schmoor C. The impact of heterogeneity on the comparison of survival times.Statistics in Medicine 1987; 6(7):773–784.

10. Bretagnolle J, Huber-Carol C. E�ects of omitting covariates in Cox’s model for survival data. ScandinavianJournal of Statistics 1988; 15:125–138.

11. Chastang C, Byar D, Piantadosi S. A quantitative study of the bias in estimating the treatment e�ect caused byomitting a balanced covariate in survival models. Statistics in Medicine 1988; 7(12):1243–1255.

12. Ford I, Norrie J, Ahmadi S. Model inconsistency, illustrated by the Cox proportional hazards model. Statisticsin Medicine 1995; 14(8):735–746.

13. Schmoor C, Schumacher M. E�ects of covariate omission and categorization when analysing randomized trialswith the Cox model. Statistics in Medicine 1997; 16(1–3):225–237.

14. Ingram DD, Makuc DM. Age versus length of follow-up as the time scale in Cox regression models: an empiricalstudy. American Statistical Association 1996 Proceedings of the Biometrics Section. American StatisticalAssociation: Alexandria, VA, 1997; 407–410.

15. Clavel-Chapelon F, van Liere MJ, Guibout C, Niravong MY, Goulard H, Le Corre C, Hoang LA, AmoyelJ, Auquier A, Duquesnel E. E3N, a French cohort study on cancer risk factors. European Journal of CancerPrevention 1997; 6(5):473–478.

16. StataCorp. Stata Statistical Software: Release 7.0. Stata Corporation: College Station, TX, 2001.17. M�en�egoz F, Ch�eri�e-Challine L. Le Cancer en France: Incidence et Mortalit�e. Situation en 1995. Evolution

entre 1975 et 1995. Minist�ere de l’Emploi et de la Solidarit�e: Paris, 1998.18. SAS Institute, Inc. SAS Software: Release 8.2. Cary, NC: SAS Institute, 2001.19. Collaborative Group on Hormonal Factors in Breast Cancer. Breast cancer and hormone replacement therapy:

collaborative reanalysis of data from 51 epidemiological studies of 52,705 women with breast cancer and 108,411women without breast cancer. Lancet 1997; 350(9084):1047–1059.

20. Greenland S. Dose-response and trend analysis in epidemiology: alternatives to categorical analysis.Epidemiology 1995; 6(4):356–365.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820

3820 A. C. M. THI �EBAUT AND J. B �ENICHOU

21. Schemper M. Cox analysis of survival data with non-proportional hazard functions. The Statistician 1992;41(4):455–465.

22. Cnaan A, Ryan L. Survival analysis in natural history studies of disease. Statistics in Medicine 1989;8(10):1255–1268.

23. Farewell VT, Cox DR. A note on multiple time scales in life testing. Applied Statistics 1979; 28(1):73–75.24. Cheung YB, Gao F, Khoo KS. Age at diagnosis and the choice of survival analysis methods in cancer

epidemiology. Journal of Clinical Epidemiology 2003; 56(1):38–43.25. Duchesne T, Lawless J. Alternative time scales and failure time models. Lifetime Data Analysis 2000; 6(2):

157–179.26. Goldstein R, Harrell F. Survival analysis, software. In Encyclopedia of Biostatistics, Armitage P, Colton T

(eds). Wiley: Chichester, 1998; 4461–4466.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:3803–3820