Part VIII: Survival response data - Harvard University
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Part VIII: Survival response data - Harvard University
✬
✫
✩
✪
Leukemia relapse
• Randomized study of treatment for leukemia
⋆ 21 patients in each of two treatment arms
(A) control
(B) 6-mercaptopurine (6-MP)
• Interest is in the time to relapse from treatment initiation
• Observed times to relapse among the patients in the control arm:
1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8,
11, 11, 12, 12, 15, 17, 22, 23
⋆ shown graphically on the next slide
• The outcome is continuous and we could choose to summarize these
observed times with the mean
⋆ sample mean = 8.7 weeks
553 BIO 233, Spring 2015
✬
✫
✩
✪
Time to relapse, weeks
CO
NT
RO
L
0 5 10 15 20 25 30 35
Time to relapse, weeks
6−M
P
0 5 10 15 20 25 30 35
]
]
]
]]
]
]]]
]]
554 BIO 233, Spring 2015
✬
✫
✩
✪
• Crucially, to be able to calculate the mean we had to observe all of the
relapse times
• In the 6-MP arm, the actual relapse time is only available for 10 of the 21
patients
⋆ for the other 11, all we know is that they had not relapsed by a certain
time
⋆ we don’t know what happened afterwards
• In a sense, the value of time to relapse is ‘incomplete’ for these 11
individuals
⋆ the sample mean of 17.1 weeks will be biased
• However, we do have partial information on these individuals
⋆ patient #4 in the 6-MP arm had not relapsed by week 6
⋆ patient #21 in the 6-MP arm had not relapsed by week 35
555 BIO 233, Spring 2015
✬
✫
✩
✪
• Such observations are censored
⋆ we don’t observe the actual event time, T
⋆ instead, we follow individuals up to some point in time but not
thereafter
• Intuitively, the actual event times are ‘missing’ or ’incomplete’
• If we observed T for all individuals in our sample, analyses would be
straightforward
⋆ calculate the mean time to relapse in each of the treatment arms
⋆ linear regression with T as the outcome and some set of covariates, X
Q: Can we estimate a mean or fit a linear regression model with data that are
censored?
Q: Do individuals who are censored provide information about T ?
⋆ if so, how?
556 BIO 233, Spring 2015
✬
✫
✩
✪
Censoring
• The censoring observed in the leukemia example is specifically referred to
as right censoring
• As with any type of missingness, we have to consider the mechanisms at
play that lead to the missingness
⋆ which observations are incomplete/censored and why?
⋆ potential selection bias?
• Examples of censoring mechanisms include:
⋆ end of study
⋆ loss to follow-up
⋆ withdrawal from the study
⋆ death
557 BIO 233, Spring 2015
✬
✫
✩
✪
• Our focus will be on right censoring but there are other types of censoring
⋆ other types of ‘missingness’ or ‘incompleteness’
• Left truncation
⋆ first opportunity to observe the event comes some time after the origin
⋆ likely an issue if age is taken as the time scale
⋆ problematic because the event could have occurred prior to our starting
observation or follow-up
• Interval censoring
⋆ we only know that the event occurred during some particular interval
⋆ an issue if individuals are followed-up at intervals
∗ e.g., every 3 months
⋆ problematic because we don’t know the exact value of T
558 BIO 233, Spring 2015
✬
✫
✩
✪
• Individuals who are censored do indeed provide some (partial) information
⋆ i.e. we know that they had not experienced the event by a certain time
• As we’ll see, to make use of that information we have to make a crucial
assumption:
⋆ the censoring mechanism is independent of the survival time
• Intuitively, the assumption states that
⋆ survival experience of those who were censored is the same as those
who were not censored
⋆ censoring carries no prognostic information
559 BIO 233, Spring 2015
✬
✫
✩
✪
Notation
• Let Ti denote the ‘time to event’ for the ith individual
• Let Ci denote the corresponding censoring time
• In survival data, we either observe the event time or the censoring time,
whichever occurs first
⋆ i.e. we observe Yi = min(Ti, Ci)
• We distinguish which occurs with the notation
δi =
1 if Ti ≤ Ci
0 if Ti > Ci
⋆ δi is often referred to as the status indicator
• So the observed data is: {(Yi, δi,Xi); i = 1, . . . , n}
560 BIO 233, Spring 2015
✬
✫
✩
✪
Survival distributions
• ‘Time to relapse’ is an example of a survival response
• More generally, we consider the time to some event
⋆ from a well-defined starting point, referred to as the ‘origin’
⋆ measure ‘survival’ until the event occurs
• To characterize the distribution of T , we often consider four functions
(1) probability density function, f(t)
(2) survival function, S(t)
(3) hazard function, λ(t)
(4) cumulative hazard function, Λ(t)
⋆ each fully characterizes the underlying distribution
⋆ four functions are interrelated
561 BIO 233, Spring 2015
✬
✫
✩
✪
1. Probability density function:
f(t) = lim∆→0
1
∆P(t ≤ T < t+∆)
= λ(t) S(t) = λ(t) exp
−
t∫
0
λ(s)∂s
2. Survival function:
S(t) = P(T > t) = 1− P(T ≤ T )
= 1− F(t) = 1−
t∫
0
f(s)∂s
= exp {−Λ(t)}
⋆ F(t) is the cumulative distribution function (CDF)
562 BIO 233, Spring 2015
✬
✫
✩
✪
3. Hazard function:
λ(t) = lim∆→0
1
∆P(t ≤ T < t+∆| T ≥ t)
=f(t)
S(t)= −
∂
∂tlog S(t)
⋆ interpreted as the instantaneous rate at time t, given that the event
has not occurred prior to time t
4. Cumulative hazard function:
Λ(t) =
t∫
0
λ(s)∂s
= −log S(t)
563 BIO 233, Spring 2015
✬
✫
✩
✪
Distributions for T
• The random variable T is continuous and non-negative
• There are many distributions for continuous random variables that have
support on R+ = (0, ∞), including
⋆ Exponential
⋆ Weibull
⋆ Gamma
⋆ Log-normal
⋆ Log-logistic
⋆ Gompertz
⋆ Generalized gamma
• Collectively referred to as survival distributions
564 BIO 233, Spring 2015
✬
✫
✩
✪
Exponential distribution
• The exponential distribution is one of the simplest survival distributions
⋆ also important historically in that early attempts at modeling survival
data were based on it
• For T ∼ Exp(µ)
f(t) =1
µexp
{−
t
µ
}
S(t) = exp
{−
t
µ
}
λ(t) =1
µ
Λ(t) =t
µ
⋆ E[T ] = µ and V[T ] = µ2
⋆ the hazard is constant as a function of time
565 BIO 233, Spring 2015
✬
✫
✩
✪
• As µ increases, the density assigns mass further away from T = 0
⋆ events occur later in time
Time
pdf,
f(t)
0 1 2 3 4 5
0.0
0.5
1.0
1.5
2.0
µ = 0.5µ = 1.0µ = 2.0
566 BIO 233, Spring 2015
✬
✫
✩
✪
• As µ increases, survival experience is ‘better’
⋆ events occur later in the time scale
Time
Sur
viva
l, S
(t)
0 1 2 3 4 5
0.00
0.25
0.50
0.75
1.00
µ = 0.5µ = 1.0µ = 2.0
567 BIO 233, Spring 2015
✬
✫
✩
✪
• As µ increases, the cumulative hazard increases at a slower rate
⋆ the total number of events accrue more slowly
Time
Cum
ulat
ive
haza
rd fu
nctio
n, H
(t)
0 1 2 3 4 5
02
46
810
µ = 0.5µ = 1.0µ = 2.0
568 BIO 233, Spring 2015
✬
✫
✩
✪
Weibull distribution
• The exponential distribution is limited in that it is completely
characterized by a single parameter, µ
• A more flexible distribution is the Weibull distribution
• For T ∼ Weibull(µ, σ)
f(t) =σ
µ
(t
µ
)σ−1
exp
{−
(t
µ
)σ}
S(t) = exp
{−
(t
µ
)σ}
λ(t) =σ
µ
(t
µ
)σ−1
Λ(t) =
(t
µ
)σ
569 BIO 233, Spring 2015
✬
✫
✩
✪
• Notes:
⋆ µ is referred to as the ‘scale’ parameter
⋆ σ is referred to as the ‘shape’ parameter
⋆ Exp(µ) is a special case with σ = 1
∗ additional parameter provides increased flexibility
⋆ the mean and variance are given by
E[T ] = µ Γ(1 + 1/σ)
V[T ] = µ2 {Γ(1 + 2/σ)− Γ(1 + 1/σ)2}
570 BIO 233, Spring 2015
✬
✫
✩
✪
• For µ = 1, as σ increases, the shape changes so that less (relative) mass is
given to low event times and high event times
⋆ events occur ‘in the middle’
Time
pdf,
f(t)
0 1 2 3 4 5
0.0
0.5
1.0
1.5
2.0
σ = 0.5σ = 1.0σ = 2.0
571 BIO 233, Spring 2015
✬
✫
✩
✪
• For µ = 1, as σ increases, survival is (relatively) better early on but the
descent is much more dramatic
⋆ events occur ‘in the middle’
Time
Sur
viva
l, S
(t)
0 1 2 3 4 5
0.00
0.25
0.50
0.75
1.00
σ = 0.5σ = 1.0σ = 2.0
572 BIO 233, Spring 2015
✬
✫
✩
✪
• For µ = 1,
⋆ if σ < 1, the hazard is initially higher but then decreases with time
⋆ if σ ≥ 1, the hazard is initially lower but then increases with time
Time
haza
rd fu
nctio
n, h
(t)
0 1 2 3 4 5
02
46
810
σ = 0.5σ = 1.0σ = 2.0
573 BIO 233, Spring 2015
✬
✫
✩
✪
• The impact of σ on λ(t) can also be seen in the cumulative hazard function
Time
Cum
ulat
ive
haza
rd fu
nctio
n, H
(t)
0 1 2 3 4 5
05
1015
2025
σ = 0.5σ = 1.0σ = 2.0
574 BIO 233, Spring 2015
✬
✫
✩
✪
Parametric survival analysis
• Suppose we observed an i.i.d sample of size n from an Exp(µ) distribution
• In the absence of censoring and (for the moment) ignoring covariates, the
observed data would be: {ti; i = 1, . . . , n}
• The likelihood is the usual product of n independent contributions
L(µ) =n∏
i=1
f(ti;µ) =n∏
i=1
1
µexp
{−tiµ
}
with the log-likelihood given by
ℓ(µ) =n∑
i=1
{− log(µ) −
tiµ
}
575 BIO 233, Spring 2015
✬
✫
✩
✪
• Differentiating with respect to µ, setting equal to zero and solving the
MLE is
µMLE =1
n
n∑
i=1
ti
⋆ i.e. the sample mean
Q: What if we don’t observe T for each individual?
⋆ i.e. some individuals are (right-)censored
• The observed data is
{(yi, δi); i = 1, . . . , n}
where Yi = min(Ti, Ci)
Q: What is the appropriate likelihood for these data?
⋆ how does each individual contribute ‘information’?
576 BIO 233, Spring 2015
✬
✫
✩
✪
• Assuming independence between individuals, the likelihood will still be the
product of n contributions
• If Yi is the event time, Ti, then the contribution to the likelihood is the
usual contribution
⋆ i.e. the density function f(yi)
• If Yi is the censoring time, Ci, then we know that their actual event time
must be after Ci
⋆ assuming independent censoring (i.e. Ci ⊥⊥ Ti), the contribution to the
likelihood is the fact that they survived to Yi
⋆ i.e. the survival function S(yi)
• We can succinctly write the contributions as
Li(µ) =
fi(µ) if δi = 1
Si(µ) if δi = 0
577 BIO 233, Spring 2015
✬
✫
✩
✪
• We can therefore write the likelihood as:
L(µ) =
n∏
i=1
f(yi;µ)δi S(yi;µ)
(1−δi)
• Recall that
λ(yi;µ) =f(yi;µ)
S(yi;µ)
so that the likelihood can be written as
L(µ) =n∏
i=1
λ(yi;µ)δi S(yi;µ)
=n∏
i=1
(1
µ
)δi
exp
{−yiµ
}
578 BIO 233, Spring 2015
✬
✫
✩
✪
• The log-likelihood is therefore
ℓ(µ) =n∑
i=1
{− δilog(µ) −
yiµ
}
and differentiating with respect to µ, we get
∂
∂µℓ(µ) =
n∑
i=1
{−
δiµ
+yiµ2
}
• Setting this expression to zero and solving, the MLE is given by
µMLE =
n∑i=1
yi
n∑i=1
δi
=Yn
Dn
⋆ equal to the sample mean if all the δi’s = 1
579 BIO 233, Spring 2015
✬
✫
✩
✪
• Differentiating again with respect to µ
∂2
∂µ2ℓ(µ) =
n∑
i=1
{δiµ2
− 2yiµ3
}
so that
V[µMLE] =
{−
∂2
∂µ2ℓ(µ)
∣∣∣∣µMLE
}−1
=µ2
MLE
Dn
⋆ use this to construct a 95% confidence interval
• Use the delta method to derive the variance for any other function of µ
⋆ e.g. the survival function: S(t) = exp{−t/µ}
580 BIO 233, Spring 2015
✬
✫
✩
✪
• For the leukemia data, we get
Control arm 6-MP arm
(n=21) (n=21)
µnaıve 8.7 17.1
Yn 182 359
Dn 21 10
µMLE 8.7 35.9
95% CI (5.0, 12.4) (13.6, 58.2)
Tmed 6.0 24.9
• Notice how biased the naıve estimate is in the 6-MP group
• Tmed is the median survival
⋆ the time by which 50% of individuals experience the event
⋆ for the exponential distribution Tmed = log(2)µ
581 BIO 233, Spring 2015
✬
✫
✩
✪
• Pluggin in µMLE into S(t) = exp{−t/µ} and plotting against time we get
Time to relapse, weeks
Sur
viva
l, S
(t)
0 5 10 15 20 25 30 35
0.0
0.2
0.4
0.6
0.8
1.0
6−MP; µ1 = 35.90Control; µ0 = 8.67
• Conclude that 6-MP is beneficial in that it
⋆ prolongs expected time to relapse by an estimated 27.2 weeks
⋆ extends median survival by 18.9 weeks
582 BIO 233, Spring 2015
✬
✫
✩
✪
Exponential regression: AFT models
• To formally evaluate differences in survival between the two treatment
arms we could embed the comparison within a regression framework
• One modeling challenge is that T > 0
⋆ building a regression model directly for E[T ] may result in negative
fitted values
• If T ∼ Exp(µ), then
ǫ = log(T )− log(µ) ∼ extreme value distribution
⋆ a unimodal, negatively skewed distribution with
f(ǫ) = exp{ǫ− exp{ǫ}}
S(ǫ) = exp{−exp{ǫ}}
583 BIO 233, Spring 2015
✬
✫
✩
✪
• We can use this to write the following
log(Ti) = log(E[Ti]) + ǫi
• The accelerated failure time (AFT) model structures the mean via
log(E[Ti]) = β0 + XTi β
⋆ E[Ti] = µi, the parameter that indexes the exponential distribution
⋆ X is an n× p design matrix with columns [X1, . . . , Xp]
⋆ β is a p-vector of regression coefficients
• Often succinctly write the AFT model as
log(Ti) = β0 + XTi β + ǫi
⋆ ǫi ∼ extreme value
584 BIO 233, Spring 2015
✬
✫
✩
✪
• Going back to the leukemia treatment study, let Xi be a binary indicator
of treatment
⋆ 0/1 = control/6-MP
• Consider the AFT model:
log(µi) = β0 + β1Xi
• We can equivalently write
µi = µ0 exp{β1Xi}
⋆ µ0 = exp{β0} is the mean survival time among individuals with Xi = 0
⋆ θ = exp{β1} is the relative change in the mean survival time
comparing individuals with Xi = 1 vs. individuals with Xi = 0
θ =µ1
µ0
585 BIO 233, Spring 2015
✬
✫
✩
✪
• We can also interpret θ in the broader context of the survival distribution
• Recall, for the exponential distribution:
S(t) = exp
{−
t
µ
}
⋆ as µ increases, survival experience is better
• If θ = exp{β1} > 1 =⇒ µ1 > µ0
⋆ 6-MP extends survival
⋆ ‘decceleration’ in the event rate
• If θ = exp{β1} < 1 =⇒ µ1 < µ0
⋆ 6-MP contracts survival
⋆ ‘acceleration’ in the event rate
• Hence the name ‘accelerated failure time’ model
586 BIO 233, Spring 2015
✬
✫
✩
✪
• For the actual leukemia data, θ = 4.14 = 35.9/8.7
Time to relapse, weeks
Sur
viva
l, S
(t)
0 5 10 15 20 25 30 35
0.0
0.2
0.4
0.6
0.8
1.0
6−MP; µ1 = 35.90Control; µ0 = 8.67
⋆ treatment with 6-MP ‘extends’ survival
587 BIO 233, Spring 2015
✬
✫
✩
✪
Exponential regression: models for the hazard
• The AFT model parameterizes a model for log(µ)
⋆ µ indexes the exponential distribution
⋆ µ is the mean of the exponential distribution
• Recall, for the exponential distribution
λ(t) =1
µ= λ
⋆ the hazard is constant as a function of time
• We can use this to write down a model for the hazard:
λi =1
µi=
1
µ0exp{β1Xi}= λ0exp{α1Xi}
588 BIO 233, Spring 2015
✬
✫
✩
✪
• More generally, we can directly write a model for the hazard function
λi(t) = λ0exp{XTα}
• Given an underling exponential distribution for T , this model provides an
alternative, equivalent parameterization
⋆ in terms of the hazard, rather than the mean
• Parameters have different values and interpretations
⋆ λ0 is the baseline hazard
∗ i.e. when X = 0
⋆ exp{α1} is a hazard ratio
exp{α1} =λ1
λ0
• Note that the hazard ratio is independent of time
⋆ this is our first encounter with a ‘proportional hazards’ model
589 BIO 233, Spring 2015
✬
✫
✩
✪
• While different, the two sets of parameters are related
⋆ λ0 = 1/µ0
⋆ exp{α1} = exp{−β1}
• For the leukemia data, exp{α1} = 0.24
⋆ hazard among individuals treatment 6-MP is estimated to be
approximately one quarter that of individuals in the control arm
590 BIO 233, Spring 2015
✬
✫
✩
✪
Estimation/inference
• We can perform likelihood-based estimation/inference for {β0,β} via a
likelihood based on the extreme value distribution:
L(β0,β) =n∏
i=1
f(ǫi; β0,β)δi S(ǫi; β0,β)
(1−δi)
where f(·) and S(·) were given before and
ǫi = log(yi)− log(µi) = log(yi)− [β0 + xTi β]
⋆ observed data are: {(yi, δi,xi); i = 1, . . . , n}
⋆ maximize to get MLEs and base inference on the inverse of the
information matrix
• We could also parameterize the model in terms of {α0,α} from the
equivalent PH model
591 BIO 233, Spring 2015
✬
✫
✩
✪
• In R, we can fit AFT models using the survreg() function in the
survival package:
>
> library(survival)
>
> load("Leukemia.dat")
> fitEx <- survreg(Surv(rTime, status) ~ Rx, dist="exponential", data=leuk)
> summary(fitEx)
...
(Intercept) 2.16 0.218 9.9 4.33e-23
Rx 1.42 0.384 3.7 2.16e-04
Scale fixed at 1
Exponential distribution
Loglik(model)= -112.2 Loglik(intercept only)= -119.6
Chisq= 14.97 on 1 degrees of freedom, p= 0.00011
>
> exp(coef(fitEx))
(Intercept) Rx
8.666667 4.142308
592 BIO 233, Spring 2015
✬
✫
✩
✪
• See that survreg() parameterizes the AFT in terms of the log-mean,
rather than the hazard
⋆ can recover estimates of the baseline hazard and hazard ratio
>
> exp(-coef(fitEx))
(Intercept) Rx
0.1153846 0.2414113
593 BIO 233, Spring 2015
✬
✫
✩
✪
Weibull regression: AFT models
• The AFT model based on Ti ∼ Exp(µi) is
log(Ti) = β0 + XTi β + ǫi
⋆ ǫi ∼ extreme value
⋆ µi = exp{β0 +XTi β}
• The (underlying) assumption that the event times are distributed
according to an exponential distribution may be restrictive
⋆ single-parameter family indexed by µ
⋆ constant hazard rate, as a function of time
• One option for increasing the flexibility in the specification of the AFT is
to scale the ‘error’ terms:
log(Ti) = β0 + XTi β + σǫi
594 BIO 233, Spring 2015
✬
✫
✩
✪
• If we retain the assumption that ǫi ∼ extreme value, then since
S(ǫ) = exp{−exp{ǫ}}
we get
Si(t) = exp
{−exp
{log(t)− [β0 +XT
i β]
σ
}}
= exp
{−
(t
µi
)1/σ}
where, again,
µi = exp{β0 +XTi β}
• This is the survival function for a Weibull distribution
⋆ in particular, Ti ∼ Weibull(µi, σ)
595 BIO 233, Spring 2015
✬
✫
✩
✪
• The increase in the flexibility induced by scaling the error terms is that we
make the underlying distribution of T more flexible
⋆ recall, the exponential distribution is a special case of the Weibull
Q: Does this change influence the interpretation of β0 or β?
• For Ti ∼ Weibull(µi, σ)
E[Ti] = µi Γ(1 + 1/σ)
• The ‘baseline’ mean time to event (i.e. when X = 0) is
E[Ti| Xi = 0] = exp{β0} Γ(1 + 1/σ)
⋆ as such, neither β0 nor exp{β0} have a particularly intuitive
interpretation
⋆ in practice, it may be better to calculate and report the above mean
596 BIO 233, Spring 2015
✬
✫
✩
✪
• Consider two values for the covariate vector X:
x = (x1, . . . , xj , . . . , xp)
x′ = (x1, . . . , xj + 1, . . . , xp)
• Based on the Weibull AFT, we see that
E[Ti| Xi = x′]
E[Ti| Xi = x]=
µ(x′) Γ(1 + 1/σ)
µ(x) Γ(1 + 1/σ)
=exp{β0 + x′Tβ}
exp{β0 + xTβ}= exp{βj}
⋆ exp{βj} is the relative change in the mean time to event associated
with a unit change in Xj , holding everything else constant
⋆ precisely the same interpretation as with the exponential AFT
597 BIO 233, Spring 2015
✬
✫
✩
✪
Weibull regression: models for the hazard
• As with the exponential AFT, we can write down the induced model in
terms of the hazard function
• For Ti ∼ Weibull(µi, σ)
λ(t) =σ
µi
(t
µi
)σ−1
= σ µ−σi tσ−1
• Setting µi = exp{β0 +XTi β} the expression for the hazard becomes
λ(t) = σ exp{β0 +XTi β}
−σ tσ−1
= λ0(t) exp{XTi α}
where
⋆ λ0(t) = σ exp{−β0σ} tσ−1
⋆ α = −βσ
598 BIO 233, Spring 2015
✬
✫
✩
✪
• As with the exponential distribution, we could have written the model
directly for the hazard function
⋆ i.e. write a model for λ(t), independently of the specification of the
AFT
• For the Weibull hazard model:
⋆ the baseline hazard function is no longer constant as a function of time
∗ in contrast to the exponential hazards model
⋆ covariate effects are independent of time
∗ same as the exponential hazard model
∗ another example of a ‘proportional hazards model’
599 BIO 233, Spring 2015
✬
✫
✩
✪
Estimation/inference
• The Weibull AFT model can be succinctly written as
log(Ti) = β0 + XTi β + σǫi
⋆ ǫi ∼ extreme value
• The unknown parameters are: {β0,β, σ}
• Since the ǫi terms still have a extreme value distribution, we can base
estimation/inference on the same likelihood we used for the exponential
AFT:
L(β0,β, σ) =n∏
i=1
f(ǫi; β0,β, σ)δi S(ǫi; β0,β, σ)
(1−δi)
where ǫi = (log(yi)− [β0 + xTi β])/σ
600 BIO 233, Spring 2015
✬
✫
✩
✪
• We can fit the Weibull AFT in R using the survreg() function
>
> fitWB <- survreg(Surv(rTime, status) ~ Rx, dist="weibull", data=leuk)
>
> summary(fitWB)
Call:
survreg(formula = Surv(rTime, status) ~ Rx, data = leuk, dist = "weibull")
Value Std. Error z p
(Intercept) 2.25 0.166 13.54 9.49e-42
Rx 1.19 0.297 4.01 6.08e-05
Log(scale) -0.31 0.145 -2.14 3.27e-02
Scale= 0.733
Weibull distribution
Loglik(model)= -110.2 Loglik(intercept only)= -119.2
Chisq= 18.1 on 1 degrees of freedom, p= 2.1e-05
Number of Newton-Raphson Iterations: 5
n= 42
601 BIO 233, Spring 2015
✬
✫
✩
✪
• There are many ways to parameterize the Weibull distribution
⋆ take care which parameterization you are working with
• survreg() parameterizes the Weibull AFT
log(Ti) = β0 + XTi β + σǫi
by setting
⋆ Intercept = β0
⋆ Scale = 1/σ
• So an estimate of the baseline mean,
E[Ti| Xi = 0] = exp{β0} Γ(1 + 1/σ)
can be obtained by as follows:
> coef(fitWB)
(Intercept) Rx
2.247819 1.191325
602 BIO 233, Spring 2015
✬
✫
✩
✪
> beta0Hat <- coef(fitWB)[1]
> betaXHat <- coef(fitWB)[2]
> sigmaHat <- 1/fitWB$scale
>
> ##
> exp(beta0Hat) * gamma(1 + 1/sigmaHat)
(Intercept)
8.666226
• As we noted, exponentiating the slope parameter yields a contrast that can
be interpreted as a ratio of expected survival times
>
> exp(betaXHat)
Rx
3.29144
⋆ the mean time to relapse among individuals who were treated with
6-MP is estimated to be 3.29 times longer than the mean time to
relapse among the controls
603 BIO 233, Spring 2015
✬
✫
✩
✪
• We can compare the results obtained by taking the survival times to be
distributed according to an exponential distribution to those taking the
distribution to be a Weibull:
>
> getCI(fitEx)
Estimate lower upper
(Intercept) 8.67 5.65 13.29
Rx 4.14 1.95 8.80
> getCI(fitWB)
Estimate lower upper
(Intercept) 9.47 6.84 13.11
Rx 3.29 1.84 5.89
Scale 2.08 1.57 2.77
• Find that there is a change in the point estimate of the treatment effect
604 BIO 233, Spring 2015
✬
✫
✩
✪
• Since we are performing likelihood-based estimation/inference, we can
formally evaluate whether or not the Weibull provides a better fit of the
data via a likelihood ratio test of the following hypotheses
H0 : σ = 1 vs Ha : σ 6= 1
⋆ implicitly a test of the ‘constant’ baseline hazard assumption
>
> anova(fitEx, fitWB)
Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)
1 Rx 40 224.3131 NA NA NA
2 Rx 39 220.3470 = 1 3.9661 0.04642519
• Conclude that the Weibull distribution does indeed provide a better fit
605 BIO 233, Spring 2015
✬
✫
✩
✪
Nonparametric and semi-parametric analyses
• So far the analysis of survival data has required the choice of some
distribution for T
⋆ e.g., Exp(µ), Weibull(µ, σ)
• Estimation/inference proceeds on the basis of the likelihood
L(θ) =n∏
i=1
λ(yi; θ)δi S(yi; θ)
⋆ estimation/inference in the frequentist paradigm
∗ maximize L(θ) to get the MLE
∗ inverse of the information matrix
⋆ estimation/inference in the Bayesian paradigm
∗ specify a prior for θ
∗ summarize features of the posterior distribution
606 BIO 233, Spring 2015
✬
✫
✩
✪
• Once we ‘know’ θ we can characterize any feature of the distribution
⋆ mean survival, E[T ]
⋆ hazard function, λ(t)
⋆ survival function, S(t)
⋆ median survival, Tmed
Q: What if we specify the wrong distribution?
⋆ we have misspecified the statistical model
⋆ the behavior of statistical procedures under misspecification is often
uncertain
Q: Do we need to specify a distribution?
Q: Can we address questions regarding survival without specifying a
distribution?
607 BIO 233, Spring 2015
✬
✫
✩
✪
• Recall, for linear regression analysis of continuous response data we don’t
have to assume any particular distribution for the error terms
Yi = XTi β + ǫi
⋆ as long as the mean-model is correctly specified, the OLS estimator is
∗ unbiased
∗ asymptotically Normally distributed
⋆ if the error terms are Normally distributed then by adopting the Normal
distribution we gain efficiency
∗ the OLS estimator is the MLE
• When we assume a specific distribution for the response and model the
parameters that index that distribution, we say that we are performing a
parametric analysis
⋆ if we know the values of the parameters than index the distribution, we
know everything about the distribution
608 BIO 233, Spring 2015
✬
✫
✩
✪
• Estimation and inference based on OLS with the use of the robust
standard error is an example of a semi-parametric procedure
⋆ we focus and place structure on the conditional mean, E[Y |X]
⋆ we don’t (have to) make any further assumptions about Y |X
∗ e.g., it’s variance
∗ e.g., the family to which its distribution belongs
• In general, a semi-parametric procedure places structure on part of the
distribution of the response but does not completely specify it
• A nonparametric procedure places no structure on the distribution of the
response variable
609 BIO 233, Spring 2015
✬
✫
✩
✪
Estimation and inference for S(t)
• In the analysis of time-to-event data, the survival function is a natural
target of estimation
S(t) = P(T > t)
= 1− P(T ≤ t)
Q: Can we estimate S(t) without making any assumptions about the
distribution of T ?
• Consider two cases:
(1) observed times are complete (i.e. no censoring)
(2) observed times are subject to right censoring
610 BIO 233, Spring 2015
✬
✫
✩
✪
Estimation in the absence of censoring
• Consider the observed times to relapse among the patients in the control
arm of the leukemia trial:
1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8,
11, 11, 12, 12, 15, 17, 22, 23
• Since all of the observed times are actual relapse times, we can directly
estimate the survival function as
S(t) =# individuals with T > t
total sample size
⋆ S(5) = 12/21 = 0.571
⋆ S(10) = 8/21 = 0.381
⋆ S(15) = 3/21 = 0.143
611 BIO 233, Spring 2015
✬
✫
✩
✪
• Present the estimate graphically:
Time to relapse, weeks
Sur
viva
l, S
(t)
0 5 10 15 20 25 30 35
0.0
0.2
0.4
0.6
0.8
1.0
Non−parametricExponential with µ = 8.67
• This estimate of S(t) is nonparametric in the sense that no assumptions
about the underlying distribution have been made
⋆ in contrast to the estimate of S(t) based on an underlying Exp(µ)
distribution
612 BIO 233, Spring 2015
✬
✫
✩
✪
Estimation in the presence of censoring
• Now consider the observed times to relapse among the patients in the
6-MP arm of the leukemia trial:
6, 6, 6, 6+, 7, 9+, 10, 10, 11+, 13, 16, 17+,
19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+
⋆ times marked with a + are (right) censored
• Naıvely applying the previous approach we’d estimate
S(15) = 11/21 = 0.524
• This estimate implicitly assumes that the individuals who were censored
prior to t = 15 did not survive to t = 15
• All three could have survived to t = 15 which would give
S(15) = 14/21 = 0.667
613 BIO 233, Spring 2015
✬
✫
✩
✪
• While these two estimates provide bounds for S(t), this is arguably not
very satisfactory
• One solution is to only consider those folks with complete data:
S(15) =# individuals with T > 15
total with complete data at t = 15=
11
18= 0.611
⋆ under independent censoring this would be a reasonable strategy
• The problem is that you throw away some of the partial information that
the censored individuals provide
⋆ i.e. only use a portion of the sample to estimate S(t)
⋆ this estimator is therefore inefficient
614 BIO 233, Spring 2015
✬
✫
✩
✪
• Another strategy is to make use of all the information in the sample by
exploiting a decomposition of the survival function in terms of conditional
probabilities
• Let {(yi, δi); i = 1, . . . , n} be the observed data
⋆ Yi = min(Ti, Ci)
⋆ δi is the status indicator
• Let t(1) < t(2) < . . . < t(K) denote the ordered observed event times
⋆ K unique event times
⋆ t(K) is the final observed event time
• For the patients in the 6-MP arm, the ordered event times are:
6, 7, 10, 13, 16, 22, 23
615 BIO 233, Spring 2015
✬
✫
✩
✪
• Partition the time scale based on the ordered event times:
[0, t(1)) ∪ [t(1), t(2)) ∪ . . . ∪ [t(K−1), t(K)) ∪ [t(K),∞)
⋆ K + 1 mutually exclusive intervals
• Define the risk set Rk to be the set of individuals were at risk to be
observed to experience the event at time t(k)
⋆ all individuals who had not experienced the event or been censored at
the start of the kth interval
• For risk set Rk, let
⋆ nk denote the number of individuals at risk
⋆ dk denote the number of failures at time t(k)
⋆ sk denote the number of non-failures at time t(k)
616 BIO 233, Spring 2015
✬
✫
✩
✪
• For example, for the 5th ordered failure time:
Time to remission, weeks
0 5 10 15 20 25 30 35
1234 ]56 ]789 ]
101112 ]13 ]14 ]151617 ]18 ]19 ]20 ]21 ]
⋆ t(5) = 16
⋆ [t(5), t(6)) = [16, 22)
⋆ R5 = {11, 12, 13, 14, . . ., 21}
⋆ nk = 11
⋆ dk = 1
⋆ sk = 10
617 BIO 233, Spring 2015
✬
✫
✩
✪
• Across all of the risk sets for the 6-MP arm, we get:
------------------------------
t_(k) n_k d_k s_k
------------------------------
0 0 21 0 21
1 6 21 3 18
2 7 17 1 16
3 10 15 2 13
4 13 12 1 11
5 16 11 1 10
6 22 7 1 6
7 23 6 1 5
------------------------------
• For each risk set, we estimate the conditional probability of survival:
P(T > t(k)| T ≥ t(k)) =sknk
⋆ note this is 1 minus the hazard at time t(k)
618 BIO 233, Spring 2015
✬
✫
✩
✪
• Calculating this for each of the event times, we get:
---------------------------------------
t_(k) n_k d_k s_k cpS_k
---------------------------------------
0 0 21 0 21 1.000
1 6 21 3 18 0.857
2 7 17 1 16 0.941
3 10 15 2 13 0.867
4 13 12 1 11 0.917
5 16 11 1 10 0.909
6 22 7 1 6 0.857
7 23 6 1 5 0.833
---------------------------------------
Q: What is the marginal probability of survival to time t = 10?
619 BIO 233, Spring 2015
✬
✫
✩
✪
• We can calculate the marginal survival for each risk set as:
P(T > t(k)) =sknk
×sk−1
nk−1× . . . ×
s1n1
• Intuitively, to make it to any given risk set and survive, you had to have
made it through all of the prior risk sets
• Applying this to the 6-MP arms of the leukemia data, we get:
------------------------------------------------
t_(k) n_k d_k s_k cpS_k S_k
------------------------------------------------
0 0 21 0 21 1.000 1.000
1 6 21 3 18 0.857 0.857
2 7 17 1 16 0.941 0.807
3 10 15 2 13 0.867 0.699
4 13 12 1 11 0.917 0.641
5 16 11 1 10 0.909 0.583
6 22 7 1 6 0.857 0.499
7 23 6 1 5 0.833 0.416
------------------------------------------------
620 BIO 233, Spring 2015
✬
✫
✩
✪
• This estimate of S(t) is also nonparametric in the sense that no
assumptions about the underlying distribution have been made
⋆ solely used the assumption of independent censoring
• Again present the results graphically:
Time to relapse, weeks
Sur
viva
l, S
(t)
0 5 10 15 20 25 30 35
0.0
0.2
0.4
0.6
0.8
1.0
Non−parametricExponential with µ = 35.9
621 BIO 233, Spring 2015
✬
✫
✩
✪
Kaplan-Meier estimator
• The estimate of S(t) just derived is the Kaplan-Meier estimator:
SKM(t) =∏
k:t(k)≤t
sknk
⋆ product of conditional survival probabilities corresponding to all risk
sets prior to time t
• Right-continuous step-function
⋆ height only changes at the observed event times, t(k)
⋆ equal to 1.0 up to and including the first event time
⋆ only equals 0.0 if the last observed time is an event time
• As n −→ ∞, the number of events and the number of intervals increase
⋆ estimate becomes ‘smoother’ and SKM(t) −→ S(t)
⋆ product limit estimator
622 BIO 233, Spring 2015
✬
✫
✩
✪
• SKM(t) can also be derived as the nonparametric maximum likelihood
estimate
⋆ NPMLE
• Recall the ordered event times: t(1) < t(2) < . . . < t(K)
• For notational convenience, let
⋆ t(0) = 0
⋆ t(K+1) = ∞
• Let ck denote the number of individuals who are censored in the kth
interval, [t(k−1), t(k))
⋆ for these individuals, we observe the following censoring times:
tk1, tk2, . . . , tkck
623 BIO 233, Spring 2015
✬
✫
✩
✪
• For the 5th ordered failure time:
Time to relapse, weeks
0 5 10 15 20 25 30 35
1234 ]56 ]789 ]
101112 ]13 ]14 ]151617 ]18 ]19 ]20 ]21 ]
⋆ [t(5), t(6)) = [16, 22)
⋆ nk = 11
⋆ dk = 1
⋆ ck = 3
⋆ observed censoring times: 17, 19, 20
624 BIO 233, Spring 2015
✬
✫
✩
✪
• Under the independent censoring assumption, the likelihood is given by
L =K∏
k=0
[S(t−(k)) − S(t(k))
]dkck∏
j=1
S(tkj)
⋆ the dk ‘failures’ each contribute:
P(T = t(k)) = S(t−(k)) − S(t(k))
⋆ the ck censored individuals each contribute their own:
P(T > tkj) = S(tkj)
• Notes:
⋆ this represents a likelihood on the space of all survivor functions, S(t)
⋆ the NPMLE is the survival function S(t) that maximizes L
625 BIO 233, Spring 2015
✬
✫
✩
✪
• Note, since tkj > t(k), S(tkj) is maximized by setting
S(tkj) = S(t(k)), j = 1, . . . , ck
⋆ maximize the contribution to the likelihood by maximizing S(·)
• So we can write
L =K∏
k=0
[S(t−(k)) − S(t(k))
]dkck∏
j=1
S(t(k))
• From this, the MLE of S(t) will be a discrete survival function:
⋆ discontinuous at the ordered event times
∗ i.e. piecewise constant with jumps at the t(k)
⋆ hazard components λk at each of the t(k)
∗ in between events, the hazard is zero
626 BIO 233, Spring 2015
✬
✫
✩
✪
• Therefore,
S(t−(k)) =k−1∏
l=0
(1− λl
)and S(t(k)) =
k∏
l=0
(1− λl
)
where the {λ1, . . . , λK} maximize
L(λ) =K∏
k=1
{λdk
k
k−1∏
l=1
(1− λl)dk
k∏
l=1
(1− λl)ck
}
=K∏
k=1
λdk
k (1− λk)nk−dk
• This is the same form as the likelihood based on the Bernoulli distribution
⋆ maximizing yields:
λk =dknk
627 BIO 233, Spring 2015
✬
✫
✩
✪
• Plugging this back into the expression for the survival function, we get:
S(t) =∏
k:t(k)≤t
(1− λk
)
=∏
k:t(k)≤t
(1−
dknk
)
=∏
k:t(k)≤t
sknk
which is the Kaplan-Meier estimator.
628 BIO 233, Spring 2015
✬
✫
✩
✪
Inference for the KM estimator
• For fixed t0, the asymptotic variance of SKM(t0) is
V[SKM(t0)
]= SKM(t0)
2∑
k:t(k)≤t0
dknksk
⋆ known as Greenwood’s formula
⋆ denote the square-root of this by SEGW(t0)
• Use this to construct an approximate 95% confidence interval:
(SKM(t0) − 1.96SEGW(t0), SKM(t0) + 1.96SEGW(t0)
)
• A problem with this confidence interval is that may or may not respect the
fact that S(t) ∈ (0, 1)
629 BIO 233, Spring 2015
✬
✫
✩
✪
• An alternative approach is to construct a 95% confidence interval for
log Λ(t0) = log[−log S(t0)]
and transform back to the S(t) scale:
⋆ Λ(t) ∈ (0,∞) ⇒ log Λ(t) ∈ (−∞,∞)
• The variance for log[−log SKM(t0)]
V[log[−log SKM(t0)]
]=
{1
−log SKM(t0)
}2 ∑
k:t(k)≤t
dknksk
⋆ denote the square-root of this by SEll(t0)
• A 95% confidence interval for S(t0) is
(SKM(t0)
exp{1.96SEll(t0)}, SKM(t0)exp{−1.96SEll(t0)}
)
630 BIO 233, Spring 2015
✬
✫
✩
✪
• In R, one can obtain the KM estimator and (pointwise) 95% confidence
intervals using the survfit() function in the survival package
⋆ note, default is to construct a 95% confidence interval for log S(t0) and
transform
>
> library(survival)
> ##
> KMv1 <- survfit(Surv(rTime, status) ~ 1, data=leuk, subset=(Rx == 1))
> summary(KMv1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.720 1.000
7 17 1 0.807 0.0869 0.653 0.996
10 15 2 0.699 0.1034 0.523 0.934
13 12 1 0.641 0.1100 0.458 0.897
16 11 1 0.583 0.1144 0.397 0.856
22 7 1 0.499 0.1247 0.306 0.815
23 6 1 0.416 0.1287 0.227 0.763
631 BIO 233, Spring 2015
✬
✫
✩
✪
>
> KMv2 <- survfit(Surv(rTime, status) ~ 1, data=leuk, subset=(Rx == 1),
conf.type="plain")
> summary(KMv2)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.707 1.000
7 17 1 0.807 0.0869 0.636 0.977
...
22 7 1 0.499 0.1247 0.255 0.744
23 6 1 0.416 0.1287 0.164 0.668
>
> KMv3 <- survfit(Surv(rTime, status) ~ 1, data=leuk, subset=(Rx == 1),
conf.type="log-log")
> summary(KMv3)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.620 0.952
7 17 1 0.807 0.0869 0.563 0.923
...
22 7 1 0.499 0.1247 0.245 0.710
23 6 1 0.416 0.1287 0.174 0.645
632 BIO 233, Spring 2015
✬
✫
✩
✪
• Looking at the estimated survival functions for the two leukemia treatment
arms there seems to be clear evidence of a difference in survival
experience:
Time to relapse, weeks
Sur
viva
l
0 5 10 15 20 25 30 35
0.0
0.2
0.4
0.6
0.8
1.0
6−MP (N=21)Control (N=21)
• We can formally evaluate whether or not this is the case using a
hypothesis test
633 BIO 233, Spring 2015
✬
✫
✩
✪
• One approach would be to compare the survival at a particular time, t0
⋆ i.e. perform a hypothesis test of
H0 : S0(t0) = S1(t0)
vs
H1 : S0(t0) 6= S1(t0)
using the test statistic
Z =S1(t0) − S0(t0)√
V [S1(t0)] + V [S0(t0)]∼H0
Normal(0, 1)
⋆ Kaplan-Meier estimator to obtain the numerator
⋆ Greenwood’s variance estimator for the denominator
• Problems with focusing on a single time point include:
⋆ clinically, there may not be an ‘optimal’ single time
⋆ we throw away a lot of information
634 BIO 233, Spring 2015
✬
✫
✩
✪
Q: Can we compare two survival curves across the entire observed time
frame?
• Pooling across both treatment arms, let t1 < t2 < . . . < tK denote the
observed event times
⋆ K distinct event times in the entire sample
• At each failure time, consider a 2×2 table of the form:
Failure
No Yes
Group 0 n0k − d0k d0k n0k
Group 1 n1k − d1k d1k n1k
nk − dk dk nk
⋆ risk set associated with the kth failure time
635 BIO 233, Spring 2015
✬
✫
✩
✪
• Under the null hypothesis H0 : S0(t) = S1(t), conditional on the risk set at
time tk, the distribution of the number of events in Group 1 is
D1k ∼ Hypergeometric(nk, n1k, dk)
⋆ expected number of events is
E[d1k] =n1kdknk
• Let Uk = d1k − E[d1k] be the difference between the observed and
expected number of events at the kth failure time
⋆ under H0,
E[Uk] = 0
V[Uk] =n1kn0k(nk − dk)dk
n2k(nk − 1)
= Vk
636 BIO 233, Spring 2015
✬
✫
✩
✪
• The log-rank test compares the survival curves across the observed time
frame
TLR =
[K∑
k=1
Uk
]2
K∑k=1
Vk
∼H0χ21
⋆ test statistic is the usual Cochran-Mantel-Haenzel χ2 statistic applied
to the collection of risk sets
>
> survdiff(Surv(rTime, status) ~ Rx)
...
N Observed Expected (O-E)^2/E (O-E)^2/V
Rx=0 21 21 11.1 8.84 15.2
Rx=1 21 10 19.9 4.93 15.2
Chisq= 15.2 on 1 degrees of freedom, p= 9.81e-05
637 BIO 233, Spring 2015
✬
✫
✩
✪
• The log-rank test assigns equal weight to each risk set
⋆ sensitive to differences in the tails of the survivor function
⋆ where there is the least amount of information
• Most powerful for the alternative:
H1 : S0(t) = S1(t)φ, φ 6= 1
• Equivalently the test is most powerful for the hypotheses:
H0 : λ0(t) = λ1(t)
vs
H1 : λ0(t) = φλ1(t), φ 6= 1
⋆ i.e. proportional hazards
638 BIO 233, Spring 2015
✬
✫
✩
✪
• The Gehan-Breslow test weights the contribution from the kth risk set by
the number of subjects at risk, nk
TW =
[K∑
k=1
nkUk
]2
K∑k=1
n2kVk
∼H0χ21
⋆ relatively larger weight is given to early risk sets
⋆ greater (relative) power under non-proportional hazards for which the
differences are large early on
639 BIO 233, Spring 2015
✬
✫
✩
✪
Cox regression
• We saw that the AFT model provides a regression framework for
characterizing the relationship between the distribution of T and some
vector of covariates, X:
log(Ti) = β0 + XTi β + ǫi
• Require specification of some distribution for ǫ
⋆ needed to form the likelihood
⋆ corresponds to some specific underlying distribution for T
• As such, the AFT is a parametric model
⋆ completely specifies the distribution of T
Q: What if we specify the wrong distribution? What if the random
component of the model is misspecified?
640 BIO 233, Spring 2015
✬
✫
✩
✪
• Recall, for the special cases where T ∼ Exp(µ) and T ∼ Weibull(µ, σ), the
AFT model corresponds to a model for the hazard that exhibits
proportionality over time in the covariate effects:
λ(ti) = λ0(ti) exp{XTi β}
⋆ note: the β in this model is not the same as the one in the AFT model
Q: Can we estimate the components of this model without making any
distributional assumptions?
⋆ baseline hazard function, λ0(·)
⋆ log-hazard ratios, β
• If primary interest lies with estimating β, then the baseline hazard function
is a nuisance parameter
⋆ ideally eliminate it somehow
641 BIO 233, Spring 2015
✬
✫
✩
✪
Partial likelihood
• Observed data: {(yi, δi,xi); i = 1, . . . , n}
• Let t(1) < t(2) < . . . < t(K) denote the ordered events times across the n
observations
• For the kth event time, identify the corresponding risk set Rk
⋆ all individuals who were at risk to experience the event
⋆ all individuals with yi ≥ t(k)
• Suppose that in each of the K risk sets there was only one event
⋆ no ties in the event times
Q: Who exactly, in the risk set, experienced the event?
Q: Of everyone who could have failed, what was so special about them?
642 BIO 233, Spring 2015
✬
✫
✩
✪
• Let x(k) denote the covariate vector for the individual who experienced the
event
• The partial likelihood compares the ‘risk’ of the individual who experiences
the event to everyone who could have experienced the event, across the K
risk sets:
LP (β) =K∏
k=1
P(indiv. (k) fails at time t(k))∑i∈Rk
P(indiv. i fails at time t(k))
=K∏
k=1
λ0(t(k)) exp{xT(k)β}∑
i∈Rkλ0(t(k)) exp{x
Ti β}
=K∏
k=1
exp{xT(k)β}∑
i∈Rkexp{xT
i β}
⋆ Cox (1972)
643 BIO 233, Spring 2015
✬
✫
✩
✪
• The log-partial likelihood is
ℓP (β) =
K∑
k=1
{xT(k)β − log
(∑
i∈Rk
exp{xTi β}
)}
• Differentiating with respect to β we get
∂
∂βℓP (β) =
K∑
k=1
{x(k) −
∑
i∈Rk
xiexp{xTi β}∑
j∈Rkexp{xT
j β}
}
=K∑
k=1
{x(k) −
∑
i∈Rk
W(k),i(β) xi
}
• Information about β comes from a comparison of the covariates for the
person who experienced the event to a weighted average of the covariates
of everyone in the risk set
⋆ weights depend, in part, on β
644 BIO 233, Spring 2015
✬
✫
✩
✪
• Estimation and inference based on LP (β) can proceed as usual
⋆ obtain the maximum partial likelihood estimate (MPLE) via
Newton-Raphson
⋆ standard errors via the partial likelihood observed information matrix:
Iββ = −∂2
∂β∂βTℓP (β)
⋆ hypothesis testing via Wald, score and likelihood ratio tests
∗ theoretical justification is complicated by the fact that the terms are
not independent
• The popularity of the proportional hazards model is driven by
⋆ the ease with which estimation/inference for β proceeds
⋆ the fact that one does not need to specify a model for λ0(·)
• Specifying a model for the hazard and using PL for estimation/inference is
an example of a semi-parametric statistical analysis
645 BIO 233, Spring 2015
✬
✫
✩
✪
• Note, the partial likelihood has exactly the same form as the conditional
likelihood for a matched case-control study
⋆ here the risk set is analogous to the set we created via the matching
process
⋆ used conditioning to eliminate the nuisance parameters introduce by
the sampling scheme
⋆ difference is that the partial likelihood is not a product of independent
conditional probabilities
646 BIO 233, Spring 2015
✬
✫
✩
✪
Interpretation
• Consider the following model for the hazard function:
λ(t) = λ0(t) exp{β1X1,i + . . . + βpXp,i}
• λ0(t) is the baseline hazard function
⋆ when X1 = . . . = Xp = 0
⋆ interpretation requires care when any of the X are continuous
• exp{βj} is the hazard ratio for a unit change in Xj , holding other
components of the model constant
⋆ some use the terms rate ratio or relative risk
• Proportional hazards in the sense that the hazard ratio does not depend on
time
⋆ hazard ratio truly does not depend on time
⋆ characterize an ‘average’ hazard ratio, averaging across time
647 BIO 233, Spring 2015
✬
✫
✩
✪
Bone marrow transplant data
• Dataset ‘marrow’, on the course website, has information on 137
individuals who underwent a bone marrow transplant
⋆ example from Klein and Moeschberger (2003)
• Outcome of interest is disease-free survival
⋆ the origin is time of transplant
⋆ measure time-to-event, where the ‘event’ is the first
∗ death
∗ relapse
⋆ if the patient makes it to the end of the study without dying or
experiencing a relapse, they are censored
• Observed data:
⋆ Yi = min(Tdeath,i, Trelapse,i, Ci)
⋆ δi = 1 if Yi corresponds to a ‘death’ or ‘relapse’ and 0 otherwise
648 BIO 233, Spring 2015
✬
✫
✩
✪
• At the time of transplant, patients were categorized into one of three
disease groups:
⋆ acute lymphoblastic leukemia (ALL)
⋆ acute myeloid leukemia (AML)
∗ low risk category
∗ high risk category
• Load in the data and examine the Kaplan-Meier estimates
>
> library(survival)
> load("BoneMarrow_data.dat")
>
> ##
> marrow$time <- marrow$time / 365.25
> marrow$plateT <- marrow$plateT / 365.25
> marrow$group <- factor(marrow$group, levels=1:3,
labels=c(" ALL", " AML-lo", " AML-hi"))
>
> fitKM <- survfit(Surv(time, status) ~ group, data=marrow)
649 BIO 233, Spring 2015
✬
✫
✩
✪
>
> plot(fitKM, mark.time=FALSE, xlab="Time to first of death or relapse, years",
ylab="Survival", lwd=3, col=c("red", "blue", "green"), axes=FALSE)
> axis(1, at=seq(from=0, to=7, by=1))
> axis(2, at=seq(from=0, to=1, by=0.2))
> legend(6, 1, c("ALL", "AML low risk", "AML high risk"),
lwd=3, col=c("red", "blue", "green"), bty="n")
Time to first of death or relapse, years
Sur
viva
l
0 1 2 3 4 5 6 7
0.0
0.2
0.4
0.6
0.8
1.0
ALLAML low rAML high r
650 BIO 233, Spring 2015
✬
✫
✩
✪
• Formally evaluate differences with either the log-rank test or within a Cox
model:
>
> survdiff(Surv(time, status) ~ group, data=marrow)
..
N Observed Expected (O-E)^2/E (O-E)^2/V
group= ALL 38 24 21.9 0.211 0.289
group= AML-lo 54 25 40.0 5.604 11.012
group= AML-hi 45 34 21.2 7.756 10.529
Chisq= 13.8 on 2 degrees of freedom, p= 0.00101
>
> coxph(Surv(time, status) ~ group, data=marrow)
...
coef exp(coef) se(coef) z p
group AML-lo -0.574 0.563 0.287 -2.00 0.046
group AML-hi 0.383 1.467 0.267 1.43 0.150
Likelihood ratio test=13.4 on 2 df, p=0.0012 n= 137, number of events= 83
651 BIO 233, Spring 2015
✬
✫
✩
✪
• Fit an adjusted Cox model:
>
> marrow$waitCat <- 0
> marrow$waitCat[marrow$waittime > 90] <- 1
> marrow$waitCat[marrow$waittime > 180] <- 2
> marrow$waitCat[marrow$waittime > 365] <- 3
> marrow$waitCat <- factor(marrow$waitCat, levels=0:3,
labels=c(" 0-90", " 91-180", "181-365", " 365+"))
>
> fitPH1 <- coxph(Surv(time, status) ~ ageP + maleP + cmvP + waitCat + group,
data=marrow)
> summary(fitPH1)
...
n= 137, number of events= 83
coef exp(coef) se(coef) z Pr(>|z|)
ageP 0.006301 1.006321 0.013296 0.474 0.6356
maleP -0.272790 0.761252 0.243990 -1.118 0.2636
cmvP 0.071114 1.073704 0.238417 0.298 0.7655
waitCat 91-180 -0.199449 0.819182 0.353061 -0.565 0.5721
waitCat181-365 -0.253918 0.775755 0.390037 -0.651 0.5150
waitCat 365+ -0.476419 0.621003 0.444332 -1.072 0.2836
652 BIO 233, Spring 2015
✬
✫
✩
✪
group AML-lo -0.808177 0.445670 0.329754 -2.451 0.0143 *
group AML-hi 0.251429 1.285861 0.292381 0.860 0.3898
---
exp(coef) exp(-coef) lower .95 upper .95
ageP 1.0063 0.9937 0.9804 1.0329
maleP 0.7613 1.3136 0.4719 1.2280
cmvP 1.0737 0.9314 0.6729 1.7133
waitCat 91-180 0.8192 1.2207 0.4101 1.6365
waitCat181-365 0.7758 1.2891 0.3612 1.6662
waitCat 365+ 0.6210 1.6103 0.2599 1.4836
group AML-lo 0.4457 2.2438 0.2335 0.8505
group AML-hi 1.2859 0.7777 0.7250 2.2807
...
Q: Interpretation of exp{-0.808} = 0.45?
Q: Interpretation of exp{0.251} = 1.29?
653 BIO 233, Spring 2015
✬
✫
✩
✪
• Perform a likelihood ratio test for the overall effect of ‘group’
H0 :
vs
H1 :
>
> ##
> fitPH0 <- coxph(Surv(time, status) ~ ageP + maleP + cmvP + waitCat,
data=marrow)
> anova(fitPH0, fitPH1)
Analysis of Deviance Table
Cox model: response is Surv(time, status)
Model 1: ~ ageP + maleP + cmvP + waitCat
Model 2: ~ ageP + maleP + cmvP + waitCat + group
loglik Chisq Df P(>|Chi|)
1 -371.84
2 -365.01 13.664 2 0.001079 **
654 BIO 233, Spring 2015
✬
✫
✩
✪
Special topics
• In the remainder of the class we are going to consider a series of special
topics that take us beyond the ‘basic’ Cox model
⋆ time-dependent covariates
⋆ time-varying effects
∗ i.e. non-proportional hazards
⋆ left truncation
• Focus on the concepts and implementation in R
• Formal justification given by the theory of counting processes
⋆ BIO 244: Analysis of Failure Time Data
655 BIO 233, Spring 2015
✬
✫
✩
✪
Time-dependent covariates
• So far we have considered models for the hazard of the form:
λ(ti) = λ0(ti) exp{XTi β}
• Implicit in this model is that the covariates do not vary over time
⋆ i.e. the components of X are measured at baseline
• In some settings, covariates of interest may vary over time
• For example, individuals who receive bone marrow transplants have their
platelet levels monitored over time
⋆ at the outset their platelet levels will be too low
⋆ over time, assuming the transplant is successful, they will return to
normal levels
656 BIO 233, Spring 2015
✬
✫
✩
✪
• See this in the marrow dataset
>
> marrow[1:10, c("id", "group", "plateS", "plateT", "status", "time")]
id group plateS plateT status time
1 1 ALL 0 1 1 1
2 2 AML-hi 0 2 1 2
3 3 AML-lo 0 10 1 10
4 4 AML-hi 0 16 1 16
5 5 AML-hi 1 16 1 32
6 6 AML-lo 0 35 1 35
...
• Patient #3:
⋆ experience either death or relapse at day 10 without their platelets
having returned to normal levels
• Patient #5:
⋆ at 16 days, their platelets had returned to normal levels
⋆ experience either death or relapse at day 32
657 BIO 233, Spring 2015
✬
✫
✩
✪
• Define the time-dependent covariate:
X1(t) =
0 platelets have not returned to normal levels by time t
1 platelets have returned to normal levels by time t
• Write a model for the hazard as:
λ(t) = λ0(t) exp{β1X1,i(t) + β2agei + . . .}
⋆ the value of the hazard depends on when it is evaluated
• The model still parameterizes the effect of X1 to be constant in time
⋆ that is, β1 does not depend on time
⋆ proportional hazards for a time-dependent covariate
⋆ interpretation of exp{β1} is as before
658 BIO 233, Spring 2015
✬
✫
✩
✪
• Estimation proceeds via the partial likelihood
• Crucially, when we evaluate the partial likelihood we must update the
value of the covariates that are being compared within the risk set
LP (β) =K∏
k=1
exp{x(k)(t(k))Tβ}∑
i∈Rkexp{xi(t(k))Tβ}
⋆ as you move through time and your covariates are compared to those
of the individual that ‘failed’, we have to make sure to update the
values of your covariates
• Operationally, this involves creating multiple records for individuals
• Consider, again, patient #5
>
> marrow[5, c("id", "group", "plateS", "plateT", "status", "time")]
id group plateS plateT status time
5 5 AML-hi 1 16 1 32
659 BIO 233, Spring 2015
✬
✫
✩
✪
• Their time-varying platelet covariate is as follows:
X1,5(t) =
0 t ≤ 16
1 t > 16
• We can represent this shift by creating two records
⋆ record 1: time between 0-16
∗ time ‘starts’ at 0 and ‘ends’ at 16
∗ X1,5 = 0
∗ had not experienced the event at t5 = 16, so that δ5 = 0 for this
record
⋆ record 2: time between 16-32
∗ time ‘starts’ at 16 and ‘ends’ at 32
∗ X1,5 = 1
∗ experience the event at t5 = 32, so that δ5 = 1 for this record
⋆ covariates that were measured at baseline are the same for both records
∗ e.g., age at entry and sex
660 BIO 233, Spring 2015
✬
✫
✩
✪
• Code in R:
⋆ for each person create a ‘first’ observation
⋆ also create a ‘second’ observation if they experienced platelet recovery
>
> ## First observation prior to platelet recovery
> ##
> marrow0 <- marrow
> marrow0$plateTVC <- 0
> marrow0$start <- 0
> marrow0$end <- pmin(marrow0$plateT, marrow0$time)
> marrow0$delta <- marrow0$status
> marrow0$delta[marrow0$plateT < marrow0$time] <- 0
>
> ## Second observation after platelet recovery
> ##
> marrow1 <- marrow[marrow$plateS == 1,]
> marrow1$plateTVC <- 1
> marrow1$start <- marrow1$plateT
> marrow1$end <- marrow1$time
> marrow1$delta <- marrow1$status
661 BIO 233, Spring 2015
✬
✫
✩
✪
• Combine records and sort
⋆ take care because of one patient who had platelet recovery immediately
>
> marrowTVC <- rbind(marrow0, marrow1)
> marrowTVC <- marrowTVC[order(marrowTVC$id, marrowTVC$start),]
> marrowTVC[1:10, c("id", "ageP", "plateTVC", "start", "end", "delta")]
id ageP plateTVC start end delta
1 1 42 0 0 1 1
2 2 20 0 0 2 1
3 3 34 0 0 10 1
4 4 27 0 0 16 1
5 5 36 0 0 16 0
510 5 36 1 16 32 1
...
>
> marrowTVC[marrowTVC$id == 20,
c("id", "ageP", "plateTVC", "start", "end", "delta")]
id ageP plateTVC start end delta
20 20 35 0 0 0 0
201 20 35 1 0 80 1
> marrowTVC <- marrowTVC[(marrowTVC$start < marrowTVC$end),]
662 BIO 233, Spring 2015
✬
✫
✩
✪
• The new dataset has 256 ‘observations’ from 137 patients
>
> nrow(marrow)
[1] 137
> nrow(marrowTVC)
[1] 256
>
> fitTVC <- coxph(Surv(start, end, delta) ~ ageP + maleP + cmvP + waitCat
+ group + plateTVC, data=marrowTVC)
> summary(fitTVC)
...
n= 256, number of events= 83
exp(coef) exp(-coef) lower .95 upper .95
ageP 1.0099 0.9902 0.9845 1.0360
...
plateTVC 0.3202 3.1227 0.1596 0.6424
• The hazard for individuals who experience platelet recovery is estimated to
be 68% lower that the hazard for individuals who don’t experience platelet
recovery, holding other components of the model constant
663 BIO 233, Spring 2015
✬
✫
✩
✪
Time-varying effects
• Consider, again, the ‘basic’ Cox model:
λ(t) = λ0(t) exp{XTβ}
• As we’ve noted, to isolate the effect of X1, we can compare the hazard
function between two populations who differ in their value by one unit:
x = (x1, x2, . . . , xp)
x′ = (x1 + 1, x2, . . . , xp)
which gives
λ(t;x′)
λ(t;x)=
λ0(t)exp{x′Tβ}
λ0(t)exp{xTβ}= exp{β1}
• That is, under the above model, the effect of X1 does not vary with time
664 BIO 233, Spring 2015
✬
✫
✩
✪
• Put another way, any change in the hazard function associated with a unit
increase in X1, it is maintained throughout all time
⋆ hence the term proportional hazards
• In practice, adopting a proportional hazards model may be reasonable for
⋆ estimating treatment effects over a relatively short timeframe
⋆ comparing innate differences between individuals (e.g. sex)
⋆ settings where one is interested in a simple summary of the effect
• In some settings, this specification may not be
⋆ appropriate
∗ e.g., if the benefits of treatment wane over time
⋆ adequate
∗ e.g., residual confounding
665 BIO 233, Spring 2015
✬
✫
✩
✪
• In practice, one might either
⋆ hypothesize non-proportional hazards at the outset
⋆ detect it via the analysis of residuals or an evaluation of goodness of fit
• Either way, if interest lies in moving beyond a proportional hazards model,
one could parameterize the model so that the effect of X1 is more flexible:
λ(t) = λ0(t) exp{β1(t)X1 + β2X2 + . . .}
which gives
λ(t;x′)
λ(t;x)= exp{β1(t)}
⋆ say the effect is time-varying
⋆ interpretation of the hazard ratio requires specification of some actual
time
666 BIO 233, Spring 2015
✬
✫
✩
✪
• In practice, analysts have considerable choice in how to specify β1(t)
• In the following sub-sections we are going to consider two approaches:
⋆ stratified baseline hazard functions
⋆ interactions with time
667 BIO 233, Spring 2015
✬
✫
✩
✪
Non-proportional hazards: stratified models
• Returning to the bone marrow data, patients under went their transplants
at one of four hospitals:
hospi =
1 The Ohio State University
2 Alferd
3 St. Vincent
4 Hahnemann
• We may want to adjust for hospital as a potential confounder for the
association of disease group and disease-free survival
⋆ different hospitals may serve different patient populations
⋆ some hospitals provide better care than others
668 BIO 233, Spring 2015
✬
✫
✩
✪
• Towards this, consider how the ‘basic’ model parameterizes the effect of
hospital:
λ(t) = λ0(t) exp{Ihosp = 2βh2
+ Ihosp = 3βh3
+ Ihosp = 4βh4
+ . . .}
• Under this model:
⋆ λ0(t) indicates how ‘risk’ varies over time at The Ohio State University
hospital
∗ i.e. the hospital for which ‘hops = 1’
⋆ risk in each of the other hospitals follows the same profile over time,
possibly increased/decreased by some constant multiplicative factor
• If we wanted to summarize differences between hospitals, this may be
reasonable
669 BIO 233, Spring 2015
✬
✫
✩
✪
• If the primary purpose of including hospital in the model is the control of
confounding we may want to avoid a restrictive model specification
⋆ mitigate residual confounding
• One option for introducing flexibility would be to let each hospital have
their own baseline hazard function:
λ(t) = λ0s(t) exp{XTβ}, s = 1, . . . , 4
⋆ λ0s(t) is the stratum-specific baseline hazard function
⋆ β is a vector of common log-hazard ratios
• The hazard ratio comparing stratum s vs stratum 1, holding X constant is
λ0s(t)
λ01(t)
⋆ a flexible function of time
670 BIO 233, Spring 2015
✬
✫
✩
✪
• Estimation/inference is straightforward in that the contributions to the
likelihood:
⋆ form the usual partial likelihood by only considering observations within
each stratum
⋆ take the product over the strata to ‘borrow strength’ in the estimation
of β
LP (β) =S∏
s=1
Ks∏
k=1
exp{xT(k,s)β}∑
i∈Rk,sexp{xT
i β}
• Estimation/inference for the stratification variable is not straightforward
⋆ estimate each of the baseline hazard functions separately
• The approach can be applied to continuous covariates by categorization
⋆ may also want to include the covariate in the linear predictor
671 BIO 233, Spring 2015
✬
✫
✩
✪
• Adopting a proportional hazards model for between-hospital effects:
>
> fitPH <- coxph(Surv(time, status) ~ ageP + maleP + cmvP + waitCat
+ group + hosp, data=marrow)
> summary(fitPH)
...
n= 137, number of events= 83
coef exp(coef) se(coef) z Pr(>|z|)
ageP 0.01380 1.01390 0.01436 0.961 0.3365
maleP -0.20074 0.81812 0.26053 -0.771 0.4410
cmvP -0.07385 0.92881 0.24374 -0.303 0.7619
waitCat 91-180 -0.23035 0.79425 0.35126 -0.656 0.5120
waitCat181-365 -0.44936 0.63803 0.40164 -1.119 0.2632
waitCat 365+ -0.69940 0.49688 0.45898 -1.524 0.1276
group AML-lo -0.55662 0.57314 0.33190 -1.677 0.0935 .
group AML-hi 0.45002 1.56835 0.29657 1.517 0.1292
hosp Alferd 0.76015 2.13860 0.35788 2.124 0.0337 *
hosp St.Vin -0.11097 0.89497 0.33610 -0.330 0.7413
hosp Hahnemann -0.92203 0.39771 0.42995 -2.145 0.0320 *
...
672 BIO 233, Spring 2015
✬
✫
✩
✪
• Permitting hospital-specific baseline hazard functions:
>
> fitS <- coxph(Surv(time, status) ~ ageP + maleP + cmvP + waitCat
+ group + strata(hosp), data=marrow)
> summary(fitS)
...
n= 137, number of events= 83
coef exp(coef) se(coef) z Pr(>|z|)
ageP 0.01191 1.01198 0.01457 0.817 0.414
maleP -0.25649 0.77376 0.26149 -0.981 0.327
cmvP -0.12152 0.88557 0.24550 -0.495 0.621
waitCat 91-180 -0.19452 0.82323 0.35227 -0.552 0.581
waitCat181-365 -0.33425 0.71587 0.39489 -0.846 0.397
waitCat 365+ -0.61556 0.54034 0.45513 -1.352 0.176
group AML-lo -0.54385 0.58051 0.33351 -1.631 0.103
group AML-hi 0.42925 1.53611 0.29958 1.433 0.152
...
• See differences in the estimated coefficients for a several of the baseline
covariates
673 BIO 233, Spring 2015
✬
✫
✩
✪
Non-proportional hazards: interactions with time
• An important drawback of using a stratified model is that the effect of the
covariate is not easily characterized
• An alternative is to directly parameterize β(·)
• One simple specification is to include an interaction with log(t) into the
linear predictor
λ(t) = λ0(t) exp{β1X1 + β∗1X1log(t) + β2X2 + . . .}
⋆ the corresponding hazard ratio is
λ(t;x′)
λ(t;x)= exp{β1 + β∗
1 log(t)} = exp{β1} × tβ∗
1
⋆ when β∗1 = 0, we have proportional hazards
⋆ when β∗1 = 1, the hazard ratio is linear in time
674 BIO 233, Spring 2015
✬
✫
✩
✪
• Practically, one can fit this model by noting that log(t) is a
time-dependent covariate
⋆ use the same ‘multiple records’ approach to construct a dataset that
appropriately updates the value of ‘log(t)’ at each risk set
• Once the model has been fit one can evaluate the null hypothesis
H0 : β∗1 = 0
to see if there is any evidence of non-proportional hazards
⋆ note that the model structure is set up to focus on specific alternatives
to proportional hazards
675 BIO 233, Spring 2015
✬
✫
✩
✪
Left truncation
• Throughout the notes we have assumed that for a given time-to-event
response variable, T
1. there is a well-defined origin, which corresponds to T=0
2. study participants are observed from T=0 onwards until either an event
occurs or they are right-censored
• Suppose interest lies in designing a study to establish risk factors for
Alzheimer’s Disease (AD)
• Since AD can manifest at (almost) any age, an ‘ideal’ study would
⋆ enroll a random sample of births
⋆ follow each individual over the course of their life, with regular
cognitive evaluations
• Clearly this isn’t practical and an alternative design would need to be used
676 BIO 233, Spring 2015
✬
✫
✩
✪
• Consider the Adult Changes in Thought (ACT) study
⋆ an on-going prospective study of incident AD and dementia
⋆ western WA state
⋆ members of Group Health Cooperative who were aged 65 years or older
• Enrollment for ACT has involved three main phases:
⋆ the initial cohort, enrolled between 1994-1996
⋆ a supplementary cohort, enrolled between 2000-2002
⋆ continuous enrollment from 2005 onwards
• At the outset, each potential participant undergoes an initial evaluation
⋆ enrollment was restricted to individuals who were ‘cognitively intact’
• Thereafter, cognitive evaluations are performed biennially
⋆ ‘time of diagnosis’ taken to be the mid-point of the two-year interval
677 BIO 233, Spring 2015
✬
✫
✩
✪
• Follow-up was censored administratively in 2012 or at age 89, whichever
occurs first
• The observed data consists of n=3,602 records
• Among the covariates measured at enrollment are
⋆ gender: 0/1 = male/female
⋆ education: 0/1/2/3 = < HS/HS/some college/graduate
⋆ marital status: 0/1 = not married/married
⋆ depression: 0/1 = no/yes
• Consider the following four ACT participants:
year female eduCat married depression ageEnroll ageDx ageDeath ageCensor
#1 2000 0 0 1 0 74 82 83 86
#2 2006 1 0 0 0 78 84 NA 84
#3 1994 0 2 0 0 74 NA 81 89
#4 2008 1 2 1 0 76 NA NA 80
678 BIO 233, Spring 2015
✬
✫
✩
✪
• Graphically, the observed data for these individuals can be represented as:
Age, years
65 70 75 80 85 90
X ]
X )
]
)
X])
Unobserved person−timeObserved person−timeAlzheimer’s Disease eventDeath eventCensoring
679 BIO 233, Spring 2015
✬
✫
✩
✪
• The structure of these data does not conform to the paradigm for
time-to-event response data that we’ve considered so far
• As an alternative, we could consider time since study enrollment:
Time since enrollment, years
0 5 10 15 20 25
X ]
X )
]
)
X])
Observed person−timeAlzheimer’s Disease eventDeath eventCensoring
680 BIO 233, Spring 2015
✬
✫
✩
✪
Q: Focusing on mortality, as we consider time to death from enrollment, what
do we mean by ‘time’? Is this notion of time interpretable beyond the
context of the study?
• Returning to age as the time scale, we have to address the problem that
the observed data is subject to left truncation
⋆ observation of person time did not begin at the origin for the age time
scale (i.e. at birth)
⋆ person-time prior to the start of observation cannot be considered
⋆ there is no potential for the individual to be at risk to be observed to
experience the event
• To formalize this, we can introduce a delayed entry time
⋆ random variable, V
⋆ in theory, this is well-defined for all individuals in the study
⋆ analogous to the right censoring time, C
681 BIO 233, Spring 2015
✬
✫
✩
✪
• Given the observed data, we have no choice but to condition on the event
‘T > V ’ in our analyses
• For example, we can readily estimate hazard functions of the form:
λ(t| X, V ) = lim∆→0
1
∆P(t ≤ T < t+∆| T ≥ t| X, T > V )
⋆ conditional hazard function, given that you made it to the delayed
entry time
Q: How does this impact the interpretability of the results?
• Fortunately, under the independence assumption:
T ⊥⊥ V | X
one can easily show that
λ(t| X, V ) = λ(t| X) = λ(t| X) lim∆→0
1
∆P(t ≤ T < t+∆| T ≥ t| X)
682 BIO 233, Spring 2015
✬
✫
✩
✪
• Consequently, we can perform valid estimation/inference for the hazard
function of primary interest
⋆ analogous to the independence assumption we routinely make for
right-censored data
Q: Is this assumption reasonable for ACT?
• Operationally, analyses in R can proceed as follows:
> ## Structure of the observed data
>
> act
year female eduCat married depression ageEnroll ageDx ageDeath
1 2010 0 3 1 0 71 NA NA
2 1995 1 2 1 0 76 NA NA
3 2008 1 3 0 0 84 NA NA
4 1995 0 3 1 0 68 NA 69
5 2002 0 3 0 0 74 NA 79
...
683 BIO 233, Spring 2015
✬
✫
✩
✪
> ## Manipulations to get the analysis variables
>
> ## Time to diagnosis
> ##
> act$T1 <- act$ageDx
> act$T1[is.na(act$T1)] <- 999
>
> ## Time to death
> ##
> act$T2 <- act$ageDeath
> act$T2[is.na(act$T2)] <- 999
>
> ## Remove folks who did not have at least 2 years of follow-up
> ##
> act <- act[act$T2 >= (act$ageEnroll + 2),]
>
> ##
> act$Y0 <- act$ageEnroll
> act$Y1 <- pmin(act$T1, act$T2, act$ageCensor)
> act$Y2 <- pmin(act$T2, act$ageCensor)
> ##
> act$delta1 <- as.numeric(act$Y1 == act$T1)
> act$delta2 <- as.numeric(act$Y2 == act$T2)
684 BIO 233, Spring 2015
✬
✫
✩
✪
> ## Observation on the "time since enrollment" scale
> ##
> act$Ystar <- act$Y2 - act$Y0
>
> ## Standardize age to make the baseline hazard function and age
> ## "contrast" more interpretable
> ##
> act$ageEnroll <- (act$ageEnroll - 75) / 5
>
> ##
> library(survival)
> getHR <- function(fit, alpha=0.05, digits=2)
+ {
+ beta <- coef(fit)
+ se <- sqrt(diag(fit$var))
+ value <- matrix(rep(beta, 3), ncol=3)
+ value <- value + qnorm(1-alpha/2)
+ * matrix(rep(c(0,-1,1), length(beta)), ncol=3, byrow=TRUE)
+ * matrix(rep(se, 3), ncol=3)
+ value <- round(exp(value), digits=digits)
+ dimnames(value) <- list(names(beta), c("HR", "Lower", "Upper"))
+ return(value)
+ }
685 BIO 233, Spring 2015
✬
✫
✩
✪
> ## Model on the "time to enrollment" time scale
> ## - missingness is due to the depression variable
> ##
> fit10 <- coxph(Surv(Ystar, delta2) ~ female + married + depression
> + factor(eduCat) + ageEnroll, data=act)
> summary(fit10)
Call:
coxph(formula = Surv(Ystar, delta2) ~ female + married + depression +
factor(eduCat) + ageEnroll, data = act)
n= 3529, number of events= 1160
(73 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
female -0.53783 0.58401 0.06331 -8.496 < 2e-16 ***
married -0.17833 0.83667 0.06408 -2.783 0.00539 **
depression 0.51223 1.66900 0.08527 6.007 1.89e-09 ***
factor(eduCat)1 0.01047 1.01052 0.10261 0.102 0.91873
factor(eduCat)2 -0.10628 0.89918 0.07041 -1.509 0.13118
factor(eduCat)3 -0.33870 0.71270 0.08569 -3.953 7.72e-05 ***
ageEnroll 0.50363 1.65471 0.03425 14.706 < 2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
686 BIO 233, Spring 2015
✬
✫
✩
✪
exp(coef) exp(-coef) lower .95 upper .95
female 0.5840 1.7123 0.5159 0.6612
married 0.8367 1.1952 0.7379 0.9486
depression 1.6690 0.5992 1.4121 1.9726
factor(eduCat)1 1.0105 0.9896 0.8264 1.2356
factor(eduCat)2 0.8992 1.1121 0.7833 1.0322
factor(eduCat)3 0.7127 1.4031 0.6025 0.8430
ageEnroll 1.6547 0.6043 1.5473 1.7696
...
Q: What conclusions do we draw?
687 BIO 233, Spring 2015
✬
✫
✩
✪
> ## Model on the "time since birth" (i.e. age) time scale
> ##
> fit11 <- coxph(Surv(Y0, Y2, delta2) ~ female + married + depression
> + factor(eduCat), data=act)
> summary(fit11)
Call:
coxph(formula = Surv(Y0, Y2, delta2) ~ female + married + depression +
factor(eduCat), data = act)
n= 3529, number of events= 1160
(73 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
female -0.52442 0.59190 0.06340 -8.272 < 2e-16 ***
married -0.12708 0.88067 0.06355 -1.999 0.045555 *
depression 0.49500 1.64049 0.08501 5.823 5.78e-09 ***
factor(eduCat)1 0.01998 1.02018 0.10263 0.195 0.845638
factor(eduCat)2 -0.08641 0.91721 0.07028 -1.230 0.218833
factor(eduCat)3 -0.31308 0.73119 0.08551 -3.662 0.000251 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
...
688 BIO 233, Spring 2015
✬
✫
✩
✪
> ## Compare the results between the two analyses
> ##
> cbind(getHR(fit10), rbind(getHR(fit11), c(NA, NA, NA)))
HR Lower Upper HR Lower Upper
female 0.58 0.52 0.66 0.59 0.52 0.67
married 0.84 0.74 0.95 0.88 0.78 1.00
depression 1.67 1.41 1.97 1.64 1.39 1.94
factor(eduCat)1 1.01 0.83 1.24 1.02 0.83 1.25
factor(eduCat)2 0.90 0.78 1.03 0.92 0.80 1.05
factor(eduCat)3 0.71 0.60 0.84 0.73 0.62 0.86
ageEnroll 1.65 1.55 1.77 NA NA NA
• Generally draw the same conclusions
689 BIO 233, Spring 2015
✬
✫
✩
✪
• Now let’s consider AD/dementia as a time-dependent covariate
> ## Create a new dataset with AD/dementia diagnosis as a time-dependent
> ##covariate
> ##
> ## - 111 patients have a diagnosis of AD/dementia but it’s the same age as
> ## their date of death
> ## - something needs to be done or else we loose these folks
> ## - make a slight modification that gives them each 0.5 years
>
> ##
> actTV <- act[ ,c("female", "eduCat", "married", "depression",
> "Y0", "Y1", "Y2", "delta1", "delta2")]
> ##
> bad <- (actTV$Y1 == actTV$Y2) & (actTV$delta1 == 1)
> actTV$Y2[bad] <- actTV$Y2[bad] + 0.5
> ##
> n <- nrow(actTV)
> group0 <- c(1:n)[actTV$delta1 == 0]
> group1 <- c(1:n)[actTV$delta1 == 1]
> ##
> actTV <- actTV[c(group0, rep(group1, rep(2, length(group1)))),]
690 BIO 233, Spring 2015
✬
✫
✩
✪
> actTV$RxAD <- c(rep(0, length(group0)), rep(c(0,1), length(group1)))
> ##
> actTV$YS <- actTV$Y0
> actTV$YE <- actTV$Y1
> actTV$deltaE <- actTV$delta2
> ##
> actTV$YS[actTV$RxAD == 1] <- actTV$Y1[actTV$RxAD == 1]
> actTV$YE[actTV$RxAD == 1] <- actTV$Y2[actTV$RxAD == 1]
> actTV$deltaE[actTV$RxAD == 0 & actTV$delta1 == 1] <- 0
>
> ##
> dim(actTV)
[1] 4297 14
> actTV[c(1:3, 4294:4297),
> c("Y0", "Y1", "delta1", "Y2", "delta2", "RxAD", "YS", "YE", "deltaE")]
Y0 Y1 delta1 Y2 delta2 RxAD YS YE deltaE
1 71 73 0 73.0 0 0 71 73.0 0
2 76 89 0 89.0 0 0 76 89.0 0
3 84 88 0 88.0 0 0 84 88.0 0
4143 73 86 1 89.0 0 0 73 86.0 0
4143.1 73 86 1 89.0 0 1 86 89.0 0
4145 72 89 1 89.5 1 0 72 89.0 0
4145.1 72 89 1 89.5 1 1 89 89.5 1
691 BIO 233, Spring 2015
✬
✫
✩
✪
> ## Model with RxAD as a time-varying covariate
> ##
> fitTV <- coxph(Surv(YS, YE, delta2) ~ RxAD + female + married + depression
> + factor(eduCat), data=actTV)
> summary(fitTV)
Call:
coxph(formula = Surv(YS, YE, delta2) ~ RxAD + female + married +
depression + factor(eduCat), data = actTV)
n= 4220, number of events= 1496
(77 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
RxAD 1.11216 3.04093 0.06575 16.916 < 2e-16 ***
female -0.49140 0.61177 0.05584 -8.801 < 2e-16 ***
married -0.15482 0.85657 0.05605 -2.762 0.00574 **
depression 0.47761 1.61221 0.07376 6.475 9.46e-11 ***
factor(eduCat)1 -0.03641 0.96425 0.09155 -0.398 0.69085
factor(eduCat)2 -0.11453 0.89179 0.06155 -1.861 0.06278 .
factor(eduCat)3 -0.40540 0.66671 0.07642 -5.305 1.13e-07 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
692 BIO 233, Spring 2015
✬
✫
✩
✪
exp(coef) exp(-coef) lower .95 upper .95
RxAD 3.0409 0.3288 2.6733 3.4592
female 0.6118 1.6346 0.5484 0.6825
married 0.8566 1.1674 0.7675 0.9560
depression 1.6122 0.6203 1.3952 1.8630
factor(eduCat)1 0.9642 1.0371 0.8059 1.1538
factor(eduCat)2 0.8918 1.1213 0.7904 1.0061
factor(eduCat)3 0.6667 1.4999 0.5740 0.7744
Q: Interpretation of the hazard ratio estimate for RxAD?
693 BIO 233, Spring 2015