Part VIII: Survival response data - Harvard University

142
Part VIII: Survival response data 552 BIO 233, Spring 2015

Transcript of Part VIII: Survival response data - Harvard University

Part VIII:

Survival response data

552 BIO 233, Spring 2015

Leukemia relapse

• Randomized study of treatment for leukemia

⋆ 21 patients in each of two treatment arms

(A) control

(B) 6-mercaptopurine (6-MP)

• Interest is in the time to relapse from treatment initiation

• Observed times to relapse among the patients in the control arm:

1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8,

11, 11, 12, 12, 15, 17, 22, 23

⋆ shown graphically on the next slide

• The outcome is continuous and we could choose to summarize these

observed times with the mean

⋆ sample mean = 8.7 weeks

553 BIO 233, Spring 2015

Time to relapse, weeks

CO

NT

RO

L

0 5 10 15 20 25 30 35

Time to relapse, weeks

6−M

P

0 5 10 15 20 25 30 35

]

]

]

]]

]

]]]

]]

554 BIO 233, Spring 2015

• Crucially, to be able to calculate the mean we had to observe all of the

relapse times

• In the 6-MP arm, the actual relapse time is only available for 10 of the 21

patients

⋆ for the other 11, all we know is that they had not relapsed by a certain

time

⋆ we don’t know what happened afterwards

• In a sense, the value of time to relapse is ‘incomplete’ for these 11

individuals

⋆ the sample mean of 17.1 weeks will be biased

• However, we do have partial information on these individuals

⋆ patient #4 in the 6-MP arm had not relapsed by week 6

⋆ patient #21 in the 6-MP arm had not relapsed by week 35

555 BIO 233, Spring 2015

• Such observations are censored

⋆ we don’t observe the actual event time, T

⋆ instead, we follow individuals up to some point in time but not

thereafter

• Intuitively, the actual event times are ‘missing’ or ’incomplete’

• If we observed T for all individuals in our sample, analyses would be

straightforward

⋆ calculate the mean time to relapse in each of the treatment arms

⋆ linear regression with T as the outcome and some set of covariates, X

Q: Can we estimate a mean or fit a linear regression model with data that are

censored?

Q: Do individuals who are censored provide information about T ?

⋆ if so, how?

556 BIO 233, Spring 2015

Censoring

• The censoring observed in the leukemia example is specifically referred to

as right censoring

• As with any type of missingness, we have to consider the mechanisms at

play that lead to the missingness

⋆ which observations are incomplete/censored and why?

⋆ potential selection bias?

• Examples of censoring mechanisms include:

⋆ end of study

⋆ loss to follow-up

⋆ withdrawal from the study

⋆ death

557 BIO 233, Spring 2015

• Our focus will be on right censoring but there are other types of censoring

⋆ other types of ‘missingness’ or ‘incompleteness’

• Left truncation

⋆ first opportunity to observe the event comes some time after the origin

⋆ likely an issue if age is taken as the time scale

⋆ problematic because the event could have occurred prior to our starting

observation or follow-up

• Interval censoring

⋆ we only know that the event occurred during some particular interval

⋆ an issue if individuals are followed-up at intervals

∗ e.g., every 3 months

⋆ problematic because we don’t know the exact value of T

558 BIO 233, Spring 2015

• Individuals who are censored do indeed provide some (partial) information

⋆ i.e. we know that they had not experienced the event by a certain time

• As we’ll see, to make use of that information we have to make a crucial

assumption:

⋆ the censoring mechanism is independent of the survival time

• Intuitively, the assumption states that

⋆ survival experience of those who were censored is the same as those

who were not censored

⋆ censoring carries no prognostic information

559 BIO 233, Spring 2015

Notation

• Let Ti denote the ‘time to event’ for the ith individual

• Let Ci denote the corresponding censoring time

• In survival data, we either observe the event time or the censoring time,

whichever occurs first

⋆ i.e. we observe Yi = min(Ti, Ci)

• We distinguish which occurs with the notation

δi =

1 if Ti ≤ Ci

0 if Ti > Ci

⋆ δi is often referred to as the status indicator

• So the observed data is: {(Yi, δi,Xi); i = 1, . . . , n}

560 BIO 233, Spring 2015

Survival distributions

• ‘Time to relapse’ is an example of a survival response

• More generally, we consider the time to some event

⋆ from a well-defined starting point, referred to as the ‘origin’

⋆ measure ‘survival’ until the event occurs

• To characterize the distribution of T , we often consider four functions

(1) probability density function, f(t)

(2) survival function, S(t)

(3) hazard function, λ(t)

(4) cumulative hazard function, Λ(t)

⋆ each fully characterizes the underlying distribution

⋆ four functions are interrelated

561 BIO 233, Spring 2015

1. Probability density function:

f(t) = lim∆→0

1

∆P(t ≤ T < t+∆)

= λ(t) S(t) = λ(t) exp

t∫

0

λ(s)∂s

2. Survival function:

S(t) = P(T > t) = 1− P(T ≤ T )

= 1− F(t) = 1−

t∫

0

f(s)∂s

= exp {−Λ(t)}

⋆ F(t) is the cumulative distribution function (CDF)

562 BIO 233, Spring 2015

3. Hazard function:

λ(t) = lim∆→0

1

∆P(t ≤ T < t+∆| T ≥ t)

=f(t)

S(t)= −

∂tlog S(t)

⋆ interpreted as the instantaneous rate at time t, given that the event

has not occurred prior to time t

4. Cumulative hazard function:

Λ(t) =

t∫

0

λ(s)∂s

= −log S(t)

563 BIO 233, Spring 2015

Distributions for T

• The random variable T is continuous and non-negative

• There are many distributions for continuous random variables that have

support on R+ = (0, ∞), including

⋆ Exponential

⋆ Weibull

⋆ Gamma

⋆ Log-normal

⋆ Log-logistic

⋆ Gompertz

⋆ Generalized gamma

• Collectively referred to as survival distributions

564 BIO 233, Spring 2015

Exponential distribution

• The exponential distribution is one of the simplest survival distributions

⋆ also important historically in that early attempts at modeling survival

data were based on it

• For T ∼ Exp(µ)

f(t) =1

µexp

{−

t

µ

}

S(t) = exp

{−

t

µ

}

λ(t) =1

µ

Λ(t) =t

µ

⋆ E[T ] = µ and V[T ] = µ2

⋆ the hazard is constant as a function of time

565 BIO 233, Spring 2015

• As µ increases, the density assigns mass further away from T = 0

⋆ events occur later in time

Time

pdf,

f(t)

0 1 2 3 4 5

0.0

0.5

1.0

1.5

2.0

µ = 0.5µ = 1.0µ = 2.0

566 BIO 233, Spring 2015

• As µ increases, survival experience is ‘better’

⋆ events occur later in the time scale

Time

Sur

viva

l, S

(t)

0 1 2 3 4 5

0.00

0.25

0.50

0.75

1.00

µ = 0.5µ = 1.0µ = 2.0

567 BIO 233, Spring 2015

• As µ increases, the cumulative hazard increases at a slower rate

⋆ the total number of events accrue more slowly

Time

Cum

ulat

ive

haza

rd fu

nctio

n, H

(t)

0 1 2 3 4 5

02

46

810

µ = 0.5µ = 1.0µ = 2.0

568 BIO 233, Spring 2015

Weibull distribution

• The exponential distribution is limited in that it is completely

characterized by a single parameter, µ

• A more flexible distribution is the Weibull distribution

• For T ∼ Weibull(µ, σ)

f(t) =σ

µ

(t

µ

)σ−1

exp

{−

(t

µ

)σ}

S(t) = exp

{−

(t

µ

)σ}

λ(t) =σ

µ

(t

µ

)σ−1

Λ(t) =

(t

µ

569 BIO 233, Spring 2015

• Notes:

⋆ µ is referred to as the ‘scale’ parameter

⋆ σ is referred to as the ‘shape’ parameter

⋆ Exp(µ) is a special case with σ = 1

∗ additional parameter provides increased flexibility

⋆ the mean and variance are given by

E[T ] = µ Γ(1 + 1/σ)

V[T ] = µ2 {Γ(1 + 2/σ)− Γ(1 + 1/σ)2}

570 BIO 233, Spring 2015

• For µ = 1, as σ increases, the shape changes so that less (relative) mass is

given to low event times and high event times

⋆ events occur ‘in the middle’

Time

pdf,

f(t)

0 1 2 3 4 5

0.0

0.5

1.0

1.5

2.0

σ = 0.5σ = 1.0σ = 2.0

571 BIO 233, Spring 2015

• For µ = 1, as σ increases, survival is (relatively) better early on but the

descent is much more dramatic

⋆ events occur ‘in the middle’

Time

Sur

viva

l, S

(t)

0 1 2 3 4 5

0.00

0.25

0.50

0.75

1.00

σ = 0.5σ = 1.0σ = 2.0

572 BIO 233, Spring 2015

• For µ = 1,

⋆ if σ < 1, the hazard is initially higher but then decreases with time

⋆ if σ ≥ 1, the hazard is initially lower but then increases with time

Time

haza

rd fu

nctio

n, h

(t)

0 1 2 3 4 5

02

46

810

σ = 0.5σ = 1.0σ = 2.0

573 BIO 233, Spring 2015

• The impact of σ on λ(t) can also be seen in the cumulative hazard function

Time

Cum

ulat

ive

haza

rd fu

nctio

n, H

(t)

0 1 2 3 4 5

05

1015

2025

σ = 0.5σ = 1.0σ = 2.0

574 BIO 233, Spring 2015

Parametric survival analysis

• Suppose we observed an i.i.d sample of size n from an Exp(µ) distribution

• In the absence of censoring and (for the moment) ignoring covariates, the

observed data would be: {ti; i = 1, . . . , n}

• The likelihood is the usual product of n independent contributions

L(µ) =n∏

i=1

f(ti;µ) =n∏

i=1

1

µexp

{−tiµ

}

with the log-likelihood given by

ℓ(µ) =n∑

i=1

{− log(µ) −

tiµ

}

575 BIO 233, Spring 2015

• Differentiating with respect to µ, setting equal to zero and solving the

MLE is

µMLE =1

n

n∑

i=1

ti

⋆ i.e. the sample mean

Q: What if we don’t observe T for each individual?

⋆ i.e. some individuals are (right-)censored

• The observed data is

{(yi, δi); i = 1, . . . , n}

where Yi = min(Ti, Ci)

Q: What is the appropriate likelihood for these data?

⋆ how does each individual contribute ‘information’?

576 BIO 233, Spring 2015

• Assuming independence between individuals, the likelihood will still be the

product of n contributions

• If Yi is the event time, Ti, then the contribution to the likelihood is the

usual contribution

⋆ i.e. the density function f(yi)

• If Yi is the censoring time, Ci, then we know that their actual event time

must be after Ci

⋆ assuming independent censoring (i.e. Ci ⊥⊥ Ti), the contribution to the

likelihood is the fact that they survived to Yi

⋆ i.e. the survival function S(yi)

• We can succinctly write the contributions as

Li(µ) =

fi(µ) if δi = 1

Si(µ) if δi = 0

577 BIO 233, Spring 2015

• We can therefore write the likelihood as:

L(µ) =

n∏

i=1

f(yi;µ)δi S(yi;µ)

(1−δi)

• Recall that

λ(yi;µ) =f(yi;µ)

S(yi;µ)

so that the likelihood can be written as

L(µ) =n∏

i=1

λ(yi;µ)δi S(yi;µ)

=n∏

i=1

(1

µ

)δi

exp

{−yiµ

}

578 BIO 233, Spring 2015

• The log-likelihood is therefore

ℓ(µ) =n∑

i=1

{− δilog(µ) −

yiµ

}

and differentiating with respect to µ, we get

∂µℓ(µ) =

n∑

i=1

{−

δiµ

+yiµ2

}

• Setting this expression to zero and solving, the MLE is given by

µMLE =

n∑i=1

yi

n∑i=1

δi

=Yn

Dn

⋆ equal to the sample mean if all the δi’s = 1

579 BIO 233, Spring 2015

• Differentiating again with respect to µ

∂2

∂µ2ℓ(µ) =

n∑

i=1

{δiµ2

− 2yiµ3

}

so that

V[µMLE] =

{−

∂2

∂µ2ℓ(µ)

∣∣∣∣µMLE

}−1

=µ2

MLE

Dn

⋆ use this to construct a 95% confidence interval

• Use the delta method to derive the variance for any other function of µ

⋆ e.g. the survival function: S(t) = exp{−t/µ}

580 BIO 233, Spring 2015

• For the leukemia data, we get

Control arm 6-MP arm

(n=21) (n=21)

µnaıve 8.7 17.1

Yn 182 359

Dn 21 10

µMLE 8.7 35.9

95% CI (5.0, 12.4) (13.6, 58.2)

Tmed 6.0 24.9

• Notice how biased the naıve estimate is in the 6-MP group

• Tmed is the median survival

⋆ the time by which 50% of individuals experience the event

⋆ for the exponential distribution Tmed = log(2)µ

581 BIO 233, Spring 2015

• Pluggin in µMLE into S(t) = exp{−t/µ} and plotting against time we get

Time to relapse, weeks

Sur

viva

l, S

(t)

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

1.0

6−MP; µ1 = 35.90Control; µ0 = 8.67

• Conclude that 6-MP is beneficial in that it

⋆ prolongs expected time to relapse by an estimated 27.2 weeks

⋆ extends median survival by 18.9 weeks

582 BIO 233, Spring 2015

Exponential regression: AFT models

• To formally evaluate differences in survival between the two treatment

arms we could embed the comparison within a regression framework

• One modeling challenge is that T > 0

⋆ building a regression model directly for E[T ] may result in negative

fitted values

• If T ∼ Exp(µ), then

ǫ = log(T )− log(µ) ∼ extreme value distribution

⋆ a unimodal, negatively skewed distribution with

f(ǫ) = exp{ǫ− exp{ǫ}}

S(ǫ) = exp{−exp{ǫ}}

583 BIO 233, Spring 2015

• We can use this to write the following

log(Ti) = log(E[Ti]) + ǫi

• The accelerated failure time (AFT) model structures the mean via

log(E[Ti]) = β0 + XTi β

⋆ E[Ti] = µi, the parameter that indexes the exponential distribution

⋆ X is an n× p design matrix with columns [X1, . . . , Xp]

⋆ β is a p-vector of regression coefficients

• Often succinctly write the AFT model as

log(Ti) = β0 + XTi β + ǫi

⋆ ǫi ∼ extreme value

584 BIO 233, Spring 2015

• Going back to the leukemia treatment study, let Xi be a binary indicator

of treatment

⋆ 0/1 = control/6-MP

• Consider the AFT model:

log(µi) = β0 + β1Xi

• We can equivalently write

µi = µ0 exp{β1Xi}

⋆ µ0 = exp{β0} is the mean survival time among individuals with Xi = 0

⋆ θ = exp{β1} is the relative change in the mean survival time

comparing individuals with Xi = 1 vs. individuals with Xi = 0

θ =µ1

µ0

585 BIO 233, Spring 2015

• We can also interpret θ in the broader context of the survival distribution

• Recall, for the exponential distribution:

S(t) = exp

{−

t

µ

}

⋆ as µ increases, survival experience is better

• If θ = exp{β1} > 1 =⇒ µ1 > µ0

⋆ 6-MP extends survival

⋆ ‘decceleration’ in the event rate

• If θ = exp{β1} < 1 =⇒ µ1 < µ0

⋆ 6-MP contracts survival

⋆ ‘acceleration’ in the event rate

• Hence the name ‘accelerated failure time’ model

586 BIO 233, Spring 2015

• For the actual leukemia data, θ = 4.14 = 35.9/8.7

Time to relapse, weeks

Sur

viva

l, S

(t)

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

1.0

6−MP; µ1 = 35.90Control; µ0 = 8.67

⋆ treatment with 6-MP ‘extends’ survival

587 BIO 233, Spring 2015

Exponential regression: models for the hazard

• The AFT model parameterizes a model for log(µ)

⋆ µ indexes the exponential distribution

⋆ µ is the mean of the exponential distribution

• Recall, for the exponential distribution

λ(t) =1

µ= λ

⋆ the hazard is constant as a function of time

• We can use this to write down a model for the hazard:

λi =1

µi=

1

µ0exp{β1Xi}= λ0exp{α1Xi}

588 BIO 233, Spring 2015

• More generally, we can directly write a model for the hazard function

λi(t) = λ0exp{XTα}

• Given an underling exponential distribution for T , this model provides an

alternative, equivalent parameterization

⋆ in terms of the hazard, rather than the mean

• Parameters have different values and interpretations

⋆ λ0 is the baseline hazard

∗ i.e. when X = 0

⋆ exp{α1} is a hazard ratio

exp{α1} =λ1

λ0

• Note that the hazard ratio is independent of time

⋆ this is our first encounter with a ‘proportional hazards’ model

589 BIO 233, Spring 2015

• While different, the two sets of parameters are related

⋆ λ0 = 1/µ0

⋆ exp{α1} = exp{−β1}

• For the leukemia data, exp{α1} = 0.24

⋆ hazard among individuals treatment 6-MP is estimated to be

approximately one quarter that of individuals in the control arm

590 BIO 233, Spring 2015

Estimation/inference

• We can perform likelihood-based estimation/inference for {β0,β} via a

likelihood based on the extreme value distribution:

L(β0,β) =n∏

i=1

f(ǫi; β0,β)δi S(ǫi; β0,β)

(1−δi)

where f(·) and S(·) were given before and

ǫi = log(yi)− log(µi) = log(yi)− [β0 + xTi β]

⋆ observed data are: {(yi, δi,xi); i = 1, . . . , n}

⋆ maximize to get MLEs and base inference on the inverse of the

information matrix

• We could also parameterize the model in terms of {α0,α} from the

equivalent PH model

591 BIO 233, Spring 2015

• In R, we can fit AFT models using the survreg() function in the

survival package:

>

> library(survival)

>

> load("Leukemia.dat")

> fitEx <- survreg(Surv(rTime, status) ~ Rx, dist="exponential", data=leuk)

> summary(fitEx)

...

(Intercept) 2.16 0.218 9.9 4.33e-23

Rx 1.42 0.384 3.7 2.16e-04

Scale fixed at 1

Exponential distribution

Loglik(model)= -112.2 Loglik(intercept only)= -119.6

Chisq= 14.97 on 1 degrees of freedom, p= 0.00011

>

> exp(coef(fitEx))

(Intercept) Rx

8.666667 4.142308

592 BIO 233, Spring 2015

• See that survreg() parameterizes the AFT in terms of the log-mean,

rather than the hazard

⋆ can recover estimates of the baseline hazard and hazard ratio

>

> exp(-coef(fitEx))

(Intercept) Rx

0.1153846 0.2414113

593 BIO 233, Spring 2015

Weibull regression: AFT models

• The AFT model based on Ti ∼ Exp(µi) is

log(Ti) = β0 + XTi β + ǫi

⋆ ǫi ∼ extreme value

⋆ µi = exp{β0 +XTi β}

• The (underlying) assumption that the event times are distributed

according to an exponential distribution may be restrictive

⋆ single-parameter family indexed by µ

⋆ constant hazard rate, as a function of time

• One option for increasing the flexibility in the specification of the AFT is

to scale the ‘error’ terms:

log(Ti) = β0 + XTi β + σǫi

594 BIO 233, Spring 2015

• If we retain the assumption that ǫi ∼ extreme value, then since

S(ǫ) = exp{−exp{ǫ}}

we get

Si(t) = exp

{−exp

{log(t)− [β0 +XT

i β]

σ

}}

= exp

{−

(t

µi

)1/σ}

where, again,

µi = exp{β0 +XTi β}

• This is the survival function for a Weibull distribution

⋆ in particular, Ti ∼ Weibull(µi, σ)

595 BIO 233, Spring 2015

• The increase in the flexibility induced by scaling the error terms is that we

make the underlying distribution of T more flexible

⋆ recall, the exponential distribution is a special case of the Weibull

Q: Does this change influence the interpretation of β0 or β?

• For Ti ∼ Weibull(µi, σ)

E[Ti] = µi Γ(1 + 1/σ)

• The ‘baseline’ mean time to event (i.e. when X = 0) is

E[Ti| Xi = 0] = exp{β0} Γ(1 + 1/σ)

⋆ as such, neither β0 nor exp{β0} have a particularly intuitive

interpretation

⋆ in practice, it may be better to calculate and report the above mean

596 BIO 233, Spring 2015

• Consider two values for the covariate vector X:

x = (x1, . . . , xj , . . . , xp)

x′ = (x1, . . . , xj + 1, . . . , xp)

• Based on the Weibull AFT, we see that

E[Ti| Xi = x′]

E[Ti| Xi = x]=

µ(x′) Γ(1 + 1/σ)

µ(x) Γ(1 + 1/σ)

=exp{β0 + x′Tβ}

exp{β0 + xTβ}= exp{βj}

⋆ exp{βj} is the relative change in the mean time to event associated

with a unit change in Xj , holding everything else constant

⋆ precisely the same interpretation as with the exponential AFT

597 BIO 233, Spring 2015

Weibull regression: models for the hazard

• As with the exponential AFT, we can write down the induced model in

terms of the hazard function

• For Ti ∼ Weibull(µi, σ)

λ(t) =σ

µi

(t

µi

)σ−1

= σ µ−σi tσ−1

• Setting µi = exp{β0 +XTi β} the expression for the hazard becomes

λ(t) = σ exp{β0 +XTi β}

−σ tσ−1

= λ0(t) exp{XTi α}

where

⋆ λ0(t) = σ exp{−β0σ} tσ−1

⋆ α = −βσ

598 BIO 233, Spring 2015

• As with the exponential distribution, we could have written the model

directly for the hazard function

⋆ i.e. write a model for λ(t), independently of the specification of the

AFT

• For the Weibull hazard model:

⋆ the baseline hazard function is no longer constant as a function of time

∗ in contrast to the exponential hazards model

⋆ covariate effects are independent of time

∗ same as the exponential hazard model

∗ another example of a ‘proportional hazards model’

599 BIO 233, Spring 2015

Estimation/inference

• The Weibull AFT model can be succinctly written as

log(Ti) = β0 + XTi β + σǫi

⋆ ǫi ∼ extreme value

• The unknown parameters are: {β0,β, σ}

• Since the ǫi terms still have a extreme value distribution, we can base

estimation/inference on the same likelihood we used for the exponential

AFT:

L(β0,β, σ) =n∏

i=1

f(ǫi; β0,β, σ)δi S(ǫi; β0,β, σ)

(1−δi)

where ǫi = (log(yi)− [β0 + xTi β])/σ

600 BIO 233, Spring 2015

• We can fit the Weibull AFT in R using the survreg() function

>

> fitWB <- survreg(Surv(rTime, status) ~ Rx, dist="weibull", data=leuk)

>

> summary(fitWB)

Call:

survreg(formula = Surv(rTime, status) ~ Rx, data = leuk, dist = "weibull")

Value Std. Error z p

(Intercept) 2.25 0.166 13.54 9.49e-42

Rx 1.19 0.297 4.01 6.08e-05

Log(scale) -0.31 0.145 -2.14 3.27e-02

Scale= 0.733

Weibull distribution

Loglik(model)= -110.2 Loglik(intercept only)= -119.2

Chisq= 18.1 on 1 degrees of freedom, p= 2.1e-05

Number of Newton-Raphson Iterations: 5

n= 42

601 BIO 233, Spring 2015

• There are many ways to parameterize the Weibull distribution

⋆ take care which parameterization you are working with

• survreg() parameterizes the Weibull AFT

log(Ti) = β0 + XTi β + σǫi

by setting

⋆ Intercept = β0

⋆ Scale = 1/σ

• So an estimate of the baseline mean,

E[Ti| Xi = 0] = exp{β0} Γ(1 + 1/σ)

can be obtained by as follows:

> coef(fitWB)

(Intercept) Rx

2.247819 1.191325

602 BIO 233, Spring 2015

> beta0Hat <- coef(fitWB)[1]

> betaXHat <- coef(fitWB)[2]

> sigmaHat <- 1/fitWB$scale

>

> ##

> exp(beta0Hat) * gamma(1 + 1/sigmaHat)

(Intercept)

8.666226

• As we noted, exponentiating the slope parameter yields a contrast that can

be interpreted as a ratio of expected survival times

>

> exp(betaXHat)

Rx

3.29144

⋆ the mean time to relapse among individuals who were treated with

6-MP is estimated to be 3.29 times longer than the mean time to

relapse among the controls

603 BIO 233, Spring 2015

• We can compare the results obtained by taking the survival times to be

distributed according to an exponential distribution to those taking the

distribution to be a Weibull:

>

> getCI(fitEx)

Estimate lower upper

(Intercept) 8.67 5.65 13.29

Rx 4.14 1.95 8.80

> getCI(fitWB)

Estimate lower upper

(Intercept) 9.47 6.84 13.11

Rx 3.29 1.84 5.89

Scale 2.08 1.57 2.77

• Find that there is a change in the point estimate of the treatment effect

604 BIO 233, Spring 2015

• Since we are performing likelihood-based estimation/inference, we can

formally evaluate whether or not the Weibull provides a better fit of the

data via a likelihood ratio test of the following hypotheses

H0 : σ = 1 vs Ha : σ 6= 1

⋆ implicitly a test of the ‘constant’ baseline hazard assumption

>

> anova(fitEx, fitWB)

Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)

1 Rx 40 224.3131 NA NA NA

2 Rx 39 220.3470 = 1 3.9661 0.04642519

• Conclude that the Weibull distribution does indeed provide a better fit

605 BIO 233, Spring 2015

Nonparametric and semi-parametric analyses

• So far the analysis of survival data has required the choice of some

distribution for T

⋆ e.g., Exp(µ), Weibull(µ, σ)

• Estimation/inference proceeds on the basis of the likelihood

L(θ) =n∏

i=1

λ(yi; θ)δi S(yi; θ)

⋆ estimation/inference in the frequentist paradigm

∗ maximize L(θ) to get the MLE

∗ inverse of the information matrix

⋆ estimation/inference in the Bayesian paradigm

∗ specify a prior for θ

∗ summarize features of the posterior distribution

606 BIO 233, Spring 2015

• Once we ‘know’ θ we can characterize any feature of the distribution

⋆ mean survival, E[T ]

⋆ hazard function, λ(t)

⋆ survival function, S(t)

⋆ median survival, Tmed

Q: What if we specify the wrong distribution?

⋆ we have misspecified the statistical model

⋆ the behavior of statistical procedures under misspecification is often

uncertain

Q: Do we need to specify a distribution?

Q: Can we address questions regarding survival without specifying a

distribution?

607 BIO 233, Spring 2015

• Recall, for linear regression analysis of continuous response data we don’t

have to assume any particular distribution for the error terms

Yi = XTi β + ǫi

⋆ as long as the mean-model is correctly specified, the OLS estimator is

∗ unbiased

∗ asymptotically Normally distributed

⋆ if the error terms are Normally distributed then by adopting the Normal

distribution we gain efficiency

∗ the OLS estimator is the MLE

• When we assume a specific distribution for the response and model the

parameters that index that distribution, we say that we are performing a

parametric analysis

⋆ if we know the values of the parameters than index the distribution, we

know everything about the distribution

608 BIO 233, Spring 2015

• Estimation and inference based on OLS with the use of the robust

standard error is an example of a semi-parametric procedure

⋆ we focus and place structure on the conditional mean, E[Y |X]

⋆ we don’t (have to) make any further assumptions about Y |X

∗ e.g., it’s variance

∗ e.g., the family to which its distribution belongs

• In general, a semi-parametric procedure places structure on part of the

distribution of the response but does not completely specify it

• A nonparametric procedure places no structure on the distribution of the

response variable

609 BIO 233, Spring 2015

Estimation and inference for S(t)

• In the analysis of time-to-event data, the survival function is a natural

target of estimation

S(t) = P(T > t)

= 1− P(T ≤ t)

Q: Can we estimate S(t) without making any assumptions about the

distribution of T ?

• Consider two cases:

(1) observed times are complete (i.e. no censoring)

(2) observed times are subject to right censoring

610 BIO 233, Spring 2015

Estimation in the absence of censoring

• Consider the observed times to relapse among the patients in the control

arm of the leukemia trial:

1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8,

11, 11, 12, 12, 15, 17, 22, 23

• Since all of the observed times are actual relapse times, we can directly

estimate the survival function as

S(t) =# individuals with T > t

total sample size

⋆ S(5) = 12/21 = 0.571

⋆ S(10) = 8/21 = 0.381

⋆ S(15) = 3/21 = 0.143

611 BIO 233, Spring 2015

• Present the estimate graphically:

Time to relapse, weeks

Sur

viva

l, S

(t)

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

1.0

Non−parametricExponential with µ = 8.67

• This estimate of S(t) is nonparametric in the sense that no assumptions

about the underlying distribution have been made

⋆ in contrast to the estimate of S(t) based on an underlying Exp(µ)

distribution

612 BIO 233, Spring 2015

Estimation in the presence of censoring

• Now consider the observed times to relapse among the patients in the

6-MP arm of the leukemia trial:

6, 6, 6, 6+, 7, 9+, 10, 10, 11+, 13, 16, 17+,

19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+

⋆ times marked with a + are (right) censored

• Naıvely applying the previous approach we’d estimate

S(15) = 11/21 = 0.524

• This estimate implicitly assumes that the individuals who were censored

prior to t = 15 did not survive to t = 15

• All three could have survived to t = 15 which would give

S(15) = 14/21 = 0.667

613 BIO 233, Spring 2015

• While these two estimates provide bounds for S(t), this is arguably not

very satisfactory

• One solution is to only consider those folks with complete data:

S(15) =# individuals with T > 15

total with complete data at t = 15=

11

18= 0.611

⋆ under independent censoring this would be a reasonable strategy

• The problem is that you throw away some of the partial information that

the censored individuals provide

⋆ i.e. only use a portion of the sample to estimate S(t)

⋆ this estimator is therefore inefficient

614 BIO 233, Spring 2015

• Another strategy is to make use of all the information in the sample by

exploiting a decomposition of the survival function in terms of conditional

probabilities

• Let {(yi, δi); i = 1, . . . , n} be the observed data

⋆ Yi = min(Ti, Ci)

⋆ δi is the status indicator

• Let t(1) < t(2) < . . . < t(K) denote the ordered observed event times

⋆ K unique event times

⋆ t(K) is the final observed event time

• For the patients in the 6-MP arm, the ordered event times are:

6, 7, 10, 13, 16, 22, 23

615 BIO 233, Spring 2015

• Partition the time scale based on the ordered event times:

[0, t(1)) ∪ [t(1), t(2)) ∪ . . . ∪ [t(K−1), t(K)) ∪ [t(K),∞)

⋆ K + 1 mutually exclusive intervals

• Define the risk set Rk to be the set of individuals were at risk to be

observed to experience the event at time t(k)

⋆ all individuals who had not experienced the event or been censored at

the start of the kth interval

• For risk set Rk, let

⋆ nk denote the number of individuals at risk

⋆ dk denote the number of failures at time t(k)

⋆ sk denote the number of non-failures at time t(k)

616 BIO 233, Spring 2015

• For example, for the 5th ordered failure time:

Time to remission, weeks

0 5 10 15 20 25 30 35

1234 ]56 ]789 ]

101112 ]13 ]14 ]151617 ]18 ]19 ]20 ]21 ]

⋆ t(5) = 16

⋆ [t(5), t(6)) = [16, 22)

⋆ R5 = {11, 12, 13, 14, . . ., 21}

⋆ nk = 11

⋆ dk = 1

⋆ sk = 10

617 BIO 233, Spring 2015

• Across all of the risk sets for the 6-MP arm, we get:

------------------------------

t_(k) n_k d_k s_k

------------------------------

0 0 21 0 21

1 6 21 3 18

2 7 17 1 16

3 10 15 2 13

4 13 12 1 11

5 16 11 1 10

6 22 7 1 6

7 23 6 1 5

------------------------------

• For each risk set, we estimate the conditional probability of survival:

P(T > t(k)| T ≥ t(k)) =sknk

⋆ note this is 1 minus the hazard at time t(k)

618 BIO 233, Spring 2015

• Calculating this for each of the event times, we get:

---------------------------------------

t_(k) n_k d_k s_k cpS_k

---------------------------------------

0 0 21 0 21 1.000

1 6 21 3 18 0.857

2 7 17 1 16 0.941

3 10 15 2 13 0.867

4 13 12 1 11 0.917

5 16 11 1 10 0.909

6 22 7 1 6 0.857

7 23 6 1 5 0.833

---------------------------------------

Q: What is the marginal probability of survival to time t = 10?

619 BIO 233, Spring 2015

• We can calculate the marginal survival for each risk set as:

P(T > t(k)) =sknk

×sk−1

nk−1× . . . ×

s1n1

• Intuitively, to make it to any given risk set and survive, you had to have

made it through all of the prior risk sets

• Applying this to the 6-MP arms of the leukemia data, we get:

------------------------------------------------

t_(k) n_k d_k s_k cpS_k S_k

------------------------------------------------

0 0 21 0 21 1.000 1.000

1 6 21 3 18 0.857 0.857

2 7 17 1 16 0.941 0.807

3 10 15 2 13 0.867 0.699

4 13 12 1 11 0.917 0.641

5 16 11 1 10 0.909 0.583

6 22 7 1 6 0.857 0.499

7 23 6 1 5 0.833 0.416

------------------------------------------------

620 BIO 233, Spring 2015

• This estimate of S(t) is also nonparametric in the sense that no

assumptions about the underlying distribution have been made

⋆ solely used the assumption of independent censoring

• Again present the results graphically:

Time to relapse, weeks

Sur

viva

l, S

(t)

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

1.0

Non−parametricExponential with µ = 35.9

621 BIO 233, Spring 2015

Kaplan-Meier estimator

• The estimate of S(t) just derived is the Kaplan-Meier estimator:

SKM(t) =∏

k:t(k)≤t

sknk

⋆ product of conditional survival probabilities corresponding to all risk

sets prior to time t

• Right-continuous step-function

⋆ height only changes at the observed event times, t(k)

⋆ equal to 1.0 up to and including the first event time

⋆ only equals 0.0 if the last observed time is an event time

• As n −→ ∞, the number of events and the number of intervals increase

⋆ estimate becomes ‘smoother’ and SKM(t) −→ S(t)

⋆ product limit estimator

622 BIO 233, Spring 2015

• SKM(t) can also be derived as the nonparametric maximum likelihood

estimate

⋆ NPMLE

• Recall the ordered event times: t(1) < t(2) < . . . < t(K)

• For notational convenience, let

⋆ t(0) = 0

⋆ t(K+1) = ∞

• Let ck denote the number of individuals who are censored in the kth

interval, [t(k−1), t(k))

⋆ for these individuals, we observe the following censoring times:

tk1, tk2, . . . , tkck

623 BIO 233, Spring 2015

• For the 5th ordered failure time:

Time to relapse, weeks

0 5 10 15 20 25 30 35

1234 ]56 ]789 ]

101112 ]13 ]14 ]151617 ]18 ]19 ]20 ]21 ]

⋆ [t(5), t(6)) = [16, 22)

⋆ nk = 11

⋆ dk = 1

⋆ ck = 3

⋆ observed censoring times: 17, 19, 20

624 BIO 233, Spring 2015

• Under the independent censoring assumption, the likelihood is given by

L =K∏

k=0

[S(t−(k)) − S(t(k))

]dkck∏

j=1

S(tkj)

⋆ the dk ‘failures’ each contribute:

P(T = t(k)) = S(t−(k)) − S(t(k))

⋆ the ck censored individuals each contribute their own:

P(T > tkj) = S(tkj)

• Notes:

⋆ this represents a likelihood on the space of all survivor functions, S(t)

⋆ the NPMLE is the survival function S(t) that maximizes L

625 BIO 233, Spring 2015

• Note, since tkj > t(k), S(tkj) is maximized by setting

S(tkj) = S(t(k)), j = 1, . . . , ck

⋆ maximize the contribution to the likelihood by maximizing S(·)

• So we can write

L =K∏

k=0

[S(t−(k)) − S(t(k))

]dkck∏

j=1

S(t(k))

• From this, the MLE of S(t) will be a discrete survival function:

⋆ discontinuous at the ordered event times

∗ i.e. piecewise constant with jumps at the t(k)

⋆ hazard components λk at each of the t(k)

∗ in between events, the hazard is zero

626 BIO 233, Spring 2015

• Therefore,

S(t−(k)) =k−1∏

l=0

(1− λl

)and S(t(k)) =

k∏

l=0

(1− λl

)

where the {λ1, . . . , λK} maximize

L(λ) =K∏

k=1

{λdk

k

k−1∏

l=1

(1− λl)dk

k∏

l=1

(1− λl)ck

}

=K∏

k=1

λdk

k (1− λk)nk−dk

• This is the same form as the likelihood based on the Bernoulli distribution

⋆ maximizing yields:

λk =dknk

627 BIO 233, Spring 2015

• Plugging this back into the expression for the survival function, we get:

S(t) =∏

k:t(k)≤t

(1− λk

)

=∏

k:t(k)≤t

(1−

dknk

)

=∏

k:t(k)≤t

sknk

which is the Kaplan-Meier estimator.

628 BIO 233, Spring 2015

Inference for the KM estimator

• For fixed t0, the asymptotic variance of SKM(t0) is

V[SKM(t0)

]= SKM(t0)

2∑

k:t(k)≤t0

dknksk

⋆ known as Greenwood’s formula

⋆ denote the square-root of this by SEGW(t0)

• Use this to construct an approximate 95% confidence interval:

(SKM(t0) − 1.96SEGW(t0), SKM(t0) + 1.96SEGW(t0)

)

• A problem with this confidence interval is that may or may not respect the

fact that S(t) ∈ (0, 1)

629 BIO 233, Spring 2015

• An alternative approach is to construct a 95% confidence interval for

log Λ(t0) = log[−log S(t0)]

and transform back to the S(t) scale:

⋆ Λ(t) ∈ (0,∞) ⇒ log Λ(t) ∈ (−∞,∞)

• The variance for log[−log SKM(t0)]

V[log[−log SKM(t0)]

]=

{1

−log SKM(t0)

}2 ∑

k:t(k)≤t

dknksk

⋆ denote the square-root of this by SEll(t0)

• A 95% confidence interval for S(t0) is

(SKM(t0)

exp{1.96SEll(t0)}, SKM(t0)exp{−1.96SEll(t0)}

)

630 BIO 233, Spring 2015

• In R, one can obtain the KM estimator and (pointwise) 95% confidence

intervals using the survfit() function in the survival package

⋆ note, default is to construct a 95% confidence interval for log S(t0) and

transform

>

> library(survival)

> ##

> KMv1 <- survfit(Surv(rTime, status) ~ 1, data=leuk, subset=(Rx == 1))

> summary(KMv1)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

6 21 3 0.857 0.0764 0.720 1.000

7 17 1 0.807 0.0869 0.653 0.996

10 15 2 0.699 0.1034 0.523 0.934

13 12 1 0.641 0.1100 0.458 0.897

16 11 1 0.583 0.1144 0.397 0.856

22 7 1 0.499 0.1247 0.306 0.815

23 6 1 0.416 0.1287 0.227 0.763

631 BIO 233, Spring 2015

>

> KMv2 <- survfit(Surv(rTime, status) ~ 1, data=leuk, subset=(Rx == 1),

conf.type="plain")

> summary(KMv2)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

6 21 3 0.857 0.0764 0.707 1.000

7 17 1 0.807 0.0869 0.636 0.977

...

22 7 1 0.499 0.1247 0.255 0.744

23 6 1 0.416 0.1287 0.164 0.668

>

> KMv3 <- survfit(Surv(rTime, status) ~ 1, data=leuk, subset=(Rx == 1),

conf.type="log-log")

> summary(KMv3)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

6 21 3 0.857 0.0764 0.620 0.952

7 17 1 0.807 0.0869 0.563 0.923

...

22 7 1 0.499 0.1247 0.245 0.710

23 6 1 0.416 0.1287 0.174 0.645

632 BIO 233, Spring 2015

• Looking at the estimated survival functions for the two leukemia treatment

arms there seems to be clear evidence of a difference in survival

experience:

Time to relapse, weeks

Sur

viva

l

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

1.0

6−MP (N=21)Control (N=21)

• We can formally evaluate whether or not this is the case using a

hypothesis test

633 BIO 233, Spring 2015

• One approach would be to compare the survival at a particular time, t0

⋆ i.e. perform a hypothesis test of

H0 : S0(t0) = S1(t0)

vs

H1 : S0(t0) 6= S1(t0)

using the test statistic

Z =S1(t0) − S0(t0)√

V [S1(t0)] + V [S0(t0)]∼H0

Normal(0, 1)

⋆ Kaplan-Meier estimator to obtain the numerator

⋆ Greenwood’s variance estimator for the denominator

• Problems with focusing on a single time point include:

⋆ clinically, there may not be an ‘optimal’ single time

⋆ we throw away a lot of information

634 BIO 233, Spring 2015

Q: Can we compare two survival curves across the entire observed time

frame?

• Pooling across both treatment arms, let t1 < t2 < . . . < tK denote the

observed event times

⋆ K distinct event times in the entire sample

• At each failure time, consider a 2×2 table of the form:

Failure

No Yes

Group 0 n0k − d0k d0k n0k

Group 1 n1k − d1k d1k n1k

nk − dk dk nk

⋆ risk set associated with the kth failure time

635 BIO 233, Spring 2015

• Under the null hypothesis H0 : S0(t) = S1(t), conditional on the risk set at

time tk, the distribution of the number of events in Group 1 is

D1k ∼ Hypergeometric(nk, n1k, dk)

⋆ expected number of events is

E[d1k] =n1kdknk

• Let Uk = d1k − E[d1k] be the difference between the observed and

expected number of events at the kth failure time

⋆ under H0,

E[Uk] = 0

V[Uk] =n1kn0k(nk − dk)dk

n2k(nk − 1)

= Vk

636 BIO 233, Spring 2015

• The log-rank test compares the survival curves across the observed time

frame

TLR =

[K∑

k=1

Uk

]2

K∑k=1

Vk

∼H0χ21

⋆ test statistic is the usual Cochran-Mantel-Haenzel χ2 statistic applied

to the collection of risk sets

>

> survdiff(Surv(rTime, status) ~ Rx)

...

N Observed Expected (O-E)^2/E (O-E)^2/V

Rx=0 21 21 11.1 8.84 15.2

Rx=1 21 10 19.9 4.93 15.2

Chisq= 15.2 on 1 degrees of freedom, p= 9.81e-05

637 BIO 233, Spring 2015

• The log-rank test assigns equal weight to each risk set

⋆ sensitive to differences in the tails of the survivor function

⋆ where there is the least amount of information

• Most powerful for the alternative:

H1 : S0(t) = S1(t)φ, φ 6= 1

• Equivalently the test is most powerful for the hypotheses:

H0 : λ0(t) = λ1(t)

vs

H1 : λ0(t) = φλ1(t), φ 6= 1

⋆ i.e. proportional hazards

638 BIO 233, Spring 2015

• The Gehan-Breslow test weights the contribution from the kth risk set by

the number of subjects at risk, nk

TW =

[K∑

k=1

nkUk

]2

K∑k=1

n2kVk

∼H0χ21

⋆ relatively larger weight is given to early risk sets

⋆ greater (relative) power under non-proportional hazards for which the

differences are large early on

639 BIO 233, Spring 2015

Cox regression

• We saw that the AFT model provides a regression framework for

characterizing the relationship between the distribution of T and some

vector of covariates, X:

log(Ti) = β0 + XTi β + ǫi

• Require specification of some distribution for ǫ

⋆ needed to form the likelihood

⋆ corresponds to some specific underlying distribution for T

• As such, the AFT is a parametric model

⋆ completely specifies the distribution of T

Q: What if we specify the wrong distribution? What if the random

component of the model is misspecified?

640 BIO 233, Spring 2015

• Recall, for the special cases where T ∼ Exp(µ) and T ∼ Weibull(µ, σ), the

AFT model corresponds to a model for the hazard that exhibits

proportionality over time in the covariate effects:

λ(ti) = λ0(ti) exp{XTi β}

⋆ note: the β in this model is not the same as the one in the AFT model

Q: Can we estimate the components of this model without making any

distributional assumptions?

⋆ baseline hazard function, λ0(·)

⋆ log-hazard ratios, β

• If primary interest lies with estimating β, then the baseline hazard function

is a nuisance parameter

⋆ ideally eliminate it somehow

641 BIO 233, Spring 2015

Partial likelihood

• Observed data: {(yi, δi,xi); i = 1, . . . , n}

• Let t(1) < t(2) < . . . < t(K) denote the ordered events times across the n

observations

• For the kth event time, identify the corresponding risk set Rk

⋆ all individuals who were at risk to experience the event

⋆ all individuals with yi ≥ t(k)

• Suppose that in each of the K risk sets there was only one event

⋆ no ties in the event times

Q: Who exactly, in the risk set, experienced the event?

Q: Of everyone who could have failed, what was so special about them?

642 BIO 233, Spring 2015

• Let x(k) denote the covariate vector for the individual who experienced the

event

• The partial likelihood compares the ‘risk’ of the individual who experiences

the event to everyone who could have experienced the event, across the K

risk sets:

LP (β) =K∏

k=1

P(indiv. (k) fails at time t(k))∑i∈Rk

P(indiv. i fails at time t(k))

=K∏

k=1

λ0(t(k)) exp{xT(k)β}∑

i∈Rkλ0(t(k)) exp{x

Ti β}

=K∏

k=1

exp{xT(k)β}∑

i∈Rkexp{xT

i β}

⋆ Cox (1972)

643 BIO 233, Spring 2015

• The log-partial likelihood is

ℓP (β) =

K∑

k=1

{xT(k)β − log

(∑

i∈Rk

exp{xTi β}

)}

• Differentiating with respect to β we get

∂βℓP (β) =

K∑

k=1

{x(k) −

i∈Rk

xiexp{xTi β}∑

j∈Rkexp{xT

j β}

}

=K∑

k=1

{x(k) −

i∈Rk

W(k),i(β) xi

}

• Information about β comes from a comparison of the covariates for the

person who experienced the event to a weighted average of the covariates

of everyone in the risk set

⋆ weights depend, in part, on β

644 BIO 233, Spring 2015

• Estimation and inference based on LP (β) can proceed as usual

⋆ obtain the maximum partial likelihood estimate (MPLE) via

Newton-Raphson

⋆ standard errors via the partial likelihood observed information matrix:

Iββ = −∂2

∂β∂βTℓP (β)

⋆ hypothesis testing via Wald, score and likelihood ratio tests

∗ theoretical justification is complicated by the fact that the terms are

not independent

• The popularity of the proportional hazards model is driven by

⋆ the ease with which estimation/inference for β proceeds

⋆ the fact that one does not need to specify a model for λ0(·)

• Specifying a model for the hazard and using PL for estimation/inference is

an example of a semi-parametric statistical analysis

645 BIO 233, Spring 2015

• Note, the partial likelihood has exactly the same form as the conditional

likelihood for a matched case-control study

⋆ here the risk set is analogous to the set we created via the matching

process

⋆ used conditioning to eliminate the nuisance parameters introduce by

the sampling scheme

⋆ difference is that the partial likelihood is not a product of independent

conditional probabilities

646 BIO 233, Spring 2015

Interpretation

• Consider the following model for the hazard function:

λ(t) = λ0(t) exp{β1X1,i + . . . + βpXp,i}

• λ0(t) is the baseline hazard function

⋆ when X1 = . . . = Xp = 0

⋆ interpretation requires care when any of the X are continuous

• exp{βj} is the hazard ratio for a unit change in Xj , holding other

components of the model constant

⋆ some use the terms rate ratio or relative risk

• Proportional hazards in the sense that the hazard ratio does not depend on

time

⋆ hazard ratio truly does not depend on time

⋆ characterize an ‘average’ hazard ratio, averaging across time

647 BIO 233, Spring 2015

Bone marrow transplant data

• Dataset ‘marrow’, on the course website, has information on 137

individuals who underwent a bone marrow transplant

⋆ example from Klein and Moeschberger (2003)

• Outcome of interest is disease-free survival

⋆ the origin is time of transplant

⋆ measure time-to-event, where the ‘event’ is the first

∗ death

∗ relapse

⋆ if the patient makes it to the end of the study without dying or

experiencing a relapse, they are censored

• Observed data:

⋆ Yi = min(Tdeath,i, Trelapse,i, Ci)

⋆ δi = 1 if Yi corresponds to a ‘death’ or ‘relapse’ and 0 otherwise

648 BIO 233, Spring 2015

• At the time of transplant, patients were categorized into one of three

disease groups:

⋆ acute lymphoblastic leukemia (ALL)

⋆ acute myeloid leukemia (AML)

∗ low risk category

∗ high risk category

• Load in the data and examine the Kaplan-Meier estimates

>

> library(survival)

> load("BoneMarrow_data.dat")

>

> ##

> marrow$time <- marrow$time / 365.25

> marrow$plateT <- marrow$plateT / 365.25

> marrow$group <- factor(marrow$group, levels=1:3,

labels=c(" ALL", " AML-lo", " AML-hi"))

>

> fitKM <- survfit(Surv(time, status) ~ group, data=marrow)

649 BIO 233, Spring 2015

>

> plot(fitKM, mark.time=FALSE, xlab="Time to first of death or relapse, years",

ylab="Survival", lwd=3, col=c("red", "blue", "green"), axes=FALSE)

> axis(1, at=seq(from=0, to=7, by=1))

> axis(2, at=seq(from=0, to=1, by=0.2))

> legend(6, 1, c("ALL", "AML low risk", "AML high risk"),

lwd=3, col=c("red", "blue", "green"), bty="n")

Time to first of death or relapse, years

Sur

viva

l

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

ALLAML low rAML high r

650 BIO 233, Spring 2015

• Formally evaluate differences with either the log-rank test or within a Cox

model:

>

> survdiff(Surv(time, status) ~ group, data=marrow)

..

N Observed Expected (O-E)^2/E (O-E)^2/V

group= ALL 38 24 21.9 0.211 0.289

group= AML-lo 54 25 40.0 5.604 11.012

group= AML-hi 45 34 21.2 7.756 10.529

Chisq= 13.8 on 2 degrees of freedom, p= 0.00101

>

> coxph(Surv(time, status) ~ group, data=marrow)

...

coef exp(coef) se(coef) z p

group AML-lo -0.574 0.563 0.287 -2.00 0.046

group AML-hi 0.383 1.467 0.267 1.43 0.150

Likelihood ratio test=13.4 on 2 df, p=0.0012 n= 137, number of events= 83

651 BIO 233, Spring 2015

• Fit an adjusted Cox model:

>

> marrow$waitCat <- 0

> marrow$waitCat[marrow$waittime > 90] <- 1

> marrow$waitCat[marrow$waittime > 180] <- 2

> marrow$waitCat[marrow$waittime > 365] <- 3

> marrow$waitCat <- factor(marrow$waitCat, levels=0:3,

labels=c(" 0-90", " 91-180", "181-365", " 365+"))

>

> fitPH1 <- coxph(Surv(time, status) ~ ageP + maleP + cmvP + waitCat + group,

data=marrow)

> summary(fitPH1)

...

n= 137, number of events= 83

coef exp(coef) se(coef) z Pr(>|z|)

ageP 0.006301 1.006321 0.013296 0.474 0.6356

maleP -0.272790 0.761252 0.243990 -1.118 0.2636

cmvP 0.071114 1.073704 0.238417 0.298 0.7655

waitCat 91-180 -0.199449 0.819182 0.353061 -0.565 0.5721

waitCat181-365 -0.253918 0.775755 0.390037 -0.651 0.5150

waitCat 365+ -0.476419 0.621003 0.444332 -1.072 0.2836

652 BIO 233, Spring 2015

group AML-lo -0.808177 0.445670 0.329754 -2.451 0.0143 *

group AML-hi 0.251429 1.285861 0.292381 0.860 0.3898

---

exp(coef) exp(-coef) lower .95 upper .95

ageP 1.0063 0.9937 0.9804 1.0329

maleP 0.7613 1.3136 0.4719 1.2280

cmvP 1.0737 0.9314 0.6729 1.7133

waitCat 91-180 0.8192 1.2207 0.4101 1.6365

waitCat181-365 0.7758 1.2891 0.3612 1.6662

waitCat 365+ 0.6210 1.6103 0.2599 1.4836

group AML-lo 0.4457 2.2438 0.2335 0.8505

group AML-hi 1.2859 0.7777 0.7250 2.2807

...

Q: Interpretation of exp{-0.808} = 0.45?

Q: Interpretation of exp{0.251} = 1.29?

653 BIO 233, Spring 2015

• Perform a likelihood ratio test for the overall effect of ‘group’

H0 :

vs

H1 :

>

> ##

> fitPH0 <- coxph(Surv(time, status) ~ ageP + maleP + cmvP + waitCat,

data=marrow)

> anova(fitPH0, fitPH1)

Analysis of Deviance Table

Cox model: response is Surv(time, status)

Model 1: ~ ageP + maleP + cmvP + waitCat

Model 2: ~ ageP + maleP + cmvP + waitCat + group

loglik Chisq Df P(>|Chi|)

1 -371.84

2 -365.01 13.664 2 0.001079 **

654 BIO 233, Spring 2015

Special topics

• In the remainder of the class we are going to consider a series of special

topics that take us beyond the ‘basic’ Cox model

⋆ time-dependent covariates

⋆ time-varying effects

∗ i.e. non-proportional hazards

⋆ left truncation

• Focus on the concepts and implementation in R

• Formal justification given by the theory of counting processes

⋆ BIO 244: Analysis of Failure Time Data

655 BIO 233, Spring 2015

Time-dependent covariates

• So far we have considered models for the hazard of the form:

λ(ti) = λ0(ti) exp{XTi β}

• Implicit in this model is that the covariates do not vary over time

⋆ i.e. the components of X are measured at baseline

• In some settings, covariates of interest may vary over time

• For example, individuals who receive bone marrow transplants have their

platelet levels monitored over time

⋆ at the outset their platelet levels will be too low

⋆ over time, assuming the transplant is successful, they will return to

normal levels

656 BIO 233, Spring 2015

• See this in the marrow dataset

>

> marrow[1:10, c("id", "group", "plateS", "plateT", "status", "time")]

id group plateS plateT status time

1 1 ALL 0 1 1 1

2 2 AML-hi 0 2 1 2

3 3 AML-lo 0 10 1 10

4 4 AML-hi 0 16 1 16

5 5 AML-hi 1 16 1 32

6 6 AML-lo 0 35 1 35

...

• Patient #3:

⋆ experience either death or relapse at day 10 without their platelets

having returned to normal levels

• Patient #5:

⋆ at 16 days, their platelets had returned to normal levels

⋆ experience either death or relapse at day 32

657 BIO 233, Spring 2015

• Define the time-dependent covariate:

X1(t) =

0 platelets have not returned to normal levels by time t

1 platelets have returned to normal levels by time t

• Write a model for the hazard as:

λ(t) = λ0(t) exp{β1X1,i(t) + β2agei + . . .}

⋆ the value of the hazard depends on when it is evaluated

• The model still parameterizes the effect of X1 to be constant in time

⋆ that is, β1 does not depend on time

⋆ proportional hazards for a time-dependent covariate

⋆ interpretation of exp{β1} is as before

658 BIO 233, Spring 2015

• Estimation proceeds via the partial likelihood

• Crucially, when we evaluate the partial likelihood we must update the

value of the covariates that are being compared within the risk set

LP (β) =K∏

k=1

exp{x(k)(t(k))Tβ}∑

i∈Rkexp{xi(t(k))Tβ}

⋆ as you move through time and your covariates are compared to those

of the individual that ‘failed’, we have to make sure to update the

values of your covariates

• Operationally, this involves creating multiple records for individuals

• Consider, again, patient #5

>

> marrow[5, c("id", "group", "plateS", "plateT", "status", "time")]

id group plateS plateT status time

5 5 AML-hi 1 16 1 32

659 BIO 233, Spring 2015

• Their time-varying platelet covariate is as follows:

X1,5(t) =

0 t ≤ 16

1 t > 16

• We can represent this shift by creating two records

⋆ record 1: time between 0-16

∗ time ‘starts’ at 0 and ‘ends’ at 16

∗ X1,5 = 0

∗ had not experienced the event at t5 = 16, so that δ5 = 0 for this

record

⋆ record 2: time between 16-32

∗ time ‘starts’ at 16 and ‘ends’ at 32

∗ X1,5 = 1

∗ experience the event at t5 = 32, so that δ5 = 1 for this record

⋆ covariates that were measured at baseline are the same for both records

∗ e.g., age at entry and sex

660 BIO 233, Spring 2015

• Code in R:

⋆ for each person create a ‘first’ observation

⋆ also create a ‘second’ observation if they experienced platelet recovery

>

> ## First observation prior to platelet recovery

> ##

> marrow0 <- marrow

> marrow0$plateTVC <- 0

> marrow0$start <- 0

> marrow0$end <- pmin(marrow0$plateT, marrow0$time)

> marrow0$delta <- marrow0$status

> marrow0$delta[marrow0$plateT < marrow0$time] <- 0

>

> ## Second observation after platelet recovery

> ##

> marrow1 <- marrow[marrow$plateS == 1,]

> marrow1$plateTVC <- 1

> marrow1$start <- marrow1$plateT

> marrow1$end <- marrow1$time

> marrow1$delta <- marrow1$status

661 BIO 233, Spring 2015

• Combine records and sort

⋆ take care because of one patient who had platelet recovery immediately

>

> marrowTVC <- rbind(marrow0, marrow1)

> marrowTVC <- marrowTVC[order(marrowTVC$id, marrowTVC$start),]

> marrowTVC[1:10, c("id", "ageP", "plateTVC", "start", "end", "delta")]

id ageP plateTVC start end delta

1 1 42 0 0 1 1

2 2 20 0 0 2 1

3 3 34 0 0 10 1

4 4 27 0 0 16 1

5 5 36 0 0 16 0

510 5 36 1 16 32 1

...

>

> marrowTVC[marrowTVC$id == 20,

c("id", "ageP", "plateTVC", "start", "end", "delta")]

id ageP plateTVC start end delta

20 20 35 0 0 0 0

201 20 35 1 0 80 1

> marrowTVC <- marrowTVC[(marrowTVC$start < marrowTVC$end),]

662 BIO 233, Spring 2015

• The new dataset has 256 ‘observations’ from 137 patients

>

> nrow(marrow)

[1] 137

> nrow(marrowTVC)

[1] 256

>

> fitTVC <- coxph(Surv(start, end, delta) ~ ageP + maleP + cmvP + waitCat

+ group + plateTVC, data=marrowTVC)

> summary(fitTVC)

...

n= 256, number of events= 83

exp(coef) exp(-coef) lower .95 upper .95

ageP 1.0099 0.9902 0.9845 1.0360

...

plateTVC 0.3202 3.1227 0.1596 0.6424

• The hazard for individuals who experience platelet recovery is estimated to

be 68% lower that the hazard for individuals who don’t experience platelet

recovery, holding other components of the model constant

663 BIO 233, Spring 2015

Time-varying effects

• Consider, again, the ‘basic’ Cox model:

λ(t) = λ0(t) exp{XTβ}

• As we’ve noted, to isolate the effect of X1, we can compare the hazard

function between two populations who differ in their value by one unit:

x = (x1, x2, . . . , xp)

x′ = (x1 + 1, x2, . . . , xp)

which gives

λ(t;x′)

λ(t;x)=

λ0(t)exp{x′Tβ}

λ0(t)exp{xTβ}= exp{β1}

• That is, under the above model, the effect of X1 does not vary with time

664 BIO 233, Spring 2015

• Put another way, any change in the hazard function associated with a unit

increase in X1, it is maintained throughout all time

⋆ hence the term proportional hazards

• In practice, adopting a proportional hazards model may be reasonable for

⋆ estimating treatment effects over a relatively short timeframe

⋆ comparing innate differences between individuals (e.g. sex)

⋆ settings where one is interested in a simple summary of the effect

• In some settings, this specification may not be

⋆ appropriate

∗ e.g., if the benefits of treatment wane over time

⋆ adequate

∗ e.g., residual confounding

665 BIO 233, Spring 2015

• In practice, one might either

⋆ hypothesize non-proportional hazards at the outset

⋆ detect it via the analysis of residuals or an evaluation of goodness of fit

• Either way, if interest lies in moving beyond a proportional hazards model,

one could parameterize the model so that the effect of X1 is more flexible:

λ(t) = λ0(t) exp{β1(t)X1 + β2X2 + . . .}

which gives

λ(t;x′)

λ(t;x)= exp{β1(t)}

⋆ say the effect is time-varying

⋆ interpretation of the hazard ratio requires specification of some actual

time

666 BIO 233, Spring 2015

• In practice, analysts have considerable choice in how to specify β1(t)

• In the following sub-sections we are going to consider two approaches:

⋆ stratified baseline hazard functions

⋆ interactions with time

667 BIO 233, Spring 2015

Non-proportional hazards: stratified models

• Returning to the bone marrow data, patients under went their transplants

at one of four hospitals:

hospi =

1 The Ohio State University

2 Alferd

3 St. Vincent

4 Hahnemann

• We may want to adjust for hospital as a potential confounder for the

association of disease group and disease-free survival

⋆ different hospitals may serve different patient populations

⋆ some hospitals provide better care than others

668 BIO 233, Spring 2015

• Towards this, consider how the ‘basic’ model parameterizes the effect of

hospital:

λ(t) = λ0(t) exp{Ihosp = 2βh2

+ Ihosp = 3βh3

+ Ihosp = 4βh4

+ . . .}

• Under this model:

⋆ λ0(t) indicates how ‘risk’ varies over time at The Ohio State University

hospital

∗ i.e. the hospital for which ‘hops = 1’

⋆ risk in each of the other hospitals follows the same profile over time,

possibly increased/decreased by some constant multiplicative factor

• If we wanted to summarize differences between hospitals, this may be

reasonable

669 BIO 233, Spring 2015

• If the primary purpose of including hospital in the model is the control of

confounding we may want to avoid a restrictive model specification

⋆ mitigate residual confounding

• One option for introducing flexibility would be to let each hospital have

their own baseline hazard function:

λ(t) = λ0s(t) exp{XTβ}, s = 1, . . . , 4

⋆ λ0s(t) is the stratum-specific baseline hazard function

⋆ β is a vector of common log-hazard ratios

• The hazard ratio comparing stratum s vs stratum 1, holding X constant is

λ0s(t)

λ01(t)

⋆ a flexible function of time

670 BIO 233, Spring 2015

• Estimation/inference is straightforward in that the contributions to the

likelihood:

⋆ form the usual partial likelihood by only considering observations within

each stratum

⋆ take the product over the strata to ‘borrow strength’ in the estimation

of β

LP (β) =S∏

s=1

Ks∏

k=1

exp{xT(k,s)β}∑

i∈Rk,sexp{xT

i β}

• Estimation/inference for the stratification variable is not straightforward

⋆ estimate each of the baseline hazard functions separately

• The approach can be applied to continuous covariates by categorization

⋆ may also want to include the covariate in the linear predictor

671 BIO 233, Spring 2015

• Adopting a proportional hazards model for between-hospital effects:

>

> fitPH <- coxph(Surv(time, status) ~ ageP + maleP + cmvP + waitCat

+ group + hosp, data=marrow)

> summary(fitPH)

...

n= 137, number of events= 83

coef exp(coef) se(coef) z Pr(>|z|)

ageP 0.01380 1.01390 0.01436 0.961 0.3365

maleP -0.20074 0.81812 0.26053 -0.771 0.4410

cmvP -0.07385 0.92881 0.24374 -0.303 0.7619

waitCat 91-180 -0.23035 0.79425 0.35126 -0.656 0.5120

waitCat181-365 -0.44936 0.63803 0.40164 -1.119 0.2632

waitCat 365+ -0.69940 0.49688 0.45898 -1.524 0.1276

group AML-lo -0.55662 0.57314 0.33190 -1.677 0.0935 .

group AML-hi 0.45002 1.56835 0.29657 1.517 0.1292

hosp Alferd 0.76015 2.13860 0.35788 2.124 0.0337 *

hosp St.Vin -0.11097 0.89497 0.33610 -0.330 0.7413

hosp Hahnemann -0.92203 0.39771 0.42995 -2.145 0.0320 *

...

672 BIO 233, Spring 2015

• Permitting hospital-specific baseline hazard functions:

>

> fitS <- coxph(Surv(time, status) ~ ageP + maleP + cmvP + waitCat

+ group + strata(hosp), data=marrow)

> summary(fitS)

...

n= 137, number of events= 83

coef exp(coef) se(coef) z Pr(>|z|)

ageP 0.01191 1.01198 0.01457 0.817 0.414

maleP -0.25649 0.77376 0.26149 -0.981 0.327

cmvP -0.12152 0.88557 0.24550 -0.495 0.621

waitCat 91-180 -0.19452 0.82323 0.35227 -0.552 0.581

waitCat181-365 -0.33425 0.71587 0.39489 -0.846 0.397

waitCat 365+ -0.61556 0.54034 0.45513 -1.352 0.176

group AML-lo -0.54385 0.58051 0.33351 -1.631 0.103

group AML-hi 0.42925 1.53611 0.29958 1.433 0.152

...

• See differences in the estimated coefficients for a several of the baseline

covariates

673 BIO 233, Spring 2015

Non-proportional hazards: interactions with time

• An important drawback of using a stratified model is that the effect of the

covariate is not easily characterized

• An alternative is to directly parameterize β(·)

• One simple specification is to include an interaction with log(t) into the

linear predictor

λ(t) = λ0(t) exp{β1X1 + β∗1X1log(t) + β2X2 + . . .}

⋆ the corresponding hazard ratio is

λ(t;x′)

λ(t;x)= exp{β1 + β∗

1 log(t)} = exp{β1} × tβ∗

1

⋆ when β∗1 = 0, we have proportional hazards

⋆ when β∗1 = 1, the hazard ratio is linear in time

674 BIO 233, Spring 2015

• Practically, one can fit this model by noting that log(t) is a

time-dependent covariate

⋆ use the same ‘multiple records’ approach to construct a dataset that

appropriately updates the value of ‘log(t)’ at each risk set

• Once the model has been fit one can evaluate the null hypothesis

H0 : β∗1 = 0

to see if there is any evidence of non-proportional hazards

⋆ note that the model structure is set up to focus on specific alternatives

to proportional hazards

675 BIO 233, Spring 2015

Left truncation

• Throughout the notes we have assumed that for a given time-to-event

response variable, T

1. there is a well-defined origin, which corresponds to T=0

2. study participants are observed from T=0 onwards until either an event

occurs or they are right-censored

• Suppose interest lies in designing a study to establish risk factors for

Alzheimer’s Disease (AD)

• Since AD can manifest at (almost) any age, an ‘ideal’ study would

⋆ enroll a random sample of births

⋆ follow each individual over the course of their life, with regular

cognitive evaluations

• Clearly this isn’t practical and an alternative design would need to be used

676 BIO 233, Spring 2015

• Consider the Adult Changes in Thought (ACT) study

⋆ an on-going prospective study of incident AD and dementia

⋆ western WA state

⋆ members of Group Health Cooperative who were aged 65 years or older

• Enrollment for ACT has involved three main phases:

⋆ the initial cohort, enrolled between 1994-1996

⋆ a supplementary cohort, enrolled between 2000-2002

⋆ continuous enrollment from 2005 onwards

• At the outset, each potential participant undergoes an initial evaluation

⋆ enrollment was restricted to individuals who were ‘cognitively intact’

• Thereafter, cognitive evaluations are performed biennially

⋆ ‘time of diagnosis’ taken to be the mid-point of the two-year interval

677 BIO 233, Spring 2015

• Follow-up was censored administratively in 2012 or at age 89, whichever

occurs first

• The observed data consists of n=3,602 records

• Among the covariates measured at enrollment are

⋆ gender: 0/1 = male/female

⋆ education: 0/1/2/3 = < HS/HS/some college/graduate

⋆ marital status: 0/1 = not married/married

⋆ depression: 0/1 = no/yes

• Consider the following four ACT participants:

year female eduCat married depression ageEnroll ageDx ageDeath ageCensor

#1 2000 0 0 1 0 74 82 83 86

#2 2006 1 0 0 0 78 84 NA 84

#3 1994 0 2 0 0 74 NA 81 89

#4 2008 1 2 1 0 76 NA NA 80

678 BIO 233, Spring 2015

• Graphically, the observed data for these individuals can be represented as:

Age, years

65 70 75 80 85 90

X ]

X )

]

)

X])

Unobserved person−timeObserved person−timeAlzheimer’s Disease eventDeath eventCensoring

679 BIO 233, Spring 2015

• The structure of these data does not conform to the paradigm for

time-to-event response data that we’ve considered so far

• As an alternative, we could consider time since study enrollment:

Time since enrollment, years

0 5 10 15 20 25

X ]

X )

]

)

X])

Observed person−timeAlzheimer’s Disease eventDeath eventCensoring

680 BIO 233, Spring 2015

Q: Focusing on mortality, as we consider time to death from enrollment, what

do we mean by ‘time’? Is this notion of time interpretable beyond the

context of the study?

• Returning to age as the time scale, we have to address the problem that

the observed data is subject to left truncation

⋆ observation of person time did not begin at the origin for the age time

scale (i.e. at birth)

⋆ person-time prior to the start of observation cannot be considered

⋆ there is no potential for the individual to be at risk to be observed to

experience the event

• To formalize this, we can introduce a delayed entry time

⋆ random variable, V

⋆ in theory, this is well-defined for all individuals in the study

⋆ analogous to the right censoring time, C

681 BIO 233, Spring 2015

• Given the observed data, we have no choice but to condition on the event

‘T > V ’ in our analyses

• For example, we can readily estimate hazard functions of the form:

λ(t| X, V ) = lim∆→0

1

∆P(t ≤ T < t+∆| T ≥ t| X, T > V )

⋆ conditional hazard function, given that you made it to the delayed

entry time

Q: How does this impact the interpretability of the results?

• Fortunately, under the independence assumption:

T ⊥⊥ V | X

one can easily show that

λ(t| X, V ) = λ(t| X) = λ(t| X) lim∆→0

1

∆P(t ≤ T < t+∆| T ≥ t| X)

682 BIO 233, Spring 2015

• Consequently, we can perform valid estimation/inference for the hazard

function of primary interest

⋆ analogous to the independence assumption we routinely make for

right-censored data

Q: Is this assumption reasonable for ACT?

• Operationally, analyses in R can proceed as follows:

> ## Structure of the observed data

>

> act

year female eduCat married depression ageEnroll ageDx ageDeath

1 2010 0 3 1 0 71 NA NA

2 1995 1 2 1 0 76 NA NA

3 2008 1 3 0 0 84 NA NA

4 1995 0 3 1 0 68 NA 69

5 2002 0 3 0 0 74 NA 79

...

683 BIO 233, Spring 2015

> ## Manipulations to get the analysis variables

>

> ## Time to diagnosis

> ##

> act$T1 <- act$ageDx

> act$T1[is.na(act$T1)] <- 999

>

> ## Time to death

> ##

> act$T2 <- act$ageDeath

> act$T2[is.na(act$T2)] <- 999

>

> ## Remove folks who did not have at least 2 years of follow-up

> ##

> act <- act[act$T2 >= (act$ageEnroll + 2),]

>

> ##

> act$Y0 <- act$ageEnroll

> act$Y1 <- pmin(act$T1, act$T2, act$ageCensor)

> act$Y2 <- pmin(act$T2, act$ageCensor)

> ##

> act$delta1 <- as.numeric(act$Y1 == act$T1)

> act$delta2 <- as.numeric(act$Y2 == act$T2)

684 BIO 233, Spring 2015

> ## Observation on the "time since enrollment" scale

> ##

> act$Ystar <- act$Y2 - act$Y0

>

> ## Standardize age to make the baseline hazard function and age

> ## "contrast" more interpretable

> ##

> act$ageEnroll <- (act$ageEnroll - 75) / 5

>

> ##

> library(survival)

> getHR <- function(fit, alpha=0.05, digits=2)

+ {

+ beta <- coef(fit)

+ se <- sqrt(diag(fit$var))

+ value <- matrix(rep(beta, 3), ncol=3)

+ value <- value + qnorm(1-alpha/2)

+ * matrix(rep(c(0,-1,1), length(beta)), ncol=3, byrow=TRUE)

+ * matrix(rep(se, 3), ncol=3)

+ value <- round(exp(value), digits=digits)

+ dimnames(value) <- list(names(beta), c("HR", "Lower", "Upper"))

+ return(value)

+ }

685 BIO 233, Spring 2015

> ## Model on the "time to enrollment" time scale

> ## - missingness is due to the depression variable

> ##

> fit10 <- coxph(Surv(Ystar, delta2) ~ female + married + depression

> + factor(eduCat) + ageEnroll, data=act)

> summary(fit10)

Call:

coxph(formula = Surv(Ystar, delta2) ~ female + married + depression +

factor(eduCat) + ageEnroll, data = act)

n= 3529, number of events= 1160

(73 observations deleted due to missingness)

coef exp(coef) se(coef) z Pr(>|z|)

female -0.53783 0.58401 0.06331 -8.496 < 2e-16 ***

married -0.17833 0.83667 0.06408 -2.783 0.00539 **

depression 0.51223 1.66900 0.08527 6.007 1.89e-09 ***

factor(eduCat)1 0.01047 1.01052 0.10261 0.102 0.91873

factor(eduCat)2 -0.10628 0.89918 0.07041 -1.509 0.13118

factor(eduCat)3 -0.33870 0.71270 0.08569 -3.953 7.72e-05 ***

ageEnroll 0.50363 1.65471 0.03425 14.706 < 2e-16 ***

---

Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

686 BIO 233, Spring 2015

exp(coef) exp(-coef) lower .95 upper .95

female 0.5840 1.7123 0.5159 0.6612

married 0.8367 1.1952 0.7379 0.9486

depression 1.6690 0.5992 1.4121 1.9726

factor(eduCat)1 1.0105 0.9896 0.8264 1.2356

factor(eduCat)2 0.8992 1.1121 0.7833 1.0322

factor(eduCat)3 0.7127 1.4031 0.6025 0.8430

ageEnroll 1.6547 0.6043 1.5473 1.7696

...

Q: What conclusions do we draw?

687 BIO 233, Spring 2015

> ## Model on the "time since birth" (i.e. age) time scale

> ##

> fit11 <- coxph(Surv(Y0, Y2, delta2) ~ female + married + depression

> + factor(eduCat), data=act)

> summary(fit11)

Call:

coxph(formula = Surv(Y0, Y2, delta2) ~ female + married + depression +

factor(eduCat), data = act)

n= 3529, number of events= 1160

(73 observations deleted due to missingness)

coef exp(coef) se(coef) z Pr(>|z|)

female -0.52442 0.59190 0.06340 -8.272 < 2e-16 ***

married -0.12708 0.88067 0.06355 -1.999 0.045555 *

depression 0.49500 1.64049 0.08501 5.823 5.78e-09 ***

factor(eduCat)1 0.01998 1.02018 0.10263 0.195 0.845638

factor(eduCat)2 -0.08641 0.91721 0.07028 -1.230 0.218833

factor(eduCat)3 -0.31308 0.73119 0.08551 -3.662 0.000251 ***

---

Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

...

688 BIO 233, Spring 2015

> ## Compare the results between the two analyses

> ##

> cbind(getHR(fit10), rbind(getHR(fit11), c(NA, NA, NA)))

HR Lower Upper HR Lower Upper

female 0.58 0.52 0.66 0.59 0.52 0.67

married 0.84 0.74 0.95 0.88 0.78 1.00

depression 1.67 1.41 1.97 1.64 1.39 1.94

factor(eduCat)1 1.01 0.83 1.24 1.02 0.83 1.25

factor(eduCat)2 0.90 0.78 1.03 0.92 0.80 1.05

factor(eduCat)3 0.71 0.60 0.84 0.73 0.62 0.86

ageEnroll 1.65 1.55 1.77 NA NA NA

• Generally draw the same conclusions

689 BIO 233, Spring 2015

• Now let’s consider AD/dementia as a time-dependent covariate

> ## Create a new dataset with AD/dementia diagnosis as a time-dependent

> ##covariate

> ##

> ## - 111 patients have a diagnosis of AD/dementia but it’s the same age as

> ## their date of death

> ## - something needs to be done or else we loose these folks

> ## - make a slight modification that gives them each 0.5 years

>

> ##

> actTV <- act[ ,c("female", "eduCat", "married", "depression",

> "Y0", "Y1", "Y2", "delta1", "delta2")]

> ##

> bad <- (actTV$Y1 == actTV$Y2) & (actTV$delta1 == 1)

> actTV$Y2[bad] <- actTV$Y2[bad] + 0.5

> ##

> n <- nrow(actTV)

> group0 <- c(1:n)[actTV$delta1 == 0]

> group1 <- c(1:n)[actTV$delta1 == 1]

> ##

> actTV <- actTV[c(group0, rep(group1, rep(2, length(group1)))),]

690 BIO 233, Spring 2015

> actTV$RxAD <- c(rep(0, length(group0)), rep(c(0,1), length(group1)))

> ##

> actTV$YS <- actTV$Y0

> actTV$YE <- actTV$Y1

> actTV$deltaE <- actTV$delta2

> ##

> actTV$YS[actTV$RxAD == 1] <- actTV$Y1[actTV$RxAD == 1]

> actTV$YE[actTV$RxAD == 1] <- actTV$Y2[actTV$RxAD == 1]

> actTV$deltaE[actTV$RxAD == 0 & actTV$delta1 == 1] <- 0

>

> ##

> dim(actTV)

[1] 4297 14

> actTV[c(1:3, 4294:4297),

> c("Y0", "Y1", "delta1", "Y2", "delta2", "RxAD", "YS", "YE", "deltaE")]

Y0 Y1 delta1 Y2 delta2 RxAD YS YE deltaE

1 71 73 0 73.0 0 0 71 73.0 0

2 76 89 0 89.0 0 0 76 89.0 0

3 84 88 0 88.0 0 0 84 88.0 0

4143 73 86 1 89.0 0 0 73 86.0 0

4143.1 73 86 1 89.0 0 1 86 89.0 0

4145 72 89 1 89.5 1 0 72 89.0 0

4145.1 72 89 1 89.5 1 1 89 89.5 1

691 BIO 233, Spring 2015

> ## Model with RxAD as a time-varying covariate

> ##

> fitTV <- coxph(Surv(YS, YE, delta2) ~ RxAD + female + married + depression

> + factor(eduCat), data=actTV)

> summary(fitTV)

Call:

coxph(formula = Surv(YS, YE, delta2) ~ RxAD + female + married +

depression + factor(eduCat), data = actTV)

n= 4220, number of events= 1496

(77 observations deleted due to missingness)

coef exp(coef) se(coef) z Pr(>|z|)

RxAD 1.11216 3.04093 0.06575 16.916 < 2e-16 ***

female -0.49140 0.61177 0.05584 -8.801 < 2e-16 ***

married -0.15482 0.85657 0.05605 -2.762 0.00574 **

depression 0.47761 1.61221 0.07376 6.475 9.46e-11 ***

factor(eduCat)1 -0.03641 0.96425 0.09155 -0.398 0.69085

factor(eduCat)2 -0.11453 0.89179 0.06155 -1.861 0.06278 .

factor(eduCat)3 -0.40540 0.66671 0.07642 -5.305 1.13e-07 ***

---

Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

692 BIO 233, Spring 2015

exp(coef) exp(-coef) lower .95 upper .95

RxAD 3.0409 0.3288 2.6733 3.4592

female 0.6118 1.6346 0.5484 0.6825

married 0.8566 1.1674 0.7675 0.9560

depression 1.6122 0.6203 1.3952 1.8630

factor(eduCat)1 0.9642 1.0371 0.8059 1.1538

factor(eduCat)2 0.8918 1.1213 0.7904 1.0061

factor(eduCat)3 0.6667 1.4999 0.5740 0.7744

Q: Interpretation of the hazard ratio estimate for RxAD?

693 BIO 233, Spring 2015