7.3 Approaches to Survival Modeling
Up to this point we have been concerned with a homogeneous population, where
the lifetimes of all units are governed by the same survival function
S(t). We now introduce the third distinguishing characteristic of
survival models-the presence of a vector of covariates or explanatory
variables that may affect survival time-and consider the general problem
of modeling these effects.
7.3.1 Accelerated Life Models*
Let Ti be a random variable representing the (possibly unobserved)
survival time of the i-th unit. Since Ti must be non-negative, we might
consider modeling its logarithm using a conventional linear model, say
where
ei is a suitable error term, with a distribution to be
specified. This model specifies the distribution of log-survival
for the i-th unit as a simple
shift of a standard or baseline
distribution represented by the error term.
Exponentiating this equation, we obtain a model for the
survival time itself
where we have written T
0i for the exponentiated error term.
It will also be convenient to use
gi as shorthand for the
multiplicative effect exp{
xiβ} of the covariates.
Interpretation of the parameters follows along standard lines.
Consider, for example, a model with a constant and a
dummy variable x representing a factor with two levels,
say groups one and zero. Suppose the
corresponding multiplicative effect is g = 2, so the
coefficient of x is β = log(2) = 0.6931. Then we
would conclude that people in group one live twice as long
as people in group zero.
There is an interesting alternative interpretation that
explains the name `accelerated life' used for this model.
Let S0(t) denote the survivor function in group zero,
which will serve as a reference group, and let S1(t)
denote the survivor function in group one. Under this
model,
In words, the probability that a member of group one will
be alive at age t is exactly the same as the probability
that a member of group zero will be alive at age t/
g.
For
g = 2, this would be half the age, so the probability
that a member of group one would be alive at age 40 (or 60)
would be the same as the probability that a member of group
zero would be alive at age 20 (or 30). Thus, we may think of
g as affecting the passage of time. In our example,
people in group zero age `twice as fast'.
For the record, the corresponding hazard functions are
related by
so if
g = 2, at any given age people in group one would be
exposed to half the risk of people in group zero half their age.
The name `accelerated life' stems from industrial applications
where items are put to test under substantially worse conditions
than they are likely to encounter in real life, so that tests
can be completed in a shorter time.
Different kinds of parametric models are obtained by assuming
different distributions for the error term.
If the ei are normally distributed, then we obtain
a log-normal model for the Ti. Estimation of this model for
censored data by maximum likelihood is known in the econometric
literature as a Tobit model.
Alternatively, if the ei have an
extreme value distribution with p.d.f.
|
f(e) = exp{ e- exp{ e} }, |
|
then T
0i has an exponential distribution, and we
obtain the exponential regression model, where T
i is
exponential with hazard
li satisfying the
log-linear model
An example of a demographic model that belongs to the
family of accelerated life models is the Coale-McNeil model
of first marriage frequencies, where the proportion ever married
at age a in a given population is written as
where F
0 is a model schedule of proportions married by
age, among women who will ever marry, based on historical
data from Sweden; c is the proportion who will eventually
marry, a
0 is the age at which marriage starts, and
k is the
pace at which marriage proceeds relative to
the Swedish standard.
Accelerated life models are essentially standard regression
models applied to the log of survival time, and except for
the fact that observations are censored, pose no new
estimation problems. Once the distribution of the error
term is chosen, estimation proceeds by maximizing the
log-likelihood for censored data described in the previous
subsection. For further details, see Kalbfleish and Prentice (1980).
7.3.2 Proportional Hazard Models
A large family of models introduced by Cox (1972) focuses
directly on the hazard function.
The simplest member of the family is the proportional
hazards model, where the hazard at time t for an
individual with covariates xi (not including a constant)
is assumed to be
|
li(t|xi) = l0(t) exp{ xiβ}. |
| (7.10) |
In this model
l0(t) is a baseline hazard function
that describes the risk for individuals with
xi = 0,
who serve as a reference cell or pivot, and
exp{
xiβ} is the relative risk, a proportionate
increase or reduction in risk, associated with the
set of characteristics
xi. Note that the increase
or reduction in risk is the same at all durations t.
To fix ideas consider a two-sample problem where we have
a dummy variable x which serves to identify groups one
and zero. Then the model is
Thus,
l0(t) represents the risk at time t in group zero,
and
g = exp{β} represents the ratio of the risk
in group one relative to group zero at any time t.
If
g = 1 (or β = 0) then the risks are the same
in the two groups. If
g = 2 (or β = 0.6931),
then the risk for an individual in group one at any given
age is twice the risk of a member of group zero who has the
same age.
Note that the model separates clearly the effect of time
from the effect of the covariates. Taking logs, we find
that the proportional hazards model is a simple
additive model for the log of the hazard, with
|
logli(t|xi) = α0(t) + xiβ, |
|
where α
0(t) = log
l0(t) is the log of the baseline hazard.
As in all additive models, we assume that the effect of the covariates
x is the same at all times or ages t. The similarity
between this expression and a standard analysis of covariance
model with parallel lines should not go unnoticed.
Returning to Equation 7.10, we can integrate
both sides from 0 to t to obtain the cumulative hazards
|
Li(t|xi) = L0(t) exp{ xiβ}, |
|
which are also proportional. Changing signs and exponentiating
we obtain the survivor functions
|
Si(t|xi) = S0(t) exp{ xiβ}, |
| (7.11) |
where S
0(t) = exp{ -
L0(t) } is a baseline
survival function. Thus, the effect of the covariate
values
xi on the survivor function is to raise it
to a power given by the relative risk exp{
xiβ}.
In our two-group example with a relative risk of g = 2,
the probability that a member of group one will be alive
at any given age t is the square of the probability that
a member of group zero would be alive at the same age.
7.3.3 The Exponential and Weibull Models
Different kinds of proportional hazard models
may be obtained by making different
assumptions about the baseline survival function, or equivalently,
the baseline hazard function. For example if the baseline
risk is constant over time, so l0(t) = l0, say,
we obtain the exponential regression model, where
Interestingly, the exponential regression model belongs to
both the proportional hazards and the accelerated life families.
If the baseline risk is a constant and you double or triple
the risk, the new risk is still constant (just higher).
Perhaps less obviously, if the baseline risk is constant
and you imagine time flowing twice or three times as fast,
the new risk is doubled or tripled but is still constant
over time, so we remain in the exponential family.
You may be wondering whether there are other cases where
the two models coincide. The answer is yes, but not many.
In fact, there is only one distribution where they do,
and it includes the exponential as a special case.
The one case where the two families coincide is the Weibull
distribution, which has survival function
and hazard function
for parameters
l > 0 and p > 0. If p = 1, this
model reduces to the exponential and has constant risk
over time. If p > 1, then the risk increases over time.
If p < 1, then the risk decreases over time. In fact,
taking logs in the expression for the hazard function, we
see that the log of the Weibull risk is a linear function
of log time with slope p-1.
If we pick the Weibull as a baseline risk and then
multiply the hazard by a constant g in a
proportional hazards framework, the resulting distribution
turns out to be still a Weibull, so the family is closed
under proportionality of hazards.
If we pick the Weibull as a baseline survival and then
speed up the passage of time in an accelerated life
framework, dividing time by a constant g,
the resulting distribution is still a Weibull, so the
family is closed under acceleration of time.
For further details on this distribution see Cox and
Oakes (1984) or Kalbfleish and Prentice (1980), who
prove the equivalence of the two Weibull models.
7.3.4 Time-varying Covariates
So far we have considered explicitly only covariates that are
fixed over time. The local nature of the proportional hazards
model, however, lends itself easily to extensions that
allows for covariates that change over time. Let us consider
a few examples.
Suppose we are interested in the analysis of birth spacing,
and study the interval from the birth of one child to the
birth of the next. One of the possible predictors of interest
is the mother's education, which in most cases can be taken
to be fixed over time.
Suppose, however, that we want to introduce breastfeeding status
of the child that begins the interval. Assuming the child
is breastfed, this variable would take the value one (`yes')
from birth until the child is weaned, at which time it
would take the value zero (`no'). This is a simple example
of a predictor that can change value only once.
A more elaborate analysis could rely on frequency of
breastfeeding in a 24-hour period. This variable could
change values from day to day. For example a sequence of
values for one woman could be 4,6,5,6,5,4,
Let xi(t) denote the value of a vector of covariates
for individual I at time or duration t. Then the
proportional hazards model may be generalized to
|
li(t, xi(t)) = l0(t) exp{xi(t)β}. |
| (7.12) |
The separation of duration and covariate effects is not so
clear now, and on occasion it may be difficult to identify
effects that are highly collinear with time. If all children
were weaned when they are around six months old, for example,
it would be difficult to identify effects of breastfeeding from
general duration effects without additional information.
In such cases one might still prefer a time-varying covariate,
however, as a more meaningful predictor of risk than the
mere passage of time.
Calculation of survival functions when we have time-varying
covariates is a little bit more complicated, because we
need to specify a path or trajectory for each variable.
In the birth intervals example one could calculate a
survival function for women who breastfeed for six months
and then wean. This would be done by using the hazard
corresponding to x(t) = 0 for months 0 to 6 and then the
hazard corresponding to x(t) = 1 for months 6 onwards.
Unfortunately, the simplicity of Equation 7.11
is lost; we can no longer simply raise the baseline
survival function to a power.
Time-varying covariates can be introduced in the context
of accelerated life models, but this is not so simple and
has rarely been done in applications. See Cox and
Oakes (1984, p.66) for more information.
7.3.5 Time-dependent Effects
The model may also be generalized to allow for effects
that vary over time, and therefore are no longer proportional.
It is quite possible, for example,
that certain social characteristics might have a large
impact on the hazard for children shortly after birth, but
may have a relatively small impact later in life.
To accommodate such models we may write
|
li(t,xi) = l0(t) exp{ xiβ(t)}, |
|
where the parameter β(t) is now a function of time.
This model allows for great generality. In the two-sample
case, for example, the model may be written as
which basically allows for two arbitrary hazard functions, one
for each group. Thus, this is a form of saturated model.
Usually the form of time dependence of the effects must be
specified parametrically in order to be able to identify the
model and estimate the parameters. Obvious candidates are
polynomials on duration, where β(t) is a linear or
quadratic function of time. Cox and Oakes (1984, p. 76)
show how one can use quick-dampening exponentials to model
transient effects.
Note that we have lost again the simple separation of time
and covariate effects. Calculation of the survival function
in this model is again somewhat complicated by the fact that
the coefficients are now functions of time, so they don't fall
out of the integral. The simple Equation 7.11 does
not apply.
7.3.6 The General Hazard Rate Model
The foregoing extensions to time-varying covariates and
time-dependent effects may be combined to give the most
general version of the hazard rate model, as
|
li(t,xi(t)) = l0(t) exp{ xi(t)β(t) }, |
|
where
xi(t) is a vector of time-varying covariates
representing the characteristics of individual I at time t,
and β(t) is a vector of time-dependent coefficients,
representing the effect that those characteristics have
at time or duration t.
The case of breastfeeding status and its effect on the length
of birth intervals is a good example that combines the two
effects. Breastfeeding status is itself a time-varying
covariate x(t), which takes the value one if the woman is
breastfeeding her child t months after birth.
The effect that breastfeeding may have in inhibiting
ovulation and therefore reducing the risk of pregnancy is
known to decline rapidly over time,
so it should probably be modeled as a time dependent effect
β(t). Again, further progress would require specifying
the form of this function of time.
7.3.7 Model Fitting
There are essentially three approaches to fitting survival
models:
- The first and perhaps most straightforward is
the parametric approach, where we assume a specific
functional form for the baseline hazard l0(t).
Examples are models based on the exponential, Weibull, gamma
and generalized F distributions.
- A second approach is what might be called a flexible
or semi-parametric strategy, where we make mild
assumptions about the baseline hazard l0(t).
Specifically, we may subdivide time into reasonably small
intervals and assume that the baseline hazard is constant
in each interval, leading to a piece-wise exponential model.
- The third approach is a non-parametric strategy
that focuses on estimation of the regression coefficients
β leaving the baseline hazard l0(t)
completely unspecified. This approach relies on a partial
likelihood function proposed by Cox (1972) in his original paper.
A complete discussion of these approaches in well beyond
the scope of these notes. We will focus on the intermediate
or semi-parametric approach because (1) it is sufficiently
flexible to provide a useful tool with wide applicability,
and (2) it is closely related to Poisson regression analysis.
Continue with 7.4. The Piece-Wise Exponential Model