![]() |
|
![]() | |||
|
|
|||||
Up to this point we have been concerned with a homogeneous population, where the lifetimes of all units are governed by the same survival function S(t). We now introduce the third distinguishing characteristic of survival models-the presence of a vector of covariates or explanatory variables that may affect survival time-and consider the general problem of modeling these effects.
Let Ti be a random variable representing the (possibly unobserved) survival time of the i-th unit. Since Ti must be non-negative, we might consider modeling its logarithm using a conventional linear model, say
|
Exponentiating this equation, we obtain a model for the survival time itself
|
Interpretation of the parameters follows along standard lines. Consider, for example, a model with a constant and a dummy variable x representing a factor with two levels, say groups one and zero. Suppose the corresponding multiplicative effect is g = 2, so the coefficient of x is b = log(2) = 0.6931. Then we would conclude that people in group one live twice as long as people in group zero.
There is an interesting alternative interpretation that explains the name `accelerated life' used for this model. Let S0(t) denote the survivor function in group zero, which will serve as a reference group, and let S1(t) denote the survivor function in group one. Under this model,
|
For the record, the corresponding hazard functions are related by
|
The name `accelerated life' stems from industrial applications where items are put to test under substantially worse conditions than they are likely to encounter in real life, so that tests can be completed in a shorter time.
Different kinds of parametric models are obtained by assuming different distributions for the error term. If the ei are normally distributed, then we obtain a log-normal model for the Ti. Estimation of this model for censored data by maximum likelihood is known in the econometric literature as a Tobit model.
Alternatively, if the ei have an extreme value distribution with p.d.f.
|
|
|
Accelerated life models are essentially standard regression models applied to the log of survival time, and except for the fact that observations are censored, pose no new estimation problems. Once the distribution of the error term is chosen, estimation proceeds by maximizing the log-likelihood for censored data described in the previous subsection. For further details, see Kalbfleish and Prentice (1980).
A large family of models introduced by Cox (1972) focuses directly on the hazard function. The simplest member of the family is the proportional hazards model, where the hazard at time t for an individual with covariates xi (not including a constant) is assumed to be
| (7.10) |
To fix ideas consider a two-sample problem where we have a dummy variable x which serves to identify groups one and zero. Then the model is
|
Note that the model separates clearly the effect of time from the effect of the covariates. Taking logs, we find that the proportional hazards model is a simple additive model for the log of the hazard, with
|
Returning to Equation 7.10, we can integrate both sides from 0 to t to obtain the cumulative hazards
|
| (7.11) |
In our two-group example with a relative risk of g = 2, the probability that a member of group one will be alive at any given age t is the square of the probability that a member of group zero would be alive at the same age.
Different kinds of proportional hazard models may be obtained by making different assumptions about the baseline survival function, or equivalently, the baseline hazard function. For example if the baseline risk is constant over time, so l0(t) = l0, say, we obtain the exponential regression model, where
|
You may be wondering whether there are other cases where the two models coincide. The answer is yes, but not many. In fact, there is only one distribution where they do, and it includes the exponential as a special case. The one case where the two families coincide is the Weibull distribution, which has survival function
|
|
If we pick the Weibull as a baseline risk and then multiply the hazard by a constant g in a proportional hazards framework, the resulting distribution turns out to be still a Weibull, so the family is closed under proportionality of hazards. If we pick the Weibull as a baseline survival and then speed up the passage of time in an accelerated life framework, dividing time by a constant g, the resulting distribution is still a Weibull, so the family is closed under acceleration of time.
For further details on this distribution see Cox and Oakes (1984) or Kalbfleish and Prentice (1980), who prove the equivalence of the two Weibull models.
So far we have considered explicitly only covariates that are fixed over time. The local nature of the proportional hazards model, however, lends itself easily to extensions that allows for covariates that change over time. Let us consider a few examples.
Suppose we are interested in the analysis of birth spacing, and study the interval from the birth of one child to the birth of the next. One of the possible predictors of interest is the mother's education, which in most cases can be taken to be fixed over time.
Suppose, however, that we want to introduce breastfeeding status of the child that begins the interval. Assuming the child is breastfed, this variable would take the value one (`yes') from birth until the child is weaned, at which time it would take the value zero (`no'). This is a simple example of a predictor that can change value only once.
A more elaborate analysis could rely on frequency of breastfeeding in a 24-hour period. This variable could change values from day to day. For example a sequence of values for one woman could be 4,6,5,6,5,4,
Let xi(t) denote the value of a vector of covariates for individual i at time or duration t. Then the proportional hazards model may be generalized to
| (7.12) |
Calculation of survival functions when we have time-varying covariates is a little bit more complicated, because we need to specify a path or trajectory for each variable. In the birth intervals example one could calculate a survival function for women who breastfeed for six months and then wean. This would be done by using the hazard corresponding to x(t) = 0 for months 0 to 6 and then the hazard corresponding to x(t) = 1 for months 6 onwards. Unfortunately, the simplicity of Equation 7.11 is lost; we can no longer simply raise the baseline survival function to a power.
Time-varying covariates can be introduced in the context of accelerated life models, but this is not so simple and has rarely been done in applications. See Cox and Oakes (1984, p.66) for more information.
The model may also be generalized to allow for effects that vary over time, and therefore are no longer proportional. It is quite possible, for example, that certain social characteristics might have a large impact on the hazard for children shortly after birth, but may have a relatively small impact later in life. To accommodate such models we may write
|
This model allows for great generality. In the two-sample case, for example, the model may be written as
|
Usually the form of time dependence of the effects must be specified parametrically in order to be able to identify the model and estimate the parameters. Obvious candidates are polynomials on duration, where b(t) is a linear or quadratic function of time. Cox and Oakes (1984, p. 76) show how one can use quick-dampening exponentials to model transient effects.
Note that we have lost again the simple separation of time and covariate effects. Calculation of the survival function in this model is again somewhat complicated by the fact that the coefficients are now functions of time, so they don't fall out of the integral. The simple Equation 7.11 does not apply.
The foregoing extensions to time-varying covariates and time-dependent effects may be combined to give the most general version of the hazard rate model, as
|
The case of breastfeeding status and its effect on the length of birth intervals is a good example that combines the two effects. Breastfeeding status is itself a time-varying covariate x(t), which takes the value one if the woman is breastfeeding her child t months after birth. The effect that breastfeeding may have in inhibiting ovulation and therefore reducing the risk of pregnancy is known to decline rapidly over time, so it should probably be modeled as a time dependent effect b(t). Again, further progress would require specifying the form of this function of time.
There are essentially three approaches to fitting survival models:
A complete discussion of these approaches in well beyond the scope of these notes. We will focus on the intermediate or semi-parametric approach because (1) it is sufficiently flexible to provide a useful tool with wide applicability, and (2) it is closely related to Poisson regression analysis.
Continue with 7.4. The Piece-Wise Exponential Model
Copyright © Germán Rodríguez, 1993-2000.
Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.