Germán Rodríguez

Generalized Linear Models
Princeton University
We discuss briefly two extensions of the proportional hazards model to discrete time, starting with a definition of the hazard and survival functions in discrete time and then proceeding to models based on the logit and the complementary log-log transformations.

Let \( T \) be a discrete random variable that takes the values \( t_1 < t_2 < \ldots \) with probabilities

\[ f(t_j) = f_j = \mbox{Pr}\{T = t_j\}. \]We define the survivor function at time \( t_j \) as the probability that the survival time \( T \) is at least \( t_j \)

\[ S(t_j) = S_j = \mbox{Pr}\{T \ge t_j\} = \sum_{k=j}^\infty f_j. \]Next, we define the hazard at time \( t_j \) as the conditional probability of dying at that time given that one has survived to that point, so that

\[\tag{7.17}\lambda(t_j) = \lambda_j = \mbox{Pr}\{T = t_j | T \ge t_j\} = \frac{f_j}{S_j}.\]Note that in discrete time the hazard is a conditional probability rather than a rate. However, the general result expressing the hazard as a ratio of the density to the survival function is still valid.

A further result of interest in discrete time is that the survival function at time \( t_j \) can be written in terms of the hazard at all prior times \( t_1,\ldots,t_{j-1} \), as

\[\tag{7.18}S_j = (1-\lambda_1) (1-\lambda_2) \ldots (1-\lambda_{j-1}).\]In words, this result states that in order to survive to time \( t_j \) one must first survive \( t_1 \), then one must survive \( t_2 \) given that one survived \( t_1 \), and so on, finally surviving \( t_{j-1} \) given survival up to that point. This result is analogous to the result linking the survival function in continuous time to the integrated or cumulative hazard at all previous times.

An example of a survival process that takes place in discrete time is time to conception measured in menstrual cycles. In this case the possible values of \( T \) are the positive integers, \( f_j \) is the probability of conceiving in the \( j \)-th cycle, \( S_j \) is the probability of conceiving in the \( j \)-th cycle or later, and \( \lambda_j \) is the conditional probability of conceiving in the \( j \)-th cycle given that conception had not occurred earlier. The result relating the survival function to the hazard states that in order to get to the \( j \)-th cycle without conceiving, one has to fail in the first cycle, then fail in the second given that one didn’t succeed in the first, and so on, finally failing in the \( (j-1) \)-st cycle given that one hadn’t succeeded yet.

Cox (1972) proposed an extension of the proportional hazards model to discrete time by working with the conditional odds of dying at each time \( t_j \) given survival up to that point. Specifically, he proposed the model

\[ \frac{\lambda(t_j|\boldsymbol{x}_i)}{1-\lambda(t_j|\boldsymbol{x}_i)} = \frac{\lambda_0(t_j)}{1-\lambda_0(t_j)} \exp\{\boldsymbol{x}b\}, \]here \( \lambda(t_j|\boldsymbol{x}_i) \) is the hazard at time \( t_j \) for an individual with covariate values \( \boldsymbol{x}_i \), \( \lambda_0(t_j) \) is the baseline hazard at time \( t_j \), and \( \exp\{\boldsymbol{x}_i'\boldsymbol{\beta}\} \) is the relative risk associated with covariate values \( \boldsymbol{x}_i \).

Taking logs, we obtain a model on the *logit* of the
hazard or conditional probability of dying at \( t_j \) given survival
up to that time,

where \( \alpha_j=\mbox{logit}\lambda_0(t_j) \) is the logit of the baseline hazard and \( \boldsymbol{x}_i'\boldsymbol{\beta} \) is the effect of the covariates on the logit of the hazard. Note that the model essentially treats time as a discrete factor by introducing one parameter \( \alpha_j \) for each possible time of death \( t_j \). Interpretation of the parameters \( \boldsymbol{\beta} \) associated with the other covariates follows along the same lines as in logistic regression.

In fact, the analogy with logistic regression goes further: we can fit the discrete-time proportional-hazards model by running a logistic regression on a set of pseudo observations generated as follows. Suppose individual \( i \) dies or is censored at time point \( t_{j(i)} \). We generate death indicators \( d_{ij} \) that take the value one if individual \( i \) died at time \( j \) and zero otherwise, generating one for each discrete time from \( t_1 \) to \( t_{j(i)} \). To each of these indicators we associate a copy of the covariate vector \( \boldsymbol{x}_i \) and a label \( j \) identifying the time point. The proportional hazards model 7.19 can then be fit by treating the \( d_{ij} \) as independent Bernoulli observations with probability given by the hazard \( \lambda_{ij} \) for individual \( i \) at time point \( t_j \).

More generally, we can group pseudo-observations with identical covariate values. Let \( d_{ij} \) denote the number of deaths and \( n_{ij} \) the total number of individuals with covariate values \( \boldsymbol{x}_i \) observed at time point \( t_j \). Then we can treat \( d_{ij} \) as binomial with parameters \( n_{ij} \) and \( \lambda_{ij} \), where the latter satisfies the proportional hazards model.

The proof of this result runs along the same lines as the proof of the equivalence of the Poisson likelihood and the likelihood for piece-wise exponential survival data under non-informative censoring in Section 7.4.3, and relies on Equation 7.18, which writes the probability of surviving to time \( t_j \) as a product of the conditional hazards at all previous times. It is important to note that we do not assume that the pseudo-observations are independent and have a Bernoulli or binomial distribution. Rather, we note that the likelihood function for the discrete-time survival model under non-informative censoring coincides with the binomial likelihood that would be obtained by treating the death indicators as independent Bernoulli or binomial.

Time-varying covariates and time-dependent effects can be introduced in this model along the same lines as before. In the case of time-varying covariates, note that only the values of the covariates at the discrete times \( t_1 < t_2 < \ldots \) are relevant. Time-dependent effects are introduced as interactions between the covariates and the discrete factor (or set of dummy variables) representing time.

An alternative extension of the proportional hazards model to discrete time starts from the survival function, which in a proportional hazards framework can be written as

\[ S(t_j|\boldsymbol{x}_i) = S_0(t_j)^{\exp\{\boldsymbol{x}b\}}, \]where \( S(t_j|\boldsymbol{x}_i) \) is the probability that an individual with covariate values \( \boldsymbol{x}_i \) will survive up to time point \( t_j \), and \( S_0(t_j) \) is the baseline survival function. Recalling Equation 7.18 for the discrete survival function, we obtain a similar relationship for the complement of the hazard function, namely

\[ 1-\lambda(t_j|\boldsymbol{x}_i) = [1-\lambda_0(t_j)]^{ \exp\{\boldsymbol{x}b\}}, \]so that solving for the hazard for individual \( i \) at time point \( t_j \) we obtain the model

\[ \lambda(t_j|\boldsymbol{x}_i) = 1 - [1-\lambda_0(t_j)]^{ \exp\{\boldsymbol{x}b\}}. \]The transformation that makes the right hand side a linear function of the parameters is the complementary log-log. Applying this transformation we obtain the model

\[\tag{7.20}\log(-\log(1-\lambda(t_j|\boldsymbol{x}_i))) = \alpha_j + \boldsymbol{x}b,\]where \( \alpha_j = \log(-\log(1-\lambda_0(t_j))) \) is the complementary log-log transformation of the baseline hazard.

This model can be fitted to discrete survival data by generating pseudo-observations as before and fitting a generalized linear model with binomial error structure and complementary log-log link. In other words, the equivalence between the binomial likelihood and the discrete-time survival likelihood under non-informative censoring holds both for the logit and complementary log-log links.

It is interesting to note that this model can be obtained by grouping time in the continuous-time proportional-hazards model. To see this point let us assume that time is continuous and we are really interested in the standard proportional hazards model

\[ \lambda(t|\boldsymbol{x}) = \lambda_0(t) \exp\{\boldsymbol{x}b\}. \]Suppose, however, that time is grouped into intervals with boundaries \( 0=\tau_0 < \tau_1 < \ldots < \tau_J=\infty \), and that all we observe is whether an individual survives or dies in an interval. Note that this construction imposes some constraints on censoring. If an individual is censored at some point inside an interval, we do not know whether it would have survived the interval or not. Therefore we must censor it at the end of the previous interval, which is the last point for which we have complete information. Unlike the piece-wise exponential set-up, here we can not use information about exposure to part of an interval. On the other hand, it turns out that we do not need to assume that the hazard is constant in each interval.

Let \( \lambda_{ij} \) denote the discrete hazard or conditional probability that individual \( i \) will die in interval \( j \) given that it was alive at the start of the interval. This probability is the same as the complement of the conditional probability of surviving the interval given that one was alive at the start, and can be written as

\[ \tag{7.21}\begin{eqnarray*} \lambda_{ij} &= &1- \mbox{Pr}\{T_i>\tau_j|T_i>\tau_{j-1}\} \cr &= & 1-\exp\{ - \int_{\tau_{j-1}}^{\tau_j} \lambda(t|\boldsymbol{x}_i) dt\} \cr &= & 1-\exp\{-\int_{\tau_{j-1}}^{\tau_j} \lambda_0(t)dt\}^{\exp\{\boldsymbol{x}b\}} \cr &= & 1- (1-\lambda_j)^{\exp\{\boldsymbol{x}b\}},\end{eqnarray*} \]where \( \lambda_j \) is the baseline probability of dying in interval \( j \) given survival to the start of the interval. The second line follows from Equation 7.4 relating the survival function to the integrated hazard, the third line follows from the proportional hazards assumption, and the last line defines \( \lambda_j \).

As noted by Kalbfleish and Prentice (1980, p. 37), “this discrete model is then the uniquely appropriate one for grouped data from the continuous proportional hazards model”. In practice, however, the model with a logit link is used much more often than the model with a c-log-log link, probably because logistic regression is better known that generalized linear models with c-log-log links, and because software for the former is more widely available than for the latter. In fact, the logit model is often used in cases where the piece-wise exponential model would be more appropriate, probably because logistic regression is better known than Poisson regression.

In closing, it may be useful to provide some suggestions regarding the choice of approach to survival analysis using generalized linear models:

If time is truly discrete, then one should probably use the discrete model with a logit link, which has a direct interpretation in terms of conditional odds, and is easily implemented using standard software for logistic regression.

If time is continuous but one only observes it in grouped form, then the complementary log-log link would seem more appropriate. In particular, results based on the c-log-log link should be more robust to the choice of categories than results based on the logit link. However, one cannot take into account partial exposure in a discrete time context, no matter which link is used.

If time is continuous and one is willing to assume that the hazard is constant in each interval, then the piecewise exponential approach based on the Poisson likelihood is preferable. This approach is reasonably robust to the choice of categories and is unique in allowing the use of information from cases that have partial exposure.

Finally, if time is truly continuous and one wishes to estimate the effects of the covariates without making any assumptions about the baseline hazard, then Cox’s (1972) partial likelihood is a very attractive approach.