Germán Rodríguez

Generalized Linear Models
Princeton University
We will consider fitting a proportional hazards model of the usual form

\[\tag{7.13}\lambda_i(t|\boldsymbol{x}_i) = \lambda_0(t) \exp \{ \boldsymbol{x}b \}\]under relatively mild assumptions about the baseline hazard \( \lambda_0(t) \).

Consider partitioning duration into \( J \) intervals with cutpoints \( 0 = \tau_0 < \tau_1 < \ldots < \tau_J = \infty \). We will define the \( j \)-th interval as \( [\tau_{j-1},\tau_j) \), extending from the \( (j-1) \)-st boundary to the \( j \)-th and including the former but not the latter.

We will then assume that the baseline hazard is *constant*
within each interval, so that

Thus, we model the baseline hazard \( \lambda_0(t) \) using \( J \) parameters \( \lambda_1,\ldots,\lambda_J \), each representing the risk for the reference group (or individual) in one particular interval. Since the risk is assumed to be piece-wise constant, the corresponding survival function is often called a piece-wise exponential.

Clearly, judicious choice of the cutpoints should allow us to approximate reasonably well almost any baseline hazard, using closely-spaced boundaries where the hazard varies rapidly and wider intervals where the hazard changes more slowly.

Figure 7.2 Approximating a Survival Curve Using a

Piece-wise Constant Hazard Function

Figure 7.2 shows how a Weibull distribution with \( \lambda=1 \) and \( p=0.8 \) can be approximated using a piece-wise exponential distribution with boundaries at 0.5, 1.5 and 3.5. The left panel shows how the piece-wise constant hazard can follow only the broad outline of the smoothly declining Weibull hazard yet, as shown on the right panel, the corresponding survival curves are indistinguishable.

let us now introduce some covariates in the context of the proportional hazards model in Equation 7.13, assuming that the baseline hazard is piece-wise constant as in Equation 7.14. We will write the model as

\[\tag{7.15}\lambda_{ij} = \lambda_j \exp\{ \boldsymbol{x}_i'\boldsymbol{\beta} \},\]where \( \lambda_{ij} \) is the hazard corresponding to individual \( i \) in interval \( j \), \( \lambda_j \) is the baseline hazard for interval \( j \), and \( \exp\{\boldsymbol{x}_i'\boldsymbol{\beta}\} \) is the relative risk for an individual with covariate values \( \boldsymbol{x}_i \), compared to the baseline, at any given time.

Taking logs, we obtain the additive log-linear model

\[\tag{7.16}\log \lambda_{ij} = \alpha_j + \boldsymbol{x}_i'\boldsymbol{\beta},\]where \( \alpha_j=\log\lambda_j \) is the log of the baseline hazard. Note that the result is a standard log-linear model where the duration categories are treated as a factor. Since we have not included an explicit constant, we do not have to impose restrictions on the \( \alpha_j \). If we wanted to introduce a constant representing the risk in the first interval then we would set \( \alpha_1=0 \), as usual.

The model can be extended to introduce time-varying covariates and time-dependent effects, but we will postpone discussing the details until we study estimation of the simpler proportional hazards model.

Holford (1980) and Laird and Oliver (1981), in papers produced independently and published very close to each other, noted that the piece-wise proportional hazards model of the previous subsection was equivalent to a certain Poisson regression model. We first state the result and then sketch its proof.

Recall that we observe \( t_i \), the total time lived by the \( i \)-th individual, and \( d_i \), a death indicator that takes the value one if the individual died and zero otherwise. We will now define analogous measures for each interval that individual \( i \) goes through. You may think of this process as creating a bunch of pseudo-observations, one for each combination of individual and interval.

First we create measures of exposure. Let \( t_{ij} \) denote the time lived by the \( i \)-th individual in the \( j \)-th interval, that is, between \( \tau_{j-1} \) and \( \tau_j \). If the individual lived beyond the end of the interval, so that \( t_i > \tau_j \), then the time lived in the interval equals the width of the interval and \( t_{ij}=\tau_j-\tau_{j-1} \). If the individual died or was censored in the interval, i.e. if \( \tau_{j-1} < t_i < \tau_j \), then the timed lived in the interval is \( t_{ij}=t_i-\tau_{j-1} \), the difference between the total time lived and the lower boundary of the interval. We only consider intervals actually visited, but obviously the time lived in an interval would be zero if the individual had died before the start of the interval and \( t_i < \tau_{j-1} \).

Now we create death indicators.
Let \( d_{ij} \) take the value one if individual \( i \)
dies in interval \( j \) and zero otherwise.
Let \( j(i) \) indicate the interval where \( t_i \) falls,
i.e. the interval where individual \( i \) died or was censored.
We use functional notation to emphasize that this interval
will vary from one individual to another.
If \( t_i \) falls in interval \( j(i) \), say, then \( d_{ij} \) must be zero
for all \( j

Then, the piece-wise exponential model may be fitted to data by treating the death indicators \( d_{ij} \)’s as if they were independent Poisson observations with means

\[ \mu_{ij} = t_{ij} \lambda_{ij}, \]where \( t_{ij} \) is the exposure time as defined above and \( \lambda_{ij} \) is the hazard for individual \( i \) in interval \( j \). Taking logs in this expression, and recalling that the hazard rates satisfy the proportional hazards model in Equation 7.15, we obtain

\[ \log\mu_{ij} = \log t_{ij} + \alpha_j + \boldsymbol{x}_i'\boldsymbol{\beta}, \]where \( \alpha_j=\log\lambda_j \) as before.

Thus, the piece-wise exponential proportional hazards model is equivalent to a Poisson log-linear model for the pseudo observations, one for each combination of individual and interval, where the death indicator is the response and the log of exposure time enters as an offset.

It is important to note that we do not assume that the
\( d_{ij} \) have independent Poisson distributions, because they
clearly do not. If individual \( i \) died in interval \( j(i) \),
then it must have been alive in all prior intervals \( j

The proof is not hard. Recall from Section 7.2.2 that the contribution of the \( i \)-th individual to the log-likelihood function has the general form

\[ \log L_i = d_i \log \lambda_i(t_i) - \Lambda_i(t_i), \]where we have written \( \lambda_i(t) \) for the hazard and \( \Lambda_i(t) \) for the cumulative hazard that applies to the \( i \)-th individual at time \( t \). Let \( j(i) \) denote the interval where \( t_i \) falls, as before.

Under the piece-wise exponential model, the first term in the log-likelihood can be written as

\[ d_i \log \lambda_i(t_i) = d_{ij(i)}\log\lambda_{ij(i)}, \]using the fact that the hazard is \( \lambda_{ij(i)} \) when \( t_i \) is in interval \( j(i) \), and that the death indicator \( d_i \) applies directly to the last interval visited by individual \( i \), and therefore equals \( d_{j(i)} \).

The cumulative hazard in the second term is an integral, and can be written as a sum as follows

\[ \Lambda_i(t_i) = \int_0^{t_i} \lambda_i(t)dt = \sum_{j=1}^{j(i)} t_{ij}\lambda_{ij}, \]where \( t_{ij} \) is the amount of time spent by individual \( i \) in interval \( j \). To see this point note that we need to integrate the hazard from 0 to \( t_i \). We split this integral into a sum of integrals, one for each interval where the hazard is constant. If an individual lives through an interval, the contribution to the integral will be the hazard \( \lambda_{ij} \) multiplied by the width of the interval. If the individual dies or is censored in the interval, the contribution to the integral will be the hazard \( \lambda_{ij} \) multiplied by the time elapsed from the beginning of the interval to the death or censoring time, which is \( t_i-\tau_{j-1} \). But this is precisely the definition of the exposure time \( t_{ij} \).

One slight lack of symmetry in our results is that the hazard leads
to *one* term on \( d_{ij(i)}\log \lambda_{ij(i)} \),
but the cumulative hazard
leads to \( j(i) \) terms, one for each interval from \( j=1 \) to \( j(i) \).
However, we know that \( d_{ij}=0 \) for all \( j

The fact that the contribution of the individual to the log-likelihood is a
*sum* of several terms (so the contribution to the likelihood
is a product of several terms) means that we can treat each of the
terms as representing an independent observation.

The final step is to identify the contribution of each pseudo-observation, and we note here that it agrees, except for a constant, with the likelihood one would obtain if \( d_{ij} \) had a Poisson distribution with mean \( \mu_{ij} = t_{ij}\lambda_{ij} \). To see this point write the Poisson log-likelihood as

\[ \log L_{ij} = d_{ij}\log \mu_{ij} - \mu_{ij} = d_{ij}\log(t_{ij}\lambda_{ij}) - t_{ij}\lambda_{ij}. \]This expression agrees with the log-likelihood above except for the term \( d_{ij}\log(t_{ij}) \), but this is a constant depending on the data and not on the parameters, so it can be ignored from the point of view of estimation. This completes the proof.\( \Box \)

This result generalizes the observation made at the end of Section 7.2.2 noting the relationship between the likelihood for censored exponential data and the Poisson likelihood. The extension is that instead of having just one ‘Poisson’ death indicator for each individual, we have one for each interval visited by each individual.

Generating pseudo-observations can substantially increase the size of the dataset, perhaps to a point where analysis is impractical. Note, however, that the number of distinct covariate patterns may be modest even when the total number of pseudo-observations is large. In this case one can group observations, adding up the measures of exposure and the death indicators. In this more general setting, we can define \( d_{ij} \) as the number of deaths and \( t_{ij} \) as the total exposure time of individuals with characteristics \( \boldsymbol{x}_i \) in interval \( j \). As usual with Poisson aggregate models, the estimates, standard errors and likelihood ratio tests would be exactly the same as for individual data. Of course, the model deviances would be different, representing goodness of fit to the aggregate rather than individual data, but this may be a small price to pay for the convenience of working with a small number of units.

It should be obvious from the previous development that we can easily accommodate time-varying covariates provided they change values only at interval boundaries. In creating the pseudo-observations required to set-up a Poisson log-likelihood, one would normally replicate the vector of covariates \( \boldsymbol{x}_i \), creating copies \( \boldsymbol{x}_{ij} \), one for each interval. However, there is nothing in our development requiring these vectors to be equal. We can therefore redefine \( \boldsymbol{x}_{ij} \) to represent the values of the covariates of individual \( i \) in interval \( j \), and proceed as usual, rewriting the model as

\[ \log \lambda_{ij} = \alpha_j + \boldsymbol{x}_{ij}'\boldsymbol{\beta}. \]Requiring the covariates to change values only at interval boundaries may seem restrictive, but in practice the model is more flexible than it might seem at first, because we can always further split the pseudo observations. For example, if we wished to accommodate a change in a covariate for individual \( i \) half-way through interval \( j \), we could split the pseudo-observation into two, one with the old and one with the new values of the covariates. Each half would get its own measure of exposure and its own death indicator, but both would be tagged as belonging to the same interval, so they would get the same baseline hazard. All steps in the above proof would still hold.

Of course, splitting observations further increases the size of the dataset, and there will usually be practical limitations on how far one can push this approach, even if one uses grouped data. An alternative is to use simpler indicators such as the mean value of a covariate in an interval, perhaps lagged to avoid predicting current hazards using future values of covariates.

It turns out that the piece-wise exponential scheme lends itself easily to the introduction of non-proportional hazards or time-varying effects, provided again that we let the effects vary only at interval boundaries.

To fix ideas, suppose we have a single predictor taking the value \( x_{ij} \) for individual \( i \) in interval \( j \). Suppose further that this predictor is a dummy variable, so its possible values are one and zero. It doesn’t matter for our current purpose whether the value is fixed for the individual or changes from one interval to the next.

In a proportional hazards model we would write

\[ \log \lambda_{ij} = \alpha_j + \beta x_{ij}, \]where \( \beta \) represents the effect of the predictor on the log of the hazard at any given time. Exponentiating, we see that the hazard when \( x=1 \) is \( \exp\{\beta\} \) times the hazard when \( x=0 \), and this effect is the same at all times. This is a simple additive model on duration and the predictor of interest.

To allow for a time-dependent effect of the predictor, we would write

\[ \log \lambda_{ij} = \alpha_j + \beta_j x_{ij}, \]where \( \beta_j \) represents the effect of the predictor on the hazard during interval \( j \). Exponentiating, we see that the hazard in interval \( j \) when \( x=1 \) is \( \exp\{\beta_j\} \) times the hazard in interval \( j \) when \( x=0 \), so the effect may vary from one interval to the next. Since the effect of the predictor depends on the interval, we have a form of interaction between the predictor and duration, which might be more obvious if we wrote the model as

\[ \log \lambda_{ij} = \alpha_j + \beta x_{ij} + (\alpha\beta)_j x_{ij}. \]These models should remind you of the analysis of covariance models of Chapter 2. Here \( \alpha \) plays the role of the intercept and \( \beta \) the role of the slope. The proportional hazards model has different intercepts and a common slope, so it’s analogous to the parallel lines model. The model with a time-dependent effect has different intercepts and different slopes, and is analogous to the model with an interaction.

To sum up, we can accommodate non-proportionality of hazards simply by introducing interactions with duration. Obviously we can also test the assumption of proportionality of hazards by testing the significance of the interactions with duration. We are now ready for an example.