We now fit a Cox model with shared frailty to data on child mortality in Guatemala. We have analized this data using piecewise exponential models with Gamma heterogeneity, using Stata, see this page. We now try a Cox model with log-normal heterogeneity using R.
The dataset is available as a Stata file in long format (one record per child)
> library(foreign) > gu <- read.dta("https://data.princeton.edu/pop509/PebleyStupp.dta")
The first step is to conduct an analysis equivalent to Pebley and Stupp's original. We will split the data at durations 1, 6, 12, 24 and 60. This opens the option of reproducing their results exactly, and also allows us to create a key time-varying covaraite.
> library(survival) > gux <- survSplit(gu, cut = c(1, 6, 12, 24, 60), + end = "time", event="death", start="t0", episode="dur")
The model includes mother's age and age squared, a linear term on birth order, and an indicator of whether the previous kid in the family died. There are also indicators of the length of the previous interval (in lieu of a factor).
The model also includes a time-varying covariate with time-varying effects.
The variable is the length of the following birth interval, with indicators
f0011 for intervals under a year and
f1223 for intervals of one to two years,
capturing very short and short intervals where a child would face competition from
These are coded as time-varying covariates because a sibling is assumed to affect the life of the index child after it is born. So we consider very short intervals (< 12) only at ahes 12 months and higher, and short intervals (12-23 months) only at ages 24 months and higher. This is the time-varying part. But the effect of very short intervals is also allowed to vary, havving different effects at ages 12-23 and 24 or more. Here are all the variables we need:
> gux <- mutate(gux, + mage2 = mage^2, + i011a1223 = f0011 * (t0 == 12), + i011a24p = f0011 * (t0 == 24), + i1223a24p = f1223 * (t0 == 24))
We are now ready to fit the model using Cox's partial likelihood.
> phaz <- coxph(Surv(t0, time, death) ~ mage + mage2 + borde + pdead + + p0014 + p1523 + p2435 + p36up + i011a1223 + i011a24p + i1223a24p, + data = gux) > phaz Call: coxph(formula = Surv(t0, time, death) ~ mage + mage2 + borde + pdead + p0014 + p1523 + p2435 + p36up + i011a1223 + i011a24p + i1223a24p, data = gux) coef exp(coef) se(coef) z p mage -0.14915 0.86144 0.05824 -2.56 0.010 mage2 0.00257 1.00258 0.00103 2.49 0.013 borde 0.06128 1.06320 0.03344 1.83 0.067 pdead 0.09684 1.10168 0.14949 0.65 0.517 p0014 0.54084 1.71745 0.21254 2.54 0.011 p1523 -0.12197 0.88518 0.18637 -0.65 0.513 p2435 -0.25720 0.77321 0.18456 -1.39 0.163 p36up -0.39112 0.67630 0.20874 -1.87 0.061 i011a1223 0.81302 2.25472 0.71641 1.13 0.256 i011a24p 1.61570 5.03141 0.73659 2.19 0.028 i1223a24p 0.06856 1.07096 0.37692 0.18 0.856 Likelihood ratio test=47.6 on 11 df, p=1.65e-06 n= 13594, number of events= 403
The similarity of the results to those obtained using a piecewise exponential model
is remarkable. The easiest was to do the comparison in R is to use the Poisson trick,
defining exposure as
time - t0 and treating
death as Poisson with mean given by
the product of the hazard rate (which is allowed to depend on duration) and exposure.
We now introduce a shared frailty term at the mother's level to allow for intra-family
correlation in child survival. R allows fitting a frailty model via
coxph by adding
frailty() term to the model formula. There is a new and more general approach
coxme library, which includes the
coxme() function to
fit mixed Cox survival models with Gaussian random effects using a Laplace approximation.
In this example the two approaches give very similar answers, but this is not always the case.
Here's a run. All we do is add a term
(1 | momid) to the model formula to indicate
that we want a random intercept at the mother's level.
> library(coxme) > sfrail <- coxme(Surv(t0, time, death) ~ mage + mage2 + borde + pdead + + p0014 + p1523 + p2435 + p36up + i011a1223 + i011a24p + i1223a24p + + (1 | momid), data = gux)
Again, the results are remarkably similar to the estimates Guo and I obtained using a piecewise exponential model with gamma frailty, and which we have reproduced exactly using Stata. (This in spite of the fact that we have many ties, an average of almost ten deaths per distinct time.)
Let us compare the fixed effect estimates obtained with and without frailty:
> exp(cbind(coef(phaz), coef(sfrail))) [,1] [,2] mage 0.8614439 0.8563498 mage2 1.0025757 1.0026849 borde 1.0631985 1.0578345 pdead 1.1016790 0.9409469 p0014 1.7174514 1.7646553 p1523 0.8851791 0.9049408 p2435 0.7732102 0.7944649 p36up 0.6763022 0.6884454 i011a1223 2.2547157 2.2188139 i011a24p 5.0314096 5.0531009 i1223a24p 1.0709625 1.0696592
The estimates of the covariate effects are remarkably stable. The one change worth
mentioning is the coefficient for
pdead, which changes sign, from
10.3% higher risk to 7.3% lower risk when the previous child died. This variable
was clearly acting as a proxy for unobserved family effects.
The estimate of the variance of the random effect is 0.178. Because a log normal frailty term can be written in the log-scale as σ x where z is standard normal, we can interpret the estimate in the same way as other Cox coefficients. Specifically, exponentiating the standard deviation of 0.421 to obtain 1.524 we learn that children in families with frailty one standard deviation above the mean have 52.4% higher risk than children in average families with the same observed covariate values.
The piecewise exponential model with gamma frailty had a variance of 0.214. To compare results note that when log-frailty is N(0, σ2) frailty itself has variance (exp(σ2)-1) exp(σ2), so 0.178 translates to 0.232, a much closer result. (This also affects the baseline hazard, as mean frailty is not one but exp(σ2/2) or 1.093 in this case, but in a Cox model the point is moot.)
By the way
coxph() can also fit a model with shared frailty via penalized partial likelihood
by adding the model formula term
frailty(momid), for which the default is gamma frailty. It estimates the variance as 0.2. For gamma frailty the penalized likelihood produces exact
maximum likelihood estimates. Alternatively, one can fit a model using log-normal frailty by relying on a Laplace
approximation to the marginal likelihood.
Stata users: Stata can fit both models. The command
stcox with the
efron option gives
exactly the same results as here for the first model. Adding the shared frailty option
shared(momid) fits a model using gamma frailty, but is very slow, taking 46 minutes on
my home machine (compared to about 20 seconds in R).
The results, however, are very similar to those obtained here if you allow for the use
of a different frailty distribution (an almost identical to the R results using gamma
rather than lognormal frailty). In particular, the variance of gamma frailty is estimated as 0.210.