Germán Rodríguez

Survival Analysis
Princeton University
We continue our analysis of the Gehan data by fitting a proportional hazards model. This is the same dataset used as an example in Cox’s original paper: Cox, D.R. (1972) Regression Models and Life Tables, (with discussion) *Journal of the Royal Statistical Society*, **34**: 187–220.

The first task is to read and `stset`

the data. I also create a dummy variable for `treated`

.

. infile group weeks relapse using /// > https://data.princeton.edu/wws509/datasets/gehan.raw, clear (42 observations read) . gen treated = group == 2 . stset weeks, failure(relapse) failure event: relapse != 0 & relapse < . obs. time interval: (0, weeks] exit on or before: failure ────────────────────────────────────────────────────────────────────────────── 42 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────────── 42 observations remaining, representing 30 failures in single-record/single-failure data 541 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 35

> gehan <- read.table("https://data.princeton.edu/wws509/datasets/gehan.dat") > names(gehan) [1] "group" "weeks" "relapse" > library(dplyr) > summarize(gehan, events = sum(relapse), exposure = sum(weeks)) events exposure 1 30 541 > gehan <- mutate(gehan, treated = as.numeric(group == "treated"))

Here’s a run fitting a Cox model with all the defaults

. stcox treated failure _d: relapse analysis time _t: weeks Iteration 0: log likelihood = -93.98505 Iteration 1: log likelihood = -86.385606 Iteration 2: log likelihood = -86.379623 Iteration 3: log likelihood = -86.379622 Refining estimates: Iteration 0: log likelihood = -86.379622 Cox regression -- Breslow method for ties No. of subjects = 42 Number of obs = 42 No. of failures = 30 Time at risk = 541 LR chi2(1) = 15.21 Log likelihood = -86.379622 Prob > chi2 = 0.0001 ─────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── treated │ .2210887 .0905501 -3.68 0.000 .0990706 .4933877 ─────────────┴──────────────────────────────────────────────────────────────── . di _b[treated] -1.5091914

> library(survival) > cm <- coxph(Surv(weeks, relapse) ~ treated, data = gehan) > cm Call: coxph(formula = Surv(weeks, relapse) ~ treated, data = gehan) coef exp(coef) se(coef) z p treated -1.5721 0.2076 0.4124 -3.812 0.000138 Likelihood ratio test=16.35 on 1 df, p=5.261e-05 n= 42, number of events= 30

Stata reports hazard ratios unless you specify the option nohr. R reports log-relative risks, but also exponentiates the coefficients to obtain hazard ratios. We see that the treatment reduced the risk of relapse by almost 80% at any duration.

There are various options for handling ties. Cox’s original proposal relies on the discrete partial likelihood. A closely-related alternative due to Kalbfleisch and Prentice uses the marginal likelihood of the ranks. Both methods are computationally intensive. A good fast approximation is due to Efron, and a simpler and faster, though somewhat less accurate, method is due to Breslow and Peto. See the notes for details.

In terms of our software, Stata implements all four using the options `exactp`

, `exactm`

, `efron`

and `breslow`

. The default is `breslow`

, but I recommend you always use `efron`

. R implements all but the marginal likelihood, using the argument `ties`

, with possible values “breslow”, “efron” and “exact”. The default is “efron”. Let us compare them all.

. estimates store breslow . quietly stcox treated, efron . estimates store efron . quietly stcox treated, exactm . estimates store exactm . quietly stcox treated, exactp . estimates store exactp . estimates table breslow efron exactm exactp ─────────────┬──────────────────────────────────────────────────── Variable │ breslow efron exactm exactp ─────────────┼──────────────────────────────────────────────────── treated │ -1.5091914 -1.5721251 -1.5981915 -1.628244 ─────────────┴────────────────────────────────────────────────────

> cmb <- coxph(Surv(weeks, relapse) ~ treated, data = gehan, ties="breslow") > cmp <- coxph(Surv(weeks, relapse) ~ treated, data = gehan, ties="exact") > data.frame(breslow = coef(cmb), efron = coef(cm), exact = coef(cmp)) breslow efron exact treated -1.509191 -1.572125 -1.628244

As you can see, Efron’s approximation is closer to the exact partial likelihood than Breslow’s. The marginal likelihood is even closer. Cox reported a log-likelihood of -1.65 in his paper, which he obtained by evaluating the likelihood in a grid of points. The more exact calculations here yield -1.63, so he did pretty well by hand (see page 197 in the paper).

One way to test proportionality of hazards is to introduce interactions with duration. In his original paper Cox tried a linear interaction with time. We will do the same, except that he worked with *t - 10* to achieve more orthogonality and we will use *t*.

Stata makes it very easy to introduce interactions with time by providing two optionsL

`tvc(varlist)`

to specify the variable(s) that we want to interact with time, and`texp(expression)`

to specify a function of time`_t`

, typically just time, with`texp(_t)`

or log-time with`texp(log(_t))`

.

Stata will then create a variable equal to the product of the variable specified in `tvc()`

by the time expression specifiedin `texp()`

and add it to the model. Let us use this technique to interact treatment and time

. stcox treate, tvc(treated) texp(_t) efron failure _d: relapse analysis time _t: weeks Iteration 0: log likelihood = -93.18427 Iteration 1: log likelihood = -85.34729 Iteration 2: log likelihood = -85.008964 Iteration 3: log likelihood = -85.008326 Iteration 4: log likelihood = -85.008326 Refining estimates: Iteration 0: log likelihood = -85.008326 Cox regression -- Efron method for ties No. of subjects = 42 Number of obs = 42 No. of failures = 30 Time at risk = 541 LR chi2(2) = 16.35 Log likelihood = -85.008326 Prob > chi2 = 0.0003 ─────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── main │ treated │ .2057005 .1595878 -2.04 0.042 .0449626 .9410648 ─────────────┼──────────────────────────────────────────────────────────────── tvc │ treated │ 1.000865 .0617494 0.01 0.989 .88687 1.129514 ─────────────┴──────────────────────────────────────────────────────────────── Note: Variables in tvc equation interacted with _t.

We find no evidence that the treatment effects changes linearly with time. BTW we didn’t need to specify `texp(_t)`

because this is the default.

Another possibility is to allow different treatment effects at early and late durations, say before and after 10 weeks. This is easily done by changing the time expression:

. stcox treated, tvc(treated) texp(_t > 10) efron failure _d: relapse analysis time _t: weeks Iteration 0: log likelihood = -93.18427 Iteration 1: log likelihood = -84.972656 Iteration 2: log likelihood = -84.74237 Iteration 3: log likelihood = -84.740124 Iteration 4: log likelihood = -84.740124 Refining estimates: Iteration 0: log likelihood = -84.740124 Cox regression -- Efron method for ties No. of subjects = 42 Number of obs = 42 No. of failures = 30 Time at risk = 541 LR chi2(2) = 16.89 Log likelihood = -84.740124 Prob > chi2 = 0.0002 ─────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── main │ treated │ .2702224 .1426515 -2.48 0.013 .0960215 .7604559 ─────────────┼──────────────────────────────────────────────────────────────── tvc │ treated │ .5475566 .4505425 -0.73 0.464 .1091542 2.746741 ─────────────┴──────────────────────────────────────────────────────────────── Note: Variables in tvc equation interacted with _t>10.

The point estimates a 73% reduction in risk in the first ten weeks and an additional reduction after ten weeks, for a total of 85% in the later period. However, the difference in treatment effects between the two periods is not significant

Because only times with observed failures contribute to the partial likelihood, we can introduce arbitrary interactions by splitting the data at each failure time. As a sanity check, we verify that we obtain the same estimate as before

. gen id = _n . streset, id(id) -> stset weeks, id(id) failure(relapse) id: id failure event: relapse != 0 & relapse < . obs. time interval: (weeks[_n-1], weeks] exit on or before: failure ────────────────────────────────────────────────────────────────────────────── 42 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────────── 42 observations remaining, representing 42 subjects 30 failures in single-failure-per-subject data 541 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 35 . stsplit, at(failures) (17 failure times) (384 observations (episodes) created) . quietly stcox treated, efron . di _b[treated] -1.5721251 . estimates store ph

> failure_times <- sort(unique(gehan$weeks[gehan$relapse])) > gehanx <- survSplit(gehan, cut = failure_times, + event = "relapse", start = "t0", end = "weeks") > coef(coxph(Surv(t0, weeks, relapse) ~ treated, data=gehanx)) treated -1.572125

We now introduce a linear interaction with time using the dummy variable for treated. (You could specify the model as `treated * t0`

. R will omit `t0`

because it is implicit in the baseline hazard, and complain that the model matrix is singular, but the results will be correct. My approach is a bit cleaner.)

. stcox treated c.treated#c._t, efron failure _d: relapse analysis time _t: weeks id: id Iteration 0: log likelihood = -93.18427 Iteration 1: log likelihood = -85.34729 Iteration 2: log likelihood = -85.008964 Iteration 3: log likelihood = -85.008326 Iteration 4: log likelihood = -85.008326 Refining estimates: Iteration 0: log likelihood = -85.008326 Cox regression -- Efron method for ties No. of subjects = 42 Number of obs = 426 No. of failures = 30 Time at risk = 541 LR chi2(2) = 16.35 Log likelihood = -85.008326 Prob > chi2 = 0.0003 ───────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ───────────────┼──────────────────────────────────────────────────────────────── treated │ .2057005 .1595878 -2.04 0.042 .0449626 .9410648 │ c.treated#c._t │ 1.000865 .0617494 0.01 0.989 .88687 1.129514 ───────────────┴──────────────────────────────────────────────────────────────── . lrtest ph . Likelihood-ratio test LR chi2(1) = 0.00 (Assumption: ph nested in .) Prob > chi2 = 0.9888 . stjoin // back to normal (option censored(0) assumed) (384 obs. eliminated)

> cmx <- coxph(Surv(t0, weeks, relapse) ~ treated + treated:weeks, data=gehanx) > summary(cmx) Call: coxph(formula = Surv(t0, weeks, relapse) ~ treated + treated:weeks, data = gehanx) n= 426, number of events= 30 coef exp(coef) se(coef) z Pr(>|z|) treated -1.5813338 0.2057005 0.7758258 -2.038 0.0415 * treated:weeks 0.0008651 1.0008655 0.0616960 0.014 0.9888 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper .95 treated 0.2057 4.8614 0.04496 0.9411 treated:weeks 1.0009 0.9991 0.88687 1.1295 Concordance= 0.69 (se = 0.045 ) Rsquare= 0.038 (max possible= 0.354 ) Likelihood ratio test= 16.35 on 2 df, p=3e-04 Wald test = 14.51 on 2 df, p=7e-04 Score (logrank) test = 17.69 on 2 df, p=1e-04 > cat("chisq", 2*(logLik(cmx) - logLik(cm)), "\n") chisq 0.0001967492

We get a Wald test for the interaction term of *z = 0.01* and twice a difference in log-likelihoods of 0.00, so clearly there is no evidence of an interaction between treatment and time at risk. Note that these are exactly the same results we got with `tvc()`

and `texp()`

.

As an alternative, we could allow different treatment effects before and after 10 weeks. We could use the current dataset, but all we really need is to split at 10, so we’ll do just that:

. stsplit dur, at(10) (21 observations (episodes) created) . quietly stcox treated, efron // for lrtest . estimates store ph . gen after10 = dur == 10 . stcox treated c.treated#c.after10, efron failure _d: relapse analysis time _t: weeks id: id Iteration 0: log likelihood = -93.18427 Iteration 1: log likelihood = -84.972656 Iteration 2: log likelihood = -84.74237 Iteration 3: log likelihood = -84.740124 Iteration 4: log likelihood = -84.740124 Refining estimates: Iteration 0: log likelihood = -84.740124 Cox regression -- Efron method for ties No. of subjects = 42 Number of obs = 63 No. of failures = 30 Time at risk = 541 LR chi2(2) = 16.89 Log likelihood = -84.740124 Prob > chi2 = 0.0002 ────────────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ────────────────────┼──────────────────────────────────────────────────────────────── treated │ .2702224 .1426515 -2.48 0.013 .0960215 .7604559 │ c.treated#c.after10 │ .5475566 .4505425 -0.73 0.464 .1091542 2.746741 ────────────────────┴──────────────────────────────────────────────────────────────── . lrtest ph . Likelihood-ratio test LR chi2(1) = 0.54 (Assumption: ph nested in .) Prob > chi2 = 0.4638 . drop dur after10 . stjoin (option censored(0) assumed) (21 obs. eliminated)

> gehan10 <- survSplit(gehan, cut = 10, + event = "relapse", start = "t0", end = "weeks") %>% + mutate(after10 = as.numeric(t0 == 10), + treated = as.numeric(group == "treated")) > cm10 <- coxph(Surv(t0, weeks, relapse) ~ treated + treated:after10, + data=gehan10) > summary(cm10) Call: coxph(formula = Surv(t0, weeks, relapse) ~ treated + treated:after10, data = gehan10) n= 63, number of events= 30 coef exp(coef) se(coef) z Pr(>|z|) treated -1.3085 0.2702 0.5279 -2.479 0.0132 * treated:after10 -0.6023 0.5476 0.8228 -0.732 0.4642 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper .95 treated 0.2702 3.701 0.09602 0.7605 treated:after10 0.5476 1.826 0.10915 2.7467 Concordance= 0.69 (se = 0.043 ) Rsquare= 0.235 (max possible= 0.948 ) Likelihood ratio test= 16.89 on 2 df, p=2e-04 Wald test = 15.31 on 2 df, p=5e-04 Score (logrank) test = 18.91 on 2 df, p=8e-05 > cat("chisq", 2*(logLik(cm10) - logLik(cm)), "\n") chisq 0.5366016

The Wald test now yields -0.73 (a chi-squared of 0.53), and the likelihood ratio test concurs, with a chi-squared of 0.54 on one d.f. The estimated risk ratio is larger after 10 weeks, but the difference is not significant. Note that these are exactly the same results we got with tvc() and texp().

Another way to check for proportionality of hazards is to use Schoenfeld residuals (and their scaled counterparts). You can obtain an overall test using the Schoenfeld residuals, or a variable-by-variable test based on the scaled variant. In this case, with just one predictor, there is only one test, but we’ll see later an example with several predictors.

Stata and R offer several possible transformations of time for the test, including a user-specified function, but chose different defaults. In Stata the default is `time`

, but one of the options is `km`

for the Kaplan-Meier estimate of overall survival. In R the default transform is “km” for the K-M estimate, but one of the options is “identity” for time.

. quietly stcox treated, efron . estat phtest Test of proportional-hazards assumption Time: Time ────────────┬─────────────────────────────────────────────────── │ chi2 df Prob>chi2 ────────────┼─────────────────────────────────────────────────── global test │ 0.00 1 0.9886 ────────────┴───────────────────────────────────────────────────

> zph <- cox.zph(cm, transform = "identity") > zph rho chisq p treated 0.00264 0.000205 0.989

The test yields no evidence against the proportional hazards assumption. If there had been, we could get a hint of the nature of the time dependence by plotting the (scaled) residuals against time and using a smoother to glean the trend, if any. [In R the `cox.zph`

class has a `plot()`

method which uses a spline smoother. I specified `df=2`

because of the small sample size. We use `ggfy()`

as before.}{.r}

. estat phtest, plot(treated) . graph export phplot.png, width(500) replace (note: file phplot.png not found) (file phplot.png written in PNG format)

> source("https://data.princeton.edu/pop509/ggfy.R.txt") > png("phplotr.png", width=500, height=400) > ggfy(zph, df=2) > dev.off() null device 1

The residuals show no time trend at all, showing that the treatment hazard ratio is fairly constant over time. (We will confirm this result below with a plot of cumulative hazards that provides more direct evidence.)

The emphasis in the Cox model is on hazard ratios, but one can calculate a Kaplan-Meier or a Nelson-Aalen estimate of the baseline survival, as shown in the notes. The baseline is defined as the case where all covariate values are zero, and this may not make sense in your data. A popular alternative is to estimate the “baseline” at average values of all covariates. In our case, a much better approach is to estimate and plot the estimated survival functions for the two groups. Stata makes this very easy via the `stcurve`

command.

. stcurve, surv at(treated=0) at(treated=1) . graph export coxsurv.png, width(500) replace (note: file coxsurv.png not found) (file coxsurv.png written in PNG format)

It is instructive to compute these “by hand” and compare them with separate Kaplan-Meier estimates for each group, which I will plot using different symbols for treated and controls. The plots connect the point estimates using a step function.

. predict S0, basesurv // control (not mean!) . gen S1 = S0^exp(_b[treated]) // treated . sts gen KM = s, by(treated) // two Kaplan-Meiers . twoway (scatter S0 _t, c(J) ms(none) sort) /// baseline > (scatter S1 _t , c(J) ms(none) sort) /// treated > (scatter KM _t if treated, msymbol(circle_hollow)) /// KM treated > (scatter KM _t if !treated, msymbol(X)) /// KM base > , legend(off) /// > title(Kaplan-Meier and Proportional Hazards Estimates) . graph export coxkm.png, width(500) replace (note: file coxkm.png not found) (file coxkm.png written in PNG format)

> sf <- survfit(cm, newdata=list(treated=c(1,0))) > km <- survfit(Surv(weeks, relapse) ~ treated, data=gehan) > dsf <- data.frame(time = rep(c(0,sf$time), 2), + survival = c(1, sf$surv[,1], 1, sf$surv[,2]), + group = factor(rep(c("treated","control"), + rep(length(sf$time) + 1,2)))) > dkm <- data.frame(time = km$time, + survival = km$surv, + group = factor(rep(c("treated", "control"), km$strata))) > library(ggplot2) > ggplot(dsf, aes(time, survival, color = group)) + geom_step() + + geom_point(data = dkm, aes(time, survival, shape=group),color="black") + + scale_shape_manual(values = c(1, 4)) > ggsave("coxkmr.png", width=500/72, height=400/72, dpi=72)

The figure looks just like Figure 1 in Cox’s paper. If the purpose of the graph is to check the proportional hazards assumption, a much better alternative is to plot the log-log transformation of the survival function, namely -log(-log(S(t)), against log(t) for each group. Under the proportional hazards assumption, the resulting curves should be parallel. This plot is useful because the eye is much better at judging whether curves are parallel than whether they are proportional.

. stphplot, by(treated) legend(off) title(Plot of -log(-log(S(t)))) failure _d: relapse analysis time _t: weeks id: id . graph export coxphplot.png, width(500) replace (note: file coxphplot.png not found) (file coxphplot.png written in PNG format)

> dkm <- mutate(dkm, lls = -log(-log(survival))) > ggplot(dkm, aes(log(time), lls, color=group)) + geom_point() + + geom_line() + ylab("-log(-log(S(t)))") > ggsave("coxphplotr.png", width=500/72, height=400/72, dpi=72)

The two lines look quite parallel indeed.