Germán Rodríguez
Demographic Methods Princeton University

Cox's Proportional Hazards Model

We continue our analysis of the leukemia remission times introduced in the context of the Kaplan-Meier estimator. This is the dataset used as an example in Cox's original paper: Cox, D.R. (1972) Regression Models and Life tables, (with discussion) Journal of the Royal Statistical Society, 34: 187--220.

We fit a model that assumes a proportional effect of treatment at all durations. There are different ways of handling ties and we choose the Efron method, which is an option in Stata and the default in R.

. // gen treated = group == "treated"
. stcox treated, efron

         failure _d:  relapse
   analysis time _t:  weeks

Iteration 0:   log likelihood =  -93.18427
Iteration 1:   log likelihood = -85.018612
Iteration 2:   log likelihood = -85.008426
Iteration 3:   log likelihood = -85.008425
Refining estimates:
Iteration 0:   log likelihood = -85.008425

Cox regression -- Efron method for ties

No. of subjects =           42                  Number of obs    =          42
No. of failures =           30
Time at risk    =          541
                                                LR chi2(1)       =       16.35
Log likelihood  =   -85.008425                  Prob > chi2      =      0.0001

─────────────┬────────────────────────────────────────────────────────────────
          _t │ Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
     treated │   .2076035    .085615    -3.81   0.000     .0925128    .4658729
─────────────┴────────────────────────────────────────────────────────────────
> gehan <- mutate(gehan, treated = as.numeric(group == "treated"))
> cm <- coxph(Surv(weeks, relapse) ~ treated, data = gehan)
> cm
Call:
coxph(formula = Surv(weeks, relapse) ~ treated, data = gehan)


          coef exp(coef) se(coef)     z       p
treated -1.572     0.208    0.412 -3.81 0.00014

Likelihood ratio test=16.4  on 1 df, p=5.26e-05
n= 42, number of events= 30 
> exp(coef(cm)) - 1
   treated 
-0.7923965 

The results show that at any given duration since remission, the risk of relapse is 7.6 lower in the treated group.

To check the assumption of proportionality of hazards one may introduce interactions with duration. In a two-group analysis like this it is also possible to plot the Kaplan-Meier estimates. Here's the plot in Cox's paper

. predict S0, basesurv            // control (not mean!)

. gen S1 = S0^exp(_b[treated])    // treated

. sts gen KM = s, by(treated)     // two Kaplan-Meiers

. twoway (scatter S0  _t, c(J) ms(none) sort)  /// baseline
>     (scatter S1 _t , c(J) ms(none) sort)     /// treated
>     (scatter KM _t if treated, msymbol(circle_hollow)) /// KM treated
>     (scatter KM _t if !treated, msymbol(X)) /// KM base
> , legend(off) title(Kaplan-Meier and Proportional Hazards Estimates)   

. graph export coxkm.png, width(500) replace          
(file coxkm.png written in PNG format)

Kaplan-Meier and Cox

> km <- survfit(Surv(weeks, relapse) ~ treated, data=gehan)
> sf <- survfit(cm, newdata=list(treated=c(0, 1)))   
> dsf <- data.frame(time = rep(c(0,sf$time), 2),
+   survival = c(1, sf$surv[,1], 1, sf$surv[,2]), 
+   group = factor(rep(c("control","treated"),
+   rep(length(sf$time) + 1, 2))))
> dkm <- data.frame(time = km$time, 
+   survival = km$surv,
+   group = factor(rep(c("control","treated"),
+   km$strata)))    
> ggplot(dsf, aes(time, survival, color = group)) + geom_step() +
+   geom_point(data = dkm, aes(time, survival, shape=group), color="black") +
+   scale_shape_manual(values = c(4, 1))  # x and o
> ggsave("coxkmr.png", width=500/72, height=400/72, dpi=72) 

Kaplan-Meier and Cox
Another way to check proportionality of hazards is to plot the log-log transformation of the survival functions versus log time, that is log(−log(S(t)) versus log(t), as the lines should then be parallel if the assumption holds.

. stphplot, by(treated) legend(off) title(Plot of log-log Survival)

         failure _d:  relapse
   analysis time _t:  weeks

. graph export coxphplot.png, width(500) replace
(file coxphplot.png written in PNG format)

Diagnostic Plot

> dkm <- mutate(dkm,  lls = -log(-log(survival)))
> ggplot(dkm, aes(log(time), lls, color=group)) + geom_point() +
+ geom_line() +   ylab("-log(-log(S(t)))")
> ggsave("coxphplotr.png", width=500/72, height=400/72, dpi=72)

Diagnostic Plot
The two lines look quite parallel indeed, showing a good fit of the proportional hazards assumption.

A more detailed treatment of these topics may be found in my survival analysis course. The discussion here is an excerpt from the pages dealing with Kaplan-Meier and Cox regression.