Germán Rodríguez
Survival Analysis Princeton University

The Gehan Survival Data

The WWS 509 datasets page has a copy of the Gehan survival data which were analyzed by Cox in his original proportional hazards paper. The data shows the length of remission in weeks for two groups of leukemia patients, treated and controls, click here for more information. We analyze the data using R and Stata.

As usual, the first task is to read the data and stset them. We also label the groups.

. infile group weeks relapse using ///
>    http://data.princeton.edu/wws509/datasets/gehan.raw
(42 observations read)

. label define group 1 "control" 2 "treated"

. label values group group

. stset weeks, failure(relapse)

     failure event:  relapse != 0 & relapse < .
obs. time interval:  (0, weeks]
 exit on or before:  failure

------------------------------------------------------------------------------
         42  total observations
          0  exclusions
------------------------------------------------------------------------------
         42  observations remaining, representing
         30  failures in single-record/single-failure data
        541  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =        35
> gehan <- read.table("http://data.princeton.edu/wws509/datasets/gehan.dat")

> summarize(gehan, events = sum(relapse), exposure = sum(weeks))
  events exposure
1     30      541

As you can see, 30 of the 42 patients had a relapse, with a total observation time of 541 weeks. The weekly relapse rate is 7.8%.

Kaplan-Meier

The Kaplan-Meier estimators for the two groups are easily plotted using sts graph with the by(group) option computed using survfit in the survival package and plotted using plot (which I wrap in a function pngplot to style the plot and generate a png file)

. sts graph, by(group)

         failure _d:  relapse
   analysis time _t:  weeks

. graph export kmg.png, width(500) replace
(file kmg.png written in PNG format)

Kaplan-Meier

> library(survival)

> kmg <- survfit(Surv(weeks, relapse) ~ group, data=gehan)

> #pngplot(kmg, "kmgr.png")

Kaplan-Meier This shows that after 23 weeks all patients in the control group had relapsed, but about half those in the treated group remained in remission. This reproduces Figure 1 in the notes.

We can list the estimates using sts list using the summary() method

. sts list, by(group)

         failure _d:  relapse
   analysis time _t:  weeks

           Beg.          Net            Survivor      Std.
  Time    Total   Fail   Lost           Function     Error     [95% Conf. Int.]
-------------------------------------------------------------------------------
control 
     1       21      2      0             0.9048    0.0641     0.6700    0.9753
     2       19      2      0             0.8095    0.0857     0.5689    0.9239
     3       17      1      0             0.7619    0.0929     0.5194    0.8933
     4       16      2      0             0.6667    0.1029     0.4254    0.8250
     5       14      2      0             0.5714    0.1080     0.3380    0.7492
     8       12      4      0             0.3810    0.1060     0.1831    0.5778
    11        8      2      0             0.2857    0.0986     0.1166    0.4818
    12        6      2      0             0.1905    0.0857     0.0595    0.3774
    15        4      1      0             0.1429    0.0764     0.0357    0.3212
    17        3      1      0             0.0952    0.0641     0.0163    0.2612
    22        2      1      0             0.0476    0.0465     0.0033    0.1970
    23        1      1      0             0.0000         .          .         .
treated 
     6       21      3      1             0.8571    0.0764     0.6197    0.9516
     7       17      1      0             0.8067    0.0869     0.5631    0.9228
     9       16      0      1             0.8067    0.0869     0.5631    0.9228
    10       15      1      1             0.7529    0.0963     0.5032    0.8894
    11       13      0      1             0.7529    0.0963     0.5032    0.8894
    13       12      1      0             0.6902    0.1068     0.4316    0.8491
    16       11      1      0             0.6275    0.1141     0.3675    0.8049
    17       10      0      1             0.6275    0.1141     0.3675    0.8049
    19        9      0      1             0.6275    0.1141     0.3675    0.8049
    20        8      0      1             0.6275    0.1141     0.3675    0.8049
    22        7      1      0             0.5378    0.1282     0.2678    0.7468
    23        6      1      0             0.4482    0.1346     0.1881    0.6801
    25        5      0      1             0.4482    0.1346     0.1881    0.6801
    32        4      0      2             0.4482    0.1346     0.1881    0.6801
    34        2      0      1             0.4482    0.1346     0.1881    0.6801
    35        1      0      1             0.4482    0.1346     0.1881    0.6801
-------------------------------------------------------------------------------
> summary(kmg)
Call: survfit(formula = Surv(weeks, relapse) ~ group, data = gehan)

                group=control 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1     21       2   0.9048  0.0641      0.78754        1.000
    2     19       2   0.8095  0.0857      0.65785        0.996
    3     17       1   0.7619  0.0929      0.59988        0.968
    4     16       2   0.6667  0.1029      0.49268        0.902
    5     14       2   0.5714  0.1080      0.39455        0.828
    8     12       4   0.3810  0.1060      0.22085        0.657
   11      8       2   0.2857  0.0986      0.14529        0.562
   12      6       2   0.1905  0.0857      0.07887        0.460
   15      4       1   0.1429  0.0764      0.05011        0.407
   17      3       1   0.0952  0.0641      0.02549        0.356
   22      2       1   0.0476  0.0465      0.00703        0.322
   23      1       1   0.0000     NaN           NA           NA

                group=treated 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    6     21       3    0.857  0.0764        0.720        1.000
    7     17       1    0.807  0.0869        0.653        0.996
   10     15       1    0.753  0.0963        0.586        0.968
   13     12       1    0.690  0.1068        0.510        0.935
   16     11       1    0.627  0.1141        0.439        0.896
   22      7       1    0.538  0.1282        0.337        0.858
   23      6       1    0.448  0.1346        0.249        0.807

The convention is to list the survival function immediately after each time. In the control group there are no censored observations, and the Kaplan-Meier estimate is simply the proportion alive after each distinct failure time. You can also check that the standard error is the usual binomial estimate. For example just after 8 weeks there are 8 out of 21 alive, the proportion is 8/21 = 0.381 and the standard error is √0.381(1 - 0.381)/21) = 0.106.

In the treated group 12 cases are censored and 9 relapse. Can you compute the estimate by hand? Only the distinct times of death --6, 7, 10, 13, 16, 22 and 23-- are relevant. The counts of relapses are 3, 1, 1, 1, 1, 1, 1. When there are ties between event and censoring times it is customary to assume that the event occurred first; that is, observations censored at t are assumed to be exposed at that time, effectively censored just after t. The counts of censored observations after each death time (but before the next) are 1, 1, 2, 0, 3, 0 and 5. When the last observation is censored the K-M estimate is greater than zero, and is usually considered undefined from that point on.

Nelson-Aalen

We can also compute the Nelson-Aalen estimate of the cumulative hazard, and then exponentiate minus the cumulative hazard to obtain an alternative estimate of the survival function. Stata's sts list can compute the Nelson-Aalen estimate using the na option, and sts generate can save either estimate in a new variable as shown below In R we can estimate the cumulative hazard "by hand" using counts of events and exposure

. sts generate kmS = s

. sts generate naH = na

. gen naS = exp(-naH)

. twoway (line kmS _t, c(J) sort) (line naS _t, c(J) sort)

. graph export kmna.png, width(500) replace
(file kmna.png written in PNG format)

Aalen and KM

> sf <- survfit(Surv(weeks, relapse) ~ 1, data = gehan)

> d <- data.frame(time = sf$ time, km = sf $surv, 
+   na = exp(-cumsum(sf$n.event / sf$n.risk)))    

> dl <- reshape(d, direction="long", idvar="time", timevar="method", 
+   v.names="survival", varying=c("km","na")) %>%
+   mutate(method = factor(method, labels=c("km","na")))

> ggplot(dl, aes(time, survival, color=method)) + geom_step()

> ggsave("kmnar.png", width=500/72, height=400/72, dpi=72)

Aalen and KM The two estimates are very similar.

Mantel-Haenszel

To test equality of survival curves in the two groups we can use the Mantel-Haenszel or log-rank test, available in the sts test command with the default logrank option available via the survdiff function

. sts test group

         failure _d:  relapse
   analysis time _t:  weeks


Log-rank test for equality of survivor functions

        |   Events         Events
group   |  observed       expected
--------+-------------------------
control |        21          10.75
treated |         9          19.25
--------+-------------------------
Total   |        30          30.00

              chi2(1) =      16.79
              Pr>chi2 =     0.0000
> survdiff(Surv(weeks, relapse) ~ group, data = gehan)
Call:
survdiff(formula = Surv(weeks, relapse) ~ group, data = gehan)

               N Observed Expected (O-E)^2/E (O-E)^2/V
group=control 21       21     10.7      9.77      16.8
group=treated 21        9     19.3      5.46      16.8

 Chisq= 16.8  on 1 degrees of freedom, p= 4.17e-05 

We see that the treated have fewer deaths than would be expected (and thus the controls have more) if the two groups had the same survival distribution: 9 instead of 19.25. The Mantel-Haenszel chi-squared satistics of 16.8 on one d.f. is highly significant.

It is instructive to compute this statistic "by hand".

The Mantel-Haenszel test is just one of several test statistics that we could use. There is a family of test based on a comparison of observed and expected counts at each distinct failure time, which differ in terms of the weights assigned to each time as shown in the table below

NameWeight
Wilcoxonni
Tarone-Ware√ ni
Peto-Peto-PrenticeS(ti) peto
Fleming-HarringtonS(ti)p (1-S(ti))q

The most popular of these is the Wilcoxon test, actually an extension of Wilcoxon's well-known non-parametric test proposed by Gehan and Breslow, which gives more weight to early failures. Stata can compute all four using the options wilcoxon, tware, peto and fh(p, q) R implements three using a parameter rho, p in our notation, with rho=0 for log-rank,
rho=1 for Peto-Peto, and rho=p for Fleming-Harrington with q=1. The tests using the survival as weight treat it as left-continuous.