Germán Rodríguez
Demographic Methods Princeton University

Kaplan-Meir Survival

Stata has excellent facilities for survival analysis in continuous time, including the Kaplan-Meier estimator. I will illustrate the estimator using the Gehan data discussed in class. These are weeks before relapse of cancer patients in a control and a treated group (coded 1 and 2 respectively).

. infile group weeks relapse using ///
>         http://data.princeton.edu/eco572/datasets/gehan.dat
(42 observations read)

. label define group 1 "Control" 2 "Treated"

. label values group group

The first thing you do in Stata is stset the data specifying the variable that represents time and the variable that distinguishes failures from censored cases:

. stset weeks, fail(relapse)

     failure event:  relapse != 0 & relapse < .
obs. time interval:  (0, weeks]
 exit on or before:  failure

------------------------------------------------------------------------------
       42  total obs.
        0  exclusions
------------------------------------------------------------------------------
       42  obs. remaining, representing
       30  failures in single record/single failure data
      541  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =        35

We get a useful statement about number of failures and total time at risk. To compute and plot the Kaplan-Meier estimate by group we use sts graph

. sts graph, by(group) title("Gehan Data")

         failure _d:  relapse
   analysis time _t:  weeks

. graph export gehankm.png, replace
(file gehankm.png written in PNG format)

You can obtain pointwise confidence bands based on Greenwood standard errors using the gwood option. When combined with grouping this results in side-by-side plots. To see the estimate in full glory use sts list.

. sts list, by(group)

         failure _d:  relapse
   analysis time _t:  weeks

           Beg.          Net            Survivor      Std.
  Time    Total   Fail   Lost           Function     Error     [95% Conf. Int.]
-------------------------------------------------------------------------------
Control 
     1       21      2      0             0.9048    0.0641     0.6700    0.9753
     2       19      2      0             0.8095    0.0857     0.5689    0.9239
     3       17      1      0             0.7619    0.0929     0.5194    0.8933
     4       16      2      0             0.6667    0.1029     0.4254    0.8250
     5       14      2      0             0.5714    0.1080     0.3380    0.7492
     8       12      4      0             0.3810    0.1060     0.1831    0.5778
    11        8      2      0             0.2857    0.0986     0.1166    0.4818
    12        6      2      0             0.1905    0.0857     0.0595    0.3774
    15        4      1      0             0.1429    0.0764     0.0357    0.3212
    17        3      1      0             0.0952    0.0641     0.0163    0.2612
    22        2      1      0             0.0476    0.0465     0.0033    0.1970
    23        1      1      0             0.0000         .          .         .
Treated 
     6       21      3      1             0.8571    0.0764     0.6197    0.9516
     7       17      1      0             0.8067    0.0869     0.5631    0.9228
     9       16      0      1             0.8067    0.0869     0.5631    0.9228
    10       15      1      1             0.7529    0.0963     0.5032    0.8894
    11       13      0      1             0.7529    0.0963     0.5032    0.8894
    13       12      1      0             0.6902    0.1068     0.4316    0.8491
    16       11      1      0             0.6275    0.1141     0.3675    0.8049
    17       10      0      1             0.6275    0.1141     0.3675    0.8049
    19        9      0      1             0.6275    0.1141     0.3675    0.8049
    20        8      0      1             0.6275    0.1141     0.3675    0.8049
    22        7      1      0             0.5378    0.1282     0.2678    0.7468
    23        6      1      0             0.4482    0.1346     0.1881    0.6801
    25        5      0      1             0.4482    0.1346     0.1881    0.6801
    32        4      0      2             0.4482    0.1346     0.1881    0.6801
    34        2      0      1             0.4482    0.1346     0.1881    0.6801
    35        1      0      1             0.4482    0.1346     0.1881    0.6801
-------------------------------------------------------------------------------

You should be able to reproduce all these results 'by hand', as we did in class.