Kaplan-Meir Survival
Stata has excellent facilities for survival analysis in continuous time, including the Kaplan-Meier estimator. I will illustrate the estimator using the Gehan data discussed in class. These are weeks before relapse of cancer patients in a control and a treated group (coded 1 and 2 respectively).
. infile group weeks relapse using /// > http://data.princeton.edu/eco572/datasets/gehan.dat (42 observations read) . label define group 1 "Control" 2 "Treated" . label values group group
The first thing you do in Stata is stset
the data specifying the variable that represents time
and the variable that distinguishes failures from censored
cases:
. stset weeks, fail(relapse)
failure event: relapse != 0 & relapse < .
obs. time interval: (0, weeks]
exit on or before: failure
------------------------------------------------------------------------------
42 total obs.
0 exclusions
------------------------------------------------------------------------------
42 obs. remaining, representing
30 failures in single record/single failure data
541 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 35
We get a useful statement about number of failures and
total time at risk.
To compute and plot the Kaplan-Meier estimate by group
we use sts graph
. sts graph, by(group) title("Gehan Data")
failure _d: relapse
analysis time _t: weeks
. graph export gehankm.png, replace
(file gehankm.png written in PNG format)

You can obtain pointwise confidence bands based on
Greenwood standard errors using the gwood option.
When combined with grouping this results in side-by-side plots.
To see the estimate in full glory use sts list.
. sts list, by(group)
failure _d: relapse
analysis time _t: weeks
Beg. Net Survivor Std.
Time Total Fail Lost Function Error [95% Conf. Int.]
-------------------------------------------------------------------------------
Control
1 21 2 0 0.9048 0.0641 0.6700 0.9753
2 19 2 0 0.8095 0.0857 0.5689 0.9239
3 17 1 0 0.7619 0.0929 0.5194 0.8933
4 16 2 0 0.6667 0.1029 0.4254 0.8250
5 14 2 0 0.5714 0.1080 0.3380 0.7492
8 12 4 0 0.3810 0.1060 0.1831 0.5778
11 8 2 0 0.2857 0.0986 0.1166 0.4818
12 6 2 0 0.1905 0.0857 0.0595 0.3774
15 4 1 0 0.1429 0.0764 0.0357 0.3212
17 3 1 0 0.0952 0.0641 0.0163 0.2612
22 2 1 0 0.0476 0.0465 0.0033 0.1970
23 1 1 0 0.0000 . . .
Treated
6 21 3 1 0.8571 0.0764 0.6197 0.9516
7 17 1 0 0.8067 0.0869 0.5631 0.9228
9 16 0 1 0.8067 0.0869 0.5631 0.9228
10 15 1 1 0.7529 0.0963 0.5032 0.8894
11 13 0 1 0.7529 0.0963 0.5032 0.8894
13 12 1 0 0.6902 0.1068 0.4316 0.8491
16 11 1 0 0.6275 0.1141 0.3675 0.8049
17 10 0 1 0.6275 0.1141 0.3675 0.8049
19 9 0 1 0.6275 0.1141 0.3675 0.8049
20 8 0 1 0.6275 0.1141 0.3675 0.8049
22 7 1 0 0.5378 0.1282 0.2678 0.7468
23 6 1 0 0.4482 0.1346 0.1881 0.6801
25 5 0 1 0.4482 0.1346 0.1881 0.6801
32 4 0 2 0.4482 0.1346 0.1881 0.6801
34 2 0 1 0.4482 0.1346 0.1881 0.6801
35 1 0 1 0.4482 0.1346 0.1881 0.6801
-------------------------------------------------------------------------------
You should be able to reproduce all these results 'by hand', as we did in class.
